Fine-tune models on curated datasets using supervised fine-tuning, LoRA, and QLoRA without destroying the base model’s general capabilities
Apply reinforcement learning from verifiable rewards (RLVR) and modern preference optimisation methods including DPO, ORPO, and beyond, to shape model behaviour
Evaluate models rigorously: design benchmarks, detect regression, and measure quality claims that survive scrutiny
Adapt models to specialised domains — from clinical language to legal text — turning general capability into a defensible competitive advantage
Train agentic models that take sequences of actions reliably, not just models that talk about taking actions
Quantise and compress fine-tuned models for deployment without sacrificing the gains you trained for
The literature on post-training is focused either on small educational use cases that do not consider enterprise realities, or presuppose the workflow of foundation labs. There’s nothing for the crucial middle: enterprise practitioners with real compute budgets who need to customise, align and deploy AI at scale. This book fills that gap.
The book treats post-training decisions as trade-offs rather than best practices, helping practitioners match techniques to constraints. It provides decision frameworks, clearly documenting trade-offs and benefits.
Combines technical depth with strategic context. Includes companion Jupyter notebooks covering practical implementation. Shows how to embed proprietary knowledge, organisational values and domain expertise into foundation models.
Part I: The Foundation
Part II: The Tools
Part III: The Craft
Part IV: The Frontier
Be the first to know when the book is available for early access. No spam — just a single email when it’s ready.
Post-training is where models stop being impressive and start being useful.