Fine-Tuning the Generator: LoRA & PEFT

When prompting and retrieval leave quality on the table, fine-tune the generator itself. Fine-tuning changes how the model responds — its style, format, and reliability — not what it knows. Learn parameter-efficient methods (LoRA, QLoRA), how to build a training set from your traces, and how to deploy adapters without breaking the base model.

Day 2 Progress0%

When to Fine-Tune (and When Not To)

By now you can prompt well, retrieve well, rerank, and evaluate. Fine-tuning is the next lever — and the most misunderstood. Used correctly it is powerful; used as a knowledge-injection hack it quietly wastes weeks.

The Decision Hierarchy

Reach for the cheapest tool that works, in this order:

  1. Prompting — system instructions, few-shot examples, structured output. Free, instant to change.
  2. RAG — when the model needs facts it doesn't have. Knowledge stays external and up to date.
  3. Fine-tuning — when you need the model to behave differently in a way prompting can't reliably achieve.

Only climb to step 3 once steps 1 and 2 are exhausted. Fine-tuning has real costs: a training pipeline, an eval harness, a deployment story, and a model artifact to version and maintain.

Fine-Tuning Changes How, Not What

This is the single most important sentence in this lesson: fine-tuning teaches a model how to respond, not what to know.

It is excellent at shaping behavior:

  • Format & structure — always return valid JSON in a fixed schema, always cite in a house style.
  • Tone & voice — match a brand's register, a legal register, a clinical register.
  • Instruction following — reliably obey complex, repetitive instructions you'd otherwise repeat in every prompt.
  • Classification & routing — crisp, consistent labels on domain text.
  • Prompt compression — bake a long system prompt into the weights, cutting per-request tokens.

The Cardinal Mistake

The most common failure is fine-tuning to inject knowledge — "we'll fine-tune the model on our docs so it knows our product." This backfires:

  • Facts in weights go stale the moment a doc changes; you'd have to retrain to update them.
  • The model learns the style of your docs but still hallucinates specifics confidently.
  • RAG already solves this — cheaper, updatable, and with citations.
If the answer changes when your data changes, that's a job for RAG. If the answer changes when your requirements change — format, tone, behavior — that's a job for fine-tuning.

Fresh or changing facts → RAG. Durable behavior → fine-tuning. Most production systems that fine-tune still use RAG for knowledge; the two are complements.

Key Takeaways
  • Climb the cost ladder in order: prompting, then RAG, then fine-tuning — only fine-tune when behavior (not knowledge) is the gap
  • Fine-tuning changes HOW a model responds (format, tone, instruction-following), not WHAT it knows
  • Fine-tuning to inject facts is the cardinal mistake: facts go stale, the model still hallucinates, and RAG already solves it better

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
55 min
Lessons
5 sections