Back to Courses

Chunking Strategies That Don't Ruin Retrieval

The single highest-leverage decision in RAG retrieval quality. Fixed-size vs token-aware, recursive splitting, overlap and the boundary problem, parent-document retrieval, and layout-aware chunking for markdown, HTML, PDF, code, and tables.

Day 2 Progress0%

Why Chunk Size Is a Real Engineering Decision

Most beginners treat chunking as a configuration value — "use 500, that's what the tutorial said." The Intermediate course position is sharper: chunking is the single biggest decision controlling retrieval quality, and it's mostly invisible until you measure.

The Trade-Off Triangle

Three properties pull against each other:

  • Specificity — small chunks match queries precisely. "When is the deadline?" matches a 30-token chunk that says "The deadline is March 5" with high cosine similarity.
  • Context — large chunks give the LLM enough surrounding material to answer. The same March 5 deadline embedded in a 1000-token chunk also explains what deadline, the reasoning, exceptions, and follow-up actions.
  • Discriminability — a chunk that contains one idea matches one kind of query well. A chunk that crams ten ideas into 2000 tokens averages out into a vague vector that matches everything weakly.

You cannot maximize all three. The right answer depends on what your users will ask.

Chunk sizeSpecificityContextDiscriminability
Tiny (50–150 tokens)HighLowHigh
Small (200–400)HighMediumHigh
Medium (500–800)MediumMediumMedium
Large (1000–2000)LowHighLow
Huge (>2000)Very lowVery highVery low

Where Each Size Wins

  • Tiny chunks: FAQ-style queries where the answer is a single sentence. Q&A bots over short docs.
  • Small chunks: most RAG over technical documentation. The sweet spot most production systems land on.
  • Medium chunks: domain-heavy material where each paragraph carries multiple related facts. Legal contracts, scientific papers.
  • Large chunks: summarization tasks. When the LLM needs to see "the whole section" to answer.

The Sneaky Cost

Chunking happens at index time. To change your chunk size, you re-chunk every document, re-embed every chunk, and re-upsert every vector. For 10M docs at OpenAI's embedding pricing, that's a few hundred dollars and a few hours — not catastrophic but not a flip you make lightly.

This is why "measure before you ship" matters. Calibrate on a representative gold set (covered in Day 5 of this course) before committing to a chunk size in production. The cost of re-chunking is much higher than the cost of measuring well the first time.

Why It's Hard to Improve Later

Three things make chunking a high-leverage / hard-to-improve decision:

  1. It's invisible. Retrieval still returns something. Bad chunking produces results that look superficially fine but are systematically worse than they could be. You won't notice until you eval.
  2. It's coupled to your embedding model. Some embedding models are trained on shorter passages (e.g., all-MiniLM-L6-v2 on 256-token max), others on longer (text-embedding-3-large on 8191). Chunks much shorter or longer than the model's training distribution embed less well.
  3. It interacts with your retrieval architecture. If you're using hybrid search, chunking affects BM25 recall too — and BM25 prefers slightly larger chunks than dense search does.

A Reasonable Starting Default

If you have no data yet: start with 500 tokens, 50 token overlap, recursive splitter. This is what LangChain's defaults give you. It's not optimal for any specific case, but it's a defensible starting point you can measure from.

Key Takeaways
  • Chunking trades specificity, context, and discriminability against each other — no single setting wins for every workload
  • Chunk size is set at index time; re-chunking means re-embedding everything, so the calibration up front pays back forever
  • If you have no data yet, start with 500 tokens / 50 overlap / recursive splitter — the standard LangChain default — and measure from there

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
46 min
Lessons
5 sections