Name: Chunking Strategies That Don't Ruin Retrieval
Availability: InStock

Why Chunk Size Is a Real Engineering Decision

Most beginners treat chunking as a configuration value — "use 500, that's what the tutorial said." The Intermediate course position is sharper: chunking is the single biggest decision controlling retrieval quality, and it's mostly invisible until you measure.

The Trade-Off Triangle

Three properties pull against each other:

Specificity — small chunks match queries precisely. "When is the deadline?" matches a 30-token chunk that says "The deadline is March 5" with high cosine similarity.
Context — large chunks give the LLM enough surrounding material to answer. The same March 5 deadline embedded in a 1000-token chunk also explains what deadline, the reasoning, exceptions, and follow-up actions.
Discriminability — a chunk that contains one idea matches one kind of query well. A chunk that crams ten ideas into 2000 tokens averages out into a vague vector that matches everything weakly.

You cannot maximize all three. The right answer depends on what your users will ask.

Chunk size	Specificity	Context	Discriminability
Tiny (50–150 tokens)	High	Low	High
Small (200–400)	High	Medium	High
Medium (500–800)	Medium	Medium	Medium
Large (1000–2000)	Low	High	Low
Huge (>2000)	Very low	Very high	Very low

Where Each Size Wins

Tiny chunks: FAQ-style queries where the answer is a single sentence. Q&A bots over short docs.
Small chunks: most RAG over technical documentation. The sweet spot most production systems land on.
Medium chunks: domain-heavy material where each paragraph carries multiple related facts. Legal contracts, scientific papers.
Large chunks: summarization tasks. When the LLM needs to see "the whole section" to answer.

The Sneaky Cost

Chunking happens at index time. To change your chunk size, you re-chunk every document, re-embed every chunk, and re-upsert every vector. For 10M docs at OpenAI's embedding pricing, that's a few hundred dollars and a few hours — not catastrophic but not a flip you make lightly.

This is why "measure before you ship" matters. Calibrate on a representative gold set (covered in Day 5 of this course) before committing to a chunk size in production. The cost of re-chunking is much higher than the cost of measuring well the first time.

Why It's Hard to Improve Later

Three things make chunking a high-leverage / hard-to-improve decision:

It's invisible. Retrieval still returns something. Bad chunking produces results that look superficially fine but are systematically worse than they could be. You won't notice until you eval.
It's coupled to your embedding model. Some embedding models are trained on shorter passages (e.g., all-MiniLM-L6-v2 on 256-token max), others on longer (text-embedding-3-large on 8191). Chunks much shorter or longer than the model's training distribution embed less well.
It interacts with your retrieval architecture. If you're using hybrid search, chunking affects BM25 recall too — and BM25 prefers slightly larger chunks than dense search does.

A Reasonable Starting Default

If you have no data yet: start with 500 tokens, 50 token overlap, recursive splitter. This is what LangChain's defaults give you. It's not optimal for any specific case, but it's a defensible starting point you can measure from.

Key Takeaways

Chunking trades specificity, context, and discriminability against each other — no single setting wins for every workload
Chunk size is set at index time; re-chunking means re-embedding everything, so the calibration up front pays back forever
If you have no data yet, start with 500 tokens / 50 overlap / recursive splitter — the standard LangChain default — and measure from there

Chunking Strategies That Don't Ruin Retrieval

Why Chunk Size Is a Real Engineering Decision

Why Chunk Size Is a Real Engineering Decision

The Trade-Off Triangle

Where Each Size Wins

The Sneaky Cost

Why It's Hard to Improve Later

A Reasonable Starting Default

Fixed-Size vs Fixed-Token Chunking

Recursive Chunking — The Smart Default

Overlap, Boundaries, and Parent-Document Retrieval

Layout-Aware and Domain-Aware Chunking

AI Learning Assistant

Course Stats

Up Next