Pull back the curtain on vectorDB.search(). Understand how HNSW graphs, IVF clustering, and quantization actually work — and which combination to pick when your dataset grows from a million vectors to a billion.
The Beginner course treated vectorDB.search() as magic. This day pulls back the curtain. The first thing to understand is why anyone bothered inventing complicated index structures in the first place.
Given a query vector and a database of N vectors, the most straightforward way to find the nearest neighbors is:
This is brute-force k-nearest-neighbors (k-NN). It's O(N × D) per query, where N is the number of vectors and D is the dimension. For N=10,000 and D=768, that's about 7.6 million floating-point operations — a few milliseconds on a modern CPU. Perfectly fine.
The trouble starts when N grows. Real-world vector databases hold a lot of vectors:
| Use case | Approximate N |
|---|---|
| Personal notes app | 10,000 |
| Mid-size company wiki | 1,000,000 |
| Customer support corpus | 10,000,000 |
| Web-scale image search | 1,000,000,000 |
At a billion vectors of 768 dimensions, a single brute-force query needs 768 billion multiplies — minutes per query on a single machine, before you've even sorted. Multiply that by every concurrent user and you have a system that can't function.
The fix is to give up a tiny bit of accuracy in exchange for a massive speedup. Instead of finding the exact top K neighbors, you find vectors that are almost certainly among the top K, fast.
This is Approximate Nearest Neighbors search. The trade-off is measured by recall — the fraction of the true top K that your ANN search actually returned. A well-tuned ANN system returns 95–99% of the true top K while running 10–1000x faster than brute force.
ANN algorithms fall into a few families:
In modern production systems, you'll mostly encounter HNSW and IVF, often combined with quantization to compress the vectors themselves. The next three sections cover each of these in detail.
It's tempting to think "I'll just use whatever my vector DB defaults to and tune later." That works until it doesn't. The choice of index determines:
Get the choice wrong at the start and you'll either burn money on oversized infrastructure or watch p99 latency climb past your SLO and not know why.
Powered by advanced LLM
Get personalized help with concepts, code examples, and explanations tailored to your learning pace.
Chunking Strategies