Name: Index Types: HNSW, IVF, and Quantization
Availability: InStock

Why Brute Force Breaks

The Beginner course treated vectorDB.search() as magic. This day pulls back the curtain. The first thing to understand is why anyone bothered inventing complicated index structures in the first place.

The Naive Approach

Given a query vector and a database of N vectors, the most straightforward way to find the nearest neighbors is:

Compute the distance between the query and every vector in the database
Sort by distance
Return the top K

This is brute-force k-nearest-neighbors (k-NN). It's O(N × D) per query, where N is the number of vectors and D is the dimension. For N=10,000 and D=768, that's about 7.6 million floating-point operations — a few milliseconds on a modern CPU. Perfectly fine.

Where It Falls Apart

The trouble starts when N grows. Real-world vector databases hold a lot of vectors:

Use case	Approximate N
Personal notes app	10,000
Mid-size company wiki	1,000,000
Customer support corpus	10,000,000
Web-scale image search	1,000,000,000

At a billion vectors of 768 dimensions, a single brute-force query needs 768 billion multiplies — minutes per query on a single machine, before you've even sorted. Multiply that by every concurrent user and you have a system that can't function.

Approximate Nearest Neighbors (ANN)

The fix is to give up a tiny bit of accuracy in exchange for a massive speedup. Instead of finding the exact top K neighbors, you find vectors that are almost certainly among the top K, fast.

This is Approximate Nearest Neighbors search. The trade-off is measured by recall — the fraction of the true top K that your ANN search actually returned. A well-tuned ANN system returns 95–99% of the true top K while running 10–1000x faster than brute force.

Three Families of ANN

ANN algorithms fall into a few families:

Graph-based (HNSW, NSG, DiskANN) — build a navigable graph between vectors, search by walking it
Cluster-based (IVF, ScaNN) — group vectors into clusters, search only the closest few clusters
Tree-based (Annoy, KD-tree) — partition the space into a tree, prune branches that can't contain the answer
Hash-based (LSH) — hash similar vectors to the same bucket

In modern production systems, you'll mostly encounter HNSW and IVF, often combined with quantization to compress the vectors themselves. The next three sections cover each of these in detail.

Why You Should Care

It's tempting to think "I'll just use whatever my vector DB defaults to and tune later." That works until it doesn't. The choice of index determines:

Query latency (a 10ms HNSW lookup vs a 200ms IVF lookup at the same recall)
Build time (HNSW builds slowly; IVF builds fast)
Memory footprint (HNSW needs 2–4x the raw vector size; IVF-PQ can be 30x smaller)
Recall ceiling (HNSW can hit 99%+ recall; LSH typically tops out at 85%)

Get the choice wrong at the start and you'll either burn money on oversized infrastructure or watch p99 latency climb past your SLO and not know why.

Key Takeaways

Brute-force k-NN is O(N × D) per query — fine at 10K vectors, hopeless at 1B
ANN algorithms trade a small recall loss for a 10–1000x speedup
The right index choice affects latency, memory, build time, and your recall ceiling

Index Types: HNSW, IVF, and Quantization

Why Brute Force Breaks

Why Brute Force Breaks

The Naive Approach

Where It Falls Apart

Approximate Nearest Neighbors (ANN)

Three Families of ANN

Why You Should Care

HNSW — The Graph You Walk

IVF — Cluster First, Search Second

Quantization — Compressing the Vectors Themselves

Choosing Between Them

AI Learning Assistant

Course Stats

Up Next