Back to Courses

Index Types: HNSW, IVF, and Quantization

Pull back the curtain on vectorDB.search(). Understand how HNSW graphs, IVF clustering, and quantization actually work — and which combination to pick when your dataset grows from a million vectors to a billion.

Day 1 Progress0%

Why Brute Force Breaks

The Beginner course treated vectorDB.search() as magic. This day pulls back the curtain. The first thing to understand is why anyone bothered inventing complicated index structures in the first place.

The Naive Approach

Given a query vector and a database of N vectors, the most straightforward way to find the nearest neighbors is:

  1. Compute the distance between the query and every vector in the database
  2. Sort by distance
  3. Return the top K

This is brute-force k-nearest-neighbors (k-NN). It's O(N × D) per query, where N is the number of vectors and D is the dimension. For N=10,000 and D=768, that's about 7.6 million floating-point operations — a few milliseconds on a modern CPU. Perfectly fine.

Where It Falls Apart

The trouble starts when N grows. Real-world vector databases hold a lot of vectors:

Use caseApproximate N
Personal notes app10,000
Mid-size company wiki1,000,000
Customer support corpus10,000,000
Web-scale image search1,000,000,000

At a billion vectors of 768 dimensions, a single brute-force query needs 768 billion multiplies — minutes per query on a single machine, before you've even sorted. Multiply that by every concurrent user and you have a system that can't function.

Approximate Nearest Neighbors (ANN)

The fix is to give up a tiny bit of accuracy in exchange for a massive speedup. Instead of finding the exact top K neighbors, you find vectors that are almost certainly among the top K, fast.

This is Approximate Nearest Neighbors search. The trade-off is measured by recall — the fraction of the true top K that your ANN search actually returned. A well-tuned ANN system returns 95–99% of the true top K while running 10–1000x faster than brute force.

Three Families of ANN

ANN algorithms fall into a few families:

  • Graph-based (HNSW, NSG, DiskANN) — build a navigable graph between vectors, search by walking it
  • Cluster-based (IVF, ScaNN) — group vectors into clusters, search only the closest few clusters
  • Tree-based (Annoy, KD-tree) — partition the space into a tree, prune branches that can't contain the answer
  • Hash-based (LSH) — hash similar vectors to the same bucket

In modern production systems, you'll mostly encounter HNSW and IVF, often combined with quantization to compress the vectors themselves. The next three sections cover each of these in detail.

Why You Should Care

It's tempting to think "I'll just use whatever my vector DB defaults to and tune later." That works until it doesn't. The choice of index determines:

  • Query latency (a 10ms HNSW lookup vs a 200ms IVF lookup at the same recall)
  • Build time (HNSW builds slowly; IVF builds fast)
  • Memory footprint (HNSW needs 2–4x the raw vector size; IVF-PQ can be 30x smaller)
  • Recall ceiling (HNSW can hit 99%+ recall; LSH typically tops out at 85%)

Get the choice wrong at the start and you'll either burn money on oversized infrastructure or watch p99 latency climb past your SLO and not know why.

Key Takeaways
  • Brute-force k-NN is O(N × D) per query — fine at 10K vectors, hopeless at 1B
  • ANN algorithms trade a small recall loss for a 10–1000x speedup
  • The right index choice affects latency, memory, build time, and your recall ceiling

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
45 min
Lessons
5 sections