Where dense retrieval fails, BM25 wins, and vice versa. The fundamentals of keyword search, Reciprocal Rank Fusion (the production standard for combining the two), when hybrid actually beats pure dense, and how Pinecone / Weaviate / Qdrant / Elasticsearch each implement it.
Pure dense retrieval — the kind covered in the Beginner course — is incredible at meaning matching. Ask "how do I make my database faster" and it finds docs about query tuning, indexing, and caching even when none of them use the literal phrase "make my database faster."
But it has a blind spot, and the blind spot is exactly where keyword search excels.
Dense embedding models compress text into a fixed-size vector that captures semantic meaning. The mechanism that makes them work for paraphrase — collapsing surface variation into the same conceptual region — also makes them lose information about literal tokens.
Three classes of queries where this hurts:
Identifiers and codes. A user types "error E_AUTH_4012". The vector for that string captures something like "this looks like an authentication-related error message." It does NOT carry the literal token E_AUTH_4012 as a strong feature. A doc that literally contains E_AUTH_4012 won't necessarily rank above a doc that talks about authentication errors in general.
Person and product names. "When did John Smith join the company?" — the embedding represents "someone with a first and last name joined." It's weakly anchored to "John Smith" specifically. If your corpus has hundreds of name-mention chunks, vector search returns name-mention-like results, not the right person's biographical chunk.
Out-of-distribution terminology. Domain-specific jargon that wasn't in the embedding model's training data (chemical names, gene symbols, protocol identifiers, your company's internal codenames) ends up represented as the closest "shape-like" token — which is often a different word entirely.
Keyword search — specifically the BM25 algorithm that powers Elasticsearch, OpenSearch, Solr, and basically every production text search built in the last 15 years — does the opposite. It scores documents on how much the literal query words appear, with adjustments for word rarity and document length.
That's terrible at paraphrase ("database speed" doesn't match "make my DB faster") but excellent at exact-term match. E_AUTH_4012 either appears or doesn't; BM25 nails it instantly.
This is the whole motivation for hybrid search. Dense and sparse retrieval miss different things:
| Query type | Dense | BM25 |
|---|---|---|
| "How do I make my DB faster" | ✓ finds rephrasing | ✗ misses if exact words absent |
| "error E_AUTH_4012" | ✗ doesn't anchor on token | ✓ exact match wins |
| "Who is John Smith" | ✗ name not strongly represented | ✓ matches "John Smith" literally |
| "best practices for caching" | ✓ semantic match | ✓ literal word overlap |
The two methods agree on the easy queries and disagree on the hard ones. Combining them captures both kinds of relevance — and you keep the wins from each.
Hybrid search runs both retrievals — dense ANN search and BM25 — gets a ranked list from each, and fuses the two lists into a single ranked list before sending to the LLM (or to the user).
The fusion strategy matters; the next three sections cover the math and the trade-offs. But the architecture is universal: index your docs twice (once with embeddings, once with an inverted index), query twice in parallel, merge the results.
Look at the search systems that handle the highest traffic in the world:
Pure dense-only retrieval is the beginner default. Production large-scale systems are almost all hybrid. This day is about closing that gap.
Powered by advanced LLM
Get personalized help with concepts, code examples, and explanations tailored to your learning pace.
Multi-Modal Embeddings