Name: Hybrid Search: tsvector + pgvector
Availability: InStock

Why Pure Vector Search Isn't Enough

If you finished the beginner track you can already store embeddings in a vector column and run ORDER BY embedding <=> query to get semantic neighbors. That's powerful — but on its own it quietly fails on a whole category of queries.

Where Embeddings Fall Down

Vector search retrieves by meaning, which is exactly wrong when the user wants an exact token:

Identifiers and codes — SKU-4471, ORA-00942, CVE-2024-3094. Embedding models tokenize these oddly and place near-duplicates far apart.
Rare proper nouns — a surname or product name the embedding model never saw in training collapses toward generic neighbors.
Negation and exact phrasing — "invoices not yet paid" embeds close to "paid invoices."
Out-of-domain jargon — internal acronyms that simply aren't in the model's vocabulary.

Keyword search has the opposite failure mode: it nails exact terms but misses synonyms and paraphrases ("car" vs "automobile", "how do I cancel" vs "termination process").

The Hybrid Idea

Hybrid search runs both retrievers and fuses their results:

A lexical retriever — Postgres full-text search over a tsvector, ranked by ts_rank / ts_rank_cd (BM25-like term weighting).
A semantic retriever — pgvector nearest-neighbor by cosine or L2 distance.

Each retriever produces a ranked list. A fusion step merges the two lists into one final ranking. The result reliably beats either retriever alone on real-world query mixes, because the two cover each other's blind spots.

One Database, Two Indexes

The nice part for a Postgres shop: you don't need a separate search cluster. A single table can carry both a tsvector column (with a GIN index) and a vector column (with an HNSW or IVFFlat index). Both retrievers, the fusion, and your business filters all live in one SQL query against one transactional store — no sync pipeline, no dual-write consistency problem.

What This Day Covers

Section 2: Postgres full-text search internals — tsvector, tsquery, ts_rank, and the GIN index.
Section 3: pgvector recap and why its scores aren't directly comparable to ts_rank.
Section 4: Reciprocal Rank Fusion (RRF) and why fusing ranks beats fusing scores.
Section 5: Implementing the full hybrid query in SQL with CTEs, plus when hybrid is and isn't worth it.

Key Takeaways

Vector search retrieves by meaning and misses exact tokens like codes, IDs, and rare names
Keyword (full-text) search nails exact terms but misses synonyms and paraphrases
Hybrid search runs both retrievers and fuses the rankings — covering each other's blind spots, all inside one Postgres table

Hybrid Search: tsvector + pgvector

Why Pure Vector Search Isn't Enough

Why Pure Vector Search Isn't Enough

Where Embeddings Fall Down

The Hybrid Idea

One Database, Two Indexes

What This Day Covers

Postgres Full-Text Search: tsvector, tsquery, ts_rank

The pgvector Half and Why Scores Don't Mix

Reciprocal Rank Fusion (RRF)

The Hybrid Query in SQL — and When to Use It

AI Learning Assistant

Course Stats

Up Next