Name: Metadata Filtering & Query Planning
Availability: InStock

Filtered Vector Search: The Real Query

Day 1 showed how to combine dense vector similarity with sparse keyword matching. But in production, almost no RAG query is only a similarity search. It's a similarity search constrained by metadata: "find the chunks most similar to this question where tenant_id = 42 and doc_type = 'policy' and published_at > now() - interval '90 days'."

Why Filters Are Everywhere

The metadata predicates come from the application, not the model:

Multi-tenancy — never leak tenant A's documents into tenant B's results
Access control — only return rows the current user is allowed to see
Freshness — exclude stale or superseded versions
Faceting — language, source, category, region

In Postgres with pgvector this looks deceptively simple:

SELECT id, content
FROM chunks
WHERE tenant_id = 42 AND doc_type = 'policy'
ORDER BY embedding <=> $1
LIMIT 10;

The <=> operator is cosine distance (<-> is L2, <#> is negative inner product). The ORDER BY ... LIMIT is the KNN search. The WHERE is the filter. The hard part — and the subject of this whole day — is what the planner does with those two requirements together.

Two Ways to Run the Same Query

Postgres can satisfy that query in fundamentally different ways:

Exact / sequential: scan rows, apply the WHERE, sort the survivors by distance, take the top 10. Always correct, but O(N) — fine for thousands of rows, not millions.
Approximate index scan: walk an HNSW or IVFFlat index ordered by distance, and check the WHERE predicate on each candidate as it comes out.

The second is fast but introduces a subtlety that does not exist in unfiltered search: the index is ordered by distance, not by your filter. The vectors that pass your filter might be scattered deep in the index, and an approximate scan can give up before it finds enough of them. That is the filtered-ANN problem, and the rest of this day is about recognizing and fixing it.

The Mental Model

Think of an HNSW index as a guided tour of the nearest vectors in distance order. Unfiltered, the tour hands you the 10 closest and stops. Filtered, the tour hands you the closest vectors but you reject the ones that fail your WHERE. If only 1% of rows match your filter, the tour may walk past hundreds of rejected neighbors before finding 10 keepers — or hit its internal search-budget limit first and return too few, or miss the truly-nearest matching rows entirely. Lost recall, silently.

Key Takeaways

Production RAG queries are KNN search (ORDER BY embedding <=> $1 LIMIT k) plus a metadata WHERE clause
Postgres can answer them exactly (seq/sort, always correct, O(N)) or approximately (vector index scan, fast, recall-sensitive)
Combining a distance-ordered index with an unrelated filter is the filtered-ANN problem — the core challenge of the day

Metadata Filtering & Query Planning

Filtered Vector Search: The Real Query

Filtered Vector Search: The Real Query

Why Filters Are Everywhere

Two Ways to Run the Same Query

The Mental Model

Reading EXPLAIN and EXPLAIN ANALYZE

The Filtered-ANN Problem: Pre vs Post Filtering

Indexes That Support Your Filters

A Tuning Workflow for Filtered KNN

AI Learning Assistant

Course Stats

Up Next