Name: Observability & Drift in Production
Availability: InStock

What 'Healthy' Means for a Vector Store

A Postgres + pgvector store that worked perfectly on launch day will degrade silently. Not because the code changed — because the data changed underneath it. Embedding tables are unusually hostile to Postgres's storage model, and the failure mode is rarely a crash. It's a query that used to take 8 ms now taking 400 ms, or a top-5 retrieval that used to be right now quietly returning the wrong chunks.

Three Axes of Health

Operating a vector store means watching three independent things, and confusing them costs you incidents:

Latency — how long a similarity query takes. Degrades from bloat, bad plans, cache misses, and lock contention.
Throughput — how many queries per second the box sustains before latency blows up. Degrades from CPU saturation and autovacuum stealing I/O.
Quality (recall / relevance) — whether the results are actually right. This is the axis traditional DB monitoring completely ignores, and the one that hurts most in a RAG system.

A normal OLTP table has only the first two. The third — retrieval quality — is what makes a vector store special and is the theme that runs through this whole day.

Why Embedding Tables Are Different

Three properties make embedding tables uniquely demanding:

Rows are huge. A 1536-dim float32 vector is ~6 KB. Postgres stores anything over ~2 KB out-of-line in TOAST. So your "row" is really a small heap tuple plus a TOAST chunk, and every fetch can be two reads.
They churn. RAG corpora get re-embedded constantly: a doc is edited, you delete its old chunks and insert new ones; you swap embedding models and rewrite the entire column. High UPDATE/DELETE rates create dead tuples faster than almost any other table you run.
The index is expensive to maintain. HNSW and IVFFlat indexes in pgvector are costly to build and, for HNSW, mutated in place on every insert. They bloat, and rebuilding them is a heavyweight operation you must schedule.

The Observability Stack

For the rest of this day, the toolbox is:

pg_stat_statements — normalized, aggregated query stats. Your #1 tool for "what is slow."
pg_stat_user_tables / pg_stat_user_indexes — dead tuples, last (auto)vacuum, scan counts, index usage.
pgstattuple (extension) — exact bloat measurement when you need ground truth.
A golden eval set — a fixed set of queries with known-correct results, replayed to detect retrieval drift.
Dashboards + alerts wiring all of the above to something a human sees before a customer does.

If you only remember one framing: traditional monitoring tells you the query is slow; only an eval set tells you the answer is wrong. You need both.

Key Takeaways

A vector store has three health axes — latency, throughput, and retrieval quality — and traditional DB monitoring only covers the first two
Embedding tables are hostile to Postgres: huge TOASTed rows, constant churn from re-embedding, and expensive-to-maintain ANN indexes
The core toolbox is pg_stat_statements, pg_stat_user_tables/pgstattuple for bloat, and a golden eval set for drift — slow ≠ wrong, you must watch both

Observability & Drift in Production

What 'Healthy' Means for a Vector Store

What 'Healthy' Means for a Vector Store

Three Axes of Health

Why Embedding Tables Are Different

The Observability Stack

Finding the Queries That Hurt with pg_stat_statements

Bloat, Dead Tuples, and Tuning Autovacuum

When and How to REINDEX (CONCURRENTLY)

Detecting Retrieval Drift, Dashboards & Alerts

AI Learning Assistant

Course Stats

Up Next