Name: Sharding with Citus for Billion-Scale Vectors
Availability: InStock

When pgvector Outgrows One Node

By now you can run pgvector on a single Postgres instance: a vector column, an HNSW index, and ORDER BY embedding <=> $1 LIMIT 10. That setup is genuinely good — it ships, it has transactions, and it fits the vast majority of products. This day is about the next problem: the one Postgres node that can no longer hold the index.

The Wall Is RAM, Not Disk

HNSW is an in-memory graph. For acceptable query latency, the index has to live in RAM (or at least in the OS page cache). A 1536-dim float32 vector is ~6 KB; the HNSW graph adds roughly m × 8 bytes per vector for neighbor links (~128 bytes at m=16), plus per-tuple Postgres overhead. Round to ~7 KB of resident memory per vector.

10M vectors → ~70 GB — comfortable on one big node
100M vectors → ~700 GB — you're now buying the largest instances
1B vectors → ~7 TB — past what a single machine holds

You hit the RAM wall long before you hit a disk or CPU wall. Once the working set spills out of memory, HNSW traversal turns random reads into page faults and p99 latency falls off a cliff.

Vertical First, Always

Before reaching for a cluster, exhaust the single node — it is dramatically simpler to operate:

Bigger instance. A 1–2 TB-RAM machine holds ~150–250M vectors. That covers most teams forever.
Quantization. pgvector's halfvec (float16) halves index memory at a small recall cost; binary/scalar quantization can cut 4–32×. This often buys you another order of magnitude on one node.
Lower dimensions. Matryoshka embeddings or PCA truncation shrink each vector linearly.
Partial / filtered indexes. If most queries hit recent data, index only the hot partition.

The honest rule of thumb: shard pgvector only when a single, well-quantized, well-provisioned node can't hold your working set or serve your write/query load — and you can say which of those is true, with numbers.

Enter Citus

Citus is a Postgres extension (now Microsoft-owned, also the engine behind Azure Cosmos DB for PostgreSQL) that turns one Postgres into a distributed cluster. It is still Postgres — same SQL, same drivers, same pgvector. The architecture:

A coordinator node holds metadata and plans queries. Clients connect here.
Worker nodes hold the actual data, split into shards (each shard is a regular Postgres table).
A distributed table is one logical table whose rows are spread across worker shards by a distribution column.

You keep writing SELECT ... ORDER BY embedding <=> $1; Citus rewrites it into per-shard queries, runs them on the workers in parallel, and combines the results. The rest of this day is about making that rewrite correct and fast for vector workloads, where the naive combine step has a recall trap.

Key Takeaways

Single-node pgvector hits a RAM wall (HNSW must be resident) around hundreds of millions of vectors — not a disk or CPU wall
Exhaust vertical options first: bigger instance, halfvec/quantization, lower dims, partial indexes — they are far simpler than a cluster
Citus turns Postgres into a coordinator + workers cluster of shards; it stays Postgres, so pgvector and your SQL keep working

Sharding with Citus for Billion-Scale Vectors

When pgvector Outgrows One Node

When pgvector Outgrows One Node

The Wall Is RAM, Not Disk

Vertical First, Always

Enter Citus

Distributed Tables and the Distribution Key

Distributed Vector Search and Top-K Merging

Co-location, Reference Tables, and Joins at Scale

Is Citus Worth It? Operations and the Decision

AI Learning Assistant

Course Stats

Up Next