Capstone: Postgres RAG Platform Design

The synthesis. This is the FINAL capstone of the PostgreSQL for AI track. Two worked case studies — a single-tenant internal knowledge assistant vs a large multi-tenant SaaS at 100M+ chunks — walk the full design process end-to-end: schema, pgvector indexing, hybrid search, metadata filtering, scaling with replication and Citus, incremental embedding, and observability. Capacity planning, cost modeling, launch playbook.

Day 5 Progress (Capstone)0%

Two Profiles, One Design Process

This is the final capstone of the PostgreSQL for AI track. Across the previous four days you learned the pieces in isolation: replication (Day 1), Citus sharding (Day 2), embedding pipelines (Day 3), and observability (Day 4). Today you assemble them into a complete, production Postgres-backed RAG platform — and you do it twice, against two deliberately different profiles, so you can feel where each decision flips.

The Two Case Studies

Profile A — Internal Knowledge Assistant (single-tenant). A 2,000-person company wants a "chat with our docs" assistant over Confluence, Google Drive, and a Slack export. Roughly 2 million chunks total, growing ~5% a month. Peak load is ~20 QPS during business hours, near zero overnight. One trust boundary (everyone is an employee), modest latency expectations (~2s end-to-end is fine), and a small platform team.

Profile B — Multi-Tenant SaaS RAG (large). A document-intelligence SaaS serves 8,000 customer tenants, each with their own corpus. Aggregate 120 million chunks and climbing 10M/month. Peak 1,500 QPS, strict tenant isolation (a query must never leak across tenants), p95 latency SLO of 400ms for retrieval, and a 99.9% availability target.

These are not "small vs big" in size only — they differ in isolation model, growth rate, SLO, and team capacity. Those four axes drive almost every decision below.

The Design Process (the spine of this day)

For each profile we walk the same seven-step spine:

  1. Schema — tables, the vector column, metadata columns, and the chunk/document relationship.
  2. pgvector indexing — HNSW vs IVFFlat, dimensionality, quantization, and maintenance_work_mem.
  3. Hybrid search — combining dense vector similarity with full-text (tsvector/BM25-style) ranking.
  4. Metadata filtering — pre- vs post-filter, partial indexes, and the tenant predicate.
  5. Scaling — read replicas vs Citus distribution, and when each is warranted.
  6. Incremental embedding — keeping the index fresh without full re-embeds.
  7. Observability — what to measure and the alerts that catch silent recall loss.

The Single Most Important Up-Front Number

Before any of that: estimate storage, because it sets everything downstream (index type, RAM, whether you shard at all). The formula you'll reuse all day:

bytes_per_chunk ≈ (dims × 4) + raw_text + metadata + index_overhead

For 1536-dim OpenAI embeddings: the vector alone is 1536 × 4 = 6,144 bytes (~6 KB). HNSW adds roughly 2–4 KB/chunk of graph. Add ~1–2 KB for text + metadata + row overhead. Call it ~10 KB/chunk all-in as a planning rule.

  • Profile A: 2M × 10 KB ≈ 20 GB — fits in RAM on a single mid-size instance. You almost certainly do not need to shard.
  • Profile B: 120M × 10 KB ≈ 1.2 TB — far past a single comfortable node's RAM. You will shard (Citus) and/or partition by tenant.

That one calculation already tells you Profile A is a single-node-plus-replicas story and Profile B is a distributed story. Everything else is detail.

Key Takeaways
  • Four axes drive every design decision: isolation model, growth rate, SLO, and team capacity — size alone is not enough
  • Always estimate all-in storage first (~10 KB/chunk for 1536-dim + HNSW) — it determines index type, RAM, and whether you shard at all
  • Profile A (~20 GB) is a single-node-plus-replicas story; Profile B (~1.2 TB) is inherently a distributed (Citus) story

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
55 min
Lessons
5 sections

Track complete 🎉

You finished the final capstone of the PostgreSQL for AI track. You can now design, size, scale, and operate a production Postgres-backed RAG platform end-to-end.

All 5 advanced days finished

Explore more tracks