The synthesis. This is the FINAL capstone of the PostgreSQL for AI track. Two worked case studies — a single-tenant internal knowledge assistant vs a large multi-tenant SaaS at 100M+ chunks — walk the full design process end-to-end: schema, pgvector indexing, hybrid search, metadata filtering, scaling with replication and Citus, incremental embedding, and observability. Capacity planning, cost modeling, launch playbook.
This is the final capstone of the PostgreSQL for AI track. Across the previous four days you learned the pieces in isolation: replication (Day 1), Citus sharding (Day 2), embedding pipelines (Day 3), and observability (Day 4). Today you assemble them into a complete, production Postgres-backed RAG platform — and you do it twice, against two deliberately different profiles, so you can feel where each decision flips.
Profile A — Internal Knowledge Assistant (single-tenant). A 2,000-person company wants a "chat with our docs" assistant over Confluence, Google Drive, and a Slack export. Roughly 2 million chunks total, growing ~5% a month. Peak load is ~20 QPS during business hours, near zero overnight. One trust boundary (everyone is an employee), modest latency expectations (~2s end-to-end is fine), and a small platform team.
Profile B — Multi-Tenant SaaS RAG (large). A document-intelligence SaaS serves 8,000 customer tenants, each with their own corpus. Aggregate 120 million chunks and climbing 10M/month. Peak 1,500 QPS, strict tenant isolation (a query must never leak across tenants), p95 latency SLO of 400ms for retrieval, and a 99.9% availability target.
These are not "small vs big" in size only — they differ in isolation model, growth rate, SLO, and team capacity. Those four axes drive almost every decision below.
For each profile we walk the same seven-step spine:
vector column, metadata columns, and the chunk/document relationship.maintenance_work_mem.tsvector/BM25-style) ranking.Before any of that: estimate storage, because it sets everything downstream (index type, RAM, whether you shard at all). The formula you'll reuse all day:
bytes_per_chunk ≈ (dims × 4) + raw_text + metadata + index_overhead
For 1536-dim OpenAI embeddings: the vector alone is 1536 × 4 = 6,144 bytes (~6 KB). HNSW adds roughly 2–4 KB/chunk of graph. Add ~1–2 KB for text + metadata + row overhead. Call it ~10 KB/chunk all-in as a planning rule.
That one calculation already tells you Profile A is a single-node-plus-replicas story and Profile B is a distributed story. Everything else is detail.
Powered by advanced LLM
Get personalized help with concepts, code examples, and explanations tailored to your learning pace.
You finished the final capstone of the PostgreSQL for AI track. You can now design, size, scale, and operate a production Postgres-backed RAG platform end-to-end.
Explore more tracks