The synthesis. Two worked case studies (small customer-support bot vs large legal research platform) walking the five-step design process end-to-end. Capacity planning, cost modeling, launch playbook. The final day of the Vector DB course family.
This is the capstone day. The previous four Advanced days covered topics in isolation: distributed search, fine-tuning, ColBERT, evaluation infrastructure. This day puts them together — how a real team decides what to build.
Production retrieval system design is a discipline. It works backward from requirements, makes scale-appropriate architecture choices, plans operations explicitly, and budgets cost up front. The teams that ship retrieval well follow the same process; the teams that struggle skip steps.
1. Requirements. What's the actual user-facing problem? What queries? What corpus? What latency budget? What quality bar? Without explicit answers, you're optimizing in a vacuum.
2. Sizing. How big is the corpus today? How fast is it growing? What's the QPS at launch and at peak? What's plausible growth over 18 months? Numbers, not guesses.
3. Architecture. Given the requirements and sizing, what retrieval stack? Distributed or single-node? Pure dense, hybrid, ColBERT? Fine-tuned or off-the-shelf embeddings? Each previous Advanced day gave you the rules — this step applies them.
4. Operations. Eval infrastructure, monitoring, alerts, deployment process, on-call. This is what makes the system stay good after launch.
5. Cost. Explicit estimates of monthly cost across embedding API, storage, compute, observability. This determines what's even feasible.
A team that does these five steps in writing produces a system that ships. A team that skips to "let's use Pinecone" produces one that costs five times what it should and underperforms.
Steps 1 (requirements) and 5 (cost) are the most frequently skipped. They're harder than the technical steps and they often surface uncomfortable answers (the requirements aren't fully understood, the cost is much higher than expected).
The discipline that distinguishes mature teams: write the requirements doc before picking the vector DB vendor. Estimate the cost before committing to an architecture. Discover bad fits on paper, not in production.
Real failure modes from teams that didn't do this:
These aren't hypothetical. They're the default outcome when teams skip the design process and start coding.
A retrieval system design doc looks something like:
Project: Customer support Q&A bot Requirements: 50K docs, 100 QPS, p99 < 2s end-to-end, 90% recall@5 Sizing: 50K docs × 1536-dim → 300MB; 100 QPS → single-node OK Architecture: pgvector + tsvector hybrid + cross-encoder rerank Operations: 200-query gold set, CI eval, daily drift check, single on-call Cost: ~$200/month Risks: corpus expected to grow 3x in year 1 → may need migration to dedicated vector DB
Four sentences per section, signed off by the team before the first line of code. The doc is small but it forces every decision to be conscious.
The next two sections walk through TWO concrete case studies — one small project, one large. Sections 4 and 5 cover the cost modeling and launch operations that apply to both. By the end, you'll have seen the design process applied end-to-end on systems at very different scales.
The exercises ask you to do the same: size a system, model its cost, decide its architecture, and validate the resulting design against requirements.
Powered by advanced LLM
Get personalized help with concepts, code examples, and explanations tailored to your learning pace.
Go build something