Name: Capstone: Production System Design
Availability: InStock

The System Design Process

This is the capstone day. The previous four Advanced days covered topics in isolation: distributed search, fine-tuning, ColBERT, evaluation infrastructure. This day puts them together — how a real team decides what to build.

Production retrieval system design is a discipline. It works backward from requirements, makes scale-appropriate architecture choices, plans operations explicitly, and budgets cost up front. The teams that ship retrieval well follow the same process; the teams that struggle skip steps.

The Five Steps

1. Requirements. What's the actual user-facing problem? What queries? What corpus? What latency budget? What quality bar? Without explicit answers, you're optimizing in a vacuum.

2. Sizing. How big is the corpus today? How fast is it growing? What's the QPS at launch and at peak? What's plausible growth over 18 months? Numbers, not guesses.

3. Architecture. Given the requirements and sizing, what retrieval stack? Distributed or single-node? Pure dense, hybrid, ColBERT? Fine-tuned or off-the-shelf embeddings? Each previous Advanced day gave you the rules — this step applies them.

4. Operations. Eval infrastructure, monitoring, alerts, deployment process, on-call. This is what makes the system stay good after launch.

5. Cost. Explicit estimates of monthly cost across embedding API, storage, compute, observability. This determines what's even feasible.

A team that does these five steps in writing produces a system that ships. A team that skips to "let's use Pinecone" produces one that costs five times what it should and underperforms.

Why Most Teams Skip Steps

Steps 1 (requirements) and 5 (cost) are the most frequently skipped. They're harder than the technical steps and they often surface uncomfortable answers (the requirements aren't fully understood, the cost is much higher than expected).

The discipline that distinguishes mature teams: write the requirements doc before picking the vector DB vendor. Estimate the cost before committing to an architecture. Discover bad fits on paper, not in production.

The Cost of Skipping

Real failure modes from teams that didn't do this:

"We chose Pinecone because it was easy." Then discovered six months later that at 100M vectors the bill is $25K/month and migrating to self-hosted Qdrant takes a quarter of engineering time.
"We started with single-vector retrieval." Six months later recall plateaus at 75% on a domain where 90% is the user expectation; hybrid search would have been free to add at design time but costs a re-index now.
"We didn't build eval infrastructure." Three retrieval changes silently degraded quality 8 points over a year; nobody knows which change did it.
"We figured we'd add monitoring later." Outage at 3 AM, no dashboards, two hours to diagnose what should have been a 10-minute fix.

These aren't hypothetical. They're the default outcome when teams skip the design process and start coding.

The Output

A retrieval system design doc looks something like:

Project: Customer support Q&A bot Requirements: 50K docs, 100 QPS, p99 < 2s end-to-end, 90% recall@5 Sizing: 50K docs × 1536-dim → 300MB; 100 QPS → single-node OK Architecture: pgvector + tsvector hybrid + cross-encoder rerank Operations: 200-query gold set, CI eval, daily drift check, single on-call Cost: ~$200/month Risks: corpus expected to grow 3x in year 1 → may need migration to dedicated vector DB

Four sentences per section, signed off by the team before the first line of code. The doc is small but it forces every decision to be conscious.

What This Day Will Do

The next two sections walk through TWO concrete case studies — one small project, one large. Sections 4 and 5 cover the cost modeling and launch operations that apply to both. By the end, you'll have seen the design process applied end-to-end on systems at very different scales.

The exercises ask you to do the same: size a system, model its cost, decide its architecture, and validate the resulting design against requirements.

Key Takeaways

Production retrieval design is a five-step discipline: requirements → sizing → architecture → operations → cost — skipping steps produces predictable failures
The most-skipped steps are requirements and cost — both surface uncomfortable answers and both are essential to ship a system that fits its actual problem
A design doc forces every decision to be conscious; teams that write one before coding ship systems that work, teams that skip to vendor selection often ship ones that don't

Capstone: Production System Design

The System Design Process

The System Design Process

The Five Steps

Why Most Teams Skip Steps

The Cost of Skipping

The Output

What This Day Will Do

Case Study: Customer Support Q&A Bot

Case Study: Legal Research Platform

Capacity Planning and Cost Modeling

The Production Launch Playbook

AI Learning Assistant

Course Stats

Course Family Complete