The intermediate capstone. Assemble the five days into one production RAG service — the end-to-end request path from retrieve → rerank → conversational memory → generate, wrapped in the reliability, observability, and evaluation discipline that turns a working pipeline into a service you can ship and operate.
For four days you built techniques in isolation: reranking (Day 1), agentic tool use (Day 2), conversational memory (Day 3), and evaluation (Day 4). A production service is what you get when you wire them into one system that takes a user's message and returns a grounded, cited answer — reliably, observably, and within a latency and cost budget.
Every RAG service splits into an offline half and an online half, exactly as in the beginner course — but each stage is now the production-grade version you spent the week learning.
┌─────────────── OFFLINE ───────────────┐
sources ─► ingest ─► chunk ─► embed ─► vector store + metadata
└────────────────────────────────────────┘
┌─────────────── ONLINE ────────────────┐
user msg ─► contextualize (memory, Day 3)
└─► retrieve top-N (bi-encoder)
└─► rerank to top-K (cross-encoder, Day 1)
└─► [agentic loop? tool use, Day 2]
└─► assemble prompt + cite
└─► generate (LLM)
└─► answer + sources
└────────────────────────────────────────┘
│
evaluation (Day 4) ◄── sample traffic, gold sets, online metrics
Think of each online stage as a replaceable component with a typed contract:
| Component | In → Out | From |
|---|---|---|
| Contextualizer | (message, history) → standalone query | Day 3 |
| Retriever | query → top-N candidates | Beginner + bi-encoder |
| Reranker | (query, candidates) → top-K + scores | Day 1 |
| Agent loop (optional) | query → tool calls → observations | Day 2 |
| Generator | (prompt, context) → answer + citations | Beginner |
| Evaluator | (query, context, answer) → scores | Day 4 |
Designing the service as components — not one monolithic function — is what lets you swap a reranker, add an agent step, or A/B a new prompt without rewriting the whole path. That decomposition is the single most important architectural decision in this lesson.
Powered by advanced LLM
Get personalized help with concepts, code examples, and explanations tailored to your learning pace.
LLM Integration Track