LLM Integration — Intermediate
Take the beginner RAG pipeline to production: cross-encoder reranking, agentic RAG with tool use, conversational memory, RAG evaluation frameworks, and a production RAG service capstone.
- 1
Reranking: Retrieve-then-Rerank with Cross-Encoders
The highest-ROI quality lever in RAG after chunking. Why top-K vector search is coarse, how a cross-encoder differs from the bi-encoder that powers retrieval, the retrieve-then-rerank pattern and its latency budget, the production rerankers (Cohere, BGE, Voyage), and how to tune over-fetch depth and measure the lift.
50 minCross-EncodersRerankingRetrieve-then-Rerank - 2
Agentic RAG: Tool Use, Function Calling & ReAct
Hand retrieval to the model itself. When a fixed retrieve-then-generate pipeline isn't enough, an agentic loop lets the LLM decide when and what to search. Function calling and tool schemas, the ReAct reason-act-observe loop, designing the retriever as a tool, multi-index routing, and the orchestration discipline — step limits, cost, and failure modes — that keeps an agent from looping forever.
55 minTool UseFunction CallingReAct - 3
Conversational RAG: Multi-Turn Memory & State
Real users ask follow-ups. 'What about the enterprise tier?' means nothing without the previous turn. Why naive RAG breaks on multi-turn, history-aware query contextualization (condense the question before you retrieve), memory types (buffer, windowed, summary), managing the token budget, and the session, persistence, and privacy concerns of holding conversation state.
50 minConversational MemoryQuery ContextualizationSession State - 4
Evaluating RAG: RAGAS, TruLens & LLM-as-Judge
You can't improve what you can't measure. RAG fails on two sides — retrieval and generation — and needs metrics for both. The RAGAS triad (faithfulness, answer relevance, context precision/recall), LLM-as-judge scoring and its biases, the frameworks that package these (RAGAS, TruLens' RAG triad), and how to build an eval harness with regression gates in CI.
55 minRAGASLLM-as-JudgeFaithfulness - 5
Capstone: A Production RAG Service
The intermediate capstone — assemble the five days into one production RAG service. The end-to-end request path (retrieve → rerank → conversational memory → generate), reliability patterns (caching, timeouts, fallbacks, graceful degradation), observability and online evaluation, and a worked case study with a capacity, cost, and launch playbook.
60 minSystem DesignCapstoneProduction RAG