Back to Courses

Building a Q&A Bot

The synthesis of Days 1-4 into a complete production Q&A bot. Ingestion pipeline, full query chain, citation-aware answers with refusal, and the gold-set evaluation + shadow/A/B rollout discipline that turns a RAG demo into a RAG product.

Day 5 Progress0%
Section 1 of 59 min

System Design Overview

This is the capstone. Days 1-4 gave you the building blocks: the RAG pipeline, prompt discipline, retrieval chains, and vector store integration. Day 5 puts them together into a complete production Q&A bot — the kind of system that lives in a real product, serves real users, and survives real outages. **The bot has four phases:** **Ingestion (offline)** turns source documents into searchable vectors. Runs on a schedule or whenever sources change. Slow operations belong here — chunking decisions, embedding API calls, batched upserts. **Query (online)** turns a user's question into context-grounded retrieval. The latency-sensitive path. Cache aggressively, time out fast, fall back gracefully. **Generation (online)** turns retrieved chunks into a final answer. Streams to the user. Manages citations, refusal, length. **Operation (continuous)** measures, monitors, and improves. Eval against a gold set. Watch cost per query. Detect drift. Roll out changes carefully. **The component diagram of a production Q&A bot:** ``` sources ──→ parser ──→ chunker ──→ embedder ──→ vector store ↑ (ingestion runs offline, on a schedule or trigger) │ │ user query ─→ classifier ─→ rewrite ─→ search ──────┘ │ │ └─→ direct (cheap) └─→ rerank │ ↓ generator with citations ──→ user │ ↓ observability (logs, costs, eval) ``` **Decision matrix for which Day 1-4 features apply:** - **Day 1 (RAG anatomy)** — every system starts here. Single-pass is the baseline; layer the rest on top. - **Day 2 (prompts)** — every system needs version-controlled prompts and structured outputs. Non-negotiable. - **Day 3 (chains)** — add rewriting when you see conversational queries fail; add HyDE when corpus is heavily declarative; add multi-step only for genuine multi-hop needs. - **Day 4 (vector store)** — pick a store based on team familiarity first, performance second. Thin internal interface. Tenant-isolated filter that's impossible to skip. **The capstone discipline:** start with the smallest thing that works. Single-pass RAG, simple prompt, pgvector or whatever's nearest, in-process embedding cache. Get it serving real users. *Then* measure where it fails, and add the chain step or operational hygiene that fixes the failure you actually have. Adding all of Day 3's chain steps at once costs you debugging surface, latency, and money for problems you may not have. A working simple system always beats a half-built sophisticated one.
// The minimum viable Q&A bot — Days 1-4 collapsed
async function answerQuestion(query, ctx) {
  // 1. Tenant-isolated retrieval (Day 4)
  const queryVec = await embed(query);
  const chunks = await ctx.store.search(queryVec, 5, {
    tenant_id: ctx.tenantId, // never bypassed
  });

  // 2. Build grounded prompt (Day 2)
  const context = chunks.map((c, i) =>
    `[${i + 1}] ${c.text}`
  ).join("\n\n");

  // 3. Generate with citations (Day 2 + 5)
  const answer = await ctx.llm.generate({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content:
        "Answer using ONLY the provided context. " +
        "Cite sources as [N]. If context doesn't contain the answer, say " +
        '"I don\'t have enough information to answer that."'
      },
      { role: "user", content:
        `Context:\n${context}\n\nQuestion: ${query}`
      },
    ],
  });

  return { answer, sources: chunks.map(c => c.id) };
}

// This is ~30 lines and works. Day 3's chain steps come later
// when you measure a problem they'd solve.

Key Takeaways

  • Four phases: ingestion (offline), query (online), generation (online), operation (continuous)
  • Start with single-pass + clean prompt; layer chain steps when failures justify them
  • Tenant isolation, stable IDs, observability are non-negotiable from day one
  • A 30-line minimum viable Q&A bot beats a half-built sophisticated one
  • Measure failures in production before adding sophistication

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
45 min
Lessons
5 sections

Course Complete

Beginner Capstone

You've completed all 5 days of LLM Integration Beginner. Ship the smallest thing that works, watch it break, fix it, ship again.