A fixed retrieve-then-generate pipeline answers one question with one search. Some questions need several searches, a choice between indexes, or a calculation first. Agentic RAG hands that control to the LLM: using function calling and the ReAct loop, the model decides when to retrieve, what to retrieve, and when it finally has enough to answer.
The pipeline you've built so far is static: embed the question, retrieve once, rerank, generate. One question in, one search, one answer out. For a huge fraction of questions that's exactly right — and you should not reach for anything fancier when it works.
But some questions break the single-shot assumption.
A static pipeline answers all of these by doing the one thing it knows how to do — retrieve once — and then hoping. That's where quality quietly falls apart.
Agentic RAG stops treating the LLM as the final step and starts treating it as the controller. Instead of you hard-coding "always retrieve, then generate," you give the model a set of tools (search this index, search that index, do math, look up a record) and let it decide:
STATIC: question ─► retrieve ─► generate ─► answer
AGENTIC: question ─► [ LLM decides ] ─► tool? ─► observe ─┐
▲ │
└──────── loop until done ◄───────┘
│
▼
answer
Agentic RAG is strictly more powerful and strictly more expensive. Each decision is another LLM call, each search adds latency, and a loop that doesn't terminate cleanly can run up cost or spin forever. The skill this lesson teaches is not "always go agentic" — it's knowing the mechanism well enough to apply it only where the question shape demands it, and to bound it when you do.
Powered by advanced LLM
Get personalized help with concepts, code examples, and explanations tailored to your learning pace.
Conversational RAG