Production GraphRAG at Scale

Naive GraphRAG retrieves a fixed neighborhood and hopes the answer is in it. Production GraphRAG lets an LLM drive traversal — planning a path through the graph, deciding hop-by-hop where to go next, and stopping when it has enough. This day is about making that loop fast, cheap, and observable: bounding traversal cost, caching subgraphs and embeddings, holding a latency budget, and surviving the failure modes that only show up at scale.

Day 4 Progress0%

From Static Retrieval to Agentic Traversal

The beginner course built a GraphRAG pipeline that does this: extract entities from the question, look them up in the graph, pull a fixed k-hop neighborhood, dump it into the prompt. That works for "who is the CEO of Acme?" It falls apart on "which suppliers of Acme's competitors had a recall in the last two years?" — a question whose answer lives several hops away along a path you can't know in advance.

Why a Fixed Neighborhood Isn't Enough

A static k-hop fetch has two failure modes that pull in opposite directions:

  • Too shallow. The answer is 4 hops away but you fetched 2. It's simply not in the context, and the LLM either hallucinates or says "I don't know."
  • Too deep. You fetch 3 hops "to be safe," and a single well-connected node (a country, a popular product category, a hub author) explodes the neighborhood into tens of thousands of nodes. This is supernode blow-up, and it blows your token budget and your latency at the same time.

You cannot pick one k that is right for every question. The fix is to stop pre-deciding the shape of the subgraph and instead let the LLM decide where to go next, one step at a time.

The Agentic Loop

Agentic (or iterative) GraphRAG reframes retrieval as a control loop:

  1. Seed. Resolve the question's entities to graph nodes (this is the entity-linking step from earlier days). Those nodes are your starting frontier.
  2. Observe. Summarize the current frontier — node labels, the relationship types available from here, maybe a one-line description per neighbor. This is what the LLM "sees."
  3. Decide. The LLM, given the question and the observation, chooses an action: expand along a specific relationship type, follow a specific node, answer now, or give up.
  4. Act. Execute the chosen graph operation, updating the frontier and the accumulated evidence.
  5. Repeat until the LLM answers, or a budget (hops, nodes, tokens, time) is exhausted.

The crucial difference from naive RAG: the LLM never sees the whole graph. It sees a local view and steers. This is the same shape as a ReAct agent, with the "tools" being graph operations (expand, get_neighbors, get_node) instead of web search.

Expose Relationship Types, Not Raw Neighbors

The single most important design choice: when you show the LLM the current frontier, show it the available relationship types ("Acme has 3 outgoing SUPPLIES edges, 12 COMPETES_WITH edges, 1 HEADQUARTERED_IN edge") rather than the raw list of neighbor nodes. The LLM picks an edge type to traverse; you then materialize only those neighbors. This keeps the per-step observation small even at a supernode, and it turns the LLM's job into typed query planning rather than scrolling a giant list.

Key Takeaways
  • Static k-hop retrieval is simultaneously too shallow for deep questions and too deep at supernodes — no single k is correct
  • Agentic GraphRAG turns retrieval into an observe→decide→act loop where the LLM steers traversal one hop at a time and never sees the whole graph
  • Expose available relationship TYPES to the LLM instead of raw neighbor lists — this keeps observations small and turns the LLM into a typed query planner

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
50 min
Lessons
5 sections