Graph Algorithms for Retrieval

Storing facts as a graph is only half the story — the real power comes from the algorithms that run over it. Today you'll learn the classic graph algorithms that turn a knowledge graph into a retrieval and ranking engine: traversal (BFS/DFS), shortest paths, PageRank and personalized PageRank, community detection, and centrality. You'll build the intuition for why each one matters, work small examples by hand, then see how they decide what to retrieve and in what order — the foundation of GraphRAG.

Day 4 Progress0%

Traversal: BFS and DFS

Every graph algorithm is built on traversal — systematically visiting nodes by following edges. Two strategies dominate, and the order in which they visit nodes is exactly what makes them useful for retrieval.

Breadth-First Search (BFS)

BFS explores the graph in waves. Starting from a seed node, it visits all nodes one hop away, then all nodes two hops away, and so on. It uses a queue (first-in, first-out).

Because BFS reaches nodes in increasing order of distance, the first time it touches any node it has found a shortest path (in number of hops) to it. This is why BFS is the natural engine for "what is near this entity?" retrieval.

Depth-First Search (DFS)

DFS dives down one branch as far as it can before backtracking. It uses a stack (last-in, first-out), or equivalently recursion. DFS is great for:

  • Cycle detection (did we come back to a node already on the current path?)
  • Topological ordering of a DAG
  • Exhaustive path enumeration between two nodes

The Key Difference for Retrieval

BFSDFS
Data structureQueue (FIFO)Stack / recursion
VisitsClosest nodes firstOne deep branch first
Best forNearest neighbors, shortest hopsReachability, cycles, enumeration

Always Track Visited Nodes

Real knowledge graphs are full of cycles (A → B → C → A). Without a visited set, traversal loops forever. Both BFS and DFS must mark nodes as seen before (or when) they are enqueued/pushed.

Worked Example

Graph: A→B, A→C, B→D, C→D, D→E. Starting BFS at A:

wave 0: A
wave 1: B, C
wave 2: D
wave 3: E

So A is 0 hops from itself, B and C are 1 hop, D is 2 hops, E is 3 hops — a complete distance map from a single source.

Key Takeaways
  • BFS uses a queue and visits closest nodes first, giving shortest hop-distances for free
  • DFS uses a stack/recursion and is ideal for cycles, reachability, and path enumeration
  • Always keep a visited set — real graphs have cycles that otherwise loop forever

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
60 min
Lessons
5 sections