One machine stops being enough long before your graph stops growing. This day is about what breaks when you partition a graph across machines: why graph partitioning is NP-hard, how cross-partition edges quietly dominate traversal cost, what supernodes do to your latency, and how Neo4j Fabric, JanusGraph, TigerGraph, and Amazon Neptune each make a different bet on where the hard part should live.
The Beginner and Intermediate courses treated the graph as a single thing living on a single machine. The Advanced course opens with the inconvenient truth: a graph is the worst-case data structure to split across machines. Relational rows shard cleanly by key. Vector indexes shard by hashing IDs. A graph fights you, because the entire value of a graph is in its edges — and edges are exactly what partitioning has to cut.
When you split a graph's vertices across P partitions, every edge is either local (both endpoints in the same partition) or a cross-partition edge (endpoints in different partitions, sometimes called a "cut edge" or "ghost edge"). Local edges are cheap to traverse — a pointer chase in RAM. Cross-partition edges require a network hop to follow.
The whole game of graph partitioning is: minimize the number of cross-partition edges while keeping partitions roughly balanced in size. Formally this is the balanced minimum k-cut problem.
The unconstrained minimum cut between two vertices is solvable in polynomial time (max-flow / min-cut theorem). But the moment you add the balance constraint — each partition must hold roughly |V| / P vertices — the problem becomes NP-hard. You cannot, in general, find the optimal balanced partition of a large graph in reasonable time.
So in practice nobody computes the optimal cut. They use heuristics:
There are two dual ways to partition:
Most interesting graphs are scale-free / power-law: degree distribution follows a heavy tail. A handful of vertices (celebrities, popular products, the literal node for "United States") have millions of edges, while most vertices have a handful. This destroys naive partitioning — you can't cut around a vertex whose edges touch every partition no matter where you put it. That special pain has a name, and we devote Section 3 to it: supernodes.
Distribution is not free and the cost is recall and latency on traversals, not storage. A graph that is 5% cross-partition edges can still be 50%+ cross-partition traversal cost, because traversals concentrate on the well-connected core. Measuring cross-partition edge ratio is the first diagnostic — and you'll build exactly that in the Practice tab.