Name: Graph Embeddings & Inference
Availability: InStock

From Traversal to Representation Learning

So far in this track you've treated the graph as something you query — you traverse edges, run Cypher, compute centralities. Today the graph becomes something you learn from. The goal of graph representation learning is to map every node (and often every edge type) to a dense vector — an embedding — such that the geometry of the embedding space reflects the structure of the graph.

Why Embed a Graph at All?

A query like MATCH (a)-[:WORKS_FOR]->(c) returns only edges that literally exist. But real knowledge graphs are radically incomplete — Freebase was estimated to be missing the place-of-birth of ~70% of the people it contained. Two questions traversal can't answer well:

Link prediction: given the edges that exist, which missing edges are most plausible? (Is (Alice)-[:WORKS_FOR]->(Acme) likely even though it's not recorded?)
Similarity: which nodes are "alike" even if they're not directly connected? Two researchers who never co-authored may still be near-identical in role and topic.

Embeddings answer both because they place structurally/semantically similar nodes near each other in vector space, and because algebraic operations on the vectors approximate relational facts.

Two Families You Must Distinguish

People conflate these constantly, and it costs them:

Node embeddings (homogeneous, structure-driven): DeepWalk, node2vec, and graph neural networks. They embed nodes primarily from connectivity. The classic objective: nearby nodes (co-occurring on short random walks) get similar vectors. They mostly ignore edge types — an edge is an edge.
Knowledge-graph embeddings (KGE, relation-aware): TransE, DistMult, ComplEx, RotatE. They embed both entities and relations and are built specifically for multi-relational graphs (subject, predicate, object triples). Their objective is to score triples, which makes them the natural tool for link prediction on a typed KG.

A rough rule: if your graph is "who-knows-whom" with one edge meaning, reach for node2vec. If it's a typed KG with many relation types (bornIn, worksFor, capitalOf), reach for a KGE model.

The Closed-World vs Open-World Trap

Traversal-based reasoning is closed-world: an unrecorded edge is treated as false. Embedding-based link prediction is open-world: an unrecorded edge is unknown, and the model produces a plausibility score. This distinction governs how you generate training data — in particular, why you have to synthesize negative examples (covered in Section 3), because the graph only stores positives.

Key Takeaways

Graph representation learning maps nodes (and relations) to vectors so geometry mirrors structure — enabling link prediction and similarity that literal traversal cannot do
Node embeddings (DeepWalk/node2vec) are structure-driven and edge-type-agnostic; knowledge-graph embeddings (TransE/DistMult/RotatE) are relation-aware and built for typed triples
Traversal is closed-world (missing = false); embedding-based inference is open-world (missing = unknown, scored) — which is why you must synthesize negatives for training

Graph Embeddings & Inference

From Traversal to Representation Learning

From Traversal to Representation Learning

Why Embed a Graph at All?

Two Families You Must Distinguish

The Closed-World vs Open-World Trap

Random Walks: DeepWalk and node2vec

Knowledge-Graph Embeddings: TransE, DistMult, RotatE

Link Prediction and Inference: Learned vs Symbolic

Embeddings for Retrieval Augmentation

AI Learning Assistant

Course Stats

Up Next