Graph Embeddings & Inference

A knowledge graph is more than a queryable store of facts — it is a space you can learn from. Today you turn nodes and edges into vectors with random-walk methods (DeepWalk, node2vec) and translational/bilinear models (TransE, DistMult, RotatE), score missing links, contrast learned embeddings against ontological and rule-based inference, and use those embeddings to augment retrieval beyond what literal traversal can reach.

Day 3 Progress0%

From Traversal to Representation Learning

So far in this track you've treated the graph as something you query — you traverse edges, run Cypher, compute centralities. Today the graph becomes something you learn from. The goal of graph representation learning is to map every node (and often every edge type) to a dense vector — an embedding — such that the geometry of the embedding space reflects the structure of the graph.

Why Embed a Graph at All?

A query like MATCH (a)-[:WORKS_FOR]->(c) returns only edges that literally exist. But real knowledge graphs are radically incomplete — Freebase was estimated to be missing the place-of-birth of ~70% of the people it contained. Two questions traversal can't answer well:

  • Link prediction: given the edges that exist, which missing edges are most plausible? (Is (Alice)-[:WORKS_FOR]->(Acme) likely even though it's not recorded?)
  • Similarity: which nodes are "alike" even if they're not directly connected? Two researchers who never co-authored may still be near-identical in role and topic.

Embeddings answer both because they place structurally/semantically similar nodes near each other in vector space, and because algebraic operations on the vectors approximate relational facts.

Two Families You Must Distinguish

People conflate these constantly, and it costs them:

  • Node embeddings (homogeneous, structure-driven): DeepWalk, node2vec, and graph neural networks. They embed nodes primarily from connectivity. The classic objective: nearby nodes (co-occurring on short random walks) get similar vectors. They mostly ignore edge types — an edge is an edge.
  • Knowledge-graph embeddings (KGE, relation-aware): TransE, DistMult, ComplEx, RotatE. They embed both entities and relations and are built specifically for multi-relational graphs (subject, predicate, object triples). Their objective is to score triples, which makes them the natural tool for link prediction on a typed KG.

A rough rule: if your graph is "who-knows-whom" with one edge meaning, reach for node2vec. If it's a typed KG with many relation types (bornIn, worksFor, capitalOf), reach for a KGE model.

The Closed-World vs Open-World Trap

Traversal-based reasoning is closed-world: an unrecorded edge is treated as false. Embedding-based link prediction is open-world: an unrecorded edge is unknown, and the model produces a plausibility score. This distinction governs how you generate training data — in particular, why you have to synthesize negative examples (covered in Section 3), because the graph only stores positives.

Key Takeaways
  • Graph representation learning maps nodes (and relations) to vectors so geometry mirrors structure — enabling link prediction and similarity that literal traversal cannot do
  • Node embeddings (DeepWalk/node2vec) are structure-driven and edge-type-agnostic; knowledge-graph embeddings (TransE/DistMult/RotatE) are relation-aware and built for typed triples
  • Traversal is closed-world (missing = false); embedding-based inference is open-world (missing = unknown, scored) — which is why you must synthesize negatives for training

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
50 min
Lessons
5 sections