Name: Capstone: Knowledge Graph Platform Design
Availability: InStock

The Two Case Studies & The Design Loop

This is the final capstone of the Knowledge Graphs track. The last five days each handed you one power tool — scaling (Day 1), temporal graphs (Day 2), graph embeddings (Day 3), graph neural networks (Day 4), and production GraphRAG (Day 5). Today we wire them into complete platforms by designing two real systems end-to-end and letting their differences teach the trade-offs.

Case A — ComplyGraph (enterprise compliance & provenance)

A regulated bank needs a graph of who approved what, when, and on whose authority. Entities: people, accounts, transactions, controls, policies, documents. The killer requirement is provenance and auditability: every edge must carry its source, the time it was valid, and who asserted it. Auditors run multi-hop questions like "show every transaction over \$10k approved by someone who later failed a controls review." Scale is modest (~50M nodes, ~400M edges) but correctness, lineage, and bitemporal history are non-negotiable.

Case B — AtlasRAG (large consumer GraphRAG product)

A consumer research assistant answers natural-language questions over ~80M documents by combining a knowledge graph with retrieval. Entities are extracted automatically from messy text at huge volume. The killer requirements are throughput, freshness, and answer quality at low latency — 3,000 QPS at peak, p95 under 900ms end-to-end, the graph growing by millions of nodes daily. Here a wrong-ish edge is tolerable; a slow or stale answer is not.

The Five-Stage Design Loop

Every knowledge-graph platform — both of ours — is the same pipeline. Memorize it; it is the spine of the rest of this lesson:

Ingestion — get raw sources into the system reliably (batch + streaming).
Extraction — turn raw data into candidate entities and relations.
Entity resolution — collapse duplicates so "IBM", "I.B.M." and "International Business Machines" become one node.
Storage & scale — pick the store, partition it, plan capacity.
Retrieval & serving — answer queries, often hybrid (graph traversal + vector + text).

The two case studies make the same decisions differently. That contrast is the lesson: there is no universal best architecture, only an architecture that fits a requirement profile.

Requirements Drive Everything

Before any boxes-and-arrows, write the numbers down. The requirement table is the single most useful artifact in a design review:

Dimension	ComplyGraph	AtlasRAG
Scale	50M nodes / 400M edges	1B+ nodes, +millions/day
Read pattern	Deep multi-hop audits, low QPS	Shallow hybrid retrieval, 3k QPS
Freshness	Hours acceptable	Minutes
Consistency	Strong, bitemporal, auditable	Eventual is fine
Failure cost	A wrong audit = regulatory fine	A stale answer = mild annoyance

Notice nothing here is about technology yet. Pick the requirement profile first; the stack falls out of it.

Key Takeaways

Every KG platform is the same five-stage loop: ingestion, extraction, entity resolution, storage/scale, retrieval/serving
Write the requirements table (scale, read pattern, freshness, consistency, failure cost) before drawing any architecture
The two case studies make the same decisions differently — there is no universal best stack, only a fit to the requirement profile

Capstone: Knowledge Graph Platform Design

The Two Case Studies & The Design Loop

The Two Case Studies & The Design Loop

Case A — ComplyGraph (enterprise compliance & provenance)

Case B — AtlasRAG (large consumer GraphRAG product)

The Five-Stage Design Loop

Requirements Drive Everything

Ingestion, Extraction & Entity Resolution

Storage, Scale & Capacity Planning

Hybrid Retrieval & Serving

Cost Modeling & The Launch Playbook

AI Learning Assistant

Course Stats

Up Next