Name: Capstone: Knowledge Graph Platform Design
Availability: InStock

The Two Case Studies & The Design Loop

This is the final capstone of the Knowledge Graphs track. The last four days each handed you one power tool — scaling (Day 1), temporal graphs (Day 2), graph embeddings (Day 3), and production GraphRAG (Day 4). Today we wire them into complete platforms by designing two real systems end-to-end and letting their differences teach the trade-offs.

Case A — ComplyGraph (enterprise compliance & provenance)

A regulated bank needs a graph of who approved what, when, and on whose authority. Entities: people, accounts, transactions, controls, policies, documents. The killer requirement is provenance and auditability: every edge must carry its source, the time it was valid, and who asserted it. Auditors run multi-hop questions like "show every transaction over \$10k approved by someone who later failed a controls review." Scale is modest (~50M nodes, ~400M edges) but correctness, lineage, and bitemporal history are non-negotiable.

Case B — AtlasRAG (large consumer GraphRAG product)

A consumer research assistant answers natural-language questions over ~80M documents by combining a knowledge graph with retrieval. Entities are extracted automatically from messy text at huge volume. The killer requirements are throughput, freshness, and answer quality at low latency — 3,000 QPS at peak, p95 under 900ms end-to-end, the graph growing by millions of nodes daily. Here a wrong-ish edge is tolerable; a slow or stale answer is not.

The Five-Stage Design Loop

Every knowledge-graph platform — both of ours — is the same pipeline. Memorize it; it is the spine of the rest of this lesson:

Ingestion — get raw sources into the system reliably (batch + streaming).
Extraction — turn raw data into candidate entities and relations.
Entity resolution — collapse duplicates so "IBM", "I.B.M." and "International Business Machines" become one node.
Storage & scale — pick the store, partition it, plan capacity.
Retrieval & serving — answer queries, often hybrid (graph traversal + vector + text).

The two case studies make the same decisions differently. That contrast is the lesson: there is no universal best architecture, only an architecture that fits a requirement profile.

Requirements Drive Everything

Before any boxes-and-arrows, write the numbers down. The requirement table is the single most useful artifact in a design review:

Dimension	ComplyGraph	AtlasRAG
Scale	50M nodes / 400M edges	1B+ nodes, +millions/day
Read pattern	Deep multi-hop audits, low QPS	Shallow hybrid retrieval, 3k QPS
Freshness	Hours acceptable	Minutes
Consistency	Strong, bitemporal, auditable	Eventual is fine
Failure cost	A wrong audit = regulatory fine	A stale answer = mild annoyance

Notice nothing here is about technology yet. Pick the requirement profile first; the stack falls out of it.

Key Takeaways

Every KG platform is the same five-stage loop: ingestion, extraction, entity resolution, storage/scale, retrieval/serving
Write the requirements table (scale, read pattern, freshness, consistency, failure cost) before drawing any architecture
The two case studies make the same decisions differently — there is no universal best stack, only a fit to the requirement profile

Capstone: Knowledge Graph Platform Design

The Two Case Studies & The Design Loop

The Two Case Studies & The Design Loop

Case A — ComplyGraph (enterprise compliance & provenance)

Case B — AtlasRAG (large consumer GraphRAG product)

The Five-Stage Design Loop

Requirements Drive Everything

Ingestion, Extraction & Entity Resolution

Storage, Scale & Capacity Planning

Hybrid Retrieval & Serving

Cost Modeling & The Launch Playbook

AI Learning Assistant

Course Stats

Up Next