Name: Supply-Chain & Model Provenance
Availability: InStock

The RAG Supply Chain

You did not build most of what your RAG system runs on. You inherited it. A production pipeline stacks a base model, maybe a fine-tune, an embedding model, one or more datasets, a pile of library dependencies — and, uniquely for RAG, the documents it ingests at runtime. Every one of those is a link in a supply chain, and a compromise or a mistake anywhere upstream becomes your incident downstream.

The Links

Link	What you inherit	What can go wrong
Base model	Weights from a vendor/hub	Backdoored or tampered weights, unknown training data
Fine-tune / adapter	A LoRA or full fine-tune	Poisoned training examples, license/PII contamination
Embedding model	The retriever's encoder	Silent version drift, biased or backdoored embeddings
Datasets	Training & eval corpora	Mislabeled, poisoned, or non-compliant data
Dependencies	PyPI/npm packages, CUDA, serving stack	Classic software supply-chain attacks
Retrieved documents	Live corpus content	Planted indirect-injection or poisoned content

Why This Is Sharper for Regulated RAG

In a regulated deployment you are accountable not just for what the model said but for what went into it. An auditor asks: which model version produced this answer, what data was it trained on, where did the retrieved evidence come from, and can you prove none of it was tampered with? "We pulled it off a model hub" is not an answer that survives a HIPAA or SOC 2 audit.

Provenance Is the Through-Line

The defense against an untrustworthy supply chain is provenance: a verifiable record of where each artifact came from, how it was built, and that it hasn't changed since. The rest of this lesson builds that record link by link — attestation for models and data, signing and an AI-SBOM for artifacts, poisoning defenses for what you ingest, and provenance for the retrieved content itself — and connects it to the answer-level citations you built in the Intermediate capstone.

Key Takeaways

A RAG system inherits risk from its whole supply chain: base model, fine-tunes, embedding model, datasets, dependencies, AND the documents it ingests
Regulated deployments make you accountable for inputs, not just outputs — 'we got it off a hub' fails an audit
Provenance — a verifiable record of origin, build, and integrity for each artifact — is the unifying defense, and it ties up to the answer-level citations from the Intermediate capstone

Supply-Chain & Model Provenance

The RAG Supply Chain

The RAG Supply Chain

The Links

Why This Is Sharper for Regulated RAG

Provenance Is the Through-Line

Model & Dataset Provenance + Attestation

Signing & Verifying Artifacts + AI-SBOM

Defending Against Data & Model Poisoning

Provenance for Retrieved Content

AI Learning Assistant

Course Stats

Up Next