Supply-Chain & Model Provenance

A RAG system inherits risk from everything it's built on — base models, fine-tunes, embedding models, datasets, dependencies, and the documents it ingests. Make every one of those inputs verifiable: provenance and attestation, artifact signing and an AI-SBOM, poisoning defenses, and trusted ingestion.

Day 2 Progress0%

The RAG Supply Chain

You did not build most of what your RAG system runs on. You inherited it. A production pipeline stacks a base model, maybe a fine-tune, an embedding model, one or more datasets, a pile of library dependencies — and, uniquely for RAG, the documents it ingests at runtime. Every one of those is a link in a supply chain, and a compromise or a mistake anywhere upstream becomes your incident downstream.

The Links

LinkWhat you inheritWhat can go wrong
Base modelWeights from a vendor/hubBackdoored or tampered weights, unknown training data
Fine-tune / adapterA LoRA or full fine-tunePoisoned training examples, license/PII contamination
Embedding modelThe retriever's encoderSilent version drift, biased or backdoored embeddings
DatasetsTraining & eval corporaMislabeled, poisoned, or non-compliant data
DependenciesPyPI/npm packages, CUDA, serving stackClassic software supply-chain attacks
Retrieved documentsLive corpus contentPlanted indirect-injection or poisoned content

Why This Is Sharper for Regulated RAG

In a regulated deployment you are accountable not just for what the model said but for what went into it. An auditor asks: which model version produced this answer, what data was it trained on, where did the retrieved evidence come from, and can you prove none of it was tampered with? "We pulled it off a model hub" is not an answer that survives a HIPAA or SOC 2 audit.

Provenance Is the Through-Line

The defense against an untrustworthy supply chain is provenance: a verifiable record of where each artifact came from, how it was built, and that it hasn't changed since. The rest of this lesson builds that record link by link — attestation for models and data, signing and an AI-SBOM for artifacts, poisoning defenses for what you ingest, and provenance for the retrieved content itself — and connects it to the answer-level citations you built in the Intermediate capstone.

Key Takeaways
  • A RAG system inherits risk from its whole supply chain: base model, fine-tunes, embedding model, datasets, dependencies, AND the documents it ingests
  • Regulated deployments make you accountable for inputs, not just outputs — 'we got it off a hub' fails an audit
  • Provenance — a verifiable record of origin, build, and integrity for each artifact — is the unifying defense, and it ties up to the answer-level citations from the Intermediate capstone

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
55 min
Lessons
5 sections