A RAG system inherits risk from everything it's built on — base models, fine-tunes, embedding models, datasets, dependencies, and the documents it ingests. Make every one of those inputs verifiable: provenance and attestation, artifact signing and an AI-SBOM, poisoning defenses, and trusted ingestion.
You did not build most of what your RAG system runs on. You inherited it. A production pipeline stacks a base model, maybe a fine-tune, an embedding model, one or more datasets, a pile of library dependencies — and, uniquely for RAG, the documents it ingests at runtime. Every one of those is a link in a supply chain, and a compromise or a mistake anywhere upstream becomes your incident downstream.
| Link | What you inherit | What can go wrong |
|---|---|---|
| Base model | Weights from a vendor/hub | Backdoored or tampered weights, unknown training data |
| Fine-tune / adapter | A LoRA or full fine-tune | Poisoned training examples, license/PII contamination |
| Embedding model | The retriever's encoder | Silent version drift, biased or backdoored embeddings |
| Datasets | Training & eval corpora | Mislabeled, poisoned, or non-compliant data |
| Dependencies | PyPI/npm packages, CUDA, serving stack | Classic software supply-chain attacks |
| Retrieved documents | Live corpus content | Planted indirect-injection or poisoned content |
In a regulated deployment you are accountable not just for what the model said but for what went into it. An auditor asks: which model version produced this answer, what data was it trained on, where did the retrieved evidence come from, and can you prove none of it was tampered with? "We pulled it off a model hub" is not an answer that survives a HIPAA or SOC 2 audit.
The defense against an untrustworthy supply chain is provenance: a verifiable record of where each artifact came from, how it was built, and that it hasn't changed since. The rest of this lesson builds that record link by link — attestation for models and data, signing and an AI-SBOM for artifacts, poisoning defenses for what you ingest, and provenance for the retrieved content itself — and connects it to the answer-level citations you built in the Intermediate capstone.