Privacy-Preserving & Confidential RAG

Encryption at rest and in transit still leaves data exposed in plaintext the moment it is computed on. This lesson protects data in use: differential privacy, confidential computing and TEEs, per-tenant key management with crypto-shredding, and federated retrieval that keeps the most sensitive corpora in-boundary.

Day 3 Progress0%

Beyond At-Rest/In-Transit: Protecting Data In Use

The Intermediate track's data-layer lesson hardened two of the three states of data: at rest (disk/column encryption, pgcrypto) and in transit (TLS), with Row-Level Security gating who can read a row. That's necessary — and incomplete.

The Third State

The moment a query runs, the database decrypts rows into memory, the embedding model reads plaintext, and the LLM sees the prompt in the clear. Data in use is exposed to the host OS, the hypervisor, a compromised process, a curious operator, and your cloud provider. For the most sensitive corpora — PHI, privileged legal material, classified data — "encrypted at rest and in transit" still means "plaintext on someone else's CPU."

The Advanced Threat Model

This lesson assumes a stronger adversary than the Intermediate track did:

  • The infrastructure operator is not fully trusted (multi-tenant cloud, regulated outsourcing).
  • An attacker may achieve host-level access (memory scraping, side channels).
  • Aggregate leakage matters: even without reading a row, query patterns and model outputs can leak private facts.

The Toolkit

Four families of control, each covered next:

  • Differential privacy — provable bounds on what aggregate outputs reveal about any individual.
  • Confidential computing / TEEs — compute over encrypted data so the host never sees plaintext.
  • Key management & encryption-in-use patterns — per-tenant keys, envelope encryption, crypto-shredding.
  • Federated & partitioned retrieval — keep the most sensitive data in-boundary and never centralize it.

None is a silver bullet; each buys down a specific risk at a specific cost. The skill is matching control to threat.

Key Takeaways
  • Encryption at rest and in transit leave data exposed *in use* — decrypted in RAM during query, embedding, and generation
  • The advanced threat model distrusts the infrastructure operator and assumes possible host-level access and aggregate leakage
  • Four controls buy down in-use risk: differential privacy, confidential computing/TEEs, key management, and federated retrieval

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
55 min
Lessons
5 sections