Back to Courses

Securing the Data Layer: RLS, Encryption & Multi-Tenancy

The embeddings, source documents, and metadata behind a RAG system are the crown jewels — together they reconstruct everything sensitive you indexed. This lesson secures that layer: Row-Level Security and tenant isolation enforced in the database, encryption at rest and in transit, permission-aware retrieval that filters before the model ever sees a chunk, and the multi-tenant isolation models for regulated workloads.

Day 4 Progress0%

The Data Layer Is the Crown Jewels

A RAG system spreads your most sensitive data across more places than teams usually realize. Securing the prompt is not enough — the data layer underneath is where a breach actually hurts.

What Lives in the Data Layer

For every document you ingest, a production RAG store typically holds three things:

  1. The source text — the original chunk, stored so it can be pasted into the prompt at query time.
  2. The embedding — a vector. People assume vectors are "anonymous numbers." They are not: with the embedding model, an attacker can run embedding inversion to reconstruct a surprising amount of the original text.
  3. The metadatasource, author, patient_id, tenant_id, timestamps, ACLs. Often the most directly sensitive part.

Together, these reconstruct the sensitive corpus you indexed. A vector store full of clinical notes is PHI, even though it looks like floats.

The Core Principle

Retrieval must enforce the same access controls as the system of record. If a user cannot read a document in the source application, RAG must never retrieve it on their behalf.

This sounds obvious, yet it is the most common serious flaw in RAG systems. Teams copy documents from a permissioned system (a wiki, an EHR, a ticketing system) into a single shared vector index — and silently drop every permission in the process. Now any user's query can surface any document.

Why RAG Makes This Worse

In a normal app, if an authorization check is missing, the user has to find the hidden record. In RAG, the LLM finds it for them and helpfully summarizes it into the answer. A single missing filter doesn't just expose a row — it puts that row's contents into fluent prose, attributed and explained. The blast radius of an authorization bug is larger in RAG than almost anywhere else.

The Plan

The rest of this lesson works bottom-up: enforce visibility in the database with Row-Level Security, protect the bytes with encryption, make retrieval itself permission-aware so forbidden chunks never reach the model, and choose a multi-tenant isolation model that matches your compliance bar.

Key Takeaways
  • A RAG store holds source text, invertible embeddings, and metadata — together they reconstruct the sensitive corpus, so the store inherits the data's classification (PHI in = PHI store)
  • Retrieval must enforce the SAME access controls as the system of record; copying documents into one shared index silently drops their permissions
  • An authorization bug is worse in RAG because the LLM surfaces and summarizes the forbidden content instead of merely exposing a row

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
55 min
Lessons
5 sections