Name: Securing the Data Layer: RLS, Encryption & Multi-Tenancy
Availability: InStock

The Data Layer Is the Crown Jewels

A RAG system spreads your most sensitive data across more places than teams usually realize. Securing the prompt is not enough — the data layer underneath is where a breach actually hurts.

What Lives in the Data Layer

For every document you ingest, a production RAG store typically holds three things:

The source text — the original chunk, stored so it can be pasted into the prompt at query time.
The embedding — a vector. People assume vectors are "anonymous numbers." They are not: with the embedding model, an attacker can run embedding inversion to reconstruct a surprising amount of the original text.
The metadata — source, author, patient_id, tenant_id, timestamps, ACLs. Often the most directly sensitive part.

Together, these reconstruct the sensitive corpus you indexed. A vector store full of clinical notes is PHI, even though it looks like floats.

The Core Principle

Retrieval must enforce the same access controls as the system of record. If a user cannot read a document in the source application, RAG must never retrieve it on their behalf.

This sounds obvious, yet it is the most common serious flaw in RAG systems. Teams copy documents from a permissioned system (a wiki, an EHR, a ticketing system) into a single shared vector index — and silently drop every permission in the process. Now any user's query can surface any document.

Why RAG Makes This Worse

In a normal app, if an authorization check is missing, the user has to find the hidden record. In RAG, the LLM finds it for them and helpfully summarizes it into the answer. A single missing filter doesn't just expose a row — it puts that row's contents into fluent prose, attributed and explained. The blast radius of an authorization bug is larger in RAG than almost anywhere else.

The Plan

The rest of this lesson works bottom-up: enforce visibility in the database with Row-Level Security, protect the bytes with encryption, make retrieval itself permission-aware so forbidden chunks never reach the model, and choose a multi-tenant isolation model that matches your compliance bar.

Key Takeaways

A RAG store holds source text, invertible embeddings, and metadata — together they reconstruct the sensitive corpus, so the store inherits the data's classification (PHI in = PHI store)
Retrieval must enforce the SAME access controls as the system of record; copying documents into one shared index silently drops their permissions
An authorization bug is worse in RAG because the LLM surfaces and summarizes the forbidden content instead of merely exposing a row

Securing the Data Layer: RLS, Encryption & Multi-Tenancy

The Data Layer Is the Crown Jewels

The Data Layer Is the Crown Jewels

What Lives in the Data Layer

The Core Principle

Why RAG Makes This Worse

The Plan

Row-Level Security & Tenant Isolation

Encryption at Rest & In Transit

Permission-Aware Retrieval

Multi-Tenant RAG Architecture

AI Learning Assistant

Course Stats

Up Next