Securing Agentic & Multi-Tool RAG

The moment an LLM can call tools and take actions, a successful prompt injection stops being a bad answer and starts being a deleted record, a sent email, or a drained budget. Least-privilege tool authorization, sandboxing, human-in-the-loop gates, and defending against injection that arrives through tool outputs and other agents.

Day 4 Progress0%

Agents Expand the Blast Radius

The LLM Integration track taught agentic RAG — tool use, function calling, ReAct loops — where the model decides when to search and what action to take. The Intermediate Secure-RAG level taught that retrieved content is untrusted and can carry prompt injection. This lesson is what happens when you combine them: a model that can be hijacked, wired to tools that can act on the world.

From "Bad Answer" to "Bad Action"

In a read-only RAG bot, the worst case of a successful prompt injection is a wrong or leaked answer. Bad — but bounded. Give that same model a set of tools — send_email, delete_record, issue_refund, run_sql, http_request — and the worst case changes completely. Now an injection in a retrieved document can cause the system to take a real, possibly irreversible action on behalf of a user who never asked for it.

This is OWASP LLM Top 10's Excessive Agency: harm that stems not from the model's text but from the capabilities, permissions, and autonomy you granted it.

The Three Ingredients of Excessive Agency

Excessive agency is the product of three things, and you reduce risk by cutting any of them:

  • Excessive permissions — the tool's credentials can do more than the use case needs (a "read a calendar" tool whose token can also delete events).
  • Excessive functionality — the agent has tools it doesn't need for the task (a support bot with a run_shell tool "just in case").
  • Excessive autonomy — the agent executes high-impact actions with no human confirmation.

The Agentic Threat Model

Map the trust boundaries of a tool-using RAG system:

Input to the loopTrusted?
The system prompt / tool definitionsTrusted (you wrote them)
The end-user's messageUntrusted
Retrieved documentsUntrusted (Intermediate Day 2)
Tool outputsUntrusted (a tool may return attacker-influenced data)
Another agent's messagesUntrusted (it may itself be compromised)

The dangerous realization: in an agent loop, the model's own next prompt is assembled from tool outputs and retrieved text — so injection can enter mid-loop, after all your input-side checks already ran. The rest of this lesson is the controls that keep a hijacked reasoning step from becoming a harmful action.

Key Takeaways
  • Tools turn a successful prompt injection from a bad answer into a real, possibly irreversible action — OWASP LLM 'Excessive Agency'
  • Excessive agency = excessive permissions x excessive functionality x excessive autonomy; cutting any one factor reduces blast radius
  • In an agent loop the model's next prompt is built from untrusted tool outputs and retrieved text, so injection can arrive AFTER input-side checks ran

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
55 min
Lessons
5 sections