The moment an LLM can call tools and take actions, a successful prompt injection stops being a bad answer and starts being a deleted record, a sent email, or a drained budget. Least-privilege tool authorization, sandboxing, human-in-the-loop gates, and defending against injection that arrives through tool outputs and other agents.
The LLM Integration track taught agentic RAG — tool use, function calling, ReAct loops — where the model decides when to search and what action to take. The Intermediate Secure-RAG level taught that retrieved content is untrusted and can carry prompt injection. This lesson is what happens when you combine them: a model that can be hijacked, wired to tools that can act on the world.
In a read-only RAG bot, the worst case of a successful prompt injection is a wrong or leaked answer. Bad — but bounded. Give that same model a set of tools — send_email, delete_record, issue_refund, run_sql, http_request — and the worst case changes completely. Now an injection in a retrieved document can cause the system to take a real, possibly irreversible action on behalf of a user who never asked for it.
This is OWASP LLM Top 10's Excessive Agency: harm that stems not from the model's text but from the capabilities, permissions, and autonomy you granted it.
Excessive agency is the product of three things, and you reduce risk by cutting any of them:
run_shell tool "just in case").Map the trust boundaries of a tool-using RAG system:
| Input to the loop | Trusted? |
|---|---|
| The system prompt / tool definitions | Trusted (you wrote them) |
| The end-user's message | Untrusted |
| Retrieved documents | Untrusted (Intermediate Day 2) |
| Tool outputs | Untrusted (a tool may return attacker-influenced data) |
| Another agent's messages | Untrusted (it may itself be compromised) |
The dangerous realization: in an agent loop, the model's own next prompt is assembled from tool outputs and retrieved text — so injection can enter mid-loop, after all your input-side checks already ran. The rest of this lesson is the controls that keep a hijacked reasoning step from becoming a harmful action.