From LLM to agentic AI: prompt injection got worse

Agentic AI attack chains

TL;DR

Agentic AI systems transform prompt injection from an isolated model manipulation into coordinated multi-tool attack chains. According to the OWASP Top 10 for Agentic Applications 2026, what was once a single manipulated output can now hijack an agent’s planning, execute privileged tool calls, persist malicious instructions in memory, and propagate attacks across connected systems. Organizations deploying agentic AI must implement defense-in-depth controls including input validation on all data sources, goal-lock mechanisms, tool sandboxing with minimal privileges, and strategic human-in-the-loop approval for high-impact actions.

Read on if you're moving from single-model LLM integrations to agentic systems with tool access — prompt injection risks scale with every capability you add.

This post is part of my series on securing agentic AI systems, covering attack surfaces, defense patterns, and threat modeling for AI agents.

Prompt injection has topped the OWASP Top 10 for LLM Applications since the list’s inception. For simple chatbot integrations, this vulnerability typically meant a user could trick the model into ignoring its instructions or leaking its system prompt. Annoying, sometimes embarrassing, but often contained.

Then came the era of Agentic AI.

In June 2025, researchers disclosed EchoLeak (CVE-2025-32711), a zero-click prompt injection vulnerability in Microsoft 365 Copilot rated CVSS 9.3 (Critical). Without any user interaction, an attacker’s carefully crafted email could coerce Copilot into accessing internal files and transmitting their contents to an attacker-controlled server. A single injection, delivered via a benign-looking email, cascaded through the agent’s retrieval capabilities to exfiltrate chat logs, OneDrive files, SharePoint content, and Teams messages.

This is the new reality. What was once a single manipulated output has become orchestrated multi-tool chains achieving unintended outcomes. The business impact is severe: unauthorized data exfiltration, regulatory exposure under GDPR and similar frameworks, reputational damage from compromised AI assistants acting on behalf of your organization, and potential liability when an agent takes actions your users never authorized. And as organizations race to deploy agentic systems (Gartner predicts that 40% of enterprise applications will integrate AI agents by 2026), the attack surface is expanding faster than most security teams realize.

In this post, I will walk through why agentic systems fundamentally amplify prompt injection risks, how to evolve your security controls for this new paradigm, and the defense-in-depth architecture patterns that can help contain the blast radius when, not if, an injection succeeds.

The amplification effect

To understand why prompt injection becomes dramatically worse in agentic systems, we need to examine what changes when you move from a stateless LLM call to an autonomous agent.

In a traditional LLM integration, prompt injection (OWASP LLM01) typically affects a single model interaction. The attacker manipulates the prompt, the model produces an unintended output, and that output is returned to the user or passed to one downstream system. The blast radius is limited by the scope of that single inference call.

Agentic systems change this equation entirely. The OWASP Top 10 for Agentic Applications 2026 introduces ASI01 (Agent Goal Hijack), which captures the broader agentic impact where a manipulated input doesn’t just alter one output. It redirects goals, planning, and multi-step behavior across the entire agent workflow.

Consider the differences in attack progression. In a simple LLM chatbot, an attacker injects a prompt that makes the model reveal its system prompt or produce harmful content. The damage is contained to that conversation. In an agentic system, that same injection can now hijack the agent’s planning process, causing it to select different tools than intended. The agent might execute those tools with the user’s inherited privileges. Results from one compromised tool call flow into the next iteration of reasoning. The agent might persist malicious instructions in memory for future sessions. And in multi-agent architectures, the compromised agent can propagate tainted instructions to peer agents.

The key insight from the OWASP Agentic security guidance is this: agents amplify existing LLM vulnerabilities. What was a single manipulated output becomes an orchestrated multi-tool kill chain achieving unintended outcomes.

The “Promptware kill chain”

Researchers (Schneier et al., 2026) have begun modeling these multi-step attacks using a framework they call the Promptware Kill Chain, treating prompt injection payloads as a new class of malware that executes in natural language space rather than machine code.

The kill chain proceeds through five stages:

Initial access occurs when the payload enters the LLM’s context via direct or indirect prompt injection, through user input, a poisoned document, a malicious email, a website with hidden malicious commands, or compromised RAG data.
Privilege escalation happens when jailbreaking techniques bypass safety training, allowing the payload to overcome the model’s built-in guardrails.
Persistence is achieved when the payload corrupts long-term memory, ensuring it survives across sessions.
Lateral movement spreads the attack across users, devices, connected services, or other agents in multi-agent architectures.
The attacker achieves their actions on objective, whether that is data exfiltration, unauthorized transactions, or system compromise.

This model helps explain why traditional prompt injection defenses, focused solely on input filtering, fail in agentic contexts. By the time you detect the injection, the agent may have already executed multiple tool calls, persisted malicious data, and propagated to other systems.

Indirect injection

The primary agentic attack vector

While direct prompt injection (where a user explicitly crafts malicious input) remains a concern, indirect prompt injection has emerged as the dominant threat vector for agentic systems.

Indirect injection occurs when malicious instructions are embedded in external data sources that the agent retrieves and processes: documents summarized by a RAG (Retrieval-Augmented Generation) pipeline, emails processed by an assistant, web pages fetched during research, calendar invitations parsed for scheduling, code repositories analyzed during development, and API responses from third-party services.

The agent cannot reliably distinguish between legitimate content and attacker-controlled instructions. As OpenAI acknowledged in December 2025, prompt injection “is unlikely to ever be fully solved” because it represents a fundamental architectural challenge: blending trusted and untrusted inputs in the same context window.

This is why the EchoLeak attack was so effective. The injection payload was embedded in a benign-looking email, a data source Copilot was designed to process. The payload didn’t need to trick a human; it only needed to be parsed by the agent’s retrieval system.

The MCP attack surface

As agentic AI adoption accelerates, the Model Context Protocol (MCP) has emerged as a standard for connecting LLMs to external tools. While MCP provides a structured way to define tool capabilities, it also introduces a significant attack surface that deserves dedicated attention.

Key attack vectors include tool poisoning (malicious instructions in tool descriptions), rug pull attacks (tools mutating behavior after approval), and cross-tool contamination (compromised servers influencing legitimate tools through shared context).

I’ll cover MCP-specific vulnerabilities and defense strategies in depth in my post on MCP security.

Evolving your security controls

The migration checklist

If you are moving from simple LLM integrations to agentic architectures, or building agentic systems from scratch, here are the security controls that must evolve.

Input validation must expand

For traditional LLM integrations, input validation typically focused on the user prompt: checking length limits, filtering known injection patterns, and perhaps running a classifier to detect malicious intent.

For agentic systems, you must validate every data source the agent touches. This includes user prompts (direct injection defense), RAG corpus contents (indirect injection defense), tool responses and API payloads, email and document contents before summarization, MCP tool descriptions and metadata, and inter-agent messages in multi-agent architectures.

The validation approach should combine syntactic checks (length limits, format validation), semantic analysis (“Does this content contain instruction-like patterns?”), and provenance tracking (“Where did this data originate, and do we trust that source?”).

For practical implementation, consider deploying prompt-injection classifiers such as LLM Guard, complemented by output-validation frameworks like Guardrails AI, as validation and control layers around the LLM. These open-source tools help detect common injection patterns and enforce constraints at different stages of the pipeline, ideally before untrusted content can influence agent behavior.

In a RAG pipeline, tag each retrieved chunk with its source and trust level, then include this provenance metadata in the context so downstream validation can apply appropriate scrutiny.

I’ll cover RAG-specific vulnerabilities and defense strategies in depth in my post on RAG security.

Output handling requires context-aware encoding

The principle from OWASP LLM05 (Improper Output Handling) becomes even more critical in agentic systems: treat all model output as untrusted user input.

Before any LLM-generated content flows to a downstream system, apply context-appropriate encoding. For HTML contexts, use HTML entity encoding. For SQL contexts, use parameterized queries. Never let the LLM generate raw SQL that is directly executed. For shell contexts, avoid this entirely if possible; if you must, use sandboxing and strict allowlists rather than blocklists. For JavaScript contexts, apply JSON encoding and strict Content Security Policies. For inter-agent messages, validate structure and content before processing.

The key insight is that LLM output should never be passed directly to any interpreter, whether that is a database engine, a shell, a browser, or another agent, without proper validation, encoding, and guards.

Privilege scope must be per-tool, per-task

In simple integrations, you might give the LLM access to a single API with a long-lived token. Agentic systems demand a more granular approach.

Implement per-tool privilege profiles that define exactly what each tool can access, what actions it can perform, what rate limits apply, and what egress destinations are allowed. An email summarization tool should have read-only access to email, not the ability to send or delete messages.

Use short-lived, task-scoped credentials rather than persistent tokens. If an agent needs database access for a specific query, issue a token that expires after that task completes and is scoped to read-only access on the relevant tables.

Consider the blast radius of each privilege grant. If this tool were compromised via prompt injection, what is the worst-case outcome? Design your privilege model to minimize that worst case.

I’ll cover agent identity and IAM-specific defense strategies in depth in an upcoming post.

Human-in-the-loop must be strategic

The OWASP Agentic guidance emphasizes human-in-the-loop (HITL) controls for high-impact actions. But HITL can become a bottleneck, or worse, a rubber-stamp exercise where reviewers approve everything without scrutiny.

Design risk-based HITL controls rather than applying blanket approval requirements. Implement tiered approvals where low-risk, read-only operations proceed automatically, medium-risk write operations require one-click confirmation, and high-risk destructive or irreversible operations demand detailed review with a preview of what will happen.

Implement pre-execution diffs that show the reviewer exactly what the agent intends to do before it does it. For a file modification, show the diff; for an email send, show the full message and recipients; for a database write, show the exact records that will change.

Protect against HITL fatigue by batching similar low-risk requests and making sure high-risk requests are rare enough that reviewers give them genuine attention. If reviewers are approving hundreds of requests per day, the control has failed.

Memory isolation prevents cross-session contamination

Agentic systems often maintain memory across sessions to provide context and personalization. This memory becomes a persistence vector for prompt injection attacks. An attacker who can write to the agent’s memory can influence all future interactions.

Implement memory segmentation that isolates user sessions and domain contexts from each other. One user’s conversation should never leak into another user’s context. Where shared memory is necessary (for example, organizational knowledge), implement strict validation before any content is committed to shared state.

Scan all memory writes for instruction-like content. If a user’s conversation includes text that looks like a system prompt or tool invocation, that should trigger additional scrutiny before persistence.

Maintain snapshots and rollback capabilities so you can recover from memory poisoning attacks.

Defense-in-depth for agentic systems

Single-layer defenses fail against multi-step attacks. The solution is defense-in-depth: multiple independent security controls at each layer of the agentic architecture, so that a failure in one control does not lead to complete compromise.

Layer 1: Input Perimeter

At the input perimeter, implement prompt injection classifiers that detect known attack patterns. Route all natural-language inputs, whether from users, documents, or external systems, through these classifiers. Apply Content Disarm and Reconstruction (CDR) to documents before the agent processes them, stripping potentially malicious elements while preserving legitimate content.

Maintain trust levels for different input sources. Direct user input might be “medium trust,” while content from external websites is “low trust,” and verified internal systems are “high trust.” These trust levels should influence how aggressively you validate and constrain the content.

I’ll cover multimodal prompt injection vulnerabilities (via video, audio, images) and defense strategies in depth in my post on multimodal attacks.

Layer 2: Goal and Planning Validation

Before the agent executes a plan, validate that the plan aligns with the intended goal. Define explicit, auditable goals in the system configuration, not just in the system prompt, which can be manipulated.

Implement goal-lock mechanisms that detect unexpected shifts in the agent’s objectives. If a user asked for email summarization and the agent is suddenly planning to access the file system, that deviation should trigger an alert or require confirmation.

Use a separate validation model (distinct from the primary agent) to assess whether the planned actions are consistent with the stated goal. This “guardian” pattern works by feeding the agent’s proposed plan to a smaller, faster model with a strict prompt: “Given the user’s original request X, does this plan contain any actions that are not directly necessary to fulfill X? Flag any file system access, network calls, or data exports that appear unrelated to the stated goal.” This provides defense against attacks that successfully compromise the primary model’s reasoning, at the cost of additional latency and compute. That’s a worthwhile tradeoff for high-stakes operations.

Layer 3: Tool Execution Sandboxing

Run all tool executions in isolated sandboxes with restricted network access, file system access, and privilege levels. The agent should never run as root or with administrative privileges.

Implement outbound network allowlists so that even a compromised tool cannot exfiltrate data to arbitrary destinations or establish Command-and-Control (C2) channels. If a tool needs to make HTTP requests, specify exactly which domains it can contact.

For code execution capabilities, increasingly common in agentic systems, use taint tracking on generated code and require safe interpreters that restrict dangerous operations. Ban eval() and equivalent functions with untrusted content.

Layer 4: Output Validation and Encoding

Before any output reaches a downstream system or user, validate that it conforms to expected formats and does not contain suspicious patterns. Apply context-appropriate encoding as described earlier.

Implement anomaly detection on outputs to identify responses that deviate significantly from expected patterns. This can catch attacks that successfully evade input-side defenses.

Layer 5: Monitoring and Response

Log all agent actions, tool invocations, memory operations, and inter-agent communications. These logs should be tamper-evident and retained long enough to support incident investigation.

Implement real-time anomaly detection that can identify attack patterns across the kill chain: unusual sequences of tool calls, unexpected data access patterns, signs of privilege escalation or lateral movement.

Maintain kill switches that can immediately revoke an agent’s credentials and halt its operations if a compromise is detected. In multi-agent systems, implement circuit breakers that can isolate a compromised agent from its peers.

Checklist for agentic security

When reviewing code that implements agentic AI features, use this checklist:

For input handling, ask: Are all user inputs validated before reaching the LLM? Are indirect inputs (files, URLs, emails, RAG data) sanitized? Is there a trust classification for different input sources?
For output handling, ask: Is LLM output encoded appropriately for the target context? Is there validation before downstream use? Are parameterized queries used for any database operations?
For privilege scope, ask: Does each tool have minimum necessary permissions? Are credentials short-lived and task-scoped? Is there a documented blast radius for each privilege grant?
For human approval, ask: Are high-impact actions gated by human confirmation? Is there a pre-execution preview? Is the approval flow resistant to fatigue attacks?
For memory handling, ask: Is memory properly segmented by user and session? Are memory writes scanned for injection patterns? Is there rollback capability?
For monitoring, ask: Are all agent actions logged with sufficient detail? Is there anomaly detection? Are kill switches and circuit breakers implemented?

Quick wins: where to start

If you cannot implement the full defense-in-depth architecture immediately, prioritize these five controls that provide the highest security ROI for the least effort:

Implement outbound network allowlists. Most agentic systems do not need to contact arbitrary internet destinations. Restrict egress to only the domains your tools legitimately require. This single control can prevent most data exfiltration scenarios.
Require human approval for all write and delete operations. Start with a simple rule: any action that modifies external state requires a human click. You can refine the granularity later.
Deploy a prompt injection classifier on all external inputs. These checks can be integrated easily and will catch the most common injection patterns in documents and emails.
Audit your current MCP tool permissions. Create a simple spreadsheet listing each tool, what it can access, and what happens if it is compromised. This exercise alone often reveals unnecessary privileges that can be immediately revoked.
Enable comprehensive logging. You cannot detect what you do not log. Make sure all tool invocations, their inputs, and their outputs are recorded with timestamps and user context.

In the long term: Build the complete defense-in-depth architecture, including goal validation, memory isolation, and real-time anomaly detection. Establish incident response procedures specific to agent compromise.

The shift to agentic AI is inevitable and offers tremendous value. But it also requires us to evolve our security thinking from protecting individual model interactions to securing autonomous systems that plan, decide, and act across multiple steps and services. Organizations that build security in from the start will be the ones that succeed. Those that scramble to retrofit controls after the first headline-grabbing breach will not.

Stay tuned—this is just the start of a series of GenAI-focused blog posts, where I’ll dive deep into the security nuances of advanced threat modeling for agentic AI, as well as critical controls for technologies like Model Context Protocol (MCP) and Retrieval-Augmented Generation (RAG).

If this resonated...

If you’re working on GenAI or agentic systems and want to better understand the security risks, I offer agentic AI security assessments covering prompt injection, MCP tool security, memory poisoning, RAG security, and defense architecture.