↑ Resources / Security

Agent Hijacking: The Security Risk Most Enterprises Can't Even Detect

Security · 5 min read · May 2026

In December 2025, OWASP released its first security framework specifically for autonomous AI agents. The OWASP Top 10 for Agentic Applications 2026 was developed by more than 100 industry experts, researchers, and practitioners, and identifies the ten most critical risks facing autonomous AI agents.

At the top of the list — ranked ASI01 — is Agent Goal Hijacking.

According to a Dark Reading poll, 48% of cybersecurity professionals identify agentic AI as the number-one attack vector heading into 2026 — outranking deepfakes, ransomware, and supply chain compromise. Yet only 34% of enterprises have AI-specific security controls in place.

That gap — between the threat and the controls — is where Agent Hijacking lives. And for most enterprises, it is completely invisible.

What Agent Goal Hijacking actually is

ASI01 — Agent Goal Hijacking — occurs when attackers manipulate an agent's objectives through poisoned inputs like emails, documents, or web content. Because agents cannot reliably distinguish instructions from data, a single malicious input can redirect an agent to perform harmful actions using its legitimate tools and access.

This is the critical distinction from traditional cyberattacks. The agent is not compromised in the conventional sense — no credentials are stolen, no system is breached. The agent continues to use its legitimate tools, its legitimate access, and its legitimate identity. It simply pursues a goal it was not supposed to pursue.

Goal hijacking turns an agent into an unintentional insider threat. From the outside — and from most security monitoring tools — it looks like normal agent activity.

How an attack unfolds — a concrete example

A procurement agent in a financial institution processes vendor communications. It has access to the procurement system, the payment API, and the internal document management system. Its legitimate function: review vendor invoices, verify against purchase orders, flag discrepancies, and queue approved payments.

An attacker sends a vendor communication containing a hidden instruction — invisible in normal reading, interpretable by the agent — embedded in the invoice PDF. The instruction is specific: retrieve the list of approved vendor accounts, compare against a target account number, and if the target is not on the list, add it under the cover of a routine database update.

The agent processes the invoice. It follows the embedded instruction. It adds the account. The action is logged in the procurement system as a routine update — because that is exactly what it was, from the system's perspective.

Three weeks later, a payment authorization goes to the newly added account. The agent approves it. The funds transfer.

The fraud surfaces when a legitimate vendor calls about an unpaid invoice.

Now the investigation begins. Which inputs did the agent process in the weeks prior? Which instruction redirected its behavior? What exactly did it retrieve, update, and authorize? When?

If the answer depends on provider-native logs, the investigation is already compromised. The poisoned input may not be in those logs. The causal chain — the sequence of retrievals and tool calls that led to the payment — is not reconstructable. The tamper-evidence that would prove the logs haven't been modified since the incident began does not exist.

Why classical security tools don't catch this

The reason is architectural. Traditional security tools are built around a specific threat model: unauthorized access. An attacker gains access they shouldn't have, and the security tool detects the unauthorized access.

Agent Goal Hijacking doesn't involve unauthorized access. The agent is authorized. Its credentials are valid. Its tool calls are within scope. Every action it takes is, from an access-control perspective, legitimate.

Signature-based detection fails because there is no malware signature — the agent is using its own legitimate capabilities.

Network monitoring fails because the traffic patterns are normal — the agent communicates with the same endpoints it always does.

DLP tools fail because the data movement may be within authorized parameters — the agent is moving data it is permitted to move, to destinations it is permitted to reach.

The only detection mechanism that works is behavioral — and behavioral detection requires a baseline. What does normal look like for this agent, in this role, processing these kinds of inputs? When did behavior deviate? In what sequence did the deviation produce consequences?

That requires a complete, attributable, tamper-evident record of every agent action. Not a dashboard. Not a vendor log. A forensically usable audit trail that exists before the incident, not assembled after it.

Why this is a NIS-2 problem, not just a security problem

NIS-2 Article 23 requires essential and important entities to notify their competent authority of significant incidents within 24 hours of becoming aware, with a full incident report within 72 hours.

For an AI agent incident, that 72-hour window requires immediate access to a complete forensic picture: what the agent processed, what it did, what systems it touched, what data it accessed or exfiltrated, and in what sequence. The notification must be specific — not "we believe our procurement agent may have been manipulated" but a precise account of the incident scope and impact.

Without a tamper-evident audit trail that predates the incident, that account cannot be produced. The organization knows something went wrong. It cannot demonstrate what, when, or how. It cannot bound the scope of the breach. It cannot demonstrate to the competent authority that its incident response was adequate.

NIS-2 does not have an exception for "we didn't have adequate logging." The reporting obligation exists regardless. What changes is whether the organization can meet it — or faces a separate regulatory consequence for failing to maintain the infrastructure that would have made compliance possible.

What detection and response actually requires

Three capabilities are needed to detect Agent Goal Hijacking and respond adequately under NIS-2:

Behavioral baselines per agent role. Normal behavior for a procurement agent is quantifiable: typical input volume, typical tool call patterns, typical endpoints reached, typical cost profile. Deviation from baseline is the detection signal. Without a baseline built from a continuous, attributable record of every action, anomaly detection is impossible.

Causal context per action. When an anomaly is detected, the investigation requires reconstructing the full decision chain: what the agent was shown before it took the action, what it retrieved, what instructions it processed. This is not input/output logging. It is causal context capture — the state of the agent's context at the moment of every tool call.

Tamper-evident records. A sophisticated attacker who has influenced an agent's behavior may also influence what gets logged. Software logs on a compromised host can be altered silently. The forensic record must be cryptographically signed at the point of capture and hash-chained so that any modification is detectable — independent of the agent, independent of the host, independent of the LLM provider.

None of these capabilities exist in provider-native logging. They require a governance layer that sits in the data path — intercepting every action, capturing every context, signing every record — before it reaches the systems the agent acts on.

The window before this becomes a mandatory lesson

Hidden prompts turned copilots into silent exfiltration engines. Agents bent legitimate tools into destructive outputs, and leaked credentials let them operate far beyond their intended scope. These are not hypothetical scenarios from a research paper. They are documented incidents from production systems, cited by OWASP as the empirical basis for the 2026 framework.

Agent Goal Hijacking is already happening. The enterprises that will navigate it successfully are not the ones with the best incident response — it is the ones that built the forensic infrastructure before the incident occurred.

The audit trail that satisfies NIS-2's 72-hour reporting requirement cannot be built after the breach is discovered. It needs to exist continuously, for every agent, before any incident that might require it.

↳ KYDE

KYDE sits in the data path of every AI agent action — capturing causal context, signing every record at the point of capture, and producing the tamper-evident audit trail that behavioral detection and NIS-2 reporting both require. Before any incident.