What Happens When an AI Agent Gets Compromised, And Nobody Has the Logs, KYDE

In 2025, researchers demonstrated a vulnerability they called ForcedLeak. By embedding malicious instructions in a routine web form, they manipulated a Salesforce Agentforce agent into retrieving sensitive CRM data and exfiltrating it to an external destination. The agent believed it was following legitimate instructions. Nobody noticed until the researchers disclosed it.

This is not a hypothetical threat model. It's a demonstrated attack pattern against production enterprise systems. And the most important question it raises isn't technical, it's operational.

If this happened in your environment, would you know?

The new attack surface

Traditional cybersecurity assumes a relatively stable attack surface. Networks, endpoints, applications, each with known entry points that can be monitored and hardened.

AI agents introduce a fundamentally different surface. Every external input an agent processes is a potential attack vector. Emails. Support tickets. Web forms. Uploaded documents. Database records. Any of these can contain embedded instructions, invisible to human readers, interpretable by the agent, designed to redirect its behavior.

This is prompt injection. At single-agent scale it's a nuisance. At enterprise scale, with agents that have access to financial systems, HR data, customer records, and internal APIs, it's a serious operational risk.

The 2025 AppOmni research on ServiceNow's Now Assist environment demonstrated a related pattern, second-order prompt injection, where malicious instructions introduced through one agent were passed to another, creating a cascade of unintended actions across a multi-agent pipeline. The agents weren't malfunctioning. They were functioning exactly as designed, just with inputs that had been manipulated.

The forensics problem

Here's the scenario most security teams haven't fully thought through.

An AI agent in your procurement stack has been manipulated through a prompt injection in a vendor communication. Over three weeks, it has been surfacing confidential contract terms in responses, making subtly incorrect recommendations, and in two cases authorizing payments to accounts that shouldn't have been scoped.

You find out when a vendor calls to ask about an unexpected payment.

Now the investigation begins. And immediately you hit the forensics problem.

Which inputs did the agent process in the three weeks prior? What instructions did it receive? What did it retrieve from your internal systems, and in what context? Which decisions were influenced by the injected instructions, and which were legitimate? What exactly did it send, and where?

If your agent runs on provider-native logging, you have partial inputs and outputs from the provider's perspective. You don't have the full retrieval context. You don't have the causal chain that led to each decision. You don't have a tamper-evident record that proves the logs haven't been modified since the incident began. And if the provider's infrastructure was part of the attack surface, you may not fully trust the logs at all.

Without a complete, independent, tamper-evident audit trail, the investigation stalls. The full scope of the incident remains unknown. Regulatory reporting becomes a problem · DORA's 72-hour window doesn't pause while you wait for vendor logs.

Why software-layer logging isn't enough

The specific threat model of a compromised agent makes clear why software-layer logging is structurally insufficient.

A sophisticated attacker who has gained influence over an agent's behavior, through prompt injection or other means, may also be able to influence what gets logged, how it gets logged, or whether the logs are preserved. Software logs live on infrastructure that the host OS can access. A compromised environment can alter them silently. By the time the incident is discovered, the evidence trail may already be incomplete.

Tamper-evidence requires cryptographic integrity at the point of capture, outside the agent's execution environment. Every entry signed the moment it's written. Every entry chained to the previous one so any modification is detectable. The record must be independent of the agent, independent of the host, and independent of the LLM provider.

This is not a paranoid threat model. It's the same principle that underlies every independent audit function, the auditor cannot be subject to the influence of the entity being audited.

What detection requires

Catching a compromised agent before the damage accumulates requires behavioral baselines. What does normal look like for this agent, in this role, on this infrastructure? What volume of calls, what endpoints, what retrieval patterns, what cost profile?

Deviation from baseline is the signal. An agent that suddenly starts accessing endpoints outside its normal scope. A cost profile that spikes without a corresponding increase in legitimate workload. Tool calls that don't match the agent's assigned role. These are detectable, but only if you have a baseline, and only if you have a continuous, attributable record of every action to compare against it.

Without that record, you're not running anomaly detection. You're running blind.

The question to ask

Every enterprise deploying AI agents in production should be able to answer one question: if an agent in our fleet was compromised today, how long would it take us to know, and what would the investigation look like?

If the honest answer is "we'd find out when something obviously broke" and "the investigation would depend on vendor logs we can't fully verify", that's the gap.

The agents are running. The attack surface exists. The logs either do or they don't.

KYDE

Kyde captures every agent action in a cryptographically signed, hash-chained audit trail, independent of the agent, independent of the host, independent of the LLM provider. Behavioral baselines and anomaly detection on the roadmap. The record exists before the incident.