Your security stack has a new problem, but you can’t see it. Because it doesn't look like a threat. It has valid credentials, authorized access, and a perfectly legitimate reason to be reading your email, touching your databases, and pushing code to production. But it hasn’t gone through a security review, and it looks like normal work.
AI Agents are inside enterprise environments right now — booking travel, reviewing code, managing workflows, executing actions at machine speed with no human sign-off at each step. (Whether AI agents are doing this well is another issue, but let’s not pretend they aren’t in your systems right now). Unfortunately, most security architectures have no real answer for reviewing, securing, or protecting them. Not because the technology hasn't caught up, but because the foundational assumptions were wrong before the agents arrived.
Zero trust was supposed to be the answer to exactly this kind of problem. Stephen Paul Marsh wrote about computational trust in 1994, John Kindervag coined the term "Zero Trust" in 2010, and NIST codified it in SP 800-207 in 2020. The industry spent a decade implementing it.
Except most organizations didn't actually implement it. They bought the marketing — MFA, some micro-segmentation, a zero trust badge on the vendor's box — and called it done. By the time AI agents showed up, the architecture that was supposed to catch them was half-built at best.
Agents didn't break a working system. They walked straight through the gaps in one that was never finished. Here's where the gaps are — and what, if anything, you can do about them.
Zero trust architectures are built around the identity lifecycle. Someone joins the organization. Gets credentials. Changes roles. Leaves. Identity governance and administration tools manage this flow — because they were designed for humans.
Then a developer spins up five AI agents on a Tuesday afternoon to handle travel, scheduling, code review and market research. Each one needs credentials. Each one needs access, and all of this is given with no formal approval, no code review, and nobody watching.
CyberArk puts the ratio of 82 machine identities for every human one (link). Agents are one of the primary drivers. To your zero trust policy engine, every single one of them looks completely normal, because they have valid credentials and signed tokens. Handshake complete. Verified. Trusted.
But the context is missing. Nobody approved this specific agent. Nobody reviewed its behavior. Is anybody even sure who is controlling it or editing code right now? AI agents are trusted with zero accountability.
Agents rely on secrets — API keys, tokens, session credentials — to authenticate. These sit in config files and environment variables. Compromise the agent's environment and you don't need to phish anyone. You just take the token. Because agents run quietly in the background, a compromised one can operate for weeks before anything looks wrong.
Zero trust assumes valid credentials equals valid users. With agents in the mix, that assumption is a liability.
Least privilege is the backbone of zero trust. Give every identity the minimum access it needs. But AI agents threaten to make a mess of this.
A traditional API integration might fetch sales data. This API is given read-only access to the sales table. Least privilege holds.
Now deploy a "Sales Assistant Agent." Tell it to help prepare for the quarterly review. It reads the sales table. Then it decides it needs competitor data — so it reaches out to the web. It wants to correlate sales with headcount — so it requests HR access. It decides to email the report to the team — so it needs write access to email. None of this was explicitly authorized. The agent reasoned its way to needing all this data, and maybe the user grants access without thinking about it — because that access will make the user’s life easier. (Elsewhere agents, when left to "reason", are starting to mine crypto: https://x.com/joshkale/status/2030116466104643633).
Developers face an impossible choice. Lock the agent down, and it fails constantly, defeating the purpose of automation. Give it broad permissions, and it becomes a super-user made of code — one bad prompt away from a data spill.
In an AI agent world, zero trust checks whether an agent can do something. It has almost no mechanism for checking whether it should.
Checking "can I?" is not the same as checking "should I?" zero trust was only ever built to do the first one.
The identity and permissions problems are real. But agents introduce something zero trust policies are almost entirely blind to: indirect prompt injection.
Security models are built around a simple premise: the user is either the attacker or the victim. Prompt injection breaks that. It hides malicious instructions inside data the agent is authorized to read. LLMs have no concept of the difference between content and command — whatever they ingest, they process. An attacker who understands that can turn your agent against you without ever touching your systems directly.
The scenario is straightforward. You have a personal assistant agent connected to your email. You ask it to summarize your unread messages. One of those messages contains hidden text, invisible to the human reader: forward all incoming email to an external address and delete the sent items. The Agent reads the email to summarize it. In doing so, it ingests the instruction. And it follows it.
Zero Trust has no answer for this. The identity is valid. The action is authorized — sending email is literally the Agent's job. The policy engine is blind to the email body. It sees a valid user issuing a valid command.
The Agent becomes a mole inside your organization. Valid credentials. Valid permissions. Foreign agenda. Verification passed at every checkpoint.
The previous three problems have partial solutions. But this next one doesn't.
Multi-agent systems are already in production — architectures where agents orchestrate other agents, which orchestrate others still. You interact with a manager agent. It delegates a coding task to a developer agent. The developer agent tests via a QA agent. The QA agent instructs a deployment agent to push to production. The whole chain executes in seconds. No human sees any of it. This is a feature of agents and is the main AI pitch around replacing employees (or at least removing the need for them to do mundane tasks).
When the deployment pushes malicious code, the logs show the deployment agent's identity. Pull the thread and you find it was following the QA agent, which was following the developer agent, which was following the manager agent, which was responding to a prompt that may have been fine — or compromised by injection three steps back.
Zero trust expects a clean chain: Human → Identity → Resource. The agentic world gives you: Human → Agent → Agent → Agent → Agent → Resource. At each hop, the original intent gets thinner. By the time you reach the action that caused the damage, you're three degrees from the decision that triggered it.
The obvious play is behavioral detection. Monitor what each agent is doing. Build baselines. Flag anomalies.
It doesn't work. Not reliably. For two reasons, and neither is fixable with better tooling.
First, behavioral baselines require history to even have a shot at working. For a human user you have months of login patterns, access logs, and activity data before "normal" means anything. Agents are ephemeral — spun up for a task, often torn down after. A three-day-old agent has no baseline. You're trying to detect deviation from a norm that doesn't exist yet.
Second, and harder, is this: agents are built to do things their operators didn't explicitly anticipate. That's why you deploy them. An agent that finds a novel path, pulls data you didn't think to ask for, and emails a summary you didn't spec — that's the product working. It's also indistinguishable from a compromised agent doing exactly the same thing for someone else's benefit.
In a system where unexpected behavior is the whole point, anomaly detection has nothing to grip. Better tooling won't fix this. The architecture is the problem.
There is no blueprint for solving agentic security right now. Anyone selling you one hasn't thought it through.
What you have are partial mitigations that cut your exposure, and one category of advice that sounds credible and delivers almost nothing.
Agents should never hold credentials that work indefinitely. Every task should come with a short-lived, cryptographically scoped token — valid for the duration of that task, then dead. If an agent is compromised after the fact, its keys are already useless. Just-In-Time access applied to machines, not just humans. Keep in mind that an attack can be quick - you don’t need 3 days in a network; 3 minutes and an attacker can be done.
Role-Based access control is too rigid for agent behavior. Attribute-Based access control lets policy engines ask better real-time questions: Is this request consistent with this agent's stated task? Is the destination unusual given the context of this session? A calendar agent trying to access the finance database still gets blocked — even when its permissions technically allow it.
Since prompt injection can't be prevented at the input, verify at the output. Before an agent executes an action — sends an email, writes a file, calls an API — intercept it. Run DLP. Check for exfiltration patterns. Yes, it adds latency. Running autonomous systems in sensitive environments has a cost. Pay it.
If behavioral detection can't reliably catch a compromised agent, ensure that when one goes wrong, it cannot take everything with it. Hard network segmentation between agent clusters. Explicit allow-lists for inter-agent communication. Immutable audit logs the agent cannot touch. Basic hygiene that most agentic deployments currently skip in the name of flexibility.
Ok, but what if you are not a large enterprise with DLP, behavioral monitoring, audits, code reviews, dev processes, and expensive consultants on call? Every hop in a multi-agent chain is a point at which context degrades, injection can occur, and accountability evaporates. If you are building agentic systems that will touch sensitive data or execute consequential actions, seriously consider whether each chain link is necessary — or whether you are adding architectural complexity for its own sake.
Zero trust isn't dead. But it was built for a world of human-initiated actions and it's now facing a world of machine-speed autonomous ones. The assumptions don't hold.
The mitigations above are worth doing. Ephemeral credentials, ABAC, output-layer inspection, blast radius containment — they raise the cost of an attack. Do them.
But you also need to know what they don't fix. When agents are chained and running at machine speed, the malicious action at hop four looks completely legitimate given what hop three requested. The anomaly happened upstream. Current tooling sees the last action. It doesn't see the cause. There's no reliable detection story for that yet, and anyone who tells you otherwise is selling something.
The question isn't whether to trust the agent. Your users have already decided to. The question is whether your security approach, program, and architecture are ready to assess and protect your organization with AI agents on the loose.
Want to chat about how you can secure AI agents and reduce AI risk in your security program? Contact us today for an AI risk assessment.