Securing Agents Without Defeating Their Purpose

TL;DR

  • Security teams are stuck choosing between agents that can't do anything and agents that can do everything. Both are failures.
  • The standard fixes (read-only mode, approve-every-action, split sessions) don't work. They either kill the agent's usefulness or create the illusion of security.
  • We believe the fix is permissions that narrow dynamically based on what the agent has already done. Keep the capability. Cut the specific risk.

Everyone's figured out that agents are powerful. Claude, Cursor, and coding agents in general do things that would have seemed implausible two years ago. Companies are building on top of them as fast as they can. And security leaders, as one CISO put it to me recently, are just trying to “hold onto the tail of the dragon”.

This post is about that tension. More specifically, it's about why most of the "solutions" to agent security are worse than the problem.

The problem is the whole point

At Oso, we do real-time enforcement of deterministic logic. When you click around in Brex, Webflow, Productboard, Vanta, or 1Password, their backends are calling Oso to decide whether you're allowed to do what you're trying to do. Lately we spend a lot of time helping customers adopt agents safely, because we keep seeing the same pattern: companies get excited about agents, realize they have no safety guarantees, and either block them entirely or neuter them until they're useless.

The specific problem is overpermissioning. Hook an agent up to email, databases, file systems, code, and it has the ability to do things. That's the value. It's also how it does the wrong things, through hallucination, prompt injection, or plain misuse.

Some examples, which we log in our Agents Gone Rogue registry:

  • An agent caused a 13-hour AWS outage after deciding to delete and recreate an environment.
  • Notion AI was weaponized via invisible prompt injection text in a job applicant's resume, causing it to read internal documents and exfiltrate them by encoding data in URL query parameters to a web search tool call.

The second one is worth dwelling on. It's not one bad action. It's a sequence of individually plausible actions that compounds into a breach. That's what makes this hard.

A lawyer with a Gmail MCP

Imagine a lawyer using Claude with a Gmail MCP. Useful: scheduling, follow-ups, coordinating with clients. Except lawyers have attorney-client privilege. They can't take information from one client's email and share it with anyone outside that conversation.

A naive agent with read and send access to Gmail can absolutely do that, because it's instructed to, because it hallucinates, or because an adversary slips a prompt injection into an incoming email.

So what do you do?

How we're thinking about the problem

The insight comes from information flow control. The question isn't "can the agent send email?" It's "given what the agent has already seen, what should it be allowed to send, and to whom?"

Here's how that works in the lawyer example:

  • The agent reads an email. That email has a specific set of participants.
  • From that point on, the agent can send email, but only to recipients in that participant list.
  • It cannot send to anyone outside the thread.

The lawyer's agent can read a client email and send a follow-up to that client. It cannot read one client's email and send information to opposing counsel. The constraint is narrow, not total. You keep the capability and cut the specific risk.

These dynamic permissions tighten based on what the agent has already done. Traditional authorization is stateless: a user has permission to do X, or they don't, regardless of history. Agents are essentially stateful. They accumulate context. Their permissions should accumulate constraints.

We demoed this on a Claude session at the MCP Connect event recently. With dynamic permissions in place, sending to someone outside the email thread gets a tool use denial. Sending to the original participants works fine.

This is what securing agents without defeating their purpose looks like. The agent still does useful work. It just can't do the specific harmful things you don't want it to do.

The alternatives don't work

Before we got here, we watched customers try everything else. None of it works.

Turn off the send tool. If the agent can only read, it can't leak anything. You've also eliminated every workflow that involves sending. You've secured the agent by making it not an agent. We call these impotent agents.

Read-only everywhere. This is a real AI security policy we see in the wild. "All agents get read access; no agent changes state." Sounds prudent. Produces no productivity gains.

Human-in-the-loop approval. The most common approach. Claude does this by default. The problem: how many times have you asked Claude to do something complex, tabbed away, and come back to find it stopped three steps in asking "Is it okay if I read GitHub documentation?" and did nothing after that? And when you are watching: if you have a hundred client emails to follow up on and you're clicking Approve for each send, you stop thinking about it. That's not security. That's the illusion of security. Human is the loop, not human in the loop.

Split read and write into separate sessions. Choose at the start: reading or writing today? Closer to the right idea, but it breaks the workflow you actually wanted. You can't read an email, notice it hasn't been replied to, and send a follow-up.

Taint the context window. Once the agent reads an email, the whole context is tainted and no sends are allowed. Better. Still too blunt. The agent can't read a client email and send a follow-up to that same client.

Every one of these is a different shape of the same mistake: treating the question as binary when it isn't.

Why this generalizes

The principle is: narrow permissions dynamically based on what the agent has already done. The sequence of tool calls an agent makes is a signal that should update what subsequent actions are permitted.

We enforce at the network level rather than inside any specific agent or MCP. That means the same policy applies to Claude, to OpenAI-based agents, to your own in-house agents. You don't re-implement it per agent, and employees can't opt out. For common MCP patterns, we write the policy for you. For custom MCPs and more complex cases, you extend it.

The core claim

Agents get blocked from real work not because they're too dangerous, but because the tools for constraining them have been too blunt. "Lock everything down" and "approve every action" aren't solutions to overpermissioning. They're different kinds of failure.

The right answer is fine-grained authorization that's specific about which actions are dangerous in which contexts, rather than restrictions so broad they eliminate the point of the agent.

Authorization is hard in traditional applications. In agents, it's harder. The attack surface isn't just who can do what on which resource. It's what can the agent do next, given everything it's already done.

That's what we're working on.

Interested in what this looks like in practice? Book a demo or check out the docs.

Want us to remind you?
We'll email you before the event with a friendly reminder.

Frequently asked questions

About the author

Graham Neray

Cofounder and CEO

Ready to get started?