TL;DR

Over-permissioned agents fail in predictable ways. Teams onboard them with broad connectors and scopes to get value quickly, then never tighten access as usage expands.

You can mitigate the risk in five moves: deny-by-default, scope tool access to task + resource + action, just-in-time elevation, continuous tightening, and containment controls that restrict or stop an agent quickly.

Your guiding principle is simple: the LLM proposes actions; policy authorizes them. If you let the agent’s reasoning become your authorization logic, you’re deploying an incident generator. This applies whether you build or buy agents. Most teams are deploying third-party agents, especially for coding, and the risk usually comes from how you connect them to internal tools and data, not from how the model was trained.

What is an over-permissioned agent?

An over-permissioned agent is an agent that can read, write, or act beyond what it needs for the task you gave it.

That sounds obvious, but agent failures usually aren't obvious when they happen. The agent still looks “helpful.” It might even produce a correct user-facing outcome. The problem is that its permission envelope is large enough that one wrong step turns into real damage: a data leak, an irreversible change, a fraudulent action, or lateral movement into systems nobody meant to expose.

When you evaluate whether or not an agent is overpermissioned, you’re not asking “Will it usually behave?” You’re asking “What's the worst it can do with its available toolset?” Agents are decision engines. They choose steps at runtime. If you give them broad tools, you’ve given them broad authority. If you want a fast gut check, ask this:

If this agent makes a single incorrect tool call, what’s the maximum damage it could cause?

If the answer includes data exposure, financial transactions, or production-impacting changes, you’re already doing blast-radius engineering: designing controls that bound the maximum impact of a single bad tool call. The only sane response is to design for bounded impact, not best-case behavior.

I’ve been watching agent failures pile up. Each incident uses different tools and prompts, but they share the same underlying mistake: broad permissions with weak controls. The Agents Gone Rogue registry is a running record of that pattern.

Why do agents become over-permissioned?

How did we go from “call this endpoint” to “run this workflow”?

Traditional integrations look like API clients. You write the sequence, you know the endpoints, and you can reason about permissions as a static set of capabilities.

Agents operate differently. You give a goal, they decide the steps. They chain tool calls, revisit earlier steps, and keep going until they hit an objective or get blocked. The workflow is not something you control step-by-step. That’s true whether you built the agent or deployed a third-party coding agent. The only reliable control is what tools it can access, under what constraints. The workflow is something the agent discovers through reasoning and planning.

That shift sounds philosophical until you connect it to permissions.

In a fixed integration, permissions are usually scoped to a narrow API surface.
In a non-deterministic agentic system, permissions are often scoped to “whatever tools might be useful.”

That’s how teams slip from “we integrated a search tool” to “we integrated the search tool, the ticket tool, the customer database, the email tool, and the refund tool… because it made the demo work.”

What permission surfaces matter, and why do they fail differently?

Agents don’t just “use an API.” They touch three permission surfaces, and each one tends to break in its own way.

What can go wrong with data access surfaces?

Data access covers databases, warehouses, file systems, internal docs, ticket attachments, and anything that looks like a retrieval-augmented generation (RAG) pattern.

This is where over-permissioning becomes silent and catastrophic. When a tool can read broadly, the agent can leak broadly. You may never see an explicit “exfiltration” step, e.g. sending an email. The agent might simply include something sensitive in an output, attach the wrong file, or summarize internal data into an external channel because it thought it was relevant.

Data access failures are usually leaks: once the agent can read broadly, it can expose sensitive data in ways you didn’t intend.

What can go wrong with action surfaces?

Action surfaces include email, Slack, ticketing, deploy pipelines, admin consoles, payment systems, and any integration that mutates state.

This surface is dangerous because “write” is not always obvious. Systems trigger side effects. A “read-only” call can still create a new record, fire a webhook, send a notification, or kick off a workflow if the integration is misconfigured or the platform treats “fetch” as “sync.”

Action surfaces fail through side effects: one call can send messages, trigger workflows, or modify systems.”

What can go wrong with identity surfaces?

Identity surfaces are service accounts, OAuth tokens, API keys, session delegation, and any mechanism that lets an agent act “as someone.”

Identity is where the most damaging failures start. If the agent’s effective identity is a shared service account with broad admin-level permissions, the user’s intent becomes irrelevant. The agent can do whatever the service account can do, and your system loses the ability to enforce “acting on behalf of” in a meaningful way.

Identity surfaces fail through escalation: if the agent runs under a shared service account or overly broad token, it can act far beyond the user’s intent and entitlements.

What’s the core failure mode?

The core failure mode is permission expansion without a tightening loop.

Teams give agents more power than the task requires, then lack a mechanism to reduce scope as the agent’s behavior becomes known. Permissions expand quickly because friction kills demos. Permissions rarely contract because tightening requires someone to define boundaries, write policies, and accept that “sometimes the agent should be blocked.”

You see the same few patterns over and over:

Prototype-to-prod drift: admin keys or broad scopes used in early testing stick around.
Tool sprawl: every new integration silently adds new actions and new data surfaces.
Long-lived, shared credentials: service accounts and OAuth scopes outlive the task and bypass user intent.
No runtime scoping: static roles don’t adapt to context, so teams over-grant to avoid failures.

That’s agent permission creep in practice: more access shows up automatically while less access requires an explicit decision.

‍

How does over-permissioning actually happen?

Why does “make it work” access become permanent?

The first version of an agent is almost never built like production software. It’s built like a demo. The team wants the agent to succeed. Tool calls are failing. The fastest fix is to widen scopes, use a powerful service account, or skip checks. Everyone tells themselves they will revisit the permissions later.

Then the demo works, the agent ships….and the permission scopes stay.

This is not a morality tale. Rather it’s normal engineering economics. Tight scopes feel like product work, security work, and operations work all at once. If you don’t design a tightening loop from the start, there’s no natural moment when it magically appears.

How does tool sprawl create implicit power?

Every tool you add is a permission change, whether you treat it that way or not.

Teams often treat tool selection as a model capability problem. They focus on prompts, reasoning, and tool descriptions. The permission impact gets ignored because it’s “just another connector.” That mindset breaks quickly in the real world.

A “read” tool can still trigger writes by calling systems that:

auto-send messages,
open tickets,
create follow-on tasks,
fire webhooks,
sync data to external systems.

If you treat tool exposure as harmless, your permission model will drift until it looks like “the agent can do everything.”

Why do long-lived credentials and ambiguous delegation undermine authorization?

There are two versions of the same bug.

Version one: the agent appears to act as the user in the UI, but calls tools using a shared backend identity.
Version two: the agent uses user credential tokens, but with broad scopes and long TTLs, so the permission envelope becomes “whatever this user could ever do,” not “what this user wants to do now.”

Both versions create the same outcome: your system can’t reliably answer “who authorized this action?” because it never forced a deterministic decision.

What is context-driven privilege creep?

This is where teams accidentally let the agent’s reasoning become their authorization logic.

In a healthy design:

the agent proposes a tool call,
policy decides whether the call is allowed,
and enforcement constrains what happens even when allowed.

In an unhealthy design:

the agent classifies intent,
chooses a tool,
and the system treats that choice as approval. The system lets inference become authorization.

A concrete example:

User: “I bought the wrong item, can I get a refund?”
Agent: infers the intent is “refund,” selects issue_refund, and generates parameters.
System: permits the call because the agent’s classification is “refund,” not because policy verified eligibility, refund limits, ownership, time window, fraud signals, or approval thresholds.

It’s subtle because it looks reasonable until it fails. When it fails, you don’t get a clean “auth denied.” You get a refund issued to the wrong user, or a refund size that violates policy, or a pattern of abuse that the agent unknowingly enables.

What’s the business impact of over-permissioned agents?

Once you see these systems in production, the impacts stop being theoretical:

Data exfiltration and compliance exposure (PII, financial data, internal IP).
Destructive writes and outages (bad config pushes, data loss, noisy automations).
Fraudulent actions (refund abuse, invoice edits, account changes).
Loss of confidence (teams pause agent rollout because nobody trusts the boundary).

You can’t sustainably ship agents without a permission model that scales with tool growth.

‍

What permission model actually works for agents?

Why must tool access be deny-by-default?

If your system starts from “tools available unless blocked,” you will miss cases. You will also over-grant to avoid agent failure. Deny-by-default forces a clean inventory:

Which tools exist for this agent?
Under which conditions?
On which resources?
With what constraints?
With what escalation and approval paths?

This is the foundation of tool access controls for agents.

How do you scope permissions to task + resource + action?

The most common mistake I see is granting permissions at the “integration” level. “Access to Jira” is not a permission. It’s a statement of hope.

What works is scoping to:

task (what the agent is doing),
resource (what it can touch),
action (what verbs are allowed),
plus constraints (how it can express that action).

Instead of “agent can access Jira,” you grant:

“agent can create tickets in project ABC”
“agent can set only fields {summary, description, severity}”
“severity must be one of {S1, S2, S3}”
“agent can’t add external watchers”
“rate limit: 5 tickets/minute”

This is also how you avoid creating impotent agents. You don’t block everything, instead you allow safe slices.

Why do you need just-in-time elevation with expiry?

Static permissions don’t match dynamic workflows. Agents don’t need permanent broad access. They need temporary authority for specific steps. The pattern leading teams reach for is:

The agent requests elevation for a specific tool + scope.
Policy grants it with a short TTL.
The grant expires automatically after completion or time window.

It’s hard to overstate how much damage this prevents. We believe that most of the “the agent still had access” incidents are credential lifecycle failures

How do you make least privilege continuous?

Least privilege for agents is not a one-time role design exercise. Agents and tools change, and workflows evolve. So you need to build a tightening loop that treats permissions as a living system:

Measure what tools and scopes the agent actually uses.
Detect unused permissions and risky patterns.
Recommend reductions.
Apply reductions manually at first.
Auto-apply low-risk tightening within policy-defined limits.

When I say ‘continuous least privilege,’ I mean the system can tighten permissions over time in two ways. It can recommend reductions for human approval, and it can auto-apply low-risk tightening within predefined limits (like shortening TTLs or removing unused tools). Anything that could break workflows or expand risk should stay in the approval path.

Why is containment and why is it non-negotiable?

Even perfect policy doesn’t eliminate risk. Tools change, connectors fail, prompts get compromised, and humans make mistakes. Containment is what lets you operate agents like production systems rather than science experiments. Keep these controls distinct:

Downgrade scope: keep the agent running but reduce impact (read-only, narrower allowlists, tighter TTLs).
Disable specific tools: keep the agent running but revoke high-risk capabilities.
Quarantine: keep the agent running but isolate it from real systems (sandbox/propose-only). In this context, quarantine is a safety mode not a shutdown.
Kill switch: hard-stop the agent immediately (block tool calls/egress for that identity).

**Figure 1:** *Deterministic authorization per tool call, with continuous monitoring and containment to prevent over-permissioned behavior.* *Oso* *provides the policy layer that authorizes each tool call and enables scoped access, continuous least privilege, and containment.*

‍

How do you prevent over-permissioned agents in practice?

Step 1: Where should you enforce authorization decisions?

Pick a boundary you can make deterministic and auditable:

API gateway
tool handler/router
service layer
data access layer

While you don’t need to be prescriptive on the location, you do need to be prescriptive about the properties:

it must be enforced on every tool call,
it must be non-bypassable,
it must be logged.

Step 2: How do you model “who is acting” and “on whose behalf”?

Most agent systems get into trouble because “acting principal” is fuzzy. You need to separate:

human user identity (requester),
agent identity (executor),
service account identity (infrastructure).

Then enforce delegation explicitly:

the agent can act within the user’s entitlements,
plus explicit agent allowances,
and nothing else.

Here’s a sanity test you can use: “If I revoke the user’s access, does the agent still have it?”
If the answer to that question is “yes”, delegation is broken.

Step 3: What is a tool access contract?

A tool access contract describes what safe usage looks like for a tool. It’s the bridge between “we exposed a tool” and “we can control it.” For every tool, you need to document:

allowed actions (read/write/delete/execute),
allowed resources (which objects/scopes),
required context (ticket ID, customer ID, repo, environment),
rate limits,
side-effect classification (safe vs risky),
rollback expectation (easy vs hard).

This makes the review concrete and also prevents “tool sprawl” from becoming “permission sprawl.”

Step 4: How do you apply constraints to tool calls?

Your policy evaluates:

actor,
action,
resource,
context.

But the real power is in constraints. You don’t want a world where “deny” is your only safety mechanism, otherwise you’re in the world of impotent agents. You want “allow, but bounded.”

Common constraint patterns:

Field-level constraints: allow writing notes, block writing billing_status.
Row-level filters: only allow rows where customer_id == current_customer.
Template restrictions: the agent can write “this kind of thing, in this format, to these destinations.”
Parameter allowlists: only specific command flags or API parameters.
Limits: max refund amount, max rows returned, max recipients.

Template restrictions matter because a lot of high-impact actions are executed through what looks like free-form text. If you let an agent compose arbitrary emails, tickets, or Slack messages, you’ve implicitly let it choose recipients, links, and triggers. Templates force the action through a fixed shape so the agent can only fill approved fields.

Step 5: Why should you use short-lived credentials and short TTLs?

Long-lived credentials are permission creep in token form. Instead prefer:

per-tool, per-scope credentials,
short TTLs,
capability tokens instead of broad API keys,
aggressive rotation.

The goal is to make “standing access” hard to accidentally create.

Step 6: When should an agent require human approval?

Don’t gate “production deploys” as a blanket category. Gate by risk and reversibility. Otherwise you’ll create impotent agents and teams will route around the system.

Use human-in-the-loop for:

money movement (payments, transfers),
permission grants (role/policy changes),
irreversible/destructive operations (deletes, irreversible migrations or state changes),
high-risk production changes (privilege-impacting, broad blast radius, hard-to-rollback).

Then implement a two-phase commit:

agent proposes,
system validates constraints and eligibility,
a human approves execution.

Step 7: How do you detect drift and contain abuse at runtime?

Even with good policy, runtime is where agents surprise you. Tools change, integrations behave differently than you expect, and the agent’s strategy evolves as it encounters edge cases. So treat containment as an always-on control loop: detect early signals of drift, then degrade safely before you hit an incident.

The key mindset shift is that you’re not trying to perfectly classify “malicious” versus “benign.” You’re trying to answer a simpler question: “Is this agent operating inside the scope we intended?” When the answer becomes unclear, you reduce blast radius automatically.

What signals reliably indicate drift?

I look for signals that correlate with loss of control, not just “weird behavior”:

Anomalous call volume: sudden spikes in tool calls, retries, or parallelism. This often shows runaway loops, bad tool routing, or prompt injection that turns the agent into a scanner.
Repeated denials: a few denials are normal; a stream of them usually means the agent is probing boundaries or stuck in a failure mode where it keeps trying higher-privilege actions.
New resource classes: the agent suddenly attempts to access a different category of resource (e.g., from “tickets” to “users,” from “metrics” to “raw tables,” from “staging” to “prod”). That’s frequently a sign the agent is escaping the intended task scope.
Out-of-policy tool chaining: the agent starts composing risky sequences like “read sensitive → summarize → send externally” or “fetch secrets → call deploy.” It may look reasonable step-by-step, but the chain is the risk.
Shifts in tool usage distribution: the agent’s pattern changes over time. If a support agent that normally uses KB search starts hammering the CRM export endpoint, something has changed.

These signals work because they’re observable at the control plane rather than you needing to guess intent.

How should the system respond?

Responses should be graduated. You want the agent to keep being useful when possible, but you want to cut off risky surfaces quickly.

Downgrade to read-only: my default first response. It preserves utility (diagnosis, summarization, drafting) while preventing most harmful side effects.
Restrict to a tool subset: remove high-risk tools (refunds, outbound comms, admin actions) while keeping safe ones (search, read metrics, draft).
Sandbox-only mode: force the agent into a mode where it can operate on synthetic/sandbox data and produce proposals, but cannot touch production state.
Quarantine: isolate the agent identity, require explicit approvals for any action, and block access to sensitive resource classes. This is “keep it running, but behind glass.”
Kill switch: hard-stop the agent when you have a strong signal of compromise or runaway behavior. Block all tool calls/egress for that agent identity until re-enabled.

A simple rule of thumb: if you can’t explain the behavior, reduce the authority. Don’t wait until you can.

What makes containment real?

Speed and automation. Containment isn’t a policy document. It’s a mechanism that triggers in seconds. If your response path is “open a ticket, page someone, debate whether it’s serious,” you don’t have containment. You have incident response. The bar for this step is: the system can degrade or stop the agent faster than the agent can do meaningful harm.

‍

Step 8: How do you build feedback loops that reduce access?

This is your anti permission-creep mechanism.

Track:

unused tools by agent/task type,
unused scopes,
denied actions by type,
frequency of high-risk tool usage,
time-to-approval and rollback.

Then:

generate permission reduction recommendations,
apply via policy PRs,
auto-apply low-risk tightening within explicit limits.

The system should tighten permissions automatically as you gain evidence about what the agent actually needs, instead of relying on rare manual audits to claw back access.

What do least-privileged agents look like in the real world?

Example A: How do you prevent an over-permissioned customer support agent?

Needs: read customer profile and draft replies.

Prevent:

sending emails to arbitrary domains,
editing billing fields,
issuing refunds without policy checks.

Controls:

recipient allowlist (only the verified customer address),
template-only outbound messages,
CRM read-only except a narrow notes field,
refunds either disabled or propose-only with approval + amount caps.

This is the key design move: keep the agent productive where it’s safe (drafting, summarizing, routing) and force explicit gates on high-impact actions (refunds, billing changes).

Example B: How do you prevent an over-permissioned BI/data analyst agent?

Needs: query a warehouse for aggregated metrics.

Prevent:

access to raw PII tables,
exporting full datasets.

Controls:

row/column-level policies (PII excluded by default),
aggregation-only query patterns (templates, not raw SQL),
capped result sizes + export disabled,
cost guards (timeouts, spend limits tied to the agent identity).

The experience stays good: most questions are answered. The unsafe edge cases get escalated rather than “solved” by over-broad access.

Example C: How do you prevent an over-permissioned DevOps runbook agent?

Needs: check service health and open incidents.

Prevent:

arbitrary deploys,
changing IAM policies.

Controls:

environment scoping (staging by default),
human approvals for high-risk production changes,
strict command allowlists and constrained arguments,
quarantine triggers when it attempts new command classes.

This yields an agent that accelerates incident response without turning into an unbounded DevOps or SRE team member

‍

What improves when you get this right?

When teams implement a real AI agent authorization model, you see the benefits quickly:

fewer incidents caused by agentic automation,
faster agent rollout because controls are inspectable and repeatable,
lower blast radius when prompts/tools get compromised,
clearer audit trails for compliance and governance,
less internal resistance to shipping agents.

It stops being a debate about whether agents are safe. It becomes an engineering question: “Is the boundary correct and enforced?”

‍

Conclusion: what should you remember?

Over-permissioning is a system design flaw, not a model behavior flaw.

Solve it with deterministic authorization at the tool boundary, scoped credentials, just-in-time elevation, continuous tightening, and containment.

If permissions only ever grow, you will eventually ship an over-permissioned agent. The only question is whether you’ll notice before the incident.

If you’re deploying agentic systems, Oso for Agents gives you a practical control plane for this: deterministic authorization at the tool boundary, scoped credentials, and runtime containment (downgrade, tool disable, quarantine, kill switch), plus work we’re doing on automated least privilege so permissions tighten as you gain evidence. If you want to sanity-check your current tool routing and delegation model, grab 20 minutes with one of our engineers and we’ll map your highest-risk tool calls to concrete policy and constraints.

FAQs

What’s the difference between least privilege for humans and for agents?

Humans typically have stable roles. Agents execute dynamic workflows. Static RBAC tends to either break tasks or over-grant access.

For agents, least privilege needs to be runtime-scoped:

per-call authorization,
task/resource/action constraints,
short-lived credentials,
continuous tightening based on observed behavior.

Should agents have their own identity, or always act as the user?

Give agents their own identity for auditability and control. Then enforce delegation so the agent cannot exceed user entitlements without explicit approvals

If everything runs under a shared service account, you lose attribution and meaningful least privilege.

How do I prevent agent permission creep as tools get added?

Treat tool additions as permission changes:

update the tool access contract,
add explicit allow rules,
constrain by resource/action/output,
and run a tightening loop to remove unused permissions.

When should an AI agent require human approval?

Use approval gates for actions that are irreversible, privilege-changing, or high blast radius:

money movement,
permission grants,
destructive operations,
high-risk production changes.

Avoid gating routine actions that run inside a hardened pipeline with policy checks, and rollback.

How do I handle agents that need broad read access for search?

Split retrieval into safe layers:

broad access to non-sensitive collections of text or documents
restricted access to sensitive data with row/column rules,
output constraints to prevent raw dumps,
caps on results and export.

Require explicit elevation or approval when crossing into sensitive data.

How do I audit what an agent accessed and why?

Log every tool call with:

actor (user/agent/service),
action + resource,
applied constraints,
allow/deny reason,
token scope + TTL,
correlation IDs back to the triggering request.

If you can’t answer “who did what, under what policy, and why,” you can’t safely operate agents in production.

Can I roll this out incrementally without rewriting my whole stack?

Yes. Start with one choke point (tool router or high-risk service layer), then progressively:

deny-by-default for new tools,
constraints for existing tools,
short-lived credentials,
approvals for high-risk actions,
containment + tightening loop.

The key is making the chosen boundary non-bypassable.

What’s the minimum viable set of controls for a first production agent?

Minimum controls that actually matter:

tool allowlist (deny-by-default),
deterministic policy check per tool call,
resource scoping,
short-lived credentials,
basic containment (disable tool + kill switch),
auditable logs for every decision and call.

‍

How to Prevent Over-Permissioned Agents

TL;DR

What is an over-permissioned agent?

Why do agents become over-permissioned?

How did we go from “call this endpoint” to “run this workflow”?

What permission surfaces matter, and why do they fail differently?

What can go wrong with data access surfaces?

What can go wrong with action surfaces?

What can go wrong with identity surfaces?

What’s the core failure mode?

How does over-permissioning actually happen?

Why does “make it work” access become permanent?

How does tool sprawl create implicit power?

Why do long-lived credentials and ambiguous delegation undermine authorization?

What is context-driven privilege creep?

What’s the business impact of over-permissioned agents?

What permission model actually works for agents?

Why must tool access be deny-by-default?

How do you scope permissions to task + resource + action?

Why do you need just-in-time elevation with expiry?

How do you make least privilege continuous?

Why is containment and why is it non-negotiable?

How do you prevent over-permissioned agents in practice?

Step 1: Where should you enforce authorization decisions?

Step 2: How do you model “who is acting” and “on whose behalf”?

Step 3: What is a tool access contract?

Step 4: How do you apply constraints to tool calls?

Step 5: Why should you use short-lived credentials and short TTLs?

Step 6: When should an agent require human approval?

Step 7: How do you detect drift and contain abuse at runtime?

What signals reliably indicate drift?

How should the system respond?

What makes containment real?

Step 8: How do you build feedback loops that reduce access?

What do least-privileged agents look like in the real world?

Example A: How do you prevent an over-permissioned customer support agent?

Example B: How do you prevent an over-permissioned BI/data analyst agent?

Example C: How do you prevent an over-permissioned DevOps runbook agent?

What improves when you get this right?

Conclusion: what should you remember?

FAQs

What’s the difference between least privilege for humans and for agents?

Should agents have their own identity, or always act as the user?

How do I prevent agent permission creep as tools get added?

When should an AI agent require human approval?

How do I handle agents that need broad read access for search?

How do I audit what an agent accessed and why?

Can I roll this out incrementally without rewriting my whole stack?

What’s the minimum viable set of controls for a first production agent?

Mat Keep

Product Marketer

Level up your authorization knowledge

Secure Your Agents

Authorization Academy

Oso Docs