- See agent activity end-to-end (actions, decisions, context)
- Detect misbehavior early (before it becomes an incident)
- Enforce guardrails at the action boundary (approvals, rate limits, information-flow controls)
Detecting Misbehaviour
Types
The UI is optimized to surface issues that matter for agent workloads:- Velocity spikes: Agents suddenly executing far more actions than normal (often a sign of a loop, prompt injection, or mis-scoped automation).
- Data leaks: Risky sequences like accessing sensitive data and then attempting an outbound or state-changing action.
- Intent drift: The agent is “still working” but is now using different tools, hitting different resources, or acting outside the expected workflow shape.
- Start from a flagged agent or alert.
- Drill into the recent action timeline to see the exact tool calls and parameters.
- Compare it to historical behavior to identify the root cause of the misbehaviour.
Simulations (Pre-Release Testing)
Before enabling a new tool or workflow in production, use simulations to validate:- Which tools the agent attempts to use
- What permissions are actually exercised
- Whether the workflow triggers high-risk sequences
Production Monitoring
In production, the UI becomes your operational loop:- Track what changed (new tools, new resources, new patterns)
- Identify over-privilege (permissions that are present but not used)
- Triage high-signal anomalies quickly with action-level evidence (not prompt transcripts alone)
Enforcing Controls
Approvals / HITL
When a tool call is risky, enforce human approval before execution. In the UI, reviewers should be able to see:- The tool being invoked and parameters
- Why it was flagged (policy or risk signal)
- Relevant prior steps in the session (the minimal context needed to decide)
- Start with approvals on high-impact tools (writes, outbound comms, admin actions).
- Expand coverage based on what you observe.
Rate Limiting
Rate limits mitigate blast radius when agents misbehave:- Cap action volume per agent/session/tool
- Block or degrade gracefully during spikes
- Create predictable failure modes instead of runaway execution
Information Flow
Information-flow controls help prevent unsafe combinations across a session, for example:- Untrusted input → sensitive data access → outbound or state-changing action
- Understand which step introduced risk
- Confirm the sequence that triggered enforcement
- Tune policy so safe workflows pass while risky compositions get gated
Incident Response & Quarantine
When something goes wrong, the UI should support a clean incident loop:- Detect: alerts triggered by executed actions and abnormal patterns
- Investigate: action timeline + tool call inspection (what happened, where, and with what parameters)
- Contain: quarantine an agent, require approvals, tighten limits
- Learn: turn the incident into a durable policy or control (so it doesn’t recur)