Skip to main content

Local Development

In progress.

Integration Platforms

Protecting Against Lethal Trifecta

This guide explains how connector platforms can integrate with Oso to:
  1. automatically classify tools into Lethal Trifecta / Rule of Two capability buckets,
  2. let agent authors configure expected behavior on violations, and
  3. enforce decisions at runtime.
The underlying risk is the “Lethal Trifecta”: untrusted content + access to private data + external communication. Meta generalizes this as the “Agents Rule of Two”: in a session, avoid simultaneously granting all three of (A) untrusted inputs, (B) sensitive/private access, (C) external comms or state change.

Concepts

Integration A connector namespace (e.g. “Slack”). Tool An action an agent can invoke (e.g., chat.postMessage, conversations.list). Capability buckets (Rule of Two / Lethal Trifecta)
  • untrusted_content (A): inputs that may contain attacker-controlled instructions
  • private_data (B): access to sensitive systems or data
  • external_or_state_change (C): network egress, sending messages, writing data, side effects
Policy What to do when a risky combination is detected:
  • deny (default)
  • require_approval
  • ignore (not recommended except for testing)

Step 1: Register an integration and its tools

When a customer creates or installs an integration in your platform, send Oso the tool catalog (names + descriptions + schemas if available).
tools = [
  {"name":"chat.postMessage","description":"..."},
  {"name":"conversations.list","description":"..."},
]

metadata = {"integration_name":"Slack"}

# Oso classifies each tool and stores the results for runtime use.
oso
	.for_integrations("slack")
	.classify_tools(tools, metadata)
Notes
  • Oso can auto-classify tools, and you can provide manual overrides for edge cases.
  • If you have OpenAPI specs / MCP manifests, include them. Schemas improve classification quality.

Step 2: Configure enforcement behavior for an agent

Agent authors choose what should happen if Oso detects a Rule of Two / Lethal Trifecta risk.
agent_id ="..."

oso.for_agents(agent_id).configure_behavior({
	"on_rule_of_two_violation": "requires_approval" # or "deny" (default), "ignore"
})
Recommended defaults:
  • Production: deny or requires_approval
  • Development: requires_approval

Step 3: Stream session context (messages + tool events)

Oso’s decisions are stateful: whether a tool call is safe often depends on what happened earlier in the same session (for example, whether the agent has recently ingested untrusted content, accessed sensitive data, or is now attempting an external/state-changing action). To enable this, send Oso a lightweight event stream for each agent_id + session_id:
  • message_from_user / message_to_user (what the agent saw and said)
  • tool_call_request / tool_call_response (what the agent tried to do and what the tool returned)
These events form the audit trail and provide the context Oso uses to evaluate Rule of Two / Lethal Trifecta risk at the moment you request enforcement. Minimum required vs. recommended:
  • Minimum to gate tool sequences: tool_call_request
  • Recommended for higher-quality decisions and better investigations: include messages and tool call responses too
In the Python SDK, these events are recorded per agent_id and session_id, with timestamps applied when the event is received by Oso, via methods such as tool_call_request, tool_call_response, message_from_user, and message_to_user.

Step 4: Enforce at runtime (tool execution gate)

At runtime, call Oso before executing a tool. Oso returns a decision that you enforce in your tool runner.
agent_id ="..."
session_id ="..."
tool_parameters = {...}

decision = oso.for_agents(agent_id, session_id).tool_request(
	"chat.postMessage", tool_parameters, {"enforce":True}, integration_id="slack"
)

# decision.action is one of: "allow", "deny", "requires_approval"
If requires_approval:
  • pause execution
  • present a review UI to the user (tool name, parameters, risk reason)
  • resume only after approval
If deny:
  • block execution
  • return a safe error to the agent (and optionally suggest an alternative flow)

In-app Agents

In progress.