AI Agents Gone Rogue
We’re tracking the latest agentic failures, exploits, and emergent attack patterns so you can understand where risks exist and how to mitigate them.
Uncontrolled Agents
While using Google Antigravity in “Turbo” mode (automatic command execution), the agent wiped the entire content of the user’s D-drive while attempting to clear the project cache
User lost full D-drive contents. Other users report similar issues
User advised others to exercise caution running Antigravity in Turbo mode as this enables the agent to execute commands without user input or approval
A bug in Asana’s MCP server allowed users from one account to access “projects, teams, tasks, and other Asana objects” from other domains
Cross-tenant data exposure risk for all MCP users, though no confirmed exploit; customers were notified and access suspended
The MCP server was taken offline, the code issue was fixed, affected customers were notified, and logs/metadata were made available for review.
Replit’s AI coding assistant ignored the instruction not to change any code 11 times, fabricated test data, and deleted a live production database.
Trust damaged; user code at risk; public apology by CEO
Product enhancements with backups in place and one-click restore launched
Tricked Agents
Attackers exploited ServiceNow Now Assist agent-to-agent collaboration + default config to trick a low-privileged agent into delegating malicious commands to a high-privilege agent, resulting in data exfiltration.
Sensitive corporate data leaked or modified; unauthorized actions executed behind the scenes
ServiceNow updated documentation and recommended mitigations: disable autonomous override mode for privileged agents, apply supervised execution mode, and segment responsibilities
Google Antigravity data-exfiltration via prompt injection. A “poisoned” web page tricked Antigravity’s agent into harvesting credentials and code from a user’s local workspace, then exfiltrating it to a public logging site.
Sensitive credentials and internal code exposed; default protections (e.g. .gitignore, file-access restrictions) bypassed.
The vulnerability has been publicly disclosed by researchers. PromptArmor and others highlight the need for sandboxing, network-egress filtering, and stricter default configurations.
A “zero-click” exploit called Shadow Escape targeted major AI-agent platforms via their MCP connections. Malicious actors abused agent integrations to access organizational systems.
Agents inside trusted environments were silently hijacked, bypassing controls. Because it exploited default MCP configs and permissions, the potential blast radius covered massive volumes of data.
Initial remediation advice included auditing AI agent integrations, enforcing least privilege, and treating uploaded documents as potential attack vectors.
Researchers demonstrated how the web-search tool in Notion’s AI agents could be abused to exfiltrate private data via a malicious prompt.
Confidential user data from internal Notion workspaces could be exposed to attackers
Notion declared the vulnerability and announced a review of tool permissions and integrations.
Supabase MCP data-exposure through prompt injection. The agent used the service_role key and interpreted user content as commands, allowing attackers to trigger arbitrary SQL queries and expose private tables.
Complete SQL database exposure. All tables became readable. Sensitive tokens, user data, internal tables at risk.
Public disclosure by researchers. Calls for least-privilege tokens instead of service_role, read-only MCP configuration, and gated tool access through proxy/gateway policy enforcement.
A prompt-injection flaw in GitHub’s MCP server lets attackers use AI agents to access private repos and exfiltrate code.
Private code, issues, and sensitive project data could be exposed via public pull requests.
Organizations were advised to limit agent permissions, disable the integration, and apply stricter review of tokens.
Weaponized Agents
A Chinese state-sponsored group abused Anthropic Claude Code and MCP tools to automate ~80–90% of a multi-stage agentic cyber espionage operation across ~30 global organizations.
Successful intrusions and data exfiltration at a subset of tech, finance, chemical, and government targets; first widely reported large-scale agentic AI-orchestrated cyberattack.
Anthropic detected the activity, banned attacker accounts, notified affected organizations, shared IOCs with partners, and tightened safeguards around Claude Code and MCP use.
Malice in Agentland study found attackers could poison the data-collection or fine-tuning pipeline of AI agents . Even with as low as 2% of traces poisoned, embedding backdoors that trigger unsafe or malicious behavior when a specific prompt or condition appears
Once triggered, agents leak confidential data or perform unsafe actions with a high success rate (~80 %). Traditional guardrails and two standard defensive layers failed to detect or block the malicious behavior.
The study raises alarm across the community; calls for rigorous vetting of data pipelines, supply-chain auditing, and end-to-end security review for agentic AI development
Submit an incident
Help us keep the AI Gone Rogue register complete and up to date
If you’re aware of a publicly documented agent-related breach we haven’t captured, share it below. We’ll review and add it to the register