🟪 Agent Gone Wild: Why AI Guardrails Fail Without Infrastructure Controls

Reading time: 7 minutes | Issue #31 | Book a Call

Happy Tuesday.

A CTO we work with built a multi-bot AI pipeline using Codex 5.5. Multiple bots collaborating: analysis bots feeding a dev bot that writes Jira tickets, specs, executes code, and produces pull requests. Our team reviews PRs before anything touches production. The bot had one explicit rule: do not merge to dev without review.

On a Friday afternoon, it merged everything. Six or seven pending tasks, force-pushed to dev. Unreviewed. Unfinished. The bot knew the rule. When asked later what happened, it had no explanation. Nobody does.

Inside the Issue

Why your AI agent’s guardrails are written in the wrong language
A five-question permissions audit you can run on every agent today
SpaceX’s $60B Cursor acquisition, Trump’s AI security order, and only 14.4% of agents going live with security approval

Sources: ABC News (PocketOS incident, April 2026) | Digital Trends (Amazon AI coding outages, March 2026) | Limestone Digital engagement data (anonymized, June 2026)

The Bot Read the Rules. Then It Broke Them.

The CTO’s system was well-designed on paper. Multiple bots in a pipeline, each with a defined role. Analysis bots feed context to a dev bot. The dev bot writes tickets, generates specs, writes code, and opens pull requests against the dev environment. Our engineers review those PRs before anything moves toward production. The CTO triggers the pipeline manually.

The bot was working on MCP creation when it went off-script. On a Friday afternoon, with most of the engineering team already offline, it stopped creating PRs for review and force-merged approximately six or seven pending tasks directly to dev.

Every one of those PRs was unreviewed. Every one was unfinished. The bot had an explicit rule in its prompt instructions: do not merge without review.

Nobody can identify what triggered the override.

Our read: The CTO isn’t careless. He built a system with defined roles and review gates. The failure was in one design decision: the bot’s API keys gave it full system access, and the “guardrails” were prompt-level instructions, not infrastructure-level permissions.

Prompt rules are linguistic. Branch protection rules are architectural. One is a suggestion the model weighs against its optimization objective. The other is a wall.

The CTO spent the weekend cleaning up. By Monday, our team discovered something worse. The bot had modified a part of the application our engineers don’t normally work in. The database had issues from the merges that couldn’t be rolled back. The app looked fine on the surface. The damage was hidden in an area nobody had checked.

If our team hadn’t caught it, broken code would have reached production. The client stakeholder didn’t know about the incident. We found the problems. That catch is worth noting, because the PocketOS incident two months earlier was loud: the database was deleted, the app went down immediately. Our incident was silent. Silent is worse.

In April, the PocketOS founder shared his post-mortem publicly after an AI coding agent running Cursor deleted his production database in nine seconds. The agent had safety rules. It acknowledged violating them. It cited the exact rules it had been given and admitted it broke every one. The post got 6.8 million views on X. A confession is useful for a post-mortem. It is useless as a control.

In March, Amazon’s AI coding tools contributed to outages that cost 6.3 million lost orders across North American marketplaces. Amazon SVP Dave Treadwell cited “novel GenAI usage for which best practices and safeguards are not yet fully established.” Amazon’s response: a 90-day safety reset targeting 335 critical systems with mandatory two-person review. Infrastructure controls. The kind that would have prevented our client’s Friday incident.

Gartner predicts 40%+ of agentic AI projects will be cancelled by end of 2027, citing escalating costs, unclear value, and inadequate risk controls. After watching three organizations hit the same failure pattern in the same quarter, I think “inadequate risk controls” is doing most of the work in that prediction.

Who should be uncomfortable. If you have an AI agent with access to your codebase, your infrastructure, or your data, and its constraints exist as prompt-level instructions only, you’re running the same configuration that produced all three of these incidents. The PocketOS agent had a fully permissioned API token. Amazon’s engineers had broader permissions than expected. Our client’s bot had full system access via API keys. Every case: someone wrote a rule in English and treated it as a security boundary.

Sources: Limestone Digital engagement data (anonymized, June 2026) | PocketOS post-mortem via ABC News, The Register, ServiceNow analysis (April 2026) | Amazon AI coding outages via Digital Trends, OECD.AI (March 2026) | Gartner agentic AI deployment forecast

01 SpaceX acquired Cursor for $60B in all-stock, the largest VC-backed acquisition on record. Cursor’s market share had dropped from 41% to 26% while revenue hit $4B ARR, partly because Anthropic’s Claude Code was running wholesale economics while Cursor paid retail API pricing. For mid-market companies using Cursor: watch for model-layer changes as xAI integrates its Grok infrastructure. Platform risk just increased.

Source: TechCrunch, CNBC (June 16, 2026)

02 Trump signed an AI security executive order on June 2. Voluntary 30-day pre-deployment review framework for frontier models, triggered by Anthropic Mythos’s cybersecurity capabilities. The classified benchmarking process for “covered frontier models” signals closer federal attention to what AI systems can do before the public sees them. For companies deploying AI in regulated industries: the voluntary framework may not stay voluntary.

Source: White House, NPR (June 2, 2026)

03 Only 14.4% of AI agents go live with full security or IT approval. Gravitee surveyed 900+ practitioners. 82% of executives believe their policies protect against unauthorized agent actions. More than half of all agents operate without security oversight or logging. That confidence gap (82% feeling protected while 86% are exposed) is the most dangerous number in the report.

Source: Gravitee State of AI Agent Security 2026

04 OWASP published the Top 10 for Agentic Applications, the first formal security taxonomy for autonomous AI systems. Agent Identity and Privilege Abuse is risk #3. Rogue Agents is #10. The framework treats agents as principals with their own goals, tools, memory, and inter-agent protocols as distinct attack surfaces. If your security team is still working from guidance designed for chatbots, this is the update.

Source: OWASP GenAI Security Project (December 2025, updated 2026)

The Permissions > Instructions Audit

After this incident, we built a five-question audit for every AI agent touching a client’s codebase or infrastructure. The goal: map the gap between what the agent is told not to do and what the agent can’t do.

Question 1: What credentials does this agent hold? List every API key, token, and access scope. If you can’t answer in under two minutes, the agent has more access than you realize. Our client’s bot had the same credentials the dev team used. Full write access to the repository and database.

Question 2: What is the maximum damage this agent can cause in a single action? Not the intended behavior. The worst case with its current permissions. PocketOS’s worst case was “delete the production database.” That worst case happened. Amazon’s worst case was “deploy broken code to 335 critical systems.” That happened too.

Question 3: What infrastructure-level controls prevent the agent from exceeding its intended scope? Branch protection rules. Scoped API tokens with minimum required permissions. Write caps on database operations. If the answer is “the prompt tells it not to,” that is not a control.

Question 4: What observability exists for this agent’s actions in real time? Our team discovered the database problems Monday morning. The incident happened Friday afternoon. Three days of hidden damage. You need alerts that fire when action volume, write patterns, or error rates deviate from baseline. The Gravitee State of AI Agent Security report found that more than half of all agents operate without any security oversight or logging.

Question 5: Can you roll back every action this agent takes? Every write, every merge, every database modification. If the agent creates a state you can’t reverse, you’ve handed it irreversible authority. Our resolution: reset the dev database to a staging copy. PocketOS needed Railway’s disaster backups. Amazon needed a 90-day safety reset across 335 systems. Rollback should not require heroics.

Run this today on every AI agent with access to your codebase. If any agent fails more than one question, better prompts won’t fix it. Infrastructure changes will. This connects directly to the five-layer agent deployment readiness checklist from Issue 22: access isolation, blast radius containment, human escalation paths, observability, and rollback. The audit above is how you score the first two layers for agents already in the wild.

If the audit revealed agents with broader access than anyone expected, our two-week diagnostic maps every agent’s permission scope against your infrastructure and builds the controls that prompt rules can’t enforce.

We run an audit across your full agent fleet and deliver a remediation plan with implementation timelines. Three diagnostic slots open in July.

Book a Diagnostic Call

Until next Tuesday,

— Mark Ajzenstadt

Founder, Limestone Digital

🟪 agent gone wild

The Bot Read the Rules. Then It Broke Them.

The Permissions > Instructions Audit

Keep Reading

AI Foundation from Limestone Digital

Home