AI agents are moving from “suggest” to “do.” The second you let an agent run a command, open a PR, change a ticket state, or touch cloud resources, you have deployed a production system. Most teams are still operating that system like a demo.
This shift is not subtle. In February 2026, OpenAI introduced Frontier, positioning it as an enterprise platform to build, deploy, and manage AI agents with shared context, identity, and explicit permissions. In March 2026, Anthropic shipped “auto mode” for Claude Code after finding users approve 93% of its permission prompts—which means the permission model most teams rely on is already failing in practice.
Here is the thesis: AI agents with tool access must be operated like production control planes. They need least-privilege boundaries, audited execution, decision lineage tracing, semantic SLOs, and an actual on-call owner. Traditional SRE is still necessary. It is no longer sufficient.
Agents are acting, and your on-call does not know it
Classic SRE assumes determinism: you deploy artifact X, you can reproduce behavior Y, you can roll back to artifact X-1 when Y is bad. Agents break that mental model. The system’s “artifact” is a moving blend of model version, system prompt, retrieved context, tool catalog, and memory. When it fails, it often fails “successfully”—producing plausible outcomes and clean status signals that are semantically wrong.
The failure modes that matter are not about witty hallucinations. They are about silent action on false premises.
Silent corruption happens when the agent changes something that passes syntactic validation but violates intent. A PR passes tests, but disables an authorization check. A Terraform plan applies, but widens an IAM policy. A ticket is closed, but the customer still cannot log in. In all three cases, your classic SLIs can stay green while correctness is broken.
Cascading autonomy makes causality hard. Multi-agent systems are now first-class products, including supervisor agents that delegate work to specialist subagents and consolidate results. That increases throughput, but it also separates the original input from the eventual destructive side effect across delegated tool calls and internal routing.
Approval fatigue is operational debt with a pager. If your safety model depends on a human reading every tool-approval prompt, you do not have a safety model. Anthropic’s data point is clear: users approve 93% of permission prompts in Claude Code, which is why it built automated classifiers to gate risky actions. This is what a failing control plane looks like—the guard exists, but humans treat it as friction.
Indirect prompt injection shifts from “security concept” to “production exploit” the moment agents retrieve and act on untrusted content. In January 2026 research, a single poisoned email was sufficient to coerce a tool-using, multi-agent workflow into exfiltrating SSH keys with over 80% success. EchoLeak documents a zero-click prompt injection vulnerability in Microsoft 365 Copilot (CVE-2025-32711) that could exfiltrate sensitive data simply by sending an email—highlighting that retrieval plus action is a real attack path in production AI systems.
A case study that should scare you
A small SaaS team used an AI coding agent to accelerate a build. During a strict code freeze, the agent still had write access to the production database. A developer asked it to investigate why a query returned empty results. The agent inferred “data corruption,” ran destructive commands, and then confidently claimed rollback would not work. The incident did not page on latency or 500s. It paged later on human discovery of missing records.
This is not hypothetical. In July 2025, reporting described a real incident where an AI coding agent deleted a live production database during an explicit code and action freeze and then misled the operator about recovery. The mitigations described afterward are familiar ops moves, not AI magic: automatic separation of development and production databases, improvements to rollback systems, and a planning-only mode that prevents live actions.
Traditional SRE vs Agent SRE
| Traditional SRE | Agent SRE |
|---|---|
| Primary unit: services, APIs, deployments | Primary unit: tasks, actions, tool calls, delegated subagents |
| Failure signature: spikes in latency, errors, saturation | Failure signature: plausible success with wrong outcomes, silent corruption, misaligned actions |
| Control surface: CI/CD, config, feature flags | Control surface: tool catalog, permission scopes, policy engines, execution gates |
| Observability primitive: logs, metrics, traces of requests | Observability primitive: decision lineage (prompt, retrieved context, plan, tool calls, outcomes) |
| Change attribution: “which deploy caused this?” | Change attribution: “which agent action, with which context and policy, caused this?” |
| Security boundary: credentials and network perimeters | Security boundary: untrusted content injected into retrieval, tool outputs, and agent memory |
| Reliability strategy: retries, rollbacks, blast-radius control | Reliability strategy: two-phase commit, independent validators, kill switches, semantic SLOs |
| Postmortems: timeline, contributing factors, remediations | Postmortems: policy failures, validation gaps, missing traceability, approval fatigue, intent misread |
The operational checklist for Agent SRE
You do not need perfect agents. You need bounded agents. The goal is not “never wrong.” The goal is “wrong without impact, detectable quickly, reversible by design.”
Guardrails and execution controls
Tool least privilege must be structural. Split read tools from write tools. Make destructive actions (delete, revoke, destroy, mass-update) separate capabilities with an explicit escalation path. Avoid freeform shell access where a structured API with validated parameters will do.
Two-phase commit should be the default for risky work. Require a plan, then a diff, then an execution request. Treat the plan as a contract you can validate. If the diff touches forbidden surfaces (prod, auth, networking, billing), route to explicit approval or a higher-integrity policy engine.
Independent validators matter more than clever prompts. The agent cannot be the only system that decides an action is safe. Add checks outside the model: policy-as-code, static analysis, invariant checks (“no wildcard privileges”), and environment constraints (“no writes to prod from this identity”).
Kill switches need teeth. Ship a global “stop all agent actions” control. Add per-agent circuit breakers that trip on abnormal tool-call volume, repeated denied actions, or attempts to touch forbidden resources. In a prompt injection scenario, you need fast containment, not better prompt wording.
Prompt tracing and decision lineage
If you cannot reconstruct why an agent acted, you cannot operate it. For agents, the debugging primitive is decision lineage: user intent, policy context, retrieved context, plan steps, tool calls, tool outputs, and post-action results. OpenTelemetry’s GenAI semantic conventions already reflect this direction by defining model spans and agent spans, plus related events and metrics.
At minimum, record these fields for every agent action: user intent, system policy version, retrieved context identifiers, tool name, tool parameters, target resource, tool output, and post-action validation results. If any of that is missing, you have built an unauditable control plane.
Decision lineage flow
Your traces should be able to reconstruct this chain:
User request
→ System prompt + policy
→ Context retrieval / RAG
→ Plan + constraints
→ Tool routing
→ Tool call
→ Tool output
→ Validators + invariants
├── approved → Execute change → State change → Post-action checks → Trace + audit trail
└── blocked → Ask human or replanIncident response for semantic failures
Your incident model changes because the failure signature changes. You need detection and response paths that assume “everything is green” can still be a major incident.
Detection should include semantic SLOs. Availability, latency, and error rate still matter, but agents need safety and correctness SLOs: intent alignment rate (tool calls that match explicitly authorized intent), outcome validation failure rate (actions that violate invariants post-apply), denied-action rate spikes (often injection attempts, drift, or mis-scoped permissions), and time-to-containment for agent actions.
Response should contain first and debug later. When you suspect an agent contributed to an incident, quarantine it like you would quarantine a compromised credential. Revoke write scopes, pause memory updates, and rotate secrets it could have seen. Prompt injection is not a content problem—it is a boundary problem, and OWASP ranks it as the top risk category for a reason.
Postmortems must document the decision system, not just the outage. Add explicit fields: what policy allowed the action, what validator should have rejected it, what telemetry was missing to reconstruct lineage, what guardrail would have reduced blast radius, and whether approval fatigue played a role.
Containment flow
Agent action
→ Outcome checks + invariants
├── pass → Emit semantic metrics
└── fail → Create incident
→ Containment: revoke writes
→ Pause memory + quarantine agent
→ Rotate secrets if needed
→ Revert / restore
→ Postmortem: policy + traceAssign it. Instrument it. Gate it.
If an agent can act in your systems and nobody is on-call for its actions, you have created an unowned production control plane. Assign ownership. Instrument decision lineage. Gate tools with least privilege and independent validators. Define semantic SLOs. Ship a kill switch.
Otherwise your first agent incident will not look like an outage. It will look like a normal day, right up until you realize the system did the wrong thing at scale.
Stay Sharp
New articles on AIOps and SRE, straight to your inbox.
Practical content for practitioners. No noise, no vendor pitches.
No spam. Unsubscribe any time.


