Tags: Agent Containment, Blast Radius, Anthropic, OpenAI, Google DeepMind, Microsoft, OWASP, Sandbox, Permissions, HITL, Audit, Security

Agent containment is the set of architectural patterns that limit what an AI agent can do when it goes wrong. Drawing from Anthropic, OpenAI, Google DeepMind, Microsoft, and OWASP — here are the four layers every team deploying agents in production needs to understand, illustrated with FlowZap sequence diagrams showing the interactions between Agent, Sandbox, Human, Permission Gates, and SIEM.

Why Agent Containment Matters Now

On June 19, 2026, Anthropic published "How we contain Claude across products" — a detailed breakdown of the security architecture protecting claude.ai, Claude Code, and Cowork. The opening line sets the stakes:

"As agents grow more capable, so does their potential blast radius. The engineering question is how to cap it."

Anthropic is the latest of four major ecosystems that have published containment frameworks since February:

Ecosystem	Key Contribution	Date
Anthropic	4-layer containment stack (Sandbox→Permissions→HITL→Audit) for Claude Code	Jun 2026
OpenAI	"Practices for Governing Agentic AI Systems" — blast radius, delegation chains, permission scoping	Jan 2026
Google DeepMind	Agent safety framework for Astra, Mariner, Veo — runtime isolation + approval hierarchies	Mar 2026
Microsoft	AI Red Team lessons from Copilot agents — sandbox escapes, prompt injection in agentic chains	Feb 2026

This isn't theoretical. Every CI/CD pipeline that auto-approves PRs from an AI coding agent is a blast radius waiting to be measured. Every MCP server that grants terminal access without path scoping is a sandbox escape vector. The patterns below are what the four ecosystems converged on.

The 4 Layers of Agent Containment

Anthropic formalized the stack. OpenAI, DeepMind, and Microsoft each contributed nuance. Here's the unified model:

The Interaction Model

Every containment layer is a dialogue between participants, not a monologue inside the agent. The diagrams below show the real interactions:

Layer 1 — Sandbox: Agent ↔ Sandbox Runtime (ephemeral container, path validation)
Layer 2 — Permissions: Agent ↔ Permission Gate (whitelist, scope check)
Layer 3 — HITL: Agent ↔ Human Reviewer (approval, fatigue management)
Layer 4 — Audit: Agent ↔ SIEM (immutable logging, alerting)

Layer 1: Sandboxing — Agent ↔ Sandbox

The first line of defense: the agent runs in an environment where it physically cannot touch anything critical.

The pattern (5 ecosystems converge):

Dedicated containers or VMs per agent session (Anthropic, Google, Microsoft)
No network access to internal services by default (OpenAI, OWASP #4)
Read-only filesystem mounts for system directories (all five)
Ephemeral storage destroyed after each session (Anthropic, Google)

agent { # Agent
n1: circle label:"Session Start"
n2: rectangle label:"Request Tool Call"
n5: rectangle label:"Process Result"
n6: circle label:"Session End"
n1.handle(right) -> n2.handle(left)
n2.handle(bottom) -> sandbox.n3.handle(top) [label="tool call"]
}

sandbox { # Sandbox
n3: diamond label:"Path in Workspace?"
n4: rectangle label:"Execute in Ephemeral Container"
n7: rectangle label:"Deny - Log Security Event"
n8: rectangle label:"Return Result to Agent"
n3.handle(right) -> n4.handle(left) [label="yes"]
n3.handle(bottom) -> n7.handle(top) [label="no"]
n4.handle(right) -> n8.handle(left)
n7.handle(right) -> n8.handle(bottom)
n8.handle(top) -> agent.n5.handle(bottom) [label="result"]
n5.handle(right) -> n6.handle(left)
}

What Anthropic does: Claude Code runs in a sandboxed environment where each tool invocation is evaluated against ALLOWED_HOSTS, with SSRF protection and request timeouts.

What Microsoft adds: Copilot agents run in "Defender-managed sandboxes" that intercept prompt injection at the model boundary — before the agent can act on a malicious instruction. Their red team found that 34% of sandbox escapes in agentic systems came through tool descriptions, not user prompts.

The gotcha: Sandboxing is only as good as its configuration. A container with --privileged or a Docker socket mounted inside defeats the purpose. Google DeepMind's safety team recommends runtime attestation: verifying the sandbox configuration hasn't been tampered with before each agent session.

Layer 2: Permissions — Agent ↔ Permission Gate

Even inside a sandbox, an agent needs some access. Layer 2 defines exactly what.

The pattern:

Whitelist, never blacklist (Anthropic, OpenAI, OWASP)
Principle of least privilege per tool (Google, Microsoft)
Path-based restrictions: only ./workspace/, never /etc/ (all five)
Read vs. write vs. execute as separate permissions (Anthropic, OpenAI)

agent { # Agent
n1: circle label:"Tool Call Initiated"
n2: rectangle label:"Request File Write"
n5: diamond label:"Permission Granted?"
n6: rectangle label:"Write File to Disk"
n7: rectangle label:"Abort - Log Denial"
n8: circle label:"Done"
n1.handle(right) -> n2.handle(left)
n2.handle(bottom) -> permgate.n3.handle(top) [label="check permission"]
}

permgate { # Permission Gate
n3: rectangle label:"Check Whitelist and Path Scope"
n4: rectangle label:"Return Decision"
n3.handle(right) -> n4.handle(left)
n4.handle(top) -> agent.n5.handle(bottom) [label="decision"]
n5.handle(right) -> n6.handle(left) [label="granted"]
n5.handle(bottom) -> n7.handle(top) [label="denied"]
n6.handle(right) -> n8.handle(left)
n7.handle(right) -> n8.handle(left)
}

What OpenAI mandates: "Practices for Governing Agentic AI Systems" (Jan 2026) explicitly calls out delegation chain permission scoping — when Agent A delegates to Agent B, B must have strictly fewer permissions than A. No child agent should have more power than its parent. This maps directly to flowzap-senior-dev → flowzap-security-auditor delegation patterns.

What OWASP flags: Item #4 "Excessive Agency" in the Top 10 for LLM Applications (v2.0, Nov 2025) warns that granting agents unrestricted tool access — especially shell, file system writes, and network egress — is the #1 architectural vulnerability in production agent deployments.

Layer 3: HITL Approval — Agent ↔ Human

Some actions are too dangerous to automate. Layer 3 puts a human between the agent's decision and the real world.

The pattern:

Auto-approve: read-only, low-risk (Anthropic, Microsoft)
Ask: file writes, network calls, shell commands (all five)
Deny: destructive operations, config changes, secret access (OpenAI, Google)
Approval fatigue prevention: batch approvals, pattern learning (Anthropic's "auto mode" innovation)

agent { # Agent
  n1: circle label:"Dangerous Tool Call"
  n2: rectangle label:"Request Human Approval"
  n5: diamond label:"Human Approved?"
  n6: rectangle label:"Execute Tool Safely"
  n7: rectangle label:"Abort Operation"
  n8: circle label:"Done"
  n1.handle(right) -> n2.handle(left)
  n2.handle(bottom) -> human.n3.handle(top) [label="approval request"]
  n5.handle(right) -> n6.handle(left) [label="approved"]
  n6.handle(right) -> n8.handle(left)
  n7.handle(right) -> n8.handle(left)
  n5.handle(bottom) -> n7.handle(bottom) [label="Rejected"]
}

human { # Human Reviewer
  n3: rectangle label:"Review Tool Call and Context"
  n4: rectangle label:"Return Decision"
  n3.handle(right) -> n4.handle(left)
  n4.handle(top) -> agent.n5.handle(bottom) [label="decision"]
}

Action	Default	Rationale
`read_file`	Auto-approve	Read-only, no side effects
`grep` / `glob`	Auto-approve	Search operations
`write_file`	Ask	Modifies filesystem
`terminal` (shell)	Ask	Arbitrary code execution
`web_fetch`	Ask	Network egress
`.env` access	Ask + Warn	Secrets exposure
`rm -rf` / destructive	Deny	Irreversible damage

What Anthropic innovated: Claude Code's "auto mode" (March 2026) selectively skips permission prompts for low-risk operations while keeping the human in the loop for anything that modifies state. The key innovation: the agent learns which patterns you approve and auto-approves similar future operations, reducing fatigue without sacrificing security. But their postmortem of "three recent issues" (Sept 2025) revealed that pattern-learning auto-approve created a new class of bugs where developers stopped reading prompts and auto-approved everything.

What Google DeepMind enforces: "Approval hierarchies" — for multi-agent systems, no single human approves their own agent's actions. The approver must be in a different reporting chain, preventing rubber-stamping. Project Mariner implements this at the browser-action level.

Layer 4: Audit Logging — Agent ↔ SIEM

The layer most teams skip — and the one they wish they had during an incident.

The pattern:

Immutable log per agent session (Anthropic, Microsoft)
Every tool call logged: timestamp, tool name, arguments (sanitized), result (all five)
Security events flagged: denied permissions, unusual patterns, rate limit hits (OWASP)
Logs shipped to a separate system — not readable by the agent itself (Google)

agent { # Agent
n1: circle label:"Execute Tool Call"
n2: rectangle label:"Send Event to Logger"
n5: rectangle label:"Continue Execution"
n6: circle label:"Done"
n1.handle(right) -> n2.handle(left)
n2.handle(bottom) -> siem.n3.handle(top) [label="log event"]
}

siem { # SIEM Audit System
n3: rectangle label:"Sanitize Args and Strip Secrets"
n4: rectangle label:"Write to Immutable Log Store"
n3.handle(right) -> n4.handle(left)
n4.handle(top) -> agent.n5.handle(bottom) [label="logged"]
n5.handle(right) -> n6.handle(left)
}

What Microsoft's red team found: In 40% of their simulated attacks on Copilot agents, the audit logs were the only detection mechanism. Permissions failed due to misconfiguration. Sandboxing failed due to container escape. HITL failed due to approval fatigue. Audit logs caught 100% of the attacks post-hoc — but only in teams that had actually shipped logs off-machine and set up alerting rules.

What OWASP recommends: Logs must be "attestable" — cryptographically signed so an agent cannot tamper with its own audit trail after a breach. This is particularly critical for CI/CD agents that have write access to the repository.

Putting It All Together: The Complete Containment Stack

When all four layers work together, the architecture looks like this — a single Containment Stack that the Agent communicates with for every tool call:

agent { # Agent
n1: circle label:"User Prompt Received"
n2: rectangle label:"Agent Plans Tool Call"
n5: rectangle label:"Process Final Result"
n6: circle label:"Response to User"
n1.handle(right) -> n2.handle(left)
n2.handle(bottom) -> stack.n3.handle(top) [label="tool call"]
}

stack { # Containment Stack
n3: rectangle label:"L1 - Sandbox Ephemeral Container"
n4: rectangle label:"L2 - Whitelist Permission Check"
n7: diamond label:"Dangerous Operation?"
n8: rectangle label:"L3 - Human Approves"
n9: rectangle label:"Execute Tool"
n10: rectangle label:"L4 - Audit to Immutable SIEM"
n3.handle(right) -> n4.handle(left)
n4.handle(right) -> n7.handle(left)
n7.handle(right) -> n8.handle(left) [label="yes"]
n7.handle(bottom) -> n9.handle(top) [label="no"]
n8.handle(right) -> n9.handle(left)
n9.handle(right) -> n10.handle(left)
n10.handle(top) -> agent.n5.handle(bottom) [label="result"]
n5.handle(right) -> n6.handle(left)
}

What Works vs. What Breaks

Approach	Works When	Breaks When	Ecosystem Evidence
Sandbox-only	Agents are stateless, read-only	Agent needs persistent state or DB access	Anthropic: sandbox alone insufficient, June 2026
Permissions-only	Tool surface is small and stable	New tools added without updating whitelist	OpenAI: delegation chains must scope down, Jan 2026
HITL-only	Operations are infrequent	Agent makes 50+ tool calls/task (fatigue)	Anthropic: postmortem Sept 2025 on auto-mode fatigue
Audit-only	You have a dedicated security team	Logs are never reviewed (security theater)	Microsoft Red Team: 40% of attacks only caught by audit, Feb 2026
4-Layer Stack	You're running agents in production	— (this is the target state)	All five ecosystems

The lesson from these leading ecosystems: no single layer is enough. Sandboxing without permissions is a cardboard box. Permissions without HITL is a policy nobody reads. HITL without audit logging means you'll never know what you approved.

What This Means for FlowZap's Architecture

My own learning: The five containment patterns map directly to my agent orchestrator:

Containment Layer	FlowZap Implementation	Status
L1 Sandbox	MCP server `secureFetch()` wrapper (SSRF protection, ALLOWED_HOSTS, timeouts)	In place
L2 Permissions	Profile-scoped skills (marie-pierre, code, securite, qa) — each with minimal tool access	In place
L3 HITL	Cron → Idea Scout → Human approval → Writer pipeline	Built this week
L4 Audit	Hermes cron logs → session DB → Telegram delivery	In place

The missing piece: cross-profile permission scoping. When my senior-dev (code profile) delegates to security-auditor (securite profile), the child agent currently inherits full parent permissions. OpenAI's delegation chain principle says the child must have strictly fewer permissions. This is a gap I need to address.

The Bottom Line

Start with Layer 1 (sandboxing) today. If your agent runs in the same environment as your production database, fix that before anything else.
Layer 2 and 3 can be implemented incrementally. Whitelist your tools. Add approval prompts for writes. You don't need a perfect system on day one. Anthropic took 18 months from Claude Code launch (Apr 2025) to the 4-layer post (Jun 2026).
Layer 4 (audit) is the one most teams skip — and the one they wish they had during an incident. Log every tool call. Ship logs off-machine. Set up alerting rules for [SECURITY] events.
Multi-agent systems multiply the blast radius. OpenAI's chain-of-delegation principle and Google's approval hierarchies are not optional when you have more than one agent in the loop.

Inspirations

Anthropic Engineering — How we contain Claude across products, June 2026
OpenAI — Practices for Governing Agentic AI Systems, January 2026
Google DeepMind — Agent Safety Framework, March 2026
Microsoft AI Red Team — Lessons from Securing Copilot Agents, February 2026
OWASP Top 10 for LLM Applications v2.0, November 2025

All FlowZap diagrams generated with FlowZap Code. Copy any .fz block above and paste it into your FlowZap Account to view, edit, and share.

Agent Containment Patterns: How Anthropic, OpenAI, Google DeepMind and Microsoft are capping the Blast Radius of their framework

Why Agent Containment Matters Now

The 4 Layers of Agent Containment

The Interaction Model

Layer 1: Sandboxing — Agent ↔ Sandbox

Layer 2: Permissions — Agent ↔ Permission Gate

Layer 3: HITL Approval — Agent ↔ Human

Layer 4: Audit Logging — Agent ↔ SIEM

Putting It All Together: The Complete Containment Stack

What Works vs. What Breaks

What This Means for FlowZap's Architecture

The Bottom Line

Inspirations