Key insight
An agent runtime that shares the operator's process, filesystem, and credentials has the same blast radius as the operator. Bounded sandbox isolation — read-only target mounts, restricted network egress, ephemeral execution — is the same architectural pattern that browsers, container platforms, and CI runners adopted years ago. AI agents are the next workload that needs it.
The default is "share everything"
Most agent runtimes ship as a process the operator launches from their editor or shell. The process inherits the operator's user identity, environment variables, working directory, and network reach. There is no boundary between "the agent" and "the operator's desktop." Whatever capabilities the operator's account possesses, the agent possesses by default.
For exploratory use this is fine. For sustained use against material workloads, it is a design choice with consequences. The blast radius of any compromise — the unauthorised tool call from a successful prompt injection, the exfiltration from a compromised dependency, the misinterpreted instruction in a poisoned file — extends to whatever the operator can reach. For an engineer logged into source-control systems, cloud platforms, identity providers, and observability stacks, "whatever the operator can reach" is a large surface.
Why the operator's bound is the wrong bound
The operator's permissions exist because the operator has work to do. Most of that work has nothing to do with what any single agent session is supposed to accomplish. The mismatch between "what the operator can do" and "what the agent needs to do" is the gap that bounded sandboxes were invented to address.
This is not a new pattern. Web browsers run scripts in sandboxes precisely because the script's intended scope is much smaller than the user's. Container runtimes apply namespace and capability restrictions to processes for the same reason. Continuous-integration platforms run untrusted code in disposable runners for the same reason again. The AI agent is the next member of the same category.
The Shared Identity Runtime anti-pattern
Anti-pattern
The Shared Identity Runtime
Definition. An AI agent runtime executes inside the operator's user identity, with full filesystem access, full network reach, and direct access to all credentials present in the operator's environment — for the duration of every session, regardless of workload sensitivity.
Symptoms. No container or virtual-machine isolation around the agent runtime; no read-only mount distinguishing the analysed target from the host filesystem; no egress allow-list at the runtime boundary; credentials accessible to the operator are accessible to the agent without further mediation.
Why it is hazardous. The agent's worst-case behaviour is bounded by the operator's full permissions across every connected system. Defence-in-depth controls at narrower layers (tool catalogues, credential scopes, approval gates) compose better when there is a containing sandbox; without one, every control is the only control.
Related controls. Ephemeral container or virtual-machine isolation; read-only mount of the analysed target; egress allow-list at the runtime boundary; bounded execution time; a capability-scoped broker for credentials, mediated by the runtime rather than handed wholesale to the agent.
The three-zone architecture
The defensible posture for agent execution organises the system into three trust zones with explicit boundaries between them.
The properties that matter:
- The agent runtime is ephemeral. Each session starts a fresh container or virtual machine; the agent's mutable state does not persist between sessions.
- The target is read-only. Whatever source or content the agent is analysing is mounted into the sandbox without write permission. A successful prompt injection cannot persist back into the analysed material.
- Egress is mediated. The agent reaches external systems only through the capability broker, which enforces a per-session allow-list. Network paths the agent does not need are not reachable.
- Credentials are not handed wholesale. The broker mints credentials for specific calls, scoped to specific verbs, and discards them after use. The agent never holds the operator's full credential set.
How to introduce a sandbox incrementally
The full three-zone architecture is a project. Teams without budget for the full project still benefit from incremental moves in its direction.
- Move the agent runtime into a container.
Even a single container with the operator's identity is better than running inline. The container boundary creates a place where additional restrictions can later be added.
- Mount the analysed target read-only.
Cheapest single improvement. A successful prompt injection that attempts to modify the target fails by filesystem permission rather than by the agent's judgement.
- Apply an egress allow-list at the container boundary.
The container reaches only the endpoints the workload requires; everything else is refused at the network layer. Many container platforms make this a single configuration change.
- Replace direct credential injection with broker-mediated minting.
The credential vault or workload-identity mechanism issues credentials to the broker on demand; the broker invokes the tool on the agent's behalf. The agent never sees the credential directly.
- Cap session duration and resource use.
An agent session terminates automatically after a bounded window. Resource limits (CPU, memory, disk) prevent runaway behaviour.
Browser sandboxes, container runtimes, and CI runners all solved versions of this problem before. Adopting their architectural patterns for agent runtimes does not require new invention — only the recognition that an AI agent is the same kind of workload.
A practical checklist
- The agent runtime executes inside a container or virtual machine, not inline in the operator's process.
- The container is ephemeral; mutable state does not persist across sessions.
- Analysed target material is mounted read-only into the sandbox.
- An egress allow-list at the container boundary restricts network reach to required endpoints.
- Credentials are issued by a broker on demand and scoped to specific calls; the agent does not hold operator credentials directly.
- Session duration is bounded by a timeout; resource limits (CPU, memory, disk) are configured.
- Filesystem access within the sandbox is minimised; the operator's home directory is not mounted.
- The sandbox image is signed; its provenance is verifiable.
- Sandbox configuration is reviewed at the same cadence as any other security-sensitive configuration in the product.
- Operator-facing documentation explains the sandbox boundary and what kinds of actions require operator approval.
Test your own agent in ten minutes
The fastest way to find out whether this anti-pattern is present in your own system is to ask an AI coding assistant to look for it. Run the prompt below in a fresh chat session, on its own — and judge the system by what the code actually does, not by what its documentation claims.
Search the whole repository to find where this applies — do not
wait for me to list files. Ignore generated, vendored, and dependency
folders (build output, node_modules, vendor). Identify every location
the failure mode below could occur, read those files in full before
you judge, and list the search terms you used so I can confirm nothing
was missed.
You are looking for one specific failure mode: the agent executes
in the same security context as the operator who launched it (same
OS user, same cloud identity, same file system permissions, same
process), rather than in a sandbox / separate identity with a
narrower set of permissions.
Respond with exactly these four sections:
1. VERDICT: one of [present / not present / unclear]
2. EVIDENCE: file path + line numbers + a one-line quote per claim
3. WHY IT MATTERS: two sentences, plain English
4. FIX: a concrete change, with a short before/after code snippet
if applicable. If "unclear", list the one piece of context you
need to decide.
Insist on the four-part answer: a verdict with a file path, a line number, and a one-line quote is something you can act on; a verdict on its own is just an opinion. If the result is present, the FIX section is your starting point — run the agent in a sandbox or under a separate identity with a narrower set of permissions than the operator. Re-run the same prompt after the change to confirm the verdict flips to not present.
Conclusion
The Shared Identity Runtime is the default because it is the easiest design — and because most agent frameworks ship it as the only design. Replacing it is engineering effort, not breakthrough research; the patterns are well-established in adjacent industries. For workloads where the consequences of an unbounded blast radius are unacceptable, the work is overdue.
The right reflex for any new agent deployment is to ask: what is the worst this agent could do if it were compromised? If the answer is "whatever the operator could do," the architecture has not yet been designed.
References & further reading
- NIST SP 800-207 — Zero Trust Architecture — trust-zone principles applied to systems with mediated access.
- OWASP Top 10 for LLM Applications — particularly LLM06: Excessive Agency as it applies to runtime scope.
- Microsoft Zero Trust guidance — "assume breach" applied to AI runtimes.
- NIST AI Risk Management Framework — system-boundary considerations for AI components.
- Sandbox isolation patterns — surveys of approaches across browsers, runtimes, and CI systems applicable to agent design.