Key insight

An agent runtime that shares the operator's process, filesystem, and credentials has the same blast radius as the operator. Bounded sandbox isolation — read-only target mounts, restricted network egress, ephemeral execution — is the same architectural pattern that browsers, container platforms, and CI runners adopted years ago. AI agents are the next workload that needs it.

The default is "share everything"

Most agent runtimes ship as a process the operator launches from their editor or shell. The process inherits the operator's user identity, environment variables, working directory, and network reach. There is no boundary between "the agent" and "the operator's desktop." Whatever capabilities the operator's account possesses, the agent possesses by default.

For exploratory use this is fine. For sustained use against material workloads, it is a design choice with consequences. The blast radius of any compromise — the unauthorised tool call from a successful prompt injection, the exfiltration from a compromised dependency, the misinterpreted instruction in a poisoned file — extends to whatever the operator can reach. For an engineer logged into source-control systems, cloud platforms, identity providers, and observability stacks, "whatever the operator can reach" is a large surface.

Why the operator's bound is the wrong bound

The operator's permissions exist because the operator has work to do. Most of that work has nothing to do with what any single agent session is supposed to accomplish. The mismatch between "what the operator can do" and "what the agent needs to do" is the gap that bounded sandboxes were invented to address.

This is not a new pattern. Web browsers run scripts in sandboxes precisely because the script's intended scope is much smaller than the user's. Container runtimes apply namespace and capability restrictions to processes for the same reason. Continuous-integration platforms run untrusted code in disposable runners for the same reason again. The AI agent is the next member of the same category.

The Shared Identity Runtime anti-pattern

Anti-pattern

The Shared Identity Runtime

Definition. An AI agent runtime executes inside the operator's user identity, with full filesystem access, full network reach, and direct access to all credentials present in the operator's environment — for the duration of every session, regardless of workload sensitivity.

Symptoms. No container or virtual-machine isolation around the agent runtime; no read-only mount distinguishing the analysed target from the host filesystem; no egress allow-list at the runtime boundary; credentials accessible to the operator are accessible to the agent without further mediation.

Why it is hazardous. The agent's worst-case behaviour is bounded by the operator's full permissions across every connected system. Defence-in-depth controls at narrower layers (tool catalogues, credential scopes, approval gates) compose better when there is a containing sandbox; without one, every control is the only control.

Related controls. Ephemeral container or virtual-machine isolation; read-only mount of the analysed target; egress allow-list at the runtime boundary; bounded execution time; a capability-scoped broker for credentials, mediated by the runtime rather than handed wholesale to the agent.

The three-zone architecture

The defensible posture for agent execution organises the system into three trust zones with explicit boundaries between them.

OPERATOR ZONE AGENT SANDBOX TARGET ZONE Trust: high named human · accountable Trust: mediated ephemeral · narrowly-scoped Trust: none treated as adversarial input Operator (IDE / shell) issues task · receives report Credential vault issues just-in-time scopes Agent runtime ephemeral container Capability broker mediates credentials + egress Target source mounted read-only External systems egress allow-list crosses intent · creds crosses read-only · egress Outcome — a successful compromise inside the sandbox cannot escape its bounded blast radius.
Figure 1. Three zones with explicit boundaries. Each zone is annotated with its trust level; each boundary is annotated with what is permitted to cross it. The agent runs in an ephemeral middle zone; the target is mounted read-only; egress is mediated by a capability broker. A successful compromise inside the sandbox cannot escape this bounded blast radius.

The properties that matter:

How to introduce a sandbox incrementally

The full three-zone architecture is a project. Teams without budget for the full project still benefit from incremental moves in its direction.

  1. Move the agent runtime into a container.

    Even a single container with the operator's identity is better than running inline. The container boundary creates a place where additional restrictions can later be added.

  2. Mount the analysed target read-only.

    Cheapest single improvement. A successful prompt injection that attempts to modify the target fails by filesystem permission rather than by the agent's judgement.

  3. Apply an egress allow-list at the container boundary.

    The container reaches only the endpoints the workload requires; everything else is refused at the network layer. Many container platforms make this a single configuration change.

  4. Replace direct credential injection with broker-mediated minting.

    The credential vault or workload-identity mechanism issues credentials to the broker on demand; the broker invokes the tool on the agent's behalf. The agent never sees the credential directly.

  5. Cap session duration and resource use.

    An agent session terminates automatically after a bounded window. Resource limits (CPU, memory, disk) prevent runaway behaviour.

The pattern is mature in other industries.

Browser sandboxes, container runtimes, and CI runners all solved versions of this problem before. Adopting their architectural patterns for agent runtimes does not require new invention — only the recognition that an AI agent is the same kind of workload.

A practical checklist

Test your own agent in ten minutes

The fastest way to find out whether this anti-pattern is present in your own system is to ask an AI coding assistant to look for it. Run the prompt below in a fresh chat session, on its own — and judge the system by what the code actually does, not by what its documentation claims.

Search the whole repository to find where this applies — do not
wait for me to list files. Ignore generated, vendored, and dependency
folders (build output, node_modules, vendor). Identify every location
the failure mode below could occur, read those files in full before
you judge, and list the search terms you used so I can confirm nothing
was missed.

You are looking for one specific failure mode: the agent executes
in the same security context as the operator who launched it (same
OS user, same cloud identity, same file system permissions, same
process), rather than in a sandbox / separate identity with a
narrower set of permissions.

Respond with exactly these four sections:
1. VERDICT: one of [present / not present / unclear]
2. EVIDENCE: file path + line numbers + a one-line quote per claim
3. WHY IT MATTERS: two sentences, plain English
4. FIX: a concrete change, with a short before/after code snippet
   if applicable. If "unclear", list the one piece of context you
   need to decide.

Insist on the four-part answer: a verdict with a file path, a line number, and a one-line quote is something you can act on; a verdict on its own is just an opinion. If the result is present, the FIX section is your starting point — run the agent in a sandbox or under a separate identity with a narrower set of permissions than the operator. Re-run the same prompt after the change to confirm the verdict flips to not present.

Conclusion

The Shared Identity Runtime is the default because it is the easiest design — and because most agent frameworks ship it as the only design. Replacing it is engineering effort, not breakthrough research; the patterns are well-established in adjacent industries. For workloads where the consequences of an unbounded blast radius are unacceptable, the work is overdue.

The right reflex for any new agent deployment is to ask: what is the worst this agent could do if it were compromised? If the answer is "whatever the operator could do," the architecture has not yet been designed.

References & further reading

  1. NIST SP 800-207 — Zero Trust Architecture — trust-zone principles applied to systems with mediated access.
  2. OWASP Top 10 for LLM Applications — particularly LLM06: Excessive Agency as it applies to runtime scope.
  3. Microsoft Zero Trust guidance — "assume breach" applied to AI runtimes.
  4. NIST AI Risk Management Framework — system-boundary considerations for AI components.
  5. Sandbox isolation patterns — surveys of approaches across browsers, runtimes, and CI systems applicable to agent design.