The Unsupervised Perimeter: When Some Agents Are Governed and Others Are Not

Key insight

Security investment in a subset of agents is undone by the presence of any user-invocable agent that did not receive the same investment. Either every agent in the product passes the same bar, or the unreviewed agents are moved out of the default surface and labelled accordingly.

How an unsupervised perimeter forms

The starting point is almost always benign. A team adopts an agent framework and inherits a set of community-maintained or vendor-maintained agents that come bundled. A handful — the ones the team actively uses — receive security review, tool catalogue curation, and ongoing maintenance. The rest are kept "in case someone needs them." Over time, the team's investment goes into the few; the many become an inheritance no one is responsible for.

From the user's perspective, all of these agents are equally accessible. From the security perspective, only some of them are equally hardened. The result is exactly the conditions that produce a documented governance failure: the perimeter is decided by the weakest, least-reviewed agent the user can invoke — not by the most-reviewed ones the security team is proud of.

The OWASP Top 10 for LLM Applications captures the broader issue under LLM06: Excessive Agency: an LLM-based system granted permissions or capabilities beyond what is needed expands its blast radius beyond what was reviewed. The Unsupervised Perimeter is what happens when "what is needed" is measured against the few governed agents and "what is granted" is measured against all of them.

The Unsupervised Perimeter anti-pattern

Anti-pattern

The Unsupervised Perimeter

Definition. A product ships a set of user-invocable AI agents in which only a subset has received the team's security review, prompt curation, tool-catalogue constraint, and ongoing maintenance. The remainder are inherited, undocumented, and indistinguishable from the curated set at the user's invocation point.

Symptoms. A user-facing agent inventory whose count exceeds the count of named agents in the team's documentation; absence of frontmatter or metadata distinguishing curated from inherited agents; absence of a review cadence covering every shipped agent; absence of an explicit promotion path between "inherited" and "curated" categories.

Why it is hazardous. A user — or an attacker exploiting a successful prompt injection — that targets an inherited agent encounters none of the controls applied to the curated ones. The product's effective security posture is the posture of its weakest invocable agent, not its strongest.

Related controls. Explicit metadata distinguishing supported from optional agents; a default surface containing only supported agents; an explicit promotion path with documented criteria; a review cadence covering every agent that remains in the supported surface; deletion of inherited agents that have not been used in a reasonable window.

A hypothetical pivot scenario

The following illustrates a plausible pivot path under the Unsupervised Perimeter anti-pattern. It is constructed from elements common to several reported industry patterns; no specific incident is implied.

A product ships eleven flagship agents that have received careful review: tool catalogues curated, prompts audited for prompt-injection resilience, regular maintenance. The same product ships forty additional agents inherited from an upstream template. The inherited agents have not been reviewed; their tool catalogues are inherited defaults; their prompts were written for a different organisation's context.

An attempt to influence the product's behaviour — by an attacker, by a researcher, or simply by an ambitious user — targets one of the forty rather than one of the eleven. The constraints are weaker; the prompt has not been audited; the tool catalogue is broader. Whatever the eleven would have refused, the chosen one accepts. The product's documentation does not distinguish the two categories, so the result is reported and attributed to the product as a whole.

No individual agent failed in isolation. The product's governance failed in aggregate. The customer-facing question becomes "which agents did you actually review?" — and there is no clean answer.

A product's effective security posture is the posture of its weakest invocable agent, not its strongest.

The governance pattern

Define the supported surface explicitly.
Maintain an enumerated list of agents the product supports. Every agent on the list has a named internal owner, a documented purpose, a curated tool catalogue, a prompt-injection review, and a scheduled re-review cadence.
Move everything else off the default surface.
Inherited or optional agents live in a separate folder (commonly optional/ or experimental/) and carry frontmatter declaring them not user-invocable from the default surface. The user must take a deliberate action to enable any of them.
Publish the supported-versus-optional distinction.
Customer-facing documentation lists supported agents and notes the existence of an optional set. The distinction is visible enough that customer security questionnaires can be answered cleanly.
Document a promotion path.
An optional agent that becomes useful can be promoted to supported by a documented checklist: assign an owner, audit the prompt, curate the tool catalogue, add to the review cadence, ship documentation. The bar is explicit; "supported" is not a feeling.
Retire what is not used.
Optional agents that remain unused for a defined window (commonly two quarters) are removed. Carrying inherited agents indefinitely is how the perimeter regrows.
Gate the supported surface in continuous integration.
A build-time check verifies that every agent in the default surface has the required metadata: owner, review date, catalogue size, last audit. Agents missing any field fail the build until they are either fixed or moved to optional.

Optional is not a code smell. Unlabelled-optional is.

Maintaining a clearly-labelled set of optional agents alongside a curated supported set is a reasonable engineering pattern. The failure is in not drawing the line — leaving the optional and supported categories visually identical at the user's invocation point.

A practical checklist

Every agent in the default surface appears in the supported-agent inventory with a named owner.
Every supported agent has a curated tool catalogue and a prompt-injection review on file.
Every supported agent has a scheduled re-review date; reviews are conducted on cadence and recorded.
Agents that have not been reviewed are moved to a separate optional folder and marked not user-invocable from the default surface.
Customer-facing documentation distinguishes supported from optional agents clearly.
A promotion path from optional to supported is documented with explicit criteria.
Optional agents unused for a defined retention window are deleted.
A continuous-integration check verifies that every agent in the default surface has the required metadata.
The supported-agent inventory is reviewed at the same cadence as any other security-sensitive inventory in the product.
The total count of agents shipped — supported plus optional — is accurately reflected in product documentation. No inflated claims; no buried inheritance.

Test your own agent in ten minutes

The fastest way to find out whether this anti-pattern is present in your own system is to ask an AI coding assistant to look for it. Run the prompt below in a fresh chat session, on its own — and judge the system by what the code actually does, not by what its documentation claims.

Search the whole repository to find where this applies — do not
wait for me to list files. Ignore generated, vendored, and dependency
folders (build output, node_modules, vendor). Identify every location
the failure mode below could occur, read those files in full before
you judge, and list the search terms you used so I can confirm nothing
was missed.

You are looking for one specific failure mode: the codebase
defines or registers multiple agents, sub-agents, or tool-bearing
flows, but the security and governance controls (rate limits, tool
allow-lists, evaluation hooks, logging, content filters) are
applied to only a subset of them.

Enumerate every distinct agent or sub-agent you can find across the
repository and state for each whether the controls applied to the
main one also apply to it.

Respond with exactly these four sections:
1. VERDICT: one of [present / not present / unclear]
2. EVIDENCE: file path + line numbers + a one-line quote per claim
3. WHY IT MATTERS: two sentences, plain English
4. FIX: a concrete change, with a short before/after code snippet
   if applicable. If "unclear", list the one piece of context you
   need to decide.

Insist on the four-part answer: a verdict with a file path, a line number, and a one-line quote is something you can act on; a verdict on its own is just an opinion. If the result is present, the FIX section is your starting point — apply the same governance controls (rate limits, allow-lists, evaluation hooks, logging) to every agent and sub-agent, not only the headline one. Re-run the same prompt after the change to confirm the verdict flips to not present.

Conclusion

The Unsupervised Perimeter is a governance failure, not a coding failure. The fix is a small set of conventions — metadata fields, folder placement, a build-time check, a customer-facing distinction — applied consistently. The cost is modest; the result is that the product's security posture matches its security investment, not the inheritance the team happened not to clean up.

The right reflex when adopting an agent framework is to immediately ask: which of these agents have we reviewed, and which have we inherited? The answer determines the product's effective posture. Make it visible, in code and in documentation, and the perimeter stops being unsupervised.

References & further reading

OWASP Top 10 for LLM Applications — particularly LLM06: Excessive Agency.
NIST AI Risk Management Framework — accountability and governance considerations for AI systems.
Microsoft Azure Well-Architected — Responsible AI — guidance on the operational governance of AI components.
NIST SP 800-207 — Zero Trust Architecture — least-privilege principles applied to system components.