Key insight

An agent is an unbounded loop over metered resources unless you bound it. Cap iterations and tool calls, enforce per-user and per-tenant budgets and rate limits, set timeouts, and add circuit breakers — so a runaway task stops itself instead of stopping your service or draining your account.

Why agents consume without limit

An agent runs a loop: reason, call a tool, read the result, reason again, until it decides the task is done. Each pass can invoke a paid model API and metered tools — search, code execution, external services. Because the model decides when to stop, the loop's length is not fixed by the code; it is a product of the model's judgement, which is non-deterministic. A confused or adversarially-steered agent may never decide it is finished, retrying the same failing step, fanning out into ever more sub-tasks, or looping between two states.

The OWASP catalogue names this LLM10: Unbounded Consumption, and the colloquial term — denial-of-wallet — captures the business reality. It is both an availability risk (the agent exhausts a shared quota and degrades the service for everyone) and a direct financial one (it spends real money on tokens and tool calls). Worse, it is attacker-actionable: if each request a user sends costs you more than it costs them, an adversary can weaponise that asymmetry deliberately.

The Unbounded Consumption anti-pattern

Anti-pattern

Unbounded Consumption

Definition. An agent loop calls paid models and metered tools with no enforced ceiling on iterations, no per-task or per-actor budget, no rate limit, no timeout, and no circuit breaker to halt a runaway.

Symptoms. Agent loops with no maximum step count; tool calls uncapped per task; no per-user or per-tenant spend or rate budget; no timeout on a task or a tool; cost and token usage unmonitored; no automatic stop when spend or error rate spikes.

Why it is hazardous. A single confused or adversarial task can run indefinitely, draining an API budget, exhausting a shared rate quota, and degrading or disabling the service for every other user — an outage and a bill at the same time.

Related controls. Iteration and tool-call caps per task; per-user and per-tenant budgets and rate limits; timeouts; circuit breakers on spend and error rate; and real-time cost and usage monitoring with alerts.

A hypothetical runaway

The following illustrates a plausible failure mode. No specific incident is implied.

A research agent is asked to “find every paper that cites this one, and every paper that cites those.” The task has no natural stopping point, and the agent has no step cap. It fans out: each citation spawns a search, each search spawns more, each step a paid model call and a metered API request. Within minutes it has made tens of thousands of calls, the API budget for the day is gone, and the shared rate limit is saturated — so every other user's agent starts failing too.

The same outcome arrives without an honest mistake. An attacker notices that one cheap request to the product triggers a cascade of expensive backend calls, and sends a stream of such requests purely to run up the bill and exhaust the quota. In both cases a per-task iteration cap, a per-user budget, and a circuit breaker on spend would have turned an open-ended runaway into a task that stopped at a known, affordable limit.

Four layers that compose into a defence

  1. Bound the loop.

    Every agent run has a hard maximum on iterations and on tool calls, enforced in code, not left to the model's judgement. When the cap is hit the task stops with a clear “limit reached” result rather than continuing. A bounded loop cannot run away.

  2. Budget per actor.

    Track spend and request volume per user and per tenant, and enforce a ceiling on both. One user's expensive task cannot consume the shared quota, and a single account cannot exceed the cost you allocated to it. Budgets convert “unlimited blast radius” into a known, per-actor cap.

  3. Time-box everything.

    Set timeouts on the overall task and on each tool call, so a hung step or a slow dependency cannot stall resources indefinitely. A task that exceeds its time budget is cancelled, not left running.

  4. Add circuit breakers and monitor cost.

    Watch spend, token usage, and error rate in real time, and trip a breaker that halts the agent when any crosses a threshold. Pair it with alerts so a runaway is noticed in minutes, not on the monthly invoice. This is the Unsupervised Perimeter lesson applied to cost.

Availability and cost are the same control here.

The limit that stops a runaway from draining your wallet is the same limit that stops it from exhausting the quota everyone else depends on. Bound consumption once and you protect both the bill and the service.

A practical checklist

Test your own codebase in ten minutes

The fastest way to find out whether this anti-pattern is present in your own system is to ask an AI coding assistant to look for it. Run the prompt below in a fresh chat session, on its own — and judge the system by what the code actually does, not by what its documentation claims.

Search the whole repository to find where this applies — do not
wait for me to list files. Ignore generated, vendored, and dependency
folders (build output, node_modules, vendor). Identify every location
the failure mode below could occur, read those files in full before
you judge, and list the search terms you used so I can confirm nothing
was missed.

You are looking for one specific failure mode: an agent loop calls
paid models or metered tools with no enforced ceiling — no hard
cap on iterations or tool calls, no per-user or per-tenant budget or
rate limit, no task/tool timeout, and no circuit breaker that halts a
runaway when spend or error rate spikes.

If there is no agent loop or external/metered call, say
"not applicable".

Respond with exactly these four sections:
1. VERDICT: one of [present / not present / unclear]
2. EVIDENCE: file path + line numbers + a one-line quote per claim
3. WHY IT MATTERS: two sentences, plain English
4. FIX: a concrete change, with a short before/after code snippet
   if applicable. If "unclear", list the one piece of context you
   need to decide.

Insist on the four-part answer: a verdict with a file path, a line number, and a one-line quote is something you can act on; a verdict on its own is just an opinion. If the result is present, the FIX section is your starting point — add iteration caps, per-actor budgets, timeouts, and a circuit breaker. Re-run the same prompt after the change to confirm the verdict flips to not present.

Conclusion

An agent is a loop that spends money, and a loop without a bound is an incident with a delay. The fix is unglamorous and reliable: cap the iterations, budget per actor, time-box the work, and break the circuit when cost or errors spike. None of it limits what a well-behaved agent can do — it only ensures a misbehaving one stops at a number you chose instead of a number your invoice reveals.

References & further reading

  1. OWASP Top 10 for LLM Applications — LLM10: Unbounded Consumption.
  2. OWASP Agentic AI — Threats and Mitigations — resource-exhaustion risks in agent loops.
  3. NIST AI Risk Management Framework — resilience and resource-management controls.
  4. The Unsupervised Perimeter — monitoring and alerting on agent behaviour.