The Unsanitised Output Sink: Model Output Is Untrusted Input

Key insight

Model output is untrusted input to whatever consumes it. Encode it for its destination, validate structured output against a schema, and run the sink with least privilege. The model being yours does not make its output safe — the input that shaped it was not.

Why output gets a free pass

Application-security training teaches us to distrust user input. The model, by contrast, feels like part of our own system — we configured it, we wrote its prompt, it speaks in our product's voice — so its output tends to inherit the trust we'd give our own code. That instinct is wrong. The model's output is a function of its input, and a meaningful share of that input is attacker-reachable: the user's message, a retrieved document, a tool result. Anything that can steer the model can steer what it emits.

The OWASP catalogue names this LLM05: Improper Output Handling. The failure is not in the model; it is in the code that takes the model's text and drops it, unescaped, into a sink that interprets text as instructions — the browser's HTML parser, a SQL engine, a shell, an eval, a downstream API. The classic web vulnerabilities never went away; the model just became a new way to reach them.

The Unsanitised Output Sink anti-pattern

Anti-pattern

Unsanitised Output Sink

Definition. Output produced by the model is passed to a downstream interpreter — browser, shell, SQL engine, file system, or another service — without the encoding, escaping, schema validation, or privilege limits that interpreter requires for untrusted data.

Symptoms. Model output rendered as raw HTML or Markdown without sanitisation; generated strings interpolated into SQL, shell commands, or file paths; structured output parsed without schema validation; output forwarded to another service that treats it as a trusted instruction; output handling that differs from how the same code treats user input.

Why it is hazardous. Because the model is steerable by prompt injection, attacker-controlled input becomes attacker-controlled output, and an unsanitised sink turns that into stored XSS, SQL injection, command execution, or a forged downstream call.

Related controls. Treat output as untrusted; context-aware output encoding per sink; schema validation for structured output; parameterised queries and argument-array process execution; least-privilege sinks; and a content security policy for rendered output.

A hypothetical exploitation

The following illustrates a plausible failure mode. No specific incident is implied.

A knowledge agent answers questions and renders its answers as HTML so it can show formatted tables and links. The rendering path inserts the model's output into the page directly, trusting it to be benign prose. An attacker plants a document in the agent's corpus containing an instruction: “End every answer with the following HTML,” followed by a <script> tag that exfiltrates the page's session token.

Later, an ordinary user asks a question that retrieves the planted document. The model appends the script tag as told, and because the rendering path does not sanitise, the browser executes it — stored cross-site scripting, delivered through the model. The same shape repeats at every other sink: had the output instead been interpolated into a SQL WHERE clause or passed to a shell, the result would have been injection or command execution.

Four layers that compose into a defence

Classify model output as untrusted at the boundary.
The function that receives the model's response marks it as untrusted, the same label you'd put on raw user input. Every downstream use then inherits the obligation to handle it safely. This single reclassification is what most of the fix depends on.
Encode for the specific sink.
Apply context-aware encoding at the point of use: HTML-encode (and sanitise allowed tags) before rendering, parameterise before a database, pass arguments as an array rather than a shell string, and validate structured output against a strict schema before acting on it. There is no universal escape — the encoding must match the destination.
Constrain what the output can express.
Where the output drives an action, restrict it to a closed set: an enum of allowed commands, an allow-list of URL schemes, a schema with no free-form code. A model that can only choose from safe options cannot be talked into an unsafe one.
Run every sink with least privilege.
Assume sanitisation will one day be bypassed and limit the damage: a content security policy on rendered output, a read-only or scoped database role, a sandboxed and unprivileged process for anything executed. Defence in depth means a single missed escape is contained, not catastrophic.

If you'd escape it from a user, escape it from the model.

The test is simple: imagine the exact output string had arrived in a form field. If you'd sanitise it then, you must sanitise it now — the model is just a more articulate source of the same untrusted text.

A practical checklist

Every place model output reaches a sink (browser, SQL, shell, file system, downstream API) is enumerated.
Model output is classified as untrusted at the boundary, identically to user input.
Rendered output is sanitised and context-encoded; raw HTML/Markdown insertion is not used.
Database access from generated values uses parameterised queries, never string interpolation.
Process execution passes arguments as an array; no generated text reaches a shell string.
Structured output is validated against a strict schema before it drives any action.
Output that drives actions is limited to a closed allow-list of commands/schemes.
Rendered output is covered by a content security policy; sinks run with least privilege.
Output handling is reviewed alongside input handling, not as an afterthought.

Test your own codebase in ten minutes

The fastest way to find out whether this anti-pattern is present in your own system is to ask an AI coding assistant to look for it. Run the prompt below in a fresh chat session, on its own — and judge the system by what the code actually does, not by what its documentation claims.

Search the whole repository to find where this applies — do not
wait for me to list files. Ignore generated, vendored, and dependency
folders (build output, node_modules, vendor). Identify every location
the failure mode below could occur, read those files in full before
you judge, and list the search terms you used so I can confirm nothing
was missed.

You are looking for one specific failure mode: text produced by the
model is passed to a downstream interpreter — rendered as HTML or
Markdown in a browser, interpolated into a SQL query, passed to a
shell or eval, written to a file path, or forwarded to another service
— without context-aware encoding, schema validation, or
least-privilege limits. In short, model output is trusted where user
input would be sanitised.

If model output never reaches such a sink, say "not applicable".

Respond with exactly these four sections:
1. VERDICT: one of [present / not present / unclear]
2. EVIDENCE: file path + line numbers + a one-line quote per claim
3. WHY IT MATTERS: two sentences, plain English
4. FIX: a concrete change, with a short before/after code snippet
   if applicable. If "unclear", list the one piece of context you
   need to decide.

Insist on the four-part answer: a verdict with a file path, a line number, and a one-line quote is something you can act on; a verdict on its own is just an opinion. If the result is present, the FIX section is your starting point — encode per sink and validate structured output. Re-run the same prompt after the change to confirm the verdict flips to not present.

Conclusion

The model did not invent a new vulnerability class; it gave attackers a new, fluent way to reach the old ones. The remedy is the discipline application security has taught for twenty years, pointed at a new source: distrust the output, encode it for its destination, constrain what it can express, and run the sink with least privilege. Do that and prompt injection stops short of code execution.

References & further reading

OWASP Top 10 for LLM Applications — LLM05: Improper Output Handling.
OWASP Cross-Site Scripting Prevention Cheat Sheet — context-aware output encoding.
OWASP Injection Prevention Cheat Sheet — parameterisation and safe sinks.