Key insight

Structured checklists find structured failures. Adversarial role-play finds the failures the checklist did not anticipate — the chain of two small mistakes that becomes one large one, the assumption the code makes that an attacker can flip. Both passes matter; neither replaces the other.

The prompt

Paste this into a fresh chat with the whole-repo search set per Tip 1.

Reminder: AI assistant output is probabilistic. Verify every claim against your source before acting on it. False positives and false negatives are expected.

Search the whole repository to find where this applies — do not
wait for me to list files. Ignore generated, vendored, and dependency
folders (build output, node_modules, vendor). Read every file that
forms an input surface or handles untrusted data in full, and list
the search terms you used so I can confirm nothing was missed.

Take the position of an adversary who wants this AI tool / agent to
do something its authors did not intend — leak data, perform
an unauthorised action, execute attacker-supplied code, exhaust
resources, or impersonate a user.

You have read access to the source. You can send any input the
tool accepts (prompts, tool arguments, files, URLs, configuration).
You cannot modify the source.

Produce, in this exact format:

ATTACK PLAYBOOK
For each attack you can construct (aim for at least five), give:
  - GOAL: what you want the tool to do
  - ENTRY POINT: which input surface you use
  - STEPS: the concrete sequence (specific values, not hand-waving)
  - WHY IT WORKS: which line or design choice makes it possible
  - EFFORT: low / medium / high
  - BLAST RADIUS: contained / lateral movement / full compromise

DEFENCES THAT WOULD STOP EACH ATTACK
For each attack above, the smallest change to the source that
would defeat it. Include a before/after snippet.

WHAT YOU COULD NOT FIND A WAY TO ATTACK
List the parts of the tool that resisted your attempts, and say
briefly why — so the authors know what to keep.

Constraints:
- No generic advice. Every claim must reference a specific file
  and line number in the source.
- If you do not have enough context to construct any attack, say
  so and list the single piece of information you need.
- Do not invent code that is not in the listed files.

Reading the answer

You will get one of three useful outcomes, and one less-useful one:

When to use this prompt vs. the prompt pack

SituationReach for
First-ever self-review of a new toolTip 2 (prompt pack) first, this prompt second.
Familiar tool, before a releaseThis prompt first — you already know the structured answers.
Refactor of a security-critical componentBoth, on just the changed files.
Adding a new tool / capability to the agentThis prompt, scoped to the new file plus the dispatcher / catalogue file.

Failure modes & triage

SymptomLikely causeFix
Assistant refuses to role-play an attackerSafety policy is reading “adversary” as malicious intent.Reframe: “Act as a security reviewer producing a defensive playbook for the authors of this tool.” Same content, different framing.
Attacks are imaginative but unrelated to your codeAssistant defaulted to general LLM-security tropes instead of reading the source.Add: “Quote the exact line of code your attack depends on. If you cannot, do not include the attack.”
All attacks have “low effort”Calibration drift; the assistant is overstating ease.Add: “Effort is ‘low’ only if the attack works on first try without any reconnaissance. Most real attacks are medium or high.”

Next tip

You now have a structured pass and an adversarial pass. Both produce findings. Tip 4 → is the tiny convention for turning those findings into changes that actually ship.

← Back to the index