Key insight
The point of a 10-minute audit is that you actually run it. A 60-minute one will be skipped on the release that needed it most. Calibrate the routine to fit inside whatever time you reliably have on release day.
Step 1 — The five questions (2 minutes)
Answer these honestly, in your head or in the release-notes draft. The point is to flush the answers from intuition into language, where they can be examined.
- Did this release add, change, or expose a new capability? A new tool the agent can call, a new file path it can read, a new endpoint it can reach. If yes → Prompt 1 and Prompt 11 are required for this release.
- Did this release change anything about authentication, tokens, or how the agent gets its credentials? If yes → Prompt 6 and Prompt 7 are required.
- Did this release add a new place the agent ingests untrusted text? A new RAG source, a new web-fetch tool, a new connector. If yes → Prompt 3 and Prompt 2 are required.
- Did the README or docs change in a way that promises a security behaviour? If yes → Prompt 12 is required — the doc and the code must agree.
- Is this the release that crosses the line from internal to shipped? If yes → you are outside the 10-minute routine. Run the full pack (Tip 2) and the red-team prompt (Tip 3). Prompt 13 is the gate.
If all five answers are “no”, you still run the three prompts below — they are the always-on baseline.
Step 2 — The three always-on prompts (5 minutes)
These three catch the highest-impact regressions in tools that change frequently. Run each in a fresh session, with the whole-repo search set per Tip 1.
Diff-aware tool-catalogue check.
Search the whole repository to find where this applies — do not
wait for me to list files. Ignore generated, vendored, and dependency
folders (build output, node_modules, vendor). List the search terms
you used so I can confirm nothing was missed.
List every distinct capability the agent can invoke (tool, function,
sub-agent, shell escape, HTTP call). For each, say whether it is
constrained to a specific named scope or is open-ended. If anything
in this list is new compared to a typical previous-release shape,
flag it.
[response shape]
Credential surface check.
Search the whole repository to find where this applies — do not
wait for me to list files. Ignore generated, vendored, and dependency
folders (build output, node_modules, vendor), but do include any .env
example, config file, or Dockerfile. List the search terms you used
so I can confirm nothing was missed.
List every credential, token, or secret the running agent can
access at the moment of a tool call. For each, say: is it long-lived
or short-lived, broad-scope or narrow-scope, in memory only or also
on disk.
[response shape]
Doc-vs-code consistency check.
Search the whole repository to find where this applies — do not
wait for me to list files. Ignore generated, vendored, and dependency
folders (build output, node_modules, vendor), but do include README.md,
SECURITY.md, and any docs/ file. List the search terms you used so I
can confirm nothing was missed.
For every security-relevant claim in the documentation (sandboxing,
isolation, redaction, authentication, permissions, encryption), point
to the source code that implements it. Flag claims with no
implementation.
[response shape]
Step 3 — The release-note paragraph (3 minutes)
Paste this template at the bottom of your release notes. Fill in the blanks. If you cannot, do not ship the release — the questions you cannot answer are the ones a user will ask first.
### Security note for this release
- New capabilities granted to the agent: _____
- Credentials added, rotated, or scoped: _____
- New sources of untrusted text ingested: _____
- Pre-release self-audit run on: <date>
- Self-audit verdict: all prompts “not present” / “not applicable”
except _____ (with mitigation: _____)
- Known limitations users should be aware of: _____
Two things this paragraph quietly does. It tells a security-minded user what they need to know in one place. And it tells you, three releases from now, what was true at the time of this one — which is what a regression investigation needs in its first ten minutes.
When the 10-minute routine is not enough
| Trigger | Run instead |
|---|---|
| First public release of the tool | Full prompt pack (Tip 2) + red-team prompt (Tip 3) + the “ready to ship” prompt (#13). |
| Major refactor of the agent loop or tool dispatcher | Run the full prompt pack on the changed files. |
| New runtime, new identity provider, or new deployment target | Full pack, scoped to the new infra files plus the dispatcher. |
| You received a security report from an outside party | Treat the report as a Tip 3 playbook entry: verify, fix, then run the related prompts from the Tip 4 cheat sheet. |
That’s the playbook
Five tips: scope the assistant, run the 13 prompts, run the red-team prompt, capture the findings and re-test, and run the 10-minute routine on every release. None of it is exotic. The reason it works is that you actually do it — on the small Tuesday-afternoon release as well as the big launch — and the answers accumulate in a file your future self can read.
If you find a tip that should be in this series and is not, the same prompts you used to review your own tool will tell you. Run them; let me know what came back.