Key insight

The point of capturing findings is not bureaucracy, it is being able to ask the same question again next month and notice that the answer has changed. Without that, every release is a fresh self-review with no memory.

The fix-list file

One file at the root of the tool: SELF-REVIEW.md. Append-only. One block per finding. The format is small on purpose — if it takes more than a minute to fill in, no one will.

## Live-fetch dependency  ·  2026-05-31

VERDICT (initial): present
EVIDENCE: install.sh:14 — "curl -sSL https://example.com/setup.sh | sh"
FIX: pinned to commit hash; SHA verified before exec.
RE-RUN VERDICT: not present (2026-05-31, same prompt)
NOTES: also caught by Prompt 9 on re-run — CI workflow used
       actions/checkout@v4 (now pinned to SHA).

Five lines is the floor. The RE-RUN VERDICT line is the one that matters — it is the moment you confirm the fix actually closed the prompt’s view of the problem, not just the symptom you noticed.

The re-run habit

After every fix:

  1. Open a fresh chat session. (Old sessions have cached state and will tell you what you want to hear.)
  2. Paste the same prompt that produced the original finding, verbatim. Do not paraphrase — the wording matters.
  3. Record the new verdict on the same block in SELF-REVIEW.md.
  4. If the new verdict is still present, the fix did not land where the prompt looks. Re-investigate; do not move on.
  5. If the new verdict is not present, also re-run one related prompt from the pack — the fix sometimes shifts the problem one category over. (Tightening a tool allow-list can move a finding from wildcard tool exposure to shared identity runtime; removing a standing credential can move it from credentials to login flows.)

What “related” means — the cheat sheet

If you fixed…Also re-run…Why
wildcard tool exposureshared identity runtime, ungoverned sub-agentsNarrowing the catalogue often surfaces that the runtime itself is too privileged, or that another agent in the codebase still has the old wide catalogue.
live-fetch dependencymutable reference trust in CISame root cause in two layers (app code, then CI). Fixing one usually leaves the other.
standing credentialphishable login flows, telemetry leakageToken replacement often changes how login works and what the logs contain.
conflated contextunauthenticated tool channels, telemetry leakageUntrusted content getting into the prompt and getting into the logs are usually the same plumbing.
docs that don’t match codeThe prompt for whatever control the doc claimed.The fix should either implement the control or update the doc. Re-run the control’s prompt to confirm which.

When to stop

You are done when, on a fresh session, every prompt in the 13-pack returns either not present or not applicable, and the red-team prompt’s playbook contains no “low effort” attacks. Anything stronger than that is welcome; anything weaker should not ship.

Failure modes & triage

SymptomLikely causeFix
Re-run says “not present” but the issue still existsFix moved the offending code to a file the assistant is no longer reading.Re-check the file list. Add the new location to the scope.
SELF-REVIEW.md keeps growing and no one reads itFormat is too verbose, or the file is buried.Put a one-line summary table at the top (check → latest verdict → date). Link it from the README.
Same finding keeps coming back across releasesThe fix was local; the design that produces the issue is still in place.Promote the finding from “fix in code” to “fix in design”. Write an ADR or design note. The next prompt run should then return “not present” for a structural reason, not a local one.

Next tip

You now have prompts, an adversarial pass, and a small process for working the results. The final tip is the routine you run on the day of a release. Tip 5 →.

← Back to the index