Chapter 5 — The Bootstrap-Repo Architecture: Separating Source from Distribution

Key insight

Two repositories beats one repository with an orphan branch. The mental model of “source repo” and “distribution repo” is cleaner; the per-release ergonomics are simpler; the security boundary is more defensible. Adopt the two-repo layout from the first release.

The three structural choices

Once you have decided to ship a signed-image appliance, there are three places to put the customer-facing material — the install script, the documentation the customer needs, the pin file.

Option one is to put it in the same repository as the application source, in a top-level folder. The customer clones the whole repository. This works but defeats the entire purpose of the signed-image appliance pattern: the source is right there in the customer’s clone, undermining IP protection and pulling internal dev history into the customer’s view.

Option two is to keep the customer-facing material in an orphan branch of the source repository. The customer clones that branch only. This works mechanically and has the appeal of single-repo simplicity. In practice, it generates a steady stream of low-grade pain: discipline is required to keep development away from the orphan branch; CI configurations get fiddly because the workflow has to know which branch it is running on; new maintainers spend a confused afternoon every time they discover the orphan. Migrating to a different code-hosting service later is painful because the orphan branch’s relationship to main has to be preserved.

Option three is to create a second repository — a bootstrap repository — that contains only the customer-facing material. This is the recommended pattern. The cost is small (one extra repository to create); the benefits compound forever.

Why the two-repository layout wins

Five reasons, in approximate order of importance.

First, it makes the security boundary obvious. The source repository is internal; the bootstrap repository is customer-facing. Every developer can see, at a glance, which side of the boundary a file is on. The boundary stops being a discipline question and becomes a structural one.

Second, it makes branch-protection rules independent. The source repository can have aggressive branch protection (required reviews, required tests, restricted committers) without slowing down the bootstrap repository’s release-by-pull-request flow. Conversely, the bootstrap repository can have a different review policy — potentially even auto-merge of pin-file updates from a verified CI identity — without weakening the source repository.

Third, it makes dependency scanning useful. Bots that scan dependencies (Dependabot, Renovate, the equivalents) report findings against the repository they ran in. With two repositories, the source repository’s alerts are about the application; the bootstrap repository’s alerts are about install-time tooling. Mixed alerts in a single repository drown out signal.

Fourth, it makes the customer’s subscription model meaningful. A customer following the bootstrap repository sees only release events. They are not paged for development commits, feature-branch activity, or documentation drift. They get clean signal.

Fifth, it makes migration trivial. If the source repository moves — to a new code-hosting service, a new organisation, a new account — the customer does not notice. Only the bootstrap repository has the source-repository URL embedded in it (in the signing-identity expression in the pin file), and that is a one-commit update.

What goes in the bootstrap repository

The bootstrap repository is small — tens of files, typically under a megabyte. What goes in it is determined by one question: does the customer need this to install, configure, or operate the agent?

The install script. One entry point, typically with the same name across releases.
The provisioning scripts that the install script calls. These are the supporting steps — identity provisioning, resource-group creation, network configuration, secret-rotation helpers — that the install script orchestrates.
The pin file (chapter 4).
The customer-facing documentation. A short README at the root, a longer setup guide, a troubleshooting guide, a changelog.
Any configuration schemas the customer fills in. Typically a sample configuration file with comments explaining each field.
The licence under which the customer is using the agent.

What deliberately does not go in

The other half of the design is what is excluded. The bootstrap repository must not contain:

The agent’s source code. (That is in the image.)
The agent’s build configuration. (That is in the source repository’s CI.)
The agent’s tests. (Same.)
Internal documentation — design notes, roadmaps, incident write-ups, decision records.
Customer-specific configuration values. The bootstrap repository ships schemas and samples, not real values.
Any internal CI scripts that should not run in a customer’s environment.
Tenant identifiers, fully qualified domain names, or any string that maps back to a real installation.

The exclusion list is enforced two ways: by the script that copies content from the source repository to the bootstrap repository (which uses an explicit allow-list, not a deny-list), and by the pre-flight hygiene checks (chapter 7) that run before any push to the bootstrap repository’s remote.

The synchronisation script

The bootstrap repository is not maintained by hand. A synchronisation script in the source repository populates it from authoritative source. The script’s shape:

Take a target directory (usually a clone of the bootstrap repository) as input.
Read an explicit allow-list of paths to copy from the source repository.
For each allow-listed path, copy it to the target directory. Overwrite without prompting; the script is the source of truth.
Generate a deploy-repository-flavoured README at the root, replacing the source repository’s README (which is internally focused).
Write a placeholder pin file if one does not exist, so the install script’s shape does not break before the first signed release.
Write a .gitignore matching the source repository’s.
Print a summary of what changed, for review before commit.

The first time the script runs is the first time the bootstrap repository exists. Subsequent runs are typically rare — the script’s job is to keep the install scripts and documentation in sync with what the source repository thinks they should be. Once the bootstrap repository has settled, most release events update only the pin file and the changelog; the install scripts change rarely.

Drift detection

Drift between the source repository’s view of the install scripts and the bootstrap repository’s copy is a real failure mode. Somebody fixes a bug in the install script, commits to the source repository, and forgets to re-run the sync script. A customer onboarding the following month gets the unfixed version.

The defence is a small CI job that runs the sync script in a dry-run mode and fails the source-repository build if the dry-run would produce changes. The signal is precise: “the bootstrap repository is stale; run the sync script and commit.” The cost is a few seconds per build; the saved support time is substantial.

Visibility and access

The bootstrap repository’s visibility is a per-product choice. Public is reasonable for an open-source agent; private with per-customer access tokens is reasonable for a commercial agent. A middle ground is to make the bootstrap repository private and issue a long-lived read-only access token per customer — the customer’s clone uses the token, and the token can be revoked without affecting other customers.

The agent image is a separate access question, governed by the container registry. The two access decisions are independent: it is common to have a private bootstrap repository (so casual users cannot discover the install instructions) but a public image (so verification tooling works without authentication).

Operational ergonomics

The day-to-day operator experience with the two-repository layout is small and uniform.

Per release: tag the source repository, wait for CI, review the auto-generated pull request on the bootstrap repository, merge it, tag the bootstrap repository, email customers. Roughly three minutes of operator effort, plus the CI wait.

Per customer onboarding: send the bootstrap-repository URL and any required access token. The customer clones, runs the install script, the script does everything else.

Per migration of the source repository: update the signing-identity expression in the bootstrap repository’s pin file template. Re-tag the next release. Customers are unaffected.

A note on monorepos

Teams using a monorepo sometimes ask whether the bootstrap-repo pattern can be implemented as a subdirectory of the monorepo. It can, with the same orphan-branch caveats as option two above — you end up enforcing discipline rather than structure. If the monorepo is non-negotiable, the closest practical equivalent is to publish the contents of the bootstrap subdirectory to a separate distribution repository via CI on every commit, treating the distribution repository as the canonical customer interface. The pattern is the same; the editing surface is just different.

References & further reading

OWASP Top 10 for LLM Applications, LLM03 Supply Chain (which discusses the surface created by distribution channels). genai.owasp.org
SLSA framework, source-integrity requirements at each level. slsa.dev/spec
NIST Secure Software Development Framework (SP 800-218), “Protect the Software” family. csrc.nist.gov/Projects/ssdf