Key insight

The right runner has three properties: it costs nothing while idle, it has no privileged container capabilities, and its identity is managed centrally rather than via a long-lived token in a secret store. Each property closes a different class of finding that compliance teams flag.

Why hosted runners get disabled

Cloud-hosted CI runners are convenient and the default in most public CI systems. They are also one of the most common items disabled by enterprise IT or compliance teams. The reasoning is consistent across organisations:

None of these reasons is wrong. They translate into a policy that disables the hosted runners and tells repository maintainers to bring their own. The first push of a release tag against the new policy fails almost instantly with a runners-disabled message; the rest of the release pipeline never gets to run.

The four bring-your-own options

The options that show up on the comparison table when a team starts solving this are:

OptionIdle costDocker supportOperational overheadCompliance posture
Always-on virtual machine Pays full price 24/7 Yes, with Docker installed Medium: VM lifecycle, patching, image upkeep Treated as a long-lived workload
Self-hosted runner pool in a separate CI service Per-minute billing Yes Medium: a second CI surface to learn Acceptable but adds a vendor
Ephemeral container-job runner (this chapter) Near zero when idle Not needed (build offloaded) Low: reuses an existing container runtime Strongest — identity-only auth, ephemeral, no privileged container
Kubernetes-cluster runner Pays for the cluster Yes, with Docker-in-Docker High: cluster ops on top of CI ops Variable

The ephemeral container-job pattern wins on every column that matters to an enterprise approver. The remainder of this chapter explains the four design choices that make it work.

Design choice one: a container-jobs platform, not a virtual machine

The runner is implemented as a container job — a workload that starts, runs one CI job to completion, and exits. The hosting platform schedules the container, monitors it, and terminates the underlying compute when there is nothing to do. The bill is per second of execution time, with no idle cost.

This is a fundamentally different cost model from an always-on virtual machine. A release pipeline that runs perhaps five times a month for ten minutes each costs the price of fifty minutes per month, not the price of a month of uptime. For small operations teams, the cost difference is enough that the conversation about “can we afford a self-hosted runner?” never starts.

Every major cloud now exposes a container-jobs primitive that fits this pattern: a managed runtime that takes a container image and a command and runs it to completion. The exact name varies; the shape is the same.

Design choice two: no Docker inside the runner

The traditional self-hosted runner runs Docker locally to build the container image. This requires a privileged container or a dedicated tier of the hosting platform that allows Docker-in-Docker. Both options widen the compliance surface and increase the operational burden.

The better pattern is to offload the actual image build to a managed cloud builder. The major cloud container registries all expose a cloud-build feature that takes a source tree, builds the image, and pushes the result to the registry — without needing Docker on the caller. The runner’s job becomes orchestrating the build, not performing it.

The downstream steps of the pipeline — signing, attestation, copying the image to a public registry — also avoid Docker. A pure-Go OCI client (such as crane) can copy manifests and blobs directly between registries over HTTPS, without a Docker daemon. The signing tool (cosign) is a single binary that runs without a daemon. The attestation tool is similar.

The runner’s container image therefore needs only standard tools: git, the cloud platform’s command-line tool, jq, cosign, syft, crane. No Docker. No privileged mode. The compliance review of the runner is correspondingly short.

Design choice three: queue-driven autoscaling, scale-to-zero

The container-jobs platform should be configured with an autoscaler that watches the CI provider’s queued-jobs API. When a job appears in the queue, the autoscaler starts a runner. When the queue is empty, the autoscaler scales the runner count to zero.

The autoscaler is the open-source pattern that goes by the project name KEDA (Kubernetes Event-Driven Autoscaling) or the equivalent built into the major container-job platforms. The configuration is typically a few lines: name the metric (queued jobs for a specific repository), name the threshold (one or more queued jobs), and the platform handles the rest.

The end-user behaviour is invisible: a developer pushes a tag, the CI provider queues a job, the autoscaler starts a runner within seconds, the job runs, the runner exits. There is no manual scaling, no “is the runner up?” on-call concern.

Design choice four: identity-based authentication, no stored tokens

The runner needs to authenticate to two services: the CI provider (to register itself and pick up jobs) and the cloud platform (to push images, write attestations, log telemetry). The naive approach is to put long-lived tokens in a secrets store and mount them as environment variables. This works and is unsatisfying: tokens get rotated, secrets stores get audited, and the runner’s blast radius is “everything those tokens can do”.

The better pattern uses managed identity. The container job runs as a workload identity in the cloud platform; the cloud platform issues short-lived tokens on demand based on the identity’s role assignments. There is no long-lived secret; rotation is implicit; revocation is one command.

For the CI provider side, registration to a self-hosted runner pool still typically requires a token, but the token can be a short-lived registration token issued at runner-start time rather than a long-lived personal access token. The runner’s lifetime is short enough that the registration token expires before the runner does.

What the pipeline actually does

Putting the four design choices together, the pipeline shape is:

  1. Developer pushes a tag.
  2. CI provider queues a job for the runner pool.
  3. Autoscaler detects the queue and starts a runner container.
  4. Runner registers itself with the CI provider using a short-lived registration token.
  5. Runner checks out the source.
  6. Runner invokes the cloud builder to build the image and push it to the cloud registry.
  7. Runner uses crane to copy the image to the public registry.
  8. Runner uses cosign to sign the image, generating a keyless signature anchored to its own identity.
  9. Runner uses syft to generate the SBOM, and the platform’s attest-provenance step to write the SLSA build attestation.
  10. Runner uploads the SBOM and pin-file fragment to the source repository’s release page.
  11. Runner exits. The container job terminates.
  12. Autoscaler observes the queue is empty and scales to zero.

From the developer’s perspective: tag, wait six to ten minutes, release is signed and ready. From the operations perspective: the only standing infrastructure is the runner pool’s configuration; there is no runner machine to maintain.

Common failure modes

Three classes of failure dominate the first few months of running this pattern.

The autoscaler does not pick up the queued job. Almost always a credentials issue — the autoscaler cannot read the CI provider’s queue because its token has insufficient scope. The CI provider’s queue API typically requires a specific organisation-level scope; ensure the token has it.

The cloud builder cannot pull the source. Either the runner has insufficient scope on the cloud builder, or the cloud builder cannot reach the source repository. The fix is usually to grant the runner’s managed identity the equivalent of “trigger builds” on the cloud builder, and to ensure the source repository is reachable from the cloud builder (a code-mirror feature on the cloud builder side is the common solution).

The signing step fails with a no-token error. The CI provider needs to issue an OIDC token to the runner for keyless signing. The pipeline’s permissions block must request id-token: write (or the equivalent on other providers). Without it, cosign has no identity to sign as.

None of these is conceptually hard. All of them appear in the first run, and a fifteen-minute debugging session sorts them out.

What you give up

The ephemeral pattern trades two things away. First, every job starts from a cold state — there is no warm cache, no pre-pulled image, no pre-cloned source. Build times are seconds-to-minutes longer than they would be on a warm runner. For releases, the trade is fine; for tight feedback loops on every push, the latency may be unacceptable and a hybrid approach (hosted runners for non-release pushes, ephemeral runners for releases) is reasonable.

Second, debugging a runner is harder — you cannot SSH into the runner because it has already terminated. The fix is to make the pipeline observable: structured logs to the cloud platform’s log store, with the runner’s identity and the job’s correlation identifier in every line. Post-mortem analysis from logs is fast enough that the lack of interactive access stops mattering.

References & further reading