Signing Models & the MLBOM: Provenance That Survives Quantum

Key insight

A model you can’t cryptographically trace is one you run on faith. Provenance is a signature bound to an identity you chose to trust plus a record of how the artifact was built — delivered by the same supply-chain stack you already use for containers (Sigstore, in-toto, a CycloneDX MLBOM). Every one of those signatures is RSA/ECDSA/Ed25519 today, so they must migrate to ML-DSA / SLH-DSA — on a different clock from key exchange, because the risk is future forgery, not past harvesting.

In one sentence

Sign your models and generated code now with the supply-chain tools you already have — and make the signing algorithm swappable, because the longest-lived roots of trust must move to post-quantum signatures first.

Why a hash isn’t provenance

Modern AI is assembled from parts you did not build: weights pulled from a public hub, a base model someone else trained, datasets of uncertain origin, and increasingly code an agent wrote. Each is a place tampering can enter — a swapped weights file, a poisoned base model, or simply an artifact whose origin nobody can attest.

The instinctive answer — “publish a checksum” — is necessary but not sufficient. A hash only proves a file matches the digest the same source handed you. If the source or its distribution channel is compromised, the attacker serves a matching hash for the tampered file and you are none the wiser. A hash proves integrity relative to a claim; it says nothing about who made the claim or how the artifact came to be.

Provenance is the missing half: a signature you can trace to an identity you deliberately chose to trust, plus a verifiable record of how the artifact was produced. That is the difference between “this file matches a number on a web page” and “this file was signed by our training pipeline’s identity, built from these inputs, and logged where I can audit it.”

Model signing, concretely

The reassuring part is that this is not a new tool to invent. It is the software supply-chain stack already hardened for containers and packages, pointed at model artifacts:

Piece	What it does
Sigstore / cosign	Signs the artifact. Supports keyless signing: instead of a long-lived private key, it binds the signature to an identity from your SSO (OIDC) provider via a short-lived certificate.
Fulcio & Rekor	Sigstore’s certificate authority and public transparency log — the signature is recorded so anyone can later audit that it happened.
in-toto	Adds attestations that describe how the model was built (inputs, steps, builder), so the signature covers the process, not just the bytes. Aligns with the SLSA supply-chain framework.
CycloneDX / SPDX MLBOM	The bill of materials, extended with machine-learning components: base model, datasets, hashes, and the model card.

These compose. Cosign signs the weights and produces an in-toto attestation binding that signature to the build; the attestation references an MLBOM describing the inputs; Rekor logs the whole thing. Hugging Face already supports signing model repositories with this machinery, so for many teams the move is adopting a workflow, not writing one. The consumer side is the mirror image: verify the signature against an identity you trust before you load the weights — the same gate you would put on a container image.

The MLBOM: a worked example

An MLBOM (machine-learning bill of materials) is deliberately unglamorous. It extends the CBOM/SBOM idea with the facts a reviewer needs to trust a model: what it is, the exact weights, what it was built on, what it learned from, and how it was signed. A CycloneDX ML component looks like this:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.6",
  "components": [{
    "type": "machine-learning-model",
    "name": "support-triage-classifier",
    "version": "2026.06",
    "hashes": [{ "alg": "SHA-256", "content": "9f2c…" }],   // the exact weights
    "modelCard": {
      "modelParameters": {
        "base-model": "llama-3.1-8b",                       // built on
        "datasets": [{ "ref": "tickets-2019-2025" }]        // learned from
      },
      "considerations": { "useCases": ["ticket routing"] }
    },
    "signature": {
      "algorithm": "Ed25519",        // ← today; the line that must migrate
      "value": "…"
    }
  }]
}

Nothing here is exotic — and that is the point. The base model and dataset references make a fine-tune auditable: if a base model is later found to be backdoored, you can query every MLBOM that lists it. The hashes pin the exact weights the signature covers. And the signature.algorithm line is the single field that connects this whole supply-chain story back to the rest of the field guide — because that algorithm is what quantum breaks.

Provenance is a chain, not a checksum: signed weights, an attestation of how they were built, and a log you can audit — all resting on signatures that must migrate.

Where quantum bites — and when

Every signature in the stack above — cosign’s, the attestation’s, the certificate chain’s — is an RSA, ECDSA, or Ed25519 signature today. All three are broken by Shor’s algorithm. So model provenance inherits the same migration as everything else in this guide: signatures move to ML-DSA (lattice-based, the general-purpose choice) or SLH-DSA (hash-based, conservative and stateless).

But the timing differs from transport, and getting this right avoids both panic and complacency. Signatures are on a different clock from key exchange:

	Key exchange (transport)	Signatures (provenance)
Quantum risk	Harvest now, decrypt later	Future forgery — not harvestable
Why	Recorded ciphertext is decrypted once a quantum computer exists	A signature is only useful while it’s verified; a 2035 forgery can’t re-sign a 2026 artifact you already verified
Urgent case	All confidential traffic, now	Long-lived roots of trust — anything that must stay valid past Q-day

The nuance that matters: a signature is not a harvest-now target the way a key exchange is. Nobody gains by recording your signatures — there is no secret in them to unlock later. The danger is that once quantum forgery is possible, an attacker could mint a new signature that validates against a trust anchor that is still in use. That reframes the priority around lifetime:

Long-lived roots — migrate first. A model-signing root, a CA root, or a firmware key expected to be trusted for ten to fifteen years will still be live when quantum arrives. These should adopt the conservative, hash-based SLH-DSA on a near horizon — it is slower and its signatures are large, but it rests on the most battle-tested assumptions.
Short-lived leaves — migrate on schedule. A signature on a model version you will re-sign next quarter, or a token that expires in an hour, can wait for ML-DSA tooling to mature. There is no harvesting emergency, and rushing means paying the large-signature cost everywhere at once.

Signing AI-generated code

The same discipline extends past model weights to the code an agent writes. If an AI system proposes changes that ship — a migration, a config, a function — then “an agent produced this” is a provenance claim like any other, and it deserves the same treatment: sign the artifact, record what produced it, and gate it behind verification before it runs. That is the cryptographic backbone under the disciplined, gated agent output the wider guide argues for — without a signature you can verify, “a trusted agent wrote this” is an assertion, not a fact.

Notice the recurring move: the algorithm is a detail behind an interface, not a hard-coded constant. Sign models and generated code now with whatever you have — Ed25519 today — but keep the signing algorithm a configurable choice so the switch to ML-DSA or SLH-DSA is a config change, not a re-plumb. That is crypto-agility applied to signing, the same discipline the capstone calls the twin of model-agility.

Provenance: A verifiable record of an artifact’s origin — a signature bound to a trusted identity plus how it was built — not merely a matching hash.
Hash: A digest that proves a file matches a claimed value; necessary but not sufficient, because a compromised source can supply a matching hash for a tampered file.
Model signing: Cryptographically signing model artifacts so tampering is detectable and origin is provable; must migrate to PQC signatures.
Sigstore / cosign: Supply-chain signing (with Fulcio CA + Rekor transparency log) used for containers, packages, and now model artifacts; supports keyless signing via SSO identity.
in-toto: A framework for signed attestations binding a signature to how an artifact was built; aligns with the SLSA supply-chain levels.
MLBOM: Machine-learning bill of materials: an inventory of the base model, datasets, hashes, and dependencies behind a model — the AI analogue of the CBOM.
CBOM: Cryptographic Bill of Materials — the crypto inventory the MLBOM extends for AI systems.
ML-DSA / SLH-DSA: The post-quantum signature standards (FIPS 204 / 205); ML-DSA is the general-purpose choice, SLH-DSA the conservative hash-based one for long-lived roots.
SLSA: Supply-chain Levels for Software Artifacts — a framework for graduating build-integrity guarantees that in-toto attestations feed.
HNDL (harvest now, decrypt later): Recording encrypted data today to decrypt after quantum; the reason key exchange is urgent while signatures are not harvestable.

What to carry forward

A hash isn’t provenance — a compromised source serves a matching hash for a tampered file. You need a signature bound to a trusted identity.
Model signing is the existing supply-chain stack (Sigstore/cosign, in-toto, a CycloneDX MLBOM), not a new tool — and Hugging Face already supports it.
An MLBOM records base model, datasets, hashes, and the signing algorithm — making fine-tunes auditable.
Signatures migrate to ML-DSA / SLH-DSA on a different clock than key exchange: forgery risk, not harvesting — so long-lived roots go first.
Extend the same signing discipline to AI-generated code, and keep the algorithm swappable so migration is a config change.

That completes The AI Frontier trio. Back to the Quantum-Safe AI capstone, the sibling Agent Channels article, or the full series ←.

Understand it in your own words

Paste into any AI assistant to check yourself:

I'm making AI model provenance quantum-safe. Quiz me one question at a
time, correcting me gently:

1. Why is a published checksum necessary but not sufficient for trust?
2. What do Sigstore/cosign, in-toto, and an MLBOM each contribute?
3. What does an MLBOM entry record, and why does that make a fine-tune
   auditable?
4. Why do signatures migrate on a different clock than key exchange,
   and which signatures are most urgent?
5. How does signing AI-generated code relate to gated agent output?

References & further reading

NIST, FIPS 204 (ML-DSA) & FIPS 205 (SLH-DSA). csrc.nist.gov/publications/fips
Sigstore, cosign & keyless signing. sigstore.dev
in-toto & SLSA, supply-chain attestations. slsa.dev
OWASP CycloneDX, ML-BOM / Machine Learning Bill of Materials. cyclonedx.org — MLBOM
Hugging Face, Model signing & security. huggingface.co/docs/hub/security