Nutrition Labels for Trust

June 13, 2026 16 min read Keywords: concrete, Formal Verification, trust, capabilities, trusted-computing-base, agents, Programming Languages

A money changer weighs coins on a small balance scale while his wife pauses over an illuminated book to watch the scale; the table is strewn with coins, pearls, and a convex mirror reflecting a figure outside the frame. — *The Money Changer and His Wife*, Quentin Matsys, 1514. Musée du Louvre, public domain, via Wikimedia Commons.

The man in the painting at the top of this page is doing the oldest verification job there is. He is weighing each coin on a balance, one at a time, because the face stamped on a coin is a claim and its weight is the evidence, and a money changer who confused the two went broke. He does not trust the mint. He trusts the scale.

Five hundred years later, almost all of our software asks us to trust the stamp. It ships with a name, a logo, a reassuring sentence about security, and no scale anywhere. The bill for trusting the stamp comes due in supply-chain backdoors, dependencies nobody audited, and “verified” badges that turn out to have meant a marketing review, and it usually arrives late and all at once.

Vitalik Buterin put the missing scale into one sentence:

In an ideal world all software and hardware would have “nutrition labels” that provide a full list of trust dependencies - what math and which actors’ honest behavior (and on what time scale) the system is relying on to provide its core functionality and implied guarantees.

Vitalik Buterin (@VitalikButerin)

binji replied with the obvious objection, and then with something more interesting:

even if this was available, it could still prove to create a cognitive load that is ignored by many, so they’d still opt into systems that preselect for them. you see this in nutrition labels and dieting etc, where most people prefer the convenience of being given a basket of “things to eat” from a verified source (doctors, dieticians…influencers).

but here’s where the agentic world gets interesting, as ai becomes the new ui, the necessity of privacy preserving agents personalized to a users preferences that can handle the cognitive load to supplement their decisions while proving verified logic on how they come to that conclusion will be key.

feels like a strong intersection opportunity at large here and is a mixture of verification, privacy, agent-assisted decision making, and overall web hygiene

binji (@binji_x)

That exchange splits the problem cleanly. Vitalik is asking for the artifact: what does this system depend on? binji is asking who is supposed to read it without turning every user into a security engineer. Concrete, the programming language I have been writing about, sits between those two questions. It does not solve the whole thing. It builds the part that should never have been prose.

Key takeaways

A trust nutrition label should be a compiler artifact, not vendor prose. Concrete already produces the math-and-code half of one as a byproduct of compiling.
The label has real fields: the capabilities a program uses, every trusted and unsafe boundary, and a per-obligation evidence class (proved, assumed, or trusted) down to a named trusted computing base and the axioms the proofs rest on.
It is verifiable, not asserted. The reports are deterministic and concrete diff fails closed when trust weakens between versions.
It deliberately stops at one half. It says nothing about which actors you trust or for how long, and the verification is partial: the backend and final binary stay trusted. The output is “here is what is proven, by what, and what is not.”
For agents, that means trust should come from the kernel-checked artifact, not the agent’s say-so.

#Two Different Kinds Of Trust

Vitalik’s label is really asking for two lists. One list is technical: which proofs, components, foreign calls, permissions, and machines does this system rely on? The other is social: which people or institutions must behave honestly, and for how long?

Those are not the same problem. Concrete works on the first list. It says nothing deep about incentives, collusion, governance, or whether some actor stays honest for six months. But the technical list is already enough work, and it is the part a compiler can actually produce.

#A label generated by the compiler

Concrete is built around a simple habit: when the program makes a claim, the compiler should keep the receipt. A contract, a capability, a trusted boundary, a runtime-safety check: each one becomes something the tool can report on. Compilation should leave behind more than a binary. It should leave behind a record of what the program used and what evidence backs each claim.

That ledger has real fields, not slogans. A small slice of it might look like this:

function: parse_config
capabilities: File
trusted_boundaries:
  - trusted extern fn os_read
obligations:
  O1 array_bounds buffer[i]
     evidence: proved_by_kernel_decision
     engine: omega
  O2 ensures result_is_valid_config
     evidence: assumed
  O3 proof link Config.Proofs.parse_config_shape
     evidence: stale
     reason: body fingerprint changed
tcb:
  - Concrete checker
  - Lean kernel
  - proof attachment and fingerprint machinery
  - LLVM/backend/runtime/OS/hardware

The syntax above is illustrative, but the categories are real: authority, trusted boundary, obligation, evidence class, stale proof, trusted base. The label comes from the program and the proof artifacts. Nobody writes it afterward as a compliance paragraph.

Capabilities. Concrete tracks effects as a visible capability vocabulary: the concrete permissions File, Network, Process, Console, Clock, Random, Env, Alloc, and Unsafe, plus a Std macro that expands to the standard set and user-defined aliases that expand at parse time. A function with no annotation is pure. A function that allocates on the heap must say with(Alloc). A function that touches the network must say with(Network), and so must everything that transitively calls it. The label cannot under-report, because a program that uses an effect it did not declare does not compile.

Trusted boundaries. Every place the program steps outside what the checker can guarantee is marked and locatable: trusted fn, trusted impl, trusted extern fn, and functions or calls carrying with(Unsafe). You can ask the compiler to enumerate them, and it answers with a list and source spans rather than a shrug. “Does pointer tricks internally” and “can call arbitrary foreign code” are different risks, and they get different markers.

Evidence classes. My favorite detail is the absence of a single green checkmark. Every obligation says how it is justified: proved_by_lean for a kernel-checked theorem, proved_by_kernel_decision for a decision procedure, solver_trusted for an external solver result, tested_by_oracle, assumed, trusted, stale, unproven, and more. “This type-checks,” “this is proven,” and “we are trusting the author here” are different statements. Concrete keeps them separate. That is the money changer’s distinction restored to software: the coin you weighed, the coin you did not, and the coin you are choosing to take on faith.

None of that is pseudocode, so here is the authority half in real Concrete. A foreign call is a named, audited boundary; a function that touches the console has to say so; and the requirement climbs the call graph on its own:

trusted extern fn putchar(c: i32) -> i32;        // foreign boundary, audited

fn print_int(n: i64) with(Console) { /* ... */ } // needs Console
fn greet() with(Console) { print_int(42); }      // inherits it from print_int
fn main() with(Std) -> Int { greet(); return 0; }

Delete with(Console) from greet and the program stops compiling, because greet calls something that needs it. Authority is not a comment that can drift out of date. It is part of the type, rechecked on every edit.

#The label carries proofs, not just declarations

A permissions screen says an app can use the network. A software bill of materials says a binary contains some library at some version. Concrete’s label can say something a list of dependencies cannot: that a specific property of a specific function has been mechanically proven, and by what.

The mechanism is ordinary design by contract, pointed at verification. A function carries #[requires] and #[ensures] clauses and loop invariants. Each becomes a proof obligation with a stable identifier. A precondition is assumed at the function’s entry and surfaced at every caller, so it cannot be quietly dropped; depending on the active policy, the caller either discharges it or carries an explicit unproven obligation.

How it gets discharged is where the marketing version would usually start lying. Concrete is kernel-first. The decision procedures omega, for linear integer arithmetic, and bv_decide, for bitvectors, produce certificates that are checked inside the toolchain, so using them adds no external solver to the trusted base. But the caveats stay visible. bv_decide relies on a compiled LRAT checker, which brings a named tier of native-code trust; it is not pure kernel reduction, and the axiom inventory says so. Concrete can also hand a condition to an external SMT solver. When it does, the result is labeled solver_trusted, and that solver binary becomes part of the trusted base for that obligation. Trust has not vanished. The label tells you which kind of trust you just used.

concrete prove is the workflow that makes this usable. It generates a Lean proof workspace for a function, links registered theorems back to their obligations, and supports replay so a proof stays bound to the exact source it was written against. A fingerprint, now a truncated SHA-256 over the function’s structure, detects when the code drifts out from under its proof, and the evidence class flips to stale. The label cannot keep claiming “proven” about a function that has since changed.

Here is the proof half in real code, just as small. A bit rotation whose precondition says the shift must stay in range:

#[requires(0 <= n && n < 32)]
fn rotr(x: u32, n: u32) -> u32 {
    return (x >> n) | (x << (32 - n));
}

That #[requires] is not a comment and not a runtime assert. The compiler turns it into an obligation, pushes it onto every caller, and then reports, one call site at a time, how each call discharges it:

call rotr(x, 13)       requires 0 <= n && n < 32   ->  proved_at_callsite
call rotr(x, n) [n=7]  requires 0 <= n && n < 32   ->  proved_by_kernel_decision (bv_decide)
call rotr(x, 40)       requires 0 <= n && n < 32   ->  failed_at_callsite
call rotr(x, k)        requires 0 <= n && n < 32   ->  unproven_at_callsite

Four calls, four honest verdicts. A constant in range folds to proved. A value fixed earlier by a let is handed to a decision procedure and closed with checked evidence. A constant out of range is reported as a violation, and policy can make that a hard failure. An argument the compiler cannot pin down stays unproven, labeled exactly that, never quietly rounded up to fine. The last line is the one that matters most: the label would rather tell you it does not know than tell you a comforting lie. That is the money changer setting a coin aside because the scale was inconclusive, instead of waving it through.

Return to the config example. The useful fact is not that the program is “verified.” It is that its File authority is visible, its absence of Network and Alloc authority is checked, its operating-system boundary is named, one bounds obligation is discharged by omega, one semantic parsing claim is only assumed, and one old proof has gone stale. Six facts, six different kinds of trust, none of them collapsed into a checkmark. You learn more from that than from a green “verified” badge, precisely because it shows you where the badge would have been lying.

The word “verified” has to stay disciplined. Capabilities and boundaries are enforced or reported by the type system. That is useful, but it is not the same thing as a proof. A contract that reaches proved_by_lean or proved_by_kernel_decision has a machine-checked argument behind it. The label keeps those cases apart so the enforced does not pretend to be proven.

#The label includes itself

The best part is that Concrete applies the same suspicion to itself. It prints the trusted computing base: the layers you must trust for any proof to mean anything. The checker and compiler. The Lean kernel. The proof-attachment and fingerprint machinery. The LLVM backend. The runtime, the operating system, the hardware. And the foreign code behind every extern fn. Most systems hide this list. Concrete prints it.

Ken Thompson gave the reason in his 1984 Turing Award lecture. You cannot fully trust code you did not write yourself, and the rot can reach all the way down to the compiler: a compiler can carry a backdoor that survives even after its own source is scrubbed clean, by recognizing when it is compiling itself and quietly reinserting the trick. That does not make trust hopeless. It means “trust me, the compiler is clean” is not an answer. You have to name what trusting the compiler commits you to. A compiler that prints its own trusted base and the axioms its proofs stand on is Thompson’s question answered out loud instead of waved away.

It even prints the axioms. An axiom-inventory gate runs over every theorem and fails the build on anything undocumented. The mathematical assumptions the proofs are allowed to lean on are named: propext, Classical.choice, Quot.sound, and the flagged native-code trust tier for compiled certificate checking. That is the literal answer to Vitalik’s “what math are you relying on,” extracted automatically rather than asserted in a README.

The label can also be regenerated. Same source, same reports. concrete diff compares two versions and flags when trust weakens, when a proof goes stale, when authority escalates, when a boundary erodes. A label you can regenerate and diff is evidence. A label you cannot is marketing.

#What it does not cover

Here is the part I would rather say myself than have you catch me on.

Concrete does not model the second half of Vitalik’s label at all. There is no notion of actors, incentives, collusion, honesty-until-some-time, or social trust anywhere in it. Its accounting is static, about which layers and which math to trust, not dynamic, about which humans behave well and for how long. The actor-and-time-scale half is a real and separate problem, and it belongs to mechanism design and economics, not to a systems language.

The verification is also partial, and the label says so. Proofs attach at the contract and proof-model level, over an intermediate representation and an idealized integer model. The chain from there through the backend down to the final binary is trusted, not verified, and binary correctness sits openly among Concrete’s explicit non-claims. Many obligations are still missing or end in a hand-written Lean proof rather than automatic discharge.

So the claim is not “Concrete proves your program correct end to end.” It is smaller and more useful: Concrete proves selected claims over its proof model, then tells you which properties are proven, by what, which trusted base they rely on, and which properties are not proven at all. That is worth more than a green badge.

#Who reads the label

binji’s objection is correct and survives even a perfect label. Labels create cognitive load, most people ignore them, and they fall back on a basket curated by someone they trust. This happens with food labels and diets, and it would happen with trust labels too. A manifest nobody reads is decoration.

But his answer points at the kind of artifact Concrete produces. He wants agents that carry the cognitive load while showing verified logic for their conclusions. For that to work, the agent needs structured facts it did not invent.

Concrete’s label is machine-consumable. It has identifiers, source spans, dependencies, and evidence classes. An agent can read it directly. The cognitive load binji worries about is a problem for a human staring at a wall of facts, not for software filtering those facts against a user’s policy.

And Concrete’s strongest conclusions arrive with proofs the kernel already checked, or with explicitly weaker labels when they do not. binji wants the agent to prove the logic behind its recommendation. With Concrete, the load-bearing proof evidence was checked independently of any agent. The agent does not have to be trusted to produce that evidence. It only has to point at evidence that already exists and that it cannot forge without changing the artifact.

#Trust should come from the artifact, not the agent

This changes the role of the agent. The usual story makes the agent the thing you must trust: align it, audit it, believe it. Concrete pushes some trust downward into the artifact. The agent’s job is then smaller. It reads facts it cannot easily fake and applies the user’s policy to them.

The split is simple. Concrete produces the verified input. The agent applies the user’s preferences. The Lean kernel anchors the strongest evidence. The remaining trusted layers are named instead of hidden. The agent is still not magic, but at least it is reading facts grounded outside itself.

One more line so I do not oversell it. Concrete answers the input problem: trustworthy facts about an artifact. It does not answer the alignment problem: whether the agent faithfully serves the user. It can make the agent’s inputs harder to fake. It does not make the agent good.

#The ingredients, not just the dish

Everything above labels a program you wrote. But the trust dependency that actually bites is the one you did not write: the parser that quietly starts logging to disk, the hash helper that adds a network call “for telemetry,” the dependency whose proof silently downgrades between versions. Vitalik’s phrase is “a full list of trust dependencies,” and in practice your dependencies are your imports. binji’s basket of ingredients is the import list.

Concrete’s design notes take the next step, and I want to be exact: this part is written down as a direction, not yet shipped. The principle is that an import should not silently grant power. It should say what it brings in and what it is forbidden to bring in. So an import carries a ceiling and a floor:

import std.parse      requires(no File, no Network, no Unsafe)
import hmac.compute   requires(proved_by_lean)
import crypto.compare requires(constant_time, no secret_sink)

and a manifest sets a whole-project authority budget:

[authority]
allowed = ["Alloc"]
forbidden = ["File", "Network", "Process", "Unsafe"]

Now drift fails closed. If the parser grows File authority, or the hash helper’s evidence downgrades from proved_by_lean to assumed, the build stops and demands an explicit change to the constraint. That is the supply-chain backdoor from the top of this post, caught at compile time instead of explained in a postmortem.

Capabilities, contracts, and evidence classes exist today; bounded imports and authority budgets are a design on paper, not a feature you can run. But the direction is the whole point, because it is where the label stops describing one program and starts describing the entire dependency tree, which is the only level at which “a full list of trust dependencies” is actually true.

#Labels should be compiler artifacts

Software trust labels should not be vendor prose. They should be compiler artifacts: deterministic, diffable, and backed by machine-checked evidence wherever the strong claims are made. Concrete shows what that looks like for capabilities, contracts, proof obligations, evidence classes, axioms, and trusted boundaries. This is the same argument as a fact-producing compiler and when the compiler is the oracle, pointed at a live conversation elsewhere.

The systems-language world and the crypto-trust world are circling the same object from opposite sides. One wants software to emit a verifiable manifest of what it depends on. The other wants a language where that manifest falls out of compilation. Concrete does not solve the honest-actor half, and it does not pretend to. It takes the half that can be mechanized and mechanizes it: the part CI can reject, a reviewer can diff, and an agent can stand on without asking to be believed.

Written with an LLM in the loop, like everything here. The ideas and the mistakes are mine. More on how I write.