Nutrition Labels for Trust
The man in the painting at the top of this page is doing the oldest verification job there is. He is weighing each coin on a balance, one at a time, because the face stamped on a coin is a claim and its weight is the evidence, and a money changer who confused the two went broke. He does not trust the mint. He trusts the scale.
Five hundred years later, almost all of our software asks us to trust the stamp. It ships with a name, a logo, a reassuring sentence about security, and no scale anywhere. The bill for trusting the stamp comes due in supply-chain backdoors, dependencies nobody audited, and “verified” badges that turn out to have meant a marketing review, and it usually arrives late and all at once.
Vitalik Buterin put the missing scale into one sentence:
In an ideal world all software and hardware would have “nutrition labels” that provide a full list of trust dependencies - what math and which actors’ honest behavior (and on what time scale) the system is relying on to provide its core functionality and implied guarantees.
binji replied with the obvious objection, and then with something more interesting:
even if this was available, it could still prove to create a cognitive load that is ignored by many, so they’d still opt into systems that preselect for them. you see this in nutrition labels and dieting etc, where most people prefer the convenience of being given a basket of “things to eat” from a verified source (doctors, dieticians…influencers).
but here’s where the agentic world gets interesting, as ai becomes the new ui, the necessity of privacy preserving agents personalized to a users preferences that can handle the cognitive load to supplement their decisions while proving verified logic on how they come to that conclusion will be key.
feels like a strong intersection opportunity at large here and is a mixture of verification, privacy, agent-assisted decision making, and overall web hygiene
That exchange is useful because it separates the problem into two parts. Vitalik names the missing artifact: a list of trust dependencies. binji names the missing interface: something that can read that list without making every user become a security engineer. Concrete, the programming language I have been writing about, sits exactly between those two needs. It does not solve the whole problem. It does build the part that should not be prose in the first place.
#The label has two halves, and people read it as one
Vitalik’s label is really two lists. One is the math and the code a system depends on: which proofs, which trusted components, which foreign boundaries, which permissions. The other is which actors’ honest behavior it relies on, and for how long. Those are different kinds of dependency, and the most useful thing you can do with the tweet is keep them apart.
The first half is not a someday wish. A language can produce it today, on its own, as a byproduct of compiling. That is the whole claim of this post, and I will keep it narrow: Concrete is what the math-and-code half of a trust label looks like once you stop writing it by hand and let the compiler keep it for you.
#A label generated by the compiler
Concrete is built as systems control plus evidence accounting. Its pipeline is documented as a single arc: claim, then obligation, then evidence, then audit. Contracts, capabilities, trusted boundaries, and runtime-safety rules are all sources of claims; the compiler lowers each into an obligation and attaches evidence to it. The output of compilation is not only a binary. It is a ledger of what the program relies on and how strongly each reliance is justified.
That ledger has real fields, not slogans. A small slice of it might look like this:
function: parse_config
capabilities: File
trusted_boundaries:
- trusted extern fn os_read
obligations:
O1 array_bounds buffer[i]
evidence: proved_by_kernel_decision
engine: omega
O2 ensures result_is_valid_config
evidence: assumed
O3 proof link Config.Proofs.parse_config_shape
evidence: stale
reason: body fingerprint changed
tcb:
- Concrete checker
- Lean kernel
- proof attachment and fingerprint machinery
- LLVM/backend/runtime/OS/hardware
The syntax above is illustrative, but every category is real: authority, trusted boundary, obligation, evidence class, stale proof, and trusted base. The important move is that the label is derived from the program and the proof artifacts, not hand-authored after the fact.
Capabilities. Concrete tracks effects as a visible capability vocabulary: the concrete permissions File, Network, Process, Console, Clock, Random, Env, Alloc, and Unsafe, plus a Std macro that expands to the standard set and user-defined aliases that expand at parse time. A function with no annotation is pure. A function that allocates on the heap must say with(Alloc). A function that touches the network must say with(Network), and so must everything that transitively calls it. The label cannot under-report, because a program that uses an effect it did not declare does not compile.
Trusted boundaries. Every place the program steps outside what the checker can guarantee is marked and locatable: trusted fn, trusted impl, trusted extern fn, and functions or calls carrying with(Unsafe). You can ask the compiler to enumerate them, and it answers with a list and source spans rather than a shrug. “Does pointer tricks internally” and “can call arbitrary foreign code” are different risks, and they get different markers.
Evidence classes. This is the part that matters most, and it is where Concrete refuses the thing every other label does. There is no single green checkmark. Every obligation carries an evidence class that says exactly how it is justified: proved_by_lean for a kernel-checked theorem, proved_by_kernel_decision for a decision procedure, solver_trusted for an external solver result, tested_by_oracle, assumed, trusted, stale, unproven, and more. “It type-checks,” “it is proven,” and “we are taking the author’s word for it” are different colors on the label, and they are kept different on purpose. It is the money changer’s distinction restored to software: the coin you weighed, the coin you did not, and the coin you are choosing to take on faith.
None of that is pseudocode, so here is the authority half in real Concrete. A foreign call is a named, audited boundary; a function that touches the console has to say so; and the requirement climbs the call graph on its own:
trusted extern ; // foreign boundary, audited
with // needs Console
with // inherits it from print_int
with
Delete with(Console) from greet and the program stops compiling, because greet calls something that needs it. The authority on a function is not a comment that can drift out of date. It is part of the type, rechecked on every edit.
#The label carries proofs, not just declarations
A permissions screen says an app can use the network. A software bill of materials says a binary contains some library at some version. Concrete’s label can say something a list of dependencies cannot: that a specific property of a specific function has been mechanically proven, and by what.
The mechanism is ordinary design by contract, pointed at verification. A function carries #[requires] and #[ensures] clauses and loop invariants. Each becomes a proof obligation with a stable identifier. A precondition is assumed at the function’s entry and pushed onto every caller, so it cannot be quietly dropped; the obligation moves to whoever calls the function and has to be discharged there.
How it gets discharged is where I want to be careful, because the honest version is more interesting than the marketing version. Concrete is kernel-first. The decision procedures omega, for linear integer arithmetic, and bv_decide, for bitvectors, produce certificates that are checked inside the toolchain, so using them adds no external solver to the trusted base. Two caveats are kept explicit rather than hidden. First, bv_decide relies on a compiled LRAT checker, which brings a named tier of native-code trust; it is not pure kernel reduction, and the axiom inventory says so. Second, Concrete can also hand a condition to an external SMT solver, and when it does, the result is labeled solver_trusted and the solver binary becomes part of the trusted base for that one obligation, firewalled from the kernel-checked classes. The point is not that none of this requires trust. The point is that the label tells you exactly which kind of trust you got.
concrete prove is the workflow that makes this usable. It generates a Lean proof workspace for a function, links registered theorems back to their obligations, and supports replay so a proof stays bound to the exact source it was written against. A fingerprint, now a truncated SHA-256 over the function’s structure, detects when the code drifts out from under its proof, and the evidence class flips to stale. The label cannot keep claiming “proven” about a function that has since changed.
Here is the proof half in real code, just as small. A bit rotation that is only correct while the shift stays in range:
That #[requires] is not a comment and not a runtime assert. The compiler turns it into an obligation, pushes it onto every caller, and then reports, one call site at a time, how each call discharges it:
call rotr(x, 13) requires 0 <= n && n < 32 -> proved_at_callsite
call rotr(x, n) [n=7] requires 0 <= n && n < 32 -> proved_by_kernel_decision (bv_decide)
call rotr(x, 40) requires 0 <= n && n < 32 -> failed_at_callsite
call rotr(x, k) requires 0 <= n && n < 32 -> unproven_at_callsite
Four calls, four honest verdicts. A constant in range folds to proved. A value fixed earlier by a let is handed to a decision procedure and closed with a checked certificate. A constant out of range is not a lint, it is a compile error. And an argument the compiler cannot pin down stays unproven, labeled exactly that, never quietly rounded up to fine. The last line is the one that matters most: the label would rather tell you it does not know than tell you a comforting lie. That is the money changer setting a coin aside because the scale was inconclusive, instead of waving it through.
Return to the config example. The useful fact is not that the program is “verified.” It is that its File authority is visible, its absence of Network and Alloc authority is checked, its operating-system boundary is named, one bounds obligation is discharged by omega, one semantic parsing claim is only assumed, and one old proof has gone stale. Six facts, six different kinds of trust, none of them collapsed into a checkmark. You learn more from that than from a green “verified” badge, precisely because it shows you where the badge would have been lying.
This is the precise line between formal verification and the rest of the label. Capabilities and boundaries are enforced or reported by the type system, which is strong but mechanical. A contract that reaches proved_by_lean or proved_by_kernel_decision is verified in the relevant sense: the toolchain checked a proof or proof-producing decision procedure for that property. The label shows you which of your guarantees are which, and refuses to let the merely enforced pose as the proven.
#The label includes itself
The best part is that Concrete turns the question back on its own machinery. It enumerates its own trusted computing base: the layers you must trust for any of its proofs to mean anything. The checker and compiler. The Lean kernel. The proof-attachment and fingerprint machinery. The LLVM backend. The runtime, the operating system, the hardware. And the foreign code behind every extern fn. Most systems hide this list. Concrete prints it.
There is a famous reason this is the only honest move. In 1984, accepting the Turing Award, Ken Thompson showed that you cannot fully trust code you did not write yourself, and that the rot reaches all the way down to the compiler: a compiler can carry a backdoor that survives even after its own source is scrubbed clean, by recognizing when it is compiling itself and quietly reinserting the trick. The lesson is not that everything is hopeless. It is that “trust me, the compiler is clean” is never a real answer. The real answer is to name, exactly, what trusting the compiler commits you to. A compiler that prints its own trusted base and the axioms its proofs stand on is Thompson’s question answered out loud instead of waved away.
It even prints the axioms. An axiom-inventory gate runs over every theorem and fails the build on anything undocumented. The mathematical assumptions the proofs are allowed to lean on are named: propext, Classical.choice, Quot.sound, and the flagged native-code trust tier for compiled certificate checking. That is the literal answer to Vitalik’s “what math are you relying on,” extracted automatically rather than asserted in a README.
And the whole label is verifiable rather than merely stated. Reports are deterministic: the same source produces the same label every time. concrete diff compares two versions and flags when trust weakens, when a proof goes stale, when authority escalates, when a boundary erodes. A label you can regenerate and diff is evidence. A label you cannot is marketing.
#What it does not cover
Here is the part I would rather say myself than have you catch me on.
Concrete does not model the second half of Vitalik’s label at all. There is no notion of actors, incentives, collusion, honesty-until-some-time, or social trust anywhere in it. Its accounting is static, about which layers and which math to trust, not dynamic, about which humans behave well and for how long. The actor-and-time-scale half is a real and separate problem, and it belongs to mechanism design and economics, not to a systems language.
The verification is also partial, and the label says so. Proofs attach at the contract and proof-model level, over an intermediate representation and an idealized integer model. The chain from there through the backend down to the final binary is trusted, not verified, and binary correctness sits openly among Concrete’s explicit non-claims. Many obligations are still missing or end in a hand-written Lean proof rather than automatic discharge.
So the honest claim is not “Concrete proves your program correct end to end.” It is “Concrete proves selected claims over its proof model, and tells you exactly which properties are proven, by what, down to which trusted base, and which are not.” That is a stronger and more useful thing to be able to say than a green checkmark, and it is the opposite of the overconfidence that makes most security claims worthless.
#Who reads the label
binji’s objection is correct and survives even a perfect label. Labels create cognitive load, most people ignore them, and they fall back on a basket curated by someone they trust. This happens with food labels and diets, and it would happen with trust labels too. A manifest nobody reads is decoration.
But his own resolution points straight back at the kind of artifact Concrete produces. He wants privacy-preserving agents that carry the cognitive load while proving the verified logic of how they reached a conclusion. That vision needs a particular kind of input, and it is exactly the input Concrete emits.
Concrete’s label is machine-consumable, not human prose. It is stable structured data with identifiers, source spans, dependencies, and evidence classes. An agent reads it directly. The cognitive load binji worries about is a problem for a human staring at a wall of facts, not for software filtering those facts against a user’s policy.
And Concrete’s strongest conclusions arrive with proofs the kernel already checked, or with explicitly weaker labels when they do not. binji wants the agent to prove the logic behind its recommendation. With Concrete, the load-bearing proof evidence was checked independently of any agent. The agent does not have to be trusted to produce that evidence. It only has to point at evidence that already exists and that it cannot forge without changing the artifact.
#Trust should come from the artifact, not the agent
That reframes the agentic-trust conversation, which usually assumes the agent must be the source of trust, so the agent itself must be aligned, audited, and believed. Concrete suggests the inverse. Push verification down into the artifact, and the agent’s job shrinks from manufacturing trust to routing it. The agent becomes a consumer of facts it cannot fake.
The division of labor comes out clean. Concrete is the verified-input layer. The agent is the personalized policy layer. The kernel is the proof anchor for the strongest evidence, and the remaining trusted layers are named instead of hidden. Trust flows from checked proofs and explicit assumptions into the label, then from the label to whatever policy reads it. The point is not that trust disappears. The point is that the trust the agent routes is grounded in artifacts it did not invent.
One more line so I am not overselling. Concrete answers the input problem: trustworthy facts about an artifact. It does not answer the alignment problem: whether the agent faithfully serves the user. It makes the agent’s inputs unforgeable. It does not make the agent good. Anyone who blurs those two is selling the exact overreach this whole approach exists to refuse.
#The ingredients, not just the dish
Everything above labels a program you wrote. But the trust dependency that actually bites is the one you did not write: the parser that quietly starts logging to disk, the hash helper that adds a network call “for telemetry,” the dependency whose proof silently downgrades between versions. Vitalik’s phrase is “a full list of trust dependencies,” and in practice your dependencies are your imports. binji’s basket of ingredients is the import list.
Concrete’s design notes take the obvious next step, and I want to be exact that this part is written down as a direction, not yet shipped. The principle is that an import should not grant power; it should declare and bound what it brings in. So an import carries a ceiling and a floor:
import std.parse requires
import hmac.compute requires
import crypto.compare requires
and a manifest sets a whole-project authority budget:
[]
= ["Alloc"]
= ["File", "Network", "Process", "Unsafe"]
The payoff is that drift fails closed instead of being discovered after the breach. If the parser grows File authority, or the hash helper’s evidence downgrades from proved_by_lean to assumed, the build stops and demands an explicit, reviewed change to the constraint. That is the supply-chain backdoor from the top of this post, refused at compile time instead of mourned in a postmortem.
Capabilities, contracts, and evidence classes exist today; bounded imports and authority budgets are a design on paper, not a feature you can run. But the direction is the whole point, because it is where the label stops describing one program and starts describing the entire dependency tree, which is the only level at which “a full list of trust dependencies” is actually true.
#Labels should be compiler artifacts
The thesis underneath all of this is short. Nutrition labels for software should not be prose written by the vendor. They should be artifacts produced by the compiler, deterministic, diffable, and backed by machine-checked evidence wherever the strong claims are made. Concrete shows what that looks like for capabilities, contracts, proof obligations, evidence classes, axioms, and trusted boundaries. It is the same argument as a fact-producing compiler and when the compiler is the oracle, now pointed at a conversation happening somewhere else.
The systems-language world and the crypto-trust world are circling the same object from opposite sides. One wants software to emit a verifiable manifest of what it depends on. The other wants a language where that manifest falls out of compilation. Concrete does not solve the honest-actor half, and it does not pretend to. What it does is take the half that is mechanizable and actually mechanize it: the part you can build today, the part CI can reject, the part a reviewer can diff, and the part an agent can stand on without asking to be believed.