Can I prove Concrete programs in Lean?

A nude figure crouched on a rocky surface at the bottom of the sea, measuring with a compass on a scroll of geometric diagrams
Newton, William Blake, 1795

I have been thinking about what it would take to prove Concrete programs in Lean. Not the compiler itself, but actual user programs: take a function written in Concrete, connect it to its Core IR representation inside the compiler, and prove properties about it using Lean’s existing theorem library.

#Why I think there is an opening

Verified systems code is expensive. The seL4 microkernel took roughly 20 person-years to verify: about 200,000 lines of Isabelle proof for roughly 10,000 lines of C. Fiat Cryptography generates verified primitives that ship in real browsers. Mathlib has formalized an enormous amount of mathematics in Lean. The proof side is fine. The languages are the problem. They were never designed as proof targets, so most of the effort goes into bridging two unrelated worlds.

That bridging cost is what I want to attack. With VST you write C, a separate tool parses it into Coq, and you hope the tool’s model of C matches what your compiler actually does. Verus stays in Rust but has to model unsafe code, implicit Drop, Deref coercions, and everything else Rust was not designed to make verifiable. F* and Dafny start from verification and try to extract systems code, but you end up in a proof assistant. ATS tried to be both, but never caught on.

It always ends the same way: bolt verification onto a language that fights you, or extract systems code from a language that was not built for it. I think Concrete might be in a different position because of a decision I made early on: writing the compiler in Lean 4.

#What the compiler already gives me

Concrete’s compiler is written in Lean 4. I did not do this because I wanted to prove things about programs. I did it because Lean is a good language. But it created a structural opportunity I did not fully appreciate at first.

After parsing and elaboration, every Concrete program becomes Core IR, a small, explicit, fully typed intermediate representation defined as Lean inductive types:

inductive CExpr where
  | intLit (val : Int) (ty : Ty)
  | boolLit (val : Bool)
  | binOp (op : BinOp) (lhs rhs : CExpr) (ty : Ty)
  | call (fn : String) (typeArgs : List Ty) (args : List CExpr) (ty : Ty)
  | match_ (scrutinee : CExpr) (arms : List CMatchArm) (ty : Ty)
  | borrow (inner : CExpr) (ty : Ty)
  | borrowMut (inner : CExpr) (ty : Ty)
  | deref (inner : CExpr) (ty : Ty)
  -- ... ~40 constructors total across CExpr, CStmt, CMatchArm

This is native Lean data. Not a foreign object imported through FFI. Not a serialized AST that some other tool has to parse. The same types the compiler manipulates during elaboration, type checking, and lowering are the types I could write proofs about. There is no translation between two separate tools, though there are still internal boundaries (elaboration, the Core-to-math connection, lowering) where meaning could drift. Those boundaries live inside one codebase rather than across two ecosystems, and that matters.

All surface sugar is gone by the time code reaches Core. Method calls become plain function calls. ? becomes explicit match-and-return. -> becomes dereference-then-field-access. Borrows, moves, and destroys are explicit operations.

The compiler pipeline has hard boundaries:

Parse → Resolve → Check → Elaborate → CoreCheck → Mono → Lower → SSA → Emit

Core is the semantic authority. Everything before it is surface convenience; everything after it is lowering toward machine code. When I started thinking about where a proof boundary would go, Core was the obvious answer. It was already there.

#The idea

Here is what I want to do. Take a Concrete function, say one that reverses a list:

fn reverse<T>(xs: List<T>) -> List<T> {
    let mut acc: List<T> = List::Nil
    for x in xs {
        acc = List::Cons(x, acc)
    }
    return acc
}

The compiler already elaborates this into a CExpr value in Lean. I want to connect that CExpr to Lean’s existing list library and prove something about it, like that it preserves length.

To get there I need three pieces that do not exist yet.

Piece 1: formal evaluation semantics for Core. A Lean definition that says what Core expressions mean:

-- Sketch, not real code yet
inductive Eval : Env → CExpr → Value → Prop where
  | intLit : Eval env (.intLit n ty) (.int n)
  | binOp : Eval env lhs (Value.int a) →
            Eval env rhs (Value.int b) →
            Eval env (.binOp .add lhs rhs ty) (.int (a + b))
  -- ...

Piece 2: a connection between Core and Lean mathematics. I need to show that “this Core term, when evaluated” means the same thing as “this Lean function over Lean lists.” This is the hardest part, because you are connecting running code to the math you actually care about.

Piece 3: the proof itself. Once the first two pieces exist, I can state and prove:

theorem reverse_length (xs : List α) :
    length (concreteFn_reverse xs) = length xs := by
  -- proof using Lean's standard list lemmas
  -- concreteFn_reverse is the Lean-level interpretation
  -- of the Concrete function's Core representation

But here is why I keep thinking about this. Lean already has thousands of lemmas about lists, natural numbers, ordering, algebraic structures. I would not be rebuilding any of that. I would be reusing it on code that compiles down to native machine instructions with explicit ownership, no GC, no runtime. The connection layer is real work to build, but it is work you pay for once and reuse across every function you verify.

#Why I think Concrete’s design makes this feasible

Writing the compiler in Lean is not enough on its own. If Concrete had the same feature surface as Rust or C++, the Core IR would be enormous and formalization would be just as painful. The reason I think this can actually work is that I designed Concrete to be small, and several of the design choices I made for other reasons turn out to help here.

In Rust, reasoning about ownership requires modeling Drop: implicit destructors that run at scope exit, invisible control flow the programmer never wrote. In Concrete, owned values must be consumed exactly once and cleanup is explicit with destroy and defer. When you formalize Core, what you see is what executes.

Capabilities make effects visible in the type system. If a function has no capability annotations, it is pure; the compiler enforces that. with(File) means file I/O. I can mechanically carve out the pure fragment of a codebase and reason about it without dragging the operating system into the proof.

The trust boundaries double as proof boundaries. Safe code is compiler-checked. trusted code hides pointer-level implementation behind a safe API. Unsafe covers foreign calls and raw system access. The language already marks where each one is, so I do not have to figure out where trust enters the picture.

No operator overloading, no implicit conversions, no closures, no exceptions, no trait objects, no Deref coercions. I originally left these out because I think they make code harder to read and audit. It turns out they also make formalization harder. Not having them keeps Core small.

#How I compare this to other projects

Nobody gets this for free. Verus gets Rust’s ecosystem but has to model a language that fights verification at every turn. Fiat Cryptography generates verified code that no human wrote. RefinedC takes on all of C, and it shows.

What I want is something different: write real low-level code in Concrete, with explicit ownership that compiles to native executables, and then prove properties about it in Lean using Lean’s existing libraries. If the bridge works, the code you prove would be the code you ship, not a model that gets translated. And you would not need a bespoke verification ecosystem because Lean already has one.

Concrete does not have Rust’s ecosystem or C’s installed base. I am betting that a small, explicit language can make the proof part cheap enough to be worth it anyway.

#What exists and what I still need to build

Here is where this actually stands.

What exists: the compiler (~14,300 lines of Lean 4), Core IR as Lean inductive types, explicit pipeline boundaries, CoreCheck validating semantic invariants, ~490 test programs, real compiled programs, and audit reports for capabilities, unsafe boundaries, allocation, and layout.

What does not exist: formal evaluation semantics for Core, a typing judgment as a Lean proposition, or any actual proof about any Concrete program. The pieces are lined up but nobody has written a proof yet.

I have a rough order in mind.

First, formalize a pure Core fragment: integer and boolean literals, binary operations, let bindings, function calls, pattern matching. Maybe 15 constructors. Small enough to formalize in weeks, not months.

Then, one proof of one function. I want to take something trivial, like summing a list, and push it through the full pipeline: Concrete source to Core to Lean proof. The proof itself does not matter. What matters is whether the pipeline actually works end to end.

After that, grow the provable fragment: structs, enums, borrows, loops. Last, prove that the compiler passes preserve meaning, from elaboration through lowering to SSA. That depends on everything else being solid first.

#Where I expect problems

Pure functions over algebraic data types are the easiest proof target: arithmetic, structural transformations, parsers over bounded inputs. The first proofs should live here.

FFI, Unsafe, and trusted code all sit outside the proof boundary. Foreign functions are black boxes. trusted and Unsafe code do things the type system cannot fully track. You model their contracts as axioms and you do not pretend otherwise. I prefer it this way: the proof tells you exactly where you are trusting something you have not verified, instead of pretending everything is covered.

Mutable heap code is where things start getting expensive again. Separation logic can model it. I would rather earn the right to worry about that later.

Concurrency is last. Real concurrency proofs need explicit models of scheduling, memory ordering, and synchronization, and I do not want to formalize any of that before the simpler cases work well. Concrete defers concurrency to later phases of the language for this reason.

There is another problem I have not mentioned yet: even after formal evaluation semantics are defined for Core, those semantics could diverge from what the compiler’s lowering passes actually do. But this is a single boundary inside a single codebase, not two separate tools that have to agree on what C or Rust means. Closing that gap (proving lowering preserves semantics) is itself a well-defined proof target for later.

#What comes next

None of this has been proved yet. The compiler exists, the IR exists, the pipeline exists. The proofs do not. One real proof of one real function will tell me whether this actually works. That is what I am working on next.