Building a Tiny Theorem Prover in Python
The first article in this series explained the Curry-Howard correspondence: propositions are types, proofs are programs. That tells you why theorem proving fits so naturally with programming languages. It does not yet tell you what the machine looks like.
A theorem prover, concretely, is smaller than most people expect. A tiny theorem prover is just:
- a language for terms
- a language for types / propositions
- a checker that decides whether a term has a type
- a tiny trusted kernel that defines the legal moves
This article builds that architecture in plain Python. We are not abusing Python’s own type system. Python is just the implementation language. The prover we build has its own terms, its own propositions, and its own checker.
That distinction matters. The goal here is not to show off host-language cleverness. I want the moving parts to be impossible to miss.
#The Smallest Useful Core
If you strip theorem proving down to the bone, you do not need arithmetic, tactics, automation, or even a parser. You need a handful of term forms and a handful of typing rules.
For a first prover, the right target is:
- implication
- variables
- function abstraction
- function application
- named assumptions in a context
This is enough to represent the simply typed lambda calculus, which under Curry-Howard corresponds to intuitionistic propositional logic.
That means we can already represent and verify proofs of statements like:
A -> AA -> B -> A(A -> B) -> (B -> C) -> A -> C
These are not toy programming examples. They are logical theorems.
#Terms, Propositions, and Contexts
We need three datatypes:
- Types: propositions like
A,B, orA -> B - Terms: proofs/programs like variables, lambdas, and applications
- Contexts: the assumptions currently in scope
In Python:
pass
:
:
:
pass
:
:
:
:
:
:
The context is just a mapping from variable names to types:
=
That mapping is the whole proof language.
Why these constructors?
Var(x)means “use the assumption namedx”Lam(x, A, body)means “assumex : A, then provebody”App(f, a)means “apply a proof ofA -> Bto a proof ofAto get a proof ofB”
That is exactly how implication works in natural deduction.
#The Checker Is the Kernel
Now we write the only thing that really matters: the checker.
return
=
=
=
return
=
=
return
This is the trusted kernel.
If infer is correct, then any term it accepts is a valid proof of the type it returns. If infer is wrong, the system is unsound. Theorem provers target tiny kernels because every line of this function is code you have to trust.
#What the Rules Mean Logically
Each branch of infer is one proof rule.
#Variables
==
Logically, this means:
- if
x : Ais in your assumptions - then you may conclude
A
That is the rule of assumption.
#Lambdas
==
Logically:
- assume
A - under that assumption, derive
B - therefore derive
A -> B
That is implication introduction.
#Applications
Logically:
- if
fprovesA -> B - and
aprovesA - then
f(a)provesB
That is implication elimination, better known as modus ponens.
The checker enforces logical inference rules directly.
#First Proof: A -> A
The identity function is the simplest proof in the system:
=
=
The inferred type is:
Under Curry-Howard, that means the term proves A -> A.
fun x => x is both:
- a program that returns its input
- a proof that if
Aholds, thenAholds
The same term inhabits both interpretations.
#A Slightly Less Trivial Proof: A -> B -> A
Now prove that if A holds and B holds, then A holds.
=
=
=
The inferred type is:
Logically, that is:
A -> B -> A
The proof is just “ignore the second assumption and return the first one.”
Ordinary functional programming and formal proof are the same structure seen two ways.
#Composition: (A -> B) -> (B -> C) -> A -> C
Now something that feels like actual reasoning:
=
=
=
=
The checker returns:
This is proof as composition of evidence:
fturns evidence ofAinto evidence ofBgturns evidence ofBinto evidence ofC- so together they turn evidence of
Ainto evidence ofC
In logic, that is a theorem. In programming, it is function composition.
The same term carries both readings.
#What This Tiny Prover Cannot Yet Do
At this point we have a real prover, but it is extremely small.
It cannot yet express:
- conjunction (
A and B) - disjunction (
A or B) - quantifiers over values
- equality
- natural numbers
- induction
- computation inside types
That is fine. The first implementation exists to make the kernel architecture legible.
Even in this tiny form, you can already see the whole pattern:
- terms are proofs
- types are propositions
- checking is proof verification
Everything more advanced is an extension of that base.
#The Next Useful Extension: Pairs
If you wanted to grow this prover one step, the best next move would be conjunction.
Add:
- a
PairType(A, B) - a
PairTerm(a, b) - projections
FstandSnd
Then the checker gets three more rules:
- if
a : Aandb : B, then(a, b) : A and B - if
p : A and B, thenfst(p) : A - if
p : A and B, thensnd(p) : B
That would let you prove statements like:
A -> B -> (A and B)(A and B) -> A(A and B) -> B
The architecture would not change. You would just add more term forms and more rules.
That is the recurring theme in theorem provers: small, explicit rules compose into powerful systems.
#What Lean Adds Beyond This
The toy checker here and Lean are the same kind of machine, but Lean adds several layers you immediately miss once you try to prove anything nontrivial:
- Dependent types: types can mention values
- Definitional equality: terms can reduce during checking
- Inductive types: natural numbers, lists, trees, equality, all built into the core theory
- Elaboration: Lean fills in omitted arguments, infers implicits, and resolves notation
- Tactics: you work at the level of proof state instead of raw lambda terms
- Automation: simplifiers, arithmetic solvers, rewriting, decision procedures
- A real trust boundary: private constructors, kernel isolation, no honor system
But none of that changes the central picture. There is still a kernel. There are still terms. There are still types. There is still a checker.
That is why building even a tiny prover is so clarifying. It separates the parts that make the logic work from the parts that make the system pleasant to use.
#Why Python Is the Right Language for This Article
Python’s type system is too weak for the Julia-style trick of making the host language do the proving. That is exactly why it is useful here.
In this article, the prover is explicit:
- the terms are your dataclasses
- the propositions are your dataclasses
- the checker is your function
- the trust boundary is code you can point at with one finger
Nothing is hidden in the host language. That makes Python a good teaching language for the architecture, even though it would be a bad language for host-type-system metaprogramming.
#The Real Lesson
The distance from ordinary interpreter code to a theorem prover kernel is much smaller than most programmers think. You start with:
- a syntax
- a few inference rules
- a checker that enforces them
That is enough to get a real notion of proof.
Lean is the same architecture, pushed further and engineered better.
The next article shows the complementary perspective: instead of implementing the prover explicitly, it embeds a tiny one inside Julia’s own type system. Same ideas, different lesson.