A Fact-Producing Compiler

April 09, 2026 10 min read Keywords: concrete, compilers, AI, llm, Formal Verification, Programming Languages

A young woman in a yellow jacket pauses while writing a letter, looking up from her desk — *A Lady Writing*, Johannes Vermeer, c. 1665

Series note: this article continues the argument from When the Compiler Is the Oracle. It also responds to Dmitri Sotnikov’s article Giving LLMs a Formal Reasoning Engine for Code Analysis. For the foundation of Concrete, start with Why Concrete Exists.

Yesterday I read Dmitri Sotnikov’s article about giving LLMs a symbolic reasoning engine for code analysis. His tool, Chiasmus, parses source code with tree-sitter, turns definitions and calls into logic facts, and lets an LLM ask formal graph questions instead of rummaging through files.

That is exactly the right direction.

If an agent asks “can user input reach this SQL query?”, grep is the wrong abstraction. The answer is not a line of text. The answer is a graph property. If an agent asks “what changes if I touch this function?”, the answer is reverse reachability over the call graph. If it asks “is there dead code?”, the answer is a definition/call/entry-point query. These are not vibes. They are structural facts about a program.

Reading the post made me realize I have been using the wrong phrase for part of Concrete.

Concrete is not only a language with reports.

Concrete is becoming a fact-producing compiler.

#The problem is not that LLMs forget to grep

The usual agent loop for codebase exploration is embarrassingly primitive.

Search for a name. Read some matches. Search for one caller. Read that. Search for another. Build a little call graph in the context window. Forget half of it. Miss a path through a wrapper. Ask another grep question. Repeat.

If this feels familiar, it is because we do it too. We have trained our tools to be very fast at finding text and then asked the model to reconstruct the program from fragments.

This is not because the model is lazy. It is because text search is not a program analysis interface.

A codebase contains facts:

defines(file, function, line)
calls(caller, callee)
imports(module, dependency)
entry_point(main)

Once you have those facts, the question “can A reach B?” is ordinary graph reachability. “Who depends on B?” is reverse reachability. “What is dead?” is “defined, not reachable from an entry point.” A solver, database, compiler, or ten-line graph traversal can answer those questions exhaustively. The LLM should translate intent into a query and explain the answer. It should not reconstruct the graph by hunting through source text.

That is the neurosymbolic point in practical form: let the neural part understand the human question. Let the symbolic part answer the structural question.

#Concrete starts one layer deeper

Chiasmus has to recover structure from existing languages. It parses source files with tree-sitter, extracts definitions and calls, serializes them as Prolog facts, and queries those facts.

That is powerful because it works for languages that were not designed for this workflow.

Concrete has a different opportunity: the compiler already has the facts.

It does not have to guess that foo() is a call. It checked it. It resolved the name. It checked the callee’s type. It checked the callee’s capabilities. It lowered the call into checked Core. It knows which monomorphized function exists after generics. It knows where the authority came from.

So the direction for Concrete should not start with a tree-sitter adapter. The direction should be:

Concrete source
  -> checked compiler facts
  -> machine-readable report / fact artifact
  -> query CLI
  -> editor / CI / review / MCP / agents

The compiler should produce the fact graph as part of its normal work.

#What Concrete already knows today

Concrete is still experimental. It does not have the query CLI yet. It does not have an MCP server. It does not have a polished fact artifact format.

But the compiler is already producing semantic information that most systems languages never expose as a unified surface.

Right now that surface is still a human report. For example, an authority report can already show where a capability comes from:

capability Alloc (13 functions):
  pub store  <- store -> vec_push
      parse_string  <- parse_string -> store
  pub parse_value  <- parse_value -> parse_string
      parse_array  <- parse_array -> vec_push
      parse_object  <- parse_object -> vec_push
      parse_kv  <- parse_kv -> parse_string

That is useful, but it is only the beginning.

In the compiler today, we can track and report things like:

Capabilities. Which functions require File, Network, Process, Random, Console, Alloc, Unsafe, and so on. A function cannot call a more powerful function unless its signature declares that authority.

Transitive authority. Not merely “this function has File.” The report can show the call chain that explains why authority is required.

Allocation. Which functions allocate or have allocation authority. The predictable profile can reject allocation.

Trust and FFI boundaries. Which functions are trusted, which extern calls exist, which functions cross the foreign boundary.

Recursion. Direct recursion and mutual call cycles are detected from the call graph.

Loop boundedness. The compiler classifies functions by loop shape: no loops, structurally bounded loops, unbounded loops, mixed.

Predictable-profile compatibility. --check predictable can reject direct recursion, mutual recursion, unbounded loops, allocation, FFI, and blocking authority. In the packet-decoder example, the parser core can pass while the I/O shell fails for the right reason.

Evidence level. The effects report can distinguish claims that are merely reported, structurally enforced, proof-backed, or dependent on a trusted assumption.

Proof attachment. Proof evidence is keyed by qualified function identity and a body fingerprint. A function with the same name but different checked body does not silently keep its proof.

Proof staleness. If a function has a registered proof by name but the body fingerprint changes, the report can downgrade the evidence and say the proof is stale.

Fingerprints. The compiler can report canonical fingerprints of checked function bodies for proof/evidence attachment.

Lean proof slices. We have started connecting selected Concrete functions to Lean proofs through a restricted proof model. This is small. But it is real enough to make proved an evidence level in the compiler’s report, not a marketing adjective.

None of this is an LLM feature. It is compiler work.

But once the compiler knows these things, an agent can use them.

#Reports are not the final interface

Human-readable reports are a good beginning. They are terrible as the final substrate for automation.

An agent should not parse a box-drawn table and hope the columns stay aligned. A CI gate should not grep prose. A proof tool should not guess which text line corresponds to an obligation.

The compiler needs to emit a durable fact artifact.

Something conceptually closer to:

function(main.decode_header)
defined_at(main.decode_header, "examples/packet/src/main.con", 42)
calls(main.recv_and_decode, main.decode_header)
requires_cap(main.recv_and_decode, Network)
authority_path(main.recv_and_decode, Network, [main.recv_and_decode, std.net.recv])
alloc_status(main.decode_header, no_alloc)
recursion_status(main.decode_header, non_recursive)
loop_status(main.decode_header, no_loops)
predictable(main.decode_header, pass)
evidence(main.decode_header, proved)
body_fingerprint(main.decode_header, "...")
proof_obligation(main.decode_header, decode_header_rejects_short, proved)

I am not proposing this exact syntax. JSON may be right. SQLite may be right. Prolog facts may be right. Lean data may be right. A custom artifact may be right.

The important part is the boundary: checked compiler facts first; query surface second; LLM integration third.

#What you should be able to ask

Once Concrete has that fact artifact, the first query interface can be boring.

concrete query callers main.decode_header
concrete query reaches main.main main.decode_header
concrete query path main.main --to-cap Network
concrete query authority-path main.main File
concrete query proof-impact main.parse_byte
concrete query predictable-failures
concrete query dead-functions

The answers should be small and structural.

If main can reach the network, show the path:

main.main
  -> main.recv_packet
  -> std.net.recv_into
  -> extern.recv

If a proof became stale, show the identity and fingerprint mismatch. If a function fails the predictable profile, show which gate failed and which call or loop caused it. If a parser core can reach allocation, show the allocation path.

This matters because “why” is part of the fact.

A boolean answer is not enough for a review. reachable: true tells me something is wrong. The path tells me where to cut.

#Where MCP fits

An MCP server is a good later interface.

Not first.

I do not want to start with “ship a Concrete MCP because MCP is fashionable.” That gives us a demo before we have the thing the demo should be querying.

The order should be: improve the compiler facts, emit them in a machine-readable format, build a boring query CLI, then let an MCP server be one more client of the same artifact.

The agent asks:

Can the packet parser core touch the network?

The tool answers:

{
  "reachable": false,
  "from": "main.decode_header",
  "to_capability": "Network",
  "evidence": "compiler-checked call graph and capability facts"
}

Or:

Why is main not predictable?

And the tool answers:

{
  "function": "main.main",
  "profile": "predictable",
  "status": "fail",
  "violations": [
    {
      "gate": "no_blocking",
      "capability": "File",
      "path": ["main.main", "std.fs.read_to_string"]
    }
  ]
}

The LLM writes the explanation. The compiler supplies the facts.

#Why this is different from Rust

Rust has excellent compiler-enforced memory safety. Rust also has cargo metadata, rustdoc JSON, MIR, rust-analyzer, Clippy, miri, cargo-geiger, cargo-udeps, cargo-audit, perf tools, and a universe of analysis tools.

That ecosystem is serious.

But the deepest Concrete facts come from a different language boundary.

In Rust, whether a function reads files, touches the network, spawns a process, allocates on the heap, blocks on a host call, crosses FFI, calls an unsafe wrapper, or remains within an analyzable execution profile is not part of the ordinary function signature. Some of it can be approximated. Some of it can be linted. Some of it can be discovered from MIR or LLVM. But it is not one semantic ledger that the language maintains for every function.

Concrete is trying to make that ledger normal.

Not because queries are cool. Because the ledger is useful to humans before AI appears. It is useful in code review. Useful in CI. Useful in safety cases. Useful in dependency audits. Useful in proof attachment.

Agents make the payoff more obvious because they punish every place where the truth is implicit. If the only way to discover a property is “read these twenty files carefully,” the agent will sometimes fail. If the compiler emits the property as a fact, the agent can query it.

#What we support now, and what comes next

Status matters, so here is the split.

Working now in the experimental compiler:

explicit function capabilities
capability enforcement
authority reports
allocation reports
unsafe/trusted/extern reports
combined effects/evidence reports
recursion / call-cycle detection
loop-boundedness classification
--check predictable
evidence levels: reported, enforced, proved, trusted assumption
qualified proof identity plus body fingerprint matching
stale-proof reporting
selected Lean proof slices connected back to report evidence

Next on the critical path:

source file / line / span in human-facing reports
Elm-like errors for predictable-profile and proof-evidence failures
machine-readable effects/evidence output
proof/spec/result registry artifacts instead of hardcoded proof attachment
proof-obligation report
source-to-ProofCore extraction report

After that:

query CLI over compiler facts
semantic diff and trust drift
module/package policy gates
attacker-style demo that adds authority/allocation/proof drift and gets caught
agent-readable performance research packet
MCP or equivalent agent interface over the same facts

This order is important. If we build the agent interface first, we get a demo. If we build the fact artifact first, everything else gets better.

#The compiler should say what is true

The sentence I keep coming back to is simple:

A compiler should say what is true about the program.

Not only “type check passed.”

What can this function touch? Can it allocate? Can it block? Is it recursive? Are its loops structurally bounded? Does it cross FFI? Does it depend on trusted code? Is it in the predictable subset? Is the proof current? Which proof obligation backs this claim? Which dependency widened authority since yesterday? Which exact path explains the authority?

Those are program facts.

Concrete already produces some of them. It enforces some of them. It proves a tiny subset of them. It should make all of them easier to query.

The future I want is boring to describe and strange to use: you ask your compiler a question about operational power, execution risk, or proof evidence, and it answers with a checked fact and a path.

No archaeology. No grep ritual. No “I inspected a few matches and it seems fine.”

The compiler says what it knows.