Self-Replicating Programs Emerge from Random Noise

A group of figures gathered around a glass vessel in candlelight, watching a scientific demonstration on a white bird
An Experiment on a Bird in the Air Pump, Joseph Wright of Derby, 1768

Most programmers think Turing completeness is the interesting threshold for a computational system. It gets all the attention. But a lower, stranger threshold matters more for the origin of complex behavior: self-replication.

A recent paper by Agüera y Arcas et al. shows that self-replicating programs spontaneously emerge from soups of random code. No one designs them. No fitness function selects for them. They assemble themselves from noise, take over the soup, and keep evolving. I reproduced the core result in about 300 lines of code.

#What Is Turing Completeness

In the 1930s, three people independently formalized what “computation” means:

All three systems turned out to define exactly the same class of computable functions. This equivalence is a proven theorem. The Church-Turing thesis goes further: it conjectures that these formalisms capture all of what is effectively computable, not just that they agree with each other.

A system is Turing-complete if it can simulate a Turing machine. This sounds like a high bar. It is not.

#The Bar Is Absurdly Low

You need exactly two things for Turing completeness:

Any system with both is Turing-complete.

The problem is that both of these show up naturally in almost any system designed to be even slightly flexible. You add conditionals because users want “if this then that.” You add repetition because users want to do things more than once. You add variables or cells or registers because users want to store intermediate results. At some point, without anyone planning it, those features combine into a general-purpose computer.

Turing completeness keeps appearing in systems never intended for general computation:

Nobody sat down and said “let’s make CSS Turing-complete.” They added features for styling, and those features accidentally crossed the threshold. Turing completeness is not a high peak you climb. It is a shallow pit you fall into.

#The Halting Problem as a Tell

There is an ironic way people sometimes discover accidental Turing completeness. Someone notices that a certain class of configurations can loop forever and there is no general way to predict which ones will terminate.

That is the halting problem. Alan Turing proved in 1936 that no algorithm can decide, for every possible program, whether it halts or runs forever. This is not a practical limitation. It is a mathematical impossibility.

Once a system is Turing-complete, the halting problem applies to it. If you discover that your type checker, your template engine, or your build system can loop forever in ways you cannot predict, you have probably built a Turing-complete system by accident.

#What Is Self-Replication

Self-replication is a different kind of threshold entirely. A self-replicating program is one that produces a copy of itself somewhere.

John von Neumann studied this in the 1940s using cellular automata. He wanted to understand the minimum requirements for a machine that could build a copy of itself. His self-replicating automaton was enormously complex, involving hundreds of thousands of cells. But the concept was clear: a system that reads its own description and writes that description into a new location.

The requirements for self-replication are more minimal than for Turing completeness. You do not need conditionals, arithmetic, or unbounded memory. You need:

A copy loop is sufficient.

#Quines Are Not What We Mean

A quine is a program that outputs its own source code. It is a self-description trick and a beloved puzzle in programming culture. But quines are not the kind of self-replication that matters here.

A quine runs once, prints itself to stdout, and stops. It does not spread. It does not compete. It does not modify its environment.

The replicators in the paper we are about to discuss do something different. They overwrite their neighbors. When program A executes alongside program B, A writes its own bytes over B’s bytes. Now there are two copies of A. Next epoch, both copies can overwrite two more neighbors. The difference is between self-description and self-propagation. Quines describe themselves. Replicators spread.

#Self-Replication Is a Lower Bar Than Turing Completeness

Self-replication requires less than Turing completeness.

Turing completeness requires conditionals and unbounded state. Self-replication requires only a copy loop. In the Forth variant studied in the paper, a 6-byte program is a complete self-replicator. In certain conditions, a single byte (0C) copies itself onto the neighbor’s tape. One byte.

If systems as narrow as CSS and Magic: The Gathering accidentally cross the Turing completeness threshold, then self-replication, a strictly easier feat, should be even harder to avoid. Any substrate where programs can read and write their own code is a candidate for spontaneous self-replication.

The paper puts this to the test.

#Previous Work: Tierra and Avida

Artificial life researchers have studied self-replicating programs since the early 1990s.

Tom Ray’s Tierra (1991) created a virtual machine populated with self-replicating programs. The programs competed for CPU time and memory. Over time, they evolved: parasites appeared that hijacked other programs’ replication machinery, then hyper-parasites that resisted parasites, then an entire ecosystem of strategies. It was a landmark result.

Charles Ofria’s Avida (2004) extended the idea into a full research platform. Programs could evolve to perform computational tasks in exchange for faster replication. Avida demonstrated the evolution of complex behaviors from simple replicators.

Both started with hand-crafted self-replicators. Neither asked whether one could assemble from nothing.

#The Paper

“Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction” by Blaise Agüera y Arcas, Jyrki Alakuijala, James Evans, Ben Laurie, Alexander Mordvintsev, Eyvind Niklasson, Ettore Randazzo, and Luca Versari answers that question. Yes. Self-replicators emerge spontaneously from random noise.

The paper uses a language called BFF, an extension of Brainfuck. BFF has 10 instructions and operates on a tape with two heads: one for reading, one for writing. The key property is that the program itself lives on the tape. There is no separation between code and data. Programs are self-modifying.

#Why Self-Modification Is the Key Ingredient

The experimental setup is not just “run random programs.” It is “concatenate two programs onto a shared tape and run them together.” This means program A’s instructions can write over program B’s bytes.

This is essential. Without it, programs would execute in isolation and nothing would change. Each program would do its thing, produce no lasting effect on any other program, and the soup would stay random forever.

The shared tape is the analog of chemistry. Molecules do not just exist side by side. They react. They break and form bonds. They change each other’s structure. In the same way, programs on a shared tape change each other’s code. That is where the dynamics come from.

#How the Experiment Works

The “primordial soup” algorithm:

  1. Create a pool of 2^17 programs, each 64 bytes long, initialized with random bytes
  2. Each epoch, randomly pair programs up
  3. For each pair, concatenate them into a single 128-byte tape and execute for up to 2^13 steps
  4. After execution, split the tape back into two 64-byte halves
  5. The modified halves replace the original programs
  6. Apply a small background mutation rate (0.024%, random bit flips)
  7. Repeat

There is no selection pressure. There is no reward signal. There is no fitness function. Programs just run, modify each other’s tapes, and get put back into the pool.

#What Emergence Looks Like

For the first few thousand epochs, the soup looks like noise. Complexity, measured by high-order entropy approximated via brotli compression, stays low. Programs modify each other randomly and the distribution of bytes drifts toward a stationary state biased by the BFF instruction set.

Then, suddenly, something changes. The paper calls it a “state transition.” Complexity spikes. The number of unique programs in the soup drops sharply. A self-replicator has emerged.

The self-replicator is a program that, when concatenated with any other program, copies itself over the other program’s half of the tape. Once one appears, it spreads exponentially. Within a few hundred epochs after the transition, most of the soup is copies of the replicator or close variants.

The paper traces the exact moment of emergence. In one case study, the first replicator appears at epoch 2355. Before that, there is a “pre-replicator” loop that copies bytes but is not yet a complete copier. Through a specific sequence of interactions with neighboring programs, the loop acquires the right structure to become a full self-replicator.

#It Is Not Just Random Initialization

A natural objection: maybe the self-replicators were already present in the random initial soup and just needed time to take over.

The paper rules this out carefully. They compare four experimental conditions across 1,000 runs each:

The short vs. long comparison shows that self-replicators are being created by the dynamics, not just discovered in the initial noise. The no-noise variant is particularly striking: even with zero background mutation and deterministic pairing, self-replicators still emerge at a similar rate. The main driver is self-modification through program interaction, not random noise.

#Replicator Generations

The first replicator to emerge is often fragile. In the case study, the initial self-replicator is a palindrome that copies itself in reverse. The reversed copy then copies itself back in the original direction. This works, but it has a flaw: it cannot handle zeros well. When the replicator encounters programs full of zero bytes, it floods the soup with zeros, a “zero-poisoning” phase where complexity stagnates and degrades.

Then something happens again. A more robust replicator appears somewhere in the soup. This one has a |{<,}| structure that can overwrite zeros. It overtakes the first replicator and its zero-poisoned debris, and complexity starts climbing again.

There are generations of replicators. The first one is not the last. They compete, and more robust variants replace fragile ones. Programs that copy themselves more reliably spread faster, a form of differential survival without an explicit fitness function.

#The 2D Grid

The paper also runs simulations on a 2D grid of 240 x 135 programs, where each program can only interact with neighbors within distance 2. Self-replicators still emerge, but now you can watch them spread spatially. A replicator appears somewhere on the grid and expands outward like a wave, overwriting the random pre-life soup.

The difference from the well-mixed soup is propagation speed. In the 0D soup, a replicator takes over in O(log n) epochs. On a 2D grid, it takes O(sqrt(n)) epochs because it spreads by contact with neighbors. The 2D setting also provides fertile ground for multiple replicator variants to coexist and compete, since spatial separation allows different lineages to persist.

#Beyond BFF

The paper tests the same primordial soup setup with other computational substrates:

Forth. A stack-based language with a restricted instruction set. Self-replicators emerge even faster and more consistently than in BFF. Almost all 1,000 runs show a state transition within 1,000 epochs (compared to 40% within 16,000 for BFF). The relative simplicity of self-replication in Forth explains this: a complete self-replicator can be just 6 bytes. In certain conditions, a single byte (0C) executing on an empty stack copies itself over.

Z80 CPU emulation. A 2D grid of 16-byte programs executed on an emulated real-world Z80 processor. Self-replicators emerge here too, but the dynamics are richer. First, a wave of stack-based replicators sweeps the grid, using the fact that PUSH instructions write to memory via the stack pointer. These form a coexisting ecosystem. Then a second wave appears: replicators that exploit the Z80’s LDIR and LDDR instructions, which are dedicated block-copy operations. These are more efficient and overtake the stack-based variants. Multiple generations of increasingly capable replicators, on a real CPU architecture.

Intel 8080. Produces simple two-byte non-looping replicators in long-tape settings.

SUBLEQ. The counterexample. SUBLEQ is one of the simplest Turing-complete languages, with just a single instruction. Despite being Turing-complete, self-replicators never spontaneously emerged in SUBLEQ soups, even after billions of iterations. The shortest known SUBLEQ self-replicator is 60 bytes (25 bytes in a 4-operand variant called RSUBLEQ4). Too long to assemble by accident from random interactions.

#What Determines Fertility

The SUBLEQ counterexample is the sharpest result in the paper. It shows that Turing completeness is not what matters. SUBLEQ can compute anything. But it cannot spontaneously produce self-replicators because the shortest one is too long.

What determines whether a computational substrate will spontaneously generate life is not its theoretical power. It is the length of the shortest self-replicator the language admits. Short replicators mean a short distance from random noise to a functioning copier. Long replicators mean the soup stays dead forever.

BFF: short replicators, fertile. Forth: even shorter replicators, even more fertile. Z80: medium-length replicators, fertile but slower. SUBLEQ: long replicators, barren.

This is a concrete, measurable property of a computational substrate. It is not about expressiveness or elegance or mathematical power. It is about how easy it is to accidentally build a copier.

#My Reproduction

I reproduced the 2D BFF soup in about 300 lines of code. The setup is a 240 x 135 grid of 64-instruction BFF programs with a maximum of 2^13 execution steps per interaction. Each program is visualized as an 8x8 pixel square, colored by its content.

The BFF interpreter is the bulk of the code. The rest is the soup logic (random pairing within a radius-2 neighborhood, execution, splitting, mutation) and visualization.

What you see when you run it: a grid of colored static. For a while, nothing seems to happen. The colors shift slowly as programs modify each other. Then, somewhere on the grid, a patch of uniform color appears. It grows. Within a few hundred more epochs, the entire grid is dominated by one or two colors, with small patches of variation at the boundaries where different replicator lineages meet.

That is a self-replicator taking over. No one wrote it. No one selected for it. It assembled itself from random interactions and then outcompeted everything else by the simple fact that it copies itself.

#Open Questions

Self-replication is just the first step. In biology, replication was the spark, but evolution was the fire. Once replicators compete for resources, selection pressure creates increasingly complex strategies: metabolism, signaling, cooperation, parasitism, immune systems.

Can any of that happen in these computational soups? The Z80 experiments hint at it: different replicator families coexist and compete, using different instruction strategies. But nothing remotely like metabolism or cooperation has been observed yet.

What would a substrate need to support behaviors beyond replication? The paper does not answer this, but it frames the question precisely. We now know that self-replication is easy. The hard question is: what comes after?

#Structure Without Design

The other things I have been writing about run in the opposite direction.

In the type systems article, the whole point is that humans design constraints and the compiler enforces them. Types are structure we impose on programs to prevent mistakes. In the theorem proving series, we go further: types become logical propositions, and the compiler verifies mathematical truth. Both are stories about humans carefully building formal structure and machines checking it.

This paper is the opposite. Nobody designed the self-replicators. Nobody wrote a type system or a proof checker. The structure emerged from noise through simple interaction rules. The programs in the soup have no types, no correctness properties, no formal guarantees. They just copy and mutate and compete.

And yet the result is recognizably structured. The replicators have internal logic. They use loops, conditionals, pointer arithmetic. Later generations are more robust than earlier ones. On the Z80, successive waves exploit increasingly powerful instructions. Something that looks like engineering appears without an engineer.

The type systems article and the theorem proving series are about what happens when you start with structure and push it as far as it can go. This paper is about what happens when you start with nothing and structure shows up anyway. Both directions are worth understanding. Formal verification and spontaneous emergence both deal in programs, types, and computational substrates. They are the same landscape seen from opposite ends.

#Further Reading