At the Core of Finance Lies Geometry. In the End, It’s All Jensen’s Inequality.

March 05, 2026 30 min read Keywords: Jensen's inequality, Kelly criterion, ergodicity economics, tail hedging, logarithms, Spitznagel, Taleb, Peters

A scholar in a dark room reaches toward a celestial globe on a table covered with a rich tapestry — *The Astronomer*, Johannes Vermeer, 1668

Never cross a river that is on average four feet deep. This is Nassim Taleb’s warning about the difference between arithmetic averages and the reality of survival. If the river is eight feet deep in the middle and dry on the sides, the average tells you nothing about whether you will drown. You will drown in the middle, or you won’t. There is no averaging across parallel universes where you both survive and die.

This observation is not just a heuristic. It is a mathematical fact about the geometry of multiplicative processes, processes where wealth compounds, where growth feeds on itself, and where a loss of 50% requires a gain of 100% just to break even. The mathematics that governs survival in these environments was invented 400 years ago to help astronomers multiply large numbers, partially rediscovered in the 18th century by Daniel Bernoulli, formalized again in the 20th century through information theory, and then largely obscured by theories that optimized across hypothetical worlds instead of along a single path through time.

This is the story of why geometry sits at the core of finance, why Jensen’s inequality is the clearest expression of that geometry, and why reducing variance can be more valuable than increasing returns.

#I. The Invention of Linearization (1614)

In the early 17th century, astronomers were drowning in calculation. Johannes Kepler had just published his laws of planetary motion, and navigators were trying to compute positions using spherical trigonometry. A single problem might require multiplying two seven-digit numbers, a process that took skilled calculators half an hour and was prone to error.

John Napier, a Scottish laird and amateur mathematician, spent 20 years searching for a way to simplify this. His insight: if you could convert multiplication into addition, calculations would become trivial. He invented “logarithms” (from Greek logos = ratio, arithmos = number), a table that mapped every number to its “ratio-representative.”

The crucial property: $\log(ab) = \log(a) + \log(b)$. Multiplication becomes addition. Division becomes subtraction. Exponentiation becomes multiplication.

But Napier’s invention was more than a computational trick. He had discovered the mathematical tool for linearizing multiplicative processes. Four hundred years later, this same tool would reveal why volatility destroys wealth and why tail hedging works.

#II. The Number That Grows Continuously (1685)

In 1685, Jacob Bernoulli investigated a financial puzzle: if you lend money at 100% annual interest, what happens if you compound it more frequently?

Annually: $(1 + 1)^1 = 2$
Monthly: $(1 + 1/12)^{12} \approx 2.61$
Daily: $(1 + 1/365)^{365} \approx 2.71$

Bernoulli discovered that as the compounding interval shrinks toward zero, the result approaches a limit:

$$e = \lim_{n \to \infty} \left(1 + \frac{1}{n}\right)^n \approx 2.71828$$

The number $e$ emerges naturally from continuous compound growth. It is the base of the logarithm that measures growth rates, the “natural” logarithm $\ln(x)$. Where Napier gave us the tool to linearize multiplication, Bernoulli gave us the base that emerges when growth becomes continuous.

Together, Napier and Bernoulli provided the mathematical foundation: wealth grows multiplicatively, and logarithms convert this multiplicative growth into additive increments.

#III. Jensen’s Breakthrough (1906)

The missing geometric step arrived in 1906. Danish mathematician Johan Jensen proved the inequality that now bears his name. He showed that whenever a function is concave, averaging inputs before applying the function gives a larger result than applying the function first and then averaging.

That result sounds abstract, but it is exactly the bridge this story needs. Napier gave us logarithms. Bernoulli showed why the natural logarithm belongs to compounding. Jensen explained why variability hurts once the relevant function bends downward.

#IV. Jensen’s Inequality: The Geometry of Concavity

A function is concave if it bends downward. If you draw a straight line between two points on the curve, that line sits below the curve itself. The logarithm is concave. That simple geometric fact turns out to matter enormously in finance, because wealth compounds.

Jensen’s Inequality states that for a concave function $\varphi$:

$$\mathbb{E}[\varphi(X)] \leq \varphi(\mathbb{E}[X])$$

The notation is simple once you unpack it. $\mathbb{E}[X]$ means the expected value, or average, of $X$. The symbol $\varphi(X)$ means “evaluate the function $\varphi$ at $X$.” So the inequality says: for a concave function, the expected value of the transformed variable is less than or equal to the transformed expected value. The gap between those two quantities is the Jensen gap. It is the mathematical penalty created by variability under concavity.

Here is the simplest possible example. Suppose your wealth factor is either $1.5$ or $0.5$ with equal probability. In other words, you either gain 50% or lose 50%.

The arithmetic average wealth factor is $(1.5 + 0.5)/2 = 1.0$
The log of that average is $\ln(1.0) = 0$
But the average log is $\frac{\ln(1.5) + \ln(0.5)}{2} \approx -0.144$

So the average outcome looks flat, but the expected log-growth is negative. Under repeated exposure to the same kind of multiplicative gamble, that means long-run compound growth is negative even though the arithmetic average looks harmless. That is Jensen’s inequality in action.

To see this more viscerally, start with $100:

After a 50% gain: $$100 \times 1.5 = $150$
After a 50% loss: $$150 \times 0.5 = $75$

You end with $75, a 25% total loss, despite the arithmetic average return being zero. The loss applied to a larger base ($150) than the gain ($100), so it took away more than the gain added. Compounding makes volatility expensive.

Apply this to log-wealth:

$$\mathbb{E}[\ln(1+R)] \leq \ln(1 + \mathbb{E}[R])$$

Here $R$ means the return in one period. If you gain 10%, then $R = 0.10$. If you lose 20%, then $R = -0.20$. The expression $1+R$ is your wealth factor, the number your money gets multiplied by over that period.

The left side is the average log-growth rate, which is what determines long-run compounding. The right side is the log of one plus the average return. The inequality is strict whenever returns vary. If returns were perfectly constant, the two sides would be the same. Variability is what creates the gap.

That distinction matters. The Jensen gap is the general geometric fact. Variance drag is the second-order approximation to that fact when returns are not too large.

To estimate the size of the gap, expand $\ln(1+R)$ in a Taylor series around the mean return $\mu$:

$$\ln(1+R) \approx \ln(1+\mu) + \frac{R-\mu}{1+\mu} - \frac{(R-\mu)^2}{2(1+\mu)^2} + …$$

Now take expectations term by term. Because $\mu = \mathbb{E}[R]$, we have $\mathbb{E}[R-\mu] = 0$, so the linear term vanishes. And because $\sigma^2 = \mathbb{E}[(R-\mu)^2]$, the quadratic term becomes the variance. That gives:

$$\mathbb{E}[\ln(1+R)] \approx \ln(1+\mu) - \frac{\sigma^2}{2(1+\mu)^2}$$

For small returns, and therefore for $|\mu| \ll 1$, we can use $\ln(1+\mu) \approx \mu$ and $(1+\mu)^2 \approx 1$. Then this simplifies to:

$$G \approx \mu - \frac{\sigma^2}{2}$$

Here $\mu$ is the arithmetic mean return, $\sigma^2$ is the variance of returns, and $G$ is the approximate geometric growth rate. So the geometric growth rate is approximately equal to the arithmetic mean $\mu$ minus half the variance. Variance drag is literally the second-order term in the Taylor expansion of a concave function.

This is Jensen’s inequality showing up in a form that is easy to compute. Because the logarithm is concave, variance lowers expected log-return. A portfolio with $\mu = 10%$ and $\sigma = 20%$ has geometric growth of approximately $8%$, which means a 2% annual drag from volatility alone.

#V. Kelly’s Criterion: Optimization Under Concavity

Long before Kelly, Daniel Bernoulli had already moved in the right direction. In 1738, in his analysis of the St. Petersburg paradox, he proposed logarithmic utility. He did not frame it in terms of time averages or ergodicity, but he correctly saw that the value of money is not linear and that multiplicative risk changes the problem.

Claude Shannon is not just background here. His work is foundational. In 1948, Shannon created information theory, a mathematical framework for reasoning about signal, noise, and transmission. The deep idea is that information can be measured, that uncertainty has structure, and that better information changes what an optimal repeated decision looks like. That is the intellectual foundation Kelly stands on.

In Shannon’s framework, information has operational value because it improves your ability to act under uncertainty. Kelly’s central insight was to take that logic and apply it to betting and capital allocation. If information improves the quality of your edge, then there is an optimal way to convert that edge into compounded wealth growth.

In 1956, John Larry Kelly Jr., a researcher at Bell Labs, derived the optimal betting strategy for a gambler with a private wire giving him noisy information about horse races. His derivation came directly from Shannon’s information theory, the mathematics of signal transmission through noisy channels.

Kelly maximized the expected logarithm of wealth because wealth compounds multiplicatively, and the logarithm converts this into additive growth rates that can be averaged over time.

The Kelly criterion maximizes:

$$G = \mathbb{E}[\ln(W_t/W_{t-1})]$$

Here $W_t$ means your wealth at time $t$. So $W_t/W_{t-1}$ is simply “how much your wealth changed this period,” and the logarithm turns that multiplicative change into something you can add across time.

For a simple bet with probability $p$ of winning, odds $b$, and probability $q=1-p$ of losing, the optimal fraction $f^*$ of wealth to bet is:

$$f^* = \frac{pb - q}{b} = \frac{p(b+1) - 1}{b}$$

For continuous returns, where $\mu$ is the expected return, $r$ is the risk-free rate, and $\sigma^2$ is the variance, the Kelly fraction becomes:

$$f^* = \frac{\mu - r}{\sigma^2}$$

Notice what appears in the denominator: variance. Kelly’s formula explicitly shows that optimal position size depends on the ratio of edge to variance. A strategy with twice the edge but twice the variance gets the same allocation as the original. Variance is not just a measure of “risk.” It is a direct divisor of optimal exposure.

Kelly betting grows wealth faster than any other strategy in the long run. It dominates all other strategies with probability approaching 1 as time goes to infinity.

Edward O. Thorp was the person who carried this from theory into practice. He used Kelly-style reasoning first in blackjack and then in markets, showing that log-optimal sizing was not just an elegant theorem. It was a workable decision rule under uncertainty.

Leo Breiman gave the result one of its clearest mathematical statements. In 1961, he showed that the log-optimal strategy asymptotically dominates alternative strategies under broad conditions. Kelly gave the rule. Thorp made it operational. Breiman helped make the long-run claim precise.

#Fractional Kelly

Practitioners rarely use full Kelly. The optimal fraction maximizes growth rate but produces extreme volatility, and drawdowns of 50% or more are common. Instead, they use half-Kelly or quarter-Kelly:

$$f_{half} = \frac{f^*}{2}$$

Under the standard local approximation around the Kelly optimum, this reduces growth rate by about 25% but cuts volatility in half. It also provides a safety margin against estimation error. If your estimate of $\mu$ or $\sigma$ is wrong, full Kelly can be catastrophic. Fractional Kelly is the recognition that maximizing geometric growth is the goal, but estimation uncertainty requires humility.

#VI. Ergodicity Economics: Time vs. Ensemble

In 2019, physicist Ole Peters published a paper in Nature Physics that should have ended 250 years of economic confusion. Peters is one of the central modern figures in this article’s argument because he does not merely add another risk model. He redefines the objective. His broader research program, including earlier work with Alexander Adamou, showed that expected utility theory, the foundation of modern economics, rests on the implicit assumption of ergodicity.

To understand the force of that critique, it helps to name the benchmark. In 1944, John von Neumann and Oskar Morgenstern formalized expected utility theory in Theory of Games and Economic Behavior. Their framework asks which action maximizes average utility across possible states of the world. It became the dominant mathematical language of rational choice. The Peters critique is not aimed at a vague intuition. It is aimed at this formal benchmark.

A process is ergodic if the time average equals the ensemble average. If 100 people flip a coin once, the average outcome equals one person flipping a coin 100 times. In ergodic systems, you can replace time with parallel trials.

Wealth growth is not ergodic. If 100 people each bet their entire net worth on a fair coin flip, about 50 will be wiped out. If one person bets their entire net worth 100 times, they will be wiped out with certainty. The ensemble average (50% survive) looks fine. The time average (certain ruin) is catastrophic.

This is where the distinction between ensemble optimality and pathwise optimality becomes decisive. A strategy can look optimal when you average across many parallel worlds, yet still be disastrous for one person living through one realized sequence of outcomes.

Peters’ insight, sharpened in related work with Adamou, is that economists since Bernoulli (1738) have maximized expected utility, an ensemble average across parallel states of the world. But individual investors experience time averages. They cannot access parallel universes where they both survived and went bankrupt.

Adamou’s contribution matters here because the joint Peters-Adamou work did not just restate the ergodicity objection in abstract terms. It used specific paradoxes, especially the St. Petersburg paradox, to show how changing the time resolution of the problem changes what counts as a rational decision. That made the time-average interpretation concrete rather than merely philosophical.

The correction is simple: maximize the time-average growth rate. This is exactly Kelly’s criterion. The logarithmic utility that Daniel Bernoulli invented to solve the St. Petersburg paradox in 1738 was correct, but economists forgot why: it emerges naturally from the non-ergodicity of multiplicative growth.

This also clarifies the limit of mean-variance thinking. Markowitz’s framework is useful as a first approximation, but it treats risk as a tradeoff between average return and dispersion in a single period. It does not, by itself, encode the asymmetry of compounding through time or the special importance of ruin.

Paul Samuelson spent years arguing against overextending Kelly logic. His objections were not trivial; they forced the distinction between maximizing expected utility and maximizing long-run growth into the open. Even if one ultimately sides with Kelly and Peters for multiplicative wealth, Samuelson is part of the reason the debate became intellectually sharp.

#VII. Absorbing Barriers and the River

Taleb’s river analogy makes the mathematics visceral:

Never cross a river that is on average four feet deep.

If the river is eight feet deep in the middle and dry on the sides:

Arithmetic mean: $(0 + 8)/2 = 4$ feet, which seems safe
Actual constraint: if you are shorter than eight feet, the deep section still kills you

The river’s average depth is irrelevant. What matters is whether any point along the path is deep enough to kill you. The same logic applies to multiplicative wealth: a single ruinous outcome matters more than an average taken across hypothetical parallel paths.

Mathematically, zero is an absorbing barrier. If wealth hits zero, the process stops. You cannot recover. The arithmetic mean ignores this because it averages across paths where you survived and paths where you died. The geometric mean, via the logarithm, assigns infinite negative utility to zero: $\ln(0) = -\infty$.

This is why the Kelly criterion never bets the entire bankroll on any positive-expected-value bet, no matter how good the odds. The logarithm’s concavity makes ruin infinitely worse than any potential gain can compensate for.

#VIII. Fat Tails and Higher Moments

The Taylor expansion of $\ln(1+R)$ assumes returns are approximately normal. But financial returns live in Extremistan (Taleb’s term for domains governed by fat-tailed distributions), where extreme events are far more likely than the normal distribution predicts.

Benoit Mandelbrot is the foundational figure here. Long before Taleb, Mandelbrot argued that speculative prices do not behave like neat Gaussian variables. They jump, cluster, and produce extreme moves far more often than classical models would suggest. Once that is true, the simple variance-based approximation is no longer enough. The tails start to dominate the economics.

Jean-Philippe Bouchaud pushed this critique further by arguing that the standard equilibrium picture of markets is too clean. His distinct contribution is to connect fat tails to market microstructure and crowd dynamics. Prices are shaped by crowding, feedback, market impact, and institutional structure, not just by tidy distributions around a stable mean. That matters because it means tail risk is not merely a statistical annoyance. It is built into how markets actually work.

Didier Sornette adds another important layer. His distinct contribution is to model bubbles and crashes as endogenous critical phenomena generated by positive feedback, imitation, and unstable market structure. In his framework, some of the biggest crashes are not random bolts from the blue. They are the natural end point of a system that has become reflexive and fragile.

When returns are fat-tailed, higher moments matter. The cleanest way to see that is to expand the logarithm directly around $R=0$. For $|R|<1$,

$$\ln(1+R) = R - \frac{R^2}{2} + \frac{R^3}{3} - \frac{R^4}{4} + …$$

Taking expectations gives

$$\mathbb{E}[\ln(1+R)] = \mathbb{E}[R] - \frac{1}{2}\mathbb{E}[R^2] + \frac{1}{3}\mathbb{E}[R^3] - \frac{1}{4}\mathbb{E}[R^4] + …$$

This version is more explicit than the shorthand mean-variance formula. The second moment enters with a negative sign, so dispersion hurts growth. The third moment enters with a positive sign, so positive skew helps and negative skew hurts. The fourth raw moment also enters with a negative sign, which means that large extreme moves, whether positive or negative, reduce expected log-growth unless they are offset elsewhere in the distribution. In practice, that means the simple mean-and-variance picture stops being enough once extreme moves become common.

Standard option pricing models (Black-Scholes) assume log-normal returns with thin tails. In practice, traders partially correct for this with volatility smiles and skews, but any framework that stays too close to a thin-tailed world will still understate the probability of extreme moves. The exact crash frequency matters less than the broader implication: left-tail events occur materially more often than a naive Gaussian calibration suggests.

Taleb’s barbell strategy is the practical philosophical response to this entire section. His central point is not just that tails are fatter than standard models admit. It is that the right response to a fat-tailed world is to organize a portfolio so that ordinary outcomes are survivable and extraordinary dislocations become opportunities rather than existential threats. A barbell does exactly that: most of the capital sits in positions that are robust to ordinary noise, while a small allocation buys extreme convexity.

That structure matters because it changes the shape of the return distribution itself. The puts have bounded downside (premium paid) and very large upside in crashes. That creates positive skewness, limits exposure to ruinous left-tail states, and preserves the possibility of large gains when the system breaks. In Jensen-Kelly terms, Taleb’s contribution is to insist that the objective is not to maximize a smooth average in a well-behaved world. It is to survive and compound in a discontinuous one.

Hyman Minsky belongs in this picture too. His core idea was that stability breeds fragility. Long calm periods encourage leverage, maturity mismatch, and crowded positioning, which makes the eventual break far more violent. That is exactly the kind of environment where average outcomes look benign right up until the left tail arrives. Mandelbrot, Bouchaud, Sornette, and Minsky all push in the same direction: the left tail is part of the structure of the world itself, not a small correction to a calm baseline.

#IX. The Put Strategy as Jensen-Optimal

Spitznagel’s tail hedge, 100% SPY plus deep OTM puts, is where the abstract logic becomes a concrete portfolio. He is not simply buying disaster insurance in the conventional sense. He is taking the full chain, compounding, concavity, variance drag, fat tails, and non-ergodicity, and turning it into a specific capital-allocation rule.

That is why Spitznagel is central to this argument rather than an optional practitioner example. Many investors understand the words “fat tails” and still build portfolios as though crashes are just unpleasant drawdowns around a stable mean. Spitznagel’s contribution is to force the implication all the way through: if the left tail dominates long-run outcomes, then convexity is not a cosmetic add-on. It belongs inside the core architecture of the portfolio.

Even in the simplest second-order approximation, the mechanism is clear. Recall the approximate growth formula:

$$G \approx \mu - \frac{\sigma^2}{2}$$

In this approximation, the comparison is explicit: the hedge is beneficial if the reduction in the variance term, $\sigma^2/2$, is larger than the reduction in the mean term, $\mu$. But that is only the local mean-variance description. The deeper mechanism is that the hedge truncates the left tail, which protects compounding from the states that do the most long-run damage. In symbols:

The cost: Puts have negative expected return (they expire worthless ~95% of the time). This reduces $\mu$ by the premium paid, say 0.5% annually.
The benefit: In a crash, puts can pay off many multiples of premium, truncating the left tail. In a local approximation, that often reduces the effective variance term. More importantly, it removes the states that are most destructive to future compounding.

Because the variance drag is quadratic in the local approximation, a modest reduction in tail risk can outweigh a linear premium cost. In the more realistic fat-tailed setting, the stronger statement is that sacrificing some carry can be worthwhile if it materially reduces exposure to ruinous states. Whether the trade is favorable depends on the price of convexity, the strike selection, the roll discipline, and how persistently the market underprices left-tail risk.

This is the geometry of the concave log function at work. The put strategy engineers a return distribution with less destructive left-tail exposure. In a local approximation, you can describe that as reducing the variance penalty more than the mean. In the fuller compounding picture, the more important fact is that the hedge protects you from the states that do disproportionate damage to long-run growth.

This is also where Taleb and Spitznagel meet. Taleb supplies the philosophical and statistical doctrine: avoid ruin, respect discontinuities, seek convexity. Spitznagel supplies the portfolio expression of that doctrine: keep the growth engine, then add a small convex structure that changes what happens in the worst states. The combination is what makes tail hedging more than a fear trade. It becomes a compounding strategy.

#X. Beyond Equities: Where Convexity is Cheaper

The SPY put strategy works, but it may not be optimal. The same Jensen-Kelly logic applies wherever there is reliable asymmetry between calm and crisis:

Rates: Central banks have a reaction function. They always cut in crises. The Fed dropped rates 500bps in 2008 and 150bps in two weeks during COVID. Rate options may underprice these panic-cut scenarios because models assume mean-reversion to stable levels. The trade: earn the risk-free rate, buy OTM calls on SOFR futures.

FX Carry: Currencies like AUD/JPY offer 3-5% annual carry. In stable times, you collect the differential. When risk-off hits, carry trades unwind violently. AUD/JPY dropped 40% in weeks during 2008. The carry itself can fund OTM puts on the high-yielder, creating a self-financing hedge.

Credit: Investment-grade bonds earn spread, but credit events cluster. CDS protection pays off exponentially when spreads blow out (IG: 50→250bps, HY: 300→2000bps in 2008). The barbell here is: earn IG spread, buy HY CDS protection.

Commodities: Oil exhibits extreme asymmetry. It grinds in a $60-80 range, then spikes to $140 (supply shock) or crashes to $20 (demand shock). Deep OTM strangles on crude capture both tails.

The Universa insight is to scan across these markets for wherever tail convexity is cheapest right now. Sometimes that’s equity vol (2007), sometimes credit (2006), sometimes rates (2019).

#XI. How This Fits the Series

This article is the theoretical capstone of the previous three pieces in the Leptokurtic series. It is the point where the series stops presenting separate facts and starts presenting one unified structure.

The first article, Twenty Centuries of Financial Data: What 240 Countries and 2,000 Years Reveal, used the forex-centuries repository to show that fat tails, devaluations, regime shifts, and currency breakages are not modern anomalies. They are the historical baseline. That article established the empirical backdrop: the world is structurally more discontinuous than Gaussian finance admits.

The second article, Detecting Crashes with Fat-Tail Statistics, used the fatcrash repository to test methods drawn from Sornette, Bouchaud, Taleb, and extreme value theory. That article moved from long-run historical evidence to live statistical diagnostics. It showed that crashes are not random noise around a stable mean. They often have detectable precursors, structural signatures, and tail behavior that standard tools miss.

The third article, The Tail Hedge Debate: Spitznagel Is Right, AQR Is Answering the Wrong Question, used the options_backtester repository to test the put-overlay debate directly on real SPY options data. That article supplied the portfolio evidence: deep out-of-the-money convexity can improve the realized path of a portfolio when it is funded and sized in the way Spitznagel actually describes.

This article explains why those three results belong together. The long-run historical data, the crash-detection toolkit, and the options backtests all point at the same underlying structure: wealth compounds multiplicatively, the logarithm is concave, and variability is not a cosmetic annoyance. It changes the geometry of the process. The previous three pieces supplied the data, the diagnostics, and the implementation. This piece supplies the unifying mathematics, and makes explicit why they are all consequences of the same underlying logic.

#XII. The Full Intellectual Lineage

What looks like a single modern investing idea is actually a chain in which each figure adds one missing piece, or corrects one mistake, left by the previous framework:

Napier (1614): Napier gives the first indispensable tool. Logarithms linearize multiplicative processes, so $\log(ab) = \log(a) + \log(b)$. Without that move, there is no clean way to turn compounding into something that can be analyzed additively.
Jacob Bernoulli (1685): Bernoulli adds the natural growth constant $e$. That connects Napier’s logarithmic tool to continuous compounding. Napier gives the language. Bernoulli gives the natural base for that language.
Daniel Bernoulli (1738): Daniel Bernoulli is the first major bridge from pure mathematics to decision theory. He takes the logarithm and applies it to risky choice, arguing that multiplicative risk changes rational behavior. He does not yet have Kelly or ergodicity, but he points in their direction.
Jensen (1906): Jensen supplies the missing geometric theorem. If the relevant function is concave, variability is penalized. That turns Daniel Bernoulli’s logarithmic intuition into a general structural fact: once wealth is evaluated through a concave function, randomness has a systematic cost.
von Neumann and Morgenstern (1944): They formalize expected utility as the dominant benchmark for rational choice. This is the framework that later thinkers will refine, challenge, or partially reject. Their role is not to solve the compounding problem. Their role is to define the benchmark that Peters will later criticize.
Shannon (1948): Shannon makes uncertainty operational. Information is measurable, noise has structure, and better signals change what an optimal repeated decision looks like. This is the mathematical foundation that Kelly later turns into a capital-allocation rule.
Markowitz (1952): Markowitz gives finance a tractable one-period approximation through mean-variance analysis. That is a real advance, but it is also a simplification. He makes portfolio choice practical, while leaving compounding, path dependence, and ruin underemphasized.
Kelly (1956): Kelly takes Shannon’s information-theoretic framework and translates it into repeated betting and investment. He shows how an edge should be converted into position size when the objective is long-run compound growth. This is where logarithms, information, and compounding become one explicit rule.
Thorp and Breiman (1961 onward): Thorp shows Kelly can be used in practice, and Breiman gives the long-run dominance result mathematical force.
Mandelbrot (1963): Mandelbrot challenges the statistical comfort behind standard finance. Returns are not well described by thin-tailed Gaussian assumptions. Once that is true, simple mean-variance reasoning becomes less reliable, and the left tail matters much more.
Samuelson (1969): Samuelson is the critic who forces the distinction between expected utility and long-run growth to be stated clearly.
Minsky (1986): Minsky adds the macro-financial mechanism: stability breeds fragility, so left-tail risk is generated by the system itself.
Bouchaud (2003 to 2008): Bouchaud links fat tails to market microstructure, feedback, and crowd behavior.
Sornette (2003): Sornette models bubbles and crashes as endogenous critical phenomena rather than exogenous shocks.
Taleb (2007 to 2012): Taleb is one of the true centers of gravity in the modern part of this story. He takes the statistical critique of fat tails and turns it into a doctrine of survival. Ruin, fragility, convexity, and asymmetry stop being technical side notes and become the core portfolio problem. Mandelbrot tells you the tails are fatter than you think. Taleb tells you that once you accept that fact, the whole logic of risk-taking has to change.
Peters and Adamou (2011 to 2019): Peters and Adamou are the other true center of gravity in the modern part of the argument. They reopen the foundations of decision theory by showing that non-ergodic multiplicative processes must be evaluated along time paths, not across hypothetical ensembles. This reconnects Kelly to a deeper justification: it is not just a clever betting rule. It is the correct objective for a non-ergodic compounding process.
Spitznagel (2021): Spitznagel is the implementation layer, and one of the central modern figures in the article’s thesis. He takes the whole chain, logarithms, concavity, Kelly sizing, fat tails, fragility, and non-ergodicity, and turns it into a practical portfolio architecture built around convex protection and survival through crashes. He is the point where the mathematics ceases to be interpretation and becomes an actual portfolio design.

Seen this way, the chain is continuous. Wealth compounds multiplicatively, not additively, so the basic object is a product of wealth factors through time. Logarithms turn that product into a sum, which makes long-run growth analyzable. Once the relevant function is concave, Jensen’s inequality tells you that variability lowers time-average growth relative to the arithmetic average. Kelly converts that geometry into a sizing rule: maximize expected log-growth, which means take as much exposure as your edge justifies but no more than your variance can support. Peters and Adamou then show why this is not just a preference for one utility function. In a non-ergodic compounding process, time-average growth is the mathematically relevant objective because a single investor lives through one realized path, not across many parallel worlds. That makes survival a structural requirement rather than a matter of taste. Tail hedging is the portfolio implementation of that logic: accept a small recurring cost in ordinary states to reduce exposure to the rare left-tail states that do the most damage to long-run compounding.

#Conclusion: The Geometry of Survival

Jensen’s inequality is not just a mathematical curiosity. It is the geometry of survival in a multiplicative world. The concavity of the logarithm means that volatility is not just unpleasant. It is geometrically destructive, a quadratic drag on compound growth that compounds over time.

The 250-year mistake in economics was maximizing the wrong average. The arithmetic mean looks at what happens across a population of investors in parallel. The geometric mean looks at what happens to you through time. For a single investor compounding over decades, only the time average matters.

Tail hedging works because it respects this geometry. It accepts a small, certain reduction in arithmetic return (the put premium) in exchange for protection against the states that do the most damage to compound growth. In a local approximation, that looks like paying to reduce a quadratic variance penalty. In the fuller fat-tailed picture, it is better understood as paying to reduce exposure to ruinous left-tail paths.

The logarithm was invented to help astronomers multiply. Four hundred years later, it reveals why you should buy insurance on your stock portfolio, and why, in a world of compounding returns, survival comes before growth.

#References

Bernoulli, J. (1685). Ars Conjectandi (posthumous, 1713)
Bernoulli, D. (1738). “Specimen Theoriae Novae de Mensura Sortis.”
Jensen, J. L. W. V. (1906). “Sur les fonctions convexes et les inégalités entre les valeurs moyennes.” Acta Mathematica, 30.
Breiman, L. (1961). “Optimal Gambling Systems for Favorable Games.”
Bouchaud, J. P. (2008). “Economics Needs a Scientific Revolution.” Nature, 455.
Bouchaud, J. P. & Potters, M. (2003). Theory of Financial Risk and Derivative Pricing. Cambridge University Press.
Kelly, J. L. (1956). “A New Interpretation of Information Rate.” Bell System Technical Journal, 35(4).
Mandelbrot, B. (1963). “The Variation of Certain Speculative Prices.” The Journal of Business, 36(4).
Mandelbrot, B. & Hudson, R. L. (2004). The (Mis)Behavior of Markets. Basic Books.
Markowitz, H. (1952). “Portfolio Selection.” The Journal of Finance, 7(1).
Minsky, H. P. (1986). Stabilizing an Unstable Economy. Yale University Press.
Napier, J. (1614). Mirifici Logarithmorum Canonis Descriptio
Peters, O. & Adamou, A. (2011). “The Time Resolution of the St Petersburg Paradox.” Philosophical Transactions of the Royal Society A, 369(1956).
Sornette, D. (2003). Why Stock Markets Crash. Princeton University Press.
Sornette, D. (2017). Why Stock Markets Crash: Critical Events in Complex Financial Systems (updated edition). Princeton University Press.
Peters, O. (2019). “The Ergodicity Problem in Economics.” Nature Physics, 15.
Peters, O. & Gell-Mann, M. (2016). “Evaluating Gambles Using Dynamics.” Chaos, 26(2).
Samuelson, P. A. (1969). “Lifetime Portfolio Selection by Dynamic Stochastic Programming.”
Shannon, C. E. (1948). “A Mathematical Theory of Communication.” Bell System Technical Journal, 27.
Spitznagel, M. (2021). Safe Haven: Investing for Financial Storms. Wiley.
Taleb, N. N. (2007). The Black Swan: The Impact of the Highly Improbable. Random House.
Taleb, N. N. (2012). Antifragile: Things That Gain from Disorder. Random House.
Thorp, E. O. (1997). “The Kelly Criterion in Blackjack, Sports Betting, and the Stock Market.”
von Neumann, J. & Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton University Press.