The Tail Hedge Debate: Spitznagel Is Right, AQR Is Answering the Wrong Question

A ship caught in a violent storm with the disciples struggling against the wind and waves
The Storm on the Sea of Galilee, Rembrandt, 1633

Stock markets crash. The S&P 500 price index fell about 57% from October 9, 2007 to March 9, 2009, and about 34% from February 19, 2020 to March 23, 2020.12 A put option is a contract that pays you when the market falls below a certain price (the “strike”). If you hold stocks and also hold puts, the puts can offset some of your losses during a crash. The question is whether the cost of buying puts is worth the protection they provide.

There are two sides. AQR Capital Management published “Chasing Your Own Tail (Risk)” (Nielsen, Villalon, and Berger, 2011). They argue that buying puts systematically costs more than it saves. On the other side, Mark Spitznagel at Universa Investments, where Nassim Taleb is scientific advisor, argues that a small put allocation improves long-term returns (Spitznagel, 2021). Universa reported a 3,612% gain in March 2020 (via an investor letter, as reported by Bloomberg).3

We tested both claims with our open-source options backtester on 17 years of real SPY options data (2008 to 2025), covering three crashes: the 2008 financial crisis, COVID, and the 2022 bear market.

Spitznagel is right about the strategy he actually proposes. AQR’s published critique uses near-ATM puts and a no-leverage framing, neither of which matches the portfolio Spitznagel and Universa describe. When we test deep OTM puts, even the no-leverage framing beats SPY. Spitznagel’s externally funded overlay still wins by a wider margin.

#Why puts are expensive

To understand this debate, we need to start with how options are priced.

An option’s price depends heavily on implied volatility (IV): the market’s estimate of how much the stock price will move in the future. Higher expected movement means the option is worth more, because there’s a greater chance it will end up profitable.

In practice, implied volatility is consistently higher than what actually materializes (realized volatility). This gap is called the Variance Risk Premium (VRP):

$$\text{VRP} = \sigma^2_{\text{implied}} - \sigma^2_{\text{realized}}$$

Think of it this way: $\sigma^2_{\text{implied}}$ is what the market expects the variance to be. $\sigma^2_{\text{realized}}$ is what actually happens. The difference is the premium that option buyers pay over fair value.

Carr and Wu (2009) documented that this spread is persistently positive. Put buyers pay more than fair value on average. The reason is that investors are willing to overpay for crash protection, the same way homeowners overpay for fire insurance relative to the expected loss from fire. Bollerslev, Tauchen, and Zhou (2009) went further: they showed that the VRP is not just a cost. It predicts future stock returns. When the gap between implied and realized variance is wide, future equity returns tend to be higher. The same force that makes puts expensive (fear of crashes) also drives the equity premium that the stock portion of the portfolio earns.

Israelov (2019) confirmed the negative average return of puts and titled his paper “Pathetic Protection: The Elusive Benefits of Protective Puts.” The CBOE S&P 500 Put Protection Index (PPUT) formalizes this as a benchmark: it holds the S&P 500 and buys monthly at-the-money (ATM) puts. It has underperformed the unhedged index over most periods. But ATM puts are the most expensive possible hedge: they have the highest theta decay and the lowest convexity. Testing ATM puts and concluding “puts don’t work” is like testing a sedan on a racetrack and concluding “cars are slow.” The deep OTM puts Spitznagel uses cost a fraction of ATM puts and have far more convexity.

AQR’s argument stops here. Puts lose money on average. Therefore they hurt portfolio performance.

This reasoning is incomplete. It looks only at the average return of the put (the first statistical moment, the mean). It ignores what the put does to the volatility of the portfolio (the second moment, the variance). Compounding depends on both.

#How volatility destroys compounding

If you invest money and earn the same return every year, your wealth compounds smoothly. But if returns fluctuate, even with the same average, you end up with less. This is called variance drain, and it’s the key to understanding why Spitznagel’s strategy works.

The geometric (compound) growth rate of a portfolio is approximately (for returns that are small relative to 1, under lognormal assumptions):

$$G \approx \mu - \frac{\sigma^2}{2}$$

Here $\mu$ is the arithmetic mean return (the simple average of all yearly returns) and $\sigma$ is the standard deviation of those returns (a measure of how much they fluctuate). The term $\frac{\sigma^2}{2}$ is the variance drain: the penalty that volatility imposes on compounding.

A simple example shows why this happens. Start with 100 dollars. Gain 50% one year, lose 50% the next.

The arithmetic average return is $\frac{+50% + (-50%)}{2} = 0%$. But you do not end up with 100 dollars. You end up with 75 dollars. You lost 25% despite an average return of zero. The gain and loss were symmetric in percentage terms, but the loss applied to a larger base (150 dollars), so it took away more than the gain added. That is variance drain.

The drain is quadratic in volatility, meaning it grows with the square of the fluctuations:

Portfolio volatility ($\sigma$)Variance drain ($\frac{\sigma^2}{2}$)
10%0.5%/yr
20%2.0%/yr
40%8.0%/yr

Doubling volatility quadruples the drain. This means that large drawdowns are disproportionately costly to long-run wealth. A single 50% crash costs more in compounding terms than ten 5% corrections, even if the total percentage lost is the same.

On SPY (2008 to 2025):

Spitznagel’s thesis is about this second term. If puts reduce portfolio variance by cutting off the worst drawdowns, the reduction in $\frac{\sigma^2}{2}$ can exceed the premium paid. A put costs money on average (it hurts $\mu$, the first moment). But by truncating the worst losses, it reduces the quadratic drag on the portfolio (it helps $\frac{\sigma^2}{2}$, the second moment). The net effect on compound growth $G$ can be positive because the variance drain grows with the square of the loss. Preventing a few large drawdowns saves more in compounding terms than the cumulative premium costs.

#Fat tails and put mispricing

Taleb makes a related but distinct argument in The Black Swan and Statistical Consequences of Fat Tails.

Standard option pricing models (like Black-Scholes) assume returns follow something close to a normal (Gaussian) distribution. In a normal distribution, events far from the average are extraordinarily rare. A crash on the scale of 2008 or 2020 sits far enough into the tails that Gaussian models treat it as effectively negligible for ordinary portfolio construction.

In reality, over the 2008 to 2025 sample we study here, SPY experienced multiple drawdowns of roughly 30% or worse, including 2008-09 and 2020, with 2022 close behind. Real markets have fat tails: extreme events are far more frequent than Gaussian models predict. The probability of a large crash is not astronomically small. It is orders of magnitude higher than a thin-tailed model would suggest.

This has a direct consequence for put pricing. Option markets do price skew: deep OTM puts trade at higher implied volatility than ATM options, reflecting some awareness of tail risk. But even after skew is priced, deep OTM puts may still be cheap relative to the realized frequency of crashes. The VRP shows that puts are expensive relative to realized volatility in normal times. But the relevant comparison for deep OTM puts is not average realized volatility; it is the actual frequency and magnitude of extreme drawdowns. The observed frequency of large drawdowns in our sample is far higher than a Gaussian baseline would imply.

Taleb calls this the difference between Mediocristan (where Gaussian statistics work, like human height) and Extremistan (where they do not, like financial returns). The S&P 500 lives in Extremistan.

The variance drain argument (puts reduce $\frac{\sigma^2}{2}$) and the mispricing argument (puts are cheap relative to true tail probabilities) are independent. Either one alone could justify the strategy. Together they explain why the results are as strong as they are.

#Theoretical foundations

The variance drain argument and the fat-tail mispricing argument have deeper roots than the Spitznagel-AQR debate suggests.

Ole Peters and ergodicity economics. The variance drain formula $G \approx \mu - \frac{\sigma^2}{2}$ is a special case of a broader insight. Peters (2019) argues that classical expected-value reasoning fails for multiplicative processes like portfolio growth. The ensemble average (what happens across many parallel investors) diverges from the time average (what happens to one investor over many periods). For a single investor compounding over decades, the time average is what matters, and it is always lower than the ensemble average when returns fluctuate. Spitznagel’s strategy works because it improves the time-average growth rate, even though it reduces the ensemble-average return (by paying premium). Most of finance optimizes for the wrong average.

Bouchaud on fat tails and hedging. Bouchaud, Iori, and Sornette (1996) showed that in fat-tailed markets, Black-Scholes delta hedging leaves large residual risk. The standard model assumes continuous rebalancing in a Gaussian world; real markets have jumps and heavy tails that make perfect hedging impossible. This means option sellers bear more risk than their models suggest, and option buyers (like tail hedgers) get more protection than the models price in. This is the theoretical basis for why deep OTM puts may be systematically cheap relative to true tail risk.

Sornette on endogenous crashes. Sornette (2003) argues that large crashes are not exogenous shocks but endogenous instabilities, the result of self-reinforcing feedback loops (herding, leverage, procyclical risk management) that build up over months or years before releasing suddenly. His Log-Periodic Power Law Singularity (LPPLS) model attempts to detect these signatures. This is relevant to our macro-signal finding: standard indicators (VIX, yield curve, credit spreads) measure risk levels but not the endogenous buildup that precedes crashes. Sornette’s approach is structurally different (it looks for acceleration patterns in price itself), though its real-time track record remains debated.

Rare disaster models. Barro (2006) formalized the idea that the equity premium itself may be compensation for rare catastrophic events. If investors demand higher average returns because crashes happen, then the equity premium and the tail-hedge premium are two sides of the same coin. Kelly and Jiang (2014) showed that time-varying tail risk is priced in equity cross-sections. Bollerslev, Tauchen, and Zhou (2009) demonstrated that the variance risk premium predicts future stock returns. The same VRP that makes puts expensive also signals future equity returns.

#The core disagreement

This is where the debate breaks down. The two sides are not testing the same portfolio.

#What AQR tests

AQR tests portfolios where you sell some of your stocks to buy puts:

$$R_{\text{portfolio}} = (1-w) \cdot R_{\text{SPY}} + w \cdot R_{\text{puts}}$$

At $w = 1%$: you hold 99% in stocks, 1% in puts. Total portfolio: 100%.

AQR’s argument seems intuitive: you are taking money out of your best asset (stocks, which go up on average) and putting it into an asset with negative expected return (puts, which expire worthless most of the time). But there is a second, less obvious difference: AQR’s published tests use near-the-money puts (roughly 5% out of the money, delta around $-0.35$). These puts are expensive per dollar of notional protection because they have high theta decay and low convexity. Israelov (2019) confirmed this: his “Pathetic Protection” paper tests puts in exactly this delta range and finds negative returns.

When we run the no-leverage framing with deep OTM puts (delta $-0.10$ to $-0.02$, the strikes Spitznagel actually uses), the results reverse: every configuration beats SPY, as shown in the results below. The framing matters, but put selection matters more.

#What Spitznagel actually does

Spitznagel keeps 100% in stocks and buys puts with a small separate budget on top:

$$R_{\text{portfolio}} = 1.0 \cdot R_{\text{SPY}} + w \cdot R_{\text{puts}}$$

The strategy requires a small amount of capital beyond the core equity position to fund the put premium. Some might call this leverage. But it is fundamentally different from ordinary leverage.

Ordinary leverage means borrowing money to buy more stocks. If you borrow to hold 130% in stocks, your gains are 30% bigger but your losses are also 30% bigger. Drawdowns get worse in proportion to the leverage. The payoff is symmetric: leverage amplifies both good and bad outcomes equally.

A put overlay works differently. It is asymmetric:

This asymmetry happens because a deep OTM put’s sensitivity to the market (its delta, $\Delta$) increases as prices fall. Delta measures how much the put’s price changes per 1-dollar change in the stock. A deep OTM put starts with a delta near zero (barely reacts to market moves). As the market drops and the put moves closer to being “in the money,” delta approaches $-1.0$ (moves dollar-for-dollar with the stock). A small position becomes a large hedge exactly when you need it.

AQR’s published analysis differs from Spitznagel’s on two dimensions: the framing (no-leverage vs leveraged overlay) and the put selection (near-ATM vs deep OTM). As we show below, both differences matter, but the put selection is the larger driver. Even in the no-leverage framing, deep OTM puts beat SPY in our sample.

The public debate has been contentious. Taleb and Asness clashed publicly in May 2020 over whether Universa’s March 2020 returns proved the strategy works. Aaron Brown (ex-AQR) wrote in Bloomberg that Universa’s percentage returns are “legit but with an asterisk”: the 3,612% is on the put allocation, not the total portfolio. CalPERS’ then-CIO Ben Meng argued that their alternative hedges outperformed tail-risk funds. These disagreements often reduce to framing: what denominator you use, and whether you funded the puts by selling stocks. AQR’s follow-up work (Israelov (2017) and Hurst, Ooi, and Pedersen (2017)) continues to test ATM or near-the-money puts in the sell-stocks-to-fund-puts framing. Neither paper tests the deep OTM overlay that Spitznagel actually runs.

#Results

All tests use deep OTM puts (delta $-0.10$ to $-0.02$, 90 to 180 days to expiration, monthly roll) on real SPY options data.

A note on terminology: “deep OTM” means the put’s strike price is far below the current market price. A put with delta $-0.02$ has roughly a 2% chance of ending up profitable. These puts are very cheap, but when the market crashes, they can multiply in value 10x to 50x.

#The calm-period test: 2012-2018

Before showing 17 years of data that include three major crashes, we start with the hardest test for the strategy: the calmest 7-year stretch in our sample. From 2012 to 2018, no correction exceeded -19.3%. There was no GFC, no COVID. If tail hedging only works because of once-in-a-decade crashes, it should bleed money here.

Spitznagel framing (100% stocks, puts on top) during 2012-2018:

ConfigAnnual Returnvs SPYMax DrawdownVolSharpe
100% SPY (baseline)+12.35%-19.3%12.9%0.960
+ 0.5% deep OTM puts+16.30%+3.95%-22.0%16.4%0.993
+ 1.0% deep OTM puts+20.34%+7.99%-25.2%22.0%0.926
+ 3.3% deep OTM puts+40.11%+27.76%-65.5%69.9%0.574

No-leverage framing (reduce equity to fund puts) during 2012-2018:

ConfigAnnual Returnvs SPYMax DrawdownVolSharpe
100% SPY (baseline)+12.35%-19.3%12.9%0.960
99.5% SPY + 0.5% puts+12.71%+0.36%-15.4%11.1%1.145
99% SPY + 1% puts+13.03%+0.68%-11.9%10.1%1.289
96.7% SPY + 3.3% puts+14.12%+1.77%-16.2%15.1%0.934

Year-by-year for the 0.5% Spitznagel configuration:

YearSPYStrategyExcess
2012+14.17%+17.40%+3.23%
2013+29.00%+32.31%+3.31%
2014+14.56%+18.11%+3.55%
2015+1.31%+7.91%+6.60%
2016+13.59%+16.70%+3.11%
2017+20.78%+24.12%+3.34%
2018-5.24%-1.70%+3.53%

The 0.5% Spitznagel config beat SPY in every single year. The excess is not coming from once-in-a-decade crashes. Moderate corrections, like the -12% dip in 2015 and the -19% drop in late 2018, are enough to generate meaningful deep OTM put payoffs that more than cover the annual premium. 2015 is the clearest example: SPY returned just +1.31%, but the put allocation gained +6.60% excess, almost entirely from the August correction.

In the no-leverage framing, the premium drag at 0.5% is nearly invisible: +0.36% excess over 7 years. The puts basically paid for themselves even without a major crash.

The danger at high budgets is also clear. At 3.3% in the Spitznagel framing, the strategy’s max drawdown reaches -65.5%, worse than any actual market decline in the period. This does not mean the market fell that much. It means the strategy gave back a large portion of its own prior gains as repeated premium bleed pulled it down from a higher peak. At that budget, the puts themselves become the source of volatility: 3.3% of portfolio value expires worthless most months, creating wild swings. This is the strongest argument for small budgets.

With the calm period as baseline, we now turn to the full 2008-2025 sample.

#No-leverage framing: sell stocks to fund puts (2008-2025)

The full 17-year sample includes the 2008 GFC, the 2020 COVID crash, and the 2022 bear market. With deep OTM puts (delta $-0.10$ to $-0.02$):

ConfigAnnual ReturnExcess vs SPYMax Drawdown
SPY only+11.05%-51.9%
99.9% SPY + 0.1% deep OTM+11.40%+0.35%-51.2%
99.5% SPY + 0.5% deep OTM+12.63%+1.59%-47.3%
99% SPY + 1% deep OTM+14.11%+3.07%-43.1%
96.7% SPY + 3.3% deep OTM+20.74%+9.69%-30.0%

Every configuration outperforms SPY. Even in the no-leverage framing, deep OTM puts more than compensate for the reduced equity exposure. At 3.3%, annual returns reach +20.74% with a max drawdown of -30.0% versus -51.9% for unhedged SPY.

This contradicts AQR’s published findings because their tests used near-ATM puts (roughly 5% OTM, delta $\approx -0.35$). Near-ATM puts cost far more per unit of tail protection: their higher delta means larger premium, faster theta decay, and less convexity. The put selection, not just the framing, is the key variable AQR’s critique overlooked. Spitznagel’s overlay framing still dominates (see below), because it preserves full equity exposure. But AQR’s conclusion that no-leverage puts “always lose” does not hold when the puts are deep OTM.

#Spitznagel framing: 100% stocks plus puts on top (2008-2025)

ConfigAnnual ReturnExcess vs SPYMax Drawdown
100% SPY (baseline)+11.05%-51.9%
100% SPY + 0.05% deep OTM+11.53%+0.49%-51.8%
100% SPY + 0.1% deep OTM+12.05%+1.00%-51.2%
100% SPY + 0.2% deep OTM+13.02%+1.98%-50.0%
100% SPY + 0.5% deep OTM+16.02%+4.97%-47.1%
100% SPY + 1.0% deep OTM+21.08%+10.03%-42.4%
100% SPY + 2.0% deep OTM+31.73%+20.69%-32.0%
100% SPY + 3.3% deep OTM+46.60%+35.55%-29.2%

In this sample, every tested configuration outperforms gross of transaction costs and taxes. Both annual return and max drawdown improve at every budget level. At 0.5% annual premium budget, the total capital committed beyond 100% SPY is just the put premium, yet we see +4.97% excess return and a 4.8 percentage point improvement in max drawdown. The excess return is far larger than the premium spent, which is the signature of convexity. We explain why below.

Standard OTM puts (closer to the money, delta $-0.25$ to $-0.10$) in the same framing:

ConfigAnnual ReturnExcess vs SPYMax Drawdown
100% SPY + 0.1% std OTM+12.04%+0.99%-51.1%
100% SPY + 0.5% std OTM+15.80%+4.75%-47.8%
100% SPY + 1.0% std OTM+20.60%+9.56%-43.6%

Both types of puts work. Deep OTM puts produce more hedge per dollar spent because their delta increases more dramatically during a crash (they have more convexity, meaning their payoff accelerates as the market falls further).

#Convexity breakdown

The following table shows the full picture. Sharpe ratio measures risk-adjusted return: $\text{Sharpe} = \frac{R_\text{portfolio}}{\sigma_\text{portfolio}}$. A higher Sharpe means more return per unit of risk. All Sharpe ratios use a risk-free rate of 0%.

StrategyPremium %/yrAnnual %Excess %Return per 1% PremiumMax DD %Vol %Sharpe
100% SPY (baseline)0.0011.05+0.00-51.920.00.556
+ 0.05% deep OTM0.0511.53+0.499.8-51.819.70.585
+ 0.1% deep OTM0.1012.05+1.0010.0-51.219.40.620
+ 0.2% deep OTM0.2013.02+1.989.9-50.019.00.687
+ 0.5% deep OTM0.5016.02+4.979.9-47.117.80.901
+ 1.0% deep OTM1.0021.08+10.0310.0-42.416.71.259
+ 2.0% deep OTM2.0031.73+20.6910.3-32.017.71.790
+ 3.3% deep OTM3.3046.60+35.5510.8-29.222.72.056

Two things stand out. First, in this gross backtest, the return per 1% of annual put premium is roughly 10x across all budget levels. Each 1% of annual premium spent on deep OTM puts generates about 10% of excess return. This ratio is stable across budget levels, which means the convexity of the puts is consistent. It is not an artifact of a single configuration. Transaction costs and bid-ask spreads would reduce this ratio in live trading, but would need to consume the majority of the premium to eliminate the effect.

Second, the Sharpe ratio increases monotonically from 0.556 (SPY alone) to 2.056 (3.3% budget). The strategy improves risk-adjusted returns at every level, not just raw returns. At 0.5% budget, the Sharpe is 0.901 versus 0.556 for unhedged SPY.

#Beyond Sharpe: downside-focused metrics

Sharpe treats upside and downside volatility equally. But upside volatility is welcome: you don’t mind large positive returns. Downside-focused metrics give a clearer picture of how the puts reshape the return distribution:

StrategySharpeSortinoCalmarMax DD %Max DD DaysSkewKurtosisPos Months %
100% SPY0.5560.6780.214-51.98340.01514.6766.7
+ 0.5% deep OTM0.9011.1500.340-47.16010.14612.8468.1
+ 1.0% deep OTM1.2591.6570.497-42.44030.20312.1170.8
+ 2.0% deep OTM1.7902.5060.992-32.02270.69116.7976.9

Sortino ratio (return divided by downside deviation only) improves more than Sharpe at every level, because the puts specifically reduce downside volatility while adding upside volatility from crash payoffs. At 0.5% budget, Sortino is 1.150 versus 0.678 for SPY, a 70% improvement, compared to 62% for Sharpe. The puts are doing exactly what they should: reducing bad volatility more than they add total volatility.

Calmar ratio (return divided by max drawdown) tells the same story. SPY’s Calmar of 0.214 means you earn 0.21% of annual return for every 1% of worst-case drawdown. At 2.0% budget, Calmar reaches 0.992, nearly 5x better, because the max drawdown shrinks from -51.9% to -32.0% while returns nearly triple.

Max drawdown duration drops from 834 trading days (over 3 years) for SPY to 227 days (under 1 year) at 2.0% budget. The portfolio recovers from crashes faster because put payoffs during the crash provide capital to compound during the recovery.

Skewness shifts from near zero (0.015, roughly symmetric) to positive (0.691 at 2.0%), meaning the puts create a rightward lean in the return distribution, with more large positive days than large negative ones. This is the distributional signature of convexity: bounded cost on the left, occasional large payoffs on the right.

Kurtosis remains high across all configurations (12-17, far above the Gaussian value of 3), confirming that SPY returns have genuinely fat tails. The puts do not eliminate fat tails; they shift which tail is fatter.

#The diminishing returns of higher budgets

More put budget is not always better. At low levels, puts reduce portfolio variance by truncating the left tail. Vol drops from 20.0% (SPY alone) to a minimum of 16.7% at the 1.0% budget. But beyond that, the puts themselves become a source of variance. Their payoffs are lumpy: zero most months, 20-50x during crashes. At 3.3% budget, annualized volatility rises back to 22.7%, higher than unhedged SPY.

The Sharpe ratio still increases monotonically in our sample because 2008, 2020, and 2022 were large enough crashes that the put payoffs overwhelmed the premium drag at every budget level. But the full-period Sharpe at 3.3% is misleading. The calm-period test (2012-2018, one of the calmest stretches in our sample) reveals the problem: at 3.3%, the Sharpe drops to 0.574, well below unhedged SPY’s 0.960, and the strategy’s max drawdown reaches -65.5% as repeated premium bleed pulls it down from its own prior peak. At 0.5%, the strategy still beats SPY on Sharpe (0.993 vs 0.960) even during these calm years. At 3.3%, you spend 3.3% of portfolio per year in premium. Over a quiet decade, that is 33% of cumulative drag before any crash pays off. At 0.5%, the cumulative drag is only 5%, easily recovered by a single moderate correction.

This is why we recommend 0.5% as the default rather than the “optimal” 3.3%. The 0.5% budget sits in the sweet spot where the variance reduction from tail truncation exceeds the variance added by the put payoffs, and the annual premium cost is small enough to outperform even in the calmest markets.

#Why it works: convexity, not leverage

The word “leverage” is misleading here. The put premium is not the same as notional exposure. When you spend 0.5% of portfolio value on deep OTM puts, you are not adding 0.5% of equity exposure. You are buying contingent downside convexity: a payoff that is near zero most of the time and very large during crashes. If you instead spent 0.5% borrowing to buy more stocks, the excess return would be about 0.05%/yr (0.5% of the equity premium). Instead, we observe +4.97%/yr. The actual excess is roughly 100 times what linear leverage would produce. The extra return is not coming from additional market exposure. It is coming from the put’s convexity.

This distinction matters because ordinary leverage and a put overlay have opposite effects on the two quantities that determine compound growth:

$$G \approx \mu - \frac{\sigma^2}{2}$$

Ordinary leverage (borrowing to buy more stocks) scales both terms proportionally. If you use 1.5x leverage, $\mu$ increases by 50% but $\sigma$ also increases by 50%, so $\sigma^2$ increases by 125%. The variance drain grows faster than the return. This is why leveraged ETFs underperform their stated multiple over long periods: they win on the first moment and lose on the second.

A put overlay works on each moment independently. The premium is a small, linear cost to $\mu$ (the first moment). But the put’s payoff during a crash truncates the left tail of the return distribution, which disproportionately reduces $\sigma^2$ (the second moment). Because the drain is quadratic in volatility, even a modest reduction in tail losses saves more in compounding terms than the premium costs.

A concrete example: suppose SPY drops 50% in a year. Without puts, that single year’s contribution to variance drain is roughly $0.50^2 / 2 = 12.5%$. With puts that offset 10% of the decline (reducing the loss to 40%), the drain contribution drops to $0.40^2 / 2 = 8.0%$, a savings of 4.5 percentage points, from a put position that cost 0.5% of the portfolio.

Taleb describes this structure as a barbell in Antifragile: combine a large, safe position with a small, highly convex one, and avoid the middle. The Spitznagel portfolio is a barbell. The bulk (100%) is in a broad equity index. A small sliver (0.1% to 1%) is in deep OTM puts.

The bulk earns the market return. The sliver has bounded downside (you can only lose the premium) and convex upside (the puts can return 10x to 50x during a crash). A “medium risk” portfolio with 80% stocks and 20% bonds reduces your exposure to crashes but also reduces your exposure to the equity premium. The barbell keeps full exposure to the equity premium while adding crash protection through a completely different mechanism.

As described above, this asymmetry comes from the put’s delta shifting from near zero to near $-1.0$ as the market crashes. A tiny position becomes a large hedge exactly when you need it. Borrowing cannot replicate this. Borrowing amplifies gains and losses symmetrically. Puts amplify only the crash payoff.

Ordinary leverage can wipe you out. If you borrow to hold 150% in stocks and the market drops 50%, you lose 75% of your equity. A margin call forces you to sell at the bottom.

A put overlay cannot do this. If you spend 0.5% of your portfolio on puts and those puts expire worthless, you lose 0.5%. That is the worst outcome. The maximum loss is the premium paid, which you know at purchase. There is no margin call.

Comparing a 50% market decline:

The put overlay reduces the drawdown. The margin position amplifies it. Similar total capital committed, opposite outcomes.

#Sensitivity and robustness

A strategy that only works with one specific set of parameters is likely overfitted to the data. We tested this concern from multiple angles.

#Parameter sensitivity

We ran a 24-combination grid search across DTE (days to expiration: how far out the put expires), delta (how far out of the money), exit timing (when to close the position), and budget (how much to spend annually). The top 10 configurations by Sharpe ratio (SPY Sharpe: 0.556):

DTEDeltaExit DTEBudget %Annual %Excess %Max DD %Vol %Sharpe
120-240(-0.10, -0.02)141.022.14+11.09-42.916.91.307
90-180(-0.15, -0.05)601.021.70+10.65-42.516.61.307
120-240(-0.10, -0.02)601.022.14+11.09-44.017.11.296
120-240(-0.10, -0.02)301.022.05+11.00-43.717.01.296
90-180(-0.15, -0.05)301.021.33+10.28-42.816.51.290
90-180(-0.15, -0.05)141.021.29+10.24-42.516.51.290
120-240(-0.15, -0.05)301.021.92+10.87-45.217.11.282
120-240(-0.15, -0.05)601.022.09+11.04-45.117.21.281
120-240(-0.15, -0.05)141.021.88+10.84-45.317.11.280
90-180(-0.10, -0.02)601.021.41+10.37-43.316.91.267

The patterns are clear. All top 10 are at the 1.0% budget, which is the highest tested. More convexity exposure produces better results. Both DTE ranges appear in the top 10, with slightly longer-dated puts performing slightly better. Both delta ranges work. Exit timing has little impact: DTE 14, 30, and 60 all appear.

The single best configuration: DTE 120 to 240, delta (-0.10, -0.02), exit at DTE 14, 1% budget. This produces 22.14%/yr with a Sharpe of 1.307 and max drawdown of -42.9%.

All 24 parameter combinations beat SPY. The worst configuration still outperforms by +4.96%/yr. All 24 have a higher Sharpe ratio than unhedged SPY. Within the tested parameter range on this asset and period, the result does not depend on picking the right parameters. This is encouraging but not definitive: 24 combinations on one asset over 17 years is a limited grid, and different markets or longer time horizons could narrow the margins.

#Rebalance frequency

Rebalancing means closing existing put positions and buying new ones. Rebalance frequency is the single most impactful parameter after budget size. All prior results use monthly rebalancing. More frequent rebalancing captures more crash payoffs because you replace expired or decayed puts faster, maintaining continuous protection:

FrequencyAnnual %Excess %Max DD %Vol %Sharpe
Monthly16.02+4.97-47.117.80.901
Biweekly24.59+13.54-44.618.61.321
Weekly41.61+30.56-38.819.02.192

Biweekly rebalancing is the best practical middle ground in this sample. It improves materially on monthly (24.59%/yr vs. 16.02%/yr) while requiring far fewer rolls than weekly. Weekly rebalancing pushes the backtest further, to 41.61%/yr with a Sharpe of 2.192 and max drawdown improving from -47.1% to -38.8%, but this should be read as an upper-bound sensitivity result rather than a realistic default. With monthly rolls, there are gaps in coverage as puts decay (they lose value as time passes, a phenomenon called theta decay). More frequent rolls keep the hedge closer to full strength at all times.

The practical tradeoff is transaction costs. Biweekly rolling means 26 trades per year versus 12 for monthly and 52 for weekly. At realistic bid/ask spreads for deep OTM puts, the transaction cost drag rises with turnover. Biweekly is easier to defend as a live implementation. Weekly may still outperform monthly after costs, but the gap is less certain once spread, slippage, and intermittent illiquidity are included.

Quarterly and semi-annual rebalancing eliminate most of the benefit. Long gaps between rolls leave the portfolio unhedged for extended periods, which is precisely when a crash can strike.

#Profit targets

We tested profit target exits at 3x, 5x, 10x, and 20x the premium paid (e.g., a put bought for 100 dollars is sold when it reaches 300 dollars at 3x). The result: profit targets barely matter at monthly rebalancing. Holding the puts until the DTE exit date produces results nearly identical to taking profits at any threshold. This is because the convex payoff of deep OTM puts concentrates in a few extreme events. Taking profits at 3x or 5x caps the upside on exactly the trades that drive the strategy’s edge. The 50x payoff during a crash funds years of premium bleed. Capping it at 10x removes most of the value.

#Macro signal timing

We tested whether macro indicators could improve put timing: buy more puts when a crash seems likely, fewer when it doesn’t. Signals tested include VIX (the market’s “fear gauge”), GDP growth, high-yield credit spreads, the yield curve (10Y-2Y treasury spread, which inverts before recessions), non-financial corporate equity, the dollar index, the Buffett Indicator (market cap/GDP), and Tobin’s Q.

None of them improve put timing. The unconditional strategy (fixed budget, no signal) outperforms every signal-conditioned variant. The reason is that crash timing is inherently unpredictable. The VIX was low before both the 2008 crisis and COVID. The Buffett Indicator has been elevated for decades. Credit spreads were tight in early 2020. These signals contain information about risk levels but not about timing. The put strategy works precisely because it does not try to time crashes. It pays a small, steady cost for permanent protection.

#Out-of-sample validation

A fair objection: 17 years with three crashes may overstate the long-run crash frequency. A 20-year period with no crashes would bleed premium with no payoff. The response is that the premium is small (0.1% to 0.5% per year), so even infrequent crashes are enough to break even.

We tested this directly. We split the data in half and ran the same default configuration (0.5% budget, DTE 90 to 180, delta -0.10 to -0.02) on both periods without re-optimizing:

PeriodStrategySPY B&HExcessMax DD
2008 to 201612.14%7.29%+4.85%-47.1%
2016 to 202520.02%14.92%+5.09%-22.3%
Full period16.02%11.05%+4.97%-47.1%

The strategy beats SPY in both halves. The first half contains the GFC (the largest crash in the sample). The second half contains COVID and the 2022 bear market. The excess return is positive in both periods, which means the result is not driven by a single event.

The strongest argument for overfitting remains that the entire edge comes from three crashes. If those crashes had been 20% milder, or if the next 17 years produce no drawdown worse than 25%, the strategy may underperform. The calm-period test (2012-2018) directly addresses this. These were among the calmest years in S&P 500 history; no correction exceeded -19.3%. Yet Spitznagel 0.5% still beat SPY on both raw return (+3.95%/yr excess) and Sharpe (0.993 vs 0.960), and did so in every single calendar year. Moderate corrections of -12% to -19% were enough to generate meaningful deep OTM put payoffs that more than covered the premium. The strategy does not need once-in-a-decade crashes.

What we can say is that the strategy is robust to parameter choice, survives an out-of-sample split, and outperforms SPY even during the calmest 7-year window in our data at the recommended 0.5% budget.

#Limitations and open questions

Capacity and execution. Deep OTM puts have limited liquidity, especially during stress. Bid-ask spreads on SPY puts with delta below $-0.05$ can exceed 20% of the mid price. During the March 2020 crash, some deep OTM strikes had no bids at all for hours. A strategy that works at $10M may not scale to $10B. Universa manages this by trading across multiple markets and maintaining dealer relationships, but capacity constraints are real.

Financing source. Where the put budget comes from matters. Our backtest treats it as an external cost. In practice, the premium could come from reducing equity exposure (no-leverage framing), from a separate cash allocation, or from an institutional budget line. The choice affects both the portfolio math and the behavioral likelihood of maintaining the strategy through long bleed periods.

Tax and turnover. Monthly rolling generates 12 short-term capital loss events per year. In taxable accounts, the interaction between put losses, put gains during crashes, and equity capital gains creates complex tax consequences. This drag is absent from our backtest.

Regime dependence of skew pricing. The volatility skew (how much more expensive OTM puts are relative to ATM options) varies over time. After 2008, skew steepened dramatically, and deep OTM puts became more expensive. If the market “learns” to price tail risk more accurately, the edge may compress. Conversely, long calm periods tend to flatten skew, making puts cheaper again.

Comparison with other tail hedges. We only test put-based strategies. Hurst, Ooi, and Pedersen (2017) at AQR argue that trend-following (managed futures) provides crash protection more cheaply than puts because trend strategies earn a positive premium on average rather than bleeding one. A fair comparison would test both approaches on the same data. Our backtester currently does not support trend-following overlays.

#Future work: Beyond equities

The SPY put strategy works, but it may not be the optimal application of Spitznagel’s structure. The same logic (steady carry plus cheap convexity on extreme moves) applies wherever there is a reliable asymmetry between calm periods and crises. The best market depends on the regime. In a classic disinflationary recession, rates options may be superior. For portfolios already earning carry, FX may be the most natural fit. For institutions with access to OTC markets, credit can offer very strong crisis convexity. VIX is the most direct panic hedge, but often the hardest to own cheaply enough. Several markets exhibit structural tail properties that can be as strong as, or stronger than, equities:

Rates and rate futures. Central banks tend to cut rates aggressively in crises. The Fed dropped rates from 5.25% to 0.25% during the 2008 crisis, and from 1.5% to 0% in two weeks during COVID. These moves are 10x to 20x larger than normal monthly rate changes. Rate options may underprice these panic-cut scenarios because standard models assume mean-reversion around stable levels. The trade would be: earn the risk-free rate (or hold short-term Treasuries), buy OTM calls on SOFR futures that pay off when rates collapse. The counterexample is stagflation: if inflation is high during a recession, central banks may not cut, and rate-based tail hedges would fail. This makes rates a conditional hedge rather than a universal one.

FX carry trades. Currencies like AUD/JPY and MXN/JPY offer interest rate differentials of 3 to 5% annually. In stable times, carry traders collect this premium. When risk sentiment shifts, these positions unwind violently. The 2008 crisis saw AUD/JPY drop 40% in weeks. OTM puts on the high-yield currency may be systematically cheap relative to the crash risk because Gaussian models treat carry-trade unwinds as low-probability events. The carry itself could fund the protection.

Credit and CDS. Investment-grade bonds earn a spread over Treasuries, but credit events are rare and clustered. The barbell structure here is: hold IG bonds for the spread, buy OTM protection on HY or IG CDS indices. The protection bleeds a small annual premium in calm markets. When credit stress hits, the payoff is convex: the 2008 crisis took IG CDS from 50bps to 250bps (5x) and HY CDS from 300bps to 2000bps (6.7x). CDS has a natural asymmetry similar to puts: bounded cost (the annual premium), unbounded upside in a credit crisis.

Volatility products. Buying calls on the VIX is the most direct tail hedge. These options can be extremely convex: when volatility explodes, short-dated VIX calls can reprice very quickly. That is why they are so attractive during panics. But high convexity does not automatically mean a good trade. The key distinction is between convexity (how fast the payoff accelerates in a selloff) and efficiency (how much convexity you get for the premium you pay).

The VIX itself trades around 12 to 15 in calm markets and can spike to 80+ during crashes (it hit 82.69 on March 16, 2020). At first glance, that makes VIX calls look like the perfect hedge. The confusion is that VIX options are not priced on spot VIX. They are priced on VIX futures. So even if spot VIX jumps from 15 to 80 intraday, the option payoff depends on how much the relevant VIX future moves, which is usually much less. This means the eye-catching spot spike overstates the actual option payout.

Short-dated ATM or slightly OTM VIX calls can still have very strong convexity because they are sensitive to sharp near-term changes in implied volatility. But that convexity is usually expensive because it is obvious and heavily demanded by investors looking for crash insurance. On top of that, the VIX futures curve is often in contango in calm markets, which means forward volatility is already priced above spot. That carry drag makes long VIX exposure expensive to hold over time.

So VIX calls are not weak because they lack convexity. They are often less efficient because the convexity is expensive, the payoff is filtered through the futures curve rather than spot VIX, and volatility mean-reverts quickly after the panic. Whether the remaining edge is still worth paying for is an empirical question.

Commodities. Oil crashes during demand shocks, which tend to coincide with equity crashes: crude fell from $145 to $30 in 2008 and briefly went negative in April 2020. OTM puts on crude oil futures would pay off in exactly these scenarios. The directional thesis is weaker than rates or credit because supply shocks push oil the other way (up during crises like the 1973 embargo or 2022 Ukraine war). This makes crude a noisier hedge than the other markets listed here.

Cross-market diversification. The strongest argument for testing multiple markets is not finding the single best hedge but combining several. Crises are correlated: when equities crash, rates get cut, carry trades unwind, credit spreads blow out, and the VIX spikes. The crash payoffs across markets are positively correlated, but the bleed costs are largely independent (rate option decay has nothing to do with FX option decay). A portfolio that spreads 0.5% of annual premium across four markets would bleed roughly the same total amount as concentrating in one, but the probability of at least one leg paying off in any given crisis is higher. This diversification of bleed with correlation of payoff is the multi-market version of Spitznagel’s variance drain argument.

If the question is “what is better than SPY puts?”, the useful answer is to match the hedge to the portfolio and the regime:

So the practical conclusion is not that one asset dominates. It is that the “best” tail hedge depends on what you already own, what kind of crash you fear, and which markets you can actually trade cheaply and consistently. For most allocators, the strongest implementation is likely to start with equity puts, then diversify a small convexity budget across rates, FX, or credit only when the portfolio, regime, and execution capability justify it.

Testing these alternatives requires different data (CME futures and options, CDS term structures, FX options, VIX futures and options) and modifications to the backtester. This is ongoing work.

#Implementation

Our backtester uses monthly rebalancing: buy the lowest-premium deep OTM put available within the target delta and expiration range. This is a simple, mechanical strategy.

Funding assumption: the put budget is treated as an external, fixed annual premium (e.g., 0.5% of portfolio value) rather than being funded by selling SPY. This is the key distinction from AQR’s setup. The fair benchmark is not plain SPY alone but SPY plus the same external capital source without the puts (e.g., SPY + 0.5% in cash). Since the premium is small and cash earns the risk-free rate, the benchmark difference is minor (roughly 0.02% per year at 0.5% budget and a 4% cash rate), but the framing matters: the outperformance comes from the convexity of the puts, not from deploying more total capital. If you instead fund the put budget by reducing SPY, you recover the no-leverage framing. Returns are lower than the overlay framing because you hold less equity, but deep OTM puts still outperform SPY in both framings in our sample.

Methodology note: no attempt is made to optimize timing; the strategy is purely rules-based. Real-world frictions (bid/ask spreads, slippage, and taxes) would reduce headline returns but should not remove the convexity effect.

Universa’s actual implementation is more sophisticated. They manage rolls continuously to maintain their desired exposure profile. They reinvest put profits into stocks at crash lows, buying when prices are depressed. They hedge across multiple markets, not just the S&P 500. Our backtest results are a lower bound on the performance of the actual strategy.

#Code

The backtester and all notebooks are open source:

github.com/lambdaclass/options_portfolio_backtester

The performance-critical paths (inventory joins, grid sweeps, filter evaluation) have an optional Rust core via PyO3 and Polars, providing 10-50x speedups. The parallel grid sweep uses Rayon for shared-memory parallelism.

#Disclaimer

This article is research and educational material only. It is not financial advice, investment advice, or a recommendation to buy or sell any security or derivative. Past performance, whether backtested or live, does not guarantee future results. Options trading involves substantial risk of loss. The backtest results presented here are gross of transaction costs, taxes, and slippage, and may not be replicable in live trading. Consult a qualified financial advisor before making any investment decisions.

#References

  1. The S&P 500 closed at 1,565.15 on October 9, 2007 and 676.53 on March 9, 2009, a 56.8% decline on closing prices. The SOA Research Brief Table 3 reports −59%, likely using intraday highs and lows. See SOA Research Brief (Apr 16, 2020).

  2. The same brief notes: “the S&P 500 cratered on March 23, down 34% from its February 19 level.” See SOA Research Brief (Apr 16, 2020).

  3. Bloomberg reports the fund “returned 3,612% in March” and that this came “according to an investor letter … obtained by Bloomberg.” See Taleb-Advised Universa Tail Fund Returned 3,600% in March.