The Tail Hedge Debate: Spitznagel Is Right, AQR Is Answering the Wrong Question
Stock markets crash. The S&P 500 fell about 59% from October 9, 2007 to March 9, 2009, and about 34% from February 19, 2020 to March 23, 2020.12 A put option is a contract that pays you when the market falls below a certain price. If you hold stocks and also hold puts, the puts can offset some of your losses during a crash. The question is whether the cost of buying puts is worth the protection they provide.
There are two sides. AQR Capital Management, one of the largest hedge funds in the world, published “Chasing Your Own Tail (Risk)”. They argue that buying puts systematically costs more than it saves. On the other side, Mark Spitznagel at Universa Investments, where Nassim Taleb is scientific advisor, argues that a small put allocation improves long-term returns. Universa reported a 3,612% gain in March 2020 (via an investor letter, as reported by Bloomberg).3
We tested both claims with our open-source options backtester on 17 years of real SPY options data (2008 to 2025), covering three crashes: the 2008 financial crisis, COVID, and the 2022 bear market.
Spitznagel is right. AQR is testing a different strategy than the one Spitznagel proposes.
#Why puts are expensive
Options are priced using implied volatility, which is a market estimate of how much prices will move. In practice, implied volatility is consistently higher than what actually materializes. This gap is called the Variance Risk Premium:
$$\text{VRP} = \sigma^2_{\text{implied}} - \sigma^2_{\text{realized}}$$
Carr and Wu (2009) documented that this spread is persistently positive. Put buyers pay more than fair value on average. The reason is that investors are willing to overpay for crash protection, the same way homeowners overpay for fire insurance relative to the expected loss from fire. Israelov (2017) confirmed this and titled his paper “Pathetic Protection: The Elusive Benefits of Protective Puts.”
AQR’s argument stops here. Puts lose money on average. Therefore they hurt portfolio performance.
This reasoning is incomplete. It looks only at the average return of the put (the first statistical moment). It ignores what the put does to the volatility of the portfolio (the second moment). Compounding depends on both.
#How volatility destroys compounding
The geometric (compound) growth rate of a portfolio is approximately (for small returns under lognormal assumptions):
$$G \approx \mu - \frac{\sigma^2}{2}$$
Here $\mu$ is the average return and $\sigma$ is the volatility. The $\frac{\sigma^2}{2}$ term is called the variance drain. It is the penalty that volatility imposes on compounding.
A simple example shows why. Start with 100. Gain 50% one year, lose 50% the next. The arithmetic average return is 0%. But you do not end up with 100. You end up with 75. The gain and loss were symmetric in percentage terms, but the loss applied to a larger base (150), so it took away more than the gain added. You lost 25% despite an average return of zero. That is variance drain.
The drain is quadratic in volatility. A portfolio with 10% volatility pays roughly 0.5% per year in variance drain. A portfolio with 20% volatility pays roughly 2%. A portfolio with 40% volatility pays roughly 8%. Doubling volatility quadruples the drain. This means that large drawdowns are disproportionately costly to long-run wealth. A single 50% crash costs more in compounding terms than ten 5% corrections, even if the total percentage lost is the same.
On SPY (2008 to 2025):
- Arithmetic mean: 12.50%/yr
- Geometric mean: 11.11%/yr
- Variance drain: 1.39%/yr
- Peak rolling drain during the 2008 crisis: 10.5%/yr
Spitznagel’s thesis is about this second term. If puts reduce portfolio variance by cutting off the worst drawdowns, the reduction in $\frac{\sigma^2}{2}$ can exceed the premium paid. A put costs money on average (it hurts the first moment). But by truncating the worst losses, it reduces the quadratic drag on the portfolio (it helps the second moment). The net effect on compound growth can be positive because the variance drain grows with the square of the loss. Preventing a few large drawdowns saves more in compounding terms than the cumulative premium costs.
Taleb makes a related point about fat tails. Standard option pricing models assume returns follow something close to a normal distribution. Under a normal distribution, a 50% crash in the S&P 500 is roughly a 4-sigma event with a probability near one in 30,000 years. In reality, we have had three drawdowns exceeding 30% in the last 17 years alone. Real markets have fat tails: extreme events are far more frequent than Gaussian models predict. This has a direct consequence for put pricing. If the market prices puts using models that underestimate tail probabilities, then deep OTM puts are systematically cheaper than they should be relative to the actual frequency of crashes. The VRP shows that puts are expensive relative to realized volatility in normal times. But they may be cheap relative to the true probability of the events they protect against. Taleb calls this the difference between Mediocristan (where Gaussian statistics work) and Extremistan (where they do not). The S&P 500 lives in Extremistan.
#AQR and Spitznagel test different strategies
This is where the debate breaks down. The two sides are not testing the same portfolio.
#What AQR tests
AQR tests portfolios where you sell some of your stocks to buy puts:
$$R_{\text{portfolio}} = (1-w) \cdot R_{\text{SPY}} + w \cdot R_{\text{puts}}$$
At $w = 1%$: you hold 99% in stocks, 1% in puts. Total portfolio: 100%.
This always loses. You are taking money out of your best asset (stocks, which go up on average) and putting it into an asset with negative expected return (puts, which expire worthless most of the time). The arithmetic is straightforward and AQR is correct about it.
#What Spitznagel actually does
Spitznagel keeps 100% in stocks and buys puts with a small separate budget on top. Total exposure exceeds 100%:
$$R_{\text{portfolio}} = 1.0 \cdot R_{\text{SPY}} + w \cdot R_{\text{puts}}$$
This is a form of leverage because total exposure is above 100%. But it is not ordinary leverage. Ordinary leverage means borrowing money to buy more stocks. If you borrow to hold 130% in stocks, your gains are 30% bigger but your losses are also 30% bigger. Drawdowns get worse in proportion to the leverage.
Put-based leverage works differently. In calm markets, you bleed a small, known premium (the cost of the puts). In a crash, the puts pay off at 10x to 50x the premium paid, because a deep out-of-the-money put’s sensitivity to the market (its delta) increases as prices fall. A put that barely moves in normal markets becomes a powerful hedge during a crash. The leverage ratio of the position depends on the path of the market and increases precisely when you need it most.
AQR’s published analysis does not test this portfolio construction.
#Results
All tests use deep OTM puts (delta -0.10 to -0.02, 90 to 180 days to expiration, monthly roll) on real SPY options data from 2008 to 2025.
#AQR framing: sell stocks to fund puts
| Config | Annual Return | Excess vs SPY | Max Drawdown |
|---|---|---|---|
| SPY only | +11.11% | -51.9% | |
| 99.9% SPY + 0.1% deep OTM | +10.27% | -0.78% | -52.1% |
| 99.5% SPY + 0.5% deep OTM | +7.07% | -3.98% | -51.7% |
| 99% SPY + 1% deep OTM | +3.15% | -7.90% | -50.8% |
| 96.7% SPY + 3.3% deep OTM | -14.63% | -25.68% | -94.2% |
Every configuration underperforms. Performance degrades as the put allocation increases. At 3.3%, the strategy loses 14.63% per year and has a 94.2% max drawdown, worse than holding stocks with no hedge at all. This happens because the steady premium bleed compounds downward while the equity exposure is reduced, so long non-crash periods grind the portfolio value lower. AQR is right about this framing. But nobody, including Spitznagel, proposes it.
#Spitznagel framing: 100% stocks plus puts on top
| Config | Annual Return | Excess vs SPY | Max Drawdown |
|---|---|---|---|
| 100% SPY (baseline) | +11.11% | -51.9% | |
| 100% SPY + 0.05% deep OTM | +11.37% | +0.32% | -51.6% |
| 100% SPY + 0.1% deep OTM | +11.63% | +0.58% | -51.4% |
| 100% SPY + 0.2% deep OTM | +12.14% | +1.09% | -50.7% |
| 100% SPY + 0.5% deep OTM | +13.79% | +2.75% | -48.1% |
| 100% SPY + 1.0% deep OTM | +16.46% | +5.41% | -45.0% |
| 100% SPY + 2.0% deep OTM | +21.78% | +10.74% | -38.5% |
| 100% SPY + 3.3% deep OTM | +28.78% | +17.74% | -31.9% |
Every configuration outperforms. Both annual return and max drawdown improve at every budget level.
Consider the 0.5% budget. Total exposure is 100.5%. That is 0.5% of leverage. Ordinary 0.5% leverage on SPY would add about 0.05% of return and make drawdowns slightly worse. Instead we see +2.75% excess return and a 3.8 percentage point improvement in max drawdown. The reason is that the put’s sensitivity to the market is not constant. As the market falls, a deep OTM put moves from barely reacting (delta near zero) to hedging almost dollar for dollar (delta approaching -1.0). A small position becomes a large hedge exactly when you need it.
Standard OTM puts (closer to the money, delta -0.25 to -0.10) in the same framing:
| Config | Annual Return | Excess vs SPY | Max Drawdown |
|---|---|---|---|
| 100% SPY + 0.1% std OTM | +12.11% | +1.07% | -50.9% |
| 100% SPY + 0.5% std OTM | +16.16% | +5.11% | -47.0% |
| 100% SPY + 1.0% std OTM | +21.35% | +10.31% | -41.9% |
Both types of puts work. Deep OTM puts produce more hedge per dollar spent.
#Leverage breakdown
The following table shows the full picture: how much leverage each budget level creates, what it returns, the Sharpe ratio, and how much excess return you get per 1% of put budget. The Sharpe ratio uses a 4% risk-free rate.
| Strategy | Budget %/yr | Leverage | Annual % | Excess % | Return per 1% Budget | Max DD % | Vol % | Sharpe |
|---|---|---|---|---|---|---|---|---|
| 100% SPY (baseline) | 0.00 | 1.000x | 11.11 | +0.00 | -51.9 | 20.0 | 0.356 | |
| + 0.05% deep OTM | 0.05 | 1.001x | 11.37 | +0.32 | 6.5 | -51.6 | 19.7 | 0.374 |
| + 0.1% deep OTM | 0.10 | 1.001x | 11.63 | +0.58 | 5.8 | -51.4 | 19.4 | 0.393 |
| + 0.2% deep OTM | 0.20 | 1.002x | 12.14 | +1.09 | 5.5 | -50.7 | 18.9 | 0.430 |
| + 0.5% deep OTM | 0.50 | 1.005x | 13.79 | +2.75 | 5.5 | -48.1 | 17.7 | 0.552 |
| + 1.0% deep OTM | 1.00 | 1.010x | 16.46 | +5.41 | 5.4 | -45.0 | 16.6 | 0.750 |
| + 2.0% deep OTM | 2.00 | 1.020x | 21.78 | +10.74 | 5.4 | -38.5 | 17.4 | 1.024 |
| + 3.3% deep OTM | 3.30 | 1.033x | 28.78 | +17.74 | 5.4 | -31.9 | 22.0 | 1.128 |
Two things stand out. First, the return per 1% of put budget is roughly 5.4x across all budget levels. Each 1% of annual premium spent on deep OTM puts generates about 5.4% of excess return. This ratio is stable, which means the convexity of the puts is consistent. It is not an artifact of a single configuration.
Second, the Sharpe ratio increases monotonically from 0.356 (SPY alone) to 1.128 (3.3% budget). The strategy improves risk-adjusted returns at every level, not just raw returns. At 0.5% budget, the Sharpe is 0.552 versus 0.356 for unhedged SPY.
#Parameter sensitivity
We ran a 36-combination grid search across DTE ranges, delta ranges, exit timing, and budget sizes. The top 10 configurations by Sharpe ratio (SPY Sharpe: 0.353):
| DTE | Delta | Exit DTE | Budget % | Annual % | Excess % | Max DD % | Vol % | Sharpe |
|---|---|---|---|---|---|---|---|---|
| 90-180 | (-0.15, -0.05) | 60 | 1.0 | 21.84 | +10.79 | -41.9 | 16.6 | 1.074 |
| 120-240 | (-0.10, -0.02) | 60 | 1.0 | 22.19 | +11.15 | -43.5 | 17.1 | 1.065 |
| 120-240 | (-0.10, -0.02) | 30 | 1.0 | 22.05 | +11.00 | -43.7 | 17.0 | 1.061 |
| 90-180 | (-0.15, -0.05) | 30 | 1.0 | 21.50 | +10.45 | -42.0 | 16.5 | 1.060 |
| 120-240 | (-0.15, -0.05) | 60 | 1.0 | 22.25 | +11.21 | -44.0 | 17.2 | 1.059 |
| 120-240 | (-0.15, -0.05) | 30 | 1.0 | 22.04 | +11.00 | -44.4 | 17.1 | 1.056 |
| 90-180 | (-0.10, -0.02) | 60 | 1.0 | 21.60 | +10.55 | -42.8 | 16.9 | 1.041 |
| 90-180 | (-0.10, -0.02) | 30 | 1.0 | 21.14 | +10.09 | -43.4 | 16.9 | 1.017 |
| 120-240 | (-0.15, -0.05) | 14 | 1.0 | 18.97 | +7.92 | -45.3 | 17.0 | 0.881 |
| 120-240 | (-0.10, -0.02) | 14 | 1.0 | 18.83 | +7.78 | -45.1 | 16.8 | 0.881 |
The patterns are clear. All top 10 are at the 1.0% budget, which is the highest tested. More convexity exposure produces better results. Exit at DTE 30 to 60 outperforms exit at DTE 14. Holding the puts longer catches more of the crash payoff. Both DTE ranges (90 to 180 and 120 to 240) appear in the top 10, with slightly longer-dated puts performing slightly better. Both delta ranges work, with (-0.15, -0.05) having a slight edge over (-0.10, -0.02) on Sharpe.
The single best configuration: DTE 90 to 180, delta (-0.15, -0.05), exit at DTE 60, 1% budget. This produces 21.84%/yr with a Sharpe of 1.074 and max drawdown of -41.9%. SPY’s Sharpe over the same period is 0.353.
Monthly rebalancing matters. Quarterly and semi-annual rebalancing eliminate most of the benefit.
#The risk is bounded
Taleb describes a strategy he calls the barbell: combine a large, safe position with a small, highly convex one, and avoid the middle. The Spitznagel portfolio is a barbell. The bulk (100%) is in a broad equity index. A small sliver (0.1% to 1%) is in deep OTM puts. The bulk earns the market return. The sliver has bounded downside (you can only lose the premium) and convex upside (the puts can return 10x to 50x during a crash). There is no middle ground position that replicates this payoff profile. A “medium risk” portfolio with 80% stocks and 20% bonds reduces your exposure to crashes but also reduces your exposure to the equity premium. The barbell keeps full exposure to the equity premium while adding crash protection through a completely different mechanism.
Ordinary leverage can wipe you out. If you borrow to hold 150% in stocks and the market drops 50%, you lose 75% of your equity. A margin call forces you to sell at the worst time.
Put-based leverage cannot do this. If you spend 0.5% of your portfolio on puts and those puts expire worthless, you lose 0.5%. That is the worst outcome per put position. The maximum loss is the premium paid, which you know at the time of purchase. There is no margin call. There is no scenario where the puts cause additional losses beyond the premium.
A 50% market decline on a portfolio of 100% SPY plus 0.5% in puts results in roughly a 48% loss, because the puts pay off during the decline. The same 50% decline on a margin-leveraged 100.5% SPY position results in a 50.25% loss. The put overlay reduces the drawdown. The margin position amplifies it.
In our data, returns and drawdowns improve at every budget level simultaneously. The premium bleed is linear and small (a first-moment cost). The variance reduction from truncating the left tail is quadratic (a second-moment benefit). On our sample of three crashes in 17 years, the convex payoffs during crashes more than compensate for the steady premium cost.
A fair objection: 17 years with three crashes may overstate the long-run crash frequency. A 20-year period with no crashes would bleed premium with no payoff. The response is that the premium is small (0.1% to 0.5% per year), so even infrequent crashes are enough to break even.
We tested this objection directly. First, robustness: all 36 parameter combinations in our grid search beat SPY. The worst configuration still outperforms by +1.63%/yr. All 36 have a higher Sharpe ratio than unhedged SPY. This is not a result that depends on picking the right parameters. You can be wrong about every parameter choice and still come out ahead.
Second, out-of-sample. We split the data in half and ran the same default configuration (0.5% budget, DTE 90 to 180, delta -0.10 to -0.02) on both periods without re-optimizing.
| Period | Strategy | SPY B&H | Excess | Max DD |
|---|---|---|---|---|
| 2008 to 2017 | 9.72% | 7.29% | +2.43% | -48.1% |
| 2017 to 2025 | 18.01% | 14.92% | +3.09% | -22.3% |
| Full period | 13.79% | 11.05% | +2.75% | -48.1% |
The strategy beats SPY in both halves. The first half contains the GFC (the largest crash in the sample). The second half contains COVID and the 2022 bear market. The excess return is positive in both periods, which means the result is not driven by a single event.
The strongest argument for overfitting remains that the entire edge comes from three crashes. If those crashes had been 20% milder, or if the next 17 years produce no drawdown worse than 25%, the strategy may underperform. What we can say is that the strategy is robust to parameter choice and survives an out-of-sample split.
#Implementation
Our backtester uses monthly rebalancing: buy the lowest-premium deep OTM put available within the target delta and expiration range. This is a simple, mechanical strategy.
Funding assumption: the put budget is treated as an external, fixed annual premium (e.g., 0.5% of portfolio value) rather than being funded by selling SPY. This is why total exposure can exceed 100%. If you instead fund the put budget by reducing SPY, you recover the AQR framing and the results degrade.
Methodology note: no attempt is made to optimize timing; the strategy is purely rules-based. Real-world frictions (bid/ask spreads, slippage, and taxes) would reduce headline returns but should not remove the convexity effect.
Universa’s actual implementation is more sophisticated. They manage rolls continuously to maintain their desired exposure profile. They reinvest put profits into stocks at crash lows, buying when prices are depressed. They hedge across multiple markets, not just the S&P 500. Our backtest results are a lower bound on the performance of the actual strategy.
#Code
The backtester and all notebooks are open source:
github.com/lambdaclass/options_backtester
- spitznagel_case.ipynb: full Spitznagel analysis with both framings, AQR vs Universa comparison
- paper_comparison.ipynb: 10 strategies vs academic claims
#References
- Carr, P. and Wu, L. (2009). Variance Risk Premiums. Review of Financial Studies, 22(3).
- Ilmanen, A. and Israelov, R. (2018). Chasing Your Own Tail (Risk). AQR White Paper.
- Israelov, R. (2017). Pathetic Protection: The Elusive Benefits of Protective Puts. J. Alternative Investments.
- Spitznagel, M. (2021). Safe Haven: Investing for Financial Storms. Wiley.
- Taleb, N. N. (2007). The Black Swan: The Impact of the Highly Improbable. Random House.
- Taleb, N. N. (2012). Antifragile: Things That Gain from Disorder. Random House.
- Taleb, N. N. (2020). Statistical Consequences of Fat Tails. STEM Academic Press.
- Whaley, R. (2002). Return and Risk of CBOE Buy Write Monthly Index. J. Derivatives.
- See REFERENCES.md for the full list.
-
SOA Research Brief Table 3 lists: “Oct 9, 2007 Mar 9, 2009 17 −2% −59% 68%.” See SOA Research Brief (Apr 16, 2020). ↩
-
The same brief notes: “the S&P 500 cratered on March 23, down 34% from its February 19 level.” See SOA Research Brief (Apr 16, 2020). ↩
-
Bloomberg reports the fund “returned 3,612% in March” and that this came “according to an investor letter … obtained by Bloomberg.” See Taleb-Advised Universa Tail Fund Returned 3,600% in March. ↩