Detecting Crashes with Fat-Tail Statistics
Financial markets don’t follow normal distributions. That is a claim about frequency, not just theory: it tells you how often catastrophic events happen. Under a naive Gaussian model, a crisis on the scale of 2008 lands so deep in the tails that standard risk models treat it as effectively impossible. It happened on a Tuesday.
The problem is that we keep using tools designed for thin-tailed worlds. Value at Risk (VaR) models that assume normality. Risk metrics that treat the 2008 crash as an “outlier” rather than a regular feature of financial returns.
I built fatcrash, a Rust+Python toolkit with 15 classical methods, to test whether fat-tail statistical methods can detect crashes before they happen. The performance-critical math (fitting, simulation, all rolling estimators) runs in Rust via PyO3; everything else (data, viz, CLI) is Python.
#What are fat tails?
A fat-tailed distribution is one where extreme events happen far more often than a bell curve (Gaussian distribution) would predict. In a normal distribution, an event five standard deviations from the mean is essentially impossible, roughly a one-in-3.5-million chance. In a fat-tailed distribution, such events are uncommon but not rare. They show up regularly in financial data.
The technical way to describe this is through the tail index, usually written $\alpha$. A fat-tailed distribution follows a power law in the extremes: the probability of a loss larger than $x$ decays as $P(X > x) \sim x^{-\alpha}$. The smaller the $\alpha$, the fatter the tail and the more likely extreme events are.
Here is a rough guide to what different values of $\alpha$ mean:
- $\alpha < 2$: Infinite variance. The distribution is so fat-tailed that the variance doesn’t converge with more data. Standard statistics like standard deviation and correlation become unreliable. This is Cauchy distribution territory.
- $\alpha$ between 2 and 4: Finite variance but infinite kurtosis. Kurtosis measures how “peaked” a distribution is and how heavy its tails are. When kurtosis is infinite, sample estimates of it are unstable and misleading. This is where most financial assets live.
- $\alpha > 4$: Relatively thin tails. Still fatter than Gaussian, but manageable with conventional tools.
The 15 classical methods in fatcrash fall into three groups: bubble detection (finding the specific price pattern that precedes a crash), regime detection (spotting shifts in how the market behaves over time, including momentum reversals and volatility cascades), and tail estimation (measuring how fat the tails actually are). Let’s walk through each group.
#Bubble detection
These methods look for structural patterns in prices, not statistics of returns. A bubble is a regime of super-exponential growth, prices rising faster and faster, that eventually becomes unsustainable.
#LPPLS: detecting bubbles before they burst
The Log-Periodic Power Law Singularity model takes a fundamentally different approach from statistical methods. Instead of measuring properties of returns, it detects a specific pattern in prices: the bubble signature.
The theory, developed by Didier Sornette at ETH Zurich and described in his book Why Stock Markets Crash, proposes that during a bubble, prices follow super-exponential growth decorated with accelerating oscillations that converge toward a critical time $t_c$, the most likely crash date. Think of it like a wine glass vibrating at increasing frequency before it shatters.
$$\ln p(t) = A + B(t_c - t)^m + C(t_c - t)^m \cos(\omega \ln(t_c - t) + \phi)$$
In plain language: the logarithm of price is the sum of a smooth power-law growth (the $B$ term) and an oscillation whose frequency accelerates as you approach $t_c$ (the cosine term). The seven parameters encode specific dynamics:
- $t_c$: critical time (when the bubble is most likely to end)
- $m$: power law exponent (must be 0.1-0.9 for a valid bubble)
- $\omega$: log-periodic frequency (must be 6-13)
- $B < 0$: required, indicates super-exponential growth
- $A, C, \phi$: amplitude and phase parameters
Fitting this is computationally expensive. For each candidate $(t_c, m, \omega)$, the linear parameters $(A, B, C_1, C_2)$ are solved analytically via OLS. The nonlinear search uses a population-based stochastic optimizer over the 3D space. The Sornette filter rejects fits that don’t satisfy the physical constraints.
The DS LPPLS confidence indicator fits this model across many overlapping time windows. If a high fraction of windows produce valid bubble fits, confidence is high.
In practice: With the tightened Nielsen (2024) filter (omega [6,13]) and a critical-time proximity constraint, LPPLS achieves 74% recall and 37% precision (F1=50%) across 39 drawdowns. It detects the bubble regime itself (super-exponential growth + oscillations), which precedes both small corrections and major crashes. The LPPLS confidence indicator (multi-window aggregation) reaches 90% recall but 29% precision. The high false positive rate is inherent: LPPLS frequently detects “bubble signatures” during normal bull markets because super-exponential growth patterns are common in trending markets.
#GSADF: explosive unit roots
The Generalized Sup ADF (Augmented Dickey-Fuller) test, introduced by Phillips, Shi, and Yu (2015), detects explosive behavior in prices. To understand it, you need one concept: a unit root. A time series has a unit root if it follows a random walk, meaning today’s value is yesterday’s value plus random noise, with no tendency to return to a long-run average. An explosive series goes further: it grows faster than a random walk, each day’s value is a multiple of yesterday’s, like compound interest gone wild.
GSADF runs backward-expanding ADF unit root tests across all possible start and end dates, taking the supremum (the largest test statistic). If this supremum exceeds Monte Carlo critical values, the series is explosive.
GSADF is complementary to LPPLS. LPPLS detects the specific log-periodic oscillation pattern. GSADF detects any form of explosive growth, regardless of the oscillation structure.
In practice: GSADF detected 38% of drawdowns overall, but 59% of medium-sized drawdowns (15-30%). It achieves 38% precision, the highest among all methods, because explosive unit root tests are more specific than distributional measures.
#Regime detection
These methods look at how the temporal structure of returns changes over time. Before a crash, markets often shift from noisy, mean-reverting behavior to strong, persistent trending, a regime change that tail estimators can’t see because they only measure distributional shape, not temporal dependence.
#DFA: detrended fluctuation analysis
Detrended Fluctuation Analysis, introduced by Peng et al. (1994), measures long-range dependence in non-stationary time series, that is, whether today’s returns are correlated with returns from days or weeks ago. The method works by dividing the integrated series (cumulative sum of returns) into windows, fitting a local polynomial trend in each window, computing the root-mean-square residual (how much the data deviates from the local trend), and checking how that residual scales with window size:
$$F(n) \sim n^{\alpha_{\text{DFA}}}$$
This formula says: the fluctuation $F$ at scale $n$ (window size) grows as a power of $n$. The scaling exponent $\alpha_{\text{DFA}}$ classifies the dynamics:
- $\alpha_{\text{DFA}} = 0.5$: uncorrelated (random walk), no memory in the series
- $\alpha_{\text{DFA}} > 0.5$: persistent (trends tend to continue), an up day makes another up day more likely
- $\alpha_{\text{DFA}} < 0.5$: anti-persistent (mean-reverting), an up day makes a down day more likely
Before a crash, markets often transition from mean-reverting to persistent dynamics. DFA picks up this regime shift. The key advantage over simpler methods is the detrending step: by removing local polynomial trends before measuring fluctuations, DFA separates genuine long-range dependence from spurious correlations caused by local trends.
In practice: DFA was the best non-bubble crash detector in our tests (82% recall, 22% precision, F1=34%). It handles non-stationarity better than Hurst’s R/S analysis because the detrending step removes local polynomial trends before measuring fluctuations. The low precision reflects the fact that persistent dynamics are common in financial markets even outside crash windows.
#Hurst exponent: persistence detection
The Hurst exponent, introduced by Harold Edwin Hurst in 1951 while studying Nile river flooding patterns, measures long-range dependence via rescaled range (R/S) analysis. For a time series of length $n$, compute the range of cumulative deviations from the mean, rescale by the standard deviation, and measure how $R/S$ scales with $n$:
$$\frac{R}{S} \sim n^H$$
This formula says: the rescaled range grows as a power of the sample size. $H = 0.5$ is a random walk. $H > 0.5$ is persistent. $H < 0.5$ is mean-reverting. In many liquid return series, the long-run baseline is closer to $H \approx 0.5$, and R/S estimates can be biased upward in finite samples. A shift toward higher estimated $H$ before a crash means the market is trending more strongly, which often accompanies bubble formation.
In practice: Hurst detected 59% of drawdowns. It is simpler than DFA but less robust to non-stationarity, because it doesn’t remove local trends before measuring the range.
#Spectral exponent: frequency domain
The GPH log-periodogram regression, introduced by Geweke and Porter-Hudak (1983), estimates the long-memory parameter $d$ from the frequency domain. Instead of looking at how correlations decay over time (as DFA and Hurst do), it looks at how much power the signal has at different frequencies.
The relationship to Hurst: $d = H - 0.5$. Positive $d$ indicates long memory (persistence). Think of it as measuring the same phenomenon (long-range dependence) but through a different lens: time domain vs. frequency domain.
In practice: It detected 28% of drawdowns, comparable to Hill. The frequency-domain approach is theoretically elegant but doesn’t add much beyond what DFA already captures for crash detection.
#Momentum and reversal
Momentum is the tendency for assets that have been rising to keep rising, and assets that have been falling to keep falling. Jegadeesh and Titman (1993) documented this effect in equities: buying past winners and selling past losers produces positive returns over 3-12 month horizons. For crash detection, the signal is not momentum itself but its reversal. When long-term momentum is strongly positive (the asset has been trending up for months) but short-term momentum turns sharply negative (the last few weeks show a sudden decline), the divergence signals a potential crash — the trend is breaking.
The reversal signal is computed as:
$$\text{reversal} = \text{mom}{\text{long}} - \text{mom}{\text{short}}$$
where $\text{mom}(k) = \ln(P_t / P_{t-k})$ is the log-return over $k$ periods. A large positive reversal means the long-term trend is still up but the short-term move is down — exactly the pattern seen at the onset of major crashes, when a strong bull market suddenly reverses.
Scowcroft and Sefton (2005) showed that momentum returns are partly compensation for crash risk: momentum strategies are profitable on average but suffer catastrophic losses during market reversals. This is the same asymmetry in reverse — the crash unwinds the positions that momentum built up.
In practice: Momentum reversal is not used as a standalone detector (it doesn’t have its own precision/recall row). Instead, it feeds into the combined detector as an independent signal. A large positive reversal — strong 12-month returns but negative 1-month returns — is a danger sign that the existing regime methods (DFA, Hurst) can’t see because they measure temporal dependence, not price-level divergence.
#Price velocity: cascade detection
Price velocity measures the rate of change of realized volatility — not how volatile the market is, but how fast volatility is accelerating. A sudden spike in volatility acceleration often signals a forced-liquidation cascade: margin calls triggering sales, which trigger more margin calls, which trigger more sales.
$$\text{velocity} = \frac{\sigma_t - \sigma_{t-\text{lag}}}{\sigma_{t-\text{lag}}}$$
where $\sigma_t$ is the realized volatility (standard deviation of recent returns) and $\text{lag}$ is the lookback for the rate of change. When velocity exceeds a threshold, the market is in a self-reinforcing volatility spiral.
The Feb 5, 2018 “Volmageddon” is the canonical example: the VIX doubled in a single day, triggering the liquidation of short-volatility ETNs (XIV, SVXY), which forced further VIX buying, which triggered more liquidations. The Sep 2019 repo rate spike followed a similar pattern in interest rate markets. In both cases, the price velocity — the acceleration of volatility, not its level — distinguished a mechanical cascade from ordinary high-vol conditions.
In practice: Like momentum reversal, price velocity is not a standalone detector. It feeds into the combined detector as an independent signal. Its value is in catching a specific failure mode that other methods miss: the mechanical cascade, where the crash causes itself through forced selling. Tail estimators measure the distribution of returns; DFA and Hurst measure persistence; velocity measures the feedback loop in real time.
#Tail estimation
These methods directly measure the fatness of the tails, how extreme the extremes really are. They answer questions like: “How often should we expect a 10% daily loss?” and “Is the variance of this distribution even finite?”
#Hill estimator: measuring tail heaviness
The Hill estimator (Hill, 1975) is the most widely used tail index estimator. It fits a power law to the extreme values of a distribution and estimates the exponent $\alpha$.
The estimator works by sorting the data from largest to smallest, taking the $k$ largest observations (called order statistics, which is just a fancy name for sorted values), and computing:
$$\hat{\alpha} = k \left( \sum_{i=1}^{k} \ln \frac{X_{(i)}}{X_{(k)}} \right)^{-1}$$
where $X_{(1)} \geq X_{(2)} \geq \ldots$ are the order statistics. In words: take the $k$ biggest values, compute how far each is from the $k$-th largest (in log scale), average those distances, and invert. A small average log-gap means the extreme values are tightly packed (thin tail), and a large gap means they’re spread out (fat tail).
The choice of $k$ matters enormously. Too small and the estimate is noisy (not enough data points). Too large and you’re including observations from the body of the distribution, not the tail. A Hill plot ($\alpha$ vs. $k$) helps find the plateau where the estimate stabilizes.
In practice: Hill alpha is useful for characterization: it tells you what kind of distribution you’re dealing with. But as a standalone crash predictor, it’s noisy. In our tests, it only detected 28% of drawdowns.
#Kappa metrics: how far from Gaussian?
Two metrics answer this question from different angles.
Taleb’s kappa, introduced in Statistical Consequences of Fat Tails by Nassim Nicholas Taleb, measures how fast the mean absolute deviation (MAD), the average distance of observations from their mean, converges as you add more data. For well-behaved distributions, the MAD stabilizes quickly. For fat-tailed ones, it doesn’t, because new extreme observations keep pulling the average around. The formula compares the MAD at two sample sizes $n_0$ and $n$:
$$\kappa = 2 - \frac{\log n - \log n_0}{\log M(n) - \log M(n_0)}$$
where $M(n)$ is the MAD for $n$ summands. For a Gaussian, $\kappa = 0$ (fast convergence). For a Cauchy distribution, $\kappa = 1$ (no convergence at all). Values between 0 and 1 measure the degree of fat-tailedness.
Max-stability kappa takes a different approach rooted in extreme value theory. The intuition: in a fat-tailed distribution, the single most extreme observation dominates everything. If you split your data into subsamples and find the maximum of each subsample, those subsample maxima will be much smaller than the overall maximum, because the one truly extreme value ended up in just one subsample. In a Gaussian distribution, the subsample maxima would be closer to the overall maximum.
Formally: split your data into $n$ subsamples. Find the maximum of each subsample. Compare the mean of those maxima to the overall maximum:
$$\kappa_{\text{max}} = \frac{\text{mean of subsample maxima}}{\text{overall maximum}}$$
For a Gaussian distribution, there is no simple $1/\sqrt{n}$ benchmark for this ratio. In practice, we estimate the Gaussian reference level by Monte Carlo simulation, because Gaussian maxima grow only logarithmically with sample size. For fat-tailed distributions, $\kappa_{\text{max}}$ falls below that simulated benchmark because extreme observations are much more extreme than what you’d see in any subsample.
The ratio $\kappa_{\text{max}} / \text{benchmark}$ is the signal:
- Near 1.0: behaves Gaussian
- Below 0.8: significantly fat-tailed
- Below 0.5: extremely fat-tailed, crisis regime
fatcrash implements both variants.
In practice: Max-stability kappa was the best tail-based method in our tests (49% overall detection rate). It’s more robust than Hill because it doesn’t depend on choosing $k$, and it directly benchmarks against Gaussian via Monte Carlo simulation. Taleb’s kappa detected 33% of drawdowns but is more useful for long-term characterization than short-term prediction.
#Pickands estimator: domain-agnostic tail index
The Pickands estimator (Pickands, 1975) estimates the extreme value index $\gamma$ using just three order statistics, making it valid for all three domains of attraction (Frechet, Gumbel, Weibull, explained below in the EVT section):
$$\hat{\gamma} = \frac{1}{\ln 2} \ln \frac{X_{(k)} - X_{(2k)}}{X_{(2k)} - X_{(4k)}}$$
In words: look at the gaps between the $k$-th, $2k$-th, and $4k$-th largest values. If the gap between the top values is much larger than the gap further down, the tail is fat ($\gamma > 0$). If the gaps are similar, the tail is exponential ($\gamma \approx 0$). If the top gap is smaller, the tail is bounded ($\gamma < 0$).
Unlike Hill, which assumes the tail is Pareto (Frechet domain only), Pickands works regardless of the tail type.
In practice: Pickands detected 49% of drawdowns, matching max-stability kappa. Its domain-agnostic nature makes it a useful cross-check on Hill.
#DEH moment estimator
The Dekkers-Einmahl-de Haan moment estimator (Dekkers, Einmahl, and de Haan, 1989) uses first and second moments (averages and averages of squares) of log-spacings between order statistics. Like Pickands, it is valid for all domains of attraction, but it uses more data points from the tail, which makes it less volatile.
In practice: It detected 46% of drawdowns.
#QQ estimator
The QQ estimator computes the tail index from the slope of a log-log QQ plot (quantile-quantile plot) against exponential quantiles. A QQ plot compares the observed distribution against a theoretical one; if the points fall on a straight line, the distributions match. The slope of that line in log-log space gives you the tail index.
In practice: It detected 38% of drawdowns.
#Maximum-to-Sum ratio
The Maximum-to-Sum ratio is a direct diagnostic for whether extreme observations dominate the second moment. For $n$ observations, compute:
$$R_n^{(2)} = \frac{\max(X_i^2)}{\sum(X_i^2)}$$
In words: what fraction of the total squared magnitude comes from the single largest observation? If one observation dominates the entire sum of squares, the second moment is not stabilizing. If $R_n^{(2)}$ stays bounded away from zero as $n$ grows, that is evidence against a well-behaved finite-variance regime.
In practice: It detected 31% of drawdowns.
#EVT: quantifying worst-case scenarios
Extreme Value Theory (EVT) is the standard mathematical framework for modeling tail risk. Instead of fitting a distribution to all the data (where the bulk dominates and extreme events are treated as noise), EVT focuses only on the extremes.
Two complementary approaches:
GPD (Generalized Pareto Distribution): Pick a high threshold $u$ (say, losses worse than the 95th percentile). Fit the GPD to losses that exceed $u$. The Pickands-Balkema-de Haan theorem guarantees that for sufficiently high $u$, the exceedances follow a GPD regardless of the underlying distribution. The GPD has two parameters: scale ($\sigma$, how spread out the exceedances are) and shape ($\xi$, how fat the tail is). From these you get:
$$\text{VaR}_p = u + \frac{\sigma}{\xi}\left[\left(\frac{n}{N_u}(1-p)\right)^{-\xi} - 1\right]$$
$$\text{ES}_p = \frac{\text{VaR}_p + \sigma - \xi u}{1 - \xi}$$
VaR (Value at Risk) tells you the loss you won’t exceed with probability $p$. For example, a 99% VaR of 5% means that on 99% of days, you’ll lose less than 5%. ES (Expected Shortfall, also called Conditional VaR) tells you the average loss when you do exceed VaR, answering the question “when things go badly, how bad do they get on average?”
GEV (Generalized Extreme Value): Instead of exceedances over a threshold, fit to block maxima (e.g., the worst loss each month). The Fisher-Tippett-Gnedenko theorem guarantees that block maxima converge to a GEV distribution. The shape parameter $\xi$ tells you the tail type:
- $\xi > 0$: Frechet (fat tail, power-law decay), typical for finance
- $\xi \approx 0$: Gumbel (exponential tail)
- $\xi < 0$: Weibull (bounded tail, there’s a maximum possible value)
In practice: GPD VaR detected 42% of drawdowns. It works well for medium corrections but struggles with major crashes because the pre-crash period is itself volatile, making the baseline VaR already elevated.
#Results on 39 drawdowns
We tested all methods on 39 drawdowns across three assets (BTC, SPY, Gold). A drawdown is defined as a peak-to-trough decline in daily close; the pre-crash window is the 120 trading days before the peak, and the calm window is a similar period ending well before the peak. A method “detects” a crash if its signal during the pre-crash window is significantly elevated compared to the calm window.
We also test each method on ~150 non-crash windows (50 per asset, sampled at least 180 days from any crash) to measure false positive rates. This gives us precision, recall, and F1 — not just recall.
The table below shows the 13 classical methods that have standalone precision/recall/F1 scores. Momentum reversal and price velocity are used as signals in the combined detector rather than standalone detectors.
| Method | Precision | Recall | F1 |
|---|---|---|---|
| LPPLS | 37% | 74% | 50% |
| LPPLS confidence | 29% | 90% | 43% |
| GSADF | 38% | 38% | 38% |
| DFA | 22% | 82% | 34% |
| Hurst | 19% | 59% | 28% |
| Pickands | 19% | 49% | 27% |
| Kappa | 19% | 49% | 27% |
| DEH | 18% | 46% | 26% |
| Spectral | 22% | 28% | 25% |
| Taleb Kappa | 20% | 33% | 25% |
| 16% | 38% | 23% | |
| GPD VaR | 12% | 42% | 19% |
| Max-to-Sum | 12% | 31% | 18% |
| Hill | 12% | 28% | 16% |
Precision = how often a signal is correct (TP/(TP+FP)). Recall = how many crashes are caught (TP/(TP+FN)). F1 = harmonic mean of both.
Why precision is low for tail/regime methods: These methods detect distributional regime shifts (tail thickening, persistent dynamics), not crash-specific patterns. They fire in many non-crash periods because fat tails and persistence are pervasive in financial data. This is by design — they measure the distributional regime, not a specific crash. LPPLS and GSADF have higher precision because they detect bubble-specific structure.
The Sornette-Bouchaud debate on precision vs recall: Sornette (the LPPLS inventor) argues that LPPLS is deliberately tuned for high recall at the cost of precision because the cost function is asymmetric — missing a crash is far more expensive than a false alarm. He calls false positives “failed predictions” and argues they are inevitable: bubbles can end in slow deflation rather than sharp crashes. His 2024 paper with Nielsen (arXiv:2405.12803) introduced the tightened omega [6,13] range specifically to improve precision without sacrificing recall.
Bouchaud takes a more skeptical view. In his work at CFM and in papers with Potters, he emphasizes that fat-tail estimators (Hill, etc.) measure unconditional properties of returns and are poor at conditional crash prediction. His point is exactly what the data shows: tail estimators have decent recall but low precision because fat tails are always present, not just before crashes. He favors portfolio-level risk measures (drawdown control, volatility targeting) over point-in-time crash prediction.
Both perspectives are reflected in fatcrash: LPPLS targets the mechanism (Sornette’s approach), tail estimators measure the regime (which Bouchaud correctly notes is always fat-tailed), and the aggregator combines both — using Sornette-style bubble detection as the primary signal and Bouchaud-style regime measurement as confirmation.
Recall by crash size shows that LPPLS confidence catches 93% of small, 94% of medium, and 75% of major crashes. DFA catches 86% of small and 88% of medium crashes.
#Major known crashes
Testing on four major crashes with pre-crash vs. calm period comparison:
| Crash | Kappa | GPD VaR | LPPLS | Hill |
|---|---|---|---|---|
| 2017 BTC Bubble | detected | detected | detected | missed |
| 2021 BTC Crash | detected | detected | detected | missed |
| 2008 Financial Crisis | detected | detected | detected | detected |
| COVID Crash 2020 | detected | — | detected | missed |
Kappa and LPPLS each detected all four. GPD VaR detected 3 of 4. Hill detected 1 of 4. These are recall numbers — false positive rates are reported in the table above.
#Why Hill underperforms
Hill measures the tail index of the return distribution, but this property changes slowly. A 6-month pre-crash window doesn’t necessarily have thinner tails than a 6-month calm window because the calm window might include its own mini-shocks. The Hill estimator is useful for long-term characterization (this asset has $\alpha=3$, that one has $\alpha=4$) but not for short-term prediction.
#Why LPPLS leads on F1
LPPLS detects structure, not statistics. It’s looking for a specific pattern: accelerating growth with log-periodic oscillations. This pattern appears before both 10% corrections and 80% crashes. The tail-based methods need to see the tail thickening, which requires the crash to already be underway. LPPLS sees the bubble building.
With the tightened Nielsen (2024) filter (omega restricted to [6,13] instead of the original loose [2,25], plus a critical-time proximity constraint requiring tc to fall within 40% of the window length after the end), LPPLS achieves the best F1 score (50%) by balancing recall (74%) and precision (37%). The LPPLS confidence indicator trades precision (29%) for higher recall (90%) by aggregating across many sub-windows.
The relatively low precision is inherent: LPPLS frequently detects “bubble signatures” during normal bull markets because super-exponential growth patterns are common. The solution is combining it with the tail-based and regime methods. If LPPLS says “bubble” and kappa says “tails thickening” and DFA shows persistent dynamics, the signal is more reliable.
#Why DFA is the best non-bubble method
DFA detects regime shifts in the correlation structure of returns. Before a crash, markets transition from noisy mean-reverting behavior to strongly persistent trending. This transition is invisible to tail estimators like Hill or kappa, which measure distributional shape. DFA measures temporal dependence. The detrending step gives DFA an edge over Hurst’s R/S analysis (82% recall vs 59%) because raw R/S conflates local trends with long-range dependence. DFA strips out the local trends and measures the residual scaling.
DFA’s 82% recall is high but its 22% precision means it also fires in many non-crash periods — persistent dynamics are common in financial markets. DFA is particularly strong on small and medium drawdowns (86% and 88% recall), where tail-based methods struggle because the distributional shift is subtle.
#Combined detector
For the combined detector, signals are grouped into four independent categories: bubble (LPPLS and GSADF), tail (Hill, Pickands, DEH, QQ, Max-to-Sum, Taleb Kappa, Max-Stability Kappa, GPD VaR), regime (DFA, Hurst, Spectral, momentum reversal), and structure (multiscale agreement across daily, 3-day, and weekly frequencies, LPPLS critical time proximity, and price velocity). When three or more categories independently signal elevated risk, the combined detector applies a +15% bonus to the crash probability.
| Small (<15%) | Medium (15-30%) | Major (>30%) | Overall | |
|---|---|---|---|---|
| Combined (agreement bonus) | 64% | 94% | 75% | 79% |
The combined detector reaches 79% overall, with 94% on medium-sized drawdowns. The agreement requirement filters out most of LPPLS’s false positives while retaining most of its true positives. The gap between small (64%) and medium/major (94%/75%) drawdowns reflects the fact that small corrections often happen without prior tail thickening or regime change. They are genuine surprises, and no method should be expected to predict all of them.
#Long timescales
#54 years of forex (GBP/USD)
We tested on GBP/USD daily data from 1971 to 2025 (13,791 trading days):
| Decade | Hill alpha | Kappa/benchmark | Notable |
|---|---|---|---|
| 1970s | 2.92 | 0.78 | Oil crises, IMF bailout |
| 1980s | 4.36 | 0.68 | Plaza Accord |
| 1990s | 4.51 | 0.83 | Black Wednesday |
| 2000s | 2.90 | 0.57 | 2008 crisis |
| 2010s | 3.86 | 0.35 | Brexit |
| 2020s | 3.39 | 0.71 | Truss mini-budget |
Every decade shows fat tails. In our labeling, all six GBP/USD crisis events were detected (6/6):
- 1976 IMF Crisis
- 1985 Plaza Accord
- 1992 Black Wednesday
- 2008 Financial Crisis
- 2016 Brexit Vote
- 2022 Truss Mini-Budget
#All methods on daily forex (1971-2025)
We ran all methods on 23 currency pairs from FRED daily data. The table shows the key estimators from each category — tail (Hill, QQ, DEH), regime (Hurst, DFA), and bubble (GSADF):
| Pair | Hill $\alpha$ | QQ $\alpha$ | DEH $\gamma$ | Hurst $H$ | DFA $\alpha$ | GSADF |
|---|---|---|---|---|---|---|
| VEF/USD | 1.20 | 0.82 | 1.06 | 0.53 | 0.82 | bubble |
| HKD/USD | 1.73 | 2.12 | 0.24 | 0.54 | 0.62 | bubble |
| KRW/USD | 1.90 | 1.93 | 0.44 | 0.67 | 0.60 | bubble |
| MXN/USD | 2.04 | 1.98 | 0.44 | 0.56 | 0.57 | bubble |
| LKR/USD | 2.14 | 1.97 | 0.51 | 0.58 | 0.66 | bubble |
| TWD/USD | 2.31 | 2.62 | 0.21 | 0.60 | 0.63 | bubble |
| THB/USD | 2.38 | 2.43 | 0.33 | 0.58 | 0.59 | bubble |
| MYR/USD | 2.42 | 2.46 | 0.33 | 0.58 | 0.60 | bubble |
| AUD/USD | 2.58 | 2.30 | 0.44 | 0.56 | 0.56 | bubble |
| INR/USD | 2.62 | 2.56 | 0.34 | 0.57 | 0.58 | bubble |
| CNY/USD | 2.79 | 1.70 | 0.69 | 0.59 | 0.71 | bubble |
| BRL/USD | 2.80 | 3.12 | 0.15 | 0.56 | 0.58 | bubble |
| NZD/USD | 2.89 | 2.46 | 0.44 | 0.57 | 0.56 | — |
| ZAR/USD | 3.19 | 3.43 | 0.15 | 0.58 | 0.54 | bubble |
| NOK/USD | 3.39 | 3.44 | 0.22 | 0.57 | 0.53 | bubble |
| SEK/USD | 3.50 | 2.88 | 0.41 | 0.58 | 0.55 | — |
| SGD/USD | 3.59 | 3.66 | 0.18 | 0.56 | 0.53 | bubble |
| CHF/USD | 3.81 | 3.59 | 0.28 | 0.57 | 0.54 | — |
| CAD/USD | 3.84 | 3.58 | 0.27 | 0.57 | 0.53 | bubble |
| DKK/USD | 3.84 | 3.23 | 0.37 | 0.58 | 0.55 | — |
| JPY/USD | 3.94 | 4.02 | 0.18 | 0.58 | 0.58 | bubble |
| GBP/USD | 4.13 | 4.11 | 0.19 | 0.58 | 0.55 | bubble |
| EUR/USD | 4.88 | 4.90 | 0.12 | 0.56 | 0.54 | — |
All 23 pairs show fat tails: DEH $\gamma > 0$ for 23/23, and two independent tail index estimators converge (mean Hill $\alpha$ = 2.95, mean QQ $\alpha$ = 2.84). Under these estimators, all 23 also show persistence-leaning signatures: Hurst $H > 0.5$ and DFA $\alpha > 0.5$ for every pair. GSADF detected explosive episodes in 18 of 23 pairs.
VEF/USD (Venezuela) is the extreme case: Hill $\alpha$ = 1.20, QQ $\alpha$ = 0.82, DEH $\gamma$ = 1.06 — every estimator confirms infinite variance. At the other end, EUR/USD has the thinnest tails (Hill $\alpha$ = 4.88) but is still fat-tailed by any standard. KRW/USD has the fattest tails among liquid pairs (Hill $\alpha$ = 1.90). CNY/USD shows the strongest persistence (DFA = 0.71), consistent with managed float dynamics.
#500 years, 138 countries
Using the Clio Infra exchange rate dataset (1500-2013), we ran all tail and regime methods on every country with 50+ years of data.
Results:
| Tail regime | Countries | Percentage |
|---|---|---|
| $\alpha < 2$ (Hill estimate in the infinite-variance regime) | 98 | 71% |
| $\alpha$ 2-4 (fat tails, finite variance) | 37 | 27% |
| $\alpha > 4$ (moderate tails) | 3 | 2% |
71% of countries land in the Hill-estimated $\alpha < 2$ regime. The median $\alpha$ across all 138 countries is 1.57. This is strong evidence that variance is unstable or poorly estimated for a large share of the sample. GEV points in the same direction: 81% of countries show Frechet-type (fat) tails with median $\xi = 0.76$.
The most extreme cases:
Returns in this table are log-returns, which can exceed -100% or +100%. A log-return of -2,748% means the currency lost virtually all its value (the price ratio $e^{-27.48} \approx 0$). This convention is standard in fat-tail analysis because log-returns are additive across time periods.
| Country | Years of data | Hill $\alpha$ | Taleb $\kappa$ | Worst year | Best year |
|---|---|---|---|---|---|
| Syria | 61 | 0.32 | — | -2% | +105% |
| Iraq | 61 | 0.40 | — | -38% | +883% |
| Germany | 153 | 0.52 | 1.00 | -2,748% | +2,104% |
| Nicaragua | 76 | 0.52 | — | -2,159% | +787% |
| Zimbabwe | 56 | 0.55 | — | -12% | +1,345% |
| Hungary | 66 | 0.56 | — | -944% | +247% |
| Peru | 64 | 0.72 | — | -1,660% | +426% |
| Bolivia | 63 | 0.76 | — | -1,794% | +494% |
| Brazil | 129 | 1.03 | — | -3,536% | +318% |
| Argentina | 102 | 1.28 | 1.00 | -2,748% | +388% |
Germany’s -2,748% in a single year is the Weimar hyperinflation. Brazil’s -3,536% reflects the cruzeiro collapse. These aren’t outliers. They’re exactly what a distribution with $\alpha < 1$ predicts.
Running all classical methods on the top 30 countries by data length confirms that fat tails and persistence go hand in hand:
| Country | Years | Hill $\alpha$ | Hurst $H$ | Taleb $\kappa$ | Verdict |
|---|---|---|---|---|---|
| Germany | 153 | 0.52 | 0.56 | 1.00 | extreme, persistent |
| Austria | 104 | 0.63 | 0.61 | 1.00 | extreme, persistent |
| Belgium | 114 | 0.89 | 0.64 | 0.86 | extreme, persistent |
| Finland | 100 | 0.94 | 0.58 | 0.43 | extreme, persistent |
| Italy | 95 | 0.77 | 0.80 | 0.95 | extreme, persistent |
| Portugal | 88 | 0.98 | 0.85 | 1.00 | extreme, persistent |
| Greece | 87 | 0.77 | 0.76 | 0.81 | extreme, persistent |
| Argentina | 102 | 1.28 | 0.71 | 1.00 | extreme, persistent |
| Mexico | 113 | 1.06 | 0.70 | 0.92 | extreme, persistent |
| UK | 223 | 2.42 | 0.47 | 0.04 | fat-tailed |
| Canada | 100 | 3.70 | 0.50 | 0.00 | fat-tailed |
Of the top 30 countries, 19 have $\alpha < 2$ (infinite variance), 25 have Hurst $H > 0.5$ (persistent dynamics), and 28 have QQ $\alpha < 4$ (heavy tails confirmed by multiple estimators). Germany, Austria, Argentina, and Portugal saturate at Taleb $\kappa = 1.0$ — Cauchy-like behavior where the CLT does not operate at any practical sample size. Italy ($H$ = 0.80) and Portugal ($H$ = 0.85) show the strongest persistence over century-scale data.
#Century-by-century: United Kingdom (1789-2013)
The UK has 224 years of continuous exchange rate data:
| Century | Hill $\alpha$ | Regime |
|---|---|---|
| 1800s | 1.19 | Infinite variance (Napoleonic wars, banking crises) |
| 1900s | 3.17 | Fat but finite (Bretton Woods stability) |
| 2000s | 2.04 | Back to borderline infinite variance |
Even within a single country, tail regimes shift across centuries.
#Inflation: 500 years, 82 countries
Inflation data from Clio Infra (1500-2010):
| Statistic | Value |
|---|---|
| Countries analyzed | 82 |
| Countries with hyperinflation (>100%/yr) | 32 (39%) |
| Countries with $\alpha < 2$ | 36 (44%) |
| Median $\alpha$ | 2.14 |
The most extreme inflation tails:
| Country | Years | Hill $\alpha$ | Max inflation |
|---|---|---|---|
| Nicaragua | 71 | 0.30 | 13,110%/yr |
| Zimbabwe | 83 | 0.44 | 24,411%/yr |
| Germany | 494 | 0.57 | 211,427,400,000%/yr |
| Brazil | 226 | 0.63 | 2,948%/yr |
| Peru | 363 | 0.80 | 7,482%/yr |
| Argentina | 274 | 0.85 | 3,079%/yr |
| China | 336 | 0.86 | 1,579%/yr |
| Poland | 414 | 0.97 | 4,738%/yr |
Germany has 494 years of inflation data with $\alpha = 0.57$. Its maximum annual inflation was 211 billion percent (Weimar 1923). With $\alpha < 1$, neither the mean nor the variance of this distribution converges. You cannot compute a confidence interval. You cannot build a VaR model. The standard toolkit breaks down.
#Extended validation: 96 crash windows across forex and equities
The original 39-drawdown evaluation used only BTC, SPY, and Gold. To test whether the results generalize, we extended the dataset with 23 FRED daily forex pairs (1971-2025) and 6 equity files covering the 2008, 2020, and 2022 crises. This yielded 96 total crash windows and 631 non-crash windows.
LPPLS recall held at 89% on the extended dataset and 90% combined. The precision and F1 patterns remained stable: LPPLS leads on recall, GSADF leads on precision, DFA is the best non-bubble method. The combined detector’s agreement bonus continues to filter false positives effectively.
The forex pairs confirmed that fat tails and persistence are universal: Hill $\alpha < 4$ for all 23 pairs, Hurst $H > 0.5$ for all 23. EM currency pairs (BRL, MXN, KRW, ZAR) show the fattest tails, as expected from their crisis histories.
#Beyond market prices: detecting problems in revenue and profit data
These methods were built for market prices, but most transfer to any time series where you need to detect structural problems — company revenue, profit margins, unit economics, or any financial metric that changes over time. The key distinction: market prices reflect collective speculative behavior (herding, positive feedback loops), while revenue and profit reflect real economic activity (customer demand, operational execution, competitive dynamics).
#What transfers to company data
Tail estimation (Hill, DEH, QQ, Pickands, Kappa, Max-to-Sum, GPD/GEV) — Yes. Revenue growth rates have fat tails. Gabaix (2011) showed that idiosyncratic firm-level shocks drive aggregate fluctuations precisely because firm-size distributions are fat-tailed. A company with Hill $\alpha < 2$ on quarterly revenue growth has a distribution where a single catastrophic quarter can dominate the entire history. EVT gives you calibrated worst-case scenarios: fit GPD to the worst quarterly declines for a valid tail risk estimate.
Persistence detection (DFA, Hurst, Spectral) — Yes. Revenue series often show strong persistence ($H > 0.5$) due to contracts, recurring revenue, and customer stickiness. A shift from persistent ($H > 0.5$) to anti-persistent ($H < 0.5$) could signal fundamental deterioration — the business is losing its growth momentum. DFA handles the non-stationarity inherent in growing companies better than raw Hurst.
GSADF — Partially. It detects unsustainable exponential growth. Applied to revenue, it could flag “growth bubbles” — growth rates that would require capturing 100% of the addressable market to sustain. Useful for evaluating whether a company’s growth trajectory is explosive (and therefore unsustainable) or merely strong.
Momentum and velocity — Partially. Revenue momentum (trailing growth rates) is meaningful for company analysis. A reversal in revenue momentum — strong long-term growth suddenly decelerating — is a classic warning sign. Price velocity (volatility acceleration) is less directly applicable to revenue data, which doesn’t exhibit the forced-liquidation cascades it was designed to detect.
LPPLS — No. It models speculative bubble dynamics: herding, log-periodic oscillations, reflexive feedback loops. Revenue doesn’t exhibit these patterns. It’s driven by real economic activity, not reflexive speculation. Don’t apply LPPLS to your quarterly revenue.
#Practical example
For a company’s quarterly revenue time series:
# Quarterly revenue growth rates
=
# Tail index — are revenue shocks fat-tailed?
# Persistence — is growth momentum persistent or fading?
# Same question, different method (cross-check)
The practical challenge: quarterly data gives roughly 80 observations over 20 years (vs. 5,000+ daily prices). Tail estimators need at least 100 data points to be reliable. Use monthly revenue or longer history when possible. For shorter series, DFA and Hurst are more robust than Hill because they measure temporal structure rather than distributional shape.
What to watch for:
- Hill $\alpha$ dropping below 3: Revenue shocks are getting more extreme. The distribution is shifting toward heavier tails.
- DFA shifting from > 0.5 to < 0.5: Growth momentum is breaking down. Revenue used to be self-reinforcing; now it’s mean-reverting.
- Max-to-Sum ratio rising: A single quarter is starting to dominate the entire history — either a massive win or a massive loss.
- GPD VaR spiking: The worst-case quarterly decline is getting worse, even accounting for the fat tails.
These methods won’t tell you why revenue is deteriorating — you still need business context for that. But they can tell you that something structural has changed in the data before it becomes obvious in the headline numbers.
#Conclusions
-
Fat tails are universal in this dataset. Every asset class and timescale we tested shows materially heavier tails than a Gaussian baseline. In the long-horizon exchange-rate panel, 71% of countries land in the Hill-estimated $\alpha < 2$ regime. That makes variance-based summaries far less reliable than standard finance usually assumes.
-
LPPLS has the best F1 score (50%) because it detects bubble structure, not tail statistics. With tightened filters (Nielsen omega [6,13], tc constraint), it achieves 74% recall and 37% precision. The LPPLS confidence indicator trades precision (29%) for recall (90%). Both have substantial false positive rates — bubble signatures appear during normal bull markets too.
-
DFA is the best non-bubble method (82% recall, F1=34%). It detects regime shifts in temporal dependence, not distributional shape. The detrending step makes it robust to non-stationarity where simpler methods like Hurst (59% recall) are confused by local trends. Low precision (22%) reflects that persistent dynamics are common in financial markets.
-
Tail-based methods have moderate recall but low precision. Kappa and Pickands (49% recall each), DEH (46%), Hill (28%). These methods detect distributional regime shifts that are pervasive in financial data, not crash-specific patterns. They are most valuable as ensemble components, not standalone detectors.
-
The combined detector reaches 79% recall. When bubble, tail, regime, and structural methods independently agree, the signal is more reliable. The agreement requirement filters most false positives while preserving 94% detection on medium-sized drawdowns. Momentum reversal and price velocity add structural signals that capture crash dynamics invisible to distributional methods — trend breaks and forced-liquidation cascades, respectively. No single method is sufficient. These results were validated on an extended dataset of 96 crash windows across 23 forex pairs and equity crises, with LPPLS recall stable at 89-90%.
-
Standard variance-based risk models are badly stressed by this data. Modern Portfolio Theory, CAPM, and Black-Scholes-style thinking rely heavily on stable variance estimates. In the long-horizon currency panel, 71% of countries land in the Hill-estimated $\alpha < 2$ regime, which means those variance inputs are often fragile, unstable, or badly misspecified.
-
Hyperinflation isn’t rare. 39% of countries experienced >100% annual inflation at some point. Germany’s 211 billion percent is extreme, but dozens of countries experienced four- and five-digit inflation. Any model that treats these as “outliers” is a model that doesn’t understand the data it’s modeling.
-
Most methods transfer beyond market prices. Tail estimation, persistence detection, momentum, and EVT work on any time series — revenue, profit, unit economics. LPPLS doesn’t transfer (it models speculative dynamics), but Hill, DFA, Hurst, momentum reversal, and GPD work on company-level data. The challenge is sample size: quarterly data gives ~80 observations vs. 5,000+ daily prices. Use monthly data when possible.
Beyond crash detection, fatcrash includes two portfolio-level tools. Constant volatility targeting (Hallerbach, 2012) sizes positions inversely to realized volatility: when vol spikes, reduce exposure; when vol is low, increase it. This is the Bouchaud-school response to crash risk — don’t predict crashes, just mechanically reduce exposure when the market gets rough. The rebalance risk signal (Rattray, Granger, Harvey, and Van Hemert, 2020) addresses a subtler problem: mechanical rebalancing (buying stocks after they fall to maintain a target allocation) is negative convexity. During persistent drawdowns where DFA shows trending dynamics and momentum is negative, “buying the dip” amplifies losses. The signal combines DFA persistence with momentum direction to warn when rebalancing is dangerous.
The code is open source: github.com/unbalancedparentheses/fatcrash. The forex data comes from forex-centuries.
#References
- Jegadeesh, N. and Titman, S. (1993). Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency. Journal of Finance, 48(1).
- Scowcroft, A. and Sefton, J. (2005). Understanding Momentum. Journal of Asset Management, 6(3).
- Hallerbach, W. (2012). A Proof of the Optimality of Volatility Weighting over Time. Journal of Investment Strategies, 1(4).
- Rattray, S., Granger, N., Harvey, C. R., and Van Hemert, O. (2020). Strategic Rebalancing. Journal of Portfolio Management, 46(6).
- Jordà, Ò., Schularick, M. and Taylor, A. M. (2019). The Rate of Return on Everything, 1870-2015. Quarterly Journal of Economics, 134(3).
- Nielsen, M. and Sornette, D. (2024). Deep LPPLS. arXiv:2405.12803.
#Disclaimer
This article is for educational and research purposes only. Nothing here constitutes financial advice, trading recommendations, or an invitation to buy or sell any asset. The detection rates reported are retrospective, based on labeled historical events, and should not be interpreted as predictive accuracy for future markets. Always consult a qualified financial advisor before making investment decisions.