Do Technical Indicators Actually Work? We Ran 856 Statistical Tests to Find Out

A rigorous, pre-registered-style study of 14 indicators across 20 years, 5 assets, and 10,000-permutation bootstrap testing with Bonferroni correction.

The Question Every Trader Avoids Answering Honestly

Retail trading content runs on a single engine: the promise that a specific chart pattern, crossover signal, or indicator reading predicts what happens next. RSI below 30 means it’s oversold and will bounce. The 50-day crosses above the 200-day — golden cross — and the bull run begins. MACD crosses bullish and you buy.

These are presented as facts. They are rarely tested as hypotheses.

We tested them. All of them. With the kind of statistical rigor that would survive peer review: Welch’s t-tests, 5,000-to-10,000 permutation bootstraps, Bonferroni correction for multiple comparisons, and Cohen’s d effect sizes. 856 hypothesis tests total.

Here’s what held up — and what collapsed.

Methodology

Data

Universe: SPY, QQQ, GLD, TLT, EEM — five assets spanning equities (broad, tech), commodities (gold), bonds, and emerging markets
Period: January 2004 – December 2024 (20 years, ~5,280 trading days per asset)
Source: Yahoo Finance adjusted close prices, with volume for OBV/MFI

Diverse assets matter. An indicator that only “works” on SPY might be picking up a statistical artifact of U.S. equity bull markets. We want signals that generalize.

Statistical Design

For each indicator signal (e.g., “RSI crosses below 30”), we split every trading day into two groups:

Signal days: days where the indicator fired
Non-signal days: all other days

We then measured forward log returns at four horizons: 1, 5, 10, and 20 trading days.

Three tests per (indicator × asset × horizon) combination:

Welch’s t-test — two-sided, unequal variance. Standard parametric test.
Permutation bootstrap — 5,000 to 10,000 random shuffles of signal labels. The null distribution is built from the data itself, no distributional assumptions.
Cohen’s d — effect size. A p-value tells you whether an effect exists; Cohen’s d tells you whether it matters.

Multiple Comparison Correction

This is where most retail “backtests” fall apart. If you test 856 hypotheses at α = 0.05, you expect 42.8 false positives by pure chance — no indicator required. Raw statistical significance is meaningless without correction.

We applied Bonferroni correction: the threshold drops from 0.05 to 0.05 / 856 = 0.000058. Only tests that survive this level of scrutiny are reported as real.

The Indicators Tested

Category	Indicators
Momentum oscillators	RSI, Stochastic, Williams %R, CCI, MFI, Rate of Change (ROC)
Trend-following	SMA crossovers, EMA crossovers, Parabolic SAR, ADX
Volatility / bands	Bollinger Bands, Keltner Channel
Volume	OBV
Hybrid	MACD

14 indicators. 856 hypothesis tests. 50 Bonferroni survivors.

Results

The Scorecard

Indicator Scorecard — Bonferroni Survivors

Figure 1: Number of hypothesis tests surviving Bonferroni-corrected permutation testing, by indicator.

Rank	Indicator	Bonferroni Survivors	Best Cohen’s d	Verdict
1	OBV	8	0.219	★★★★★ REAL
2	MACD	8	0.149	★★★★★ REAL
3	Keltner Channel	7	0.311	★★★★★ REAL
4	Williams %R	6	0.229	★★★★★ REAL
5	RSI	6	0.302	★★★★★ REAL
6	Stochastic	6	0.229	★★★★★ REAL
7	EMA	4	0.205	★★★★ REAL
8	CCI	2	0.149	★★ MARGINAL
9	MFI	1	0.461	★ MARGINAL
10	Bollinger Bands	1	0.278	★ MARGINAL
11	ADX	1	0.139	★ MARGINAL
12	SMA	0	0.862	NOISE
13	ROC	0	0.178	NOISE
14	PSAR	0	0.181	NOISE

Effect Sizes

Cohen's d Effect Sizes by Indicator

Figure 2: Distribution of absolute Cohen’s d values across all tests per indicator. Higher = more practically significant.

Effect sizes are small across the board — this is financial markets, not physics. But “small” is not the same as “useless.” A Cohen’s d of 0.3 on a 20-day forward return horizon, applied systematically, represents real edge.

The 25 Best Individual Signals

These are the tests that survived Bonferroni correction AND had the largest effect sizes:

Asset	Signal	Horizon	N	Diff (bps)	Cohen’s d
SPY	MFI oversold (<20)	1d	76	+54	0.461
SPY	KC lower band touch	20d	454	+141	0.311
SPY	RSI oversold (<30)	20d	249	+137	0.302
SPY	RSI oversold (<30)	5d	249	+69	0.290
QQQ	RSI oversold (<30)	1d	273	+39	0.290
GLD	BB lower band touch	20d	236	+131	0.279
QQQ	RSI oversold (<30)	5d	273	+73	0.262
SPY	KC lower band touch	5d	456	+61	0.255
TLT	RSI oversold (<30)	20d	454	−92	−0.243
SPY	Stochastic oversold (<20)	5d	698	+55	0.229
SPY	Williams %R oversold (<−80)	5d	698	+55	0.229
QQQ	KC lower band touch	5d	467	+61	0.221
SPY	OBV above SMA	20d	3,086	−99	−0.219
SPY	OBV below SMA	20d	2,159	+99	0.219
GLD	RSI oversold (<30)	20d	550	+101	0.214
SPY	Williams %R oversold (<−80)	20d	696	+97	0.214
SPY	Stochastic oversold (<20)	20d	696	+97	0.214
SPY	KC lower band touch	10d	454	+70	0.214
GLD	KC lower band touch	20d	694	+100	0.213
QQQ	Stochastic oversold (<20)	5d	748	+55	0.199
QQQ	Williams %R oversold (<−80)	5d	748	+55	0.199

The pattern is immediate and striking: every single top signal is an oversold reading on a mean-reversion oscillator. Not a crossover. Not a breakout. Oversold.

The Golden Cross Is Mythology

SMA vs. Oscillator Comparison

Figure 3: Mean forward returns following trend-following vs. mean-reversion signals. Mean-reversion signals consistently positive; trend-following near zero or negative.

We ran a dedicated study on SMA-200 crossovers — the most popular signal in retail trading. Six variants tested:

Price crosses above 200-day SMA (bullish crossover)
Price crosses below 200-day SMA (bearish crossover)
50-day SMA crosses above 200-day SMA (Golden Cross)
50-day SMA crosses below 200-day SMA (Death Cross)
Regime filter: all days price is above 200 SMA
Regime filter: all days price is below 200 SMA

Bonferroni survivors: 2 — and both are the same finding mirrored (TLT regime above/below SMA at 20d), consistent with bonds trending differently than equities, not evidence of the crossover signal itself predicting anything.

The Golden Cross and Death Cross produced zero statistically significant tests after multiple comparison correction. This is consistent across all five assets and all four forward return horizons.

SMA also had the highest raw Cohen’s d in the entire study — 0.862 — driven by the SMA regime-filter on SPY. But it failed Bonferroni correction, meaning the effect does not survive the multiple comparison threshold. The number of raw tests run creates enough chance hits to generate impressive-looking best-case numbers.

The Golden Cross is a myth. Death Cross is a myth. SMA crossovers do not predict returns.

What Actually Works: Mean Reversion in Oversold Conditions

Forward Return Curves — Top SPY Signals

Figure 4: Cumulative forward log return curves for the strongest individual signals on SPY.

The consistent finding across all three studies (Bollinger Bands, SMA-200, comprehensive benchmark) is the same:

Markets mean-revert when sufficiently oversold. Trend-following signals do not predict returns.

The strongest signals:

RSI < 30 (Oversold) — 6 Bonferroni survivors. SPY at RSI < 30 produces +137 bps over the next 20 days (d = 0.30). QQQ produces +39 bps even at the 1-day horizon. GLD adds +101 bps at 20 days. This signal generalizes across assets and horizons. TLT is the outlier: bonds oversold by RSI tend to continue falling, not revert — a meaningful exception.

Keltner Channel lower touch — 7 survivors. When price touches the lower Keltner band (2 ATRs below EMA-20), the next 20 days on SPY produce +141 bps (d = 0.31). The signal also works on QQQ and GLD. This may outperform RSI because Keltner bands adapt to volatility rather than using a fixed 14-period momentum window.

OBV vs. its SMA — 8 survivors. When OBV is below its moving average (selling pressure in volume terms), subsequent 20-day SPY returns are +99 bps (d = 0.22). When OBV is above its SMA, returns are −99 bps (d = −0.22). This is the most consistent indicator across time, suggesting that volume-price divergence carries real information about near-term mean reversion.

MFI < 20 — only 1 survivor, but the largest effect size in the entire study: d = 0.461 on SPY at 1-day horizon, +54 bps. The Money Flow Index (a volume-weighted RSI) fires rarely — only 76 times in 20 years — but when it does, the 1-day forward return is exceptional. Low sample size hurts Bonferroni survival across other variants.

Williams %R and Stochastic — 6 survivors each. Both measure oversold conditions in their own mathematical form; their high correlation means they largely duplicate each other’s signal. The 5-day and 20-day horizons show consistent +55–97 bps edges on SPY and QQQ.

What doesn’t work:

Parabolic SAR: 0 Bonferroni survivors, negative mean return differential (−11 bps at 5d). Following PSAR crossovers loses money on average.
MACD: 8 survivors in count, but the best Cohen’s d is only 0.149. MACD’s survival comes from sheer number of tests (80 total), not from large effects. The mean bps differential at 20-day horizon is −7.6 bps — negative.
ROC: 0 survivors. Rate of change is noise.
ADX: 1 marginal survivor, d = 0.139. Trend strength, as measured by ADX, does not predict returns with enough consistency to be actionable.

The Mean-Reversion vs. Trend-Following Divide

This study joins a body of academic literature (Jegadeesh & Titman 1993, Lo & MacKinlay 1988, Poterba & Summers 1988) in finding that:

Short-to-medium term equity returns exhibit mean reversion at the daily-to-monthly horizon
Trend-following (momentum) works at longer horizons (months to years) or cross-sectional ranking — not on individual signals at daily chart horizons
Volatility-adapted signals (Keltner, Bollinger) outperform simple price-level signals (SMA) because they normalize for changing market regimes

The retail trading industry sells the trend-following story because it’s psychologically compelling. Crossovers are visually clean, easy to explain on YouTube, and generate consistent trading activity (brokerage revenue). The evidence says they don’t work.

The mean-reversion story is less exciting — “buy when the market falls hard, because it usually recovers” — but it’s what the data supports.

Practical Implications

If you use RSI:

The oversold signal (< 30) has genuine edge, particularly on SPY, QQQ, and GLD at 5-to-20-day horizons. The overbought signal (> 70) was not in the top survivors — mean reversion works more cleanly on the downside.

If you use Keltner Channels:

Lower band touches on SPY and GLD are your highest-d signals in the entire study. The 20-day forward return is 141 bps with a p-value indistinguishable from zero across 10,000 permutations. This is actionable — with appropriate position sizing.

If you use OBV:

Volume divergence from price direction is real. OBV below its SMA predicts positive returns over the following month. This is the most consistent signal in the study — 8 Bonferroni survivors, generalizes across assets.

If you use SMA crossovers:

Stop. The Golden Cross and Death Cross have been empirically tested and failed. Twenty years. Five assets. Four horizons. Zero survivors.

If you use PSAR or ROC:

These actively reduce information. PSAR produces negative returns on average when followed directionally.

Limitations and Caveats

Transaction costs: All edges reported are pre-cost. The 1-day +54 bps MFI signal on SPY looks spectacular, but it fires 76 times in 20 years — roughly 4 times per year. At high enough signal frequency, bid-ask spreads and commissions would erode smaller edges.

Regime sensitivity: This study uses 2004–2024, a period dominated by quantitative easing, secular bull markets (2009–2021), and two significant crashes (2008, 2020). Mean reversion works because markets recover. In a regime where they don’t recover (prolonged deflation, structural bear markets), oversold signals could generate extended losses. TLT’s negative RSI result is a reminder that these effects are asset-specific.

Lookahead bias: All forward returns are computed from signal day + 1 to prevent lookahead. Indicators use only past data by construction.

Single-indicator analysis: We test indicators in isolation. Real trading systems combine signals. A Keltner lower touch with OBV confirming might have higher d than either alone — that’s a follow-on study.

Survivorship: SPY, QQQ, GLD, TLT, and EEM are all surviving, liquid instruments. The regime that made them behave this way may not apply to individual stocks.

Conclusion

Technical indicators are not equally valid. This study draws a sharp line between two classes:

Statistically real: RSI, Keltner Channel, Williams %R, Stochastic, OBV, EMA (all in oversold/divergence modes)

Statistical noise: SMA crossovers (Golden Cross, Death Cross), Parabolic SAR, Rate of Change

The mechanism behind the real signals is consistent: mean reversion after oversold extremes. Markets overshoot. Panic creates dislocations. Prices return toward fair value — particularly in liquid, diversified instruments. An indicator that quantifies how far prices have strayed from normal is measuring something real.

An indicator that measures whether the 50-day line crossed the 200-day line is measuring a geometric artifact of the price path, not a causal driver of future returns. The evidence is now quantitative and unambiguous.

Data and Code

All code, raw results (856 test rows with t-statistics, permutation p-values, Cohen’s d, and Bonferroni flags), and chart generation scripts are available at the ThoughtEngine research repository.

Tests used: Welch’s t-test (scipy.stats.ttest_ind), 5,000–10,000 permutation bootstraps (vectorized numpy), Bonferroni correction (α = 0.05 / n_tests), Cohen’s d (pooled standard deviation formula). Data: yfinance adjusted close prices.

Published May 2026 · REIGraph Research

Methodology note: This study is not investment advice. Effect sizes of d = 0.2–0.4 represent statistically real but practically modest edges. Position sizing, risk management, and portfolio construction determine whether statistical edge translates to trading profit.