Selection bias in 0DTE breakout backtests: how full-population validation overturned four positive drafts

Four drafts of the Donchian 0DTE paper reported +$80K, +$50K, and +$32K per year. Full-population tick validation reported −$107K. The gap was two compounding biases — and the larger one was a selection filter that looked like engineering hygiene.

The headline sequence

Four drafts of the same strategy paper, each “more careful” than the last, reported these annualized results for the best exit policy:

v5 changed exactly one thing: it stopped sampling. Every trade in every exit-policy ledger inside the Databento coverage window was downloaded and measured directly — 4,177 deduplicated OPRA queries, about $280 of data spend, real_exec_pnl = exit_bid − entry_ask per trade, no extrapolation.

The filter that looked like hygiene

The v3/v4 calibration pipeline required “valid quotes at exit”: finite bid/ask, no NaN, real_mid_exit ≥ 0. That reads like ordinary data hygiene. It is actually a selection rule.

A long 0DTE option that expires out-of-the-money has no dealer bid at the close — the exit bid is NaN precisely because the position died. For the best exit policy, those expired-worthless trades were 191 of 701 validatable out-of-sample trades (27%), and every single one was a 100% loss of entry premium. The strict filter removed them from the calibration sample, so the BS-to-real ratio was estimated on survivors only.

Decomposing the −4,639-point gap between v4’s prediction and v5’s measurement: roughly 36% came from the expired-worthless trades the filter deleted, and 64% from stratified over-weighting — the “top win” tier was sampled at about six times its population frequency, dragging the calibration ratio upward.

What full-population validation means in practice

The lesson

Any filter phrased as “require valid data at exit” is conditioned on the outcome of the trade. In options backtests the missing-data mechanism is not random — worthless expiry, panic spreads, and halted quotes all correlate with losing. If a validation step can only remove trades, ask what the removed trades have in common before trusting what survives.

Full detail is in the negative-result paper on the research page.