Time-Series Momentum / Reproducible Research / Execution Realism

Time-Series Momentum: A Reproducible Audit from Specification to Integer Contracts

Research completed / in-sample audit

Can an individual investor improve on holding an equity index without giving it up? This audit answers with a volatility-targeted time-series-momentum overlay and measures what the descent from specification to order ticket actually costs — layer by layer, with every number injected from the repository and every failure disclosed.

Full materials

Paper (PDF) Slides (PDF)

Research Question

Can an individual investor systematically improve on holding an equity index — without abandoning the index position? Most retail attempts to beat the index fail at one of three gates: the strategy is overfit; the strategy is real but the chosen instrument leaks the edge through costs; or both are fine and the account is simply too small to hold the prescribed positions in integer contracts. The literature covers the first gate extensively, the second occasionally, and the third almost never. This project walks all three — on one strategy (a volatility-targeted time-series-momentum overlay), with one audit chain.

Methodology — discipline first

Frozen specification. The signal definition, 3/6/12-month look-backs, a 60-day volatility window, a 10% per-asset volatility target, a 1.50 leverage cap, costs (10 bp one-way; 50 bp annual borrow on ETF shorts), and the single 2015-01-01 sample split were all frozen from literature priors before any estimation. The spec forbids re-optimization on this data.
Look-ahead prevention as a unit test. Not a claim — a test: artificially doubling all future returns must leave current signals bit-for-bit unchanged.
One source of truth. Every number in the paper is injected by the build system from the repository’s output files; the build fails on any undefined macro and reproduces byte-identically across runs.

Results — friction, layer by layer

The same strategy is measured at three altitudes:

Strategy (ETF total-return data, 2003-08 → 2026-05, 274 months). The long/short sleeve nets an excess Sharpe of 0.57 (Newey–West t = 2.86, bootstrap 95% CI [0.19, 0.96]), with −0.04 correlation to SPY and gains in 6 of 6 crisis windows (GFC: +16.1% versus SPY −46.0%). The long-flat sleeve reaches 0.85. A 108-cell robustness grid is 100% positive (minimum Sharpe 0.36).
Instrument (CME futures). Futures replications track their ETF counterparts with strict-pair correlations of 0.90–0.97; a return-stacking overlay on a full SPY position improves risk-adjusted performance monotonically in the overlay multiple within the evaluated grid.
Execution (integer contracts at a representative $500K account). Annualized tracking error against the frozen model is 327 bp, of which only 110 bp is rounding and commissions; the larger 355 bp is instrument basis between micro contracts and their full-size parents. Tracking error, not alpha decay, is where the retail implementation story is decided.

Honest disclosures

All results are in-sample. Freezing parameters from literature priors removes one form of overfitting but is not out-of-sample validation. The strategy’s post-2015 softening is shown in full (the long/short Sharpe falls from 0.73 to 0.40 across the frozen split). Several pre-registered acceptance criteria FAILED and are reported as such. The project also reports its own mistakes as results: a 100× contract-value mis-scaling that was found, corrected, and re-run; two pre-registered predictions the data later falsified; and a cost-control incident in the data-acquisition pipeline. This is an internal research package and a candidate for controlled paper trading — it is not live-trading-ready, and nothing here is investment advice.

Takeaway

The descent from paper to order ticket cost real performance at every step — and measuring those steps, rather than asserting the top-line number, is the contribution. In systematic investing, the audit chain is the product.