Figure: Anytime-Valid Comparison of Major League Baseball Forecasters (2010–2019)

(Scoring Rule = Brier. P = FiveThirtyEight; Q = Vegas-Odds.com.)


Abstract

Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post hoc, avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, we present a rigorous answer to this question by designing novel sequential inference procedures for estimating the time-varying difference in forecast scores. To do this, we employ confidence sequences (CS), which are sequences of confidence intervals that can be continuously monitored and are valid at arbitrary data-dependent stopping times (“anytime-valid”). The widths of our CSs are adaptive to the underlying variance of the score differences. Underlying their construction is a game-theoretic statistical framework in which we further identify e-processes and p-processes for sequentially testing a weak null hypothesis—whether one forecaster outperforms another on average (rather than always). Our methods do not make distributional assumptions on the forecasts or outcomes; our main theorems apply to any bounded scores, and we later provide alternative methods for unbounded scores. We empirically validate our approaches by comparing real-world baseball and weather forecasters.


Note

Poster Award (Runner-up with $10K research grant), Citadel Securities Inaugural Ph.D. Summit, 2022


Citation

Choe, Y. J., & Ramdas, A. (2023). Comparing sequential forecasters. Operations Research, 72(4), 1368-1387. https://doi.org/10.1287/opre.2021.0792

@article{choe2023comparing,
  title={Comparing Sequential Forecasters},
  author={Choe, Yo Joong and Ramdas, Aaditya},
  journal={Operations Research},
  volume={72},
  number={4},
  pages={1368--1387},
  year={2023},
  publisher={INFORMS}
}