World Cup Oracle
AI vs Polymarket — predicting the 2026 FIFA World Cup with a Time Series Foundation Model ensemble. 48 teams, 104 matches, $525M market.
Context
Polymarket runs a ~$525M market on the 2026 FIFA World Cup — real money pricing real outcomes. On a structured tournament with known teams, brackets, and rules, this is one of the most honest benchmarks a forecasting system can face.
worldcup-oracle asks: can a Time Series Foundation Model ensemble, combined with classical sports-forecasting primitives, find systematic mispricings on a $525M market?
Approach
Hybrid model stack
- TSFM ensemble — Chronos-2, TimesFM 2.5, FlowState. Each forecasts attacking / defensive strength trajectories from match history. Outputs are averaged.
- Club Elo baseline — decades-old rating system that is genuinely hard to beat as a prior. Treated as a fourth model in the ensemble for sanity checks.
- Poisson goal model — maps team-strength pairs into joint goal distributions per fixture.
- Monte Carlo — 50K simulations of the full 104-match tournament to get win / advance / title probabilities.
Honest edge selection
An edge is only flagged STRONG when both conditions hold:
- Absolute edge vs Polymarket > 5 percentage points
- All 4 models (Chronos-2, TimesFM-2.5, FlowState, Elo) agree on direction
This filters out “one weird model” calls.
Backtest validation
53 tests across past World Cups, validating both the goal model and the Monte Carlo pipeline before running predictions against the live market.
Notable Edges (April 2026)
The model is materially long Spain and short Brazil / England / Portugal vs the market:
- Spain — 32.2% AI vs 16.0% market = +16.2pp STRONG BUY (all 4 models agree)
- Brazil — 3.0% vs 8.6% = −5.6pp STRONG SELL
- England — 6.0% vs 11.3% = −5.4pp STRONG SELL
- Portugal — 1.9% vs 7.0% = −5.2pp STRONG SELL
These will get resolved or proved wrong by July 19, 2026 — which is the point. The model commits in writing, in advance, against a real market.
What I’m Learning
The Spain edge is the interesting one. Polymarket prices Spain at the “recently-good-team” level; the model reads their underlying attacking trajectory plus Elo as significantly stronger. If the market is right and the model is wrong, it’s because tournament dynamics (one bad match knocks you out) don’t reward season-long form. If the model is right, it’s because the market hasn’t fully priced in trajectory.
Either answer teaches me something concrete about where this kind of hybrid forecasting works and where it doesn’t.
Tech Stack & Links
Stack: Python · Chronos-2 · TimesFM 2.5 · FlowState · Club Elo · Poisson · Monte Carlo (50K runs)
Sister project: UEFA Champions League Oracle — same modeling team, different tournament, different conclusions about what TSFMs add.