Does the American heat move the 2026 World Cup?

The 2026 World Cup is the first played across the summer heat of the United States, and the intuition writes itself: brutal midday kickoffs in Dallas and Monterrey must be changing the football. I decided to actually check, as a side-study inside my World Cup Oracle forecasting project, against a simple standard — not “does it feel hot on TV”, but “does it show up in the data, and does it move a result you could bet on?”

The answer is the interesting kind of no.

Humid heat has a real, measurable effect on players’ bodies and on how teams play — but that “fitness tax” is absorbed before it reaches the scoreboard.

Every claim below is measured on the same 81 completed matches (group stage + Round of 32, through July 2), with the physical numbers coming from FIFA’s own official post-match reports. It’s an honest, sample-limited, exploratory study — I’ll flag every place where the sample is thin and the signal is soft.

The data, and how I kept myself honest

The point of a study like this is not to find something — with 47 significance tests you always will. The point is to survive scrutiny. So the sourcing is deliberately boring and the methods are conservative.

What	Source
Results · venues · kickoff time	Project cache from ESPN’s public scoreboard
Weather (temp, humidity, apparent temp, wet-bulb, solar radiation, elevation)	Open-Meteo archive & forecast APIs, sampled at each stadium’s coordinates for the match’s own 2-hour window
Running & sprint distance	FIFA Training Centre official post-match reports (PMSR) — team-level, 71 group-stage matches parsed
Player-level sprint count & top speed	Same PMSR reports, OCR-extracted and double-validated (953 players, 58 full teams)
Half-time goal timing	ESPN summary key-events, cross-validated 79/79
Model pre-match win prob · Brier	The Oracle’s own evaluation logs, used to subtract out team strength

Three habits do the heavy lifting. Skill-adjusted partial correlation — I regress both the outcome and the temperature on the model’s pre-match favorite probability and correlate the residuals, so “the strong team won” can’t masquerade as “the weather did it.” Permutation tests (20,000 reshuffles) instead of textbook p-values, because the buckets are small. And Bonferroni correction across all 47 tests — a brutal bar that, spoiler, exactly one result clears.

The one thing heat clearly does: it drains the legs

Goals are rare and noisy; running distance is a continuous number every match, which makes it the most sensitive probe of fatigue I have. And it moves.

Apparent temp × running distance1 dot = 1 match · n=71

open-airroofindoor A/C (control)

All venues: ρ=−0.40 (p<0.001, n=71). Open-air loses 0.78 km per °C; inside the A/C venues outdoor temp is uncorrelated — the negative control.

✱ A/C venues (Dallas/Houston/Atlanta) = negative control. With wet-bulb temperature the open-air signal is stronger still: ρ=−0.55 (p<0.001).

This is the spine of the whole study. In open-air stadiums, hotter (specifically more humid-hot, measured as wet-bulb temperature) means less running — ρ = −0.55, p < 0.001, about −0.78 km per °C for the two teams combined. It’s the single result that survives Bonferroni correction across all 47 tests.

The reason I believe it’s causal and not a coincidence is the negative control. Three venues — Dallas, Houston, Atlanta — are climate-controlled indoor stadiums. Inside them, the outdoor temperature should be meaningless, and it is: ρ = −0.17, p = 0.57. The effect appears only where players are actually exposed to the weather. (It also survives an altitude check — dropping Mexico City and Guadalajara entirely makes the effect stronger, not weaker.)

How the body pays it: fewer sprints, same top speed

I pushed the FIFA reports down to the individual player level — 953 players, OCR’d and validated row-by-row — to ask what kind of running the heat takes.

Heat cuts the count, not the speed953 players

−.50+.5

ρ · p

Frequency / volume — falls

Sprint count

−0.49p=0.002

Zone-5 sprint dist

−0.15p=0.38

Peak speed — intact

Top speed (max)

+0.07p=0.67

Top speed (mean)

+0.16p=0.33

Heat significantly cuts how often players sprint (count ρ=−0.49, p=0.002), while peak speed is intact (ρ=+0.07) — it caps repeated output, not the single-effort ceiling.

The split is clean and physiologically sensible. Heat cuts how often players sprint (sprint count ρ = −0.49, p = 0.002) while leaving their top speed completely intact (+0.07, p = 0.67). It caps repeated high-intensity output, not the single-effort ceiling — you can still hit your max sprint, you just do it less often. This independently reproduces the “top speed unaffected by heat” finding others pulled from FIFA data, and supplies the half they were missing: the count really does fall.

How teams pay it: they pass instead of run

Here’s the twist that explains why the scoreboard barely notices. Losing running doesn’t mean losing the attack — teams just re-route around it.

In open-air heat, passes go up (ρ = +0.31, p = 0.036), completed line-breaks go up (ρ = +0.33, p = 0.024), and expected goals stay flat. Teams substitute passing for running to keep advancing the ball. The same three temperature bands, measured four different ways, make the asymmetry obvious:

Three bands, four measuresdot=mean · whisker=95%CI

Cool19.6°·n21

228.0

Mild24.3°·n27

224.2

Hot29.7°·n23

221.1

Combined km/match — monotone decline, tight CIs. ρ=−0.31(p=0.009) significant.

Only running falls monotonically and significantly; upsets and goals rattle inside overlapping intervals — signal in the body, none on the scoreboard.

Only running distance falls monotonically and significantly. Toggle to upsets or goals and the bars just rattle around inside overlapping confidence intervals. The body layer has a signal; the box score doesn’t.

The dog that didn’t bark: no second-half collapse

If heat drained players late, you’d expect goals — and mistakes — to pile up in the second half of hot matches. It’s the most intuitive version of the story, and it’s simply not there.

Does heat push goals later?2H share overall 56.7%

Cool19.3°·n23

65%

Mild24.3°·n28

50%

Hot30.1°·n30

56%

The most back-loaded band is the coolest, not the hottest — the opposite of a heat-driven collapse. ρ=−0.14 (p=0.22), n.s.

Football is naturally back-loaded (56.7% of goals came after half-time, heat or not). But the most back-loaded band is the coolest one, not the hottest — the exact opposite of a heat-driven collapse. Whatever heat is doing, it isn’t creating late-game chaos.

So where does that leave the whole board of hypotheses? Here is every correlation I ran, on one axis:

Every signal on one ρ axisfilled = p<0.05

−.50+.5

Body · heat taxes endurance

Wet-bulb → run

−0.55

Temp → run · open

−0.45

Temp → run · all

−0.40

Sprint count

−0.49

Zone-4 sprint

−0.34

Style · pass instead of run

Passes ↑

+0.31

Line-breaks ↑

+0.33

Controls that should stay flat

Indoor A/C ctrl

−0.17

Top speed

+0.07

xG · open

+0.15

The bridge · marginal

Heat × strength

−0.32

Scoreboard · all null

Upset rate

+0.15

Total goals

−0.01

2nd-half goals

−0.14

Fouls

−0.11

Every dot that lights up is in the body or style layer; the whole scoreboard layer stays dark. Heat changed the bodies and tactics — not the results.

The shape tells the story better than any paragraph: the body and style layers light up, and the entire scoreboard layer stays dark. Focus “Scoreboard” and every dot goes gray.

The one bridge toward the scoreboard

There is exactly one place where the body effect starts to lean on a result, and it’s the finding I’ll be watching hardest as the sample grows.

The favorite's edge evaporates in heatopen-air · km/team

Favorite

113.0 n39

Underdog

111.6 n39

Favorite's edge+1.4 km

In cool matches the favorite out-runs the underdog by +1.4 km; heat erases and reverses it to −0.5 km. Interaction: 0.32 km/°C more decline for favorites (perm p=0.082, n=94) — marginal, the only bridge to the upset layer.

In cool matches, the model’s favorite out-runs the underdog by +1.4 km — that extra running is part of how strong teams impose themselves. In the heat, that edge flattens and reverses to −0.5 km. It lines up perfectly in direction with the faint “more upsets in the heat” ripple (p ≈ 0.20) — for the first time giving that soft signal a concrete mechanical candidate. But the interaction itself is only marginal (permutation p = 0.08, and the hot bucket is just 8 + 8 team-matches). It’s a lead, not a result.

Putting it together

The full causal chain (tap a link)● strong ◑ marg ○ null

→

●Open-air run ↓ρ=−0.55·p<0.001·n47

The study's strongest signal: −0.78 km/°C, and it vanishes at indoor A/C venues (the negative control).

The early links (body → tactics) are lit and measured; at favorite's edge and upsets the chain turns marginal, and the scoreboard link goes dark. The body tax is real — absorbed symmetrically before the score.

Read left to right, the chain is fully lit through the body and tactical links, turns marginal at “favorite’s edge” and “upsets,” and goes dark at the scoreboard. Both teams slow down together and both switch to a more economical passing style together — so the strength ordering and the goal structure barely move. Your intuition that the American heat matters is confirmed in the players’ bodies and in the teams’ tactics. It just never becomes one side collapsing.

What I’m not claiming

This is not causal proof. One primary hypothesis was pre-registered; everything else is exploratory correlation. Read the soft signals (p ≈ 0.2) as directions to watch, not conclusions.
The sample is small. 81 matches, ~8 team-matches in the hottest strength bucket. The upset lead and the heat × strength interaction only get resolved after the full 104 matches — I’ll re-run then.
Weather is not a factor in the Oracle’s official predictions. The clean null is the product. The live dashboard has a weather tab showing per-match kickoff conditions and a clearly-labeled experimental adjustment (shrunk to −0.71 pp/°C, capped ±3pp) that auto-zeroes if the signal dies as more matches come in.

The most useful thing a study can do is find a real, strong, well-identified effect — ρ = −0.55, with a working negative control — that still doesn’t predict the outcome. “Statistically significant” and “moves the money” are different questions, and it’s worth building the discipline to tell them apart.

Reproducible from the worldcup-oracle repo: research/weather_effect/ (REPORT.md + scripts 01–12). Charts on this page read straight from the study’s own stats*.json and matches_deep.csv — a snapshot at 81 matches; they’ll shift as the tournament finishes. Sister piece: Forecasting the new Champions League.