Monte Carlo Tournament Simulation — March Machine Learning Mania 2026
50K
Simulations
85
Teams in Field
22.2%
Top Champion Prob (Duke)
0.53651
Model LOSO Log-Loss
1. Overview
The ensemble model produces pairwise win probabilities for any two teams: given Team A vs Team B, it outputs $P(A\ \text{wins})$. But the competition cares about championship probabilities — the chance that each team wins all six games and takes the title.
To convert pairwise probabilities into championship probabilities, we simulate the entire tournament bracket 50,000 times using Monte Carlo methods. Each simulation draws random outcomes for every game according to the model's predicted probabilities, producing one complete bracket. Aggregating across all simulations gives us the probability of each team reaching any given round, including the championship.
Why Monte Carlo? Analytical computation of championship probability requires integrating over all possible bracket paths (2^63 possible brackets). Monte Carlo simulation approximates this integral with controlled statistical error: for a team with 20% championship probability, the 90% CI from 50,000 simulations is roughly ±0.3 percentage points.
2. How the Simulation Works
1
Build the Field — Identify tournament teams.
Before Selection Sunday: use bracketology consensus from ESPN, CBS, RotoWire, and TeamRankings. All teams with projected seeds and feature coverage are included (85 teams in current field). Since multiple teams may share a seed line (e.g., 6 teams projected at seed 4), each simulation randomly selects 4 per seed line to fill the 64-team bracket.
After Selection Sunday: use official Kaggle seeds from MNCAATourneySeeds.csv (exactly 68 teams: 64 + 4 play-in games).
2
Assign Seeds & Regions — Teams are distributed across four regions by seed line.
Seeds come from bracketology consensus (pre-Selection Sunday) or official Kaggle seeds (post-Selection Sunday).
Default mode: within each seed line, teams are randomly shuffled and assigned to regions. When a seed line has more than 4 teams (common pre-Selection Sunday), only 4 are selected per simulation — the rest are excluded from that draw.
Fixed-region mode (--fixed-regions): uses bracketology consensus region assignments with constraint satisfaction to respect projected placements.
3
Compute Pairwise Probabilities — For every possible matchup (A vs B), the ensemble model predicts $P(A\ \text{wins})$.
Reading the table: Each column shows the probability of a team reaching that round. R64 reflects the probability of being selected into the 64-team bracket. Before Selection Sunday, bracketology projections may assign more than 4 teams to a seed line (e.g., 6 teams projected at seed 4). Each simulation randomly selects 4 of those teams, so each gets roughly 4/N probability of inclusion. All 1-seeds show 100% because exactly 4 teams are projected there. A team like Gonzaga at ~67% R64 means it is one of 6 projected 4-seeds competing for 4 slots.
4. Championship Probabilities
The 90% confidence interval is computed via parametric bootstrap: we draw 10,000 samples from $\text{Binomial}(50000,\ \hat{p})$ where $\hat{p}$ is the observed championship fraction, and report the 5th–95th percentile range.
The intervals are narrow because 50,000 simulations provide substantial precision. For Duke at 21.0%, the 90% CI is [21.9%, 22.5%] — a range of just 0.6 percentage points.
Interpreting the probabilities: These are model-implied probabilities, not true frequencies. They reflect what the ensemble model believes, conditional on the bracket structure. If the model is miscalibrated, these probabilities will be systematically off.
5. The Elimination Funnel
Even a dominant team faces compounding uncertainty across six elimination rounds. A team that wins each game with 80% probability has only:
This explains why no team exceeds ~21% championship probability even when they are heavy favorites in each individual game. The funnel steepens in later rounds because the surviving opponents are themselves strong — a 1-seed's Elite Eight opponent is typically a 2- or 3-seed, not a 15-seed.
Note: The formula $P(\text{champion}) \approx \prod P(\text{win round}\ i)$ is approximate because the opponent in each round depends on the bracket path. The Monte Carlo simulation handles this correctly by drawing specific opponents each round based on who advanced.
6. Model vs Market
We compare model championship probabilities to prediction market prices from Kalshi, Polymarket, and a sportsbook sharp consensus. The scatter plot reveals a systematic pattern:
Model systematically higher on defensive/efficiency teams (Houston, Illinois, Michigan St) — the model rewards KenPom metrics that markets may underweight.
Markets systematically higher on brand-name/momentum teams (Michigan, Kansas, Iowa St) — markets may overweight recent narratives and fan attention.
Top 5 "Value Picks" (Model > Market)
Team
Seed
Model
Market
Divergence
Arizona
1
18.8%
14.4%
+4.4pp
Duke
1
22.5%
19.5%
+3.0pp
Houston
2
9.8%
7.8%
+2.0pp
Illinois
2
6.4%
4.6%
+1.7pp
Florida
2
10.1%
9.1%
+1.1pp
Top 5 "Market Favorites" (Market > Model)
Team
Seed
Model
Market
Divergence
Michigan
1
15.6%
19.5%
-3.9pp
Iowa St
3
1.7%
3.7%
-2.1pp
Kansas
4
0.5%
1.9%
-1.4pp
Arkansas
5
0.4%
1.5%
-1.1pp
St John's
4
0.5%
1.5%
-0.9pp
Systematic bias pattern: If the model has a structural bias, it's toward rewarding defensive efficiency and underweighting market narratives. This could be correct on average, but if wrong, the errors will be correlated — Houston, Illinois, and Michigan St all fail together, while Michigan, Kansas, and Iowa St all outperform together.
Important: Model predictions are not blended with market data for the primary competition submission. Markets are used only for post-hoc comparison and identifying potential value. An optional secondary submission uses 70% model + 30% market-implied pairwise probabilities.
7. Rating Perturbation (Experimental)
An experimental mode adds Gaussian noise to each team's AdjEM rating before computing pairwise probabilities in each simulation, modeling the inherent uncertainty in team ratings:
The effect is to compress the championship probability distribution:
Favorites' championship probabilities decrease (they sometimes draw unlucky perturbations, making them appear weaker than they are).
Underdogs' championship probabilities increase (they sometimes draw lucky perturbations, making them appear stronger).
With $\sigma = 2.5$ (roughly the standard error of KenPom AdjEM): in ~45% of simulations, the #2 team's perturbed AdjEM exceeds #1's.
This behavior is closer to how prediction markets price teams — markets implicitly account for model uncertainty, which is why they tend to be flatter than raw model probabilities. However, perturbation also introduces noise that may not reflect real-world uncertainty accurately.
Currently NOT used in the primary submission. The standard simulation uses fixed team ratings. Rating perturbation is tracked as an experimental comparison mode.
8. Limitations & Caveats
Pre-Selection Sunday: seed assignments are estimates from bracketology consensus. The actual bracket will differ — some projected teams will miss the field, and seed assignments will shift. Region assignments matter enormously: who you face in the Elite Eight depends on your region.
Independence assumption: the simulation treats each game as independent. In reality, fatigue from a close overtime game affects the next round, momentum from a blowout win carries forward, and injuries sustained during the tournament change a team's probability. None of this is modeled.
Model-dependent: championship probabilities are only as good as the pairwise model. If the ensemble is systematically miscalibrated (e.g., overconfident on favorites), the simulation amplifies the miscalibration through six compounding rounds. A 2% bias in each game compounds to a ~12% bias in championship probability.
Statistical precision: 50,000 simulations provide reasonably tight confidence intervals. For a team with 20% championship probability, the standard error is $\sqrt{0.2 \times 0.8 / 50000} \approx 0.18\%$, giving a 90% CI of approximately ±0.3%. For rare outcomes (e.g., a 15-seed champion at 0.01%), the relative uncertainty is much larger.
Bracket structure sensitivity: a team's championship probability depends heavily on which region they are placed in. Two teams in the same region cannot both reach the Final Four. The pre-Selection Sunday simulation uses estimated region assignments that will change.