Bracket Simulation & Championship Probabilities

Monte Carlo Tournament Simulation — March Machine Learning Mania 2026

50K

Simulations

Teams in Field

22.2%

Top Champion Prob (Duke)

0.53651

Model LOSO Log-Loss

1. Overview

The ensemble model produces pairwise win probabilities for any two teams: given Team A vs Team B, it outputs $P(A\ \text{wins})$. But the competition cares about championship probabilities — the chance that each team wins all six games and takes the title.

To convert pairwise probabilities into championship probabilities, we simulate the entire tournament bracket 50,000 times using Monte Carlo methods. Each simulation draws random outcomes for every game according to the model's predicted probabilities, producing one complete bracket. Aggregating across all simulations gives us the probability of each team reaching any given round, including the championship.

Why Monte Carlo? Analytical computation of championship probability requires integrating over all possible bracket paths (2^63 possible brackets). Monte Carlo simulation approximates this integral with controlled statistical error: for a team with 20% championship probability, the 90% CI from 50,000 simulations is roughly ±0.3 percentage points.

2. How the Simulation Works

Build the Field — Identify tournament teams.

Before Selection Sunday: use bracketology consensus from ESPN, CBS, RotoWire, and TeamRankings. All teams with projected seeds and feature coverage are included (85 teams in current field). Since multiple teams may share a seed line (e.g., 6 teams projected at seed 4), each simulation randomly selects 4 per seed line to fill the 64-team bracket.
After Selection Sunday: use official Kaggle seeds from MNCAATourneySeeds.csv (exactly 68 teams: 64 + 4 play-in games).

Assign Seeds & Regions — Teams are distributed across four regions by seed line.

Seeds come from bracketology consensus (pre-Selection Sunday) or official Kaggle seeds (post-Selection Sunday).
Default mode: within each seed line, teams are randomly shuffled and assigned to regions. When a seed line has more than 4 teams (common pre-Selection Sunday), only 4 are selected per simulation — the rest are excluded from that draw.
Fixed-region mode (--fixed-regions): uses bracketology consensus region assignments with constraint satisfaction to respect projected placements.

Compute Pairwise Probabilities — For every possible matchup (A vs B), the ensemble model predicts $P(A\ \text{wins})$.

$$P = 0.70 \cdot P_{\text{LightGBM}}(\mathbf{x}_\Delta) + 0.30 \cdot P_{\text{LogReg}}(\mathbf{x}_\Delta) \qquad \text{clipped to } [0.01, 0.99]$$

All probabilities are precomputed and cached before the simulation loop begins.

Simulate Bracket (×50,000) — Each simulation plays out the full tournament.

Round of 64: standard NCAA bracket matchups: 1v16, 8v9, 5v12, 4v13, 6v11, 3v14, 7v10, 2v15
Each game: draw a random number $U \sim \text{Uniform}(0,1)$. If $U < P(A\ \text{wins})$, Team A advances; otherwise Team B.
4 rounds within each region produce a regional champion.
Final Four: 2 semifinal games + 1 championship game.
Record which team wins the title.

Aggregate Results — Count outcomes across all simulations.

$$P(\text{champion}) = \frac{\text{# simulations won}}{50{,}000} \qquad P(\text{reach round}\ r) = \frac{\text{# times reached round}\ r}{50{,}000}$$

3. Round-by-Round Results

Round-by-round advancement probabilities

Team	Seed	R64	R32	S16	E8	F4	Final	Champ
Duke	1	100%	96%	85%	71%	52%	35%	22.2%
Arizona	1	100%	96%	86%	70%	49%	31%	18.5%
Michigan	1	100%	95%	83%	67%	45%	27%	15.3%
Florida	2	100%	94%	79%	60%	33%	19%	9.9%
Houston	2	100%	96%	81%	62%	34%	19%	9.5%
Illinois	2	100%	95%	78%	55%	28%	14%	6.2%
Connecticut	1	100%	94%	75%	50%	30%	14%	5.8%
Michigan St	2	100%	94%	68%	42%	17%	7%	2.3%
Purdue	3	100%	92%	67%	33%	13%	5%	1.8%
Iowa St	3	100%	89%	63%	28%	11%	4%	1.6%
Gonzaga	4	67%	58%	44%	16%	9%	4%	1.5%
Alabama	4	66%	58%	39%	12%	6%	2%	0.6%
Virginia	3	100%	94%	56%	20%	6%	2%	0.5%
St John's	4	67%	59%	37%	12%	5%	2%	0.5%
Louisville	6	100%	78%	48%	20%	6%	2%	0.5%
Kansas	4	67%	59%	39%	11%	5%	2%	0.5%
Nebraska	4	67%	58%	36%	12%	5%	2%	0.5%
Tennessee	5	100%	85%	43%	12%	5%	2%	0.4%
Arkansas	5	100%	82%	39%	11%	4%	1%	0.4%
Vanderbilt	5	100%	85%	41%	12%	5%	1%	0.4%

Reading the table: Each column shows the probability of a team reaching that round. R64 reflects the probability of being selected into the 64-team bracket. Before Selection Sunday, bracketology projections may assign more than 4 teams to a seed line (e.g., 6 teams projected at seed 4). Each simulation randomly selects 4 of those teams, so each gets roughly 4/N probability of inclusion. All 1-seeds show 100% because exactly 4 teams are projected there. A team like Gonzaga at ~67% R64 means it is one of 6 projected 4-seeds competing for 4 slots.

4. Championship Probabilities

Championship probabilities with confidence intervals

The 90% confidence interval is computed via parametric bootstrap: we draw 10,000 samples from $\text{Binomial}(50000,\ \hat{p})$ where $\hat{p}$ is the observed championship fraction, and report the 5th–95th percentile range.

The intervals are narrow because 50,000 simulations provide substantial precision. For Duke at 21.0%, the 90% CI is [21.9%, 22.5%] — a range of just 0.6 percentage points.

Interpreting the probabilities: These are model-implied probabilities, not true frequencies. They reflect what the ensemble model believes, conditional on the bracket structure. If the model is miscalibrated, these probabilities will be systematically off.

5. The Elimination Funnel

Even a dominant team faces compounding uncertainty across six elimination rounds. A team that wins each game with 80% probability has only:

$$P(\text{champion}) \approx \prod_{i=1}^{6} P(\text{win round}\ i) = 0.80^6 = 0.262 = 26.2\%$$

This explains why no team exceeds ~21% championship probability even when they are heavy favorites in each individual game. The funnel steepens in later rounds because the surviving opponents are themselves strong — a 1-seed's Elite Eight opponent is typically a 2- or 3-seed, not a 15-seed.

Note: The formula $P(\text{champion}) \approx \prod P(\text{win round}\ i)$ is approximate because the opponent in each round depends on the bracket path. The Monte Carlo simulation handles this correctly by drawing specific opponents each round based on who advanced.

6. Model vs Market

$Model vs market scatter plot$

We compare model championship probabilities to prediction market prices from Kalshi, Polymarket, and a sportsbook sharp consensus. The scatter plot reveals a systematic pattern:

Model systematically higher on defensive/efficiency teams (Houston, Illinois, Michigan St) — the model rewards KenPom metrics that markets may underweight.
Markets systematically higher on brand-name/momentum teams (Michigan, Kansas, Iowa St) — markets may overweight recent narratives and fan attention.

Top 5 "Value Picks" (Model > Market)

Team	Seed	Model	Market	Divergence
Arizona	1	18.8%	14.4%	+4.4pp
Duke	1	22.5%	19.5%	+3.0pp
Houston	2	9.8%	7.8%	+2.0pp
Illinois	2	6.4%	4.6%	+1.7pp
Florida	2	10.1%	9.1%	+1.1pp

Top 5 "Market Favorites" (Market > Model)

Team	Seed	Model	Market	Divergence
Michigan	1	15.6%	19.5%	-3.9pp
Iowa St	3	1.7%	3.7%	-2.1pp
Kansas	4	0.5%	1.9%	-1.4pp
Arkansas	5	0.4%	1.5%	-1.1pp
St John's	4	0.5%	1.5%	-0.9pp

Systematic bias pattern: If the model has a structural bias, it's toward rewarding defensive efficiency and underweighting market narratives. This could be correct on average, but if wrong, the errors will be correlated — Houston, Illinois, and Michigan St all fail together, while Michigan, Kansas, and Iowa St all outperform together.

Important: Model predictions are not blended with market data for the primary competition submission. Markets are used only for post-hoc comparison and identifying potential value. An optional secondary submission uses 70% model + 30% market-implied pairwise probabilities.

7. Rating Perturbation (Experimental)

An experimental mode adds Gaussian noise to each team's AdjEM rating before computing pairwise probabilities in each simulation, modeling the inherent uncertainty in team ratings:

$$\text{AdjEM}_{\text{perturbed}} = \text{AdjEM} + \epsilon, \qquad \epsilon \sim N(0, \sigma^2)$$

The effect is to compress the championship probability distribution:

Favorites' championship probabilities decrease (they sometimes draw unlucky perturbations, making them appear weaker than they are).
Underdogs' championship probabilities increase (they sometimes draw lucky perturbations, making them appear stronger).
With $\sigma = 2.5$ (roughly the standard error of KenPom AdjEM): in ~45% of simulations, the #2 team's perturbed AdjEM exceeds #1's.

This behavior is closer to how prediction markets price teams — markets implicitly account for model uncertainty, which is why they tend to be flatter than raw model probabilities. However, perturbation also introduces noise that may not reflect real-world uncertainty accurately.

Currently NOT used in the primary submission. The standard simulation uses fixed team ratings. Rating perturbation is tracked as an experimental comparison mode.

8. Limitations & Caveats

Pre-Selection Sunday: seed assignments are estimates from bracketology consensus. The actual bracket will differ — some projected teams will miss the field, and seed assignments will shift. Region assignments matter enormously: who you face in the Elite Eight depends on your region.
Independence assumption: the simulation treats each game as independent. In reality, fatigue from a close overtime game affects the next round, momentum from a blowout win carries forward, and injuries sustained during the tournament change a team's probability. None of this is modeled.
Model-dependent: championship probabilities are only as good as the pairwise model. If the ensemble is systematically miscalibrated (e.g., overconfident on favorites), the simulation amplifies the miscalibration through six compounding rounds. A 2% bias in each game compounds to a ~12% bias in championship probability.
Statistical precision: 50,000 simulations provide reasonably tight confidence intervals. For a team with 20% championship probability, the standard error is $\sqrt{0.2 \times 0.8 / 50000} \approx 0.18\%$, giving a 90% CI of approximately ±0.3%. For rare outcomes (e.g., a 15-seed champion at 0.01%), the relative uncertainty is much larger.
Bracket structure sensitivity: a team's championship probability depends heavily on which region they are placed in. Two teams in the same region cannot both reach the Final Four. The pre-Selection Sunday simulation uses estimated region assignments that will change.