Bracket Simulation & Championship Probabilities

Monte Carlo Tournament Simulation — March Machine Learning Mania 2026

50K
Simulations
85
Teams in Field
22.2%
Top Champion Prob (Duke)
0.53651
Model LOSO Log-Loss

1. Overview

The ensemble model produces pairwise win probabilities for any two teams: given Team A vs Team B, it outputs $P(A\ \text{wins})$. But the competition cares about championship probabilities — the chance that each team wins all six games and takes the title.

To convert pairwise probabilities into championship probabilities, we simulate the entire tournament bracket 50,000 times using Monte Carlo methods. Each simulation draws random outcomes for every game according to the model's predicted probabilities, producing one complete bracket. Aggregating across all simulations gives us the probability of each team reaching any given round, including the championship.

Why Monte Carlo? Analytical computation of championship probability requires integrating over all possible bracket paths (2^63 possible brackets). Monte Carlo simulation approximates this integral with controlled statistical error: for a team with 20% championship probability, the 90% CI from 50,000 simulations is roughly ±0.3 percentage points.

2. How the Simulation Works

Monte Carlo methodology diagram
1
Build the Field — Identify tournament teams.
2
Assign Seeds & Regions — Teams are distributed across four regions by seed line.
3
Compute Pairwise Probabilities — For every possible matchup (A vs B), the ensemble model predicts $P(A\ \text{wins})$.
$$P = 0.70 \cdot P_{\text{LightGBM}}(\mathbf{x}_\Delta) + 0.30 \cdot P_{\text{LogReg}}(\mathbf{x}_\Delta) \qquad \text{clipped to } [0.01, 0.99]$$

All probabilities are precomputed and cached before the simulation loop begins.

4
Simulate Bracket (×50,000) — Each simulation plays out the full tournament.
5
Aggregate Results — Count outcomes across all simulations.
$$P(\text{champion}) = \frac{\text{# simulations won}}{50{,}000} \qquad P(\text{reach round}\ r) = \frac{\text{# times reached round}\ r}{50{,}000}$$

3. Round-by-Round Results

Round-by-round advancement probabilities
TeamSeedR64R32S16E8F4FinalChamp
Duke 1 100% 96% 85% 71% 52% 35% 22.2%
Arizona 1 100% 96% 86% 70% 49% 31% 18.5%
Michigan 1 100% 95% 83% 67% 45% 27% 15.3%
Florida 2 100% 94% 79% 60% 33% 19% 9.9%
Houston 2 100% 96% 81% 62% 34% 19% 9.5%
Illinois 2 100% 95% 78% 55% 28% 14% 6.2%
Connecticut 1 100% 94% 75% 50% 30% 14% 5.8%
Michigan St 2 100% 94% 68% 42% 17% 7% 2.3%
Purdue 3 100% 92% 67% 33% 13% 5% 1.8%
Iowa St 3 100% 89% 63% 28% 11% 4% 1.6%
Gonzaga 4 67% 58% 44% 16% 9% 4% 1.5%
Alabama 4 66% 58% 39% 12% 6% 2% 0.6%
Virginia 3 100% 94% 56% 20% 6% 2% 0.5%
St John's 4 67% 59% 37% 12% 5% 2% 0.5%
Louisville 6 100% 78% 48% 20% 6% 2% 0.5%
Kansas 4 67% 59% 39% 11% 5% 2% 0.5%
Nebraska 4 67% 58% 36% 12% 5% 2% 0.5%
Tennessee 5 100% 85% 43% 12% 5% 2% 0.4%
Arkansas 5 100% 82% 39% 11% 4% 1% 0.4%
Vanderbilt 5 100% 85% 41% 12% 5% 1% 0.4%
Reading the table: Each column shows the probability of a team reaching that round. R64 reflects the probability of being selected into the 64-team bracket. Before Selection Sunday, bracketology projections may assign more than 4 teams to a seed line (e.g., 6 teams projected at seed 4). Each simulation randomly selects 4 of those teams, so each gets roughly 4/N probability of inclusion. All 1-seeds show 100% because exactly 4 teams are projected there. A team like Gonzaga at ~67% R64 means it is one of 6 projected 4-seeds competing for 4 slots.

4. Championship Probabilities

Championship probabilities with confidence intervals

The 90% confidence interval is computed via parametric bootstrap: we draw 10,000 samples from $\text{Binomial}(50000,\ \hat{p})$ where $\hat{p}$ is the observed championship fraction, and report the 5th–95th percentile range.

The intervals are narrow because 50,000 simulations provide substantial precision. For Duke at 21.0%, the 90% CI is [21.9%, 22.5%] — a range of just 0.6 percentage points.

Interpreting the probabilities: These are model-implied probabilities, not true frequencies. They reflect what the ensemble model believes, conditional on the bracket structure. If the model is miscalibrated, these probabilities will be systematically off.

5. The Elimination Funnel

Elimination funnel for top 4 teams

Even a dominant team faces compounding uncertainty across six elimination rounds. A team that wins each game with 80% probability has only:

$$P(\text{champion}) \approx \prod_{i=1}^{6} P(\text{win round}\ i) = 0.80^6 = 0.262 = 26.2\%$$

This explains why no team exceeds ~21% championship probability even when they are heavy favorites in each individual game. The funnel steepens in later rounds because the surviving opponents are themselves strong — a 1-seed's Elite Eight opponent is typically a 2- or 3-seed, not a 15-seed.

Note: The formula $P(\text{champion}) \approx \prod P(\text{win round}\ i)$ is approximate because the opponent in each round depends on the bracket path. The Monte Carlo simulation handles this correctly by drawing specific opponents each round based on who advanced.

6. Model vs Market

Model vs market scatter plot

We compare model championship probabilities to prediction market prices from Kalshi, Polymarket, and a sportsbook sharp consensus. The scatter plot reveals a systematic pattern:

Top 5 "Value Picks" (Model > Market)

TeamSeedModelMarketDivergence
Arizona 1 18.8% 14.4% +4.4pp
Duke 1 22.5% 19.5% +3.0pp
Houston 2 9.8% 7.8% +2.0pp
Illinois 2 6.4% 4.6% +1.7pp
Florida 2 10.1% 9.1% +1.1pp

Top 5 "Market Favorites" (Market > Model)

TeamSeedModelMarketDivergence
Michigan 1 15.6% 19.5% -3.9pp
Iowa St 3 1.7% 3.7% -2.1pp
Kansas 4 0.5% 1.9% -1.4pp
Arkansas 5 0.4% 1.5% -1.1pp
St John's 4 0.5% 1.5% -0.9pp
Systematic bias pattern: If the model has a structural bias, it's toward rewarding defensive efficiency and underweighting market narratives. This could be correct on average, but if wrong, the errors will be correlated — Houston, Illinois, and Michigan St all fail together, while Michigan, Kansas, and Iowa St all outperform together.
Important: Model predictions are not blended with market data for the primary competition submission. Markets are used only for post-hoc comparison and identifying potential value. An optional secondary submission uses 70% model + 30% market-implied pairwise probabilities.

7. Rating Perturbation (Experimental)

An experimental mode adds Gaussian noise to each team's AdjEM rating before computing pairwise probabilities in each simulation, modeling the inherent uncertainty in team ratings:

$$\text{AdjEM}_{\text{perturbed}} = \text{AdjEM} + \epsilon, \qquad \epsilon \sim N(0, \sigma^2)$$

The effect is to compress the championship probability distribution:

This behavior is closer to how prediction markets price teams — markets implicitly account for model uncertainty, which is why they tend to be flatter than raw model probabilities. However, perturbation also introduces noise that may not reflect real-world uncertainty accurately.

Currently NOT used in the primary submission. The standard simulation uses fixed team ratings. Rating perturbation is tracked as an experimental comparison mode.

8. Limitations & Caveats