240 likes | 255 Views
Explore how to model and predict outcomes of CFL games, calculate probabilities for teams finishing first, and analyze financial impact and power rankings using simulations. Data-driven approach to predict game results. Potential for further refinement by weighting recent games more heavily.
E N D
Who’s on First: Simulating the Canadian Football League regular season Keith A. Willoughby, Ph.D. University of Saskatchewan Joint Statistical Meetings (2014)
Research questions • Can we develop a spreadsheet model to simulate the outcome of professional football games? • Can we use this model to determine the probabilities of a team finishing first in their division?
Overview of presentation • 1. CFL background • 2. Power rankings model • 3. CFL simulation model • 4. Results
CFL teams (2014) Eastern Division Western Division
Why do teams want to finish 1st in their division? • The 1st place team hosts the divisional championship game • Winners of each divisional championship game meet in the Grey Cup
Financial impact • Hosting a playoff game can yield over $1 million in profit for the home team • Ticket sales, concession sales • Annual salary cap for each team is about $5 million
Power rankings model • In order to develop the simulation model, we needed to determine the probability of victory for any team during all regular season games • Need a way to quantitatively establish the “strength” of each team
“Strength” values • Considers two items: • Particular opponent • Defeating a stronger opponent increases a team’s strength value • Outcome of each game (margin of victory) • Defeating an opponent by a larger margin of victory increases a team’s strength value
Power rankings model • For each game, let: Si = score of winning team Sj = score of losing team Margin of victory (MOVi,j) = Si - Sj
Simulation model • How well do the strength values (β’s) correlate with game outcomes? • Analyzed game results from 2006-2012 seasons • 504 CFL games
Simulation model • Using the optimization model, we determined the strength values (β’s) for each team • Calculated βi – βj for each game in each season • Team i represented the home team
Simulation model • Logistic regression model: • Explanatory variable (X) = βh – βv • where h = home team; v = visiting team • Response variable (Y) = outcome of game • 1 if home team won; 0 if home team lost • Tie games: 3 (out of 504) – Assigned the visiting team as the winner
Probability of victory • Applied simulation model for 2013 regular season • Calculated βh – βv for all games yet to be played • Added 3.4 to the resulting difference • Reflects average home team margin of victory from 2006-2012 • “Home field advantage”
Simulation model • Used the logistic regression equation to determine the probability of victory • Generate random numbers using the RAND() function • If RAND() ≤ Calculated probability, then home team wins • Else, visiting team wins
Simulation model • Require the following inputs: • Current number of wins • Remaining games • Strength values from the power rankings optimization model
Simulation model • It will calculate the expected number of wins for each team • By simply counting how many times a specific team has the most wins, we can determine the probability that each team finishes first in its four-team division
Conclusions • Western Division: • Calgary overtook Saskatchewan • Saskatchewan lost 4 straight games in September • Eastern Division: • Toronto was the dominant team all year
Next steps • Currently, each game is equally weighted • However, the relatively recent games may have more influence on a team’s performance than games that occurred much earlier in the season • Could adopt a weighting scheme that gives less emphasis to games earlier in the season
Thank you for your time! • Contact information: • Keith A. Willoughby, Ph.D. • University of Saskatchewan • willoughby@edwards.usask.ca