Generalized Linear Mixed Model

Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Introduction • English Premier League Soccer (Football) • 20 Teams – Each plays all others twice (home/away) • Games consist of two halves (45 minutes each) • No overtime • Each team is on offense and defense for 38 games (38 first and second halves) • Response Variable: Goals in a half • Potential Independent Variables • Fixed Factors: Home Dummy, Half2 Dummy, Game#(1-38) • Random Factors: Offensive Team, Defensive Team • Distribution of Response: Poisson?

Preliminary Summary

Summary of Previous Slide • Teams vary extensively on offense and defense • Offense: min=38, max=73, mean=50.6, SD=8.85 • Defense: min=26, max=79, mean=50.6, SD=13.75 • Strong Negative correlation between off/def: r=-0.80 • Home Teams outscore Away Teams 1.3:1 • Second Half outscores First Half 1.2:1 • No evidence of autocorrelation in total goals scored over weeks, Durbin-Watson Stat = 2.03

“Marginal Analysis” – No Team Effects • Break Down Goals by Home/Half2 (380 Games)

Summary of Previous Slide • Means (Variances) for 4 Half Types: • Home/1st Half: Mean = 0.692 Variance = 0.689 • Away/1st Half: Mean = 0.521 Variance = 0.514 • Home/2nd Half: Mean = 0.813 Variance = 0.912 • Away/2nd Half: Mean = 0.637 Variance = 0.628 • Thus, means and variances in strong agreement • Chi-Square Statistics for testing for Poisson: • Df = (4 categories-1)-(1 Parameter estimated) = 2 • P-values all exceed 0.50 (.8505, .5440, .7353, .6957) • Goals scored consistent with Poisson Distribution

Generalized Linear Models • Dependent Variable: Goals Scored • Distribution: Poisson • Link Function: log • Independent Variables: Home, Half2 Dummy Variables • Models: Model fit using generalized linear model software packages

Parameter Estimates / Model Fit – Model 1 Distribution Poisson Link Function Log Dependent Variable goals Number of Observations Read 1520 Number of Observations Used 1520 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 1517 1650.4574 1.0880 Scaled Deviance 1517 1650.4574 1.0880 Pearson Chi-Square 1517 1549.2570 1.0213 Scaled Pearson X2 1517 1549.2570 1.0213 Log Likelihood -1411.0226 Algorithm converged.

Parameter Estimates / Model Fit – Model 1 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Intercept 1 -0.6397 0.0588 -0.7549 -0.5245 118.48 home 1 0.2624 0.0634 0.1381 0.3866 17.12 half2 1 0.1783 0.0631 0.0546 0.3020 7.98 Scale 0 1.0000 0.0000 1.0000 1.0000 Analysis Of Parameter Estimates Parameter Pr > ChiSq Intercept <.0001 home <.0001 half2 0.0047 Scale NOTE: The scale parameter was held fixed.

Parameter Estimates / Model Fit – Model 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 1516 1650.3613 1.0886 Scaled Deviance 1516 1650.3613 1.0886 Pearson Chi-Square 1516 1549.7072 1.0222 Scaled Pearson X2 1516 1549.7072 1.0222 Log Likelihood -1410.9745 Algorithm converged.

Parameter Estimates / Model Fit – Model 2 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi-Parameter DF Estimate Error Limits SquareIntercept 1 -0.6519 0.0711 -0.7912 -0.5126 84.15home 1 0.2839 0.0941 0.0995 0.4683 9.10half2 1 0.2007 0.0958 0.0129 0.3885 4.39home*half2 1 -0.0395 0.1274 -0.2891 0.2101 0.10Scale 0 1.0000 0.0000 1.0000 1.0000 Parameter Pr > ChiSq Intercept <.0001 home 0.0026 half2 0.0363 home*half2 0.7566 Scale NOTE: The scale parameter was held fixed.

Testing for Home/Half2 Interaction • H0: No Home x Half2 Interaction (bHomeHalf2 = 0) • HA: Home x Half2 Interaction (bHomeHalf2≠ 0) • Test 1 – Wald Test • Test 2 – Likelihood Ratio Test

Testing for Main Effects for Home & Half2 • Wald tests only reported here (both effects are very significant) • Tests based on Model 1 (no interaction model)

Interpreting the GLM

Incorporating Random (Team) Effects • Teams clearly vary in terms of offensive and defensive skills (see slide 3) • Since many factors are inputs into team abilities (players, coaches, chemistry), we will treat team offensive and defensive effects as Random • There will be 20 random offensive effects (one per team) and 20 defensive effects

Random Team Effects • All effects are on log scale for goals scored • Offense Effects: oi ~ NID(0,so2) • Defense Effects: di ~ NID(0,sd2) • In Estimation process assume COV(oi,di)=0 which seems a stretch (but we can still “observe” the covariance of the estimated random effects)

Mixed Effects Model • Fixed Effects: Intercept, Home, Half2 (a) • Random Effects: Offteam, Defteam (b) • Conditional Model (on Random Effects)

Model in Matrix Notation - Example • League has 3 Teams: A, B, C • Order of Entry of Games: A@B, A@C, B@C, B@A, C@A, C@B • Order of Entry of Scores within Game: Home/1st, Away/1st, Home/2nd, Away/2nd • 3 Offense Effects, 3 Defense Effects, 24 Observations

Model – Based on 3 Teams

Sequence of Potential Models • No fixed or random effects (common mean) • Fixed home and second half effects, no random effects • Fixed home and second half effects, random offense team effects • Fixed home and second half effects, random defense team effects • Fixed home and second half effects, random offense and defense team effects

Results – Estimates (P-Values) • Based on Z-test, not preferred Likelihood Ratio Test • H0:so2 = 0 vs HA:s02>0 TS: 4958.6-4951.9=6.7 P=0.5P(c12≥6.7)=.005 • Based on AIC, BIC, Model with both offense and defense effects is best • No interaction found between team effects and home or half2

Goodness of Fit • We Test whether the Poisson GLMM is appropriate model by means of the Scaled Deviance • H0: Model Fits HA: Model Lacks Fit • Deviance = 1570.7 • DF = N-#fixed parms = 1520-3=1517 • P-value=P(c2≥1570.7)=0.1646 • No Evidence of Lack-of-Fit* • * If we use Scaled Deviance, we do reject, where scaled deviance=1570.7/0.9531=1647.9

Best Linear Unbiased Predictors (BLUPs) Estimated Team (Random) Effects (Teams with High Defense values Allow More Goals) Estimated Fixed Effects For each Halfijkl compute exp{-0.6605+HOMEi+HALF2j+ok+dl} as the BLUP

Comparison of BLUPs with Actual Scores • For Each Team Half, we have Actual and BLUP • Correlation Between Actual & BLUP = 0.2655 • Concordant Pairs of Halves (One scores higher on both Actual and BLUP than other) = 452471 • Discordant Pairs of Halves = 355617 • “Gamma” = (452471-355617)/(452471+355617)=0.1199 • Evidence of Some Positive Association Between actual and predicted scores

Sources: Data: SoccerPunter.com Methods: Littell, Milliken, Stroup, Wolfinger(1996). “SAS System for Mixed Models” Wolfinger, R. and M. O’Connell(1993). “Generalized Linear Mixed Models: A Pseudo-Likelihood Approach,” J. Statist. Comput. Simul., Vol. 48, pp. 233-243.

SAS Code data one;infile 'engl2003d.dat';input hteam $ 1-20 rteam $21-40 goals 47-48 half2 56 home 64 round 71-73;if home=1 then do; offteam=hteam; defteam=rteam; end;else do; offteam=rteam; defteam=hteam; end;%include 'glmm800.sas';%glimmix(data=two, procopt=method=reml, stmts=%str( class offteam defteam; model goals = home half2 /s; random offteam defteam /s ; ), error=poisson, link=log);run;

Generalized Linear Mixed Model