PSC 5940: Estimating the Fit of Multi-Level Models

PSC 5940: Estimating the Fit of Multi-Level Models Session 8 Fall, 2009

Log-likelihood - GLMs Given a linear model: Allows errors to be correlated With possibly correlated errors: The log-likelihood is defined as: Given a linear model, differentiating the Generalized Sum of Squares wrtβ, setting partial derivative to zero, and solving for β, produces: Look familiar?

Understanding MLE • For OLS, the the MLE is the SSE when β=(X’X)-1(X’Y) • For log models, the MLS is the product of the (logged) errors given the formula for the Generalized Sum of Squares • The latter differs from OLS as various assumptions are relaxed (nonlinearity, correlated errors, etc) • The results of the likelihood function can be viewed topographically, as a “hill” showing the effect on the LE as you vary the estimated coefficients. The maximum-LE will be the peak of the hill. • The “fit” measures (adj R2, AIC, BIC) are telling you how high (or low) the peak is for a given model. • Comparisons of the fit measures across models can assist in model selection

Measures of Model Fit • R2 and adj R2: • AIC: • Where is the maximized log-likelihood for the model • AIC penalizes for the complexity of the model • BIC: • Where n is the number of observations • BIC penalizes for increased model complexity and sample size (results in preference for simpler models)

BIC Test for “Improvement in Model Fit”

Example in R: LM1 • Predicting votes in referendum on alternative energy tax (erdf100<-e63_erdf) by price and region: • Explanatory variables: • Randomly assigned values: ($6 to $2400 p/y) • price<-random_p • Region (already so named) • ML1<-lmer(erdf100~1+(1|price)+(1|region))

Example in R: LM2 • Predicting votes in referendum on alternative energy tax (erdf100<-e63_erdf) by price, region and perceived risks posed by GCC: • Explanatory variables: • Randomly assigned values: ($6 to $2400 p/y) • price<-random_p • Region (already so named) • Risk_GCC (0-10 scale) • ML2<-lmer(erdf100~1+risk_GCC+(1|price)+(1|region))

LM1 Result: > summary(ML1) Linear mixed model fit by REML Formula: erdf100 ~ 1 + (1 | price) + (1 | region) AIC BIC logLik deviance REMLdev 15275 15296 -7634 15271 15267 Random effects: Groups Name Variance Std.Dev. price (Intercept) 97.5853 9.8785 region (Intercept) 8.7641 2.9604 Residual 1113.6147 33.3709 Number of obs: 1546, groups: price, 15; region, 5 Fixed effects: Estimate Std. Error t value (Intercept) 59.026 3.064 19.26

LM2 Result: Linear mixed model fit by REML Formula: erdf100 ~ 1 + risk_gcc + (1 | price) + (1 |region) AIC BIC logLik deviance REMLdev 14960 14987 -7475 14954 14950 Random effects: Groups Name Variance Std.Dev. price (Intercept) 99.8298 9.9915 region (Intercept) 6.6622 2.5811 Residual 990.3544 31.4699 Number of obs: 1532, groups: price, 15; region, 5 Fixed effects: Estimate Std. Error t value (Intercept) 30.9259 3.6408 8.494 risk_gcc 4.1735 0.3068 13.604

BIC Test Results: BIC for ML1 – BIC for ML2 = 15296 – 14987 = 309; “Conclusive” Note: You can have R calculate the difference: BIC(logLik(ML2))-BIC(logLik(ML1))

Model fit: BIC and Thinking • Use of BIC is often (ill)used like a statistical “idiot light” • Depends on sample employed • Maximizes predictive capacity of model rather than model explanation • When you face a decision ofwhether to add an explanatory variable: use a 2-step process • Does the variable make theoretical sense? • Does BIC show improved model fit? • If answers to both are “yes”, then add the variable

BREAK

R Coding • When modeling with only part of your data, use “subset” • lmer(y~x…, subset=state==“NY”)

Workshop • New developments on Models? • Progress on Papers • Research question motivated by literature reviews

Next Week • Focus is on paper progress • Build timelines for completion • Focus on challenges, and what we need to do to surmount them • Hone your research question: motivated by literature reviews • Need 1-page progress reports, including task assignments

PSC 5940: Estimating the Fit of Multi-Level Models