Stats 330: Lecture 23

Stats 330: Lecture 23 Multiple Logistic Regression (Cont)

Plan of the day In today’s lecture we continue our discussion of the multiple logistic regression model Topics covered • Models and submodels • Residuals for Multiple Logistic Regression • Diagnostics in Multiple Logistic Regression • No analogue of R2 Reference: Coursebook, sections 5.2.3, 5.2.3

Comparison of models • Suppose model 1 and model 2 are two models, with model 2 a submodel of model1 • If Model 2 is in fact correct, then the difference in the deviances will have approximately a chi-squared distribution • df equals the difference in df of the separate models • Approximation OK for grouped and ungrouped data

Example: kyphosis data • Is age alone an adequate model? > age.glm<-glm(Kyphosis~Age+I(Age^2),family=binomial, data=kyphosis.df) Null deviance: 83.234 on 80 degrees of freedom Residual deviance: 72.739 on 78 degrees of freedom AIC: 78.739 Full model has deviance 54.428 on 76 df Chisq is 72.739 - 54.428 = 18.311 on 78-76=2 df > 1-pchisq(18.311,2) [1] 0.0001056372 Highly significant: need at least one of start and number

Anova in R Two-model form: comparing > anova(age.glm,kyphosis.glm, test=“Chi”) Analysis of Deviance Table Model 1: Kyphosis ~ Age + I(Age^2) Model 2: Kyphosis ~ Age + I(Age^2) + Start + Number Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 78 72.739 2 76 54.428 2 18.311 0.0001056 ***

Residuals • Two kinds of residuals • Pearson residuals • useful for grouped data only • similar to residuals in linear regression, actual minus fitted value • Deviance residuals • useful for grouped and ungrouped data • Measure contribution of each covariate pattern to the deviance

Pearson residuals Pearson residual for pattern i is Probability predicted by model Standardized to have approximately unit variance, so big if more than 2 in absolute value

Deviance residuals (i) • For grouped data, the deviance is

Deviance residuals (i) • Thus, the deviance can be written as the sum of squares of M quantities d1, …, dM ,one for each covariate pattern • Each di is the contribution to the deviance from the ith covariate pattern • If deviance residual is big (more than about 2 in magnitude), then the covariate pattern has a big influence on the likelihood, and hence the estimates

Calculating residuals > pearson.residuals<-residuals(budworm.glm, type="pearson") • > deviance.residuals<-residuals(budworm.glm, type="deviance") • > par(mfrow=c(1,2)) > plot(pearson.residuals, ylab="residuals", main="Pearson") • > abline(h=0,lty=2) > plot(deviance.residuals, ylab="residuals", main="Deviance") • > abline(h=0,lty=2)

Diagnostics: outlier detection • Large residuals indicate covariate patterns poorly fitted by the model • Large Pearson residuals indicate a poor match between the “maximum model probabilities” and the logistic model probabilities, for grouped data • Large deviance residuals indicate influential points • Example: budworm data

Diagnostics: detecting non-linear regression functions • For a single x, plot the logits of the maximal model probabilities against x • For multiple x’s, plot Pearson residuals against fitted probabilities, against individual x’s • If the data has most ni’s equal to 1, so can’t be grouped, try gam (cf kyphosis data)

Example: budworms • Plot Pearson residuals versus dose, plot shows a curve

Diagnostics: influential points Will look at 3 diagnostics • Hat matrix diagonals • Cook’s distance • Leave-one-out Deviance Change

Example: vaso-constriction data Data from study of reflex vaso-constriction (narrowing of the blood vessels) of the skin of the fingers • Can be caused caused by sharp intake of breath

Example: vaso-constriction data Variables measured: Response = 0/1 1=vaso-constriction occurs, 0 = doesn’t occur Volume: volume of air breathed in Rate: rate of intake of breath

Data • Volume Rate Response • 1 3.70 0.825 1 • 2 3.50 1.090 1 • 3 1.25 2.500 1 • 4 0.75 1.500 1 • 5 0.80 3.200 1 • 6 0.70 3.500 1 • 7 0.60 0.750 0 • 8 1.10 1.700 0 • 9 0.90 0.750 0 • 10 0.90 0.450 0 • 11 0.80 0.570 0 • 12 0.55 2.750 0 • 0.60 3.000 0 • . . . 39 obs in all

Plot of data > plot(Rate,Volume,type="n", cex=1.2) > text(Rate,Volume,1:39, col=ifelse(Response==1, “red",“blue"), cex=1.2) > text(2.3,3.5,“blue: no VS", col=“blue", adj=0, cex=1.2) > text(2.3,3.0,“red: VS", col=“red",adj=0, cex=1.2)

Note points 4 and 18

Enhanced residual plots > vaso.glm = glm(Response ~ log(Volume) + log(Rate), family=binomial, data=vaso.df) > pear.r<-residuals(vaso.glm, type="pearson") > dev.r<-residuals(vaso.glm, type="deviance") > par(mfrow=c(1,2)) > plot(pear.r, ylab="residuals", main="Pearson",type="n") > text(pear.r,cex=0.7) > abline(h=0,lty=2) > abline(h=2,lty=2,lwd=2) > abline(h=-2,lty=2,lwd=2) > plot(dev.r, ylab="residuals", main="Deviance",type="h") > text(dev.r, cex=0.7) > abline(h=0,lty=2) > abline(h=2,lty=2,lwd=2) > abline(h=-2,lty=2,lwd=2)

Diagnostics: Hat matrix diagonals • Can define hat matrix diagonals (HMD’s) pretty much as in linear models • HMD big if HMD > 3p/M (M= no of covariate patterns) • Draw index plot of HMD’s

Plotting HMD’s > HMD<-hatvalues(vaso.glm) > plot(HMD,ylab="HMD's",type="h") > text(HMD,cex=0.7) > abline(h=3*3/39, lty=2)

Obs 31 high-leverage

Hat matrix diagonals • In ordinary regression, the hat matrix diagonals measure how “outlying” the covariates for an observation are • In logistic regression, the HMD’s measure the same thing, but are down-weighted according to the estimated probability for the observation. The weights gets small if the probability is close to 0 or 1. • In the vaso-constriction data, points 1,2,17 had very small weights, since the probabilities are close to 1 for these points.

Note points 1,2,17

Diagnostics: Cooks distance • Can define an analogue of Cook’s distance for each point CD = (Pearson resid )2 x HMD/(p*(1-HMD)2) p = number of coeficients • CD big if more than about 10% quantile of the chi-squared distribution on k+1 df, divided by k+1 • Calculate with qchisq(0.1,k+1)/(k+1) • But not that reliable as a measure

Cooks D: calculating and plotting p<-3 CD<-cooks.distance(vaso.glm) plot(CD,ylab="Cook's D",type="h", main="index plot of Cook's distances") text(CD, cex=0.7) bigcook<-qchisq(0.1,p)/p abline(h=bigcook, lty=2)

Points 4 and 18 influential

Diagnostics: leave-one-out deviance change • If the ith covariate pattern is left out, the change in the deviance is approximately (Dev. Res) 2 + (Pearson. Res)2HMD/(1-HMD) Big if more than about 4

Deviance change: calculating and plotting > dev.r<-residuals(vaso.glm,type="deviance") > Dev.change<-dev.r^2 + pear.r^2*HMD/(1-HMD) > plot(Dev.change,ylab="Deviance change", type="h") > text(Dev.change, cex=0.7) > bigdev<-4 > abline(h=bigdev, lty=2)

4 and 18 influential

All together influenceplots(vaso.glm)

Should we delete points? • How influential are the 3 points? • Can delete each in turn and examine changes in coefficients, predicted probabilities • First, coefficients: Deleting: None 31 4 18 All 3 (Intercept) -2.875 -3.041 -5.206 -4.758 -24.348 log(Volume) 5.179 4.966 8.468 7.671 39.142 log(Rate) 4.562 4.765 7.455 6.880 31.642

Should we delete points (2)? • Next, fitted probabilities: • Conclusion: points 4 and 18 have a big effect. delete points Fitted at None 31 4 18 4 and 18 All 3 point 31 0.722 0.627 0.743 0.707 0.996 0.996 point 4 0.075 0.073 0.010 0.015 0.000 0.000 point 18 0.106 0.100 0.018 0.026 0.000 0.000

Should we delete points (3)? • Should we delete? • They could be genuine – no real evidence they are wrong • If we delete them, we increase the regression coefficients, make fitted probabilities more extreme • Overstate the predictive ability of the model

Residuals for ungrouped data • If all cases have distinct covariate patterns, then the residuals lie along two curves (corresponding to success and failure) and have little or no diagnostic value. • Thus, there is a pattern even if everything is OK.

Formulas • Pearson residuals: for ungrouped data, residual for i th case is

Formulas (cont) • Deviance residuals: for ungrouped data, residual for i th case is

Use of plot function plot(kyphosis.glm)

Analogue of R2? • There is no satisfactory analogue of R2 for logistic regression. • For the “small m big n” situation we can use the residual deviance, since we can obtain an approximate p-value. • For other situations we can use the Hosmer –Lemeshow statistic (next slide)

Hosmer-Lemeshow statistic • How can we judge goodness of fit for ungrouped data? • Can use the Hosmer-Lemeshow statistic, which groups the data into cases having similar fitted probabilities • Sort the cases in increasing order of fitted probabilities • Divide into 10 (almost) equal groups • Do a chi-square test to see if the number of successes in each group matches the estimated probability

Kyphosis data Divide probs into 10 classes : lowest 10%, next 10%...... Class 1 Class 2 Class 3 Class 4 Class 5 Observed 0’s 9 8 8 7 8 Observed 1’s 0 0 0 1 0 Total obs 9 8 8 8 8 Expected 1’s 0.022 0.082 0.199 0.443 0.776 Class 6 Class 7 Class 8 Class 9 Class 10 Observed 0’s 8 5 5 3 3 Observed 1’s 0 3 3 5 5 Total obs 8 8 8 8 8 Expected 1’s 1.023 1.639 2.496 3.991 6.328 Note: Expected = Total.obs x average prob

In R, using the kyphosis data Result of fitting model > HLstat(kyphosis.glm) Value of HL statistic = 6.498 P-value = 0.592 A p-value of less than 0.05 indicates problems. No problem indicated for the kyphosis data – logistic appears to fit OK. The function HLstat is in the “330 functions”

Stats 330: Lecture 23