290 likes | 478 Views
Anareg week11. Regression diagnostics. Regression Diagnostics. Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s D, DFBETAS Variance inflation factor Tolerance. NKNW Example. NKNW p 3 89 , section 11 .1 Y is amount of life insurance
E N D
Anareg week11 • Regression diagnostics
Regression Diagnostics • Partial regression plots • Studentized deleted residuals • Hat matrix diagonals • Dffits, Cook’s D, DFBETAS • Variance inflation factor • Tolerance
NKNW Example • NKNW p 389, section 11.1 • Y is amount of life insurance • X1 is average annual income • X2 is a risk aversion score • n = 18 managers
Partial regression plots • Also called added variable plots or adjusted variable plots • One plot for each Xi
Partial regression plots (2) • Consider X1 • Use the other X’s to predict Y • Use the other X’s to predict X1 • Plot the residuals from the first regression vs the residuals from the second regression
Partial regression plots (3) • These plots can detect • Nonlinear relationships • Heterogeneous variances • Outliers
Output Source DF F Value Pr > F Model 2 542.33 <.0001 Error 15 C Total 17 Root MSE 12.66267 R-Square 0.9864
Output (2) Par St Var Est Err t Pr > |t| Int -205.72 11 -18.06 <.0001 income 6.288.20 30.80 <.0001 risk 4.738 1.3 3.44 0.0037
Plot the residuals vs each Indep Variables • From the regression of Y on X1 and X2 we plot the residual against each of indep. Variable. • The plot of residual against X1 indicates a curvelinear effect. • Therefore, we need to check further by looking at the partial regression plot
The partial regression plots • To generate the partial regression plots • Regress Y and X1 each on X2. • Get the residual from each regression namely e(Y|X2) and e(X1|X2) • Plot e(Y|X2) against e(X1|X2) • Do the same for Y and X2 each on X1.
Residuals • There are several versions • Residuals ei = Yi – Ŷi • Studentized residuals ei / √MSE • Deleted residuals : di = ei / (1-hii) where hii is the leverage • Studentized deleted residuals • di * = di / s(di) • Where • Or equivalenly
Residuals (2) • We use the notation (i) to indicate that case i has been deleted from the computations • X(i) is the X matrix with case i deleted • MSE(i) is the MSE with case i deleted
Residuals (3) • When we examine the residuals we are looking for • Outliers • Non normal error distributions • Influential observations
Hat matrix diagonals • hii is a measure of how much Yi is contributing to the prediction Yi(hat) • Ŷ1 = h11Y1 + h12 Y2 + h13Y3 + … • hii is sometimes called the leverage of the ith observation
Hat matrix diagonals (2) • 0 < hii< 1Σhii = p • We would like hii to be small • The average value is p/n • Values far from this average point to cases that should be examined carefully
Hat diagonals Hat Diag Obs H 1 0.0693 2 0.1006 3 0.1890 4 0.1316 5 0.0756
DFFITS • A measure of the influence of case i on Ŷi • It is a standardized version of the difference between Ŷi computed with and without case i • It is closely related to hii
Cook’s Distance • A measure of the influence of case i on all of the Ŷi’s • It is a standardized version of the sum of squares of the differences between the predicted values computed with and without case i
DFBETAS • A measure of the influence of case i on each of the regression coefficients • It is a standardized version of the difference between the regression coefficient computed with and without case i
Variance Inflation Factor • The VIF is related to the variance of the estimated regression coefficients • We calculate it for each explanatory variable • One suggested rule is that a value of 10 or more indicates excessive multicollinearity
Tolerance • TOL = (1 – R2k) • Where R2k is the squared multiple correlation obtained in a regression where all other explanatory variables are used to predict Xk • TOL = 1/VIF • Described in comment on p 411
Output (Tolerance) Variable Tolerance Intercept . income 0.93524 risk 0.93524
Last slide • Read NKNW Chapter 11