190 likes | 321 Views
Lab 11. Multiple Regression Residuals and Influence. Influence analysis. Standardized B weights. Residuals analysis. Multiple Regression Syntax. Proc Reg; Model dv = iv1 iv2 / stb R influence ; Plot dv*iv1; Plot dv*iv2; Plot dv*p.; Plot p.*r.; Run ;. Example.
E N D
Lab 11 Multiple Regression Residuals and Influence
Influence analysis Standardized B weights Residuals analysis Multiple Regression Syntax Proc Reg; Model dv = iv1 iv2 / stb R influence; Plot dv*iv1; Plot dv*iv2; Plot dv*p.; Plot p.*r.; Run;
Example Record company wants to know if 1) airplay 2) attractiveness of band and 3) advertising budget contribute significant variance to record sales?
Example Program data d2; infile 'C:\WINDOWS\Desktop\lab11.txt'; input adverts sales airplay attract; ProcReg; Model sales = adverts airplay attract / stb R influence; Plot sales*adverts; Plot sales*airplay; Plot sales*attract; Plot sales*p.; Plot p.*r.; Run;
Example Output Model: MODEL1 Dependent Variable: sales Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1722754 574251 261.64 <.0001 Error 396 869150 2194.82212 Corrected Total 399 2591904 Root MSE 46.84893 R-Square 0.6647 Dependent Mean 193.20000 Adj R-Sq 0.6621 Coeff Var 24.24893 Parameter Estimates Parameter Standard Standardized Variable DF Estimate Error t Value Pr > |t| Estimate Intercept 1 -26.61294 12.20619 -2.18 0.0298 0 adverts 1 0.08488 0.00487 17.43 <.0001 0.51085 airplay 1 3.36742 0.19542 17.23 <.0001 0.51199 attract 1 11.08634 1.71509 6.46 <.0001 0.19168
Output Residual Analysis(cont) Model: MODEL1 Dependent Variable: sales Output Statistics Dep Var Predicted Std Error Std Error Student Obs sales Value Mean Predict Residual Residual Residual 1 330.0000 229.9206 7.1963 100.0794 46.293 2.162 2 330.0000 229.9206 7.1963 100.0794 46.293 2.162 3 120.0000 228.9494 2.9642 -108.9494 46.755 -2.330 4 120.0000 228.9494 2.9642 -108.9494 46.755 -2.330 5 360.0000 291.5573 4.7662 68.4427 46.606 1.469 6 360.0000 291.5573 4.7662 68.4427 46.606 1.469 7 270.0000 262.9757 3.7127 7.0243 46.702 0.150 8 270.0000 262.9757 3.7127 7.0243 46.702 0.150 9 220.0000 225.7525 5.3483 -5.7525 46.543 -0.124 10 220.0000 225.7525 5.3483 -5.7525 46.543 -0.124 11 170.0000 141.0950 3.9487 28.9050 46.682 0.619 12 170.0000 141.0950 3.9487 28.9050 46.682 0.619
What to look for in your outlier results • Studentized Residuals (takes into account that values of X further from the mean have larger standard errors) – want to identify values greater than 2. • Residuals – Look at values greater than 2 x root MSE, 2 x 47 = 94 • Once you identify the outliers, you want to investigate their influence on your results
Influence Output Model: MODEL1 Dependent Variable: sales Output Statistics Cook's Hat Diag Cov Obs -2-1 0 1 2 D RStudent H Ratio DFFITS 1 | |**** | 0.028 2.1720 0.0236 0.9866 0.3376 2 | |**** | 0.028 2.1720 0.0236 0.9866 0.3376 3 | ****| | 0.005 -2.3434 0.0040 0.9597 -0.1486 4 | ****| | 0.005 -2.3434 0.0040 0.9597 -0.1486 5 | |** | 0.006 1.4707 0.0104 0.9987 0.1504 6 | |** | 0.006 1.4707 0.0104 0.9987 0.1504 7 | | | 0.000 0.1502 0.0063 1.0163 0.0119 8 | | | 0.000 0.1502 0.0063 1.0163 0.0119 9 | | | 0.000 -0.1234 0.0130 1.0233 -0.0142 10 | | | 0.000 -0.1234 0.0130 1.0233 -0.0142 11 | |* | 0.001 0.6187 0.0071 1.0135 0.0523 12 | |* | 0.001 0.6187 0.0071 1.0135 0.0523
What to look for in your influence results • Leverage: an index of the importance of an observation for the regression equation. Function solely of X. • Denoted by Hat Diag (H). Want to look for values greater than 2(k+1)/N, where k = number of independent variables. Recall (k+1)/N is the average. • 2(3+1)/400 = .02
Influence (cont) • Cook’s D – measure of overall influence of a single case on the model. Look for values greater than .2. • DFBETA and standardized DFBETA, change in regression values when that case is deleted. You can evaluate the influence on both X and Y. Look for values that are large relative to the other values or look for values greater than:
Influence Output (cont) Model: MODEL1 Dependent Variable: sales Output Statistics -------------------DFBETAS------------------- Obs Intercept adverts airplay attract 1 -0.2177 -0.1672 0.1088 0.2438 2 -0.2177 -0.1672 0.1088 0.2438 3 0.0089 -0.0889 0.0066 -0.0131 4 0.0089 -0.0889 0.0066 -0.0131 5 -0.0267 0.1228 0.0327 -0.0038 6 -0.0267 0.1228 0.0327 -0.0038 7 -0.0018 0.0086 0.0024 0.0001 8 -0.0018 0.0086 0.0024 0.0001 9 -0.0060 0.0008 -0.0100 0.0095 10 -0.0060 0.0008 -0.0100 0.0095 11 0.0465 0.0016 -0.0147 -0.0362 12 0.0465 0.0016 -0.0147 -0.0362
Delete outliers greater than 2 studentized residuals data d2; infile 'C:\WINDOWS\Desktop\lab11.txt'; input adverts sales airplay attract; if _n_ = 1 then delete; if _n_ = 2 then delete; if _n_ = 3 then delete; if _n_ = 4 then delete; ProcReg; Model sales = adverts airplay attract / stb R influence; Run;
Output with 4 outliers deleted Model: MODEL1 Dependent Variable: sales Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1719469 573156 272.58 <.0001 Error 392 824249 2102.67719 Corrected Total 395 2543718 Root MSE 45.85496 R-Square 0.6760 Dependent Mean 192.87879 Adj R-Sq 0.6735 Coeff Var 23.77398 Parameter Estimates Parameter Standard Standardized Variable DF Estimate Error t Value Pr > |t| Estimate Intercept 1 -21.41404 12.06981 -1.77 0.0768 0 adverts 1 0.08741 0.00480 18.20 <.0001 0.52814 airplay 1 3.32150 0.19177 17.32 <.0001 0.50771 attract 1 10.27947 1.70029 6.05 <.0001 0.17695
In class example The dataset “Data11” contains the following 10 variables: id na typeAas typeAii errorC learn errorS errorCmm think mngmt Use PROC REG to regress errorC (error competence, higher more competent) on learn (learning from error, higher means quicker you learn from errors),think (thinking about errors, more time you put into thinking about your errors) and mngmt (Management’s orientation toward errors, higher values mean there are more consequences when errors are made). Which variables are significant predictors of Error competence. How many outliers (studentized residuals greater than 2) did you identify? Delete the outliers. Are there change in the significance of the Beta weights or R-square?