1 / 26

Lab 12

Lab 12. Prediction, Regression with multiple IVs. Proc Reg; Model dv = iv1 iv2 / stb selection=forward (or backward, stepwise, maxr, minr, rsquare) ; Plot dv*iv1; Plot dv*iv2; Plot p.*r.; Run ;. Example Forward. data d6; infile 'C:WINDOWSDesktoplab12.txt';

precious
Download Presentation

Lab 12

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lab 12 Prediction, Regression with multiple IVs

  2. Proc Reg; Model dv = iv1 iv2 / stb selection=forward (or backward, stepwise, maxr, minr, rsquare); Plot dv*iv1; Plot dv*iv2; Plot p.*r.; Run;

  3. Example Forward data d6; infile 'C:\WINDOWS\Desktop\lab12.txt'; input adverts sales airplay attract; ProcReg; Model sales = adverts airplay attract /stb selection=forward; run;

  4. Forward Output Model: MODEL1 Dependent Variable: sales Forward Selection: Step 1 Variable airplay Entered: R-Square = 0.3587 and C(p) = 361.3180 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 929726 929726 222.62 <.0001 Error 398 1662178 4176.32758 Corrected Total 399 2591904 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept 84.87251 7.94693 476354 114.06 <.0001 airplay 3.93918 0.26401 929726 222.62 <.0001 Bounds on condition number: 1, 1

  5. Questions • What is the R2 for this model? • What is the raw regression equation for this model?

  6. Forward Output (cont.) Forward Selection: Step 2 Variable adverts Entered: R-Square = 0.6293 and C(p) = 43.7832 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 1631048 815524 336.95 <.0001 Error 397 960856 2420.29268 Corrected Total 399 2591904 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept 41.12384 6.57300 94739 39.14 <.0001 adverts 0.08689 0.00510 701322 289.77 <.0001 airplay 3.58879 0.20204 763672 315.53 <.0001 Bounds on condition number: 1.0105, 4.042

  7. Forward Output (cont.) Forward Selection: Step 3 Variable attract Entered: R-Square = 0.6647 and C(p) = 4.0000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1722754 574251 261.64 <.0001 Error 396 869150 2194.82212 Corrected Total 399 2591904 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -26.61294 12.20619 10433 4.75 0.0298 adverts 0.08488 0.00487 666664 303.74 <.0001 airplay 3.36742 0.19542 651719 296.93 <.0001 attract 11.08634 1.71509 91707 41.78 <.0001 Bounds on condition number: 1.0425, 9.2867

  8. Forward Output (cont.) All variables have been entered into the model. Summary of Forward Selection Variable Number Partial Model Step Entered Vars In R-Square R-Square C(p) F Value Pr > F 1 airplay 1 0.3587 0.3587 361.318 222.62 <.0001 2 adverts 2 0.2706 0.6293 43.7832 289.77 <.0001 3 attract 3 0.0354 0.6647 4.0000 41.78 <.0001

  9. Forward Output (cont.) Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1722754 574251 261.64 <.0001 Error 396 869150 2194.82212 Corrected Total 399 2591904 Root MSE 46.84893 R-Square 0.6647 Dependent Mean 193.20000 Adj R-Sq 0.6621 Coeff Var 24.24893 Parameter Estimates Parameter Standard Standardized Variable DF Estimate Error t Value Pr > |t| Estimate Intercept 1 -26.61294 12.20619 -2.18 0.0298 0 adverts 1 0.08488 0.00487 17.43 <.0001 0.51085 airplay 1 3.36742 0.19542 17.23 <.0001 0.51199 attract 1 11.08634 1.71509 6.46 <.0001 0.19168

  10. Example Backward data d6; infile 'C:\WINDOWS\Desktop\lab12.txt'; input adverts sales airplay attract; ProcReg; Model sales = adverts airplay attract /stb selection=backward; run;

  11. Backward Output Backward Elimination: Step 0 All Variables Entered: R-Square = 0.6647 and C(p) = 4.0000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1722754 574251 261.64 <.0001 Error 396 869150 2194.82212 Corrected Total 399 2591904 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -26.61294 12.20619 10433 4.75 0.0298 adverts 0.08488 0.00487 666664 303.74 <.0001 airplay 3.36742 0.19542 651719 296.93 <.0001 attract 11.08634 1.71509 91707 41.78 <.0001 Bounds on condition number: 1.0425, 9.2867 ***All variables left in the model are significant at the 0.1000 level.

  12. Stepwise • In this example, Stepwise results in the same model as Forward. This need not be the case. At each step, Stepwise examines whether to both add and remove a variable. It essentially combines forward and backward.

  13. Example Rsquare data d6; infile 'C:\WINDOWS\Desktop\lab12.txt'; input adverts sales airplay attract; ProcReg; Model sales = adverts airplay attract /stb selection=rsquare; run;

  14. Rsquare Output The REG Procedure Model: MODEL1 Dependent Variable: sales R-Square Selection Method Number in Model R-Square Variables in Model 1 0.3587 airplay 1 0.3346 adverts 1 0.1063 attract ------------------------------------------------- 2 0.6293 adverts airplay 2 0.4132 adverts attract 2 0.4075 airplay attract ------------------------------------------------- 3 0.6647 adverts airplay attract

  15. Questions • What combination of variables accounts for the most variance? • Is this a significant difference?

  16. Testing Incremental R2

  17. Splitting data for cross validation, Step 1 Run a regression for only the odd values (If mod(_N_,2) = 1;): data d1; infile 'C:\WINDOWS\Desktop\lab11.txt'; input adverts sales airplay attract; *keep even data; *If mod(_N_,2) = 0; *keep odd data; If mod(_N_,2) = 1; Proc Reg; Model sales=adverts airplay attract / stb ; Run;

  18. Regression Output Analysis of Variance Sum of Mean Source DF Squares Square F Value Model 3 861377 287126 129.50 Error 196 434575 2217.21827 Corrected Total 199 1295952 Source Pr > F Model <.0001 Error Corrected Total Root MSE 47.08735 R-Square 0.6647 Dependent Mean 193.20000 Adj R-Sq 0.6595 Coeff Var 24.37233

  19. Regression Output (cont.) Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -26.61294 17.35000 -1.53 0.1267 adverts 1 0.08488 0.00692 12.26 <.0001 airplay 1 3.36742 0.27777 12.12 <.0001 attract 1 11.08634 2.43785 4.55 <.0001 Parameter Estimates Standardized Variable DF Estimate Intercept 1 0 adverts 1 0.51085 airplay 1 0.51199 attract 1 0.19168

  20. Step 2 Copy down the following results: standardized beta weights, raw beta weights, R-square, and adjusted R-square values and record these values in your homework.

  21. Step 3 Now you want to use the equation that you got from the odd data and see how well it predicts Y values on a new sample (even values). Create a regression equation from the raw beta-weights from the regression output for the odd sample and input it into SAS to get predicted Y-values for the even data.

  22. Step 3 (cont.) data d2; infile 'C:\WINDOWS\Desktop\lab11.txt'; input adverts sales airplay attract; *keep even data; If mod(_N_,2) = 0; *keep odd data; *If mod(_N_,2) = 1; *create variable predY with the regression equation in your output from the odd sample; predY= -26.61 + .08*adverts + 3.37*airplay + 11.09*attract; *correlate the Y-variable you just created (Y' or predY) with the actual Y values (sales) *from the even sample; proc corr; var sales predy; run;

  23. Proc Corr results 2 Variables: sales predY Simple Statistics Variable N Mean Std Dev Sum sales 200 193.20000 80.69896 38640 predY 200 190.29731 64.15582 38059 Simple Statistics Variable Minimum Maximum sales 10.00000 360.00000 predY 41.67120 327.00040 Pearson Correlation Coefficients, N = 200 Prob > |r| under H0: Rho=0 sales predY sales 1.00000 0.81499 <.0001 predY 0.81499 1.00000 <.0001

  24. Step 4 Square the correlation between your actual y (sales) with the predicted y (predY) and compare the values to SAS's adjusted R-square. r2=.8152=.664

  25. In Class Example • Use data 10 from Brannick’s website. • Commit, column 1-2: Self ratings of job commitment • Citizen, column 6-7: Self ratings of citizenship performance • Stress, column 11-12: Self-ratings of stress on the job; • Satis, column 16-17: Self-ratings of job satisfaction; high scores mean more satisfied with the job

  26. Example cont. • Perform forward, stepwise, backwards, and rsquare. • What model accounts for the most variance? • What is the difference between stepwise and forward output?

More Related