260 likes | 461 Views
Lab 12. Prediction, Regression with multiple IVs. Proc Reg; Model dv = iv1 iv2 / stb selection=forward (or backward, stepwise, maxr, minr, rsquare) ; Plot dv*iv1; Plot dv*iv2; Plot p.*r.; Run ;. Example Forward. data d6; infile 'C:WINDOWSDesktoplab12.txt';
E N D
Lab 12 Prediction, Regression with multiple IVs
Proc Reg; Model dv = iv1 iv2 / stb selection=forward (or backward, stepwise, maxr, minr, rsquare); Plot dv*iv1; Plot dv*iv2; Plot p.*r.; Run;
Example Forward data d6; infile 'C:\WINDOWS\Desktop\lab12.txt'; input adverts sales airplay attract; ProcReg; Model sales = adverts airplay attract /stb selection=forward; run;
Forward Output Model: MODEL1 Dependent Variable: sales Forward Selection: Step 1 Variable airplay Entered: R-Square = 0.3587 and C(p) = 361.3180 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 929726 929726 222.62 <.0001 Error 398 1662178 4176.32758 Corrected Total 399 2591904 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept 84.87251 7.94693 476354 114.06 <.0001 airplay 3.93918 0.26401 929726 222.62 <.0001 Bounds on condition number: 1, 1
Questions • What is the R2 for this model? • What is the raw regression equation for this model?
Forward Output (cont.) Forward Selection: Step 2 Variable adverts Entered: R-Square = 0.6293 and C(p) = 43.7832 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 1631048 815524 336.95 <.0001 Error 397 960856 2420.29268 Corrected Total 399 2591904 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept 41.12384 6.57300 94739 39.14 <.0001 adverts 0.08689 0.00510 701322 289.77 <.0001 airplay 3.58879 0.20204 763672 315.53 <.0001 Bounds on condition number: 1.0105, 4.042
Forward Output (cont.) Forward Selection: Step 3 Variable attract Entered: R-Square = 0.6647 and C(p) = 4.0000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1722754 574251 261.64 <.0001 Error 396 869150 2194.82212 Corrected Total 399 2591904 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -26.61294 12.20619 10433 4.75 0.0298 adverts 0.08488 0.00487 666664 303.74 <.0001 airplay 3.36742 0.19542 651719 296.93 <.0001 attract 11.08634 1.71509 91707 41.78 <.0001 Bounds on condition number: 1.0425, 9.2867
Forward Output (cont.) All variables have been entered into the model. Summary of Forward Selection Variable Number Partial Model Step Entered Vars In R-Square R-Square C(p) F Value Pr > F 1 airplay 1 0.3587 0.3587 361.318 222.62 <.0001 2 adverts 2 0.2706 0.6293 43.7832 289.77 <.0001 3 attract 3 0.0354 0.6647 4.0000 41.78 <.0001
Forward Output (cont.) Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1722754 574251 261.64 <.0001 Error 396 869150 2194.82212 Corrected Total 399 2591904 Root MSE 46.84893 R-Square 0.6647 Dependent Mean 193.20000 Adj R-Sq 0.6621 Coeff Var 24.24893 Parameter Estimates Parameter Standard Standardized Variable DF Estimate Error t Value Pr > |t| Estimate Intercept 1 -26.61294 12.20619 -2.18 0.0298 0 adverts 1 0.08488 0.00487 17.43 <.0001 0.51085 airplay 1 3.36742 0.19542 17.23 <.0001 0.51199 attract 1 11.08634 1.71509 6.46 <.0001 0.19168
Example Backward data d6; infile 'C:\WINDOWS\Desktop\lab12.txt'; input adverts sales airplay attract; ProcReg; Model sales = adverts airplay attract /stb selection=backward; run;
Backward Output Backward Elimination: Step 0 All Variables Entered: R-Square = 0.6647 and C(p) = 4.0000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1722754 574251 261.64 <.0001 Error 396 869150 2194.82212 Corrected Total 399 2591904 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -26.61294 12.20619 10433 4.75 0.0298 adverts 0.08488 0.00487 666664 303.74 <.0001 airplay 3.36742 0.19542 651719 296.93 <.0001 attract 11.08634 1.71509 91707 41.78 <.0001 Bounds on condition number: 1.0425, 9.2867 ***All variables left in the model are significant at the 0.1000 level.
Stepwise • In this example, Stepwise results in the same model as Forward. This need not be the case. At each step, Stepwise examines whether to both add and remove a variable. It essentially combines forward and backward.
Example Rsquare data d6; infile 'C:\WINDOWS\Desktop\lab12.txt'; input adverts sales airplay attract; ProcReg; Model sales = adverts airplay attract /stb selection=rsquare; run;
Rsquare Output The REG Procedure Model: MODEL1 Dependent Variable: sales R-Square Selection Method Number in Model R-Square Variables in Model 1 0.3587 airplay 1 0.3346 adverts 1 0.1063 attract ------------------------------------------------- 2 0.6293 adverts airplay 2 0.4132 adverts attract 2 0.4075 airplay attract ------------------------------------------------- 3 0.6647 adverts airplay attract
Questions • What combination of variables accounts for the most variance? • Is this a significant difference?
Splitting data for cross validation, Step 1 Run a regression for only the odd values (If mod(_N_,2) = 1;): data d1; infile 'C:\WINDOWS\Desktop\lab11.txt'; input adverts sales airplay attract; *keep even data; *If mod(_N_,2) = 0; *keep odd data; If mod(_N_,2) = 1; Proc Reg; Model sales=adverts airplay attract / stb ; Run;
Regression Output Analysis of Variance Sum of Mean Source DF Squares Square F Value Model 3 861377 287126 129.50 Error 196 434575 2217.21827 Corrected Total 199 1295952 Source Pr > F Model <.0001 Error Corrected Total Root MSE 47.08735 R-Square 0.6647 Dependent Mean 193.20000 Adj R-Sq 0.6595 Coeff Var 24.37233
Regression Output (cont.) Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -26.61294 17.35000 -1.53 0.1267 adverts 1 0.08488 0.00692 12.26 <.0001 airplay 1 3.36742 0.27777 12.12 <.0001 attract 1 11.08634 2.43785 4.55 <.0001 Parameter Estimates Standardized Variable DF Estimate Intercept 1 0 adverts 1 0.51085 airplay 1 0.51199 attract 1 0.19168
Step 2 Copy down the following results: standardized beta weights, raw beta weights, R-square, and adjusted R-square values and record these values in your homework.
Step 3 Now you want to use the equation that you got from the odd data and see how well it predicts Y values on a new sample (even values). Create a regression equation from the raw beta-weights from the regression output for the odd sample and input it into SAS to get predicted Y-values for the even data.
Step 3 (cont.) data d2; infile 'C:\WINDOWS\Desktop\lab11.txt'; input adverts sales airplay attract; *keep even data; If mod(_N_,2) = 0; *keep odd data; *If mod(_N_,2) = 1; *create variable predY with the regression equation in your output from the odd sample; predY= -26.61 + .08*adverts + 3.37*airplay + 11.09*attract; *correlate the Y-variable you just created (Y' or predY) with the actual Y values (sales) *from the even sample; proc corr; var sales predy; run;
Proc Corr results 2 Variables: sales predY Simple Statistics Variable N Mean Std Dev Sum sales 200 193.20000 80.69896 38640 predY 200 190.29731 64.15582 38059 Simple Statistics Variable Minimum Maximum sales 10.00000 360.00000 predY 41.67120 327.00040 Pearson Correlation Coefficients, N = 200 Prob > |r| under H0: Rho=0 sales predY sales 1.00000 0.81499 <.0001 predY 0.81499 1.00000 <.0001
Step 4 Square the correlation between your actual y (sales) with the predicted y (predY) and compare the values to SAS's adjusted R-square. r2=.8152=.664
In Class Example • Use data 10 from Brannick’s website. • Commit, column 1-2: Self ratings of job commitment • Citizen, column 6-7: Self ratings of citizenship performance • Stress, column 11-12: Self-ratings of stress on the job; • Satis, column 16-17: Self-ratings of job satisfaction; high scores mean more satisfied with the job
Example cont. • Perform forward, stepwise, backwards, and rsquare. • What model accounts for the most variance? • What is the difference between stepwise and forward output?