1 / 27

Lab 13

Lab 13 . Partial & Semipartial Correlations, Collinearity, and Nonlinear Trends. Partial and Semipartial Correlation. Partial Correlation: correlation between two variables with the effects of a 3 rd variable removed.

morey
Download Presentation

Lab 13

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lab 13 Partial & Semipartial Correlations, Collinearity, and Nonlinear Trends

  2. Partial and Semipartial Correlation • Partial Correlation: correlation between two variables with the effects of a 3rd variable removed. • To test: need to remove the variance attributable to the 3rd variable and then compute the correlation between the two remaining variables.

  3. Partial and Semipartial Correlation (cont.) • Run a regression of IV1 on IV3. • Regression partials out the variance in the DV (in this case IV1) into two things: variance due to the IV (R-squared) and variance not due to the IV (residuals). • Run a second regression of IV2 on IV3. • Compute a correlation between the residuals from the first regression and the residuals from the second regression. This is a correlation between IV1 and IV2 with IV3 partialed out.

  4. Example of Partial Correlation • Want to know the correlation between education and salary. We predict that gender and minority of the employees will influence this correlation, we are going to partial out their influence. • Compute correlation between education and salary controlling for gender and minority status.

  5. data d1; infile 'C:\WINDOWS\Desktop\lab13.txt'; input id sex$ hiredat $ educ title $ salary startsal jobtime prevexp minority $; if sex = "Male" then gender = 1; if sex = "Female" then gender = 2; if minority = "Yes" then minor = 1; if minority = "No" then minor = 2; procreg; model salary = gender minor; output out=data2 r=r1; procreg; model educ = gender minor; output out=data3 r=r2; data merged; merge data2 data3; proccorr data=merged; var salary educ gender minor r1 r2; run; Example of Partial Correlation

  6. Output for regressing salary on gender and minority status Model: MODEL1 Dependent Variable: salary Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 34116446497 17058223249 77.40 <.0001 Error 471 1.038E11 220382270 Corrected Total 473 1.379165E11 Root MSE 14845 R-Square 0.2474 Dependent Mean 34420 Adj R-Sq 0.2442 Coeff Var 43.13034 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 42051 3496.63766 12.03 <.0001 gender 1 -15961 1373.05406 -11.62 <.0001 minor 1 8762.76693 1652.36821 5.30 <.0001

  7. Output for regressing education on gender and minority Model: MODEL1 Dependent Variable: educ Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 599.98412 299.99206 42.35 <.0001 Error 471 3336.48213 7.08383 Corrected Total 473 3936.46624 Root MSE 2.66155 R-Square 0.1524 Dependent Mean 13.49156 Adj R-Sq 0.1488 Coeff Var 19.72749 Parameter Estimates Parameter Standar Variable DF Estimate Error t Value Pr > |t| Intercept 1 14.59945 0.62690 23.29 <.0001 gender 1 -2.13024 0.24617 -8.65 <.0001 minor 1 1.11934 0.29625 3.78 0.0002

  8. Example of Partial Correlation - Output Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum salary 474 34420 17076 16314875 15750 135000 educ 474 13.49156 2.88485 6395 8.00000 21.00000 gender 474 1.45570 0.49856 690.000 1.00000 2.00000 minor 474 1.78059 0.41428 844.000 1.00000 2.00000 r1 474 0 14814 0 -22315 91385 r2 474 0 2.65591 0 -6.70790 6.29210

  9. Example of Partial Correlation – Output (cont.) Pearson Correlation Coefficients, N = 474 Prob > |r| under H0: Rho=0 salary educ gender minor r1 r2 salary 1.00000 0.66056 -0.44992 0.17734 0.86754 0.50662 <.0001 <.0001 0.0001 <.0001 <.0001 educ 0.66056 1.00000 -0.35599 0.13289 0.53763 0.92064 <.0001 <.0001 0.0038 <.0001 <.0001 gender -0.44992 -0.35599 1.00000 0.07567 0.00000 0.00000 <.0001 <.0001 0.0999 1.0000 1.0000 minor 0.17734 0.13289 0.07567 1.00000 0.00000 0.00000 0.0001 0.0038 0.0999 1.0000 1.0000 r1 0.86754 0.53763 0.000 0.0000 1.00000 0.58397 Residual <.0001 <.0001 1.000 1.0000 <.0001

  10. Collinearity • Collinearity means that within the set of IVs, some of the IVs are (nearly) totally predicted by the other IVs. • Diagnostics: • Correlation matrix – look for large correlations between IVs • Variance Inflation Factor (VIF) – look for values greater than 10. • Tolerance – look for small values close to zero • Condition indices – Look for values greater than 30. Collinearity is spotted by finding 2 or more variables that have large proportions of variance (.50 or more) that correspond to large condition indices.

  11. Example of Collinearity analysis Research on eating disorders. • BMI is used to approximate % body fat. • Percent overweight • Appearance anxiety • Body image • Eating disorder measures the amount of behaviors that signal an eating disorder Check for collinearity by running a correlation matrix and regression analysis.

  12. Program data d1; input (bmi percent anxiety image disorder)(5.0); cards; proccorr; procreg; model disorder = bmi percent anxiety image /vif tol collin; run;

  13. Proc Corr Output Pearson Correlation Coefficients, N = 235 Prob > |r| under H0: Rho=0 bmi percent anxiety image disorder bmi 1.00000 0.97992 0.43771 0.65529 0.54376 <.0001 <.0001 <.0001 <.0001 percent 0.97992 1.00000 0.40085 0.68914 0.48138 <.0001 <.0001 <.0001 <.0001 anxiety 0.43771 0.40085 1.00000 0.33633 0.14723 <.0001 <.0001 <.0001 0.0240 image 0.65529 0.68914 0.33633 1.00000 0.36574 <.0001 <.0001 <.0001 <.0001 disorder 0.54376 0.48138 0.14723 0.36574 1.00000 <.0001 <.0001 0.0240 <.0001

  14. Proc Reg Output Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 4 10628 2657.02322 37.79 <.0001 Error 230 16171 70.30887 Corrected Total 234 26799 Root MSE 8.38504 R-Square 0.3966 Dependent Mean 80.18298 Adj R-Sq 0.3861 Coeff Var 10.45738 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 56.92129 8.20539 6.94 <.0001 bmi 1 2.11256 0.27077 7.80 <.0001 percent 1 -1.61429 0.27513 -5.87 <.0001 anxiety 1 -0.19021 0.06199 -3.07 0.0024 image 1 0.18510 0.08071 2.29 0.0227

  15. Collinearity Output (tol and VIF) Parameter Estimates Variance Variable DF Tolerance Inflation Intercept 1 . 0 bmi 1 0.03632 27.53589 percent 1 0.03460 28.90430 anxiety 1 0.77532 1.28979 image 1 0.50634 1.97497

  16. Collinearity Output (collin) Collinearity Diagnostics Condition Number Eigenvalue Index 1 4.97499 1.00000 2 0.01424 18.69085 3 0.00840 24.33043 4 0.00207 48.99494 5 0.00029641 129.55302 Collinearity Diagnostics --------------------Proportion of Variation-------------------- Number Intercept bmi percent anxiety image 1 0.00017 0.00003 0.00002 0.00044 0.000115 2 0.06825 0.01779 0.00724 0.16035 0.00312 3 0.13010 0.00157 0.00003 0.77013 0.04982 4 0.70537 0.00941 0.00277 0.01361 0.87007 5 0.09609 0.971200.98993 0.05546 0.07688

  17. Add collinear variables and rerun data d1; input (bmi percent anxiety image disorder)(5.0); bmiperc= bmi + percent; cards; proccorr; procreg; model disorder = bmi percent anxiety image /vif tol collin; run;

  18. Proc Corr Output Pearson Correlation Coefficients, N = 235 Prob > |r| under H0: Rho=0 bmiperc anxiety image bmiperc 1.00000 0.42132 0.67569 <.0001 <.0001 anxiety 0.42132 1.00000 0.33633 <.0001 <.0001 image 0.67569 0.33633 1.00000 <.0001 <.0001

  19. Proc Reg Output Sum of Mean Source DF Squares Square F Value Pr > F Model 3 7291.63332 2430.54444 28.78 <.0001 Error 231 19507 84.44805 Corrected Total 234 26799 Root MSE 9.18956 R-Square 0.2721 Dependent Mean 80.18298 Adj R-Sq 0.2626 Coeff Var 11.46074 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 39.18791 8.53866 4.59 <.0001 bmiperc 1 0.26428 0.03998 6.61 <.0001 anxiety 1 -0.09313 0.06615 -1.41 0.1605 image 1 0.04590 0.08563 0.54 0.5924

  20. Collinearity Diagnostics Variance Variable DF Tolerance Inflation Intercept 1 . 0 bmiperc 1 0.50098 1.99609 anxiety 1 0.81758 1.22313 image 1 0.54020 1.85116 Condition Number Eigenvalue Index 1 3.98060 1.00000 2 0.00943 20.54641 3 0.00799 22.32086 4 0.00198 44.82460 -----------------Proportion of Variation---------------- Number Intercept bmiperc anxiety image 1 0.00030972 0.00052415 0.00072670 0.00019231 2 0.01865 0.41736 0.56626 0.01061 3 0.26238 0.19662 0.41635 0.03812 4 0.71866 0.38550 0.01666 0.95108

  21. Testing Interactions with Regression data d1; input id sex$ hiredat $ educ title $ salary startsal jobtime prevexp minority $; if sex = "Male" then gender = 1; if sex = "Female" then gender = 2; inter = gender*prevexp; cards; procreg; model salary = gender prevexp; procreg; model salary = gender prevexp inter; run;

  22. Proc Reg Output w/out interaction Sum of Mean Source DF Squares Square F Value Pr > F Model 2 32095090228 16047545114 71.43 <.0001 Error 471 1.058214E11 224673896 Corrected Total 473 1.379165E11 Root MSE 14989 R-Square 0.2327 Dependent Mean 34420 Adj R-Sq 0.2295 Coeff Var 43.54827 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 61063 2340.43528 26.09 <.0001 prevexp 1 -28.80629 6.68120 -4.31 <.0001 gender 1 -16406 1401.56098 -11.71 <.0001

  23. Proc Reg Output w/out interaction Sum of Mean Source DF Squares Square F Value Pr > F Model 3 32501255237 10833751746 48.30 <.0001 Error 470 1.054152E11 224287745 Corrected Total 473 1.379165E11 Root MSE 14976 R-Square 0.2357 Dependent Mean 34420 Adj R-Sq 0.2308 Coeff Var 43.51083 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 63525 2969.20179 21.39 <.0001 prevexp 1 -54.37887 20.14155 -2.70 0.0072 gender 1 -18074 1870.07488 -9.66 <.0001 inter 1 18.45578 13.71463 1.35 0.1790

  24. Significant interaction, run proc corr and proc gplot data d1; input id sex$ hiredat $ educ title $ salary startsal jobtime prevexp minority $; if sex = "Male" then gender = 1; if sex = "Female" then gender = 2; inter = gender*prevexp; cards; symbol1 color=blue interpol=r1 value=none; symbol2 color=black interpol=r2 value=none; ProcSort; by gender; Procgplot; plot salary * prevexp=gender; Proc corr; Var salary prevexp; By gender; run;

  25. Correlations by gender ------------------- gender=1 ------------------ Pearson Correlation Coefficients, N = 258 Prob > |r| under H0: Rho=0 salary prevexp salary 1.00000 -0.20208 0.0011 prevexp -0.20208 1.00000 0.0011 ----------------- gender=2 ----------------- Pearson Correlation Coefficients, N = 216 Prob > |r| under H0: Rho=0 salary prevexp salary 1.00000 -0.21958 0.0012 prevexp -0.21958 1.00000 0.0012

  26. In Class Examples • Download data8lab. Compute a partial correlation between iq and age controlling for knldge. • Download data8lab. Regress iq on knwldge and age. Then run the regression again and include the interaction term of knwldge and age. • Download dataset assign10.txt and check for multicollinearity.

More Related