1 / 52

Psych 5510/6510

Psych 5510/6510. Chapter Eight--Multiple Regression: Models with Multiple Continuous Predictors Part 2: Testing the Addition of One Parameter at a Time. Spring, 2009. Overall Test. In part one we looked at the overall test of the parameters in Model A: Model C :

Download Presentation

Psych 5510/6510

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Psych 5510/6510 Chapter Eight--Multiple Regression: Models with Multiple Continuous Predictors Part 2: Testing the Addition of One Parameter at a Time Spring, 2009

  2. Overall Test In part one we looked at the overall test of the parameters in Model A: Model C: Ŷi=β0 (where β0 = μY) PC=1 Model A: Ŷi=β0+ β1Xi1+ β2Xi2+ ...+βp-1Xip-1 PA=p

  3. Disadvantages The disadvantages of this overall test are: • If some of the parameters in A are worthwhile and some are not, the PRE per parameter added may not be very impressive, with the weaker parameters washing out the effects of the stronger. • As with the overall F test in ANOVA, our alternative hypothesis is very vague, that at least one β1 through βp-1 doesn’t equal 0. If Model A is worthwhile overall, we don’t know which of its individual parameters contributed to that worthwhileness.

  4. One parameter test It is usually more interesting to test adding one parameter at a time (PA-PC=1) to our model. Model C Ŷi=β0+ β1Xi1+ β2Xi2+ ...+βp-1Xip-1 Model A Ŷi=β0+ β1Xi1+ β2Xi2+ ...+βp-1Xip-1+ βpXip HO: βp= 0 HA: βp 0

  5. Model C: Ŷi=β0+ β1Xi1+ β2Xi2+ ...+βp-1Xip-1 Model A: Ŷi=β0+ β1Xi1+ β2Xi2+ ...+βp-1Xip-1+ βpXip The values of β1 through βp-1 will probably change when βpXip is added to the model (as we will see, this is due to redundancy among the predictor variables). Remember our subscripting (useful when the situation is not clear from the context): β4.123 (i.e. the value of β4 when 1,2, and 3 are included in the model)

  6. Redundancy When we use more than one predictor variable in our model then an important issue arises; specifically, to what degree are the predictor variables redundant (i.e. share information). For example, using both a child’s age and a child’s height to predict their weight is somewhat redundant, as there is a relationship between their height and age. Please review the Venn diagrams on redundancy from Part 1.

  7. Redundancy Thus two or more predictor variables are redundant to the degree to which they are correlated. Let’s say we are going to add another predictor variable Xp to the model below: Model C: Ŷi = β0 + β1Xi1 +β2Xi2 + β3Xi3 and we want to know how redundant Xp may be with the X variables that are already in the model. Well, we know how to determine that...

  8. Measuring the Redundancy of Xp with X1, X2, and X3 Yes indeed, we know how to measure the relationship between Xp and X1, X2, and X3. The R² (i.e. PRE) of moving from Model C to Model A is the measure of the redundancy between Xp and X1, X2, and X3..

  9. Redundancy We can measure the redundancy between the variable we are going to add (Xp) and those variables already in the model (X1 through Xp-1) by seeing how well those already added variables can predict the value of Xp. To do this we will regress Xp on variables X1 through Xp-1, and look at the resulting PRE. The PRE for regressing Xp on variables X1 through Xp-1 is symbolized as R²p, which is shorter version of the full symbol which would be R²p.123…p-1

  10. Tolerance Conversely, tolerance is a measure of how unique a variable is compared to the other predictor variables already in the model. If tolerance is low then the variable is redundant and can add little to the model, if tolerance is high then the variable is not very redundant, and thus has the ability to add significantly to the model (if it is correlated to Y of course). (For a pictorial representation of these ideas see the handout on ‘Tolerance’)

  11. Confidence Intervals

  12. Low Tolerance The formula for the confidence interval of β includes tolerance in its denominator (look back at that formula), if tolerance is low then the confidence interval of β is large (and thus rejecting β=0 becomes unlikely). If tolerance is very low then the confidence interval for β becomes huge, meaning that we become increasingly unable to determine the true value of β, and the accuracy of some computations begins to drop. Because of this, when tolerance is below .01 (or .001) some statistical programs issue a warning message.

  13. Variable Inflation Factor Because a low tolerance makes the confidence interval wider, some programs report the variance inflation factor (VIF) which is the inverse of the tolerance.

  14. Back to the One Parameter Test We are looking at the PRE of adding one new predictor variable to our model: Model C: Ŷi=β0+ β1Xi1+ β2Xi2+ ...+βp-1Xip-1 Model A: Ŷi=β0+ β1Xi1+ β2Xi2+ ...+βp-1Xip-1+ βpXip H0: η² = 0 HA: η² > 0, or equivalently, H0: βp= 0 HA: βp 0

  15. Statistical Significance SPSS makes this easy, simply regress Y on the variables of Model A. For each β in the model SPSS provides its confidence interval, and the values of ‘t’ and ‘p’ for the test of whether that β = 0. Not only do we get the information needed to decide whether it is worthwhile to add Xp to a model that contains the other variables, we get the same information about adding each variable last to a model that contains the other variables…

  16. Significance (cont.) …for each variable SPSS gives us the PRE for adding that variable to a model containing all of the other variables, and tells us whether or not the β that goes with the variable differs from zero. So in addition to testing βp similar information is provided forβ1 (see below) and all the other β’s: Model C: Ŷi=β0+ β2Xi2+ ...+βp-1Xip-1+ βpXip Model A: Ŷi=β0+ β2Xi2+ ...+βp-1Xip-1+ βpXip + β1Xi1 H0: β1= 0 HA: β1 0 And so on for each β.

  17. Coefficient of Partial Determination The PRE from adding a new predictor variable to a model that already contains predictor variables is called the ‘coefficient of partial determination’. It is symbolized as r²Yp.123…p-1 (the PRE of adding variable Xp to the model of Y when variables X1-Xp-1 are already included). See the handout on ‘Partial Correlations’.

  18. Partial Correlation Coefficient The square root of the coefficient of partial determination is called the ‘partial correlation coefficient’. It is symbolized as rYp.123…p-1 It represents the correlation between Y and Xp when the influences of the other predictor variables have been removed from both Y and Xp.

  19. More Descriptions of ‘Partial Correlation Coefficient’ It is the correlation between Y and Xp when the other predictor variables are ‘held constant’. It is the correlation between Y and Xp for people who have identical scores on the other predictor variables.

  20. Part Correlations Another correlation sometimes examined (but not in our approach) is called the ‘part’ or ‘semipartial’ correlation. In this correlation the influence of the other predictor variables (X1-Xp-1) is only removed from Xp, rather than from both Xp and Y (see the handout on Partial Correlations).

  21. Partial This and Partial That We have three ‘partial’ terms: Partial regression coefficient: the value of β (or equivalently est. β = b) that goes with a particular predictor variable. Partial correlation coefficient: the correlation between Y and a particular predictor variable after the influence of the other predictor variables has been removed from both Y and that variable. Partial coefficient of determination: the PRE of adding a particular predictor variable to a model that already contains the other predictor variables. It is the (partial correlation coefficient)² Now let’s see how the terms connect.

  22. Back to Our Example Dependent Variable: GPA Predictor Variables: • HS_Rank • SAT_V • SAT_M Let’s look at the various ‘partial’ values that go with the predictor variable SAT_M.

  23. The ‘Partial’ Plot • Use the other predictor variables (HS rank and SAT_V) to predict Yi. • Compute the error of those predictions (I.e. create a variable consisting of Yi –Ŷi). This is a variable of residuals (showing how much the actual Y scores vary from what HS rank and SAT_V can predict). Name this variable Yresiduals.

  24. The ‘Partial’ Plot (cont.) • Use the other predictor variables (HS rank and SAT_V) to predict SAT_M. • Compute the error of those predictions (I.e. create a variable consisting of SAT_Mi actual – SAT_Mi predicted scores). This is a variable of residuals (showing how much the actual SAT_M scores vary from what HS rank and SAT_V can predict). Name this variable SAT_Mresiduals.

  25. The ‘Partial’ Plot (cont.) • Now graph the scatter plot of Yresiduals and SAT_Mresiduals. This is the relationship between Y and SAT_M after the influence of the other predictor variables have been removed (from both of them). This is equivalent to saying the relationship between Y and SAT_M when the values of the other variables are held constant.

  26. The ‘Partial’ Plot (cont.)

  27. The ‘Partial’ Plot (cont.) The partial regression coefficient is the slope of that regression line. The partial correlation coefficient is the correlation shown in the plot (the correlation between the Yresiduals and SAT_Mresiduals). The partial coefficient of determination is the r² of that correlation (how much we gain by using the regression line rather than the mean of the Yresiduals scores to predict the Yresiduals scores). Note that the mean of the Yresiduals scores =0.

  28. Back to Our Example (again) Dependent Variable: GPA Predictor Variables: • HS_Rank • SAT_V • SAT_M See SPSS printout.

  29. Test of Worthwhileness of Overall Model Y=GPA Model C: Ŷi = β0(where β0 is μY) Model A: Ŷi = β0 + β1 (HSRanki ) + β2(SAT_Vi) + β3(SAT_Mi) PRE=.220 F*=38.544 p<.001 est. η²=.214

  30. A Look at Each Predictor Variable We will now examine each predictor variable individually, looking at the analysis of adding each variable last to a model that already contains the other predictor variables.

  31. HSRank: Analysis Model C: Ŷi = β0 + β2(SAT_Vi) + β3(SAT_Mi) Model A: Ŷi = β0 + β2(SAT_Vi) + β3(SAT_Mi) + β1 (HSRanki ) From SPSS: Ŷi = -1.739+ .027(HSRanki ) + .011(SAT_Vi) + .022(SAT_Mi) • b1=.027, test to determine whether β10: t=8.3, p<.001 • Partial correlation between HS rank and GPA (i.e. the correlation between those two variables when the other predictor variables are held constant (i.e. the influences of the other predictor variables have been removed from HS rank and GPA): 0.38 • PRE of adding HS rank to the model (i.e. moving from Model C to Model A): 0.38²=0.14 (p<.001, same as from part ‘1’ above). Extra parameter of Model A worthwhile.

  32. HSRank: Residual Plot The relationship between HSRank and GPA with the other variables held constant. The slope of the regression line is.027 (i.e. b1), the correlation between HSRank and GPA in this plot is .38 (i.e. the partial correlation), the PRE of using HSRank to predict GPA is .38²=0.14

  33. SAT_V Model C: Ŷi = β0 + β1 (HSRanki ) + β3(SAT_Mi) Model A: Ŷi = β0 + β1 (HSRanki ) + β2(SAT_Vi) + β3(SAT_Mi) As before: Ŷi = -1.739+ .027(HSRanki ) + .011(SAT_Vi) + .022(SAT_Mi) • b2=.011, test to determine whether β2 0: t=2.5, p=.011 • Partial correlation between SAT_V and GPA (i.e. the correlation between those two variables when the other predictor variables are held constant (i.e. the influences of the other predictor variables have been removed from SAT_V and GPA): 0.126 • PRE of adding SAT_V to the model (i.e. moving from Model C to Model A): .126²=0.016 (p=.011, same as from part ‘1’ above). Extra parameter of Model A worthwhile.

  34. SAT_V: Residual Plot The relationship between SAT_V and GPA with the other variables held constant. The slope of the regression line is.011 (i.e. b2), the correlation between SAT_V and GPA in this plot is .126 (i.e. the partial correlation), the PRE of using SAT_V to predict GPA is .126²=0.016

  35. SAT_M Model C: Ŷi = β0 + β1 (HSRanki ) + β2(SAT_Vi) Model A: Ŷi = β0 + β1 (HSRanki ) + β2(SAT_Vi) + β3(SAT_Mi) As before: Ŷi = -1.739+ .027(HSRanki ) + .011(SAT_Vi) + .022(SAT_Mi) • b3=.022, test to determine whether β30: t=4.5, p<.000 • Partial correlation between SAT_M and GPA (i.e. the correlation between those two variables when the other predictor variables are held constant (i.e. the influences of the other predictor variables have been removed from SAT_M and GPA): 0.216 • PRE of adding SAT_M to the model (i.e. moving from Model C to Model A): .216²=0.047 (p<.000, same as from part ‘1’ above). Extra parameter of Model A worthwhile.

  36. SAT_M: Residual Plot The relationship between SAT_M and GPA with the other variables held constant. The slope of the regression line is.022 (i.e. b3), the correlation between SAT_M and GPA in this plot is .216 (i.e. the partial correlation), the PRE of using SAT_M to predict GPA is .216²=0.047

  37. Tolerances HSRank=.995 SAT_V=.893 SAT_M=.890 The values of the tolerances show that the predictor variables were not very redundant, leaving each with the opportunity to significantly add to the model if their correlation with Y is high.

  38. Another Example We are interested in the relationship between unemployment (UN) and industrial production (IP). We expect there to be a negative correlation between the two (the higher the industrial production that year the lower the unemployment, and vice versa).

  39. Data

  40. MODELS • Y=UN • X=IP • MODEL C: Ŷi=β0 =2.82 • MODEL A: Ŷi=β0+ β1 Xi =-.035+.021(Xi) • PRE=.098, p=.379 • Not only do we not reject H0, but the slope was unexpectedly a positive value!

  41. Scatter Plot UN and IP

  42. Bringing ‘Year’ into the Model • Let’s take a look at the relationship between unemployment and year (using the ‘year codes’ of 1 through 10). • Y=UN • X=Year • MODEL C: Ŷi=β0 =2.82 • MODEL A: Ŷi=β0+ β1 Xi =1.67+.21(Xi) • PRE=.428, p=.04 • We reject H0, it is worthwhile to add year to the model (compare to using just the mean)

  43. Scatter Plot UN and Year

  44. Year and IP • Let’s see year’s ability to predict industrial production (IP). • Y=IP • X=Year • MODEL C: Ŷi=β0 =138 • MODEL A: Ŷi=β0+ β1 Xi =114+4.75(Xi) • PRE=.821, p<.001 • We reject H0, year is also good for predicting industrial production.

  45. Year as a Suppressor Variable Perhaps the variable ‘year’ is having a large effect on both unemployment (UN) and industrial production (IP), and is thus masking the relationship between UN and IP. If this is true then year would be called a suppressor variable.

  46. Residuals Let’s take a look at the relationship between unemployment and industrial production when the effects of year are removed from both variables.

  47. Residuals from Using Year to Predict UN and IP

  48. Scatterplot of Residuals

  49. UN and IP Residuals • Partial correlation coefficient: -0.88 • PRE=.77 p=.002 • So there is a negative correlation between UN and IP once the effect of year has been taken out of both UN and IP.

  50. SPSS Output We don’t need to actually compute the residuals of using year to predict unemployment, and then using year to predict industrial production, to find the relationship between unemployment and industrial production after the effect of year on both variables has been incorporated into the model. SPSS gives us all that in its computation of the partial regression coefficient and the partial correlation coefficient. See the handout from the course web site.

More Related