1 / 27

Anareg Week 10

Anareg Week 10. Multicollinearity Interesting special cases Polynomial regression. Multicollinearity. Numerical analysis problem is that the matrix X’X is close to singular and is therefore difficult to invert accurately

Download Presentation

Anareg Week 10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anareg Week 10 • Multicollinearity • Interesting special cases • Polynomial regression

  2. Multicollinearity • Numerical analysis problem is that the matrix X’X is close to singular and is therefore difficult to invert accurately • Statistical problem is that there is too much correlation among the explanatory variables and it is therefore difficult to determine the regression coefficients

  3. Multicollinearity (2) • Solve the statistical problem and the numerical problem will also be solved • The statistical problem is more serious than the numerical problem • We want to refine a model that has redundancy in the explanatory variables even if X’X can be inverted without difficulty

  4. Multicollinearity (3) • Extremes cases can help us to understand the problem • if all X’s are uncorrelated, Type I SS and Type II SS will be the same, i.e, the contribution of each explanatory variable to the model will be the same whether or not the other explanatory variables are in the model • if there is a linear combination of the explanatory variables that is a constant (e.g. X1 = X2 (X1 - X2 = 0)), then the Type II SS for the X’s involved will be zero

  5. An example Y = gpa X1 = hsm X3 = hss X4 = hse X5 = satm X6 = satv X7 = genderm; Define: sat=satm+satv; We will regress Y on sat satm and satv;

  6. Output Source DF Model 2 Error 221 Corrected Total 223 • Something is wrong • dfM=2 but there are 3 Xs

  7. Output (2) NOTE: Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.

  8. Output (3) NOTE: The following parameters have been set to 0, since the variables are a linear combination of other variables as shown. satv = sat - satm

  9. Output (4) Par St Var DF Est Err t P Int 1 1.28 0.37 3.43 0.0007 sat B -0.00 0.00 -0.04 0.9684 satm B 0.00 0.00 2.10 0.0365 satv 0 0 . . .

  10. Extent of multicollinearity • Our CS example had one explanatory variable equal to a linear combination of other explanatory variables • This is the most extreme case of multicollinearity and is detected by statistical software because (X’X) does not have an inverse • We are concerned with cases less extreme

  11. Effects of multicollinearity • Regression coefficients are not well estimated and may be meaningless • Similarly for standard errors of these estimates • Type I SS and Type II SS will differ • R2 and predicted values are usually ok

  12. Two separate problems • Numerical accuracy • (X’X) is difficult to invert • Need good software • Statistical problem • Results are difficult to interpret • Need a better model

  13. Polynomial regression • We can do linear, quadratic, cubic, etc. by defining squares, cubes, etc. in a data step and using these as predictors in a multiple regression • We can do this with more than one explanatory variable • When we do this we generally create a multicollinearity problem

  14. Polynomial Regression (2) • We can remove the correlation between explanatory variables and their squares • Center (subtract the mean) before squaring • NKNW rescale by standardizing (subtract the mean and divide by the standard deviation)

  15. Interaction Models • With several explanatory variables, we need to consider the possibility that the effect of one variable depends on the value of another variable • Special cases • One indep variable – second order • One indep variable – Third order • Two cindep variables – second order

  16. One Independentvariable –Second Order • The regression model: • The mean response is a parabole and is frequently called a quadratic response function. βo reperesents the mean response of Y when x = 0 and β1 is often called the linear effect coeff while β11 is called the quadratic effect coeff.

  17. One Independentvariable –Third Order • The regression model: • The mean response is

  18. TwoIndependentvariable –Second Order • The regression model: • The mean response is the equation of a conic section. The coeff β12 is often called the interaction effect coeff.

  19. NKNW Example p 330 • Response variable is the life (in cycles) of a power cell • Explanatory variables are • Charge rate (3 levels) • Temperature (3 levels) • This is a designed experiment

  20. check the data Obs cycles chrate temp 1 150 0.6 10 2 86 1.0 10 3 49 1.4 10 4 288 0.6 20 5 157 1.0 20 6 131 1.0 20 7 184 1.0 20 8 109 1.4 20 9 279 0.6 30 10 235 1.0 30 11 224 1.4 30

  21. Create the new variables and run the regression Create new variables chrate2=chrate*chrate; temp2=temp*temp; ct=chrate*temp; Then regress cycles on chrate, temp, chrate2, temp2, and ct;

  22. a. Regression Coefficients Output

  23. Output (2) b. ANOVA Table

  24. Conclusion • We have a multicollinearity problem • Lets look at the correlations (use proc corr) • There are some very high correlations • r(chrate,chrate2) = 0.99103 • r(temp,temp2) = 0.98609

  25. A remedy • We can remove the correlation between explanatory variables and their squares • Center (subtract the mean) before squaring • NKNW rescale by standardizing (subtract the mean and divide by the standard deviation)

  26. Last slide • Read NKNW 7.6 to 7.7 and the problems on pp 317-326 • We used programs cs4.sas and NKNW302.sas to generate the output for today

  27. Last slide • Read NKNW 8.5 and Chapter 9

More Related