1 / 16

The Principal Components Regression Method

The Principal Components Regression Method. David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center Portland, Oregon. The General Linear Regression Model. where: Y = dependent variable X i = independent variables

abdulm
Download Presentation

The Principal Components Regression Method

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center Portland, Oregon

  2. The General Linear Regression Model where: Y = dependent variable Xi = independent variables bi = regression coefficients n = number of independent variables

  3. The Problem If X’s are intercorrelated, they contain redundant information, and the b’s cannot be meaningfully estimated. However, we don’t want to have to throw out most of the X’s but prefer to retain them for robustness.

  4. The Solution Possibilities: 1) Pre-combine X’s into composite index(es), e.g., Z-score method 2) Principal components regression These are similar in concept but differ in the mathematics.

  5. Principal Components Analysis Principal components regression is just like standard regression except the independent variables are principal components rather than the original X variables. Principal components are linear combinations of the X’s.

  6. Principal Components Analysis Each principal component is a weighted sum of all the X’s: . . .

  7. Principal Components Analysis The e’s are called eigenvectors, derived from a matrix equation whose input is the correlation matrix of all the X’s with each other. Principal components are new variables that are not correlated with each other. The principal components transformation is equivalent to a rotation of axes.

  8. Principal Components Analysis

  9. Principal Components Analysis The eigenvectors (weights) are based solely on the intercorrelations among the X’s and have no knowledge of Y (in contrast to Z-score, for which the opposite is true). Principal components can be used for purely descriptive purposes, but we want to use them as independent variables in a regression.

  10. Principal Components Analysis -- Example Independent Variables: X1 – X5 Snow water equivalent at 5 stations X6 – X10 Water year to date precipitation at 5 stations X11 Antecedent streamflow X12 Climate teleconnection index

  11. Correlation Matrix

  12. First Five Eigenvectors

  13. Principal Components Regression Procedure • Try the PC’s in order • Test for regression coefficient significance (t-test) • Stop at first insignificant component • Transform regression coefficients to be in terms of original variables • Sign test – coefficient signs must be same as correlation with Y

  14. Principal Components Regression Procedure t-test iterations for example data set (tcrit = 1.2): 10.243 10.105 0.622 : stop here, use only first PC Continuing ... 10.225 0.629 1.235 : 3rd PC exceeds tcrit 10.261 0.632 1.239 -1.073 10.092 0.621 1.219 -1.055 -0.588 11.723 0.722 1.416 -1.225 -0.683 -2.764 11.395 0.702 1.376 -1.191 -0.664 -2.686 -0.073

  15. Principal Components Regression Procedure Final model for example data set (1 PC): Y = 2.91 X1 + 3.34 X2 + 2.44 X3 + 2.27 X4 + 2.50 X5 + 3.34 X6 + 2.69 X7 + 2.45 X8 + 2.97 X9 + 2.78 X10 + 0.55 X11 + 2.47 X12 - 79.78 R = 0.906 JR = 0.890 SE = 62.558 JSE = 67.410

  16. Summary • Principal components analysis is a standard multivariate statistical procedure • Can be used for descriptive purposes to reduce the dimensionality of correlated variables • Can be taken a step further to provide new, non-correlated independent variables for regression • PC’s taken in order, subject to t-test and sign test • Final model is expressed in terms of original X variables

More Related