230 likes | 417 Views
The Principal Components Regression Method. David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center Portland, Oregon. The General Linear Regression Model. where: Y = dependent variable X i = independent variables
E N D
The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center Portland, Oregon
The General Linear Regression Model where: Y = dependent variable Xi = independent variables bi = regression coefficients n = number of independent variables
The Problem If X’s are intercorrelated, they contain redundant information, and the b’s cannot be meaningfully estimated. However, we don’t want to have to throw out most of the X’s but prefer to retain them for robustness.
The Solution Possibilities: 1) Pre-combine X’s into composite index(es), e.g., Z-score method 2) Principal components regression These are similar in concept but differ in the mathematics.
Principal Components Analysis Principal components regression is just like standard regression except the independent variables are principal components rather than the original X variables. Principal components are linear combinations of the X’s.
Principal Components Analysis Each principal component is a weighted sum of all the X’s: . . .
Principal Components Analysis The e’s are called eigenvectors, derived from a matrix equation whose input is the correlation matrix of all the X’s with each other. Principal components are new variables that are not correlated with each other. The principal components transformation is equivalent to a rotation of axes.
Principal Components Analysis The eigenvectors (weights) are based solely on the intercorrelations among the X’s and have no knowledge of Y (in contrast to Z-score, for which the opposite is true). Principal components can be used for purely descriptive purposes, but we want to use them as independent variables in a regression.
Principal Components Analysis -- Example Independent Variables: X1 – X5 Snow water equivalent at 5 stations X6 – X10 Water year to date precipitation at 5 stations X11 Antecedent streamflow X12 Climate teleconnection index
Principal Components Regression Procedure • Try the PC’s in order • Test for regression coefficient significance (t-test) • Stop at first insignificant component • Transform regression coefficients to be in terms of original variables • Sign test – coefficient signs must be same as correlation with Y
Principal Components Regression Procedure t-test iterations for example data set (tcrit = 1.2): 10.243 10.105 0.622 : stop here, use only first PC Continuing ... 10.225 0.629 1.235 : 3rd PC exceeds tcrit 10.261 0.632 1.239 -1.073 10.092 0.621 1.219 -1.055 -0.588 11.723 0.722 1.416 -1.225 -0.683 -2.764 11.395 0.702 1.376 -1.191 -0.664 -2.686 -0.073
Principal Components Regression Procedure Final model for example data set (1 PC): Y = 2.91 X1 + 3.34 X2 + 2.44 X3 + 2.27 X4 + 2.50 X5 + 3.34 X6 + 2.69 X7 + 2.45 X8 + 2.97 X9 + 2.78 X10 + 0.55 X11 + 2.47 X12 - 79.78 R = 0.906 JR = 0.890 SE = 62.558 JSE = 67.410
Summary • Principal components analysis is a standard multivariate statistical procedure • Can be used for descriptive purposes to reduce the dimensionality of correlated variables • Can be taken a step further to provide new, non-correlated independent variables for regression • PC’s taken in order, subject to t-test and sign test • Final model is expressed in terms of original X variables