300 likes | 581 Views
LECTURE 8. PRINCIPAL COMPONENT ANALYSIS(PCA) EOFs and Principle Components; Selection Rules. Supplementary Readings : Wilks , chapters 9. WE’LL START OUT WITH AN EXAMPLE: 20th GLOBAL SURFACE TEMPERATURE RECORD. Surface Temperature Changes.
E N D
LECTURE 8 PRINCIPAL COMPONENT ANALYSIS(PCA) EOFs and Principle Components; Selection Rules Supplementary Readings: Wilks, chapters 9
WE’LL START OUT WITH AN EXAMPLE: 20th GLOBAL SURFACE TEMPERATURE RECORD
Surface Temperature Changes Climatic Research Unit (‘CRU’), University of East Anglia
EOF #1 EOFs for the five leading eigenvectors of the global temperature data from 1902-1980. The gridpoint areal weighting factor used in the PCA procedure has been removed from the EOFs so that relative temperature anomalies can be inferred from the patterns. 12% (88%) EOF #2 6% (3%) EOF #3 5% (1%) EOF #4 4% (1%) EOF #5 3% (0.5%)
FILTERING THROUGH PCA SURFACE TEMPERATURE RECORD FILTERED BY RETAINING PROJECTION ONTO WITH FIRST FIVE EIGENVECTORS
GLOBAL TEMPERATURE TREND PC #1 EOF #1
EL NINO/SOUTHERN OSCILLATION (ENSO) EOF #2 PC #2 Multivariate ENSO Index (“MEI”)
NORTH ATLANTIC OSCILLATION PC #3 EOF #3
NORTH ATLANTIC OSCILLATION PC #3 EOF #3
TROPICAL ATLANTIC “DIPOLE” PC #3 EOF #3
ATLANTIC MULTIDECADAL OSCILLATION PC #5 EOF #5
ATLANTIC MULTIDECADAL OSCILLATION PC #5 EOF #5
ATLANTIC MULTIDECADAL OSCILLATION PC #5 EOF #5
Recall from our earlier lecture the variance-covariance matrix A in the multivariate regression problem: The eigenvectors of A comprise an orthogonal predictor set (Principal Components Regression)
Let us return to the data matrix, (assume it has zero mean) Assume M>N (overdetermined; greater number of “equations” than “unknowns”) We can write Where U,V are unitary matrices (orthogonal matrices if X is real-valued), U is MxN, S is diagonal NxN, and V is NxN Singular Value Decomposition (SVD)
Typically, we are interested in the case N>M. A revisedoverdetermined problem can be obtained by redefining the problem: We can then write Where U, V are unitary matrices (orthogonal matrices if X is real-valued), U is NxM, S is diagonal MxM, and V is MxM Singular Value Decomposition (SVD)
V is a unitary matrix which diagonalizes XXT! Thus, S2 contains the eigenvalues of XXT There is a mathematical equivalence between taking the Singular Value Decomposition (SVD) of X, and finding the eigenvectors ofA=XXT
U contains as its columns the temporal patterns or Principal Components (“PC”s) corresponding to the M eigenvalues, which are the “right eigenvectors” of the SVD: V contains the as its columns the Spatial Pattern or Empirical Orthogonal Function (“EOF”) corrresponding to the M eigenvalues, which are the “left eigenvectors” of the SVD:
FILTERING WITH EIGENVECTORS We can filter the original data with a subset of M* eigenvectors:
Some Additional Considerations: • Standardization & Areal Weighting • Gappy Data • Frequency domain • “Rotation” • Selection Rules
SELECTION RULES How many eigenvectors do we consider significant? There is no uniquely defensible criterion... • Eigenvalue > 1/M • Break in slope in eigenvalue spectrum (“Scree” test) or log eigenvalue (“LEV”) spectrum • Eigenvalue lies outside expected distribution for M uncorrelated Gaussian time series of length N (Preisendorfer Rule N). This is an example of a Monte Carlo method • Rule N’ (take into account serial correlation)
SELECTION RULES Preisdendorfer Rule N
SELECTION RULES Asymptotic results of Preisendorfer Rule N for large sample size (N,M>100 or so) b=N/M
MATLAB EXAMPLE: NORTH ATLANTIC SEA LEVEL PRESSURE DATA 1899-1999