Object Orie’d Data Analysis, Last Time

Object Orie’d Data Analysis, Last Time • Finished Algebra Review • Multivariate Probability Review • PCA as an Optimization Problem (Eigen-decomp. gives rotation, easy sol’n) • Connected Mathematics & Graphics • Started Redistribution of Energy

PCA Redistribution of Energy Convenient summary of amount of structure: Total Sum of Squares Physical Interpetation: Total Energy in Data Insight comes from decomposition Statistical Terminology: ANalysis Of VAriance (ANOVA)

PCA Redist’n of Energy (Cont.) ANOVA mean decomposition: Total Variation = = Mean Variation + Mean Residual Variation Mathematics: Pythagorean Theorem Intuition Quantified via Sums of Squares

Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Residuals from Mean = Data – Mean Most of Variation = 92% is Mean Variation SS Remaining Variation = 8% is Resid. Var. SS

PCA Redist’n of Energy (Cont.) • Have already studied this decomposition (recall curve e.g.) • Variation due to Mean (% of total) • Variation of Mean Residuals (% of total)

PCA Redist’n of Energy (Cont.) Now decompose SS about the mean where: Energy is expressed in trace of covar’ce matrix

PCA Redist’n of Energy (Cont.) • Eigenvalues provide atoms of SS decomposi’n • Useful Plots are: • “Power Spectrum”: vs. • “log Power Spectrum”: vs. • “Cumulative Power Spectrum”: vs. • Note PCA gives SS’s for free (as eigenvalues), • but watch factors of

PCA Redist’n of Energy (Cont.) • Note, have already considered some of these Useful Plots: • Power Spectrum • Cumulative Power Spectrum

Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Revisit SS Decomposition for PC1: PC1 has “most of var’n” = 93% Reflected by good approximation in Object Space

Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Revisit SS Decomposition for PC1: PC2 has “only a little var’n” = 7% Reflected by poor approximation in Object Space

Different Views of PCA • Solves several optimization problems: • Direction to maximize SS of 1-d proj’d data • Direction to minimize SS of residuals • (same, by Pythagorean Theorem) • “Best fit line” to data in “orthogonal sense” • (vs. regression of Y on X = vertical sense • & regression of X on Y = horizontal sense) • Use one that makes sense…

Different Views of PCA Next Time: Add some graphics about this Scatterplot of Toy Data sets + various fits, with residuals Will be Useful in Stor 165, as well

Different Views of PCA 2-d Toy Example Feature Space Object Space • Max SS of Projected Data • Min SS of Residuals • Best Fit Line

PCA Data Representation Idea: Expand Data Matrix in terms of inner prod’ts & eigenvectors Recall notation: Eigenvalue expansion (centered data):

PCA Data Represent’n (Cont.) • Now using: • Eigenvalue expansion (raw data): • Where: • Entries of are loadings • Entries of are scores

PCA Data Represent’n (Cont.) Can focus on individual data vectors: (part of above full matrix rep’n) Terminology: are called “PCs” and are also called scores

PCA Data Represent’n (Cont.) • More terminology: • Scores, are coefficients in eigenvalue representation: • Loadings are entries of eigenvectors:

PCA Data Represent’n (Cont.) • Reduced Rank Representation: • Reconstruct using only terms • (assuming decreasing eigenvalues) • Gives: rank approximation of data • Key to PCA dimension reduction • And PCA for data compression (~ .jpeg)

PCA Data Represent’n (Cont.) • Choice of in Reduced Rank Represent’n: • Generally very slippery problem • SCREE plot (Kruskal 1964): • Find knee in power spectrum

PCA Data Represent’n (Cont.) • SCREE plot drawbacks: • What is a knee? • What if there are several? • Knees depend on scaling (power? log?) • Personal suggestion: • Find auxiliary cutoffs (inter-rater variation) • Use the full range (ala scale space)

PCA Simulation • Idea: given • Mean Vector • Eigenvectors • Eigenvalues • Simulate data from corresponding Normal Distribution • Approach: Invert PCA Data Represent’n • where

Alternate PCA Computation Issue: for HDLSS data (recall ) • may be quite large, • Thus slow to work with, and to compute • What about a shortcut? Approach: Singular Value Decomposition (of (centered, scaled) Data Matrix )

Alternate PCA Computation Singular Value Decomposition: Where: is unitary is unitary is diag’l matrix of singular val’s Assume: decreasing singular values

Alternate PCA Computation Singular Value Decomposition: Recall Relation to Eigen-analysis of Thus have same eigenvector matrix And eigenval’s are squares of singular val’s

Alternate PCA Computation Singular Value Decomposition, Computational advantage (for rank ): Use compact form, only need to find e-vec’s s-val’s scores Other components not useful So can be much faster for

Alternate PCA Computation Another Variation: Dual PCA Motivation: Recall for demography data, Useful to view as both Rows as Data & Columns as Data

Alternate PCA Computation Useful terminology (from optimization): Primal PCA problem: Columns as Data Dual PCA problem: Rows as Data

Alternate PCA Computation Dual PCA Computation: Same as above, but replace with So can almost replace with Then use SVD, , to get:

Alternate PCA Computation Appears to be cool symmetry: Primal  Dual Loadings  Scores  But, there is a problem with the means…

Alternate PCA Computation Next time: Explore Loadings & Scores issue More deeply, with explicit look at Notation….

Primal - Dual PCA Note different “mean vectors”: Primal Mean = Mean of Col. Vec’s: Dual Mean = Mean of Row Vec’s:

Primal - Dual PCA Primal PCA, based on SVD of Primal Data: Dual PCA, based on SVD of Dual Data: Very similar, except: • Different centerings • Different row – column interpretation

Primal - Dual PCA Next Time get factors of (n-1) straight. Maybe best to dispense with that in defn Of X_P and X_D…

Primal - Dual PCA Toy Example 1: Random Curves, all in Primal Space: * Constant Shift * Linear * Quadratic Cubic (chosen to be orthonormal) Plus (small) i.i.d. Gaussian noise d = 40, n = 20

Primal - Dual PCA Toy Example 1: Raw Data

Primal - Dual PCA Toy Example 1: Raw Data • Primal (Col.) curves similar to before • Data mat’x asymmetric (but same curves) • Dual (Row) curves much rougher (showing Gaussian randomness) • How data were generated • Color map useful? (same as mesh view) • See richer structure than before • Is it useful?

Primal - Dual PCA Toy Example 1: Primal PCA Column Curves as Data

Primal - Dual PCA Toy Example 1: Primal PCA • Expected to recover increasing poly’s • But didn’t happen • Although can see the poly’s (order???) • Mean has quad’ic (since only n = 20???) • Scores (proj’ns) very random • Power Spectrum shows 4 components (not affected by subtracting Primal Mean)

Primal - Dual PCA Toy Example 1: Dual PCA Row Curves as Data

Primal - Dual PCA Toy Example 1: Dual PCA • Curves all very wiggly (random noise) • Mean much bigger, 54% of Total Var! • Scores have strong smooth structure (reflecting ordered primal e.v.’s) (recall primal e.v.  dual scores) • Power Spectrum shows 3 components (Driven by subtraction Dual Mean) • Primal – Dual mean difference is critical

Object Orie’d Data Analysis, Last Time