1 / 40

Object Orie’d Data Analysis, Last Time

Object Orie’d Data Analysis, Last Time. Finished Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp. gives rotation , easy sol ’ n) Connected Mathematics & Graphics Started Redistribution of Energy. PCA Redistribution of Energy.

mimir
Download Presentation

Object Orie’d Data Analysis, Last Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Orie’d Data Analysis, Last Time • Finished Algebra Review • Multivariate Probability Review • PCA as an Optimization Problem (Eigen-decomp. gives rotation, easy sol’n) • Connected Mathematics & Graphics • Started Redistribution of Energy

  2. PCA Redistribution of Energy Convenient summary of amount of structure: Total Sum of Squares Physical Interpetation: Total Energy in Data Insight comes from decomposition Statistical Terminology: ANalysis Of VAriance (ANOVA)

  3. PCA Redist’n of Energy (Cont.) ANOVA mean decomposition: Total Variation = = Mean Variation + Mean Residual Variation Mathematics: Pythagorean Theorem Intuition Quantified via Sums of Squares

  4. Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Residuals from Mean = Data – Mean Most of Variation = 92% is Mean Variation SS Remaining Variation = 8% is Resid. Var. SS

  5. PCA Redist’n of Energy (Cont.) • Have already studied this decomposition (recall curve e.g.) • Variation due to Mean (% of total) • Variation of Mean Residuals (% of total)

  6. PCA Redist’n of Energy (Cont.) Now decompose SS about the mean where: Energy is expressed in trace of covar’ce matrix

  7. PCA Redist’n of Energy (Cont.) • Eigenvalues provide atoms of SS decomposi’n • Useful Plots are: • “Power Spectrum”: vs. • “log Power Spectrum”: vs. • “Cumulative Power Spectrum”: vs. • Note PCA gives SS’s for free (as eigenvalues), • but watch factors of

  8. PCA Redist’n of Energy (Cont.) • Note, have already considered some of these Useful Plots: • Power Spectrum • Cumulative Power Spectrum

  9. Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Revisit SS Decomposition for PC1: PC1 has “most of var’n” = 93% Reflected by good approximation in Object Space

  10. Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Revisit SS Decomposition for PC1: PC2 has “only a little var’n” = 7% Reflected by poor approximation in Object Space

  11. Different Views of PCA • Solves several optimization problems: • Direction to maximize SS of 1-d proj’d data • Direction to minimize SS of residuals • (same, by Pythagorean Theorem) • “Best fit line” to data in “orthogonal sense” • (vs. regression of Y on X = vertical sense • & regression of X on Y = horizontal sense) • Use one that makes sense…

  12. Different Views of PCA Next Time: Add some graphics about this Scatterplot of Toy Data sets + various fits, with residuals Will be Useful in Stor 165, as well

  13. Different Views of PCA 2-d Toy Example Feature Space Object Space • Max SS of Projected Data • Min SS of Residuals • Best Fit Line

  14. PCA Data Representation Idea: Expand Data Matrix in terms of inner prod’ts & eigenvectors Recall notation: Eigenvalue expansion (centered data):

  15. PCA Data Represent’n (Cont.) • Now using: • Eigenvalue expansion (raw data): • Where: • Entries of are loadings • Entries of are scores

  16. PCA Data Represent’n (Cont.) Can focus on individual data vectors: (part of above full matrix rep’n) Terminology: are called “PCs” and are also called scores

  17. PCA Data Represent’n (Cont.) • More terminology: • Scores, are coefficients in eigenvalue representation: • Loadings are entries of eigenvectors:

  18. PCA Data Represent’n (Cont.) • Reduced Rank Representation: • Reconstruct using only terms • (assuming decreasing eigenvalues) • Gives: rank approximation of data • Key to PCA dimension reduction • And PCA for data compression (~ .jpeg)

  19. PCA Data Represent’n (Cont.) • Choice of in Reduced Rank Represent’n: • Generally very slippery problem • SCREE plot (Kruskal 1964): • Find knee in power spectrum

  20. PCA Data Represent’n (Cont.) • SCREE plot drawbacks: • What is a knee? • What if there are several? • Knees depend on scaling (power? log?) • Personal suggestion: • Find auxiliary cutoffs (inter-rater variation) • Use the full range (ala scale space)

  21. PCA Simulation • Idea: given • Mean Vector • Eigenvectors • Eigenvalues • Simulate data from corresponding Normal Distribution • Approach: Invert PCA Data Represent’n • where

  22. Alternate PCA Computation Issue: for HDLSS data (recall ) • may be quite large, • Thus slow to work with, and to compute • What about a shortcut? Approach: Singular Value Decomposition (of (centered, scaled) Data Matrix )

  23. Alternate PCA Computation Singular Value Decomposition: Where: is unitary is unitary is diag’l matrix of singular val’s Assume: decreasing singular values

  24. Alternate PCA Computation Singular Value Decomposition: Recall Relation to Eigen-analysis of Thus have same eigenvector matrix And eigenval’s are squares of singular val’s

  25. Alternate PCA Computation Singular Value Decomposition, Computational advantage (for rank ): Use compact form, only need to find e-vec’s s-val’s scores Other components not useful So can be much faster for

  26. Alternate PCA Computation Another Variation: Dual PCA Motivation: Recall for demography data, Useful to view as both Rows as Data & Columns as Data

  27. Alternate PCA Computation Useful terminology (from optimization): Primal PCA problem: Columns as Data Dual PCA problem: Rows as Data

  28. Alternate PCA Computation Dual PCA Computation: Same as above, but replace with So can almost replace with Then use SVD, , to get:

  29. Alternate PCA Computation Appears to be cool symmetry: Primal  Dual Loadings  Scores  But, there is a problem with the means…

  30. Alternate PCA Computation Next time: Explore Loadings & Scores issue More deeply, with explicit look at Notation….

  31. Primal - Dual PCA Note different “mean vectors”: Primal Mean = Mean of Col. Vec’s: Dual Mean = Mean of Row Vec’s:

  32. Primal - Dual PCA Primal PCA, based on SVD of Primal Data: Dual PCA, based on SVD of Dual Data: Very similar, except: • Different centerings • Different row – column interpretation

  33. Primal - Dual PCA Next Time get factors of (n-1) straight. Maybe best to dispense with that in defn Of X_P and X_D…

  34. Primal - Dual PCA Toy Example 1: Random Curves, all in Primal Space: * Constant Shift * Linear * Quadratic Cubic (chosen to be orthonormal) Plus (small) i.i.d. Gaussian noise d = 40, n = 20

  35. Primal - Dual PCA Toy Example 1: Raw Data

  36. Primal - Dual PCA Toy Example 1: Raw Data • Primal (Col.) curves similar to before • Data mat’x asymmetric (but same curves) • Dual (Row) curves much rougher (showing Gaussian randomness) • How data were generated • Color map useful? (same as mesh view) • See richer structure than before • Is it useful?

  37. Primal - Dual PCA Toy Example 1: Primal PCA Column Curves as Data

  38. Primal - Dual PCA Toy Example 1: Primal PCA • Expected to recover increasing poly’s • But didn’t happen • Although can see the poly’s (order???) • Mean has quad’ic (since only n = 20???) • Scores (proj’ns) very random • Power Spectrum shows 4 components (not affected by subtracting Primal Mean)

  39. Primal - Dual PCA Toy Example 1: Dual PCA Row Curves as Data

  40. Primal - Dual PCA Toy Example 1: Dual PCA • Curves all very wiggly (random noise) • Mean much bigger, 54% of Total Var! • Scores have strong smooth structure (reflecting ordered primal e.v.’s) (recall primal e.v.  dual scores) • Power Spectrum shows 3 components (Driven by subtraction Dual Mean) • Primal – Dual mean difference is critical

More Related