1 / 53

Object Orie’d Data Analysis, Last Time

Learn about PCA, its energy redistribution concept, and data representation through eigenvalues and eigenvectors. Explore alternate PCA computations and applications in demography. Understand the optimization problems solved by PCA.

ramirod
Download Presentation

Object Orie’d Data Analysis, Last Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Orie’d Data Analysis, Last Time • Finished NCI 60 Data • Linear Algebra Review • Multivariate Probability Review • PCA as an Optimization Problem (Eigen-decomp. gives rotation, easy sol’n) • Connected Mathematics & Graphics

  2. Class Listserv Tested on Thursday Evening, 9/8/05 If you did not get the email: • Please add yourself to the list • Use Instructions at bottom of Class Web Page: http://www.unc.edu/~marron/UNCstat322-2005/HomePage.html

  3. PCA Redistribution of Energy Convenient summary of amount of structure: Total Sum of Squares Physical Interpetation: Total Energy in Data Insight comes from decomposition Statistical Terminology: ANalysis of VAriance (ANOVA)

  4. PCA Redist’n of Energy (Cont.) ANOVA mean decomposition: Total Variation = = Mean Variation + Mean Residual Variation Mathematics: Pythagorean Theorem Intuition Quantified via Sums of Squares

  5. Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Residuals from Mean = Data – Mean Most of Variation = 92% is Mean Variation SS Remaining Variation = 8% is Resid. Var. SS

  6. PCA Redist’n of Energy (Cont.) Now decompose SS about the mean where: Energy is expressed in trace of covar’ce matrix

  7. PCA Redist’n of Energy (Cont.) • Eigenvalues provide atoms of SS decomposi’n • Useful Plots are: • “Power Spectrum”: vs. • “log Power Spectrum”: vs. • “Cumulative Power Spectrum”: vs. • Note PCA gives SS’s for free (as eigenvalues), • but watch factors of

  8. PCA Redist’n of Energy (Cont.) • Note, have already considered some of these Useful Plots: • Power Spectrum • Cumulative Power Spectrum

  9. Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Revisit SS Decomposition for PC1: PC1 has “most of var’n” = 93% Reflected by good approximation in Object Space

  10. Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Revisit SS Decomposition for PC1: PC2 has “only a little var’n” = 7% Reflected by poor approximation in Object Space

  11. Different Views of PCA • Solves several optimization problems: • Direction to maximize SS of 1-d proj’d data • Direction to minimize SS of residuals • (same, by Pythagorean Theorem) • “Best fit line” to data in “orthogonal sense” • (vs. regression of Y on X = vertical sense • & regression of X on Y = horizontal sense) • Use one that makes sense…

  12. Different Views of PCA 2-d Toy Example Feature Space Object Space • Max SS of Projected Data • Min SS of Residuals • Best Fit Line

  13. PCA Data Representation Idea: Expand Data Matrix in terms of inner prod’ts & eigenvectors Recall notation: Eigenvalue expansion (centered data):

  14. PCA Data Represent’n (Cont.) • Now using: • Eigenvalue expansion (raw data): • Where: • Entries of are loadings • Entries of are scores

  15. PCA Data Represent’n (Cont.) Can focus on individual data vectors: (part of above full matrix rep’n)

  16. PCA Data Represent’n (Cont.) • Reduced Rank Representation: • Reconstruct using only terms • (assuming decreasing eigenvalues) • Gives: rank approximation of data • Key to PCA data reduction • And PCA for data compression (~ .zip)

  17. PCA Data Represent’n (Cont.) • Choice of in Reduced Rank Represent’n: • Generally very slippery problem • SCREE plot (Kruskal 1964): • Find knee in power spectrum

  18. PCA Data Represent’n (Cont.) • SCREE plot drawbacks: • What is a knee? • What if there are several? • Knees depend on scaling (power? Log?) • Personal suggestion: • Find auxilliary cutoffs (inter-rater variation) • Use the full range (ala scale space)

  19. PCA Simulation • Idea: given • Mean Vector • Eigenvectors • Eigenvalues • Simulate data from corresponding Normal Distribution • Approach: Invert PCA Data Represent’n • where

  20. Alternate PCA Computation Issue: for HDLSS data (recall ) • may be quite large, • Thus slow to work with, and to compute • What about a shortcut? Approach: Singular Value Decomposition (of Data Matrix )

  21. Alternate PCA Computation Singular Value Decomposition: Where: is unitary is unitary is diag’l matrix of singular val’s Assume: decreasing singular values

  22. Alternate PCA Computation Singular Value Decomposition: Recall Relation to Eigen-analysis of Thus have same eigenvector matrix And eigenval’s are squares of singular val’s

  23. Alternate PCA Computation Singular Value Decomposition, Computational advantage: Use compact form, only need to find e-vec’s e-val’s scores Other components not useful So can be much faster for

  24. Alternate PCA Computation Another Variation: Dual PCA Motivation: Recall for demography data, Useful to view as both Rows as Data & Columns as Data

  25. Alternate PCA Computation Useful terminology (from optimization): Primal PCA problem: Columns as Data Dual PCA problem: Rows as Data

  26. Alternate PCA Computation Dual PCA Computation: Same as above, but replace with So can almost replace with Then use SVD, , to get:

  27. Alternate PCA Computation Appears to be cool symmetry: Primal  Dual Loadings  Scores  But, there is a problem with the means…

  28. Primal - Dual PCA Note different “mean vectors”: Primal Mean = Mean of Col. Vec’s: Dual Mean = Mean of Row Vec’s:

  29. Primal - Dual PCA Primal PCA, based on SVD of Primal Data: Dual PCA, based on SVD of Dual Data: Very similar, except: • Different centerings • Different row – column interpretation

  30. Primal - Dual PCA Toy Example 1: Random Curves, all in Primal Space: • * Constant Shift • * Linear • * Quadratic • Cubic (chosen to be orthonormal) • Plus (small) i.i.d. Gaussian noise • d = 40, n = 20

  31. Primal - Dual PCA Toy Example 1: Raw Data

  32. Primal - Dual PCA Toy Example 1: Raw Data • Primal (Col.) curves similar to before • Data mat’x asymmetric (but same curves) • Dual (Row) curves much rougher (showing Gaussian randomness) • How data were generated • Color map useful? (same as mesh view) • See richer structure than before • Is it useful?

  33. Primal - Dual PCA Toy Example 1: Primal PCA Column Curves as Data

  34. Primal - Dual PCA Toy Example 1: Primal PCA • Expected to recover increasing poly’s • But didn’t happen • Although can see the poly’s (order???) • Mean has quad’ic (since only n = 20???) • Scores (proj’ns) very random • Power Spectrum shows 4 components (not affected by subtracting Primal Mean)

  35. Primal - Dual PCA Toy Example 1: Dual PCA Row Curves as Data

  36. Primal - Dual PCA Toy Example 1: Dual PCA • Curves all very wiggly (random noise) • Mean much bigger, 54% of Total Var! • Scores have strong smooth structure (reflecting ordered primal e.v.’s) (recall primal e.v.  dual scores) • Power Spectrum shows 3 components (Driven by subtraction Dual Mean) • Primal – Dual mean difference is critical

  37. Primal - Dual PCA Toy Example 1: Dual PCA – Scatterplot

  38. Primal - Dual PCA Toy Example 1: Dual PCA - Scatterplot • Smooth Curve Structure • But not usual curves (Since 1-d curves not quite poly’s) • And only in 1st 3 components • Recall only 3 non-noise components • Since constant curve went into mean (dual) • Remainder is pure noise • Suggests wrong rotation of axes???

  39. Primal - Dual PCA A 3rd Type of Analysis: • Called “SVD decomposition” • Main point: subtract neither mean • Viewed as a serious competitor • Advantage: gives best Mean Square Approximation of Data Matrix • Vs. Primal PCA: best about col. Mean • Vs. Dual PCA: best about row Mean Difference in means is critical!

  40. Primal - Dual PCA Toy Example 1: SVD – Curves view

  41. Primal - Dual PCA Toy Example 1: SVD Curves View • Col. Curves view similar to Primal PCA • Row Curves quite different (from dual): • Former mean, now SV1 • Former PC1, now SV2 • i.e. very similar shapes with shifted indices • Again mean centering is crucial • Main difference between PCAs and SVD

  42. Primal - Dual PCA Toy Example 1: SVD – Mesh-Image View

  43. Primal - Dual PCA Toy Example 1: SVD Mesh-Image View • Think about decomposition into modes of variation • Constant x Gaussian • Linear x Gaussian • Cubic by Gaussian • Quadratic • Shows up best in image view? • Why is ordering “wrong”???

  44. Primal - Dual PCA Toy Example 1: All Primal • Why is SVD mode ordering “wrong”??? • Well, not expected… • Key is need orthogonality • Present in space of column curves • But only approximate in row Gaussians • The implicit orthogonalization of SVD (both rows and columns) gave mixture of the poly’s.

  45. Primal - Dual PCA Toy Example 2: All Primal, GS Noise • Started with same column space • Generated i.i.d. Gaussians for row cols • Then did Graham-Schmidt Ortho-normalization (in row space) Visual impression: Amazingly similar to original data (used same seeds of random # generators)

  46. Primal - Dual PCA Toy Example 2: Raw Data

  47. Primal - Dual PCA Compare with Earlier Toy Example 1

  48. Primal - Dual PCA Toy Example 2: Primal PCA Column Curves as Data Shows Explanation (of wrong components) was correct

  49. Primal - Dual PCA Toy Example 2: Dual PCA Row Curves as Data Still have big mean But Scores look much better

  50. Primal - Dual PCA Toy Example 2: Dual PCA – Scatterplot

More Related