1 / 56

Object Orie’d Data Analysis, Last Time

Object Orie’d Data Analysis, Last Time. Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: Detailed (math ’ cal) look at PCA. Last Time: Checked Data Combo, using DWD Dir ’ ns. DWD Views of NCI 60 Data. Interesting Question:

Download Presentation

Object Orie’d Data Analysis, Last Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Orie’d Data Analysis, Last Time • Gene Cell Cycle Data • Microarrays and HDLSS visualization • DWD bias adjustment • NCI 60 Data Today: Detailed (math’cal) look at PCA

  2. Last Time: Checked Data Combo, using DWD Dir’ns

  3. DWD Views of NCI 60 Data • Interesting Question: • Which clusters are really there? • Issues: • DWD great at finding dir’ns of separation • And will do so even if no real structure • Is this happening here? • Or: which clusters are important? • What does “important” mean?

  4. Real Clusters in NCI 60 Data • Simple Visual Approach: • Randomly relabel data (Cancer Types) • Recompute DWD dir’ns & visualization • Get heuristic impression from this • Deeper Approach • Formal Hypothesis Testing • (Done later)

  5. Random Relabelling #1

  6. Random Relabelling #2

  7. Random Relabelling #3

  8. Random Relabelling #4

  9. Revisit Real Data

  10. Revisit Real Data (Cont.) Heuristic Results: Strong Clust’s Weak Clust’s Not Clust’s MelanomaC N S NSCLC LeukemiaOvarianBreast RenalColon

  11. Rediscovery – Renaming of PCA Statistics: Principal Component Analysis (PCA) Social Sciences: Factor Analysis (PCA is a subset) Probability / Electrical Eng: Karhunen – Loeve expansion Applied Mathematics: Proper Orthogonal Decomposition (POD) Geo-Sciences: Empirical Orthogonal Functions (EOF)

  12. An Interesting Historical Note The 1st (?) application of PCA to Functional Data Analysis: Rao, C. R. (1958) Some statistical methods for comparison of growth curves, Biometrics, 14, 1-17. 1st Paper with “Curves as Data” viewpoint

  13. Detailed Look at PCA • Three important (and interesting) viewpoints: • Mathematics • Numerics • Statistics • 1st: Review linear alg. and multivar. prob.

  14. Review of Linear Algebra • Vector Space: • set of “vectors”, , • and “scalars” (coefficients), • “closed” under “linear combination” • ( in space) • e.g. • , • “ dim Euclid’n space”

  15. Review of Linear Algebra (Cont.) • Subspace: • subset that is again a vector space • i.e. closed under linear combination • e.g. lines through the origin • e.g. planes through the origin • e.g. subsp. “generated by” a set of vector (all linear combos of them = • = containing hyperplane • through origin)

  16. Review of Linear Algebra (Cont.) • Basis of subspace: set of vectors that: • span, i.e. everything is a lin. com. of them • are linearly indep’t, i.e. lin. Com. is unique • e.g. “unit vector basis” • e.g.

  17. Review of Linear Algebra (Cont.) Basis Matrix, of subspace of Given a basis, , create matrix of columns: Then “linear combo” is a matrix multiplicat’n: where Check sizes:

  18. Review of Linear Algebra (Cont.) Aside on matrix multiplication: (linear transformat’n) For matrices , Define the “matrix product” (“inner products” of columns with rows) (composition of linear transformations) Often useful to check sizes:

  19. Review of Linear Algebra (Cont.) • Matrix trace: • For a square matrix • Define • Trace commutes with matrix multiplication:

  20. Review of Linear Algebra (Cont.) • Dimension of subspace (a notion of “size”): • number of elements in a basis (unique) • (use basis above) • e.g. dim of a line is 1 • e.g. dim of a plane is 2 • dimension is “degrees of freedom”

  21. Review of Linear Algebra (Cont.) • Norm of a vector: • in , • Idea: “length” of the vector • Note: strange properties for high , • e.g. “length of diagonal of unit cube” = • “length normalized vector”: • (has length one, thus on surf. of unit sphere) • get “distance” as:

  22. Review of Linear Algebra (Cont.) • Inner (dot, scalar) product: • for vectors and , • related to norm, via • measures “angle between and ” as: • key to “orthogonality”, i.e. “perpendicul’ty”: • if and only if

  23. Review of Linear Algebra (Cont.) • Orthonormal basis : • All ortho to each other, i.e. , for • All have length 1, i.e. , for • “Spectral Representation”: where • check: • Matrix notation: where i.e. • is called “transform (e.g. Fourier, wavelet) of ”

  24. Review of Linear Algebra (Cont.) • Parseval identity, for • in subsp. gen’d by o. n. basis : • Pythagorean theorem • “Decomposition of Energy” • ANOVA - sums of squares • Transform, , has same length as , • i.e. “rotation in ”

  25. Review of Linear Algebra (Cont.) Next time: add part about Gram-Schmidt Ortho-normalization

  26. Review of Linear Algebra (Cont.) • Projection of a vector onto a subspace : • Idea: member of that is closest to • (i.e. “approx’n”) • Find that solves: (“least squa’s”) • For inner product (Hilbert) space: • exists and is unique • General solution in : for basis matrix • So “proj’n operator” is “matrix mult’n”: • (thus projection is another linear operation) • (note same operation underlies least squares)

  27. Review of Linear Algebra (Cont.) • Projection using orthonormal basis : • Basis matrix is “orthonormal”: • So = Recon(Coeffs of “in dir’n”) • For “orthogonal complement”, , • and • Parseval inequality:

  28. Review of Linear Algebra (Cont.) • (Real) Unitary Matrices: with • Orthonormal basis matrix • (so all of above applies) • Follows that • (since have full rank, so exists …) • Lin. trans. (mult. by ) is like “rotation” of • But also includes “mirror images”

  29. Review of Linear Algebra (Cont.) Singular Value Decomposition (SVD): For a matrix Find a diagonal matrix , with entries called singular values And unitary (rotation) matrices , (recall ) so that

  30. Review of Linear Algebra (Cont.) • Intuition behind Singular Value Decomposition: • For a “linear transf’n” (via matrix multi’n) • First rotate • Second rescale coordinate axes (by ) • Third rotate again • i.e. have diagonalized the transformation

  31. Review of Linear Algebra (Cont.) SVD Compact Representation: Useful Labeling: Singular Values in Increasing Order Note: singular values = 0 can be omitted Let = # of positive singular values Then: Where are truncations of

  32. Review of Linear Algebra (Cont.) Eigenvalue Decomposition: For a (symmetric) square matrix Find a diagonal matrix And an orthonormal matrix (i.e. ) So that: , i.e.

  33. Review of Linear Algebra (Cont.) • Eigenvalue Decomposition (cont.): • Relation to Singular Value Decomposition • (looks similar?): • Eigenvalue decomposition “harder” • Since needs • Price is eigenvalue decomp’n is generally complex • Except for square and symmetric • Then eigenvalue decomp. is real valued • Thus is the sing’r value decomp. with:

  34. Review of Linear Algebra (Cont.) • Computation of Singular Value and Eigenvalue Decompositions: • Details too complex to spend time here • A “primitive” of good software packages • Eigenvalues are unique • Columns of are called • “eigenvectors” • Eigenvectors are “ -stretched” by :

  35. Review of Linear Algebra (Cont.) • Eigenvalue Decomp. solves matrix problems: • Inversion: • Square Root: • is positive (nonn’ve, i.e. semi) definite all

  36. Review of Multivariate Probability Given a “random vector”, A “center” of the distribution is the mean vector, A “measure of spread” is the covariance matrix:

  37. Review of Multivar. Prob. (Cont.) • Covariance matrix: • Noneg’ve Definite (since all varia’s are 0) • Provides “elliptical summary of distribution” • Calculated via “outer product”:

  38. Review of Multivar. Prob. (Cont.) Empirical versions: Given a random sample , Estimate the theoretical mean , with the sample mean:

  39. Review of Multivar. Prob. (Cont.) Empirical versions (cont.) And estimate the “theoretical cov.” , with the “sample cov.”: Normalizations: gives unbiasedness gives MLE in Gaussian case

  40. Review of Multivar. Prob. (Cont.) Outer product representation: , where:

  41. PCA as an Optimization Problem Find “direction of greatest variability”:

  42. PCA as Optimization (Cont.) Find “direction of greatest variability”: Given a “direction vector”, (i.e. ) Projection of in the direction : Variability in the direction :

  43. PCA as Optimization (Cont.) Variability in the direction : i.e. (proportional to) a quadratic form in the covariance matrix Simple solution comes from the eigenvalue representation of : where is orthonormal, &

  44. PCA as Optimization (Cont.) Variability in the direction : But = “ transform of ” = “ rotated into coordinates”, and the diagonalized quadratic form becomes

  45. PCA as Optimization (Cont.) Now since is an orthonormal basis matrix, and So the rotation gives a distribution of the (unit) energy of over the eigen-directions And is max’d (over ), by putting all energy in the “largest direction”, i.e. , where “eigenvalues are ordered”,

  46. PCA as Optimization (Cont.) • Notes: • Solution is unique when • Else have sol’ns in subsp. gen’d by 1st s • Projecting onto subspace to , • gives as next direction • Continue through ,…, • Replace by to get theoretical PCA • Estimated by the empirical version

  47. Iterated PCA Visualization

  48. Connect Math to Graphics 2-d Toy Example Feature Space Object Space Data Points (Curves) are columns of data matrix, X

  49. Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Sample Mean, X

  50. Connect Math to Graphics (Cont.) 2-d Toy Example Feature Space Object Space Residuals from Mean = Data - Mean

More Related