1 / 39

Mastering Dimensionality Reduction through Principal Component Analysis

Explore the motivation behind dimensionality reduction, the principles of Principal Component Analysis (PCA), its diverse applications, and comparisons with other methods. Learn how to reduce computational expense, visualize data effectively, and preserve variance. Discover techniques to find optimal projections, select the right number of components, and implement PCA for tasks like face recognition. Uncover the limitations of PCA and delve into alternative dimensionality reduction methods like Fisher Linear Discriminants, Non-linear methods, Kernel PCA, Independent Component Analysis, and Locally Linear Embedding.

richardw
Download Presentation

Mastering Dimensionality Reduction through Principal Component Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana KovashkaUniversity of Pittsburgh January 19, 2017

  2. Plan for today • Dimensionality reduction – motivation • Principal Component Analysis (PCA) • Applications of PCA • Other methods for dimensionality reduction

  3. Why reduce dimensionality? • Data may intrinsically live in a lower-dim space • Too many features and too few data • Lower computational expense (memory, train/test time) • Want to visualize the data in a lower-dim space • Want to use data of different dimensionality

  4. Goal • Input: Data in a high-dim feature space • Output: Projection of same data into a lower-dim space • F: high-dim X  low-dim X

  5. Goal Slide credit: Erik Sudderth

  6. Some criteria for success • Find a projection where the data has: • Low reconstruction error • High variance of the data See hand-written notes for how we find the optimal projection

  7. Principal Components Analysis Slide credit: SubhransuMaji

  8. Demo • http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA_demo.m • http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA.m • Demo with eigenfaces: http://www.cs.ait.ac.th/~mdailey/matlab/

  9. Implementation issue • Covariance matrix is huge (D2 for D pixels) • But typically # examples N << D • Simple trick • X is NxD matrix of normalized training data • Solve for eigenvectors u of XXT instead of XTX • Then Xu is eigenvector of covariance XTX • Need to normalize each vector of Xu into unit length Adapted from Derek Hoiem

  10. How to pick K? • One goal can be to pick K such that P% of the variance of the data is preserved, e.g. 90% • Let Λ = a vector containing the eigenvalues of the covariance matrix • Total variance can be obtained from entries of Λ • total_variance = sum(Λ); • Take as many of these entries as needed • K = find( cumsum(Λ) / total_variance >= P, 1);

  11. Variance preserved at i-th eigenvalue Figure 12.4 (a) from Bishop

  12. Application: Face Recognition Image from cnet.com

  13. Face recognition: once you’ve detected and cropped a face, try to recognize it Detection Recognition “Sally” Slide credit: Lana Lazebnik

  14. Typical face recognition scenarios • Verification: a person is claiming a particular identity; verify whether that is true • E.g., security • Closed-world identification: assign a face to one person from among a known set • General identification: assign a face to a known person or to “unknown” Slide credit: Derek Hoiem

  15. The space of all face images • When viewed as vectors of pixel values, face images are extremely high-dimensional • 24x24 image = 576 dimensions • Slow and lots of storage • But very few 576-dimensional vectors are valid face images • We want to effectively model the subspace of face images Adapted from Derek Hoiem

  16. Representation and reconstruction • Face x in “face space” coordinates: • Reconstruction: = = + ^ x = µ + w1u1+w2u2+w3u3+w4u4+ … Slide credit: Derek Hoiem

  17. Recognition w/ eigenfaces Process labeled training images • Find mean µ and covariance matrix Σ • Find k principal components (eigenvectors of Σ) u1,…uk • Project each training image xi onto subspace spanned by principal components: (wi1,…,wik) = (u1Txi, … , ukTxi) Given novel image x • Project onto subspace: (w1,…,wk) = (u1Tx, … , ukTx) • Classify as closest training face in k-dimensional subspace M. Turk and A. Pentland, Face Recognition using Eigenfaces, CVPR 1991 Adapted from Derek Hoiem

  18. Slide credit: Alexander Ihler

  19. Slide credit: Alexander Ihler

  20. Slide credit: Alexander Ihler

  21. Slide credit: Alexander Ihler

  22. Slide credit: Alexander Ihler

  23. Slide credit: Alexander Ihler

  24. Slide credit: Alexander Ihler

  25. Slide credit: Alexander Ihler

  26. Slide credit: Alexander Ihler

  27. Slide credit: Alexander Ihler

  28. Plan for today • Dimensionality reduction – motivation • Principal Component Analysis (PCA) • Applications of PCA • Other methods for dimensionality reduction

  29. PCA • General dimensionality reduction technique • Preserves most of variance with a much more compact representation • Lower storage requirements (eigenvectors + a few numbers per face) • Faster matching • What are some problems? Slide credit: Derek Hoiem

  30. PCA limitations • The direction of maximum variance is not always good for classification Slide credit: Derek Hoiem

  31. PCA limitations • PCA preserves maximum variance • A more discriminative subspace: Fisher Linear Discriminants • FLD preserves discrimination • Find projection that maximizes scatter between classes and minimizes scatter within classes Adapted from Derek Hoiem

  32. Fisher’s Linear Discriminant • Using two classes as example: x2 x2 x1 x1 Poor Projection Good Slide credit: Derek Hoiem

  33. Comparison with PCA Slide credit: Derek Hoiem

  34. Other dimensionality reduction methods • Non-linear: • Kernel PCA (Schölkopf et al., Neural Computation 1998) • Independent component analysis – Comon, Signal Processing 1994 • LLE (locally linear embedding) – Roweis and Saul, Science 2000 • ISOMAP (isometric feature mapping) – Tenenbaum et al., Science 2000 • t-SNE (t-distributed stochastic neighbor embedding) – van derMaaten and Hinton, JMLR 2008

  35. ISOMAP example Figure from Carlotta Domeniconi

  36. ISOMAP example Figure from Carlotta Domeniconi

  37. t-SNE example Figure from Genevieve Patterson, IJCV 2014

  38. t-SNE example Thomas and Kovashka, CVPR 2016

  39. t-SNE example Thomas and Kovashka, CVPR 2016

More Related