390 likes | 403 Views
CS 2750: Machine Learning Dimensionality Reduction. Prof. Adriana Kovashka University of Pittsburgh January 19, 2017. Plan for today. Dimensionality reduction – motivation Principal Component Analysis (PCA) Applications of PCA Other methods for dimensionality reduction.
E N D
CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana KovashkaUniversity of Pittsburgh January 19, 2017
Plan for today • Dimensionality reduction – motivation • Principal Component Analysis (PCA) • Applications of PCA • Other methods for dimensionality reduction
Why reduce dimensionality? • Data may intrinsically live in a lower-dim space • Too many features and too few data • Lower computational expense (memory, train/test time) • Want to visualize the data in a lower-dim space • Want to use data of different dimensionality
Goal • Input: Data in a high-dim feature space • Output: Projection of same data into a lower-dim space • F: high-dim X low-dim X
Goal Slide credit: Erik Sudderth
Some criteria for success • Find a projection where the data has: • Low reconstruction error • High variance of the data See hand-written notes for how we find the optimal projection
Principal Components Analysis Slide credit: SubhransuMaji
Demo • http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA_demo.m • http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA.m • Demo with eigenfaces: http://www.cs.ait.ac.th/~mdailey/matlab/
Implementation issue • Covariance matrix is huge (D2 for D pixels) • But typically # examples N << D • Simple trick • X is NxD matrix of normalized training data • Solve for eigenvectors u of XXT instead of XTX • Then Xu is eigenvector of covariance XTX • Need to normalize each vector of Xu into unit length Adapted from Derek Hoiem
How to pick K? • One goal can be to pick K such that P% of the variance of the data is preserved, e.g. 90% • Let Λ = a vector containing the eigenvalues of the covariance matrix • Total variance can be obtained from entries of Λ • total_variance = sum(Λ); • Take as many of these entries as needed • K = find( cumsum(Λ) / total_variance >= P, 1);
Variance preserved at i-th eigenvalue Figure 12.4 (a) from Bishop
Application: Face Recognition Image from cnet.com
Face recognition: once you’ve detected and cropped a face, try to recognize it Detection Recognition “Sally” Slide credit: Lana Lazebnik
Typical face recognition scenarios • Verification: a person is claiming a particular identity; verify whether that is true • E.g., security • Closed-world identification: assign a face to one person from among a known set • General identification: assign a face to a known person or to “unknown” Slide credit: Derek Hoiem
The space of all face images • When viewed as vectors of pixel values, face images are extremely high-dimensional • 24x24 image = 576 dimensions • Slow and lots of storage • But very few 576-dimensional vectors are valid face images • We want to effectively model the subspace of face images Adapted from Derek Hoiem
Representation and reconstruction • Face x in “face space” coordinates: • Reconstruction: = = + ^ x = µ + w1u1+w2u2+w3u3+w4u4+ … Slide credit: Derek Hoiem
Recognition w/ eigenfaces Process labeled training images • Find mean µ and covariance matrix Σ • Find k principal components (eigenvectors of Σ) u1,…uk • Project each training image xi onto subspace spanned by principal components: (wi1,…,wik) = (u1Txi, … , ukTxi) Given novel image x • Project onto subspace: (w1,…,wk) = (u1Tx, … , ukTx) • Classify as closest training face in k-dimensional subspace M. Turk and A. Pentland, Face Recognition using Eigenfaces, CVPR 1991 Adapted from Derek Hoiem
Plan for today • Dimensionality reduction – motivation • Principal Component Analysis (PCA) • Applications of PCA • Other methods for dimensionality reduction
PCA • General dimensionality reduction technique • Preserves most of variance with a much more compact representation • Lower storage requirements (eigenvectors + a few numbers per face) • Faster matching • What are some problems? Slide credit: Derek Hoiem
PCA limitations • The direction of maximum variance is not always good for classification Slide credit: Derek Hoiem
PCA limitations • PCA preserves maximum variance • A more discriminative subspace: Fisher Linear Discriminants • FLD preserves discrimination • Find projection that maximizes scatter between classes and minimizes scatter within classes Adapted from Derek Hoiem
Fisher’s Linear Discriminant • Using two classes as example: x2 x2 x1 x1 Poor Projection Good Slide credit: Derek Hoiem
Comparison with PCA Slide credit: Derek Hoiem
Other dimensionality reduction methods • Non-linear: • Kernel PCA (Schölkopf et al., Neural Computation 1998) • Independent component analysis – Comon, Signal Processing 1994 • LLE (locally linear embedding) – Roweis and Saul, Science 2000 • ISOMAP (isometric feature mapping) – Tenenbaum et al., Science 2000 • t-SNE (t-distributed stochastic neighbor embedding) – van derMaaten and Hinton, JMLR 2008
ISOMAP example Figure from Carlotta Domeniconi
ISOMAP example Figure from Carlotta Domeniconi
t-SNE example Figure from Genevieve Patterson, IJCV 2014
t-SNE example Thomas and Kovashka, CVPR 2016
t-SNE example Thomas and Kovashka, CVPR 2016