Mastering Dimensionality Reduction through Principal Component Analysis

CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana KovashkaUniversity of Pittsburgh January 19, 2017

Plan for today • Dimensionality reduction – motivation • Principal Component Analysis (PCA) • Applications of PCA • Other methods for dimensionality reduction

Why reduce dimensionality? • Data may intrinsically live in a lower-dim space • Too many features and too few data • Lower computational expense (memory, train/test time) • Want to visualize the data in a lower-dim space • Want to use data of different dimensionality

Goal • Input: Data in a high-dim feature space • Output: Projection of same data into a lower-dim space • F: high-dim X  low-dim X

Goal Slide credit: Erik Sudderth

Some criteria for success • Find a projection where the data has: • Low reconstruction error • High variance of the data See hand-written notes for how we find the optimal projection

Principal Components Analysis Slide credit: SubhransuMaji

Demo • http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA_demo.m • http://www.cs.pitt.edu/~kovashka/cs2750_sp17/PCA.m • Demo with eigenfaces: http://www.cs.ait.ac.th/~mdailey/matlab/

Implementation issue • Covariance matrix is huge (D2 for D pixels) • But typically # examples N << D • Simple trick • X is NxD matrix of normalized training data • Solve for eigenvectors u of XXT instead of XTX • Then Xu is eigenvector of covariance XTX • Need to normalize each vector of Xu into unit length Adapted from Derek Hoiem

How to pick K? • One goal can be to pick K such that P% of the variance of the data is preserved, e.g. 90% • Let Λ = a vector containing the eigenvalues of the covariance matrix • Total variance can be obtained from entries of Λ • total_variance = sum(Λ); • Take as many of these entries as needed • K = find( cumsum(Λ) / total_variance >= P, 1);

Variance preserved at i-th eigenvalue Figure 12.4 (a) from Bishop

Application: Face Recognition Image from cnet.com

Face recognition: once you’ve detected and cropped a face, try to recognize it Detection Recognition “Sally” Slide credit: Lana Lazebnik

Typical face recognition scenarios • Verification: a person is claiming a particular identity; verify whether that is true • E.g., security • Closed-world identification: assign a face to one person from among a known set • General identification: assign a face to a known person or to “unknown” Slide credit: Derek Hoiem

The space of all face images • When viewed as vectors of pixel values, face images are extremely high-dimensional • 24x24 image = 576 dimensions • Slow and lots of storage • But very few 576-dimensional vectors are valid face images • We want to effectively model the subspace of face images Adapted from Derek Hoiem

Representation and reconstruction • Face x in “face space” coordinates: • Reconstruction: = = + ^ x = µ + w1u1+w2u2+w3u3+w4u4+ … Slide credit: Derek Hoiem

Recognition w/ eigenfaces Process labeled training images • Find mean µ and covariance matrix Σ • Find k principal components (eigenvectors of Σ) u1,…uk • Project each training image xi onto subspace spanned by principal components: (wi1,…,wik) = (u1Txi, … , ukTxi) Given novel image x • Project onto subspace: (w1,…,wk) = (u1Tx, … , ukTx) • Classify as closest training face in k-dimensional subspace M. Turk and A. Pentland, Face Recognition using Eigenfaces, CVPR 1991 Adapted from Derek Hoiem

Slide credit: Alexander Ihler

Plan for today • Dimensionality reduction – motivation • Principal Component Analysis (PCA) • Applications of PCA • Other methods for dimensionality reduction

PCA • General dimensionality reduction technique • Preserves most of variance with a much more compact representation • Lower storage requirements (eigenvectors + a few numbers per face) • Faster matching • What are some problems? Slide credit: Derek Hoiem

PCA limitations • The direction of maximum variance is not always good for classification Slide credit: Derek Hoiem

PCA limitations • PCA preserves maximum variance • A more discriminative subspace: Fisher Linear Discriminants • FLD preserves discrimination • Find projection that maximizes scatter between classes and minimizes scatter within classes Adapted from Derek Hoiem

Fisher’s Linear Discriminant • Using two classes as example: x2 x2 x1 x1 Poor Projection Good Slide credit: Derek Hoiem

Comparison with PCA Slide credit: Derek Hoiem

Other dimensionality reduction methods • Non-linear: • Kernel PCA (Schölkopf et al., Neural Computation 1998) • Independent component analysis – Comon, Signal Processing 1994 • LLE (locally linear embedding) – Roweis and Saul, Science 2000 • ISOMAP (isometric feature mapping) – Tenenbaum et al., Science 2000 • t-SNE (t-distributed stochastic neighbor embedding) – van derMaaten and Hinton, JMLR 2008

ISOMAP example Figure from Carlotta Domeniconi

t-SNE example Figure from Genevieve Patterson, IJCV 2014

t-SNE example Thomas and Kovashka, CVPR 2016

Mastering Dimensionality Reduction through Principal Component Analysis

Mastering Dimensionality Reduction through Principal Component Analysis

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7