1 / 70

Principal Component Analysis

Principal Component Analysis. B: Chapter 12 HRF: Chapter 14.5. Barnab á s P ó czos University of Alberta. Nov 24, 2009. Contents. Motivation PCA algorithms Applications Face recognition Facial expression recognition PCA theory Kernel-PCA. Some of these slides are taken from

beata
Download Presentation

Principal Component Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principal Component Analysis B: Chapter 12 HRF: Chapter 14.5 Barnabás Póczos University of Alberta Nov 24, 2009

  2. Contents • Motivation • PCA algorithms • Applications • Face recognition • Facial expression recognition • PCA theory • Kernel-PCA Some of these slides are taken from • Karl Booksh Research group • Tom Mitchell • Ron Parr

  3. Data Visualization Data Compression Noise Reduction Data Classification Trend Analysis Factor Analysis PCA Applications

  4. Example: Given53 blood and urine samples (features) from 65 people. How can we visualize the measurements? Data Visualization

  5. Data Visualization • Matrix format (65x53) Instances Features Difficult to see the correlations between the features...

  6. Data Visualization • Spectral format (65 pictures, one for each person) Difficult to compare the different patients...

  7. Data Visualization • Spectral format (53 pictures, one for each feature) Difficult to see the correlations between the features...

  8. Data Visualization Bi-variate Tri-variate How can we visualize the other variables??? … difficult to see in 4 or higher dimensional spaces...

  9. Data Visualization • Is there a representation better than the coordinate axes? • Is it really necessary to show all the 53 dimensions? • … what if there are strong correlations between the features? • How could we find the smallest subspace of the 53-D space that keeps the most information about the original data? • A solution: Principal Component Analysis

  10. Principle Component Analysis Orthogonal projection of data onto lower-dimension linear space that... • maximizes variance of projected data (purple line) • minimizes mean squared distance between • data point and • projections (sum of blue lines) PCA:

  11. Principle Components Analysis Idea: Given data points in a d-dimensional space, project into lower dimensional space while preserving as much information as possible Eg, find best planar approximation to 3D data Eg, find best 12-D approximation to 104-D data In particular, choose projection that minimizes squared errorin reconstructing original data

  12. Vectors originating from the center of mass Principal component #1 points in the direction of the largest variance. Each subsequent principal component… is orthogonal to the previous ones, and points in the directions of the largest variance of the residual subspace The Principal Components

  13. 2D Gaussian dataset

  14. 1st PCA axis

  15. 2nd PCA axis

  16. w x w1 w1(w1Tx) w2(w2Tx) x’=w1(w1Tx)+w2(w2Tx) w2 PCA algorithm I (sequential) Given the centered data {x1, …, xm}, compute the principal vectors: 1st PCA vector We maximize thevariance of projection of x kth PCA vector x’ PCA reconstruction We maximize the variance of the projection in the residual subspace

  17. PCA algorithm II(sample covariance matrix) • Given data {x1, …, xm}, compute covariance matrix  • PCA basis vectors = the eigenvectors of  • Larger eigenvalue  more important eigenvectors where

  18. PCA algorithm II PCA algorithm(X, k): top k eigenvalues/eigenvectors %X= N  m data matrix, % … each data point xi = column vector, i=1..m • X subtract mean xfrom each column vectorxi in X •  XXT … covariance matrix of X • { i, ui }i=1..N = eigenvectors/eigenvalues of ...1  2  … N • Return { i, ui }i=1..k% top k principle components

  19. PCA algorithm III(SVD of the data matrix) Singular Value Decomposition of the centered data matrix X. • Xfeatures  samples = USVT X U S = VT sig. significant noise noise noise significant samples

  20. Columns of U the principal vectors, { u(1), …, u(k) } orthogonal and has unit norm – so UTU = I Can reconstruct the data using linear combinations of { u(1), …, u(k) } Matrix S Diagonal Shows importance of each eigenvector Columns of VT The coefficients for reconstructing the samples PCA algorithm III

  21. Face recognition

  22. Challenge: Facial Recognition • Want to identify specific person, based on facial image • Robust to glasses, lighting,… Can’t just use the given 256 x 256 pixels

  23. Applying PCA: Eigenfaces Example data set: Images of faces Famous Eigenface approach [Turk & Pentland], [Sirovich & Kirby] Each face x is … 256  256 values (luminance at location) x in 256256 (view as 64K dim vector) Form X = [ x1 , …, xm ]centered data mtx Compute S= XXT Problem: S is 64K  64K … HUGE!!! Method A: Build a PCA subspace for each person and check which subspace can reconstruct the test image the best Method B: Build one PCA database for the whole dataset and then classify based on the weights. x1, …, xm X = 256 x 256 real values m faces

  24. Computational Complexity • Suppose m instances, each of size N • Eigenfaces: m=500 faces, each of size N=64K • Given NN covariance matrix S, can compute • all N eigenvectors/eigenvalues in O(N3) • first k eigenvectors/eigenvalues in O(k N2) • But if N=64K,EXPENSIVE!

  25. A Clever Workaround Note that m<<64K Use L=XTX instead of S=XXT If v is eigenvector of L thenXv is eigenvector of S Proof:Lv =  v XTX v =  v X (XTX v) = X( v) =  Xv (XXT)X v =  (Xv) S (Xv) =  (Xv) x1, …, xm X = 256 x 256 real values m faces

  26. Principle Components (Method B)

  27. Reconstructing… (Method B) • … faster if train with… • only people w/out glasses • same lighting conditions

  28. Shortcomings Requires carefully controlled data: All faces centered in frame Same size Some sensitivity to angle Alternative: “Learn” one set of PCA vectors for each angle Use the one with lowest error Method is completely knowledge free (sometimes this is good!) Doesn’t know that faces are wrapped around 3D objects (heads) Makes no effort to preserve class distinctions

  29. Facial expression recognition

  30. Happiness subspace (method A)

  31. Disgust subspace (method A)

  32. Facial Expression RecognitionMovies (method A)

  33. Facial Expression RecognitionMovies (method A)

  34. Facial Expression RecognitionMovies (method A)

  35. Image Compression

  36. Original Image • Divide the original 372x492 image into patches: • Each patch is an instance that contains 12x12 pixels on a grid • View each as a 144-D vector

  37. L2 error and PCA dim

  38. PCA compression: 144D ) 60D

  39. PCA compression: 144D ) 16D

  40. 16 most important eigenvectors

  41. PCA compression: 144D ) 6D

  42. 6 most important eigenvectors

  43. PCA compression: 144D ) 3D

  44. 3 most important eigenvectors

  45. PCA compression: 144D ) 1D

  46. 60 most important eigenvectors Looks like the discrete cosine bases of JPG!...

  47. 2D Discrete Cosine Basis http://en.wikipedia.org/wiki/Discrete_cosine_transform

  48. Noise Filtering

  49. Noise Filtering, Auto-Encoder… x’ x U x

  50. Noisy image

More Related