1 / 29

Objectives

EE3J2 Data Mining Lecture 11 Vector Data Analysis and Principle Components Analysis (PCA) Martin Russell. Objectives. To review basic data analysis To review the notions of mean, variance and covariance To explain Principle Components Analysis (PCA). Example from speech processing.

Download Presentation

Objectives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EE3J2 Data MiningLecture 11Vector Data Analysis and Principle Components Analysis (PCA)Martin Russell EE3J2 Data Mining

  2. Objectives • To review basic data analysis • To review the notions of mean, variance and covariance • To explain Principle Components Analysis (PCA) EE3J2 Data Mining

  3. Example from speech processing Plot of high-frequency energy vs low-frequency energy, for 25 ms speech segments, sampled every 10ms EE3J2 Data Mining

  4. Sample variance ‘x’ ‘y’ max ‘y’ min Sample variance ‘y’ Sample mean ‘x’ max ‘x’ min Basic statistics EE3J2 Data Mining

  5. Basic statistics • Denote samples by X = x1, x2, … ,xT, where xt = (xt1, xt2, … , xtN) • The sample mean (X) is given by: EE3J2 Data Mining

  6. More basic statistics • The sample variance (X) is given by: EE3J2 Data Mining

  7. Covariance • As the x value increases, the y value also increases • This is (positive) co-variance • If y decreases as x increases, the result is negative covariance EE3J2 Data Mining

  8. Definition of covariance • The covariance between the mth and nthcomponents of the sample data is defined by: • In practice it is useful to subtract the mean (X) from each of the data points xt. The sample mean is then 0 and EE3J2 Data Mining

  9. The covariance matrix EE3J2 Data Mining

  10. Implies positive covariance Data with mean subtracted EE3J2 Data Mining

  11. Implies negative covariance Sample data rotated through 2 EE3J2 Data Mining

  12. Data with covariance removed EE3J2 Data Mining

  13. Principle Components Analysis • PCA is the technique which I used to diagonalise the sample covariance matrix • The first step is to write the covariance matrix in the form: where D is diagonal and U is a matrix corresponding to a rotation • Can do this using SVD (see lecture 8) or eigenvalue decomposition EE3J2 Data Mining

  14. PCA continued U implements rotation through angle  e1is the first column of U d11is the variance in the direction e1 e2 is the second column of U d22is the variance in the direction e2 e1 e2  EE3J2 Data Mining

  15. Example • Illustration of PCA through an example application • 3D dance motion modelling EE3J2 Data Mining

  16. Data • Analysis of dance sequence data • Body position represented as 90 dimensional vector • Dance sequence represented as a sequence of these vectors • MEng FYP 2004/5, Wan Ni Chong EE3J2 Data Mining

  17. Data Capture (1) EE3J2 Data Mining

  18. Data Capture (2) EE3J2 Data Mining

  19. Data Capture (3) EE3J2 Data Mining

  20. Calculating PCA • Step 1: Arrange data as a matrix • Rows correspond to individual data points • Number of columns = dimension of data (= 90) • Number of rows = number of data points = N EE3J2 Data Mining

  21. Calculating PCA (step 2) • Compute the covariance matrix of the data • In MATLAB >>C = cov(X) • Alternatively (as in slides from last lecture): • calculate the mean vector m, • subtract m from each row of X to give Y • Then EE3J2 Data Mining

  22. Calculating PCA (step 3) • Do an eigenvector decomposition of C, so that: C = UDUT • Where • U is a unitary (rotation) matrix • D is a diagonal matrix (in fact all elements of D will be real and non-negative) • In MATLAB type >>[U,D] = eig(C) EE3J2 Data Mining

  23. Calculating PCA (step 4) • Each column of U is a principle vector • The corresponding eigenvalue indictates the variance of the data along that dimension • Large eigenvalues indicate significant components of the data • Small eigenvalues indicate that the variation along the corresponding eigenvectors may be noise EE3J2 Data Mining

  24. Data PCs Eigenvalues 1st 1st More Significant Components 2nd 3rd Insignificant Components 90th 90th Eigenvalues EE3J2 Data Mining

  25. Calculating PCA (step 6) • It may be advantageous to ignore dimensions which correspond to small eigenvalues and only consider the projection of the data onto the most significant eigenvectors • In this way the dimension of the data can be reduced EE3J2 Data Mining

  26. Eigenspace Eigenspace Visualising PCA Original pattern (blue) U Set coordinates 11 – 90 to zero Reduced pattern (red) U-1 EE3J2 Data Mining

  27. PCA Example • Original 90 dimensional data reduced to just 1 dimension EE3J2 Data Mining

  28. PCA Example • Original 90 dimensional data reduced to 10 dimensions EE3J2 Data Mining

  29. Summary • Example of PCA • Analysis of 90 dimensional 3D dance data • Analysis shows that PCA can reduce 90 dimensional representation to just 10 dimensions with minimal loss of accuracy EE3J2 Data Mining

More Related