1 / 40

Multivariate Data Analysis

Multivariate Data Analysis. Principal Component Analysis. Principal Component Analysis (PCA). Singular Value Decomposition Eigenvector / eigenvalue calculation. Data Matrix (IxK). Reduce variables Improve projections Remove noise Find outliers Find classes. K. X. I. PCA.

Download Presentation

Multivariate Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate Data Analysis Principal Component Analysis

  2. Principal Component Analysis (PCA) • Singular Value Decomposition • Eigenvector / eigenvalue calculation

  3. Data Matrix (IxK) • Reduce variables • Improve projections • Remove noise • Find outliers • Find classes K X I

  4. PCA • Example with 2 variables, 6 objects • Find best (most informative) direction in space • Describe direction • Make projection

  5. x2 x1

  6. x2 x1

  7. 1st PC

  8. 1st PC Score Residual

  9. 1st PC Loading p2 Unit vector Loading p1

  10. 1st PC Unit vector Loading p2 = sin (a)  Loading p1 = cos(a)

  11. t X K i Score vector I p Loading vector

  12. k t X K Score vector I p Loading vector

  13. t X K Score vector I p Loading vector

  14. X = t1p1’ + t2p2’ + ... + tApA’ + E X=TP’+E X : properly preprocessed (IxK) T: Score matrix (IxA) P: loading matrix (KxA) E: residual matrix (IxK) ta: score vector pa: loading vector

  15. The Wine ExamplePeople magazineWise & Gallagher

  16. Wine Beer Spirit LifeEx HeartD France Italy Switz Austra Brit U.S.A. Russia Czech Japan Mexico 63.5000 40.1000 2.5000 78.0000 61.1000 58.0000 25.1000 0.9000 78.0000 94.1000 46.0000 65.0000 1.7000 78.0000 106.4000 15.7000 102.1000 1.2000 78.0000 173.0000 12.2000 100.0000 1.5000 77.0000 199.7000 8.9000 87.8000 2.0000 76.0000 176.0000 2.7000 17.1000 3.8000 69.0000 373.6000 1.7000 140.0000 1.0000 73.0000 283.7000 1.0000 55.0000 2.1000 79.0000 34.7000 0.2000 50.4000 0.8000 73.0000 36.4000

  17. Beer Wine Spirit LifeEx HeartD Mean 20.9900 68.2600 1.7500 75.9000 153.8700 24.9270 38.6718 0.9132 3.2128 110.8182 Standard Deviation

  18. Singular value l1=46% 32% 12% 8% 2% Component

  19. Score 2 (32%) Czech Brit Austral Mex USA Japan Switz Italy France Russia Score 1 (46%)

  20. Loading 2 Beer Life exp. Heart dis. Wine Spirit Loading 1

  21. Conclusions Scores = positions of objects in multivariate space Loadings = importance of original variables for new directions Try to explain a large enough portion of X (46+32 = 78%)

  22. The Apricot Example Manley & Geladi

  23. Pseudoabsorbance Appelkoos Wavelength, nm

  24. Singular value Scree plot Component number

  25. What is rank? Mathematical rank = max(min(I,K)) Gives zero residual Effective rank = A Separates model from noise

  26. ANOVA SS SS% SS%cum Comp# 1 2 3 4 5 6 7 8 9 10 68.8269 1.2843 0.0463 0.0045 0.0007 0.0003 0.0002 0.0001 0.0000 0.0000 98.10 1.83 0.07 0.01 0.00 0.00 0.00 0.00 0.00 0.00 98.10 99.93 100 Total 70.1634 100

  27. Score 2 (2%) Score 1 (98%)

  28. ANOVA SStot = l1 + l2 + l3 +...+ l(I or K) SStot = SS1 + SS2 + SS3 +...+ SS(I or K) From largest to smallest!

  29. ANOVA X = TP’ + E data = model + residual SStot = SSmod + SSres R2 = SSmod / SStot = 1 - SSres / SStot Coefficient of determination (often in %)

  30. Examples Wines R2 = SSmod = 78% SSres = 22%2 Comp. Apricots 1 R2 = SSmod = 99.93% SSres = 0.07% 2 Comp. Apricots 2 R2 = SSmod = 100% SSres = ±0.0% 3 Comp.

  31. Absorbance Outliers removed Wavelength, nm

  32. No outliers Singular values l1=81% 16% 3% Component

  33. Score 3 (3%) Whole fruit No kernel Thin slice Score 2 (16%)

  34. Loading 23 Wavelength, nm

  35. Loading 3 Loading 2

  36. More nomenclature Score = Latent Variable Loading vector = Eigenvector Effective rank = Pseudorank = Model dimensionality = Number of components SSa = Eigenvalue Singular value = SSa1/2

  37. An analysis sequence • 1. Scale, mean-center data • 2. Calculate a few components • 3. Check scores, loadings • 4. Find outliers, groupings, explain • 5. Remove outliers

  38. An analysis sequence • 6. Scale, mean-center data • 7. Calculate enough components • 8. Try to detemine pseudorank • 9. Check score plots • 10. Check loading plots • 11. Check residuals

  39. Wines Residual stdev 2 1 4 0 3

  40. Wines Residual stdev 4 0 1 3 2

More Related