1 / 18

Principal Component Analysis

Principal Component Analysis. Zelin Jia Shengbin Lin 10/20/2015. What is PCA?. An orthogonal transformation Convert correlated variables to an artificial variable(Principle Component) The resulting vectors are an orthogonal basis set A tool in  exploratory data analysis.

lindsayk
Download Presentation

Principal Component Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015

  2. What is PCA? • An orthogonal transformation • Convert correlated variables to an artificial variable(Principle Component) • The resulting vectors are an orthogonal basis set • A tool in exploratory data analysis https://en.wikipedia.org/wiki/Principal_component_analysis

  3. Why use PCA? • Reduce the dimensionality of the data • Compress the data • Prepare the data for further analysis using other techniques • Understand your data better by interpreting the loadings, and by graphing the derived variables http://psych.colorado.edu/wiki/lib/exe/fetch.php?media=labs:learnr:emily_-_principal_components_analysis_in_r:pca_how_to.pdf Dr. Peter Westfall

  4. How PCA works • PCA begin with covariance matrix: Cov(X)=XTX • For the covariance matrix, calculate its eigenvectors and eigenvalues. • Get sets of eigenvectors zi and eigenvaluesλi (Constraint: ziT zi=1) • arrange the eigenvectors in decreasing order of the eigenvalues • Pick eigenvectors, multiple by original data matrix(X), we will get PC matrix. https://www.riskprep.com/all-tutorials/36-exam-22/132-understanding-principal-component-analysis-pca

  5. Example of how PCA works (by R) • A financial sample data with 8 variables and 25obs • Perform PCA on this data and reduce the number of variables from 8 to something more manageable https://www.riskprep.com/all-tutorials/36-exam-22/132-understanding-principal-component-analysis-pca

  6. Simulate PC on uncorrelated data and highly correlated data (by R) • PCA is better for more highly correlated data in that greater reduction is achievable.  Provided by Dr. Peter Westfall

  7. PCA standardization Why: The variable with the smaller numbers – even though this may be the more important number – will be overwhelmed by the other larger numbers in what it contributes to the covariance https://www.riskprep.com/all-tutorials/36-exam-22/132-understanding-principal-component-analysis-pca

  8. properties of PC • The number of principal components is less than or equal to the number of original variables. • The first principal component has the largest possible variance. • Each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. https://en.wikipedia.org/wiki/Principal_component_analysis

  9. What is SVD? Applied_Regression_Analysis_A_Research_Tool.pdf

  10. Relationship between SVD and PCA • From SVD we have X = UL1/2ZT-> W = XZ = UL1/2 • If X is an n × p matrix of observations on p variables, each column of W is a new variable defined as a linear transformation of the original variables. Applied_Regression_Analysis_A_Research_Tool.pdf

  11. EFA vs PCA • EFA: EFA provides a model to explain why the data looks like it does. • PCA: PC is not a model  that explains how the data looks.  There is no model at all. Provided by Dr. Peter Westfall

  12. EFA vs PCA http://www.gac-usp.com.br/resources/use_of_exploratory_factor_analysis_park_dailey.pdf

  13. EFA vs PCA EFA: in EFA one postulates that there is a smaller set of unobserved (latent) variables or constructs underlying the variables actually observed or measured (this is commonly done to assess validity) PCA: in PCA one is simply trying to mathematically derive a relatively small number of variables to use to convey as much of the information in the observed/measured variables as possible http://www.gac-usp.com.br/resources/use_of_exploratory_factor_analysis_park_dailey.pdf

  14. Application of PCA • Data visualization • Image compression

  15. Data visualization • If a multivariate dataset is visualized as a set of coordinates in a high-dimensional data space (1 axis per variable), PCA can supply the user with a lower-dimensional picture. https://en.wikipedia.org/wiki/Principal_component_analysis

  16. PCA using on compressing image • The PCA formulation may be used as a digital image compression algorithm with a low level of loss. http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1679-45082012000200004

  17. princomp vs prcomp • For prcomp: • The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using eigen on the covariance matrix. This is generally the preferred method for numerical accuracy. • For princomp: • The calculation is done using eigen on the correlation or covariance matrix, as determined by cor. This is done for compatibility with the S-PLUS result. A preferred method of calculation is to use svd on x, as is done in prcomp." http://stats.stackexchange.com/questions/20101/what-is-the-difference-between-r-functions-prcomp-and-princomp

  18. Thanks!

More Related