1 / 19

Multivariate statistical methods

Multivariate statistical methods. Multivariate methods. multivariate dataset – group of n objects, m variables (as a rule n > m, if possible). confirmation vs. eploration analysis confirmation – impact on parameter estimate and hypothesis testing

amadis
Download Presentation

Multivariate statistical methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate statistical methods

  2. Multivariate methods • multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). • confirmation vs. eploration analysis • confirmation – impact on parameter estimate and hypothesis testing • exploration – impact on data exploration, finding out of patterns and structure

  3. Multivariate statistical methods Unit classification • Cluster analysis • Discrimination analysis Analysis of relations among variables • Cannonical correlation analysis • Factor analysis • Principal component analysis

  4. Methods for analysis of relations among variables

  5. Principal component analysis • the oldest and the most used multivariate statistical methods • standed by Pearson in 1901 and independently from Pearson also by Hotelling in 1933 • principal aims: • detection of relations among variables • reduction of variables number and finding of new purposeful variables

  6. Principal component analysis • as fundament is linear transformation of original variables into less number of new fictituous variables, so called principal components • component characteristics: • are not mutually correlated • for m original variables is r<=m good dimension, r (best a lot less than m) principal components explain sufficiency variability of original variables

  7. PCA • component characteristics: • method is based on full explanation of total variability • principal components are ordered according share of explained variance • the most of variance is explained by first component, the least by last component

  8. PCA procedure • starting analysis – exploration of relations among variables (graphs, descriptive statistics) • exploration of correlation matrix (existence of correlation among original variables – reduction of variables is possible) • principal component analysis, choice of suitable number of components (usually is enough 70 – 90 % of explained variance) • interpretation of principal components

  9. PCA procedure • PCA is based on • covariance matrix (the same units of variables, similar variance) • correlation matrix (standardized data or different units of variables)

  10. Model of PCA → standardized original variable … weights of principal component … prin. components in standardized expression j,k = 1,2, …., p i = 1,2, …., n - number of units j = 1,2, …., p - number of variables

  11. PCA – mathematical model • original matrix – dataset X (n x m), n objects, m variables • Z = [zij] standardized matrix X i = 1,…., n j = 1,…., m • aim is find out transformation matrix Q, which convert m standardized variables (matrix Z) into m mutual independent component (matrix P) P = Z . Q

  12. PCA – mathematical model • Modification of P = Z . Q→ we get matrix

  13. PCA – mathematical model • matrix Λ is matrix of covariance and variance of principal components. With regard to independence of principal components are covariances 0 and matrix Λ is diagonal with variances of principal component on diagonal • sum of variances standardized variables equals to m. proportions indicate, how large is the share of the first, second, … last component on explanation of the total variance of all variables

  14. PCA – mathematical model • matrix R is correlation matrix of original variables where Diagonal values of matrix Λ are eigenvalues of matrix R, in columns of matrix Q are eigenvectors related to each eigenvalue

  15. PCA – other notions • coordinates of nonstandardized principal component are called „score“ • matrix of all score for all objects (n) is called „score matrix“ • scores for objects are in rows • matrix columns are vectors of score

  16. PCA – other notions • share of total variability of each original variable Xi, i = 1, 2,…, m, which is explained by r principals components is called communality of variable Xi. • is computed as second power of multiple coefficient of correlation → r2

  17. PCA – graphical visualisation • Cattel´s graph → scree plot • tool for determination of number of principal components

  18. PCA – graphical visualization • graph of coefficients of correlation (1st and 2nd principal component)

  19. PCA – graphical visualization • Graph of component score

More Related