220 likes | 505 Views
Multivariate statistical methods. Multivariate methods. multivariate dataset – group of n objects, m variables (as a rule n > m, if possible). confirmation vs. eploration analysis confirmation – impact on parameter estimate and hypothesis testing
E N D
Multivariate methods • multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). • confirmation vs. eploration analysis • confirmation – impact on parameter estimate and hypothesis testing • exploration – impact on data exploration, finding out of patterns and structure
Multivariate statistical methods Unit classification • Cluster analysis • Discrimination analysis Analysis of relations among variables • Cannonical correlation analysis • Factor analysis • Principal component analysis
Principal component analysis • the oldest and the most used multivariate statistical methods • standed by Pearson in 1901 and independently from Pearson also by Hotelling in 1933 • principal aims: • detection of relations among variables • reduction of variables number and finding of new purposeful variables
Principal component analysis • as fundament is linear transformation of original variables into less number of new fictituous variables, so called principal components • component characteristics: • are not mutually correlated • for m original variables is r<=m good dimension, r (best a lot less than m) principal components explain sufficiency variability of original variables
PCA • component characteristics: • method is based on full explanation of total variability • principal components are ordered according share of explained variance • the most of variance is explained by first component, the least by last component
PCA procedure • starting analysis – exploration of relations among variables (graphs, descriptive statistics) • exploration of correlation matrix (existence of correlation among original variables – reduction of variables is possible) • principal component analysis, choice of suitable number of components (usually is enough 70 – 90 % of explained variance) • interpretation of principal components
PCA procedure • PCA is based on • covariance matrix (the same units of variables, similar variance) • correlation matrix (standardized data or different units of variables)
Model of PCA → standardized original variable … weights of principal component … prin. components in standardized expression j,k = 1,2, …., p i = 1,2, …., n - number of units j = 1,2, …., p - number of variables
PCA – mathematical model • original matrix – dataset X (n x m), n objects, m variables • Z = [zij] standardized matrix X i = 1,…., n j = 1,…., m • aim is find out transformation matrix Q, which convert m standardized variables (matrix Z) into m mutual independent component (matrix P) P = Z . Q
PCA – mathematical model • Modification of P = Z . Q→ we get matrix
PCA – mathematical model • matrix Λ is matrix of covariance and variance of principal components. With regard to independence of principal components are covariances 0 and matrix Λ is diagonal with variances of principal component on diagonal • sum of variances standardized variables equals to m. proportions indicate, how large is the share of the first, second, … last component on explanation of the total variance of all variables
PCA – mathematical model • matrix R is correlation matrix of original variables where Diagonal values of matrix Λ are eigenvalues of matrix R, in columns of matrix Q are eigenvectors related to each eigenvalue
PCA – other notions • coordinates of nonstandardized principal component are called „score“ • matrix of all score for all objects (n) is called „score matrix“ • scores for objects are in rows • matrix columns are vectors of score
PCA – other notions • share of total variability of each original variable Xi, i = 1, 2,…, m, which is explained by r principals components is called communality of variable Xi. • is computed as second power of multiple coefficient of correlation → r2
PCA – graphical visualisation • Cattel´s graph → scree plot • tool for determination of number of principal components
PCA – graphical visualization • graph of coefficients of correlation (1st and 2nd principal component)
PCA – graphical visualization • Graph of component score