590 likes | 1.51k Views
Canonical Correlation Analysis (CCA). CCA. This is it! The mother of all linear statistical analysis. When ? We want to find a structural relation between a set of independent variables and a set of dependent variables. CCA. When ? (part 2)
E N D
CCA • This is it! • The mother of all linear statistical analysis • When ? • We want to find a structural relation between a set of independent variables and a set of dependent variables.
CCA • When ? (part 2) • To what extend can one set of two or more variables be predicted or “explained” by another set of two or more variables? • What contribution does a single variable make to the explanatory power to the set of variables to which the variable belongs? • What contribution does a single variable contribute to predicting or “explaining” the composite of the variables in the variable set to which the variable does not belong? • What different dynamics are involved in the ability of one variable set to “explain” in different ways different portions of other variable set? • What relative power do different canonical functions have to predict or explain relationships? • How stable are canonical results across samples or sample subgroups? • How closely do obtained canonical results conform to expected canonical results?
CCA • Assumptions • Linearity: if not, nonlinear canonical correlation analysis. • Absence of multicollinearity: If not, Partial Least Squares (PLS) regression to reduce the space. • Homoscedasticity: If not, data transformation. • Normality: If not, re-sampling. • A lot of data: Max(p, q)20nb of pairs. • Absence of outliers.
CCA • Toy example IVs DVs =X
CCA • Z score transformation IV1 IV1 DV2 DV2 =Z
CCA • Canonical Correlation Matrix
CCA • Relations with other subspace methods
CCA • Eigenvalues and eigenvectors decomposition R = PCA
CCA • Eigenvalues and eigenvectors decomposition • The roots of the eigenvalues are the canonical correlation values
CCA • Significance test for the canonical correlation • A significant output indicates that there is a variance share between IV and DV sets • Procedure: • We test for all the variables (m=1,…,min(p,q)) • If significant, we removed the first variable (canonical correlate) and test for the remaining ones (m=2,…, min(p,q) • Repeat
CCA • Significance test for the canonical correlation Since all canonical variables are significant, we will keep them all.
CCA • Canonical Coefficients • Analogous to regression coefficients BY= Eigenvectors Correlation matrix of the dependant variables Bx=
CCA • Canonical Variates • Analogous to regression coefficients
CCA • Loading matrices • Matrices of correlations between the variables and the canonical coefficients Ax Ay
CCA • Loadings and canonical correlations for both canonical variate pairs • Only coefficient higher than |0.3| are interpreted. Loading Canonical correlation
CCA • Proportion of variance extracted • How much variance does each of the canonical variates extract form the variables on its own side of the equation? First First Second Second
CCA • Redundancy • How much variance the canonical variates form the IVs extract from the DVs, and vice versa. rdyx Eigenvalues
CCA • Redundancy • How much variance the canonical variates form the IVs extract from the DVs, and vice versa. Summary The first canonical variate from IVs extract 40% of the variance in the y variable. The second canonical variate form IVs extract 30% of the variance in the y variable. Together they extract 70% of the variance in the DVs. The first canonical variate from DVs extract 49% of the variance in the x variable. The second canonical variate form DVs extract 24% of the variance in the x variable. Together they extract 73% of the variance in the IVs.
CCA • Rotation • A rotation does not influence the variance proportion or the redundancy. = Loading matrix =