140 likes | 358 Views
Multivariate Statistics. ESM 206, 5/17/05. A collection of techniques to help us understand patterns in and make predictions with large datasets with many variables Ordination: find a (hopefully small) number of composite variables that capture most of the variability among data points
E N D
Multivariate Statistics ESM 206, 5/17/05
A collection of techniques to help us understand patterns in and make predictions with large datasets with many variables Ordination: find a (hopefully small) number of composite variables that capture most of the variability among data points Cluster Analysis: discover natural groupings of similar data points Discriminant Analysis: find a (hopefully small) number of composite variables that can be used to predict the levels of a categorical dependent variable Canonical Correlation Analysis: find relationships between two groups of variables “Dependent variable” is multivariate WHAT IS MULTIVARIATE STATISTICS?
Reflect more accurately the true multidimensional nature of environmental systems Provide a way to handle large datasets with large numbers of variables by summarizing the redundancy Provide rules for combining variables in an “optimal” way Provide a means of detecting and quantifying truly multivariate patterns that arise out of correlational structure of the variable set Provide a means of exploring complex data sets for patterns and relationships from which hypotheses can be generated and subsequently tested experimentally WHAT CAN MULTIVARIATE STATISTICS DO?
ORDINATION • Simplify the interpretation of complex data by organizing sampling entities along independent gradients or factors defined by combinations of interrelated variables • Uncover a more fundamental set of factors that account for the major patterns across all of the original variables • If a few major gradients explain much of the variability in data, then data can be interpreted with respect to these gradients without loss of information
Most commonly used ordination technique Given P correlated variables, extract P principal components Linear combinations of the variables Uncorrelated with one another First PC is direction through data cloud that captures the most variance in data Second PC is direction perpendicular to first that captures the most remaining variance Etc. Assumptions of PCA: Data are multivariate normal Data are independent Observed variables depend linearly on underlying factors May need to transform data to satisfy these Unless variables are all measured on same scale, use correlations rather than covariances Gives equal weight to variability in all variables PRINCIPAL COMPONENTS ANALYSIS (PCA)
72 chemical compounds tested for solubility in each of 6 solvents Solubility measure on log scale Strong (but not perfect) correlations among the 6 solvents Can we use fewer than 6 variables to characterize each chemical? EXAMPLE: CHEMICAL SOLUBILITY
Eigenvalue indicates how much of the variability in data is explained by the PC Magnitude depends on number of variables (and variances if done with covariance matrix) Instead look at percents Eigenvector gives coefficients of linear relationship of PC to each variable NOTE: some software scales the eignvectors differently Interpretation: PC1 is axis of overall increasing solubility PC2 is axis of differential solubility in 1-Ocatanol & Ether vs. other 4 solvents SOLUBILITY PCA
Organizes sampling entities (e.g., species, sites, observations) along continuous environmental gradients Assesses relationships within single set of variables; doesn’t define relationship between a set of independent variables and one or more dependent variables However, PC’s can be used as independent variables in a regression Reduces dimensionality of multivariate data set by condensing large # of original variables into smaller set of new composite variables with minimal loss of information Summarizes data redundancy by placing similar entities in proximity in ordination space Defines new composite variables (e.g., principal components) as weighted linear combinations of the original variables Eliminates noise from a multivariate data set by recovering patterns in first few composite dimensions and deferring noise to subsequent axes CHARACTRISTICS OF ORDINATION
Polar Ordination (PO) Factor Analysis (FA) This is often used as a generic term meaning “ordination” in social sciences Nonmetric Multidimensional Scaling (NMMDS) Relaxes normality and linearity assumptions by using ranks Correspondence Analysis (CA) Allows data (e.g., species abundance) to take on peak values at intermediate levels of the gradient Also called Reciprocal Averaging Detrended Correspondence Analysis (DCA) Deals particularly well with nonlinear relationships Canonical Correspondence Analysis (CCA) Like CA, but ordination of variables of interest (e.g., species abundance) is constrained to depend linearly on other variables (e.g., environmental characteristics) measured at same sites OTHER ORDINATION TECHNIQUES
FURTHER READING • McGarigal, K., S. Cushman, and S. Stafford. 2000. Multivariate Statistics for Wildlife and Ecology Research (Springer-Verlag, New York). • Gotelli, H.J., and A.M. Ellison. 2004. A Primer of Ecological Statistics (Sinauer, Sunderland, MA); Chapter 12. • Spicer, J. 2005. Making Sense of Multivariate Data Analysis (Sage Press, Thousand Oaks).