1 / 26

Introduction to Multivariate Analysis and Multivariate Distances

Introduction to Multivariate Analysis and Multivariate Distances. Hal Whitehead BIOL4062/5062. Data matrices Problems with data matrices missing values outliers Matrices used in multivariate analysis Multivariate distances Association matrices. The Data Matrix. Variables:. Units:.

rehan
Download Presentation

Introduction to Multivariate Analysis and Multivariate Distances

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Multivariate Analysis and Multivariate Distances Hal Whitehead BIOL4062/5062

  2. Data matrices • Problems with data matrices • missing values • outliers • Matrices used in multivariate analysis • Multivariate distances • Association matrices

  3. The Data Matrix Variables: Units:

  4. The Data Matrix

  5. Visualize Data Matrix as:Points in multidimensional space

  6. Problems with Data Matrix • Missing values • Outliers • Units not independent • Many zeros • Not multivariate normal

  7. Missing DataOften present in ecological, or other biological, data • delete columns of data matrix

  8. Missing DataOften present in ecological, or other biological, data • delete columns of data matrix • delete rows of data matrix

  9. Missing DataOften present in ecological, or other biological, data • delete columns of data matrix • delete rows of data matrix • just delete pairs of elements where one is missing

  10. Missing DataOften present in ecological, or other biological, data • delete columns of data matrix • delete rows of data matrix • just delete pairs of elements where one is missing • interpolate 0.12

  11. Outliers • Statistical packages often indicate “outliers” *** WARNING *** Case 86 has large leverage (Leverage = 0.252) • If plausibly: • the result of biological, or other, processes outside the scope of the model being used, • or the results of measurement or coding error, • they may be discarded • Otherwise they should be retained • (perhaps use a different model)

  12. Problems with Data Matrix • Missing values • Outliers • Units not independent • Not a problem unless doing tests • Many zeros • Special methods (e.g. correspondence analysis) • Not multivariate normal • Transform if possible

  13. Uses of Multivariate Analysis • Large data sets • simplify • summarize • find patterns • Analyze groupings of units • Find groupings of units • Examine relationships between variables

  14. Some Matrices Used inMultivariate Analysis • Data matrix: rectangular • units i=1,…,n • variables j, k • Covariance matrix between variables: symmetric (square/triangular) • cjk= Σ (xij-xj) · (xik-xk) / (n-1) [xk = mean(xik)] • Correlation matrix between variables: symmetric (square/triangular) • rjk=cjk/(Sj Sk)[Sk = SD(xik)]

  15. Data Matrix

  16. Covariance Matrix

  17. Correlation Matrix

  18. Multivariate distancesbetween units or groups of units1. Euclidean distance p variables

  19. Multivariate distancesbetween units or groups of units2. Penrose distance p variables Sk2variance of xik Corrects for different units, different ranges of units of variables

  20. Multivariate distancesbetween units or groups of units3. Mahalanobis distance p variables vrselements of inverse of covariance matrix Corrects for correlations between variables

  21. 3 species of iris; 4 measurements • Euclidean distances: A 0 B 3.2 0 C 4.8 1.6 0 A B C • Penrose distances: A 0 B 2.8 0 C 3.9 1.5 0 A B C • Mahalanobis distances: A 0 B 89.9 0 C 179.4 17.2 0 A B C

  22. The Standard Data Matrix Variables: Units:

  23. The Association Matrix Units: Units:

  24. Similarity Dissimilarity Association matrices • Social structure • association between individuals • Community ecology • similarity between species, sites • dissimilarities between species sites • Genetic distances • Correlation matrices • Covariance matrices • Distance matrices • Euclidean, Penrose, Mahalanobis

  25. Association matricesDissimilarity/Similarity • Mahalanobis distances between iris species: • A 0 • B 89.9 0 • C 179.4 17.2 0 • A B C Genetic relatedness among bottlenose dolphins (Krutzen et al. 2003)

  26. Association matricesSymmetric/Asymmetric Grooming rates of capuchin monkeys (Perry 1996) Genetic relatedness among bottlenose dolphins (Krutzen et al. 2003)

More Related