1 / 60

An Introduction to Multivariate Analysis

An Introduction to Multivariate Analysis. Drs. Alan S.L. Leung and Kenneth M.Y. Leung. Lectures 14-15. Multivariate analysis. An extension to univariate (with a single variable) and bivariate (with two variables) analysis

kieve
Download Presentation

An Introduction to Multivariate Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction toMultivariate Analysis Drs. Alan S.L. Leung and Kenneth M.Y. Leung Lectures 14-15

  2. Multivariate analysis • An extension to univariate (with a single variable) and bivariate (with two variables) analysis • Dealing with a number of samples and species/environmental variables simultaneously

  3. Multivariate Data Set Morphological measurement of organisms (e.g. length) Physiological measurement of organisms (e.g. blood pressure) Physiochemical measurement of the environment (e.g. air temperature) Species abundance Species richness etc…… Data usually in a form of data matrix…..

  4. Similarity (S) between samples • Ranged from 0 to 100 % or 0 to 1 • S = 100% if two samples are totally similar (i.e. the entries in two samples are identical) • S = 0 if two samples are totally dissimilar (i.e. the two samples has no species in common)

  5. Bray-Curtis coefficient (Bray & Curtis, 1957) • First developed in terrestrial ecology Where, yij represented the abundance of species i in sample j, yik represented the abundance of species i in sample k, and n represented the total number of samples.

  6. where, yij represented the abundance of species i in sample j, yik represented the abundance of species i in sample k, and n represented the total number of samples. • Please calculate the Bray-Curtis Similarity between samples: • X2 and X3 • X3 and Y1

  7. }= 84 SX2 X3= 100{1 - 3+0+0+2+8 11+0+0+14+58 }= 38 SX3 Y1= 100{1 - 0+6+8+2+30 14+6+8+10+36

  8. Species similarity matrix

  9. Transformation • Two distinct roles: • To validate statistical assumptions for parametric analysis (e.g. variance heterogeneity in ANOVA) • To weight the contributions of common and rare species in non-parametric multivariate analysis

  10. Why Transforming the data? • To weight the contributions of common and rare species • Transformed and untransformed data can give different results on the computation of dissimilarities between samples • Affect the final outcome (solution) of nMDS

  11. Intermediate abundance species Degree of severity Not commonly used Rare species Choice of transformation in multivariate analysis • Square-root • Fourth-root / Log (1+y) • Presence/Absence

  12. Species similarity matrix – Fourth-root transformed Some patterns can be seen, but…

  13. Multivariate Techniques • The most widely used multivariate techniques included: • Cluster Analysis • Ordination • E.g. Multiple discriminant analysis

  14. Cluster Analysis • Put samples (sites, species, or environmental variables) into groups based on their similarity. • Samples within the same group are more similar to each other than samples in different groups

  15. Dendrogram Samples Statistical Software: PRIMER 5 for Windows

  16. Ordination • Graphical presentation technique • Ordination map (usually two or three-dimensional) • The relatively distances among points in the ordination map represent the similarity among samples (say species composition)

  17. Two Types of Ordination Techniques Indirect gradient analysis Only includes biological data - Species abundance by samples matrix Environmental data can be correlated with the ordination axes subsequently Direct gradient analysis Includes both environmental and biological data

  18. Indirect gradient analysis Including: Principle Component Analysis (PCA) Correspondence Analysis (CA) Detrended Correspondence Analysis (DCA) Non-metric Multi-dimensional Scaling (nMDS) Principle Component Analysis (PCA) Direct gradient analysis Including: Redundancy Analysis (RD) Canonical Correspondence Analysis (CCA) Detrended Canonical Correspondence Analysis (DCCA) Non-metric Multi-dimensional Scaling (nMDS)

  19. PCA Best-fit curve • Use original data matrix First Principle Component Axis (PC1) Source: Clarke, K. R. & Warwick, R. M. (1994) Change in Marine Communities: an Approach to Statistical Analysis and Interpretation. Plymouth Marine Laboratory, Plymouth: 144pp.

  20. Second principal component axis (PC2) – perpendicular to PC1 (i.e. uncorrelated / orthogonal) Rotation

  21. Third principal component axis (PC3) Theoretically, many more species can be added

  22. The variances extracted by the PCs Eigenvalues PC Eigenvalues %Variation Cum.%Variation 1 3.39 67.8 67.8 2 0.92 18.4 86.1 3 0.56 11.2 97.4 4 0.11 2.1 99.5 5 0.02 0.5 100.0 Eigenvectors (Coefficients in the linear combinations of variables making up PC's) Variable PC1 PC2 PC3 PC4 PC5 A 0.269 0.823 0.485 -0.088 -0.092 B 0.521 -0.264 -0.018 -0.143 -0.799 C 0.515 -0.226 0.082 -0.635 0.523 D -0.499 0.227 -0.292 -0.739 -0.261 E -0.377 -0.388 0.820 -0.150 -0.109 Species

  23. PCA Assumptions • Linear relationships between variables • Normality of the variables Ecological data which can fulfill these assumptions are rare…..

  24. Multidimensional Scaling • A technique for analyzing multivariate data • Visualization of the relationships between samples to facilitate interpretation in a low dimensional space • There are two types of MDS: • Metric • Non-metric

  25. Metric MDS: • Assume the input data is either interval or ratio during measurement • Quantitative • Non-metric MDS (nMDS) • The data should be in the form of rank • Quantitative and/or Qualitative

  26. Major Advantages of nMDS • Ordination is based on the ranked similarities/dissimilarities between pairs of samples • Ordinal data could be used The actual values of data are not being used in the ordination, few (no?) assumptions on the nature and quality of the data e.g. 1 = very low; 2 = low; 3 = mid; 4 = high; 5 = very high

  27. Bray-Curtis similarity Modified from Clarke & Warwick, 1994

  28. An Ecological Example Spatial and temporal variability in benthic macroinvertebrate communities in Hong Kong Streams

  29. Macroinvertebrate communities

  30. Macroinvertebrate communities

  31. Macroinvertebrate communities

  32. Study Sites (HK map)

  33. Spatial

  34. Temporal

  35. Macroinvertebrate Sampling & Identification

  36. Statistical Analysis Nested analysis of variance (ANOVA) Regions (Random, orthogonal) Sites (Random, nested within Regions) Sections (Random, nested within Sites) Spatial Years (Random, orthogonal) Seasons (Fixed, orthogonal) Days (Random, nested within Years and Seasons) Temporal Interactions between them

  37. Statistical Analysis Non-parametric multivariate analysis • Non-metric multidimensional scaling (NMDS) • Analysis of similarities (ANOSIM) Display the stream community data in ordination diagrams intended to reveal underlying patterns in the community structure Compare the community structure among spaces and times

  38. Species Abundance vs. Samples

  39. Fourth – root transformed

  40. A measure of goodness-of-fit

  41. Multivariate analysis - Temporal Years [All samples in all sites; Each Region; Each Site; Each Section in each Site] Seasons (all years & each year) [All samples in all sites; Each Region; Each Site; Each Section in each Site] Dates within Seasons in each year

More Related