600 likes | 719 Views
Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin. Distance Measures and Ordination. Goals of Ordination. To arrange items along an axis or multiple axes in a logical order To extract a few major gradients that explain much of the variability in the total dataset
E N D
Adapted from Ecological Statistical Workshop, FLC, Daniel Laughlin Distance Measures and Ordination
Goals of Ordination • To arrange items along an axis or multiple axes in a logical order • To extract a few major gradients that explain much of the variability in the total dataset • Most importantly: to interpret the gradients since important ecological processes generated them
What makes ordination possible? • Variables (species) are “correlated” (in a broad sense) • Correlated variables = redundancy • Ordination thrives on the complex network of inter-correlations among species
Ordination helps to: • Describe the strongest patterns of community composition • Separate strong patterns from weak ones • Reveal unforeseen patterns and suggest unforeseen processes
“Direct” gradient analysis • Order plots along measured environmental gradients • e.g., regress diatom abundance on salinity
“Indirect” gradient analysis • Order plots according to • covariation among species, or • dissimilarity among sample units • Following this step, we can then examine correlations between environment and ordination axes • Axes = Gradients • In PCA, these are called “Principal Components”
Data reduction • Goal: to reduce the dimensionality of community datasets • (i.e., from 100 species down to 2 or 3 main gradients) n x p n x d These d dimensions represent the strongest correlation structure in the data This is possible because of redundancy in the data (i.e., species are “correlated”)
Ordination Diagrams • Do not seek patterns as you would with a regression: axes are orthogonal (uncorrelated) • Know two things: • What the points represent (plots or species?) • Distance in the diagram is proportional to compositional dissimilarity NMS Ordination Axis 2: “Biotic” Axis 1: “Abiotic”
How many axes? • “How many discrete signals can be detected against a background of noise?” • Typically we expect 2 or 3 gradients to be sufficient, but if we know that 5 independent environmental gradients are structuring the vegetation (water, light, CO2, nutrients, grazers, etc.), then perhaps 5 axes are justified
Two basic techniques • Eigenanalysis methods- use information from variance-covariance matrix or correlation matrix (e.g., PCA) • Appropriate for linear models since covariance is a measure of a linear association • Distance-based methods- use information from distance matrix (e.g., NMS) • Appropriate for nonlinear models since some distance measures and ordination techniques can “linearize” nonlinear associations
Distance measures • Distance = Difference = Dissimilarity • Distance matrix is like a triangular mileage chart on maps (symmetric) • We are interested in the distances between sample units (plots) in species space
Distance measures • In univariate species space (one species), the distance between two points is their difference in abundances • We will examine two kinds of distance measures: • Euclidean distance, and • Bray-Curtis (Sorenson) distance
Domains and Ranges Distance Domain of x Range of d =f(x) Euclidean all non-negative Sorenson x ≥ 0 0<d<1 (0<d<100)
Which one works best? “If species respond noiselessly to environmental gradients, then we seek a perfect linear relationship between distances in species space and distances in environmental space. Any departure from that represents a partial failure of our distance measure.” McCune p. 51
Easy dataset (low beta diversity) Figure 6.6
Difficult dataset (high beta diversity) Intuitive property Figure 6.7
NMS is able to linearize the relationship between distance in species space and environmental distance because it is based on ranked distances (stay tuned)
Theoretical basis • Our choice is primarily empirical: we should select measures that have been shown superior performance • One important theoretical basis: ED measures distance through uninhabitable, impossibly species rich space. • In contrast, city-block distances are measured along the edges of species space- exactly where the sample units lie in the dust bunny distribution!
NonmetricMultidimensional Scaling(NMS, NMDS, MDS, NMMDS, etc.)
NMS • Uses a distance/dissimilarity matrix • Makes no assumptions regarding linear relationships among variables • Arranges plots in a space that best approximates the distances in a distance matrix
From a map to a distance matrix Calculate distances
From a distance matrix to a map NMS Question: How well do the distances in the ordination match the distances in the distance matrix?
Advantages of NMS • Avoids the assumptions of linear relations • The use of ranked distances tends to linearize the relationship between distances in species space and distances in environmental space • You can use any distance measure
Historical disadvantages of NMS • Failing to find the best solution (low “stress”) due to local minima • Slow computation time These concerns have largely been dealt with given modern computer power
In a nutshell • NMS is an iterative search for the best positions of n entities on k dimensions (axes) that minimizes the stress of the k-dimensional configuration • “Stress” is a measure of departure from monotonicity in the relationship between the original distance matrix and the distances in the ordination diagram
Achieving monotonicity Fig 16.2 The closer the points lie to a monotonic line, the better the fit and the lower the stress. If S* = 0, then relationship is perfectly monotonic Blue = perfect fit, monotonic Red = high stress, not monotonic
Instability • Instability is calculated as the standard deviation in stress over the preceeding 10 iterations • Instabilities of 0.0001 are generally preferred sd = sqrt(var)
Landscape analogy for NMS Global minimum Local minimum (strong, regular, geometric patterns emerge)
Reliability of Ordination • Low stress and stable solutions • Proportion of variance represented (R2) • Monte Carlo tests
Variance represented? • “Ode to an eigenvalue” • NMS not based on partitioning variance, so there is no direct method • Calculate R2 for relationship between Euclidean distances in ordination versus Bray-Curtis distances in distance matrix • Axis Increment Cumulative R2 • 0.37 0.37 • 0.20 0.57 • 0.15 0.72
Monte Carlo test • Has the final NMS configuration extracted stronger axes than expected by chance? • Compare stress obtained using your data with stress obtained from multiple runs of randomized versions of your data (randomly shuffled within columns) • P-value = (1+n)/(1+N) n = # of random runs with final stress less than or equal to the observed minimum stress, N = number of randomized runs P-value = the proportion of randomized runs with stress less than or equal to the observed stress
Autopilot mode in PC-ORD Table 16.3 in McCune and Grace (2002)
Choosing the best solution • Select the appropriate number of dimensions • Seek low stress • Use a Monte Carlo test • Avoid unstable solutions
1. How many dimensions? One dimension is generally not used, unless the data is known to be unidimensional. More than three becomes difficult to interpret. Find the elbow and inspect Monte Carlo tests. elbow Figure 16.3
2. Seek low stress • <5 = excellent • 5-10 = good • 10-20 = fair, useable • 20-30 = not great, still useable • >30 = dangerously close to random Adapted from Table 16.4, p 132
A general procedure • Carefully read pages 135-136 • In your papers, you should report the information that is listed on page 136 • Autopilot mode works really well, but don’t publish ordinations obtained using the Quick and Dirty option! Be sure to publish the parameter settings.
Interpreting NMS axes • Two main/complementary approaches • Evaluate how species abundancesare correlated with NMS axes • Evaluate how environmental variablesare correlated with NMS axes
Overlays • Overlays: flexible way to see whether a variable is patterned on an ordination; not limited to linear relationships Axis 1
Species versus Axes Resist the temptation to use p-values when examining these relationships! - nonlinear - circular reasoning Unimodal pattern Linear pattern
Environmental Variables • Joint plots- diagram of radiating lines, where the angle and length of a line indicate the direction and strength of the relationship
The analysis of community composition • Continuous covariates • Use ordination to produce a continuous response variable (i.e., axis) • Use covariance analysis (multiple regression, SEM) to explain variance of the axis • Categorical groups • Ordination is not required (remember, ordination is not the test) • Permutational MANOVA (PerMANOVA): can use on any experimental design • MRPP (only one-way or blocked designs) • ANOSIM (up to two factors, in R and PRIMER)
MANOVA • Multivariate Analysis of Variance • Traditional parametric method • Assumes linear relations among variables, multivariate normality, equal variances and covariances • Not appropriate for community data
PerMANOVA • Permutational MANOVA • Straightforward extension of ANOVA • Decomposes variance in the distance matrix • No distributional assumptions • Can still be sensitive to heterogeneous variances (dispersion) among groups • Anderson, M. 2001. Austral Ecology