680 likes | 731 Views
Quantifying biological diversity Lou Jost. Standard measures of biological diversity lead to invalid inferences. What are diversity measures used for?. Measure the impact of human or natural disturbances on an ecosystem. Prioritize sites for conservation
E N D
Quantifying biological diversityLou Jost Standard measures of biological diversity lead to invalid inferences
What are diversity measures used for? • Measure the impact of human or natural disturbances on an ecosystem. • Prioritize sites for conservation • Provide a robust community summary statistic which can be compared with values predicted by ecological and evolutionary theories. • Genetic diversity as well as species diversity
Meaning of diversity in biology • Compositional complexity as viewed by an organism in the community. • In standard approach, diversity is independent of density: Diversity depends only on relative abundances p_i. When comparing diversities of communities, the communities being compared should have the same densities. (In some important approaches this rule can be relaxed.)
Principle of transfers For most purposes, if all else is equal, a community with five equally common species is more diverse than a community with one very abundant species and four extremely rare ones. For fixed richness and density, we require diversity measures to obey a Principle of Transfers: Diversity should never decrease with transfer of abundance from abundant species to rare species.
Compare mean diversity of the groups to the diversity of the pooled groups. Diversity is linked to compositional similarity and differentiation between groups
Standard method in ecology and genetics: Similarity = mean within-group diversity / pooled diversity. Diversity is linked to compositional similarity and differentiation between groups
Biologists have equated diversity with standard measures of complexity: Shannon entropy: Generalized entropies of order or degree q: Example Havrda and Charvat, Daroczy, Tsallis entropy
q=1: “Shannon entropy” or “Shannon-Weaver index” or “Shannon-Wiener index” q=2: “Gini-Simpson index” in ecology, “heterozygosity” or “gene diversity” in genetics. Probability that two randomly drawn individuals are different species. pi is true relative abundance of species i in the population, S is number of species.
Vast ecological and genetic literature based on these measures Do they give reasonable answers to biological questions???
Conservation biology application • 20 islands: Each island has the same number of individuals, the same number of species, and the same species relative abundance distribution. • Assume that there are no shared species between islands; each island has a completely distinctive set of species. • Their diversities are all equal regardless of one’s definition of diversity.
For definiteness suppose the species relative abundances on each island are the same as those actually observed for the trees of Barro Colorado Island, Panama (Hubbell et al. 2005). • Conservation biology question: Suppose our goal is to protect half the diversity of the region. How many islands must be preserved? • The correct answer has to be ten islands by symmetry. What do the standard diversity measures say?
Why does species richness give the reasonable answer while the other standard measures do not? • Each island must contribute equally to total diversity. • Linearity wrt pooling: For N completely distinct, equally large islands of equal diversity, pooled diversity must equal N ∙ individual diversity. • This “Replication Principle” from economics is the requirement for diversity to be self-consistent in these kinds of inferences.
This property is implicit in our intuitive concept of diversity, and many of our rules of inference presuppose this property. • Shannon entropy and the Gini-Simpson index(heterozygosity) do not have this property. • Species richness, exp(H), and inverse Simpson concentration do have it.
Standard method in ecology and genetics: Similarity = mean within-group “diversity” / pooled “diversity”. Linking diversity to compositional similarity and differentiation between groups
So the classic measures of biodiversity give badly misleading and self-contradictory results!!
“Numbers equivalents” or “effective number of species” or “Hill numbers” • How many equally common species are needed to give the observed entropy value X? That number is the “numbers equivalent” of the entropy value. • Found by setting X = H(p1, p2, ..pk) = H(1/D, 1/D,..1/D) and solving for D. Ranges from 1 to S where S is the actual number of species in the community. • Equivalence classes are defined by the value of the standard complexity measures. Parameterize these classes by D. • The value of D obeys the Replication Principle so D is a valid measure of diversity.
“Numbers equivalents” or “effective number of species” or “Hill numbers” • When we find the effective number of species using any q-order generalized entropy that is a monotonic function of the sum of the relative abundances to the power q, we always get the same formula.
The numbers equivalent of a standard generalized entropy (eg Renyi, Tsallis) is the inverse of a power mean of the species relative abundances.
“Numbers equivalents” or “effective number of species” or “Hill numbers” • Introduced by economists in the 1960’s • Introduced to ecology for special cases by MacArthur in the 1960s, and to genetics by Crow and Kimura in 1964 for the special case of heterozygosity. • General case treated by Mark Hill in ecology in 1973. • Didn’t catch on in ecology but did in economics…
“Numbers equivalents” or “effective number of species” or “Hill numbers” • We now have diversity measures which possess the properties that are implicit in the diversity concept used by biologists. • Biologists had been making inferences about diversity which were invalid because the mathematical properties of standard measures of complexity did not support these rules of inference. • Biologists did not notice the problem!!!
Can plot this diversity as a function of q to create a “diversity profile” • This is similar to the Renyi spectrum based on Renyi generalized entropies. • The diversity profile contains all the information contained in the relative abundance vector for the community.
Diversity profile plotting software https://chao.shinyapps.io/SpadeR/
How to partition effective number of species (Hill numbers) into within- and between-group components? • A method commonly used in ecology and genetics to set conservation priorities and understand evolutionary processes is “additive partitioning” (Nei 1973, Lande 1996). • Total (gamma) diversity = mean within-group diversity (alpha) + between-group (beta) diversity, where H is a generalized entropy with q=1 or q=2: Htotal-Hwithin = Hbetween Hgamma – Halpha = Hbeta • For q=2, partitions the Gini-Simpson index or Gene diversity H= • Additive partitioning of the Gini-Simpson index or gene diversity is incomplete---it produces a between-group component that is confounded with the within-group component. Htotal = Hwithin + Hbetween but since 0<H<=1, when Hwithin is close to unity, Hbetween must be close to 0 no matter how different the groups are.
Axioms: Partitioning diversity into within- and between-group components • Complete partitioning: The within-group component should contain no information about the between-group component, and vice versa. Knowledge of one component should give no information, nor impose any mathematical constraint, on the value of the other component. • Within-group component should be a generalized mean of the diversities of the individual groups. Weakest possible specification: If all groups have diversity D, then the within-group diversity is D. • Between-group component must take minimum when communities are identical and maximum when they share no species. This between-group diversity measures the degree of differentiation of the relative abundance vectors. (Other goals are possible.) • These properties are implicit in the way that biologists use the concepts of within- and between-group diversity, even though their measures of within- and between-group diversity did not generally possess these properties.
Partitioning diversity… (Jost, Ecology 2007) • If a meaningful partitioning of the numbers equivalent exists!!!!
Components of Shannon diversity (alpha = within-group, beta = between-group, gamma = total diversity)
“Beta” or between-group diversity for Shannon diversity • Beta ranges from 1 (when all communities have identical compositions) to the exponential of the entropy of the weights (when all assemblages are completely different). • When weights are equal, beta ranges from 1 to N where N is number of communities. • This beta diversity is the effective number of completely distinct communities in the region or dataset. It is the exponential of Mutual Information.
Shannon beta diversity = regional heterogeneity of the relative abundance vectors • If there are N equally large, completely distinct communities in the region, beta = N. • It is not necessary to identify or demarcate the communities in the region: just make many random sample points or points on a grid. As number of samples becomes large, the sample beta approaches the true regional beta.
Shannon beta diversity = regional heterogeneity of the relative abundance vectorsEach color represents a completely distinct community with no shared species with other communities. Beta diversity for left-hand region =4, beta diversity for right-hand region =1.05.
Within-group or “alpha” diversity for q ≠1 Mean generalized entropy, converted to effective number of species. Beta ranges from 1 to N where N is number of communities. It is impossible to satisfy my partitioning conditions when weights are unequal for q ≠ 0 or 1.
Normalizations of this beta diversity provide measures of compositional similarity and differentiation
q=0, 1 Beta and normalized similarity and differentiation measures obey strongest possible monotonicity properties. Other values of q Beta and normalized similarity and differentiation measures obey weaker monotonicity properties and should not be used without good reason. Monotonicity issues
Another problem with classical diversity measures • Diversity is the complexity per individual, depends only on relative abundances. Adding a super-abundant new species will decrease diversity even if the original species are not affected by the new species. • An ideal diversity measure would never decrease with the addition of a new species that does not change the abundances of the other species. Hill numbers do not satisfy that property • Need to find a density-dependent analogue of “probability of inter-specific encounters”, such as the rate of interspecific encounters per unit time. Active field of work.
Why have people ignored these problems with their measures for so long? • Ecologists and geneticists often treat measures as mere tools for the calculation of p-values (statistical significance). • Statistical significance depends at least as much on sample size as on the magnitude of the effect being measured. • In natural populations, the null hypothesis of zero differentiation is virtually always false, and if sample size is large enough, a difference can always be demonstrated with any desired degree of statistical significance. • P- values are not a substitute for real measures of effect size, and despite its popularity with researchers and journal editors, testing a null hypothesis is rarely the appropriate model in science.
Why have people ignored these problems with their measures for so long? Ecological problems should usually be cast in terms of estimating a meaningful parameter, rather than testing an always-false null hypothesis (which will always be rejected if sample size is large enough). Biologists need measures whose absolute magnitudes are interpretable. Statistical uncertainties should be expressed by confidence intervals, not p values.
Xie xieThank you! • Special thanks to Dr. Anne Chao for inviting me and giving me the chance to know this beautiful country.
Supplementary Material: Population Genetics • Gst = 1- Hs/Ht • We’ve already seen how badly this works as a measure of compositional differentiation in ecology, but it is still the standard measure of compositional differentiation on genetics (it also has other more legitimate uses in genetics).
A measure of allelic differentiation • The complement of the Morisita-Horn index (q=2) is a measure of dissimilarity for genetics: D = [(HT – HS)/(1 – HS)] [n/(n-1)] = 1- GD /GS • If all n subpopulations consist of k equally common alleles, this measure gives the proportion of each subpopulation’s alleles that are unique to that subpopulation. • This is a measure of pure differentiation, independent of average within-subpopulation heterozygosity, unlike GST. • It should replace GST when heterozygosity-based allelic differentiation is the quantity of interest.
Link to ecological and genetic models • Hubbell’s neutral model of biodiversity • Finite island model in population genetics • Kind of like the ideal gas or the two-body problem of physics; they are simple enough that we can solve them analytically.
Differentiation D at equilibrium is: D ≈1/{1 + m/[u(n-1)] These factors control neutral divergence in subdivided populations. Very different from the standard view. Traditional Gst at equilibrium under the same model is: Gst = 1- Hs/Ht ≈ 1/(1+ 4Nm + 4Nu) ≈ 1/(1+ 4Nm) n = # of subpopulations, N= size of subpopulations, m = migration rate, u = mutation rate. Genetic divergence
Population genetics rule of thumb: more than 1 migrant per generation = little or no differentiation in the absence of natural selection. Counter-intuitive for large N. This is wrong since it only tells us that GST will be low, not that real allelic differentiation will be low. GST can be low even for completely differentiated subpopulations, or can be high even when subpopulations show little differentiation. Genetic divergence