BioSim2 Workshop

BioSim2 Workshop New England Association of Environmental Biologists 30th Annual Meeting Bethel Inn Bethel, Maine March 30, 2006

Outline • B, Index of Biotic Similarity -1976 Paper Faults of other Indexes Available Basic Concept and Biosim • B, Index of Biotic Similarity -1992 Paper Options and BioSim1 Preferred Options • B, Index of Biotic Similarity -2005 Paper BioSim2 Macroinvertebrate Data Original Data Rearranged in Double Dendrogram Order Habitat Data Physical Chemical Data All Data Combined • Working with Your Own Data

Community Structure(Species Composition) Species occurrence and abundance (that is, both the kinds and numbers of species present) Measures of Community Structure Being Used in Pollution Surveys in 1970s • Gleason's Richness Index • Shannon's Diversity Index • Simpson's Index of Dominance • Brillouin's Index • Menhinik's Richness Index • Pielou's Evenness Index • McIntosh's Index

Where Others Go Wrong

Where Presence-Absence Similarity Coefficients Go Wrong P = number of matches in which a given taxon is present at both stations N = number of matches in which a given taxon is absent from both stations M = number of matches in which a given taxon is present in one station and absent from the other.

Where Chutter's Biotic Index Fails N = total number of individuals at Station a Xi = number of individuals in the ith taxon at Station a Qi = quality index for the ith taxon k = the number of taxa at Station a CB is calculated for only one station at a time.

Where Percentage Similarity of Community Fails k = number of different taxa at Stations a and b ria and rib = the relative abundances of the ith taxa in stations a and b, respectively

Index of Biotic Similarity(Pinkham-Pearson Index) Barbour et al. (1992) in a systematic comparison of the metrics proposed in EPA's rapid bioassessment protocol (Pfalkin et al., 1989), concluded that B "may be the most appropriate metric to serve as a measure of community similarity."

Example of B, Index of Biotic Similarity

Comparing More Than Two Stations Each comparison between stations is called a paired comparison (PC). When dealing with two or more stations, the number of paired comparisons is expressed by the formula:

Matrix of B’s Between 11 Habitat Parameters

BioSim • BioSim, 1976 Fortran IV • Calculated B’s from original data matrix and produced dendrograms from resulting matrices of B’s.

BioSim1 • BioSim1, 1992, DOS • As above, plus • Defined terms clearly • Provided a strategy for analyzing data • Contained new features as part of this strategy

BioSim1 - Terms • Original Data Set • The Matrix of B’s • The Dendrogram

BioSim1 – Terms - Original Data Set A data set is usually 3-dimensional That is it includes three variables: sample sites or replicates at one site taxa sampled sampling dates Normally the data set is analyzed two variables at a time. Thus an original data set has two axes, one variable on each axis. This is a two-dimensional matrix The parameters are the actual values of the variables, such as Site 1 Gammarus sp June 10, 2005 Data points in the matrix are the number of organisms recorded in the sample involving a parameter on the x-axis and one on the y-axis. Each comparison between two data points is called a match.

BioSim1 – Terms Components of an Original Data Set

BioSim1 – Terms - The Matrix of B’s The triangular array of paired comparisons between all the possible pairs of parameters of a given variable forms a matrix of B's. Each paired comparison in the matrix is a B-value.

BioSim1 – Terms Components of a Matrix of B’s

BioSim1 – Terms - The Dendrogram The matrix of B's is surveyed for paired comparisons of parameters with high B-values. Each of these pairs is linked in a cluster. For the purpose of the discussion which follows, a single parameter is considered to comprise a cluster. By an iterative process, additional clusters are linked to those already linked, based on a function of the average for the B-values between the two clusters. In this manner a dendrogram of clusters is formed which resembles a tree on its side, with the branches linking the various clusters at nodes. The nodes link clusters of parameters. Note that the B-value for a given cluster is found by extending the node lines to the coefficient of similarity scale (B-value scale).

BioSim1 – Terms Components of a Dendrogram

BioSim1 – Strategy for Data Analysis • Nature of the original data points • Configuration of the original data matrix • Variations of B • Data points with low numbers of organisms • Unweighted vs weighted clustering • Configuration of the rearranged data matrix • Establishing environmentally valid subclusters

BioSim1 – Strategy for Data Analysis Nature of the original data points • numbers of individuals in a taxon (density) • percentage of entire sample represented by a single taxon (% composition) • biomass, productivity, chlorophyll, etc. • chemical parameters (will need scaling) • physical parameters (will need scaling) • habitat parameters (will need scaling) • paired comparisons between any of the above (will need scaling)

BioSim1 – Strategy for Data Analysis Configuration of the original data matrix • Taxa across sites for a given date (2-dimensional matrix) clusters of sites based on taxa they contain & clusters of taxa at those sites • Taxa across dates for a given site (2-dimensional matrix) clusters of dates based on taxa they contain & clusters of taxa on those dates • Sites across dates for a given taxon (2-dimensional matrix) clusters of dates based sites & clusters of sites on those dates • Taxa across sites for given dates (3-dimensional matrix) clusters of sites on dates based on the taxa they contain clusters of taxa based on their distribution at sites on given dates

BioSim1 – Strategy for Data Analysis Variations of B • B, as displayed • B1, used original data matrix to calculate % composition and then developed a dendrogram based on the % composition values • B2, as in B1, but calculated using a weighting factor for each match based on the average of the % composition values in that match

BioSim1 – Strategy for Data Analysis Data points with low numbers of organisms • 0/0 matches scored as 1 or ignored? • 1/1, 1/2, 2/2 matches ignored? Perkins (1981) • Compressing the data matrixDelete any taxon that is represented by fewer than X total individuals over all the sites/dates being compared as long as X/n, where n is the number of times it occurs is < 3. Any similarly logical rule

BioSim1 – Strategy for Data Analysis Data points with low numbers of organisms Further Consideration of sampling error: 0/1 = 0.0 ; 1/1 = 1 - 0/100 = 0.0 ; 1/100 = 0.01 Solution (Clifford and Stephanson, 1975): use an adjustment factor (f). Add f to both numerator and denominator. They recommend f = 1/5 lowest non-zero entry in original data matrix. In above case, f = 0.2: 0.2/1.2 = 0.17 - 0.2/100 = 0.002 which may reflect a more realistic relationship

BioSim1 – Strategy for Data Analysis Data points with low numbers of organisms • Ignored matches The number of matches (k in the formula for B) is decreased by one for each match ignored in each paired comparison.

BioSim1 – Strategy for Data Analysis Unweighted vs weighted clustering • The algorithm that determines which of two candidate clusters to join to an already existing cluster. • A cluster of 5 parameters could be linked to a cluster of 3 parameters or another of 2 parameters. There would be 5x3 B-values to average for the first possibility and 5x2 for the second. • Unweighted: The averages of the two sets of B-values determine the decision. • Weighed: The number of parameters in the cluster to be joined plays a role in the decision.

BioSim1 – Strategy for Data Analysis Configuration of the rearranged data matrix Rearranging original data matrix in dendrogram order

BioSim1 – Strategy for Data Analysis Configuration of the rearranged data matrix Rearranging original data matrix in double-dendrogram order

BioSim1 – Strategy for Data Analysis • Establishing environmentally valid subclusters

Preferred Options

BioSim2 • BioSim2, 2005, Java Format • Most features as in BioSim1 • But, very user-friendly • Provides many of the options in BioSim1 as automatic output • Expands on capability of mining the data for insights

Figure 1. Opening Screen of BioSim2.

Demonstration of BioSim2 • Input original compressed data matrix • Output of BioSim2 • Row Dendrogram • Row Cophenetic Correlation Coefficient • Column Dendrogram • Column Cophenetic Correlation Coefficient • Original Data Matrix Rearranged in Double-Dendrogram Order

0’s Present 0’s Removed

Coding The Reordered Data Matrix

Synthesis

BioSim2 Workshop

BioSim2 Workshop

Presentation Transcript

Workshop

Workshop

Workshop

WORKSHOP

Mathematics Workshop Mathematics Workshop

Workshop

Workshop

Workshop

Workshop

Workshop

Workshop

Workshop

Workshop 3 – grid workshop