Species Richness in Soil Bacterial Communities: A Proposed Approach to Overcome Sample-Size Bias

Methods: DNA extraction from 0.5 g of soil using a modified lysis and bead-beating protocol. PCR: A near complete 16S rRNA gene fragment was amplified using primer pair 27F and 1391R. Cloning: Invitrogen TOPO-TA cloning kit. Sequencing: 13,486 near complete 16S rRNA gene clones were generated, and sequenced at the Department of Energy Joint Genome Institute Alignment/ Distance matrix: Greengenes OTU assignment: DOTUR Chimera check: using Mallard, 485 sequences were identified as potential chimeras and removed from the dataset. The remaining 13,001 sequences were designated Kessler farm soil clones (KFS clones). Abstract Estimates of species richness based on 16S rRNA gene clone libraries are increasingly utilized to gauge the level of bacterial diversity within various ecosystems. However, previous studies have indicated that regardless of the method utilized, species richness estimates obtained is dependent on the size of clone libraries analyzed. We here propose an approach to overcome sample size bias in species richness estimates in complex microbial communities. Parametric (Maximum likelihood-based and rarefaction curve-based) and nonparametric approaches were used to estimate species richness in a library of 13,001 near full-length 16S rRNA clones derived from soil, as well as in multiple subsets of the original library. Species richness estimates obtained increased with the increase in library size. To obtain a sample size-unbiased estimate of species richness, we calculated the theoretical clone library sizes required to encounter the estimated species richness at various clone library sizes, used curve fitting to determine the theoretical clone library size required to encounter the “true” species richness, and subsequently determined the corresponding sample size-unbiased species richness value. Using this approach, sample size unbiased estimates of 17,230, 15,571, and 33,912 were obtained for the ML-based, rarefaction curve-based, and ACE-1 estimators, compared to bias uncorrected values of 15,009, 11,913, and 20,909, respectively. Effect of sample size on various species richness estimates. Subsets of 100, 500, 1000, and 3289 clones were randomly drawn from the 13,001 KFS dataset and treated as smaller clone libraries. Species richness estimates for all subset clone libraries were estimated using all previously described parametric and nonparametric methods. Regardless of the model used, the estimate of species richness increased with sample size. The estimate versus sample size plot was generally linear until sample size 3289-clone after which it started to approach an asymptote. Mixed exponential-mixed Poisson Poisson Pareto-mixed Poisson Lognormal-mixed Poisson Inverse Gaussian-mixed Poisson Negative Binomial Towards a sample-size unbiased estimate of species richness. Since species richness estimate increases with clone library size, the estimates species richness is only a fraction of the true richness. To obtain a sample-size unbiased estimate of species richness we proposed the following approach. Using parametric ML-based species richness estimates (SRest) obtained at different clone library sizes, we calculated the theoretical clone library sizes required to observe the absolute majority (99, 99.9, or 99.99%) (CLth-99, CLth-99.9, or CLth-99.99) of the SRest at different actual clone library sizes (CLact). Species richness estimates used Parametric Non-Parametric Rarefaction curve-based Fitting models: Michaelis Menten (MM), MM with intercept, double MM, negative exponential, and negative exponential with intercept Chao, Chao_bc ACE, ACE-1 ML-based EstimateS The library size at which CLact = CLth i.e. the point at which theoretical clone library effort is met was 6.3 X 106. http://www.stat.cornell.edu/~bunge/ ML-based SREst Vs CLth-99 suggested 17, 230 as a sample size-unbiased SR estimate Repeating the procedure described above with rarefaction curve-based, and ACE-1 estimators, values of 15,571, and 33,912 were obtained as a sample size-unbiased SR estimate, respectively. Comparison of mixed-Poisson parametric models used to fit the Kessler Farm Soil dataset frequency dataa References Behnke A, et al. 2006. Appl. Environ. Microbiol.72, 3623-3636. Dunbar J. et al. 2002. Appl. Environ. Microbiol.6, 3035-3045. Joen S-O, et al. 2006. Appl. Environ. Microbiol.72, 6578-6583. Hill TCJ, et al. 2003. FEMS Microbiol. Ecol.43, 1-11. Hong S-H, et al. 2006. Proc. Natl. Acad. Sci. USA103, 117-122. Hughes JB, et al. 2001. Appl. Environ. Microbiol.67, 4399-4406. Marguiles M, et al 2005. Nature437, 376-380. O’Hara RB 2005. J. Animal Ecol.74, 375-386. Roesch LFW, et al. 2007 ISME J.1, 283-290. Schloss PD & Handleman J 2006. PLoS Comput. Biol.2, 786-793. Schloss PD & Handleman J 2005. Appl. Environ. Microbiol.71, 1501-1506. Stach EM & Bull AT 2004. Antonie Van Leeuwenhoek87, 3-9. Stoeck T, et al. 2007 PLos One8, e278. Zuendorf A, et al. 2006 FEMS Microbiol. Ecol.58, 476-491. Conclusions and Discussions • We utilized a large 16S rRNA gene clone library (13,001) to demonstrate the problems associated with SR determination. • In spite of the large size of the library (13,001 clones), all SRest values continued to increase regardless of the estimation method used. • We used a ML-based, rarefaction curve-based, and nonparametric-based estimate for our sample size-unbiased approach. • A sample size-unbiased estimate (17,230) is still 15% higher than the SRest value obtained using the entire 13,001 clone library (15,009). • The approach highlights the value of utilizing large clone library datasets in SR estimates. The larger the clone library, the closer the rarefaction curve, CLth Vs CLact, and CLth Vs SRest plots will be to reaching an asymptote. • Pyrosequencing-based approaches could be useful for obtaining large 16S rRNA gene datasets from various habitats. Comparison of different models used to fit the rarefaction curve in Kessler Farm Soil dataseta Study Site An undisturbed tall grass prairie preserve in Kessler Farm Field Laboratory biological research station in central Oklahoma Nonparametric estimators of species richness for Kessler Farm Soil dataset. Species Richness in Soil Bacterial Communities: A Proposed Approach to Overcome Sample-Size Bias Noha Youssef, Mostafa Elshahed Oklahoma State University, Stillwater, OK Funding Provided By NSF Microbial Observatories Program JGI-Laboratory sequencing program OSU Start-up funds to Mostafa Elshahed N-229 Introduction Culture-independent 16S rRNA gene-based analysis has been utilized to study bacterial, archaeal, and eukaryotic diversity from various habitats. Collectively, these studies demonstrated that the scope of microbial diversity is far broader than previously implied based on cultivation analysis. The sizes of analyzed clone libraries have steadily been increasing which allowed for the utilization of various statistical approaches in evaluating microbial diversity (Dunbar, 2002; Schloss, 2006; Joen, 2006; Hong, 2006; Hughes, 2001). Species richness estimation methods are either parametric or nonparametric (O' Hara, 2005). Nonparametric estimators have been more commonly used in microbial ecology (Stach, 2004; Schloss, 2005; Hill, 2003). Parametric methods (maximum likelihood-,ML, and rarefaction curve-based) have recently been used to estimate bacterial and microeukaryotic diversities in complex microbial communities (Joen, 2006; Hong, 2006; Behnke, 2006; Stoeck, 2007; Zuendorf, 2006; Roesch, 2007). Regardless of the approach utilized for species richness determination, it has been observed that the estimated species richness value (SRest) is dependent on the size of the clone library used in the calculation (Dunbar, 2002; Roesch, 2007; Schloss, 2006). The problem could theoretically be rectified by adequate sampling effort to encounter all bacterial species in the sample. However, in complex microbial communities, a large fraction of the bacterial species is present in low abundance and even with recent advances in sequencing technologies (Margulies, 2005), a complete census of all bacterial species remains a daunting task.

Species Richness in Soil Bacterial Communities: A Proposed Approach to Overcome Sample-Size Bias

Species Richness in Soil Bacterial Communities: A Proposed Approach to Overcome Sample-Size Bias

Presentation Transcript

Biodiversity and exotic species invasion in riparian plant communities

Island biogeography

Soil Stabilization

EPI 5240: Introduction to Epidemiology Bias and Misclassification November 9, 2009

Soil Horizons

PATHOGENESIS OF BACTERIAL INFECTION PATHOGENICITY TOXIGENICITY VIRULENCE

Soil Microbiology

Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications

Recreational Fish Species List Fish illustration, Minimum size, Bag limits

Bacterial Infections 07/02/2011

Chapter Eight

EVOLUTION: CHANGE OVER TIME

Sample size calculation and development of sampling plan

Using Soil Color to Assess Hydrology

ENVIRONMENTAL PRACTICAL 2 A2.2VP2

How big should my study be? The science and art of choosing your sample size

Genomes of bacterial pathogens and their diversity

Chapter 15 Soil Resources

Power and Sample Size: Issues in Study Design

OPTION G