1 / 21

Accurate estimation of microbial communities using pyrotags and itags

Accurate estimation of microbial communities using pyrotags and itags. Julien Tremblay, PhD jtremblay@lbl.gov. 16S rRNA as phylogenetic marker gene. 21 proteins. 16S rRNA. 30S. 70S Ribosome. subunits. 50S. 5S rRNA. Escherichia coli 16S rRNA Primary and Secondary Structure.

ardith
Download Presentation

Accurate estimation of microbial communities using pyrotags and itags

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accurate estimation of microbial communities using pyrotags and itags Julien Tremblay, PhD jtremblay@lbl.gov

  2. 16S rRNA as phylogenetic marker gene 21 proteins 16S rRNA 30S 70S Ribosome subunits 50S 5S rRNA Escherichia coli 16S rRNA Primary and Secondary Structure 34 proteins 23S rRNA highly conserved between different species of bacteria and archaea Falk Warnecke

  3. 16S rRNA in environmental microbiology(Sanger clone libraries) 900-1100 bp length Falk Warnecke

  4. Next generation sequencing (NGS) 454 Illumina 0.5M 450bp reads $$ 10-400M 150bp reads/lane $

  5. Game plan to survey microbial diversity V1 V2 V3 V4 V5 V6 V7 V8 V9 16S rRNA Generate amplicons of a given variable region from bacterial community (many millions of sequences) Amplicon tags = Deeper, cheaper, faster X 10 X 1 X 1,000 X 2,000 Identification (BLAST against DB) Reduce dataset by dereplication/clustering X 200 X 1,200 X 800 X 10,000

  6. Rare biosphere Abundance Rare biosphere Rank Low abundance High abundance Sequencing error? Relative small size of amplicons

  7. Rare bias sphere? 27F 342R 1114F 1392R Is rare biosphere an artifact of the NGS error? Control experiment: estimate rare biosphere in a single strain of E.coli V1 & V2 V8 It should not, if relatively stringent clustering parameters are applied Subject to controversy – Is rare always real? Kunin et al., (2009), Environ. Microbiol. Quince et al., (2009), Nat. Methods

  8. PyroTagger (for 454 amplicons) Unzip, validate Remove low-quality reads Redundancy removal PyroClust & Uclust Remove chimeras Samples comparison, post-processing pyrotagger.jgi-psf.org

  9. Classification and barcode separation • Sequences of cluster (OTU) representatives • Blast vs GreenGenes and Silva databases, dereplicated at 99.5% • Distribution of microbial phyla in the dataset

  10. Illumina tags (itags) • Typical 454 run  450,000 – 500,000 reads • “Typical” Illumina run: • GAIIx  10,000,000 – 40,000,000 reads/lane • Hiseq  350,000,000 reads/lane • Miseq (available soon)  1,000,000 – 2,000,000 reads/lane • Miseq technology will replace 454 for microbial communities surveys. • Pyrotagger replaced by more efficient clustering algorithm (SeqObs).

  11. itags clustering Reduces dataset by 80% Default 97% Default 97% Edward Kirton, JGI

  12. Number of reads >> number of clusters Edward Kirton, JGI

  13. Benefits of parallelization Edward Kirton, JGI

  14. itags Validating SeqObs output by comparing with pyrotagger results 454  Pyrotagger (V8 region) Synthetic communities Termite gut Surface Sediments Compost Sludge Illumina GAIIx  SeqObs pipeline (V4, V5 and V9 regions) Illumina Miseq  SeqObs pipeline (V4 region)

  15. Itags – Phyla abundance comparison GAIIx vs 454 V8 region

  16. Itags – Phyla abundance comparison Miseq V4 region vs 454 V8 region

  17. itags – confidence level Alignment length against reference DB 454 GAIIx Miseq 5’ reads Miseq assembled reads

  18. itags – confidence level E values Miseq 5’ reads 454 GAIIx Miseq assembled reads

  19. Challenges • Short size of amplicon • What filtering parameters to use (stringency level)? •  balance between stringency filter and keeping as much data as we can • Whole new dimension for rare biosphere? • Handling large numbers of sample (tens of thousand magnitude) • Cost of barcoded primers, handling

  20. Acknowledgments • Susannah Tringe • Edward Kirton • Feng Chen • Kanwar Singh • Rob Knight lab (Univ. of Colorado) Thanks!

  21. 16S rRNA Dangl lab, UNC

More Related