1 / 26

Annotating Metagenomes Using the NMPDR

Learn how to generate consistent and accurate annotations for metagenomes using the NMPDR system. This workshop will cover the basics of metagenome annotation, as well as advanced features such as automated metabolic reconstruction.

cmathews
Download Presentation

Annotating Metagenomes Using the NMPDR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ASM General Meeting, Boston. Annotating Metagenomes Using the NMPDR See also poster: B-179 (126B) Aziz et al Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory www.nmpdr.org www.theseed.org

  2. How much has been sequenced? 100 bacterial genomes Environmental sequencing Number of known sequences First bacterial genome 1,000 bacterial genomes Year www.nmpdr.org www.theseed.org

  3. How much will be sequenced? Everybody in USA Everybody in Boston One genome from every species 100 people Most major microbial environments All cultured Bacteria www.nmpdr.org www.theseed.org

  4. The Problem How do you generate consistent and accurate annotations for metagenomes? www.nmpdr.org www.theseed.org

  5. The SEED Family www.nmpdr.org www.theseed.org

  6. Annotations using subsystems FIG has developed the notion of Subsystem – a generalization of “pathway” as a collection of functional roles jointly involved in a biological process or complex Extended subsystems into FIGfams – protein families that perform the same functions. www.nmpdr.org www.theseed.org

  7. Subsystems make up metabolism Wikipedia Metabolism http://en.wikipedia.org/wiki/Portal:Metabolism

  8. SEED Viewer www.nmpdr.org www.theseed.org

  9. Populated Subsystem www.nmpdr.org www.theseed.org

  10. genome context (virulence islands, prophages, conserved gene clusters) virulence mechanism enzymatic activity cellular localization predicted or measured co-regulation common phenotype combinations of criteria Subsystems Are Not Just Pathways www.nmpdr.org www.theseed.org

  11. Automated user originated processing Takes 1-7 hours depending on size and complexity of the genome ~1,500 external submissions, including 150 genomes not yet publicly released. Reannotation of >500 genomes complete 789 users, 160 organizations, 25 countries. Automated Annotations of Complete genomes http://rast.nmpdr.org/

  12. Automated Annotations of Complete Metagenomes http://metagenomics.theseed.org/ MG-RAST Server Accurate and consistent annotations in a few days Automatic metabolic reconstruction Freely available after registration www.nmpdr.org www.theseed.org

  13. Metagenome Annotation Automated pipeline • upload sequences in fasta, with or without Q-scores • removes exact duplicates (454 artefact) • renumbers sequences (mapping provided) • BLAST against SEED nr, 16S rDNA • Annotations and metabolic reenactment • Taxonomic summary www.nmpdr.org www.theseed.org

  14. Metagenome Metabolic Reenactment

  15. Phylogenomics

  16. Comparing Metagenomes to Genomes (or other metagenomes!)

  17. Metabolic potential in environments

  18. MG-RAST computation ~19 hours of compute per input megabyte Hours of Compute Time Input size (MB)

  19. How much so far ~200 GS20 ~200 FLX ~200 Sanger] 676 metagenomes 10,012,793,995 bp (10 Gbp) Average: ~15 M bp per genome Compute time (on a single CPU): 190,243 hours = 7,926 days = 21 years www.nmpdr.org www.theseed.org

  20. Lots of sequencesall pyrosequencing www.nmpdr.org www.theseed.org

  21. Stress Membrane transport Sulfur Signaling Capsule Motility Phosphorus RNA Mine Saltern Respiration Marine Microbialites Fish Animals Coral Freshwater From Sequences To Environments CDA 60.2% CDA 21.7% Dinsdale et al, Nature 2008

  22. Upcoming Features • More user options (removing sequences, E-values, percent identities, etc) • More databases (ACLAME, human, etc) • More user generated content (mash-ups) via webservices and published API www.nmpdr.org www.theseed.org

  23. Accessing Data via Web Services Thanks: Bahador Nosrat SDSU

  24. Workshops Free workshops on NMPDR, RAST, mg-RAST, SEED Upcoming workshops: Greece, Argonne, Urbana-Champaign, San Diego Contact Leslie McNeil lkmcneil@ncsa.uiuc.edu or visit http://www.nmpdr.org/

  25. Acknowledgements Metagenomics Annotation Server Rick Stevens Daniel Paarman Folker Meyer Bob Olsen Mark D'Souza FIG Ross Overbeek Veronika Vonstein Annotators Statistics & Web services Liz Dinsdale Dana Hall Beltran Rodriguez-Brito Bahador Nosrat Environmental Genomics Forest Rohwer and the labs that provided sequence www.nmpdr.org www.theseed.org

More Related