260 likes | 279 Views
Learn how to generate consistent and accurate annotations for metagenomes using the NMPDR system. This workshop will cover the basics of metagenome annotation, as well as advanced features such as automated metabolic reconstruction.
E N D
ASM General Meeting, Boston. Annotating Metagenomes Using the NMPDR See also poster: B-179 (126B) Aziz et al Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory www.nmpdr.org www.theseed.org
How much has been sequenced? 100 bacterial genomes Environmental sequencing Number of known sequences First bacterial genome 1,000 bacterial genomes Year www.nmpdr.org www.theseed.org
How much will be sequenced? Everybody in USA Everybody in Boston One genome from every species 100 people Most major microbial environments All cultured Bacteria www.nmpdr.org www.theseed.org
The Problem How do you generate consistent and accurate annotations for metagenomes? www.nmpdr.org www.theseed.org
The SEED Family www.nmpdr.org www.theseed.org
Annotations using subsystems FIG has developed the notion of Subsystem – a generalization of “pathway” as a collection of functional roles jointly involved in a biological process or complex Extended subsystems into FIGfams – protein families that perform the same functions. www.nmpdr.org www.theseed.org
Subsystems make up metabolism Wikipedia Metabolism http://en.wikipedia.org/wiki/Portal:Metabolism
SEED Viewer www.nmpdr.org www.theseed.org
Populated Subsystem www.nmpdr.org www.theseed.org
genome context (virulence islands, prophages, conserved gene clusters) virulence mechanism enzymatic activity cellular localization predicted or measured co-regulation common phenotype combinations of criteria Subsystems Are Not Just Pathways www.nmpdr.org www.theseed.org
Automated user originated processing Takes 1-7 hours depending on size and complexity of the genome ~1,500 external submissions, including 150 genomes not yet publicly released. Reannotation of >500 genomes complete 789 users, 160 organizations, 25 countries. Automated Annotations of Complete genomes http://rast.nmpdr.org/
Automated Annotations of Complete Metagenomes http://metagenomics.theseed.org/ MG-RAST Server Accurate and consistent annotations in a few days Automatic metabolic reconstruction Freely available after registration www.nmpdr.org www.theseed.org
Metagenome Annotation Automated pipeline • upload sequences in fasta, with or without Q-scores • removes exact duplicates (454 artefact) • renumbers sequences (mapping provided) • BLAST against SEED nr, 16S rDNA • Annotations and metabolic reenactment • Taxonomic summary www.nmpdr.org www.theseed.org
MG-RAST computation ~19 hours of compute per input megabyte Hours of Compute Time Input size (MB)
How much so far ~200 GS20 ~200 FLX ~200 Sanger] 676 metagenomes 10,012,793,995 bp (10 Gbp) Average: ~15 M bp per genome Compute time (on a single CPU): 190,243 hours = 7,926 days = 21 years www.nmpdr.org www.theseed.org
Lots of sequencesall pyrosequencing www.nmpdr.org www.theseed.org
Stress Membrane transport Sulfur Signaling Capsule Motility Phosphorus RNA Mine Saltern Respiration Marine Microbialites Fish Animals Coral Freshwater From Sequences To Environments CDA 60.2% CDA 21.7% Dinsdale et al, Nature 2008
Upcoming Features • More user options (removing sequences, E-values, percent identities, etc) • More databases (ACLAME, human, etc) • More user generated content (mash-ups) via webservices and published API www.nmpdr.org www.theseed.org
Accessing Data via Web Services Thanks: Bahador Nosrat SDSU
Workshops Free workshops on NMPDR, RAST, mg-RAST, SEED Upcoming workshops: Greece, Argonne, Urbana-Champaign, San Diego Contact Leslie McNeil lkmcneil@ncsa.uiuc.edu or visit http://www.nmpdr.org/
Acknowledgements Metagenomics Annotation Server Rick Stevens Daniel Paarman Folker Meyer Bob Olsen Mark D'Souza FIG Ross Overbeek Veronika Vonstein Annotators Statistics & Web services Liz Dinsdale Dana Hall Beltran Rodriguez-Brito Bahador Nosrat Environmental Genomics Forest Rohwer and the labs that provided sequence www.nmpdr.org www.theseed.org