250 likes | 264 Views
NSF/EU Cyberinfrastructure Meeting, Washington, DC. Annotating Metagenomes Using the SEED. Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory. www.nmpdr.org. www.theseed.org.
E N D
NSF/EU Cyberinfrastructure Meeting, Washington, DC. Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory www.nmpdr.org www.theseed.org
How much has been sequenced? 100 bacterial genomes Environmental sequencing Number of known sequences First bacterial genome 1,000 bacterial genomes Year
How much will be sequenced? Everybody in USA Everybody in San Diego One genome from every species 100 people Most major microbial environments All cultured Bacteria
What do we want from annotations? Consistent Accurate Available Reliable www.nmpdr.org www.theseed.org
Consistent www.nmpdr.org www.theseed.org
The Importance of Consistency • Consistency: same genes connected to same functional role • Enables communication • Required for most comparative genomics assays www.nmpdr.org www.theseed.org
hisA FIG function: Phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase (EC 5.3.1.16) Other functions in RefSeq: phosphoribosylformimino-5-aminoimidazole carboxamide phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase phosphoribosylformimino-5-aminoimidazole carboxamide ribotide... 1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)methylideneamino] imidazole-4-carboxamide isomerase N-(5-phospho-L-ribosyl-formimino)-5-amino-1-(5- phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1-(5'-phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'- phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'-phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'-phosphoribosyl)-4- imidazolecarboxamide isomerase Phosphoribosyl isomerase A [1-[5-phosphoribosyl]-5-[[5-phosphoribosylamino]methylideneamino] imidazole-4-carboxamide isomerase] www.nmpdr.org www.theseed.org
Measuring Consistency • Define a set of protein families such that each family contains genes playing the same function • Attach functional roles to protein families • Measure the consistency of the annotations made to genes within each family • "consistency" is the odds that two proteins from the same family have the same function • Evaluate both families and functions. www.nmpdr.org www.theseed.org
Consistency among databases www.nmpdr.org www.theseed.org
Accurate www.nmpdr.org www.theseed.org
How to measure accuracy • If everything was called “hypothetical protein” the database would be 100% consistent • Need to measure accuracy (specificity) as well as consistency • Sample 100 proteins at random from “curated” set (i.e. that are believed to be correct) • Manually inspect annotations to score correctness www.nmpdr.org www.theseed.org
Available www.nmpdr.org www.theseed.org
http://metagenomics.theseed.org Free service User registration/log in Free to upload sequences in several formats Automatically annotates sequences Download in several formats Complete genomes too: http://www.nmpdr.org/anno-server Soon to come: Plasmids, phages, other short genomes
Comparing Metagenomes to Genomes (or other metagenomes!)
Reliable (Believable)
Stress Membrane transport Sulfur Signaling Capsule Motility Phosphorus RNA Mine Saltern Respiration Marine Microbialites Fish Animals Coral Freshwater From sequences to environments CDA 60.2% CDA 21.7%
What do we want from annotations? Consistent Accurate Available Reliable When do we want it? NOW
Acknowledgements Environmental Genomics Forest Rohwer Rohwer lab members All the labs that provided sequence Statistics Liz Dinsdale Dana Hall Beltran Rodriguez-Brito Metagenomics Annotation Server Rick Stevens Daniel Paarman Folker Meyer Bob Olsen FIG Ross Overbeek Veronika Vonstein Annotators
Subsystems make up metabolism Wikipedia Metabolism http://en.wikipedia.org/wiki/Portal:Metabolism