1 / 25

Annotating Metagenomes Using the SEED

NSF/EU Cyberinfrastructure Meeting, Washington, DC. Annotating Metagenomes Using the SEED. Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory. www.nmpdr.org. www.theseed.org.

Download Presentation

Annotating Metagenomes Using the SEED

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NSF/EU Cyberinfrastructure Meeting, Washington, DC. Annotating Metagenomes Using the SEED Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory www.nmpdr.org www.theseed.org

  2. How much has been sequenced? 100 bacterial genomes Environmental sequencing Number of known sequences First bacterial genome 1,000 bacterial genomes Year

  3. How much will be sequenced? Everybody in USA Everybody in San Diego One genome from every species 100 people Most major microbial environments All cultured Bacteria

  4. What do we want from annotations? Consistent Accurate Available Reliable www.nmpdr.org www.theseed.org

  5. Consistent www.nmpdr.org www.theseed.org

  6. The Importance of Consistency • Consistency: same genes connected to same functional role • Enables communication • Required for most comparative genomics assays www.nmpdr.org www.theseed.org

  7. hisA FIG function: Phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase (EC 5.3.1.16) Other functions in RefSeq: phosphoribosylformimino-5-aminoimidazole carboxamide phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase phosphoribosylformimino-5-aminoimidazole carboxamide ribotide... 1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)methylideneamino] imidazole-4-carboxamide isomerase N-(5-phospho-L-ribosyl-formimino)-5-amino-1-(5- phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1-(5'-phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'- phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'-phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'-phosphoribosyl)-4- imidazolecarboxamide isomerase Phosphoribosyl isomerase A [1-[5-phosphoribosyl]-5-[[5-phosphoribosylamino]methylideneamino] imidazole-4-carboxamide isomerase] www.nmpdr.org www.theseed.org

  8. Measuring Consistency • Define a set of protein families such that each family contains genes playing the same function • Attach functional roles to protein families • Measure the consistency of the annotations made to genes within each family • "consistency" is the odds that two proteins from the same family have the same function • Evaluate both families and functions. www.nmpdr.org www.theseed.org

  9. Consistency among databases www.nmpdr.org www.theseed.org

  10. Accurate www.nmpdr.org www.theseed.org

  11. How to measure accuracy • If everything was called “hypothetical protein” the database would be 100% consistent • Need to measure accuracy (specificity) as well as consistency • Sample 100 proteins at random from “curated” set (i.e. that are believed to be correct) • Manually inspect annotations to score correctness www.nmpdr.org www.theseed.org

  12. Available www.nmpdr.org www.theseed.org

  13. http://metagenomics.theseed.org Free service User registration/log in Free to upload sequences in several formats Automatically annotates sequences Download in several formats Complete genomes too: http://www.nmpdr.org/anno-server Soon to come: Plasmids, phages, other short genomes

  14. Metagenome Metabolic Reconstruction

  15. Metabolic potential in environments

  16. Phylogenomics

  17. Comparing Metagenomes to Genomes (or other metagenomes!)

  18. Reliable (Believable)

  19. Metabolic potential in environments

  20. Stress Membrane transport Sulfur Signaling Capsule Motility Phosphorus RNA Mine Saltern Respiration Marine Microbialites Fish Animals Coral Freshwater From sequences to environments CDA 60.2% CDA 21.7%

  21. What do we want from annotations? Consistent Accurate Available Reliable When do we want it? NOW

  22. Acknowledgements Environmental Genomics Forest Rohwer Rohwer lab members All the labs that provided sequence Statistics Liz Dinsdale Dana Hall Beltran Rodriguez-Brito Metagenomics Annotation Server Rick Stevens Daniel Paarman Folker Meyer Bob Olsen FIG Ross Overbeek Veronika Vonstein Annotators

  23. Subsystems make up metabolism Wikipedia Metabolism http://en.wikipedia.org/wiki/Portal:Metabolism

More Related