1 / 52

Building and Refining AraCyc: Data Content, Sources, and Methodologies

Building and Refining AraCyc: Data Content, Sources, and Methodologies. Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science. AraCyc . AraCyc – Ara bidopsis Metabolic En Cyc lopedia Database of metabolic pathways found in Arabidopsis. Accessible from :

may
Download Presentation

Building and Refining AraCyc: Data Content, Sources, and Methodologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building and Refining AraCyc:Data Content, Sources, and Methodologies Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science

  2. AraCyc • AraCyc – Arabidopsis Metabolic EnCyclopedia • Database of metabolic pathways found in Arabidopsis • Accessible from: • TAIR – The Arabidopsis Information Resource • www.arabidopsis.org

  3. AraCyc • AraCyc – Arabidopsis Metabolic EnCyclopedia • Database of metabolic pathways found in Arabidopsis • Accessible from: • PMN – Plant Metabolic Network • www.plantcyc.org

  4. AraCyc Pathway pages Evidence Code Compound Reaction Pathway Gene Enzyme + Additional curated information

  5. AraCyc Pathway pages Classification Superpathways Summary Pathway variants References

  6. AraCyc Pathway pages Evidence Code Compound Reaction Pathway Gene Enzyme

  7. AraCyc Pathway pages Evidence Code Compound Reaction Pathway Gene Enzyme

  8. AraCyc Compound pages AraCyc Compound: CDP-choline Synonyms Classification(s) Molecular Weight / Formula Appears as Reactant Appears as Product

  9. AraCyc Pathway pages Evidence Code Compound Reaction Pathway Gene Enzyme

  10. AraCyc Enzyme detail pages AraCyc Enzyme: phosphatidyltransferase Multifunctional protein * *

  11. AraCyc Enzyme detail pages AraCyc Enzyme: phosphatidyltransferase Reaction Pathway(s) Inhibitors, Kinetic Parameters, etc. Summary References

  12. AraCyc Pathway pages Evidence Code Compound Reaction Pathway Gene To TAIR . . . Enzyme

  13. AraCyc 4.5 (released June 2008) • More detailed information available in the Release Notes

  14. PlantCyc 1.0 (released June 2008) • www.plantcyc.org

  15. Putting AraCyc (and PlantCyc) to use • Reference information • Pathways, Genes, Enzymes, Reactions, and Metabolites • Data Analysis (AraCyc) • Use the OMICS viewer • Display the results of experiments on an Arabidopsis metabolic map • Study your data or public data sets

  16. Putting AraCyc to use • Display the results of experiments on an Arabidopsis metabolic map Compounds Transcripts or Proteins

  17. Putting AraCyc (and PlantCyc) to use • Reference information • Pathways, Genes, Enzymes, Reactions, and Metabolites • Data Analysis (AraCyc) • Use the OMICS viewer • Display the results of experiments on an Arabidopsis metabolic map • Study your data or public data sets • Generate new hypotheses • Find metabolic differences in your mutant with “no phenotype” • Identify pathways that are related to your favorite biological process • See more at “Advanced Bioinformatic Resources for Arabidopsis” • Thursday, July 24, 7 PM in the Grand Salon • Enzyme discovery • Fill “pathway holes” through comparative analyses

  18. Putting AraCyc (and PlantCyc) to use Pathway “Hole Filling” AraCyc Choline Biosynthesis I Spinach PlantCyc Fill pathway “hole” ethanolamine ?????? Soybean

  19. Data sources and data flow Research Community Genes, Proteins, Metabolites Experimental Data Data repositories Published literature Curators Computational predictions Community submissions Metabolic Pathway Databases

  20. Data sources and data flow • Information enters metabolic pathway database in two stages • Stage 1: Initial build • Stage 2: Updates and improvements • AraCyc 1.0 – Initial Build - 2002

  21. Initial AraCyc Build (2002) • 7900 Arabidopsis genes annotated to the GO term ‘catalytic activity’ • 4900 loci in small molecule metabolism • 19% of the total genome • Goal: Map these loci to metabolic PATHWAYS • Solution: • Use reference database: MetaCyc (460 metabolic pathways) • Run PathoLogic program (SRI International) • Predict metabolic pathways present in Arabidopsis

  22. MetaCyc • Multi-kingdom metabolic pathway database • METAbolic EnCYClopedia • SRI International (www.metacyc.org) • First released in 1999 • All pathways generated by curators extracting information from the scientific literature • Only contains pathways with experimental support • Reference database • Used to create SINGLE SPECIES databases • . . . including AraCyc in 2002!

  23. arogenate dehydratase prephenate aminotransferase chorismate mutase 5.4.99.5 4.2.1.91 2.6.1.79 chorismate prephenate L-arogenate L-phenylalanine Initial AraCyc Build (2002) MetaCyc ANNOTATED GENOME DNA sequences Gene calls AT1G69370 PathoLogic Gene functions chorismate mutase chorismate mutase arogenate dehydratase AT1G69370 AT2G27820 AraCyc

  24. PathoLogic Program • Matches input enzymes to reference enzymes • Name • Enzyme Commission (EC) number • Identifies probable pathways • Enzyme coverage • Predicted species distribution • Initial AraCyc 1.0 build (2002) • PathoLogic inferred over 200 pathways • PathoLogic mapped 940 genes to the pathways

  25. Validation of a New Database • PathoLogic errs on the side of over-prediction • Curators validate pathways . . .

  26. Validation of a New Database • Curators • Find support for predicted pathways • Is the pathway described in Arabidopsis literature? • Are the crucial metabolites described in Arabidopsis literature? • Does the pathway include a unique reaction catalyzed by an Arabidopsis protein?

  27. Validation of a New Database • Curators: • Remove pathways not found in Arabidopsis • glycogen biosynthesis • C4 photosynthesis • caffeine biosynthesis • Edit pathways operating via a different route • Phenylalanine biosynthesis in bacteria vs. Arabidopsis

  28. Validation of a New Database • Edit pathways operating via a different route AraCyc Pathway: phenylalanine biosynthesis

  29. Completion of a New Database • Curators • Add Arabidopsis pathways not present in reference database • Add Arabidopsis compounds, reactions, and enzymes not mapped to a pathway • Assign evidence codes to pathways and enzymes

  30. Assignment of Evidence Codes

  31. AraCyc 1.0 . . . and beyond • Information enters metabolic pathway database in two stages • Stage 1: Initial build • Stage 2: Updates and improvement

  32. Database updates and improvements

  33. Database updates and improvements • New rounds of computational pathway prediction • New TAIR genome releases • New MetaCyc releases • New round of PathoLogic prediction

  34. Database updates and improvements • New rounds of computational pathway prediction • New TAIR genome releases • New reference database – PlantCyc • Part of the Plant Metabolic Network • Released in June 2008 • Contains plant pathways supported by: • experimental evidence • expert hypothesis *** • Reviewed by an editorial board of biochemists • Will include enzymes from newly sequenced plant genomes and EST collections www.plantcyc.org

  35. Database updates and improvements • New rounds of computational pathway prediction Newest TAIR Genome Annotations Newest Version of PlantCyc PathoLogic Program See poster: ICAR1404 Updated pathway predictions for AraCyc • Newly predicted pathways undergo pathway validation

  36. Database updates and improvements • New curator entries • Curators search for new information in scientific literature • TAIR curators • Assign new functional annotations to metabolic genes • AraCyc curators • Manually attach enzymes to pathways • Identify new and updated pathways • Write or revise summaries

  37. Database updates and improvements • New community submissions • Jamborees • Experts meet individually with curators • Review pathways in specific metabolic domains • Provide useful references and suggest important pathways • Curation Booth ****** • Open during all poster sessions – Booth #1 • Please come (free candy!) • TAIR or PMN website

  38. Community submissions • TAIR – www.arabidopsis.org

  39. Community submissions • TAIR – www.arabidopsis.org

  40. Community submissions • PMN – www.plantcyc.org

  41. Community submissions curator@plantcyc.org • PMN – www.plantcyc.org

  42. Community submissions = fame! • PMN Contributor page Your name here!

  43. Acknowledgements TAIR, AraCyc, and the PMN Eva Huala (Director and Co-PI) Sue Rhee (PI and Co-PI) Current Curators: - Peifen Zhang (Director and lead curator- metabolism) - Tanya Berardini (lead curator – functional annotation) - David Swarbreck (lead curator – structural annotation) - A. S. Karthikeyan (curator) - Donghui Li (curator) Recent Past Curators: - Christophe Tissier (curator) - Hartmut Foerster (curator) Tech Team Members: - Bob Muller (Manager) - Larry Ploetz (Sys. Administrator) - Raymond Chetty - Anjo Chi - Vanessa Kirkup - Cynthia Lee - Tom Meyer - Shanker Singh - Chris Wilks Metabolic Pathway Software: - Peter Karp and SRI group (NIH)

  44. Thank you . . . www.arabidopsis.org curator@arabidopsis.org www.arabidopsis.org/biocyc curator@arabidopsis.org www.plantcyc.org curator@plantcyc.org Please visit us at the Curation Booth!

  45. draw pathway diagram identify a pathway find details of reactions find details of enzymes data entry Curation workflow • reactions • structure of substrates • enzymes • EC number • kinetic parameters • inhibitors / activators • coding gene

  46. Database maintenance and improvement Genome Annotation + PathoLogic Prediction + Manual Pathway Curation Single Species Databases AraCyc 4.5 RiceCyc PoplarCyc AraCyc 5.0 RiceCyc PoplarCyc Refine existing databases *PlantCyc* PlantCyc Multi-species reference database

  47. Database maintenance and improvement Genome Annotation + PathoLogic Prediction + Manual Pathway Curation Single Species Databases AraCyc 5.0 RiceCyc PoplarCyc MaizeCyc and more *PlantCyc* PlantCyc Multi-species reference database

  48. Database maintenance and improvement Genome Annotation + PathoLogic Prediction + Manual Pathway Curation Single Species Databases AraCyc 10.0 RiceCyc PoplarCyc MaizeCyc and more *PlantCyc* PlantCyc Multi-species reference database

  49. Database maintenance and improvement Genome Annotation + PathoLogic Prediction + Manual Pathway Curation Single Species Databases AraCyc 4.5 RiceCyc PoplarCyc AraCyc 5.0 RiceCyc PoplarCyc Refine existing databases *PlantCyc* PlantCyc Multi-species reference database

  50. Database maintenance and improvement Genome Annotation + PathoLogic Prediction + Manual Pathway Curation Single Species Databases AraCyc 5.0 RiceCyc PoplarCyc MaizeCyc and more *PlantCyc* PlantCyc Multi-species reference database

More Related