220 likes | 239 Views
Curation of the EcoCyc Database: The EcoCyc Update Project. Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International. http://www.ecocyc.org. http://www.biocyc.org. EcoCyc Organization. EcoCyc collects information about multiple types of database objects
E N D
Curation of the EcoCyc Database:The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International http://www.ecocyc.org http://www.biocyc.org
EcoCyc Organization • EcoCyc collects information about multiple types of database objects • Pathway * • Reaction * • Compound * • Protein • Gene * • Transcription Unit * hierarchies Genes Proteins Pathway Reactions Compounds
EcoCyc Statistics 176 pathways 992 enzymes 1006 enzymatic reactions 169 transporters 828 transcription units 1929 proteins have a comment (598 > 300 characters)
EcoCyc Pathway Information http://biocyc.org:1555/ECOLI/new-image?type=PATHWAY&object=ALANINE-VALINESYN-PWY&detail-level=2
EcoCyc Pathway Information http://biocyc.org:1555/ECOLI/new-image?type=PATHWAY&object=ALANINE-VALINESYN-PWY&detail-level=2
EcoCyc Protein Information reaction comment citations
EcoCyc Metabolic Overview Static or animated views of expression data http://biocyc.org/ov-expr.shtml
EcoCyc Curation • names and synonyms • gene classes • subunit composition of protein complexes • location of gene product • protein or complex molecular weight • enzyme activity name • enzyme properties (activators, inhibitors, cofactors) • comment fields • evidence • citations • reactions catalyzed • pathway information
Build a new MOD or add a “Pathway Module”! • Saccharomyces cerevisiae • SGD, Stanford University • Arabidopsis thaliana • Carnegie Institution of Washington • Plasmodium falciparum, • Stanford University • Mycobacterium tuberculosis • Stanford University • Synechocystis • Carnegie Institution of Washington • Methanococcus janaschii • EBI Pathway Tools Software - Takes annotated genome - Generates database, including pathway predictions Freely available (academics/non-profits) Current Pathway Tools Users http://bioinformatics.ai.sri.com/ptools/ Pathway Tools software environment for creation, curation, analysis, and Web publishing of MODs ptools-info@ai.sri.com
EcoCyc Strengths • Metabolism • Transport • Transcription regulation
EcoCyc into the Future: “EcoCyc is not just metabolism anymore!” …an integrated, review-level information resource on E. coli genomics and biochemistry…
The EcoCyc Update Project: • What do we need to do? Goals • Can we possibly get it done? Quantification • Where do we start? Priorities • How is it going? Progress
EcoCyc Update: Curation Goals Curate every gene product: • literature-based descriptions • comprehensive reference lists • Expand database scope beyond metabolism, transporters, and transcription • Curate associated reactions and pathways • Stay current with the latest papers
EcoCyc Update: Quantification 4405 genes -175 transcription factors -168 transporters 4062 genes to curate Full-time curator: 4 days/week on curation + Part-time curator (70%), years 2-4 Year 1: 1600 hours Year 2: 3000 hours Year 3: 3000 hours Year 4: 3000 hours Total:10,600 hours/4062 genes: 2.6 hours per gene Curation of abstracts
EcoCyc Update: Priorities • 1. Problems raised by users and advisors • 2. Gene products that have new characterizations published in the literature • 3. Gene products that have not yet been thoroughly curated • 4. Gene products that have been curated, but have not been updated lately
Where are we now? 807 gene products curated. 807/4062 = 19.9% of the total (excluding transport and transcription factors) 4-year plan: Curate 615 genes in Year 1 We are meeting our goal!
The EcoCyc Collaboration UNAM • Julio Collado-Vides, Project Leader • Socorro Gama-Castro, Curator • Martin Peralta, Curator TIGR • Ian Paulsen, Project Leader • Mark Hance, Curator UCSD • Milton Saier, Project Leader • Can Tran, Curator SRI • Peter Karp, PI • Suzanne Paley, Software Engineer • John Pick, Software Engineer • Martha Arnaud, Curator UCD • John Ingraham, Project Leader MBL • Monica Riley, Editor Emerita • Funding: • NIH National Center for Research Resources
Saccharomyces cerevisiae, Stanford University pathway.yeastgenome.org/biocyc/ Plasmodium falciparum, Stanford University plasmocyc.stanford.edu Mycobacterium tuberculosis, Stanford University BioCyc.org Arabidopsis thaliana andSynechocystis, Carnegie Institution of Washington Arabidopsis.org:1555 Methanococcus janaschii, EBI Maine.ebi.ac.uk:1555 Other PGDBs in progress by 40 other users Software freely available Each PGDB owned by its creator Pathway/Genome DBs Created byExternal Users