1 / 98

Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

Metabolomics Applications of the BioCyc Databases and Pathway Tools Software. Peter D. Karp ecocyc.org SRI International biocyc.org metacyc.org. Overview. Overview of MetaCyc family of Pathway/Genome Databases ( PGDBs )

tuan
Download Presentation

Metabolomics Applications of the BioCyc Databases and Pathway Tools Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metabolomics Applications of the BioCyc Databases and Pathway Tools Software • Peter D. Karp ecocyc.org • SRI International biocyc.org • metacyc.org

  2. Overview • Overview of MetaCyc family of Pathway/Genome Databases (PGDBs) • BioCyc collection: EcoCyc, MetaCyc, HumanCyc, etc • CuratedPGDBs for Arabidopsis, Yeast, Mouse, Fly, etc • Overview of Pathway Tools software • Automatic generation of metabolic-flux models

  3. MetaCyc Family ofPathway/Genome Databases • 6,000+ databases from many institutions • All domains of life with microbial emphasis • Genomes plus predicted metabolic pathways • DBs derived from MetaCyc via computational pathway prediction • Common schema • Common controlled vocabularies • Managed using Pathway Tools software MetaCyc Family 6,000+ BioCyc.org 3,500 Archives of Toxicology 85:1015 2011

  4. Curated Databases Within the MetaCyc Family http://biocyc.org/otherpgdbs.shtml

  5. Pathway Tools Software • Comprehensive systems biology software environment • Create and maintain an organism database integrating genome, pathway, regulatory information • Computational inference tools • Interactive editing tools • Query and visualize that database • Generate steady-state metabolic flux models • Flux-balance analysis • Interpret omics datasets • Comparative analysis tools • Licensed by 5,000+ groups

  6. Motivations: Management ofMetabolic Pathway Data • Organize growing corpus of data on metabolic pathways • Experimentally elucidated pathways in the biomedical literature • Computationally predicted pathways derived from genome data • Provide software tools for querying and comprehending this complex information space • Multiorganism view: MetaCyc • Unique, experimentally elucidated pathways across all organisms • Reference database for computational pathway prediction • Organism-specific view: • Organism-specific Pathway/Genome Databases • Detailed qualitative models of metabolic networks • Combine computational predictions with experimentally determined pathways

  7. Model Organism Databases /Organism Specific Databases • DBs that describe the genome and other information about an organism • Every sequenced organism with an active experimental community requires a MOD • Integrate genome data with information about the biochemical and genetic network of the organism • Integrate literature-based information with computational predictions • Accurate metabolic modeling requires a curation effort

  8. Rationale for MODs • Each “complete” genome is incomplete in several respects: • 40%-60% of genes have no assigned function • Roughly 7% of those assigned functions are incorrect • Many assigned functions are non-specific • Need continuous updating of annotations with respect to new experimental data and computational predictions • Gene positions, sequence, gene functions, regulatory sites, pathways • MODs are platforms for global analyses of an organism • Interpret omics data in a pathway context • In silico prediction of essential genes • Characterize systems properties of metabolic and genetic networks

  9. Pathway/Genome Database Pathways Reactions Compounds Sequence Features Proteins RNAs Regulation Operons Promoters DNA Binding Sites Regulatory Interactions Genes Chromosomes Plasmids CELL

  10. BioCyc Collection of 3,000 Pathway/Genome Databases • Pathway/Genome Database (PGDB) – combines information about • Pathways, reactions, substrates • Enzymes, transporters • Genes, replicons • Transcription factors/sites, promoters, operons • Tier 1: Literature-Derived PGDBs • MetaCyc, HumanCyc, YeastCyc • EcoCyc -- Escherichia coli K-12 • AraCyc – Arabidopsis thaliana • Tier 2: Computationally-derived DBs, Some Curation -- 34 PGDBs • Bacillus subtilis, Mycobacterium tuberculosis • Tier 3: Computationally-derived DBs, No Curation -- ~3,000 PGDBs

  11. Obtaining a PGDB for Organism of Interest • Find existing PGDB in BioCyc • Find existing PGDB from larger MetaCyc family of PGDBs • http://biocyc.org/otherpgdbs.shtml • Download from PGDB registry • http://biocyc.org/registry.html • Create your own PGDB

  12. 4,000+ licensees: 250 groups applying software to 1,700 organisms Saccharomycescerevisiae, SGD project, Stanford University 135 pathways / 565 publications – BioCyc.org FungiCyc, Broad Institute Candida albicans, CGD project, Stanford University dictyBase, Northwestern University Mouse, MGD, Jackson Laboratory -- BioCyc.org Drosophila, FlyBase, Harvard University -- BioCyc.org Under development: C. elegans, WormBase Arabidopsis thaliana,TAIR, Carnegie Institution of Washington 288 pathways / 2282 publications – BioCyc.org PlantCyc: Poplar, Cassava, Corn, Grape, Soy,Carnegie Institution Six Solanaceae species, Cornell University GrameneDB: Rice, Sorghum, Maize, Cold Spring Harbor Laboratory Medicagotruncatula, Samuel Roberts Noble Foundation ChlamyCyc, GoFORSYS Pathway Tools Software: PGDBs Created Outside SRI

  13. M. Bibb, John Innes Centre,Streptomycescoelicolor F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa Genoscope,Acinetobacter R.J.S. Baerends, University of Groningen, Lactococcuslactis IL1403, Lactococcuslactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus ATCC14579 Matthew Berriman, Sanger Centre, Trypanosomabrucei, Leishmania major Sergio Encarnacion, UNAM, Sinorhizobiummeliloti Mark van derGiezen, University of London, Entamoebahistolytica, Giardiaintestinalis Pathway Tools Software: PGDBs Created Outside SRI

  14. Pathway Tools Software: PGDBs Created Outside SRI • Large scale users: • C. Medigue, Genoscope, 500+ PGDBs • J. Zucker, Broad Inst, 94 PGDBs • G. Sutton, J. Craig Venter Institute, 80+ PGDBs • G. Burger, U Montreal, 60+ PGDBs • E. Uberbacher, ORNL 33 Bioenergy-related organisms • Bart Weimer, UC Davis, Lactococcuslactis, Brevibacterium linens, Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii, Listeriamonocytogenes • Partial listing of outside PGDBs at http://biocyc.org/otherpgdbs.shtml

  15. EcoCyc Project – EcoCyc.org • E.coli Encyclopedia • Review-level Model-Organism Database for E. coli • Derived from 25,000 publications • “Multi-dimensional annotation of the E. coli K-12 genome” • Gene product summaries and literature citations • Evidence codes • Gene Ontology terms • Protein features (active sites, metal ion binding sites) • Multimeric complexes • Metabolic pathways • Regulation of gene expression and of protein activity • Gene essentiality data • Growth under alternative nutrient conditions Karp, Gunsalus, Collado-Vides, Paulsen Nuc. Acids Res. 41:D605 2013

  16. EcoCyc = E.coli Dataset + Pathway/Genome Navigator Pathways: 312 Reactions: Metabolic: 1600 Transport: 370 Compounds: 2,400 EcoCyc v17.0 Citations: 24,000 Monomers: 4389 Complexes: 976 RNAs: 301 Regulation: Operons: 4,500 Trans Factors: 222 Promoters: 3,770 TF Binding Sites: 2,700 Reg Interactions: 5,900 Genes: 4,499 URL: EcoCyc.org

  17. Perspective 1:EcoCyc as Online Encyclopedia • All gene products for which experimental literature exists are curated with a minireview summary • 3,730 gene products contain summaries • Summaries cover function, interactions, mutant phenotypes, crystal structures, regulation, and more • Additional summaries and other data found in pages for genes, operons, pathways • Quick Search

  18. Perspective 2: EcoCyc as Queryable Database • High-fidelity knowledge representation amenable to structured queries • 333 database fields capture object properties and relationships • Each molecular species defined as a DB object • Genes, proteins, small molecules • Each molecular interaction defined as a DB object • Metabolic and transport reactions, regulation • Extensive search tools • Object-specific search Search Menu • Advanced search Search -> Advanced

  19. Paradigm 3: EcoCyc as Predictive Metabolic Model • A steady-state quantitative model of E. coli metabolism can be generated from EcoCyc • Predicts phenotypes of E. coli knock-outs, and growth/no-growth of E. coli on different nutrients • Model is updated on each EcoCyc release • Serves as a quality check on the EcoCyc data

  20. EcoCyc Accelerates Science • Experimentalists • E. coli experimentalists • Experimentalists working with other microbes • Analysis of expression data • Computational biologists • Biological research using computational methods • Genome annotation • Study properties of E. coli metabolic and regulatory networks • Bioinformaticists • Training and validation of new bioinformatics algorithms – predict operons, promoters, protein functional linkages, protein-protein interactions, • Metabolic engineers • “Design of organisms for the production of organic acids, amino acids, ethanol, hydrogen, and solvents “ • Educators • Microbiology and metabolism education

  21. Recent Developments in EcoCyc • EcoCyc contains six knock-out datasets for E. coli containing 13,000 growth observations

  22. Recent Developments in EcoCyc –Growth-Observation Data • EcoCyc contains 1831 growth observations under 522 conditions for E. coli • Substantial number of discrepancies • 45 cases remain where growth status is unclear

  23. MetaCyc: Metabolic Encyclopedia • Describes experimentally determined metabolic pathways, reactions, enzymes, and compounds • Literature-based DB with extensive references and commentary • MetaCycvsBioCyc: Experimentally elucidated pathways • Jointly developed by • P. Karp, R. Caspi, C. Fulcher, SRI International • L. Mueller, A. Pujar, Boyce Thompson Institute • S. Rhee, P. Zhang, Carnegie Institution Nucleic Acids Research2012 Database Issue

  24. MetaCyc Data -- Version 18.0 “A Systematic Comparison of the MetaCyc and KEGG Pathway Databases BMC Bioinformatics 2013 14(1):112

  25. Taxonomic Distribution of MetaCyc PathwaysVersion 17.5

  26. Comparison with KEGG • KEGG vsMetaCyc: Reference pathway collections • KEGG maps are not pathways Nuc Acids Res 34:3687 2006 • KEGG maps contain multiple biological pathways • KEGG maps are composites of pathways in many organisms -- do not identify what specific pathways elucidated in what organisms • Two genes chosen at random from a BioCyc pathway are more likely to be related according to genome context methods than from a KEGG pathway • KEGG has few literature citations, few comments, less enzyme detail • KEGG vs organism-specific PGDBs • KEGG does not curate or customize pathway networks for each organism • Highly curatedPGDBs now exist for important organisms such as E. coli, yeast, mouse, Arabidopsis

  27. Pathway Tools

  28. Pathway Tools Software + PathoLogic MetaCyc Annotated Genome Pathway/Genome Navigator Pathway/Genome Database MetaFlux Pathway/Genome Editors Briefings in Bioinformatics 11:40-79 2010

  29. Pathway Tools Enables Multi-Use Metabolic Databases Metabolic Model Encyclopedia Queryable Database Zoomable Metabolic Map Omics Data Analysis

  30. Pathway Tools Software: PathoLogic • Computational creation of new Pathway/Genome Databases • Transforms genome into Pathway Tools schema and layers inferred information above the genome • Predicts operons • Predicts metabolic network • Predicts which genes code for missing enzymes in metabolic pathways • Infers transport reactions from transporter names

  31. Pathway Tools Software:Pathway/Genome Editors • Interactively update PGDBs with graphical editors • Support geographically distributed teams of curators with object database system • Gene and protein editor • Reaction editor • Compound editor • Pathway editor • Operon editor • Publication editor

  32. Pathway Tools Software:Pathway/Genome Navigator • Querying and visualization of: • Pathways • Reactions • Metabolites • Genes/Proteins/RNA • Regulatory interactions • Chromosomes • Modes of operation: • Web mode • Desktop mode • Most functionality shared

  33. Pathway Tools Software: MetaFlux • Speeds development of genome-scale metabolic flux models • Steady-state quantitative flux-models generated directly from PGDBs • Computed reaction fluxes can be painted onto metabolic overview diagram • Multiple gap filler accelerates model development by suggesting model completions: • Reactions to add from MetaCyc • Additional nutrients and secreted compounds

  34. Pathway Tools Schema / Ontology • 1064 classes • Datatype classes such as: • Pathways, Reactions, Compounds, Macromolecules, Proteins, Replicons, DNA-Segments (Genes, Operons, Promoters) • Taxonomies for Pathways, Reactions, Compounds • Cell Component Ontology • Evidence Ontology • 308 attributes and relationships • Span genome, metabolism, regulatory information • Meta-data: Creator, Creation-Date • Comment, Citations, Common-Name, Synonyms • Attributes: Molecular-Weight, DNA-Footprint-Size • Relationships: Catalyzes, Component-Of, Product

  35. Pathway Prediction • Pathway prediction is useful because • Pathways organize the metabolic network into mentally tractable units • Pathways guide us to search for missing enzymes • Pathway inference fills in holes in the metabolic network • Pathways can be used for analysis of high-throughput data • Visualization, enrichment analysis • Pathway prediction is hard because • Reactome inference is imperfect • Some reactions present in multiple pathways • Pathway variants share many reactions in common • Increasing size of MetaCyc

  36. Reactome Inference • For each protein in the organism, infer reaction(s) it catalyzes • Protein functions can be specified in three ways: • Enzyme names (protein functions) (uncontrolled vocabulary) • EC numbers • Gene Ontology terms • Detect conflicts among this information • Example: • Yersiniapseudotuberculosis PB1 • 2-succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexene-1-carboxylate synthase / EC 4.1.1.71

  37. Enzyme Name Matching • Extraneous information found in gene product names • Putative carbamatekinase, alpha subunit • Carbamatekinase (abcD) • Carbamatekinase (3.2.1.4) • Monoamine oxidase B • bifunctionalproline dehydrogenase/pyrroline-5-carboxylate dehydrogenase

  38. Inference of Metabolic Pathways • For each pathway in MetaCyc consider • What fraction of its reactions are present in the just-inferred reactome of the organism? • Are enzymes present for reactions unique to the pathway? • Are enzymes present for designated “key reactions” within MetaCyc pathways? • Calvin cycle / ribulosebisphosphatecarboxylase • Is a given pathway outside its designated taxonomic range? • Calvin cycle: green plants, green algae, etc Standards in Genomic Sciences 5:424-429 2011

  39. Evaluation of Pathway Inference • Define gold-standard pathway prediction set • E. coli, Yeast, Arabidopsis, Synechococcus, Mouse • Positive and negative pathways • PathoLogic achieved 91% accuracy BMC Bioinformatics 11:15 2010

  40. Comparison with KEGG • KEGG vsMetaCyc: Reference pathway collections • KEGG maps are not pathways Nuc Acids Res 34:3687 2006 • KEGG maps contain multiple biological pathways • KEGG maps are composites of pathways in many organisms -- do not identify what specific pathways elucidated in what organisms • KEGG modules are incomplete • KEGG has few literature citations, few comments, less enzyme detail • KEGG vs organism-specific PGDBs • KEGG does not curate or customize pathway networks for each organism • Highly curatedPGDBs now exist for important organisms such as E. coli, yeast, mouse, Arabidopsis • KEGG algorithms • Not published; accuracy unknown

  41. Pathway Analysis of Metagenomes • Bin the metagenome data and create separate PGDBs for each organism • Hallam lab • Compute list of all pathways present in the metagenome

  42. Analysis of High Throughput Datasets • Genome-scale visualizations of cellular networks • Generated automatically from PGDB • Magnify, interrogate • Omics viewers paint omics data onto overview diagrams • Different perspectives on same dataset • Use animation for multiple time points or conditions

  43. Cellular Overview Diagram • Combines metabolic map and transporters • Automatically generated, organism-specific • Zoomable, queryable

  44. E. coli Cellular Overview

More Related