410 likes | 618 Views
EcoCyc , MetaCyc, and the Pathway Tools Software. Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International pkarp@ai.sri.com http:// www.ai.sri.com/pkarp/talks / BioCyc.org EcoCyc.org , MetaCyc.org. MetaCyc Family of Pathway/Genome Databases.
E N D
EcoCyc, MetaCyc, and the Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International pkarp@ai.sri.com http://www.ai.sri.com/pkarp/talks/ BioCyc.org EcoCyc.org, MetaCyc.org
MetaCyc Family ofPathway/Genome Databases • 1,700+ databases from multiple institutions • Cover all domains of life with microbial emphasis • All DBs derived from MetaCyc via computational pathway prediction • Common schema • Common controlled vocabularies • Common methodologies Archives of Toxicology 2011
BioCyc Collection of 1,100 Pathway/Genome Databases • Pathway/Genome Database (PGDB) – combines information about • Pathways, reactions, substrates • Enzymes, transporters • Genes, replicons • Transcription factors/sites, promoters, operons • Tier 1: Literature-Derived PGDBs • MetaCyc • EcoCyc -- Escherichia coli K-12 • Tier 2: Computationally-derived DBs, Some Curation -- 28 PGDBs • HumanCyc, BsubCyc • Mycobacterium tuberculosis • Tier 3: Computationally-derived DBs, No Curation -- The remainder
EcoCyc Project – EcoCyc.org • E.coli Encyclopedia • Review-level Model-Organism Database for E. coli • Tracks evolving annotation of the E. coli genome and cellular networks • The two paradigms of EcoCyc • “Multi-dimensional annotation of the E. coli K-12 genome” • Positions of genes; functions of gene products – 76% / 66% exp • Gene Ontology terms; MultiFun terms • Gene product summaries and literature citations • Evidence codes • Multimeric complexes • Metabolic pathways • Regulation of gene expression and of protein activity Karp, Gunsalus, Collado-Vides, Paulsen Nuc. Acids Res. 35:7577 2007ASM News 70:25 2004 Science 293:2040
EcoCyc = E.coli Dataset + Pathway/Genome Navigator URL: EcoCyc.org Pathways: 260 Reactions: Metabolic: 1446 Transport: 287 Compounds: 1,830 EcoCycv15.0 Citations: 21,000 Proteins: 4,479 Complexes: 895 RNAs: 285 Regulation: Operons: 3,409 Trans Factors: 206 Promoters: 1,878 TF Binding Sites: 2,394 Reg Interactions: 5345 Genes: 4,489
PortEco.org • EcoCyc + PortEco = E. coli model-organism database • Query multiple E. coli databases simultaneously • E. coli gene expression archive • E. coli Wiki • ~40 E. coli and Shigella databases available at BioCyc.org
MetaCyc: Metabolic Encyclopedia • Describe a representative sample of every experimentally determined metabolic pathway • Describe properties of metabolic enzymes • Literature-based DB with extensive references and commentary • Pathways, reactions, enzymes, substrates • MetaCycvsBioCyc: Experimentally elucidated pathways • Jointly developed by • P. Karp, R. Caspi, C. Fulcher, SRI International • L. Mueller, A. Pujar, Boyce Thompson Institute • S. Rhee, P. Zhang, Carnegie Institution Nucleic Acids Research2010
Applications of MetaCyc • Reference source on metabolic pathways and enzymes • Predict pathways from genomes • Metabolic engineering • Find desired metabolic pathways and reactions • Find enzymes with desired activities, regulatory properties • Determine cofactor requirements
Pathway Tools Software + Annotated Genome PathoLogic Genome-Scale Flux Model Pathway/Genome Database Pathway/Genome Navigator Pathway/Genome Editors Briefings in Bioinformatics 11:40-79 2010
Pathway Tools Software: PathoLogic • Computational creation of new Pathway/Genome Databases • Transforms genome into Pathway Tools schema and layers inferred information above the genome • Predicts operons • Predicts metabolic network • Predicts which genes code for missing enzymes in metabolic pathways • Infers transport reactions from transporter names
Pathway Tools Software:Pathway/Genome Editors • Interactively update PGDBs with graphical editors • Support geographically distributed teams of curators with object database system • Gene and protein editor • Reaction editor • Compound editor • Pathway editor • Operon editor • Publication editor
Pathway Tools Software:Pathway/Genome Navigator • Querying and visualization of: • Pathways • Reactions • Metabolites • Genes/Proteins/RNA • Regulatory interactions • Chromosomes • Two modes of operation: • Web mode • Desktop mode • Most functionality shared, but each has unique functionality
Cellular Overview Diagram • Combines metabolic map and transporters • Automatically generated for each organism • Zoomable, queryable • Web-based and desktop • BioCyc.org • Tools Cellular Overview • Tools Regulatory Overview • Fastest with Safari, Chrome, Firefox
Regulatory Overview and Omics Viewer • Show regulatory relationships among gene groups
Enrichment Analysis “My experiments yielded a set of genes/metabolites. What do they have in common?” • Given a set of genes: • What GO terms are statistically over-represented in that set? • What metabolic pathways are over-represented? • What transcriptional regulators are over-represented? • Given a set of metabolites: • What metabolic pathways are statistically over-represented in that set?
Automated Generation of Metabolic Flux Models from PGDBs Joint work with Mario Latendresse
Goals • Decrease the time required to construct FBA models from 9-12 months to several weeks • Create richer FBA models that are tightly coupled to genome and regulatory information • Make FBA models and results more transparent
Approach: Derive FBA Models from PGDBs • Store and update metabolic model within Pathway Tools • Export to constraint solver for model execution/solving • Fast generation of metabolic model from annotated genome • Pathway Tools schema • Associate a wealth of information with each metabolic model • Unique identifiers and controlled vocabulary for model components • Tools for querying and visualization of metabolic models • Tools for model debugging and analysis • Reaction balance checking • Dead-end metabolite analysis • Visualize reaction flux using cellular overview • Multiple gap filling
FBA Model Execution • Runs SCIP solver on .lp file • Konrad-Zuse-ZentrumfürInformationstechnik Berlin • Interpret SCIP output • Determine if SCIP found a solution • Map fluxes to PGDB reactions • Display resulting fluxes on the Cellular Overview
Model Debugging via Multiple Gap Filling • Most FBA models are not initially solvable because of incomplete or incorrect information • Use meta-optimization to postulate alterations to a model to render it solvable • Each alteration has an associated cost; minimize cost of alterations • Formulate as MILP and submit to SCIP
Multiple Gap Filling of FBA Models • Reaction gap filling (Kumar et al, BMC Bioinf 2007 8:212): • Reverse directionality of selected reactions • Add a minimal number of reactions from MetaCyc to the model to enable a solution • Reaction cost is a function of reaction taxonomic range • Metabolite gap filling: Postulate additional nutrients and secretions • Partial solutions: Identify maximal subset of biomass components for which model can yield positive production rates
Comparative Analysis • Via Cellular Overview • Comparative genome browser • Comparative pathway table • Comparative analysis reports • Compare reaction complements • Compare pathway complements • Compare transporter complements
Advanced Query Form • Intuitive construction of complex database queries of SQL power
Work in Progress • Computation of reaction atom mappings • Program to generate metabolic pathways that synthesize target compound from feedstock compound
How to Learn More • BioCyc.org Help menu • BioCyc Webinars • Biocyc.org/webinar.shtml • Publications page • Biocyc.org/publications.shtml • Tutorials held at SRI • Next week: FBA