1 / 36

Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples. Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International gobbel@ai.sri.com http://BioCyc.org/. Computing with Pathway Tools: APIs. Generic functions with a consistent naming scheme

gisela
Download Presentation

Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Exploration of Metabolic Networks with Pathway ToolsPart 2: APIs & Examples Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International gobbel@ai.sri.com http://BioCyc.org/

  2. Computing with Pathway Tools: APIs • Generic functions with a consistent naming scheme • Basic frame access functions • Built-in functions for analysis and global statistics • Simultaneous access to multiple KBs • Cross-species comparisons • Specialized KBs • MetaCyc • SchemaBase

  3. Computing with Pathway Tools: APIs • PerlCyc interface • Library of Perl functions for querying PGDBs via socket connection • Database access functions • Select_Organism, All_Pathways • Functions for performing inference / hardwired queries • Genes_Of_Reaction, Genes_Of_Pathway • Transcription_Unit_Transcription_Factors • Enzyme_P • JavaCyc interface also in progress • http://aracyc.stanford.edu/~mueller/perlcyc/ • Lisp API • http://bioinformatics.ai.sri.com/ptools/ptools-resources.html

  4. Perlcyc and Javacyc • Interface to running Pathway Tools image through TCP • Names are translated to Perl and Java conventions • Object references are supported by means of unique frame names

  5. get_class_all_instances(Class) Returns the instances of Class Key Pathway Tools classes: Genetic-Elements Genes Proteins Polypeptides Protein-Complexes Pathways Pathway Tools API Functions • Reactions • Compounds-And-Elements • Enzymatic-Reactions • Transcription-Units • Promoters • DNA-Binding-Sites

  6. Pathway Tools API Functions • Notation Frame.Slot means a specified slot of a specified frame • get_slot_value(Frame Slot) • Returns first value of Frame.Slot • get_slot_values(Frame Slot) • Returns all values of Frame.Slot • slot_has_value_p(Frame Slot) • Returns true if Frame.Slot has at least one value • member_slot_value_p(Frame Slot Value) • Returns true if Value is one of the values of Frame.Slot

  7. Additional Pathway Tools Functions – Semantic Inference Layer • Built-in functions encode commonly used queries that compute indirect DB relationships • genes_of_pathway, substrates_of_pathway • all_transcription_factors, regulon_of_protein • See http://bioinformatics.ai.sri.com/ptools/ptools-fns.html for more information

  8. Computing with Pathway Tools:Flat Files • Two file formats: tab-delimited, attribute-value • One file for each format, each datatype • Specification: • http://bioinformatics.ai.sri.com/ptools/flatfile-format.html • Examples: • Pathways.col – Pathways and genes encoding enzymes • Enzymes.col – Enzymes and reactions they catalyze • Pathways.dat – Full data on each pathway • Reactions.dat – Full data on each reaction

  9. Example Flat File UNIQUE-ID - P107-PWY TYPES - Energy-Metabolism COMMON-NAME - RuMP cycle and formaldehyde assimilation REACTION-LIST - FORMATEDEHYDROG-RXN REACTION-LIST - FORMALDEHYDE-DEHYDROGENASE-RXN REACTION-LIST - 6PGLUCONDEHYDROG-RXN REACTION-LIST - R84-RXN REACTION-LIST - PGLUCISOM-RXN REACTION-LIST - R12-RXN REACTION-LIST - R10-RXN SYNONYMS - ribulose-monophosphate cycle SYNONYMS - formaldehyde oxidation //

  10. Example Flat File – Reactions.dat UNIQUE-ID - R84-RXN TYPES - EC-1.1.1 EC-NUMBER - 1.1.1.- IN-PATHWAY - P122-PWY IN-PATHWAY - P107-PWY LEFT - GLC-6-P LEFT - NAD OFFICIAL-EC? - NO RIGHT - 6-P-GLUCONATE RIGHT - NADH RIGHT - PROTON //

  11. Example Flat File – Compounds.dat UNIQUE-ID - GLC-6-P TYPES - Carbohydrate-Derivatives COMMON-NAME - glucose-6-phosphate CAS-REGISTRY-NUMBERS - 56-73-5 CHEMICAL-FORMULA - (C 6) CHEMICAL-FORMULA - (H 13) CHEMICAL-FORMULA - (O 9) CHEMICAL-FORMULA - (P 1) MOLECULAR-WEIGHT - 260.137 SYNONYMS - D-glucose-6-P SYNONYMS - glucose-6-P SYNONYMS - α-D-glucose-6-phosphate SYNONYMS - α-D-glucose-6-P SYNONYMS - D-glucose-6-phosphate //

  12. Bioinformatics Results:Algorithms • Query and visualization environment for genome and pathway information • PathoLogic algorithm predicts the metabolic network of an organism from its genome • Algorithm for global characterization of a metabolic network • Algorithms under development for qualitative modeling of the cell

  13. The Pathway Tools KB as a "virtual cell" • Detailed representation of proteins, including subunits • Protein complexes and modifications • Links from genome, through proteins, to pathways and superpathways

  14. Computing with theMetabolic Network • Comparative analysis of metabolic networks • Visualization of expression data • Correlation of metabolism and transport • Connectivity analysis of metabolic network • Forward propagation of metabolites • Verification of known growth media with metabolic network

  15. Computational Explorationof PGDBs • Infer metabolic network from genome • Bioinformatics 18:705 2002 • Global properties of the metabolic network • Genome Research 10:568 2000 • Global properties of the genetic network • Comparison of whole metabolic networks • Consistency of a PGDB with respect to known growth-media requirements • Search for gaps in metabolic network • Pacific Symp Biocomputing 2001:471

  16. Example Studies • Relationship of protein subunits to gene positions • Global properties of the E. coli metabolic network • Reactions catalyzed by more than one enzyme • Enzymes that catalyze more than one reaction • Reactions participating in more than one pathway • Automatic detection of intersection points in the metabolic network • Nutrient analyses • Forward propagation: Given a set of nutrients, what compounds will be produced by the metabolic network? • Backtracking: Given a forward propagation result, and a set of essential compounds that are not included in that result, what precursors must be supplied to produce those compounds? • Operon prediction

  17. Protein subunits and linked genes • Question: are protein subunits coded by neighboring genes? • Proteins are linked to genes, gene positions are recorded in the KB • Procedure • Fetch all protein complexes • Subunits are stored in the ‘components’ slot • Each component has a ‘gene’ slot • Genes have ‘left-end-position’ and ‘right-end-position’ slots • Results • Protein subunits of >90% of heteromeric enzymes are encoded by neighboring genes

  18. Global properties: How many reactions are catalyzed by more than one enzyme? • Procedure • get_class_all_instances(‘Reactions’) • We are interested only in reactions with at least one value in their ‘enzymatic-reaction’ slot • result = reactions with more than one value for their ‘enzymatic-reaction’ slot • Results • About 10% of reactions are catalyzed by more than one enzyme • Two classes of multi-enzyme reactions • Homologous enzymes • “Easy” reactions

  19. Global properties: Multifunctional enzymes (how many enzymes catalyze more than one reaction?) • Procedure • get_class_all_instances(‘Proteins’) • result = proteins with more than one value in the ‘catalyzes’ slot • Results • 100 out of 607 enzymes catalyze multiple reactions • This is significantly more than predicted by genome sequencing projects

  20. Global properties: Reactions in multiple pathways • Procedure • get_class_all_instances(‘Reactions’) • result = reactions with more than one value in the ‘in-pathway’ slot • Significance • Reactions that appear in multiple pathways correspond to intersection points in the metabolic network • Could be used to identify candidate reactions for drug targets

  21. Metabolic Overview Queries • Species comparison • Highlight reactions that are • Shared/not-shared with • Any-one/All-of • A specified set of species • Overlay expression data • Absolute or relative expression levels • Reaction colors reflects expression level

  22. A E

  23. C. crescentus Cell Cycle Gene Expression

  24. Global Consistency Checking of Biochemical Network • Given: • A PGDB for an organism • A set of initial metabolites • Infer: • What set of products can be synthesized by the small-molecule metabolism of the organism • Can known growth medium yield known essential compounds? • Pacific Symposium on Biocomputing p471 2001

  25. Algorithm:Forward Propagation Nutrient set Products PGDB reaction pool Transport “Fire” reactions Metabolite set Reactants

  26. Results • Phase I: Forward propagation • 21 initial compounds yielded only half of 38 essential compounds for E. coli • Phase II: Manually identify • Bugs in EcoCyc (e.g., two objects for tryptophan) • Missing initial protein substrates (e.g., ACP) • Missing pathways in EcoCyc • Phase III: Forward propagation with 11 more initial metabolites • Yielded all 38 essential compounds

  27. Initial Metabolites(Total: 21 compounds)

  28. Nutrient-Related Analysis:Validation of the EcoCyc Database Results on EcoCyc: Phase I: • Essential compounds • produced 19 • not produced 19 • Total compounds • produced: (28%) • Reactions • Fired (31%)

  29. Missing Essential Compounds Due To • Bugs in EcoCyc • Narrow conceptualization of the problem • Protein substrates • Incomplete biochemical knowledge

  30. Nutrient-Related Analysis:Validation of the EcoCyc Database Results on EcoCyc: Phase II (After adding 11 extra metabolites): • Essential compounds • produced 38 • not produced 0 • Total compounds • produced: (49%) • not produced: (51%) • Reactions • Fired (58%) • Not fired (42%)

  31. Operon Prediction • Based on the method of Moreno-Hagelsieb et al. Bioinformatics 18 Suppl. 1 (2002) • Distance between genes • Functional classification • Correctly predicts 75% of transcription units, 65% of operons • Additional information available in PGDB • Pathways • Protein complexes • Transporters • Improved prediction performance: 80% of transcription units, 69% of operons • Detailed paper in preparation

  32. Visualization of Genetic Network • Operon display window • Transcription factor display window • Highlight regulon on Overview diagram • Paint expression data onto Overview diagram • Database adapter mechanism: MAGE-ML intermediate form • Adapter defined for SMD • Animation • User specified mapping of color ranges • Import of SAM files (next release) • List of significantly +/- genes • Display full genetic network (later release)

  33. SRI Peter Karp, Suzanne Paley, Pedro Romero, John Pick, Randy Gobbel, Cindy Krieger, Martha Arnaud EcoCyc Project Julio Collado-Vides, Ian Paulsen, Monica Riley, Milton Saier MetaCyc Project Sue Rhee, Lukas Mueller, Peifen Zhang, Chris Somerville Stanford Gary Schoolnik, Harley McAdams, Lucy Shapiro, Russ Altman, Iwei Yeh Funding sources: NIH National Center for Research Resources NIH National Institute of General Medical Sciences NIH National Human Genome Research Institute Department of Energy Microbial Cell Project DARPA BioSpice, UPC Acknowledgements BioCyc.org

More Related