360 likes | 532 Views
Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples. Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International gobbel@ai.sri.com http://BioCyc.org/. Computing with Pathway Tools: APIs. Generic functions with a consistent naming scheme
E N D
Computational Exploration of Metabolic Networks with Pathway ToolsPart 2: APIs & Examples Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International gobbel@ai.sri.com http://BioCyc.org/
Computing with Pathway Tools: APIs • Generic functions with a consistent naming scheme • Basic frame access functions • Built-in functions for analysis and global statistics • Simultaneous access to multiple KBs • Cross-species comparisons • Specialized KBs • MetaCyc • SchemaBase
Computing with Pathway Tools: APIs • PerlCyc interface • Library of Perl functions for querying PGDBs via socket connection • Database access functions • Select_Organism, All_Pathways • Functions for performing inference / hardwired queries • Genes_Of_Reaction, Genes_Of_Pathway • Transcription_Unit_Transcription_Factors • Enzyme_P • JavaCyc interface also in progress • http://aracyc.stanford.edu/~mueller/perlcyc/ • Lisp API • http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
Perlcyc and Javacyc • Interface to running Pathway Tools image through TCP • Names are translated to Perl and Java conventions • Object references are supported by means of unique frame names
get_class_all_instances(Class) Returns the instances of Class Key Pathway Tools classes: Genetic-Elements Genes Proteins Polypeptides Protein-Complexes Pathways Pathway Tools API Functions • Reactions • Compounds-And-Elements • Enzymatic-Reactions • Transcription-Units • Promoters • DNA-Binding-Sites
Pathway Tools API Functions • Notation Frame.Slot means a specified slot of a specified frame • get_slot_value(Frame Slot) • Returns first value of Frame.Slot • get_slot_values(Frame Slot) • Returns all values of Frame.Slot • slot_has_value_p(Frame Slot) • Returns true if Frame.Slot has at least one value • member_slot_value_p(Frame Slot Value) • Returns true if Value is one of the values of Frame.Slot
Additional Pathway Tools Functions – Semantic Inference Layer • Built-in functions encode commonly used queries that compute indirect DB relationships • genes_of_pathway, substrates_of_pathway • all_transcription_factors, regulon_of_protein • See http://bioinformatics.ai.sri.com/ptools/ptools-fns.html for more information
Computing with Pathway Tools:Flat Files • Two file formats: tab-delimited, attribute-value • One file for each format, each datatype • Specification: • http://bioinformatics.ai.sri.com/ptools/flatfile-format.html • Examples: • Pathways.col – Pathways and genes encoding enzymes • Enzymes.col – Enzymes and reactions they catalyze • Pathways.dat – Full data on each pathway • Reactions.dat – Full data on each reaction
Example Flat File UNIQUE-ID - P107-PWY TYPES - Energy-Metabolism COMMON-NAME - RuMP cycle and formaldehyde assimilation REACTION-LIST - FORMATEDEHYDROG-RXN REACTION-LIST - FORMALDEHYDE-DEHYDROGENASE-RXN REACTION-LIST - 6PGLUCONDEHYDROG-RXN REACTION-LIST - R84-RXN REACTION-LIST - PGLUCISOM-RXN REACTION-LIST - R12-RXN REACTION-LIST - R10-RXN SYNONYMS - ribulose-monophosphate cycle SYNONYMS - formaldehyde oxidation //
Example Flat File – Reactions.dat UNIQUE-ID - R84-RXN TYPES - EC-1.1.1 EC-NUMBER - 1.1.1.- IN-PATHWAY - P122-PWY IN-PATHWAY - P107-PWY LEFT - GLC-6-P LEFT - NAD OFFICIAL-EC? - NO RIGHT - 6-P-GLUCONATE RIGHT - NADH RIGHT - PROTON //
Example Flat File – Compounds.dat UNIQUE-ID - GLC-6-P TYPES - Carbohydrate-Derivatives COMMON-NAME - glucose-6-phosphate CAS-REGISTRY-NUMBERS - 56-73-5 CHEMICAL-FORMULA - (C 6) CHEMICAL-FORMULA - (H 13) CHEMICAL-FORMULA - (O 9) CHEMICAL-FORMULA - (P 1) MOLECULAR-WEIGHT - 260.137 SYNONYMS - D-glucose-6-P SYNONYMS - glucose-6-P SYNONYMS - α-D-glucose-6-phosphate SYNONYMS - α-D-glucose-6-P SYNONYMS - D-glucose-6-phosphate //
Bioinformatics Results:Algorithms • Query and visualization environment for genome and pathway information • PathoLogic algorithm predicts the metabolic network of an organism from its genome • Algorithm for global characterization of a metabolic network • Algorithms under development for qualitative modeling of the cell
The Pathway Tools KB as a "virtual cell" • Detailed representation of proteins, including subunits • Protein complexes and modifications • Links from genome, through proteins, to pathways and superpathways
Computing with theMetabolic Network • Comparative analysis of metabolic networks • Visualization of expression data • Correlation of metabolism and transport • Connectivity analysis of metabolic network • Forward propagation of metabolites • Verification of known growth media with metabolic network
Computational Explorationof PGDBs • Infer metabolic network from genome • Bioinformatics 18:705 2002 • Global properties of the metabolic network • Genome Research 10:568 2000 • Global properties of the genetic network • Comparison of whole metabolic networks • Consistency of a PGDB with respect to known growth-media requirements • Search for gaps in metabolic network • Pacific Symp Biocomputing 2001:471
Example Studies • Relationship of protein subunits to gene positions • Global properties of the E. coli metabolic network • Reactions catalyzed by more than one enzyme • Enzymes that catalyze more than one reaction • Reactions participating in more than one pathway • Automatic detection of intersection points in the metabolic network • Nutrient analyses • Forward propagation: Given a set of nutrients, what compounds will be produced by the metabolic network? • Backtracking: Given a forward propagation result, and a set of essential compounds that are not included in that result, what precursors must be supplied to produce those compounds? • Operon prediction
Protein subunits and linked genes • Question: are protein subunits coded by neighboring genes? • Proteins are linked to genes, gene positions are recorded in the KB • Procedure • Fetch all protein complexes • Subunits are stored in the ‘components’ slot • Each component has a ‘gene’ slot • Genes have ‘left-end-position’ and ‘right-end-position’ slots • Results • Protein subunits of >90% of heteromeric enzymes are encoded by neighboring genes
Global properties: How many reactions are catalyzed by more than one enzyme? • Procedure • get_class_all_instances(‘Reactions’) • We are interested only in reactions with at least one value in their ‘enzymatic-reaction’ slot • result = reactions with more than one value for their ‘enzymatic-reaction’ slot • Results • About 10% of reactions are catalyzed by more than one enzyme • Two classes of multi-enzyme reactions • Homologous enzymes • “Easy” reactions
Global properties: Multifunctional enzymes (how many enzymes catalyze more than one reaction?) • Procedure • get_class_all_instances(‘Proteins’) • result = proteins with more than one value in the ‘catalyzes’ slot • Results • 100 out of 607 enzymes catalyze multiple reactions • This is significantly more than predicted by genome sequencing projects
Global properties: Reactions in multiple pathways • Procedure • get_class_all_instances(‘Reactions’) • result = reactions with more than one value in the ‘in-pathway’ slot • Significance • Reactions that appear in multiple pathways correspond to intersection points in the metabolic network • Could be used to identify candidate reactions for drug targets
Metabolic Overview Queries • Species comparison • Highlight reactions that are • Shared/not-shared with • Any-one/All-of • A specified set of species • Overlay expression data • Absolute or relative expression levels • Reaction colors reflects expression level
A E
Global Consistency Checking of Biochemical Network • Given: • A PGDB for an organism • A set of initial metabolites • Infer: • What set of products can be synthesized by the small-molecule metabolism of the organism • Can known growth medium yield known essential compounds? • Pacific Symposium on Biocomputing p471 2001
Algorithm:Forward Propagation Nutrient set Products PGDB reaction pool Transport “Fire” reactions Metabolite set Reactants
Results • Phase I: Forward propagation • 21 initial compounds yielded only half of 38 essential compounds for E. coli • Phase II: Manually identify • Bugs in EcoCyc (e.g., two objects for tryptophan) • Missing initial protein substrates (e.g., ACP) • Missing pathways in EcoCyc • Phase III: Forward propagation with 11 more initial metabolites • Yielded all 38 essential compounds
Nutrient-Related Analysis:Validation of the EcoCyc Database Results on EcoCyc: Phase I: • Essential compounds • produced 19 • not produced 19 • Total compounds • produced: (28%) • Reactions • Fired (31%)
Missing Essential Compounds Due To • Bugs in EcoCyc • Narrow conceptualization of the problem • Protein substrates • Incomplete biochemical knowledge
Nutrient-Related Analysis:Validation of the EcoCyc Database Results on EcoCyc: Phase II (After adding 11 extra metabolites): • Essential compounds • produced 38 • not produced 0 • Total compounds • produced: (49%) • not produced: (51%) • Reactions • Fired (58%) • Not fired (42%)
Operon Prediction • Based on the method of Moreno-Hagelsieb et al. Bioinformatics 18 Suppl. 1 (2002) • Distance between genes • Functional classification • Correctly predicts 75% of transcription units, 65% of operons • Additional information available in PGDB • Pathways • Protein complexes • Transporters • Improved prediction performance: 80% of transcription units, 69% of operons • Detailed paper in preparation
Visualization of Genetic Network • Operon display window • Transcription factor display window • Highlight regulon on Overview diagram • Paint expression data onto Overview diagram • Database adapter mechanism: MAGE-ML intermediate form • Adapter defined for SMD • Animation • User specified mapping of color ranges • Import of SAM files (next release) • List of significantly +/- genes • Display full genetic network (later release)
SRI Peter Karp, Suzanne Paley, Pedro Romero, John Pick, Randy Gobbel, Cindy Krieger, Martha Arnaud EcoCyc Project Julio Collado-Vides, Ian Paulsen, Monica Riley, Milton Saier MetaCyc Project Sue Rhee, Lukas Mueller, Peifen Zhang, Chris Somerville Stanford Gary Schoolnik, Harley McAdams, Lucy Shapiro, Russ Altman, Iwei Yeh Funding sources: NIH National Center for Research Resources NIH National Institute of General Medical Sciences NIH National Human Genome Research Institute Department of Energy Microbial Cell Project DARPA BioSpice, UPC Acknowledgements BioCyc.org