1 / 91

Graph-based analysis of biochemical networks

aMAZE - Protein Function and Biochemical Processes. Graph-based analysis of biochemical networks. Contents. Mapping metabolic networks onto a graph Taversal rules for metabolic graphs Path finding Path finding in weighted graphs Pathway reconstruction by reaction clustering

idana
Download Presentation

Graph-based analysis of biochemical networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. aMAZE - Protein Function and Biochemical Processes Graph-based analysis of biochemical networks Jacques van HeldenJacques.van.Helden@ulb.ac.be

  2. Contents • Mapping metabolic networks onto a graph • Taversal rules for metabolic graphs • Path finding • Path finding in weighted graphs • Pathway reconstruction by reaction clustering • From gene expression data to pathways • Recurrent modules

  3. Graph-based analysis of biochemical networks Mapping metabolic networks onto graphs Jacques van HeldenJacques.van.Helden@ulb.ac.be

  4. Metabolic network L-Homoserine SuccinylSCoA AcetlyCoA 2.3.1.46 2.3.1.31 HSCoA CoA Alpha-succinyl-L-Homoserine L-Cysteine E.coli S.cerevisiae O-acetyl-homoserine 4.2.99.9 Succinate Cystathionine H2O Sulfide 4.4.1.8 4.2.99.10 NH4+ Pyruvate Homocysteine 5-MethylTHF 2.1.1.14 THF L-Methionine

  5. One node per compound L-Homoserine SuccinylSCoA AcetlyCoA 2.3.1.46 2.3.1.46 2.3.1.46 2.3.1.46 HSCoA CoA Alpha-succinyl-L-Homoserine L-Cysteine 4.2.99.9 O-acetyl-homoserine 4.2.99.9 4.2.99.9 4.2.99.9 Succinate Cystathionine H2O Sulfide NH4+ Pyruvate Homocysteine • vertices = compounds • arcs = reactions • problem: no representation of cross-point reactions 5-MethylTHF THF L-Methionine

  6. One node per reaction 2.3.1.46 2.3.1.31 Alpha-succinyl-L-Homoserine O-acetyl-homoserine 4.2.99.9 Cystathionine 4.4.1.8 4.2.99.10 Homocysteine Homocysteine • vertices = reactions • arcs = intermediate compounds • problem: no representation of cross-point compounds 2.1.1.14

  7. One node per compound and per reaction L-Homoserine SuccinylSCoA AcetlyCoA 2.3.1.46 2.3.1.31 HSCoA CoA Alpha-succinyl-L-Homoserine L-Cysteine O-acetyl-homoserine 4.2.99.9 Succinate Cystathionine H2O Sulfide 4.4.1.8 4.2.99.10 NH4+ • 2 types of vertices • compounds and reactions • arcs • from substrate to reaction • from reaction to product • arc labels can be used to represent stoichiometry Pyruvate Homocysteine 5-MethylTHF 2.1.1.14 THF L-Methionine

  8. a bipartite graph is a graph whose vertex-set V can be partitioned into two subsets U and W, such that each edge of G has one endpoint in U and one endpoint in W. arcs never go from compound to compound arcs never go from reaction to reaction 5,871 compounds 5,223reactions Reactions and compounds: directed bipartite graph 21,194arcs

  9. Extending the graph to full biochemical networks • The concept can be extended to include additional types of vertices : • biochemical entities : compounds, genes, proteins, … • biochemical interactions : reaction, catalysis, transcription, regulation, translocation, transport catalysis… • This allows to represent metabolism, regulation, transport, signal transduction, compartments, … • Warning : with this extension, the graph is not bipartite anymore, because some interactions have other interactions as output (e.g. a catalysis acts on a reaction) • van Helden et al. (2000) Biol Chem, 381(9-10), 921-35. • van Helden et al. (2001) Briefings in Bioinformatics, 2(1), 81-93. • van Helden et al. (2002) In Bioinformatics and Genome Analysis. Springer-Verlag, Berlin Heidelberg, Vol. 38.

  10. Graph-based analysis of metabolic networks Traversal rules for metabolic graphs Jacques van HeldenJacques.van.Helden@ulb.ac.be

  11. Ubiquitous compounds Reactions L-Aspartic Semialdehyde dihydrodipicolinic acid 4.2.1.52 Pyruvate H2O Sucinyl diaminopimelate succinate 3.5.1.18 H2O LL-diaminopimelic acid Invalid pathway L-Aspartic Semialdehyde LL-diaminopimelic acid 4.2.1.52 3.5.1.18 H2O

  12. Compound connectivity

  13. Compound connectivity

  14. Reaction connectivity

  15. Reaction connectivity - without ubiquitous compounds

  16. Invalid intermediates • Where to set the limit ? • Seems obvious for H2O (1615), NADH (569), ... • What about ATP (435) ? • And pyruvate ? • And NH3 ? • Depends on the reaction/pathway considered • e.g. ATP is valid intermediate in nucleotide biosynthesis • Depends on the atoms being transferred during the reaction • e.g. NADH gives one proton • Depends on the focus of the question • e.g. analysis of energy metabolism ATP, NAD will matter

  17. Ubiquitous compounds • Jeong et al. (Nature2000; 407: 651-654) • Calculate network diameter, i.e. average length of shortest path between two compounds • Show that when ubiquitous compounds ("hubs" in their terminology) are removed, diameter increases. • Compared the metabolic network diameter between different organisms. • "Surprising" result: the network diameter does not depend on the number of enzymes found in the organism. • But: for this comparison, all compounds were considered, including H2O.

  18. Direct traversal of reversible reactions Reaction L-Aspartic Semialdehyde dihydrodipicolinic acid 4.2.1.52 Pyruvate H2O Valid pathways 4.2.1.52 L-Aspartic Semialdehyde dihydrodipicolinic acid 4.2.1.52 dihydrodipicolinic acid L-Aspartic Semialdehyde Invalid pathway L-Aspartic Semialdehyde 4.2.1.52 Pyruvate

  19. Mutual exclusion of reverse reactions Reactions L-Aspartic Semialdehyde dihydrodipicolinic acid 4.2.1.52 Pyruvate H2O dihydrodipicolinic acid L-Aspartic Semialdehyde 4.2.1.52 reverse H2O Pyruvate Invalid pathway 4.2.1.52 reverse L-Aspartic Semialdehyde dihydrodipicolinic acid 4.2.1.52 Pyruvate

  20. Traversal of reversible reactions • Fell& Wagner (Nat Biotechnol2000; 18, 1121-2) • Select a sub-network (energy metabolism and small molecule biosynthesis in E.coli). • Discard ubiquitous compounds. • Identify the "center" of the network : glutamate, followed by pyruvate. • But: reactions can be traversed from substrate to substrate or from product to product. • Jeong et al. (Nature2000; 407: 651-654) • Calculate network diameter. • But: reactions can be traversed from substrate to substrate or from product to product.

  21. Graph-based analysis of biochemical networks Path finding Jacques van HeldenJacques.van.Helden@ulb.ac.be

  22. Applications of path finding to biochemical networks • metabolic pathways from compound A to compound B (2-ends path finding) • genes regulated by a membrane receptor via a signal transduction pathway (1-end path-finding) • proteins and compounds regulating directly or indirectly the expression of a given gene (1-end path finding, reverse) • feed-back loops (cycle finding) • functional distance between two enzymes, in terms of the minimal number of steps between the reactions they catalyze

  23. A graph of compounds and reactions Reactions from KEGG • Compound nodes • 10,166 compounds(only 4302 used by one reaction) • Reaction nodes • 5,283 reactions • Arcs • 10,685 substrate  reaction (7,297 non-trivial) • 10,621 reaction  product(6,828 non-trivial)

  24. Escherichia coli 4219 Genes (Blattner) 967 enzymes (Swissprot) 159 pathways (EcoCyc) Metabolic Pathways as subgraphs

  25. Functional distance between enzymes • The length of the shortest path between two reactions can be considered as a measure of their functional distance. • By extension, one can estimate the functional distance between two enzymes as the length of the shortest path between the ctalayzed reactions. • Example of application: interpretation of pairs of fused genes • Two enzymatic functions can be carried by a single gene in a genome, and by two separated genes in another genomes, as the result of a gene fusion event • Are such fusion events preferentially observed between functionally related enzymes ?

  26. Shortest path finding with gene fusion pairs enzyme A enzyme B • Fusion pairs • Tsoka and Ouzounis (Nat Genet2000; 26: 141-2) • Shortest path analysis • van Helden et al. (2002) In Bioinformatics and Genome Analysis. Springer-Verlag, Berlin Heidelberg, Vol. 38. reactions compounds functional distancebetween enzymes shortest path finding Fusion pairs Random pairs

  27. Pathway enumeration source compound target compound • Kuffner et al. (Bioinformatics 2000; 16: 825-836). • All possible paths from glucose to pyruvate, with maximal length 9  500,000 possible paths. • Adding constraints • Selecting "complete" pathways, i.e. where all side reactants are ubiquitous • Constraint on pathway width • Width 2  541 pathways • Width 1  170 pathways reactions compounds potential metabolic pathways path finding

  28. select reactions (for each pathway separately) set of reactions genesenzymes identification of enzymes enzyme-coding genes gene expressiondata scoring of gene cluster (covariance of the response) most probably relevant pathways Scoring pathways with gene expression data source compound target compound reactions compounds potential metabolic pathways path finding

  29. random control (glycolysis) found Scoring pathways with gene expression data pathway score distribution Zien, A., Kuffner, et al. (2000). Ismb8, 407-17.

  30. Path finding - summary • Metabolic pathways are organism-dependent • Shortest path is generally not the most relevant. • Simple path enumeration returns innumerable false positives. • Adding consistency rules (complete pathways) reduces the number of returned pathways. • Pathway scoring allows to select the most relevant pathways for a given organism. • Requirements • Gene expression data • Specification of the source and target compounds

  31. Graph-based analysis of biochemical networks Pathway building by reaction clustering Jacques van HeldenJacques.van.Helden@ulb.ac.be

  32. Reconstructing a pathway from a subset of reactions • Input: • a set of reactions (the seed reactions) • Output: • a metabolic pathway including • the seed reactions, together with their substrates and products • optionally, some additional reactions, intercalated to improve the pathway connectivity • the pathway can either be connected, or contain several unconnected components

  33. Seed nodes Compound Reaction Seed Reaction

  34. Linking seed nodes Compound Reaction Seed Reaction Direct link

  35. Compound Reaction Seed Reaction Direct link Intercalated reaction Enhance linking by intercalating reactions

  36. Subgraph extraction

  37. Validation of the method • Take a known pathway (e.g. Lysine biosynthesis in Escherichia coli: 9 reactions). • Provide the program with a subset of reactions. • See if the program is able to reconstruct the whole pathway on the basis of this subset.

  38. Validation of the method • Take a set of experimentally characterized pathways, and for each one • Select a subset of enzymes • Use the reactions catalysed by these enzymes as seed nodes • Extract the subgraph • Compare with known pathway

  39. Lysine biosynthesis in E.coli Aspartate biosynthesis L-Aspartate ATP aspartate kinase III lysC 2.7.2.4 ADP L-aspartyl-4-P NADPH; H+ aspartate semialdehyde deshydrogenase asd Methionine biosynthesis 1.2.1.11 NADP+; Pi L-aspartic semialdehyde Threnonine biosynthesis pyruvate dihydrodipicolinate synthase dapA 4.2.1.52 2 H2O dihydropicolinic acid NADPH or NADH; H+ dihydrodipicolinate reductase dapB 1.3.1.26 NADP+ or NAD+ tetrahydrodipicolinate succinyl CoA tetrahydrodipicolinae N-succinyltransferase dapD 2.3.1.117 CoA N-succinyl-epsilon-keto-L-alpha-aminopimelic acid glutamate succinyl diaminopimelateaminotransferase dapC 2.6.1.17 alpha-ketoglutarate succinyl diaminopimelate H2O N-succinyldiaminopimelatedesuccinylase dapE 3.5.1.18 succinate LL-diaminopimelic acid diaminopimelateepimerase dapF 5.1.1.7 meso-diaminopimelic acid diaminopimelatedecarboxylase lysRprotein lysR lysA 3.5.1.18 CO2 L-lysine

  40. Example: reconstitution of lysine pathway • Gap size: 0 • all Ecs from original pathway are provided as seeds • Seeds • 1.2.1.11 • 1.3.1.26 • 2.3.1.117 • 2.6.1.17 • 2.7.2.4 • 3.5.1.18 • 4.1.1.20 • 4.2.1.52 • 5.1.1.7 • Result: • Inferring reaction orientation(reverse or forward) • Ordering

  41. Example: reconstitution of lysine pathway • Gap size: 1 • 5 seed reactions • Result • Inferring missing steps • Inferring reaction orientation • Ordering

  42. Example: reconstitution of lysine pathway • Gap size: 2 • 4 seed reactions • Result • E.coli pathway found • Alternative pathways also returned

  43. Example: reconstitution of lysine pathway • Gap size: 3 • 3 seed reactions • Result • E.coli pathway is not found, because the program finds shortcuts between the seed reactions

  44. Building pathways from operons • Pathways obtained with the pathway builder, using the genes from His operon as seeds

  45. Applications of pathway reconstruction • We have the complete genome for more than 100 bacteria • For these genomes, • there is almost no experimental characterization of metabolism • enzymes have been predicted by sequence similarity. • gene expression data will in some cases be available, in most cases not. • In some cases, one expects to find the same pathways as in model organisms, but in other cases, variants or completely distinct pathways

  46. Strategy 1: starting from annotated pathways • For each known pathway from model organisms • Select the subset of reactions for which an enzyme exists in the target organism • If a reasonable number of reactions are present • Using these as seeds, reconstruct a pathway • This strategy is likely to detect some variants of the annotated pathways, but is not able to predict novel pathways.

  47. Strategy 2 - starting from predicted functional groups • Comparative genomics provides us with clues about functional modularity • operons can be predicted following different methods, and reveal some level of modular organisation. • groups of synteny can also reveal functional modules. • phylogenetic profiles reveal groups of co-evolving genes, which are generally involved in a same process or pathway. • Strategy • predict operons, groups of synteny, and groups of co-evolving genes • with each of these groups • select enzyme-coding genes • identify the reactions catalyzed by their products • use these reactions as seeds for the pathway builder

  48. Graph-based analysis of biochemical networks Path finding in weighted graphs Jacques van HeldenJacques.van.Helden@ulb.ac.be

  49. Path finding in a weighted graph • Assign a higher weight to highly connected compounds. This allows to work with the whole graph, but reduce the probability to use a pool metabolite as intermediate between two successive reactions. • Assign a smaller weight to reactions for which an enzyme has been identified in the genome. This will favour organism-specific pathways, without preventing to use spontaneous reactions or reactions catalysed by an unidentified enzyme in this organism. • When gene expression data is available, assign a weight to reactions according to the level of expression of the corresponding enzymes. This will favour context-specific pathways.

  50. L-Aspartate 2.7.2.4 S.cerevisiae E.coli L-aspartyl-4-P 1.2.1.11 L-aspartic semialdehyde 1.1.1.3 L-Homoserine 2.3.1.31 2.3.1.46 Alpha-succinyl-L-Homoserine O-acetyl-homoserine 4.2.99.9 Cystathionine 4.2.99.10 4.4.1.8 Homocysteine 2.1.1.14 L-Methionine 2.5.1.6 S-Adenosyl-L-Methionine Test case: methionine biosynthesis

More Related