1 / 30

Homomorphism Mapping in Metabolic Pathways

Homomorphism Mapping in Metabolic Pathways. Qiong Cheng, Dipendra Kaur, Robert Harrison , Alexander Zelikovsky Computer Science in Georgia State University. Dec. 1 2007 RECOMB Satellite Conference on Systems Biology 2007. Outline. Concept of Metabolic pathway comparison

jamar
Download Presentation

Homomorphism Mapping in Metabolic Pathways

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Homomorphism Mapping in Metabolic Pathways Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Dec. 1 2007 RECOMB Satellite Conference on Systems Biology 2007

  2. Outline • Concept of Metabolic pathway comparison • Enzyme similarity • Graph mappings: embeddings & homomorphisms • Min cost homomorphism problem for trees • Optimal DP algorithm for trees • Min cost homomorphism problem for arbitrary graphs • Minimum Feedback vertex set (MFVS) • Searching metabolic networks for • pathway motifs • pathway holes • Web tool • Architecture & Brief interface • Future work

  3. 2.7.1.13 1.1.1.49 1.1.1.34 1.1.1.44 3.1.1.31 A portion of pentose phosphate pathway Metabolic pathway & pathways model • Metabolic pathways model • Metabolic pathway

  4. match match match Mismatch/Substitute Comparison of metabolic pathways • Enzyme similarity and pathway topology together represent the similarity of pathway functionality. • Enzyme Similarity • Pathway topology Similarity

  5. A B C D A’ X’ D’ A’ A B’ X’ B C D Related work • Linear topology (Forst & Schulten[1999], Chen & Hofestaedt[2004];) • Tree topology (Pinter [2005] o(|VG|2|VT|/log|VG|+|VG||VT|log|VT|) ) • Arbitrary topology Mapping : Linear pattern  Graph (Kelly et al 2004) ( o(|VT|i+2|VG|2) ) Exhaustively search (Sharan et al 2005 ( o(i!) o(|VT|i+2|VG|2) ), Yang et al 2007 ( o(2|VG||VG|2) )

  6. = Δ[X, Y] log2c(X, Y) = = = Enzyme mapping cost Enzyme D = d1 . d2 . d3 . d4 • EC (Enzyme Commission) notation • Measure Enzyme similarity score Δ by the lowest common upper class distribution • Measure Δ by tight reaction property Enzyme X = x1 . x2 . x3 . x4 Enzyme Y = y1 . y2 . y3 . y4 Δ[X, Y] = 1 = = = = = Δ[X, Y] = 10 Δ[X, Y] = +∞ otherwise

  7. Sv in VT Δ(v, fv(v)) f T G Graph mappings: embeddings & homomorphisms • Isomorphism • Isomorphic embedding • Homeomorphic embedding • Homomorphism Homomorphism f : T  G: fv : VT VG fe : ET paths of G Edge-to-path cost : l (|fe(e)|-1) = Homomorphism cost + l Se in ET (|fe(e)|-1) We allow different enzymes to be mapped to the same enzyme.

  8. ignoring direction Min cost homomorphism of multi source tree to arbitrary graph • A multi-source tree is a directed graph, whose underlying undirected graph is a tree. • Given an multisource tree T =<VT, ET> (Pattern) and an arbitrary graph G =<VG, EG> (Text), find min cost homomorphism of multisource tree to arbitrary graph f : T  G

  9. A B D C E F D B 1 Text G 1 1 2 3 A 2 C 3 2 Transitive closure 1 1 1 F 2 E Transitive Closure of G : G* Preprocessing of text graph Transitive closure of G is graph G*=(V, E*), where E*={(i,j): there is i-j-path in G}

  10. a b c d Pattern T Ordering a c d b Pattern graph ordering • Construct ordered pattern T’ • DFS traversal • Processing order in opposite way Ordered pattern T’ • Each edge ei in T’ is the unique edge connecting vi • with the previous vertices in the order

  11. a b c Text arbitrary order d Pattern T DP table min cost homomorphism mapping from T’s subgraph induced by previous vertices in the orderinto G* DT[a, uj]

  12. vi uj h(j, j’) vil uj’ G* Pattern T DT[i, j] i<|VT| j<|VG| = Filling DP table • Recursive function Δ(vi , uj)if vi is a leaf in T Δ(vi, uj) + ∑l=1 to adj(vi)Minj’=1 to |VG|C(il, j’) if vi is a leaf in T C[il, jl] = DT[il, jl] + l(h(j, jl) - 1) lis penalty for gaps h(j, jl) = #(hops between uj and ujl in G)

  13. Runtime Analysis for mapping trees • Transitive closure takes O(|VG||EG|). • Pattern graph ordering takes O(|VT| + |ET|) • Dynamic programming - Calculate min contribution of all child pairs of node pair (vi∈T,uj∈G) takes tij = degT (vi)degG*(uj) - Filling DT takes Sj=1 to |VG| Si=1 to |VT|tij = Sj=1 to |VG| degG*(uj)Si=1 to |VT|degT(vi) = 2|EG*||ET| The total runtime for mapping trees is O(VG||EG|+|VG*||VT|).

  14. A B e e a b a b c c d d MFVS • Minimum Feedback vertex set (MFVS) : • Given: an undirected graph G=(V,E) and a nonnegative weight function w on V • Find: a minimum weight subset of V whose removal leaves an acyclic graph. • Bad news MFVS problem is NP-complete. • Good news 2-approximation • Greedy Algorithm • Delete degree 1/0 vertices from V and set remaining vertices to V’ • MFVS<- f • while V’ fdo • pick up the set S of maximal degree vertices • MFVS <- MFVS U S • Delete degree 1/0 vertices from V’

  15. Min cost homomorphism of arbitrary graphs • Given an arbitrary graph P =<VP, EP> (Pattern) and an arbitrary graph G =<VG, EG> (Text), find min cost homomorphism f : P  G • Algorithm • Findminimum feedback vertex set F(P) of P • Construct a multi source tree P’ = <Vp-F(P), Ep(Vp-F(P))> • for every sub mapping f’v: F(P) VG do • obtain min cost homomorphism of multi source tree P’ to arbitrary graph G under sub mapping f’v • choose min cost homomorphism for all sub mappings

  16. Runtime Analysis for mapping arbitrary graphs • Finding min feedback vertex set takes O(|VP| + |ET|) • O(VG |F(P)|) possible mappings for MFVS • Finding min homomorphism mapping of multi source tree to arbitrary graph takes O(VG||EG|+|VG*||VT|). The total runtime is O(VG |F(P)|(|VG||EG|+|VG*||VT|)).

  17. a b a b c d c d Statistical significance • Randomized P-Value computation • Random degree-conserved graph generation: • Reshuffle nodes Reshuffle edge • Reshuffle edges

  18. Experiments & applications • Identifying conserved pathways • 24 pathways that are conserved across all 4 species • 18 more pathways that are conserved across at least three of these species • All-against-all mappings among S. cerevisiae, B. subtilis, T. thermophilus, and E.coli + Hallobacterium • Resolving ambiguity • Discovering pathways holes

  19. Mappings with cycles

  20. Resolving Ambiguity

  21. Pathway holes • Check if there is such enzyme in pattern • Find the closest protein in the same group • If identity is too high > 80% then we expect good filling • Align to previous and next enzyme – the functions may be taken over

  22. Filling pathways holes

  23. Graph Extraction Graph Layout Graph Visualization Graph searching Query Cached Indexing Pathway Database Browsers Visualized Outputs Additional Value Service Web Service Architecture

  24. Web Interface

  25. Future work • Approximation algorithm to handle with the comparison of general graphs • Mining protein interaction network • Discovery of critical elements or modules based on graph comparison • Discovery of evolution relation of organisms by pathway comparison of different organisms at different time points • Integration with genome database

  26. Reference • Ron Y Pinter, Oleg Rokhlenko, Esti Yeger-Lotem, Michal Ziv-Ukelson: Alignment of metabolic pathways. Bioinformatics. LNCS 3109. Springer-Verlag.(Aug 2005)21(16): 3401-8 • Sebastian Wernicke: Combinatorial Algorithms to Cope with the Complexity of Biological Networks. Dissertation (December 2006) • J. Ellson, E. Gansner, E. Koutsofios, S. North, and G. Woodhull. Graphviz and dynagraph - static and dynamic graph drawing tools. In M. Junger and P. Mutzel, editors, Graph Drawing Software, pages 127-148. Springer-Verlag, 2003 • Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, pages 721-724, 2002. • N. Ketkar, L. Holder, D. Cook, R. Shah and J. Coble, Subdue: Compression-based Frequent Pattern Discovery in Graph Data, Proceedings of the ACM KDD Workshop on Open-Source Data Mining, August 2005. • K, Borgwardt, S. Bottger, H. Kriegel, VGM: visual graph mining, International Conference on Management of Data archive Proceedings of the 2006 ACM SIGMOD international conference on Management of data • Q. Cheng, D. Kaur, R. Harrison, and A. Zelikovsky,"Mapping and Filling Metabolic Pathways ", RECOMB Satellite Conference on Systems Biology 2007 • Q. Cheng, R. Harrison, and A. Zelikovsky,"Homomorphisms of Multisource Trees into Networks with Applications to Metabolic Pathways", Proc. of IEEE 7-th International Symposium on BioInformatics and BioEngineering (BIBE'07)

  27. Question? Thanks!

  28. Handling Cycles • Sorting of the pattern such that children can communicate only through parent • “Fix” images for some pattern vertices => interrupt communication through cycles • Feedback vertex set F(T)= VT-F(T) is acyclic • Runtime is increased by factor O(VG |F(T)|) • t(v) = # of “reasonable” text images of v • ∏ t(v) -> min ≈∑ log(t(v)) ->min • 2-approximation algo

  29. Query Cached Indexing Pathway Database Distributed Pathway DB (BioCyc, KEGG) Genome DB Sequence alignment tools (Swiss-prot, Blast, ClusterW) Software architecture of service-oriented pathway mining tool Services Container Ambiguity pairs AI Potential holes Rule based mining DB Pathway Modeling Comparison Storage Indexing …… Data-Control-View Browsers SW PDC Visualized Outputs Simulation Additional Value Service

More Related