300 likes | 501 Views
Homomorphism Mapping in Metabolic Pathways. Qiong Cheng, Dipendra Kaur, Robert Harrison , Alexander Zelikovsky Computer Science in Georgia State University. Dec. 1 2007 RECOMB Satellite Conference on Systems Biology 2007. Outline. Concept of Metabolic pathway comparison
E N D
Homomorphism Mapping in Metabolic Pathways Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Dec. 1 2007 RECOMB Satellite Conference on Systems Biology 2007
Outline • Concept of Metabolic pathway comparison • Enzyme similarity • Graph mappings: embeddings & homomorphisms • Min cost homomorphism problem for trees • Optimal DP algorithm for trees • Min cost homomorphism problem for arbitrary graphs • Minimum Feedback vertex set (MFVS) • Searching metabolic networks for • pathway motifs • pathway holes • Web tool • Architecture & Brief interface • Future work
2.7.1.13 1.1.1.49 1.1.1.34 1.1.1.44 3.1.1.31 A portion of pentose phosphate pathway Metabolic pathway & pathways model • Metabolic pathways model • Metabolic pathway
match match match Mismatch/Substitute Comparison of metabolic pathways • Enzyme similarity and pathway topology together represent the similarity of pathway functionality. • Enzyme Similarity • Pathway topology Similarity
A B C D A’ X’ D’ A’ A B’ X’ B C D Related work • Linear topology (Forst & Schulten[1999], Chen & Hofestaedt[2004];) • Tree topology (Pinter [2005] o(|VG|2|VT|/log|VG|+|VG||VT|log|VT|) ) • Arbitrary topology Mapping : Linear pattern Graph (Kelly et al 2004) ( o(|VT|i+2|VG|2) ) Exhaustively search (Sharan et al 2005 ( o(i!) o(|VT|i+2|VG|2) ), Yang et al 2007 ( o(2|VG||VG|2) )
= Δ[X, Y] log2c(X, Y) = = = Enzyme mapping cost Enzyme D = d1 . d2 . d3 . d4 • EC (Enzyme Commission) notation • Measure Enzyme similarity score Δ by the lowest common upper class distribution • Measure Δ by tight reaction property Enzyme X = x1 . x2 . x3 . x4 Enzyme Y = y1 . y2 . y3 . y4 Δ[X, Y] = 1 = = = = = Δ[X, Y] = 10 Δ[X, Y] = +∞ otherwise
Sv in VT Δ(v, fv(v)) f T G Graph mappings: embeddings & homomorphisms • Isomorphism • Isomorphic embedding • Homeomorphic embedding • Homomorphism Homomorphism f : T G: fv : VT VG fe : ET paths of G Edge-to-path cost : l (|fe(e)|-1) = Homomorphism cost + l Se in ET (|fe(e)|-1) We allow different enzymes to be mapped to the same enzyme.
ignoring direction Min cost homomorphism of multi source tree to arbitrary graph • A multi-source tree is a directed graph, whose underlying undirected graph is a tree. • Given an multisource tree T =<VT, ET> (Pattern) and an arbitrary graph G =<VG, EG> (Text), find min cost homomorphism of multisource tree to arbitrary graph f : T G
A B D C E F D B 1 Text G 1 1 2 3 A 2 C 3 2 Transitive closure 1 1 1 F 2 E Transitive Closure of G : G* Preprocessing of text graph Transitive closure of G is graph G*=(V, E*), where E*={(i,j): there is i-j-path in G}
a b c d Pattern T Ordering a c d b Pattern graph ordering • Construct ordered pattern T’ • DFS traversal • Processing order in opposite way Ordered pattern T’ • Each edge ei in T’ is the unique edge connecting vi • with the previous vertices in the order
a b c Text arbitrary order d Pattern T DP table min cost homomorphism mapping from T’s subgraph induced by previous vertices in the orderinto G* DT[a, uj]
vi uj h(j, j’) vil uj’ G* Pattern T DT[i, j] i<|VT| j<|VG| = Filling DP table • Recursive function Δ(vi , uj)if vi is a leaf in T Δ(vi, uj) + ∑l=1 to adj(vi)Minj’=1 to |VG|C(il, j’) if vi is a leaf in T C[il, jl] = DT[il, jl] + l(h(j, jl) - 1) lis penalty for gaps h(j, jl) = #(hops between uj and ujl in G)
Runtime Analysis for mapping trees • Transitive closure takes O(|VG||EG|). • Pattern graph ordering takes O(|VT| + |ET|) • Dynamic programming - Calculate min contribution of all child pairs of node pair (vi∈T,uj∈G) takes tij = degT (vi)degG*(uj) - Filling DT takes Sj=1 to |VG| Si=1 to |VT|tij = Sj=1 to |VG| degG*(uj)Si=1 to |VT|degT(vi) = 2|EG*||ET| The total runtime for mapping trees is O(VG||EG|+|VG*||VT|).
A B e e a b a b c c d d MFVS • Minimum Feedback vertex set (MFVS) : • Given: an undirected graph G=(V,E) and a nonnegative weight function w on V • Find: a minimum weight subset of V whose removal leaves an acyclic graph. • Bad news MFVS problem is NP-complete. • Good news 2-approximation • Greedy Algorithm • Delete degree 1/0 vertices from V and set remaining vertices to V’ • MFVS<- f • while V’ fdo • pick up the set S of maximal degree vertices • MFVS <- MFVS U S • Delete degree 1/0 vertices from V’
Min cost homomorphism of arbitrary graphs • Given an arbitrary graph P =<VP, EP> (Pattern) and an arbitrary graph G =<VG, EG> (Text), find min cost homomorphism f : P G • Algorithm • Findminimum feedback vertex set F(P) of P • Construct a multi source tree P’ = <Vp-F(P), Ep(Vp-F(P))> • for every sub mapping f’v: F(P) VG do • obtain min cost homomorphism of multi source tree P’ to arbitrary graph G under sub mapping f’v • choose min cost homomorphism for all sub mappings
Runtime Analysis for mapping arbitrary graphs • Finding min feedback vertex set takes O(|VP| + |ET|) • O(VG |F(P)|) possible mappings for MFVS • Finding min homomorphism mapping of multi source tree to arbitrary graph takes O(VG||EG|+|VG*||VT|). The total runtime is O(VG |F(P)|(|VG||EG|+|VG*||VT|)).
a b a b c d c d Statistical significance • Randomized P-Value computation • Random degree-conserved graph generation: • Reshuffle nodes Reshuffle edge • Reshuffle edges
Experiments & applications • Identifying conserved pathways • 24 pathways that are conserved across all 4 species • 18 more pathways that are conserved across at least three of these species • All-against-all mappings among S. cerevisiae, B. subtilis, T. thermophilus, and E.coli + Hallobacterium • Resolving ambiguity • Discovering pathways holes
Pathway holes • Check if there is such enzyme in pattern • Find the closest protein in the same group • If identity is too high > 80% then we expect good filling • Align to previous and next enzyme – the functions may be taken over
Graph Extraction Graph Layout Graph Visualization Graph searching Query Cached Indexing Pathway Database Browsers Visualized Outputs Additional Value Service Web Service Architecture
Future work • Approximation algorithm to handle with the comparison of general graphs • Mining protein interaction network • Discovery of critical elements or modules based on graph comparison • Discovery of evolution relation of organisms by pathway comparison of different organisms at different time points • Integration with genome database
Reference • Ron Y Pinter, Oleg Rokhlenko, Esti Yeger-Lotem, Michal Ziv-Ukelson: Alignment of metabolic pathways. Bioinformatics. LNCS 3109. Springer-Verlag.(Aug 2005)21(16): 3401-8 • Sebastian Wernicke: Combinatorial Algorithms to Cope with the Complexity of Biological Networks. Dissertation (December 2006) • J. Ellson, E. Gansner, E. Koutsofios, S. North, and G. Woodhull. Graphviz and dynagraph - static and dynamic graph drawing tools. In M. Junger and P. Mutzel, editors, Graph Drawing Software, pages 127-148. Springer-Verlag, 2003 • Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, pages 721-724, 2002. • N. Ketkar, L. Holder, D. Cook, R. Shah and J. Coble, Subdue: Compression-based Frequent Pattern Discovery in Graph Data, Proceedings of the ACM KDD Workshop on Open-Source Data Mining, August 2005. • K, Borgwardt, S. Bottger, H. Kriegel, VGM: visual graph mining, International Conference on Management of Data archive Proceedings of the 2006 ACM SIGMOD international conference on Management of data • Q. Cheng, D. Kaur, R. Harrison, and A. Zelikovsky,"Mapping and Filling Metabolic Pathways ", RECOMB Satellite Conference on Systems Biology 2007 • Q. Cheng, R. Harrison, and A. Zelikovsky,"Homomorphisms of Multisource Trees into Networks with Applications to Metabolic Pathways", Proc. of IEEE 7-th International Symposium on BioInformatics and BioEngineering (BIBE'07)
Question? Thanks!
Handling Cycles • Sorting of the pattern such that children can communicate only through parent • “Fix” images for some pattern vertices => interrupt communication through cycles • Feedback vertex set F(T)= VT-F(T) is acyclic • Runtime is increased by factor O(VG |F(T)|) • t(v) = # of “reasonable” text images of v • ∏ t(v) -> min ≈∑ log(t(v)) ->min • 2-approximation algo
Query Cached Indexing Pathway Database Distributed Pathway DB (BioCyc, KEGG) Genome DB Sequence alignment tools (Swiss-prot, Blast, ClusterW) Software architecture of service-oriented pathway mining tool Services Container Ambiguity pairs AI Potential holes Rule based mining DB Pathway Modeling Comparison Storage Indexing …… Data-Control-View Browsers SW PDC Visualized Outputs Simulation Additional Value Service