1 / 65

Approximate Labelled Subtree Homeomorphism

Approximate Labelled Subtree Homeomorphism. Based on: “Approximate Labelled Subtree Homeomorphism” R. Y. Pinter, O.Rokhlenko, D. Tsur, M. Ziv-Ukelson “Alignment of Metabolic Pathways” R. Y. Pinter, O. Rokhlenko, E. Yeger-Lotem, M. Ziv-Ukelson. The general Idea.

xylia
Download Presentation

Approximate Labelled Subtree Homeomorphism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximate Labelled Subtree Homeomorphism Based on: “Approximate Labelled Subtree Homeomorphism” R. Y. Pinter, O.Rokhlenko, D. Tsur, M. Ziv-Ukelson “Alignment of Metabolic Pathways” R. Y. Pinter, O. Rokhlenko, E. Yeger-Lotem, M. Ziv-Ukelson

  2. The general Idea Biological Problem Converting into terms of computer science problem Finding the solution Reverting back to Biological terms

  3. Metabolism

  4. Ag Stimuli IL-12 IL-12R Thnp Th1 TNF-a Stat 4 IFN- T-Bet IL-2 Proliferation Signal transduction

  5. Why pathways? • Metabolic and regulatory pathways have biological importance. • These pathways are evolutionary conserved.

  6. What do we want to do? • Compare one metabolic pathway of a certain organism against the same metabolic pathways in other organisms. • Compare a metabolic pathway against other metabolic pathways in the same organism.

  7. How do we do (it)?

  8. The subtree homeomorphism problem: Given a pattern tree P and a text tree T, find a subtree of T which is isomorphic to P or decide that there is no such tree. Degree 2 node can be deleted from the text tree. ?

  9. Graph homeomorphism Text Pattern Colors?

  10. Graph homeomorphism Text Pattern

  11. Text Pattern Labels (similarity) topology Graph homeomorphism

  12. Back to 2nd semester… • An unrooted tree is an undirected, acyclic, connected graph (T=(VT,ET(( • A rooted tree is a triple Tr=(VT,ET,r(where (VT,ET( is an unrooted tree, and r is some vertex in V which is called the root. The root node of the tree implies the direction for all the edges in the graph. • A multi-source tree is an acyclic, directed graph, whose underlying undirected graph is a tree.

  13. Back to 2nd semester… A tree is said to be ordered if the relative order of its subtree in each node is fix. Otherwise a tree is unordered. for “ordered” Problem:

  14. What are we allowed to do? • Taking into account both label similarity and topology. • We are permited to delete vertexes from the text tree. • We are NOT permited to delete vertexes from the pattern tree.

  15. What we “gonna” see today:

  16. Some definitions: • Let Δdenote a predefined node-to-node similarity score table. • Let D denote a predefined score for deleting a node from a tree (usually a penalty). • A mapping M from T1 to T2 is a partial one to one map from the nodes of T1 to the nodes of T2 that preserves the ancestor relations of the nodes.

  17. Our problem: Let M be a mapping from T1 to T2 . The Labelled Subtree Homeomorphic Similarity Score of M[T1,T2] is: LSH (M[T1,T2]) = D  (|T1|-|T2|) + ∑ (u,v) ∈ M Δ]u,v] Given two undirected labeled trees P and T, We want to find a mapping M and a subtree t of T, such that: LSH (M [t,P]) is maximal.

  18. Text Pattern Score Score: 2 Scoring Score: 5 Score: 2

  19. Dynamic programming P T v u y1 y2 y3 x1 x2

  20. RScore[u,v] is the maximum between two terms: • The node-to-node similarity value Δ[v,u] plus the sum of the weights of the matched edges in the maximal assignment over G. This term is only compute if c(u) ≤c(v) (otherwise: -∞). • The weight RScore[yi,u] for the comparison of u and the best scoring child yi of v, updated with the penalty for deleting v. C(u) is the number of the children of u

  21. RScore[u,v] - example w Pattern Text score matrix deletion = -1 b v a u Max {5,10-1} = 9 Max {3,9-1} = 8

  22. The assignment problem Let G be a bipartite graph G = (V = X U Y,E) with weights w (x,y) for all edges. The assignment problem is to compute a matching M (list of monogamic pairs) such that: • The size of M is maximal among all the matchings. • From all the matchings above, The sum of the weights is maximal.

  23. Solving the assignment problem • Reduction from the assignment problem to the min cost max flow problem. • We’ll construct G’ which contains G(V,E) with the following changes: • Two more vertexes: s,t • Edges from s to X and from Y to t, while w (s,x) = 0, w (y,t) =0 • The cost of the other edges in E is –w (x,y) • The capacity of all edges is 1 What is it? Among all the maximal flows we’ll choose the cheapest

  24. From assignment to matching v u y3 y2 y1 x1 x2 y2 x1 s y2 t x2 y2

  25. Time complexity analysis • Edmonds and Karp’s algorithm: O(EV*logV) • Fredman and Tarjan: O(VE + V2logV) (independent of the edges cost) • Gabow and Tarjan: O(V1/2Elog(VC) where the input costs are integers and in the range [-C,….,C] (the similarity assumption)

  26. Reminder…

  27. What did we have so far? • Motivation • “Advanced” homeomorphism: labels and topology • Scoring and deletion • Dynamic programming • Matching • Questions?

  28. The algorithm for rooted unordered trees: • Input: Rooted trees T = (VT,ET,r) and P = (VP,EP,r’ )). • Output: The root of the subtree t of T which has the highest similarity score to P, (and homeomorphic to P).

  29. Dynamic programming for each node u of P in postorder do for each node v of T in postorder do ifu is leaf then ifv is leaf then RScore(v, u) = Δ [v,u] else RScores(v,u) = ComputeScores (v,u) end if else ifLevel(u) > Level(v) thenRScore(v, u) = -∞ elseRScores(v,u) = ComputeScores (v,u) end if ; end if; end for; end for Node to node score Delete from the pattern

  30. Procedure ComputeScores (v,u) Let k denote the out-degree of node u and l denote the out degree of node v ifk >l then AssignmentScore(G) = -∞ else Construct a bipartite graph G with node bipartition X and Y such that: X is the set of children {x1…xk{ of u, Y is the set of children {y1…yl{ of v, node ui ∈ X X is connected to node vj ∈ Y via an edge whose weight w(ui,vj) is set to RScore(vj,ui). AssignmentScore(G) = max ∑ (i,j) ∈ M RScores[yj,xi] end if Find, among all children of v, the node BestChild(v,u) whose ALSH score with u is highest: BestChild(v,u) = max j=1 to l RScore(yj,u) return max {Δ [v,u]+AssignmentScore(G),BestChild(v,u)+δ} Deletion penalty

  31. Time complexity analysis Observation 1: ∑u =1 to m c(u) = m-1 ∑v =1 to n c(v) = n-1 The number of the vertexes in the pattern

  32. Time complexity analysis The weighted assignment is computed once for each pair u,v uT, vP In a bipartite graph there are c(v)+c(u) nodes and c(v)c(u) edges. Based on Fredman and Tarjan the time complexity is: O(∑u=1 to m ∑ v=1 to n)c(u)2)c(v)+c(u)c(v) log (c(v)) = (observation 1) O(∑u=1 to m c(u)2)n+c(u)n log n) = (observation 1) O(m2n + mn log n)

  33. Unrooted unordered trees: • The problem: each vertex in both the text tree and the pattern tree can be the root. • The naïve solution: choose an arbitrary node r of T to get a rooted tree. Next, for each u P compute rooted ALSH between Puand Tr. • Time complexity: O(m3n+m2n log n)

  34. 2nd try: • Select an arbitrary node r in T as the root • For each internal node in T (in post order) and for each node in P compute an “improved” matching problem u v

  35. 2nd try: • Select an arbitrary node r in T as the root • For each internal node in T (in post order) and for each node in P compute an “improved” matching problem u v

  36. 2nd try: • Select an arbitrary node r in T as the root • For each internal node in T (in post order) and for each node in P compute an “improved” matching problem u v

  37. General idea for keeping the time complexity • Find the best match between the children {x1,..,xn) of v∈T and {y1,…,ym} of u∈P. • After computing the best match and removing a node xi (which act as the parent of u) there is a way to find the optimal matching between {x1,…,xn}\xi and {y1,…,ym} in O(d(u)c(v)+c(v) log c(v)) • The total time complexity for computing all assignments between v and u: O(d(u)2c(v)+d(u)c(v) log c(v))

  38. Time complexity Observation 2: The sum of vertex degrees in an unrooted tree P is ∑u =1 to m d(u) = 2m-2 We’ve study that at Combinatorics

  39. Time complexity – continue… O((∑u =1 to m ∑v =1 tond(u)2c(v))+d(u)c(v) log c(v)) = O((∑u =1 to m d(u)2n +d(u)n log n) = O(m2n + mn log n) Observation 1 Observatin 2

  40. Up the tree… For each vertex v∈T, u∈P and xi∈ neighbors (u), UScore[v,u, xi[ is the maximal LSH between a subtree pu,xi of P and a corresponding homeomorphic subtree of tv,r if one exists. otherwise, UScore[v,u,xi] is set to -∞ A subtree in P which his root is u and the root’s parent is xi

  41. UScore[u,v,xi] is the maximum between two terms: • The node-to-node similarity value Δ[v,u] plus the sum of the weights of the matched edges in the maximal assignment over Gi. This term is only compute if d(u) - 1 c(v) (otherwise: -∞). • The weight UScore[yi,u,xi] for the comparison of u and the best scoring child yi of v, updated with the penalty for deleting v. d(u) is the degree of u

  42. And if ‘u’ is the root… • We have to compute an additional entry UScore[v,u,Φ]. • This entry represent the fact that u might be the root of P. • The root of P will be node u such that: UScore[v,u,Φ] is maximal.

  43. Multi-source graphs • DAG = Directed Acyclic Graph. • A multi-source tree is a DAG whose its underlying structure is an unrooted, unordered trees.

  44. Multi-source graph - example pattern text UScore[u,v,r’] = -∞ r’ r u v

  45. Multi-source graphs & alignment • We’ll use the algorithm for the unrooted unordered tress. • We’ll filter out subtree alignments that map together edges of conflicting direction. • We’ll split the bipartite graph G = {X U Y,E} into two different graphs: one correspond to macthing of incoming-edge neighbors of u and v and the other for matching outgoing edge neighbors.

  46. ALSH for ordered rooted trees

  47. Solving ALSH for ordered rooted trees • Maximum weighted matching problem on ordered bipartite graphs, where no edges are allowed to cross. • Given a pattern string X, a source Y, and a character to character similarity table Δ[∑X, ∑Y], find among all |X|-sized subsequences of Y the subsequence Q which is most similar to X, that is, the sum ∑i=1 to|X|Δ[Qi,Xi] is maximized.

  48. 0 0 0 ∆ -∞ ki+1 x1 -∞ x2 y1 y2 y3 lj+1 String alignment This is NOT the deletion penalty y1 y2 y3 We can’t delete nodes from the pattern tree

  49. Time complexity for rooted ordered For each node pair (v∈T,u∈P), the time complexity of the assignmentb is O(c(u)c(v)) (dynamic programming) ∑u =1 to m ∑v =1 to n O(c(v)  c(u)) = ∑v =1 to n O(m  c(v)) = O(m  n) Observation 1

More Related