1 / 35

Common Intervals in Sequences, Trees, and Graphs

Explore the conservation of gene order in bacterial genomes, study gene clusters, and analyze common intervals in permutations and trees with formalization and algorithms.

woshea
Download Presentation

Common Intervals in Sequences, Trees, and Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Common Intervals in Sequences,Trees, and Graphs Steffen Heber and Jiangtian Li

  2. Genome Comparison of Bacteria Kim et al.,Nat. Biotechnol., 2004]

  3. Gene Order & Function in Bacteria • Gene order in bacteria is weakly conserved.[Gene order is not conserved in bacterial evolution. Mushegian, Koonin; Trends Genet. 1996] • Some genes cluster together even in unrelated species. • Genes inside a cluster are functionally associated.[Conserved clusters of functionally related genes in two bacterial genomes. Tamames et al.; J Mol Evol. 1997]

  4. Gene Order & Function in Bacteria

  5. Gene Order & Function in Bacteria

  6. Formalization of Gene Clusters Genomes: permutations π1, π2 ,…, πk Genes: numbers 1,…,n 1 2 3 4 5 6 7 8 π1 8 7 6 4 5 2 1 3 π2 3 1 2 5 8 7 6 4 π3 6 7 4 2 1 3 8 5 π4

  7. Intervals • For permutation  of [n] = {1, 2, …, n},an interval (=gene cluster) is a set{(i), (i+1), …, (j)} for 1  i < j  n. • Any permutation of [n] has n(n-1)/2 intervals. 1 3 5 4 2 6 7

  8. Common Intervals • For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset SÍ[n], iff S is interval in all i. • We say SCF . 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6

  9. Common Intervals • For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset SÍ[n], iff S is interval in all i. • We say SCF . 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6

  10. Common Intervals • For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset SÍ [n], iff S is interval in all i. • We say SCF . 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6

  11. Lemma Let F = (0, 1, …, k-1) and c, d  CF . • If c d   then c d CF. 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6

  12. Lemma Let F = (0, 1, …, k-1) and c, d  CF . • If c d   then c d CF. • We call c dreducible. 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6 irreducible reducible interval

  13. Analysis • We have K  n(n-1)/2 common intervals, and I<n irreducible intervals. • Find all K common intervals of k  2permutations of [n]:O(kn + K) time & O(n) space

  14. Common Intervals of Trees Let T,T1,…,Tk be trees with vertex set [n]. Definition: • S Í [n] is interval of T iffT[S] connected, and |S|>1 • S Í [n] is common interval of T1,…,Tk, iffS is interval in all trees. • Tree intervals generalize intervals of permutations.

  15. Miscellaneous 2 1 4 5 1 2 3 4 Example: common intervals of T1, T2: { [2], [3], [4], [5] } • (Common) Intervals in trees are induced subtrees. 3 5 T2 T1

  16. Structure of Tree Intervals • Tree intervals have the Helly property, i.e. for any family of tree intervals (Ti)iÎI, the assumption TpÇ Tq¹Æ for every p,qÎI implies ÇiÎITi ¹Æ.

  17. Extreme Cases n-vertex stars Sn-1# non-trivial induced subtrees: 2n-1-1

  18. The Common Interval Graph • Given T = (T1,…,Tk) and corresponding common intervals CT. The common interval graph GT = (V,E) is the graph with V = CT E = {(c,d) | c,d Î CF, cÇd ¹Æ, c ¹ d}

  19. Example 2 1 2 3 4 1 • V=[n], T=(Pn, Sn-1) • We have CT = { [2],[3],…,[n] }, GT = K(CT). 3 4 [2] [n] [3] [4] GT

  20. Common Interval Graphs cont’d A graph is called chordal, if it does not contain an induced cycle Cn on n>3 vertices. Proposition: Common interval graphs of trees are chordal graphs.

  21. Irreducible Common Intervals For a common interval c Î CT and a subset V Í CT we say that V generates c, iff • for each d Î V, d Ì c • c = Ud • GT[V] is connected. If there is no such V then c is irreducible. The irred. intervals generate all common intervals. 1 3 5 2 4 6 7

  22. Finding Irreducible Intervals • We have K < 2n-1 common intervals, and I<n irreducible intervals. • Find all irreducible common intervals of k trees on n vertices:O(kn2) time & O(kn) space

  23. Finding Irreducible Intervals • Irreducible intervals are minimal common intervals containing an adjacent vertex pair. x y x y l z m l z m m m l l y y z x z x

  24. Graph Intervals G=(V,E), undirected, connected graph, V=[n] S Í V is interval (convex), iff the induced subgraph G[S] is connected, and includes every shortest path with end-vertices in S. 1 1 2 3 2 3 4 4 convex NOT!

  25. Common Intervals of Graphs Let G=(G1,…,Gk) family of connected undirected graphs, with vertex set [n]. Definition: S Í [n] is common interval of G, iff S is interval in all graphs. • Graph intervals generalize tree intervals. 1 2 G0 G1 2 3 3 4 4 1

  26. Some Differences • The union of convex sets is NOT always convex.

  27. Some Differences • The common convex hull of an adjacent vertex pair is NOT always irreducible. 3 3 1 2 1 2 G1 G2

  28. Finding Irreducible Graph Intervals Sketch: Given G=(G0, G1, …, Gk-1) For each edge (i,j)ÎEi* do S(i,j) :={i,j} For each (k,l)ÎS(i,j) Add vertices ‘between’ k and l to S(i,j) Remove reducible intervals

  29. Extreme Cases Permutations (identical permutations): • C  n(n-1)/2I < n Trees (identical star-trees): • C < 2n-1I < n Graphs (complete graphs): • C < 2nI  n(n-1)/2

  30. Example: InterDom Database of protein domain interactions. • Gene fusions • Protein-protein interactions (DIP & BIND) • Protein complexes (PDB)

  31. Comparing Two Networks

  32. Comparing Three Networks G : Gene fusion P : PDB B : BIND D : DIP

  33. Irreducible Intervals size of irreducible interval

  34. Biological Meaningful? regulator of chromosome condensation protein kinase PH domain RAS family domain ankyrin repeat

  35. THANK YJU!!!

More Related