350 likes | 367 Views
Explore the conservation of gene order in bacterial genomes, study gene clusters, and analyze common intervals in permutations and trees with formalization and algorithms.
E N D
Common Intervals in Sequences,Trees, and Graphs Steffen Heber and Jiangtian Li
Genome Comparison of Bacteria Kim et al.,Nat. Biotechnol., 2004]
Gene Order & Function in Bacteria • Gene order in bacteria is weakly conserved.[Gene order is not conserved in bacterial evolution. Mushegian, Koonin; Trends Genet. 1996] • Some genes cluster together even in unrelated species. • Genes inside a cluster are functionally associated.[Conserved clusters of functionally related genes in two bacterial genomes. Tamames et al.; J Mol Evol. 1997]
Formalization of Gene Clusters Genomes: permutations π1, π2 ,…, πk Genes: numbers 1,…,n 1 2 3 4 5 6 7 8 π1 8 7 6 4 5 2 1 3 π2 3 1 2 5 8 7 6 4 π3 6 7 4 2 1 3 8 5 π4
Intervals • For permutation of [n] = {1, 2, …, n},an interval (=gene cluster) is a set{(i), (i+1), …, (j)} for 1 i < j n. • Any permutation of [n] has n(n-1)/2 intervals. 1 3 5 4 2 6 7
Common Intervals • For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset SÍ[n], iff S is interval in all i. • We say SCF . 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6
Common Intervals • For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset SÍ[n], iff S is interval in all i. • We say SCF . 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6
Common Intervals • For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset SÍ [n], iff S is interval in all i. • We say SCF . 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6
Lemma Let F = (0, 1, …, k-1) and c, d CF . • If c d then c d CF. 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6
Lemma Let F = (0, 1, …, k-1) and c, d CF . • If c d then c d CF. • We call c dreducible. 1 0 1 3 5 4 2 6 7 2 4 5 1 3 7 6 irreducible reducible interval
Analysis • We have K n(n-1)/2 common intervals, and I<n irreducible intervals. • Find all K common intervals of k 2permutations of [n]:O(kn + K) time & O(n) space
Common Intervals of Trees Let T,T1,…,Tk be trees with vertex set [n]. Definition: • S Í [n] is interval of T iffT[S] connected, and |S|>1 • S Í [n] is common interval of T1,…,Tk, iffS is interval in all trees. • Tree intervals generalize intervals of permutations.
Miscellaneous 2 1 4 5 1 2 3 4 Example: common intervals of T1, T2: { [2], [3], [4], [5] } • (Common) Intervals in trees are induced subtrees. 3 5 T2 T1
Structure of Tree Intervals • Tree intervals have the Helly property, i.e. for any family of tree intervals (Ti)iÎI, the assumption TpÇ Tq¹Æ for every p,qÎI implies ÇiÎITi ¹Æ.
Extreme Cases n-vertex stars Sn-1# non-trivial induced subtrees: 2n-1-1
The Common Interval Graph • Given T = (T1,…,Tk) and corresponding common intervals CT. The common interval graph GT = (V,E) is the graph with V = CT E = {(c,d) | c,d Î CF, cÇd ¹Æ, c ¹ d}
Example 2 1 2 3 4 1 • V=[n], T=(Pn, Sn-1) • We have CT = { [2],[3],…,[n] }, GT = K(CT). 3 4 [2] [n] [3] [4] GT
Common Interval Graphs cont’d A graph is called chordal, if it does not contain an induced cycle Cn on n>3 vertices. Proposition: Common interval graphs of trees are chordal graphs.
Irreducible Common Intervals For a common interval c Î CT and a subset V Í CT we say that V generates c, iff • for each d Î V, d Ì c • c = Ud • GT[V] is connected. If there is no such V then c is irreducible. The irred. intervals generate all common intervals. 1 3 5 2 4 6 7
Finding Irreducible Intervals • We have K < 2n-1 common intervals, and I<n irreducible intervals. • Find all irreducible common intervals of k trees on n vertices:O(kn2) time & O(kn) space
Finding Irreducible Intervals • Irreducible intervals are minimal common intervals containing an adjacent vertex pair. x y x y l z m l z m m m l l y y z x z x
Graph Intervals G=(V,E), undirected, connected graph, V=[n] S Í V is interval (convex), iff the induced subgraph G[S] is connected, and includes every shortest path with end-vertices in S. 1 1 2 3 2 3 4 4 convex NOT!
Common Intervals of Graphs Let G=(G1,…,Gk) family of connected undirected graphs, with vertex set [n]. Definition: S Í [n] is common interval of G, iff S is interval in all graphs. • Graph intervals generalize tree intervals. 1 2 G0 G1 2 3 3 4 4 1
Some Differences • The union of convex sets is NOT always convex.
Some Differences • The common convex hull of an adjacent vertex pair is NOT always irreducible. 3 3 1 2 1 2 G1 G2
Finding Irreducible Graph Intervals Sketch: Given G=(G0, G1, …, Gk-1) For each edge (i,j)ÎEi* do S(i,j) :={i,j} For each (k,l)ÎS(i,j) Add vertices ‘between’ k and l to S(i,j) Remove reducible intervals
Extreme Cases Permutations (identical permutations): • C n(n-1)/2I < n Trees (identical star-trees): • C < 2n-1I < n Graphs (complete graphs): • C < 2nI n(n-1)/2
Example: InterDom Database of protein domain interactions. • Gene fusions • Protein-protein interactions (DIP & BIND) • Protein complexes (PDB)
Comparing Three Networks G : Gene fusion P : PDB B : BIND D : DIP
Irreducible Intervals size of irreducible interval
Biological Meaningful? regulator of chromosome condensation protein kinase PH domain RAS family domain ankyrin repeat