270 likes | 459 Views
Sorting by reversals. Bogdan Pasaniuc Dept. of Computer Science & Engineering. Overview. Biological background Definitions Unsigned Permutations Approximation Algorithm Sorting Signed Permutations Simplified Algorithm. Mouse (X chrom.). What is the evolutionary path ?
E N D
Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering
Overview • Biological background • Definitions • Unsigned Permutations • Approximation Algorithm • Sorting Signed Permutations • Simplified Algorithm
Mouse (X chrom.) • What is the evolutionary path ? • What is the ancestor chromosome? • Chromosomes lists of genes permutation Unknown ancestor Human (X chrom.)
Mutation at chromosome level • Inversion (1 2 3 4 5 6 7) (1 4 3 2 5 6 7) • Transposition (1 2 3 4 5 6 7) (1 5 6 2 3 4 7) • Translocation (1 2 3 4 5 6 7) (1 2 3 4 5 2 3 4 6 7) • Inversions • Known as reversals • The most common • Most often reflect the differences between and within species • What is the minimum number of reversals required to transform one perm. into another? • Reversal distance good approx. for evolutionary distance
1 2 3 9 10 8 4 7 5 6 Reversals Genes (blocks) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Reversals 1 2 3 9 10 8 4 7 5 6 1, 2, 3, 8, 7, 6, 5, 4, 9, 10
Reversals 1 2 3 9 10 8 4 7 Breakpoints 5 6 1, 2, 3, 8, 7, 6, 5, 4, 9, 10
Breakpointa pair of adjacent positions (i,i+1) s. t. | i - i+1| ≠ 1 • The values i i+1are not consecutive • If | i - i+1| = 1 then the values i i+1are adjacent • Introduce 0= 0 , n+1 = n+1 • (0,1) breakpoint if 1≠ 1 • (n,n+1) breakpoint if n≠ n • A reversal affects the breakpoints only at its endpoints Any reversal can remove or induce at most 2 bkpts.
Strip A maximal run of increasing (decreasing) elements. • Identity permutation has no breakpoints and any other permutation has at least one breakpoint • Greedy at each step remove the maximum number of breakpoints. • Ф() = number of breakpoints in • While(Ф() > 0) • Choose a reversal that removes the maximum number of breakpoints. (if there is a tie favor the reversal that leaves a decreasing strip) • Greedy ends in at most Ф() steps.
Quality of approximation Lemma1:Every permutation with a decreasing strip has a reversal that removes one breakpoint. Proof: consider the decreasing strip with i being the smallest i -1 must be in an increasing strip that lies to the left or right Breakpoint that will be removed
Lemma2: has a decreasing strip. If every reversal that removes one bkpt leaves a permutation with no decreasing strips has a reversal that removes two bkpts. Proof: • consider the decreasing strip with i being the smallest increasing strip must be to the left. i • consider the decreasing strip with jbeing the largest decreasing strip containing j+1must be to the right. j
Fact 1: i andj must overlap • j must lie in i if it doesn’t then oi has the decreasing strip that contains j • i must lie in j if it doesn’t then ojhas the decreasing strip that contains i
Fact 2. i =j If i -j ≠ 0 then - if i -j contains an increasing strip ojhas a decreasing strip - if i -j contains an decreasing strip oihas a decreasing strip • Then =i = removes 2 breakpoints.
Lemma 3:Greedy solves a permutation with a decreasing strip in at most Ф() – 1 reversals • Obs:if i has no decreasing strip at step i-1 the reversal removed 2 bkpts. • we can use one reversal to create a decr. strip exists a reversal that removes at least one bkpt • Theorem1: Greedy sorts every permutation in at most Ф() reversals. • If has a decreasing strip at most Ф() -1 reversals • If has no decreasing strip every reversal induces a decreasing strip after one step we can apply lemma3 at most Ф() reversals
Corollary:Greedy is a 2-approximation algorithm • Every reversal removes at most 2 bkpts • OPT() ≥ Ф() /2 ≥ Greedy() /2 • Greedy() ≤ 2* OPT() . • Runtime #of steps O(n). At each step we need to analyze reversalsO(n2). Total runtime = O(n3). analyze only reversals that remove bkpts O(n2).
Signed permutations: reversals change the sign: (1,2,3,4,5,6,7,8,9,10) (1,2,3,-8,-7,-6,-5,-4,9,10) Problem: Given a signed perm., find the minimum length series of reversals that transforms it into the identity perm. polynomial algorithm (Hannenhalli&Pevzner ’95) relies on several intermediary constructions these constructions have been simplified first completely elementary treatment of the problem (Bergeron ’05)
Oriented pair a pair of consecutive integers with different signs (0,3,1,6,5,-2,4,7) o.p. (3,-2) and (1,-2). • o.p. reversals that create consecutive integers (3,-2) : (0,3,1,6,5,-2,4,7) (0,3,2,-5,-6,-1,4,7) (1,-2) : (0,3,1,6,5,-2,4,7) (0,3,-5,-6,-1,-2,4,7) • Oriented reversal: reversal that creates consecutive integers • Score of a reversal: # of oriented pairs it creates.
Algorithm1: As long as has an oriented pair, choose the oriented reversal that has the maximal score. output will be a permutation with positive elements. 0 and n+1 are positive; if there is a negative element there exists an o.p. Claim1: If Alg1 applies k reversals to , yielding ’ then d() = d(’) + k.
(0 2 5 4 3 6 1 7 ) (0 2 5 4 3 6 1 7 ) (0 2 5 4 3 6 1 7 ) Sorting positive perms.: - signed perm. with positive elements - circular order: 0 successor of n+1. - reduced if it does not contain consecutive elements. framed interval in : i j+1 j+2 …j+k-1i+k s.t. i < j+1 j+2 … j+k-1 < i+k (0 2 5 4 3 6 1 7 ) hurdle a framed int. that contains no shorter framed int.
Idea: create oriented pairs and then apply Algorithm1 Operations on Hurdles: • Hurdle Cutting:i j+1 j+2 …i+1…j+k-1i+k (0 1 4 3 2 5) (0 -3 -4 -1 2 5) • Hurdle Merging: i … i+k … i’ … i’…i’+k’ (0 2 5 4 3 6 1 7) • Simple hurdle if cutting it decreases the # of hurdles • Super hurdles if cutting it increases the # of hurdles (0 2 5 4 3 -6 1 7 )
Algorithm2: has 2k hurdles merge any two non-consecutive hurdles has 2k+1 hurdles cut one simple hurdle (if it has none merge any two non-consecutive) Claim2: Alg1 + Alg2 optimally sort any signed perm.
Proof of claims: • breakpoint graph • 1. each positive el x 2x-1,2x and each negative (-x) 2x,2x-1 (0 -1 3 5 4 6 -2 7) (0 2 1 5 6 9 10 7 8 11 12 4 3 13 ) arcs
Arcs oriented if they span an odd # of elements • Arc overlap graph: • Vertices -> arcs from breakpoint graph • Edges arcs overlap
Every oriented vertex corresponds to an oriented pair. • Fact2: Score of an oriented reversal (oriented vertex v) is T+U-O+1. • T= #oriented vertices. • U= #unoriented vertices adjacent to v • O= #oriented vertices adjacent to v • Oriented component if it contains an oriented v • Safe reversal does not create new unoriented components.
Theorem (Hannenhalli&Pevzner). Any sequence of oriented safe reversals is optimal. • Theorem. An oriented reversal of maximal score is safe. • claim1 holds. • Claim2 is proven in a similar manner.
J. Kececioglu and D. Sankoff. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. 1995. • A. Bergeron. A very elementary presentation of the Hannenhalli-Pevzner Theory. 2005 • A. Caprara. Sorting by reversals is difficult. 1997 • S. Hannenhalli and Pavel Pevzner.Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. 1999