270 likes | 445 Views
Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France. Motivations Genome Rearrangements. Human. Mouse. Sorting by Reversals. 0 7 5 3 -1 -6 -2 4 8. (HS). (MM). 0 1 2 3 4 5 6 7 8.
E N D
Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France
Motivations Genome Rearrangements Human Mouse
Sorting by Reversals 0 7 5 3 -1 -6 -2 4 8 (HS) (MM) 0 1 2 3 4 5 6 7 8
Sorting by Reversals 0 7 5 3 -1 -6 -2 4 8 (HS) 0 1 -3 -5 -7 -6 -2 4 8 (MM) 0 1 2 3 4 5 6 7 8
Sorting by Reversals 0 7 5 3 -1 -6 -2 4 8 (HS) 0 1 -3 -5 -7 -6 -2 4 8 0 1 -3 -5 -4 2 6 7 8 (MM) 0 1 2 3 4 5 6 7 8
Sorting by Reversals 0 7 5 3 -1 -6 -2 4 8 (HS) 0 1 -3 -5 -7 -6 -2 4 8 0 1 -3 -5 -4 2 6 7 8 0 1 -3 -2 4 5 6 7 8 (MM) 0 1 2 3 4 5 6 7 8
Sorting by Reversals 0 7 5 3 -1 -6 -2 4 8 (HS) 0 1 -3 -5 -7 -6 -2 4 8 0 1 -3 -5 -4 2 6 7 8 0 1 -3 -2 4 5 6 7 8 (MM) 0 1 2 3 4 5 6 7 8
History 1995 Hannenhalli and Pevzner first polynomial algorithm O(n4) 1996 Berman and Hannenhalli complexity improvement O(n2a(n)) 1997 Kaplan, Shamir and Tarjan complexity improvement O(n2) 1997 Caprara NP-completeness of the unsigned problem 2003 Bergeron simple presentation 2003 Ozery-Flato and Shamir "It is a central problem in the study of genome rearrangements whether one can obtain a subquadratic algorithm for sorting by reversals"
The Breakpoint Graph 0 7 5 3 -1 -6 -2 4 8 Reality 0 -1 -2 3 4 5 -6 7 8 Desire
The Breakpoint Graph 1-cycle, adjacency 4 5 2-cycle 3 -4 5 3-cycle 3 -4 -4.5 5 6 3 -4 5 6 Two 2-cycles
The effect of a reversal on the cycles 0 -1 -2 3 4 5 -6 7 8 0 -1 -2 3 4 5 -6 7 8 0 7 5 3 -1 -6 -2 4 8 0 7 5 3 -1 -6 -2 4 8 0 7 -4 2 6 1 -3 -5 8 0 1 -3 -5 -7 -6 -2 4 8 0 1 2 3 -4 -5 6 7 8 0 1 -2 -3 4 -5 -6 -7 8 Non-oriented cycle Oriented cycle
In the Breakpoint Graph Oriented cycle = with blue edges joining different signs Component = Set of cycles, not crossing others cycles outside Oriented Component = Component with an oriented cycle Unoriented Component = Component with non oriented cycle
The theorem of Hannenhalli and Pevzner number of reversals to clear unoriented components size of the permutation d = n + 1 - c + t minimum number of reversals number of cycles in the breakpoint graph
The theorem of Hannenhalli and Pevzner (no unoriented component) size of the permutation d = n + 1 - c minimum number of reversals number of cycles in the breakpoint graph
A bad choice among oriented cycles 0 -1 -2 3 4 5 -6 7 8 0 7 5 3 -1 -6 -2 4 8 0 7 5 6 1 -3 -2 4 8 0 1 -2 -3 4 5 6 7 8
Different approaches Naive: Choose any oriented cycle, apply the corresponding reversal, and if it creates an unoriented component, choose another one O(n3) Better: Test some properties on oriented cycles that cannot create unoriented component O(n2) Our method: Bad oriented cycles are good ones... later
The algorithm B A 0 -1 -2 3 4 5 -6 7 8 C Solution : empty D
The algorithm B A 0 1 -2 -3 4 5 6 7 8 C Solution : D
The algorithm B A 0 1 2 3 4 5 6 7 8 Solution : D,C
The algorithm B A 0 1 -2 -3 4 5 6 7 8 C Solution : (D,C)
The algorithm B A 0 -1 -2 3 4 5 -6 7 8 C Solution : (D,C) D
The algorithm B 0 1 -2 -3 4 -5 -6 -7 8 C Solution : A...(D,C) D
The algorithm 0 1 2 -3 -4 -5 6 7 8 C Solution : A,B...(D,C) D
The algorithm 0 1 2 3 4 5 6 7 8 Solution : A,B,D,C
Time complexity With any classical data structure, it takes linear time to perform a reversal, so at least quadratic time to sort. Kaplan and Verbin (2003) invented a data structure to represent permutation, which allows to pick an oriented cycle and perform a reversal in time O(sqrt(n log(n))) We use the same data structure to sort by reversals in time O(sqrt(n log(n))).
The data structure 0 7 5 3 -1 -6 -2 4 8 -1 -2 5 0 4 3 -6 7 8
Future work Can we do better in time complexity? Can the method give ideas to - sort with several (>2) permutations? (NP-hard, Caprara, 2002) - sort by transpositions? (unknown complexity)