1 / 27

Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France. Motivations Genome Rearrangements. Human. Mouse. Sorting by Reversals. 0 7 5 3 -1 -6 -2 4 8. (HS). (MM). 0 1 2 3 4 5 6 7 8.

Download Presentation

Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

  2. Motivations Genome Rearrangements Human Mouse

  3. Sorting by Reversals 0 7 5 3 -1 -6 -2 4 8 (HS) (MM) 0 1 2 3 4 5 6 7 8

  4. Sorting by Reversals 0 7 5 3 -1 -6 -2 4 8 (HS) 0 1 -3 -5 -7 -6 -2 4 8 (MM) 0 1 2 3 4 5 6 7 8

  5. Sorting by Reversals 0 7 5 3 -1 -6 -2 4 8 (HS) 0 1 -3 -5 -7 -6 -2 4 8 0 1 -3 -5 -4 2 6 7 8 (MM) 0 1 2 3 4 5 6 7 8

  6. Sorting by Reversals 0 7 5 3 -1 -6 -2 4 8 (HS) 0 1 -3 -5 -7 -6 -2 4 8 0 1 -3 -5 -4 2 6 7 8 0 1 -3 -2 4 5 6 7 8 (MM) 0 1 2 3 4 5 6 7 8

  7. Sorting by Reversals 0 7 5 3 -1 -6 -2 4 8 (HS) 0 1 -3 -5 -7 -6 -2 4 8 0 1 -3 -5 -4 2 6 7 8 0 1 -3 -2 4 5 6 7 8 (MM) 0 1 2 3 4 5 6 7 8

  8. History 1995 Hannenhalli and Pevzner first polynomial algorithm O(n4) 1996 Berman and Hannenhalli complexity improvement O(n2a(n)) 1997 Kaplan, Shamir and Tarjan complexity improvement O(n2) 1997 Caprara NP-completeness of the unsigned problem 2003 Bergeron simple presentation 2003 Ozery-Flato and Shamir "It is a central problem in the study of genome rearrangements whether one can obtain a subquadratic algorithm for sorting by reversals"

  9. The Breakpoint Graph 0 7 5 3 -1 -6 -2 4 8 Reality 0 -1 -2 3 4 5 -6 7 8 Desire

  10. The Breakpoint Graph 1-cycle, adjacency 4 5 2-cycle 3 -4 5 3-cycle 3 -4 -4.5 5 6 3 -4 5 6 Two 2-cycles

  11. The effect of a reversal on the cycles 0 -1 -2 3 4 5 -6 7 8 0 -1 -2 3 4 5 -6 7 8 0 7 5 3 -1 -6 -2 4 8 0 7 5 3 -1 -6 -2 4 8 0 7 -4 2 6 1 -3 -5 8 0 1 -3 -5 -7 -6 -2 4 8 0 1 2 3 -4 -5 6 7 8 0 1 -2 -3 4 -5 -6 -7 8 Non-oriented cycle Oriented cycle

  12. In the Breakpoint Graph Oriented cycle = with blue edges joining different signs Component = Set of cycles, not crossing others cycles outside Oriented Component = Component with an oriented cycle Unoriented Component = Component with non oriented cycle

  13. The theorem of Hannenhalli and Pevzner number of reversals to clear unoriented components size of the permutation d = n + 1 - c + t minimum number of reversals number of cycles in the breakpoint graph

  14. The theorem of Hannenhalli and Pevzner (no unoriented component) size of the permutation d = n + 1 - c minimum number of reversals number of cycles in the breakpoint graph

  15. A bad choice among oriented cycles 0 -1 -2 3 4 5 -6 7 8 0 7 5 3 -1 -6 -2 4 8 0 7 5 6 1 -3 -2 4 8 0 1 -2 -3 4 5 6 7 8

  16. Different approaches Naive: Choose any oriented cycle, apply the corresponding reversal, and if it creates an unoriented component, choose another one O(n3) Better: Test some properties on oriented cycles that cannot create unoriented component O(n2) Our method: Bad oriented cycles are good ones... later

  17. The algorithm B A 0 -1 -2 3 4 5 -6 7 8 C Solution : empty D

  18. The algorithm B A 0 1 -2 -3 4 5 6 7 8 C Solution : D

  19. The algorithm B A 0 1 2 3 4 5 6 7 8 Solution : D,C

  20. The algorithm B A 0 1 -2 -3 4 5 6 7 8 C Solution : (D,C)

  21. The algorithm B A 0 -1 -2 3 4 5 -6 7 8 C Solution : (D,C) D

  22. The algorithm B 0 1 -2 -3 4 -5 -6 -7 8 C Solution : A...(D,C) D

  23. The algorithm 0 1 2 -3 -4 -5 6 7 8 C Solution : A,B...(D,C) D

  24. The algorithm 0 1 2 3 4 5 6 7 8 Solution : A,B,D,C

  25. Time complexity With any classical data structure, it takes linear time to perform a reversal, so at least quadratic time to sort. Kaplan and Verbin (2003) invented a data structure to represent permutation, which allows to pick an oriented cycle and perform a reversal in time O(sqrt(n log(n))) We use the same data structure to sort by reversals in time O(sqrt(n log(n))).

  26. The data structure 0 7 5 3 -1 -6 -2 4 8 -1 -2 5 0 4 3 -6 7 8

  27. Future work Can we do better in time complexity? Can the method give ideas to - sort with several (>2) permutations? (NP-hard, Caprara, 2002) - sort by transpositions? (unknown complexity)

More Related