Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals

Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals Haim Kaplan and Elad Verbin

Given a permutation, find a shortest sequence of reversals that transforms it to (1 2 … n) Unsigned Sorting by reversals ( (3 4 2 5 1) 3 4 2 5 1)

Signed Sorting by reversals Given a signed permutation, find a shortest sequence of reversals that transforms it to (+1 +2 … +n) ( - - - (3 4 -2 -5 1) 3 4 2 5 1)

(3 4 -2 -5 1) (-4 -3 -2 -5 1) (-4 -3 -2 -1 5) ( 1 2 3 4 5) Example

Motivation • Biology: computing large-scale evolutionary distance (most parsimonious scenario) • Heuristics for TSP

History of Unsigned SBR • 1993: Conjectured to be NP-Hard and 2-approx. algorithm, Kececioglu and Sankoff • 1997: Proven NP-Hard, Caprara • 1999: Proven MAX-SNP Hard, Berman and Karpinski • 2001: 1.375-approximation by Berman, Hannenhalli and Karpinski

History of Signed SBR • 1993: Conjectured NP-Hard, Kececioglu and Sankoff • 1995: Polynomial Algorithm, Pevzner and Hannenhalli • 1996: O(n2(n)) Algorithm, Berman and Hannenhalli • 1997: Simpler O(n2) Algorithm, Kaplan, Shamir and Tarjan - Best to date. • 2001: Very Simple cubic solution, Bergeron

Variants • Computing just the distance (signed version) – linear (Bader et. al., 2001) • Many, many, more

Sorting Signed Permutations

x -x xa,xb xb,xa The Breakpoint Graph +3 +4 -2 -5 +1 0 3a 3b 4a 4b 2b 2a 5b 5a 1a 1b 6 • Transform Permutation • Blue Edges Between Adjacent Vertices • Red Edges Between Consecutive elements

Note: single decomposition to alternating cycles Calculating the distance dist=n+1-c+h+f dist=n+1-c Goal: Find a reversal that creates a cycle and keeps h+f=0 (a safe reversal) h,f are small parameters of the breakpoint graph (usually h+f=0) Eliminated in linear time 0 3a 3b 4a 4b 2b 2a 5b 5a 1a 1b 6

Oriented Edges We consider only reversals that reverse between the endpoints of a red edge Red edges can be : Oriented{ Right-to-Right Left-to-Left Unoriented{ Left-to-Right Right-to-Left Def:This reversal acts on the red edge

Oriented Edges Red edges can be : Unoriented c=0 Oriented c=1 Thm: A reversal acting on a red edge creates a new cycle the edge is oriented

Safe reversals • Safe oriented reversal: does not increase h+f • Theorem(H-P): a permutation with h+f=0 always has a safe oriented reversal (or is id) • SBR algorithms iteratively find a safe reversal. KST does this in linear time (total running time – O(n2)) • H-P characterized safe oriented reversals using the overlap graph

What happens if we disregard safety? perms with h+f>0 are rare

Failed but don’t know it yet Each path is <n+2 steps h+f>0 Yey! Whoops.. id NO CHILDREN RandomWalk =(1 -4 2 -3) H-P: all nodes with no children are either idor red ’s oriented reversals Prop: if we ended up at id, then the sequence was a minimal sorting sequence, otherwise we can retry Prop: oriented reversals never decrease h+f (i.e. red points only to red)

50% chance of failure 50% chance of failure 50% chance of failure After how many trials will RandomWalk succeed? • Worst permutation – many • Average permutation – very few pi7=(2 4 6 -1 -3 -5 7) • (permutation of Michal Ozery and Ron Shamir) Probability of success

What is the average over all permutations of the expected number of trials until success? • Theoretically – we don’t know (but we do know that red permutations are polynomially rare) • Experimentally – 1.6, regardless of n

Empirical testing of RandomWalk on random permutations (selected uniformly without unoriented components) Behavior of RandomWalk on the average

Implementing the Random Walk

Basic structure • Our DS is based on a simple data structures of • Fredman, Johnson, McGeoch & Hostheimer `95 • (which are based on those by Chrobak, Szymacha • & Krawczyk, `90). These structures were invented • for implementing TSP heuristics • These data structures allow us to maintain • permutation under: • Reversals • Queries: , • All operations taking logarithmic time.

Random Walk • Repeatedly: • Select a random oriented reversal • Perform it and update the permutation

How fast can we run one RandomWalk iteration? Repeatedly: • Select a random oriented reversal • Perform it and update the permutation • Can be done in O(n) time • In the paper we show how to do it in so one Random Walk takes time

Further Questions • Why does RandomWalk work (so well)? • Are there variants of RandomWalk that work better (i.e. have good worst-case behavior – no bad cases)? • Can RandomWalk be easily and efficiently implemented?

Further Questions • Are there variants to RandomWalk that can be easily and efficiently implemented but maintain good average-case behavior (i.e. by waving the demand for a uniform selection of an oriented reversal)? • Can we maintain safety or hurdelity too?

General Further Research • Is there a subquadratic algorithm for SBR? • What is the structure of the space of all sorting reversal sequences? (i.e. the permutation graph we saw before)

Fin.

Fredman et. al.’s DS • Splay tree with the permutation in the leaves – inorder scan gives the permutation. • Reverse bits at the nodes. If on means that the order of the subtree should be reverse, as should the signs of the elements.

Reverse(i,j): Splay operations should keep the invariant that the tree indeed represents the perumutation i j splay(j)

Reverse(i,j): j i splay(i)

Reverse(i,j): i i j k T1 T1 j T2 T3 T2 -j T3 T4 -j -k T1 -i T1 -i T3R T2R T2R T3 T4

Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals

Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals

Presentation Transcript

Sorting by reversals

Randomized Algorithms: Data Structures

I/O-efficient Algorithms and Data Structures

I/O-efficient Algorithms and Data Structures

Efficient Sorting Algorithm

Space-Efficient Data Structures for Top-k Completion

Space-Efficient Data Structures for Top- k Completion

Data Structures – LECTURE 4 Comparison-based sorting

Techniques and Data Structures for Efficient Multimedia Similarity Search

Genome Rearrangement SORTING BY REVERSALS

Efficient Sequential Aggregate Signed Data

CSE 326: Data Structures: Sorting

Hierarchical Data Structures for Efficient Rendering and Navigation

Data Structures and Algorithms for Efficient Shape Analysis

Randomized Algorithms for Selection and Sorting

CSE 326: Data Structures: Sorting

CS203 Programming with Data Structures Sorting

Data Structures and Algorithm Analysis Algorithm Analysis and Sorting

Exercise 12 – Data Structures – Trees Sorting Algorithms