340 likes | 660 Views
Greedy Algorithms CS 6030. by Savitha Parur Venkitachalam. Outline . Greedy approach to Motif searching Genome rearrangements Sorting by Reversals Greedy algorithms for sorting by reversals Approximation algorithms Breakpoint Reversal sort. Greedy motif searching .
E N D
Greedy Algorithms CS 6030 by SavithaParurVenkitachalam
Outline • Greedy approach to Motif searching • Genome rearrangements • Sorting by Reversals • Greedy algorithms for sorting by reversals • Approximation algorithms • Breakpoint Reversal sort
Greedy motif searching • Developed by Gerald Hertz and Gary Stormo in 1989 • CONSENSUS is the tool based on greedy algorithm • Faster than Brute force and Simple motif search algorithms • An approximation algorithm with an unknown approximation ratio
Greedy motif search – Steps • Input – DNA Sequence , t (# sequences) , n (length of one sequence) , l (length of motif to search) • Output – set of starting points of l-mers • Performs an exhaustive search using hamming distance on first two sequences of the DNA • Forms a 2 x l seed matrix with the two closest l-mers • Scans the rest of t-2 sequences to find the l-mer that best matches the seed and add it to the next row of the seed matrix
Complexity • Exhaustive search on first two sequences require l(n-l+1)2 operations which is O(ln2) • The sequential scan on t-2 sequences requires l(n-l+1)(t-2) operations which is O(lnt) • Thus running time of greedy motif search is O(ln2 + lnt) • If t is small compared to n algorithm behaves O(ln2)
Consensus tool • Greedy motif algorithm may miss the optimal motif • Consensus tool saves large number of seed matrices • Consensus tool can check sequences in random • Consensus tool is less likely to miss the optimal motif
Genome rearrangements • Gene rearrangements results in a change of gene ordering • Series of gene rearrangements can alter genomic architecture of a species • 99% similarity between cabbage and turnip genes • Fewer than 250 genomic rearrangements since divergence of human and mice
History of Chromosome X Rat Consortium, Nature, 2004
Types of Rearrangements Reversal 1 2 3 4 5 6 1 2 -5 -4 -3 6 Translocation 1 2 3 45 6 1 2 6 4 5 3 Fusion 1 2 3 4 5 6 1 2 3 4 5 6 Fission
Greedy algorithms in Gene Rearrangements • Biologists are interested in finding the smallest number of reversals in an evolutionary sequence • gives a lower bound on the number of rearrangements and the similarity between two species • Two greedy algorithms used - Simple reversal sort - Breakpoint reversal sort
Gene Order • Gene order is represented by a permutation p: p = p1------ pi-1 pipi+1 ------pj-1 pj pj+1 -----pn • Reversal r ( i, j ) reverses (flips) the elements from i to j inp • p * r ( i, j) ↓ p1------ pi-1pj pj-1 ------pi+1 pipj+1 -----pn
Reversal example p = 1 2 3 4 5 6 7 8 r(3,5) ↓ 1 2 5 4 3 6 7 8 r(5,6) ↓ 1 2 5 4 6 3 7 8
Reversal distance problem • Goal: Given two permutations, find the shortest series of reversals that transforms one into another • Input: Permutations pand s • Output: A series of reversals r1,…rttransforming p into s, such that t is minimum • t - reversal distance between p and s • d(p, s) - smallest possible value of t, given p and s
Sorting by reversal • Goal : Given a permutation , find a shortest series of reversals that transforms it into the identity permutation. • Input: Permutation π • Output : A series of reversals r1,…rttransforming p into identity permutation, such that t is minimum
Sorting by reversal - Greedy algorithm • If sorting permutation p = 1 2 3 6 4 5, the first three elements are already in order so it does not make any sense to break them. • The length of the already sorted prefix of p is denoted prefix(p) • prefix(p) = 3 • This results in an idea for a greedy algorithm: increase prefix(p) at every step
Simple Reversal sort – Psuedocode • A very generalized approach leads to analgorithm that sorts by moving ith element to ith position SimpleReversalSort(p) 1 fori 1 to n – 1 2 j position of element i in p(i.e., pj = i) 3 ifj ≠i 4 p p * r(i, j) 5 outputp 6 ifp is the identity permutation 7 return
Example – SimpleReversalSort not optimal Input – 612345 612345 ->162345 ->126345 ->123645->123465 --> 123456 Greedy SimpleReversalSort takes 5 steps where as optimal solution only takes 2 steps 612345 -> 543216 -> 123456 • An example of SimpleReversalSort is ‘Pancake Flipping problem’
Approximation Ratio • These algorithms produce approximate solution rather than an optimal one • Approximation ratio is of an algorithm A is given by A(p) / OPT(p) • For algorithm A that minimizes objective function (minimization algorithm): • max|p| = nA(p) / OPT(p) • For maximization algorithm: • min|p| = n A(p) / OPT(p)
Breakpoints – A different face of greed • In a permutation p=p1----pn - if pi and pi+1 are consecutive numbers it is an adjacency - if pi and pi+1 are not consecutive numbers it is a breakpoint Example: • = 1 |9 |3 4 |7 8|2 |6 5 • Pairs (1,9), (9,3), (4,7), (8,2) and (2,6) form breakpoints • Pairs (3,4) (7,8) and (6,5) form adjacencies • b(p) - # breakpoints in permutation p • Our goal is to eliminate all breakpoints and thus forming the identity permutation
Breakpoint Reversal Sort – Steps • Put two elements p0=0 and pn + 1=n+1 at the ends of p • Eliminate breakpoints using reversals • Each reversal eliminates at most 2 breakpoints • This implies reversal distance ≥ #breakpoints/2 p= 2 3 1 4 6 5 0 2 3 1 4 6 57b(p) = 5 01 3 24 6 5 7b(p) = 4 0 1 2 3 4 6 57b(p) = 2 0 1 2 3 4 5 6 7 b(p) = 0 • Not efficient as it may run forever
Psuedocode – Breakpoint reversal Sort BreakPointReversalSort(p) 1whileb(p) > 0 2Among all possible reversals, choose reversalrminimizing b(p•r) 3p p • r(i, j) 4outputp 5return
Using strips A strip is an interval between two consecutive breakpoints in a permutation • Decreasing strip: strip of elements in decreasing order • Increasing strip: strip of elements in increasing order 01 9 4 3 7 8 2 5 6 10 • A single-element strip can be declared either increasing or decreasing. We will choose to declare them as decreasing with exception of the strips with 0 and n+1
Reducing breakpoints • Choose the decreasing strip with the smallest element k in p • Find K-1 in the permutation • Reverse the segment between k and k-1 Eg: p= 1 4 6 5 7 8 3 2 01 4 6 5 7 8 3 29 b(p) = 5 01 2 3 8 7 5 6 49 b(p ) = 4 0 1 2 3 4 6 5 7 8 9 b(p ) = 2 0 1 2 3 4 5 6 7 8 9
ImprovedBreakpointReversalSort • Sometimes permutation may not contain any decreasing strips • So an increasing strip has to be reversed so that it becomes a decreasing strip • Taking this into consideration we have an improved algorithm ImprovedBreakpointReversalSort(p) 1 whileb(p) > 0 2 ifphas a decreasing strip • Among all possible reversals, choose reversalr that minimizes b(p•r) 4 else 5 Choose a reversalr that flips an increasing strip in p 6 p p•r 7 outputp 8 return
Example – ImprovedBreakPointSort • There are nodecreasing strips in p, for: p = 01 2 |5 6 7 |3 4 |8 b(p) = 3 p•r(6,7) = 01 2 | 5 6 7 | 4 3 |8 b(p) = 3 r(6,7) does not change the # of breakpoints r(6,7) creates a decreasing strip thus guaranteeing that the next step will decrease the # of breakpoints.
Approximation Ratio - ImprovedBreakpointReversalSort • Approximation ratio is 4 • It eliminates at least one breakpoint in every two steps; at most 2b(p) steps • Approximation ratio: 2b(p) / d(p) • Optimal algorithm eliminates at most 2 breakpoints in every step: d(p) b(p) / 2 • Performance guarantee: • ( 2b(p) / d(p) ) [ 2b(p) / (b(p) / 2) ] = 4
References • An Introduction to Bioinformatics Algorithms - Neil C.Jones and PavelA.Pevzner • http://bix.ucsd.edu/bioalgorithms/slides.php#Ch5