1 / 31

Greedy Algorithms CS 6030

Greedy Algorithms CS 6030. by Savitha Parur Venkitachalam. Outline . Greedy approach to Motif searching Genome rearrangements Sorting by Reversals Greedy algorithms for sorting by reversals Approximation algorithms Breakpoint Reversal sort. Greedy motif searching .

talon
Download Presentation

Greedy Algorithms CS 6030

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Greedy Algorithms CS 6030 by SavithaParurVenkitachalam

  2. Outline • Greedy approach to Motif searching • Genome rearrangements • Sorting by Reversals • Greedy algorithms for sorting by reversals • Approximation algorithms • Breakpoint Reversal sort

  3. Greedy motif searching • Developed by Gerald Hertz and Gary Stormo in 1989 • CONSENSUS is the tool based on greedy algorithm • Faster than Brute force and Simple motif search algorithms • An approximation algorithm with an unknown approximation ratio

  4. Greedy motif search – Psuedocode

  5. Greedy motif search – Steps • Input – DNA Sequence , t (# sequences) , n (length of one sequence) , l (length of motif to search) • Output – set of starting points of l-mers • Performs an exhaustive search using hamming distance on first two sequences of the DNA • Forms a 2 x l seed matrix with the two closest l-mers • Scans the rest of t-2 sequences to find the l-mer that best matches the seed and add it to the next row of the seed matrix

  6. Complexity • Exhaustive search on first two sequences require l(n-l+1)2 operations which is O(ln2) • The sequential scan on t-2 sequences requires l(n-l+1)(t-2) operations which is O(lnt) • Thus running time of greedy motif search is O(ln2 + lnt) • If t is small compared to n algorithm behaves O(ln2)

  7. Consensus tool • Greedy motif algorithm may miss the optimal motif • Consensus tool saves large number of seed matrices • Consensus tool can check sequences in random • Consensus tool is less likely to miss the optimal motif

  8. Genome rearrangements • Gene rearrangements results in a change of gene ordering • Series of gene rearrangements can alter genomic architecture of a species • 99% similarity between cabbage and turnip genes • Fewer than 250 genomic rearrangements since divergence of human and mice

  9. History of Chromosome X Rat Consortium, Nature, 2004

  10. Types of Rearrangements Reversal 1 2 3 4 5 6 1 2 -5 -4 -3 6 Translocation 1 2 3 45 6 1 2 6 4 5 3 Fusion 1 2 3 4 5 6 1 2 3 4 5 6 Fission

  11. Greedy algorithms in Gene Rearrangements • Biologists are interested in finding the smallest number of reversals in an evolutionary sequence • gives a lower bound on the number of rearrangements and the similarity between two species • Two greedy algorithms used - Simple reversal sort - Breakpoint reversal sort

  12. Gene Order • Gene order is represented by a permutation p: p = p1------ pi-1 pipi+1 ------pj-1 pj pj+1 -----pn • Reversal r ( i, j ) reverses (flips) the elements from i to j inp • p * r ( i, j) ↓ p1------ pi-1pj pj-1 ------pi+1 pipj+1 -----pn

  13. Reversal example p = 1 2 3 4 5 6 7 8 r(3,5) ↓ 1 2 5 4 3 6 7 8 r(5,6) ↓ 1 2 5 4 6 3 7 8

  14. Reversal distance problem • Goal: Given two permutations, find the shortest series of reversals that transforms one into another • Input: Permutations pand s • Output: A series of reversals r1,…rttransforming p into s, such that t is minimum • t - reversal distance between p and s • d(p, s) - smallest possible value of t, given p and s

  15. Sorting by reversal • Goal : Given a permutation , find a shortest series of reversals that transforms it into the identity permutation. • Input: Permutation π • Output : A series of reversals r1,…rttransforming p into identity permutation, such that t is minimum

  16. Sorting by reversal - Greedy algorithm • If sorting permutation p = 1 2 3 6 4 5, the first three elements are already in order so it does not make any sense to break them. • The length of the already sorted prefix of p is denoted prefix(p) • prefix(p) = 3 • This results in an idea for a greedy algorithm: increase prefix(p) at every step

  17. Simple Reversal sort – Psuedocode • A very generalized approach leads to analgorithm that sorts by moving ith element to ith position SimpleReversalSort(p) 1 fori 1 to n – 1 2 j position of element i in p(i.e., pj = i) 3 ifj ≠i 4 p p * r(i, j) 5 outputp 6 ifp is the identity permutation 7 return

  18. Example – SimpleReversalSort not optimal Input – 612345 612345 ->162345 ->126345 ->123645->123465 --> 123456 Greedy SimpleReversalSort takes 5 steps where as optimal solution only takes 2 steps 612345 -> 543216 -> 123456 • An example of SimpleReversalSort is ‘Pancake Flipping problem’

  19. Approximation Ratio • These algorithms produce approximate solution rather than an optimal one • Approximation ratio is of an algorithm A is given by A(p) / OPT(p) • For algorithm A that minimizes objective function (minimization algorithm): • max|p| = nA(p) / OPT(p) • For maximization algorithm: • min|p| = n A(p) / OPT(p)

  20. Breakpoints – A different face of greed • In a permutation p=p1----pn - if pi and pi+1 are consecutive numbers it is an adjacency - if pi and pi+1 are not consecutive numbers it is a breakpoint Example: • = 1 |9 |3 4 |7 8|2 |6 5 • Pairs (1,9), (9,3), (4,7), (8,2) and (2,6) form breakpoints • Pairs (3,4) (7,8) and (6,5) form adjacencies • b(p) - # breakpoints in permutation p • Our goal is to eliminate all breakpoints and thus forming the identity permutation

  21. Breakpoint Reversal Sort – Steps • Put two elements p0=0 and pn + 1=n+1 at the ends of p • Eliminate breakpoints using reversals • Each reversal eliminates at most 2 breakpoints • This implies reversal distance ≥ #breakpoints/2 p= 2 3 1 4 6 5 0 2 3 1 4 6 57b(p) = 5 01 3 24 6 5 7b(p) = 4 0 1 2 3 4 6 57b(p) = 2 0 1 2 3 4 5 6 7 b(p) = 0 • Not efficient as it may run forever

  22. Psuedocode – Breakpoint reversal Sort BreakPointReversalSort(p) 1whileb(p) > 0 2Among all possible reversals, choose reversalrminimizing b(p•r) 3p p • r(i, j) 4outputp 5return

  23. Using strips A strip is an interval between two consecutive breakpoints in a permutation • Decreasing strip: strip of elements in decreasing order • Increasing strip: strip of elements in increasing order 01 9 4 3 7 8 2 5 6 10 • A single-element strip can be declared either increasing or decreasing. We will choose to declare them as decreasing with exception of the strips with 0 and n+1

  24. Reducing breakpoints • Choose the decreasing strip with the smallest element k in p • Find K-1 in the permutation • Reverse the segment between k and k-1 Eg: p= 1 4 6 5 7 8 3 2 01 4 6 5 7 8 3 29 b(p) = 5 01 2 3 8 7 5 6 49 b(p ) = 4 0 1 2 3 4 6 5 7 8 9 b(p ) = 2 0 1 2 3 4 5 6 7 8 9

  25. ImprovedBreakpointReversalSort • Sometimes permutation may not contain any decreasing strips • So an increasing strip has to be reversed so that it becomes a decreasing strip • Taking this into consideration we have an improved algorithm ImprovedBreakpointReversalSort(p) 1 whileb(p) > 0 2 ifphas a decreasing strip • Among all possible reversals, choose reversalr that minimizes b(p•r) 4 else 5 Choose a reversalr that flips an increasing strip in p 6 p p•r 7 outputp 8 return

  26. Example – ImprovedBreakPointSort • There are nodecreasing strips in p, for: p = 01 2 |5 6 7 |3 4 |8 b(p) = 3 p•r(6,7) = 01 2 | 5 6 7 | 4 3 |8 b(p) = 3 r(6,7) does not change the # of breakpoints r(6,7) creates a decreasing strip thus guaranteeing that the next step will decrease the # of breakpoints.

  27. Approximation Ratio - ImprovedBreakpointReversalSort • Approximation ratio is 4 • It eliminates at least one breakpoint in every two steps; at most 2b(p) steps • Approximation ratio: 2b(p) / d(p) • Optimal algorithm eliminates at most 2 breakpoints in every step: d(p) b(p) / 2 • Performance guarantee: • ( 2b(p) / d(p) )  [ 2b(p) / (b(p) / 2) ] = 4

  28. References • An Introduction to Bioinformatics Algorithms - Neil C.Jones and PavelA.Pevzner • http://bix.ucsd.edu/bioalgorithms/slides.php#Ch5

  29. Questions

More Related