1 / 10

Shortest Superstring (SS)

Shortest Superstring (SS). s. shortest superstring. s 1. pref ( s 1 , s 2 ). s 2. pref ( s 2 , s 3 ). s 3. pref ( s 3 , s 4 ). s 4. pref ( s 4 , s 5 ). s 5. s = pref (s 1 , s 2 ) + pref (s 2 , s 3 ) + pref (s 3 , s 4 ) + pref (s 4 , s 5 ) + s 5. SS rewritten. s.

Download Presentation

Shortest Superstring (SS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shortest Superstring (SS) s shortest superstring s1 pref(s1, s2) s2 pref(s2, s3) s3 pref(s3, s4) s4 pref(s4, s5) s5 s = pref(s1, s2) + pref(s2, s3) + pref(s3, s4) + pref(s4, s5) + s5

  2. SS rewritten s shortest superstring s1 pref(s1, s2) s2 pref(s2, s3) s3 pref(s3, s4) s4 pref(s4, s5) s5 pref(s5, s1) s1 overlap(s5, s1) s = pref(s1, s2) + pref(s2, s3) + pref(s3, s4) + pref(s4, s5) + s5 s = pref(s1, s2) + pref(s2, s3) + pref(s3, s4) + pref(s4, s5) + pref(s5, s1) + overlap(s5, s1)

  3. TSP ≈ SS TSP on a digraph with vertices siand distances pref(si, sj) TSP = pref(s1, s2)+pref(s2, s3)+pref(s3, s4)+pref(s4, s5)+pref(s5, s1) SS = pref(s1, s2)+pref(s2, s3)+pref(s3, s4)+pref(s4, s5)+pref(s5, s1) + overlap(s5, s1)

  4. Approximate TSP Cycle cover of the digraph CC1 = pref(s1, s2)+pref(s2, s3)+pref(s3, s1) CC2 = pref(s4, s5)+pref(s5, s4) + overlap(s3, s1) + overlap(s5, s4) approx SS=pref(s1,s2)+pref(s2,s3)+pref(s3,s1)+overlap(s3,s1)+pref(s4,s5)+pref(s5,s4)+overlap(s5,s4)

  5. Estimating the error size How big is overlap(s3, s1) compared to CC ? Pretty big in the worst case. E.g. s1 = abcabcabc s2 = bcabcabca s3 = cabcabcab pref(s1,s2)=a pref(s2,s3)=b pref(s3,s1)=c CC = “abc” + overlap(s3,s1) = “abc” + “abcabcab”

  6. Three upper bounds for overlap • A trivial one: overlap(s3, s1) ≤ |s1| • A semi-trivial one. Let r1, r2, …, r3 be the order in which the first string of each of the cycles in the cover appears in OPT. Then Σ|ri| ≤ OPT + Σ overlap(ri, ri+1) since Σ (|ri| - overlap(ri, ri+1)) ≤ OPT • A clever one: overlap(ri, ri+1) ≤ CCi + CCi+1

  7. The clever bound overlap(ri, ri+1) ≤ CCi + CCi+1 If |ri| ≤ CCi then it follows trivially since overlap(ri, ri+1) ≤ |ri| (similarly if |ri+1| ≤ CCi+1 there is nothing to show) Else, riis bigger than CCi . Huh? This can only happen if ri periodic, since riis fully contained in CCi (similarly ri+1 is periodic too)

  8. The clever bound Now by way of contradiction assume we have two periodic strings ri, ri+1 such that overlap(ri, ri+1) ≥CCi + CCi+1 I.e. we have two periodic strings ri, ri+1, each containing their cycles and with high overlap. Intuitive idea: If two periodic things overlap for long enough they must be contained in each other modulo shifts. If so, it is not hard to see that the CCi covers every string in CCi+1 and hence the two cycles can be merged with cost CCiwhich contradicts the fact that we had a minimum cycle cover. Intuition is right, details in the book.

  9. Three upper bounds for overlap • A trivial one: overlap(s3, s1) ≤ |s1| • A semi-trivial one. Let r1, r2, …, r3 be the order in which the first string of each of the cycles in the cover appears in OPT. Then Σ|ri| ≤ OPT + Σ overlap(ri, ri+1) since Σ (|ri| - overlap(ri, ri+1)) ≤ OPT • A clever one: overlap(ri, ri+1) ≤ CCi + CCi+1

  10. A trivial one: overlap(s3, s1) ≤ |s1| • Σ|ri| ≤ OPT + Σ overlap(ri, ri+1) • overlap(ri, ri+1) ≤ CCi + CCi+1 approx SS=pref(s1,s2)+pref(s2,s3)+pref(s3,s1)+overlap(s3,s1)+pref(s4,s5)+pref(s5,s4)+overlap(s5,s4) ≤ OPT+overlap(s3,s1)+overlap(s5,s4 ) ≤ OPT+|s1|+|s4| ≤ OPT + Σ |ri| ≤ OPT + OPT + Σ overlap(ri, ri+1) ≤ OPT + OPT + Σ (CCi + CCi+1 ) ≤ OPT + OPT + OPT + OPT ≤ 4 OPT

More Related