1 / 72

SMAWK

SMAWK. Revise. a. a. c. g. a. c. g. a. 6. 7. 3. 4. 1. 5. 2. 8. 0. c. 1. t. 2. a. 3. c. 4. g. 5. a. 6. g. 7. a. 8. Global alignment (Revise). Alignment graph for S = aacgacga , T = ctacgaga. V( i,j ) = max { V(i-1,j-1) + (S[ i ], T[j]),

ermin
Download Presentation

SMAWK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SMAWK

  2. Revise

  3. a a c g a c g a 6 7 3 4 1 5 2 8 0 c 1 t 2 a 3 c 4 g 5 a 6 g 7 a 8 Global alignment (Revise) Alignment graph for S = aacgacga, T = ctacgaga • V(i,j) = max { V(i-1,j-1) + (S[i], T[j]), V(i-1,j) + (S[i], -), V(i,j-1) + (-, T[j]) } Complexity: O(n2)

  4. a c g 2 4 3 5 5 a 1 4 g I 0 3 1 2 0 G O DIST and OUT matrix (Revise) Block – sub-sequences “acg”, “ag” DIST matrix I (input borders) OUT matrix max col

  5. a c g 2 4 3 5 5 a 1 4 g I 0 3 1 2 0 G O • Compute O without explicit OUT Block – sub-sequences “acg”, “ag” DIST matrix I (input borders) SMAWK

  6. Aggarwal, Park and Schmidt observed that DIST and OUT matrices are Monge arrays. • Definition: a matrix M[0…m,0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d • Convex condition:M[a,c]M[b,c]M[a,d]M[b,d]. • Concave condition:M[a,c]M[b,c]M[a,d]M[b,d].

  7. SMAWK • Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find all row and column maxima of a totally monotone matrix by querying only O(n) elements of the matrix.

  8. Presentation Outline • What is Monge arrays? • MongeTotally monotone • Why DIST alignment matrix is Monge arrays? • How to compute totally monotone arrays efficiently? • SMAWK • Given a totally monotone arrays • Compute all columns maxima in O(n)

  9. Monge and Totally monotone properties

  10. Monge • A matrix M[0…m, 0…n] is Monge if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d • M[a, c] +M[b, d] M[a, d] + M[b, c] • M[a, c] +M[b, d] M[a, d] + M[b, c]

  11. Totally monotone • A matrix M[0…m, 0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d • Convex condition:M[a,c]M[b,c]  M[a,d]M[b,d] • Concave condition:M[a,c]M[b,c]  M[a,d]M[b,d] • Monge Totally monotone

  12. Intuition • Monge: Quadrangle inequality: d a M[a, c] + M[b, d] M[a, d] + M[b, c] b c x z

  13. History • Computational Geometry • All nearest neighbor problem • Shamos and Hoey proved (n log n) in 1975 • All farthest neighbor problem • F.P.Reparata proved (n log n) in 1977 • All farthest neighbor problem in convex polygon • Lee and Preparata proved O(n) in 1978

  14. SMAWK • Aggarwal et.al. proved O(n) for farthest in convex polygon in 1987 • Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find all row and column maxima of a totally monotone matrix by querying only O(n) elements of the matrix.

  15. DIST and OUT Matrices

  16. Assumption • row and column maxima of a totally monotone matrix can be computed in O(n) • Why DIST and OUT matrices of the alignment problem is totally monotone?

  17. a c g 2 4 3 5 5 a 1 4 g I 0 3 1 2 0 G O DIST and OUT matrix (Revise) Block – sub-sequences “acg”, “ag” DIST matrix I (input borders) OUT matrix max col

  18. a c g 2 4 3 5 5 a 1 4 g I 0 3 1 2 0 G O • Compute O without explicit OUT Block – sub-sequences “acg”, “ag” DIST matrix I (input borders) SMAWK

  19. a c g 2 4 3 5 5 a 1 4 g I 0 3 1 2 0 G O DIST is Monge

  20. DIST is Monge array • Monge • M[a, c] + M[b, d] M[a, d] + M[b, c] • Totally monotone by Concave condition: • M[a,c]M[b,c]  M[a,d]M[b,d]

  21. Comment on this approach • Advantages • Easy to parallelize • Easy to combine • Disadvantages • Need to compute/keep more information

  22. Applications • Parallel sequence alignment • O(log m log n) time • Using O(m n / log m) processors (CREW PRAM) • Best non-overlapping alignment score • O(n2 log2 n) time • Tandem approximate repeat • O(n2 log n) time • Common Substring Alignment

  23. SMAWK

  24. Find all column mimimas of the following totally monotone arrays [ab] [cd] b < d  a< c b = d  a c

  25. Find all column mimimas of the following totally monotone arrays [ab] [cd] b < d  a< c b = d  a c a > c b > d a = c  b d

  26. Observation 1 [ab] [cd] b < d  a< c b = d  a c a > c b > d a = c  b d

  27. Observation 2 [ab] [cd] b < d  a< c b = d  a c a > c b > d a = c  b d

  28. [ab] [cd] b < d  a< c b = d  a c a > c b > d a = c  b d • SMAWK is a recursive algorithm of 2 steps • REDUCE • INTERPOLATE

  29. [ab] [cd] b < d  a< c b = d  a c a > c b > d a = c  b d • SMAWK is a recursive algorithm of 2 steps • REDUCE • INTERPOLATE • REDUCE removes rows • INTERPOLATE removes half of the columns

  30. REDUCE

  31. REDUCE

  32. REDUCE

  33. REDUCE

  34. REDUCE

  35. REDUCE

  36. REDUCE

  37. REDUCE

  38. REDUCE

  39. REDUCE

  40. REDUCE

  41. REDUCE

  42. REDUCE

  43. REDUCE

  44. REDUCE

  45. REDUCE

  46. REDUCE

  47. REDUCE

  48. REDUCE

  49. REDUCE

  50. REDUCE

More Related