720 likes | 975 Views
SMAWK. Revise. a. a. c. g. a. c. g. a. 6. 7. 3. 4. 1. 5. 2. 8. 0. c. 1. t. 2. a. 3. c. 4. g. 5. a. 6. g. 7. a. 8. Global alignment (Revise). Alignment graph for S = aacgacga , T = ctacgaga. V( i,j ) = max { V(i-1,j-1) + (S[ i ], T[j]),
E N D
a a c g a c g a 6 7 3 4 1 5 2 8 0 c 1 t 2 a 3 c 4 g 5 a 6 g 7 a 8 Global alignment (Revise) Alignment graph for S = aacgacga, T = ctacgaga • V(i,j) = max { V(i-1,j-1) + (S[i], T[j]), V(i-1,j) + (S[i], -), V(i,j-1) + (-, T[j]) } Complexity: O(n2)
a c g 2 4 3 5 5 a 1 4 g I 0 3 1 2 0 G O DIST and OUT matrix (Revise) Block – sub-sequences “acg”, “ag” DIST matrix I (input borders) OUT matrix max col
a c g 2 4 3 5 5 a 1 4 g I 0 3 1 2 0 G O • Compute O without explicit OUT Block – sub-sequences “acg”, “ag” DIST matrix I (input borders) SMAWK
Aggarwal, Park and Schmidt observed that DIST and OUT matrices are Monge arrays. • Definition: a matrix M[0…m,0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d • Convex condition:M[a,c]M[b,c]M[a,d]M[b,d]. • Concave condition:M[a,c]M[b,c]M[a,d]M[b,d].
SMAWK • Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find all row and column maxima of a totally monotone matrix by querying only O(n) elements of the matrix.
Presentation Outline • What is Monge arrays? • MongeTotally monotone • Why DIST alignment matrix is Monge arrays? • How to compute totally monotone arrays efficiently? • SMAWK • Given a totally monotone arrays • Compute all columns maxima in O(n)
Monge • A matrix M[0…m, 0…n] is Monge if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d • M[a, c] +M[b, d] M[a, d] + M[b, c] • M[a, c] +M[b, d] M[a, d] + M[b, c]
Totally monotone • A matrix M[0…m, 0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n; a<b and c<d • Convex condition:M[a,c]M[b,c] M[a,d]M[b,d] • Concave condition:M[a,c]M[b,c] M[a,d]M[b,d] • Monge Totally monotone
Intuition • Monge: Quadrangle inequality: d a M[a, c] + M[b, d] M[a, d] + M[b, c] b c x z
History • Computational Geometry • All nearest neighbor problem • Shamos and Hoey proved (n log n) in 1975 • All farthest neighbor problem • F.P.Reparata proved (n log n) in 1977 • All farthest neighbor problem in convex polygon • Lee and Preparata proved O(n) in 1978
SMAWK • Aggarwal et.al. proved O(n) for farthest in convex polygon in 1987 • Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find all row and column maxima of a totally monotone matrix by querying only O(n) elements of the matrix.
Assumption • row and column maxima of a totally monotone matrix can be computed in O(n) • Why DIST and OUT matrices of the alignment problem is totally monotone?
a c g 2 4 3 5 5 a 1 4 g I 0 3 1 2 0 G O DIST and OUT matrix (Revise) Block – sub-sequences “acg”, “ag” DIST matrix I (input borders) OUT matrix max col
a c g 2 4 3 5 5 a 1 4 g I 0 3 1 2 0 G O • Compute O without explicit OUT Block – sub-sequences “acg”, “ag” DIST matrix I (input borders) SMAWK
a c g 2 4 3 5 5 a 1 4 g I 0 3 1 2 0 G O DIST is Monge
DIST is Monge array • Monge • M[a, c] + M[b, d] M[a, d] + M[b, c] • Totally monotone by Concave condition: • M[a,c]M[b,c] M[a,d]M[b,d]
Comment on this approach • Advantages • Easy to parallelize • Easy to combine • Disadvantages • Need to compute/keep more information
Applications • Parallel sequence alignment • O(log m log n) time • Using O(m n / log m) processors (CREW PRAM) • Best non-overlapping alignment score • O(n2 log2 n) time • Tandem approximate repeat • O(n2 log n) time • Common Substring Alignment
Find all column mimimas of the following totally monotone arrays [ab] [cd] b < d a< c b = d a c
Find all column mimimas of the following totally monotone arrays [ab] [cd] b < d a< c b = d a c a > c b > d a = c b d
Observation 1 [ab] [cd] b < d a< c b = d a c a > c b > d a = c b d
Observation 2 [ab] [cd] b < d a< c b = d a c a > c b > d a = c b d
[ab] [cd] b < d a< c b = d a c a > c b > d a = c b d • SMAWK is a recursive algorithm of 2 steps • REDUCE • INTERPOLATE
[ab] [cd] b < d a< c b = d a c a > c b > d a = c b d • SMAWK is a recursive algorithm of 2 steps • REDUCE • INTERPOLATE • REDUCE removes rows • INTERPOLATE removes half of the columns