300 likes | 314 Views
An outline of divide and conquer algorithms, including MergeSort, linear space sequence alignment, four-Russians speedup, and constructing LCS in sub-quadratic time.
E N D
Outline • MergeSort • Finding the middle point in the alignment matrix in linear space • Linear space sequence alignment • Block Alignment • Four-Russians speedup • Constructing LCS in sub-quadratic time
Divide and Conquer Algorithms • Divide problem into sub-problems • Conquer by solving sub-problems recursively. If the sub-problems are small enough, solve them in brute force fashion • Combine the solutions of sub-problems into a solution of the original problem (tricky part)
Sorting Problem Revisited • Given: an unsorted array • Goal: sort it
Mergesort: Divide Step Step 1 – Divide log(n) divisions to split an array of size n into single elements
Mergesort: Conquer Step • Step 2 – Conquer O(n) O(n) O(n) O(n) O(n logn) logn iterations, each iteration takes O(n) time. Total Time:
Mergesort: Combine Step • Step 3 – Combine • 2 arrays of size 1 can be easily merged to form a sorted array of size 2 • 2 sorted arrays of size n and m can be merged in O(n+m) time to form a sorted array of size n+m
Mergesort: Combine Step Combining 2 arrays of size 4 Etcetera…
Merge Algorithm • Merge(a,b) • n1 size of array a • n2 size of array b • an1+1 • an2+1 • i 1 • j 1 • fork 1 to n1 + n2 • ifai < bj • ck ai • i i +1 • else • ck bj • j j+1 • returnc
Mergesort: Example 20 4 7 6 1 3 9 5 Divide 20 4 7 6 1 3 9 5 20 4 7 6 1 3 9 5 1 3 9 5 7 20 4 6 4 20 6 7 1 3 5 9 Conquer 4 6 7 20 1 3 5 9 1 3 4 5 6 7 9 20
MergeSort Algorithm MergeSort(c) n size of array c ifn = 1 returnc left list of first n/2 elements of c right list of last n-n/2 elements of c sortedLeft MergeSort(left) sortedRight MergeSort(right) sortedList Merge(sortedLeft,sortedRight) returnsortedList
MergeSort: Running Time • The problem is simplified to baby steps • for the i’th merging iteration, the complexity of the problem is O(n) • number of iterations is O(log n) • running time: O(n logn)
Divide and Conquer Approach to LCS Path(source, sink) • if(source & sink are in consecutive columns) • output the longest path from source to sink • else • middle ← middle vertex between source & sink • Path(source, middle) • Path(middle, sink)
Divide and Conquer Approach to LCS Path(source, sink) • if(source & sink are in consecutive columns) • output the longest path from source to sink • else • middle ← middle vertex between source & sink • Path(source, middle) • Path(middle, sink) The only problem left is how to find this “middle vertex”!
Computing Alignment Path Requires Quadratic Memory Alignment Path • Space complexity for computing alignment path for sequences of length n and m is O(nm) • We need to keep all backtracking references in memory to reconstruct the path (backtracking) m n
Computing Alignment Score with Linear Memory • Alignment Score • Space complexity of computing just the score itself is O(n) • We only need the previous column to calculate the current column, and we can then throw away that previous column once we’re done using it 2 n n
Computing Alignment Score: Recycling Columns Only two columns of scores are saved at any given time memory for column 1 is used to calculate column 3 memory for column 2 is used to calculate column 4
Crossing the Middle Line We want to calculate the longest path from (0,0) to (n,m) that passes through (i,m/2) where i ranges from 0 ton and represents the i-th row Define length(i) as the length of the longest path from (0,0) to (n,m) that passes through vertex (i, m/2) (i, m/2) Prefix(i) Suffix(i)
Crossing the Middle Line (i, m/2) Prefix(i) Suffix(i) Define (mid,m/2) as the vertex where the longest path crosses the middle column. length(mid, m/2) = optimal length = max0i nlength(i)
Computing Prefix(i) • prefix(i) is the length of the longest path from (0,0) to (i,m/2) • Compute prefix(i) by dynamic programming in the left half of the matrix store prefix(i) column 0 m/2 m
Computing Suffix(i) • suffix(i) is the length of the longest path from (i,m/2) to (n,m) • suffix(i) is the length of the longest path from (n,m) to (i,m/2) with all edges reversed • Compute suffix(i) by dynamic programming in the right half of the “reversed” matrix store suffix(i) column 0 m/2 m
Length(i) = Prefix(i) + Suffix(i) • Add prefix(i) and suffix(i) to compute length(i): • length(i)=prefix(i) + suffix(i) • You now have a middle vertex of the maximum path (i,m/2) as maximum of length(i) 0 i middle point found 0 m/2 m
Time = Area: First Pass • On first pass, the algorithm covers the entire area Area = nm
Time = Area: First Pass • On first pass, the algorithm covers the entire area Area = nm Computing prefix(i) Computing suffix(i)
Time = Area: Second Pass • On second pass, the algorithm covers only 1/2 of the area Area/2
Time = Area: Third Pass • On third pass, only 1/4th is covered. Area/4
Geometric Reduction At Each Iteration • 1 + ½ + ¼ + ... + (½)k ≤ 2 • Runtime: O(Area)= O(nm) 5th pass: 1/16 3rd pass: 1/4 first pass: 1 4th pass: 1/8 2nd pass: 1/2