300 likes | 314 Views
CS 575 Design and Analysis of Computer Algorithms Professor Michal Cutler Lecture 12 March 7, 2005. This class. Longest common subsequence Edit distance. Principle of optimality - shortest path problem. Problem: Given a graph G and vertices s and t, find a shortest path in G from s to t
E N D
CS 575Design and Analysis ofComputer AlgorithmsProfessor Michal CutlerLecture 12March 7, 2005
This class • Longest common subsequence • Edit distance
Principle of optimality - shortest path problem • Problem: Given a graph G and vertices s and t, find a shortest path in G from s to t • Theorem: A subpath from s’ to t’ of a shortest path P is also a shortest path. • Proof: If there was a a shorter path from s’ to t’, we can cut the existing subpath from s’ to t’ in P and paste into P the shorter one. • The result would be a better solution than the optimal one.
A problem that does not satisfy the Principle of Optimality • Problem: What is the longest simple route between City A and B? • Simple = never visit the same spot twice. • The longest simple route (solid line consisting of edges (A,C), (C, D) and (D, B)) has city C as an intermediate city. • It does not consist of the longest simple route from A to C plus the longest simple route from C to B. D Longest A to C B A C Longest A to B
Biological Applications • Compare the DNA of two or more organisms • How similar are the two strands? • Is one a substring of the other? • Find a new longest strand in which the bases (A, C, G, T) appear in the same order as in the original 2 strands?
Longest Common Subsequence (LCS) • Problem: Given sequences x[1..m] and y[1..n], find a longest common subsequence of both. • Example: x=ABCBDAB and y=BDCABA, • BCA is a common subsequence and • BCBA and BDAB are two LCSs
LCS • Brute force solution • Writing a recurrence equation • The dynamic programming solution • Application of algorithm
Brute force solution • Solution: For every subsequence of x, check if it is a subsequence of y. • Analysis : • There are 2m subsequences of x. • Each check takes O(n) time, since we scan y for first element, and then scan for second element, etc. • The worst case running time is O(n2m) or (2m).
Writing the recurrence equation • Let Xi denote the ithprefix x[1,..i] of x[1..,m], and • X0 denotes an empty prefix • We will first compute the length of an LCS of Xm and Yn, LenLCS(m, n), and then use information saved during the computation for finding the actual subsequence • We need a recursive formula for computing LenLCS(i, j).
Writing the recurrence equation • If Xi and Yjend with the same character xi=yj, the LCS must include the character. If it did not we could get a longer LCS by adding the common character. • If Xi and Yjdo not end with the same character there are two possibilities: • either the LCS does not end with xi, • or it does not end with yj • Let Zk denote an LCS of Xi and Yj
x1 x2 … xi-1xi Xi Yj y1 y2 … yj-1yj=xi Zk z1 z2…zk-1zk=yj=xi Zk is Zk -1 followed by zk = yj = xi where Zk-1 is an LCS of Xi-1 and Yj -1 and LenLCS(i, j)=LenLCS(i-1, j-1)+1 Xiand Yjend with xi=yj
x1 x2 … xi-1 xi x1 x2 … xi-1 x i Xi Xi Yj Yj y1 y2 … yj-1 yj yj y1 y2 …yj-1 yj Zk Zk z1 z2…zk-1 zk ¹yj z1 z2…zk-1 zk ¹xi Xiand Yjend with xi¹ yj Zk is an LCS of Xi and Yj -1 Zk is an LCS of Xi -1 and Yj LenLCS(i, j)=max{LenLCS(i, j-1), LenLCS(i-1, j)}
The dynamic programming solution • Initialize the first row and the first column of the matrix LenLCS to 0 • Calculate LenLCS (1, j) for j = 1,…, n • Then the LenLCS (2, j) for j = 1,…, n, etc. • Store also in a table an arrow pointing to the array element that was used in the computation. • It is easy to see that the computation is O(mn)
LCS-Length(X, Y) m length[X} n length[Y] for i 1 to m do c[i, 0] 0 for j 1 to n do c[0, j] 0
LCS-Length(X, Y) cont. for i 1 to m do for j 1 to n do if xi = yj c[i, j] c[i-1, j-1]+1 b[i, j] “D” else if c[i-1, j] c[i, j-1] c[i, j] c[i-1, j] b[i, j] “U” else c[i, j] c[i, j-1] b[i, j] “L” return c and b
Example To find an LCS follow the arrows, each diagonal one denotes a member of the LCS
Edit distance • Given two strings s and t • Edit distance = the minimum number of basic operations to covert one to the other • Basic operations are typically character-level • Insert • Delete • Replace • Often include also transposition • http://www.merriampark.com/ld.htm
Dynamic programming for edit distance • Let s[1, 2, ..., m] and t[1, 2, ..., n] be the two strings. The recurrence equation is: • r(i, j) =0 when s[i] = t[j], otherwise 1
The Greedy Method • Greedy algorithms make goodlocalchoices in the hope that they result in an optimal solution. • They result in feasible solutions. • Not necessarily optimal ones. • A proof is needed to show that the algorithm finds an optimal solution. • A counterexample shows that the greedy algorithm does not provide an optimal solution.
Pseudo-code for Greedy Algorithm set Greedy (Set Candidate){ solution= new Set( ); while (Candidate.isNotEmpty()) { next = Candidate.select(); //use selection criteria, //remove from Candidate and return value if (solution.isFeasible( next)) //constraints satisfied solution.union( next); if (solution.solves()) return solution} //No more candidates and no solution return null }
Pseudo code for greedy cont. • select() chooses a candidate based on a local selection criteria, removes it from Candidate, and returns its value. • isFeasible() checks whether adding the selected value to the current solution can result in a feasible solution (no constraints are violated). • solves() checks whether the problem is solved.
Elements of the Greedy Strategy Cast problem as one in which we make a greedy choice and are left with one subproblem to solve. To show optimality: • Prove there is always an optimal solution to original problem that makes the greedy choice.
Elements of the Greedy Strategy 2. Demonstrate that what remains is a subproblem with property: If we combine the optimal solution of the subproblem with the greedy choice we have an optimal solution to original problem.
Activity Selection • Given a set S of nactivities with start time si and finish time fi of activity i • Find a maximum size subset A of compatible activities (maximum number of activities). • Activities are compatible if they do not overlap • Can you suggest a greedy choice?
Example Activities 1 2 3 4 5 6 7 11 13 2 12 3 10 11 15 3 7 1 4 0 2 Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Counter Example 1 • Select by start time Activities 1 2 3 11 15 1 4 0 15 Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Counter Example 2 • Select by minimum duration Activities 1 2 3 8 15 1 8 7 9 Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Select by finishing time Activities 1 2 3 4 5 6 7 11 13 2 12 3 10 11 15 3 7 1 4 0 2 Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Next class Greedy algorithms