350 likes | 360 Views
This text discusses the recursive algorithm for computing the longest common subsequence (LCS) of two given sequences, X and Y. It also explains the optimal substructure property of LCS and the recursive equation for computing the length of an LCS.
E N D
Recursive Algorithm: Compute-Opt(j) if j=0 then return 0 else return max {vj+Compute-Opt(p(j)), Compute-Opt(j-1)} Running time: >2n/2. (not required) chapter25
Index v1=2 p(1)=0 1 v2=4 p(2)=0 2 v3=4 p(3)=1 3 v4=7 p(4)=0 4 v5=2 p(5)=3 5 v6=1 p(6)=3 6 chapter25
(not required) OPT(6) OPT(5) OPT(3) OPT(4) OPT(3) OPT(2) OPT(1) OPT(3) OPT(1) OPT(1) OPT(2) OPT(1) OPT(2) The tree of subproblems grows very quickly OPT(1) OPT(1) It may take exponential time chapter25
(not required) T(n)=T(n-1)+T(n-2)>2T(n-2)>4T(n-4) > 8T(n-6)>…>2n/2T(1) chapter25
Weighted Interval Scheduling: Bottom-Up • Input: n, s1, s2, …, sn, f1, f2, …, fn, v1, v2, …, vn • Sort jobs by finish times so that f1f2 … fn. • Compute p(1), p(2) , …, p(n) • M[0]=0; • for j = 1 to n do • M[j] = max { vj+m[p(j)], m[j-1]} • if (M[j] == M[j-1]) then B[j]=0else B[j]=1 /*for backtracking • m=n; /*** Backtracking • while ( m ≠0) { if (B[m]==1) then • print job m; m=p(m) • else • m=m-1 } B[j]=0 indicating job j is not selected. B[j]=1 indicating job j is selected. chapter25
M[2]=w2+M[0]=4+0; M[3]=w3+M[1]=4+2; M[4]=W4+M[0]=7+0; M[5]=W5+M[3]=2+6; M[6]=w6+M[3]=1+6<8; 0 1 2 3 4 5 6 Index 0 2 M = w1=2 p(1)=0 1 w2=4 p(2)=0 0 2 4 2 w3=4 p(3)=1 3 0 2 4 6 w4=7 p(4)=0 4 w5=2 p(5)=3 0 2 4 6 7 5 w6=1 p(6)=3 6 0 2 4 6 7 8 0 2 4 6 7 8 8 Backtracking: job1, job 3, job 5 j: 0 1 2 3 4 5 6 B: 0 1 1 1 1 1 0 chapter25
Backtracking and time complexity • Backtracking is used to get the schedule. • P()’s can be computed in O(n) time after sorting all the jobs based on the starting times. • Time complexity • O(n) if the jobs are sorted and p() is computed. • Total time: O(n log n) including sorting. chapter25
Computing p()’s in O(n) time P()’s can be computed in O(n) time using two sorted lists, one sorted by finish time (if two jobs have the same finish time, sort them based on starting time) and the other sorted by start time. Start time: b(0, 5), a(1, 3), e(3, 8), c(5, 6), d(6, 8) Finish time a(1, 3), b(0,5), c(5,6), d(3,8), e(6,8) P(d)=c, p(c )=b, p(e)= a, p(a)=0, p(b)=0. (See demo7) chapter25
Example 2: Start time: b(0, 5), a(1, 3), e(3, 8), c(5, 6), d(6, 8) Finish time a(1, 3), b(0,5), c(5,6), d(6,8), e(3,8) P(d)=c, p(c )=b, p(e)= a, p(a)=0, p(b)=0. v(a)=2, v(b)=3, v(c )=5, v(d) =6, v(e)=8.8. Solution: M[0]=0, M[a]=2. M[b]=max{2, 3+M[p(b)]}=3. M[c]=max{3, 5+M[p(c )]}=5+M[b]=8. M[d]=max{8, 6+M[p(d)]}=6+M[c]=6+8=14. M[e]=max{14, 8.8+M[p(e)]}=max{14, 8.8+M[a]}=max {14, 10.8}=14. Backtracking: b, c, d. Job: a b c d e B: 1 111 0 chapter25
Longest common subsequence • Definition 1: Given a sequence X=x1x2...xm, another sequence Z=z1z2...zk is a subsequence of X if there exists a strictly increasing sequence i1i2...ik of indices of X such that for all j=1,2,...k, we have xij=zj. • Example 1: If X=abcdefg, Z=abdg is a subsequence of X. X=abcdefg,Z=ab d g chapter25
Definition 2: Given two sequences X and Y, a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y. • Example 2: X=abcdefg and Y=aaadgfd. Z=adf is a common subsequence of X and Y. X=abc defg Y=aaaadgfd Z=a d f chapter25
Definition 3: A longest common subsequence of X and Y is a common subsequence of X and Y with the longest length. (The length of a sequence is the number of letters in the seuqence.) • Longest common subsequence may not be unique. • Example: abcd acbd Both acd and abd are LCS. chapter25
Longest common subsequence problem • Input: Two sequences X=x1x2...xm, and Y=y1y2...yn. • Output: a longest common subsequence of X and Y. • Applications: • Similarity of two lists • Given two lists: L1: 1, 2, 3, 4, 5 , L2:1, 3, 2, 4, 5, • Length of LCS=4 indicating the similarity of the two lists. • Unix command “diff”. chapter25
Longest common subsequence problem • Input: Two sequences X=x1x2...xm, and Y=y1y2...yn. • Output:a longest common subsequence of X and Y. • A brute-force approach Suppose that mn. Try all subsequence of X (There are 2m subsequence of X), test if such a subsequence is also a subsequence of Y, and select the one with the longest length. chapter25
Charactering a longest common subsequence • Theorem (Optimal substructure of an LCS) • Let X=x1x2...xm, and Y=y1y2...yn be two sequences, and • Z=z1z2...zk be any LCS of X and Y. • 1. If xm=yn, then zk=xm=yn and Z[1..k-1] is an LCS of X[1..m-1] and Y[1..n-1]. • 2. If xmyn, then zkxm implies that Z is an LCS of X[1..m-1] and Y. • 2. If xmyn, then zkyn implies that Z is an LCS of X and Y[1..n-1]. chapter25
The recursive equation • Let c[i,j] be the length of an LCS of X[1...i] and Y[1...j]. • c[i,j] can be computed as follows: 0 if i=0 or j=0, c[i,j]= c[i-1,j-1]+1 if i,j>0 and xi=yj, max{c[i,j-1],c[i-1,j]} if i,j>0 and xiyj. Computing the length of an LCS • There are nm c[i,j]’s. So we can compute them in a specific order. chapter25
The algorithm to compute an LCS • 1. for i=1 to m do • 2. c[i,0]=0; • 3. for j=0 to n do • 4. c[0,j]=0; • 5. for i=1 to m do • 6. for j=1 to n do • 7. { • 8. if x[i] ==y[j] then • 9. c[i,j]=c[i-1,j-1]+1; • 10 b[i,j]=1; • 11. elseif c[i-1,j]>=c[i,j-1] then • 12. c[i,j]=c[i-1,j] • 13. b[i,j]=2; • 14. else c[i,j]=c[i,j-1] • 15. b[i,j]=3; • 14 } chapter25
Example 3: X=BDCABA and Y=ABCBDAB. chapter25
Constructing an LCS (back-tracking) • We can find an LCS using b[i,j]’s. • We start with b[n,m] and track back to some cell b[0,i] or b[i,0]. • The algorithm to construct an LCS (backtracking) 1. i=m 2. j=n; 3. if i==0 or j==0 then exit; 4. if b[i,j]==1 then { i=i-1; j=j-1; print “xi”; } 5. if b[i,j]==2 i=i-1 6. if b[i,j]==3 j=j-1 7. Goto Step 3. • The time complexity: O(nm). chapter25
Remarks on weighted interval scheduling • it takes long time to explain. (50+13 minutes) • Do not mention exponent time etc. • For the first example, use the format of example 2 to show the computation process (more clearly). chapter25
Shortest common supersequence • Definition:Let X and Y be two sequences. A sequence Z is a supersequence of X and Y if both X and Y are subsequence of Z. • Shortest common supersequence problem: Input: Two sequences X and Y. Output: a shortest common supersequence of X and Y. • Example: X=abc and Y=abb. Both abbc and abcb are the shortest common supersequences for X and Y. chapter25
Recursive Equation: • Let c[i,j] be the length of an SCS of X[1...i] and Y[1...j]. • c[i,j] can be computed as follows: j if i=0 i if j=0, c[i,j]= c[i-1,j-1]+1 if i,j>0 and xi=yj, min{c[i,j-1]+1,c[i-1,j]+1} if i,j>0 and xiyj. chapter25
The pseudo-codes for i=0 to n do c[i, 0]=i; for j=0 to m do c[0,j]=j; for i=1 to n do for j=1 to m do if (xi == yj) c[i ,j]= c[i-1, j-1]+1; b[i.j]=1; else { c[i,j]=min{c[i-1,j]+1, c[i,j-1]+1}. if (c[I,j]=c[i-1,j]+1 then b[I,j]=2; else b[I,j]=3; } p=n, q=m; / backtracking while (p≠0 or q≠0) { if (b[p,q]==1) then {print x[p]; p=p-1; q=q-1} if (b[p,q]==2) then {print x[p]; p=p-1} if (b[p,q]==3) then {print y[q]; q=q-1} } chapter25
Exercises • Exercise 1: For the weighted interval scheduling problem, there are eight jobs with starting time and finish time as follows: j1=(0, 8), j2=(2, 3), j3=(3, 6), j4=(5, 9), j5=(8, 12), j6=(9, 11), j7=(10, 13) and j8=(11, 16). The weight for each job is as follows: v1=3.5, v2=2.0, v3=3.0, v4=3.0, v5=6.5, v6=2.5, v7=12.0, and v8=8.0. Find a maximum weight subset of mutually compatible jobs. (Backtracking process is required.) (You have to compute p()’s. The process of computing p()’s is NOT required.) • Exercise 2: Let X=abbacab and Y=baabcbb. Find the longest common subsequence for X and Y. Backtracking process is required. chapter25
Summary of Week 6 • Understand the algorithms for the weighted Interval Scheduling problem, LCS and SCS. • The “alignment of sequences” part is not taught. chapter25
Alignment of sequences • An alignment: • inserting spaces into X and Y such that the two resulting sequences X’ and Y’ are of the same length. • every letter in X’ is opposite to a unique letter in Y’. Examples: o-currence o-curr-ance abbbaa--bbbbaab occurrence o-curre-nce ababaaabbbbba-b • The alignment value: • where X’[i] and Y’[i] are the two letters in column i of the alignment and s(X’[i], Y’[i]) is the score (weight) of these opposing letters. • There are several popular socre schemes for DNA and protein sequences. chapter25
Recursive equations: c[i,j]=max{ c[i-1, j-1]+s(X[i], Y[j]), c[i, j-1]+s(_,Y[j]), c[i-1, j)+s(X[i],_)}. Similarity Score Scheme (max): • match: 1; • mismatch or insertion or deletion: 0. Example: AB B CAA A A B B C A A A 0 0 0 0 0 0 0 0 A BC CA A A 0 1 1 1 1 1 1 1 B 0 1 2 2 2 2 2 2 C 0 1 2 2 3 3 3 3 C 0 1 2 2 3 3 3 3 A 0 1 2 2 3 4 4 4 A 0 1 2 2 3 4 5 5 The same as LCS if we use the special similarity score and maximization chapter25
Recursive equations: c[i,j]=min{ c[i-1, j-1]+s(X[i], Y[j]), c[i, j-1]+s(_,Y[j]), c[i-1, j)+s(X[i],_)}. Distance Score Scheme (mix): • match: 0 insertion and deletion 1; • Mismatch 2 Example: AB B CAA A A B B CA A A 0 1 2 3 4 5 6 7 A BC C A A A 1 1 2 3 4 5 6 7 B 2 2 2 3 4 5 6 7 C 3 3 3 4 4 5 6 7 C 4 4 4 5 5 6 7 8 A 5 5 5 6 6 6 7 8 A 6 6 6 7 7 7 7 8 The same as SCS if we use the special distance score and minimization chapter25
A score emphasizing A-A match: (max) • A-A match: 1, • Any other match or mismatch: 0. Example: A B B C AA A A B B C A A A 0 0 0 0 0 0 0 0 A B C C A A A 0 1 1 1 1 1 1 1 B 0 1 1 1 1 1 1 1 C 0 1 1 1 1 1 1 1 C 0 1 1 1 1 1 1 1 A 0 1 1 1 1 2 2 2 A 0 1 1 1 1 2 3 3 There are 3 A-A matchs chapter25
Recursive equations: c[i,j]=min{ c[i-1, j-1]+s(X[i], Y[j]), c[i, j-1]+s(_,Y[j]), c[i-1, j)+s(X[i],_)}. c[i,j]=max{ c[i-1, j-1]+s(X[i], Y[j]), c[i, j-1]+s(_,Y[j]), c[i-1, j)+s(X[i],_)}. • Time and space complexity Both are O(nm) or O(n2) if both sequences have equal length n. • Why? We have to compute c[i,j] (the cost) and b[i,j] (for back-tracking). Each will take O(n2). chapter25