1 / 35

Midterm: week 7 in the lecture for 2 hours

This text discusses the recursive algorithm for computing the longest common subsequence (LCS) of two given sequences, X and Y. It also explains the optimal substructure property of LCS and the recursive equation for computing the length of an LCS.

bobbic
Download Presentation

Midterm: week 7 in the lecture for 2 hours

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Midterm: week 7 in the lecture for 2 hours chapter25

  2. chapter25

  3. chapter25

  4. chapter25

  5. Recursive Algorithm: Compute-Opt(j) if j=0 then return 0 else return max {vj+Compute-Opt(p(j)), Compute-Opt(j-1)} Running time: >2n/2. (not required) chapter25

  6. Index v1=2 p(1)=0 1 v2=4 p(2)=0 2 v3=4 p(3)=1 3 v4=7 p(4)=0 4 v5=2 p(5)=3 5 v6=1 p(6)=3 6 chapter25

  7. (not required) OPT(6) OPT(5) OPT(3) OPT(4) OPT(3) OPT(2) OPT(1) OPT(3) OPT(1) OPT(1) OPT(2) OPT(1) OPT(2) The tree of subproblems grows very quickly OPT(1) OPT(1) It may take exponential time chapter25

  8. (not required) T(n)=T(n-1)+T(n-2)>2T(n-2)>4T(n-4) > 8T(n-6)>…>2n/2T(1) chapter25

  9. Weighted Interval Scheduling: Bottom-Up • Input: n, s1, s2, …, sn, f1, f2, …, fn, v1, v2, …, vn • Sort jobs by finish times so that f1f2 … fn. • Compute p(1), p(2) , …, p(n) • M[0]=0; • for j = 1 to n do • M[j] = max { vj+m[p(j)], m[j-1]} • if (M[j] == M[j-1]) then B[j]=0else B[j]=1 /*for backtracking • m=n; /*** Backtracking • while ( m ≠0) { if (B[m]==1) then • print job m; m=p(m) • else • m=m-1 } B[j]=0 indicating job j is not selected. B[j]=1 indicating job j is selected. chapter25

  10. M[2]=w2+M[0]=4+0; M[3]=w3+M[1]=4+2; M[4]=W4+M[0]=7+0; M[5]=W5+M[3]=2+6; M[6]=w6+M[3]=1+6<8; 0 1 2 3 4 5 6 Index 0 2 M = w1=2 p(1)=0 1 w2=4 p(2)=0 0 2 4 2 w3=4 p(3)=1 3 0 2 4 6 w4=7 p(4)=0 4 w5=2 p(5)=3 0 2 4 6 7 5 w6=1 p(6)=3 6 0 2 4 6 7 8 0 2 4 6 7 8 8 Backtracking: job1, job 3, job 5 j: 0 1 2 3 4 5 6 B: 0 1 1 1 1 1 0 chapter25

  11. Backtracking and time complexity • Backtracking is used to get the schedule. • P()’s can be computed in O(n) time after sorting all the jobs based on the starting times. • Time complexity • O(n) if the jobs are sorted and p() is computed. • Total time: O(n log n) including sorting. chapter25

  12. Computing p()’s in O(n) time P()’s can be computed in O(n) time using two sorted lists, one sorted by finish time (if two jobs have the same finish time, sort them based on starting time) and the other sorted by start time. Start time: b(0, 5), a(1, 3), e(3, 8), c(5, 6), d(6, 8) Finish time a(1, 3), b(0,5), c(5,6), d(3,8), e(6,8) P(d)=c, p(c )=b, p(e)= a, p(a)=0, p(b)=0. (See demo7) chapter25

  13. Example 2: Start time: b(0, 5), a(1, 3), e(3, 8), c(5, 6), d(6, 8) Finish time a(1, 3), b(0,5), c(5,6), d(6,8), e(3,8) P(d)=c, p(c )=b, p(e)= a, p(a)=0, p(b)=0. v(a)=2, v(b)=3, v(c )=5, v(d) =6, v(e)=8.8. Solution: M[0]=0, M[a]=2. M[b]=max{2, 3+M[p(b)]}=3. M[c]=max{3, 5+M[p(c )]}=5+M[b]=8. M[d]=max{8, 6+M[p(d)]}=6+M[c]=6+8=14. M[e]=max{14, 8.8+M[p(e)]}=max{14, 8.8+M[a]}=max {14, 10.8}=14. Backtracking: b, c, d. Job: a b c d e B: 1 111 0 chapter25

  14. Longest common subsequence • Definition 1: Given a sequence X=x1x2...xm, another sequence Z=z1z2...zk is a subsequence of X if there exists a strictly increasing sequence i1i2...ik of indices of X such that for all j=1,2,...k, we have xij=zj. • Example 1: If X=abcdefg, Z=abdg is a subsequence of X. X=abcdefg,Z=ab d g chapter25

  15. Definition 2: Given two sequences X and Y, a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y. • Example 2: X=abcdefg and Y=aaadgfd. Z=adf is a common subsequence of X and Y. X=abc defg Y=aaaadgfd Z=a d f chapter25

  16. Definition 3: A longest common subsequence of X and Y is a common subsequence of X and Y with the longest length. (The length of a sequence is the number of letters in the seuqence.) • Longest common subsequence may not be unique. • Example: abcd acbd Both acd and abd are LCS. chapter25

  17. Longest common subsequence problem • Input: Two sequences X=x1x2...xm, and Y=y1y2...yn. • Output: a longest common subsequence of X and Y. • Applications: • Similarity of two lists • Given two lists: L1: 1, 2, 3, 4, 5 , L2:1, 3, 2, 4, 5, • Length of LCS=4 indicating the similarity of the two lists. • Unix command “diff”. chapter25

  18. Longest common subsequence problem • Input: Two sequences X=x1x2...xm, and Y=y1y2...yn. • Output:a longest common subsequence of X and Y. • A brute-force approach Suppose that mn. Try all subsequence of X (There are 2m subsequence of X), test if such a subsequence is also a subsequence of Y, and select the one with the longest length. chapter25

  19. Charactering a longest common subsequence • Theorem (Optimal substructure of an LCS) • Let X=x1x2...xm, and Y=y1y2...yn be two sequences, and • Z=z1z2...zk be any LCS of X and Y. • 1. If xm=yn, then zk=xm=yn and Z[1..k-1] is an LCS of X[1..m-1] and Y[1..n-1]. • 2. If xmyn, then zkxm implies that Z is an LCS of X[1..m-1] and Y. • 2. If xmyn, then zkyn implies that Z is an LCS of X and Y[1..n-1]. chapter25

  20. The recursive equation • Let c[i,j] be the length of an LCS of X[1...i] and Y[1...j]. • c[i,j] can be computed as follows: 0 if i=0 or j=0, c[i,j]= c[i-1,j-1]+1 if i,j>0 and xi=yj, max{c[i,j-1],c[i-1,j]} if i,j>0 and xiyj. Computing the length of an LCS • There are nm c[i,j]’s. So we can compute them in a specific order. chapter25

  21. The algorithm to compute an LCS • 1. for i=1 to m do • 2. c[i,0]=0; • 3. for j=0 to n do • 4. c[0,j]=0; • 5. for i=1 to m do • 6. for j=1 to n do • 7. { • 8. if x[i] ==y[j] then • 9. c[i,j]=c[i-1,j-1]+1; • 10 b[i,j]=1; • 11. elseif c[i-1,j]>=c[i,j-1] then • 12. c[i,j]=c[i-1,j] • 13. b[i,j]=2; • 14. else c[i,j]=c[i,j-1] • 15. b[i,j]=3; • 14 } chapter25

  22. Example 3: X=BDCABA and Y=ABCBDAB. chapter25

  23. Constructing an LCS (back-tracking) • We can find an LCS using b[i,j]’s. • We start with b[n,m] and track back to some cell b[0,i] or b[i,0]. • The algorithm to construct an LCS (backtracking) 1. i=m 2. j=n; 3. if i==0 or j==0 then exit; 4. if b[i,j]==1 then { i=i-1; j=j-1; print “xi”; } 5. if b[i,j]==2 i=i-1 6. if b[i,j]==3 j=j-1 7. Goto Step 3. • The time complexity: O(nm). chapter25

  24. Remarks on weighted interval scheduling • it takes long time to explain. (50+13 minutes) • Do not mention exponent time etc. • For the first example, use the format of example 2 to show the computation process (more clearly). chapter25

  25. Shortest common supersequence • Definition:Let X and Y be two sequences. A sequence Z is a supersequence of X and Y if both X and Y are subsequence of Z. • Shortest common supersequence problem: Input: Two sequences X and Y. Output: a shortest common supersequence of X and Y. • Example: X=abc and Y=abb. Both abbc and abcb are the shortest common supersequences for X and Y. chapter25

  26. Recursive Equation: • Let c[i,j] be the length of an SCS of X[1...i] and Y[1...j]. • c[i,j] can be computed as follows: j if i=0 i if j=0, c[i,j]= c[i-1,j-1]+1 if i,j>0 and xi=yj, min{c[i,j-1]+1,c[i-1,j]+1} if i,j>0 and xiyj. chapter25

  27. chapter25

  28. The pseudo-codes for i=0 to n do c[i, 0]=i; for j=0 to m do c[0,j]=j; for i=1 to n do for j=1 to m do if (xi == yj) c[i ,j]= c[i-1, j-1]+1; b[i.j]=1; else { c[i,j]=min{c[i-1,j]+1, c[i,j-1]+1}. if (c[I,j]=c[i-1,j]+1 then b[I,j]=2; else b[I,j]=3; } p=n, q=m; / backtracking while (p≠0 or q≠0) { if (b[p,q]==1) then {print x[p]; p=p-1; q=q-1} if (b[p,q]==2) then {print x[p]; p=p-1} if (b[p,q]==3) then {print y[q]; q=q-1} } chapter25

  29. Exercises • Exercise 1: For the weighted interval scheduling problem, there are eight jobs with starting time and finish time as follows: j1=(0, 8), j2=(2, 3), j3=(3, 6), j4=(5, 9), j5=(8, 12), j6=(9, 11), j7=(10, 13) and j8=(11, 16). The weight for each job is as follows: v1=3.5, v2=2.0, v3=3.0, v4=3.0, v5=6.5, v6=2.5, v7=12.0, and v8=8.0. Find a maximum weight subset of mutually compatible jobs. (Backtracking process is required.) (You have to compute p()’s. The process of computing p()’s is NOT required.) • Exercise 2: Let X=abbacab and Y=baabcbb. Find the longest common subsequence for X and Y.                Backtracking process is required.   chapter25

  30. Summary of Week 6 • Understand the algorithms for the weighted Interval Scheduling problem, LCS and SCS. • The “alignment of sequences” part is not taught.   chapter25

  31. Alignment of sequences • An alignment: • inserting spaces into X and Y such that the two resulting sequences X’ and Y’ are of the same length. • every letter in X’ is opposite to a unique letter in Y’. Examples: o-currence o-curr-ance abbbaa--bbbbaab occurrence o-curre-nce ababaaabbbbba-b • The alignment value: • where X’[i] and Y’[i] are the two letters in column i of the alignment and s(X’[i], Y’[i]) is the score (weight) of these opposing letters. • There are several popular socre schemes for DNA and protein sequences. chapter25

  32. Recursive equations: c[i,j]=max{ c[i-1, j-1]+s(X[i], Y[j]), c[i, j-1]+s(_,Y[j]), c[i-1, j)+s(X[i],_)}. Similarity Score Scheme (max): • match: 1; • mismatch or insertion or deletion: 0. Example: AB B CAA A A B B C A A A 0 0 0 0 0 0 0 0 A BC CA A A 0 1 1 1 1 1 1 1 B 0 1 2 2 2 2 2 2 C 0 1 2 2 3 3 3 3 C 0 1 2 2 3 3 3 3 A 0 1 2 2 3 4 4 4 A 0 1 2 2 3 4 5 5 The same as LCS if we use the special similarity score and maximization chapter25

  33. Recursive equations: c[i,j]=min{ c[i-1, j-1]+s(X[i], Y[j]), c[i, j-1]+s(_,Y[j]), c[i-1, j)+s(X[i],_)}. Distance Score Scheme (mix): • match: 0 insertion and deletion 1; • Mismatch 2 Example: AB B CAA A A B B CA A A 0 1 2 3 4 5 6 7 A BC C A A A 1 1 2 3 4 5 6 7 B 2 2 2 3 4 5 6 7 C 3 3 3 4 4 5 6 7 C 4 4 4 5 5 6 7 8 A 5 5 5 6 6 6 7 8 A 6 6 6 7 7 7 7 8 The same as SCS if we use the special distance score and minimization chapter25

  34. A score emphasizing A-A match: (max) • A-A match: 1, • Any other match or mismatch: 0. Example: A B B C AA A A B B C A A A 0 0 0 0 0 0 0 0 A B C C A A A 0 1 1 1 1 1 1 1 B 0 1 1 1 1 1 1 1 C 0 1 1 1 1 1 1 1 C 0 1 1 1 1 1 1 1 A 0 1 1 1 1 2 2 2 A 0 1 1 1 1 2 3 3 There are 3 A-A matchs chapter25

  35. Recursive equations: c[i,j]=min{ c[i-1, j-1]+s(X[i], Y[j]), c[i, j-1]+s(_,Y[j]), c[i-1, j)+s(X[i],_)}. c[i,j]=max{ c[i-1, j-1]+s(X[i], Y[j]), c[i, j-1]+s(_,Y[j]), c[i-1, j)+s(X[i],_)}. • Time and space complexity Both are O(nm) or O(n2) if both sequences have equal length n. • Why? We have to compute c[i,j] (the cost) and b[i,j] (for back-tracking). Each will take O(n2). chapter25

More Related