1 / 39

Longest Common Subsequence

Longest Common Subsequence. Strand of DNA: a string of molecules called bases Four possible bases: adenine (A), guanine (G), cytosine (C) and thymine (T) DNA strand expressible as string over alphabet {A,C,G,T} Examples organism 1: ACCGGTCGAGTGCGCGGAAGCCGGCCGAA

damali
Download Presentation

Longest Common Subsequence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Longest Common Subsequence • Strand of DNA: a string of molecules called bases • Four possible bases: adenine (A), guanine (G), cytosine (C) and thymine (T) • DNA strand expressible as string over alphabet {A,C,G,T} • Examples • organism 1: ACCGGTCGAGTGCGCGGAAGCCGGCCGAA • organism 2: GTCGTTCGGAATGCCGTTGCTCTGTGTAAA • “Similarity” of strands used as measure of similarity of organisms • Similar could mean • one string is a substring of the other (text-matching algorithms) • number of letter changes to go from one to the other(edit difference – exercise 15-3) • longest common subsequence:a longest sequence that appears as a (not necessarily consecutive) subsequence of both sequences

  2. Longest Common Subsequence • Strand of DNA: a string of molecules called bases • Four possible bases: adenine (A), guanine (G), cytosine (C) and thymine (T) • DNA strand expressible as string over alphabet {A,C,G,T} • Examples • organism 1: ACCGGTCGAGTGCGCGGAAGCCGGCCGAA • organism 2: GTCGTTCGGAATGCCGTTGCTCTGTGTAAA • “Similarity” of strands used as measure of similarity of organisms • Similar could mean • one string is a substring of the other (text-matching algorithms) • number of letter changes to go from one to the other(edit difference – exercise 15-3) • longest common subsequence:a longest sequence that appears as a (not necessarily consecutive) subsequence of both sequences • GTCGTCGGAAGCCGGCCGAAis the LCS of above strands.

  3. Formal Definitions • SubsequenceSequence Z = z1, z2, ... , zk is a subsequence of sequence X = x1, x2, ... , xm if there is an increasing sequence of indices 1 ≤ i1 < i2 <  < ik≤ m such that for j = 1, …, n. • Example: • Z = B,C,D,B is a subsequence of X = A,B,C,B,D,A,B

  4. Formal Definitions • SubsequenceSequence Z = <z1, z2, ... , zk is a subsequence of sequence X = x1, x2, ... , xm if there is an increasing sequence of indices 1 ≤ i1 < i2 <  < in≤ m such that for j = 1, …, n. • Example: • Z = B,C,D,B> is a subsequence of X = A,B,C,B,D,A,B • Index subsequence: 2,3,5,7

  5. Formal Definitions • Common SubsequenceSequence Z is a common subsequence of sequences X and Y if Z is a subsequence of X and Z is a subsequence of Y • Example: • Z = B,C,A is a common subsequence of X = A,B,C,B,D,A,BY = B,D,C,A,B,A • The Z given above is not a longest common subsequence of X and Y: B,C,B,A is a longer common subsequence of X and Y • In fact B,C,B,A is a longest common subsequence of X and Y.

  6. Longest Common Subsequence Problem • Common Subsequence Problem Input: Finite sequences X and Y Output: A maximum-length common subsequence of X and Y • We will examine an efficient dynamic programming algorithm for this problem.

  7. Longest Common Subsequence Characterization • Definition For each i with 0 ≤ i ≤ m, the ith prefix of sequence X = x1, x2, ... , xmis the sequence Xi = x1, x2, ... , xi • Note that the 0th prefix is the empty sequence, which has length 0 • Example If X = A,B,C,B,D,A,B, then X4 =

  8. Longest Common Subsequence Characterization • Definition For each i with 0 ≤ i ≤ m, the ith prefix of sequence X = x1, x2, ... , xmis the sequence Xi = x1, x2, ... , xi • Note that the 0th prefix is the empty sequence, which has length 0 • Example If X = A,B,C,B,D,A,B, then X4 = A,B,C,B

  9. Optimal Substructure for LCS • Theorem Let X = x1, x2, ... , xm-1 , xm  and Y = y1, y2, ... , yn-1, yn be sequences and let Z = z1, z2, ... , zk-1 , zk be any LCS of X and Y

  10. Optimal Substructure for LCS zk = = • Theorem Let X = x1, x2, ... , xm-1, xm and Y = y1, y2, ... , yn-1, yn be sequencesand let Z = z1, z2, ... , zk-1 , zk be any LCS of X and Y 1. If xm = yn, then zk = xm = yn and Zk-1is an LCS of Xm-1 and Yn-1.

  11. Optimal Substructure for LCS zk yn   • Theorem Let X = x1, x2, ... , xm-1, xm and Y = y1, y2, ... , yn-1, yn be sequencesand let Z = z1, z2, ... , zk-1 ,zk be any LCS of X and Y 1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1. 2. If xm yn and zk xm then Z is an LCS of Xm-1 and Y.

  12. Optimal Substructure for LCS zk xm   • Theorem Let X = x1, x2, ... , xm-1, xm and Y = y1, y2, ... , yn-1, yn be sequencesand let Z = z1, z2, ... , zk-1 ,zk be any LCS of X and Y 1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1. 2. If xm yn and zk xm then Z is an LCS of Xm-1 and Y. 3. If xm yn, then zk yn implies Z is an LCS of X and Yn-1.

  13. Optimal Substructure for LCS • Theorem Let X = x1, x2, ... , xm-1, xm and Y = y1, y2, ... , yn-1, yn be sequences and let Z = z1, z2, ... , zk-1 ,zk be any LCS of X and Y 1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1. 2. If xm yn and zk xm then Z is an LCS of Xm-1 and Y. 3. If xm yn, then zk yn implies Z is an LCS of X and Yn-1. • The importance of the above theorem is that it shows that an LCS of two sequences contains an LCS of prefixes of the sequences. • Therefore, the LCS problem has the optimal-substructure property. • A recurrence characterizing the LCS of two sequences also follows from the theorem

  14. Optimal Substructure for LCS • Given sequences X = x1, x2, ... , xm, Y = y1, y2, ... , yn and integers i,j with 0 ≤ i ≤ m and 0 ≤ j ≤ n, let c[i,j] denote the length of a longest common subsequence of Xi and Yj • Then c[i,j] satisfies the following recurrence: • A direct recursive implementation of the above recurrence would produce an exponential-time algorithm (overlapping sub-problems) • Since there are (mn) sub-problems, we can use dynamic programming to compute the solutions bottom up.

  15. Dynamic Programming Algorithm for LCS • Input: Sequences a and b • Output Two-dimensional table c • Postcondition: c[i,j] is the length of a longest common subsequence of ai and bj

  16. Dynamic Programming Algorithm for LCS LCS(a,b,c) m = a.lastn = b.last for i = 0 to m c[i][0] = 0 for j = 1 to n c[0,j] = 0 for i = 1 to mfor j = 1 to n if ( a[i]  b[j] ) c[i][j] = max { c[i-1][j], c[i][j-1] } else c[i,j] = 1 + c[i-1,j-1] Running time: (mn)

  17. Dynamic Programming Algorithm for LCS • Printing the longest common subsequence LCS_print(a,m,n,c) { if c[m][n] == 0 return if ( c[m][n] == c[m-1][n] ) LCS_print(a,m-1,n,c) else if (c[m][n] == c[m][n-1] ) LCS_print(a,m,n-1,c) else { LCS_print(a,m-1,n-1,c) print( a[m] )} } Running time: O(m+n)

  18. Example • In the example that follows we will use the following symbols to indicate how the value in a given cell is computed:  c[m][n] = c[m][n-1]  c[m][n] = c[m-1][n]  c[m][n] = 1 + c[m-1][n-1]

  19. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  20. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  21. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  22. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  23. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  24. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  25. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  26. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  27. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  28. Example: Computing the LCS • We next use the previous table to find the longest common subsequence of the strings a =  A, B, C, B, D, A, B  and b =  B, D, C, A, B, A 

  29. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  30. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  31. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  32. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  33. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  34. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  35. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  36. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  37. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A 

  38. Example a =  A, B, C, B, D, A, B  b =  B, D, C, A, B, A  Longest Common Subsequence:BCBA

  39. LCS Homework Page 348, # 2,3

More Related