90 likes | 198 Views
The Longest Common Subsequence Problem. CSE 373 Data Structures. Reading. Goodrich and Tamassia, 3 rd ed, Chapter 12, section 11.5, pp.570-574. Motivation. Two Problems and Methods for String Comparison: The substring problem The longest common subsequence problem .
E N D
The Longest Common Subsequence Problem CSE 373 Data Structures
Reading Goodrich and Tamassia, 3rd ed, Chapter 12, section 11.5, pp.570-574. CSE 373 AU 04 -- Longest Common Subsequences
Motivation • Two Problems and Methods for String Comparison: • The substring problem • The longest common subsequence problem. • In both cases, good algorithms do substantially better than the brute force methods. CSE 373 AU 04 -- Longest Common Subsequences
String Matching Problem • Given two strings TEXT and PATTERN, find the first occurrence of PATTERN in TEXT. • Useful in text editing, document analysis, genome analysis, etc. CSE 373 AU 04 -- Longest Common Subsequences
String Matching Problem:Brute-Force Algorithm For i = 0 to n – m { For j = 0 to m – 1 { If TEXT[j] PATTERN[i] then break If j = m – 1 then return i } return -1; } Suppose TEXT = 0000000000001 PATTERN = 0000001 This type of problem has (n2) behavior. A more efficient algorithm is the Boyer-Moore algorithm. (We will not be covering it in this course.) CSE 373 AU 04 -- Longest Common Subsequences
Longest Common Subsequence Problem • A Longest Common Subsequence LCS of two strings S1 and S2 is a longest string the can be obtained from S1 and from S2 by deleting elements. • For example, S1 = “thoughtful” and S2 = “shuffle” have an LCS: “hufl”. • Useful in spelling correction, document comparison, etc. CSE 373 AU 04 -- Longest Common Subsequences
Dynamic Programming • Analyze the problem in terms of a number of smaller subproblems. • Solve the subproblems and keep their answers in a table. • Each subproblem’s answer is easily computed from the answers to its own subproblems. CSE 373 AU 04 -- Longest Common Subsequences
Longest Common Subsequence:Algorithm using Dynamic Programming • For every prefix of S1 and prefix of S2 we’ll compute the length L of an LCS. • In the end, we’ll get the length of an LCS for S1 and S2 themselves. • The subsequence can be recovered from the matrix of L values. • (see demonstration) CSE 373 AU 04 -- Longest Common Subsequences