90 likes | 208 Views
An Extension of the String-to-String Correction Problem Roy Lowrance and Robert A. Wagner Journal of the ACM, vol. 22, No. 2, April 1975, pp. 177-183. Speaker: 吳展碩. Edit Distance. Three edit operations: Substitution abcd -> aacd ( change b to a ) Insertion
E N D
An Extension of the String-to-String Correction ProblemRoy Lowrance and Robert A. WagnerJournal of the ACM, vol. 22, No. 2, April 1975, pp. 177-183. Speaker: 吳展碩
Edit Distance • Three edit operations: • Substitution • abcd -> aacd (changeb toa) • Insertion • abcd -> abacd (insert ana) • Deletion • abcd -> abd (deletec) • Given two strings T and P, The problem is to determine the minimum number of edit operations to transform T into P. Note: For clarity, we consider the cost of all edit operations are same.
d[i, j] = min( d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + cost(A[i]->B[j]) ) This example is copied from Wikipedia
The Problem • This paper extends the set of edit operations to include the operation of interchanging two adjacent characters. • Swap • Example: T: a b c d P: c d a a b c d -> a c d -> c a d -> c d a
T: a b c d P: c d a Trace • A trace is a graphical specification of how edit operations apply to each character in the two strings. • Example:
Important Properties • The edit operations in following cases can be substituted by other edit operations.
2 swaps insertion + deletion or deletion + substitution 2 substitution swap + substitution K L swap+Kdeletion+Linsertion a trace with lower cost
The Algorithm i’ i j’ j d[i, j] = min( d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + cost(A[i]->B[j]), d[i'-1, j'-1] + (i-i'-1) + (j-j'-1) + 1 )
Summary • With a simple preprocessing on |T| and |P|, then the problem can be solved by dynamic programming in time O(|T||P|). • If we allow edit operations to have different cost Insertion (cost WI) Deletion (cost WD) Swap (cost WS) Substitution (cost WC) then the algorithm works if 2 WS ≥ WI + WD.