1 / 9

Speaker: 吳展碩

An Extension of the String-to-String Correction Problem Roy Lowrance and Robert A. Wagner Journal of the ACM, vol. 22, No. 2, April 1975, pp. 177-183. Speaker: 吳展碩. Edit Distance. Three edit operations: Substitution abcd -> aacd ( change b to a ) Insertion

elpida
Download Presentation

Speaker: 吳展碩

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Extension of the String-to-String Correction ProblemRoy Lowrance and Robert A. WagnerJournal of the ACM, vol. 22, No. 2, April 1975, pp. 177-183. Speaker: 吳展碩

  2. Edit Distance • Three edit operations: • Substitution • abcd -> aacd (changeb toa) • Insertion • abcd -> abacd (insert ana) • Deletion • abcd -> abd (deletec) • Given two strings T and P, The problem is to determine the minimum number of edit operations to transform T into P. Note: For clarity, we consider the cost of all edit operations are same.

  3. d[i, j] = min( d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + cost(A[i]->B[j]) ) This example is copied from Wikipedia

  4. The Problem • This paper extends the set of edit operations to include the operation of interchanging two adjacent characters. • Swap • Example: T: a b c d P: c d a a b c d -> a c d -> c a d -> c d a

  5. T: a b c d P: c d a Trace • A trace is a graphical specification of how edit operations apply to each character in the two strings. • Example:

  6. Important Properties • The edit operations in following cases can be substituted by other edit operations.

  7. 2  swaps insertion + deletion or deletion + substitution 2 substitution swap + substitution K L swap+Kdeletion+Linsertion a trace with lower cost

  8. The Algorithm i’ i j’ j d[i, j] = min( d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + cost(A[i]->B[j]), d[i'-1, j'-1] + (i-i'-1) + (j-j'-1) + 1 )

  9. Summary • With a simple preprocessing on |T| and |P|, then the problem can be solved by dynamic programming in time O(|T||P|). • If we allow edit operations to have different cost Insertion (cost WI) Deletion (cost WD) Swap (cost WS) Substitution (cost WC) then the algorithm works if 2 WS ≥ WI + WD.

More Related