140 likes | 469 Views
The String-to-String Correction Problem. Advisor : Dr. Hsu Graduate : Wen-Hsiang Hu Authors : Robert A. Wagner, Michael J. Fischer. Journal of the Association for Computing Machinery, January 1974. Outline. Motivation Objective Introduction Edit Distance Algorithm
E N D
The String-to-String Correction Problem Advisor : Dr. Hsu Graduate : Wen-Hsiang Hu Authors :Robert A. Wagner, Michael J. Fischer Journal of the Association for Computing Machinery, January 1974
Outline • Motivation • Objective • Introduction • Edit Distance • Algorithm • Longest Common Subsequences • conclusion • Discussion
Motivation • change the one string into the other
Objective • Solve string-to-string correction problem by edit distance
Introduction • three editing operations are: • changing one character to another single character • deleting one character from the given string • inserting a single character into the given string
Edit Distance • We let the edit distance δ(A,B) from string A to string B be the minimum cost trace of all sequences of edit operations which transform A into B. • δ(A,B)=min γ(T) | T is a trace from A to B , where A and B are two strings, and γ is a cost function. → change ------------> delete --------------> insert
Algorithm Y • If an actual trace T from A to B of least cost is desired, algorithm Y will print the pairs in T.
Longest Common Subsequences • Let ρ(A,B) be the length of the longest common subsequence of A and B. δ(A,B)= |A|+|B|-2ρ(A,B) =>ρ(A,B) =( |A|+|B| -δ(A,B))/ 2
Conclusion • Present an algorithm for computing the edit distance • obvious application: • automatic spelling correction • find the LCS (longest common subsequence) to measure the similarity of two strings
Discussion • 在Wagner(1974)中source 的字串到target 字串間更正問題並不允許交錯的區段搬移(crossing block move)並且要求在來自source 和target 的字元間須符合一對一的關係。在Walter(1984)中,解除了這兩個限制。如圖1和圖2所示,傳統LCS (longest common subsequences)與解除限制後的字串比較。 圖2.說明在一般LCS 的處理以及可允許重 複對應的LCS 圖1.說明在一般LCS 的處理以及可允 許交錯對應的LCS