100 likes | 417 Views
Complement to lecture 11 : Levenshtein distance algorithm. Levenshtein distance. Also called “Edit distance” Purpose: to compute the smallest set of basic operations Insertion Deletion Replacement that will turn one string into another.
E N D
Levenshtein distance Also called “Edit distance” Purpose: to compute the smallest set of basic operations Insertion Deletion Replacement that will turn one string into another Intro. to Programming, lecture 11 (complement): Levenshtein
Levenshtein distance “Michael Jackson” to “Mendelssohn” H I A M I C H A E L J A C K S O N E N D S H S D S S S D D D D I Operation 1 2 3 4 5 6 7 8 9 10 0 Distance Intro. to Programming, lecture 11 (complement): Levenshtein
Levenshtein distance algorithm levenshtein (source, target: STRING): INTEGER -- Minimum number of operations to turnsourceintotarget local distance: ARRAY_2 [INTEGER] i, j, del, ins, subst : INTEGER do createdistance.make (source.count, target.count) from i := 0untili > source.countloop distance [i, 0] := i ; i := i + 1end fromj := 0 untilj > target.countloopdistance [0, j ] := j ; j := j + 1end -- (Continued) Indexed from zero Intro. to Programming, lecture 11 (complement): Levenshtein
Levenshtein, continued fromi := 1 until i > source.countloop fromj := 1 until j > target.countinvariant loop ifsource [i ] = target [ j ] then distance [i, j ] := distance [ i -1, j -1]else deletion := distance [i -1, j ]insertion := distance [i , j - 1]substitution := distance [i - 1, j - 1] distance [i, j ] := minimum (deletion, insertion, substitution) + 1 end j := j + 1 end i := i + 1 end Result := distance (source.count, target.count) end -- For allp: 0 .. i, q: 0 .. j–1, we can turnsource [1 .. p] -- intotarget [1..q] indistance [p, q] operations s [m .. n ]: substring ofswith items at positionsk such thatm k n (empty ifm > n) Intro. to Programming, lecture 11 (complement): Levenshtein
S B E A T L E 3 1 0 2 5 6 7 4 I I I I I I I 0 1 2 3 4 5 6 7 0 K D I I I I I I B 0 4 2 1 1 3 6 1 5 D D K I I I I I 0 E 2 5 1 4 1 3 2 2 D D K S S S K I I I E 3 2 1 1 2 3 3 4 3 ? S D S K D D D I I I T 4 3 2 4 2 2 3 1 4 D D D D S S S S D I I H 5 4 3 3 2 2 3 4 5 I S D K Substitute Keep Insert Delete
S B E A T L E 0 1 2 3 4 5 6 7 Keep B,1 B 0 4 2 1 3 6 1 5 Keep E,2 E 2 5 0 1 4 1 3 2 Subst EA,3 E 3 2 1 1 2 3 3 4 Keep T,4 T 4 3 2 4 2 2 3 1 SubstHS,7 Ins L,5 Ins E,6 H 5 4 3 3 2 2 3 4 Substitute Keep Insert Delete