Hamming Distance

Hamming Distance • Very efficient, only for the strings with same length. • Basically it simply counts the number of distinct characters. • Wont help much for us.

Levenstein distance • It measures distance in terms of the number of "operations" required to transform one string to another. • These operations include insertion, deletion and substitution. • In Damerau- Levenstein distance transposition is included. • This may useful for spelling correction but I am not sure how it will efficient in our case.

Needlman-Wunsch • This algorithm is same like Damerau- Levenstein with weighted edit distance, this is used in biology. • Mainly used for Alignment • So obviously we don’t need it.

Smith–Waterman algorithm • Like Needlman – Wunsch algorithm this is also mainly used for alignment. • This also used in biology. • Gotoh distance is also used to find the alignment.

Jaro-Winkler Similarity • The order of occurrence is an essential determination of similarity. • For instance, the strings "martha" and "marhta" are considered a complete match because the transposed "th" and "ht" are within 2 characters of each other. • The more transposes found between the two strings, the smaller the overall matching weight.

Matching coefficient • This is simple same as hamming distance with one change- position is not important • Simply counts the number of terms present • |a ∩ b| - It doesn’t take in to account the sizes of a and b • There are some metrics which use the same with including sizes of a and b. those are as follows Any one of this may helpful for us

Jaccardcoefficient • The sentence is tokenized into words. Then words are compared with other sentence words. • |a ∩ b| / |a U b| • This is one of the most efficient algorithm. • Overlap Coefficient is similar with slight modulation is formula: |a ∩ b| / min(|a|,|b|)

Sørensen Similarity • Same as jaccard similarity with different formula. • Similarity = 2* |Number of intersection| / |union number of words| • This is Identical to Dice’s coefficient These may all be considered to be normalised versions of the simple matching coefficient

Other metrics • Other metrics like SFS, Tau, Confusion probability, Skew divergence, Cosine, TFIDF, etc are either not useful for us or contains big calculations which is not possible in our case.

Hamming Distance

Hamming Distance

Presentation Transcript

Hamming Codes

Hamming Code

Hamming Codes

Richard W. Hamming

DCSP-8: Minimal length coding II, Hamming distance, Encryption

SHADE: Secure HAmming DistancE computation from oblivious transfer

HAMMING CODE SOLUTIONS

Hamming It Up with Hamming Codes

LAB2 Calculating Hamming Distance

Hamming Codes

HmSearch : An Efficient Hamming Distance Query Processing Algorithm

Improve sketching of Hamming Distance with Error Correcting

Hamming Code

Richard W. Hamming

Richard W. Hamming

Hamming Code

Hamming Code

Richard W. Hamming

Hamming Code

Richard W. Hamming

Richard W. Hamming

Hamming Distance, minimum hamming Distance, Hamming code, error detection & correction