1 / 32

Rules for Approximate String Matching

Rules for Approximate String Matching. R.C.T. Lee. Rule 1. Consider two substrings A 1 and A 2 as shown below:. A 1. P 1. S 1. A 2. P 2. S 2. If ed ( A 1 , A 2 ) ≦ k and S 1 = S 2 , then ed ( P 1 , P 2 ) ≦ k.

kairos
Download Presentation

Rules for Approximate String Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rules for Approximate String Matching R.C.T. Lee

  2. Rule 1 Consider two substrings A1 and A2 as shown below: A1 P1 S1 A2 P2 S2 If ed(A1, A2) ≦k and S1=S2, then ed(P1, P2) ≦k.

  3. Rule 1:[AKLLLR2000], [H2005], [HHLS2006], [JB2000], [LV89], [NB99], [NB2000], [S80], [TU93], and [WM92].

  4. Rule 2 If ed(A, B) ≦k, then the length of A must be between m-k and m+k. A B m

  5. Rule 2: [FN2004], [NB99], [NB2000] and [TU93].

  6. Rule 3 If S1 contain S1’ completely and the distance between S1’ and any substring of P is larger than k, then ed(S1, P)>k. S1 S1’ P

  7. Rule 3: [ALP2004].

  8. Rule 4 T S1 For any substring S1 in T, if there exists a substring S2 in P to the left of S1, ed(S1, S2) ≦k and S2 is the rightmost such substring, then move P to align S1 and S2. P S2 P S2

  9. Rule 4: [ALP2004].

  10. Based upon Rule 3 and Rule 2, we have Rule 5 m-k If the window size is (m-k) and there exists a substring S1 in the window such that the distance between S1 and any substring of P is larger than k, then we can safely move P as follows: T S1 P m-k T S1 P

  11. If Rule 5 is not satisfied, it means the following: For every substring S1 in T, there exists a substring S2 in P such that ed(S1, S2) ≦k.

  12. Rule 5-1 m-k T S1 P If Rule 5 is not satisfied, we can only move 1 step as follows: m-k T S1 P

  13. Rule 5: [HN2005].

  14. Rule 6 Hamming Distance(A, B) ≧Edit Distance(A, B).

  15. Rule 6: [AKLLLR2000], [FN2004] and [TU93].

  16. Rule 7 For strings A and B, if there are k+1 characters which do not appear in B, then ed(A, B)>k. Rule 7-1 Let A and B be two strings. Let there be k+1 characters a1, a2, …, ak+1 in A and ai is aligned with bi in B. If every ai does not appear in B[i-k, i+k], then ed(A, B)>k.

  17. Rule 7: [TU93].

  18. Rule 8 Let there be two strings A and B. Let B be divided into j pieces B1, B2, …, Bj. If ed(A, B)>k, there is at least one substring Ai in A such that ed(Ai, Bi) .

  19. Rule 8-1 Let A and B be two strings. Let B be divided into j pieces B1, B2, …, Bj. If for every Bi and every substring S of A, ed(S, Bi) , ed(A, B)>k.

  20. Rule 8-2 Let A and B be two strings. Let the lengths of A and B be m+k and m repsectively. Let B be divided into j pieces B1, B2, …, Bj. Let AP be a prefix of A. If for every Bi and every substring S of A, ed(S, Bi) , ed(AP, B)>k.

  21. Rule 8: [NB99] and [NB2000].

  22. Rule 9 Let A and B be two strings with lengths m+k and m respectively. Let A’ be the prefix of A with length m-k. Let there be j characters a1, a2, …, aj in A’. Let the number of times that ai appears in A and B be N(A’, ai) and N(B, ai) respectively. Let Ci=N(A’, ai)-N(B, ai). Let AP be any prefix of A. If , ed(AP, B)>k.

  23. Rule 9-1 Let A and B be two strings with lengths m+k and m respectively. Let there be j characters a1, a2, …, aj in A. Let the number of times that ai appears in A and B be N(A’, ai) and N(B, ai) respectively. Let Ci=N(B, ai)-N(A, ai). Let AP be any prefix of A. If , ed(AP, B)>k.

  24. Rule 10 m+2k T P’ i-k i i+m+k P Let P and T be two strings with lengths m and n respectively. If P matches with a substring P’ of T at position i, any substring S of T[i-k, i+m+k] has the probability of ed(S, P) ≦k.

  25. Rule 10: [NB99].

  26. Rule 11 Let P and Q be two strings. Let P be divided as follows: … P1 Pn P2 Let Qi be the substring in Q and that ed(Pi, Qi) is the smallest. … P1 Pn P2 … Q2 QN Q1 If

  27. Application of Rule 11 W … t2 tn T t1 Pn P2 P1 ed(ti,Pi) is the smallest. If for some n,

  28. [AKLLLR2000] Text Indexing and Dictionary Matching with One Error , Amir, A., Keselman, D., Landau, G. M., Lewenstein, M., Lewenstein, N. and Rodeh, M. , Journal of Algorithms , Vol. 37 , 2000 , pp. 309-325 . • [ALP2004] Faster Algorithms for String Matching with k Mismatches, Amir, A., Lewenstein, and Porat, E. Journal of Algorithms, Vol. 50, 2004, pp. 257-275. • [FN2004] Average-Optimal Multiple Approximate String Matching, Kimmo Fredriksson , Gonzalo Navarro, ACM Journal of Experimental Algorithmics, Vol 9, Article No. 1.4,2004, pp. 1-47.

  29. [GG86] Improved String Matching with k Mismatches, Galil, Z. and Giancarlo, R.,SIGACT News, Vol. 17, No. 4, 1986, pp. 52-54. • [H2005] Bit-parallel approximate string matching algorithms with transposition Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229. • [HHLS2006] Approximate String Matching Using Compressed Suffix Arrays, Trinh N. D. Huynh, W. K. Hon, T. W. Lam and W. K. Sung, Theoretical Computer Science, Vol. 352, 2006, pp. 240-249.

  30. [HN2005] Bit-parallel Witnesses and their Applications to Approximate String Matching, Heikki Hyyro and Gonzalo Navarro, Algorithmica, Vol 4, No. 3, 2005, pp.203-231. • [JB2000] Approximate string matching using factor automata, Jan Holub, Borivoj Melichar, Theoretical Computer Science 249, 2000, pp. 305-311. • [LV86] String Matching with k Mismatches by Using Kangaroo Method, Landau, G.M., and Vishkin, U., Theoret. Comput Sci 43, 1986, pp. 239-249.

  31. [LV89] Fast Parallel and Serial Approximate String Matching, G. Landau and U. Vishkin, Journal of algorithms, 10, 1989, pp.157-169. • [NB99] Very fast and simple approximate string matching, G. Navarro and R. Baeza-Yates, Information Processing Letters, Vol. 72, 1999, pp.65-70. • [NB2000] A Hybrid Indexing Method for Approximate String Matching, Gonzalo Navarro and Ricardo Baeza-Yates , 2000, No.1, Vol.1, pp.205-239.

  32. [S80] String Matching with Errors, Sellers, P. H., Journal of Algorithms, Vol. 20, No. 1, 1980, pp. 359-373. • [TU93] Approximate Boyer-Moore String Matching, J. Tarhio and E. Ukkonen, SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260. • [WM92] Fast Text Searching: Allowing Errors, Sun Wu and Udi Manber, Communications of the ACM, Vol. 35, 1992, pp. 83-91.

More Related