1 / 29

Parameterized Pattern Matching by Boyer-Moore-type Algorithms

Parameterized Pattern Matching by Boyer-Moore-type Algorithms. Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541 - 550 Brenda S. Baker Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen. Let us consider two strings: A = a 1 a 2 a 3 a 4 a 5 = xaxby

minya
Download Presentation

Parameterized Pattern Matching by Boyer-Moore-type Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541 - 550 Brenda S. Baker Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen

  2. Let us consider two strings: A=a1a2a3a4a5=xaxby B=b1b2b3b4b5=bacbc If the edit distance concept is used, A may be transformed to B by substituting a1 by b1, a3 by b3 and a5 by b5.

  3. In this paper, we define a new transformation in which a character may be substituted by another character. But the substitution is global. That is, if x in A is substituted by a, then every x in A is substituted by a.

  4. A=a1a2a3a4a5=xaxby B=b1b2b3b4b5=bacbc Consider the above example again. To transform A to B, the first x must be substituted by b. But this is global. Thus, A’=babby It can be easily seen that if this kind of substitution is used, A=xaxby can not be transformed to B.

  5. For A=xaxby and B=babbc, A can be transformed to B by substituting x by b and y by c.

  6. We define bijection to be a global substitution of a set of distinct characters into another set characters. A string Pp-matches a string Q if P can be transformed to Q by a bijection.

  7. Let A=ababc B=bcbcd Then A p-matches B because there is a bijection, namely which transforms A to B.

  8. On the other hand, for A=ababc and B=bcbdc, A does not p-match B. It is actually easy to determine whether Ap-matches B. Given A=a1a2… aN and B=b1b2…bN. Ap-matches B if and only if for every i, if ai=x and bi=y, then if aj=x, bj must be y.

  9. For A=ababc and B=bcbcc. It can be seen that every a in A is matched with b and every b is matched c. This is not true for A=ababc and B=bcbdc. Thus, given a string A and a string B which are of the same length, it is trivial to determine whether Ap-matches B.

  10. There is another property which is important. If Ap-matches B and Bp-matches C, then Ap-matches C. It is obvious that this is true.

  11. This paper considers the following problem: Given a text T and a pattern P, find all occurrence where Pp-matches a substring of T. For example: Let and We can see that Pp-matches strings in T.

  12. For P=abaec and S2=cacbd, the substitution will transform P to S2. For S2=cacbd and S1=bcbda, the substitution transforms S2 to S1. It can be seen that P=abaec will be transformed to S1=bcbda by

  13. The substitution can be visualized as follows:

  14. This paper is based upon Good suffix rule 1 and Good suffix rule 2 proposed in Boyer and Moore Algorithm.

  15. Good Suffix Rule 1 for p-match Let T1 be the largest suffix which p-matches with a suffix P1 of P. If there is a substring zP2 which is the right most one and p-matches with yP1 , and z≠y, we can move P as follows:

  16. Example p-mismatch P’ Transform Shift

  17. P’ Transform After moving, we compare T and P from right to left. We found out T6,15≡P1,10.

  18. Good Suffix Rule 2 for p-match Let T1 be the largest suffix of the window of P which p-matches with a suffix P1 of P. Let be suffix of P1 which p-matches with a prefix P2 of P. If exists, we move P as follows:

  19. Example p-mismatch P’ Transform Shift

  20. P’ Transform

  21. The shift function ∆ is

  22. Example p-mismatch P’ Transform j’=7 j=9 Shift

  23. p-mismatch P’ Transform j’=7 j=9 Shift

  24. P’ Transform

  25. Time Complexity • In average case, the preprocessing phase in O(mlog min(m, Π)) time and space complexity O(n) time complexity and searching phase in O(nlog min(m, Π)) .

  26. References • [AFM94] Amihood Amir, Martin Farach, and S. Muthukrishnan, Alphabet dependence in parameterized matching. Info. Proc. Letters, Vol. 49, pp.111-115, 1994. • [Bak] Brenda S. Baker, Parameterized pattern matching: algorithms and applications., J. Comput. Syst. Sci. to appear. • [Bak92] Brenda S. Baker, A program for identifying duplicated code., In Computing Science and Statistics Vol.24: Proceeding of the 24th Symposium on the Interface, pp.49-57, 1992. • [Bak93a] Brenda S. Baker, Parameterized duplication in strings: algorithms and an application to software maintenance., submitted for publication, 1993. • [Bak93b] Brenda S. Baker, A theory of parameterized pattern matching: Algorithms and applications, In Proceedings of the 25th Annual Symposium on Theory of Computing, pp.71-80, pp.1993. • [BM77] Robert S. Boyer and J. Strother Moore, A fast string searching algorithm, Commun. ACM,Vol.20, No.10, pp.762-772, 1977.

  27. References • [BYGR90] Ricardo A. Baeza-Yates, Gaston H. Gonnet, and Mireille Regnier, Analysis of Boyer-Moore-type string searching algorithms. In Proc. of First Annual ACM-SIAM Symposium on Discrete Algorithms, pp.328-343, 1990. • [BYR92] Ricardo A. Baeza-Yates and Mireille Regnier, Average running time of the Boyer-Moore-Horspool algorithm, Theoretical Computer Sci., Vol. 92, pp.19-31, 1992. • [CLC+92] Maxime Crochemore, Thierry Lecroq, Artur Czumaj, Leszek Gasieniec, S. Jarominek, and W. Plandowski, Speeding up two string-matching algorithms, In 9th Annual Symposium on Theoretical Aspects of Computer Science, LNCS Vol.577, pp.589-600, 1992. • [Col 91] Richard Cole. Tight bounds of the complexity of the Boyer-Moore string matching algorithm, In Proceedings of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp.224-234, pp.1991. • [Hor 80] R. Nigel Horspool. Practical fast searchingin strings. Soft. Pract. And Exp., Vol.10, pp.501-506, 1980.

  28. References • [HS91] Andrew Hume and Daniel Sunday, Fast string search, Soft. Pract. And Exp., Vol. 21, No.11, pp.1221-1248, 1991. • [IS94] Ramana M. Idury and Alejandro A. Schaffer. Multiple matching of parameterized patterns. In proc. Of 5th Symposium on Combinatorial Pattern Matching, pp.226-239, 1994. • [KMP77] D. E. Knuth, J. H. Morries, and V. R. Pratt, Fast pattern matching in strings, SIAM J. Comput., Vol.6, No.2, pp.323-350, 1977. • [Ryt80] Wojciech Rytter, A correct preprocessing algorithm for Boyer-Moore string-searching, SIAM J. Comput., Vol.9, No.3, pp.509-512, 1980. • [Sch88] R. Schaback, On the expected sublinearity of the Boyer-Moore algorithm. SIAM J. on Comput., Vol. 17, No.4, pp.648-659, 1988. • [Sun 90] Daniel M. Sunday, A very fast substring search algorithm, Commun. ACM, Vol.33, No.8, pp132-139, 1990

  29. THANK YOU

More Related