1 / 20

The Zhu-Takaoka Algorithm

The Zhu-Takaoka Algorithm. On improving the average case of the Boyer-Moore string matching algorithm, Journal of Information Processing 10(3):173-177, 1987 R. F. ZHU, T. TAKAOKA. Advisor: Prof. R. C. T. Lee Speaker: S. Y. Tang.

willis
Download Presentation

The Zhu-Takaoka Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Zhu-Takaoka Algorithm On improving the average case of the Boyer-Moore string matching algorithm, Journal of Information Processing 10(3):173-177, 1987 R. F. ZHU, T. TAKAOKA Advisor: Prof. R. C. T. Lee Speaker: S. Y. Tang

  2. The Zhu-Takaoka Algorithm is an algorithm which solves the string matching problem. • String matching problem: Input: a text string T of length n and a pattern string P of length m. Output: all occurrences of P which occur in T.

  3. The Zhu-Takaoka Algorithm is a variant of the Boyer and MooreAlgorithm.The algorithm only improve the bad character of the Boyer and Moore Algorithm. • Zhu and Takaoka modified the BM Algorithm. They replaced the bad character rule by a 2-substring rule . The good suffix rules are still used.

  4. The 2-Substring Rule • Consider text=ACTGCTAAGTA and pattern=CTAAG. No GC appears in P. Text Pattern Text Pattern Text Pattern

  5. How can we know whether a specified 2-substring appears in P or not?

  6. Whenever a mismatch or a complete match occurs, we select the last 2-substring in T and search for the rightmost location of this 2-substring in P if it exists. This is done by constructing a ztBc table. • Example Text Pattern Shift by 5 Shift by 1 T(CA)=5 means that CA appears in 5 locations from the right end. Thus we can shift by 5. T(GA)=1 means that GA appears in 1 location from the right end. If GA is the 2-substring to be matched, we shift 1 step.

  7. ztBc[a,b] The preprocessing phase of the algorithm consists in computing for each pair of characters (a, b) with a, b the rightmost occurrence of ab in x [ 0..m -2]

  8. preprocessing phase Consider text= ATTGCCTAATA and pattern=CTAAG The alphabet of pattern is {A.C.G.T }; The sign “ * ” denotes a word of text which never appears in pattern. First, we fill inthe blanks with the length m of pattern. Example:

  9. preprocessing phase Then, we suppose the last 2-substring ab does not occur in [0..m-2]. If P0 = b, we set ztBc[i , b] = m-1 for all i. Example: ← b T: ATTGCCTAAGTA P: CTAAG CTAAG ↑ a

  10. preprocessing phase Finally, we set ztBC[a,b] = k if k≤ m-2 and P[m-k-2..m-k-1]=ab and ab does not occurin P[m-k-1..m-2]. Example: ← b P: CTAAG 1 2 3 ↑ a

  11. Case 1 : • If ztBc[A,C] = k • Example Text Pattern Shift by 5 ← b • ztBc[C,A] = 5 ; k ≤ m-2 ; ∵ x[8-5-2..8-5-1] = ab (x[1..2] = CA) and “CA” does not occur in x[8-5-1..8-2] (x[2..6]). ↑ a

  12. Case 2 : => If ztBc[A,C] = k • Example Text Pattern Shift by 7 ← b • ztBc[C,G] = 7 ; k = m-1 ; ∵ x[0] = b ( G = G)and “CG” does not occur in x[0..8-2] (x[0..6] ). ↑ a

  13. Case 3 : => If ztBc[A,C] = k • Example Text Pattern ← b • ztBc[A,C] = 8 ; k = m ; ∵ x[0] ≠b (G≠C) and “AC” does not occur in x[0..8-2] ( x[0..6] ). ↑ a

  14. Text Pattern • Full Example Shift by 5 In the step, we select the ztBc function to shift because ztBc[P6P7=CA] = 5 > bmGs [7] =1. The pattern shifts 5 steps right by case 1. ← b ↑ a

  15. Text Pattern exact matching • Full Example Shift by 7 In the step, we select the bmGs function to shift because ztBc[A,G] = 2 <bmGs [0] = 7. ← b ↑ a

  16. Text Pattern • Full Example Shift by 4 In the step, we select the bmGs function to shift because ztBc[A,G] = 2 < bmGs [5] = 4. ← b ↑ a

  17. Text Pattern • Full Example By the bmGs or ztBc function ; We can select the ztBc function or the bmGs function to shift because ztBc[C,G] = 7 = bmGs [6]. ← b ↑ a

  18. Time complexity • preprocessing phase in O(m+) time and space complexity. • ( = the numbers of alphabet of the text ). • searching phase in O(m ×n) time complexity.

  19. References • ZHU, R.F. and TAKAOKA, T., 1987, On improving the average case of the Boyer-Moore string matching algorithm, Journal of Information Processing 10(3):173-177 .

  20. Thank you for your attention.

More Related