200 likes | 466 Views
The Zhu-Takaoka Algorithm. On improving the average case of the Boyer-Moore string matching algorithm, Journal of Information Processing 10(3):173-177, 1987 R. F. ZHU, T. TAKAOKA. Advisor: Prof. R. C. T. Lee Speaker: S. Y. Tang.
E N D
The Zhu-Takaoka Algorithm On improving the average case of the Boyer-Moore string matching algorithm, Journal of Information Processing 10(3):173-177, 1987 R. F. ZHU, T. TAKAOKA Advisor: Prof. R. C. T. Lee Speaker: S. Y. Tang
The Zhu-Takaoka Algorithm is an algorithm which solves the string matching problem. • String matching problem: Input: a text string T of length n and a pattern string P of length m. Output: all occurrences of P which occur in T.
The Zhu-Takaoka Algorithm is a variant of the Boyer and MooreAlgorithm.The algorithm only improve the bad character of the Boyer and Moore Algorithm. • Zhu and Takaoka modified the BM Algorithm. They replaced the bad character rule by a 2-substring rule . The good suffix rules are still used.
The 2-Substring Rule • Consider text=ACTGCTAAGTA and pattern=CTAAG. No GC appears in P. Text Pattern Text Pattern Text Pattern
How can we know whether a specified 2-substring appears in P or not?
Whenever a mismatch or a complete match occurs, we select the last 2-substring in T and search for the rightmost location of this 2-substring in P if it exists. This is done by constructing a ztBc table. • Example Text Pattern Shift by 5 Shift by 1 T(CA)=5 means that CA appears in 5 locations from the right end. Thus we can shift by 5. T(GA)=1 means that GA appears in 1 location from the right end. If GA is the 2-substring to be matched, we shift 1 step.
ztBc[a,b] The preprocessing phase of the algorithm consists in computing for each pair of characters (a, b) with a, b the rightmost occurrence of ab in x [ 0..m -2]
preprocessing phase Consider text= ATTGCCTAATA and pattern=CTAAG The alphabet of pattern is {A.C.G.T }; The sign “ * ” denotes a word of text which never appears in pattern. First, we fill inthe blanks with the length m of pattern. Example:
preprocessing phase Then, we suppose the last 2-substring ab does not occur in [0..m-2]. If P0 = b, we set ztBc[i , b] = m-1 for all i. Example: ← b T: ATTGCCTAAGTA P: CTAAG CTAAG ↑ a
preprocessing phase Finally, we set ztBC[a,b] = k if k≤ m-2 and P[m-k-2..m-k-1]=ab and ab does not occurin P[m-k-1..m-2]. Example: ← b P: CTAAG 1 2 3 ↑ a
Case 1 : • If ztBc[A,C] = k • Example Text Pattern Shift by 5 ← b • ztBc[C,A] = 5 ; k ≤ m-2 ; ∵ x[8-5-2..8-5-1] = ab (x[1..2] = CA) and “CA” does not occur in x[8-5-1..8-2] (x[2..6]). ↑ a
Case 2 : => If ztBc[A,C] = k • Example Text Pattern Shift by 7 ← b • ztBc[C,G] = 7 ; k = m-1 ; ∵ x[0] = b ( G = G)and “CG” does not occur in x[0..8-2] (x[0..6] ). ↑ a
Case 3 : => If ztBc[A,C] = k • Example Text Pattern ← b • ztBc[A,C] = 8 ; k = m ; ∵ x[0] ≠b (G≠C) and “AC” does not occur in x[0..8-2] ( x[0..6] ). ↑ a
Text Pattern • Full Example Shift by 5 In the step, we select the ztBc function to shift because ztBc[P6P7=CA] = 5 > bmGs [7] =1. The pattern shifts 5 steps right by case 1. ← b ↑ a
Text Pattern exact matching • Full Example Shift by 7 In the step, we select the bmGs function to shift because ztBc[A,G] = 2 <bmGs [0] = 7. ← b ↑ a
Text Pattern • Full Example Shift by 4 In the step, we select the bmGs function to shift because ztBc[A,G] = 2 < bmGs [5] = 4. ← b ↑ a
Text Pattern • Full Example By the bmGs or ztBc function ; We can select the ztBc function or the bmGs function to shift because ztBc[C,G] = 7 = bmGs [6]. ← b ↑ a
Time complexity • preprocessing phase in O(m+) time and space complexity. • ( = the numbers of alphabet of the text ). • searching phase in O(m ×n) time complexity.
References • ZHU, R.F. and TAKAOKA, T., 1987, On improving the average case of the Boyer-Moore string matching algorithm, Journal of Information Processing 10(3):173-177 .