310 likes | 662 Views
Simon Algorithm. String matching algorithms and automata SIMON I. 1st American Workshop on String Processing, pp 151-157(1993). Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu. String matching problem Given a text string T of length n and a pattern string P of length m .
E N D
Simon Algorithm String matching algorithms and automata SIMON I. 1st American Workshop on String Processing, pp 151-157(1993) Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu
String matching problem • Given a text string T of length n and a pattern string P of length m. • Find all occurrences of P in T. • Simon algorithm is an algorithm which solves the string matching problem. • Skip the pattern by using Rule 2.
z≠x • In KMP algorithm, they use a Prefix Function to determine the window shifting. Case 1: T: P: Find Move Case 2: T: Move No such u where z≠x.
z = y • Simon algorithm improves KMP algorithm by using a better function. Case 1: T: P: Find Move Case 2: T: Move No such u where z= y.
Example: T : P : ∵The string P(5,8)= ATCAis the longest suffix of ATCACATCAwhich isequal to a prefix of P, namely P(0,3), and P(4) = T(9), that is T(5,9)= P(0,4). Therefore, we can slide the window to align P(4)with T(9). T : P :
Example: T : P : ∵ There is no suffix of ATCACATCAA which isequal to a prefix of P. Therefore, we slide the window to align P(0)with T(9). T : P : In this case, Simon algorithm is batter than KMP algorithm because KMP algorithm would align P(0) with T(5).
Simon Table • Let . • Let u be longest suffix of (P(0, i-1) + y) which is equal to a prefix of P, where y≠ P(i). SimonTable (i, y, |u|) T u P u
Note that, in the Simon Algorithm, when a mismatch occurs at location i, and if (i, y, |u|)SimonTable, we could move P by (i-|u|+1) steps, otherwise, move P by i+1 steps.
The Simon Table can be constructed recursively by using the table which is used in MP algorithm, called Prefix Table.
The Simon Table for ( i=1 ; i<=m ; i++ ) { t = prefix (i-1) while ( t > 0 ) { if ( P(i) ≠P(t) & (i, P(t), *) SimonTable) SimonTable (i, P(t), t+1); end if else t = prefix ( t – 1 ); /*recursive*/ end while if( t = 0 & P(i) ≠P(t) & (i, P(t), *) SimonTable) SimonTable (i, P(t), 1); end if end for
4 2 6 7 1 0 1 3 5 1 2 3 4 0 0 0 C B C B A B B A Example i P Prefix i = 1. t = prefix(1-1) = prefix(0) = 0. P(1) ≠P(t)= P(0) & (1, P(t), *) SimonTable SimonTable (1, B, 1). SimonTable : {(1, B, 1)}
4 2 6 7 1 0 1 3 5 1 2 3 4 0 0 0 C B C B A B B A Example i P Prefix i = 2. t = prefix(2-1) = prefix(1) = 0. P(2) = P(t) = P(0) = B SimonTable : {(1, B, 1)}
5 6 7 2 4 3 1 2 1 4 0 0 1 0 0 3 C B C B B A B A i P Prefix i = 3. t = prefix(3-1) = prefix(2) = 1. P(3) ≠P(t)= P(1) & (3, C, *) SimonTable SimonTable (3, C, t+1) = (3, C, 2). t = prefix(t-1) = prefix(0) = 0 P(3) ≠P(0) & (3, B, *) SimonTable SimonTable (3, B, 1). SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1)}
5 6 7 2 4 3 1 2 1 4 0 0 1 0 0 3 C B C B B A B A i P Prefix i = 4. t = prefix(4-1) = prefix(3) = 0. P(4) = P(t) = P(0) = B SimonTable : {(1, B, 1), (3, C, 2), (3, B, 1)}
5 6 7 2 4 3 1 2 1 4 0 0 1 0 0 3 C B C B B A B A i P Prefix i = 5. t = prefix(5-1) = prefix(4) = 1. P(5) = P(t)= P(1) = C t = prefix(t-1) = prefix(0) = 0 P(5) ≠ P(0) & (5, B, *) SimonTable SimonTable (5, B, 1). SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1)}
5 6 7 2 4 3 1 2 1 4 0 0 1 0 0 3 C B C B B A B A i P Prefix i = 6. t = prefix(6-1) = prefix(5) = 2. P(6) = P(t)= P(2) = B. t = prefix(t-1) = prefix(1) = 0. P(6) =P(0) = B. SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1)}
5 6 7 2 4 3 1 2 1 4 0 0 1 0 0 3 C B C B B A B A i i = 7. t = prefix(7-1) = prefix(6) = 3. P(7) = P(t)= P(3) = A. t = prefix(t-1) = prefix(2) = 1. P(7) ≠ P(t) = P(1) & (7, C, *) SimonTable SimonTable (7, C, t+1) = (7, C, 2). t = prefix(t-1) = prefix(0) = 0. P(7) ≠ P(0) & (7, B, *) SimonTable SimonTable (7, B, 1). SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1), (7, C, 2), (7, B, 1)} P Prefix
5 3 1 6 7 8 4 1 2 3 4 0 0 1 0 0 2 A B C B C B B A i P Prefix i = 8. t = prefix(8-1) = prefix(7) = 4. P(8) ≠P(t) = P(4) & (8, B, *) SimonTable. SimonTable (8, B, t+1) = (8, B, 5). t = prefix(t-1) = prefix(3) = 0. P(8) ≠ P(0), but (8, B, *) SimonTable SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1), (7, C, 2), (7, B, 1), (8, B, 5)}
B B B C B D B A C A B A B D B C … C B C B A B C B A B B A B C B SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1), (7, C, 2), (7, B, 1), (8, B, 5)} • Example: T : P : ∵ P(3)≠T(3) = C, and (3, C, 2)SimonTable. ∴ Move P by (3-2+1) = 2 steps
B A B B D B C B C B C C B A B B A B C B A B C B A C B A B C … B SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1), (7, C, 2), (7, B, 1), (8, B, 5)} • Example: T : P : ∵ P(3)≠T(5) = D, and (3, D, *)SimonTable. ∴ Move P by 4 steps
B B C B D B C B C B C C B A B B A B B A B C B … A C B A B C A B SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1), (7, C, 2), (7, B, 1), (8, B, 5)} • Example: T : P : ∵ P(0, 7)=T(6, 13), and (8, B, 5)SimonTable. ∴ Move P by (8-5+1) = 4 steps.
Preprocessing phase in O(m) time and space complexity. • Searching phase in O(m+n) time complexity.
References • BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992, Éléments d'algorithmique, Chapter 10, pp 337-377, Masson, Paris. • CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostolico and Z. Galil, Chapter 1, pp 1-53, Oxford University Press. • CROCHEMORE, M., HANCART, C., 1997. Automata for Matching Patterns, in Handbook of Formal Languages, Volume 2, Linear Modeling: Background and Application, G. Rozenberg and A. Salomaa ed., Chapter 9, pp 399-462, Springer-Verlag, Berlin. • CROCHEMORE, M., RYTTER, W., 1994, Text Algorithms, Oxford University Press. • HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raffinements, in Théorie des Automates et Applications, Actes des 2e Journées Franco-Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, 99-110. • HANCART, C., 1993, On Simon's string searching algorithm, Inf. Process. Lett. 47(2):95-99. • HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte, Ph. D. Thesis, University Paris 7, France. • SIMON I., 1993, String matching algorithms and automata, in in Proceedings of 1st American Workshop on String Processing, R.A. Baeza-Yates and N. Ziviani ed., pp 151-157, Universidade Federal de Minas Gerais, Brazil. • SIMON, I., 1994, String matching algorithms and automata, in Results and Trends in Theoretical Computer Science, Graz, Austria, Karhumäki, Maurer and Rozenberg ed., pp 386-395, Lecture Notes in Computer Science 814, Springer Verlag.