170 likes | 538 Views
The Galil-Giancarlo algorithm. On the exact complexity of string matching: upper bounds , SIAM Journal on Computing , Vol. 21 , No. 3 , 1992 , pp. 407-437 . Galil, Z. and Giancarlo, R. Advisor: Prof. R. C. T. Lee Speaker: S. Y. Tang.
E N D
The Galil-Giancarlo algorithm On the exact complexity of string matching: upper bounds , SIAM Journal on Computing , Vol. 21 , No. 3 , 1992 , pp. 407-437 . Galil, Z. and Giancarlo, R. Advisor: Prof. R. C. T. Lee Speaker: S. Y. Tang
The Galil-Giancarlo algorithm is an algorithm which solves the string matching problem. • String matching problem: Input: a text string T of length n and a pattern string P of length m. Output: all occurrences of P in T.
The Galil-Giancarlo algorithm(GG algorithm for short) is an algorithm which improves the worst case of the Colussi algorithm. • There are two phases in the GG algorithm which are preprocessing and searching. • The preprocessing phase is the same as the Colussi algorithm. • The GG algorithm adds 5 cases to determine how to jump in the searching phase and this is the difference between GG algorithm and Colussi algorithm.
The cases under which the GG algorithm is not used. • Case1: The pattern has only one period. The entire window is skipped. There is no way to know whether there is a prefix in the window equal to a prefix of the pattern. • Example: T: GCAGCGGGAC P: GGAGC GGAGC mismatch shift
Case2: A prefix of the pattern is already known to be equal to a prefix of the window. T: GGACGGAACGCA P: GGAGGGA GGAGGGA T: GCAGGAGCAGCA P: GGAGGAG GGAGGAG mismatch shift mismatch shift
Case:1 Text k = 2 If l>k Pattern l = 5 shift If l=k ; p[l+1]≠t[j+k] Case:2 Text k = 3 Pattern l = 3 shift If l<k ; p[l+1]≠t[j+k] Case:3 Text k = 5 Pattern l = 2 shift
Case: 4 Text k = 3 If l=k ; p[l+1]= t[j+k] ; Pattern l = 3 Do not need to shift. Case: 5 Text k = 5 If l<k ; p[l+1]= t[j+k] Pattern l = 3 shift
Example(1/7) T P mismatch shift Shift[4] = 4 We first compare noholes by using phase 1 of Colussi algorithm and shift by using the Shift[i].
Example(2/7) T P match
Example(3/7) T P mismatch shift Shift[0] = 5 After all noholes are matched, we compare holes by using phase 2 of Colussi algorithm and shift by using the Shift[i].
Example(4/7) T k = 2 P l = 3 shift In this case, we use the Case 1 of the GG algorithm to shift because this case satisfies the condition overlay < lof using the GG algorithm and l > k.
Example(5/7) T P All noholes are match mismatch shift Shift[2] = 5 After comparing the cases of the GG algorithm, We return to use the Colussi algorithm.
Example(6/7) T k = 2 P l = 3 shift In the case, we use the Case 5 of the GG algorithm to shift because this case satisfies the condition of using the GG algorithm and l < k.
Example(7/7) T P Exact match After comparing the cases of the GG algorithm, We return to use the Colussi algorithm.
Time complexity • preprocessing phase in O(m) time and space complexity. • searching phase in O(n) time complexity. • performs (4/3)n text character comparisons in the worst case.
Conclusion • The Galil-Giancarlo algorithm is very similar to Colussi algorithm. The Colussis algorithm performs very badly if the pattern starts and ends with a sequence of repetitions of the same symbol. For these patterns Colussis algorithm shifts by a single position and (3/2)n comparisons are actually performed. Galil and Giancarlo devised a way to avoid these shifts by a single position.
References • [B92] BRESLAUER, D., Efficient String Algorithmics, Ph. D. Thesis, Report CU-024-92, Computer Science Department, Columbia University, New York, NY, 1992. • [GG92] On the exact complexity of string matching: upper bounds , Galil, Z. and Giancarlo, R. , SIAM Journal on Computing , Vol. 21 , No. 3 , 1992 , pp. 407-437 .