1 / 18

KMP Skip Search Algorithm

KMP Skip Search Algorithm. Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in Computer Science, Vol. 1448, 1998, pp. 55-64. Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan. 3. b. c. d. a. b. a. a.

brock-meyer
Download Presentation

KMP Skip Search Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KMP Skip Search Algorithm Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in Computer Science, Vol. 1448, 1998, pp. 55-64 Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan

  2. 3 b c d a b a a a b c d a d a d b d c c b d d d a a 2 a 4 5 6 7 8 17 d 19 20 18 12 a 11 16 15 b 10 9 14 13 1 d Definition • String Matching Problem: Input: a text string T of length n and a pattern string P of length m. Output: Find all occurrence of P in T. Example T: P: The occurrences of P in T : T5

  3. The KMP Skip Search algorithm consists two phases which are processing and searching. • KMP Skip Search algorithm uses KMP table to improve the Skip Search algorithm.

  4. c A C G T i 0 1 2 3 4 5 6 7 Z[c] 6 1 7 -1 List[i] -1 -1 -1 0 2 3 4 5 0 1 2 3 4 5 6 7 8 mpNext -1 0 0 0 1 0 1 0 1 kmpNext -1 0 0 -1 1 -1 1 -1 1 Preprocessing • The preprocessing phase computes the buckets for all characters of the alphabet , list table , MP table and KMP table. Example: Text stringT=GCATCGCAGAGAGTATACAGTACG 0 12 3 4 5 6 7 Pattern string P=GCAGAGAG P = G C A G A G A G 0 1 2 3 4 5 6 7

  5. A general situation for the search phase i T j P start wall i T k X j P k • First it uses skip search algorithm which makes T[i]=P[j]. • wall is the first mismatch position of T when T align with P. • start is the first position of T when T align with P. • k is a small string when the substring of P equal to the substring of T. • KmpStart is the next shift position of kmp. • Skipstart is the next shift position of skip.

  6. If k=0, that there is not the prefix of P which equals the substring of T, it uses skip search algorithm; otherwise, when k>0, that there is not the prefix of P which equals the substring of T, we have to find out Kmpstart、wall and Skipstart to compare its four cases. Case1. skipStart < kmpStart then a shift according to the skip algorithm is applied which gives a new value for skipStart, and we have to compare again skipStart and kmpStart. Case2. kmpStart < skipStart < wall then a shift according to the shift table of Morris-Pratt is applied. This gives a new value for kmpStart. We have to compare again skipStart and kmpStart. Case3. skipStart = kmpStart then another step can be performed with start = skipStart. Case4. kmpStart < wall < skipStart then another step can be performed with start = skipStart.

  7. Example: step 1 First it uses the Skip Search algorithm to align T and P. start = 0 wall = 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 P = ACTACGT k = 5 0 1 2 3 4 5 6 ACTACGT (kmp’s shift) kmpstart = 3 0 1 2 3 4 5 6 ACTACGT (skip’s shift) skipstart = 4 wall kmpstart skipstart = 5 = 3 = 4 Case2. kmpStart < skipStart < wall then a shift according to the shift table of Morris-Pratt is applied. This gives a new value for kmpStart. We have to compare again skipStart and kmpStart. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT

  8. Example: step 1-1 start = 0 wall = 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT k = 2 0 1 2 3 4 5 6 ACTACGT (kmp’s shift) kmpstart = 5 0 1 2 3 4 5 6 ACTACGT (skip’s shift) skipstart = 4 wall kmpstart skipstart = 5 = 5 = 4 Case1. skipStart < kmpStart then a shift according to the skip algorithm is applied which gives a new value for skipStart, and we have to compare again skipStart and kmpStart. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT

  9. Example: step 1-2 start = 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT k = 0 ∴ uses skip search algorithm 0 1 2 3 4 5 6 ACTACGT start = 9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT

  10. Example: step 2 start = 9 wall = 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT k = 1 0 1 2 3 4 5 6 ACTACGT (kmp’s shift) kmpstart = 10 0 1 2 3 4 5 6 ACTACGT (skip’s shift) skipstart = 12 wall kmpstart skipstart = 10 = 10 = 12 Case4. kmpStart < wall < skipStart then another attempt can be performed with start = skipStart. start = 12 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT

  11. Example: step 3 start = 12 wall = 19 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT match, k=7 0 1 2 3 4 5 6 ACTACGT (kmp’s shift) kmpstart = 19 0 1 2 3 4 5 6 ACTACGT (skip’s shift) skipstart = 16 wall kmpstart skipstart = 19 = 19 = 16 Case1. skipStart < kmpStart then a shift according to the skip algorithm is applied which gives a new value for skipStart, and we have to compare again skipStart and kmpStart. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT

  12. Example: step 3-1 start = 12 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT k=0 ∴ uses skip search algorithm 0 1 2 3 4 5 6 ACTACGT start = 19 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT

  13. Example: step 4 start = 19 wall = 21 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT k=2 0 1 2 3 4 5 6 ACTACGT (kmp’s shift) kmpstart = 21 0 1 2 3 4 5 6 ACTACGT (skip’s shift) skipstart = 21 wall kmpstart skipstart = 21 = 21 = 21 Case3. skipStart = kmpStart then another attempt can be performed with start = skipStart. start = 21 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT

  14. Example: step 5 start = 21 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT k=0 ∴ uses skip search algorithm 0 1 2 3 4 5 6 ACTACGT start = 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT

  15. Example: step 6 start = 25 wall = 26 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT k=1 0 1 2 3 4 5 6 ACTACGT (kmp’s shift) kmpstart = 26 0 1 2 3 4 5 6 ACTACGT (skip’s shift) skipstart = 28 wall kmpstart skipstart = 26 = 26 = 28 Case4. kmpStart < wall < skipStart then another attempt can be performed with start = skipStart. start = 28 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT

  16. Example: step 7 start = 28 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T =ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT match, k=7

  17. Time Complexity • The preprocessing phase of kmp Skip Search is O(m+σ)(σ is the number of alphabet.) • The Searching Phase of Kmp Skip Search algorithm is O(n).

  18. References [BM77]    A Fast String Searching Algorithm , Boyer, R. S. and Moore, J. S. , Communication of the ACM , Vol. 20 , 1977 , pp. 762-772 . [HS91]    Fast String Searching , Hume, A. and Sundy, D. M. , Software, Practice and Experience , Vol. 21 , 1991 , pp. 1221-1248 . [MTALSWW92] Speeding Up Two String-Matching Algorithms, Maxime C., Thierry L., Artur C., Leszek G., Stefan J., Wojciech P. and Wojciech R., Lecture Notes In Computer Science, Vol. 577, 1992, pp. 589-600 . [MW94] Text algorithms, M. Crochemore and W. Rytter, Oxford University Press, 1994. [KMP77] Fast Pattern Matching in Strings, D.E. Knuth, J.H. Morris and V.R. Pratt, SIAM Journal on Computing, Vol. 6, No.2, 1977, pp 323-350 . [T92] A variation on the Boyer-Moore algorithm, Thierry Lecroq, Theoretical Computer Science archive, Vol. 92 , No.1, 1992, pp 119-144 . [T98] Experiments on string matching in memory structures, Thierry Lecroq, Software—Practice & Experience archive, Vol. 28, No.5, 1998, pp 561-568 [T92] Tuning the Boyer-Moore-Horspool string searching algorithm, Timo Raita, Software—Practice & Experience archive, Vol. 22, No.10, 1992, pp. 879-884 . [G94] String searching algorithms, G.A. Stephen, World Scientific Lecture Notes Series On Computing, Vol. 3, 1994, pp. 243 .

More Related