270 likes | 685 Views
String Matching of Bit Parallel Suffix Automata. Suffix Automata. Base on a Deterministic Acyclic Word Graph (DAWG) To facilitate comparing equivalence suffix string Nondeterministic suffix automata Deterministic suffix automata. Subset Construction.
E N D
Suffix Automata • Base on a Deterministic Acyclic Word Graph (DAWG) • To facilitate comparing equivalence suffix string • Nondeterministic suffix automata • Deterministic suffix automata Subset Construction
Also called Backward Deterministic automata Matching (BDM) Build the factor x for pattern p endpos(x) set of all the pattern position where an occurrence of x ends Ex: Pattern = baabbaa, endpos(aa) = {3,7} Safe shift, if no equivalent suffix in pattern Suffix Automata Search Text: shift left to right Windows size = pattern length Fail to matching a factor Shift window
BDM Algorithm Build automata Reached the final state
Suffix Automata Search Example 1. Build Reverse Deterministic Suffix Automata 2. endpos(x) to find a factor 3. Fail to find a factor, do a safe shift
Suffix Automata Search Example 1. T= [abbaba a ]bbaab a is a factor of pr and a reverse prefix of p. last =6 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b
Suffix Automata Search Example 2. T= [abbab aa ]bbaab aa is a factor of pr and a reverse prefix of p. last =5 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b
Suffix Automata Search Example 3. T= [abba baa ]bbaab aab is a factor of pr 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b
Suffix Automata Search Example 4. T= [abb abaa ]bbaab We fail to recognize the next a.So we shift the window to last. We search again in position:T= abbab[aabbaab] . last=7 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b
Suffix Automata Search Example 5. T= abbab[aabbaa b ] b is a factor of pr 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b
Suffix Automata Search Example 6. T= abbab[aabba ab ] ba is a factor of pr 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b
Suffix Automata Search Example 7. T= abbab[aabb aab ] baa is a factor of pr and a reverse prefix of p. last =4 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b
Suffix Automata Search Example 8. T= abbab[aab baab ] baab is a factor of pr 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b
Suffix Automata Search Example 9. T= abbab[aa bbaab ] baabb is a factor of pr 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b
Suffix Automata Search Example 10. T= abbab[a abbaab ] baabba is a factor of pr 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b
Suffix Automata Search Example 11. T= abbab[ aabbaab ] We recognize the word aabbaab and report an occurrence. 2367 b a 6 a a 01234567 26 7 37 4 a a a b b b 145 5 b
BNDM Algorithm • Backward Nondeterministic Dawg Matching (BNDM) • Handle class, multiple pattern, and allow errors • Using bit parallelism, Combine Shift-Or and BDM • Faster than BDM 20% ~ 25%, Faster than BM 10% ~ 40% • Update Function
BNDM Further Improvement • Handle long pattern • Partition pattern p into subpatterns pi • Build a array of D and B, process each part with basic algorithm • If pi is found, than process pi+1 … • Handle Class • Modified B table only • Have the ith bit set for all chars belonging to ith position in pattern • Multiple Pattern • Two method • Interleave patterns, shift r bit for each D update • Just concatenate, shift 1 bit, but modifed D = (D<<1) &(1m-10)r • Where r is # of patterns • Approximate Matching • Use Wu’s method
Performance Comparison In 1/100 of second per megabyte
Reference • Gonzalo Navarro and Mathieu Raffinot. A Bit-parallel approach to Suffix Automata: Fast Extended String Matching. In M. Farach (editor), Proc. CPM'98, LNCS 1448. Pages 14-33, 1998. • Gonzalo Navarro, Mathieu Raffinot, Fast and Flexible String Matching by Combining Bit-parallelism and Suffix Automata (1998)