1 / 38

Filter based fast matching of long patterns by using SIMD instructions

Filter based fast matching of long patterns by using SIMD instructions. M. Oğuzhan Külekci TÜBİTAK-UEKAE National Research Institute of Electronics & Cryptology,Turkey kulekci@uekae.tubitak.gov.tr. Area of the research. Off-line ( without using an index ) exact pattern matching

deannaj
Download Presentation

Filter based fast matching of long patterns by using SIMD instructions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Filter based fast matching of long patterns by using SIMD instructions M. Oğuzhan Külekci TÜBİTAK-UEKAE National Research Institute of Electronics & Cryptology,Turkey kulekci@uekae.tubitak.gov.tr PSC'09, Prague, Czech Republic

  2. Area of the research • Off-line (without using an index) exact pattern matching • for patterns longer than 32 bytes • Methodology : Filter-then-search • Technology : SIMD parallelisation PSC'09, Prague, Czech Republic

  3. Filter-then-search • Two phase search: • Filtration: Detect potential match areas of the text via an easy to compute filter function. • Verification: Verify the existance of the pattern only on those positions that pass the filtration. PSC'09, Prague, Czech Republic

  4. Filter-then-search • To be adventegous, computing the filter value of a text portion must be cheaper than performing a full scan of the pattern on the same area. PSC'09, Prague, Czech Republic

  5. Filter-then-search • Efficiency of a filter depends on two criterion: • Distinguishing power: The rate of call for verification. • Computation cost: The time and space complexity of the filter calculation • Usually there is a trade-off in between. PSC'09, Prague, Czech Republic

  6. Previous work on filtering algorithms • Filtering algorithms are especially useful in approximate string matching e.g., factor filters, suffix filters, gapped q-grams,counting filters,... – not included in this study • Lecroq’s q-hash algorithm for exact matching(2007) • Fredriksson&Grabowski’s AOSO and FAOSO(2005) • Bit-parallel algorithms may be used as filters PSC'09, Prague, Czech Republic

  7. This work offers... • This work proposes to benefit from single instruction multiple data (SIMD) parallelisation in pattern matching. • With this aim, it presents a filter that is easy to compute with SIMD intrinsics. PSC'09, Prague, Czech Republic

  8. SIMD Technology • Rather new with less than a decade history. • Main target was multimedia applications (audio/video/image processing) as algorithms on those areas are quite data parallel. • Not very much addressed in string algorithms area PSC'09, Prague, Czech Republic

  9. x1 y1 z1 x2 y2 z2 x3 y3 z3 x4 y4 z4 SIMD Technology • 128-bit special registers (4 floats/integers, or 2 double, or 16 8-bit characters) • Special instruction set dedicated to some operations Θ  Θ  Θ  Θ  PSC'09, Prague, Czech Republic

  10. Intel SIMD Technology • SSE (streaming SIMD extensions) • MMX • SSE2 • SSE3, SSE3e • SSE 4.1 & SSE 4.2 (special instructions dedicated to string matching) • AVX (advanced vector technology, 2011 ?) PSC'09, Prague, Czech Republic

  11. 16-byte block D0 Di DN-1 ... t15 ... ti.16+15 t(N-1).16 ... tn-1 t0 ti.16 Q0 QM-1 ... p15 p(M-1).16 ... pm-1 p0 Basics... Text: N = n / 16 M = m / 16 Pattern: PSC'09, Prague, Czech Republic

  12. 16-byte 16-byte 16-byte 10-byte 6-byte Q0 Q1 Q2 Q3 Basics... Let L represents zero-based index of the last whole 16-byte block of the pattern L = m/16 - 1 Example: m=58, Q=Q0Q1Q2Q3 last 6 bytes of Q3 are null padded, L = 58/16-1 = 2. PSC'09, Prague, Czech Republic

  13. Main idea F = filter(Dz.L+L), for 0 z < N/L • F indicates if P may begin at any byte in previous 16-byte blocks Dz.L to Dz.L+(L-1). • If so, call verification. • Move towards right by L blocks. PSC'09, Prague, Czech Republic

  14. Q0 Q0 Q0 Q1 Q1 Q1 Q2 Q2 Q2 Q3 Q3 Q3 Main idea • Following the same example, m=58,L=2 D0 D1 D2 D3 D4 D5 31 0 15 16 PSC'09, Prague, Czech Republic

  15. Q0 Q0 Q0 Q1 Q1 Q1 Q2 Q2 Q2 Q3 Q3 Q3 Main Idea • Following the same example, m=58,L=2 D0 D1 D2 D3 D4 D5 63 32 47 48 Note that L is actually the shift amount! PSC'09, Prague, Czech Republic

  16. Byte 0 Byte 1 Byte 15 15 15 b 0 0 1 1 b 0 1 b b b b b b 15 15 b b 7 1 0 7 0 7 1 1 K 0 15 0 0 1 b 0 1 b 0 b 0 b b K+1 K K K+1 K+1 b b b 1 15 0 F = K K K Filter Calculation Given a 16-byte block : 1. Shift each byte left by K bits 2. Concatenate individual sign bits PSC'09, Prague, Czech Republic

  17. Shift left by K bits ? • Why do we shift each byte left by K bits? • To compose the filter by the most informative bits of the bytes  more distinguishing filter • How to determine actual K? • According to the alphabet • According to the text (more powerfull, but not practicle) PSC'09, Prague, Czech Republic

  18. Shift left by K bits? e.g. ASCII coded DNA sequences a = 0110 0001 t = 0111 0100 c = 0110 0011 g = 0110 0111 PSC'09, Prague, Czech Republic

  19. Filter computation via SSE 2 intrinsics from SSE2 instruction set: • tmp128 = mm_slli_epi64(inp128,K); * performs the shift • F = mm_movemask_epi8(tmp128); * performs the sign bit concatenation PSC'09, Prague, Czech Republic

  20. Preprocessing • The pattern can align with the text block Di+L (i=0 mod L), on which the filter is to be computed, in L.16 different ways. Di+L Di 0 1 15 0 1 15 p0 p1 p15 p16L p16L+1 p16L+15 p0 p1 p14 p16L-1 p16L p16L+14 L.16 p0 p1 p2 p15 16 PSC'09, Prague, Czech Republic

  21. Preprocessing FList m=58, L = 2; i=0  r = 32  f([p32 .. P47]) i=1  r = 31  f([p31 .. P46]) i=31  r = 1  f([p1 .. P16]) null 0 12 1 null 3 21 null 65535 null PSC'09, Prague, Czech Republic

  22. SSEF Algorithm PSC'09, Prague, Czech Republic

  23. SSEF Algorithm Following the same example (m=58, L=2), let’s investigate the situation for i=16. D16 = [t16.16 .. t16.16+15] = [t256 .. t271] assume f(D16) = 217, and FList[217]  1  null. Remember that 1 means [p31..p47] may align with [t256..t271]. Thus, it’s appropriate that P may occur at [t225..t282]. Note that [t(i-L).16+j..t(i-L).16+j+m-1] = [t225..t282] ,for i=16,L=2,m=58,j=1 Call verification to check if [p0..p58] = [t225..t282]. PSC'09, Prague, Czech Republic

  24. Complexity • Preprocessing • Space : • 64K Flist + 16.L pattern filter nodes • space consumption is O(16.L)  O(m), • Remembering L= m/16-1 • Time : • Exactly 16.L filter operations are performed on the pattern • O(16.L)  O(m) PSC'09, Prague, Czech Republic

  25. Complexity • Searching • Filter is computed over N 16-byte blocks in steps of L. • Total number of filtering operation is O(N/L) • After each filter computation, verification is called • Maximum 16.L times • Minimum 0 times • Average  16.L / 64K times PSC'09, Prague, Czech Republic

  26. Complexity • Best case • no verification call, just the filter calculations • O(N/L)  O(n/m) • Worst case • All possible 16.L alignments are verified at each filter • O( N/L + (L.16).m)  O(n.m) • Average case • At each filtering operation verification is called with a probablity of 16.L/64K • O( N/L + N/L.(L.16/64K).m)  O(n/m + n.m/65536 ). PSC'09, Prague, Czech Republic

  27. Experimental Results • SSEF is compared with BLIM, QS, 3-hash, 8-hash, BOM2, and BSOM2 • 64-bit Intel Xeon machine, 3GB memory, gcc with –O3 option • Small, medium and large alphabet sets PSC'09, Prague, Czech Republic

  28. Data sets PSC'09, Prague, Czech Republic

  29. PSC'09, Prague, Czech Republic

  30. PSC'09, Prague, Czech Republic

  31. PSC'09, Prague, Czech Republic

  32. PSC'09, Prague, Czech Republic

  33. PSC'09, Prague, Czech Republic

  34. PSC'09, Prague, Czech Republic

  35. Grand Average Speed • Very fast on binary alphabet, and plain DNA sequences. • On natural language text, the improvement is significant also. PSC'09, Prague, Czech Republic

  36. Conclusions • Initial attempt to benefiting from SIMD parallelisation. • Faster than the alternatives in all alphabet sizes. • Its speed is not much effected by the alphabet size (similar to q-hash filter). • A new strong alternative for exact matching of long patterns on biological sequences. PSC'09, Prague, Czech Republic

  37. Future work • Can we do better with SSE4 ? • What about the shorter (<32) length patterns? • Any other point to deploy SIMD parallelisation in string algorithms area? PSC'09, Prague, Czech Republic

  38. Thank you...questions ? PSC'09, Prague, Czech Republic

More Related