260 likes | 280 Views
Jumbled Matching with SIMD. Sukhpal Singh Ghuman and Jorma Tarhio. Outline. Definition Motivation Previous algorithms SIMD Jumbled matching with SIMD Experimental results Concluding remarks. Jumbled matching.
E N D
Jumbled Matching with SIMD Sukhpal Singh Ghuman and Jorma Tarhio
Outline • Definition • Motivation • Previous algorithms • SIMD • Jumbled matching with SIMD • Experimental results • Concluding remarks
Jumbled matching • A substring u of T is a jumbled equivalent to P if the count of each character in P is equal to its count in u and |P| = |u| holds. • To find substrings of T which are permutations of P. • For example: P = edcba in T = aabecdcddee.
Jumbled matching • Jumbled patterns can be described as Parikh vector. • Vector of multiplicities of the characters. • p(S) is (1,2,1) for S = abcb.
Motivation • Alignment of strings • SNP discovery • Discovery of repeated patterns • Interpretation of mass spectrometry data
Previous Algorithms: Count • Key Idea - scan the text forward while maintaining counts of characters. • Work in linear time. • These algorithms were developed as filtration methods for online approximate string matching.
Previous Algorithms: BAM • Cantone and Faro (Proc. PSC 2014) presented the BAM algorithm (Bit-parallel Abelian Matcher). • Associate a counter (bin) to each distinct character in P. • A single 1-bit counter for the remaining characters of the alphabet.
Bit Parallel simulation P = abbccc cbaother characters
Previous Algorithms • Chhabra et al. (Proc. PSC 2015) presented: • BAM2 - A variation of BAM that handles a 2 - gram at a time. • EBL (Exact Backward for Large alphabets) - Based on the SBNDM2 algorithm.
SIMD • The SIMD architecture allows the execution of multiple data on single instruction. • Sixteen 128-bit registers known as XMM0 to XMM15. • Weusespecializedstringmatching SIMD instructions in addition to standard SIMD instructions
Jumbled Matching + SIMD • SIMD comprise of several aggregation operations: • Equal each • Equal any • Ranges
Equal Any Approach • Equal any: • Handles 16 bytes at one time. • Two operands as input. • Set of characters in the pattern. • Text window. • Example: • Operand1: aeiou • Operand2: You drive me mad • Output: 0110001010010010
Equal Any Approach • The equal any SIMD command returns a bitvector of 16 bits showing the positions in the test window which hold any character of the pattern. • A match candidate is found if the m bits of the vector are ones.
SIMD Instructions • simd-equal-any(x, y) • _mm_extract_epi16( _mm_cmpistrm(x, y, SIDD_CMP_EQUAL_ANY), 0). • simd-cmpeq(x, y) • mm_movemask_epi8( _m128i_mm_cmpeq_epi8(_m128i x, _m128i y)).
Least Frequent Character Approach • Based on the least frequent character of the text in the pattern. • Frequency of the character is based on the text or on the language. • We use SIMD instructions to analyze whether a test window of 16 bytes holds the least frequent character of the pattern.
Least Frequent Character Approach • R is an array containing 16 bytes, each of which holding the least frequent character of the pattern. • The SIMD register x holds R and the SIMD register y holds a test window of 16 bytes of the text. • The registers x and y are compared by the simd-cmpeq operation. • Our previous algorithms used as checking ( or local search) routine.
Experimental Results • The performance of SIMD instructions depends on the architecture of the processor. • The performance of a single instruction is measured by latency and throughput.
Latency and throughput of SIMD instructions for Nehalem and Haswell
Experimental Results: Execution times of algorithms for English data on Nehalem.
Experimental Results: Execution times of algorithms for Protein data on Nehalem.
Experimental Results: Execution times of algorithms for English data on Haswell.
Concluding remarks • We introduced improved solutions for exact jumbled pattern matching based on the SIMD architecture. • If the latency of the used SIMD instructions would improve in future processors, the running times of the algorithms will respectively change.