260 likes | 284 Views
Explore the innovative Jumbled Matching technique with SIMD technology for efficient substring permutations discovery in text data. Learn about SIMD architecture, experimental results, and application in pattern recognition. Discover how SIMD revolutionizes pattern matching algorithms.
E N D
Jumbled Matching with SIMD Sukhpal Singh Ghuman and Jorma Tarhio
Outline • Definition • Motivation • Previous algorithms • SIMD • Jumbled matching with SIMD • Experimental results • Concluding remarks
Jumbled matching • A substring u of T is a jumbled equivalent to P if the count of each character in P is equal to its count in u and |P| = |u| holds. • To find substrings of T which are permutations of P. • For example: P = edcba in T = aabecdcddee.
Jumbled matching • Jumbled patterns can be described as Parikh vector. • Vector of multiplicities of the characters. • p(S) is (1,2,1) for S = abcb.
Motivation • Alignment of strings • SNP discovery • Discovery of repeated patterns • Interpretation of mass spectrometry data
Previous Algorithms: Count • Key Idea - scan the text forward while maintaining counts of characters. • Work in linear time. • These algorithms were developed as filtration methods for online approximate string matching.
Previous Algorithms: BAM • Cantone and Faro (Proc. PSC 2014) presented the BAM algorithm (Bit-parallel Abelian Matcher). • Associate a counter (bin) to each distinct character in P. • A single 1-bit counter for the remaining characters of the alphabet.
Bit Parallel simulation P = abbccc cbaother characters
Previous Algorithms • Chhabra et al. (Proc. PSC 2015) presented: • BAM2 - A variation of BAM that handles a 2 - gram at a time. • EBL (Exact Backward for Large alphabets) - Based on the SBNDM2 algorithm.
SIMD • The SIMD architecture allows the execution of multiple data on single instruction. • Sixteen 128-bit registers known as XMM0 to XMM15. • Weusespecializedstringmatching SIMD instructions in addition to standard SIMD instructions
Jumbled Matching + SIMD • SIMD comprise of several aggregation operations: • Equal each • Equal any • Ranges
Equal Any Approach • Equal any: • Handles 16 bytes at one time. • Two operands as input. • Set of characters in the pattern. • Text window. • Example: • Operand1: aeiou • Operand2: You drive me mad • Output: 0110001010010010
Equal Any Approach • The equal any SIMD command returns a bitvector of 16 bits showing the positions in the test window which hold any character of the pattern. • A match candidate is found if the m bits of the vector are ones.
SIMD Instructions • simd-equal-any(x, y) • _mm_extract_epi16( _mm_cmpistrm(x, y, SIDD_CMP_EQUAL_ANY), 0). • simd-cmpeq(x, y) • mm_movemask_epi8( _m128i_mm_cmpeq_epi8(_m128i x, _m128i y)).
Least Frequent Character Approach • Based on the least frequent character of the text in the pattern. • Frequency of the character is based on the text or on the language. • We use SIMD instructions to analyze whether a test window of 16 bytes holds the least frequent character of the pattern.
Least Frequent Character Approach • R is an array containing 16 bytes, each of which holding the least frequent character of the pattern. • The SIMD register x holds R and the SIMD register y holds a test window of 16 bytes of the text. • The registers x and y are compared by the simd-cmpeq operation. • Our previous algorithms used as checking ( or local search) routine.
Experimental Results • The performance of SIMD instructions depends on the architecture of the processor. • The performance of a single instruction is measured by latency and throughput.
Latency and throughput of SIMD instructions for Nehalem and Haswell
Experimental Results: Execution times of algorithms for English data on Nehalem.
Experimental Results: Execution times of algorithms for Protein data on Nehalem.
Experimental Results: Execution times of algorithms for English data on Haswell.
Concluding remarks • We introduced improved solutions for exact jumbled pattern matching based on the SIMD architecture. • If the latency of the used SIMD instructions would improve in future processors, the running times of the algorithms will respectively change.