120 likes | 263 Views
A fast, lock-free approach for efficient parallel counting of occurrences of k - mers. Presented By: Dinesh Agarwal. Overview. Introduction Algorithm Updating Hash Table Reducing memory usage Space efficient key encoding Fast merging of hash tables Results. Introduction.
E N D
A fast, lock-free approach for efficient parallel counting ofoccurrences of k-mers Presented By: Dinesh Agarwal
Overview • Introduction • Algorithm • Updating Hash Table • Reducing memory usage • Space efficient key encoding • Fast merging of hash tables • Results
Introduction • Problem definition: Given a string S how to count the number of occurrences of every sub-string of size k • k length substrings are called k-mers • Determining their number is called k-mer counting
Algorithm: k-mer hash table • M = length of the hash table, M = 2L for some L • i-th possible location for k-mer m: pos(m,i) = ( hash(m) + reprobe(i) ) % M • The key is an integer in set Uk= [0, 4k -1] • Function hash is a bijection • Function reprobe(i) = i ( I + 1 ) / 2
Updating the hash table • CAS instruction
Reduced memory usage • Value field smaller than that can hold largest frequency value • Majority of k-mers appear once • Most of the remaining ones appear c times • A small number appears large number of times • Use two entries in the hash table for a key • The value is concatenation of values stored in both keys
Space efficient key encoding • Position of a key tells its lower L bits • The key field only stores higher 2k-L bits of f(m) • Conversely, the content can tell the position of the sequence of k-mers • k-mer can be recovered by computing f-1(m)
Fast merging of intermediate hash tables • Hash tables too big to keep in memory • Sorting in linear time* • Let pos(m) be the final position of a k-mer • If pos(m1,0) + reprobe(max)<pos(m2,0)+pos (m2,0) then pos(m1) < pos(m2) • Resolving the ordering within a window of size reprobe(max) is sufficient to sort the output.