1 / 12

A fast, lock-free approach for efficient parallel counting of occurrences of k - mers

A fast, lock-free approach for efficient parallel counting of occurrences of k - mers. Presented By: Dinesh Agarwal. Overview. Introduction Algorithm Updating Hash Table Reducing memory usage Space efficient key encoding Fast merging of hash tables Results. Introduction.

kemal
Download Presentation

A fast, lock-free approach for efficient parallel counting of occurrences of k - mers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A fast, lock-free approach for efficient parallel counting ofoccurrences of k-mers Presented By: Dinesh Agarwal

  2. Overview • Introduction • Algorithm • Updating Hash Table • Reducing memory usage • Space efficient key encoding • Fast merging of hash tables • Results

  3. Introduction • Problem definition: Given a string S how to count the number of occurrences of every sub-string of size k • k length substrings are called k-mers • Determining their number is called k-mer counting

  4. Algorithm: k-mer hash table • M = length of the hash table, M = 2L for some L • i-th possible location for k-mer m: pos(m,i) = ( hash(m) + reprobe(i) ) % M • The key is an integer in set Uk= [0, 4k -1] • Function hash is a bijection • Function reprobe(i) = i ( I + 1 ) / 2

  5. Updating the hash table • CAS instruction

  6. Updating the hash table

  7. Reduced memory usage • Value field smaller than that can hold largest frequency value • Majority of k-mers appear once • Most of the remaining ones appear c times • A small number appears large number of times • Use two entries in the hash table for a key • The value is concatenation of values stored in both keys

  8. Space efficient key encoding • Position of a key tells its lower L bits • The key field only stores higher 2k-L bits of f(m) • Conversely, the content can tell the position of the sequence of k-mers • k-mer can be recovered by computing f-1(m)

  9. Fast merging of intermediate hash tables • Hash tables too big to keep in memory • Sorting in linear time* • Let pos(m) be the final position of a k-mer • If pos(m1,0) + reprobe(max)<pos(m2,0)+pos (m2,0) then pos(m1) < pos(m2) • Resolving the ordering within a window of size reprobe(max) is sufficient to sort the output.

  10. Results

  11. Results..

  12. Questions?

More Related