A fast, lock-free approach for efficient parallel counting of occurrences of k - mers

A fast, lock-free approach for efficient parallel counting ofoccurrences of k-mers Presented By: Dinesh Agarwal

Overview • Introduction • Algorithm • Updating Hash Table • Reducing memory usage • Space efficient key encoding • Fast merging of hash tables • Results

Introduction • Problem definition: Given a string S how to count the number of occurrences of every sub-string of size k • k length substrings are called k-mers • Determining their number is called k-mer counting

Algorithm: k-mer hash table • M = length of the hash table, M = 2L for some L • i-th possible location for k-mer m: pos(m,i) = ( hash(m) + reprobe(i) ) % M • The key is an integer in set Uk= [0, 4k -1] • Function hash is a bijection • Function reprobe(i) = i ( I + 1 ) / 2

Updating the hash table • CAS instruction

Updating the hash table

Reduced memory usage • Value field smaller than that can hold largest frequency value • Majority of k-mers appear once • Most of the remaining ones appear c times • A small number appears large number of times • Use two entries in the hash table for a key • The value is concatenation of values stored in both keys

Space efficient key encoding • Position of a key tells its lower L bits • The key field only stores higher 2k-L bits of f(m) • Conversely, the content can tell the position of the sequence of k-mers • k-mer can be recovered by computing f-1(m)

Fast merging of intermediate hash tables • Hash tables too big to keep in memory • Sorting in linear time* • Let pos(m) be the final position of a k-mer • If pos(m1,0) + reprobe(max)<pos(m2,0)+pos (m2,0) then pos(m1) < pos(m2) • Resolving the ordering within a window of size reprobe(max) is sufficient to sort the output.

Results

Results..

Questions?

A fast, lock-free approach for efficient parallel counting of occurrences of k - mers

A fast, lock-free approach for efficient parallel counting of occurrences of k - mers

Presentation Transcript

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems

Lock vs. Lock-Free memory

Fast counting of triangles in large networks without counting: Algorithms and laws

A Statistical Approach for Efficient Crawling of Rich Internet Applications

Occurrences

FastForward for Efficient Pipeline Parallelism: A Cache-Optimized Concurrent Lock-Free Queue

Efficient Realization of Parallel HEVC Intra Coding

A Prototypical Self-Optimizing Package for Parallel Implementation of Fast Signal Transforms

Lithe: Enabling Efficient Composition of Parallel Libraries

Heuristics for Fast Exact Model Counting

Model Checking of a lock-free stack

A Java Implementation of a Lock-Free Concurrent Priority Queue

A Lock-Free Multiprocessor OS Kernel

Verification of Atomicity in Lock Free Programs

FastForward for Efficient Pipeline Parallelism: A Cache-Optimized Concurrent Lock-Free Queue

Fast, Memory-Efficient Traffic Estimation by Coincidence Counting

COMPUTATIONALLY EFFICIENT ALGORITHM FOR PARALLEL IMPLEMENTATION OF ZEROTREE CODING

A Kenya Free of AIDS: K e F A

A Fast PTAS for k-Means Clustering

Lock Smiths of Dubai – Unlocking an Era of Stress Free Lock Repair and Maintenance

Can I Make a Single Key for All of My Locks? - Efficient Lock & Key

Range-Efficient Counting of Distinct Elements