1 / 14

Accelerating HMMER Search on GPUs using Hybrid Task and Data Parallelism

Accelerating HMMER Search on GPUs using Hybrid Task and Data Parallelism. Narayan Ganesan 1 , Roger Chamberlain 2 , Jeremy Buhler 2 and Michela Taufer 1 Computer and Info. Sciences Dept, University of Delaware. 1 Computer Science and Engineering, Washington University in St. Louis 2.

chakra
Download Presentation

Accelerating HMMER Search on GPUs using Hybrid Task and Data Parallelism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerating HMMER Search on GPUs using Hybrid Task and Data Parallelism Narayan Ganesan1, Roger Chamberlain2, Jeremy Buhler2 and Michela Taufer1 Computer and Info. Sciences Dept, University of Delaware.1 Computer Science and Engineering, Washington University in St. Louis2

  2. Motivation • Dataset size is growing for many applications rapidly • Problem sizes grow as the product of the data sizes • E.g., Genome sequence alignment, protein motif finding • Number of problems also grows as the product of number of available data • Many sequentially dependent algorithms are executed serially • Processor speed has hit a brick wall (~3.5-4.00GHz in 2003) and serial evaluation is just not feasible for large data applications • Need for parallelism is greater than ever • Parallel hardware is ubiquitous • GPU, Multi-core, SIMD, Hybrid, MIMD, FPGA • Efforts must be spent on parallelizing algorithms and applications for these hardware

  3. Application: Protein Motif Finding ...PAQVEMYKFLLRISQLNRD... • Proteins are synthesized by biological Processes that can be described by HMMs Class A Protein Synthesis ... CHTEARGLEGVCIDPKK... ... DGEACNSPYLDWRKDTEQ... . . . ... CTPPSLAACTPPTS... ... LTITNLMKSLGFKPKPKKI... Class N Protein Synthesis ... DELAAMVRDYLKKTPEF... • Each class is described by short characteristic sequences or motifs • Each class or “generator” is described by a Hidden Markov model • Protein Motif Finding answers the questions: • Given a sequence what class does it belong to? • Given a sequence and a HMM what is the probability that the sequence belongs to that class?

  4. Protein Motif Finding ...PAQVEMYKFLLRISQLNRD... ... CHTEARGLEGVCIDPKK... Protein Synthesis ... DGEACNSPYLDWRKDTEQ... Protein motif finding is very similar to signal source identification A protein sequence may contain multiple motifs: ..ACGFTDFWAPSLTHLTIKNL.. Motifs in the sequence are sometimes modified by addition and deletion of random amino acids Occurrence of motifs can be modeled by Profile Hidden Markov models Viterbi algorithm is used to “decode” a given protein sequence against a model The result is the probability that the sequence belongs to the class

  5. HMMER Search - Protein Motif Finding D2 D3 D4 I1 I2 I3 I4 M1 M2 M3 M4 M5 Sample Paths C B E T N S J ....ACGFTDFWAPSLTHLTIKNL.... ....ACGFTDFWAPSLTHLTIKNL.... ....ACGFTDFWAPSALTHLTIKNL.... ....ACGFTDFWAPSAGLTHLTIKNL... ....ACGFTDFWAPSAGL-HLTIKNL...

  6. HMMER Search – Recurrence Equations D2 D3 D4 VD VI VM I4 I2 I3 I1 B M1 M2 M3 M4 M5 E T C S N J

  7. HMMER Search- Protein Motif Finding Model 1 m 1 Sequence XE L Insert: • Dependence on XE imposes a row major order computation • Delete state costs impose a serial dependency on previous element in the row • Harder to parallelize by conventional means Match: Delete:

  8. HMMER Search – Protein Motif Finding 1 m • Delete costs impose sequential dependency row i • Parallelize the row calculations • The recurrence for VD is parallelizable by blocking strategy

  9. NVIDIA-CUDA Programming Interface Threads Thread Blocks . . . Multiprocessor 2 Multiprocessor 1 . . . Shared Memory Shared Memory … … Registers Registers Registers Registers Registers Registers Instruction Unit Instruction Unit Processor 1 Processor 1 Processor 2 Processor 2 Processor M Processor M Constant Cache Constant Cache Texture Cache Texture Cache Global Memory

  10. Traditional Task Parallelism for GPUs • Different GPU threads work on independent tasks • Time taken for different tasks vary according to the nature of the job • Different sequences of different lengths take different time to complete • Load imbalance issues are possible • Total time depends on location of the longest sequences in the database Seqi Seqi+M Thread Block 1 Thread Block 2 Thread Block P

  11. Hybrid data and task parallelism • We extract parallelism out of data dependency • Multiple GPU threads cooperate to work on the same task • By dividing the database into roughly equal chunks, we naturally solve any load imbalance problems • This technique works for uniform recurrence equations • Ubiquitous in computational biology including local sequence alignment, multiple sequence alignment, motif finding Seqi Seqi+M Thread Block 1 Thread Block 2 Thread Block P

  12. GPU Implementation row i 1 m • Multiple threads cooperate by partitioning a single sequence and working on different partitions • Working set is one row of the DP matrix, 507x3, 32-bit integers • Model of size 507 has 507x9 transition probabilities and 507x40 emission probabilities stored as short integers • Model data is read in a coalesced form into the shared memory • Working set is stored and updated within shared memory

  13. Performance and Results • We compared our implementation on 1 and 4 GPUs versus the mpi-gpu HMMER for the same dataset • 3 HMM Sizes: • 128 • 256 • 507 • NCBI NR Protein Database: • 5.5GB in Size • 10.5 Million Protein Sequences

  14. Conclusions and Future Work • Our GPU implementation of HMMER is: • 5-8x faster than GPU-HMMER search implementation for the same data set • 100x faster than the CPU implementation • Future work: Phylogenetic Motif Identification • Identify common genetic motifs among several groups of organisms • Representative of evolutionary relatedness among different (typically 1000s) species • Closely related to multiple sequence alignment problem via Hidden Markov Models • Computationally intensive for which faster motif finding is absolutely necessary

More Related