1 / 24

Massively Parallel Computing for Protein Alignment

Massively Parallel Computing for Protein Alignment. Bertil Schmidt School of Computer Engineering Nanyang Technological University Singapore. Contents. Motivation Smith-Waterman Algorithm Parallelization on the Hybrid Architecture Parallelization on the Fuzion 150

bazyli
Download Presentation

Massively Parallel Computing for Protein Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Massively Parallel Computing for Protein Alignment Bertil Schmidt School of Computer Engineering Nanyang Technological University Singapore

  2. Contents • Motivation • Smith-Waterman Algorithm • Parallelization on the Hybrid Architecture • Parallelization on the Fuzion 150 • Performance Evaluation • Conclusion and Future Work

  3. Motivation • Genetic sequence databases are growing exponentially • Database growth rate will continue for the foreseeable future, since multiple concurrent genome projects have begun, with more to come

  4. Motivation • Discovered sequences are analyzed by comparison with databases • Complexity of sequence comparison is proportional to the product of query size times database size •  Analysis too slow on sequential computers • Two possible approaches • Heuristics, e.g. BLAST,FastA, but the more efficient the heuristics, the worse the quality of the results • Parallel Processing, get high-quality results in reasonable time

  5. Mycobacterium Smegmatis Mycobacterium Tuberculosis 3918 Protein Sequences 1.329.298 AminoAcids 4289 Protein Sequences 1.359.008 AminoAcids Full Genome Comparison • related Organisms, but Tuberculosis causes a disease  find common and different parts • 16106 pairwise sequence comparisons • Project with IMCB, Thomas Dick

  6. GGHSRLILSQLGEEG.RLLAIDRDPQAIAVAKT....IDDPRFSII |||::::| : |::| ||:::||||:|:|||:: ::| |:::: GGHAERFL.E.GLPGLRLIGLDRDPTALDVARSRLVRFAD.RLTLV Slower Search Speed Faster Data Quality Lower Higher Protein Sequence Alignment • BLAST, FastA, Smith-Waterman Smith- Waterman FastA BLAST

  7. Smith-Waterman Algorithm • Optimal local alignment of two sequences • Performs an exhaustive search for the optimal local alignment • Complexity O(nm) for sequence lengths n and m • Based on the 'dynamic programming' (DP) algorithm • Fill the DP matrix using a substitution (mutation) matrix • Find the maximal value (score) in the matrix • Trace back from the score until a 0 value is reached

  8. Smith-Waterman Algorithm • Aligning S1 and S2 of length l1 and l2 using Recurrences: • Calculate three possible ways to extend the alignment • by one AminoAcid (AA) in each sequence • by one AA in the first sequence and align it with a gap in the second • by one AA in the second sequence and align it with a gap in the first

  9. A T C T C G T A T G A T G  0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0 T 0 0 2 1 2 1 1 4 3 2 1 1 3 2 C 0 0 1 4 3 4 3 3 3 2 1 0 2 2 T 0 0 2 3 6 5 4 5 4 5 4 3 2 1 A 0 2 2 2 5 5 4 4 7 6 5 6 5 4 T 0 1 4 3 4 4 4 6 5 9 8 7 8 7 C 0 0 3 6 5 6 5 5 5 8 8 7 7 7 A 0 2 2 5 5 5 5 4 7 7 7 10 9 8 C 0 1 1 4 4 7 6 5 6 6 6 9 9 8 A T C T C G T A T G A T G G T C T A T C A C Smith-Waterman Algorithm Align S1=ATCTCGTATGATGS2=GTCTATCAC 0 0 0 0 0 0 2 1 0 0 2 1 0 2 2 =1, =1 4 3 5 7 9 8 10

  10. Systola 1024: PC add-on board with 1024 processors • Fuzion 150: 1536 processors on a single chip Parallel Architectures for Bioinformatics • Embedded Massively Parallel Accelerators • Other accelerators: Decypher, Biocellerator, GeneMatcher2, Kestrel, SAMBA

  11. Systola1024 Systola1024 Systola1024 Systola1024 Systola1024 Systola1024 Systola1024 Systola1024 High speed Myrinet switch Systola1024 Systola1024 Systola1024 Systola1024 Systola1024 Systola1024 Systola1024 Systola1024 Parallel Architectures for Bioinformatics • combines SIMD and MIMD paradigm within a parallel architecture Hybrid Computer

  12. Previous Applications • Volume Visualization • Automatic Visual Quality Control (Opel) • Cryptography • Computer Tomography • Video Compression • Range of Transforms (Fourier, Wavelet, Hough, Radon) • Computer Graphics

  13. RAM NORTH RAM WEST Controller program memory host computer bus ISA Interface processors Architecture of Systola 1024 • Instruction Systolic Array: • 32  32 mesh of processing elements • wavefront instruction execution

  14. - + - - * - + - - - * * * * + - + * + - + + * * - + + * * + + * - + + * * + + + + * * column selectors + instructions - * + * - + - - + * row selectors Instruction Systolic Array • wavefront instruction execution  fast accumulation operations (e.g. row sum, broadcast, ringshift)

  15. l1 P1 P2 P13  A T C T C G T A T G A T G  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 2 1 0 2 G 0 0 0 0 0 2 T 0 0 0 2 2 1 1 1 2 2 1 1 4 3 2 1 1 3 2 3 2 l2 C 0 0 0 1 1 3 4 4 3 4 3 3 3 2 1 0 2 2 4 T 0 0 0 3 2 2 3 6 5 4 5 4 5 4 3 2 1 A 0 2 2 2 2 2 5 5 4 4 7 6 5 6 5 4 T 0 1 4 3 4 4 4 6 5 9 8 7 8 7 C 0 0 3 6 5 6 5 5 5 8 8 7 7 7 1 A 0 2 2 5 5 5 5 4 7 7 7 10 9 8 C 0 1 1 4 4 7 6 5 6 6 6 9 9 8 Parallelization of Smith-Waterman • matrix cells along a single diagonal are computed in parallel • comparison is performed in l1+l21 steps on l1 PEs 0 5

  16. a1023 a1022 a992 a63 a62 a32 bk….b1b0 a31 a30 a0 Mapping onto Systola 1024 • Subject sequences can be pipelined with only step delay  k steps for subject sequence of length k a: query sequence (equal to 1024) b: subject sequence …c1c0 X • Efficient routing on the ISA: Row Ringshift and Broadcast

  17. Query sequence length 256 512 1024 2048 4096 Systola 1024 speedup to PIII 850 294 5 577 6 1137 6 2241 6 4611 6 Cluster of 16 Systolas speedup to PIII 850 20 81 38 86 73 91 142 94 290 94 Performance Evaluation • Scan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths • Parallel implementation scales linearly with sequence length and number of PCs • Computing time dominates data transfer time need a state-of-the-art architecture

  18. Fuzion 150 Architecture Linear SIMD Array 1536PEs each with 2 Kbytes DRAM SIMD Controller • 0.25-m, single-chip, SIMD architecture • 1536 PEs @ 200 MHz  300 GOPS • 600 GB/s on-chip, 6.4 GB/s off-chip bandwidth • Multithreading (control units interact via semaphores) • developed by Clearspeed Technology (UK) for graphics, networking processing Instruction Fetch Local Memory Host AGP Rambus FUZION Bus 1,2 or 4 Channels (6.4 GB/s) 32-bit EPU (ARC) Video I/O Display

  19. Instructions ALU (8 bits) Register file 32 Bytes Left PE Right PE PE Memory 2 KByte DRAM Block I/O Channel Fuzion 150 Architecture Local Memory Block 5 Fuzion Bus PE (5,0) PE (5,1) PE (5,255) Block 1 PE (1,0) PE (1,1) PE (1,255) Block 0 PE (0,0) PE (0,1) PE (0,255)

  20. Fuzion 150 - Debugger

  21. a1535 a1534 a1280 a511 a510 a256 a0 a1 a255 Mapping onto the Fuzion 150 Block 5 a: query sequence (equal to 1536) Block 1 b: subject sequence Block 0 bk….b1b0 …c1c0 X • No fast global communication  2-step local communcication • Subject sequence can be pipelined with only step delay

  22. Query sequence length 256 512 1024 2048 4096 Fuzion 150 speedup to PIII 850 12 136 22 151 42 157 82 163 162 165 Performance Evaluation • Scan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths • Parallel implementation scales linearly with sequence length • Computing time dominates data transfer time

  23. Performance Evaluation • Normalized time Comparison for a 10 Mbase search on different parallel architectures with different query length • 4faster than 16K-PE MasPar • 6faster than Kestrel • 5faster than SAMBA (special-purpose 3-board architecture)

  24. Conclusions and Future Work • Demonstrated how fine-grained parallel architectures can be applied efficiently for Comparative Genomics • Significant runtime savings for full genome comparisons and database searching  More Discovery Is Possible at a good price-performance ratio • Accelerating other Bioinformatics Applications, e.g. Hidden Markov Models • Build a next generation architecture at Center for High Performance Embedded Systems, NTU • Integration of accelerators in a Grid Environment

More Related