1 / 22

NETWORK-ON-CHIP HARDWARE ACCELERATORS FOR BIOLOGICAL SEQUENCE ALIGNMENT

NETWORK-ON-CHIP HARDWARE ACCELERATORS FOR BIOLOGICAL SEQUENCE ALIGNMENT. Author: Souradip Sarkar; Gaurav Ramesh Kulkarni; Partha Pratim Pande; and Ananth Kalyanaraman Publisher: IEEE TRANSACTIONS ON COMPUTERS, JANUARY 2010 Presenter: Chin-Chung Pan Date: 2010/09/08. OUTLINE.

kiona
Download Presentation

NETWORK-ON-CHIP HARDWARE ACCELERATORS FOR BIOLOGICAL SEQUENCE ALIGNMENT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NETWORK-ON-CHIP HARDWARE ACCELERATORSFOR BIOLOGICAL SEQUENCE ALIGNMENT Author: Souradip Sarkar; Gaurav Ramesh Kulkarni; Partha Pratim Pande; and Ananth Kalyanaraman Publisher: IEEE TRANSACTIONS ON COMPUTERS, JANUARY 2010 Presenter: Chin-Chung Pan Date: 2010/09/08

  2. OUTLINE • Introduction • Algorithms for Sequence Alignment and Parallelization • The PP Approach • The AD Approach • NoC Implementation • Mapping the Sequence Alignment Algorithms to NoC • NoC Switch Design • Experimental Results

  3. INTRODUCTION • Propose optimized NoC architectures for different sequence alignment algorithms that were originally designed for distributed memory parallel computers. • Provide a thorough comparative evaluation of their respective performance and energy dissipation. • The results show that our NoC-based implementations can provide above 102-103-fold speedup over other hardware accelerators and above 104-fold speedup over traditional CPU architectures.

  4. ALGORITHMS FOR SEQUENCE ALIGNMENT AND PARALLELIZATION • We briefly outline the main ideas behind these two algorithms: • PP algorithm (for parallel prefix) • AD algorithm (for antidiagonal) • There are two main challenges in parallelizing DP algorithms for PSA: • Meeting the dependency constraint without affecting the parallel speedup during forward pass. • Computing the optimal retrace without storing the entire DP table during forward pass.

  5. ALGORITHMS FOR SEQUENCE ALIGNMENT AND PARALLELIZATION • The PP approach • The forward pass in table computation proceeds iteratively, row by row, all the PEs participate in computing row i in parallel. • an O(n/p) local computation. • an O(logp) parallel prefix communication.

  6. ALGORITHMS FOR SEQUENCE ALIGNMENT AND PARALLELIZATION • The PP approach

  7. ALGORITHMS FOR SEQUENCE ALIGNMENT AND PARALLELIZATION • The AD algorithm • This requires that both the input sequences s1 and s2 are made available to all the PEs. • For the forward pass, the algorithm proceeds iteratively by computing one antidiagonal of DPtable at each “time step”.

  8. NOC IMPLEMENTATIONMAPPING THE SEQUENCE ALIGNMENT ALGORITHMS TO NOC • The PP Implementation • While there are multiple NoC architectures, the hypercubic topology is best suited for the parallel prefix operation. • But a physical realization of the NoC is limited by the layout dimension of the chip, which is predominantly 2D in practice. D-dimensional mesh

  9. NOC IMPLEMENTATIONMAPPING THE SEQUENCE ALIGNMENT ALGORITHMS TO NOC • The PP Implementation • It is not practical to implement any arbitrarily higher dimensional hypercube. • A well-defined property of the communication pattern in parallel prefix algorithm is that PEs do not communicate arbitrarily. • The forward pass is followed by a backward pass operation. This step is implemented using p-1 neighbor PE communication exchanges, as the PEs regenerate the path from cell [m, n] to [0, 0].

  10. NOC IMPLEMENTATIONMAPPING THE SEQUENCE ALIGNMENT ALGORITHMS TO NOC • The AD Implementation • In our implementation, we achieve this by embedding a ring into the mesh, and following the Moore space filing curve. • Note that this result is unlike the PP approach; the communication volume is independent of the number of PEs during the forward pass.

  11. NOC IMPLEMENTATIONMAPPING THE SEQUENCE ALIGNMENT ALGORITHMS TO NOC • The AD Implementation • In our backward pass implementation, we partition the processors into two subgroups, and the number of processors in each of the subgroups depends on the number of cells in the two partitions. • The processor grouping requires a broadcast operation, to propagate the new partitioning cell coordinate to all the PEs, which takes O(log p) time (where p is the number of PEs).

  12. NOC IMPLEMENTATIONNOC SWITCH DESIGN • To reduce the delay, instead of building a multihop or pipelined communication link between two nonadjacent PEs. • The switch boxes are designed to establish a direct communication path (unpipelined or single hop) between the PEs

  13. NOC IMPLEMENTATIONNOC SWITCH DESIGN • To exchange data between PE1 and PE2, the following pass transistors will be on: Miph1 and Mh1 of the switch connected to PE1, and Miph1 of the switch connected to PE2.

  14. NOC IMPLEMENTATIONNOC SWITCH DESIGN

  15. EXPERIMENTAL RESULTSPIPELINED VERSUS UNPIPELINED IMPLEMENTATION • We considered two types of NoC implementations: • A “Pipelined” communication scheme, where all the nonadjacent PEs communicate in multiple hops step by step. • An “Unpipelined” communication scheme, where the nonadjacent hypercubic neighboring PEs communicate in a single hop with the help of bypass. Pipelined Unpipelined Multihop

  16. EXPERIMENTAL RESULTSPIPELINED VERSUS UNPIPELINED IMPLEMENTATION

  17. EXPERIMENTAL RESULTSENERGY AND TIMING CHARACTERISTICS OF THE PP APPROACH

  18. EXPERIMENTAL RESULTSENERGY AND TIMING CHARACTERISTICS OF THE AD APPROACH

  19. EXPERIMENTAL RESULTSCOMPARATIVE EVALUATION OF PP AND AD SCHEMES

  20. EXPERIMENTAL RESULTSIMPACT OF VARYING STRING SIZES • In order to allow a fair comparison, we varied the string sizes but keeping the total work (i.e., number of cells in the DP table) same. • As the number of rows is decreased and/or the number of columns increased, the total time and energy both decrease.

  21. EXPERIMENTAL RESULTSPARALLEL PREFIX IMPLEMENTATION ON PROTEIN SEQUENCE DATA

  22. EXPERIMENTAL RESULTSDISCUSSION

More Related