NETWORK-ON-CHIP HARDWARE ACCELERATORS FOR BIOLOGICAL SEQUENCE ALIGNMENT

NETWORK-ON-CHIP HARDWARE ACCELERATORSFOR BIOLOGICAL SEQUENCE ALIGNMENT Author: Souradip Sarkar; Gaurav Ramesh Kulkarni; Partha Pratim Pande; and Ananth Kalyanaraman Publisher: IEEE TRANSACTIONS ON COMPUTERS, JANUARY 2010 Presenter: Chin-Chung Pan Date: 2010/09/08

OUTLINE • Introduction • Algorithms for Sequence Alignment and Parallelization • The PP Approach • The AD Approach • NoC Implementation • Mapping the Sequence Alignment Algorithms to NoC • NoC Switch Design • Experimental Results

INTRODUCTION • Propose optimized NoC architectures for different sequence alignment algorithms that were originally designed for distributed memory parallel computers. • Provide a thorough comparative evaluation of their respective performance and energy dissipation. • The results show that our NoC-based implementations can provide above 102-103-fold speedup over other hardware accelerators and above 104-fold speedup over traditional CPU architectures.

ALGORITHMS FOR SEQUENCE ALIGNMENT AND PARALLELIZATION • We briefly outline the main ideas behind these two algorithms: • PP algorithm (for parallel prefix) • AD algorithm (for antidiagonal) • There are two main challenges in parallelizing DP algorithms for PSA: • Meeting the dependency constraint without affecting the parallel speedup during forward pass. • Computing the optimal retrace without storing the entire DP table during forward pass.

ALGORITHMS FOR SEQUENCE ALIGNMENT AND PARALLELIZATION • The PP approach • The forward pass in table computation proceeds iteratively, row by row, all the PEs participate in computing row i in parallel. • an O(n/p) local computation. • an O(logp) parallel prefix communication.

ALGORITHMS FOR SEQUENCE ALIGNMENT AND PARALLELIZATION • The PP approach

ALGORITHMS FOR SEQUENCE ALIGNMENT AND PARALLELIZATION • The AD algorithm • This requires that both the input sequences s1 and s2 are made available to all the PEs. • For the forward pass, the algorithm proceeds iteratively by computing one antidiagonal of DPtable at each “time step”.

NOC IMPLEMENTATIONMAPPING THE SEQUENCE ALIGNMENT ALGORITHMS TO NOC • The PP Implementation • While there are multiple NoC architectures, the hypercubic topology is best suited for the parallel prefix operation. • But a physical realization of the NoC is limited by the layout dimension of the chip, which is predominantly 2D in practice. D-dimensional mesh

NOC IMPLEMENTATIONMAPPING THE SEQUENCE ALIGNMENT ALGORITHMS TO NOC • The PP Implementation • It is not practical to implement any arbitrarily higher dimensional hypercube. • A well-defined property of the communication pattern in parallel prefix algorithm is that PEs do not communicate arbitrarily. • The forward pass is followed by a backward pass operation. This step is implemented using p-1 neighbor PE communication exchanges, as the PEs regenerate the path from cell [m, n] to [0, 0].

NOC IMPLEMENTATIONMAPPING THE SEQUENCE ALIGNMENT ALGORITHMS TO NOC • The AD Implementation • In our implementation, we achieve this by embedding a ring into the mesh, and following the Moore space filing curve. • Note that this result is unlike the PP approach; the communication volume is independent of the number of PEs during the forward pass.

NOC IMPLEMENTATIONMAPPING THE SEQUENCE ALIGNMENT ALGORITHMS TO NOC • The AD Implementation • In our backward pass implementation, we partition the processors into two subgroups, and the number of processors in each of the subgroups depends on the number of cells in the two partitions. • The processor grouping requires a broadcast operation, to propagate the new partitioning cell coordinate to all the PEs, which takes O(log p) time (where p is the number of PEs).

NOC IMPLEMENTATIONNOC SWITCH DESIGN • To reduce the delay, instead of building a multihop or pipelined communication link between two nonadjacent PEs. • The switch boxes are designed to establish a direct communication path (unpipelined or single hop) between the PEs

NOC IMPLEMENTATIONNOC SWITCH DESIGN • To exchange data between PE1 and PE2, the following pass transistors will be on: Miph1 and Mh1 of the switch connected to PE1, and Miph1 of the switch connected to PE2.

NOC IMPLEMENTATIONNOC SWITCH DESIGN

EXPERIMENTAL RESULTSPIPELINED VERSUS UNPIPELINED IMPLEMENTATION • We considered two types of NoC implementations: • A “Pipelined” communication scheme, where all the nonadjacent PEs communicate in multiple hops step by step. • An “Unpipelined” communication scheme, where the nonadjacent hypercubic neighboring PEs communicate in a single hop with the help of bypass. Pipelined Unpipelined Multihop

EXPERIMENTAL RESULTSPIPELINED VERSUS UNPIPELINED IMPLEMENTATION

EXPERIMENTAL RESULTSENERGY AND TIMING CHARACTERISTICS OF THE PP APPROACH

EXPERIMENTAL RESULTSENERGY AND TIMING CHARACTERISTICS OF THE AD APPROACH

EXPERIMENTAL RESULTSCOMPARATIVE EVALUATION OF PP AND AD SCHEMES

EXPERIMENTAL RESULTSIMPACT OF VARYING STRING SIZES • In order to allow a fair comparison, we varied the string sizes but keeping the total work (i.e., number of cells in the DP table) same. • As the number of rows is decreased and/or the number of columns increased, the total time and energy both decrease.

EXPERIMENTAL RESULTSPARALLEL PREFIX IMPLEMENTATION ON PROTEIN SEQUENCE DATA

EXPERIMENTAL RESULTSDISCUSSION

NETWORK-ON-CHIP HARDWARE ACCELERATORS FOR BIOLOGICAL SEQUENCE ALIGNMENT

NETWORK-ON-CHIP HARDWARE ACCELERATORS FOR BIOLOGICAL SEQUENCE ALIGNMENT

Presentation Transcript

2. Comparing biological sequences: sequence alignment (cont’d)

Network-on-chip

Trees, Stars, and Multiple Biological Sequence Alignment

Sequence Alignment

Network-on-Chip

Sequence Alignment

Biological Sequence Comparison and Alignment

Network-on-Chip

Parallel Hardware for Sequence Comparison and Alignment

A Brief Introduction to Biological Sequence Alignment

Sequence Alignment

Sequence Alignment

2. Comparing biological sequences : sequence alignment

Sequence Alignment

Sequence Alignment

Sequence alignment

Biological Motivation for Multiple Sequence Alignment

Sequence Alignment with Traceback on Reconfigurable Hardware

Sequence Alignment

Sequence Alignment

Sequence Alignment

Sequence Alignment