A Massively Parallel Architecture for Bioinformatics

A Massively Parallel Architecture for Bioinformatics Presented by MdJamiulJahid

Introduction • Bioinformatics algorithms are demanding in scientific computing • In general most of the bioinformatics algorithms are fairly simple • Dealing with huge amount of data • The size of DNA sequence database doubles every year

Introduction • A typical DNA contains 3.4 billion base pairs • Maximum algorithms use only simple operations with input data like • Arithmetic operation • String matching • String comparison

Introduction • Standard CPUs are designed for providing a good instruction mix for almost all commonly used algorithm • For a target class of algorithm they are not effective • Results • High runtime • Energy • Money

Contribution • Present a massively parallel architecture • Using low cost FPGA(Field Programmable Gate Array) • They called it COPACOBANA 5000 • Meaning Cost-Optimized Parallel Code BrakerANdAnalyzer

COPACOBANA 1000 • This machine is for cryptanalysis: fast code breaking • 120 low cost FPGAs • 20 subunits • Each has Xilinx Spartan -3 XC3S1000 FPGAs

COPACOBANA 1000 • Assumptions • Programs are parallelizable • Demand of data transfer is low • All node needed very little local memory which can be served from on-chip RAM of FPGAs

COPACOBANA 5000 • Bus Concepts • Point to point connection two neighboring FPGA-cards • Point to point connection contain 8 pairs of wire • Each 250MHz, total 2Gbit/s

COPACOBANA 5000 • Controller • Root entity of control is running on a remote host computer • Connected to COPACOBANA5000 by LAN • Two scenario • Data on remote host • Data on COPACOBANA5000

COPACOBANA 5000 • FPGA-Card • Xilinx Spartan-3 5000 is used • Contains 8 FPGAs • All FPGAs are globally clocked

Performance Estimation • Between • PC • COPACOBANA1000 • COPACOBANA5000

Performance Estimation

Conclusion • In this paper a new hardware for running bioinformatics algorithm is proposed • The hardware are • Cheap • Low power consumption • Efficient

Questions ?

Thank You

Reference • Gerd Pfeiffer, Stefan Baumgart, Jan Schröder, and Manfred Schimmler, A Massively Parallel Architecture for Bioinformatics, 9th International Conference on Computational Science (ICCS 2009).

A Massively Parallel Architecture for Bioinformatics

A Massively Parallel Architecture for Bioinformatics

Presentation Transcript

Massively Parallel Processors

Error model for massively parallel (454) DNA sequencing

Massively Parallel LDPC Decoding on GPU

Scalable Parallel I/O Alternatives for Massively Parallel Partitioned Solver Systems

Programming Massively Parallel Graphics Processors

Approximate History Map for Massively Parallel Environments

Massively Parallel/Distributed Data Storage Systems

Emulating Massively Parallel (Peta FLOPS ) Machines

Parallel Architecture

A Few Thoughts on Programming Models for Massively Parallel Systems

Massively Parallel Multgrid for Finite Elements

Massively Parallel Solutions for Molecular Sequence Analysis

The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Massively Parallel Signature Sequencing (MPSS)

Steering Massively Parallel Applications Under Python

Implementing Data Parallel Algorithms for Bioinformatics

PAVEMENT/PIO Parallel I/O System for Massively Parallel Processors

CM-5 Massively Parallel Supercomputer

Massively Parallel Computing for Protein Alignment

Parallel BioInformatics

Massively Parallel Solutions for Molecular Sequence Analysis

A Fault Tolerant Protocol for Massively Parallel Machines