1 / 20

Architecture Exploration of FPGA based Accelerators for Bioinformatics

Architecture Exploration of FPGA based Accelerators for Bioinformatics. Thesis Presentation. B. Sharat Chandra Varma Amarnath Shashi Khosla School of Information Technology, varma@cse.iitd.ac.in. Supervisors. Prof. M. Balakrishnan Department of Computer Science and Engineering

myersb
Download Presentation

Architecture Exploration of FPGA based Accelerators for Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Architecture Exploration of FPGA based Accelerators for Bioinformatics Thesis Presentation B. Sharat Chandra Varma Amarnath Shashi Khosla School of Information Technology, varma@cse.iitd.ac.in Supervisors Prof. M. Balakrishnan Department of Computer Science and Engineering mbala@cse.iitd.ac.in Dr. Kolin Paul Department of Computer Science and Engineering kolin@cse.iitd.ac.in

  2. Motivation

  3. Architecture exploration FPGA with Accelerator HEBs Accelerating Protein-Docking Application Application Accelerating De Novo Genome Assembly Methodology High Level Models Bart Kienhuis et. Al. SAMOS 2002 [24]

  4. Methodology for DSE VEB Flow Adopted from (Chun Hok Ho et al. FCCM 2006) [16]

  5. Application-Sequencing Problem • Sample  Sequencer  Reads • Large number of short reads of 35-250 bp are generated • There is a need for • mapping the short reads to a reference genome • reconstruct the whole genome from the overlap information • 0.9 billion Bp- 16hr43min –166 GB RAM [Nitin Joshi et al. HiPC 2011 [20]. Sample Sequencer Computer

  6. Application - de-novo genome assembly Construct the whole sequence from the reads when reference genome is not known. … ACTGTGTGTACTGATGTCACTGCTCGATCTATCCTAAGCTGTGATACTGCA … Sample ACTGTGTGTA … CCTAAGCTG TGTGTACTGAC … GCTGTGATAC CTGATGTCAC ATGATACTGCA Reads Contig Contig

  7. Approach Reads ----------------- ----------------- ----------------- ----------------- . . . ---------------- ---------------- Objective To make contigs of as much “long” as possible by aligning the reads. To reduce the number of contigs. ------------------------------------------------------------------------------------------------------------- ---------------------------- ----------------------------------- ---------------- Contigs CPU Velvet

  8. FPGA based Acceleration Reads ----------------- ----------------- ----------------- ----------------- . . . ---------------- ---------------- ---------------------------------------------------------------------------------------- --------------------------------------------- --------------------------------------------------------- -------------------------------- Intermediate Contigs FPGA Redundancy Removal Unit ------------------------------------------------------------------------------------------------------------------------ ---------------------------- Contigs Velvet CPU

  9. Different Models Parameters C-Model using Mapsembler [17] Overall Speed-up Quality of Output Simulation of Large Genomes Initial Algorithm Changes System C-Model Threshold Variation Pre-Filter Design Simulation with HEBs Effect of pre-filter on speedup VHDL Model Clock Speed Number of PEs in FPGA HEB design

  10. Meaning of Terms • K-mer of a string • ACATCGTAGACAGTAGTCGATGTCGATC • For eg. if K =11 • K-mer 1= ACATCGTAGAC • K-mer 2= CATCGTAGACA • K-mer 3= ATCGTAGACAG • . • . • . • K-mer N = CGATGTCGATC

  11. Meaning of Extension • Starter • ACATCGTAGACAGTAGTCGATGTCGATC • Read • TGGATGATAGCATCGTAGACACA • Extend starter • ACATCGTAGACAGTAGTCGATGTCGATC • TGGATGATAACATCGTAGACAGT • Extended Starter • TGGATGATAACATCGTAGACAGTAGTCGATGTCGATC • Extension at the edges. • Either right or left. • ACATCGTAGACAGTAGTCGATGTCGATC • TGGATGATAACATCGTAGACAGT • Cycle 1 • ACATCGTAGACAGTAGTCGATGTCGATC • TGGATGATAACATCGTAGACAGT • Cycle 2 • ACATCGTAGACAGTAGTCGATGTCGATC • TGGATGATAACATCGTAGACAGT • Cycle 3 • ACATCGTAGACAGTAGTCGATGTCGATC • TGGATGATAACATCGTAGACAGT • Cycle 9

  12. MAPSEMBLER[25] for Assembly – C Model -------------------------------- ---------------- ---------------- ---------------- ---------------- . Reads . ---------------- ---------------- ---------------- ---------------- ---------------- ---------------- -------------------------------- Random reads as Starters . . . ---------------- ---------------- ---------------- ---------------- ---------------- Kmer 1:(frag10,pos3) Kmer 2:frag2,pos3)(frag8,pos1) . Hash Table . . Kmer n-1:(frag9,pos3) Kmer n:frag3,pos3)(frag8,pos1) ---------------- ---------------- ---------------- -------------------------- ---------------- Update Hash Extended Starter Delete Read ---------------- ---------------- ----------------

  13. MAPSEMBLER for Assembly ---------------- ---------------- . Reads . ---------------- Random reads as Starters . . . --------------------------------- Kmer 1:(frag10,pos3) Kmer 2:frag2,pos3)(frag8,pos1) . Hash Table . . Kmer n-1:(frag9,pos3) Kmer n:frag3,pos3)(frag8,pos1) Extended Starter ---------------- ---------------- -------------------------------------- ---------------- Extended Starter ----------------------------------- ---------------- Intermediate Contig (Starter not extended in full single round of reads) -------------------------------- -------------------- -----------------------------

  14. Hardware Design FPGA Host ---------------- ---------------- . Reads . ---------------- ---------------- ---------------- ---- ---- Pre-Filter ---- ---- Pre-Filter ---- ---- Pre-Filter ---- ---- Extend if Possible ---- ---- Extend if Possible ---- ---- Extend if Possible Further Processing Input: 2 x 256 bit = read and readVec Read = 200 bits ReadString (read length =100, A=00,C=01,T=10,G=11) 32 Bit read Number 1 bit Used for initializing starters in stream readVec = Vector constructed

  15. Pre-Filter Design • Construct a 256 bit vector consisting of 256 4-mers. • The bit is set if 4-mer exists in the read else its 0. • Eg. If Read is AAAAAAAGGGGG • A AAA …. … … G • A AAA … …. … G • A AAA G • A G C T G • 1 1 0 0 … … … 1 • Construct the readVec for each read. (Pre-Processing in H/w) • Find population count of ‘1’s in • readVec ‘AND’ starterLeftVec. Find popcntL(no. of ‘1’s) • readVec ‘AND’ starterRightVec. Find popcntR • IF popcntL > (Threshold) or IF popcntR > (Threshold) send for extension.

  16. Hardware Design • Read is represented in binary coded format • A=“00”, C= “01”, T=“10” and G=“11” • We do not store the whole starter. • Store the left end and right end of starter equivalent to read length. • Eg. ACTGCTGTGTGTGTGTGTGTGTGATGTACTGCA if Read length = 9 • starterLeft = ACTGCTGTG • starterRight = TGTACTGCA • For first tiime we store the read as both starterLeft and starterRight • We consider precheck. • It checks if there is possibility of extension. • Shift and extend if possible.

  17. Speedups - Swinepox Genome (C-model) Intel Core2 duo E4700 processor running at 2.6 GHz with 4GB RAM. • Speedups increase and then decrease slightly with rounds • I/O Dominates after reaching knee • More speedups with more PEs

  18. Compression with varying threshold (System-C model) • Compression with HEBs is higher • After threshold of 12 the compression reduces. • Multi-FPGA simulations were done using System-C model. (More results in Paper) 19

  19. HEB design- area & operating frequency (VHDL model) HEB is 1-counter Eg. “11110100” -- 5 256-bit 1-counter Area = 11983 um2 Operating freq = 185MHz.

  20. Results- Ecoli Genome a - without any HEBs, b - with only FIFO controller HEBs c - With only 1-counter HEBs d - with both the HEBs • HEBs reduce the processing time

More Related