1 / 13

Hardware Acceleration of Parallel Prefix Algorithms

Hardware Acceleration of Parallel Prefix Algorithms. Peter Scott (Project leader) Avinash Srinivasa Vaibhav Sundriyal. What is parallel prefix?. Finding parallelism in serial-looking problems. Take an array, like [1, 3, 2, 1] Find partial sums: [1, 1+3, 1+3+2, 1+3+2+1]

lonna
Download Presentation

Hardware Acceleration of Parallel Prefix Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware Acceleration of Parallel Prefix Algorithms Peter Scott (Project leader) Avinash Srinivasa Vaibhav Sundriyal

  2. What is parallel prefix? • Finding parallelism in serial-looking problems. • Take an array, like [1, 3, 2, 1] • Find partial sums: [1, 1+3, 1+3+2, 1+3+2+1] • We can use any associative operation, not just addition. • Matrix multiplication is okay • Vector dot product doesn’t work

  3. Applications • DNA sequence alignment • Large tree data structure acceleration • Incremental regular expression matching • Many others, parameterizable by kernel.

  4. Parallel version of this • Distribute data to several processors. • Do redundant computations to get parallelism. Image taken from Steele & Hillis, 1986.

  5. Architecture • Several processors, shared multi-channel bus

  6. 1,2 3,4 5,6 7,8 P1 P2 P3 P4 1,3 3,7 5,11 7,15 OPERATE 1,3 3,7 5,11 7,15 COMMUNICATE 1,3 6,10 5,11 18,26 UPDATE 1,3 6,10 5,11 18,26 COMMUNICATE 1,3 6,10 15,21 28,36 UPDATE

  7. Bus contention • There are often more processors than bus channels. • How to deal with contention? • Answer: pre-computed static scheduling. • Store schedule as sequence of instructions: • Write <channel> • Load <channel> • No_op • Comm_step_complete

  8. How to use the final product • Write VHDL for an associative binary operation, like addition or multiplication. • Say how many processors you want, how wide your data are, how many bus channels, etc. • A wizard generates all the VHDL. • Just customize it and go.

  9. Core generator wizard

  10. Core generator wizard

  11. Generates processor code…

  12. …and various supporting files • Bus program memory holds bus instructions • Prefix accelerator instantiates processors and bus • Etc.

  13. Related Papers • Explanation of parallel prefix and DNA sequence alignment (Aluru): http://class.ece.iastate.edu/cpre526/basics.pdf • Data parallel algorithms (Steele and Hillis): http://cva.stanford.edu/classes/cs99s/papers/hillis-steele-data-parallel-algorithms.pdf • Prefix sums and their applications (Bleloch): http://www.cs.cmu.edu/~guyb/papers/Ble93.pdf • Finger trees (Hinze & Paterson): http://www.soi.city.ac.uk/~ross/papers/FingerTree.pdf

More Related