1 / 15

A Computing Origami: Folding Streams in FPGAs

A Computing Origami: Folding Streams in FPGAs. DAC 2009, California, USA. S. M. Farhad PhD Student University of Sydney. Outline. Motivation Stream programming FPGA Problem Stream Folding Results Conclusion. 2. Stream Programming Paradigm. Programs expressed as stream graphs

lucas
Download Presentation

A Computing Origami: Folding Streams in FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Computing Origami: Folding Streams in FPGAs DAC 2009, California, USA S. M. Farhad PhD Student University of Sydney

  2. Outline • Motivation • Stream programming • FPGA • Problem • Stream Folding • Results • Conclusion 2

  3. Stream Programming Paradigm • Programs expressed as stream graphs • Streams: Sequence of data elements • Actor: Functions applied to streams • Independent actors with explicit communication • Regular and repeating computation Streams Actor/Filter Streams

  4. FPGA • FPGAs are widely available as programmable coprocessors • Opportunities to exploit FPGA-based acceleration • Multimedia, networking, graphics, and security codes

  5. Problem • Maximizing throughput subject to • Area and latency constraints • Resolving bottleneck actors • The replicated filters do not require resynthesis

  6. Motivating Example

  7. Motivating Example

  8. Motivating Example

  9. Outline • Motivation • Stream programming • FPGA • Problem • Stream Folding • Results • Conclusion 9

  10. Area/Throughput Design Folding 1 foreachFilter f in S do 2 workFactor[f] = f.latency.S.runs(f); 3 designPointArea+ = f.area.workFactor[f]; 4 scaleLimit = minf.hasState (1/workFactor[f]); 5 scaling = min(AREA/designPointArea, scaleLimit); 6 foreachFilter f in S do 7 replication[f] = workFactor[f].scaling; 8 whilearea(replication) > AREA do 9 replication = reduceThroughput(replication);

  11. Calculating Throughput

  12. Calculating Latency • FPGAs that are coupled to host processors • Initiation interval (DMA) • Replication improves throughput, it often increases the latency! • Major factors for latency variation • Non-periodic data arrival • Data-token reordering • Local congestion

  13. Latency constrained design folding 1 latConf= null ; T = ∞; 2 whilethroughput(thrConf) ≤ T do 3 iffeasibleImprovement(thrConf) then 4 candidates = simAnnealing(thrConf, T); 5 foreachcandidate in candidates do 6 ifthroughput(candidate) < Tthen 7 latConf = candidate; 8 T = throughput(latConf); 9 thrConf = reduceThroughput(thrConf); 10 returnlatConf

  14. Results

  15. Questions?

More Related