150 likes | 230 Views
A Computing Origami: Folding Streams in FPGAs. DAC 2009, California, USA. S. M. Farhad PhD Student University of Sydney. Outline. Motivation Stream programming FPGA Problem Stream Folding Results Conclusion. 2. Stream Programming Paradigm. Programs expressed as stream graphs
E N D
A Computing Origami: Folding Streams in FPGAs DAC 2009, California, USA S. M. Farhad PhD Student University of Sydney
Outline • Motivation • Stream programming • FPGA • Problem • Stream Folding • Results • Conclusion 2
Stream Programming Paradigm • Programs expressed as stream graphs • Streams: Sequence of data elements • Actor: Functions applied to streams • Independent actors with explicit communication • Regular and repeating computation Streams Actor/Filter Streams
FPGA • FPGAs are widely available as programmable coprocessors • Opportunities to exploit FPGA-based acceleration • Multimedia, networking, graphics, and security codes
Problem • Maximizing throughput subject to • Area and latency constraints • Resolving bottleneck actors • The replicated filters do not require resynthesis
Outline • Motivation • Stream programming • FPGA • Problem • Stream Folding • Results • Conclusion 9
Area/Throughput Design Folding 1 foreachFilter f in S do 2 workFactor[f] = f.latency.S.runs(f); 3 designPointArea+ = f.area.workFactor[f]; 4 scaleLimit = minf.hasState (1/workFactor[f]); 5 scaling = min(AREA/designPointArea, scaleLimit); 6 foreachFilter f in S do 7 replication[f] = workFactor[f].scaling; 8 whilearea(replication) > AREA do 9 replication = reduceThroughput(replication);
Calculating Latency • FPGAs that are coupled to host processors • Initiation interval (DMA) • Replication improves throughput, it often increases the latency! • Major factors for latency variation • Non-periodic data arrival • Data-token reordering • Local congestion
Latency constrained design folding 1 latConf= null ; T = ∞; 2 whilethroughput(thrConf) ≤ T do 3 iffeasibleImprovement(thrConf) then 4 candidates = simAnnealing(thrConf, T); 5 foreachcandidate in candidates do 6 ifthroughput(candidate) < Tthen 7 latConf = candidate; 8 T = throughput(latConf); 9 thrConf = reduceThroughput(thrConf); 10 returnlatConf