180 likes | 353 Views
Enhancing FPGA Performance for Arithmetic Circuits. Philip Brisk 1. Ajay K. Verma 1. Paolo Ienne 1. Hadi Parandeh-Afshar 1,2. 2 University of Tehran. 1. Department of Electrical and Computer Engineering. Outline. State of the Art: FPGAs Proposed Solution
E N D
Enhancing FPGA Performance for Arithmetic Circuits Philip Brisk1 Ajay K. Verma1 Paolo Ienne1 Hadi Parandeh-Afshar1,2 2University of Tehran 1 Department of Electrical and Computer Engineering
Outline • State of the Art: FPGAs • Proposed Solution • Field Programmable Counter Array (FPCA) • New Lattice for Accelerating Arithmetic Computations • Integrate on Same Die as FPGA • Experimental Results • Conclusion
ASIC FPGA Performance Area Utilization Power Consumption Flexibility Time-to-Market FPGA vs. ASIC √ √ √ √ √
FPGA Commentary • Poor Performance for Arithmetic Operations Compared to ASIC • IP Cores • Limited Flexibility; 18-bit Adder/Multiplier • Full Adder Implemented in CLB Structure • Fast Carry-Chain (Xilinx and Altera) • Reduces Routing Delay • Cannot Use Compressor Trees to Add k>2 Values • Wallace/Dadda/3-Greedy
Proposed Solution • Transform a DFG to Expose Multi-Input Addition Ops • [Verma and Ienne, ICCAD ’04] • Map Addition Ops onto New Lattice (FPCA) • Proposed Here • Map Everything Else onto Traditional FPGA • Standard Approach • Integrate FPGA+FPCA Onto Same Die • Ongoing Research at EPFL
step 3 delta 7 delta 4 delta 2 delta 1 >> 4 0 0 0 + step 1 0 & = = = SEL >> = step 0 step 1 step 2 step 3 2 + 0 & step 2 >> >> >> >> 0 0 0 SEL = >> SEL SEL SEL 1 & & & & 0 & + ∑ Compressor Tree SEL = + vpdiff vpdiff Verma-Ienne Transformation [ICCAD ’04] ADPCM
Proposed Hybrid Lattice FPGA FPCA ∑ + Final Adder (Programmable IP or FPGA) • FPCA : Field Programmable Counter Array • Novel Lattice for Accelerating Large Sums
Counters Counters You Know 2:2 – Half Adder 3:2 – Full Adder Count #of Input Bits Set to 1 Output # as a Binary Value (Carry-Save Adder) m:n counter m The correct building block for computing sums of k>2 numbers n Better than LUTs! n = log2(m+1)
Field Programmable Counter Array (FPCA) • Same Structure as an FPGA • Replace CLBs with Counters • Integrate onto Same Die as FPGA FPGA: (CLB) FPCA: (Counter)
Experimental Methodology • Xilinx Virtex-4, Altera Stratix-II, With/Without FPCA • 90nm CMOS Technology • For Multi-Input Addition Ops • FPGA – Adder Tree • Binary Adders in Virtex-4 • Ternary Adders in Stratix-II • FPCA – Build Compressor Trees From Counters • Use Modified Wallace Algorithm • Place-and-Route Using VPR • Use FPGA for Final Addition
Experimental Results Delay (ns)
Experimental Results Delay (ns) Virtex-4 Stratix-II Virtex-4 Stratix-II Virtex-4 Stratix-II
Experimental Results Virtex-4 Stratix-II Virtex-4 Stratix-II Virtex-4 Stratix-II FPCA – Register Placed on Every Counter Output
Conclusion • FPGA Performance for Arithmetic Circuits is Lacking • Hybrid FPGA/FPCA Accelerates Arithmetic Circuits • Significant Improvement in Area Utilization • Counters are the Correct Building Blocks for Multi-Input Additions • Marginal Improvements in Delay • FPGA – Fast Carry-Chain (No Routing Delay) • FPCA – All Wires Having Routing Delays • Naïve/No Retiming in These Experiments