270 likes | 354 Views
Rapid Estimation of Power Consumption for Hybrid FPGAs. Chun Hok Ho 1 , Philip Leong 2 , Wayne Luk 1 , Steve Wilton 3 1 Department of Computing, Imperial College London 2 Department of Computer Science and Engineering, Chinese University of Hong Kong
E N D
Rapid Estimation of Power Consumption for Hybrid FPGAs Chun Hok Ho1, Philip Leong2, Wayne Luk1, Steve Wilton3 1 Department of Computing, Imperial College London 2 Department of Computer Science and Engineering, Chinese University of Hong Kong 3 Department of Electrical and Computer Engineering, University of British Columbia 9 September 2008
Overview 1. Motivation 2. Contributions 3. Related Work 4. Rapid Power Estimation Flow 5. Technology Mapper 6. Evaluation 7. Future work + Conclusion
Motivation • For a new hybrid FPGA architecture • How do we assess power dissipation rapidly? • How do we map application into such architecture effectively?
Contributions • High level power estimation flow • Estimate the power using various vendor toolchain and technique • Hybrid FPGA technology mapper • Produce netlist/bitstream based on dataflow graph (DFG)
Related work Hybrid FPGA: architecture [1] D=9, M=4, R=3, F=3, 2 add, 2 mul: best density over benchmarks [1] C. Ho et. al , “Domain-Specific Hybrid FPGA: Architecture and Floating Point Applications”, FPL 2007
Related work: Virtual Embedded Blocks [1] • Dummy blocks used to model coarse-grained block’s area and delay • Timing analyzer can be used to determine hybrid’s performance (including fine-to-coarse routing and delays) [1] C. Ho et. al, “Virtual Embedded Blocks: A Methodology for Evaluating Embedded Elements in FPGAs ”, FCCM 2006
Power estimation flow • Different tools chain involved • VEB modelling flow • FPGA power spreadsheet model • ASIC power compiler flow • Limitation • Dynamic power consumption only (power loss due to switching activity) • Constant activity rate is assumed • Core only – no I/O power is assessed • First order estimation • Accurate simulation based model is required
Power estimation flow • Pall – Total power dissipations • Pfgu – power dissipated in fine-grained unit (FGU) • Pcgu – power dissipated in coarse-grained unit (CGU) • Pr – power dissipated in routing between FGU and CGU
Power estimation flow (Pfgu) • Synthesis the circuit with VEB flow • Measure the power of the circuit with spreadsheet approach (P’) • Constant activity rate of 12.5% applied • Measure the power of the VEB with spreadsheet approach (Pveb) • Pfgu = P’ - Pveb
Power estimation flow (Pcgu) • Synthesis the coarse-grained unit with ASIC flow • Configure the ASIC netlist with bitstream • Apply constant activity rate on all the nets • Estimate the dynamic power with power compiler tool
Power estimation flow (Pr) • Pr can be modeled by providing suitable output loading in estimating Pcgu • Output loading can be calibrated by referring existing embedded block • Embedded multiplier blocks in Virtex II is used in calibration.
Power estimation flow (Pr) • Measure the power of multiplier in FPGA using spreadsheet (Pem) • Implement a multiplier in ASIC flow • Measure the power of ASIC multiplier (Pam) • Adjust loading capacitance (CL)such thatPam ~= Pem • Apply CL in estimating Pcgu
Technology mapper • A tool for producing netlist/bitstream from high level description • Reuse existing C-to-gate compiler • CHiMPS [1] • Trident [2] • fly [3] • Only backend is different – technology mapper [1] A. Putnam, et. al, “CHiMPS: A C-Level Compilation Flow for Hybrid CPU-FPGA Architectures”, FPL 2008 [2] J. Tripp, et. al, “Trident: An FPGA Compiler Framework for Floating-Point Algorithms”, FPL 2005 [3] C. Ho, et. al, “Fly - A Modifiable Hardware Compiler”, FPL 2002
Technology mapper • Greedy algorithm • Not optimal but effective in most cases • Pack as much operations in a single coarse-grained unit as possible • No suitable block – use soft core • Coarse-grained units use up – use soft core
Mapping example fadd tmp1, a, b fadd tmp2, c, d
Mapping example fmul tmp3, tmp1, tmp2
Mapping example fsqrt tmp4, tmp3 • No square root dedicated block, use fine-grained unit
Mapping example fmul z, tmp4, g • Instantiate another coarse-grained unit and connect altogether
Evaluation • How effective of the technology mapper? • Compare with optimal mapping • How much power/energy can be reduced by introducing coarse-grained unit? • Compare with existing FPGA devices
Evaluation • 8 benchmark circuits • DSP computation kernels: e.g. bfly • Linear algebra: e.g. mm3 • Complete application: e.g. bgm • Synthetic benchmark: e.g. syn2 • Circuits are mapped to hybrid FPGA using technology mapper • Synthesized to Xilinx Virtex II devices for comparison
EvaluationPower reduction * syn7 is implemented on XC2V8000-5
EvaluationEnergy reduction Energy reduced by 14 times on average
Future work • Integration of technology mapper into existing compiler • Trident, fly • Simulation based power estimation flow for more accurate results • Power estimation comparison with HHVPR [1] flow • Static power consumption? [1] N. Choy, et. al, “Activity-Based Power Estimation and Characterization of DSP and Multiplier Blocks in FPGAs”, FPT 2006
Conclusion • Rapid power estimation flow on hybrid FPGA • VEB flow, FPGA power spreadsheet, ASIC power compiler • Technology mapper for hybrid FPGA • Target different coarse-grained units • DFG input to cope with existing compiler • Produce netlist and bitstream • Assess hybrid FPGA power consumption • Power reduced by 4 times • Energy reduced by 14 times