1 / 12

Closed-Loop Modeling of Power and Temperature Profiles of FPGAs

Closed-Loop Modeling of Power and Temperature Profiles of FPGAs. Kanupriya Gulati Sunil P. Khatri Peng Li Department of ECE, Texas A&M University, College Station. Introduction. Due to increasing density of FPGAs Power is now a zeroth order design constraint

Download Presentation

Closed-Loop Modeling of Power and Temperature Profiles of FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Closed-Loop Modeling of Power and Temperature Profiles of FPGAs Kanupriya Gulati Sunil P. Khatri Peng Li Department of ECE, Texas A&M University, College Station

  2. Introduction • Due to increasing density of FPGAs • Power is now a zeroth order design constraint • During operation, two components of power consumption are • Dynamic Power • Temperature independent • Static Power • Gate leakage • Largely temperature independent • Sub-threshold leakage • Exponential dependence on junction temperature • This positive feedback loop could cause • Non-convergence (thermal runaway) • Convergence above a safe junction temperature (thermal breakdown) Increase in dynamic power Increase in temperature Increase in leakage power

  3. Our Approach • Our approach is design and FPGA device specific • Partition placed and routed FPGA design inton2grid regions • For each grid region, at the given temperature • Compute total power (dynamic and leakage power) • Dynamic power computed based on logic in the region • Leakage power computed using fast and accurate macromodels • From the power of the n2 grid regions, compute new thermal profile • Compute increase in temperature for each grid region • If change in temperature in all grid regions is less than ε, stop and declare convergence • If no convergence and new temperature in any grid region more than a threshold value, declare thermal breakdown • Else recompute leakage power of each grid region using new temperature value and iterate

  4. Our Approach – Flowchart

  5. Our Approach – Dynamic Power • Compute using the XPower tool from Xilinx • XPower reads the design data file and computes activity estimate ‘α’ • After synthesis, place and route of the design, we compute the maximum operating frequency ‘fckt’ • XPower has the node and wire capacitance values. So, Pdyn = C * Vdd2 * fckt * α • Find the contribution of grid region (i, j) to Pdyn • For each LUT in grid region (i, j), we compute • Probability of output being logic ‘1’, P1 = (ΣVk)/16 • Where Vkis the logic value stored in thekth SRAM of the LUT • Probability of output switching, Psw = 2 * P1 * (1-P1) • Average probability of switching in the grid region P(i, j) = (ΣPsw)/q • Where q is the number of LUTs per grid region • Pdyn(i, j) = Pdyn * P(i, j) * 1/(ΣP(i, j))

  6. Our Approach – Static Power LUT Implementation using a 16:1 MUX L2’ Leakage NMOS Passgate Sub-threshold Leakage States NMOS Passgate Gate Leakage States

  7. Our Approach – Static Power • Pre-compute leakage using SPICE for • LUT • SRAM configuration data is known • Each of the 31 pass gates in LUT are in one of • 4 states ( L1, L2, L3 or L2’ ) contributing to subthreshold leakage • 4 states ( K1, K2, K3 or K4 ) contributing to gate leakage or • Remaining states have negligible leakage contribution • But we do not know the f1, f2, f3 and f4 inputs to the LUT • Take average over 16 possible input combinations • SRAM cell in LUT (stored 1 and 0) • D-flipflop (output 1 and 0) • MUX Logic block in the FPGA

  8. Our Approach – Total Power • Generate temperature dependent leakage macromodel for • LUT (L states), D-flipflop, SRAM and MUX • Pre-compute the leakage values at 3 different temperatures and fit exponential curve • Gate leakage (for K states) is largely temperature independent • Leakage is quickly and accurately estimated for the logic block at any temperature • Maximum 3% error when compared to explicit SPICE runs • 4 orders of magnitude faster • Compute leakage for grid region (i, j) at any temperature, Plkg(i, j, T) • Taking the sum of the leakages of all LUTs, D-flipflops, SRAMs and MUXes in region (i, j) at any temperature T = temp(i, j) • Total power Ptot(i, j, T) = Pdyn(i, j) + Plkg(i, j, T)

  9. Our Approach – Temperature Computation • We use the following approach • “Critical path analysis considering temperature, power supply variations and temperature induced leakage”, P. Li, ISQED 2006 • Assume a 1W power consumption in grid region (i, j) • Table Zij(k, l) indicates resulting temperature at grid region (k, l) • We precompute n2 such Zij tables, each with n2 entries • We know the total power consumption of each grid region • Thus, we find the new temperature, temp_new(i,j), at the (i, j)th grid region, by superposition • Details of the thermal model • Circuit discretized into n2 grid regions • 15 layers of metal/dielectric are modeled • Assuming a metallization percentage for each layer, the thermal conductivity of each layer is computed • Model includes heat dissipation due to heat sinks

  10. Endgame and Experimental Setup • Endgame • Find the absolute difference between • temp(i,j) and temp_new(i,j) • Declare convergence when the maximum difference for all grid points is < 0.001°c • If temp_new(i,j) > 110°c, and no convergence, we declare thermal breakdown • Setup • Applied our methodology to 10 designs, implemented on a Virtex-4XCVLX200 Xilinx FPGA device • Synthesized, placed and routed using Xilinx ISE 8.1i • Initial temperature set at 27°c • n = 16 • To the best of our knowledge, no other existing work reports final converged temperature and power numbers for FPGA designs, after closing the dependence loop between leakage and temperature • We therefore compared our final temperatures against a full-chip 3D thermal modeling and simulation tool • Maximum (average) error in temperature was 2.52%(1.05%) for the DMA benchmark • Our approach is faster by ~40X per iteration

  11. Results Circuits operating at 450 MHz Temperature Profile for Circuit DMA

  12. Conclusions • Developed a technique to simultaneously model (in an FPGA) • Power consumption • Temperature • Used fast and accurate macromodels, for leakage estimation • Over all circuit components of a logic block, at all temperatures • Less than 3% error compared to SPICE and • Up to 4 orders of magnitude speedup • Approach • Partition FPGA design (placed and routed) into 16x16 grid regions • Compute total power consumption (dynamic and leakage) for each region • Find thermal profile of IC under this power consumption • Using pre-computed power-to-temperature tables • New thermal information is used to update the leakage power consumption • Steps iterated until the temperature converges (for all grid regions), or exceeds a safe value (for any grid region) • Final temperature obtained from our method • Compared to full-chip 3D temperature estimation tool • Shows max.(avg.) error of 2.52%(1.05%) for the DMA benchmark

More Related