120 likes | 221 Views
Closed-Loop Modeling of Power and Temperature Profiles of FPGAs. Kanupriya Gulati Sunil P. Khatri Peng Li Department of ECE, Texas A&M University, College Station. Introduction. Due to increasing density of FPGAs Power is now a zeroth order design constraint
E N D
Closed-Loop Modeling of Power and Temperature Profiles of FPGAs Kanupriya Gulati Sunil P. Khatri Peng Li Department of ECE, Texas A&M University, College Station
Introduction • Due to increasing density of FPGAs • Power is now a zeroth order design constraint • During operation, two components of power consumption are • Dynamic Power • Temperature independent • Static Power • Gate leakage • Largely temperature independent • Sub-threshold leakage • Exponential dependence on junction temperature • This positive feedback loop could cause • Non-convergence (thermal runaway) • Convergence above a safe junction temperature (thermal breakdown) Increase in dynamic power Increase in temperature Increase in leakage power
Our Approach • Our approach is design and FPGA device specific • Partition placed and routed FPGA design inton2grid regions • For each grid region, at the given temperature • Compute total power (dynamic and leakage power) • Dynamic power computed based on logic in the region • Leakage power computed using fast and accurate macromodels • From the power of the n2 grid regions, compute new thermal profile • Compute increase in temperature for each grid region • If change in temperature in all grid regions is less than ε, stop and declare convergence • If no convergence and new temperature in any grid region more than a threshold value, declare thermal breakdown • Else recompute leakage power of each grid region using new temperature value and iterate
Our Approach – Dynamic Power • Compute using the XPower tool from Xilinx • XPower reads the design data file and computes activity estimate ‘α’ • After synthesis, place and route of the design, we compute the maximum operating frequency ‘fckt’ • XPower has the node and wire capacitance values. So, Pdyn = C * Vdd2 * fckt * α • Find the contribution of grid region (i, j) to Pdyn • For each LUT in grid region (i, j), we compute • Probability of output being logic ‘1’, P1 = (ΣVk)/16 • Where Vkis the logic value stored in thekth SRAM of the LUT • Probability of output switching, Psw = 2 * P1 * (1-P1) • Average probability of switching in the grid region P(i, j) = (ΣPsw)/q • Where q is the number of LUTs per grid region • Pdyn(i, j) = Pdyn * P(i, j) * 1/(ΣP(i, j))
Our Approach – Static Power LUT Implementation using a 16:1 MUX L2’ Leakage NMOS Passgate Sub-threshold Leakage States NMOS Passgate Gate Leakage States
Our Approach – Static Power • Pre-compute leakage using SPICE for • LUT • SRAM configuration data is known • Each of the 31 pass gates in LUT are in one of • 4 states ( L1, L2, L3 or L2’ ) contributing to subthreshold leakage • 4 states ( K1, K2, K3 or K4 ) contributing to gate leakage or • Remaining states have negligible leakage contribution • But we do not know the f1, f2, f3 and f4 inputs to the LUT • Take average over 16 possible input combinations • SRAM cell in LUT (stored 1 and 0) • D-flipflop (output 1 and 0) • MUX Logic block in the FPGA
Our Approach – Total Power • Generate temperature dependent leakage macromodel for • LUT (L states), D-flipflop, SRAM and MUX • Pre-compute the leakage values at 3 different temperatures and fit exponential curve • Gate leakage (for K states) is largely temperature independent • Leakage is quickly and accurately estimated for the logic block at any temperature • Maximum 3% error when compared to explicit SPICE runs • 4 orders of magnitude faster • Compute leakage for grid region (i, j) at any temperature, Plkg(i, j, T) • Taking the sum of the leakages of all LUTs, D-flipflops, SRAMs and MUXes in region (i, j) at any temperature T = temp(i, j) • Total power Ptot(i, j, T) = Pdyn(i, j) + Plkg(i, j, T)
Our Approach – Temperature Computation • We use the following approach • “Critical path analysis considering temperature, power supply variations and temperature induced leakage”, P. Li, ISQED 2006 • Assume a 1W power consumption in grid region (i, j) • Table Zij(k, l) indicates resulting temperature at grid region (k, l) • We precompute n2 such Zij tables, each with n2 entries • We know the total power consumption of each grid region • Thus, we find the new temperature, temp_new(i,j), at the (i, j)th grid region, by superposition • Details of the thermal model • Circuit discretized into n2 grid regions • 15 layers of metal/dielectric are modeled • Assuming a metallization percentage for each layer, the thermal conductivity of each layer is computed • Model includes heat dissipation due to heat sinks
Endgame and Experimental Setup • Endgame • Find the absolute difference between • temp(i,j) and temp_new(i,j) • Declare convergence when the maximum difference for all grid points is < 0.001°c • If temp_new(i,j) > 110°c, and no convergence, we declare thermal breakdown • Setup • Applied our methodology to 10 designs, implemented on a Virtex-4XCVLX200 Xilinx FPGA device • Synthesized, placed and routed using Xilinx ISE 8.1i • Initial temperature set at 27°c • n = 16 • To the best of our knowledge, no other existing work reports final converged temperature and power numbers for FPGA designs, after closing the dependence loop between leakage and temperature • We therefore compared our final temperatures against a full-chip 3D thermal modeling and simulation tool • Maximum (average) error in temperature was 2.52%(1.05%) for the DMA benchmark • Our approach is faster by ~40X per iteration
Results Circuits operating at 450 MHz Temperature Profile for Circuit DMA
Conclusions • Developed a technique to simultaneously model (in an FPGA) • Power consumption • Temperature • Used fast and accurate macromodels, for leakage estimation • Over all circuit components of a logic block, at all temperatures • Less than 3% error compared to SPICE and • Up to 4 orders of magnitude speedup • Approach • Partition FPGA design (placed and routed) into 16x16 grid regions • Compute total power consumption (dynamic and leakage) for each region • Find thermal profile of IC under this power consumption • Using pre-computed power-to-temperature tables • New thermal information is used to update the leakage power consumption • Steps iterated until the temperature converges (for all grid regions), or exceeds a safe value (for any grid region) • Final temperature obtained from our method • Compared to full-chip 3D temperature estimation tool • Shows max.(avg.) error of 2.52%(1.05%) for the DMA benchmark