600 likes | 762 Views
CSV881: Low-Power Design Gate-Level Power Optimization. Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal. Components of Power. Dynamic
E N D
CSV881: Low-Power Design Gate-Level Power Optimization Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal Lectures 10, 11, 12: Gate-level optimization
Components of Power • Dynamic • Signal transitions • Logic activity • Glitches • Short-circuit (often neglected) • Static • Leakage Lectures 10, 11, 12: Gate-level optimization
Power of a Transition isc VDD Dynamic Power = CLVDD2/2+ Psc R Vo Vi CL R Ground Lectures 10, 11, 12: Gate-level optimization
Dynamic Power • Each transition of a gate consumes CV 2/2. • Methods of power saving: • Minimize load capacitances • Transistor sizing • Library-based gate selection • Reduce transitions • Logic design • Glitch reduction Lectures 10, 11, 12: Gate-level optimization
Glitch Power Reduction • Design a digital circuit for minimum transient energy consumption by eliminating hazards Lectures 10, 11, 12: Gate-level optimization
Theorem 1 • For correct operation with minimum energy consumption, a Boolean gate must produce no more than one event per transition. Output logic state changes One transition is necessary Output logic state unchanged No transition is necessary Lectures 10, 11, 12: Gate-level optimization
Event Propagation Single lumped inertial delay modeled for each gate PI transitions assumed to occur without time skew Path P1 1 3 1 0 2 4 6 P2 1 2 3 0 Path P3 5 2 0 Lectures 10, 11, 12: Gate-level optimization
Inertial Delay of an Inverter Vin dHL+dLH d = ──── 2 dHL dLH Vout time Lectures 10, 11, 12: Gate-level optimization
Multi-Input Gate A B Delay d < DPD C DPD: Differential path delay A B C DPD d d Hazard or glitch Lectures 10, 11, 12: Gate-level optimization
Balanced Path Delays A B Delay d < DPD DPD C Delay buffer A B C d No glitch Lectures 10, 11, 12: Gate-level optimization
Glitch Filtering by Inertia A B Delay d> DPD C A B C DPD d > DPD Filtered glitch Lectures 10, 11, 12: Gate-level optimization
Theorem • Given that events occur at the input of a gate, whose inertial delay is d, at times, t1 ≤ . . . ≤ tn , the number of events at the gate output cannot exceed tn – t1 ──── d min ( n , 1 + ) tn - t1 time t1 t2 t3 tn Lectures 10, 11, 12: Gate-level optimization
Minimum Transient Design • Minimum transient energy condition for a Boolean gate: | ti – tj | < d Where ti and tj are arrival times of input events and d is the inertial delay of gate Lectures 10, 11, 12: Gate-level optimization
Balanced Delay Method • All input events arrive simultaneously • Overall circuit delay not increased • Delay buffers may have to be inserted 1 1 1 1 1 No increase in critical path delay 3 1 1 1 1 1 Lectures 10, 11, 12: Gate-level optimization
Hazard Filter Method • Gate delay is made greater than maximum input path delay difference • No delay buffers needed (least transient energy) • Overall circuit delay may increase 1 1 1 1 1 3 1 1 1 1 Lectures 10, 11, 12: Gate-level optimization
Designing a Glitch-Free Circuit • Maintain specified critical path delay. • Glitch suppressed at all gates by • Path delay balancing • Glitch filtering by increasing inertial delay of gates or by inserting delay buffers when necessary. • A linear program optimally combines all objectives. Path delay = d1 |d1 – d2| < D Delay D Path delay = d2 Lectures 10, 11, 12: Gate-level optimization
Problem Complexity • Number of paths in a circuit can be exponential in circuit size. • Considering all paths through enumeration is infeasible for large circuits. • Example: c880 has 6.96M path constraints. Lectures 10, 11, 12: Gate-level optimization
Define Arrival Time Variables • di Gate delay. • Define two timing windowvariables per gate output: • tiEarliest time of signal transition at gate i. • Ti Latest time of signal transition at gate i. • Glitch suppression constraint: Ti – ti < di t1, T1 ti, Ti . . . di tn, Tn Reference: T. Raja, Master’s Thesis, Rutgers Univ., 2002. Lectures 10, 11, 12: Gate-level optimization
Linear Program • Variables: gate and buffer delays, arrival time variables. • Objective: minimize number of buffers. • Subject to: overall circuit delay constraint for all input-output paths. • Subject to: minimum transient energy condition for all multi-input gates. Lectures 10, 11, 12: Gate-level optimization
An Example: Full Adder add1b 1 1 1 1 1 1 1 1 1 Critical path delay = 6 Lectures 10, 11, 12: Gate-level optimization
Linear Program • Gate variables: d4 . . . d12 • Buffer delay variables: d15 . . . d29 • Window variables: t4 . . . t29 and T4 . . . . T29 Lectures 10, 11, 12: Gate-level optimization
Multiple-Input Gate Constraints For Gate 7: T7≥ T5 + d7 t7≤ t5 + d7d7 > T7 – t7 T7≥ T6 + d7 t7≤ t6 + d7 Glitch suppression Lectures 10, 11, 12: Gate-level optimization
Single-Input Gate Constraints Buffer 19: T16 + d19 = T19 t16 + d19 = t19 Lectures 10, 11, 12: Gate-level optimization
Critical Path Delay Constraints T11≤maxdelay T12≤maxdelay maxdelay is specified Lectures 10, 11, 12: Gate-level optimization
Objective Function • Need to minimize the number of buffers. • Because that leads to a nonlinear objective function, we use an approximate criterion: minimize ∑ (buffer delay) all buffers i.e., minimize d15 + d16 + ∙ ∙ ∙ + d29 • This gives a near optimum result. Lectures 10, 11, 12: Gate-level optimization
AMPL Solution: maxdelay =6 1 2 1 1 1 1 1 2 1 2 2 Critical path delay = 6 Lectures 10, 11, 12: Gate-level optimization
AMPL Solution: maxdelay =7 3 1 1 1 1 1 2 2 1 2 Critical path delay = 7 Lectures 10, 11, 12: Gate-level optimization
AMPL Solution: maxdelay ≥11 5 1 1 1 3 1 2 3 4 Critical path delay = 11 Lectures 10, 11, 12: Gate-level optimization
ALU4: Four-Bit ALU 74181 Maximum Power Savings (zero-buffer design): Peak = 33%, Average = 21% Lectures 10, 11, 12: Gate-level optimization
ALU4: Original and Low-Power Lectures 10, 11, 12: Gate-level optimization
Benchmark Circuits Normalized Power Max-delay (gates) 7 15 24 48 47 94 43 86 No. of Buffers 5 0 62 34 294 120 366 111 Circuit ALU4 C880 C6288 c7552 Average 0.80 0.79 0.68 0.68 0.40 0.36 0.44 0.42 Peak 0.68 0.67 0.54 0.52 0.36 0.34 0.34 0.32 Lectures 10, 11, 12: Gate-level optimization
C7552 Circuit: Spice Simulation Power Saving: Average 58%, Peak 68% Lectures 10, 11, 12: Gate-level optimization
References • R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, South San Francisco: The Scientific Press, 1993. • M. Berkelaar and E. Jacobs, “Using Gate Sizing to Reduce Glitch Power,” Proc. ProRISC Workshop, Mierlo, The Netherlands, Nov. 1996, pp. 183-188. • V. D. Agrawal, “Low Power Design by Hazard Filtering,” Proc. 10th Int’l Conf. VLSI Design, Jan. 1997, pp. 193-197. • V. D. Agrawal, M. L. Bushnell, G. Parthasarathy and R. Ramadoss, “Digital Circuit Design for Minimum Transient Energy and Linear Programming Method,” Proc. 12th Int’l Conf. VLSI Design, Jan. 1999, pp. 434-439. • T. Raja, V. D. Agrawal and M. L. Bushnell, “Minimum DynamicPower CMOS Circuit Design by a Reduced Constraint Set Linear Program,” Proc. 16thInt’l Conf. VLSI Design, Jan. 2003, pp. 527-532. • T. Raja, V. D. Agrawal, and M. L. Bushnell, “Transistor sizing of logicgates to maximize input delay variability,” J. Low Power Electron., vol.2, no. 1, pp. 121–128, Apr. 2006. • T. Raja, V. D. Agrawal, and M. L. Bushnell, “Variable Input Delay CMOS Logic for Low Power Design,” IEEE Trans. VLSI Design, vol. 17, mo. 10, pp. 1534-1545. October 2009. Lectures 10, 11, 12: Gate-level optimization
Exercise: Dynamic Power • An average gate • VDD, V = 1 volt • Output capacitance, C = 1pF • Activity factor, α = 10% • Clock frequency, f = 1GHz • What is the dynamic power consumption of a 1 million gate VLSI chip? Lectures 10, 11, 12: Gate-level optimization
Answer • Dynamic energy per transition = 0.5CV2 • Dynamic power per gate = Energy per second = 0.5 CV2 α f = 0.5 ✕ 10 – 12 ✕ 12 ✕ 0.1 ✕ 109 = 0.5 ✕ 10 – 4 = 50μW • Power for 1 million gate chip = 50W Lectures 10, 11, 12: Gate-level optimization
Components of Power • Dynamic • Signal transitions • Logic activity • Glitches • Short-circuit • Static • Leakage Lectures 10, 11, 12: Gate-level optimization
Subthreshold Conduction Vgs – Vth –Vds Ids = I0 exp( ───── ) × (1– exp ─── ) nVT VT Ids 1mA 100μA 10μA 1μA 100nA 10nA 1nA 100pA 10pA Subthreshold slope Saturation region Subthreshold region d g s Vth 0 0.3 0.6 0.9 1.2 1.5 1.8 V Vgs Lectures 10, 11, 12: Gate-level optimization
Thermal Voltage, vT VT = kT/q = 26 mV, at room temperature. When Vds is several times greater than VT Vgs – Vth Ids = I0 exp( ───── ) nVT Lectures 10, 11, 12: Gate-level optimization
Leakage Current • Leakage current equals Ids when Vgs= 0 • Leakage current, Ids = I0exp( – Vth/nVT) • At cutoff, Vgs = Vth, and Ids = I0 • Lowering leakage to 10-b ✕ I0 Vth = bnVTln 10 = 1.5b × 26 ln 10 = 90b mV • Example: To lower leakage to I0/1,000 Vth = 270 mV Lectures 10, 11, 12: Gate-level optimization
Threshold Voltage • Vth = Vt0 + γ[(Φs+Vsb)½ – Φs½] • Vt0 is threshold voltage when source is at body potential (0.4 V for 180nm process) • Φs = 2VTln(NA /ni)is surface potential • γ = (2qεsiNA)½tox /εox is body effect coefficient (0.4 to 1.0) • NA is doping level = 8×1017 cm–3 • ni = 1.45×1010 cm–3 Lectures 10, 11, 12: Gate-level optimization
Threshold Voltage, Vsb = 1.1V • Thermal voltage, VT = kT/q = 26 mV • Φs = 0.93 V • εox = 3.9×8.85×10-14 F/cm • εsi = 11.7×8.85×10-14 F/cm • tox = 40 Ao • γ = 0.6 V½ • Vth = Vt0 + γ[(Φs+Vsb)½- Φs½] = 0.68 V Lectures 10, 11, 12: Gate-level optimization
A Sample Calculation • VDD = 1.2V, 100nm CMOS process • Transistor width, W = 0.5μm • OFF device (Vgs = Vth) leakage • I0 = 20nA/μm, for low threshold transistor • I0 = 3nA/μm, for high threshold transistor • 100M transistor chip • Power = (100×106/2)(0.5×20×10-9A)(1.2V) = 600mW for all low-threshold transistors • Power = (100×106/2)(0.5×3×10-9A)(1.2V) = 90mW for all high-threshold transistors Lectures 10, 11, 12: Gate-level optimization
Dual-Threshold Chip • Low-threshold only for 20% transistors on critical path. • Leakage power = 600×0.2 + 90×0.8 = 120 + 72 = 192 mW Lectures 10, 11, 12: Gate-level optimization
Dual-Threshold CMOS Circuit Lectures 10, 11, 12: Gate-level optimization
Dual-Threshold Design • To maintain performance, all gates on critical paths are assigned low Vth . • Most other gates are assigned high Vth . • But, some gates on non-critical paths may also be assigned low Vth to prevent those paths from becoming critical. Lectures 10, 11, 12: Gate-level optimization
Integer Linear Programming (ILP) to Minimize Leakage Power • Use dual-threshold CMOS process • First, assign all gates low Vth • Use an ILP model to find the delay (Tc) of the critical path • Use another ILP model to find the optimal Vth assignment as well as the reduced leakage power for all gates without increasing Tc • Further reduction of leakage power possible by letting Tc increase Lectures 10, 11, 12: Gate-level optimization
ILP -Variables For each gate i define two variables. • Ti :the longest time at which the output of gate i can produce an event after the occurrence of an input event at a primary input of the circuit. • Xi :a variable specifyinglow or high Vth for gate i ; Xiis an integer [0, 1], 1 gate i is assigned low Vth , 0 gate i is assigned high Vth . Lectures 10, 11, 12: Gate-level optimization
ILP - objective function Leakage power: minimize the sum of all gate leakage currents, given by • ILi is the leakage current of gate i with low Vth • IHiis the leakage current of gate i with high Vth • Using SPICE simulation results, construct a leakage current look up table, which is indexed by the gate type and the input vector. Lectures 10, 11, 12: Gate-level optimization
ILP - Constraints Ti • For each gate (1) output of gate j is fanin of gate i (2) • Max delay constraints for primary outputs (PO) (3) Tmax is the maximum delay of the critical path Gate i Gate j Tj Lectures 10, 11, 12: Gate-level optimization
ILP Constraint Example • Assume all primary input (PI) signals on the left arrive at the same time. • For gate 2, constraints are Lectures 10, 11, 12: Gate-level optimization