CSV881: Low-Power Design Gate-Level Power Optimization

CSV881: Low-Power Design Gate-Level Power Optimization Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal Lectures 10, 11, 12: Gate-level optimization

Components of Power • Dynamic • Signal transitions • Logic activity • Glitches • Short-circuit (often neglected) • Static • Leakage Lectures 10, 11, 12: Gate-level optimization

Power of a Transition isc VDD Dynamic Power = CLVDD2/2+ Psc R Vo Vi CL R Ground Lectures 10, 11, 12: Gate-level optimization

Dynamic Power • Each transition of a gate consumes CV 2/2. • Methods of power saving: • Minimize load capacitances • Transistor sizing • Library-based gate selection • Reduce transitions • Logic design • Glitch reduction Lectures 10, 11, 12: Gate-level optimization

Glitch Power Reduction • Design a digital circuit for minimum transient energy consumption by eliminating hazards Lectures 10, 11, 12: Gate-level optimization

Theorem 1 • For correct operation with minimum energy consumption, a Boolean gate must produce no more than one event per transition. Output logic state changes One transition is necessary Output logic state unchanged No transition is necessary Lectures 10, 11, 12: Gate-level optimization

Event Propagation Single lumped inertial delay modeled for each gate PI transitions assumed to occur without time skew Path P1 1 3 1 0 2 4 6 P2 1 2 3 0 Path P3 5 2 0 Lectures 10, 11, 12: Gate-level optimization

Inertial Delay of an Inverter Vin dHL+dLH d = ──── 2 dHL dLH Vout time Lectures 10, 11, 12: Gate-level optimization

Multi-Input Gate A B Delay d < DPD C DPD: Differential path delay A B C DPD d d Hazard or glitch Lectures 10, 11, 12: Gate-level optimization

Balanced Path Delays A B Delay d < DPD DPD C Delay buffer A B C d No glitch Lectures 10, 11, 12: Gate-level optimization

Glitch Filtering by Inertia A B Delay d> DPD C A B C DPD d > DPD Filtered glitch Lectures 10, 11, 12: Gate-level optimization

Theorem • Given that events occur at the input of a gate, whose inertial delay is d, at times, t1 ≤ . . . ≤ tn , the number of events at the gate output cannot exceed tn – t1 ──── d min ( n , 1 + ) tn - t1 time t1 t2 t3 tn Lectures 10, 11, 12: Gate-level optimization

Minimum Transient Design • Minimum transient energy condition for a Boolean gate: | ti – tj | < d Where ti and tj are arrival times of input events and d is the inertial delay of gate Lectures 10, 11, 12: Gate-level optimization

Balanced Delay Method • All input events arrive simultaneously • Overall circuit delay not increased • Delay buffers may have to be inserted 1 1 1 1 1 No increase in critical path delay 3 1 1 1 1 1 Lectures 10, 11, 12: Gate-level optimization

Hazard Filter Method • Gate delay is made greater than maximum input path delay difference • No delay buffers needed (least transient energy) • Overall circuit delay may increase 1 1 1 1 1 3 1 1 1 1 Lectures 10, 11, 12: Gate-level optimization

Designing a Glitch-Free Circuit • Maintain specified critical path delay. • Glitch suppressed at all gates by • Path delay balancing • Glitch filtering by increasing inertial delay of gates or by inserting delay buffers when necessary. • A linear program optimally combines all objectives. Path delay = d1 |d1 – d2| < D Delay D Path delay = d2 Lectures 10, 11, 12: Gate-level optimization

Problem Complexity • Number of paths in a circuit can be exponential in circuit size. • Considering all paths through enumeration is infeasible for large circuits. • Example: c880 has 6.96M path constraints. Lectures 10, 11, 12: Gate-level optimization

Define Arrival Time Variables • di Gate delay. • Define two timing windowvariables per gate output: • tiEarliest time of signal transition at gate i. • Ti Latest time of signal transition at gate i. • Glitch suppression constraint: Ti – ti < di t1, T1 ti, Ti . . . di tn, Tn Reference: T. Raja, Master’s Thesis, Rutgers Univ., 2002. Lectures 10, 11, 12: Gate-level optimization

Linear Program • Variables: gate and buffer delays, arrival time variables. • Objective: minimize number of buffers. • Subject to: overall circuit delay constraint for all input-output paths. • Subject to: minimum transient energy condition for all multi-input gates. Lectures 10, 11, 12: Gate-level optimization

An Example: Full Adder add1b 1 1 1 1 1 1 1 1 1 Critical path delay = 6 Lectures 10, 11, 12: Gate-level optimization

Linear Program • Gate variables: d4 . . . d12 • Buffer delay variables: d15 . . . d29 • Window variables: t4 . . . t29 and T4 . . . . T29 Lectures 10, 11, 12: Gate-level optimization

Multiple-Input Gate Constraints For Gate 7: T7≥ T5 + d7 t7≤ t5 + d7d7 > T7 – t7 T7≥ T6 + d7 t7≤ t6 + d7 Glitch suppression Lectures 10, 11, 12: Gate-level optimization

Single-Input Gate Constraints Buffer 19: T16 + d19 = T19 t16 + d19 = t19 Lectures 10, 11, 12: Gate-level optimization

Critical Path Delay Constraints T11≤maxdelay T12≤maxdelay maxdelay is specified Lectures 10, 11, 12: Gate-level optimization

Objective Function • Need to minimize the number of buffers. • Because that leads to a nonlinear objective function, we use an approximate criterion: minimize ∑ (buffer delay) all buffers i.e., minimize d15 + d16 + ∙ ∙ ∙ + d29 • This gives a near optimum result. Lectures 10, 11, 12: Gate-level optimization

AMPL Solution: maxdelay =6 1 2 1 1 1 1 1 2 1 2 2 Critical path delay = 6 Lectures 10, 11, 12: Gate-level optimization

AMPL Solution: maxdelay =7 3 1 1 1 1 1 2 2 1 2 Critical path delay = 7 Lectures 10, 11, 12: Gate-level optimization

AMPL Solution: maxdelay ≥11 5 1 1 1 3 1 2 3 4 Critical path delay = 11 Lectures 10, 11, 12: Gate-level optimization

ALU4: Four-Bit ALU 74181 Maximum Power Savings (zero-buffer design): Peak = 33%, Average = 21% Lectures 10, 11, 12: Gate-level optimization

ALU4: Original and Low-Power Lectures 10, 11, 12: Gate-level optimization

Benchmark Circuits Normalized Power Max-delay (gates) 7 15 24 48 47 94 43 86 No. of Buffers 5 0 62 34 294 120 366 111 Circuit ALU4 C880 C6288 c7552 Average 0.80 0.79 0.68 0.68 0.40 0.36 0.44 0.42 Peak 0.68 0.67 0.54 0.52 0.36 0.34 0.34 0.32 Lectures 10, 11, 12: Gate-level optimization

C7552 Circuit: Spice Simulation Power Saving: Average 58%, Peak 68% Lectures 10, 11, 12: Gate-level optimization

References • R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, South San Francisco: The Scientific Press, 1993. • M. Berkelaar and E. Jacobs, “Using Gate Sizing to Reduce Glitch Power,” Proc. ProRISC Workshop, Mierlo, The Netherlands, Nov. 1996, pp. 183-188. • V. D. Agrawal, “Low Power Design by Hazard Filtering,” Proc. 10th Int’l Conf. VLSI Design, Jan. 1997, pp. 193-197. • V. D. Agrawal, M. L. Bushnell, G. Parthasarathy and R. Ramadoss, “Digital Circuit Design for Minimum Transient Energy and Linear Programming Method,” Proc. 12th Int’l Conf. VLSI Design, Jan. 1999, pp. 434-439. • T. Raja, V. D. Agrawal and M. L. Bushnell, “Minimum DynamicPower CMOS Circuit Design by a Reduced Constraint Set Linear Program,” Proc. 16thInt’l Conf. VLSI Design, Jan. 2003, pp. 527-532. • T. Raja, V. D. Agrawal, and M. L. Bushnell, “Transistor sizing of logicgates to maximize input delay variability,” J. Low Power Electron., vol.2, no. 1, pp. 121–128, Apr. 2006. • T. Raja, V. D. Agrawal, and M. L. Bushnell, “Variable Input Delay CMOS Logic for Low Power Design,” IEEE Trans. VLSI Design, vol. 17, mo. 10, pp. 1534-1545. October 2009. Lectures 10, 11, 12: Gate-level optimization

Exercise: Dynamic Power • An average gate • VDD, V = 1 volt • Output capacitance, C = 1pF • Activity factor, α = 10% • Clock frequency, f = 1GHz • What is the dynamic power consumption of a 1 million gate VLSI chip? Lectures 10, 11, 12: Gate-level optimization

Answer • Dynamic energy per transition = 0.5CV2 • Dynamic power per gate = Energy per second = 0.5 CV2 α f = 0.5 ✕ 10 – 12 ✕ 12 ✕ 0.1 ✕ 109 = 0.5 ✕ 10 – 4 = 50μW • Power for 1 million gate chip = 50W Lectures 10, 11, 12: Gate-level optimization

Components of Power • Dynamic • Signal transitions • Logic activity • Glitches • Short-circuit • Static • Leakage Lectures 10, 11, 12: Gate-level optimization

Subthreshold Conduction Vgs – Vth –Vds Ids = I0 exp( ───── ) × (1– exp ─── ) nVT VT Ids 1mA 100μA 10μA 1μA 100nA 10nA 1nA 100pA 10pA Subthreshold slope Saturation region Subthreshold region d g s Vth 0 0.3 0.6 0.9 1.2 1.5 1.8 V Vgs Lectures 10, 11, 12: Gate-level optimization

Thermal Voltage, vT VT = kT/q = 26 mV, at room temperature. When Vds is several times greater than VT Vgs – Vth Ids = I0 exp( ───── ) nVT Lectures 10, 11, 12: Gate-level optimization

Leakage Current • Leakage current equals Ids when Vgs= 0 • Leakage current, Ids = I0exp( – Vth/nVT) • At cutoff, Vgs = Vth, and Ids = I0 • Lowering leakage to 10-b ✕ I0 Vth = bnVTln 10 = 1.5b × 26 ln 10 = 90b mV • Example: To lower leakage to I0/1,000 Vth = 270 mV Lectures 10, 11, 12: Gate-level optimization

Threshold Voltage • Vth = Vt0 + γ[(Φs+Vsb)½ – Φs½] • Vt0 is threshold voltage when source is at body potential (0.4 V for 180nm process) • Φs = 2VTln(NA /ni)is surface potential • γ = (2qεsiNA)½tox /εox is body effect coefficient (0.4 to 1.0) • NA is doping level = 8×1017 cm–3 • ni = 1.45×1010 cm–3 Lectures 10, 11, 12: Gate-level optimization

Threshold Voltage, Vsb = 1.1V • Thermal voltage, VT = kT/q = 26 mV • Φs = 0.93 V • εox = 3.9×8.85×10-14 F/cm • εsi = 11.7×8.85×10-14 F/cm • tox = 40 Ao • γ = 0.6 V½ • Vth = Vt0 + γ[(Φs+Vsb)½- Φs½] = 0.68 V Lectures 10, 11, 12: Gate-level optimization

A Sample Calculation • VDD = 1.2V, 100nm CMOS process • Transistor width, W = 0.5μm • OFF device (Vgs = Vth) leakage • I0 = 20nA/μm, for low threshold transistor • I0 = 3nA/μm, for high threshold transistor • 100M transistor chip • Power = (100×106/2)(0.5×20×10-9A)(1.2V) = 600mW for all low-threshold transistors • Power = (100×106/2)(0.5×3×10-9A)(1.2V) = 90mW for all high-threshold transistors Lectures 10, 11, 12: Gate-level optimization

Dual-Threshold Chip • Low-threshold only for 20% transistors on critical path. • Leakage power = 600×0.2 + 90×0.8 = 120 + 72 = 192 mW Lectures 10, 11, 12: Gate-level optimization

Dual-Threshold CMOS Circuit Lectures 10, 11, 12: Gate-level optimization

Dual-Threshold Design • To maintain performance, all gates on critical paths are assigned low Vth . • Most other gates are assigned high Vth . • But, some gates on non-critical paths may also be assigned low Vth to prevent those paths from becoming critical. Lectures 10, 11, 12: Gate-level optimization

Integer Linear Programming (ILP) to Minimize Leakage Power • Use dual-threshold CMOS process • First, assign all gates low Vth • Use an ILP model to find the delay (Tc) of the critical path • Use another ILP model to find the optimal Vth assignment as well as the reduced leakage power for all gates without increasing Tc • Further reduction of leakage power possible by letting Tc increase Lectures 10, 11, 12: Gate-level optimization

ILP -Variables For each gate i define two variables. • Ti :the longest time at which the output of gate i can produce an event after the occurrence of an input event at a primary input of the circuit. • Xi :a variable specifyinglow or high Vth for gate i ; Xiis an integer [0, 1], 1  gate i is assigned low Vth , 0  gate i is assigned high Vth . Lectures 10, 11, 12: Gate-level optimization

ILP - objective function Leakage power: minimize the sum of all gate leakage currents, given by • ILi is the leakage current of gate i with low Vth • IHiis the leakage current of gate i with high Vth • Using SPICE simulation results, construct a leakage current look up table, which is indexed by the gate type and the input vector. Lectures 10, 11, 12: Gate-level optimization

ILP - Constraints Ti • For each gate (1) output of gate j is fanin of gate i (2) • Max delay constraints for primary outputs (PO) (3) Tmax is the maximum delay of the critical path Gate i Gate j Tj Lectures 10, 11, 12: Gate-level optimization

ILP Constraint Example • Assume all primary input (PI) signals on the left arrive at the same time. • For gate 2, constraints are Lectures 10, 11, 12: Gate-level optimization

CSV881: Low-Power Design Gate-Level Power Optimization