220 likes | 318 Views
ELEC 5970-001/6970-001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits Low-Power Logic Design and Parallelism. Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University
E N D
ELEC 5970-001/6970-001(Fall 2005)Special Topics in Electrical EngineeringLow-Power Design of Electronic CircuitsLow-Power Logic Designand Parallelism Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University http://www.eng.auburn.edu/~vagrawal vagrawal@eng.auburn.edu ELEC 5970-001/6970-001 Lecture 17
State Encoding • Two-bit binary counter: • State sequence, 00→01→10→11→00 • Six bit transitions in four clock cycles • 6/4 = 1.5 transitions per clock • Two-bit Gray-code counter • State sequence, 00→01→11→10→00 • Four bit transitions in four clock cycles • 4/4 = 1.0 transition per clock • Gray-code counter is more power efficient. G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers (now Springer), 1998. ELEC 5970-001/6970-001 Lecture 17
Three-Bit Counters ELEC 5970-001/6970-001 Lecture 17
N-Bit Counter: Toggles in Counting Cycle • Binary counter: T(binary) = 2(2N – 1) • Gray-code counter: T(gray) = 2N • T(gray)/T(binary) = 2N-1/(2N – 1) → 0.5 ELEC 5970-001/6970-001 Lecture 17
Bus Encoding • Example: Four bit bus • 0000→1110 has three transitions. • If bits of second pattern are inverted, then 0000→0001 will have only one transition. • Bit-inversion encoding for N-bit bus: N N/2 0 Number of bit transitions after inversion encoding 0 N/2 N Number of bit transitions ELEC 5970-001/6970-001 Lecture 17
Bus-Inversion Encoding Logic Sent data Received data Bus register Polarity decision logic M. Stan and W. Burleson, “Bus-Invert Coding for Low Power I/O,” IEEE Trans. VLSI Systems, vol. 3, no. 1, pp. 49-58, March 1995. Polarity bit ELEC 5970-001/6970-001 Lecture 17
FSM State Encoding Transition probability based on PI statistics 0.6 0.6 11 01 0.3 0.3 0.1 0.1 0.4 0.4 00 01 00 11 0.1 0.1 0.9 0.9 0.6 0.6 Expected number of state-bit transitions: 2(0.3+0.4) + 1(0.1+0.1) = 1.61(0.3+0.4+0.1) + 2(0.1) = 1.0 State encoding can be selected using a power-based cost function. ELEC 5970-001/6970-001 Lecture 17
FSM: Clock-Gating • Moore machine: Outputs depend only on the state variables. • If a state has a self-loop in the state transition graph (STG), then clock can be stopped whenever a self-loop is to be executed. Xi/Zk Si Sk Xk/Zk Clock can be stopped when (Xk, Sk) combination occurs. Sj Xj/Zk ELEC 5970-001/6970-001 Lecture 17
Clock-Gating in Moore FSM Combinational logic PI PO Flip-flops Clock activation logic Latch L. Benini and G. De Micheli, Dynamic Power Management, Boston: Springer, 1998. CK ELEC 5970-001/6970-001 Lecture 17
Clock-Gating in Low-Power Flip-Flop D D Q CK ELEC 5970-001/6970-001 Lecture 17
Low-Power Datapath Architecture • Lower supply voltage • This slows down circuit speed • Use parallel computing to gain the speed back • Works well when threshold voltage is also lowered. • About 60% reduction in power obtainable. • Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995. ELEC 5970-001/6970-001 Lecture 17
A Reference Datapath Combinational logic Output Register Input Register Cref CK Supply voltage = Vref Total capacitance switched per cycle = Cref Clock frequency = f Power consumption: Pref = CrefVref2f ELEC 5970-001/6970-001 Lecture 17
Register Register Register Register A Parallel Architecture Supply voltage: VN ≤ V1 = Vref N = Deg. of parallelism A copy processes every Nth input, operates at reduced voltage Comb. Logic Copy 1 f/N Comb. Logic Copy 2 Output Input N to 1 multiplexer f/N f Comb. Logic Copy N Multiphase Clock gen. and mux control f/N CK ELEC 5970-001/6970-001 Lecture 17
Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4 ELEC 5970-001/6970-001 Lecture 17
Power PN = Pproc + Poverhead Pproc = N(Cinreg+Ccomb)VN2f/N + CoutregVN2f = (Cinreg+Ccomb+Coutreg)VN2f = CrefVN2f Poverhead = CoverheadVN2f ≈ δCref(N – 1)VN2f PN = [1 + δ(N – 1)]CrefVN2f PN VN2 ── = [1 + δ(N – 1)] ─── P1 Vref2 ELEC 5970-001/6970-001 Lecture 17
Voltage vs. Speed CLVref CLVref Delay of a gate, T ≈ ──── = ────────── I k(W/L)(Vref – Vt)2 where I is saturation current k is a technology parameter W/L is width to length ratio of transistor Vt is threshold voltage 4.0 3.0 2.0 1.0 0.0 Voltage reduction slows down as we get closer to Vt 1.2μ CMOS N=3 Normalized gate delay, T N=2 N=1 Supply voltage Vt V3 V2=2.9V Vref =5V ELEC 5970-001/6970-001 Lecture 17
Increasing Multiprocessing 1.0 0.8 0.6 0.4 0.2 0.0 1.2μ CMOS, Vref = 5V Vt=0.8V PN/P1 Vt=0.4V Vt=0V (extreme case) 1 2 3 4 5 6 7 8 9 10 11 12 N ELEC 5970-001/6970-001 Lecture 17
Extreme Case: Vt = 0 Delay, T α 1/ Vref For N processing elements, delay = NT → VN = Vref/N PN 1 ── = [1+ δ (N – 1)] ── → 1/N P1 N2 For negligible overhead, δ→0 PN 1 ── ≈ ── P1 N2 For Vt > 0, power reduction is less and there will be an optimum value of N. ELEC 5970-001/6970-001 Lecture 17
D Q D Q D Q D Q D Q Reduced-Power Shift Register D D Q D Q Output multiplexer D Q CK(f/2) Flip-flops are operated at full voltage and half the clock frequency. ELEC 5970-001/6970-001 Lecture 17
Power Consumption of Shift Reg. 16-bit shift register, 2μ CMOS P = C’VDD2f/n 1.0 0.5 0.25 0.0 Normalized power C. Piguet, “Circuit and Logic Level Design,” pages 103-133 in W. Nebel and J. Mermet (ed.), Low Power Design in Deep Submicron Electronics, Boston: Kluwer Academic Publishers, 1997. 1 2 4 Degree of parallelism, n ELEC 5970-001/6970-001 Lecture 17
Multicore Processors • D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp. 11-13, May 2005. • A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40, July 2005; this special issue contains three more articles on multicore processors. ELEC 5970-001/6970-001 Lecture 17
Multicore Processors Computer, May 2005, p. 12 Multicore Performance based on SPECint2000 and SPECfp2000 benchmarks Single core 2000 2004 2008 ELEC 5970-001/6970-001 Lecture 17