250 likes | 356 Views
ELEC 5970-001/6970-001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits Power Aware Microprocessors. Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University
E N D
ELEC 5970-001/6970-001(Fall 2005)Special Topics in Electrical EngineeringLow-Power Design of Electronic CircuitsPower Aware Microprocessors Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University http://www.eng.auburn.edu/~vagrawal vagrawal@eng.auburn.edu ELEC 5970-001/6970-001 Lecture 19
SIA Roadmap for Processors (1999) Source: http://www.semichips.org ELEC 5970-001/6970-001 Lecture 19
Power Reduction in Processors • Just about everything is used. • Hardware methods: • Voltage reduction for dynamic power • Dual-threshold devices for leakage reduction • Clock gating, frequency reduction • Sleep mode • Architecture: • Instruction set • hardware organization • Software methods ELEC 5970-001/6970-001 Lecture 19
SPEC CPU2000 Benchmarks • Twelve integer and 14 floating point programs, CINT2000 and CFP2000. • Each program run time is normalized to obtain a SPEC ratio with respect to the run time of Sun Ultra 5_10 with a 300MHz processor. • CINT2000 and CFP2000 summary measurements are the geometric means of SPEC ratios. ELEC 5970-001/6970-001 Lecture 19
Reference CPU s: Sun Ultra 5_10 300MHz Processor ELEC 5970-001/6970-001 Lecture 19
CINT2000: 3.4GHz Pentium 4, HT Technology (D850MD Motherboard) SPECint2000_base = 1341 SPECint2000 = 1389 Source: www.spec.org ELEC 5970-001/6970-001 Lecture 19
Two Benchmark Results • Baseline: A uniform configuration not optimized for specific program: • Same compiler with same settings and flags used for all benchmarks • Other restrictions • Peak: Run is optimized for obtaining the peak performance for each benchmark program. ELEC 5970-001/6970-001 Lecture 19
CFP2000: 3.6GHz Pentium 4, HT Technology (D925XCV/AA-400 Motherboard) SPECfp2000_base = 1627 SPECfp2000 = 1630 Source: www.spec.org ELEC 5970-001/6970-001 Lecture 19
CINT2000: 1.7GHz Pentium 4(D850MD Motherboard) SPECint2000_base = 579 SPECint2000 = 588 Source: www.spec.org ELEC 5970-001/6970-001 Lecture 19
CFP2000: 1.7GHz Pentium 4 (D850MD Motherboard) SPECfp2000_base = 648 SPECfp2000 = 659 Source: www.spec.org ELEC 5970-001/6970-001 Lecture 19
Energy SPEC Benchmarks • Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by: 1/(Execution time) Energy efficiency = ──────────── joules consumed ELEC 5970-001/6970-001 Lecture 19
Energy Efficiency • Efficiency averaged on n benchmark programs: n Efficiency = (Π Efficiencyi)1/n i=1 where Efficiencyi is the efficiency for program i. • Relative efficiency: Efficiency of a computer Relative efficiency = ───────────────── Eff. of reference computer ELEC 5970-001/6970-001 Lecture 19
SPEC2000 Relative Energy Efficiency Always max. clock Laptop adaptive clk. Min. power min. clock ELEC 5970-001/6970-001 Lecture 19
Voltage Scaling • Dynamic: Reduce voltage and frequency during idle or low activity periods. • Static: Clustered voltage scaling • Logicon non-critical path given lower voltage • 47% power reduction with 10% area increase reported. • M. Igarashi et al., “Clustered Voltage Scaling Techniques for Low-Power Design,” Proc. IEEE Symp. Low Power Design, 1997. ELEC 5970-001/6970-001 Lecture 19
Pipeline Gating • A pipeline processor uses speculative execution. • Incorrect branch prediction results in pipeline stalls and wasted energy. • Idea: Stop fetching instructions if a branch hazard is expected: • If the count (M) of incorrect predictions exceeds a pre-specified number (N), then suspend fetching instruction for some k cycles. • Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Proc. 25th Annual International Symp. Computer Architecture, June 1998. ELEC 5970-001/6970-001 Lecture 19
Slack Scheduling • Application: Superscalar, out-of-order execution: • An instruction is executed as soon as data and resources it needs become available. • A commit unit reorders the results. • Delay the execution of instructions whose result is not immediately needed. • Example of RISC instructions: • add r0, r1, r2; (A) • sub r3, r4, r5; (B) • and r9, x1, r9; (C) • or r5, r9, r10; (D) • xor r2, r10, r11; (E) J. Casmira and D. Grunwald, “Dynamic Instruction Scheduling Slack,” Proc. ACM Kool Chips Workshop, Dec. 2000. ELEC 5970-001/6970-001 Lecture 19
Slack Scheduling Example ELEC 5970-001/6970-001 Lecture 19
Slack Scheduling Re-order buffer Scheduling logic Low-power execution units Slack bit ELEC 5970-001/6970-001 Lecture 19
Parallel Architecture Processor Processor Input Output Output f/2 Input Processor f f Capacitance = C Voltage = V Frequency = f Power = CV2f Capacitance = 2.2C Voltage = 0.6V Frequency = 0.5f Power = 0.396CV2f f/2 ELEC 5970-001/6970-001 Lecture 19
Pipeline Architecture Processor ½ Proc. ½ Proc. Input Output Input Output Register Register Register f f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV2f Capacitance = C Voltage = V Frequency = f Power = CV2f ELEC 5970-001/6970-001 Lecture 19
Approximate Trend G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998. ELEC 5970-001/6970-001 Lecture 19
Clock Distribution clock ELEC 5970-001/6970-001 Lecture 19
Clock Power Pclk = CLVDD2f + CLVDD2f / λ + CLVDD2f / λ2 + . . . stages – 1 1 = CLVDD2f Σ ─ n = 0 λn where CL = total load capacitance λ = constant fanout at each stage in distribution network Clock consumes about 40% of total processor power. ELEC 5970-001/6970-001 Lecture 19
Clock Network Examples D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627-1633, Nov. 1998. ELEC 5970-001/6970-001 Lecture 19
Power Reduction Example • Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W • Reduce voltage to 1.5V, power (5.3x) = 4.9W • Eliminate FP, power (3x) = 1.6W • Scale 0.75→0.35μ, power (2x) = 0.8W • Reduce clock load, power (1.3x) = 0.6W • Reduce frequency 200→160MHz, power (1.25x) = 0.5W • J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703-1714, Nov. 1996. ELEC 5970-001/6970-001 Lecture 19