280 likes | 430 Views
ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors. Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07. SIA Roadmap for Processors (1999).
E N D
ELEC 7770Advanced VLSI DesignSpring 2007Power Aware Microprocessors Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07 ELEC 7770: Advanced VLSI Design (Agrawal)
SIA Roadmap for Processors (1999) Source: http://www.semichips.org ELEC 7770: Advanced VLSI Design (Agrawal)
Power Reduction in Processors • Just about everything is used. • Hardware methods: • Voltage reduction for dynamic power • Dual-threshold devices for leakage reduction • Clock gating, frequency reduction • Sleep mode • Architecture: • Instruction set • hardware organization • Software methods ELEC 7770: Advanced VLSI Design (Agrawal)
SPEC CPU2000 Benchmarks • Twelve integer and 14 floating point programs, CINT2000 and CFP2000. • Each program run time is normalized to obtain a SPEC ratio with respect to the run time of Sun Ultra 5_10 with a 300MHz processor. • CINT2000 and CFP2000 summary measurements are the geometric means of SPEC ratios. ELEC 7770: Advanced VLSI Design (Agrawal)
Reference CPU s: Sun Ultra 5_10 300MHz Processor ELEC 7770: Advanced VLSI Design (Agrawal)
CINT2000: 3.4GHz Pentium 4, HT Technology (D850MD Motherboard) SPECint2000_base = 1341 SPECint2000 = 1389 Source: www.spec.org ELEC 7770: Advanced VLSI Design (Agrawal)
Two Benchmark Results • Baseline: A uniform configuration not optimized for specific program: • Same compiler with same settings and flags used for all benchmarks • Other restrictions • Peak: Run is optimized for obtaining the peak performance for each benchmark program. ELEC 7770: Advanced VLSI Design (Agrawal)
CFP2000: 3.6GHz Pentium 4, HT Technology (D925XCV/AA-400 Motherboard) SPECfp2000_base = 1627 SPECfp2000 = 1630 Source: www.spec.org ELEC 7770: Advanced VLSI Design (Agrawal)
CINT2000: 1.7GHz Pentium 4(D850MD Motherboard) SPECint2000_base = 579 SPECint2000 = 588 Source: www.spec.org ELEC 7770: Advanced VLSI Design (Agrawal)
CFP2000: 1.7GHz Pentium 4 (D850MD Motherboard) SPECfp2000_base = 648 SPECfp2000 = 659 Source: www.spec.org ELEC 7770: Advanced VLSI Design (Agrawal)
Energy SPEC Benchmarks • Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by: 1/(Execution time) Energy efficiency = ──────────── joules consumed ELEC 7770: Advanced VLSI Design (Agrawal)
Energy Efficiency • Efficiency averaged on n benchmark programs: n Efficiency = (Π Efficiencyi)1/n i=1 where Efficiencyi is the efficiency for program i. • Relative efficiency: Efficiency of a computer Relative efficiency = ───────────────── Eff. of reference computer ELEC 7770: Advanced VLSI Design (Agrawal)
SPEC2000 Relative Energy Efficiency Always max. clock Laptop adaptive clk. Min. power min. clock ELEC 7770: Advanced VLSI Design (Agrawal)
Voltage Scaling • Dynamic: Reduce voltage and frequency during idle or low activity periods. • Static: Clustered voltage scaling • Logicon non-critical paths given lower voltage. • 47% power reduction with 10% area increase reported. • M. Igarashi et al., “Clustered Voltage Scaling Techniques for Low-Power Design,” Proc. IEEE Symp. Low Power Design, 1997. ELEC 7770: Advanced VLSI Design (Agrawal)
Pipeline Gating • A pipeline processor uses speculative execution. • Incorrect branch prediction results in pipeline stalls and wasted energy. • Idea: Stop fetching instructions if a branch hazard is expected: • If the count (M) of incorrect predictions exceeds a pre-specified number (N), then suspend fetching instruction for some k cycles. • Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Proc. 25th Annual International Symp. Computer Architecture, June 1998. ELEC 7770: Advanced VLSI Design (Agrawal)
Slack Scheduling • Application: Superscalar, out-of-order execution: • An instruction is executed as soon as data and resources it needs become available. • A commit unit reorders the results. • Delay the execution of instructions whose result is not immediately needed. • Example of RISC instructions: • add r0, r1, r2; (A) • sub r3, r4, r5; (B) • and r9, x1, r9; (C) • or r5, r9, r10; (D) • xor r2, r10, r11; (E) J. Casmira and D. Grunwald, “Dynamic Instruction Scheduling Slack,” Proc. ACM Kool Chips Workshop, Dec. 2000. ELEC 7770: Advanced VLSI Design (Agrawal)
Slack Scheduling Example ELEC 7770: Advanced VLSI Design (Agrawal)
Slack Scheduling Re-order buffer Scheduling logic Low-power execution units Slack bit ELEC 7770: Advanced VLSI Design (Agrawal)
Clock Distribution clock ELEC 7770: Advanced VLSI Design (Agrawal)
Clock Power Pclk = CLVDD2f + CLVDD2f / λ + CLVDD2f / λ2 + . . . stages – 1 1 = CLVDD2f Σ ─ n = 0 λn where CL = total load capacitance λ = constant fanout at each stage in distribution network Clock consumes about 40% of total processor power. ELEC 7770: Advanced VLSI Design (Agrawal)
Clock Network Examples D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627-1633, Nov. 1998. ELEC 7770: Advanced VLSI Design (Agrawal)
Power Reduction Example • Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W • Reduce voltage to 1.5V, power (5.3x) = 4.9W • Eliminate FP, power (3x) = 1.6W • Scale 0.75→0.35μ, power (2x) = 0.8W • Reduce clock load, power (1.3x) = 0.6W • Reduce frequency 200→160MHz, power (1.25x) = 0.5W • J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703-1714, Nov. 1996. ELEC 7770: Advanced VLSI Design (Agrawal)
Parallel Architecture Processor Processor Input Output Output f/2 Input Processor f f Capacitance = C Voltage = V Frequency = f Power = CV2f Capacitance = 2.2C Voltage = 0.6V Frequency = 0.5f Power = 0.396CV2f f/2 ELEC 7770: Advanced VLSI Design (Agrawal)
Pipeline Architecture Processor ½ Proc. ½ Proc. Input Output Input Output Register Register Register f f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV2f Capacitance = C Voltage = V Frequency = f Power = CV2f ELEC 7770: Advanced VLSI Design (Agrawal)
Approximate Trend G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998. ELEC 7770: Advanced VLSI Design (Agrawal)
For More on Microprocessors • T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessor Design, Springer, 2002. • R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum Publishers, 2002. ELEC 7770: Advanced VLSI Design (Agrawal)