1 / 32

ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Power Aware Microprocessors

ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Power Aware Microprocessors. Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 vagrawal@eng.auburn.edu

kpineda
Download Presentation

ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Power Aware Microprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ELEC 5270/6270 Fall 2007Low-Power Design of Electronic CircuitsPower Aware Microprocessors Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E6270_Fall07/course.html ELEC6270 Fall 07, Lecture 14

  2. SIA Roadmap for Processors (1999) Source: http://www.semichips.org ELEC6270 Fall 07, Lecture 14

  3. Power Reduction in Processors • Just about everything is used. • Hardware methods: • Voltage reduction for dynamic power • Dual-threshold devices for leakage reduction • Clock gating, frequency reduction • Sleep mode • Architecture: • Instruction set • hardware organization • Software methods ELEC6270 Fall 07, Lecture 14

  4. SPEC CPU2000 Benchmarks • Twelve integer and 14 floating point programs, CINT2000 and CFP2000. • Each program run time is normalized to obtain a SPEC ratio with respect to the run time of Sun Ultra 5_10 with a 300MHz processor. • CINT2000 and CFP2000 summary measurements are the geometric means of SPEC ratios. • LINPACK is numerically intensive floating point linear system (Ax = b) program used for benchmarking supercomputers. ELEC6270 Fall 07, Lecture 14

  5. Reference CPU s: Sun Ultra 5_10 300MHz Processor ELEC6270 Fall 07, Lecture 14

  6. CINT2000: 3.4GHz Pentium 4, HT Technology (D850MD Motherboard) SPECint2000_base = 1341 SPECint2000 = 1389 Source: www.spec.org ELEC6270 Fall 07, Lecture 14

  7. Two Benchmark Results • Baseline: A uniform configuration not optimized for specific program: • Same compiler with same settings and flags used for all benchmarks • Other restrictions • Peak: Run is optimized for obtaining the peak performance for each benchmark program. ELEC6270 Fall 07, Lecture 14

  8. CFP2000: 3.6GHz Pentium 4, HT Technology (D925XCV/AA-400 Motherboard) SPECfp2000_base = 1627 SPECfp2000 = 1630 Source: www.spec.org ELEC6270 Fall 07, Lecture 14

  9. CINT2000: 1.7GHz Pentium 4(D850MD Motherboard) SPECint2000_base = 579 SPECint2000 = 588 Source: www.spec.org ELEC6270 Fall 07, Lecture 14

  10. CFP2000: 1.7GHz Pentium 4 (D850MD Motherboard) SPECfp2000_base = 648 SPECfp2000 = 659 Source: www.spec.org ELEC6270 Fall 07, Lecture 14

  11. Energy SPEC Benchmarks • Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by: 1/(Execution time) Energy efficiency = ──────────── joules consumed ELEC6270 Fall 07, Lecture 14

  12. Energy Efficiency • Efficiency averaged on n benchmark programs: n Efficiency = (Π Efficiencyi)1/n i=1 where Efficiencyi is the efficiency for program i. • Relative efficiency: Efficiency of a computer Relative efficiency = ───────────────── Eff. of reference computer ELEC6270 Fall 07, Lecture 14

  13. SPEC2000 Relative Energy Efficiency Always max. clock Laptop adaptive clk. Min. power min. clock ELEC6270 Fall 07, Lecture 14

  14. Voltage Scaling • Dynamic: Reduce voltage and frequency during idle or low activity periods. • Static: Clustered voltage scaling • Logicon non-critical paths given lower voltage. • 47% power reduction with 10% area increase reported. • M. Igarashi et al., “Clustered Voltage Scaling Techniques for Low-Power Design,” Proc. IEEE Symp. Low Power Design, 1997. ELEC6270 Fall 07, Lecture 14

  15. Processor Utilization Throughput = Operations / second Compute-intensive processes Maximum throughput Low throughput (background) processes Throughput System idle Time ELEC6270 Fall 07, Lecture 14

  16. Examples of Processes • Compute-intensive: spreadsheet, spelling check, video decoding, scientific computing. • Low throughput: data entry, screen updates, low bandwidth I/O data transfer. • Idle: no computation, no expected output. ELEC6270 Fall 07, Lecture 14

  17. Effects of Voltage Reduction • Voltage reduction increases delay, decreases throughput: • Slow reduction in throughput at first • Rapid reduction in throughput for VDD≤ Vth • Time per operation (TPO) increases • Voltage reduction continues to reduce power consumption: • Energy per operation (EPO) = Power × TPO ELEC6270 Fall 07, Lecture 14

  18. Energy per Operation (EPO) 1.0 0.5 0.0 EPO Power TPO 1 2 3 4 5 VDD / Vth ELEC6270 Fall 07, Lecture 14

  19. Dynamic Voltage and Clock T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessors, Springer, 2002, pp. 35-36. ELEC6270 Fall 07, Lecture 14

  20. Problem of Process Variation and Leakage Clock specification Power specification From a presentation:Power Reduction using LongRun2 in Transmeta’s Efficon Processor, by D. Ditzel May 17, 2006 Number of chips Yield loss due to high leakage Yield loss due to slow speed Lower Vth Vth Higher Vth ELEC6270 Fall 07, Lecture 14

  21. Pipeline Gating • A pipeline processor uses speculative execution. • Incorrect branch prediction results in pipeline stalls and wasted energy. • Idea: Stop fetching instructions if a branch hazard is expected: • If the count (M) of incorrect predictions exceeds a pre-specified number (N), then suspend fetching instruction for some k cycles. • Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Proc. 25th Annual International Symp. Computer Architecture, June 1998. ELEC6270 Fall 07, Lecture 14

  22. Slack Scheduling • Application: Superscalar, out-of-order execution: • An instruction is executed as soon as the required data and resources become available. • A commit unit reorders the results. • Delay the completion of instructions whose result is not immediately needed. • Example of RISC instructions: • add r0, r1, r2; (A) • sub r3, r4, r5; (B) • and r9, x1, r9; (C) • or r5, r9, r10; (D) • xor r2, r10, r11; (E) J. Casmira and D. Grunwald, “Dynamic Instruction Scheduling Slack,” Proc. ACM Kool Chips Workshop, Dec. 2000. ELEC6270 Fall 07, Lecture 14

  23. Slack Scheduling Example ELEC6270 Fall 07, Lecture 14

  24. Slack Scheduling Re-order buffer Scheduling logic Low-power execution units Slack bit ELEC6270 Fall 07, Lecture 14

  25. Clock Distribution clock ELEC6270 Fall 07, Lecture 14

  26. Clock Power Pclk = CLVDD2f + CLVDD2f / λ + CLVDD2f / λ2 + . . . stages – 1 1 = CLVDD2f Σ ─ n = 0 λn where CL = total load capacitance λ = constant fanout at each stage in distribution network Clock consumes about 40% of total processor power. ELEC6270 Fall 07, Lecture 14

  27. Clock Network Examples D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627-1633, Nov. 1998. ELEC6270 Fall 07, Lecture 14

  28. Power Reduction Example • Alpha 21064: 200MHz @ 3.45V, power dissipation =26W • Reduce voltage to 1.5V, power (5.3x) =4.9W • Eliminate FP, power (3x) =1.6W • Scale 0.75→0.35μ, power (2x) =0.8W • Reduce clock load, power (1.3x) =0.6W • Reduce frequency 200→160MHz, power (1.25x) =0.5W • J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703-1714, Nov. 1996. ELEC6270 Fall 07, Lecture 14

  29. Parallel Architecture Processor Processor Input Output Output f/2 Input Processor f f Capacitance = C Voltage = V Frequency = f Power = CV2f Capacitance = 2.2C Voltage = 0.6V Frequency = 0.5f Power = 0.396CV2f f/2 ELEC6270 Fall 07, Lecture 14

  30. Pipeline Architecture Processor ½ Proc. ½ Proc. Input Output Input Output Register Register Register f f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV2f Capacitance = C Voltage = V Frequency = f Power = CV2f ELEC6270 Fall 07, Lecture 14

  31. Approximate Trend G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998. ELEC6270 Fall 07, Lecture 14

  32. For More on Microprocessors • T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessor Design, Springer, 2002. • R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum Publishers, 2002. ELEC6270 Fall 07, Lecture 14

More Related