1 / 45

Power Management

Power Management. Lecture notes S. Yalamanchili and S. Mukhopadhyay. GATE. DRAIN. SOURCE. BODY. Technology Scaling. GATE. 30% scaling down in dimensions  doubles transistor density Power per transistor V dd scaling  lower power Transistor delay = C gate V dd /I SAT

wylie
Download Presentation

Power Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

  2. GATE DRAIN SOURCE BODY Technology Scaling GATE • 30% scaling down in dimensions  doubles transistor density • Power per transistor • Vddscaling  lower power • Transistor delay = CgateVdd/ISAT • Cgate, Vddscaling  lower delay DRAIN SOURCE tox L

  3. Moore’s Law Goal: Sustain Performance Scaling • Performance scaled with number of transistors • Dennard scaling*: power scaled with feature size From wikipedia.org *R. Dennard, et al., “Design of ion-implanted MOSFETs with very small physical dimensions,” IEEE Journal of Solid State Circuits, vol. SC-9, no. 5, pp. 256-268, Oct. 1974.

  4. Parallelism and Power IBM Power5 AMD Trinity Source: forwardthinking.pcmag.com Source: IBM • How much of the chip area is devoted to compute? • Run many cores slower. Why does this reduce power?

  5. Power per transistor scales with frequency but also scales with Vdd Lower Vddcan be compensated for with increased pipelining to keep throughput constant Power per transistor is not same as power per area  power density is the problem! Multiple units can be run at lower frequencies to keep throughput constant, while saving power The Power Wall

  6. What is the Problem? Mukhopadhyay and Yalamanchili (2009) • Based on scaling using Pentium-class cores • While Moore’s Law continues, scaling phenomena have changed • Power densities are increasing with each generation

  7. ITRS Roadmap for Logic Devices From: “ExaScaleComputing Study: Technology Challenges in Achieving Exascale Systems,” P. Kogge, et.al, 2008

  8. Power Management Basics Lecture notes S. Yalamanchili and S. Mukhopadhyay

  9. What are my Options? • Better technology • Manufacturing • Better devices (FinFet) • New Devices  non-CMOS?  this is the future • Be more efficient – activity management • Clock gating – dynamic energy/power • Power gating – static energy/power • Power state management - both • Improved architecture • Simpler pipelines • Parallelism Not this course

  10. Activity Management Clock Gating Power Gating • Turn off clock to a block of logic • Eliminate unnecessary transitions/activity • Clock distribution power Vdd Power gate transistor Core 0 Core 1 • Turn off power to a block of logic, e.g., core • No leakage Combinational Logic input clk cond clk clk

  11. Multiple Voltage Frequency Domains Intel Sandy Bridge Processor • Coresand ring in one DVFS domain • Graphics unit in another DVFS domain • Cores and portion of cache can be gated off From E. Rotem et. Al. HotChips 2011

  12. Processor Power States • Performance States – P-states • Operate at different voltage/frequencies • Recall delay-voltage relationship • Lower voltage  lower leakage • Lower frequency  lower power (not the same as energy!) • Lower frequency  longer execution time • Idle States - C-states • Sleep states • Differ is how much state is saved • SW or HW managed transitions between states!

  13. Example of P-states AMD Trinity A10-5800 APU: 100W TDP • Software Managed Power States • Changing Power States is not free

  14. Example of P-states From: http://www.intel.com/content/www/us/en/processors/core/2nd-gen-core-family-mobile-vol-1-datasheet.html

  15. Management Knobs • Each core can be in any one of a multiple of states • How do I decide what state to set each core? • Who decides? HW? SW? • How do I decide when I can turn off a core? • What am I saving? Static energy or dynamic energy?

  16. Power Management • Software controlled power management • Optimize power and/or energy • Orchestrated by the operating system or application libraries • Industry standard interfaces for power management • Advanced Configuration and Power Interface (ACPI) • https://www.acpica.org/ • http://www.acpi.info/ • Hardware power management • Optimized power/energy • Failsafe operation, e.g., protect against thermal emergencies

  17. Power Management 3.0 Performance and energy efficiency depend on effective utilization of power and thermal headroom Max Die Temp Thermal Headroom Die Temperature Instructions/cycle Time Time HW Boost states Performance Convert thermal headroom to higher performance through boost SW visible states

  18. Boosting Intel Sandy Bridge • Exploit package physics • Temperature changes on the order of milliseconds • Use the thermal headroom Turbo boost region Max Power TDP Power 10s of seconds Low power – build up thermal credits

  19. Power Gating • Turn off components that are not being used • Lose all state information • Costs of powering down • Costs of powering up • Smart shutdown • Models to guide decisions Intel Sandy Bridge Processor

  20. Parallelism • Concurrency + lower frequency  greater energy efficiency Example • 4X #cores • 0.75x voltage • 0.5x Frequency • 1X power • 2X in performance Core Core Core Core Core Cache Cache Cache Cache Cache

  21. Simplify Core Design AMD Bulldozer Core • Support for branch prediction, schedulers, etc. consumes more energy per instruction • Can fit many more simpler cores on a die ARM A7 Core (arm.com)

  22. Metrics • Power efficiency • MIPS/watt • Ops/watt • Energy efficiency • Joules/instruction • Joules/op • Composite • Energy-delay product • Energy-delay2 Why are these useful?

  23. Modeling Lecture notes S. Yalamanchili and S. Mukhopadhyay

  24. Microarchitectural Level Models • How can we study power consumption without building circuits? • Models • Models can are available at multiple levels of abstraction. We are interested in microarchitectural models

  25. Processor Microarchitecture Fetch Decode Execute/Writeback ALU Register Files MUL Instruction Decoder Instruction Cache Instruction Queue Fetch Queue FPU LD Branch Prediction Instruction TLB ST L1 Data Cache Data TLB Network Memory On-Chip Network L2 Data Cache NoC Router

  26. Energy/Power Calculation • How do we calculate energy or power dissipation for a given microarchitecture? • Energy/Power varies between: • Different ISA; ARMvsIntel x86 • Different microarchitecture; in-ordervsout-of-order • Different applications; memoryvscompute-bound • Different technologies; 90nmvs22nm technology • Different operation conditions; frequency, temperature

  27. Architecture Activity (1) fbuffer.write++; icache.read++; ALU Register Files Activity 1: Instruction Fetch MUL Instruction Decoder Instruction Cache Instruction Queue Fetch Queue FPU LD Branch Prediction Instruction TLB ST • Collect activity counts of each architecture component (through simulation or measurement). • List of components differs between microarchitectures. • Activity counts at each component differs between applications. L1 Data Cache Data TLB On-Chip Network L2 Data Cache NoC Router

  28. Architecture Activity (2) idecoder.logic++; fbuffer.read++; ALU Register Files Activity 2: Instruction Decode MUL Instruction Decoder Instruction Cache Instruction Queue Fetch Queue FPU LD Branch Prediction Instruction TLB ST • Read/write accesses to caches, buffers, etc. • Logical accesses to logic blocks such as decoder, ALUs, etc. • Tradeoff of differentiating more access types (accuracy) vs simulation speed (complexity). L1 Data Cache Data TLB On-Chip Network L2 Data Cache NoC Router

  29. Power and Architecture Activity • For example, At nth clock cycle, collected counters are: • Data cache: • read = 20, write = 12; • per-read energy = 0.5nJ; per-write energy = 0.6nJ; • Read energy = read*per-read energy = 10nJ • Write energy = write*per-write energy = 7.2nJ • Total activity energy = read+write energies = 17.2nJ • If n = 50th clock cycle and clock frequency = 2GHz,Total activity power = energy*clock_freq/n = 688mW • *Note: n/clock_freq = n clock periods in sec power = time average of energy

  30. Things to consider (1) How do we calculate per-read/write energies? • Per-access energies can be estimated from circuit-level designs and analyses. • There are various open-source tools for this. Architecture Specification Circuit-level Estimation Tool Estimation Results: Area, Energy, Timing, etc. Technology Parameters

  31. Things to consider (2) Is per-access energy always the same? • Per-access energy in fact depends on: • how many bits are switching • how they are switching (0→1 or 1→0) • It is reasonable to assume constant per-access energy in long-term observation (e.g., n = 1M clock cycles); the number of switching bits are averaged (e.g., 50% of bits are switching). • Most architecture simulators do not capture bit-level details due to simulation complexity.

  32. Things to consider (3) If a register file didn’t have read/write accesses but held data, what is the energy dissipation? • Energy (or power) is largely comprised of dynamic and static dissipations. • Dynamic (or switching) energy refers to energy dissipation due to switching activities. • Static (or leakage) energy is dissipation to keep the electronic system turned on. • In this case, the register file has no dynamic energy dissipation but consumes static energy.

  33. Thermal Issues Lecture notes S. Yalamanchili and S. Mukhopadhyay

  34. Thermal Issues • Heat can cause damage to the chip • Need failsafe operation • Thermal fields change the physical characteristics • Leakage current and therefore power increases • Delay increases • Device degradation becomes worse • Cooling solution determines the permitted power dissipation

  35. Thermal Design Power (TDP) • This is the maximum power at which the part is designed to operate • Dictates the design of the cooling system • Max temperature  Tjmax • Typically fixed by worst case workload • Parts are typically operating below the TDP • Opportunities for turbo mode? AMD Trinity APU http://ecs.vancouver.wsu.edu/thermofluids-research

  36. Heat Sink Limits on Performance • Thermal design power (TDP) • Determines the cooling solution & package limits • Performance depends on effective utilization of this thermal headroom Max Die Temp Thermal Headroom Instructions/cycle Workload Time • www.legitreviews.com Temp Boost power HW Boost states TDP Power Convert thermal headroom to higher performance through boosting SW visible states Power

  37. Trinity TDP Source: http://www.anandtech.com/show/6347/amd-a10-5800k-a8-5600k-review-trinity-on-the-desktop-part-2

  38. Issues • Cooling chips is now an issue for computer architects! • Co-design the cooling system and the processor • Some very “cool” new technologies • E.g., microfluidics!

  39. Electrical and Fluidic I/Os Courtesy L. Zheng ECE) and Professor Muhannad Bakir (ECE) • Fluid flow through the microchannels carry heat out to an external heat exchanger (e.g., heat sink)

  40. Fabrication Examples Courtesy L. Zheng ECE) and Professor Muhannad Bakir (ECE) Micropin-fins (150 µm diameter and 225 µm diameter)and vias Electrical and fluidic microbumps, fluidic vias and fine wires

  41. Conclusions • Power/energy is the leading driver of modern architecture design • Power and energy management is key to scalability • Need integrated power/energy, performance, thermal management in fielded systems • What about energy/power efficient algorithms?

  42. Study Guide • Explain the difference between energy dissipation and power dissipation • Distinguish between static power dissipation and dynamic power dissipation • Explain dynamic voltage frequency scaling • What are power states? • Why is this an advantage? • What is the impact of DVFS on i) energy, ii) execution time, and iii) power • Distinguish between clock gating and power gating

  43. Study Guide (cont.) • Define thermal design power (TDP) • Name two schemes to preventing the chip from exceeding TDP. Explain how they achieve this goal • What does boosting achieve? • What is the difference between C-states and P-states? • Name one power management technique that will save static power? • How does using many slower simpler cores improve power efficiency?

  44. Study Guide (cont.) • How is thermal design power (TDP) calculated? • When using boost algorithms, what determines the duration of the high frequency operation? • How does a power virus work? • Describe how throttling works • Know the power dissipation in some modern processor-memory systems drawn from the embedded, server, and high performance computing segments

  45. Glossary • Boosting • C-states • Dynamic Power and Energy • Power Gating • P-states • Static Power and Energy • Time constant • Thermal Design Point • Throttling

More Related