1 / 38

Thermal Management: Technologies & Design Techniques

Thermal Management: Technologies & Design Techniques. Yan Lin, Philip Lee, Jinjun Xiong. ---- Part of the slides courtesy of the original authors. Reference.

tehya
Download Presentation

Thermal Management: Technologies & Design Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thermal Management: Technologies & Design Techniques Yan Lin, Philip Lee, Jinjun Xiong ---- Part of the slides courtesy of the original authors.

  2. Reference • [1] W. Liao, F. Li and L. He, "Microarchitecture Level Power and Thermal Simulation Considering Temperature Dependent Leakage Model," ISLPED, 2003 • [2] D. Brooks, M. Martonos, "Dynamic Thermal Management for High-Performance Microprocessor," ISHPCA, 2001 • [3] F. Bellosa, S. Kellner, M. Waitz, A. Weissel, "Event-Driven Energy Accounting for Dynamic Thermal Management," COLP, 2003 • [4] H. Zeng, C. Ellis, A. Lebeck, and A. Vahdat, "ECOSystem: Managing Energy as a First-Class Operating System Resource," ASPLOS, 2002

  3. Outline [1] • Introduction • Leakage power modeling with temperature scaling • Coupled power and thermal simulation • Sub-Conclusions

  4. Introduction • Leakage power is about 40% of total power for Intel Pentium IV processors at 3GHz [A. Grove, IEDM 2002] • Leakage power exponentially increases with respect to temperature Coupled power and thermal simulation is needed for accurate power and thermal modeling

  5. Circuit and Power States • Active circuit and power state • Pa: full power dissipation without any throttling • Pa = Pd + Ps • Pd: dynamic power • Clock gating and standby state • Ps: leakage power with clock gating • Power gating and inactive state • Pi: reduced leakage power with power gating

  6. VRC(Virtual power/ground Rails Clamp) for memory units • Less power reduction but with data retention Low Vt Logic Virtual GND Low Vt Logic Sleep Virtual GND Sleep Leakage Power Reduction by Power Gating • MTCMOS for logic circuits • Near 100% leakage power reduction in sleep mode • No data retention

  7. Its maximum and minimum values are stable when the number of circuit blocks is large enough (> 20) Leakage Power Model for Logic • Leakage power: • Iavg is the averaged leakage current over circuit blocks considering logic states, transistor stacking, and transistor size

  8. Leakage Power Model for Memory Units • Memory units modeled by SRAM array: • Plogic : leakage power for logic such as wordline drivers, write circuits and precharge transistors • Pcircuit : leakage power for SRAM cells

  9. Memory units: Temperature Scaling • Iavg in logic circuits: • α,β,γandδare coefficients • T is the absolute temperature • Get coefficient by curve fitting, less than 6% error with SPICE

  10. Temperature Calculation • Stable on-chip temperature • T: on-chiptemperature • Ta: ambient temperature • Rt: thermal resistance (for unit-area) • Transient Temperature • Suppose the average power within (t1, t2) is Pavg if Ta + Rt * Pavg > Tt1 else • τheat/cool: heating and cooling time constants

  11. Temperature Calculation Modes • Universal mode • Assume the whole chip has a uniform temperature • Provide lower bound of the maximum on-chip temperature • Individual mode • Divide the whole system into components • Calculate a temperature for each individual component • Assume no horizontal heat transfer among components • Provide upper bound of maximum temperature and maximum temperature gap

  12. 100% 90% 80% 70% 60% Normalized total energy 50% 40% 30% 20% 10% 0% ind uni ind uni ind uni 90C 90C 90C 130C 110C 110C 130C 130C 110C Case I Case II Case III Temperature Dependent Energy • Total leakage energy changes by a factor of 2.5X when temperature changes from 90oC to 130oC • Any study regarding to leakage energy is not accurate without considering thermal issue • At 2GHz with individual mode, clock gating reduces dynamic energy by up to 69.29%, and reduce leakage energy by up to 48.06% • due to reduced temperature • Case I: 1GHz without throttling • Case II: 2GHz without throttling • Case III: 2GHz with clock gating Dynamic energy Leakage energy

  13. Sub-conclusions • Coupled power and thermal simulation is necessary • growing significance of leakage • Leakage is an exponential function of temperature • The first cycle-accurate coupled power and thermal simulator is developed • Power and thermal management will be investigated • With inter-dependence between power and temperature

  14. Outline [2] • Introduction • Mechanisms for Dynamic Thermal Management • Simulation Results • Conclusions

  15. Introduction • Power dissipation becomes critical with increasing clock rate and transistor count • Thermal and power-delivery issues become especially critical for high-performance microprocessors Dynamic Thermal Management is needed for high-performance processors

  16. Overview

  17. Mechanisms for DTM • Initiation Delay i.e. operating system interrupt and handler • Response Delay i.e. Voltage and Frequency Scaling • Policy Delay : # of cycles before checking temperature after turning on DTM Turn Response off Check Temp Trigger Reached Check Temp Turn Response On Shutoff Delay Initiation Delay Response Delay Policy Delay Response On

  18. Initiation Mechanisms Hardware support for initiating Responses Trigger Mechanisms Temperature Sensors for Thermal Feedback On-Chip Activity Counters Dynamic Profiling Analysis Compiler-time trigger requirements Response Mechanisms Micro-architecture techniques Frequency/Voltage Scaling techniques Trigger Mechanisms

  19. Power Emergency Settings: 25W Emergencies are removed with DTM for all benchmarks except Fppp (No average power with DTM in paper)

  20. Performance Degradation Frequency/Voltage Scaling Techniques Microarchitecture Techniques Performance loss at various trigger level

  21. Sub-Conclusions • Allows arbitrary tradeoffs between performance and savings • Designer can focus on average power • Trigger delay is a key factor in performance overhead

  22. Introduction [3] • Two major design alternatives to deal with power dissipation: • Cooling technology designed to handle maximum power consumption • Heat removal designed for typical sustained power across realistic workloads. • Most dynamic thermal management (DTM) techniques do not account for application-specific techniques

  23. Introduction • An event-driven energy estimation model • Uses event-monitoring counters to estimate actual power consumption • Identifies which processes are using the power of the system • Allow OS to treat energy as a resource • A CPU scheduler limits the execution time slices of “hot” processes

  24. Event-Driven DTM • Use performance counters available in modern processors to determine if process is “hot” • Faster to estimate power consumption based on counters than to actually measure • Experiments performed on Pentium 4 • Limitation: accounts for thermal management of only the CPU

  25. From Events to Energy • Energy estimation done by correlating a processor-internal event to an amount of energy

  26. Energy Containers • Energy abstracted as a first class resource • Allows OS to actively schedule/manage based on energy • An energy container is a specific type of resource container • Processes are throttled based on the limits of the energy containers

  27. From Energy to Temperature • From energy equations and Newton’s Law of Cooling, the following formula is derived to estimate processor temperature: • The constants c1, c2, and T0 were determined experimentally using test programs • In all cases, estimated temperature > measured

  28. Evaluation • Compared estimated temperature with measured temperature on various benchmarks

  29. Evaluation

  30. Original vs. Energy/Temperature Scheduling

  31. Overhead • Event-monitoring counters read with timer interrupt (1000 times/sec), and context switches with energy container support increases by 49% • However, performance loss < 1% • Estimating temperature is negligible since it takes ~4.85us and is only executed 1-10 times/sec

  32. Introduction [4] • Energy as a first class resource • Explicit allocation of energy to competing applications • Control of battery resource • Goal of extending battery life by limiting average discharge rate • Uses currentcy model, an energy accounting framework • ECOSystem (a modified Linux)

  33. The Currentcy Model • Model uses a common unit of currentcy for energy accounting and allocation • 1 unit of currentcy represents the right to consume a certain amount of energy within a fixed amount of time

  34. The Currentcy Model • Allocation: currentcy is divided among competing tasks based on specified weights • Payback: each managed device has a cost that requires payment in currentcy • Allows OS to determine which tasks get access to the energy resource

  35. ECOSystem (Energy-Centric Operating System) • Currentcy model implemented in Linux OS • Models the power characteristics of 3 primary devices: • CPU • Disk • Wireless Network Interface

  36. Energy Accounting • Accuracy of currentcy model vs. program counter sampling

  37. Achieving Target Battery Lifetime

  38. Q & A • Thank you!

More Related