1 / 22

System-level, Unified In-band and Out-of-band Dynamic Thermal Control

System-level, Unified In-band and Out-of-band Dynamic Thermal Control. Dong Li Virginia Tech Rong Ge Marquette University Kirk Cameron Virginia Tech. Motivation. Hot spots or elevated temperatures in areas of the data center are quite common

aitana
Download Presentation

System-level, Unified In-band and Out-of-band Dynamic Thermal Control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. System-level, Unified In-band and Out-of-band Dynamic Thermal Control Dong Li Virginia Tech Rong Ge Marquette University Kirk Cameron Virginia Tech

  2. Motivation • Hot spots or elevated temperatures in areas of the data center are quite common • Out-of-band techniques (e.g. CPU cooling fans) are less studied • In-band and out-of-band techniques operate independently without cooperating with each other • Challenge 1: enforcing the same user control policy across diverse physical mechanisms • Challenge 2: in needs of a tunable controller

  3. Temperature Characteristics of Parallel Applications

  4. Temperature Characteristics of Parallel Applications • Three temperature characteristics • Sudden change and gradual change lead to actual temperature increase or decrease • Jitter lacks sustained increase or decrease following a spike • Design a controller to recognize these types and respond accordingly

  5. History-based Context-aware Temperature Control(Basic idea) • Periodically profile temperature and use the historical information to predict future CPU temperature • Identify the appropriate technique to perform thermal control and balance power and performance for the next interval based on the prediction

  6. History-based Context-aware Temperature Control(Temperature Profiling and Prediction) • Use a two-level window to track the changes in temperature in both long and short time periods Temperature samples The level-one temperature window to react to the “sudden” Average value to reduce jitter The level-two temperature window (FIFO) to react to the “gradual” front rear

  7. History-based Context-aware Temperature Control(Temperature Profiling and Prediction) • We assume that temperature will change with the same rate for the next round of sampling • The temperature difference (tL1/L2) is then used to determine the appropriate temperature regulator response

  8. History-based Context-aware Temperature Control(Target Mode Identification) • Inputs: • Predicted temperature behavior based on the temperature profile (tL1/L2) • A parameter (Pp) specified by the user that indicates the aggressiveness of the temperature controller • Outputs: • Fan speed • Frequency setting • The controls follow the thermal control policy (Pp)

  9. History-based Context-aware Temperature Control(Target Mode Identification) • We maintain a “thermal control array” for each available thermal control technique on the system {g1, g2, g3, …, gnp, …, gN} Each number represents a mode that controls temperature at a degree Effectiveness of controlling temperature Weak Strong

  10. Filled with the most effective mode gN Filled with a subset of physically available modes evenly extracted from the full set npis determined by Pp History-based Context-aware Temperature Control(Target Mode Identification) • To coordinate multiple thermal management techniques, we fill out the arrays in a unified way {g1, g2, g3, …, gnp, …, gN} Effectiveness of controlling temperature Weak Strong

  11. History-based Context-aware Temperature Control(Target Mode Identification) • np is determined by Pp Pp PMIN PMAX Mapping np 1 N

  12. History-based Context-aware Temperature Control(Target Mode Identification) PMIN Pp PMAX Mapping np 1 N {g1,…, ,…, gN} gnp • A smaller Pp leads to a more aggressive thermal control • More array items store the most efficient temperature mode • A small increment in array index can lead to large increment in cooling effect

  13. History-based Context-aware Temperature Control(Target Mode Identification) • We use the predicted temperature variance (tL1/L2) from the two-level window to identify an index in the thermal control array {g1,…, gi,…gi+c*t,…, gN} current mode next mode C = (N-1)/(Tmax – Tmin) TMIN TMAX Mapping 1 N

  14. Performance Evaluation (Platform) • Implement a fan driver that dynamically set the fan speed according to processor temperature • Collect temperature samples from digital thermal sensors embedded in the processor • The processor can be scaled among 5 frequencies

  15. Performance Evaluation (Dynamic Fan Control) Our dynamic fan control responds to temperature changes under different control policies (Pp=25 (aggressive), Pp=50(moderate), and Pp=75(weak))

  16. Performance Evaluation (Dynamic Fan Control) • Pp = 50; benchmark: bt.B.4 • We compare our dynamic fan control method with the traditional static method and constant fan speed control

  17. Performance Evaluation (Dynamic Fan Control) • In general, larger maximum PWM duty cycle leads to lower temperature • A less powerful fan is able to deliver similar cooling effects as a more powerful fan with our dynamic control

  18. Performance Evaluation (Temperature Aware DVFS Control) • Benchmark: LU.B.4; coupled with traditional static fan control; Pp=50 • Our DVFS control scales down frequency only when average temperature is stabilized • Our DVFS control scales up frequency to its original value once the temperature is consistently below the threshold so as to avoid performance loss

  19. Performance Evaluation (Temperature Aware DVFS Control) • Our DVFS control performs better than CPUSPEED in terms power-saving and performance

  20. Performance Evaluation (Dynamic Hybrid Fan and DVFS Control) • Our method effectively unifies different thermal control techniques and reacts to different user control policies with minimum performance impact

  21. Conclusion • We classify thermal characteristics of parallel applications and use a two-level temperature window to make our controller more effective • We introduce a simple parameter (Pp) to allow the user to specify the aggressiveness of in-band and out-of-band techniques for thermal reductions • We integrate an out-of-band method (fan control) and an in-band method (DVFS) • We explore an efficient fan control method

  22. Thank You

More Related