220 likes | 404 Views
System-level, Unified In-band and Out-of-band Dynamic Thermal Control. Dong Li Virginia Tech Rong Ge Marquette University Kirk Cameron Virginia Tech. Motivation. Hot spots or elevated temperatures in areas of the data center are quite common
E N D
System-level, Unified In-band and Out-of-band Dynamic Thermal Control Dong Li Virginia Tech Rong Ge Marquette University Kirk Cameron Virginia Tech
Motivation • Hot spots or elevated temperatures in areas of the data center are quite common • Out-of-band techniques (e.g. CPU cooling fans) are less studied • In-band and out-of-band techniques operate independently without cooperating with each other • Challenge 1: enforcing the same user control policy across diverse physical mechanisms • Challenge 2: in needs of a tunable controller
Temperature Characteristics of Parallel Applications • Three temperature characteristics • Sudden change and gradual change lead to actual temperature increase or decrease • Jitter lacks sustained increase or decrease following a spike • Design a controller to recognize these types and respond accordingly
History-based Context-aware Temperature Control(Basic idea) • Periodically profile temperature and use the historical information to predict future CPU temperature • Identify the appropriate technique to perform thermal control and balance power and performance for the next interval based on the prediction
History-based Context-aware Temperature Control(Temperature Profiling and Prediction) • Use a two-level window to track the changes in temperature in both long and short time periods Temperature samples The level-one temperature window to react to the “sudden” Average value to reduce jitter The level-two temperature window (FIFO) to react to the “gradual” front rear
History-based Context-aware Temperature Control(Temperature Profiling and Prediction) • We assume that temperature will change with the same rate for the next round of sampling • The temperature difference (tL1/L2) is then used to determine the appropriate temperature regulator response
History-based Context-aware Temperature Control(Target Mode Identification) • Inputs: • Predicted temperature behavior based on the temperature profile (tL1/L2) • A parameter (Pp) specified by the user that indicates the aggressiveness of the temperature controller • Outputs: • Fan speed • Frequency setting • The controls follow the thermal control policy (Pp)
History-based Context-aware Temperature Control(Target Mode Identification) • We maintain a “thermal control array” for each available thermal control technique on the system {g1, g2, g3, …, gnp, …, gN} Each number represents a mode that controls temperature at a degree Effectiveness of controlling temperature Weak Strong
Filled with the most effective mode gN Filled with a subset of physically available modes evenly extracted from the full set npis determined by Pp History-based Context-aware Temperature Control(Target Mode Identification) • To coordinate multiple thermal management techniques, we fill out the arrays in a unified way {g1, g2, g3, …, gnp, …, gN} Effectiveness of controlling temperature Weak Strong
History-based Context-aware Temperature Control(Target Mode Identification) • np is determined by Pp Pp PMIN PMAX Mapping np 1 N
History-based Context-aware Temperature Control(Target Mode Identification) PMIN Pp PMAX Mapping np 1 N {g1,…, ,…, gN} gnp • A smaller Pp leads to a more aggressive thermal control • More array items store the most efficient temperature mode • A small increment in array index can lead to large increment in cooling effect
History-based Context-aware Temperature Control(Target Mode Identification) • We use the predicted temperature variance (tL1/L2) from the two-level window to identify an index in the thermal control array {g1,…, gi,…gi+c*t,…, gN} current mode next mode C = (N-1)/(Tmax – Tmin) TMIN TMAX Mapping 1 N
Performance Evaluation (Platform) • Implement a fan driver that dynamically set the fan speed according to processor temperature • Collect temperature samples from digital thermal sensors embedded in the processor • The processor can be scaled among 5 frequencies
Performance Evaluation (Dynamic Fan Control) Our dynamic fan control responds to temperature changes under different control policies (Pp=25 (aggressive), Pp=50(moderate), and Pp=75(weak))
Performance Evaluation (Dynamic Fan Control) • Pp = 50; benchmark: bt.B.4 • We compare our dynamic fan control method with the traditional static method and constant fan speed control
Performance Evaluation (Dynamic Fan Control) • In general, larger maximum PWM duty cycle leads to lower temperature • A less powerful fan is able to deliver similar cooling effects as a more powerful fan with our dynamic control
Performance Evaluation (Temperature Aware DVFS Control) • Benchmark: LU.B.4; coupled with traditional static fan control; Pp=50 • Our DVFS control scales down frequency only when average temperature is stabilized • Our DVFS control scales up frequency to its original value once the temperature is consistently below the threshold so as to avoid performance loss
Performance Evaluation (Temperature Aware DVFS Control) • Our DVFS control performs better than CPUSPEED in terms power-saving and performance
Performance Evaluation (Dynamic Hybrid Fan and DVFS Control) • Our method effectively unifies different thermal control techniques and reacts to different user control policies with minimum performance impact
Conclusion • We classify thermal characteristics of parallel applications and use a two-level temperature window to make our controller more effective • We introduce a simple parameter (Pp) to allow the user to specify the aggressiveness of in-band and out-of-band techniques for thermal reductions • We integrate an out-of-band method (fan control) and an in-band method (DVFS) • We explore an efficient fan control method