190 likes | 305 Views
The Laboratory for Computer Architecture at Virginia (LAVA). Kevin Skadron University of Virginia Department of Computer Science. Why We Care About Thermal Management. Source: Tom’s Hardware Guide http://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html. Dynamic Thermal Management.
E N D
The Laboratory for Computer Architecture at Virginia (LAVA) Kevin Skadron University of Virginia Department of Computer Science
Why We Care About Thermal Management... Source: Tom’s Hardware Guidehttp://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html
Dynamic Thermal Management • Dynamically adjust execution to control temperature • Avoid catastrophic failure (heat sink, fan) • Permit the use of a less expensive thermal package • Design for less than the worst case • Package costs ~$1 / W above ~40 W • Peak power as high as 130 W in 1-2 generations (SIA roadmap) • Temperatures over 100°C
Dynamic Thermal Management • Deal with “hot spots” • Localized heating occurs much faster than chip-wide • Chip-wide treatment is too conservative • Prove temperature will be safely bounded
Thermal Modeling • Want a fine-grained model of temperature • Power dissipation: too indirect, not easy to measure in HW
“Ohm’s Law” for Temperature V temp I power R thermal resistance C thermal capacitance RC time constant I · t V · t V = ------- + -------- C RC • Lets us compute stepwise changes in temperature for any granularity at which we can get P, T, R, C • steady-state: V = IR (T = PR)
Thermal Modeling • Use thermal resistance and capacitance of Si • Develop computationally efficient model based on lumped values Pi · t Ti · t Ti = -------- + --------- Ci RiCi • Integrate in Wattch (power/performance simulator) • Time evolution of temperature is driven by unit activities and power dissipations on a per-cycle basis • Detect hot spots and activate thermal response • Typical time constant: 10-100 s
Fetch Toggling • Fetch toggling • disable fetch every N cycles • 4/5, 2/3, 1/2, 1/3, 1/5, … IF ID EX MEM WB
Fetch Toggling • Fetch toggling • disable fetch every N cycles • 4/5, 2/3, 1/2, 1/3, 1/5, … IF ID EX MEM WB IF ID EX MEM WB
Fetch Toggling • Fetch toggling • disable fetch every N cycles • 4/5, 2/3, 1/2, 1/3, 1/5, … • How to set the fetch rate? IF ID EX MEM WB IF ID EX MEM WB
Feedback-Control of Fetch Toggling • Formal feedback control PID: m = KC (e + KIe + Kdde/dt) • easy to compute • toggling = f(m) setpoint e m P T Actuator:I-fetch toggling Thermaldynamics Controller Temp. sensor measured T
Other Thermal-Management Techniques • Fetch toggling • Fetch throttling • Decode throttling • Speculation control • Frequency/voltage scaling
Per-Structure Response • Hot spots • Branch predictor (probed every cycle) • Load-store queue • L1 D-cache (for high-BW apps) • …most major structures are a hot spot for at least one SPEC2k app • Modified Wattch • Sampling rate: 1000 cycles (RC of hot spots is 10-100 s) • Base temp. of 100C (SIA roadmap) • Emergency threshold of 108 (Yuan/Hong SEMI-THERM ‘01) • Set point of 107.9
Thermal Modeling: Where to go from here?(i.e., lots of research questions) • Floor-planning issues and granularity of lumped R/C values • Thermal coupling among blocks • Response lag in temperature sensors • Validation techniques • Visualization • How to deal with large time scales?
Thermal Management: Where to go from here?(i.e., lots more research questions) • New mechanisms • Characterize benchmarks • When to use frequency/voltage scaling • Faster HW techniques for sensing temperature changes • Robust response despite sensor lag • Hot spots • Temperature effects on leakage current • Joint control of temp., power, and performance
Thermal Management: Where to go from here?(i.e., lots more research questions) • New mechanisms • When to use clock scaling • Robust response despite sensor lag • Temperature effects on leakage current • Joint control of temperature, power, and performance
Summary • New tools for thermal management • Models • Mechanisms Source: Tom’s Hardware Guidehttp://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html
Performance Loss Performance loss reduced by 65%