430 likes | 572 Views
Server-level Power Control. Ming Chen. Motivations(1). Clusters of hundreds, even thousands of servers; Occupy one room of a building or even a whole building; Servers racked in cabinets with high density; Cabinets are ordered in rows and columns to occupy a whole room. Motivations(2).
E N D
Server-level Power Control Ming Chen
Motivations(1) • Clusters of hundreds, even thousands of servers; • Occupy one room of a building or even a whole building; • Servers racked in cabinets with high density; • Cabinets are ordered in rows and columns to occupy a whole room.
Motivations(2) • Power and energy consumption have become key concerns in data centers; • Solutions: • Peak power management to decrease the cost of cooling systems and power delivery systems; • Power efficient design to improve performance/watts; From Spring 2005, Data Center User’s Group Conference, The adaptive Data Center: Managing Dynamic Technologies
Outline • Power management for CPU • Server-level Power Control (paper 1) • Formal Control Techniques for Power-Performance Management (paper 2) • Comparison between the two papers
Why CPU Power Management? • The most used actuator in power management; • The majority of total power consumption of servers; • More than 60% of the total power consumption. • Well-documented interfaces to adjust power scaling. • P-states; • T-states.
CPU Power Knob (1)—P-states p f • DVFS • PowerNOW, SpeedStep, Cool’n’Quiet
CPU Power Knob (2)—T-states p Duty cycle
Server-level Power Control Charles Lefurgy, Xiaorui Wang and Malcolm Ware IBM Research, Austin University of Tennessee, Knoxville
Motivations Workload varies a lot • Goal: • Manage the peak power control to avoid unnecessary under-provisioned cooling systems and power delivery systems. very few but worst cases of power consumption Over-provision the cooling system and power delivery system
Control Options • Open-loop • No measurement of power; • Choose fixed speed for a given power budget; • Based on most power hungry workload; • Ad-hoc control • measure power and compare it with the power budget; • raise/lower one level of performance state based on the comparison;
Contributions • The first paper that manages the peak power of a single server with a closed-loop control system; • A feedback controller based on control theory; • Detailed derivation and analysis of the stability and accuracy; • Empirical results in a physical hardware system; • Better application performance than previous methods
Platform • IBM BladeCenter HS20 blade server with Intel Xeon processors; • Power constraint: 250 W • No overload of power supply for more than 1 second;
System Modeling(1) • Power changes immediately as the performance state changes (within 1 ms) • Curve fitting Which A to choose?
Controller Design(1) • First-order delta-sigma modulator: • Map a series of discrete throttling levels to the floating-point output of the controller; • For example: 6.2 is discretized as 6, 6, 6, 6, 7, 6, 6, 6, 6, 7; • Controller: P controller; • Plant:
Controller Design(2) • Minimal prototype: • Different workloads on the same server have different slope; • The same workload on the different servers has different slopes. Slope variation • Real model:
Performance Analysis • Stability 0 < g < 2 • Steady state error • Settling time
System Architecture • Power Monitor • A hardware which can measure the power at 1000 samples/second; • A firmware in the service processor average the power measurements; • Controller • Compute the ideal throttling level • Actuator • Map the discrete throttling levels to floating-point levels and write the CPU register to throttle the clock.
Comparison with Ad-hoc controller(2) • Set points are from 180W to 260W with 1W increment; • P4MAX is used; • The average of three runs is plotted; • P controller has a precision of 0.1W; • The safe margin of Ad-hoc controller is 6.1 W.
Comparison of Three Controllers • Open-loop set point • P4MAX without violation of power budget • P controller set point • Reducing the power budget by 2% measurement error; • Ad-hoc controller set point • 6.1W lower than P controller set point.
Conclusion • A control-theoretic peak power management solution for servers is presented; • Better control performance and application performance than two baselines; • Stability, settling time and zero steady state error are analyzed based on control theory.
Critiques • Peak power management Vs. performance/watts; • Clock throttling + DVFS, what is the solution? • A high precision hardware is required which is not available to everyone.
Formal Control Techniques for Power-Performance Management Qiang Wu, Philo Juang, Margaret Martonosi, Li-Shiuan Peh, Douglas W.Clark Princeton University
Background: MCD f1 Ifetch/Decode f2 f3 f4 FP exec INT exec Ld/St exec • Each function block operates with an independent clock; • Advantages: • less clock distribution • less clock skew; • less power consumption; • DVFS flexibility • Use queue structures between domains for efficiency.
Basic Idea q f DVFS controller • Adapt frequency to workload changes; • capability > demand: wasted; capability < demand: degraded performance; • Queue occupancy • clues about capability and demand; • a feedback signal to control the domain frequency.
System Modeling(1) clock domain clock domain demand service rate arrival rate frequency f1 frequency f2 queue q
System Modeling(1) clock domain clock domain demand service rate arrival rate frequency f2 frequency f1 queue q
System Modeling(2) demand service rate arrival rate clock domain clock domain frequency f1 frequency f2 queue q • λt and μt: independent and stationary random processes; • Each control period T includes N sampling period Δt; • q’k is the controlled variable.
System Linearization • f is the manipulated variable, but it is nonlinear in the model; • It is generally hard to design an effective controller for nonlinear system; • Fortunately, the nonlinear part in this system can be separated.
Controller Design • PI controller • Proportional gain (K_p) • Integral gain (K_i)
Energy-Performance Tradeoff • How aggressively to save energy? • Or preserve performance? • A simple lever – qref position • Increase qref – more aggressive in saving energy • Decrease qref – value performance more • Software/hardware cooperation • Software – make overall tradeoff decisions • Hardware – implement details of speed adaptation
Experiments(1)– Illustrative Exp queue entries • Benchmark Epic_Decode: frequency settings
Experiments Results • Simulator: SimpleScalar + Wattch power estimation extension + MCD processor extension • Benchmarks: 18 benchmarks
Extension for CMPs (1) • Using task queues; • Dependency among parallel application threads; • Parallel sections require all threads to finish before moving on. • Two valid assumptions: • The tile with the highest queue occupancy is on the critical path. • The tile on the critical path should run in full speed. What is the solution?
Extension for CMPs (2) – Dist_PID qref the performance lever • Each tile estimates qtarget; • The tiles exchanges their qtarget; • The tile with the highest qtargetis identified as the critical path; • Other tiles set their qref as the highest qtarget.
Experiment for Dist_PID • Simulator: modified Xtrem (a validated SimpleScalar ARM simulator); • Dist_PID has lower EDP than Local_PID thus it has better performance.
Conclusion • A control-based solution for power-performance tradeoffs of MCD processors and CMPs is presented; • An analytical queue model between different MCDs is analyzed; • Based on the PI controller for MCDs, a Dist_PID is introduced for CMPs; • Simulation results are provided to verify the performance of the controllers.
Critiques • Effects of λ on the stability or the accuracy of the controller? • Simulation results are not convincing enough; • Dist_PID only compares with Local_PID. How about other solutions for CMPs? • Overhead or delay for exchanging qtarget in the dist_PID?