350 likes | 511 Views
Latency-aware and performance-preserving Power Capping. Arka Bhattacharya , David Culler (UCB) Aman Kansal , Sriram Sankar , Sriram Govindan (Microsoft). What do I mean by power capping ?.
E N D
Latency-aware and performance-preserving Power Capping Arka Bhattacharya, David Culler (UCB) AmanKansal, SriramSankar, SriramGovindan (Microsoft)
What do I mean by power capping ? • Restrict server power consumption to a specific power budget , through manipulation of load or scaling of processor frequency.
Summary of the talk : • Data Centers need power capping. • Any power capping technique should be • Fast ; and • Ensure graceful degradation of performance • Related work has proposed power capping either through • frequency scaling, or • processor utilization capping • In an open system, using any one of these knobs might lead to a cascading failure • Hence, to maintain a stable system one needs to • Maintain desired power level through admission control, • Implement a frequency scaling governor for safety
Data Center Cost Analysis • James Hamilton’s 2010 figures for a 50k server, 8 MW facility. • Power Distribution and Cooling is close to ~20% of total data center budget - James Hamilton
Why do power capping ? • To under-provision UPS batteries/Generators • According to previous figure, annual cost of power distribution and cooling equipment is > $7m for about 50k servers • Current UPS provisions are mostly based on worst-case faceplate or spec-power ratings.
PDF of power consumption of a colo containing an online application Aggressive UPS provisioning Peak power Headroom CURRENT UPS PROVISIONING
Other reasons for doing power capping • Ensure circuit protection • Re-claim UPS re-charge budget • Shave off data center peak power usage (for data centers paying peak-pricing rates) • Differentiating service among data center apps • React to change in power supply from utility
Ramp Rate of power spikes 95th percentile S1 S2 Power consumption Power Spike = PowerConsumption(S2)-PowerConsumtion(S1) Ramp Rate = PowerSpike per sampling period Time
Power spikes in case of under-provisioned UPS Sampling rate = 30sec
For circuit protection : Latency Analysis of power-capping methods
Prior work in feedback based power-capping e.g • Power Budgeting for virtualized data centers – Lim et.al (ATC,2011) • Coordinated power control and performance management for virtualized server clusters –Wang et. Al (IEEE TOPDS,2010) • Ship: Scalable hierarchical power control for large-scale data centers (PACT, 2009) • Dynamic Voltage Scaling in multitier web servers with end-to-end delay control – Horvath et.al (IEEE Trans. Comput. 2007)
Worst-case power rise in Servers • Power rise in an Intel Xeon L5520 Server • Fastest observed power rise (from min to max): 100ms • Power rise in an Intel Xeon L5640 Server • Fastest observed power rise (from min to max) : 200ms
Methods to decrease server power • DVFS (Dynamic Voltage and Frequency Scaling) : • Reduces the frequency and voltage the processor runs at. • Processor utilization capping : • Imposes a certain number of idle cycles on the CPU , while running at the same frequency • Admissions control : • Reduces the amount of network traffic that the server serves.
Time Line of events Central controller gives actuation command Settings changed in hardware Command received by Daemon 3 5 1 2 4 time Function Call returns Command reaches destination server Power decreases
Central controller gives actuation command Settings changed in hardware Command received by agent Freq. scaling : 200-350ms Proc. Capping : ~2 sec Admission Control : > 2sec ~20ms < 1ms time < 1ms <40-60ms in current implementation (using user-level code) Command reaches destination server Function call returns Power decreases * Still to be measured accurately
If UPS capacity < Peak Power of IT equipment , one needs to be implement non-feedback governor (based on DVFS/proc capping/hardware capping) Take-away 1:
Why do we need Network Admissions Control ? • In an open system , if a frequency scaled server is stuck with more work than it can handle , • Server latency goes up ( because of filled queues ) • Requests getting dropped are retried by TCP stack of clients. • The entire load on the system keeps increasing = > cascading failure • In a closed system, • An implicit admissions control takes place because new requests are not issued until old requests are served. • Latency increases, but does not dive into cascading failure.
Experiment Setup to check frequency scaling effects on closed and open systems Open System => a Xeon X5550 server running wikipedia benchmark on apache web server , on linux Closed System => StockTrader Application on Windows Server2008, Xeon L5520 . In both systems , load was generated by 3-5 external servers.
Frequency scaling : demarcating stable and unstable regions for each frequency
Unstable open loop system due to frequency scaling Frequency scaling applied here
Open system power capping Admission control required Power reduction required
In an open system, while doing power capping one must perform admissions control to maintain stability. Take-away 2:
Capping a closed online application Experiment : Generate constant load for the server. Lower the processor cap gradually and observe effects on latency
In an closed system, doing admissions control along with frequency scaling leads to almost same throughput but with better latency Take-away 3:
Admissions control Assumptions : known relation between network traffic (T) and power consumption (P). Problem Statement : Reduce traffic of an application from current state T1 to T2 , such that power goes from current state P1 to P2. Challenges : • Traffic changes every instant. • A request from a user may spawn multiple flows. • How to do it in an app-agnostic way ?
Admissions Control Tradeoffs at each layer Doing Admissions Control at Layer 2 : • Layer 2: • Pros : Simple implementation . • Cons : All connections get hurt equally.
Admissions control continued • Layer 3 : • Pros : • cuts off entire requests, spanning across multiple flows. • Easy to configure in a firewall. • Does not need app-level compliance • Con : • Coarse admissions control due to NATs.
Admissions control continued • Layer 4 : • Pro : Can do finer grained admissions control than IP. • Cons : A webpage may be served over multiple flows , and different flows of the same request might get different service. • Layer 7: • Pro : Has most insight into app working . Can do fine-grained admissions control • Con : data center needs app-compliance / load balancer compliance .
Future work • Evaluate tradeoffs between doing Network Admissions Control at different layers. • Devise and implement algorithms to do admissions control at various layers.