Power Cost Reduction in Distributed Data Centers

Power Cost Reduction in Distributed Data Centers Yuan Yao University of Southern California Joint work: Longbo Huang, Abhishek Sharma, LeanaGolubchik and Michael Neely • IBM Student Workshop for Frontiers of Cloud Computing 2011 • Paper to appear on Infocom 2012

Background and motivation • Data centers are growing in number and size… • Number of servers: Google (~1M) • Data centers built in multiple locations • IBM owns and operates hundreds of data centers worldwide • …and in power cost! • Google spends ~$100M/year on power • Reduce cost on power while considering QoS

Existing Approaches • Power efficient hardware design • System design/Resource management • Use existing infrastructure • Exploit options in routing and resource management of data center

Existing Approaches • Power cost reduction through algorithm design • Server level: power-speed scaling [Wierman09] • Data center level: rightsizing [Gandhi10, Lin11] • Inter data center level: Geographical load balancing [Qureshi09, Liu11] $2/kwh $5/kwh job

Our Approach: SAVE • We provide a framework that allows us to exploit options in all these levels Temporal volatility of power prices StochAstic power redUctionschEme(SAVE) Server level Data center level Inter data center level + = Job arrived Job served

Our Model: data center and workload • M geographically distributed data centers • Each data center contain a front end server and a back end cluster • Workloads Ai(t) (i.i.d) arrive at front end servers and are routed to one of the back end clusters µji(t)

Our Model: server operation and cost • Back end cluster of data center i contain Ni servers • Ni(t) servers active • Service rate of active servers: bi (t) ∈[0, bmax] • Power price at data center i: pi(t) (i.i.d) • Powerusage at data center i: • Power cost at data center i:

Our Model: two time scale • The system we model is two time scale • At t=kT, change the number of active servers Nj(t) • At all time slots, change service rate bj(t)

Our Model: summary • Input: power prices pi(t), job arrival Ai(t) • Two time Scale Control Action: • Queue evolution: • Objective: Minimize the time average power cost subject to all constraints on Π, and queue stability

SAVE: intuitions • SAVE operates at both front end and back end • Front end routing: • When , choose μij(t)>0 • Back end server management: • Choose small Nj(t) and bj(t) to reduce the power costfj(t) • When is large, choose large Nj(t) and bj(t) to stabilize the queue

SAVE: how it works • Front end routing: • In all time slot t, choose μij(t) maximize • Back end server management: Choose V>0 • At time slot t=kT, choose Nj(t) to minimize • In all time slots τ choose bj(τ) to minimize • Serve jobs and update queue sizes

SAVE: performance • Theorem on performance of our approach: • Delay of SAVE ≤ O(V) • Power cost of SAVE ≤ Power cost of OPTIMAL + O(1/V) • OPTIMAL can be any scheme that stabilizes the queues • V controls the trade-off between average queue size (delay) and average power cost. • SAVE suited for delay tolerant workloads

Experimental Setup • We simulate data centers at 7 locations • Real world power prices • Possion arrivals • We use synthetic workloads that mimics MapReduce jobs • Power Cost • Power consumption of active servers • Power price • Power consumption of servers in sleep • Power usage effectiveness

Experimental Setup: Heuristics for comparison • Local Computation • Send jobs to local back end • Load Balancing • Evenly split jobs to all back ends • Low Price (similar to [Qureshi09]) • Send more jobs to places with low power prices • All servers are activated • Instant On/Off • Routing is the same as Load Balancing • Data center i tune Ni(t) and bi(t) every time slot to minimize its power cost • No additional cost on activating/putting to sleep servers • Unrealistic

Experimental Results • As V increases, power cost reduction grows from ~0.1% to ~18% • SAVE is more effective for delay tolerant workloads. relative power cost reduction as compared to Local Computation

Experimental Results: Power Usage • We record the actual power usage (not cost) of all schemes in our experiments • Our approach saves power usage

Summary • We propose atwo time scale, non work conserving control algorithm aimed atreducing power costin distributed data centers. • Our work facilitating an explicit power cost vs. delay trade-off • We derive analytical bounds on the time average power cost and service delay achieved by our algorithm • Through simulations we show that our approach can reduce the power cost by as much as 18%, and our approach reduces power usage.

Future work • Other problems on power reduction in data centers • Scheduling algorithms to save power • Delay sensitive workloads • Virtualized environment, when migration is available

Questions? • Please check out our paper: • "Data Centers Power Reduction: A two Time Scale Approach for Delay Tolerant Workloads” to appear on Infocom 2012 • Contact info: yuanyao@usc.edu http://www-scf.usc.edu/~yuanyao/

Power Cost Reduction in Distributed Data Centers