240 likes | 483 Views
Green Computing: Energy Consumption Optimized Service Hosting. W alter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA. Motivation. Data centers are becoming ubiquitous Large installations of computer systems Providing critical services
E N D
Green Computing: Energy Consumption Optimized Service Hosting Walter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA
Motivation • Data centers are becoming ubiquitous • Large installations of computer systems • Providing critical services • Data centers are big power consumers • Continuously operating computers, regardless of the load • Cooling 2009-01-26
Reducing Power Consumption • Green Grid consortium advocates data center design and management to improve energy efficiency • Right-sizing data centers at design time • Energy-efficient cooling • Virtualization (multiple servers on same physical machine) • Processor power saving (e.g., clock rate depending on load) • Powering down unused machines • Computers with dedicated roles (e.g., computers performing backups) 2009-01-26
Our Approach • Load on machines varies over time • Turn off subset of unnecessary machines, respectively restart machines according to load • Problems • Load is distributed over multiple machines • Load reduction typically also distributed across multiple machines • Need to consolidate load on a subset of machines in order to free up machines that can be turned off • Goal: Minimum number of machines running • Constraint: QoS must be ensured • Service-Level Agreements (SLAs) must not be violated 2009-01-26
Example 2009-01-26
Service Types • Hosting environment may offer multiple service types • Service type consists of • Service interface • SLA defining QoS parameters • SLA parameters specified according to a common ontology • WS-Agreement, WSLA, SLAng, etc. • Here: Single QoS parameter: Response time 2009-01-26
Stateless versus Stateful Services • Stateless service: • Requests are independent • After completing all pending requests, a stateless service may be stopped • Stateful service: • Requests in one session may depend on prior requests in the same session • Sessions may be explicitly terminated by clients, or expire after some period of inactivity • After termination of all sessions, a stateful service may be stopped 2009-01-26
Hosting Environment (1) • Dedicated machines for three different purposes: • File servers • Provide all data sources • Compute servers • Execute service requests • Dispatchers • Receive service requests and choose compute servers to handle them • Decide on shutdown and restart of compute servers • Dispatchers and file servers are continuously running • Only idle compute servers may be shut down 2009-01-26
Hosting Environment (2) Clients Dispatcher File servers Compute servers 2009-01-26 dataaccess requests dispatch
Hosting Environment (3) • Heterogeneous environment • Machines have different computing resources • Dynamically changing environment • New machines may be added • Cores may fail • Compute servers may host any number of service types, and a service type may be hosted by any number of compute servers • Compute servers are ranked according to energy efficiency 2009-01-26
Node Manager • Each compute server runs a Node Manager component • Monitors idle time and average response time for each service type • Communicates measurements to dispatcher • Handles server shutdown upon request from dispatcher • Notifies dispatcher upon startup 2009-01-26
Shutdown of Compute Severs • Dispatcher notifies Node Manager on compute server to prepare shutdown • No further service requests are dispatched to the compute server • Node Manager waits for • Completion of all previously accepted requests • Termination of all active sessions • Alternative: Migration of sessions 2009-01-26
Shutdown Options • Complete shutdown • No power consumption • Ensures clean state upon restart (e.g., no memory leaks) • Slow restart • Hibernation • No power consumption • Memory saved on persistent storage • Resume by reloading memory snapshot • Standby • Reduced power consumption • Processor stopped, but memory remains active • Fast restart 2009-01-26
Restart of Compute Servers • Wake on LAN • Magic packet is broadcast to LAN • Special header: 0xFF repeated 6 times • MAC address of the machine to restart • Dispatcher initiates compute server restart • Node Manager notifies dispatcher of completed restart • Dispatcher needs to know MAC addresses of all compute servers 2009-01-26
Service Dispatch: Definitions • n compute servers <s1,…,sn> • Sorted according to energy efficiency • sx more energy efficient than sy x < y • In each configuration • s1 … sr are running (1 ≤ r ≤ n) • sr … sn are shut down (or in the process of shutting down) • pT(i): probability that request for service type T is dispatched to si 2009-01-26
Service Dispatch upon Request • Take a random number z (0 ≤ z ≤ 1; uniform distribution) • Choose sc such that c = min { i: (1 ≤ i ≤ n) && (z ≤ sum(1; i; pT(i))) } • Related to lottery scheduling • Tickets instead of probabilities 2009-01-26
Update of Probabilities (1) • In regular intervals, dispatcher obtains monitoring data from Node Managers of running compute servers • If si had idle time and si had no problem meeting the SLAs: • Increase load on si, reduce load on sr • pT(r) := pT(r) – Δp • pT(i) := pT(i) + Δp • If r > 1 and for all service types TpT(r) = 0, initiate shutdown of sr 2009-01-26
Update of Probabilities (2) • If compute server si violates the SLA for a service type T (overload situation): • First try to find a running compute server sk (1 ≤ k ≤ r) that has idle time and met the SLAs of all service types • Balance load between si and sk • pT(i) := pT(i) – Δp • pT(k) := pT(k) + Δp • If there is no such compute server sk, initiate restart of sr+1 2009-01-26
Future Work (1) • Testbed and evaluation • Main evaluation metric: Energy savings for given workloads • Service performance must be modeled • Traces of service execution in data centers needed • Migration of sessions • Reduces the time for preparing shutdown • Complex optimization criteria • Minimize number of service types hosted on the same compute server • Consider estimated shutdown preparation time when choosing the compute server to shut down 2009-01-26
Future Work (2) • Distribution and replication • Service dispatcher must not become bottleneck • Fault tolerance • Dispatcher must detect compute server failures • Dispatcher must not become single point of failure • Sudden load fluctuations • Shutting down machines increases vulnerability wrt. denial-of-service attacks 2009-01-26
Conclusions • Data centers are growing and consume huge amounts of electrical energy • Energy can be saved by powering down unused machines according to the current load • Requires consolidation of services on a subset of the available machines • Probabilistic approach to energy consumption-aware load-balancing 2009-01-26