260 likes | 372 Views
GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks. Íñigo Goiri , Kien Le, Thu D. Nguyen, Jordi Guitart , Jordi Torres, and Ricardo Bianchini. Motivation. Datacenters consume large amounts of energy Energy cost is not the only problem Brown sources: coal, natural gas…
E N D
GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks ÍñigoGoiri, Kien Le, Thu D. Nguyen, JordiGuitart, Jordi Torres, and Ricardo Bianchini
Motivation • Datacenters consume large amounts of energy • Energy cost is not the only problem • Brown sources: coal, natural gas… • Connect datacenters to green sources • Solar panels, wind turbines… • Green datacenter • Early examples in the field
Green datacenter • Energy sources • Solar/wind: variable over time • Electrical grid: backup • Mitigation approaches are not ideal • Batteries and net metering • We need to match the energy demand to the supply Solar power Power Load Workload Time
Delaying load within time bounds J1 J2 J2 J3 Power Nodes Delay some jobs is OK (respecting time bounds) J1 J2 J3 Power Nodes Time
Scheduling data-processing workloadsin green datacenters Shuffle • Data-processing jobs • Each task operates on a chunk of data • Data distributed among servers • Simple workflow: MapReduce • Map tasks: process input data • Reduce tasks: merge maps’ outputs Challenges • Match MapReduce workload with green energy availability • No information on #nodes, length, power… • Conserve energy while ensuring data availability 1 Map 2 Map Reduce 6 3 Map Reduce 7 4 Map 5 Map
Overview of GreenHadoop • Predict solar energy availability • May delay jobs but must meet time bounds • Maximize green energy use • If not enough green energy, minimize brown electricity cost • Brown energy cost + peak brown power cost • Deactivate idle servers while keeping data available • Divided into two parts • Computation scheduling • Data management
1. Computation scheduling Estimate the energy required by jobs (EWMA) Job3 Job3 Job5 Job5 Job1 Job1 Job2 Job2 Job4 Job4 Job6 Job6
1. Computation scheduling Assign green energy first Job3 Job5 Job1 Job2 Job4 Job6 Off-peak On-peak Off-peak Predict energy availability (weather forecast) Power Now Time
1. Computation scheduling Assign cheap brown energy Job3 Job5 Job1 Job2 Job4 Job6 Off-peak On-peak Off-peak Previous peak Power Now Time
1. Computation scheduling Assign expensive energy Job3 Job5 Job1 Job2 Current power → Active servers Job4 Job6 Off-peak On-peak Off-peak Active servers Power Now Time
1. Computation scheduling As time goes by… the number of active servers changes Active servers Power Now Time
2. Data management • Deactivate servers to save energy • Some data might become unavailable • Prior solution: covering subset [Leverich’09] • Set of servers always running has ALL data Covering subset 7 6 3 2 1 7 1 2 3 6 8 5 7 4 8 3 4 1 5 • Our approach • Only required data has to be available • We usually require fewer active servers
2. Data management Server 1 Active Server 3 7 Server 2 1 2 4 4 6 Running queue: 6 5 3 Non-required file 4 6 JobA Required file 5 JobB Decommission 1 JobC Down Server 4 Server 5 2 4 3 6 8 3 7
2. Data management Server 1 Server 1 Active Server 3 7 7 Server 2 1 1 2 2 4 4 6 Running queue: 6 5 3 Non-required file 4 6 JobA Required file 5 JobB Decommission 1 JobC Down Server 4 Server 5 2 4 3 6 8 3 7 GreenHadoop (computation) requires only 2 servers
2. Data management Server 1 Active 1 Server 3 7 Server 2 1 2 4 4 6 Running queue: 6 5 3 4 6 JobA 5 JobB Replicate Decommission 1 JobC Down Server 4 Server 5 2 4 3 6 8 3 7 Move required files to Active servers
2. Data management Server 1 Server 1 Active 1 Server 3 7 7 Server 2 1 1 2 2 4 4 6 Running queue: 6 5 3 Non-required file 4 6 JobA Required file 5 JobB Decommission 1 JobC Down Server 4 Server 5 2 4 3 6 8 3 7 Decommissioned server can be sent to Down
2. Data management Server 1 Active 4 1 Server 3 7 Server 2 6 4 1 2 6 4 4 6 Running queue: 6 5 3 Non-required file 4 6 JobA Required file 5 JobB Decommission 1 JobC 8 JobD Required file Down 4 Server 4 Server 5 6 8 2 4 3 6 8 3 7 Jobs to be executed change → Required files change
2. Data management Server 1 Active 1 Server 3 7 Server 2 1 2 4 4 6 6 5 3 Non-required file Running queue: Required file 5 JobB Decommission 1 JobC 8 JobD Required file Down Server 4 Server 4 Server 5 2 2 4 4 3 6 8 8 3 3 7 Make missing data available
2. Data management Server 1 Active 1 Server 3 7 Server 2 1 2 4 4 6 6 5 3 Non-required file Running queue: Required file 5 JobB Decommission 1 JobC 8 JobD Down Server 4 Server 4 Server 5 2 2 4 4 3 6 8 8 3 3 7 GreenHadoop (computation) requires 3 servers
Evaluation methodology • Cluster with 16 Xeon servers • Hadoop and Hadoop turning off idle servers (EAHadoop) • GreenHadoop: green energy, brown electricity cost • Energy profile • NJ electricity pricing (on/off peak and peak cost) • Solar farm energy availability (14 PV panels) • Five pairs of days (combinations of high and low days) • Workload • Derived from Facebook [Zaharia’09] • Jobs with up to 37GB, 600 tasks, and 6 hours of length • Internal time bound of one day
Energy prediction vs actual cloud cover rain thunderstorm
GreenHadoop for Facebook & high-high days Green produced 30 kWh 59 kWh $8.00 Green consumed Brown consumed 31% more green 39% cost savings Brown price Green predicted 39 kWh 25 kWh $6.06 -24%
GreenHadoop for Facebook Effect of parameters in GreenHadoop Different pairs of days
Other results • Workload intensity (datacenter utilization) • High-priority jobs • Shorter time bounds • Data availability • Workloads variations • Consistent green energy increases and cost savings
Conclusions • Data-processing scheduler for green datacenters • Predicts green energy availability • Increases the use of green energy • Reduces brown electricity costs • Manages data availability • We are building Parasol • Solar-powered μdatacenter • Poster session
GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks ÍñigoGoiri, Kien Le, Thu D. Nguyen, JordiGuitart, Jordi Torres, and Ricardo Bianchini