1 / 26

GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks. Íñigo Goiri , Kien Le, Thu D. Nguyen, Jordi Guitart , Jordi Torres, and Ricardo Bianchini. Motivation. Datacenters consume large amounts of energy Energy cost is not the only problem Brown sources: coal, natural gas…

Download Presentation

GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks ÍñigoGoiri, Kien Le, Thu D. Nguyen, JordiGuitart, Jordi Torres, and Ricardo Bianchini

  2. Motivation • Datacenters consume large amounts of energy • Energy cost is not the only problem • Brown sources: coal, natural gas… • Connect datacenters to green sources • Solar panels, wind turbines… • Green datacenter • Early examples in the field

  3. Green datacenter • Energy sources • Solar/wind: variable over time • Electrical grid: backup • Mitigation approaches are not ideal • Batteries and net metering • We need to match the energy demand to the supply Solar power Power Load Workload Time

  4. Delaying load within time bounds J1 J2 J2 J3 Power Nodes Delay some jobs is OK (respecting time bounds) J1 J2 J3 Power Nodes Time

  5. Scheduling data-processing workloadsin green datacenters Shuffle • Data-processing jobs • Each task operates on a chunk of data • Data distributed among servers • Simple workflow: MapReduce • Map tasks: process input data • Reduce tasks: merge maps’ outputs Challenges • Match MapReduce workload with green energy availability • No information on #nodes, length, power… • Conserve energy while ensuring data availability 1 Map 2 Map Reduce 6 3 Map Reduce 7 4 Map 5 Map

  6. Overview of GreenHadoop • Predict solar energy availability • May delay jobs but must meet time bounds • Maximize green energy use • If not enough green energy, minimize brown electricity cost • Brown energy cost + peak brown power cost • Deactivate idle servers while keeping data available • Divided into two parts • Computation scheduling • Data management

  7. 1. Computation scheduling Estimate the energy required by jobs (EWMA) Job3 Job3 Job5 Job5 Job1 Job1 Job2 Job2 Job4 Job4 Job6 Job6

  8. 1. Computation scheduling Assign green energy first Job3 Job5 Job1 Job2 Job4 Job6 Off-peak On-peak Off-peak Predict energy availability (weather forecast) Power Now Time

  9. 1. Computation scheduling Assign cheap brown energy Job3 Job5 Job1 Job2 Job4 Job6 Off-peak On-peak Off-peak Previous peak Power Now Time

  10. 1. Computation scheduling Assign expensive energy Job3 Job5 Job1 Job2 Current power → Active servers Job4 Job6 Off-peak On-peak Off-peak Active servers Power Now Time

  11. 1. Computation scheduling As time goes by… the number of active servers changes Active servers Power Now Time

  12. 2. Data management • Deactivate servers to save energy • Some data might become unavailable • Prior solution: covering subset [Leverich’09] • Set of servers always running has ALL data Covering subset 7 6 3 2 1 7 1 2 3 6 8 5 7 4 8 3 4 1 5 • Our approach • Only required data has to be available • We usually require fewer active servers

  13. 2. Data management Server 1 Active Server 3 7 Server 2 1 2 4 4 6 Running queue: 6 5 3 Non-required file 4 6 JobA Required file 5 JobB Decommission 1 JobC Down Server 4 Server 5 2 4 3 6 8 3 7

  14. 2. Data management Server 1 Server 1 Active Server 3 7 7 Server 2 1 1 2 2 4 4 6 Running queue: 6 5 3 Non-required file 4 6 JobA Required file 5 JobB Decommission 1 JobC Down Server 4 Server 5 2 4 3 6 8 3 7 GreenHadoop (computation) requires only 2 servers

  15. 2. Data management Server 1 Active 1 Server 3 7 Server 2 1 2 4 4 6 Running queue: 6 5 3 4 6 JobA 5 JobB Replicate Decommission 1 JobC Down Server 4 Server 5 2 4 3 6 8 3 7 Move required files to Active servers

  16. 2. Data management Server 1 Server 1 Active 1 Server 3 7 7 Server 2 1 1 2 2 4 4 6 Running queue: 6 5 3 Non-required file 4 6 JobA Required file 5 JobB Decommission 1 JobC Down Server 4 Server 5 2 4 3 6 8 3 7 Decommissioned server can be sent to Down

  17. 2. Data management Server 1 Active 4 1 Server 3 7 Server 2 6 4 1 2 6 4 4 6 Running queue: 6 5 3 Non-required file 4 6 JobA Required file 5 JobB Decommission 1 JobC 8 JobD Required file Down 4 Server 4 Server 5 6 8 2 4 3 6 8 3 7 Jobs to be executed change → Required files change

  18. 2. Data management Server 1 Active 1 Server 3 7 Server 2 1 2 4 4 6 6 5 3 Non-required file Running queue: Required file 5 JobB Decommission 1 JobC 8 JobD Required file Down Server 4 Server 4 Server 5 2 2 4 4 3 6 8 8 3 3 7 Make missing data available

  19. 2. Data management Server 1 Active 1 Server 3 7 Server 2 1 2 4 4 6 6 5 3 Non-required file Running queue: Required file 5 JobB Decommission 1 JobC 8 JobD Down Server 4 Server 4 Server 5 2 2 4 4 3 6 8 8 3 3 7 GreenHadoop (computation) requires 3 servers

  20. Evaluation methodology • Cluster with 16 Xeon servers • Hadoop and Hadoop turning off idle servers (EAHadoop) • GreenHadoop: green energy, brown electricity cost • Energy profile • NJ electricity pricing (on/off peak and peak cost) • Solar farm energy availability (14 PV panels) • Five pairs of days (combinations of high and low days) • Workload • Derived from Facebook [Zaharia’09] • Jobs with up to 37GB, 600 tasks, and 6 hours of length • Internal time bound of one day

  21. Energy prediction vs actual cloud cover rain thunderstorm

  22. GreenHadoop for Facebook & high-high days Green produced 30 kWh 59 kWh $8.00 Green consumed Brown consumed 31% more green 39% cost savings Brown price Green predicted 39 kWh 25 kWh $6.06 -24%

  23. GreenHadoop for Facebook Effect of parameters in GreenHadoop Different pairs of days

  24. Other results • Workload intensity (datacenter utilization) • High-priority jobs • Shorter time bounds • Data availability • Workloads variations • Consistent green energy increases and cost savings

  25. Conclusions • Data-processing scheduler for green datacenters • Predicts green energy availability • Increases the use of green energy • Reduces brown electricity costs • Manages data availability • We are building Parasol • Solar-powered μdatacenter • Poster session

  26. GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks ÍñigoGoiri, Kien Le, Thu D. Nguyen, JordiGuitart, Jordi Torres, and Ricardo Bianchini

More Related