1 / 25

Optimizing MapReduce Provisioning in the Cloud

Optimizing MapReduce Provisioning in the Cloud. Michael Cardosa, Aameek Singh†, Himabindu Pucha †, Abhishek Chandra http://www.cs.umn.edu/~cardosa Department of Computer Science, University of Minnesota † IBM Almaden Research Center. MapReduce Provisioning Problem. Platform:

eagan
Download Presentation

Optimizing MapReduce Provisioning in the Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing MapReduce Provisioningin the Cloud Michael Cardosa, Aameek Singh†, HimabinduPucha†, Abhishek Chandra http://www.cs.umn.edu/~cardosa Department of Computer Science, University of Minnesota †IBM Almaden Research Center

  2. MapReduce Provisioning Problem • Platform: • Virtualized Cloud Environment, which enables • Virtualized MapReduce Clusters • Several MapReduce Jobs from different users • Goal: Optimize system-wide metrics, such as: throughput, energy, load distribution, user costs • Problem: At the Cloud Service Provider level, how can we harvest opportunities to increase performance, save energy, or reduce user costs?

  3. MapReduce Platform: Hadoop • Open-source implementation of MapReducedistributed computing framework • Used widely: Yahoo, Facebook, NYT, (Google) Input Data

  4. Hadoop Clusters • Distributed data • Replicated chunks • Distributed computation • Map/reduce tasks • Traditional: Dedicated physical nodes

  5. Virtual Hadoop Clusters • Run Hadoop on top of VMs • E.g.: Amazon Elastic MapReduce = Hadoop+AmazonEC2 Hadoop Processes VM Pool Server Pool

  6. Roadmap • Intro & Problem • Platform Overview • Spatio-Temporal Insights for Provisioning • Building Blocks for MapReduce Provisioning • Case Study: Performance optimization • Case Study: Energy optimization

  7. Spatio-Temporal Insights for Provisioning • Initial Focus: Energy Savings • Goal: Minimize energy usage • Energy+cooling ~ 42% of total cost [Hamilton08] • Problem: How to place the VMs on available physical servers to minimize energy usage? • Minimize Cumulative Machine Uptime (CMU)

  8. VM Placement: Spatial Fit Job 1 Job 2 Job 3 Job 4 Co-Place complementary workloads

  9. Which placement is better? SHUTDOWN SHUTDOWN 100min 20min 20min 20min 10min 20min A B

  10. Time Balancing 20 20 25 25 Time Balance 20 20 20 20 25 25 25 25 30 30 30 90

  11. Building Blocks for Provisioning MapReduce Jobs Objective-driven resource provisioning Job profiling Cluster scaling Migration Continuous Optimization Initial Provisioning Cloud Execution Environment

  12. Building Blocks for Provisioning • Job Profiling: MapReduce job runtime estimation • Based on number of VMs allocated to job • Based on input data size • Offline and Online Profiling • Cluster Scaling: Changing number of VMs allocated to a particular MapReduce job • Affects runtime of job; relies on Job Profiling model • Migration: Useful for continuous optimization • Load balancing, VM consolidation

  13. Job Profiling: Runtime Estimation • Based on Number of VMs

  14. Job Profiling: Runtime Estimation • Based on Input Data Size

  15. Job Profiling: Runtime Estimation • Online Profiling: Additional refinement

  16. Cluster Scaling • Increasing allocated resources (typical): • Add additional VMs to join virtualized Hadoop cluster • Job performance increases, runtime decreases • E.g, for Time Balancing: Energy reasons • E.g, Load Balancing and Deadlines: Performance

  17. Cluster Scaling: Time Balancing 20 20 25 25 Time Balance 20 20 20 20 25 25 25 25 30 30 30 90

  18. Roadmap • Intro & Problem • Platform Overview • Spatio-Temporal Insights for Provisioning • Building Blocks for MapReduce Provisioning • Case Study: Performance optimization • Case Study: Energy optimization

  19. Case Study: Performance & Deadlines • Goal: Meet deadlines for MapReduce jobs • Determine initial allocation accurately • Dynamically adjust allocation to meet deadline if necessary • Monitoring: Use offline profiling to estimate number of VMs needed based on past performance • Actuation: Online profiling: Trigger points to invoke cluster scaling

  20. Case Study: Energy Savings • Goal: Minimize energy consumption from the execution of a large batch of MapReduce jobs • Energy+cooling ~ 42% of total cost [Hamilton08] • Pass energy savings on to users • Problem: How to place the VMs on available physical servers to minimize energy usage? • Minimize Cumulative Machine Uptime (CMU)

  21. Case Study: Energy Savings • Use Job Profiling to place similar-runtime VMs together for initial provisioning • Use Job Profiling to adjust number of VMs in each cluster to adjust runtimes if needed • Monitoring: Online profiling to determine when energy could be saved by using migration or cluster scaling • Actuation: Use Cluster Scaling or Migration to dynamically adjust for inaccuracies/unknowns in initial provisioning

  22. Conclusion • Framework: Building blocks (STEAMEngine) for the optimization of MapReduce provisioning from a cloud service provider perspective • Preliminary evaluations to validate usefulness of each building block • Approaches for applying building blocks to meet specific goals, e.g. performance, energy

  23. Thank you! • Questions?

  24. Job Profiling: Runtime Estimation • Based on Number of VMs

  25. Cluster Scaling • Increasing allocated resources (typical): • Add additional VMs to join virtualized Hadoop cluster • Job performance increases, runtime decreases • E.g, for Time Balancing: Energy reasons • E.g, Load Balancing and Deadlines: Performance

More Related