Optimizing MapReduce Provisioning in the Cloud

Optimizing MapReduce Provisioningin the Cloud Michael Cardosa, Aameek Singh†, HimabinduPucha†, Abhishek Chandra http://www.cs.umn.edu/~cardosa Department of Computer Science, University of Minnesota †IBM Almaden Research Center

MapReduce Provisioning Problem • Platform: • Virtualized Cloud Environment, which enables • Virtualized MapReduce Clusters • Several MapReduce Jobs from different users • Goal: Optimize system-wide metrics, such as: throughput, energy, load distribution, user costs • Problem: At the Cloud Service Provider level, how can we harvest opportunities to increase performance, save energy, or reduce user costs?

MapReduce Platform: Hadoop • Open-source implementation of MapReducedistributed computing framework • Used widely: Yahoo, Facebook, NYT, (Google) Input Data

Hadoop Clusters • Distributed data • Replicated chunks • Distributed computation • Map/reduce tasks • Traditional: Dedicated physical nodes

Virtual Hadoop Clusters • Run Hadoop on top of VMs • E.g.: Amazon Elastic MapReduce = Hadoop+AmazonEC2 Hadoop Processes VM Pool Server Pool

Roadmap • Intro & Problem • Platform Overview • Spatio-Temporal Insights for Provisioning • Building Blocks for MapReduce Provisioning • Case Study: Performance optimization • Case Study: Energy optimization

Spatio-Temporal Insights for Provisioning • Initial Focus: Energy Savings • Goal: Minimize energy usage • Energy+cooling ~ 42% of total cost [Hamilton08] • Problem: How to place the VMs on available physical servers to minimize energy usage? • Minimize Cumulative Machine Uptime (CMU)

VM Placement: Spatial Fit Job 1 Job 2 Job 3 Job 4 Co-Place complementary workloads

Which placement is better? SHUTDOWN SHUTDOWN 100min 20min 20min 20min 10min 20min A B

Time Balancing 20 20 25 25 Time Balance 20 20 20 20 25 25 25 25 30 30 30 90

Building Blocks for Provisioning MapReduce Jobs Objective-driven resource provisioning Job profiling Cluster scaling Migration Continuous Optimization Initial Provisioning Cloud Execution Environment

Building Blocks for Provisioning • Job Profiling: MapReduce job runtime estimation • Based on number of VMs allocated to job • Based on input data size • Offline and Online Profiling • Cluster Scaling: Changing number of VMs allocated to a particular MapReduce job • Affects runtime of job; relies on Job Profiling model • Migration: Useful for continuous optimization • Load balancing, VM consolidation

Job Profiling: Runtime Estimation • Based on Number of VMs

Job Profiling: Runtime Estimation • Based on Input Data Size

Job Profiling: Runtime Estimation • Online Profiling: Additional refinement

Cluster Scaling • Increasing allocated resources (typical): • Add additional VMs to join virtualized Hadoop cluster • Job performance increases, runtime decreases • E.g, for Time Balancing: Energy reasons • E.g, Load Balancing and Deadlines: Performance

Cluster Scaling: Time Balancing 20 20 25 25 Time Balance 20 20 20 20 25 25 25 25 30 30 30 90

Roadmap • Intro & Problem • Platform Overview • Spatio-Temporal Insights for Provisioning • Building Blocks for MapReduce Provisioning • Case Study: Performance optimization • Case Study: Energy optimization

Case Study: Performance & Deadlines • Goal: Meet deadlines for MapReduce jobs • Determine initial allocation accurately • Dynamically adjust allocation to meet deadline if necessary • Monitoring: Use offline profiling to estimate number of VMs needed based on past performance • Actuation: Online profiling: Trigger points to invoke cluster scaling

Case Study: Energy Savings • Goal: Minimize energy consumption from the execution of a large batch of MapReduce jobs • Energy+cooling ~ 42% of total cost [Hamilton08] • Pass energy savings on to users • Problem: How to place the VMs on available physical servers to minimize energy usage? • Minimize Cumulative Machine Uptime (CMU)

Case Study: Energy Savings • Use Job Profiling to place similar-runtime VMs together for initial provisioning • Use Job Profiling to adjust number of VMs in each cluster to adjust runtimes if needed • Monitoring: Online profiling to determine when energy could be saved by using migration or cluster scaling • Actuation: Use Cluster Scaling or Migration to dynamically adjust for inaccuracies/unknowns in initial provisioning

Conclusion • Framework: Building blocks (STEAMEngine) for the optimization of MapReduce provisioning from a cloud service provider perspective • Preliminary evaluations to validate usefulness of each building block • Approaches for applying building blocks to meet specific goals, e.g. performance, energy

Thank you! • Questions?

Job Profiling: Runtime Estimation • Based on Number of VMs

Cluster Scaling • Increasing allocated resources (typical): • Add additional VMs to join virtualized Hadoop cluster • Job performance increases, runtime decreases • E.g, for Time Balancing: Energy reasons • E.g, Load Balancing and Deadlines: Performance

Optimizing MapReduce Provisioning in the Cloud