360 likes | 508 Views
Improving MapReduce Performance in Heterogeneous Environments. Sturzu Antonio-Gabriel, SCPD. Table of Contents. Intro Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End) Evaluation of Performance. Intro.
E N D
Improving MapReduce Performance in Heterogeneous Environments Sturzu Antonio-Gabriel, SCPD
Table of Contents • Intro • Scheduling in Hadoop • Heterogeneity in Hadoop • The LATE Scheduler(Longest Approximate Time to End) • Evaluation of Performance
Intro • The big volume of data that internet services work on has led to the need of parallel processing • The leading example is Google which uses MapReduce to process 20 petabytes of data per day • MapReduce breaks a computation into small tasks that run in parallel on multiple machines and scales easy to very large clusters • The two key benefits of MapReduce are: • Fault tolerance • Speculative execution
Intro • Google has noted that speculative execution improves response time by 44% • The paper shows an efficient way to do speculative execution in order to maximize performance • It also shows that Hadoop’s simple speculative algorithm based on comparing each task’s progress to the average progress brakes down in heterogeneous systems
Intro • The proposed scheduling algorithm increases Hadoop’s response time by a factor of two • Most of the examples and tests are done on Amazon’s Elastic Compute Cloud( EC2) • The paper adresses two important problems in speculative execution: • Choosing the best node to run the speculative task • Distinguishing between nodes slightly slower than the mean and stragglers
Scheduling in Hadoop • Hadoop divides each MapReduce job into tasks • The input file is split into even-sized chunks replicated for fault tolerance • Each chunk of input is first processed by a map task that outputs a set of key-value pairs • Map outputs are split into buckets based on the key
Scheduling in Hadoop • When all map tasks finish reducers apply a reduce function on the set of values associated with each key
Scheduling in Hadoop • Hadoop runs several maps and reduces concurrently on each slave in order to overlap I/O with computation • Each slave tells the master when it has empty task slots • First any failed task is given priority • Second a non-running task • Third a task to execute speculatively
Scheduling in Hadoop • To select speculative tasks Hadoop uses a progress score between 0 and 1 • For a map the progress score is the fraction of input data read • For a reduce task the execution is divided into 3 phases each of which accounts for 1/3 of the score: • The copy phase • The sort phase • The reduce phase
Scheduling in Hadoop • In each phase the score is the fraction of data processed • Hadoop calculates an average score for each category of tasks in order to define a threshold for speculative execution • When a task’s progress score is less than the average for its category minus 0.2 and the task has run for at least a minute it is marked as a straggler
Scheduling in Hadoop • All tasks beyond the threshold are considered equally slow and ties between them are broken by data locality • This threshold works well in homogeneous systems because tasks tend to start and finish in “waves” at roughly the same times • When running multiple jobs Hadoop uses a FIFO discipline
Scheduling in Hadoop • Assumptions made by Hadoop Scheduler: • Nodes can perform work at roughly the same rate • Tasks progress at a constant rate throughout time • There is no cost to launching a speculative task on a node that would otherwise have an idle slot
Scheduling in Hadoop • A task’s progress score is a representative of fraction of its total work that it has done • Tasks tend to finish in waves, so a task with a low progress score is likely a straggler • Tasks in the same category (map or reduce) require roughly the same amount of work
Heterogeneity in Hadoop • Too many speculative tasks are launched because of the fixed threshold (assumption 3 falls) • Because the scheduler uses data locality to rank candidates for speculative execution the wrong tasks may be chosen first • Assumptions 3, 4 and 5 fall on both homogeneous and heterogeneous clusters
The LATE Scheduler • The main idea is that it speculatively executes the task that will finish farthest in the future • Estimates progress rate as ProgressScore/T, where T is the amount of time the task has been running for • The time to completion is (1-ProgressScore)/ProgressRate
The LATE Scheduler • In order to get the best chance to beat the original task which was speculated the algorithm launches speculative tasks only on fast nodes • It does this using a SlowNodeThreshold which is a metric of the total work performed • Because speculative tasks cost resources LATE uses two additional heuristics: • A limit on the number of speculative tasks executed (SpeculativeCap) • A SlowTaskThreshold that determines if a task is slow enough in order to get speculated (uses progress rate for comparison)
The LATE Scheduler • When a node asks for a new task and the number of speculative tasks is less than the threshold: • If the node’s progress score is below SlowNodeThreshold ignore the request • Rank currently running tasks that are not being speculated by estimating completion time • Launch a copy of the highest ranked task whose progress rate is below SlowTaskThreshold • Doesn’t take into account data locality
The LATE Scheduler • Advantages of the algorithm: • Robust to node heterogeneity because it launches only the slowest tasks and only few of them • Prioritized among slow tasks based on how they hurt response time • Takes into account node heterogeneity when choosing on which node to run a speculative task • Executes only tasks that will improve the total response time, not any slow task
The LATE Scheduler • The time completion estimation can produce errors when a task’s progress rate decreases but in general gets correct approximations in typical MapReduce jobs
Evaluation • In order to create heterogeneity they mapped a variable number of virtual machines (from 1 to 8) on each host in the EC2 cluster • They measured the impact of contention on I/O performance and Application Level Performance
Evaluation • For Application Level they sorted 100 GB of random data using Hadoop’s Sort benchmark with speculative execution disabled • With isolated VM’s the job completed in 408 s and with VM’s packed densely onto physical hosts (7 VM’s per host) it took 1094s • For evaluating the scheduling algorithms they used clusters of about 200 VM’s and they performed 5-7 runs
Evaluation • Results for scheduling in a heterogeneous cluster of 243 VM’s using the Sort job (128 MB per host) for a total of 30GB of data:
Evaluation • On average LATE finished jobs 27% faster than Hadoop’s native scheduler and 31% faster than no speculation • Results for scheduling with stragglers • In order to simulate stragglers they manually slowed down eight VM’s in a cluster of 100
Evaluation • For each run they sorted 256 MB per host for a total of 25GB:
Evaluation • They also ran other two workloads on a heterogeneous cluster with stragglers: • Grep • WordCount • They used a 204 node cluster with 1 to 8 VM’s per host • For the Grep test they searched on 43GB of text data or about 200 MB per host
Evaluation • On average LATE finished jobs 36% faster than Hadoop’s native scheduler and 56% faster than no speculation • For the WordCount test they used a data set of 21GB or 100 MB per host
Evaluation • Sensitivity analysis • SpeculativeCap • SlowTaskThreshold • SlowNodeThreshold • SpeculativeCap results • They ran experiments at six SpeculativeCap values from 2.5% to 100% repeating each experiment 5 times
Evaluation • Sensitivity to SlowTaskThreshold • Here the idea is to not speculate tasks that are progressing fast if they are the only tasks left • They tested 6 values from 5% to 100%
Evaluation • We observe that values past 25% all work well with 25% being the optimum value • Sensitivity to SlowNodeThreshold