190 likes | 365 Views
Center-of-Gravity Reduce Task Scheduling to Lower MapReduce Network Traffic. Mohammad Hammoud , M. Suhail Rehman , and Majd F. Sakr. Hadoop MapReduce. MapReduce is now a pervasive analytics engine on the cloud Hadoop is an open source implementation of MapReduce
E N D
Center-of-Gravity Reduce Task Scheduling to Lower MapReduce Network Traffic Mohammad Hammoud, M. SuhailRehman, and Majd F. Sakr
HadoopMapReduce • MapReduce is now a pervasive analytics engine on the cloud • Hadoop is an open source implementation of MapReduce • HadoopMapReduce incorporates two phases, Map and Reduce phases, which encompass multiple map and reduce tasks Map Task HDFS BLK Reduce Task Partition Partition Partition Map Task HDFS BLK Dataset Reduce Task To HDFS Partition Partition Map Task HDFS BLK HDFS Partition Reduce Task Partition Partition Partition Map Task HDFS BLK Partition Merge Stage Shuffle Stage Reduce Stage Map Phase Reduce Phase
Task Scheduling in Hadoop • A golden principle adopted by Hadoop is: “Moving computation towards data is cheaper than moving data towards computation” • Hadoopapplies this principle to Map task scheduling but not to Reduce task scheduling • With reduce task scheduling, once a slave (or a TaskTracker- TT) polls for a reduce task, R, at the master node (or the JobTracker- JT), JT assigns TTanyR Shuffle Partitions Total Network Distance (TND) = 4 A locality problem, where R is scheduled at TT1 while its partitions exist at TT4 CS +1 +1 RS2 RS1 +1 +1 TT4 JT TT1 TT3 TT5 TT2 Request Reduce Task R Assign R to TT1 CS= Core Switch & RS = Rack Switch
Data Locality: A Working Example • TT1 and TT2 are feeding nodes of a reduce task R • Every TT is requesting R from JT • JT can assign R to any TT CASE-I CASE-II CASE-III CS CS CS RS1 RS1 RS1 RS2 RS2 RS2 TT1 TT1 TT1 TT2 TT2 TT2 TT3 TT4 TT3 TT3 TT4 TT4 JT TT5 JT JT TT5 TT5 TNDR = 8 TNDR = 8 TNDR = 4 CASE-IV CASE-V • Hadoop does not distinguish between different cases to choose the one that provides best locality • (i.e., CASE-IV or CASE-V) CS CS RS1 RS1 RS2 RS2 TT1 TT2 TT3 TT4 JT TT5 TT1 TT2 TT3 TT4 JT TT5 TNDR = 2 TNDR = 2
Partitioning Skew in MapReduce • Existing Hadoop’s reduce task scheduler is not only locality unaware, but also partitioning skew unaware • Partitioning skew refers to the significant variance in intermediate key’s frequencies and their distribution across different data nodes • Partitioning skew has been reported to exist in many scientific applications including feature extraction and bioinformatics, among others • Partitioning skew causes shuffle skew where some reduce tasks receive more data than others Sort WordCount K-Means
Partitioning Skew: A Working Example • TT1 and TT2 are feeding nodes of a reduce task R • TT1 and TT2 are requesting R from JT • R’s partitions at TT1 and TT2 are of sizes 100MB and 20MB CASE-IV CASE-V CS CS RS1 RS2 RS1 RS2 TT1 TT2 TT3 TT4 JT TT5 TT1 TT2 TT3 TT4 JT TT5 TNDR = 2 TNDR = 2 Amount of data shuffled = 100MB Amount of data shuffled = 20MB • Hadoop does not consider partitioning skew exhibited by some MapReduce applications
Our Work • We explore the locality and the partitioning skew problems present in the current Hadoop implementation • We propose Center-of-Gravity Reduce Scheduler (CoGRS), a locality-aware skew-aware reduce task scheduler for MapReduce • CoGRS attempts to schedule every reduce task, R, at its center-of-gravity node determined by: • The network locations of R’s feeding nodes • The skew in the sizes of R’s partitions • By scheduling reduce tasks at their center-of-gravity nodes, we argue for diminished network traffic and improved Hadoop performance
Talk Roadmap • The proposed Center-of-Gravity Reduce Scheduler (CoGRS) • Tradeoffs: • Locality, Concurrency, Load Balancing and Utilization • CoGRS and the Shuffle Stage in Hadoop • Quantitative Methodology and Evaluations • CoGRS on a Private Cloud • CoGRS on Amazon EC2 • Concluding Remarks
CoGRS Approach • To address data locality and partitioning skew, CoGRS attempts to place every reduce task, R, at a suitable node that minimizes: • Total Network Distance of R (TNDR) • Shuffle Data • We suggest that a suitable node would be the center-of-gravity node in accordance with: • The network locations of R’s feeding nodes • The weights of R’s partitions • We define the weight of a partition P needed by R as the size of P divided by the total sizes of all the partitions needed by R 1 2 1 To address 2 To address
Weighted Total Network Distance • We propose a new metric called Weighted Total Network Distance (WTND) and define it as follows: • WTNDR = where: n is the number of R’s partition,ND is the network distance required to shuffle a partition i to R, and wi is the weight of a partition i • In principle, the center-of-gravity of R is always one of R’s feeding nodes since it is less expensive to access data locally than to shuffle them over the network • Hence, we designate the center-of-gravity of R to be the feeding node of R that provides the minimum WTNDR
Locality, Load Balancing, Concurrency, and Utilization Tradeoffs • Strictly exploiting data locality can lead to scheduling skew • GoGRS gives up some locality for the sake of extracting more concurrency and improving load balancing and cluster utilization isOccupied? YES Attempt to Schedule Close to TT5 CS CS Better Utilization and Load Balancing RS2 RS2 RS1 RS1 TT4 TT4 JT JT TT1 TT1 TT3 TT3 TT5 TT5 TT2 TT2 IDLE IDLE IDLE R1 R1 R2 R2 R3 R3 R4 R4
GoGRS and Early Shuffle • To determine the center-of-gravity node of a particular reduce task, R, we need to designate the network locations of R’s feeding nodes • This cannot be precisely determined before the Map phase commits because any map task could lead a cluster node to become a feeding node of R • Default Hadoop starts scheduling reduce tasks after only 5% of map tasks commit so as to overlap Map and Reduce phases • GoGRSdefers early shuffle a little bit (e.g., after 20% of map tasks commit) so that most (or all) keys (which determine reduce tasks) will likely be encountered EARLY SHUFFLE Reduce Map Shuffle
Quantitative Methodology • We evaluate CoGRS on: • A private cloud with 14 machines • Amazon EC2 with 8, 16, and 32 instances • We use Apache Hadoop 0.20.2 • We use various benchmarks with different dataset distributions
Timeline: Sort2 (An Example) H_OFF Ends H_ON Ends Earlier Defers early shuffle a little bit GoGRS Ends Even Earlier
Reduce Network Traffic on Private Cloud • On average, CoGRS maximizes node-local data by 34.5% and minimizes off-rack data by 9.6% versus native Hadoop
Execution Times on Private Cloud • CoGRS outperforms native Hadoop by an average of 3.2% and by up to 6.3%
GoGRS on Amazon EC2: Sort2 32 EC2 Instances 8 EC2 Instances 16 EC2 Instances • Compared to native Hadoop, on average, CoGRSmaximizes node-local data by 1%, 32%, • and 57.9%, and minimizes off-rack data by 2.3%, 10%, and 38.6% with 8, 16, and 32 cluster sizes, respectively • This translates to 1.9%, 7.4%, and 23.8% average reductions in job execution times under CoGRS versus native Hadoop with 8, 16, and 32 cluster sizes, respectively
Concluding Remarks • In this work we observed that the network load is of special concern with MapReduce • Large amount of traffic can be generated during the shuffle stage • This can deteriorate Hadoop performance • We realized that scheduling reduce tasks at their center-of-gravity nodes has positive effects on Hadoop’s network traffic and performance • Average reductions of 9.6% and 38.6% of off-rack network traffic have been accomplished on a private cloud and on Amazon EC2, respectively • This provided Hadoop by up to 6.3% and 23.8% performance improvement on a private cloud and on Amazon EC2, respectively • We expect GoGRS to play a major role in MapReduce for applications that exhibit high partitioning skew (e.g., scientific applications)
Thank You! Questions?