Breaking the MapReduce Stage Barrier

Breaking the MapReduce Stage Barrier Abhishek Verma, Nicolas Zea, Brian Cho, Indranil Gupta and Roy Campbell Department of Computer Science, University of Illinois at Urbana-Champaign

MapReduce • MapReduce introduced by Google [OSDI 04] • Programming model • Map and reduce primitives borrowed from functional languages • Map applies the same computation identically on partitioned data and reduce aggregates map outputs • Clustered computing system • Automatic parallelization & job distribution • Fault-tolerance via job re-execution • Provides status and monitoring tools IEEE Cluster 2010

Who is using MapReduce/Hadoop? IEEE Cluster 2010

MapReduce Model <be, 1>, <be, 1> <be, 2> <to, 1>, <be, 1>, <or, 1>, <not, 1>, <to, 1>, <be, 1> <in, 1> <in, 1>, <is, 1> To be, or not to be: <is, 1> <mind, 1> <nobler, 1> <not, 1> <or, 1> <question, 1> <suffer, 1> <or, 1>, <not, 1>, To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer <mind, 1> <nobler, 1> <not, 1> <or, 1> <question, 1> <suffer, 1> <that, 1>, <is, 1>, <the, 1>, <question, 1> <mind, 1> <suffer, 1> that is the question: <nobler, 1>, <question,1>, <whether, 1>, <tis, 1>, <nobler, 1>, <in, 1> Whether 'tis nobler in <to, 1>, <to,1>, <that, 1>, <the, 1>, <the, 1>, <tis, 1>, <to, 1>, <to, 1>, <to, 1>, <to, 1> <whether, 1> <that, 1>, <the, 2>, <tis, 1>, <to, 4>, <whether, 1> <the, 1> <to, 1> <the, 1>, <mind,1>, <to, 1>, <suffer, 1> <whether, 1> <tis, 1> the mind to suffer <that, 1>, <the, 1> IEEE Cluster 2010

Motivation • Barrier between Map and Reduce stages • Reduce stage does not start till all the map tasks have completed • Why is the barrier present? • Grouping of values by keys done through sorting • Simplifies re-execution of failed tasks • Is this barrier necessary? • Which applications benefit if we remove this barrier? IEEE Cluster 2010

Outline • Motivation • Breaking the barrier • Classify reduce operations • Memory management • Evaluation • Conclusion IEEE Cluster 2010

Original Wordcount function map (key, value): // key: document name // value: document contents for each word in value do Emit intermediate (word, 1) end for function reduce (key, values): // key: a word // values: a list of counts result ←0 for each val in values do result ← result + val end for Write (key, result) as output function run( ): • while there more keys do • key ← current key • values ← current values • reduce (key, values) • end while IEEE Cluster 2010

Barrier-less Wordcount function map (key, value): // key: document name // value: document contents for each word in value do Emit intermediate (word, 1) end for function reduce (key, values): // key: a word // values: a list of counts Fetch result from Treemap for eachval invalues do result ← result + val end for Update (key, result) in TreeMap function run( ): • Create a new TreeMap • while there are more keys do • key ← current key • values ← current values • if TreeMap does not contain key then • Insert (key, 0) in the TreeMap end if • reduce (key, values) • end while • // After all reduce invocations are done • for each(key, value) in TreeMap do • Write (key, value) as output • end for IEEE Cluster 2010

Barrier-less MapReduce <be, 1>, <be, 1> <be, 2> <to, 1>, <be, 1>, <or, 1>, <not, 1>, <to, 1>, <be, 1> To be, or not to be: To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer that is the question: <or, 1>, <not, 1>, <not, 1> <or, 1> Whether 'tis nobler in <to, 1>, <to,1>, <to, 2> the mind to suffer IEEE Cluster 2010

Barrier-less MapReduce <be, 2> <to, 1>, <be, 1>, <or, 1>, <not, 1>, <to, 1>, <be, 1> To be, or not to be: that is the question: <mind, 1> <not, 1> <or, 1> <suffer, 1> <mind, 1> <suffer, 1> <not, 1> <or, 1> Whether 'tis nobler in <the, 1> <to, 1> <the, 1>, <mind,1>, <to, 1>, <suffer, 1> <the, 1> <to, 3> <to, 2> the mind to suffer IEEE Cluster 2010

Barrier-less MapReduce <be, 2> <to, 1>, <be, 1>, <or, 1>, <not, 1>, <to, 1>, <be, 1> To be, or not to be: <in, 1> <in, 1> that is the question: <mind, 1> <nobler, 1> <not, 1> <or, 1> <suffer, 1> <mind, 1> <not, 1> <or, 1> <suffer, 1> <nobler, 1>, <whether, 1>, <tis, 1>, <nobler, 1>, <in, 1> Whether 'tis nobler in <the, 1>, <tis, 1>, <to, 3>, <whether, 1> <whether, 1> <tis, 1> <the, 1>, <mind,1>, <to, 1>, <suffer, 1> <the, 1> <to, 3> the mind to suffer IEEE Cluster 2010

Barrier-less MapReduce <be, 2> <to, 1>, <be, 1>, <or, 1>, <not, 1>, <to, 1>, <be, 1> To be, or not to be: <in, 1>, <is, 1> <is, 1> <in, 1> <that, 1>, <is, 1>, <the, 1>, <question, 1> that is the question: <mind, 1> <nobler, 1> <not, 1> <or, 1> <suffer, 1> <mind, 1> <nobler, 1> <not, 1>, <or, 1> <question, 1> <suffer, 1> <question,1> <whether, 1>, <tis, 1>, <nobler, 1>, <in, 1> Whether 'tis nobler in <that, 1> <the, 2>, <tis, 1>,<to, 3>, <whether, 1> <the, 1>, <tis, 1>, <to, 3>, <whether, 1> <the, 1>, <mind,1>, <to, 1>, <suffer, 1> <that, 1>, <the, 1> the mind to suffer IEEE Cluster 2010

MapReduce Timeline IEEE Cluster 2010

Implementation • Modified Hadoop 0.20 • Bypass the sorting mechanism • Modify reduce invocation so it can be called with a single record • Shared buffer producer consumer • first thread shuffles intermediate data • other thread applies reduce function on shuffled data • Configurable boolean flag • conf.setBarrier(true/false) IEEE Cluster 2010

Classifying Reduce Operations • Post-reduction processing • Single reducer aggregation • Sorting • Selection • Aggregation • Identity • Cross-key operations IEEE Cluster 2010

Post Reduction Processing • Two step process • Entries are processed and inserted in temp structure • Post-processing operation applied finally • E.g.: Last.fm tracks the number of unique users listening to a particular music track • Default implementation • Group users by track by emitting <TrackId, UserID> • Finally find the number of unique users • Barrier-less implementation • Maintain a set for each TrackId with the UserIDs IEEE Cluster 2010

Single Reducer Aggregation • Use single reducer to aggregate output from multiple map tasks • E.g.: Computing Black-Scholes option pricing • Map computes exponentiations • Single reduce computes mean and std dev • Barrier-less implementation • Reducer keeps running sum and sum of squares IEEE Cluster 2010

Sorting • One of the few applications which requires sorted keys • TeraSort and other sorting benchmarks • Use identity map and identity reduce function • Framework does the sorting in the shuffle phase • Barrier-less: Use Red-Black tree • Maintain O(records) state IEEE Cluster 2010

Memory Management Techniques • Intermediate results can grow up to O(records) • overflow memory capacity • Techniques to manage memory • Disk-spill and merge • When memory usage reaches a threshold, spill the results to disk • Merge partial results together in the end • Disk-spilling key/value store • Every reduce invocation fetches previous partial result, processes it and stores it back • Key/value store can evict records according to LRU IEEE Cluster 2010

Evaluation • Experimental Setup • 16 nodes of OpenCirrus Cloud • 8 cores, 16GB RAM, 2 TB hard disk • One node as job and DFS master + 15 slaves • Number of map and reduce tasks per node = 4 • Does the job completion time decrease after removing barrier? • How much programmer effort is required? • How well do our memory management techniques work? IEEE Cluster 2010

Application Speedups IEEE Cluster 2010

Memory management techniques IEEE Cluster 2010

Memory management IEEE Cluster 2010

Conclusion • General purpose MapReduce frameworks without barrier can result in significant benefits • Experiments demonstrate an average speedup of 25% • Preserves fault tolerance and ease of programming • Questions? • verma7@illinois.edu IEEE Cluster 2010

Breaking the MapReduce Stage Barrier