90 likes | 270 Views
MapReduce Programming Model. Based on Lin and Dryer’s text: Chapter 3. Job Tracker and Task Tracker. Figure 2.6. Tom White’s Wordcount. MapReduce Model. A programmer has no control over: Where a mapper or reducer runs (i.e., on which node in the cluster).
E N D
MapReduce Programming Model Based on Lin and Dryer’s text: Chapter 3
Job Tracker and Task Tracker • Figure 2.6
MapReduce Model • A programmer has no control over: • Where a mapper or reducer runs (i.e., on which node in the cluster). • When a mapper or reducer begins or finishes. • Which input key-value pairs are processed by a specific mapper. • Which intermediate key-value pairs are processed by a specific reducer.
Techniques for controlling execution and managing data flow • Ability to: • Construct complex data types as keys and values for storage, processing and communications • Specify and execute initialization code before a map and/or reduce and the same for termination code after map and/or reduce. • To preserve state across multiple keys in map and/or in the reduce • To control sorting order of intermediate keys • To control partitioning of key space, and thus the set of keys a particular reduce will process
Objective • Address the issues without creating bottleneck for scalability • Golden standard that MR attempts is sheer linear scalability • Storing and manipulating state has the potential of hindering scalability • How to improve performance? • Make the functions efficient? • Transfer of intermediate data efficient • Aggregation of intermediate data is an important operation for efficiency • Shrink the intermediate key space • What else can we do?
Mapper • http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/mapreduce/Mapper.html • http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/mapred/package-summary.html • http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api
Mapper with built-in combiner-v1 class Mapper method Map(docida, doc d) H ← new AssociativeArray for all term t ∈ doc d do H{t} ← H{t} + 1 //Tally counts for entire document for all term t ∈ H do Emit(term t, count H{t})
Mapper with built-in combiner-v2 class Mapper method Initialize H ← new AssociativeArray method Map(docida, doc d) for all term t ∈ doc d do H{t} ← H{t} + 1 Tally counts across documents method Close for all term t ∈ H do Emit(term t, count H{t})