MapReduce

MapReduce Simplified Data Processing on Large Clusters Jeffery Dean and Sanjay Ghemawat Google, Inc.

Abstract & Introduction • MapReduce programming model & an associated implementation for processing & generating large data sets • Users specify a map and reduce function • Map function processes and generates intermediate key/value pair • Reduce function merges intermediate values associated with the same intermediate key

Programming Model • Map() and Reduce() • MapReduce library groups together intermediate values associated with same intermediate key I • Passes values to Reduce() via an iterator • Reduce()  merges values to form possibly smaller set of values • Zero or one output value is produced per Reduce invocation

Example Counting Number of Word Occurrences (user code) map (String key, String value):// key: document name// value: document contents// emits each word plus associated count of occurrences for each word w in value:EmitIntermediate (w, “1”); reduce (String key, Iterator values):// key: a word// values: a list of counts// sums together all counts emitted for a particular word int result = 0;for each v in values: result += ParseInt(v);Emit (AsString(result)); user code is linked together with the MapReduce library

Associated Types map (k1, v1)  list (k2, v2) reduce (k2, list (v2))  list (v2) • Input keys & values drawn from a different domain than the output keys & values • Intermediate keys & values are from the same domain as the output keys & values • User code to convert between strings & appropriate types

More Examples

MapReduce

MapReduce

Presentation Transcript

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka

MapReduce

MapReduce