60 likes | 244 Views
MapReduce. Simplified Data Processing on Large Clusters. Jeffery Dean and Sanjay Ghemawat Google, Inc. Abstract & Introduction. MapReduce programming model & an associated implementation for processing & generating large data sets Users specify a map and reduce function
E N D
MapReduce Simplified Data Processing on Large Clusters Jeffery Dean and Sanjay Ghemawat Google, Inc.
Abstract & Introduction • MapReduce programming model & an associated implementation for processing & generating large data sets • Users specify a map and reduce function • Map function processes and generates intermediate key/value pair • Reduce function merges intermediate values associated with the same intermediate key
Programming Model • Map() and Reduce() • MapReduce library groups together intermediate values associated with same intermediate key I • Passes values to Reduce() via an iterator • Reduce() merges values to form possibly smaller set of values • Zero or one output value is produced per Reduce invocation
Example Counting Number of Word Occurrences (user code) map (String key, String value):// key: document name// value: document contents// emits each word plus associated count of occurrences for each word w in value:EmitIntermediate (w, “1”); reduce (String key, Iterator values):// key: a word// values: a list of counts// sums together all counts emitted for a particular word int result = 0;for each v in values: result += ParseInt(v);Emit (AsString(result)); user code is linked together with the MapReduce library
Associated Types map (k1, v1) list (k2, v2) reduce (k2, list (v2)) list (v2) • Input keys & values drawn from a different domain than the output keys & values • Intermediate keys & values are from the same domain as the output keys & values • User code to convert between strings & appropriate types