1 / 6

MapReduce

MapReduce. Simplified Data Processing on Large Clusters. Jeffery Dean and Sanjay Ghemawat Google, Inc. Abstract & Introduction. MapReduce  programming model & an associated implementation for processing & generating large data sets Users specify a map and reduce function

cissy
Download Presentation

MapReduce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MapReduce Simplified Data Processing on Large Clusters Jeffery Dean and Sanjay Ghemawat Google, Inc.

  2. Abstract & Introduction • MapReduce programming model & an associated implementation for processing & generating large data sets • Users specify a map and reduce function • Map function processes and generates intermediate key/value pair • Reduce function merges intermediate values associated with the same intermediate key

  3. Programming Model • Map() and Reduce() • MapReduce library groups together intermediate values associated with same intermediate key I • Passes values to Reduce() via an iterator • Reduce()  merges values to form possibly smaller set of values • Zero or one output value is produced per Reduce invocation

  4. Example Counting Number of Word Occurrences (user code) map (String key, String value):// key: document name// value: document contents// emits each word plus associated count of occurrences for each word w in value:EmitIntermediate (w, “1”); reduce (String key, Iterator values):// key: a word// values: a list of counts// sums together all counts emitted for a particular word int result = 0;for each v in values: result += ParseInt(v);Emit (AsString(result)); user code is linked together with the MapReduce library

  5. Associated Types map (k1, v1)  list (k2, v2) reduce (k2, list (v2))  list (v2) • Input keys & values drawn from a different domain than the output keys & values • Intermediate keys & values are from the same domain as the output keys & values • User code to convert between strings & appropriate types

  6. More Examples

More Related