MapReduce

MapReduce Powering Hadoop

Overview • Overview • What is MapReduce • How Does It Divide Work • Example • Conclusion • References

What Is MapReduce • Originally created by Google • Used to query large data-sets • Extracts relations from unstructured data • Can draw from many disparate data sources

How It Divides Work http://docs.basho.com/riak/1.3.0/tutorials/querying/MapReduce/

4 Refinements • General algorithms fit most needs • User defined Tweaks to the Map and Reduce functions fit special problems

4.1 Partitioning Function • Users can define the number of reduce tasks to run (R) • We can redefine the intermediate keys • A default function is hash(key) mod R • Sometimes we may want to group output together, such as grouping web data by domain • We can redefine partition to use hash(Hostname(urlkey)) mod R

4.2 Ordering Guarantees • Within each partition, intermediate key/value pairs are always processed in increasing order • This supports efficient lookup of random keys

4.3 Combiner Function • There is sometimes significant repetition in the intermediate keys • This is usually handled in the Reduce function, but sometimes we want to partially combine it in the Map function • The combiner function does this for us, and in some situations grants significant performance gains

4.4 Input and Output Types • MapReduce can take data from a number of formats • The way the data is organized for input greatly effects the output • Adding support for a new data type only requires users to change the reader interface

4.5 Side-effects • Sometimes we want to output additional files from the Map or Reduce functions • Users are responsible for these files, as long as these outputs are deterministic

MapReduce

MapReduce

Presentation Transcript

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka

MapReduce

MapReduce