An Introduction to Apache Hadoop MapReduce

Apache Hadoop MapReduce • What is it ? • Why use it ? • How does it work • Some examples • Big users

MapReduce – What is it ? • Processing engine of Hadoop • Developers create Map and Reduce jobs • Used for big data batch processing • Parallel processing of huge data volumes • Fault tolerant • Scalable

MapReduce – Why use it ? • Your data in Terabyte / Petabyte range • You have huge I/O • Hadoop framework takes care of • Job and task management • Failures • Storage • Replication • You just write Map and Reduce jobs

MapReduce – How does it work ? Take word counting as an example, something that Google does all of the time.

MapReduce – How does it work ? • Input data split into shards • Split data mapped to key,value pairs i.e. Bear,1 • Mapped data shuffled/sorted by key i.e. Bear • Sorted data reduced i.e. Bear, 2 • Final data stored on HDFS • There might be extra map layer before shuffle • JobTracker controls all tasks in job • TaskTracker controls map and reduce

MapReduce - Some examples A visual example with colours to show you the cycle Split -> Map -> Shuffle -> Reduce

MapReduce - Some examples A visual example of MapReduce with job and task trackers added to individual map and reduce jobs.

Hadoop MapReduce – Big users • Users • Facebook • Yahoo • Amazon • Ebay • Providers • Amazon • Cloudera • HortonWorks • MapR

Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems

An Introduction to Apache Hadoop MapReduce

An Introduction to Apache Hadoop MapReduce

Presentation Transcript

Introduction to Apache Hadoop

An Introduction to MapReduce:

Apache Hadoop 2.0

Introduction to Hadoop and MapReduce

Hadoop: Beyond MapReduce

Introduction to MapReduce and Hadoop

MapReduce and Hadoop

Mapreduce and Hadoop

Hadoop MapReduce

Introduction to MapReduce and Hadoop

Apache Hadoop

Running Non- MapReduce Applications on Apache Hadoop

MapReduce: Hadoop Implementation