90 likes | 271 Views
An Introduction to Apache Hadoop MapReduce, what is it and how does it work ? What is the map reduce cycle and how are jobs managed. Why should it be used and who are big users and providers ?
E N D
Apache Hadoop MapReduce • What is it ? • Why use it ? • How does it work • Some examples • Big users
MapReduce – What is it ? • Processing engine of Hadoop • Developers create Map and Reduce jobs • Used for big data batch processing • Parallel processing of huge data volumes • Fault tolerant • Scalable
MapReduce – Why use it ? • Your data in Terabyte / Petabyte range • You have huge I/O • Hadoop framework takes care of • Job and task management • Failures • Storage • Replication • You just write Map and Reduce jobs
MapReduce – How does it work ? Take word counting as an example, something that Google does all of the time.
MapReduce – How does it work ? • Input data split into shards • Split data mapped to key,value pairs i.e. Bear,1 • Mapped data shuffled/sorted by key i.e. Bear • Sorted data reduced i.e. Bear, 2 • Final data stored on HDFS • There might be extra map layer before shuffle • JobTracker controls all tasks in job • TaskTracker controls map and reduce
MapReduce - Some examples A visual example with colours to show you the cycle Split -> Map -> Shuffle -> Reduce
MapReduce - Some examples A visual example of MapReduce with job and task trackers added to individual map and reduce jobs.
Hadoop MapReduce – Big users • Users • Facebook • Yahoo • Amazon • Ebay • Providers • Amazon • Cloudera • HortonWorks • MapR
Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems