1 / 9

An Introduction to Apache Hadoop MapReduce

An Introduction to Apache Hadoop MapReduce, what is it and how does it work ? What is the map reduce cycle and how are jobs managed. Why should it be used and who are big users and providers ?

semtechs
Download Presentation

An Introduction to Apache Hadoop MapReduce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Apache Hadoop MapReduce • What is it ? • Why use it ? • How does it work • Some examples • Big users

  2. MapReduce – What is it ? • Processing engine of Hadoop • Developers create Map and Reduce jobs • Used for big data batch processing • Parallel processing of huge data volumes • Fault tolerant • Scalable

  3. MapReduce – Why use it ? • Your data in Terabyte / Petabyte range • You have huge I/O • Hadoop framework takes care of • Job and task management • Failures • Storage • Replication • You just write Map and Reduce jobs

  4. MapReduce – How does it work ? Take word counting as an example, something that Google does all of the time.

  5. MapReduce – How does it work ? • Input data split into shards • Split data mapped to key,value pairs i.e. Bear,1 • Mapped data shuffled/sorted by key i.e. Bear • Sorted data reduced i.e. Bear, 2 • Final data stored on HDFS • There might be extra map layer before shuffle • JobTracker controls all tasks in job • TaskTracker controls map and reduce

  6. MapReduce - Some examples A visual example with colours to show you the cycle Split -> Map -> Shuffle -> Reduce

  7. MapReduce - Some examples A visual example of MapReduce with job and task trackers added to individual map and reduce jobs.

  8. Hadoop MapReduce – Big users • Users • Facebook • Yahoo • Amazon • Ebay • Providers • Amazon • Cloudera • HortonWorks • MapR

  9. Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems

More Related