360 likes | 388 Views
This Edureka MapReduce Tutorial (MapReduce Tutorial blog: https://goo.gl/W0Rmtd) will help you understand the basic concepts of Hadoop's processing component - MapReduce. Below are the topics covered in this MapReduce Tutorial:<br><br>1) What is Hadoop MapReduce?<br>2) MapReduce In Nutshell<br>3) Advantages of MapReduce<br>4) Hadoop MapReduce Approach with an Example<br>5) Hadoop MapReduce/YARN Components<br>6) YARN With MapReduce<br>7) Yarn Application Workflow<br>8) Running a MapReduce Program<br><br>Check our complete Hadoop playlist here: https://goo.gl/ExJdZs
E N D
EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Agenda for today’s Session 1. 2. 3. 4. 5. 6. 7. 8. What is Hadoop MapReduce? MapReduce In Nutshell Advantages of MapReduce Hadoop MapReduce Approach with an Example Hadoop MapReduce/YARN Components YARN With MapReduce Yarn Application Workflow MapReduce Program with Hands On EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop Components 2 main Hadoop Components Storage Processing EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
MapReduce: Data Processing Using Programming Hadoop MapReduce is the processing component of Apache Hadoop It processes data parallelly in distributed environment Big Data Result EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
MapReduce In Nutshell Index and Search Map Classification Reduce Function Recommendation Used in Google HDFS Implemented Analytics MapReduce Pig Apache Hadoop Design Pattern For Hive A Program Model Summarization Eg: Inverted Index Features Large Scale Distributed Model Classification Eg: Top N records HBase Parallel Programming Recommendation Eg: Sort Analytics Eg: Join, Selection EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
2 Biggest Advantages of MapReduce EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Advantage 1: Parallel Processing Slave A Data Slave B Slave E Data is processed in parallel Processing becomes fast Master Slave C Slave D EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Advantage 2: Data Locality - Processing to Storage Slave A Data Slave B Slave E Moving Data to processing is very costly In MapReduce, we move processing to Data Master Slave C Slave D EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Traditional vs MapReduce Way EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Election Votes Counting Booth A Data Booth B Booth E Election Votes Casting Votes is stored at different Booths Result Centre has the details of all the Booths Result Centre Booth C Booth D EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Election Votes Counting – Traditional Way Booth A Data Booth B Booth E Counting – Traditional Approach Votes are moved to Result Centre for counting Moving all the votes to Centre is costly Result Centre Result Centre is over-burdened Counting takes time Booth C Booth D EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop MapReduce To the Rescue! Booth A Data Hadoop MapReduce Doesn’t Follow This Approach Booth B Booth E Result Centre Booth C Booth D EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Election Votes Counting – MapReduce Way Booth A Votes Booth B Booth E Counting – MapReduce Approach Votes are counted at individual booths Booth-wise results are sent back to the result centre Result Centre Final Result is declared easily and quickly using this way Booth C Booth D EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
MapReduce In Detail EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
MapReduce Way EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Anatomy of a MapReduce Program Map: Key Value (K1, V1) List (K2, V2) Reduce: MapReduce (K2, list (V2)) List (K3, V3) EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Let us take an example to understand MapReduce Way EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
MapReduce Way – Word Count Process EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Executing a MapReduce Program EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
MapReduce Using Yarn EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
YARN – Moving beyond MapReduce OTHER (Search) (Weave..) BATCH (MapReduce) INTERACTIVE (Text) ONLINE (HBase) STREAMING (Storm,S4, …) IN-MEMORY (Spark) HPC MPI (OpenMPI) GRAPH (Giraph) EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop 2.x Daemons EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop 2.x MapReduce Yarn Components Job History Server » Maintains information about submitted MapReduce jobs after their ApplicationMaster terminates Client » Submits a MapReduce Job ApplicationMaster Resource Manager » » » » One per application Short life Coordinates and Manages MapReduce Jobs Negotiates with Resource Manager to schedule tasks The tasks are started by NodeManager(s) » » Cluster Level resource manager Long Life, High Quality Hardware Node Manager » » » One per Data Node Monitors resources on Data Node Container » » Created by NM when requested Allocates certain amount of resources (memory, CPU etc.) on a slave node EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
YARN Application Workflow in MapReduce EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
YARN Workflow Scheduler Resource Manager Applications Manager (AsM) Node Manager Node Manager Node Manager Node Manager Container 2.2 Container 1.2 Node Manager Container 1.1 Node Manager Node Manager App Master 2 Node Manager Container 2.1 Node Manager Node Manager Node Manager Node Manager App Container 2.3 Master 1 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 1 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 3. AM registers with RM 3 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 3. AM registers with RM 3 4. AM asks containers from RM 4 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 3. AM registers with RM 3 4. AM asks containers from RM 4 5. AM notifies NM to launch containers 5 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 3. AM registers with RM 3 4. AM asks containers from RM 4 5. AM notifies NM to launch containers 5 6. Application code is executed in container 6 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 3. AM registers with RM 3 4. AM asks containers from RM 4 5. AM notifies NM to launch containers 5 6. Application code is executed in container 6 7 7. Client contacts RM/AM to monitor application’s status EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 3. AM registers with RM 3 4. AM asks containers from RM 4 5. AM notifies NM to launch containers 5 6. Application code is executed in container 6 7 7. Client contacts RM/AM to monitor application’s status 8. AM unregisters with RM 8 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Learning Resources Hadoop Tutorial: www.edureka.co/blog/hadoop-tutorial MapReduce Tutorial: www.edureka.co/blog/mapreduce-tutorial MapReduce Interview Questions: www.edureka.co/blog/interview-questions/hadoop-interview-questions-mapreduce EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Thank You … Questions/Queries/Feedback EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop