1 / 18

Hadoop MapReduce Programmers perspective

Hadoop MapReduce Programmers perspective. HAMS Technologies www.hams.co.in director@hams.co.in priyank@hams.co.in vivek@hams.co.in. Hadoop overview.

jun
Download Presentation

Hadoop MapReduce Programmers perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hadoop MapReduce Programmers perspective HAMS Technologies www.hams.co.in director@hams.co.in priyank@hams.co.in vivek@hams.co.in HAMS Technologies

  2. Hadoopoverview • A framework that lets one easily write and run applications that process vast amounts of data. It includes terminology like:MapReduce, HDFS, Hive, Hbase, Pig. • Yahoo is the biggest contributor. Other major contributor are Facebook, Google, Amazon/A9. • Here's what makes it especially useful: • Scalable and reliable • Easy of implementation • Efficient • Lots of tool available • Supporting many well known languages and scripts. HAMS Technologies

  3. How Hadoop works ? • MapReducedivides applications into small blocks of work. • HDFS creates desire replicas of data blocks for reliability, placing them on compute nodes around the cluster. • MapReduce can then process the data locally followed by aggregation of intermediate result . HAMS Technologies

  4. General flow in MapReduce architecture • Create a clustered network. • Load the data into cluster using Map (mapper task). • Fetch the processing data with help of Map (mapper task). • Aggregate the result with Reducer ( Reducer task). Local Data Local Data Local Data Map Map Map Partial Result-1 Partial Result-2 Partial Result-3 Reduce Aggregated Result HAMS Technologies

  5. General attributes of in MapReduce architecture • Distributed file system (DFS) • Data locality • Data redundancy for fault tolerance • Map tasks applied to partitioned data it scheduled so that input blocks are on same machine. • Reducer tasks applied to process data partitioned by MAP task. Local Data Local Data Local Data Map Map Map Partial Result-1 Partial Result-2 Partial Result-3 Reduce Aggregated Result HAMS Technologies

  6. Hadoop is an open source implementation of MapReduced architecture maintained by Apache Hadoop HDFS Hadoop Distributed file system MapReduce Job trackers Master nodes name node/s Job tracker node/s Data node/s Hive (Hadoop interactIVE) Data Node Data Node Data Node Data node/s Data node/s Data node/s Slave nodes Tracker node/s Tracker node/s Tracker node/s HAMS Technologies

  7. Hadoop-streaming allow to create and run MapReducde job as Mapper and/or as Reducer. • HDFS (Hadoop Distributed File System) is a clustered network used to store data. HDFS contain the script to replicate and track the different data blocks. HDFS write is show below. In same reverse manner we retrieve data from HDFS. I am having a file contains 3 blocks.. Where should I write these? Okey, Write these on data-node 1 ,2 and 3 hams.txt Block-1 1 2 Block-2 Name Node 3 Block-3 3 3 Data Node-1 Data Node-2 Data Node-3 Data Node-n Data node/s Data node/s Data node/s Data node/s Tracker node/s Tracker node/s Tracker node/s Tracker node/s HAMS Technologies

  8. When to use Hadoop • Unstructured data for analysis • Very large amount of data • Write ones (less), read many • Multiple modules written in different languages HAMS Technologies

  9. Kind of people working in development of Application using Hadoop 1. Hadoop Admin/Technical person : People who configure the Hadoop environment, setting required number of cluster with detail of all data source and different nodes 2. Hadoop programmer : People who write the different map reduce function to perform the data analysis. *Here we are taking the perspective of Hadoop programmer. HAMS Technologies

  10. Map/Reduce is a programming model for efficient distributed computing • It works like a Unix pipeline: • Unix -> cat input | grep | sort | uniq -c | cat > output • Hadoop-> Input | Map | Shuffle & Sort | Reduce | Output • A simple model but good for a lot of applications • Log processing. • Web index building. • Count of URL Access Frequency • ReverseWeb-Link Graph: • list of all source URLs associated with a given target URL • Inverted index: Produces <word, list(Document ID)> pairs • Distributed sort HAMS Technologies

  11. HAMS Technologies

  12. Here we need to take care the implementation of Map and reduce function and need to write code for launching the application • Mapper • Input: value: lines of text of input • Output: key: word, value: 1 • Reducer • Input: key: word, value: set of counts • Output: key: word, value: sum • Launching program • Defines the job • Submits job to cluster HAMS Technologies

  13. Mapper ( example for word count) public static class WordCountMapextendsMapper<LongWritable, Text, Text, IntWritable> { private final static IntWritableone = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizertokenizer = new StringTokenizer(line,"\t"); //System.out.println(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } HAMS Technologies

  14. Reducer ( example for word count) public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritableval : values) { sum += val.get(); } result.set(sum); context.write(key, result); } HAMS Technologies

  15. Map reduce launcher Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordCountMap.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[1])); FileOutputFormat.setOutputPath(job, new Path(args[2])); job.waitForCompletion(true); HAMS Technologies

  16. Running the complete program • Build the jar file either directly using eclipse or by jar command. • Configure the Hadoop. • Place the jar file in appropriate location. • Lets move to the Demo : ) HAMS Technologies

  17. Documentation : • Hadoop Wiki • Introduction • http://hadoop.apache.org/core/ • Getting Started • http://wiki.apache.org/hadoop/GettingStartedWithHadoop • Map/Reduce Overview • http://wiki.apache.org/hadoop/HadoopMapReduce • DFS • http://hadoop.apache.org/core/docs/current/hdfs_design.html • Javadoc • http://hadoop.apache.org/core/docs/current/api/index.html HAMS Technologies

  18. Thank you Kindly drop us a mail at below mention address for any suggestion and clarification. We like to hear from you HAMS Technologies www.hams.co.in director@hams.co.in priyank@hams.co.in vivek@hams.co.in HAMS Technologies

More Related