90 likes | 358 Views
TOOLS FOR BIG DATA Sandeep Prasad Dipojjwal Ray. GROUP 7. Objectives. Apache Hadoop Apache hadoop v1.0.3 and v1.0.4 successful installation Wordcount functionality by hadoop mapreduce Estimating value of 'Pi' by hadoop mapreduce MapReduce and HDFS. Apache Hadoop.
E N D
TOOLS FOR BIG DATA Sandeep Prasad Dipojjwal Ray GROUP 7
Objectives... • Apache Hadoop • Apache hadoop v1.0.3 and v1.0.4 successful installation • Wordcount functionality by hadoop mapreduce • Estimating value of 'Pi' by hadoop mapreduce • MapReduce and HDFS
Apache Hadoop... • High-Availability Distributed object-oriented platform • Open Source • Pseudo-Distributed single-node cluster • A part of Apache Lucene project • Handles petabytes of data
Installation of Hadoop v1.0.3 & 1.0.4... • Release Date v1.0.3 : October 12, 2012 • Release Date v1.0.4 : May 16, 2012 • OS : Ubuntu v12.04 • Prerequisites : Sun Java, hduser • Configuration
Examples... • WordCount example : • $ /bin/hadoop jar hadoop-1.0.3-examples.jar wordcount file01.txt Estimation of 'Pi' • $ /bin/hadoop jar hadoop-1.0.3-examples.jar pi (x) (y) x= Number of maps y= Sample per maps Runtime 2.25 seconds (x=10 ; y=100) Estimated value 3.1480000000000
MapReduce & HDFS... • Divide and conquer algorithm • Map() and Reduce() function derive roots from functional programming • JobTracker and TaskTracker • NameNode and DataNode • Hadoop Distributed File System • Java Framework
References... • http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster • http://lintool.github.io/Cloud9/ • Data intensive text-processing using Mapreduce Book by Jimmy Lin and Chris Dyer • http://hadoop.apache.org/releases.html • http://www.apache.org/dyn/closer.cgi/hadoop/co
framework written in Java highly fault-tolerant distributed file system JobTracker web UI provides information about general job statistics of the Hadoop cluster, running/completed/failed jobs and a job history log file The task tracker web UI shows you running and non-running tasks