170 likes | 306 Views
Hadoop Demo. Presented by: Imranul Hoque. Topics. Hadoop running modes Stand alone Pseudo distributed Cluster Running MapReduce jobs Status/logs Sample MapReduce code. Required Software. Hadoop (release 0.18.3)
E N D
HadoopDemo Presented by: ImranulHoque
Topics • Hadoop running modes • Stand alone • Pseudo distributed • Cluster • Running MapReducejobs • Status/logs • Sample MapReduce code
Required Software • Hadoop (release 0.18.3) • http://apache.osuosl.org/hadoop/core/hadoop-0.18.3/hadoop-0.18.3.tar.gz • Java Development Kit (jdk 1.6.0_01) • http://java.sun.com/javase/downloads/index.jsp • Ant (ant 1.7.1) • http://apache.inetbridge.net/ant/binaries/apache-ant-1.7.1-bin.tar.gz
Setup NameNode: sherpa01 JobTracker: sherpa02 DataNode/TaskTracker: sherpa05, sherpa06
Assumptions • ssh must be installed and sshd must be running • Shared home directory (nfs) across all nodes in the cluster (makes life easier)
Steps • Install JDK, ant • Passphraselessssh • Compiling Hadoop • Setting up config parameters • Starting up Hadoop • Running jobs • Job status
Passphraselessssh Source Destination Generate private-public key-pair ~/.ssh/id_dsa and ~/.ssh/id_dsa.pub Send the public key to Destination Add the public key to the authorized key list~/.ssh/authorized_keys
Passphraselessssh (2) sherpa01 sherpa02 sherpa05 sherpa06 ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys (four times) Modify hostname in authorized_keys NFS Add “StrictHostKeyChecking no” in /etc/ssh/ssh_config to turn off prompt
Setting the PATH JAVA_HOME=/usr/java/jdk1.6.0_01 ANT_HOME=~/ant PATH=/usr/java/jdk1.6.0_01/bin:$PATH PATH=~/ant/bin:$PATH
Installing and Configuring Hadoop • Extract • Build (ant) • Modify conf/hadoop-env.sh: • export JAVA_HOME=/usr/java/jdk1.6.0_01 • Inform Hadoop of the Masters and Slaves • conf/masters • conf/slaves • Modify conf/hadoop-site.xml
Rack Awareness <property> <name>topology.script.file.name</name> <value>conf/fakedns.sh</value> </property> • In fakedns.sh: • echo /rack_id
Staring Hadoop • Format Namenode FS (sherpa01): • bin/hadoopnamenode -format • From NameNode (sherpa01): • bin/start-dfs.sh • From JobTracker (sherpa02): • bin/start-mapred.sh
Running MapReduce • Copy data to HDFS • bin/hadoopdfs -copyFromLocal ~/data gutenberg • Run MapReduce • bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -r 6 gutenberggutenberg-output • Some HDFS commands • copyToLocal, cat, cp, rm, du, ls, etc.
Job/Node Status • NameNode: • http://sherpa01.cs.uiuc.edu:50001 • DataNode: • http://sherpa02.cs.uiuc.edu:50002 • Also look at the logs: • logs/
WordCount.java • src/examples/org/apache/hadoop/examples/WordCount.java • Map function • Reduce function • Driver function
Shutdown • From NameNode(sherpa01): • bin/stop-dfs.sh • From JobTracker(sherpa02): • bin/stop-mapred.sh
Conclusion • For more details: • http://hadoop.apache.org/core/ • http://wiki.apache.org/hadoop/