Tutorial: To run the MapReduce EEMD code with Hadoop on Futuregrid

-by Rewati Ovalekar Tutorial: To run the MapReduce EEMD code with Hadoop on Futuregrid

Step 1: • Code is available on: http://code.google.com/p/cyberaide/ • Download the code from: http://code.google.com/p/cyberaide/source/browse/#svn%2Ftrunk%2Fproject%2Fspring2011%2FEEMDAnalysis%2FEEMDJava

Step 2: • Create a futuregrid account • For further details refer: https://portal.futuregrid.org/tutorials (FutureGrid Tutorial)

Step 3: • Login to Futuregrid • ssh username@india.futuregrid.org • Following message will be displayed for successful login

Step 4: • Create a jar file • Step 5: • To transfer the jar file and the input file: • sftp username@india.futuregrid.org • put /../filepath

Step 6: • In order to run Hadoop on FutureGrid create an eucalyptus account • For further details refer: https://portal.futuregrid.org/tutorials/eucalyptus • Step 7: • Once the account is approved, load the eucalyptus tools : Module load euca2ools

Step 8: • Make sure that the jar file and the input file are in the same directory as the username.private key • Run the image which has hadoop on it: euca-run-instances -k rovaleka -t c1.xlarge emi-D778156D -k indicates the key name -t indicates the type of instance emi-D778156D indicates the image name -n indicates the number of clusters to run

Step 8: • Check the status using: • euca-describe-instances • Keep checking till the status is running, once the status is running one can login to run the Hadoop. It will be displayed as below:

Step 9: • Transfer the input file and the jar file to the required VM using: scp –i username.private filename root@149.165.146.171:/ (Make sure that the address is same as the address assigned to you else it will ask for password) • Login using: scp –i username.private root@149.165.146.171 (Make sure the address is same)

Step 10: • Above message will be displayed for successful login • Retrieve the transferred files and transfer it in the Hadoop folder: cd /.. mv filename /opt/hadoop-0.20.2 cd /opt/hadoop-0.20.2 SINGLE NODE

Step 11: • To run Hadoop: cd /opt/hadoop-0.20.2 bin/start-all.sh • To check if everything is started: jps

Step 12: • Transfer the input file on the HDFS: bin/hadoop dfs –copyFromLocal inputfile name_in_HDFS • To check if it is present on HDFS: bin/hadoop dfs –ls NOTE: We need to transfer the input file whenever we start Hadoop

Step 13: • To run the code: bin/hadoop jar [jarFile] EEMDHadoop [inputfilename] [required_output_file]

Step 14: • Retrieve the output : bin/hadoop dfs -copyToLocal [outputFileName] [outputfileNameToBeGiven] (output will be avaliable in part-00000 file) To check the logs and to debug the code go to folder logs/userlogs

Step 15: • Stop the Hadoop: bin/stop-all.sh exit

Thank you!!!

Tutorial: To run the MapReduce EEMD code with Hadoop on Futuregrid