1 / 31

Hadoop Installation Fully Distributed Mode

Learn step-by-step Hadoop installation and configuration in fully distributed mode on Ubuntu VMs in AWS for big data processing. Includes setting up security groups, SSH connections, Java installation, Hadoop download, configuration, and running WordCount program.

vidrine
Download Presentation

Hadoop Installation Fully Distributed Mode

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HadoopInstallationFullyDistributedMode QianwenYe

  2. Before We Start • 1. create a few VM instances (Ubuntu is suggested) • 2. set proper security group constraints • 3. allow passphraseless connection between them

  3. Security Group Snapshot Inbound Outbound

  4. What I Have: • 4 Ubuntu VMS in AWS • 172.31.11.234 • 172.31.3.56 • 172.31.12.237 • 172.31.14.124 • Already set up passphraselesssshconnection

  5. Overview • Change /etc/hosts File (not necessary) • Java Installation • Hadoop Environment Configuration

  6. ChangeHosts File • On each VM’s Terminal: • Add following content:

  7. ChangeHosts File • Then we can use the following command to connect to each other:

  8. Install Java on each VM • Install Java

  9. Install Java on each VM • Configure JAVA HOME

  10. Download Hadoop: Master Node Only • Goes to Hadoop Download Page • http://hadoop.apache.org/releases.html • Find the link for downloading (binary)

  11. Download Hadoop: Master Node Only • Download and unzip it

  12. Configure ~/.bash_profile • For all VMs:

  13. Configure Hadoop: Master Node Only • Hadoop’s directory • Files need to be modified • core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml • hadoop-env.sh • slaves, masters

  14. core-site.xml

  15. hdfs-site.xml

  16. mapred-site.xml.template

  17. yarn-site.xml

  18. hadoop-env.sh

  19. Masters and slaves • Slaves • Master

  20. Send Hadoop to all other nodes

  21. Format Namenode and Start Hadoop

  22. Processes on Master node and Slave node

  23. Example: WordCount

  24. WordCount: Map

  25. WordCount: Reduce

  26. WordCount: Main

  27. Compile WordCount and make jar package

  28. Prepare Input

  29. Execute WordCount Program

  30. Check Result

  31. Thank you!

More Related