190 likes | 290 Views
A Hadoop Overview. Outline. Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A. Outline. Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A. Progress. Hadoop buildup has been completed.
E N D
Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A
Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A
Progress • Hadoop buildup has been completed. • Version 0.19.0, running under Standalone mode. • HBase buildup has been completed. • Version 0.19.3, with no assists of HDFS. • Simple demonstration over MapReduce. • Simple word count program.
Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A
Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A
Hadoop • Full name Apache Hadoopproject. • Open source implementation for reliable, scalable distributed computing. • An aggregation of the following projects (and its core): • Avro • Chukwa • HBase • HDFS • Hive • MapReduce • Pig • ZooKeeper
Virtual Machine (VM) • Virtualization • All services are delivered through VMs. • Allows for dynamically configuring and managing. • There can be multiple VMs running on a single commodity machine. • VMware
HDFS(Hadoop Distributed File System) • The highly scalable distributed file system of Hadoop. • Resembles Google File System(GFS). • Provides reliability by replication. • NameNode & DataNode • NameNode • Maintains file system metadata and namespace. • Provides management and control services. • Usually one instance. • DataNode • Provides data storage and retrieval services. • Usually several instances.
MapReduce • The sophisticate distributed computing service of Hadoop. • A computation framework. • Usually resides on HDFS. • JobTracker & TaskTracker • JobTracker • Manages the distribution of tasks to the TaskTrackers. • Provides job monitoring and control, and the submission of jobs. • TaskTracker • Manages single map or reduce tasks on a compute node.
Cluster Makeup • A Hadoop cluster is usually make up by: • Real Machines. • Not required to be homogeneous. • Homogeneity will help maintainability. • Server Process. • Multiple process can be run on a single VM. • Master & Slave • The node/machine running the JobTracker or NameNode will be Master node. • The ones running the TaskTracker or DataNode will be Slave node.
Administrator Scripts • Administrator can use the following script files to start or stop server processes. • Can be located in $HADOOP_HOME/bin • Start-all.sh/stop-all.sh • Start-mapred.sh/stop-mapred.sh • Start-dfs.sh/stop-dfs.sh • Slaves.sh • hadoop
Configuration • By default, each Hadoop Core server will load the configuration from several files. • These file will be located in $HADOOP_HOME/conf • Usually identical copies of those files are maintained in every machine in the cluster.
Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A
Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A
Q & A Any question?