150 likes | 295 Views
VIRTUALISATION OF HADOOP CLUSTERS. Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT. Introduction. Physical machine can have a number of smaller virtual machines (VMs), each running a separate operating system instance. Challenges partitioning of a machine
E N D
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT
Introduction • Physical machine can have a number of smaller virtual machines (VMs), each running a separate operating system instance. • Challenges • partitioning of a machine • concurrent execution of multiple operating systems • Isolation of virtual machines from one another • Support heterogeneity of applications • Low performance overhead • Xen is a virtual machine monitor for x86 that supports execution of multiple guest operating systems hypervisor, kernel and user space applications
Objective • Automation of creation and deletion of a virtual cluster for hosting Hadoop using Xen • A large physical cluster can be simulated on few physical machines Steps • Input user configuration by editing configuration files. • Generates user specified number of VM running Hadoop. • Users can manage the Hadoop file system • Users can submit jobs for each physical machine.
Need for virtualisation • Ability to recover from software problems quickly by saving a copy of guest image. • High availability by relocating guests when a server machine in inoperable. • Dynamic load balancing by migrating guests from server machines. • Consolidation of many services in one physical machine and administer them independently in VM. • Usage of abundant computational power on the physical machine. Minimisation of cost. • Switch between applications on different OS using hypervisors.
HADOOP CLUSTER CONFIGURATION Host node is configured as master (NN) and also acts as slave (DN) Guest node (DN) is configured as slave
Master is the HostOS which acts as job tracker/Name node. • Slave is the GuestOS which acts as task tracker/Data node.
Steps in implementing • Installation of Xen kernel • Creation of Guest OS • Configuration of Guest OS • Installation of Java Development Kit • Extraction and Configuration of Hadoop Cluster • Creating OS image for new Guest Machines • Creation and removal of other Virtual machines, copy the OS images
Automated Creation of a Hadoop Virtual cluster XML file has configuration details of new VM
Advantages of automated virtualization in Hadoop • Effective isolation of the datanode from the load on the machine caused by other processes makes the datanode more responsive/reliable. • The availability of multiple virtual machines on each machine lowers the granularity of scheduling units thus making it possible to schedule multiple task trackers on the same machine and to improve the overall utilization of the whole clusters. • The snapshot a virtual cluster makes it possible to re-activate the same cluster in the future and start to work from the snapshot. (rollback)
Enhancements • Providing a graphical console for monitoring and managing virtual cluster. • Creation and Migration of virtual machine for the purpose of load balancing. • Enabling snapshot of the virtual machine. For checkpointing • Providing Intelligent Monitoring System which could detect the failure of a virtual machine in the cluster and restarts the particular virtual machine increasing the reliability.
Master as a Physical Node 7 NodesData nodes – 6 Virtual nodesName node –1 physical node
7 NodesData nodes – 1 physical node + 5 Virtual nodesName node –1 virtual node Master as a Virtual Node