201 likes | 339 Views
Dolly+ for system management of PC Linux cluster. A.Manabe (CRC@KEK) Atsushi.Manabe@kek.jp http://corvus.kek.jp/~manabe. Motivation. System (Software) installation & update to more than 10 PC was boring and even hard work for me. How hard will installation to over 100PCs be?
E N D
Dolly+ forsystem management ofPC Linux cluster A.Manabe (CRC@KEK) Atsushi.Manabe@kek.jp http://corvus.kek.jp/~manabe
Motivation • System(Software) installation & update to more than 10 PC was boring and even hard work for me. • How hard will installation to over 100PCs be? • If installation is very fast, you can easily switch the OS from old to new version and its opposite way as well. It is very convenient for testing a brand-new version of OS. • If it is, it is also good for system recovery from HD trouble.
An idea and Objective • An way of installation process. • At first, you install a system to one PC. • Then clone the disk image to other PCs via network. • Config. files unique for each node such as hostname, IP address or so on will be created and overwrited to each PCs disk. • Target: • Installation to very many PCs (100~1000) of almost same spec.. • Objective: • Very fast installation; for example, 100PC installation in 10min. • Good scalability against the number of nodes. • Necessary human operation as small as possible. If there is, do in centralized way.
Dolly and Dolly+ Dolly • A Linux application software to copy/clone files or/anddisk images among many PCs through a network. • Dolly is originally developed by CoPs project in ETH (Swiss) and a free software. Dolly+ features • Sequential files (no limitation of over 2GB) and/or normal files (optinal:decompress and untar on the fly) transfer/copy via TCP/IP network. • Virtual RING network connection topology. • Pipeline and multi-threading mechanism for speed-up. • Fail recovery mechanism for robust operation. turn up to the next page
Dolly+: How do you start it on linux Config file example Server side(which has the original file) % dollyS [-v] -f config_file Nodes side % dollyC [-v] iofiles 3 /dev/hda1 > /tmp/dev/hda1 /data/file.gz >> /data/file boot.tar.Z >> /boot server n000.kek.jp firstclient n001.kek.jp lastclient n020.kek.jp client 20 n001 n002 : n020 endconfig # of files to Xfer server name # of client nodes clients names end code The left of ‘>’ is input file in the server. The right is output file in clients. '>' means dolly+ does not modify the image. '>>' indicate dolly+ should cook (decompress , untar ..) the file according to the name of the file.
Dolly: Virtual Ring Topology Server = host having original image • Physical network connection is as you like. • Logically Dolly makes a node ring chain. Its order is specified by dolly’s config file. • Though transfer is only between its two adjacent nodes, it can utilize max. performance ability of switching network of full duplex ports. • Good for network complex by many switches. node PC network hub switch physical connection Logical (virtual) connection
Other possibility of network connection which is not supported by Dolly
S (few) Server - (Many) Client model • Server could be a daemon process.(you don‘t need to start it by hand) • Performance is not scalable against # of nodes. • Server bottle neck. Network congestion. Multicasting or Broadcasting • No server bottle neck. • Get max performance of network which support multicasting in switch fablics. • Nodes failure does not affect to all the process very much, it could be robust. • Since failed node need re-transfer. Speed is governed by the slowest node as in RING topology. • Not TCP but UDP, so application must take care of transfer reliability.
Cascade Topology • Server bottle neck could be overcome. • Cannot get maximum network performance but better than many to only one topology. • Week against a node failure. Failure will spread in cascade way as well and difficult to recover.
BOF EOF 1 2 3 4 5 6 7 8 9 ….. File chunk =4MB 6 9 8 7 6 network Server 5 8 7 Node 1 network 5 7 6 Node 2 Next node PIPELINING & multi threading 3 thread in parallel
S Fail recovery mechanism • Only one node failure could be “show stopper” in RING (=series connection) topology. • Dolly+ provides automatic ‘short cut’ mechanism in node problem. • In a node trouble, the upper stream node detect it by sending time out. • The upper stream node negotiate with the lower stream node for reconnection and retransfer of a file chunk. • RING topology makes its implementation easy. time out Short cutting
Re-transfer in short cutting BOF EOF 1 2 3 4 5 6 7 8 9 ….. File chunk =4MB 6 9 8 7 6 network Server 5 8 7 Node 1 network 5 7 6 Node 2 Next node
Performance (measuredand expected) • Measured performance (see the next page graph!) • 1Server - 1Nodes (Pent.III 1GHz x 2cpu) • ATA 100 IDE disk/ full duplex 100Base-TX network ~ MB/s • 2GB image copy ~ 8.2MB/s, elapsed time ~230sec. • 1Server - 10Nodes (At the moment I have only 11 PCs available for the test.) • All nodes are the same type hardware as above. • 2GB image copy ~720MB/s in aggregate, elapsed time ~260 sec • Thanks to pipelining mechanism, elapsed time does not increase not so much as the nodes increase. (See the next page graph!) • 5min for 1 node then 10min for 500 nodes theoretically. • The measured time is only for image cloning, Actually you need around more 4 min. for booting process ( PXE+kickstart). turn up to the next page
How does dolly+ start after pushing reset button. • You setup kickstart, PXE and DHCP config. file and run these servers. Prepare one installed PC. Connect all PC to network. • Push reset button of all nodes. • Starting PXE process in all nodes. 3.1) PXE ask DHCP server for booting process (local boot, kickstart installation or diskless client ), IP address and its hostname. Default booting process you can setup by DHCP config file. So keyboard operate on each node is not necessary. (assume you select kickstart installation) 3.2) PXE download small OS kernel with kickstart images (~2MB) by multicast TFTP from PXE server. 3.3) Running RAM disk root Linux on all nodes. Red items;you have to operate Blue items;process automatically turn up to the next page
How does dolly+ start after pushing reset button. (2) • Starting kickstart process 4.1) kickstart get IP address and hostname from DHCP server andkeep it at the local RAM disk. 4.2) make file system in nodes’ disks according to the kickstart config. file. 4.3) kickstart invokes a ‘post’ shell script on nodes. 4.3.1) running dollyC on nodes • Start dollyS in the pre-installed machine. 5.1) After all nodes are ready,you start ‘dollyS’ in the PC which has been installed before this process. You can know the nodes readiness by using `ping’ command. turn up to the next page
How does dolly+ start after pushing reset button. (3) • Continuing kickstart post shell script 4.4) overwrite individual host information (IP address, hostname, fstab ..) to the cloned local disk from the RAM disk. 4.5) Do LILO. • Re-configure the DHCP servers for nodes to boot locally. • Push reset button of all nodes again • PXE (Pre eXecution Environment) http://developer.intel.com/ial/WfM/wfmspecs.htm • A standard propsed by Intel of Network Card firmware for network booting. It requires PXE compliant NIC hardware. • Kickstart (RedHat) • RedHat batch installer.
Conclusion • I have developed a fast installation process suitable for Linux cluster which consists of massive number of PCs. • Dolly+ is a disk image cloning software and is a part of here proposed installation process. It only takes almost same minutes for 10PCs disk image cloning as for 1 PC cloning and high scalability upto several hundreds PC installation is expected. • We are using it for 16 PCs cluster and will use for a cluster of 70 PCs in this September. • Software is available from http://corvus.kek.jp/~manabe/pcf/dolly • Thank you for your reading !