1 / 40

Administration Tools for Managing Large Scale Linux Cluster

Administration Tools for Managing Large Scale Linux Cluster. CRC KEK Japan S.Kawabata,A.Manabe. Linux PC Clusters in KEK. PC Cluster 2 PenIII 800MHz 80CPU (40 nodes). PC Cluster 1 PenIII Xeon 500MHz 144 CPUs (36 nodes). PC Cluster 3 Pentium III Xeon 700MHz 320CPU (80 nodes).

bella
Download Presentation

Administration Tools for Managing Large Scale Linux Cluster

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Administration Tools for Managing Large Scale Linux Cluster CRC KEK Japan S.Kawabata,A.Manabe

  2. Linux PC Clusters in KEK

  3. PC Cluster 2 PenIII 800MHz 80CPU (40 nodes) PC Cluster 1 PenIII Xeon 500MHz 144 CPUs (36 nodes) ACAT2002

  4. PC Cluster 3 Pentium III Xeon 700MHz 320CPU (80 nodes)

  5. PC Cluster 4 1U server Pentium III 1.2GHz 256 CPU (128 nodes)

  6. 3U PC Cluster 5 Blade server: LP Pentium III 700MHz 40CPU (40 nodes)

  7. PC clusters • Already more than 400 nodes installed. • Only counting >middle size PC cluster. • All PC clusters are managed by individual user group. • A major exp. group plan to install several x100 nodes of blade server in this year. ACAT2002

  8. Center Machine • KEK Computer Center plan to have >1000 nodes in near future (~2004). • Will be installed under `~4year rental’ contract. • System will be share among many user groups. (don’t dedicate to one gr.) • According to their demand to cpu power, system partition will be vary. ACAT2002

  9. PC cluster for system R&D • FujitsuTS225 50 nodes • PentiumIII 1GHz x 2CPU • 512MB memory • 31GB disk • 100BaseTX x 2 • 1U rack mount model • RS232C x2 • Remote BIOS setting • Remote reset/power-off ACAT2002

  10. Necessary admin tools • Installation /update • Command Execution • Configuration • Status Monitoring ACAT2002

  11. Installation tool ACAT2002

  12. Installation tool Image Cloning Install system/application Copy disk partition image to nodes ACAT2002

  13. Installation tool Package server Package Information Db Clients Package archive ACAT2002

  14. Remote Installation via NW • Cloning disk image • SystemImager (VA) http://systemimager.sourceforge.net/ • CATS-i (soongsil Univ.) • CloneIt http://www.ferzkopp.net/Software/CloneIt/ • Comercial: ImageCast, Ghost,….. • Packages/Applications installation • Kickstart + rpm (RedHat) • LUI (IBM) http://oss.software.ibm.com/developerworks/projects/lui • Lucie (TiTec) http://matsu-www.is.titech.ac.jp/~takamiya/lucie/ ACAT2002

  15. Dolly+ • We developed ‘image cloning via NW’ installer `dolly+’. • WHY ANOTHER? • We install/update • frequently (according to user needs) • 100~1000 nodes at a time. • Traditional Server/Client software suffer server bottleneck. • Multicast copy with ~GB image seems unstable.(No free soft ? ) ACAT2002

  16. S (few) Server - (Many) Client model • Server could be a daemon process.(you don‘t need to start it by hand) • Performance is not scalable against # of nodes. • Server bottle neck. Network congestion Multicasting or Broadcasting • No server bottle neck. • Get max performance of network which support multicasting in switch fablics. • Nodes failure does not affect to all the process very much, it could be robust. • Since failed node need re-transfer. Speed is governed by the slowest node as in RING topology. • Not TCP but UDP, so application must take care of transfer reliability.

  17. Dolly and Dolly+ Dolly • A Linux application software to copy/clone files or/anddisk images among many PCs through a network. • Dolly is originally developed by CoPs project in ETH (Swiss) and a free software. Dolly+ features • Sequential files (no limitation of over 2GB) and/or normal files (optinal:decompress and untar on the fly) transfer/copy via TCP/IP network. • Virtual RING network connection topology. • Pipeline and multi-threading mechanism for speed-up. • Fail recovery mechanism for robust operation.

  18. Dolly: Virtual Ring Topology Server = host having original image • Physical network connection is as you like. • Logically ‘Dolly’ makes a node ring chain which is specified by dolly’s config file. • Though transfer is only between its two adjacent nodes, it can utilize max. performance ability of switching network of full duplex ports. • Good for network complex by many switches. node PC network hub switch physical connection Logical (virtual) connection

  19. Cascade Topology • Server bottle neck could be overcome. • Cannot get maximum network performance but better than many client to only one serv. topology. • Week against a node failure. Failure will spread in cascade way as well and difficult to recover.

  20. BOF EOF 1 2 3 4 5 6 7 8 9 ….. File chunk =4MB 6 9 8 7 6 network Server 5 8 7 Node 1 network 5 7 6 Node 2 Next node PIPELINING & multi threading 3 thread in parallel

  21. S Fail recovery mechanism • Only one node failure could be “show stopper” in RING (=series connection) topology. • Dolly+ provides automatic ‘short cut’ mechanism in node problem. • In a node trouble, the upper stream node detect it by sending time out. • The upper stream node negotiate with the lower stream node for reconnection and retransfer of a file chunk. • RING topology makes its implementation easy. time out Short cutting ACAT2002

  22. Re-transfer in short cutting BOF EOF 1 2 3 4 5 6 7 8 9 ….. File chunk =4MB 6 9 8 7 6 network Server 5 8 7 Node 1 network 5 7 6 Node 2 Next node

  23. Dolly+: How do you start it on linux Config file example Server side(which has the original file) % dollyS [-v] -f config_file Nodes side % dollyC [-v] iofiles 3 /dev/hda1 > /tmp/dev/hda1 /data/file.gz >> /data/file boot.tar.Z >> /boot server n000.kek.jp firstclient n001.kek.jp lastclient n020.kek.jp client 20 n001 n002 : n020 endconfig # of files to Xfer server name # of client nodes clients names end code The left of ‘>’ is input file in the server. The right is output file in clients. '>' means dolly+ does not modify the image. '>>' indicate dolly+ should cook (decompress , untar ..) the file according to the name of the file. ACAT2002

  24. ACAT2002

  25. Performance of dolly+ Less than 5min! for 100 nodes expected HW: FujitsuTS225 PenIII 1GHz x2, SCSI disk, 512MB mem, 100BaseT NW ACAT2002

  26. Dolly+ transfer speed scalability with size of image 600 PC: Hardware spec. 1500 (server & nodes) 500 1GHz PentiumIII x 2 10MB/s line 400 IDE-ATA/100 disk 100BASE-TX net 7MB/s line 300 40 50 60 70 256MB memory 1000 transfered bytes (MB) setup elapsed time speed 500 1server-1nodes 230sec 8.2MB/s 1server-2nodes 252sec 7.4MB/s x2 1server-7nodes 266sec 7.0MB/s x7 1server-10nodes 260sec 7.2MB/s x10 0 0 100 200 elapsed time (sec)

  27. How does dolly+ start after rebooting. • Nodes broadcast over the LAN in search of an installation server. • PXE server respond to nodes with information about the nodes IP and kernel download server. • The kernel and `ram disk / FS’ are Multicast TFTP’ed to the nodes and the kernel gets start. • The kernel hands off to an installation script which run a disk tool and ‘dolly+ ’. ACAT2002

  28. How does dolly+ start after rebooting. • The code partitions the hard drive, creates file systems and start `dolly+’ client on the node. • You start `dolly+’ master on the master host to start up a disk clone process. • The code then configure individual nodes. (Host name, IP addess… etc.) • ready to boot from its hard drive for the first time. ACAT2002

  29. Remote Execution ACAT2002

  30. Remote execution • Administrator sometimes need to issue a command to all nodes urgently. • Remote execution could be rsh/ssh/pikt/cfengine/SUT(mpich )* …. • Points are • To make it easy to know the execution result (fail or success) at a glance. • Parallel execution among nodes. *) Scalable Unix tools for cluster http://www-unix.mcs.anl.gov/sut/ ACAT2002

  31. WANI • WEB base remote command executer. • Easy to select nodes concerned. • Easy to specify script/type-in commands. • Issue the commands to nodes in parallel. • Collect result after error/fail detection. • Currently, the software is in prototyping by combinations of existed protocol and tools. (Anyway it works!) ACAT2002

  32. WANI is implemented on `Webmin’ GUI Start Command input Node selection ACAT2002

  33. Switch to another page Command execution result Host name Results from 200nodes in 1 Page ACAT2002

  34. Error detection • Exit code • “fail/failure/error” word `grep –i` • *sys_errlist[] (perror) list check • `strings /bin/sh` output check Flame color represents; White: initial Yellow: command starts Black: finished 1 2 BG color 3 4 ACAT2002

  35. Stdout output Click here Click here Stderr output

  36. Command Result Pages Result Error marked Result lpr Node hosts execution WEB Browser Piktc_svc PIKT server Webmin server Piktc error detector print_filter Lpd

  37. Status Monitoring • Cfengine /Pikt*1/Pica*2 • Ganglia*3 *1 http://pikt.org *2 http://pica.sourceforge.net/wtf.html *3 http://ganglia.sourceforge.net ACAT2002

  38. Conclusion • Installation: dolly+ • Install/Update/Switch system very quick. • Remote Command Execution • `Result at a glance’ is important for quick iteration. • Parallel execution is important • Status monitor • Configuration manager • Not matured yet. • Software is available from http://corvus.kek.jp/~manabe/pcf/dolly • Thank you for your reading ! ACAT2002

  39. ACAT2002

  40. ACAT2002

More Related