160 likes | 522 Views
Tools for Cluster Administration and Applications (ancient technology – from 2001…). System Administrators DO NOT scale Install / update operating system Install applications Add / Remove users etc. Users DO NOT scale Install applications Move data files Launch applications
E N D
Tools for ClusterAdministration and Applications(ancient technology – from 2001…)
System Administrators DO NOT scale Install / update operating system Install applications Add / Remove users etc. Users DO NOT scale Install applications Move data files Launch applications Interact with active jobs etc. Tools that… Treat cluster as single machine Scale from 1-to-N nodes 10,000’s of nodes Scale to Federated clusters Easy to learn – use – adapt ...Problem Large Cluster Administration: what is the... ...Solution
Tool Review • Systemimager • LUI – Linux Utility for cluster Install • VA Cluster Management (VACM) • Alert • Parallel UNIX Commands – (Ptools) • dsh • prsh • Webmin • ALINKA LCM - Linux Cluster Manager • ALINKA RAISIN • SCMS – Smile Cluster Management System • C3 – Cluster Command & Control • M3C – Managing Multiple Multi-User Clusters
Systemimager • Disk image / system administration • maintain disk coherency across cluster • administrator level tool • image server stores images • can build image server database of site disk images • Pros: • supported by VA Linux as opensource • architecture independent • Cons: • requires each node to request image (“pull image”) • only operates at disk image level (not individual file) • Dependencies: • rsync, DHCP • http://download.sourceforge.net/systemimager
Linux Utility for cluster Install – (LUI) • System install / restore • administrator level tool • easy to duplicate install by resource • linux kernel, system map, partition table, RPMs, “user exits”, local & remote NFS file systems • no need to store disk images • Pros: • LUI available as an RPM • supported by IBM as opensource • architecture independent • machine & resource groups • Cons: • only useful for system initialization • manually installed packages will have to be reinstalled • Dependencies: • NFS, tftp-hpa, bootp or dhcp, perl • http://oss.software.ibm.com/developer/opensource/linux/projects/lui
VA Cluster Management - (VACM) • GUI based Hardware level monitor • device power control, hardware reset, remote bios control, chasis intrusion, cpu fan status • Intel Intelligent Platform Management Interface motherboards • Pros: • monitor does not impact performance as IPMI runs in hardware micro controllers • Cons: • only available for Intel IMPI compliant motherboards • does not monitor power supply fan or external fan • Dependencies: • IMPI motherboard: • NB440BX Server Platform (Nightshade) • T440BX Server Platform (Nightlight) • L440GX Server Platform (Lancewood) • GTK+ v1.02, Gnome-libs, GDK v1.2, imlib v1.0.6 • http://www.valinux.com/software/vacm/
Alert • Web based UNIX cluster monitoring tool • local clients on each node reports to monitor node(s) • clients are scripts running as cron jobs • monitors run daemon to receive reports from clients • Monitors • alerts • print web pages • email notification of events • Pros • supports cluster configuration files, allowing definitions of subclusters • errors can be categorized • notifications can be assigned for each category • uses a special Alert log as opposed to having to search syslog • clients can be written to handle new monitoring tasks • Cons • no proactive event correction ability • http://www.cs.virginia.edu/~jdm2d/alert/
Parallel UNIX Commands – (Ptools project) • Parallel version of common UNIX commands • cp, cat, ls, rm, mv, find, ps, kill, exec, and test • Other parallel tools • parallel process find, command execution on satisfied condition, command execution on collection of files, display command output • Target Architecture • MPP with full Unix environment on each node • SP-1 • Meiko CS-2 • Unix NOWs • Argonne National Laboratory • William Gropp • Ewing Lusk • Status: vaporware -- latest reference ‘94 SHPCC paper • http://www.ptools.org/ • http://www.ptools.org/projects.html#PUC
Distributed Shell – (dsh) • Command line based • sequential execution across collection of hosts • rsh to access nodes • output prepended with host name • Pros: • single or multiple remote commands • can create node groups • command can specify individual hosts or use node groups • Cons: • no concurrent execution • no interactive operation • Dependencies: • rsh, Perl • environment vars: • BEOWULF_ROOT – directory with beowulf related files • WCOLL – location of file with default working collective • http://www.ccr.buffalo.edu/dsh.htm
Parallel Remote Shell – (prsh) • Command line based • concurrent execution across collection of hosts • run UNIX command across nodes • stderr & stdout returned to originating computer • Pros: • ability to use rsh or ssh • hosts and options can be specified in environment variables • output can be associated with hostname using --prepend • Cons: • not able to perform interactive tasks (stdin set to /dev/null) • using --status with rsh unreliable • Dependencies: • rsh, ssh, Perl • environment vars: • PRSH_OPTIONS – used before command line options • PRSH_HOSTS – default host list • http://www.cacr.caltech.edu/projects/beowulf/GrendelWeb/ • software/index.html
Webmin • web interface for system administration • designed for use on individual systems – not clusters • web server and CGI programs to perform administration tasks • Pros: • quick, graphical interface to most common system administration tasks • telnet module for console access to hosts • ability to define custom commands • view and manage running processes • easy addition of user written modules, and standards for writing them • Cons: • not intended for clusters • must have web server on every host • modules must be written entirely in Perl • Dependencies: • Perl 5 or later • web server • http://www.webmin.com/webmin/
ALINKA LCM - Linux Cluster Manager • Command line based management and configuration • Pros: • cluster-wide command execution, except superuser commands • ability to define and manage subclusters • load monitoring of nodes • MPI/PVM job execution support • Cons: • master node is NFS server for /home, /etc, and /var, limiting scalability • no support for using SSH, and cluster command doesn't work as root • no support for NIS or Shadow passwords • limited to homogeneous clusters • difficult to install and operate • Dependencies: • rsh, tar, nfs-server, sudo, php cgi-bin with pgsql support, bootpd, tcpdump, postgresql, gawk • http://www.alinka.com/download.htm#lcm
ALINKA RAISIN • GUI based management and configuration • same functionality as ALINKA LCM • added GUI • Pros: • cluster-wide command execution, except superuser commands • ability to define and manage subclusters • load monitoring of nodes • MPI/PVM job execution support • Cons: • all cons of ALINKA LCM • commercial license • Dependencies: • same as ALINKA LCM • apache • php module for apache with postgresql support • gnuplot • http://www.alinka.com/araisin.htm
Smile Cluster Management System – (SCMS) • Command line and GUI environment • designed managing beowulf-type clusters as single machine • latest version looks promising with ptools like command line interface • Pros: • many system utilities (e.g. node status, node control panel, node file system, disk space, ftp, process status, reboot/shutdown, rpm package manager, telnet, parallel UNIX commands, alarm services, and motherboard monitoring) • performance monitoring/logging of CPU, memory, I/O, and network • user-definable alarm levels with e-mail or script notifications • Cons: • no support for job scheduling and cluster resource allocation • no MPI/PVM job submission tool • no support for using SSH • Dependencies: • rsh, Java, Perl • http://smile.cpe.ku.ac.th/
Cluster Command & Control (C3) Tools • Command line based • single machine interface • cluster configuration file • serial & parallel versions • Pros: • serial version – deterministic execution, good for debugging • parallel version – efficient execution • ability to rapidly deploy software updates and update system images • command line list option allows subcluster management • distributed file scatter and gather operations • execution of any non-interactive command • Cons: • no support for interactive command execution • Dependencies: • DHCP, rsync 2.4.3 or later, OpenSSL, OpenSSH, DNS, SystemImager v0.23, • Perl v5.6.0 or later • http://www.csm.ornl.gov/clusterpowertools torc@msr.epm.ornl.gov
Cluster Command & Control (C3) Tools • System administration • cpushimage - “push” image across cluster • cshutdown - Remote shutdown to reboot or halt cluster • User tools • cpush - push single file -to- directory • crm - delete single file -to- directory • cget - retrieve files from each node • cexec - execute arbitrary command on each node • cps - run ps and retrieve the output from each node • ckill - kill a process on each node • Add “s” to end for serial version -- cshutdowns, cpushs, etc...