1.5k likes | 1.52k Views
Learn how to build a PC cluster by installing Linux, administering the system, interconnecting with other PCs, and using parallel programming methods.
E N D
Introduction to Clusters:Build Yourself a PC cluster NOW! Tutorial S1 – November 11, 2001 Supercomputing 2001 – Denver, CO Drs. Christian Halloy & Kwai Wong • halloy@jics.utk.edu, wong@jics.utk.edu • Joint Institute for Computational Science • University of Tennessee, Oak Ridge National Laboratory • http://www.jics.utk.edu Build Yourself a PC Cluster NOW
Acknowledgments All the contributors of LINUX All the contributors of Beowulf Cluster Technology All the contributors in the art and science of parallel computing JICS Staff, graduate students, and collaborators Build Yourself a PC Cluster NOW
Disclaimer • The information and examples provided are based on the Red Hat Linux 7.1 installation on Intel PC platforms (with specific examples from some of our own hardware configurations) • Much of it should be applicable to other versions of Linux, • There is no warranty that the material presented here is error free • The authors nor JICS will not be held responsible for any direct, indirect, special, incidental or consequential damages related to any use of these materials Build Yourself a PC Cluster NOW
Goals • In this Tutorial you will learn how to : • Install Linux on a stand alone PC • Administer and maintain the system • Interconnect with other PCs over the network • Build a PC cluster • Manage multiple jobs on a cluster • Use parallel programming methods with MPI • Install and utilize useful scientific computing libraries Build Yourself a PC Cluster NOW
Introduction Installation File structure of Linux User administration Networking NFS NIS Kernel RPM PBS Software + InstallationCompilersMPI, PVM Parallel Programming Parallel Software Libs:BLAS, ScaLAPACK,ATLAS, etc. AZTEC, PETSc, etc. PICMSS Cluster Tools:C3, OSCAR Outline Build Yourself a PC Cluster NOW
Introduction Build Yourself a PC Cluster NOW
Need of Computing Resources • Requirement of computational capacity depends on applications and formulations and what you want to see…. • Length Scale- number of grid point - resolution of the movie screen • Time Scale- number of time step (number of repeated calculations)- how smooth you want your movie play out - PARALLEL COMPUTING • 2D problem : • grid points 100x100 = 10000 pts • 1 pts = 1 double word = 8 bytes • a vector of 10000 elements ~ 80 KB • need 10 such vectors ~ 800KB • 3D problem : • grid points 1000x1000 x1000 = 10e9 pts • 10e9 unknowns ~ 8 GB • need 100 such vectors ~ 800 GB !!! • Need PARALLEL COMPUTING Build Yourself a PC Cluster NOW
W W O O R R K K Parallel Processing Division of work into smaller tasks Multiple computers work on smaller tasks simultaneously >> Reduce Wall Clock Time << Build Yourself a PC Cluster NOW
Issues of Parallel Computing • Pros : • Save total elapsed time • Access to huge amount of memory and disk space • Cheaper overall cost (?) • Cons : • Difficult to construct • Efficient parallel algorithm may need some thought • Cost of program development KEYS: 1) LOAD BALANCE - similar amount of work for every processor 2) LOCALITY - minimize communications among processors 3) PORTABILITY - work well on different computer platforms 4) SCALABILITY - solve increasingly larger problems efficiently Build Yourself a PC Cluster NOW
General Parallel Computer Classification Shared Memory Systems (SMP) (SGI, HP, PC, IBM, SUN, etc) Distributed Memory Systems (MPP) (Compaq, SP2, PC or Workstation Cluster) P P P M M M P P P bus or switch Communication Network shared memory Build Yourself a PC Cluster NOW
Networks Of Workstations or PCs • NOWs use available resources to make a “poor man’s supercomputer” • Workstations or PCs are networked using Ethernet, ATM, etc • The workstations cooperate in solving problems, together acting like a distributed memory multiprocessor • Information is passed from one workstation to another using messages • Programs written in C or Fortran can use freely available message passing libraries (PVM, MPI, etc.) to send and receive information • PVM = Parallel Virtual Machine, • MPI = Message Passing Interface • Programs developed on NOWs can easily be ported to supercomputers when they are ready for production runs Build Yourself a PC Cluster NOW
About Linux • In 1991 Linus Torvalds created the first version of Linux OS on a 30386 PC and made it an Open Source Software. Thousands of people have since made contribution to it. • Linux OS distribution is composed of a version of Linux Kernel + tools and utilities. Linus Torvalds has the control of the release of the Linux kernel. • Linux can be regarded as a free Unix OS on PC. It also comes with a lot of free software, compilers, and web tools. • The first official release of Linux kernel 1.0 came in March 1992. The current stable kernel is 2.4.2.x. • However, it does not support new emerging hardware and it is not quite user-friendly (yet) with office tools. • Several implementations exist: RedHat, Debian, Mandrake, Suse, … • Lots of software packages are free with the Open Source efforts Build Yourself a PC Cluster NOW
Linux and Beowulf • Beowulf was the name of one of the first projects to produce parallel Linux clusters from off-the-shelf hardware and freely available software • Use inexpensive Intel x86 based boxes (PCs) • One or more networking methods (10/100 Ethernet, Myrinet, etc) • A fast network (hub, switch, gigabit switch or other) • Linux operating system (freely available Unix clone, includes full source code) • cc, f77, vi, emacs, perl, python and other free compilers and tools • Free PVM or MPI implementations (MPICH, LAM) , etc • Results in a Beowulf style supercomputer, for far less $$ than any traditional supercomputer • http ://www.linux.org • http://www/linuxdoc.org • http://www.beowulf.org/ • http://cesdis.gsfc.nasa.gov/beowulf/ • http://www.extremelinux.org • http://www.xtreme-machines.com/ Build Yourself a PC Cluster NOW
PC Clusters Overview Build Yourself a PC Cluster NOW
PC Clusters • PC Cluster: • a group of personal computers connected by a network • Features: • Hardware • Can use nearly any PC • Easy hardware service • Inexpensive • Software • Free • Available source codes • Open source and lots of help • Performance • Can get supercomputer performance on specific tasks • By far the best cost/performance ratio Build Yourself a PC Cluster NOW
PC Clusters Overview “The Do-It Yourself Supercomputer” by W. Haargrove, F. Hoffman, T. SterlingAugust 2001 issue of Scientific American The “Stone SouperComputer”at Oak Ridge NationalLaboratory http://www.sciam.com/2001/0801issue/0801hargrove.html Build Yourself a PC Cluster NOW
PC Clusters Overview (cont’d) • NASA Ames Research Center: Build Yourself a PC Cluster NOW
ORNL PC Cluster • HighTORC : 64 node cluster (128 PIII processors) , www.ccs.ornl.gov/torc Build Yourself a PC Cluster NOW
Network Interconnect • Ethernet -- a standard LAN protocol for a 10Mb/s bus using CSMA/CD (carrier-sense multiple access with collision detection) as access method. It uses various transmission media: coaxial cables, unshielded twisted pairs, optical fibers • Fast Ethernet (100 Mb/s) • Gigabit Ethernet (1 billion bits per second) carried primarily on optical fiber • 10 Gbs ….. • ATM (Asynchronous Transport Mode) -- a dedicated-connection switching technology that organizes digital data into 53-byte cells or packets and transmits them over a medium using digital signal technology. • Myrinet -- a Gigabit-per-second network with full-duplex 1.28+1.28 Gigabit/second links, switch ports, and interface ports Build Yourself a PC Cluster NOW
PC Cluster vs. SP2 Build Yourself a PC Cluster NOW
FLOPS / Dollar Performance Build Yourself a PC Cluster NOW
Case Study – Hive Build Yourself a PC Cluster NOW
The Team Build Yourself a PC Cluster NOW
Budget < $40000 One powerful server for compilations, space for file storage, and serial computations Several fat worker nodes with lots of memory for serial computations More worker nodes for parallel computations Try to fit the system into a 24-port 100Mb switch The Plan and The Specification Meeting with Local and Major Vendors Forward & backward iterations with the buyer • One server node with dual CPU & SCSI Drive • 5 Fat worker node with 1 GB RAM • 16 Worker nodes with 512 MB RAM • one 24 Port 100Mb Switch Build Yourself a PC Cluster NOW
The Hardware Schematic Build Yourself a PC Cluster NOW
The Fact Sheet: total = ~$36300 (Feb, 2000) • 1 - Server - $4459 -------------------------------------------- $ 4459 • PIII 550 Mhz , Dual CPU , 1GB RAM, 18G SCSI, Tall Tower with 6 Bays, others • 5 – Fat Worker - $1862x5 ----------------------------------- $ 9310 • PIII 500 Mhz , single CPU , 1GB RAM, 6G IDE, mid Tower, no others • 14 – Thin Worker - $1181x14 ------------------------------ $16534 • PIII 500 Mhz , single CPU , 512MB RAM, 6G IDE, mid Tower, no others • 2 – Thin Worker - $1273x2 -------------------------------- $ 2546 • PIII 450 Mhz , single CPU , 384MB RAM, 6G IDE, flat case, others 1 – 24-port Switch ------------------------------------------- $ 2072 1 – UPS , 2 kitchen racks ----------------------------------- $ 550 Cables & Software (f77 & C++) -------------------------- $ 700 1 – 15” monitor ----------------------------------------------- $ 129 Build Yourself a PC Cluster NOW
The Preparations and Details • Chose local vendor : • Largest local supplier in the region with a large warehouse • 3 year warranty and quick turn around fixes • Willing to work with our schedule and needs • PRICE LOW enough to win the contract • Machine Room : • Enough space to house the cluster, ~10’x10’ space • Enough power supply, 4-20 amp sockets • steady electric supply, steady cool air supply, Ethernet ports • Machine Specification : • IP addresses – two valid IP addresses, register with the network agency • Names for the entire PC cluster, tag the machines, official stuff!! Build Yourself a PC Cluster NOW
The Action Steps • Get the server and two clients from the vendor • Install the server and clients, then test them out • Setup a PC at the site and proceed to test the network connection • Ask vendor to duplicate the disk of the client and put them in the rest of the worker (client) nodes and deliver them to the site • Setup the entire cluster in the machine room • Change IP addresses and names on all the clients • Check and test the entire cluster • Prepare set of documentations and user materials • Backup images on tapes • Install commercial compilers and software & run some benchmarks • Ask the users to run test cases and report problems; fix problems • Declare the cluster in production mode Build Yourself a PC Cluster NOW
The Hard Work Build Yourself a PC Cluster NOW
The Product Build Yourself a PC Cluster NOW
Outside IP (Get in using SSH only) server (galaxy) NIS domain name = workshop 192.168.1.100 Private IPs (Allow RSH) switch Client (star2) Client (star3) Client (star1) 192.168.1.1 192.168.1.2 192.168.1.3 The PC Cluster – NOW (built at SC2001!) Build Yourself a PC Cluster NOW
Installation and File System Build Yourself a PC Cluster NOW
Installation Steps • A procedure run down • Reboot your computer with Redhat CD • Create partitions for Linux • Manage the mount points for the created partitions • Format the partitions • Install the Linux software packages • Configure X Windows • Configure the network • Install LILO • NFS • NIS • Software Build Yourself a PC Cluster NOW
Linux Installation Steps – Lab 1 (RH-7.1) • Insert RH CD and install Linux • Enter language selection, keyboard selection • Choose custom option to install system • Perform disk partition using disk Druid • Choose partitions to format • Lilo configuration • Network configuration • Firewall configuration (optional) • Language support selection, Time zone selection • Account Setup, set root password • Authentication configuratn, MD5 – yes; shadow passwd – no; NIS – no; • Package selection • X-configuration • Monitor Configuration – choose text mode to login • Install ----- 30 minutes --- coffee break • Create boot disk • Exit - DONE WITH BASIC SETUP! Build Yourself a PC Cluster NOW
Boot and Supplemental Floppies (Ref) • If you use the installation boot floppy, you need to prepare two diskettes: • boot.img floppy (for CD-ROM and hard drive installation) • bootnet.img (for FTP, HTTP, NFS installation) • Building the installation boot diskette: • Insert the Red Hat Linux CD into the CD-drive • Run rawrite.exe in the Dosutils folder on the CD-ROM • Enter e:/images/boot.img to create a boot disk or • e:/images/supp.img to create a supplemental disk • (assuming that e: is your CD-drive) in “Enter Disk image source file name:” prompt then hit Enter • Insert a clean formatted floppy in the floppy drive. You need a separate floppy for each of the two image files • Enter a: in a prompt saying “Enter target drive:” • (assuming that a: is your floppy drive) then press Enter Build Yourself a PC Cluster NOW
Disk Partitioning • A partition is a section of the hard drive dedicated to an operating system. • Due to the BIOS limitations of the Intel-based machines, a hard disk can have: • Up to four Primary partitions • An Extended partition that contains Logical partitions • Data is stored only on primary or logical partitions • Swap partition(s) can be used to increase the amount of memory (RAM) by creating an area of virtual on-disk memory. Access to the swap space is much slower than the conventional RAM. Swap space is used when memory pages haven’t been accessed in some amount of time. • At least two partitions are required for Linux: • a swap (>16MB, 2xRAM is better) • a rootpartition, named / (>200 MB) located entirely below cylinder 1023 due to the BIOS limitations, ~8 GB Build Yourself a PC Cluster NOW
Partitioning tools: disk druid • Redhat 7.1 uses a Disk Druid program to create or edit Linux disk partitions • It has a pull down menu to select mount points and a space to edit the size of the partition. • You will see a Linux disk partition is denoted with a combination of: • two letters indicating the type of device (hd - for IDE disks, or sd - for SCSI disks) • followed by a letter for the device (/dev/hda -- the first IDE hard disk, /dev/hdb -- the second IDE hard disk) • followed by the size and mount point of the partition • Do not create or delete partitions of other operating systems • None of the changes made will take effect until you save them • fdisk is also available, as an alternative to Disk Druid Build Yourself a PC Cluster NOW
Mount point & /etc/fstab • The directory to which a device or partition is attached is called a mount point. The partition table you create is written directly to the disk drive. • The mount points and the partitions that are mapped to them are written to the /etc/fstab file from where they are read during the boot process and mounted using the mount -a command /dev/hda5 / ext2 defaults 1 1 /dev/hda8 /tmp ext2 defaults 1 2 /dev/hda7 /usr ext2 defaults 1 2 /dev/hd6 swap swap defaults 0 0 /dev/fd0 /mnt/floppy ext2 owner, noauto 0 0 none /proc proc defaults 0 0 none /dev/pts devpts gid=5,mode=620 0 0 galaxy:/home /home nfs defaults 0 0 Build Yourself a PC Cluster NOW
Filesystem Hierarchy Standard (FHS) • The Filesystem Hierarchy Standard, FHS, has been designed to provide a standard used by Linux distribution developers, package developers, and system implementers. According to this standard, all Linux distributions should put programs and data in similar places to eliminate confusion for new users. Having such a standard has a number of benefits: • This eliminates confusion for new users and developers • This makes it easier to maintain a system • Third party product vendors, like Netscape, Acrobat, and Corel, can port their software to Linux and not a particular vendor’s distribution • FHS Home Page:http://www.pathname.com/fhs Build Yourself a PC Cluster NOW
The Root Directory ( / ) • According to the FHS, the root directory must contain most of the following directories: • /bin - user programs to take the system beyond booting into production. • /boot - kernel images, RAM disk images, and kernel symbol maps necessary to install the boot loader and images (within 1023 cylinders) • /dev - system devices • /etc - system configuration files • /home - default directory for users’ home directories • /mnt - mount points for cdrom, floppy and NFS mounts • /opt - third party software (Acrobat, Word,etc) • /proc - virtual filesystem that keeps track of running processes • /root - root’s home directory • /sbin - system programs necessary for system boot, recovery and initialization • /tmp - temporary storage; • /lib - kernel loadable modules • /usr - static shared and unshared data (esp. native Linux software) • /var - for holding changing data (machine specific) Build Yourself a PC Cluster NOW
The /etc directory Host-specific system configuration files. NEVER Mount!!! • This is the directory where the Linux configuration files are located. • /etc/cron.daily - programs executed by the system daily. You can put any script that you want to be executed every day under this directory. • /etc/cron.weekly - same as /etc/cron.daily but executed weekly • /etc/pam.d - security configuration files • /etc/rc.d - system boot and initialization scripts • /etc/skel - a set of files copied to a newly created user account (.bashrc, .bash_profile, .cshrc, .rhosts) • /etc/sysconfig - network configuration files • inittab, passwd, fstab, HOSTNAME, resolv.conf, xinetd.d, printcap….. Build Yourself a PC Cluster NOW
The /proc directory Every Linux has a /proc directory. It is not a real file system. It is an interface into kernel data structures that provide information about running processes. • /proc/<filename> - info about components of the running system (/proc/interrups, /proc/loadavg, /proc/cpuinfo, /proc/meminfo) • /proc/net - info about current networking statistics • /proc/<pid> - dir of the running process • /proc/sys - files which can be used to change the runtime configuration of the system (network, kernel, virtual memory config) • /proc/<pid>/environ - working environment of the process • /proc/<pid>/exe - symlink to the binary file being executed • /proc/<pid>/fd - file descriptors which are symlinks to the process’ file descriptors • /proc/<pid>/stat - info used by ps command Build Yourself a PC Cluster NOW
The /var directory Changing Data, Write Only, Individual • /var/cache/ - used by programs that need files in place of memory • /var/lock/ - for locking devices • /var/log/ - log files (boot process, logins, security, mail, spool) • /var/mail/ - user mail spool directory • /var/run/ - files with PIDs and other info about running procs • /var/tmp/ - temporary storage used by daemons • /var/spool/ - spooled data (printing: /var/spool/lpd) • /var/yp/ - NIS database and Makefile • Note: Keep an eye on the size of this directory and also on the log messages. It can get out of hand sometimes! Build Yourself a PC Cluster NOW
Partitioning Strategies • Client disk partitioning: • / must be small, stable and non-changing (around 200MB) • /home will be mounted from the server • /var can consume your space very quickly. So keep it separate to prevent filling the root partition • /usr/src can be spacious. Keep it separate to prevent filling any system partitions • /opt third party products (Star Office) can be huge • swap is required • /tmp can be very big esp., for parallel applications • Server disk partitioning: • /home: full partition can crash user’s codes, prevent users from logging in, mail problems and others • /var/log can grow fast plus you may want to keep archives there • server directories depend on the purpose of the server: web server, ftp server, mail server • / and swap: the same as for client • /usr/export for outside software Build Yourself a PC Cluster NOW
Boot Process and LILO Build Yourself a PC Cluster NOW
Boot Process ROM BIOS boot sector lilo kernel RAM init • The BIOS program loads the boot sector from the boot device which contains the linux loader, which then loads the Linux kernel to start the init process, which loads all the Linux services. • The first 512 bytes of a disk contain a boot sector. Floppy disks have only one boot sector. However, each partition on a hard drive has its own boot sector. • The first sector on the the entire disk is called the master boot record (MBR). It is the only boot sector loaded by the BIOS. However, the MBR contains a small loader program and a partition table. For DOS MBR, it passes control to the active partition • /boot/vmlinuz-2.4.1 Build Yourself a PC Cluster NOW
What is LILO? • LILO (Linux Loader) is a program that allows you to boot Linux (as well as some other OS’s) • It is better to install LILO after the other OS’s have been installed • The Lilo configuration file is: /etc/lilo.conf • When Lilo boots the system, it can only load data sectors that can be accessed by the BIOS (one of the first two physical disks, often up to 1024 cylinders) • Any time you recompile the kernel or overwrite the image boot files you must reinstall Lilo (run /sbin/lilo) Build Yourself a PC Cluster NOW
/etc/lilo.conf • boot=/dev/hda5 • map=/boot/map • install=/boot/boot.b • prompt • timeout=50 • image=/boot/vmlinuz-2.4.1 • label=linux • root=/dev/hda5 • read-only • password = whowho • restricted • other=/dev/hda1 • label=dos • table=/dev/hda Build Yourself a PC Cluster NOW
LILO Location MBR, First sector of /boot, Floppy the Master Boot Record (MBR, /dev/hda) -- a recommended place if only one operating system will be installed: In this case LiLo will be the primary boot loader and will handle the first-stage booting process for all OS’s on your drive Before the installation make a backup copy of your MBR on a floppy disk, e.g.: dd if=/dev/hda of =/mnt/floppy/MBR bs=512 count=1 512 bytes=446 (for loader program) + 64 (for partition table) + 2 (for verification of boot sector) To restore the old MBR, e.g.: dd if=/mnt/floppy/MBR of=/dev/hda bs=446 count=1 the root partition of your Linux installation (usually /dev/hda1 or /dev/hda2) -- the best place for LILO (safe, easier to uninstall, etc....) Build Yourself a PC Cluster NOW
LILO boot process • Lilo boots the kernel by passing it a number of arguments. The kernel is loaded into the RAM and then it calls the system boot and initialization program /sbin/init. Init is the mother of all processes. Its primary role is to create processes from a configuration script stored in the file /etc/inittab. • 1. The first line of /etc/inittab tells init which runlevel should be default: • id:5:initdefault: • 2. The system initialization is started by the script /etc/rc.d/rc.sysinit: • si::sysinit:/etc/rc.d/rc.sysinit • 3. Init then runs another shell script based on the runlevel: • l5:5:wait:/etc/rc.d/rc 5 • 4.Init starts to run once programs: ud::once:/sbin/update • 5. Init manages the re-spawned processes when they die: • 1:12345:respawn:/sbin/mingetty tty1 • 6. Init catches certain events (power failures, ctrlt-alt-del and other interrupts): • ca::ctrlaltdel:/sbin/shutdown -t3 -r now Build Yourself a PC Cluster NOW