661 likes | 1.44k Views
Beowulf Clusters. Matthew Doney. What is a cluster?. A cluster is a group of several computers connected Several different methods of connecting them Distributed Computers widely separated, connected over the internet Used by research groups like SETI@home and GIMPS Workstation Cluster
E N D
Beowulf Clusters Matthew Doney
What is a cluster? • A cluster is a group of several computers connected • Several different methods of connecting them • Distributed • Computers widely separated, connected over the internet • Used by research groups like SETI@home and GIMPS • Workstation Cluster • Collection of Workstations loosely connected by LAN • Cluster Farm • PC’s connected over LAN that perform work when idle
What is a Beowulf Cluster • A Beowulf Cluster is one class of a cluster computer • Uses Commercial Off The Shelf (COTS) hardware • Typically contains both master and slave nodes • Not defined by a specific piece of hardware Image Source: http://www.cse.mtu.edu/Common/cluster.jpg
What is a Beowulf Cluster • The origin of the name “Beowulf” • Main character of Old English poem • Described in the poem – “he has thirty men’s heft of grasp in the gripe of his hand, the bold-in-battle”. Image Source: http://www.teachingcollegeenglish.com/wp-content/uploads/2011/06/lynd-ward-17-jnanam-dot-net.jpg
Cluster Computer History – 1950’s • SAGE, one of the first cluster computers • Developed by IBM for NORAD • Linked radar stations together for first early warning detection system Image Source: http://www.ieeeghn.org/wiki/images/3/34/Sage_nomination.jpg
Cluster Computer History – 1970’s • Technological Advancements • VLSI (Very Large Scale Integration) • Ethernet • UNIX Operating System
Cluster Computer History – 1980’s • Increased interest in cluster computing • Ex: NSA connected 160 Apollo workstations in a cluster configuration • First widely used clustering product: VAXcluster • Development of task scheduling software • Condor package developed by UW-Madison • Development of parallel programming software • PVM(Parallel Virtual Machine)
Cluster Computer History – 1990’s • NOW(Network of workstations) project at UC Berkeley • First cluster on TOP500 list • Development of Myrinet LAN system • Beowulf project started at NASA’s Goddard Space Flight Center Image Source: http://www.cs.berkeley.edu/~pattrsn/Arch/NOW2.jpg
Cluster Computer History - Beowulf • Developed by Thomas Sterling and Donald Becker • 16 Individual nodes • 100 MHz Intel 80486 processors • 16 MB memory, 500 MB hard drive • 2 10Mbps Ethernet ports • Early version of Linux • Used PVM library
Cluster Computer History – 1990’s • MPI standard developed • Created to be a global standard to replace existing message passing protocols • DOE, NASA, California Institute of Technology collaboration • Developed a Beowulf system with sustained performance 1 Gflops • Cost $50,000 • Awarded Gordon Bell prize for price/performance • 28 Clusters were on the TOP500 list by the end of the decade
Beowulf Cluster Advantages • Price/Performance • Using COTS hardware greatly reduces associated costs • Scalability • By using individual nodes, more can easily be added by slightly altering the network • Convergence Architecture • Using commodity hardware has standardized operating systems, instruction sets, and communication protocols • Code portability has greatly increased
Beowulf Cluster Advantages • Flexibility of Configuration and Upgrades • Large variety of COTS components • Standardization of COTS components allows for easy upgrades • Technology Tracking • Can use new components as soon as they come out • No delay time waiting for manufacturers to integrate components • High Availability • System will continue to run if an individual node fails
Beowulf Cluster Advantages • Level of Control • System is easily configured to users liking • Development Cost and Time • No special hardware needs to be designed • Less time designing system, just pick parts to be used • Cheaper mass market components
Beowulf Cluster Disadvantages • Programming Difficulty • Programs need to be highly parallelized to take advantage of hardware design • Distributed Memory • Program data is split over the individual nodes • Network speed can bottleneck performance • Results may need to be compiled by a single node
Beowulf Cluster Architecture • Master-Slave configuration • Master Node • Job scheduling • System monitoring • Resource management • Slave Node • Does assigned work • Communicates with other slave nodes • Sends results to master node
Node Hardware • Typically desktop PC’s • Can consist of other types of computers i.e. • Rack-mount servers • Case-less motherboards • PS3’s • RaspberryPi boards
Node Software • Operating System • Resource Manager • Message Passing Software
Resource Management Software • Condor • Developed by UW-Madison • Allows distributed job submission • PBS (Portable Batch System) • Initially developed by NASA • Developed to schedule jobs on parallel compute clusters • Maui • Adds enhanced monitoring to existing job scheduler (i.e. PBS) • Allows administrator to set individual and group job priorities
Sample Condor Submit File • Submits 150 copies of the program foo • Each copy of the program has its own input, output, and error message file • All of the log information from Condor goes to one file
Sample Maui Configuration File • User yangq will have the highest priority users of the group ART having lowest • Members of group CS_SE are limited to 20 jobs which use no more than 100 nodes
Sample PBS Submit File • Submits job “my_job_name” that needs 1 hour and 4 CPUs with 2GB of memory • Uses file “my_job_name.in” as input • Uses file “my_job_name.log” as output • Uses file “my_job_name.err” as error output
Message Passing Software • MPI (Message Passing Interface) • Widely used in HPC community • Specification is controlled by MPI-Forum • Available for free • PVM (Parallel Virtual Machine) • First message passing protocol in be widely used • Provided for fault tolerant operation
Interconnection Hardware • Two main choices – technology and topology • Main Technologies • Ethernet with speeds up to 10Gbps • Infiniband with speeds up to 300 Gbps Image Source:http://www.sierra-cables.com/Cables/Images/12X-Infiniband-R.jpg
Interconnection Topology Bus Network Torus Network Flat Neighborhood Network
References • [1] Impagliazzo, J., & Lee, J. A. N. (2004). History of Computing in Education. Norwell: Kluwer Academic Publishers. • [2] Pfeiffer, C. (Photographer). (2006, November 25). Cray-1 Deutsches Museum [Web Photo]. Retrieved from http://en.wikipedia.org/wiki/File:Cray-1-deutsches-museum.jpg • [3] Sterling, T. (2002). Beowulf Cluster Computing with Linux. United States of America: MassahusettsInstitue of Technology. • [4] Sterling, T. (2002). Beowulf Cluster Computing with Windows. United State of America: Massachusetts Institute of Technology. • [5] Condor High Throughput Computing. (2013, October 24). Retrieved October 27, 2013, from http://research.cs.wisc.edu/htcondor/
References • [6] Beowulf: A Parallel Workstation For Scientific Computation. (1995). Retrieved October 27, 2013, from http://www.phy.duke.edu/~rgb/brahma/Resources/beowulf/papers/ICPP95/ icpp95.html • [7] Development over Time | TOP500 Supercomputer Sites. Retrieved October 27, 2013, from www.top500.org/statistics/overtime/ • [8] Jain, A. (2006). Beowulf cluster design and setup. Retrieved October 27, 2013. Informally published manuscript, Department of Computer Science, Boise State University, Retrieved from http://cs.boisestate.edu/~amit/research/beowulf/beowulf-setup.pdf • [9] Zinner, S. (2012). High Performance Computing Using Beowulf Clusters. Retrieved October 27, 2013. Retrieved from http://www2.hawaii.edu/~zinner/101/students/MitchelBeowulf/cluster.html