280 likes | 325 Views
Beowulf Clusters. Paul Tymann Computer Science Department Rochester Institute of Technology ptt@cs.rit.edu. Parallel Computers ( Summary ). In the mid 70s and 80s high performance computing was dominated by systems that were contained in a single “box”.
E N D
Beowulf Clusters Paul Tymann Computer Science Department Rochester Institute of Technology ptt@cs.rit.edu
Parallel Computers (Summary) • In the mid 70s and 80s high performance computing was dominated by systems that were contained in a single “box”. • Architectures were specialized and very different from each other. • There is no Von Neumann architecture for parallel computers • If the software and hardware architectures matched you could attain significant improvements in performance • Difficult (almost impossible in some cases) to port programs • Programmers had to be specialized • Very expensive 364 - Beowulf Clusters
Seymour Cray (1925-1996) • Packaging, including heat removal. • High level bit plumbing… getting the bits from I/O, into memory through a processor and back to memory and to I/O. • Parallelism. • Programming: O/S and compiler. • Problems being solved. • Established the template for vector supercomputer architecture. 364 - Beowulf Clusters
Cray XMP/4 364 - Beowulf Clusters
Cray 2 364 - Beowulf Clusters
Thinking Machines • Company founded by Danny Hillis, Guy Steele and others. • Thinking Machines was the leader in scalable computing, with software and applications running on parallel systems ranging from 16 to 1024 processors. • In developing the Connection Machine system, Thinking Machines also did pioneering work in parallel software. 364 - Beowulf Clusters
Basic Organization • Host sends commands & data to microcontroller • Microcontroller broadcasts control signals, data to array • Microcontroller collects data from processor array CM Processors And Memories Host Computer Microcontroller 364 - Beowulf Clusters
CM2 364 - Beowulf Clusters
CM5 364 - Beowulf Clusters
SPMD Computing • SPMD stands for single program multiple data • The same program is run on the processors of an MIMD machine • Occasionally the processors may synchronize • Because an entire program is executed on separate data, it is possible that different branches are taken, leading to asynchronous parallelism • SPMD can about as a desire to do SIMD like calculations on MIMD machines • SPMD is not a hardware paradigm, it is the software equivalent of SIMD 364 - Beowulf Clusters
Distributed Systems • A collection of autonomous computers linked by a network, with software designed to produce an integrated computing facility. • The introduction of LANs at the beginning of the 1970s triggered the development of distributed systems. • As an alternative to expensive parallel systems, many researchers began to “build” parallel computers using distributed computing technology. Local Area Network 364 - Beowulf Clusters
Distributed vs. High Performance • Distributed systems, and distributed software are in common use today • Web servers • ATM networks • Cell Phone System • … • These system use distributed computing for architectural reasons (reliability, modularity, …) not necessarily for speed. • High performance distributed computing uses distributed computing to reduce the run time of an application • Primary interest is speed • Primary use is parallel computing 364 - Beowulf Clusters
Common Systems Beowulf – conqueror of computationally intensive problems COWS – Clusters of Workstations 364 - Beowulf Clusters
Clusters of Workstations • Cycle vampires • Use wasted compute cycles on the desktop • Utilize equipment that is not designed for distributed computing • 100mbps may be fine for mail… • Must work with an OS that is designed for general purpose computing • Typically suspend computation when workstation becomes active • Some common software environments include • Condor • PVM/P4 • Autorun • … 364 - Beowulf Clusters
Communication • Communication is vital in any kind of distributed application. • Initially most people wrote their own protocols • Tower of Babel effect • Eventually standards appeared • Parallel Virtual Machine (PVM) • Message Passing Interface (MPI) 364 - Beowulf Clusters
What Is a Beowulf Cluster? • “It's a kind of high-performance massively parallel computer built primarily out of commodity hardware components, running a free-software operating system like Linux or FreeBSD, interconnected by a private high-speed network.” – Beowulf FAQ. • A key feature of a Beowulf cluster is that the machines in the cluster are dedicated to running high-performance computing tasks. • The cluster is on a private network. • It is usually connected to the outside world through only a single node. 364 - Beowulf Clusters
Beowulf Architecture External Network Control Nodes … Cluster of dedicated machines on separate network 364 - Beowulf Clusters
Origins of Beowulf • In the early 1990s, NASA researchers Becker & Sterling identify these problems: • Computing projects need more power. • Budgets are increasingly tight. • Supercomputer manufacturers were going bust. • Maintenance contracts voided. • Proprietary hardware no longer upgradeable. • Proprietary software no longer maintainable. “Learning the peculiarities of a specific vendor only enslaves you to that vendor.” 364 - Beowulf Clusters
1994: Wiglaf • Becker & Sterling named their prototype system Wiglaf: • 16 nodes, each with • 486-DX4 CPU (100-MHz) • 16M RAM (60 ns) • 540Mb or 1Gb disk • three 10-Mbps ethernet cards (communication load spread across three distinct ethernets) • Triple-bus topology • 42 Mflops (measured) Source: Joel Adams, http://www.calvin.edu/~adams/ 364 - Beowulf Clusters
1995: Hrothgar • They named their next system Hrothgar: • 16 nodes, each with • Pentium CPU (100-MHz) • 32M RAM • 1.2Gb disk • Two 100-Mbps NICs • Two 100-Mbps switches double-bus topology • 280 Mflops (measured) Source: Joel Adams, http://www.calvin.edu/~adams/ 364 - Beowulf Clusters
1997: Stone Soupercomputer • Hoffman & Hargrove built ORNL’s Stone Soupercomputer • donated/castoff nodes • 486s, Pentiums, ... • whatever RAM, disk • one 10-Mbps NIC • one 10-Mbps ethernet (bus topology) • Feb 2002: 133 nodes • Total cost: $0 • Performance/Price ratio: Source: Joel Adams, http://www.calvin.edu/~adams/ 364 - Beowulf Clusters
Stone SouperComputer Source: Joel Adams, http://www.calvin.edu/~adams/ 364 - Beowulf Clusters
plexus.lac.rit.edu • 53 dual (106 CPUs) PIII 1.4GHz boxes each with. • 36GB SCSI drives. • 512MB of RAM. • Gigabit Ethernet card. • A management node. • Storage/server node. • an attached storage array with 14 73GB scsi drives (approx. 1TB of storage). • The cluster is connected by switched Gigabit Ethernet. • A 100Mbps Ethernet is used for administration. 364 - Beowulf Clusters
Beowulf Resources • The Beowulf Project • http://www.beowulf.org • The Beowulf Underground • http://www.beowulf-underground.org/ • The Beowulf HOWTO • http://www.linux.com/howto/Beowulf-HOWTO.html • The Scyld Computing Corporation • http://www.scyld.com 364 - Beowulf Clusters
Communication • Communication is vital in any kind of distributed application. • Initially most people wrote their own protocols. • Tower of Babel effect. • Eventually standards appeared. • Parallel Virtual Machine (PVM). • Message Passing Interface (MPI). 364 - Beowulf Clusters
What is MPI? • A message passing library specification • Message-passing model • Not a compiler specification (i.e. not a language) • Not a specific product • Designed for parallel computers, clusters, and heterogeneous networks 364 - Beowulf Clusters
The MPI Process • Development began in early 1992 • Open process/Broad participation • IBM,Intel, TMC, Meiko, Cray, Convex, Ncube • PVM, p4, Express, Linda, … • Laboratories, Universities, Government • Final version of draft in May 1994 • Public and vendor implementations are now widely available 364 - Beowulf Clusters