1 / 29

Building Beowulfs for High Performance Computing

Building Beowulfs for High Performance Computing. Duncan Grove Department of Computer Science University of Adelaide http://dhpc.adelaide.edu.au/projects/beowulf. “Cluster” of networked PCs Intel PentiumII or Compaq Alpha Switched 100Mbit/s Ethernet or Myrinet Linux

donar
Download Presentation

Building Beowulfs for High Performance Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Beowulfs for High Performance Computing Duncan Grove Department of Computer Science University of Adelaide http://dhpc.adelaide.edu.au/projects/beowulf

  2. “Cluster” of networked PCs Intel PentiumII or Compaq Alpha Switched 100Mbit/s Ethernet or Myrinet Linux Parallel and batch software support Anatomy of a “Beowulf” Switching Infrastructure Outside World Front-end Node n1 n2 nN Compute Nodes

  3. Science/$ Some problems take lots of processing Many supercomputers are used as batch processing engines Traditional supercomputers wasteful high throughput computing Beowulfs: “ [useful] computational cycles at the lowest possible price.” Suited to high throughput computing Effective at an increasingly large set of parallel problems Why build Beowulfs?

  4. Data Parallel Regular grid based problems Parallelising compilers, eg HPF Eg physicists running lattice gauge calculations Message Passing Unstructured parallel problems. MPI, PVM Eg chemists running molecular dynamics simulations. Task Farming “High throughput computing” - batch jobs Queuing systems Eg chemists running Gaussian. Three Computational Paradigms

  5. Caltech Prehistory Berkeley NOW NASA Beowulf Stone SouperComputer USQ Topcat UIUC NT Supercluster LANL Avalon SNL Cplant AU Perseus? A Brief Cluster History

  6. Single System Image (SSI) Unified process space Distributed shared memory Distributed file system Performance easily extensible Just “add more bits” Is fault tolerant Is “simple” to administer and use Beowulf Wishlist

  7. Shrinkwrapped “solutions” or do-it-yourself Not much more than a nicely installed network of PCs A few kernel hacks to improve performance No magical software for making the cluster transparent to the user Queuing software and parallel programming software can create the appearance of a more unified machine Current Sophistication?

  8. Stone SouperComputer

  9. Learning platform Program development Simple benchmarking Simple performance evaluation of real applcaions Teaching machine Money lever Iofor

  10. iMacwulf • Student lab by day, Beowulf by night? • MacOS with Appleseed • LinuxPPC 4.0, soon LinuxPPC 5.0 • MacOS/X

  11. Machine Cost # Processors ~ Peak Speed Cray T3E 10s million 1084 1300Gflop/s SGI Origin 2000 10s million 128 128Gflop/s IBM SP2 10s million 512 400Gflop/s Sun HPC 1s million 64 50Gflop/s TMC CM5 5 Million (1992) 128 20Gflop/s SGI PowerChallenge 1 Million (1995) 20 20Gflop/s Beowulf cluster + myrinet 1 Million 256 120Gflop/s Beowulf cluster 300K 256 120Gflop/s “Gigaflop harlotry”

  12. In the past: Commomdity processors way behind supercomputer processors Commodity networks way, way, way behind supercomputer networks In the now: Commomdity processors only just behind supercomputer processors Commmodity networks still way, way behind supercomputer networks More exotic networks still way behind supercomputer networks In the future: Commodity processors will be supercomputer processors Will the commodity networks catch up? The obvious, but important

  13. Hardware possibilities

  14. OS possibilities

  15. The good... Lots of users, active development Easy access to make your own tweaks Aspects of Linux are still immature, but recently SGI has release xfs as open source Sun has released its HPC software as open source And the bad... There’s a lot of bad code out there! Open Source

  16. So many choices! Interfaces, cables, switches, hubs; ATM, Ethernet, Fast Ethernet, gigabit Ethernet, firewire, HiPPI, serial HiPPI, Myrinet, SCI… The important issues latency bandwidth availability price price/performance application type! Network technologies

  17. I/O a problem in parallel systems Data not local on compute nodes is a performance hit Distributed file systems CacheFS CODA Parallel file systems PVFS On-line bulk data is interesting in itself Beowulf Bulk Data Server cf with slow, expensive tape silos... Disk subsystems

  18. Machine for chemistry simulations Mainly high throughput computing RIEF grant in excess of $300K 128 nodes. For < $2K per node Dual processor PII450 At least 256MB RAM Some nodes up to 1GB 6GB local disk each 5x24 (+2x4) port Intel 100Mbit/s switches Perseus

  19. Prototype 16 dual processor PII 100Mbit/s switched Ethernet Perseus: Phase 1

  20. Perseus: installing a node Switching Infrastructure Outside World Front-end Node n1 n2 nN User node, administration, compilers, queues, nfs, dns, NIS, /etc/*, bootp/dhcp, kickstart, ... Floppy disk or bootrom

  21. Software to support the three computational paradigms Data Parallel Portland Group HPF Message Passing MPICH, LAM/MPI, PVM High throughput computing Condor, GNU Queue Gaussian94, Gaussian98 Software on perseus

  22. Loki, 1996 16 Pentium Pro processors, 10Mbit/s Ethernet 3.2 Gflop/s peak, achieved 1.2 real Gflop/s on Linpack benchmark Perseus, 1999 256 PentiumII processors, 100Mbit/s Ethernet 115 Gflop/s peak ~40 Gflop/s on Linpack benchmark? Compare with top 500! Would get us to about 200 currently Other Australian machines? NEC SX/4 @ BOM at #102 Sun HPC at #181, #182, #255 Fujitsi VPP @ ANU at #400 Expected parallel performance

  23. Build it right! Is the operating system and software running ok? Is heat dissipation going to be a problem? Monitoring daemon Normal features CPU, network, memory, disk More exotic features Power supply and CPU fan speeds Motherboard and CPU temperatures Do we have any heisen-cabling? Racks and lots of cable ties! Reliability in a large system

  24. Scalability Load balancing Effects of machines capabilities Desktop machines vs. dedicated machines Resource allocation Task Migration Distributed I/O System monitoring and control tools Maintenance requirements Installation, upgrading, versioning Complicated scripts Parallel interactive shell? The limitations...

  25. A large proportion of the current limitations compared with traditional HPC solutions are merely systems integration problems Some contributions to be made in HOWTOs Monitoring and maintenance Performance modelling and real benchmarking … and the opportuntities

More Related