1 / 9

Running, Using, and Maintaining a Cluster

Running, Using, and Maintaining a Cluster. From a software viewpoint Andrew Fitz Gibbon. What’s involved?. A cluster. People to use that cluster. Software for those people to use. Ways to keep the cluster working. Libraries and User applications Schedulers and resource managers

kyoko
Download Presentation

Running, Using, and Maintaining a Cluster

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Running, Using, and Maintaining a Cluster • From a software viewpoint • Andrew Fitz Gibbon

  2. What’s involved? • A cluster. • People to use that cluster. • Software for those people to use. • Ways to keep the cluster working. • Libraries and User applications • Schedulers and resource managers • Monitoring and Maintenance tools

  3. Goals? • Reliability • Availability • Efficient* • Secure** • AFAP*** * Maintains high utilization by people getting “real” work done, and not by system time. Should also be optimized for your users’ work load. ** But not so secure as to lock out users. *** A.F.A.P: As Fast As Possible

  4. Compilers • GNU Compiler suite • Intel Compiler suite • Portland Group Compilers • Cray Compilers • ....

  5. Libraries • MPI • ATLAS • ScalaBLAST • CUDA • Gaussian • mpiBLAST • BLAS • LAPACK • ScaLAPACK • PETSc Ad nausium...

  6. Job Managers • TORQUE/PBS, LSF, LoadLeveler, Condor, Sun Grid Engine • Usually tied into scheduler like FIFO, Maui • Separate queues based on needs • E.g., CUDA, Debug, “normal,” intpar

  7. Monitoring tools Cacti Ganglia

  8. Maintenance tools • C3 Tools • Nagios • Cron • IPMI • Modules • ...

  9. Questions?

More Related