1 / 18

Cost-effective clustering with OpenPBS

Cost-effective clustering with OpenPBS. Ben Webb WGR Research Group Physical and Theoretical Chemistry Lab. University of Oxford. Overview. History of PBS Interests of the WGR group OpenPBS architecture: portability, security, scheduling Grid integration Alternatives. History of PBS.

stacia
Download Presentation

Cost-effective clustering with OpenPBS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cost-effective clustering with OpenPBS Ben Webb WGR Research Group Physical and Theoretical Chemistry Lab. University of Oxford

  2. Overview • History of PBS • Interests of the WGR group • OpenPBS architecture: portability, security, scheduling • Grid integration • Alternatives

  3. History of PBS • PBS is the “Portable Batch System” • Developed from 1993 to 1997 for NASA • Intended to replace NQS • Currently available as: • OpenPBS (open source) • PBSPro (commercial)

  4. Interests of the WGR group • High throughput • Virtual screening (cancer screensaver) • Met by loose “grid” of over 2 million PCs; United Devices/Intel • High performance • Ab initio chemistry • Simulation of chemical reactions (free energy) • Met by OpenPBS at zero software cost

  5. OpenPBS architecture • Server: keeps track of all jobs • Scheduler: tells the server when and where to run jobs • MOM (Machine Oriented Miniserver): runs on each node to start, monitor, and terminate jobs, under instruction from the server • POSIX compliant batch system • Supports file staging for executables and data • No need for shared filesystem (e.g. NFS) although this does simplify communication

  6. An example OpenPBS setup

  7. Advantages of PBSPro • Pre-emptive job scheduling • Scheduler backfilling • Improved fault tolerance • “Desktop Cycle Harvesting” • Paid support (all OpenPBS support is via mailing lists) • Largely compatible with OpenPBS

  8. Portability • Runs on most Unix-like systems: e.g. Linux/Irix/Unicos/HPUX/IA64 etc. • MOMs for various architectures take advantage of system-specific features • e.g. checkpointing supported on certain architectures • Full server/client/MOM support for heterogeneous networks

  9. Queues and nodes • Unlike NQS, PBS does not rely on queues for scheduling decisions • Queues are not tied to nodes, but can specify resources • Routing queues can pass jobs to execution queues, possibly on different PBS servers • Nodes can have any number of virtual processors

  10. Resource definition • Server-defined properties group nodes into classes - e.g. “intel” for all Intel architecture machines • Additional resources (e.g. tape drives, software licences) can be specified by each MOM • Custom resources are not utilised by the default scheduler

  11. Resource usage • Timeshared nodes: balanced by load • Cluster nodes: jobs allocated to virtual processors, usually exclusively • MOMs track jobs and kill any that exceed resource limits (e.g. CPU or wall time, memory) • No unified mechanism for accounting of running and finished jobs • qstat for running jobs • Server accounting logs for finished jobs

  12. Scheduling • Scheduler is just a privileged client • Well-defined PBS scheduling API • Facilities to write schedulers in C/BaSL/Tcl • OpenPBS provides a simple FIFO scheduler, as well as custom schedulers to take advantage of system-specific features • Maui scheduler (third party) also integrates with other batch systems, and provides powerful scheduling

  13. Security • Uses rhosts mechanism for authentication of clients to the server (consistent user name space not required), but does not require rsh • MOMs can use rsh, ssh or cp (via NFS) to stage files in and out • Access Control Lists can also be used to provide extra security • PBS daemons use non-random port numbers, and TCP for most communication, allowing straightforward firewalling • All daemons run as root! (No reported vulnerabilities to date, however.)

  14. Parallel support • Conventional MPI mechanisms rely on well-behaved users, and lack resource tracking • OpenPBS provides a Task Manager (TM) API • Allows parallel PBS jobs to spawn processes on nodes other than the master • mpiexec (third party) allows start-up of MPI jobs via the TM mechanism (MPICH/EMP/LAM) • Current LAM CVS also has a PBS-TM boot SSI (system services interface) for job start-up

  15. Customisation • Full source code available, for commercial or non-commercial use • Site-specific modification routines allow easy customisation of “likely targets” • Defined C API for job submission, query etc. • Third-party projects and patches, e.g. mpiexec, Cplant (fault tolerance), PyPBS, scalability patches, AFS token management

  16. Grid integration • Globus Resource Allocation Manager (GRAM) available for PBS • Maui scheduler or PBSPro default scheduler support advance reservations • Silver metascheduler is grid-aware, has full support for PBS, and can work with or without Globus

  17. Comparison with Sun Grid Engine • Both systems perform balancing of jobs/load between managed nodes • PBS server is a single point of failure; SGE supports shadow masters • SGE seems to now be more actively developed than OpenPBS

  18. Summary and acknowledgements • OpenPBS is a cheap solution for Linux clustering, conventional supercomputer management, and/or use of idle workstations • Can upgrade easily to PBSPro if desired PBS includes software developed by NASA Ames Research Center, Lawrence Livermore National Laboratory, and Veridian Information Solutions, Inc. Visit www.OpenPBS.org for OpenPBS software support, products, and information. WGR group webpages: http://bellatrix.pcl.ox.ac.uk/

More Related