240 likes | 458 Views
OpenPBS (Portable Batch System). A Job S cheduling System By: Zac Brownell. Overview. Overview System Details Commands (client) Server Executor (MOM – Machine- Oriented Miniserver ) Scheduler Staging Files Utilities/Extensions for OpenPBS Questions. Commands Overview.
E N D
OpenPBS(Portable Batch System) A Job Scheduling System By: Zac Brownell
Overview • Overview • System Details • Commands (client) • Server • Executor (MOM – Machine-OrientedMiniserver) • Scheduler • Staging Files • Utilities/Extensions for OpenPBS • Questions
CommandsOverview • PBS offerstwo types of commands • Command line commands(POSIX 1003.2d conformant) • GUI (GraphicalUser Interface) – haven’tseenthisanywhere, but System Admininstall directions call it out • Commandsallowusers to do the following to jobs: • Submit • Monitor • Modify • Delete • Commandscanbeinstalled on any system type supported by PBS and do not require the local presence of any of the other components of PBS • Three classifications of commandsexist: • User commands(anyauthorized user canuse) • Operateagainst jobs • Operatorcommands(requiredifferentaccessprivileges) • Operateagainst queues • Manager commands (requiredifferentaccessprivileges) • Operateagainst servers
ExampleCommands • Alter Job • qalter [−a date_time] [−A account_string] [−c interval] [−e path] [−h hold_list] [−j join] [−k keep] [−l resource_list] [−m mail_options] [−M user_list] [−N name] [−o path] [−p priority] [−r c] [-S path] [−u user_list] [−W additional_attributes] job_identifier... • Delete Job • qdel [−W delay] job_identifier ... • Hold Job • qhold [−h hold_list] job_identifier ... • Move Job • qmove destination job_identifier ... • Message Job • qmsg [-E] [-O] message_stringjob_identifier ... • Order Jobs • qorderjob_identifierjob_identifier • Rerun Job • qrerunjob_identifier ...
ExampleCommands (cont) • Release Job • qrls [-h hold_list] job_identifier ... • Select Jobs • Qselect [-a [op]date_time] [-A account_string] [-c [op]interval] [-h hold_list] [ -l resource_list] [-N name] [-p [op]priority] [-r rerun] [-s states] [-u user_list] • Signal Job • qsig [-s signal] job_identifier ... • Status Jobs, Queues, or Servers • qstat [-f][-W site_specific] [job_identifier... | destination...] • qstat [-a|-i|-r] [-n] [-s] [-G|-M] [-R] [-u user_list] [job_identifier... | destination... ] • qstat -Q [-f][-W site_specific] [destination... ] • qstat -q [-G|-M] [destination... ] • qstat -B [-f][-W site_specific] [server_name... ] • Submit Job • qsub [-a date_time] [-A account_string] [-c interval] [-C directive_prefix] [-e path] [-h] • [-I] [-j join] [-k keep] [-l resource_list] [-m mail_options] [-M user_list] [-N name] [-o • path] [-p priority] [-q destination] [-r c] [-S path_list] [-u user_list] [-v variable_list] [-V] • [-W additional_attributes] [-z] [script] • Etc, etc, etc...
Server Overview • The Job Server is the central focus for PBS • Generallyreferred to as the Server or by the executionnamepbs_server • All commands and the other daemons communicatewith the Server via an IP network • Provides basic batch services • Receiving/creating a batch job • Modifying the job • Protecting the job against system crashes • Running the job (placingitintoexecution) • Maintainstwo queues: • Routing queue (optional) • Execution (batch) queue (required)
Executor (MOM) Overview • Daemon whichactually places the job intoexecution • Daemon iscalledMom as itis the mother of all executing jobs, and pbs_mom • Places a job intoexecutionwhenitreceives a copy of the job from a Server • Creates a new session as identical to a user login session as possible • Example: If the user’s login shelliscsh, thenMomcreates a session in which .login isrun as well as .cshrc • csh = C Shell • .login is the file readonlyat login • .cshrcis the file readeach time a C shellisstarted • Returns the job’s output to the user whendirected to do so by the Server • OpenPBS supports Interactive mode – whichallows for standard i/o to beredirected to client atruntime (neatfeature, have to specifically call out if youwant to use – not default setting)
SchedulerOverview • Daemon whichcontains the site’spolicycontrollingwhich job isrun, in addition to whereand whenitisrun • PBS allowseach site to createitsownscheduler • Schedulercommunicateswith the variousMoms to learn about the state of system resources • Memory usage • CPU loadaverage • Schedulercommunicateswith the Server to learn about the availability of jobs to execute • Status of jobs in queue • Job dependencies • Schedulerappearsas a batch Manager to the Server
Batch Scheduling on a Single Host events policy kernel 2) Server sends scheduling command to Scheduler 1) Event tells Server to initiate a scheduling cycle 6) Server sends job status info to scheduler. Scheduler makes policy decision to run job 3) Scheduler requests resource info from MOM Server Scheduler MOM 5) Scheduler requests job info from server 4) MOM returns requested info 7) Scheduler sends run request to server jobs running jobs 8) Server sends job to MOM to run
Batch Scheduling on Multiple Hosts events policy kernel MOM 3) Scheduler requests resource info from MOM 2) Server sends scheduling command to Scheduler 1) Event tells Server to initiate a scheduling cycle running jobs 6) Server sends job status info to scheduler. Scheduler makes policy decision to run job Client Server Scheduler 5) Scheduler requests job info from server kernel 4) MOM returns requested info 7) Scheduler sends run request to server MOM jobs 8) Server sends job to MOM to run running jobs
Staging In/Out Files • Staging In (moving files prior to job) • Designateremote host/name and local name to use (for copy) • If file is local, /bin/cpisused to copy • If file isremote, rcpisused • Staging Out (moving files after a job) • Occursduring job exiting (‘E’) state • Once files copied to remote, they’reremovedfromexecution host • Can retain by usinglink (‘ln’) command
Staging In Files Job with Stage In Requirements Job without Stage In Requirements stage in request run job request run job request Files Only Files & Execute time fail fail Stage In Failed stage in complete run job request mom acknowledges Stage In Complete Pre-Run (send jobs to MOM) Running (MOM started job)
Interface Library (IFL) • Provides a means of building new batch clients (any batch service requestexposed via IFL) • Allowsusers to build jobs thatstatusthemselves or spawn new jobs, display custom status info (instead of usingqstat), or build new control commands • 1:1 relationshipbetweencommands and batch service requests (aka – anythingyoucan do via command-line interface, youcan script using IFL) • Source: src/lib/Libifl/pbsD_*.c • Header: <pbs_error.h>, <pbs_ifl.h>
PBSWeb • DevelopedatUniversity of Alberta as an aid to the OpenPBS job scheduler • Key benefits • Allow the user to submit jobs to PBS without hand-writing a complicatedscript • Save all scripts submitted by a user for future modification and/or resubmission • Allow a user to submit jobs to any site withequalease. The site couldbeacross the street or across the country • Allow the user to manage jobs withoutloggingintoanyremote machines. The user mayview jobs in queues and deleteany of theircurrentlyqueued or running jobs • Allow the user to view the output of a job withoutloggingintoanyremotemachine • Do all of this in a securemanner (via SSL), and with utilities commonlyavailable on mostUnices • Submit jobs as the actual user, not just on behalf of the user (via SSH) • Additional information • http://webdocs.cs.ualberta.ca/~pinchak/PBSWeb/
UnderLordScheduler • Joint developedbetween Jefferson Lab and MIT to providemeta-facilitybetweentheirlabs • Consists of twoschedulers • OverLord – handlesmigrating jobs between sites • UnderLord – responsible for deploying jobs locally • Implements ‘stage classes’ to allow custom schedulingpolicy • Job Age Stage – Considers time spent in queue and weights jobs accordingly • Job Duration Stage – Considersprojected time job willtake to run (canmultiply for parallelexecution) • Queue Priority Stage – Considerspriority of different queues on server, alongwithhistorical system utilization • User Share Stage – Considersuser’shistorical usage of system and weights jobs accordingly • User Priority Stage – Providesprioritystatus for an individualuser’s jobs (if multiple jobs withsameweight, prioritydetermineswhichruns first) • User’sManual... • http://www.jlab.org/hpc/UnderLord/UnderLord.pdf
MissingResources • Official OpenPBS home • No longer hosts OpenPBS info – onlypaid ‘PBS Works’ software suite (Altair Software) • PBS mini-HowTo • DanishIntitute of Physicshosted, but no longer available (DTU Fysik) • Center for HPC Cluster Resource Management and Scheduling (supercluster.org) • Nowwebsite for Adaptive Computing • OSC Configuration and Tools for OpenPBS and Maui Scheduler • Ohio Supercomputer Center site still hosts PBS info, but links to this documentation are broken • Maui Scheduler • MPI Exec • OpenPBSDownload site (former MRJ TechnologySolution’s PBS web site) • http://pbs.mrj.com/
Resources • OpenPBSPublic Home • http://www.mcs.anl.gov/research/projects/openpbs/ • OpenPBS NASA Gov site • http://www.nas.nasa.gov/Software/PBS/docs.html