230 likes | 519 Views
BQS. Yves Fouilhé Centre de Calcul de l’IN2P3 fouilhe@in2p3.fr. June 99. Contents. Architecture A job’s life Job tracking Special events Administration Details: Group Objectives, Post-mortem script, Files, Resource management, Privileges, Authentication. Architecture. Client machines.
E N D
BQS Yves Fouilhé Centre de Calcul de l’IN2P3 fouilhe@in2p3.fr June 99
Contents • Architecture • A job’s life • Job tracking • Special events • Administration • Details: Group Objectives, Post-mortem script, Files, Resource management, Privileges, Authentication
Architecture Client machines Class I: T<100000Sec, Spool<5MB, Scratch<400MB, VMEM<128MB Worker ccwasn03: PlatForm=SUN, VMEM=1GB, Scratch=4GB, Spool=30MB Cluster Anastasie: Master, Scheduling parameters, Groups objectives
A job’s life: 1. Submission • qsub from a client machine, job queued on master • 3 files sent: • main and post-mortem scripts, environment variables • Resource specifications • Class and/or explicit • CPU time, spool, scratch, memory, tokens, platform • More options: • group, messages, standard IO, hold/at-time, rerunable, priority • Other commands: • qalter, qhold, qrls
A job’s life: 2. scheduling • Job and WorkPoint selection • Job: state and caracteristics of queued Jobs • set by commands: qsub, qalter, qhold, qrls • Groups objectives and consumptions • WorkPoint: • state and caracteristics of Workers and WorkPoints • set by commands: qstop, qstart, qterm, qinit • Scheduling algorithm • on master based on global vars managed by bqsc qvar
A job’s life: 3. Spawning • master sends selected job to selected worker: • jobblock, • tokens, • the 3 files sent by qsub. • on worker, BQS: • gives the job a jobid (jobnumber.hostname), • builds the scratch and spool directories for the job, • takes the identity of the user (user, group, tokens) • starts the shell under which the job will execute
A job’s life: 4. Execution • The shell executes, in this order: • BQS prologue script, • job main script, • BQS epilogue script • prologue: • starts bqseoj in the background • writes the start-of-job banner to stdout • bqseoj: • tells master when job starts and ends • periodically checks and extends tokens, • keeps job PGID, last target of qdel
A job’s life: 5. End of Job • epilogue, after the end of the main script: • writes end-of-job banner to stdout • sends final accounting data to master • terminates the main shell • bqseoj being orphaned • starts user’s post-mortem script if any given by qsub • starts BQS post-mortem script, • sends standard outputs to the user
Job tracking • Tracking: • qjob [-l] jobname • qshow -R worker • qcat jobname • qaccount -p • Debugging • vi master&worker:/usr/spool/bqs/logyyyymmdd • vi worker:/usr/spool/bqs/eojlogs/jobid • Usual job states • QUEUED/HOLD after submission • SUBMITTED job selected, sent to worker, PGID back • STARTED bqseoj started • RUNNING job reported by danmonitor • ENDED bqsjobend and then bqseoj reported end of job
Special events and states • Job voluntary interruption • qdel [-f] DELET[ING|ED] • qrerun [-f -h] REQUEUE[ING|D|D-HOLD], RERUN (-nr !) • REBOOT of a worker • REQUEUED (-nr !) • Spawning failure • HOLDSYS (qjob -l => HOLDTOK, HOLDSYS) • Resource limit exhausted • SIGNALLED (qjob -l => SIG[TL|TMP|MEM|TOK][ACK]) • KILLED (qjob -l => KILL[G][TL|TMP|MEM|TOK])
Administration • Add a machine: • add it in cluster.conf on master • prepare the machine: register the bqs user, make bqs spool, install soft, do local customizations, copy configuration files from master, register bqsdaemon in inetd.conf, monitor danmonitor • qinit + qstart • Other commands • qstart, qstop machines or WorkPoints • qterm, qinit to temporarily put a machine out and in a cluster • Configuration and Global Variables in /usr/spool/bqs: • cluster.conf, global_vars, cluster_vars, master_vars • scope: global, cluster, master • bqsc qvar query ALL • bqsc qvar [query|set] -s stanza=val var=val
Group objectives • Every authorized group is given a CPU consumption objective • Normalized CPU hours • for the current semester • bqsc qvar query -s group=groupname • bqsc qvar set -s group=groupname objective=value • Not to be considered absolute but rather relatively to other objectives • Scheduling based on the distance between CPU consumption and objective • distance has thresholds • CPU consumed is amortized over last 30 days • Flooding factors to avoid a group from monopolizing resources
Post-mortem script and JOBRC • qsub -af can send your post-mortem script to BQS • to perform housekeeping after normal or abnormal end of job • to move a good file from a temporary to a safe backuped directory or delete results of a job that terminated incorrectly • to register processed events in a database • to start another dependent job or restart this job again • to alert user upon special condition • JOBRC may be stored by qalter • qalter -rc may be used during the execution of the main script of your job to store a value in BQS database • BQS makes this value avaliable to your post-mortem script into the JOBRC environment variable • this helps deciding if the job terminated correctly or what step failed
Files • All machines • /usr/spool/bqs/logtoday, logyyyymmdd • cluster.conf, global_vars, cluster_vars • Master: • In_m, In_m.v, In_m.af, • jobblocks, yymmblocks • .usertokens/.username • override.worker, state.worker, • master_vars, all_masters_vars • Worker: • spool: /usr/spool/bqs/usertmp/jobnumber/In_m.O & E • bqseoj log: /usr/spool/bqs/eojlogs/jobid • scratch: /scratch/usernamejobnumber directory
Resource Management • Specification • explicit with qsub -l T,M,scratch,spool • or default corresponding to the explicit or implied job class • Monitoring resource usage by danmonitor on workers and report to master: • CPU time: normalized, sum on all current and ended processes • Virtual memory: sum on all current processes • Standard IO, aka spool: /usr/spool/bqs/usertmp/jobnumber directory • Scratch: /scratch/usernamejobnumber directory • Master stops jobs consuming beyond one limit: • SIGUSR1 sent to the process that used qtl (modified CERNLIB TIMEL), or to the larger CPU consumer. • After a grace period, SIGUSR2, and if needed SIGKILL, sent to the whole job (PGID)
Special resource: AFS tokens • Specification • local cell token always required • other tokens may be declared as required (qsub -l afs_cell) • jobs are always equipped with all owner’s known tokens • Monitoring • bqseoj checks tokens life time • bqseoj stops the job if a required token expires • user is warned token is expiring, he can refresh and send it (qlog) • local cell token may be automatically refreshed • refreshed tokens reloaded by bqseoj • bqseoj interrupt the stop process or use the CPU time limit procedure to stop the job.
Clients and Servers • Clients: • we already mentioned the client commands (man bqs) • they can be used through an API (man bqsapi) • clients are authenticated by danauthd running on all client machine (cf. the ident protocol, rfc 1413) • Servers: bqsd and bqsdaemon on every machine • bqsdaemon, via inetd, dedicated to qsub, qcat and qrcp • bqsd used for all other commands • number of bqsd started on master, each dedicated to a series of commands • fonctionalities used according to the role of each machine, some commands forwarded from local to master bqsd. Most directly from client to master.
Privileges • Administrators • are declared in cluster.conf • displayed by qshow -A • equivalent to privilege bqs.admin • Privilege manager may be used • privc client command and API, available to BQS and others • privilege given to users, groups and machines • thane.groupname to act on all jobs pertaining to a group: qsub -p, qalter, qdel, qhold, qrerun, qrls, qsig • bqs.admin (>bqs.*) and bqs.operator super thane and administration • bqs.clustername to submit to a restricted cluster • bqs.bypass_lock to bypass complex or cluster lock • bqs.spawn_forbidden, bqs.spawn_anyway • bqs.Pclass gives access to the P (eg) restricted class
Authentication • Looks like an implementation of the ident protocol, rfc 1413 • daemon danauthd running on client machine • bqsd on master receives a privileged command and wants to be sure of the identity of its client calling from the remote client machine • bqsd calls danauthd (running root on the client machine and supposed to be trustable) and ask it the identity on its machine of the owner of such socket and such protocol • danauthd and the corresponding API • is not limited to BQS but may be used by other servers, • it can be seen as least as an identification mechanism • identd could replace danauthd • probably available for all of the OS we use, • but danauthd is here and returns more information