180 likes | 298 Views
Research Computing Environment at the University of Alberta. Diego Novillo Research Computing Support Group University of Alberta April 1999. Computing Environment. SGI Origin 2000, 42 CPUs, 10Gb RAM Mix of interactive and batch jobs 2 CPUs for interactive activity
E N D
Research Computing Environment at the University of Alberta Diego Novillo Research Computing Support Group University of Alberta April 1999
Computing Environment • SGI Origin 2000, 42 CPUs, 10Gb RAM • Mix of interactive and batch jobs • 2 CPUs for interactive activity • 40 CPUs used by batch jobs • Batch jobs managed by LSF (Platform)
Monthly System Utilization (CPU days) Theoretical max
Average wait time in queue (hours) Need to balance parallel jobs Started using load thresholds
Some thoughts on usage • Scalar use is predominant (so far) • We are starting to push the system • Jobs are waiting too long in the queue • Need to modify queue policies • Lower runtime limits • Checkpoint/restart • Limit on number of jobs submitted
Job queues • Parallel queue par • High priority • Slot-based: up to 32 processors • Jobs are never suspended • Sequential queue nic • Low priority • threshold-based: up to 95% system utilization • Jobs can be preempted by parallel jobs
Job queues II • Two special queues • npseq • For sequential jobs that do not wish to be preempted • Very low priority • Only 4 slots available • special • Jobs that need to run longer than system limit • Only 1 slot available • Must be approved by committee
Fairshare system • Jobs are scheduled according to priorities • Priorities are dynamic and based on • Number of shares • Past usage (currently 2 weeks of history) • Type of job (parallel jobs higher priority) • Resource availability also important
Getting started • Complete LSF documentation online http://www.ualberta.ca/CNS/RESEARCH/LSF/ • Man pages also available • Add one line to your login files source /usr/local/lsf/cshrc.lsf (C shell) or . /usr/local/lsf/profile.lsf (Bourne shell)
Submitting jobs % bsub [options] pgm args -q name Which queue to use -n num How many processors -o file Output file • Queue defaults to ‘nic’. • If no output file is given, results are mailed to you.
Watching jobs % bjobs [options] -l All the details -p Only pending jobs (and why) -a All jobs (even finished ones) -uall All the jobs in the system jobid Just the job with this id
Manipulating jobs % bkill jobid Kills the job (can also send signal) % bstop jobid Suspends the job (even if not running) % bresume jobid Resumes the job
Getting usage statistics • We keep monthly stats in our web page http://www.ualberta.ca/CNS/RESEARCH/ • For current information % bacct [opts] Total usage for your jobs. Can specify dates and jobs % priorities (or bhpart -r) Lists all the priorities for different groups
Monitoring load on the system % bqueues Shows queues and how loaded they are % lsload Quick glance at the load on the system • Also GUI tools (xlsbatch, xlsmon) • Please use sparingly as they add to interactive load on the system.
Contact Information • Visit our home page http://www.ualberta.ca/CNS/RESEARCH/ • Questions and comments Research.Support@Ualberta.CA