240 likes | 257 Views
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005. Topics. Interactive Serial Parallel Limits Batch Serial Parallel Queues and Policies Charging Comparison with Seaborg. Execution Environment.
E N D
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005
Topics • Interactive • Serial • Parallel • Limits • Batch • Serial • Parallel • Queues and Policies • Charging • Comparison with Seaborg
Execution Environment • Four login nodes • Serial jobs only • CPU limit: 60 minutes • Memory limit: 64 MB • 320 compute nodes • “Interactive” parallel jobs • Batch serial and parallel jobs • Scheduled by PBSPro • Queue limits and policies established to meet system objectives • User input is critical!
Interactive Jobs • Serial jobs run on login nodes • cd, ls, pathf90, etc. • ./a.out • Parallel jobs run on compute nodes • Controlled by PBSPro mpirun -np 16 ./a.out qsub -I -q interactive -l nodes=8:ppn=2 % cd $PBS_O_WORKDIR % mpirun -np 16 ./a.out qsub -I -q batch -l nodes=32:ppn=2,walltime=18:00:00
PBSPro • Marketed by Altair Engineering • Based on open source Portable Batch System developed for NASA • Also installed on DaVinci • Batch scripts contain directives: #PBS -o myjob.out • Directives may also appear as command-line options: qsub -o myjob.out …
Simple Batch Script #PBS -l nodes=8:ppn=2,walltime=00:30:00 #PBS -N myjob #PBS -o myjob.out #PBS -e myjob.err #PBS -A mp999 #PBS -q debug #PBS -V cd $PBS_O_WORKDIR mpirun -np 16 ./a.out
Useful PBS Options (1) -Arepo Charge this job to repository repo Default: Your default repository -N jobname Provide name for job; up to 15 printable, non-whitespace characters Default: Name of batch script -q qname Submit job to batch queue qname Default: batch
Useful PBS Options (2) -S shell Specify shell as the scripting language Default: Your login shell -V Export current environment variables into the batch job environment Default: Do not export
Useful PBS Options (3) -o outfile Write STDOUT to outfile Default: <jobname>.o<jobid> -e errfile Write STDERR to errfile Default: <jobname>.e<jobid> -j [eo|oe] Join STDOUT and STDERR on STDOUT (eo)or STDERR (oe) Default: Do not join
Useful PBS Options (4) -m [a|b|e|n] E-main notification a = send mail when job aborted by system b = send mail when job begins e = send mail when job ends n = do not send mail Options a, b, and e may be combined Default: a
Batch Queue Policies • Each user may have: • One running interactive job • One running debug job • Four jobs running over entire system • Only one batch128 job is allowed to run at a time. • The batch256 queue usually has a run limit of zero. NERSC staff will arrange to run jobs of this size.
Submitting Batch Jobs % qsub myjob 93935.jacin03 % • Record jobid for tracking!
Deleting Batch Jobs % qdel 93935.jacin03 %
Monitoring Batch Jobs (1) • PBS command qstat % qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 93295.jacin03-ib job5 einstein 00:00:00 R batch16 93894.jacin03 EV80fl02_3 legendre 0 H batch16 93330.jacin03 test.script laplace 00:00:23 R batch32 93897.jacin03 runlu8x8 rasputin 0 Q batch32 93334.jacin03-m mtp_mg_3wat_o2a fibonacci 00:00:11 R batch16 ... • Use -u option for single-user output % qstat -u einstein Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 93295.jacin03-ib job5 einstein 00:00:00 R batch16 %
Monitoring Batch Jobs (2) • NERSC command qs % qs JOBID ST USER NAME NDS REQ USED SUBMIT 93939 R gauss STDIN 1 00:30:00 00:10:43 Oct 2 16:47:00 93891 R einstein runlu4x8 16 01:00:00 00:38:48 Oct 2 15:23:36 93918 R inewton r4_16 8 01:00:00 00:10:37 Oct 2 15:36:35 ... 93785 Q inewton r4_64 32 01:00:00 - Oct 2 08:42:36 93828 Q rasputin nodemove 64 00:05:00 - Oct 2 12:00:11 93897 Q einstein runlu8x8 32 01:00:00 - Oct 2 15:24:27 ... 93893 H legendre EV80fl02_2 4 03:00:00 - Oct 2 15:24:23 93894 H legendre EV80fl02_3 4 03:00:00 - Oct 2 15:24:24 93917 H legendre EV80fl98_5 4 03:00:00 - Oct 2 15:26:06 ... • Also provides -u option
Monitoring Batch Jobs (3) • NERSC website has current queue look: http://www.nersc.gov/nusers/status/jacquard/qstat • Also has completed jobs list: http://www.nersc.gov/nusers/status/jacquard/pbs_summary • Numerous filtering options available • Owner • Account • Queue • Jobid
Charging • Machine charge factor (cf) = 4 • Based on benchmarks and user applications • Currently under review • Serial interactive • Charge = cf • cputime • Always charged to default repository • All parallel • Charge = cf • 2 • nodes • walltime • Charged to default repo unless -A specified
Things To Look Out For (1) • Do not set group write permission for your home directory; it will prevent PBS from running your jobs. • Library modules must be loaded at runtime as well as linktime. • Propagation of environment variables to remote processes is incomplete; contact NERSC consulting for help.
Things To Look Out For (2) • Do not run more that one MPI program in a single batch script. • If your login shell is bash, you may see: accept: Resource temporarily unavailable done. In this case, specify a different shell using the -S directive, such as: #PBS -S /usr/bin/ksh
Things To Look Out For (3) • Batch jobs always start in $HOME. To get to directory where job was submitted: cd $PBS_O_WORKDIR For jobs that work with large files: cd $SCRATCH/some_subdirectory • PBS buffers output and error files until job completes. To view files (in home directory) while running: -k oe
Things To Look Out For (3) • The following is just a warning and can be ignored: Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.
Resources • NERSC Website http://www.nersc.gov/nusers/resources/jacquard/running_jobs.php http://www.nersc.gov/vendor_docs/altair/PBSPro_7.0_User_Guide.pdf • NERSC Consulting 1-800-66-NERSC, menu option 3, 8 am - 5 pm, Pacific time (510) 486-8600, menu option 3, 8 am - 5 pm, Pacific time consult@nersc.govhttp://help.nersc.gov/