1 / 23

Batch Systems

Batch Systems. In a number of scientific computing environments, multiple users must share a compute resource: research clusters supercomputing centers On multi-user HPC clusters, the batch system is a key component for aggregating compute nodes into a single, sharable computing resource

Download Presentation

Batch Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Batch Systems • In a number of scientific computing environments, multiple users must share a compute resource: • research clusters • supercomputing centers • On multi-user HPC clusters, the batch system is a key component for aggregating compute nodes into a single, sharable computing resource • The batch system becomes the “nerve center” for coordinating the use of resources and controlling the state of the system in a way that must be “fair” to its users • As current and future expert users of large-scale compute resources, you need to be familiar with the basics of a batch system

  2. Batch Systems • The core functionality of all batch systems are essentially the same, regardless of the size or specific configuration of the compute hardware: • Multiple Job Queues: • queues provide an orderly environment for managing a large number of jobs • queues are defined with a variety of limits for maximum run times, memory usage, and processor counts; they are often assigned different priority levels as well • may be interactive or non-interactive • Job Control: • submission of individual jobs to do some work (eg. serial, or parallel HPC applications) • simple monitoring and manipulation of individual jobs, and collection of resource usage statistics (e.g., memory usage, CPU usage, and elapsed wall-clock time per job) • Job Scheduling • policy which decides priority between individual user jobs • allocates resources to scheduled jobs

  3. Batch Systems • Job Scheduling Policies: • the scheduler must decide how to prioritize all the jobs on the system and allocate necessary resources for each job (processors, memory, file-systems, etc) • scheduling process can be easy or non-trivial depending on the size and desired functionality • first in, first out (FIFO)scheduling: jobs are simply scheduled in the order in which they are submitted • political scheduling: enables some users to have more priority than others • fairshare scheduling, scheduler ensures users have equal access over time • Additional features may also impact scheduling order: • advanced reservations - resources can be reserved in advance for a particular user or job • backfill -can be combined with any of the scheduling paradigms to allow smaller jobs to run while waiting for enough resources to become available for larger jobs • back-fill of smaller jobs helps maximize the overall resource utilization • back-fill can be your friend for small duration jobs

  4. Batch Systems • Common batch systems you may encounter in scientific computing: • Platform LSF • PBS • Loadleveler (IBM) • SGE • All have similar functionality but different syntax • Reasonably straight forward to convert your job scripts from one system to another • Above all include specific batch system directives which can be placed in a shell script to request certain resources (processors, queues, etc). • We will focus on LSF primarily since it is the system running on Lonestar

  5. Launch mpirun Queue Master Batch Submission Process internet Compute Nodes Server Head Submission: bsub < job C1 C2 C3 C4 Queue: Job Script waits for resources on Server Master: Compute Node that executes the job script, launches ALL MPI processes Launch: ssh to each compute node to start executable (e.g. a.out) ibrun ./a.out mpirun –np # ./a.out

  6. LSF Batch System • Lonestar uses Platform LSF for both the batch queuing system and scheduling mechanism (provides similar functionality to PBS, but requires different commands for job submission and monitoring) • LSF includes global fairshare, a mechanism for ensuring no one user monopolizes the computing resources • Batch jobs are submitted on the front end and are subsequently executed on compute nodes as resources become available • Order of job execution depends on a variety of parameters: • Submission Time • Queue Priority: some queues have higher priorities than others • Backfill Opportunities: small jobs may be back-filled while waiting for bigger jobs to complete • Fairshare Priority: users who have recently used a lot of compute resources will have a lower priority than those who are submitting new jobs • Advanced Reservations: jobs my be blocked in order to accommodate advanced reservations (for example, during maintenance windows) • Number of Actively Scheduled Jobs: there are limits on the maximum number of concurrent processors used by each user

  7. Lonestar Queue Definitions

  8. Lonestar Queue Definitions • Additional Queue Limits • In the normal and high queues, only a maximum of 512 processes can be used at one time. Jobs requiring more processors are deferred for possible scheduling until running jobs complete. For example, a single user can have the following job combinations eligible for scheduling: • Run 2 jobs requiring 256 procs • Run 4 jobs requiring 128 procs each • Run 8 jobs requiring 64 procs each • Run 16 jobs requiring 32 procs each • A maximum of 25 queued jobs per user is allowed at one time

  9. LSF Fairshare • A global fairshare mechanism is implemented on Lonestar to provide fair access to its substantial compute resources • Fairshare computes a dynamic priority for each user and uses this priority in making scheduling decisions • Dynamic priority is based on the following criteria • Number of shares assigned • Resources used by jobs belonging to the user: • Number of job slots reserved • Run time of running jobs • Cumulative actual CPU time (not normalized), adjusted so that recently used CPU time is weighted more heavily than CPU time used in the distant past

  10. Priority LSF Fairshare • bhpart: Command to see current fairshare priority. For example: lslogin1--> bhpart -r HOST_PARTITION_NAME: GlobalPartition HOSTS: all SHARE_INFO_FOR: GlobalPartition/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME avijit 1 0.333 0 0 0.0 0 chona 1 0.333 0 0 0.0 0 ewalker 1 0.333 0 0 0.0 0 minyard 1 0.333 0 0 0.0 0 phaa406 1 0.333 0 0 0.0 0 bbarth 1 0.333 0 0 0.0 0 milfeld 1 0.333 0 0 2.9 0 karl 1 0.077 0 0 51203.4 0 vmcalo 1 0.000 320 0 2816754.8 7194752

  11. Commonly Used LSF Commands Note: most of these commands support a “-l” argument for long listings. For example: bhist –l <jobID> will give a detailed history of a specific job. Consult the man pages for each of these commands for more information.

  12. LSF Batch System • LSF Defined Environment Variables:

  13. LSF Batch System • Comparison of LSF, PBS and Loadleveler commands that provide similar functionality

  14. Batch System Concerns • Submission (need to know) • Required Resources • Run-time Environment • Directory of Submission • Directory of Execution • Files for stdout/stderr Return • Email Notification • Job Monitoring • Job Deletion • Queued Jobs • Running Jobs

  15. Job name Stdout Output file name (%J = jobID) Stderr Output file name Submission queue Your Project Name Max Run Time (15 minutes) Echo pertinent environment info LSF: Basic MPI Job Script Total number of processes #!/bin/csh #BSUB -n 32 #BSUB -J hello #BSUB -o %J.out #BSUB -e %J.err #BSUB -q normal #BSUB -P A-ccsc #BSUB -W 0:15 echo "Master Host = "`hostname` echo "LSF_SUBMIT_DIR: $LS_SUBCWD" echo "PWD_DIR: "`pwd` ibrun ./hello Execution command Parallel application manager and mpirun wrapper script executable

  16. Job name Stdout Output file name (%J = jobID) Stderr Output file name Submission queue Your Project Name Max Run Time (15 minutes) LSF: Extended MPI Job Script Total number of processes #!/bin/csh #BSUB -n 32 #BSUB -J hello #BSUB -o %J.out #BSUB -e %J.err #BSUB -q normal #BSUB -P A-ccsc #BSUB -W 0:15 #BSUB -w ‘ended(1123)' #BSUB -u karl@tacc.utexas.edu #BSUB -B #BSUB -N echo "Master Host = "`hostname` echo "LSF_SUBMIT_DIR: $LS_SUBCWD" ibrun ./hello Dependency on Job <1123> Email address Email when job begins execution Email job report informationupon completion

  17. LSF: Job Script Submission • When submitting jobs to LSF using a job script, a redirection is required for bsub to read the commands. Consider the following script:lslogin1> cat job.script#!/bin/csh#BSUB -n 32#BSUB -J hello#BSUB -o %J.out#BSUB -e %J.err#BSUB -q normal#BSUB -W 0:15echo "Master Host = "`hostname`echo "LSF_SUBMIT_DIR: $LS_SUBCWD“echo "PWD_DIR: "`pwd`ibrun ./hello • To submit the job:lslogin1% bsub < job Re-direction is required!

  18. LSF: Interactive Execution • Several ways to run interactively • Submit entire command to bsub directly:> bsub –q development -I -n 2 -W 0:15 ibrun ./helloYour job is being routed to the development queueJob <11822> is submitted to queue <development>.<<Waiting for dispatch ...>><<Starting on compute-1-0>> Hello, world! --> Process # 0 of 2 is alive. ->compute-1-0 --> Process # 1 of 2 is alive. ->compute-1-0 • Submit using normal job script and include additional -I directive:> bsub -I < job.script

  19. Batch Script Suggestions • Echo issuing commands • (“set -x” and “set echo” for ksh and csh). • Avoid absolute pathnames • Use relative path names or environment variables ($HOME, $WORK) • Abort job when a critical command fails. • Print environment • Include the "env" command if your batch job doesn't execute the same as in an interactive execution. • Use “./” prefix for executing commands in the current directory • The dot means to look for commands in the present working directory. Not all systems include "." in your $PATH variable. (usage: ./a.out). • Track your CPU time

  20. LSF Job Monitoring (showq utility) lslogin1% showq ACTIVE JOBS-------------------- JOBID JOBNAME USERNAME STATE PROC REMAINING STARTTIME 11318 1024_90_96x6 vmcalo Running 64 18:09:19 Fri Jan 9 10:43:53 11352 naf phaa406 Running 16 17:51:15 Fri Jan 9 10:25:49 11357 24N phaa406 Running 16 18:19:12 Fri Jan 9 10:53:46 23 Active jobs 504 of 556 Processors Active (90.65%) IDLE JOBS---------------------- JOBID JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 11169 poroe8 xgai Idle 128 10:00:00 Thu Jan 8 10:17:06 11645 meshconv019 bbarth Idle 16 24:00:00 Fri Jan 9 16:24:183 Idle jobs BLOCKED JOBS------------------- JOBID JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 11319 1024_90_96x6 vmcalo Deferred 64 24:00:00 Thu Jan 8 18:09:11 11320 1024_90_96x6 vmcalo Deferred 64 24:00:00 Thu Jan 8 18:09:11 17 Blocked jobs Total Jobs: 43 Active Jobs: 23 Idle Jobs: 3 Blocked Jobs: 17

  21. LSF Job Monitoring (bjobs command) lslogin1% bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 11635 bbarth RUN normal lonestar 2*compute-8 *shconv009 Jan 9 16:24 2*compute-9-22 2*compute-3-25 2*compute-8-30 2*compute-1-27 2*compute-4-2 2*compute-3-9 2*compute-6-13 11640 bbarth RUN normal lonestar 2*compute-3 *shconv014 Jan 9 16:24 2*compute-6-2 2*compute-6-5 2*compute-3-12 2*compute-4-27 2*compute-7-28 2*compute-3-5 2*compute-7-5 11657 bbarth PEND normal lonestar *shconv028 Jan 9 16:38 11658 bbarth PEND normal lonestar *shconv029 Jan 9 16:38 11662 bbarth PEND normal lonestar *shconv033 Jan 9 16:38 11663 bbarth PEND normal lonestar *shconv034 Jan 9 16:38 11667 bbarth PEND normal lonestar *shconv038 Jan 9 16:38 11668 bbarth PEND normal lonestar *shconv039 Jan 9 16:38 Note: Use “bjobs -u all” to see jobs from all users.

  22. LSF Job Monitoring (lsuser utility) lslogin1$ lsuser -u vap JOBID QUEUE USER NAME PROCS SUBMITTED 547741 normal vap vap_hd_sh_p96 14 Tue Jun 7 10:37:01 2005 HOST R15s R1m R15m PAGES MEM SWAP TEMP compute-11-11 2.0 2.0 1.4 4.9P/s 1840M 2038M 24320M compute-8-3 2.0 2.0 2.0 1.9P/s 1839M 2041M 23712M compute-7-23 2.0 2.0 1.9 2.3P/s 1838M 2038M 24752M compute-3-19 2.0 2.0 2.0 2.6P/s 1847M 2041M 23216M compute-14-19 2.0 2.0 2.0 2.1P/s 1851M 2040M 24752M compute-3-21 2.0 2.0 1.7 2.0P/s 1845M 2038M 24432M compute-13-11 2.0 2.0 1.5 1.8P/s 1841M 2040M 24752M

  23. LSF Job Manipulation/Monitoring • To kill a running or queued job (takes ~30 seconds to complete):bkill <jobID> bkill -r <jobID> (Use when bkill alone won’t delete the job) • To suspend a queued job:bstop <jobId> • To resume a suspended job:bresume <jobID> • To see more information on why a job is pending:bjobs –p <jobID> • To see a historical summary of a job:bhist <jobID>lslogin1> bhist 11821Summary of time in seconds spent in various states:JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL11821 karl hello 131 0 127 0 0 0 258

More Related