170 likes | 177 Views
Learn about the BYU SP-2 system, its interactive nodes for login and testing, batch scheduling system, and parallel file system. Explore compiler options, libraries, and documentation. Understand job scheduling, backfill scheduling, and using LoadLeveler. Use sample scripts and commands for job management.
E N D
Our System • Interactive nodes (2) • used for login, compilation & testing • marylou10.et.byu.edu • I/O and scheduling nodes (7) • used for the batch scheduling system and the parallel file system • Compute nodes (26) • 22 4 processor • 4 16 processor
Compilers • xlc C • xlC C++ • xlf Fortran • Parallel Compilers • mpcc • mpCC • mpxlf • Optimization • -O5 -qarch=pwr3 -qtune=pwr3 -qhot • Libraries • -lblas, -lfftw, -llapack, -lessl
Other Stuff • Documentation • http://www-1.ibm.com/servers/eserver/pseries/library/sp_books/ • http://marylou.byu.edu • Launching parallel jobs • done through the batch scheduler • Your job is a shell script that you hand to the batch scheduler for execution • Can look at xloadl for help creating script
Batch job scheduler • Batch Schedulers • PBS (Portable Batch System) open source • LoadLeveler - descendent of Condor • The process • user submits jobs to queue • machines register with scheduler offering to run jobs of certain class • scheduler allocates jobs to machines and tracks them • once started, jobs are scheduled by kernel
Scheduling parallel jobs • jobs can ask for • number of nodes (1 CPU) • number of tasks per node (multiple CPUs) • non shared nodes (multiple CPUs) • mixing jobs can be bad • two intense I/O processes on a 2 CPU node can ruin performance for both • same for two RAM intensive processes
Scheduling parallel jobs (2) • All allocated nodes and processors and resources are allocated for the duration of the entire job • No dynamic adjustments, except by creating jobs with multiple steps • each step can have different requirements • each step can express dependency on other steps
Scheduling parallel jobs (3) • Management must • allow some jobs to use the entire machine • allow short jobs to get started quickly they should not have to wait weeks in the queue • Some very long jobs may be needed, but are to be avoided
Backfill scheduling Job C 10 nodes system Job D Job B Job A time B A C D
Backfill scheduling • Requires real time limit to be set • More accurate (shorter) estimate gives more chance to be running earlier • Short jobs can move through system quicker • Uses system better by avoiding waste of cycles during wait
Using LoadLeveler • Graphical user interface: xloadl • Make shell script with LoadLeveler keywords as shell comments # @output = thing.log # @error = thing.err # @class = short # @queue # @executable = thingx # @node = 6,10 # @tasks_per_node = 4 # @requirements = (Adapter==hps_us)
Sample LoadLeveler Script #!/bin/ksh # @ job_type = parallel # @ input = /dev/null # @ output = $(Executable).$(Cluster).$(Process).out # @ error = $(Executable).$(Cluster).$(Process).err # @ initialdir = /gstudent/student_rt_y/directory # @ notify_user = student_rt_y@byu.edu # @ class = short # @ notification = complete # @ checkpoint = no # @ restart = no # @ requirements = (Arch == "power3") # @ blocking = unlimited # @ total_tasks = 4 # @ network.MPI = switch,shared,US # @ queue ./your_exe_and_any_args
Sample serial job #!/bin/ksh # @ job_type = serial # @ input = /dev/null # @ output = $(Executable).$(Cluster).$(Process).out # @ error = $(Executable).$(Cluster).$(Process).err # @ initialdir = /gstudent/student_rt_y # @ notify_user = student_rt_y@byu.edu # @ class = medium # @ notification = complete # @ checkpoint = no # @ restart = no # @ queue paupnew Hlav3ashort.paup
LoadLeveler commands • llq: shows all jobs • can also use showq • llq -s JobID : show why not running • llclass : shows classes • llstatus : shows machines • llcancel JobID : cancel job • llhold JobID : put job in hold state
Sample llq output bash-2.05a$ llq Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ ----------- m1015i.1127.0 mdt36 8/7 12:41 R 50 long m1009i m1015i.1128.0 mdt36 8/7 12:41 R 50 long m1019i m1015i.1497.0 jl447 8/12 16:25 R 50 long m1012i m1015i.1544.0 to5 8/13 08:44 R 50 long m1045i m1015i.1545.0 to5 8/13 08:44 R 50 long m1045i … m1015i.1602.0 taskman 8/14 08:13 R 50 short m1017i m1015i.1598.0 taskman 8/14 08:13 R 50 short m1014i m1015i.1601.0 taskman 8/14 08:13 R 50 short m1017i m1015i.1599.0 taskman 8/14 08:13 R 50 short m1014i m1015i.1600.0 taskman 8/14 08:13 R 50 short m1011i m1015i.1626.0 mendez 8/14 13:07 I 50 long m1015i.1625.0 cr66 8/14 12:40 I 50 medium m1015i.1513.0 jl447 8/13 07:08 I 50 long m1015i.1572.0 dvd 8/13 10:45 I 50 medium m1015i.1576.0 dvd 8/13 11:22 I 50 medium m1015i.1577.0 dvd 8/13 11:25 I 50 medium m1015i.1566.0 mdt36 8/13 08:51 I 50 long m1015i.1564.0 mdt36 8/13 08:50 I 50 long … m1015i.1612.0 taskman 8/14 08:27 I 50 short m1015i.1624.0 taskman 8/14 08:57 I 50 short m1015i.1623.0 taskman 8/14 08:57 I 50 short 58 job step(s) in queue, 23 waiting, 0 pending, 35 running, 0 held, 0 preempted
Sample showq output bash-2.05a$ showq ACTIVE JOBS-------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME m1015i.1581.0 taskman Running 1 18:39:00 Wed Aug 14 08:06:24 m1015i.1582.0 taskman Running 1 18:39:00 Wed Aug 14 08:06:24 m1015i.1580.0 taskman Running 1 18:39:00 Wed Aug 14 08:06:24 … m1015i.1615.0 taskman Running 1 21:33:42 Wed Aug 14 11:01:06 m1015i.1613.0 taskman Running 1 23:43:05 Wed Aug 14 13:10:29 m1015i.1575.0 dvd Running 4 2:15:10:38 Wed Aug 14 04:38:02 m1015i.1127.0 mdt36 Running 8 2:23:14:21 Wed Aug 7 12:41:45 … m1015i.1567.0 jar65 Running 4 9:04:07:44 Tue Aug 13 17:35:08 m1015i.1569.0 jar65 Running 4 9:08:28:16 Tue Aug 13 21:55:40 m1015i.1547.0 to5 Running 8 9:21:11:49 Wed Aug 14 10:39:13 m1015i.1546.0 to5 Running 8 9:21:11:49 Wed Aug 14 10:39:13 35 Active Jobs 150 of 184 Processors Active (81.52%) 26 of 34 Nodes Active (76.47%) IDLE JOBS---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME m1015i.1513.0 jl447 Idle 2 5:00:00:00 Tue Aug 13 07:08:09 m1015i.1572.0 dvd Idle 8 3:00:00:00 Tue Aug 13 10:45:18 … 23 Idle Jobs NON-QUEUED JOBS---------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME Total Jobs: 58 Active Jobs: 35 Idle Jobs: 23 Non-Queued Jobs: 0
LoadLeveler environment • Normally same as your login environment • Limits are set, use llclass -l to see values • ulimit -S -a • ulimit -H -a • Big heap requirements • -bmaxdata:0x80000000 up to 2 GB data (heap) • -q64 -bmaxdata:0x…. Up to 8 EB