170 likes | 282 Views
Using the BYU SP-2. Our System. Interactive nodes (2) used for login, compilation & testing marylou10.et.byu.edu I/O and scheduling nodes (7) used for the batch scheduling system and the parallel file system Compute nodes (26) 22 4 processor 4 16 processor. Compilers. xlc C
E N D
Our System • Interactive nodes (2) • used for login, compilation & testing • marylou10.et.byu.edu • I/O and scheduling nodes (7) • used for the batch scheduling system and the parallel file system • Compute nodes (26) • 22 4 processor • 4 16 processor
Compilers • xlc C • xlC C++ • xlf Fortran • Parallel Compilers • mpcc • mpCC • mpxlf • Optimization • -O5 -qarch=pwr3 -qtune=pwr3 -qhot • Libraries • -lblas, -lfftw, -llapack, -lessl
Other Stuff • Documentation • http://www-1.ibm.com/servers/eserver/pseries/library/sp_books/ • http://marylou.byu.edu • Launching parallel jobs • done through the batch scheduler • Your job is a shell script that you hand to the batch scheduler for execution • Can look at xloadl for help creating script
Batch job scheduler • Batch Schedulers • PBS (Portable Batch System) open source • LoadLeveler - descendent of Condor • The process • user submits jobs to queue • machines register with scheduler offering to run jobs of certain class • scheduler allocates jobs to machines and tracks them • once started, jobs are scheduled by kernel
Scheduling parallel jobs • jobs can ask for • number of nodes (1 CPU) • number of tasks per node (multiple CPUs) • non shared nodes (multiple CPUs) • mixing jobs can be bad • two intense I/O processes on a 2 CPU node can ruin performance for both • same for two RAM intensive processes
Scheduling parallel jobs (2) • All allocated nodes and processors and resources are allocated for the duration of the entire job • No dynamic adjustments, except by creating jobs with multiple steps • each step can have different requirements • each step can express dependency on other steps
Scheduling parallel jobs (3) • Management must • allow some jobs to use the entire machine • allow short jobs to get started quickly they should not have to wait weeks in the queue • Some very long jobs may be needed, but are to be avoided
Backfill scheduling Job C 10 nodes system Job D Job B Job A time B A C D
Backfill scheduling • Requires real time limit to be set • More accurate (shorter) estimate gives more chance to be running earlier • Short jobs can move through system quicker • Uses system better by avoiding waste of cycles during wait
Using LoadLeveler • Graphical user interface: xloadl • Make shell script with LoadLeveler keywords as shell comments # @output = thing.log # @error = thing.err # @class = short # @queue # @executable = thingx # @node = 6,10 # @tasks_per_node = 4 # @requirements = (Adapter==hps_us)
Sample LoadLeveler Script #!/bin/ksh # @ job_type = parallel # @ input = /dev/null # @ output = $(Executable).$(Cluster).$(Process).out # @ error = $(Executable).$(Cluster).$(Process).err # @ initialdir = /gstudent/student_rt_y/directory # @ notify_user = student_rt_y@byu.edu # @ class = short # @ notification = complete # @ checkpoint = no # @ restart = no # @ requirements = (Arch == "power3") # @ blocking = unlimited # @ total_tasks = 4 # @ network.MPI = switch,shared,US # @ queue ./your_exe_and_any_args
Sample serial job #!/bin/ksh # @ job_type = serial # @ input = /dev/null # @ output = $(Executable).$(Cluster).$(Process).out # @ error = $(Executable).$(Cluster).$(Process).err # @ initialdir = /gstudent/student_rt_y # @ notify_user = student_rt_y@byu.edu # @ class = medium # @ notification = complete # @ checkpoint = no # @ restart = no # @ queue paupnew Hlav3ashort.paup
LoadLeveler commands • llq: shows all jobs • can also use showq • llq -s JobID : show why not running • llclass : shows classes • llstatus : shows machines • llcancel JobID : cancel job • llhold JobID : put job in hold state
Sample llq output bash-2.05a$ llq Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ ----------- m1015i.1127.0 mdt36 8/7 12:41 R 50 long m1009i m1015i.1128.0 mdt36 8/7 12:41 R 50 long m1019i m1015i.1497.0 jl447 8/12 16:25 R 50 long m1012i m1015i.1544.0 to5 8/13 08:44 R 50 long m1045i m1015i.1545.0 to5 8/13 08:44 R 50 long m1045i … m1015i.1602.0 taskman 8/14 08:13 R 50 short m1017i m1015i.1598.0 taskman 8/14 08:13 R 50 short m1014i m1015i.1601.0 taskman 8/14 08:13 R 50 short m1017i m1015i.1599.0 taskman 8/14 08:13 R 50 short m1014i m1015i.1600.0 taskman 8/14 08:13 R 50 short m1011i m1015i.1626.0 mendez 8/14 13:07 I 50 long m1015i.1625.0 cr66 8/14 12:40 I 50 medium m1015i.1513.0 jl447 8/13 07:08 I 50 long m1015i.1572.0 dvd 8/13 10:45 I 50 medium m1015i.1576.0 dvd 8/13 11:22 I 50 medium m1015i.1577.0 dvd 8/13 11:25 I 50 medium m1015i.1566.0 mdt36 8/13 08:51 I 50 long m1015i.1564.0 mdt36 8/13 08:50 I 50 long … m1015i.1612.0 taskman 8/14 08:27 I 50 short m1015i.1624.0 taskman 8/14 08:57 I 50 short m1015i.1623.0 taskman 8/14 08:57 I 50 short 58 job step(s) in queue, 23 waiting, 0 pending, 35 running, 0 held, 0 preempted
Sample showq output bash-2.05a$ showq ACTIVE JOBS-------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME m1015i.1581.0 taskman Running 1 18:39:00 Wed Aug 14 08:06:24 m1015i.1582.0 taskman Running 1 18:39:00 Wed Aug 14 08:06:24 m1015i.1580.0 taskman Running 1 18:39:00 Wed Aug 14 08:06:24 … m1015i.1615.0 taskman Running 1 21:33:42 Wed Aug 14 11:01:06 m1015i.1613.0 taskman Running 1 23:43:05 Wed Aug 14 13:10:29 m1015i.1575.0 dvd Running 4 2:15:10:38 Wed Aug 14 04:38:02 m1015i.1127.0 mdt36 Running 8 2:23:14:21 Wed Aug 7 12:41:45 … m1015i.1567.0 jar65 Running 4 9:04:07:44 Tue Aug 13 17:35:08 m1015i.1569.0 jar65 Running 4 9:08:28:16 Tue Aug 13 21:55:40 m1015i.1547.0 to5 Running 8 9:21:11:49 Wed Aug 14 10:39:13 m1015i.1546.0 to5 Running 8 9:21:11:49 Wed Aug 14 10:39:13 35 Active Jobs 150 of 184 Processors Active (81.52%) 26 of 34 Nodes Active (76.47%) IDLE JOBS---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME m1015i.1513.0 jl447 Idle 2 5:00:00:00 Tue Aug 13 07:08:09 m1015i.1572.0 dvd Idle 8 3:00:00:00 Tue Aug 13 10:45:18 … 23 Idle Jobs NON-QUEUED JOBS---------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME Total Jobs: 58 Active Jobs: 35 Idle Jobs: 23 Non-Queued Jobs: 0
LoadLeveler environment • Normally same as your login environment • Limits are set, use llclass -l to see values • ulimit -S -a • ulimit -H -a • Big heap requirements • -bmaxdata:0x80000000 up to 2 GB data (heap) • -q64 -bmaxdata:0x…. Up to 8 EB