230 likes | 253 Views
Learn how to create and submit jobs efficiently on a high-performance computing system, using block creation, allocation, job modes, submission methods, and more.
E N D
Job submission • Basic procedure • Create a block • Allocate a block • Boot a block • Run a job • Free the block, or run another job
Block creation • Two ways of creating blocks • Block builder • mmcs_db_console • The use of block builder is recommended • Block builder is capable to create any available blocks • Block builder is a lot easier to use…
Block creation • Block builder • Available via Navigator • Able to create any block with a valid block size • 16, 32, 64, 128 and 256 nodes (mesh) • 512 and multiples of 512 nodes (torus/mesh) • Starting card • 16 node : J00, J01 • 64 node : N00, N02, N04, N08, N10, N12 or N14 • 128 node : N00, N04, N08 or N12 • 256 node : N00 or N08
Block creation • mmcs_db_console • Able to create most of the available blocks • Provides a set of commands to create block • genblock : a base partition • genblocks : each base partition on the system • gensmallblock : a sub-base partition • genBPblock : a set of base partitions • genfullblock : the entire system • Use Navigator for pass-through or split cables
Block deletion • Available via mmcs_db_console • mmcs$ delete bgpblock R00-M0 type ‘help delete’ in the mmcs shell prompt for usage • Block deletion is not available via Navigator’s GUI • mmcs_db_console within the Navigator is available
Exercise • Create a block from the block builder • Create a block from the mmcs_db_console • Delete a block from the mmcs_db_console
Job modes • There are three job modes, virtual node mode, SMP mode, and Dual Mode • MPI Ranks (processes) per node & Threads per process: • VNM 4 processes/node, 1 thread/process • SMP 1 process/node, 4 threads/process • Dual 2 processes/node, 2 threads/process Virtual Node Mode SMP Mode Dual Mode CPU 0 Rank 0 CPU 1 Rank 1 CPU 0 Rank 0 CPU 1 thread CPU 0 Rank 0 CPU 1 thread CPU 2 Rank 2 CPU 3 Rank 3 CPU 2 thread CPU 3 thread CPU 2 Rank 1 CPU 3 thread
Job submission • Ways to submit a job • mmcs_db_console • mpirun • LoadLeveler
Job submission • mmcs_db_console • A console for the Midplane Management Control System (MMCS) • Used to configure and allocate blocks of compute nodes and I/O nodes and run programs on the BG/P system. • Basically for administrator use • Requires an access to the service node • Environmental variable needed to be set • /etc/profile.d/bgp.sh • Caveat when submitting jobs from the console • No stdin support • stdout & stderr sent to files
Job submission • mmcs_db_console • $ cd /bgsys/driver/ppcfloor/bin • $ ./mmcs_db_console • mmcs$ allocate_block R00-M0 • mmcs$ boot_block • mmcs$ submit_job /bghome/test/hello.rts /bghome/test • mmcs$ free R00-M0 • mmcs$ quit type ‘help’ in the mmcs shell prompt for available commands
Job submission • mmcs commands • allocate_block : mark the block as allocated, but does not boot it • boot_block : initialize, load and start block resource • submit_job : starts an executable running on the currently selected block • free : release the resources associated with the block ID
Job submission • mpirun • Launches jobs on the BG/P hardware and acts as a job monitor • mpirun continually monitors status of the job, terminates when job is done • Transparently forwards stdin & receives stdout and stderr • Acts as a gateway for debuggers such as gdb and TotalView • Each job requires a partition • Can be allocated on the fly (-np or –shape) • Or used predefined partitions • Can boot partitions from their initial state • Disable this feature with –noallocate • User should verify no overlapping busy hardware • Can optionally not destroy booted partitions with -nofree
Job submission • mpirun $ mpirun –partition R00-M0 –mode SMP –cwd /bghome/test –exe /bghome/test/hello.rts partition : specify which block to use mode : specify execution processor mode cwd : specify currently working directory exe : specify the program to run type mpirun –h for available options
Job submission • LoadLeveler • Allocates machine resources to run jobs • Scheduling of jobs depends on the availability of resources within the system • A user submits a job using a job command file • Maximize the efficiency of the cluster by maximizing the utilization of resources
Job submission • LoadLeveler some of the tasks can be performed: • Choosing the next job to run • Examining the job requirements • Collecting available resource in its cluster • Dispatching the job to the selected machine • Controlling running jobs • Create reservations and schedule jobs to run in the reservations • Job preemption to enable high priority jobs to run immediately • Fair share scheduling to automatically balance resources among users or groups of users • Co-scheduling to enable several jobs to be scheduled to run at the same time
Example code • 1. Write simple hello world: /* Hello World program */ #include<stdio.h> void main(void) { printf("Hello World!\n"); } • 2. Compile the program: /bgsys/drivers/ppcfloor/comm/bin/mpicc -o hello hello.c • 3. Run the program: Assuming that the program lives in /bgsys/apps and you want the results (STDOUT and STDERR) to be written to /bgsys/apps/results: At the mmcs_db_console prompt: mmcs$ submit_job /bgsys/apps/hello /bgsys/apps/results/
Exercise • Submit a job using mmcs_db_console • Free the block after the job finishes • Submit a job using mpirun • Submit a job using LoadLeveler
Job termination • mmcs_db_console • killjob, kill_job • mmcs$ killjob R00-M0 124 • mmcs$ wait_job • Terminating a job can take a while • default timeout is 5 minutes
Job termination • mpirun Control-C • mpirund will do a cleanup • Do not send multiple control-C • Second control-C will force termination • Third control-C is almost similar to kill -9, which may cause block state to be left in limbo
Scripting A list of commands for mmcs_db_console can be written into a file for a scripting usage $ mmcs_db_console < script_file script_file is a simple ascii text file with a list of commands for mmcs_db_console
Scripting • Sample script_file Create and test several blocks $ cat script_file genblock R00-M0 R00-M0 64 allocate R00-M0 free R00-M0 genblock R00-M1 R00-M1 64 allocate R00-M1 free R00-M1 … quit
Bridge API • Public API used by job schedulers • LoadLeveler, SLURM, Altair PBS Pro, Platform LSF, Cobalt • Used by mpirun too • Has Interfaces to manage various Blue Gene resources • Create, destroy, query logical constructs such as jobs and partitions • Query physical entities such as midplanes, node cards, switches, and cables • Essentially a thin abstraction layer of the database • Requires a polling model to obtain machine state, example: • Grab a snapshot of the machine state • Create a partition based on free resources • Boot partition • Poll partition state until it is INITIALIZED