230 likes | 249 Views
Learn how to create parallel programs on the cluster at UNCW using Paraguin and submit jobs through the Sun Grid Engine (SGE) scheduler. Compile and run programs using job submission files. Manage job status and delete jobs as needed.
E N D
Assignment 2 Using Paraguin to Create Parallel Programs
Cluster at UNCW User Computers Dedicated Cluster Ethernet interface Submit Host: babbage Master node Head Node: harpua Switch Compute nodes Compute Nodes: compute-0-0, compute-0-1, compute-0-2, …
Cluster at UNCW • We use the Sun Grid Engine (SGE) to schedule jobs on the cluster • This is to allow users to have exclusive use of the compute nodes so that users’ applications don’t interfere with the performance of others • The scheduler (SGE) is responsible for allocating compute nodes to jobs exclusively • Compile as normal: $ mpicc hello.c –o hello
SGE • But running is done through a job submission file • Some SGE commands: • qsub <job submission file> – submits a job to the schedule to run • qstat – see the status of submitted jobs (waiting, queued, running, terminated, etc.) • qdel <#> - deletes a job (by number) from the system • qhost – see a list of hosts
SGE • Example job submission file (hello.sge): #!/bin/sh # Usage: qsub hello.sge #$ -S /bin/sh #$ -peorte 16 # Specify how many processors we want # -- our name --- #$ -N Hello # Name for the job #$ -l h_rt=00:01:00 # Request 1 minute to execute #$ -cwd # Make sure that the .e and .o file arrive in the working directory #$ -j y # Merge the standard out and standard error to one file mpirun -np $NSLOTS ./hello
SGE • Example job submission file (hello.sge): #!/bin/sh # Usage: qsub hello.sge #$ -S /bin/sh #$ -peorte 16 # Specify how many processors we want
SGE • Example job submission file (hello.sge): # -- our name --- #$ -N Hello # Name for the job #$ -l h_rt=00:01:00 # Request 1 minute to execute The name of the job plus the name of the output files: Hello.o### and Hello.op### Indicates that the job will need only a minute. This is important so that SGE will clean up if the program hangs or terminates incorrectly. May need to increase the time for longer programs or it will terminate the program before it has completed.
SGE • Example job submission file (hello.sge): #$ -cwd # Make sure that the .e and .o file arrive in the working directory #$ -j y # Merge the standard out and standard error to one file Do the job in the current directory SGE will create 3 files: Hello.o##, Hello.e##, and Hello.op##. The –j y command will merge the Hello.o and Hello.e files (std out and error).
SGE • Example job submission file (hello.sge): mpirun -np $NSLOTS ./hello And finally the command to run the MPI program. $NSLOTS is the same number given with the #$ -pe orte 16 line.
SGE Example $ qstat $ qsub hello.sge Your job 106 ("Hello") has been submitted $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 106 0.00000 Hello cferner qw 09/04/2012 09:08:38 16 $ The state of “qw” means queued and waiting.
SGE Example $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 106 0.55500 Hello cferner r 09/04/2012 09:11:43 all.q@compute-0-0.local 16 [cferner@babbage mpi_assign]$ The state of “r” means running
SGE Example $ ls hello hello.c Hello.o106 Hello.po106 hello.sge ring ring.c ring.sge test test.c test.sge $ cat Hello.o106 Hello world from master process 0 running on compute-0-2.local Message from process = 1 : Hello world from process 1 running on compute-0-2.local Message from process = 2 : Hello world from process 2 running on compute-0-2.local … You will want to clean up the output files when you are done with them or you will end up with a bunch of clutter.
Deleting a job $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 108 0.00000 Hello cferner qw 09/04/2012 09:18:20 16 $ qdel 108 cferner has registered the job 108 for deletion $ qstat $
Assignment 2 Setup (Do this only once) • Put these lines in the file .bash_profile export MACHINE=x86_64-redhat-linux export SUIFHOME=/share/apps/suifhome export COMPILER_NAME=gcc `perl $SUIFHOME/setup_suif -sh` • Run the command: $ . .bash_profile • Notice the 2 periods and the space between them
Hello World Program • Program is given to you • You simply need to compile it and run it (using a job submission file) • Try running it on my processors • Produce documentation of compiling and running the program
Matrix Multiplication • Matrix Multiplication skeleton program is given to you in Appendix • Includes: • Opening the input file • Reading the input • Taking a time stamp • Taking a 2nd time stamp • Computing the elapsed time between the time stamps • Printing the results
Matrix Multiplication • You need to: • Broadcast the error to the processors and exit in necessary • Scatter the input • Compute the partial results • Gather the partial results
Heat Distribution • Using the stencil pattern, model the distribution of heat in a room that has a fireplace along one wall
Heat Distribution • The newly computed values will be the average of its neighbors (diagonals also) as well as its own old value • So each value at location i,j should be the average of 9 values • This reduces oscillations
Producing a Visual of the Output Produced with X11 Graphics Produced with Excel
Producing a Visual of the Output • See the document http://coitweb.uncc.edu/~abw/ITCS4145F13/Assignments/X11GraphicsNotes.pdf for help with creating graphics using X11. • The Excel Graph is a surface plot
Monte Carlo Estimation of π(required for Graduates/optional for Undergraduates) • Scatter/Gather pattern, but uses broadcast and reduce • This is not a workflow pattern • π can also be estimated by integrating the function , but you aren’t asked to do this.