280 likes | 405 Views
Using NPACI Systems Part I. NPACI Environment ( http://www.npaci.edu/Resources ). NPACI Parallel Computing Workshop March 10, 2003 San Diego Supercomputer Center. NPACI Computer Systems. Full List at: www.npaci.edu/Resources This Institute focus – SDSC IBM Blue Horizon
E N D
Using NPACI SystemsPart I. NPACI Environment(http://www.npaci.edu/Resources) NPACI Parallel Computing Workshop March 10, 2003 San Diego Supercomputer Center
NPACI Computer Systems • Full List at: www.npaci.edu/Resources • This Institute focus – SDSC IBM Blue Horizon • $HOME, $WORK, HPSS, hsi • Accounting • reslist
NPACI Environment Outline: • Logging in • ssh, startup scripts • File System • $HOME, $WORK, HPSS, hsi • Accounting • reslist
NPACI Environment: Logging In • ssh horizon.npaci.edu -l username • ssh (secure shell) required for secure access • For SDSC Security policy, see: http://security.sdsc.edu/
NPACI Environment: Unix Startup Files • .cshrc, .login, and .profile NPACI provided • set up NPACI default programming environment • set up Unix environment • Execute COPYDEFAULTS utility to restore startup files /usr/local/paci/shellrc/bin/COPYDEFAULTS (then source .cshrc)
Filesystem Structure • $HOME/: 50-100 MB/user of backed up storage used for storage of Unix essentials, etc. • $WORK/: >200 GB of purged (96 hours) disk storage primary workspace • Archival Storage: HPSS(SDSC)/DMF(TACC) • Disk/tape system providing TBytes of storage
HPSS - Archival Storage (SDSC) • Interactive execution: “hsi” is recommended HPSS interface utility; can also use pftp - Beginning Interactive Execution: tf004i% hsi Saving a file in HPSS: hsi> put filename Retrieving a file from HPSS: hsi> get filename Ending Interactive Execution: hsi> quit
HPSS - Archival Storage (SDSC) • Single line execution: • Saving a file in HPSS: tf004i% hsi put filename • Retrieving a file from HPSS: tf004i% hsi get filename • For tips and examples of using HPSS, see http://www.npaci.edu/HPSS http://www.sdsc.edu/Storage/hsi/
NPACI Environment: Accounting • Normally updated every 24 hours • beware of going over your allocation • Commands: • reslist -u username (-a accountname) • resalloc
Using NPACI SystemsPart II. Blue Horizon(http://www.npaci.edu/Horizon) NPACI Parallel Computing Workshop March 10, 2003 San Diego Supercomputer Center
Overview • Hardware overview • Compiling programs • How CPUs ( nodes ) are requested • Running parallel jobs • Interactive • Loadleveler scripts
Configuration • 144 IBM SP-High Nodes • 8 shared-memory processors/node for 1152 processors • 4 Gbytes/node SDRAM per node • 13 “B80” Nodes for Interactive work • 4 shared-memory processors/node • 2 Gbytes/node SDRAM per node • Power3 processors • 375 MHz CPU Clock • Up to 4 floating point operations per cycle • 1.5 GFLOPS/processor (Peak) • Nodes connected together with IBM Colony switch • 350 MB/second - Peak Bandwidth • 350 MB/second - measured with MPI “ping-pong” test
Blue Horizon Programming Environment • Compilers • Fortran : xlf, xlf90, xlf95, mpxlf, mpxlf90, mpxlf95, xlf_r, xlf90_r, xlf95_r, mpxlf_r, mpxlf90_r, mpxlf95_r • C/C++ : xlc, xlC, mpcc, mpCC, gcc(GNU c/c++) • Tools • Debuggers • TotalView • pedb • Profilers • xprofiler • Parallel Environment (PE) and Parallel Operating Environment (POE) • Enables running parallel jobs • Message passing libraries: MPI, MPICH, LAPI, MPL
Blue Horizon - Compiling C • C • xlc -qtune=pwr3 -qarch=pwr3 -O3program.c • C MPI • mpcc -qtune=pwr3 -qarch=pwr3 -O3program.c • C OpenMP • cc_r -qsmp=noauto:omp program.c • C MPI and OpenMP (Hybrid) • mpcc_r -qsmp=noauto:omp program.c • C++ • xlC mpCC xlC_r mpCC_r
Blue Horizon - Compiling Fortran • Fortran 77 Serial • xlf -qtune=pwr3 -qarch=pwr3 -O3program.f • Fortran 77 with MPI • mpxlf -qtune=pwr3 -qarch=pwr3 -O3program.f • Fortran 77 with OpenMP • xlf_r -qsmp=noauto:omp program.f • Fortran 77 with MPI and OpenMP • mxlf_r program -qsmp=noauto:omp program.f • Fortran 90/95 • xlf90, mpxlf90, xlf90_r, mpxlf90_r, xlf95, mpxlf95, xlf95_r, mpxlf95_r
Laura C. Nett: Describe the break down of the machine: Describe how to request processors or CPUs. Note that # of nodes and tasks are requested via the POE or LL environment where # of threads is set via environmental variable. Current hardware is limited to only using 4 MPI tasks per node meaning that 4 PEs will be idle ONLY if using fast communications...don’t worry because you aren’t charged for them. New colony switch will change that. Note when using threads: set the number of threads to spawn per MPI task. How CPUs are Requested Batch Processing • Blue Horizon has 144 Nodes with 8 CPUs on each node. 144*8=1152 CPUs • Request for CPUs are made on node basis - nodes not shared with other users’ tasks: # of Nodes # of tasks per Node # of threads per task (hybrid code only) • # of CPUs requested = nodes*tasks*threads • Hybrid code - spawns threads per MPI task
Laura C. Nett: Interactive jobs run similar to SP2 except you need to define the number of CPUs by defining # of nodes and tasks. if you want threads to be spawned define this before running poe. The interactive CPUs are set aside for use during the day and unlike the SP2 they are not shared. There are various environmental variables that you may want to set and they get defined using all lower case. Remember that the tasks per node is a number between 1-8 the number of CPUs on a node but for straight MPI code using fast communications the max you can set this to is 4. Running Interactive Jobs • Interactive Jobs - first, log on b80 nodes: • ssh b80login.sdsc.edu poe a.outarg_list -nodes n -tasks_per_node m -rmpool 1 -euilib ip -euidevice en0 • Interactive Debugging: pedb a.outarg_list -nodes n -tasks_per_node m -rmpool 1 -euilib ip -euidevice en0 Alternatively, use environment variables: setenv MP_NODES n setenv MP_TASKS_PER_NODE m setenv MP_RMPOOL 1 setenv EUILIB ip setenv EUIDEVICE en0 tasks_per_node: 1-4 (number of CPUs on each node)
Batch Jobs with LoadLeveler • Develop a program and make sure it runs correctly • Write a script file that has information about your job and the nodes/tasks/threads you want to run it on • Submit the script file to LoadLeveler – batch job utility • Check the status of your job • NPACI Batch Script Generator (sample) http://hotpage.npaci.edu/Batch
Blue Horizon Batch Queue Concepts User Batch Script LoadLeveler Catalina Scheduler
LoadLeveler Scripts: Example 18 CPUs per Node, 16 Nodes #!/bin/ksh # @ environment = MP_EUILIB=us; MP_SHARED_MEMORY=YES; MP_PULSE=0; MP_INTRDELAY=100; # @ class = high # @ job_type = parallel # @ node=16 # @ tasks_per_node=8 # @ node_usage=not_shared # @ network.MPI=css0,not_shared,US # @ wall_clock_limit =3:30:00 # @ input = /dev/null # @ output = LL_out.$(jobid) # @ error = LL_err.$(jobid) # @ initialdir = /work/login-name/mydir # @ notify_user = login-name@npaci.edu # @ notification = always # @ queue poe a.out or poe myscript myscript: #!/bin/ksh a.out
LoadLeveler Scripts: Example 24 CPUs per node PLUS 2 threads per task #!/bin/ksh # @ environment = MP_EUILIB=us; MP_SHARED_MEMORY=YES; MP_PULSE=0;MP_INTRDELAY=100; # @ class = high # @ job_type = parallel # @ node=16 # @ tasks_per_node=4 # @ node_usage=not_shared # @ network.MPI=css0,not_shared,US # @ wall_clock_limit =3:30:00 # @ input = /dev/null # @ output = LL_out.$(jobid) # @ error = LL_err.$(jobid) # @ initialdir = /work/login-name/mydir # @ notify_user = login-name@npaci.edu # @ notification = always # @ queue export OMP_NUM_THREADS=2 export SPINS=0 export YIELDS=0 export SPINLOOPTIME=5000 poe a.out or poe myscript myscript: #!/bin/ksh export OMP_NUM_THREADS=2 export SPINS=0 export YIELDS=0 export SPINLOOPTIME=5000 a.out
Keywords in LoadLeveler Scripts arguments - arguments to pass to the executable input - file to use as stdin output - file to use as stdout initialdir - initial working directory for job executable - executable or script to run job_type - can be parallel or serial notify_user - user to send email to regarding this job, keywords are case insensitive
Keywords for LoadLeveler Scripts(cont.) notification - e-mail to notify_user (always, error, start, never, complete) node - how many nodes do you want tasks_per_node - MPI tasks/node (max of 8) wall_clock_limit - maximum wall clock time to use for the job class - class (queue) to run in (low,normal,high,express) queue - puts the job in the queue
LoadLeveler Commands • Job Submission llsubmit filename submit job filename to LoadLeveler llcancel [-q] jobid cancel a job with the given id llq [–s job_number] list job [job number]
Scheduler Commands • Job Status showq show job queue showq | fgrep jobname get info showbf show backfill
Documentation NPACI Blue Horizon User Guide documentation http://www.npaci.edu/Horizon IBM SP documentation http://www.rs6000.ibm.com/resource/aix_resource/sp_books
Lab Session - Blue Horizon • Log into horizon ssh -l username horizon.npaci.edu • Ensure you have the default environment - run COPYDEFAULTS /usr/local/paci/shellrc/bin/COPYDEFAULTS source .cshrc • Copy sample LoadLeveler script and MPI example source code cp /gpfs/Consult/Training/LoadLeveler/LLexample . cp /gpfs/Consult/Training/LoadLeveler/hello_mpi.f . • Compile sample source code: mpxlf hello_mpi.f • Run executable interactively with 4 tasks per node: MPI ONLY: poe a.out -nodes 1 -tasks_per_node 4 -rmpool 1 • Submit job to LoadLeveler batch queue: • First edit Llexample script to customize for your user login name • Submit Llexample using “llsubmit” command llsubmit LLexample • Check current queues showq