450 likes | 600 Views
Introduction to Scientific Computing on BU’s Linux Cluster. Doug Sondak Linux Clusters and Tiled Display Walls Boston University July 30 – August 1, 2002. Outline. hardware parallelization compilers batch system profilers. Doug Sondak Linux Clusters and Tiled Display Walls
E N D
Introduction to Scientific Computing on BU’s Linux Cluster Doug Sondak Linux Clusters and Tiled Display Walls Boston University July 30 – August 1, 2002
Outline • hardware • parallelization • compilers • batch system • profilers Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Hardware Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
BU’s Cluster • 52 2-processor nodes • specifications • 2 Pentium III processors per node • 1 GHz • 1 GB memory per node • 32 KB L1 cache per CPU • 256 KB L2 cache per CPU Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
BU’s Cluster (2) • Myrinet 2000 interconnects • sustained 1.96 Gb/s • Linux Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Some Timings • CFD code, MPI, 4 procs. Origin2000 495 SP 329 Cluster, 2 procs. per box 174 Cluster, 1 proc. per box 153 Regatta 78 Machine Sec.
Parallelization Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Parallelization • MPI is the recommended method • PVM may also be used • some MPI tutorials • Boston University http://scv.bu.edu/Tutorials/MPI/ • NCSA http://pacont.ncsa.uiuc.edu:8900/public/MPI/ Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Parallelization (2) • OpenMP is available for SMP within a node • mixed MPI/OpenMP not presently available • we’re working on it! Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Compilers Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Compilers • Portland Group • pgf77 • pgf90 • pgcc • pgCC Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Compilers (2) • gnu • g77 • gcc • g++ Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Compilers (3) • Intel • Fortran ifc • C/C++ icc Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Compilers (2) Polyhedron F77 Benchmarks http://www.polyhedron.com/ PG gnu Intel AC 8.66 12.38 6.13 ADI 8.48 9.27 6.83 AIR 16.41 15.65 13.45 CHESS 11.67 10.06 10.16 DODUC 21.35 36.23 18.18 LP8 4.31 7.88 4.16 MDB 3.62 3.81 2.94 MOLENR 11.66 12.72 7.61 PI 24.58 41.95 7.08 PNPOLY 3.81 5.24 4.86 RO 10.75 10.31 3.92 TFFT 18.84 20.24 20.18
Compilers (3) • Portland Group • pgf77 generally faster than g77 • Intel • ifc generally faster than pgf77 Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Compilers (4) • Linux C/C++ compilers • gcc/g++ seems to be the standard, usually described as a good compiler Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Portland Group -O2 • highest level of optimization -fast • same as -O2 -Munroll -Mnoframe -Minline • function inlining Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Portland Group (2) -Mbyteswapio • swaps between big endian and little endian • useful for using files created on our SP, Regatta, or Origin2000 -Ktrap=fp • trap floating point invalid operation, divide by zero, or overflow • slows code down, only use for debugging Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Portland Group (3) -Mbounds • array bounds checking • slows code down, only use for debugging -mp • process OpenMP directives -Mconcur • automatic SMP parallelization Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Intel • Need to set some environment variables • contained in /usr/local/IT/intel6.0/compiler60/ia32/bin/iccvars.csh • source this file, copy it into your .cshrc file, or source it in .cshrc • there’s an identical file called ifcvars.cshto avoid (create?) confusion Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Intel (2) -O3 • highest level of optimization -ipo • interprocedural optimization -unroll • loop unrolling Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Intel (3) -openmp -fpp • process OpenMP directives -parallel • automatic SMP parallelization -CB • array bounds checking Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Intel (3) -CU • check for use of uninitialized variables • Endian conversion by way of environment variables setenv F_UFMTENDIAN big • all reads will be converted from big to little endian, all writes from little to big endian Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Intel (4) • Can specify units for endian conversion setenv F_UFMTENDIAN big:10,20 • Can mix endian conversions setenv F_UFMTENDIAN little;big:10,20 • all units are little endian except for 10 and 20, which wil be converted Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Batch System Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Batch System • PBS • different than LSF on O2k’s, SP’s, Regattas • there’s only one queue dque Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
qsub • job submission done through script • script details will follow qsub scriptname • returns job ID • in working directory • std. out - scriptname.ojobid • std. err -scriptname.ejobid [sondak@hn003 run]$ qsub corrun 808.hn003.nerf.bu.edu
qstat • Check status of all your jobs qstat • lies about run time • often (always?) zero [sondak@hn003 run]$ qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- ------------ - -------- 808.hn003 corrun sondak 0 R dque
qstat (2) • S - job status • Q - queued • R - running • E - exiting (finishing up) • qstat -f gives detailed status exec_host = nodem019/0+nodem018/0 +nodem017/0+nodem016/0 • to specify jobid qstat jobid
Other PBS Commands • kill job qdel jobid • some less-important PBS commands • qalter, qhold, qrls, qmsg, qrerun • man pages are available for all commands Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
PBS Script • For serial runs #!/bin/bash # Set the default queue #PBS -q dque # ppn is cpu's per node #PBS -l nodes=1:ppn=1,walltime=00:30:00 cd $PBS_O_WORKDIR myrun
PBS/MPI • For MPI, set up gmi file in PBS script test -d ~/.gmpi || mkdir ~/.gmpi GMCONF=~/.gmpi/conf.$PBS_JOBID /usr/local/xcat/bin/pbsnodefile2gmconf $PBS_NODEFILE > $GMCONF cd $PBS_O_WORKDIR NP=$(head -1 $GMCONF) Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
PBS/MPI (2) • To run MPI, end PBS script with (all on one line) mpirun.ch_gm --gm-f $GMCONF --gm-recv polling --gm-use-shmem --gm-kill 5 -np $NP PBS_JOBID=$PBS_JOBID myprog Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
PBS/MPI (3) • mpirun.ch_gm • version of mpirun that uses myrinet • --gm-f $GMCONF • access configuration file constructed above • --gm-recv polling • poll continually to check for completion of sends and receives • most efficient for dedicated procs. • That’s us!
PBS/MPI (4) • --gm-use-shmem • enable shared-memory support • may improve or degrade performance • try your code with and without it • --gm-kill 5 • if one MPI process aborts, kill others after 5 sec. Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
PBS/MPI (5) • -np $NP • run on NP procs as computed earlier in script • equals “nodes x ppn” from PBS -l option • PBS_JOBID=$PBS_JOBID • seems redundant redundant • do it anyway • myprog • run the darn code already!
Profiling Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
Portland Group • Portland Group Compiler flag • function level -Mprof=func • line level -Mprof=lines • much larger file • creates pgprof.out file in working directory Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
PG (2) • At unix prompt, type pgprof command • will pop up window with bar chart of timing results • can take file name argument in case you’ve renamed the pgprof.out file pgprof pgprof.lines Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
PG (3) • option to specify source directory pgprof -Isourcedir pgprof.lines • can specify multiple directories with multiple -I flags • also can use GUI menu • Options Source Directory... Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
PG (5) • Calls - number of times routine was called • Time - time spent in specified routine • Cost - time spent in specified routine plus time spent in called routines Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002
PG (6) • Lines profiling • with optimization, may not be able to identify many (most?) lines in source code • reports results for blocks of code, e.g., loops • without optimization, doesn’t measure what you really want • initial screen looks like “func” screen • double-click function/subroutine name to get line-level listing
Questions/Comments • Feel free to contact us directly with questions about the cluster or parallelization/optimization issues Doug Sondak sondak@bu.edu Kadin Tseng kadin@bu.edu Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002