700 likes | 894 Views
Kickstart Tutorial/Seminar on using the 64-nodes P4-Xeon Cluster in Science Faculty. June 11, 2003. Aims and target audience. Aims: Provide a kickstart tutorial to potential cluster users in Science Faculty, HKBU Promote the usage of the PC cluster in Science Faculty Target audience
E N D
Kickstart Tutorial/Seminar on using the 64-nodes P4-Xeon Cluster in Science Faculty June 11, 2003
Aims and target audience • Aims: • Provide a kickstart tutorial to potential cluster users in Science Faculty, HKBU • Promote the usage of the PC cluster in Science Faculty • Target audience • Science Faculty students referred by their project/thesis supervisors • Staff who are interested in High Performance Computing
Outline • Brief introduction • Hardware, software, login and policy • How to write and run program on multiple CPUs • Simple MPI programming • Resources on MPI documentation • Demonstration of software installed • SPRNG, BLAS, NAMD2, GAMESS, PGI
Hardware Configuration • 1 master node + 64 compute nodes + Gigabit Interconnection • Master node • Dell PE2650, P4-Xeon 2.8GHz x 2 • 4GB RAM, 36GB x 2 U160 SCSI (mirror) • Gigabit ethernet ports x 2 • SCSI attached storage • Dell PV220S • 73GB x 10 (RAID5)
Hardware Configuration (cont) • Compute nodes • Dell PE2650, P4-Xeon 2.8GHz x 2 • 2GB RAM, 36GB U160 SCSI HD • Gigabit ethernet ports x 2 • Gigabit Interconnect • Extreme Blackdiamond 6816 Gigabit ethernet • 256Gb backplane • 72 Gigabit ports (8 ports card x 9)
Software installed • Cluster operating system • ROCKS 2.3.2 from www.rocksclusters.org • MPI and PVM libraries • LAM/MPI 6.5.9, MPICH 1.2.5, PVM 3.4.3-6beolin • Compilers • GCC 2.96, GCC 3.2.3 • PGI C/C++/f77/f90/hpf version 4.0 • MATH libraries • ATLAS 3.4.1, ScaLAPACK, SPRNG 2.0a • Application software • MATLAB 6.1 with MPITB • Gromacs 3.1.4, NAMD2.5b1 , Gamess • Editors • vi, pico, emacs, joe • Queuing system • OpenPBS 2.3.16, Maui scheduler
Cluster O.S. – ROCKS 2.3.2 • Developed by NPACI and SDSC • Based on RedHat 7.3 • Allow setup of 64 nodes in 1 hour • Useful command for users to monitor jobs in all nodes. E.g. • cluster-fork date • cluster-ps morris • cluster-kill morris • Web based management and monitoring • http://tdgrocks.sci.hkbu.edu.hk
Hostnames • Master node • External : tdgrocks.sci.hkbu.edu.hk • Internal : frontend-0 • Compute nodes • comp-pvfs-0-1, …, comp-pvfs-0-48 • Short names: cp0-1, cp0-2, …, cp0-48
Network diagram tdgrocks.sci.hkbu.edu.hk Master node frontend-0 (192.168.8.1) Gigibit ethernet switch Compute node Compute node Compute node comp-pvfs-0-1 (192.168.8.254) comp-pvfs-0-2 (192.168.8.253) comp-pvfs-0-48 (192.168.8.207)
Login to the master node • Login is allowed remotely in all HKBU networked PCs by ssh or vncviewer • SSH Login (terminal login) • Using your favourite ssh client software, namely putty, SSHsecureshell on windows and openssh on Linux/UNIX • E.g. on all SCI workstations (sc11 – sc30), type ssh tdgrocks.sci.hkbu.edu.hk
Login to the master node • VNC Login (graphical login) • Using vncviewer download from http://www.uk.research.att.com/vnc/ • E.g. in sc11 – sc30.sci.hkbu.edu.hk, vncviewer vnc.sci.hkbu.edu.hk:51 • E.g. in windows, run vncviewer and upon asking the server address, type vnc.sci.hkbu.edu.hk:51
Username and password • The unified password authentication has been implemented • Same as that of your netware account • Password authentication using NDS-AS • Setup similar to net1 and net4 in ITSC
ssh key generation • To make use of multiple nodes in the PC cluster, users are restricted to use ssh. • Key generation is done once automatically during first login • You may input a passphrase to protect the key pair • The key pair is stored in your $HOME/.ssh/
User Policy • Users are allowed to remote login from other networked PCs in HKBU. • All users must use their own user account to login. • The master node (frontend) is used only for login, simple editing of program source code, preparing the job dispatching script and dispatching of jobs to compute node. No foreground or background can be run on it. • Dispatching of jobs must be done via the OpenPBS system.
OpenPBS system • Provide a fair and efficient job dispatching and queuing system to the cluster • PBS script shall be written for running job • Either sequential or parallel jobs can be handled by PBS • Jobs error and output are stored in different filenames according to job IDs.
PBS script example (sequential) #!/bin/bash #PBS -l nodes=1 #PBS -N prime #PBS -m ae #PBS -q default # the above is the PBS directive used in batch queue # Assume that you placed the executable in /u1/local/share/pbsexamples echo Running on host `hostname` /u1/local/share/pbsexamples/prime 216091 • PBS scripts are shell script with directives preceding with #PBS • The above example request only 1 node and deliver the job named ‘prime’ in default queue. • The PBS system will mail a message after the job executed.
Delivering PBS job • Prepare and compile executable cp /u1/local/share/pbsexamples/prime.c . cc –o prime prime.c -lm • Prepare and edit PBS script as previous cp /u1/local/share/pbsexamples/prime.bat . • Submit the job qsub prime.bat
PBS script example (parallel) #!/bin/sh #PBS -N cpi #PBS -r n #PBS -e cpi.err #PBS -o cpi.log #PBS -m ae #PBS -l nodes=5:ppn=2 #PBS -l walltime=01:00:00 # This job's working directory echo Working directory is $PBS_O_WORKDIR cd $PBS_O_WORKDIR echo Running on host `hostname` echo This jobs runs on the following processors: echo `cat $PBS_NODEFILE` # Define number of processors NPROCS=`wc -l < $PBS_NODEFILE` echo This job has allocated $NPROCS nodes # Run the parallel MPI executable “cpi” /u1/local/mpich-1.2.5/bin/mpirun -v -machinefile $PBS_NODEFILE -np $NPROCS /u1/local/share/pbsexamples/cpi
Delivering parallel jobs • Copy the PBS script examples cp /u1/local/share/pbsexamples/runcpi . • Submit the PBS job qsub runcpi • Note the error and output files named cpi.e??? and cpi.o???
End of Part 1 Thank you!
Demonstration of software installed • SPRNG • BLAS, ScaLAPACK • MPITB for MATLAB • NAMD2 and VMD • GAMESS, GROMACS • PGI Compilers for parallel programming • More…
SPRNG 2.0a • Scalable Parallel Pseudo Random Number Generators Library • A set of libraries for scalable and portable pseudorandom number generation. • Most suitable for parallel Monte-Carlo simulation • The current version is installed in /u1/local/sprng2.0a • For serial source code (e.g. mcErf.c), compile with gcc -c -I /u1/local/sprng2.0/include mcErf.c gcc -o mcErf -L /u1/local/sprng2.0/lib mcErf.o -lsprng –lm • For parallel source code (e.g. mcErf-mpi.c, compile with mpicc -c -I /u1/local/sprng2.0/include mcErf-mpi.c mpicc -o mcErf-mpi -L /u1/local/sprng2.0/lib mcErf-mpi.o \ -lsprng –lpmpich –lmpich –lm • Or use Makefile to automate the above process • Samples file mcPi.tar.gz, mcErf.tar.gz can be found in /u1/local/share/example/sprng/ in the cluster • Thanks for Mr. K.I. Liu for providing documentations and samples for SPRNG in http://www.math.hkbu.edu.hk/~kiliu. • More information can be found in http://sprng.cs.fsu.edu/.
BLAS • Basic Linear Algebra Subprograms • basic vector and matrix operations • Sample code showing the speed of BLAS matrix-matrix multiplication against self written for loop in /u1/local/share/example/blas • dgemm.c, makefile.dgemm.c, • dgemm-mpi.c, makefile.dgemm-mpi Thanks Mr. C.W. Yeung, MATH for providing the above example • Further information can be found in http://www.netlib.org/blas/
ScaLAPACK • Scalable LAPACK • PBLAS + BLACS • PBLAS : Parallel Basic Linear Algebra Subprograms • BLACS : Basic Linear Algebra Communication Subprograms • Support MPI and PVM • Only MPI version can be found in our cluster • Directories for BLAS, BLACS and ScaLAPACK libraries: • /u1/local/ATLAS/lib/Linux_P4SSE2_2/ • /u1/local/BLACS/LIBS • /u1/local/SCALAPACK/libscalapack.a • PBLAS and ScaLAPACK examples (pblas.tgz, scaex.tgz) are stored in /u1/local/share/example/scalapack • Further information for ScaLAPACK can be found in http://www.netlib.org/scalapack/scalapack_home.html • Please ask Morris for further information.
MPITB for MATLAB • MPITB example • MC.tar.gz in /u1/local/share/example/mpitb • Untar the example in your home tar xzvf /u1/local/share/example/mpitb/MC.tar.gz • Lamboot first then start matlab and run qsub runMCpbs.bat • Further information can be found in • http://www.sci.hkbu.edu.hk/~smlam/tdgc/MPITB • Thanks Tammy Lam, MATH for providing the above homepage and examples
NAMD2 • parallel, object oriented molecular dynamics code • high-performance simulation of large biomolecular systems • binary downloaded and installed in /u1/local/namd2/ • work with VMD frontend (GUI) • Demonstration of VMD and NAMD using alanin.zip in /u1/local/example/namd2 • Further information can be found in http://www.ks.uiuc.edu/Research/namd • Ask Morris Law for further information
GAMESS • The General Atomic and Molecular Electronic Structure System • A general ab initio quantum chemistry package • Thanks for Justin Lau, CHEM for providing sample scripts and explanation of the Chemistry behind.
PGI compilers support 3 types of parallel programming • Automatic shared-memory parallel • Use in SMP within the same node (max NCPUS=2) • Using the option -Mconcur in pgcc, pgCC, pgf77, pgf90 pgcc –o pgprime –Mconcur prime.c export NCPUS=2 ./pgprime • User-directed shared-memory parallel • Use in SMP within the same node (max NCPUS=2) • Using the option -mp in pgcc, pgCC, pgf77, pgf90 pgf90 –o f90prime –Mconcur prime.f90 export NCPUS=2 ./f90prime • User should understand OpenMP parallelization directives for Fortran and pragmas for C and C++ • Consult PGI Workstations user guide for details • http://www.pgroup.com/ppro_docs/pgiws_ug/pgiug_.htm
PGI compilers support 3 types of parallel programming • Data parallelshared- or distribute-memory parallel • Only HPF support • suitable in SMP and cluster environment pghpf –o hello hello.hpf ./hello –pghpf –np 8 –stat alls • PGHPF environmental variables • PGHPF_RSH=ssh • PGHPF_HOST=cp0-1,cp0-2, • PGHPF_STAT=alls (can be cpu, mem, all, etc) • PGHPF_NP (max no.=16, license limit) • Example files in /u1/local/share/example/hpf • hello.tar.gz, pde1.tar.gz • Consult PGHPF user guide in http://www.pgroup.com/ppro_docs/pghpf_ug/hpfug.htm
Other software in considerations • PGAPACK • Parallel Genetic Algorithm Package • /u1/local/pga • PETSc • the Portable, Extensible Toolkit for Scientific computation • Any suggestions
End of Part 2 Thank you!
What is Message Passing Interface (MPI)? • Portable standard for communication • Processes can communicate through messages. • Each process is a separable program • All data is private
What is Message Passing Interface (MPI)? • This is a library, not a language!! • Different compilers, but all must use the same libraries, i.e. MPICH, LAM, etc. • There are two versions now, MPI-1 and MPI-2 • Use standard sequential language. Fortran, C, etc.
Basic Idea of Message Passing Interface (MPI) • MPI Environment - Initialize, manage, and terminate communication among processes • Communication between processes • global communication, i.e. broadcast, gather, etc. • point to point communication, i.e. send, receive, etc. • Complicated data structures • i.e. matrices and memory
Is MPI Large or Small? • MPI is large • More than one hundred functions • But not necessarily a measure of complexity • MPI is small • Many parallel programs can be written with just 6 basic functions • MPI is just right • One can access flexibility when it is required • One need not master all MPI functions
When Use MPI? • You need a portable parallel program • You are writing a parallel library • You care about performance • You have a problem that can be solved in parallel ways
F77/F90, C/C++ MPI library calls • Fortran 77/90 uses subroutines • CALL is used to invoke the library call • Nothing is returned, the error code variable is the last argument • All variables are passed by reference • C/C++ uses functions • Just the name is used to invoke the library call • The function returns an integer value (an error code) • Variables are passed by value, unless otherwise specified
Getting started with LAM • Create a file called “lamhosts” • The content of “lamhosts” (8 notes): cp0-1 cp0-2 cp0-3 … cp0-8 frontend-0
Getting started with LAM • starts LAM on the specified cluster LAMRSH=ssh export LAMRSH lamboot -v lamhosts • removes all traces of the LAM session on the network lamhalt • In the case of a catastrophic failure (e.g., one or more LAM nodes crash), the lamhalt utility will hang LAMRSH=ssh export LAMRSH wipe -v lamhosts
Getting started with MPICH • Open the “.bashrc” under your home directory • Add a path at the end of the file: PATH=/u1/local/mpich-1.2.5/bin:/u1/local/pgi/linux86/bin:$PATH • Save and exit • Restart the terminal
MPI Commands • mpicc - compiles an mpi program mpicc -o foo foo.c mpif77 -o foo foo.f mpif90 -o foo foo.f90 • mpirun - start the execution of mpi programs mpirun -v -np 2 foo
MPI Environment • Initialize - initialize environment • Finalize - terminate environment • Communicator- create default comm. group for all processes • Version - establish version of MPI • Total processes - spawn total processes • Rank/Process ID - assign identifier to each process • Timing Functions - MPI_Wtime, MPI_Wtick
MPI_INIT • Initializes the MPI environment • Assigns all spawned processes to MPI_COMM_WORLD, default comm. • C • int MPI_Init(argc,argv) • int *argc; • char ***argv; • INPUT PARAMETERS • argc - Pointer to the number of arguments • argv - Pointer to the argument vector • Fortran • CALL MPI_INIT(error_code) • int error_code – variable that gets set to an error code
MPI_FINALIZE • Terminates the MPI environment • C • int MPI_Finalize() • Fortran • CALL MPI_FINALIZE(error_code) • int error_code – variable that gets set to an error code
Hello World 1 (C) #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); printf(”Hello world!\n”); MPI_Finalize(); return(0); }
Hello World 1 (Fortran) program main include 'mpif.h' integer ierr call MPI_INIT(ierr) print *, 'Hello world!' call MPI_FINALIZE(ierr) end