190 likes | 281 Views
Running MPI on “Gridfarm”. Bryan Carpenter February, 2005. Gridfarm Nodes. We assume you have a login account for gridfarm08.ucs.indiana.edu (which also has the alias gf8.ucs.indiana.edu ), and from there you are able to log-in to ( ssh to) the four worker nodes (hosts):
E N D
Running MPI on “Gridfarm” Bryan Carpenter February, 2005
Gridfarm Nodes • We assume you have a login account for gridfarm08.ucs.indiana.edu (which also has the alias gf8.ucs.indiana.edu), and from there you are able to log-in to (ssh to) the four worker nodes (hosts): • gridfarm04.ucs.indiana.edu • gridfarm05.ucs.indiana.edu • gridfarm06.ucs.indiana.edu • gridfarm07.ucs.indiana.edu which also have the aliases gf4, gf5, gf6, gf7. • All hosts, including gf8, share a common file system. • You may use gf8 for initial log-in and some administrative tasks. But all MPI-specific operations should happen on gf4, gf5, gf6, gf7 (MPI is not installed on gf8). • You can also use gf2 and gf3 but these notes don’t discuss!
LAM MPI • LAM MPI is one of the two most widely used free implementations of MPI. The other is MPICH. • We use LAM because it is developed at Indiana University. MPICH is also a perfectly good choice (e.g. if you want to run on “non-UNIX/Linux platforms”). • For general information/documentation on LAM see: http://www.lam-mpi.org/ For MPICH see: http://www-unix.mcs.anl.gov/mpi/mpich/ • You will find that the LAM MPI software is already installed on Gridfarm. But you will need to do some configuration in your own accounts before you can use the software.
LAM Execution Model • Before you can run parallel programs using LAM, you need to start the LAM Daemons on all the worker nodes (in our case, on gf4, gf5, gf6, gf7). • Once the LAM Daemons are running you can start MPI programs using mpirun or mpiexec. • You only need to start the LAM Daemons once, early in your session, then you can run parallel programs as many times as you like. • In fact the daemons will carry on running after your log-in session finishes, unless you explicitly stop them (with lamhalt). • LAM daemons are “per user”: it is your responsibility to start and stop your own daemons, and what you do with your own daemons shouldn’t directly affect other users (except you share resources).
lamboot • You start the LAM daemons using the lamboot command. • Once things are properly configured, you only need to run lamboot on one node, and it will start the daemons on all nodes. • You pass an argument to lamboot which is a file containing a list of hosts on which to start the daemons. • Unfortunately there is some complexity to configuring things so this works. • Under the Gridfarm set-up, lamboot has to use ssh to remotely execute the command that starts the daemon. • lamboot is very picky, and will only succeed if you can ssh between nodes without having to give a password, or answer any interactive questions.
Configuring ssh. 1. • Generate an ssh key-pair to authenticate yourself between nodes, using the ssh-keygen command, e.g.: $ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/dbc/.ssh/id_rsa): [HIT RETURN] Enter passphrase (empty for no passphrase): [HIT RETURN] Enter same passphrase again: [HIT RETURN] Your identification has been saved in /home/dbc/.ssh/id_rsa. Your public key has been saved in /home/dbc/.ssh/id_rsa.pub. The key fingerprint is: 55:62:76:f9:19:07:ad:c7:18:9f:7d:1b:0f:9c:08:b1 dbc@gridfarm004.ucs.indiana.edu • Do not specify a passphrase here.
Configuring ssh. 2. • To make your new key an authorised key for log in, copy the new public key to your authorized_keys file, e.g.: $ cd ~/.ssh $ cp id_rsa.pub authorized_keys • You should now be able to ssh between nodes of Gridfarm (only!) without giving a password or passphrase. • Try it.
Configuring ssh. 3. • There is one last stage to go through before lamboot will work. • The first time you ssh to any node, you may get a message saying the host is unrecognized, and you need to answer “yes” in response to the question about whether to continue connecting. • This interactive dialog will trip up lamboot. • Before running lamboot for the first time, manuallyssh to all nodes gf4, gf5, gf6, gf7 to initialize the list of known hosts, e.g: $ ssh gf4 The authenticity of host 'gf4 (156.56.104.84)' can't be established. RSA key fingerprint is b7:a1:92:a0:4f:d1:e0:10:c5:68:d2:48:86:7c:f7:5b. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'gf4,156.56.104.84' (RSA) to the list of known hosts.
Booting the Daemons. 1. • Prepare a “hosts” file called (e.g.) bhost.def, containing the names of the nodes, on separate lines, like this: gf4 gf5 gf6 gf7 • Set the value of the environment variable LAMRSH to "ssh -x".E.g. put the lines: LAMRSH="ssh -x" export LAMRSH in your ~/.bashrc, and “source” it.
Booting the Daemons. 2. • Finally, run the lamboot command, e.g.: $ lamboot -v bhost.def LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University Executing hboot on n0 (gf4 - 1 CPU)... Executing hboot on n1 (gf5 - 1 CPU)... Executing hboot on n2 (gf6 - 1 CPU)... Executing hboot on n3 (gf7 - 1 CPU)... topology done • If you did everything right, you should get output similar to this.
An MPI Test Program #include "mpi.h" int main(int argc, char* argv []) { char name [1024] ; int len ; MPI_Init(&argc, &argv) ; MPI_Get_processor_name(name, &len) ; printf("Hello from %s\n", name) ; MPI_Finalize() ; }
Compiling an MPI Program • Copy the contents of the previous slide to a file mpitest.c (for example). • Compile, it specifying mpitest as the output executable, as follows: mpicc -o mpitest mpitest.c • mpicc is a wrapper for the standard cc command that automatically finds the MPI includes files and libraries. It supports the same options as cc.
Running an MPI Program • Run mpitest in 6 processes as follows: $ mpirun -np 6 mpitest Hello from gridfarm004.ucs.indiana.edu Hello from gridfarm004.ucs.indiana.edu Hello from gridfarm005.ucs.indiana.edu Hello from gridfarm007.ucs.indiana.edu Hello from gridfarm006.ucs.indiana.edu Hello from gridfarm005.ucs.indiana.edu • Again, if you did everything right, you should get output similar to this. • Note there are only 4 physical hosts used in this example, so extra processes “wrap around”, and gf4 and gf5 each hold two processes. • Exercise: try this with gf2 and gf3 enabled
More C MPI Examples • There are about a dozen examples at http://www.npac.syr.edu/projects/cpsedu/summer98summary/examples/mpi-c/mpi-c.html Orhttp://www.new-npac.org/projects/cdroms/cewes-1999-06-vol2/cps615course/mpi-c.html • You can download the source codes and compile and run them in a similar way to the simple example just given. • For example: $ mpicc -o monte-carlo_with_mpi monte-carlo_with_mpi.c -lm $ mpirun -np 4 monte-carlo_with_mpi • Note we need to include the math library –lm in this and most other examples here.
Notes on C Examples • Most of the C examples need the –lm flag for compilation. • Most examples will run in any number of processes, but Example-1 needs exactly 4 processes and Example-11 needs exactly 6 processes. • Example-3 doesn’t overlap sends and recvs correctly and in principle could deadlock (exercise: fix this). • Example-8 has disabled calls to MPE (MPE is MPI extensions) pipe commands to communicate particle information between processors so presumably doesn’t produce right answers (exercise: LAM has MPE – enable these calls). • Example-9 you will have to define the C pre-processor macro DO_TEST to make an executable. E.g. compile by: mpicc -o pipe -DDO_TEST pipe.c
mpiJava • mpiJava is a Java interface to MPI, developed in Indiana and other places. • The software is available from: http://www.hpjava.org/mpiJava.html • You will find mpiJava is already installed on Gridfarm in the directory /opt/mpiJava/. • This installation is makes use of the native LAM installation described earlier. • mpiJava can also be configured to use other native MPI packages (e.g. MPICH).
Environment for mpiJava • Add the mpiJava classes to your class path, and the prunjava script to your path, e.g. add the lines CLASSPATH=".:/opt/mpiJava/lib/classes/" export CLASSPATH PATH=$PATH:/opt/mpiJava/src/scripts/ to your ~./bashrc file, and “source” it.
Compiling an mpiJava example • As an example we use a parallel Java version of Conway’s Game of Life, included in the mpiJava distribution. • Copy the source file to your working directory, e.g.: cp /opt/mpiJava/examples/simple/Life.java . • Provided the mpiJava classes are on your class path, you can just compile an mpiJava program using javac: javac Life.java
Running an mpiJava Example • The easiest way to run an mpiJava example is by using the prunjava script, e.g.: $ prunjava 4 Life The first argument is the number of processes to run in. The second argument is the name of the class to run. • If this runs correctly it will print out a series of successive states of the “Life” board. • Note this program is written to run in exactly 4 processes.