MPI Program Structure Self-Test with Solution

MPI Program Structure Self Test with solution

Self Test • How would you modify "Hello World" so that only even-numbered processors print the greeting message?

Answer #include <stdio.h> #include <mpi.h> void main (int argc, char *argv[]) { int myrank, size; MPI_Init(&argc, &argv); /* Initialize MPI */ MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* Get my rank */ MPI_Comm_size(MPI_COMM_WORLD, &size); /* Get the total number of processors */ if ((myrank % 2) == 0) printf("Processor %d of %d: Hello World!\n", myrank, size); MPI_Finalize(); /* Terminate MPI */ }

Self Test • Consider the following MPI pseudo-code, which sends a piece of data from processor 1 to processor 2: MPI_INIT() MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr) if (myrank = 1) MPI_SEND (some data to processor 2 in MPI_COMM_WORLD) else { MPI_RECV (data from processor 1 in MPI_COMM_WORLD) print "Message received!" } MPI_FINALIZE() • where MPI_SEND and MPI_RECV are blocking send and receive routines. Thus, for example, a process encountering the MPI_RECV statement will block while waiting for a message from processor 1.

Self Test • If this code is run on a single processor, what do you expect to happen? • The code will print "Message received!" and then terminate. • The code will terminate normally with no output. • The code will hang with no output. • An error condition will result.

Answer • Incorrect. This cannot happen; there is only one processor! • Incorrect. Not quite. Remember that ranks are indexed starting with zero. • Incorrect. Good try! The receive on processor 0 is blocking, and so you might expect it to simply wait forever for an incoming message. However, remember that there is no processor 1. • Correct! Both 1 and 2 are invalid ranks in this case; thus the code prints an error message and exits. (Note, however, that checking for invalid arguments may be disabled by default on some machines. In such cases the code will appear to hang with no output, since the blocking receive on processor 0 is never satisfied.)

Self Test • If the code is run on three processors, what do you expect? • The code will terminate after printing "Message received!". • The code will hang with no output. • The code will hang after printing "Message received!". • The code will give an error message and exit (possibly leaving a core file).

Answer • Incorrect. Not quite. Remember that processor 0 also posts a (blocking) receive. • Incorrect. Close. It is true that the blocking receive on processor 0 is never satisfied, but the communication between processor 1 and processor 2 occurs independently of this. • Correct. Yes! The receive posted by processor 0 is blocking, so the code hangs while waiting for a message to arrive. • Incorrect! No, in this case all ranks referred to in the code are valid, so there is no error.

Self Test • Consider an MPI code running on four processors, denoted A, B, C, and D. In the default communicator MPI_COMM_WORLD their ranks are 0-3, respectively. Assume that we have defined another communicator, called USER_COMM, consisting of processors B and D. Which one of the following statements about USER_COMM is always true? • Processors B and D have ranks 1 and 3, respectively. • Processors B and D have ranks 0 and 1, respectively. • Processors B and D have ranks 1 and 3, but which has which is in general undefined. • Processors B and D have ranks 0 and 1, but which has which is in general undefined.

Answer • Incorrect. Remember that ranks in a given communicator always start from zero. Try again. • Incorrect. Close. Ranks are assigned starting with zero, so this is sometimes true. Remember, however, that there is no connection between ranks in different communicators. • Incorrect. Remember that ranks in a given communicator always start from zero. Try again. • Correct! Yes! This is the only statement which is always true. Ranks are assigned starting from zero, but which processor gets which rank is in general undefined, and ranks in different communicators are unrelated.

Course Problem

Course Problem • Description • The initial problem implements a parallel search of an extremely large (several thousand elements) integer array. The program finds all occurrences of a certain integer, called the target, and writes all the array indices where the target was found to an output file. In addition, the program reads both the target value and all the array elements from an input file. • Exercise • You now have enough knowledge to write pseudo-code for the parallel search algorithm introduced in Chapter 1. In the pseudo-code, you should correctly initialize MPI, have each processor determine and use its rank, and terminate MPI. By tradition, the Master processor has rank 0. Assume in your pseudo-code that the real code will be run on 4 processors.

#include <stdio.h> #include <mpi.h> int rank, error; error = MPI_Init(&argc,&argv); error = MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) then read in target value from input data file send target value to processor 1 send target value to processor 2 send target value to processor 3 read in integer array b from input file send first third of array to processor 1 send second third of array to processor 2 send last third of array to processor 3 while (not done) receive target indices from any of the slaves write target indices to the output file end while else receive the target from processor 0 receive my sub_array from processor 0 for each element in my subarray if ( element value == target ) then convert local index into global index // SEE COMMENT #1 send global index to processor 0 end if end loop send message to processor 0 indicating my search is done // SEE COMMENT #2 end if error = MPI_Finalize(); Solution

Solution • Comment #1 • For example, say that the array b is 300 elements long. • Each of the three slaves would then be working with their own array containing 100 elements. • Let's say Processor 3 found the target at index 5 of its local array. Five would then be the local index. But in the global array b the index of that particular target location would be 200+5=205. It is this global index that should be sent back to the master. • Thus, in the real program, (which you will write after the next chapter) you will have to write code to convert local indices to global. • Comment #2 • There are several methods by which each slave could send a message to the master indicating that it is done with its part of the total search. • One way would be to have the message contain a special integer value that could not possibly be an index of the array b. • An alternate method would be to send a message with a special tag different from the tag used by the "real" messages containing target indices. NOTE: Message tags will be discussed in the next chapter.

MPI Program Structure Self-Test with Solution

MPI Program Structure Self-Test with Solution

Presentation Transcript

Program Structure

MPI Program Performance

MPI-izing Your Program

Experiences with MPI Program Migration

Java Program Structure

MPI

Program Structure II

PROGRAM STRUCTURE

BMO Program Structure

MPI Program Structure

MPI Program Structure

Program Hierarchy Structure

VAMOS Program Structure

NQC Program Structure

Java program structure

Program Structure I

MPI Program Performance

Program Structure IV

Java Program Structure

Program Structure II

MPI Program Performance