An Introduction

An Introduction • Background • The message-passing model • Parallel computing model • Communication between process • Source of further MPI information • Basic of MPI message passing • Fundamental concepts • Simple examples in C • Extended point-to-pint operations • Non-blocking communication • modes

Companion Material • Online examples available at http://www.mcs.anl.gov/mpi/tutorial • http://www.netlib.org/utk/papers/mpi-book/mpi-book.html contains a relative complete reference on how to program on MPI and examples.

The Message-Passing Model • A process is (traditionally) a program counter and address space • Processes may have multiple threads(program counters and associated stacks) sharing a single address space. MPI is for communication among processes, which have separate address spaces. • Interprocess communication consists of • Synchronization/Asynchronization • Movement of data from one process’s address space to another’s

Synchronous Communication • A synchronous communication does not complete until the message has been received • A FAX or registered mail

Asynchronous Communication • An asynchronous communication completes as soon as the message is on the way. • A post card or email

What is MPI? • A message-passing library specification • Extended message-passing model • Not a language or compiler specification • Not a specific implementation or product • For parallel computers, clusters, and heterogeneous networks • Full-featured • Designed to permit the development of parallel software libraries • Designed to provide access to advanced parallel hardware for • End users • Library writers • Tool developers

What is message passing? • Data transfer plus synchronization • Requires cooperation of sender and receiver • Cooperation not always apparent in code

MPI Sources • The standard itself: • At http://www.mpi-forum.org • All MPI official release, in both postscript and HTML • Books: • Using MPI: Portable parallel programming with the message-passing interface, by Gropp, Lusk, and Skjellum, MIT Press, 1994 • MPI: The Complete Reference, by Snir, Otto, Huss-Lederman, Walker,and Dongarra, MIT Press, 1996 • Designing and building Parallel programs, by Ian Foster, Addison-Wesley, 1995 • Parallel Programming with MPI, by Peter Pacheco, Morgan-Kaufmann, 1997 • MPI: The Complete Reference Vol 1 and 2, MIT Press, 1998 • Other information on web: • At http://www.mcs.anl.gov/mpi • Pointers to lot of stuff, including other talks and tutorials, a FAQ, other MPI pages

Features of MPI • General • Communications combine context and group for message security • Thread safety • Point-to-point communication • Structured buffers and derived datatypes, heterogeneity. • Modes: normal(blocking and non-blocking), synchronous, ready(to allow access to fast protocol), buffered • Collective • Both built-in and user-defined collective operations. • Large number of data movement routines. • Subgroups defined directly or by topology

Features NOT in MPI • Process Management • Remote memory transfer • Threads • Virtual shared memory

Why use MPI? • MPI provides a powerful, efficient, and portable way to express parallel programs. • MPI was explicitly designed to enable libraries which may eliminate the need for many users to learn (much of) MPI. • Portable • Expressive • Good way to learn about subtle issues in parallel computing

Is MPI large or small? • MPI is large(125 functions) • MPI’s extensive functionality requires many functions. • Number of functions not necessarily a measure of complexity. • MPI is small(6 functions) • Many parallel programs can be written with just 6 basic functions. • MPI is just right • One can access flexibility when it is required. • One need not master all parts of MPI to use it. • MPI is whatever size you like

Skeleton MPI Program #include <mpi.h> main( int argc, char** argv ) { MPI_Init( &argc, &argv ); /* main part of the program */ Use MPI function call depend on your data partition and parallization architecture MPI_Finalize(); }

Initializing MPI • The first MPI routine called in any MPI program must be the initialization routine MPI_INIT • MPI_INIT is called once by every process, before any other MPI routines int mpi_Init( int *argc, char **argv );

A minimal MPI program(c) #include “mpi.h” #include <stdio.h> int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); printf(“Hello, world!\n”); MPI_Finalize(); Return 0; }

Commentary • #include “mpi.h” provides basic MPI definitions and types. • MPI_Init starts MPI • MPI_Finalize exits MPI • Note that all non-MPI routines are local; thus “printf” run on each process

Notes on C • In C: • mpi.h must be included by using #include mpi.h • MPI functions return error codes or MPI_SUCCESS

Error handling • By default, an error causes all processes to abort. • The user can cause routines to return(with an error code) instead. • A user can also write and install custom error handlers. • Libraries might want to handle errors differently from applications.

Running MPI Programs • The MPI-1 standard does not specify how to run an MPI program, just as the Fortran standard does not specify how to run a Fortran program. • In general, starting an MPI program is dependent on the implementation of MPI you are using, and might require various scripts, program arguments, and/or environment variables. • Common way: mpirun –np 2 hello

Finding out about the environment • Two important questions that arise early in a parallel program are: -How many processes are participating in this computation? -Which one am I? • MPI provides functions to answer these questions: -MPI_Comm_size reports the number of processes. -MPI_Comm_rank reports the rank, a number between 0 and size-1, identifying the calling process.

Better Hello(c) #include “mpi.h” #include <stdio.h> int main(int argc, char *argv[]) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf(“I am %d of\n”, rank, size); MPI_Finalize(); return 0; }

Some basic concepts • Processes can be collected into groups. • Each message is sent in a context, and must be received in the same context. • A group and context together form a communicator. • A process is identified by its rank in the group associated with a communicator. • There is a default communicator whose group contains all initial processes, called MPI_COMM_WORLD.

MPI Datatypes • The data in a message to sent or received is described by a triple(address, count, datatype) • An MPI datatype is recursively defined as: • Predefined, corresponding to a datatype from the language(e.g., MPI_INT, MPI_DOUBLE_PRECISION) • A contiguous array of MPI datatypes • A strided block of datatypes • An indexed array of blocks of datatypes • An arbitrary structure of datatypes • There are MPI functions to construct custom datatypes, such an array of (int, float) pairs, or a row of a matrix stored columnwise.

Why datatypes • Since all data is labeled by type, an MPI implementation can support communication between processes on machines with very different memory representations and lengths of elementary datatypes(heterogeneous communication). • Specifying application-oriented layout of data in memory • Reduces memory-to-memory copies in the implementation • Allows the use of special hardware(scatter/gather) when available.

Process 1 Receive(data) Process 0 Send(data) MPI basic send/receive • We need to fill in the details in • Things that need specifying: • How will “data” be described? • How will processes be identified? • How will the receiver recognize/screen messages? • What will it mean for these operation to complete?

MPI basic (blocking) send • MPI_Send(void *start, int count,MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) • The message buffer is described by (start, count, datatype). • The target process is specified by dest, which is the rank of the target process in the communicator specified by comm. • When this function returns, the data has been delivered to the system and the buffer can be reused. The message may not have been received by the target process.

MPI_Send primitive • MPI_Send performs a standard-mode, blocking send. • The send buffer specified by MPI_Send consists of count successive entries of the type indicated by datatype, starting with the entry at address “buf”. • The count may be zero, in which case the data part of the message is empty.

Basic MPI Datatype MPI datatype C datatype MPI_CHAR signed char MPI_SIGNED_CHAR signed char MPI_UNSIGNED_CHAR unsigned char MPI_SHORT signed short MPI_UNSIGNED_SHORT unsigned short MPI_INT signed int MPI_UNSIGNED unsigned int MPI_LONG signed long MPI_UNSIGNED_LONG unsigned long

Basic MPI Datatype(continue) MPI_FLOAT float MPI_DOUBLE double MPI_LONG_DOUBLE long double

MPI basic (blocking) receive • MPI_RECV(void *start, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) • Waits until a matching (on source and tag) message is received from the system, and the buffer can be used. • Source is rank in communicator specified by comm, or MPI_ANY_SOURCE. • Status contains further information • Receiving fewer than count occurrences of datatype is OK, but receiving more is an error.

More comment on send and receive • A receive operation may accept messages from an arbitrary sender, but a send operation must specify a unique receiver. • Source equals destination is allowed, that is, a process can send a message to itself.

MPI_ISEND and MPI_IRECV primitive MPI_ISEND(buf, count, datatype, dest, tag, comm, request) • Nonblocking communication use request objects to identify communication operations and link the posting operation with the completion operation. • request is a request handle which can be used to query the status of the communication or wait for its completion.

More on nonblocking send and receive • A nonblocking send call indicates that the system may start copying data out of the send buffer. The sender must not access any part of the send buffer after a nonblocking send operation is posted, until the complete-send returns. • A nonblocking receive indicates that the system may start writing data into the receive buffer. The receiver must not access any part of the receive buffer after a nonblocking receive operation is posted, until the complete-receive returns.

MPI_RECV primitive • MPI_RECV performs a standard-mode, blocking receive. • The receive buffer consists of storage sufficient to contain count consecutive entries of the type specified by datatype, starting at address buf. • An overflow error occurs if all incoming data don’t fit into the receive buffer. • The receiver can specify a wildcard value for souce(MPI_ANY_SOURCE) and/or a wildcard value for tag(MPI_ANY_TAG), indicating that any source and/or tag are acceptable.

MPI tags • Message are sent with an accompanying user-defined integer tag, to assist the receiving process in identifying the message. • Message can be screened at the receiving end by specifying a specific tag, or not screened by specifying MPI_ANY_TAG as the tag in a receive. • Some non-MPI message-passing systems have called tags”message types”. MPI calls them tags to avoid confusion with datatype.

Retrieving further information • Status is a data structure allocated in the user’s program. • In C: int recvd_tag, recvd_from, recvd_count; MPI_Status status; MPI_Recv(…, MPI_ANY_SOURCE, MPI_ANY_TAG, …, &status) recvd_tag = status.MPI_TAG; recvd_from = status.MPI_SOURCE; MPI_Get_count(&status, datatype, &recvd_count);

Tags and contexts • Separation of messages used to be accomplished by use of tags, but • This requires libraries to be aware of tags used by other libraries • This can be defeated by use of “wild card” tags • Contexts are different from tags • No wild cards allowed • Allocated dynamically by the system when a library sets up a communicator for its own use. • User-defined tags still provided in MPI for user convenience in organizing application • Use MPI_Comm_split to create new communicators

MPI is simple • Many parallel programs can be written using just these six functions, only two of which are non-trivial; • MPI_INIT • MPI_FINALIZE • MPI_COMM_SIZE • MPI_COMM_RANK • MPI_SEND • MPI_RECV • Point-to-point(send/recv) isn’t the only way…

Alternative set of 6 functions fro simplified MPI • MPI_INIT • MPI_FINALIZE • MPI_COMM_SIZE • MPI_COMM_RANK • MPI_BCAST • MPI_REDUCE

Introduction to collective operations in MPI • Collective operations are called by all processes in a communicator • MPI_Bcast distributes data from one process(the root) to all others in a communicator. Syntax: MPI_Bcast(void *message, int count, MPI_Datatype datatype, int root, MPI_Comm comm) • MPI_Reduce combines data from all processes in communicator or and returns it to one process Syntax: MPI_Reduce(void *message, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm) • In many numerical algorithm, send/receive can be replaced by Bcast/Reduce, improving both simplicity and efficiency.

MPI_Datatype values MPI_CHAR, MPI_SHORT, MPI_INT, MPI_LONG, MPI_FLOAT, MPI_DOUBLE, MPI_LONG_DOUBLE, MPI_UNSIGNED_CHAR, MPI_UNSIGNED_SHORT, MPI_UNSIGNED, MPI_UNSIGNED_LONG, MPI_BYTE, MPI_PACKED • MPI_OP Values MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND, MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR, MPI_MAXLOC, MPI_MINLOC

Example: PI in c –1 #include “mpi.h” #include <math.h> int main(int argc, char *argv[]) { int done = 0, n, myid, numprocs, I, rc; double PI25DT = 3.141592653589793238462643; double mypi, pi, h, sum, x, a; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_COMM_rank(MPI_COMM_WORLD, &myid); while (!done) { if (myid == 0) { printf(“Enter the number of intervals: (0 quits) “); scanf(“%d”, &n); } MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); if (n == 0) break; } }

Example: PI in c - 2 h = 1.0 / (double)n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i – 0.5); sum += 4.0 / (1.0 + x * x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf(“pi is approximately %.16f, Error is %.16f\n”, pi, fabs(pi – PI25DT)); } MPI_Finalize(); return 0; }

Compile and run the code • mpicc –o pi pi.c • Run the code by using: mpirun –np # of procs –machinefile XXX pi -machinefile tells MPI to run the program on the machines of XXX.

One more example • Simple summation of numbers using MPI • This program adds numbers that stored in the data file "rand_data.txt" in parallel. • This program is taken from book "Parallel Programming" by Barry Wilkinson and Michael Allen.

Continue…. #include "mpi.h" #include <stdio.h> #include <math.h> #define MAXSIZE 100 int main(int argc, char **argv) { int myid, numprocs; int data[MAXSIZE], i, x, low, high, myresult=0, result; char fn[255]; FILE *fp; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myid); if(0 == myid) { /* open input file and intialize data */ strcpy(fn, getenv("PWD")); strcat(fn, "/rand_data.txt");

Continue… if( NULL == (fp = fopen(fn, "r")) ) { printf("Can't open the input file: %s\n\n", fn); exit(1); } for(i=0; i<MAXSIZE; i++) { fscanf(fp, "%d", &data[i]); } } } /* broadcast data */ MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD); /* add portion of data */ x = MAXSIZE/numprocs; /* must be an integer */ low = myid * x; high = low + x; for(i=low; i<high; i++) { myresult += data[i]; }

Continue… printf("I got %d from %d\n", myresult, myid); /* compute global sum */ MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); if(0 == myid) { printf("The sum is %d.\n", result); } MPI_Finalize(); }

Communication models MPI provides multiple modes for sending messages: • Synchronous mode(MPI_Ssend): the send does not complete until a matching receive has begun. (Unsafe programs become incorrect and usually deadlock within an MPI_Ssend) • Buffered mode(MPI_Bsend): the user supplies the buffer to system for its use. (User supplies enough memory to make unsafe program safe) • Read mode(MPI_Rsend): user guarantees that matching

Deadlocks in blocking operations • Send a large message from process 0 to process 1 • If there is insufficient storage at the destination, the send must wait for the user to provide the memory space(through a receive) • What happens with Process 0 Process 1 Send(1) Send(0) Recv(1) Recv(0) • This is called “unsafe” because it depends on the availability of system buffers.

An Introduction

An Introduction

Presentation Transcript

An Introduction

An introduction

An Introduction

An introduction

An Introduction

An introduction

An Introduction

An Introduction

An Introduction

An Introduction

An introduction

An Introduction

An Introduction

An Introduction

An introduction

An introduction

An introduction…………

An Introduction

An Introduction

An Introduction

An Introduction

An Introduction