More on MPI

More on MPI • Nonblocking point-to-point routines • Deadlock • Collective communication

Non-blocking send/recv • Most hardware has a communication co-processor: communication can happen at the same time with computation. Proc 0 proc 1 … MPI_Send_start MPI_Recv_start Comp … Comp …. MPI_Send_wait MPI_Recv_wait No comm/comp overlaps Proc 0 proc 1 … MPI_Send MPI_Recv Comp … Comp …. No comm/comp overlaps

Non-blocking send/recv routines • Non-blocking primitives provide the basic mechanisms for overlapping communication with computation. • Non-blocking operations return (immediately) “request handles” that can be tested and waited on. MPI_Isend(start, count, datatype, dest, tag, comm, request) MPI_Irecv(start, count, datatype, dest, tag, comm, request) MPI_Wait(&request, &status)

One canalso test without waiting: MPI_Test(&request, &flag, status) • MPI allows multiple outstanding non-blocking operations. MPI_Waitall(count, array_of_requests, array_of_statuses) MPI_Waitany(count, array_of_requests, &index, &status)

Process 0 Send(1) Recv(1) Process 1 Send(0) Recv(0) Deadlocks • Send a large message from process 0 to process 1 • If there is insufficient storage at the destination, the send must wait for memory space • What happens with this code? • This is called “unsafe” because it depends on the availability of system buffers

Process 0 Send(1) Recv(1) Process 0 Sendrecv(1) Process 1 Recv(0) Send(0) Process 1 Sendrecv(0) Some Solutions to the “unsafe” Problem • Order the operations more carefully: Supply receive buffer at same time as send:

Process 0 Bsend(1) Recv(1) Process 0 Isend(1) Irecv(1) Waitall Process 1 Bsend(0) Recv(0) Process 1 Isend(0) Irecv(0) Waitall More Solutions to the “unsafe” Problem • Supply own space as buffer for send (buffer mode send) Use non-blocking operations:

MPI Collective Communication • Send/recv routines are also called point-to-point routines (two parties). Some operations require more than two parties, e.g broadcast, reduce. Such operations are called collective operations, or collective communication operations. • Non-blocking collective operations in MPI-3 only • Three classes of collective operations: • Synchronization • data movement • collective computation

Synchronization • MPI_Barrier( comm ) • Blocks until all processes in the group of the communicator comm call it.

P0 P0 A A P1 P1 A P2 P2 A P3 P3 A A A B C D Collective Data Movement Broadcast Scatter B C D Gather

P0 P0 A ABCD Reduce P1 P1 B P2 P2 C P3 P3 D A A AB B Scan ABC C ABCD D Collective Computation

MPI Collective Routines • Many Routines: Allgather, Allgatherv, Allreduce, Alltoall, Alltoallv, Bcast, Gather, Gatherv, Reduce, Reduce_scatter, Scan, Scatter, Scatterv • Allversions deliver results to all participating processes. • V versions allow the hunks to have different sizes. • Allreduce, Reduce, Reduce_scatter, and Scan take both built-in and user-defined combiner functions.

MPI discussion • Ease of use • Programmer takes care of the ‘logical’ distribution of the global data structure • Programmer takes care of synchronizations and explicit communications • None of these are easy. • MPI is hard to use!!

MPI discussion • Expressiveness • Data parallelism • Task parallelism • There is always a way to do it if one does not care about how hard it is to write the program.

MPI discussion • Exposing architecture features • Force one to consider locality, this often leads to more efficient program. • MPI standard does have some items to expose the architecture feature (e.g. topology). • Performance is a strength in MPI programming. • Would be nice to have both world of OpenMP and MPI.

More on MPI

More on MPI

Presentation Transcript

MPI in uClinux on Microblaze

MPI

MPI

MPI

MPI on a Million Processors

MPI

Interconnects for more than MPI

MPI on WinNT-Clusters

MPI

MPI

MPI

MPI

MPI on NT- Further Investigations

MPI on the Grid

MPI

More MPI

MPI

MPI on the Grid

MPI and MPICH on Clusters

Interactive MPI on Demand