Distributed-Memory (Message-Passing) Paradigm

Distributed-Memory (Message-Passing) Paradigm FDI 2007 Track Q Day 1 – Afternoon Session

Characteristics of Distributed-Memory Machines • Scalable interconnection network. • Some memory physically local, some remote • Message-passing programming model. • Major issues: network latency and bandwidth, message-passing overhead, data distribution. • Incremental parallelization and debugging can be difficult.

What is MPI? • De facto standard API for explicit message-passing MIMD/SPMD programming. • Many implementations over many networks • Developed in mid 90’s by consortium, reflecting lessons learned from machine-specific libraries and PVM. • Focused on: homogeneous MPPs, high performance, library writing, portability. • For more information: people.cs.vt.edu/~ribbens/fdi

Six-Function MPI • MPI_Init Initialize MPI • MPI_Finalize Close it down • MPI_Comm_rank Get my process # • MPI_Comm_size How many total? • MPI_Send Send message • MPI_Recv Receive message

MPI “Hello World” in Fortran77 implicit none include 'mpif.h' integer myid, numprocs, ierr call mpi_init( ierr ) call mpi_comm_rank( mpi_comm_world, myid, ierr ) call mpi_comm_size( mpi_comm_world, numprocs, ierr ) print *, "hello from ", myid, " of ", numprocs call mpi_finalize( ierr ) stop end

MPI “Hello World” in C #include "mpi.h" #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { int myid, numprocs, namelen; char processor_name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); MPI_Get_processor_name(processor_name, &namelen); printf("hello from %s: process %d of %d\n", processor_name, myid, numprocs); MPI_Finalize(); }

Function MPI_Send MPI_SEND(buf, count, datatype, dest, tag, comm) IN buf initial address of send buffer (choice) IN count number of entries to send (integer) IN datatype datatype of each entry (handle) IN dest rank of destination (integer) IN tag message tag (integer) IN comm communicator (handle) int MPI_Send (void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) MPI_SEND(BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERR) <type> BUF(*) INTEGER COUNT, DATATYPE, DEST, TAG, COMM, IERR

Function MPI_Recv MPI_RECV(buf, count, datatype, source, tag, comm, status) OUT buf initial address of receive buffer (choice) IN count max number of entries to receive (integer) IN datatype datatype of each entry (handle) IN dest rank of source (integer) IN tag message tag (integer) IN comm communicator (handle) OUT status return status (Status) int MPI_Recv (void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) MPI_RECV(BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERR) <type> BUF(*) INTEGER COUNT, DATATYPE, SOURCE, TAG, COMM INTEGER STATUS(MPI_STATUS_SIZE), IERR

MPI Send and Recv Semantics • These are “standard mode”, blocking calls. • Send returns when buf may be re-used; Recv returns when data in buf is available. • Count consecutive items of type datatype, beginning at buf, are sent to process with rank dest. • Tag can be used to distinguish among messages from the same source. • Messages are non-overtaking. • Buffering policies depend on implementation.

MPI Communicators • A communicator can be thought of as a set of processes; every communication event takes place within a particular communicator. • MPI_COMM_WORLD is the initial set of processes (note the static process model). • Why do communicators exist? • Collective operations over subsets of processes • Can define special topologies for sets of processes • Separate communication contexts for libraries.

MPI Collective Operations • An operation over an entire communicator • Must be called by every member of the communicator • Three classes of collective operations: • Synchronization (MPI_Barrier) • Data movement • Collective computation

Collective Patterns(Gropp)

Collective Computation Patterns(Gropp)

Collective Routines (Gropp) • Many routines: Allgather Allgatherv Allreduce Alltoall Alltoallv Bcast Gather Gatherv Reduce ReduceScatter Scan Scatter Scatterv • All versions deliver results to all participating processes. • V versions allow the chunks to have different sizes. • Allreduce, Reduce, ReduceScatter, and Scan take both built-in and user-defined combination functions.

MPI topics not covered … • Topologies: Cartesian and graph • User-defined types • Message-passing modes • Intercommunicators • MPI-2 topics: MPI I/O, remote memory access, dynamic process management

Distributed-Memory (Message-Passing) Paradigm

Distributed-Memory (Message-Passing) Paradigm

Presentation Transcript

Distributed Memory Multiprocessors

Distributed Configuration

Distributed Objects

Distributed Systems: Message Passing, Clusters, and Implementation of Clusters in Representative Operating Systems

Distributed Message Passing for Large Scale Graphical Models

Introduction to Software Distributed Shared Memory Systems

Shared Memory and Message Passing

Distributed (Operating) Systems -Communication in Distributed Systems-

Communication in Distributed Systems

MIMD Distributed Memory Architectures

Programming distributed memory systems: Message Passing Interface (MPI)

Distributed Memory Programming Using Advanced MPI (Message Passing Interface)

distributed systems

Distributed Shared Memory (DSM)

Parallel and Distributed Processing Lecture 5: Message-Passing Computing

Bridging the Gap Between Distributed Shared Memory and Message Passing

Distributed Routing Algorithms

Distributed Systems

Distributed Systems

Distributed Computing

Distributed Routing Algorithms

Distributed Memory Programming Using Advanced MPI (Message Passing Interface)