Lecture 3. MPI computing. 张少强 sqzhang@163.com http://bioinfo.uncc.edu/szhang. Message-Passing Computing 消息传递计算 More MPI routines: Collective ( 集合 ) routines Synchronous ( 同步) routines Non-blocking ( 非阻塞) routines. 上一讲是点对点消息传递,这一讲:. 集合消息传递例程. 发送消息给一组多个进程或者从一组进程接收进程的例程。 比“点对点”的例程要高效些。.
集合消息传递例程 发送消息给一组多个进程或者从一组进程接收进程的例程。 比“点对点”的例程要高效些。
Collective Communication 一个通信子中的一组进程. 无消息标记. • MPI_Bcast()- 从根进程(root process)向其他进程广播(broadcast) • MPI_Gather()- 一个进程从一组进程中的每个进程收集(gather)一个值 • MPI_Scatter()- 将跟进程的数据分发(scatter)给各个进程 • MPI_Alltoall() - 从所有进程向所有进程发送数据 • MPI_Reduce() - 将各进程的值组合成一个值 • MPI_Reduce_scatter() - 组合值,后分发结果 • MPI_Scan() - 在各个进程计算前缀数据归约 • MPI_Barrier()- 一种使各进程同步的方法:停止各进程直到所有进程都收到一个特定的“barrier” 呼叫.
MPI 广播操作 Sending same message to all processes in communicator.Multicast(多播) - sending same message to defined group of processes.
基本的 MPI 分发操作 Sending each element of an array in root process to a separate process. Contents of ith location of array sent to ith process.将数组的各个元素分发到各个进程
MPI_Scatter() • 最简单的scatter是将一个数组的每个元素分发给不同的进程. • MPI_Scatter()routine 也提供分发数组固定数目( a fixed number of contiguous elements )连续的元素 给各个进程.
Scattering contiguous groups of elements to each process分发连续的元素组给每个进程
举例 In the following code, size of send buffer is given by 100 * <number of processes> and 100 contiguous elements are send to each process: main (int argc, char *argv[]) { int size, *sendbuf, recvbuf[100]; /* for each process */ MPI_Init(&argc, &argv); /* initialize MPI */ MPI_Comm_size(MPI_COMM_WORLD, &size); sendbuf = (int *)malloc(size*100*sizeof(int)); /*memory allocating */ . MPI_Scatter(sendbuf,100,MPI_INT,recvbuf,100,MPI_INT,0, MPI_COMM_WORLD); . MPI_Finalize(); /* terminate MPI */ }
Gather(集中) Having one process collect individual values from set of processes. 2.12
Gather Example To gather items from group of processes into process 0, using dynamically allocated memory in root process: int data[10]; /*data to be gathered from processes*/ MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* find rank */ if (myrank == 0) { MPI_Comm_size(MPI_COMM_WORLD, &grp_size); /*find group size*/ buf = (int *)malloc(grp_size*10*sizeof (int)); /*alloc. mem*/ } MPI_Gather(data,10,MPI_INT,buf,grp_size*10,MPI_INT,0,MPI_COMM_WORLD) ; … MPI_Gather() gathers from all processes, including root.
Reduce(归约) Gather operation combined with specified arithmetic/logical operation. Example: Values could be gathered and then added together by root:
Reduce - operations MPI_Reduce(*sendbuf,*recvbuf,count,datatype,op,root,comm) Parameters: *sendbuf send buffer address *recvbuf receive buffer address count number of send buffer elements datatype data type of send elements op reduce operation. Several operations, including MPI_MAX Maximum MPI_MIN Minimum MPI_SUM Sum MPI_PROD Product root root process rank for result comm communicator
一个集合通信的实例 (求和的并行程序) #include "mpi.h" #include <stdio.h> #include <string.h> #include <stddef.h> #include <stdlib.h> #include <math.h> #define NUM 40 int main(int argc, char *argv[]){ int myid, i,numprocs,x,low,high,myresult, result; int data[40]={10, 13, 34, 45, 56, 76, 83, 54, 55, 60, 33, 54, 62, 10, 45, 32, 40, 54, 55, 60, 32, 43, 62, 44, 12, 31, 56, 91, 22, 90, 32, 44, 67, 76, 89, 99, 18, 77, 40, 21}; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Bcast(data, NUM, MPI_INT, 0, MPI_COMM_WORLD); x=NUM/numprocs; myresult=0; low=myid *x; high=low+x; for(i=low; i<high; i++) myresult+=data[i]; printf("I got %d from process %d\n", myresult, myid); MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0,MPI_COMM_WORLD); if (myid == 0) printf("The sum is %d \n",result); MPI_Finalize(); }
上例的运行结果 将文件保存为bcast_reduce.c 先用mpicc编译 再用mpiexec 运行4个进程
Collective routines主要特点 • 在一个通信子( communicator)内的一组进程上运行 • 可以替换一组点对点通信 • 通信是本地阻塞的 • 不能保证同步 (implementation dependent) • 可能需要定义根进程(root) • 数据量要准确匹配 • 不需要消息标记
Barrier(障栅) Block process until all processes have called it.Synchronous operation. MPI_Barrier(comm) Communicator
Synchronous Message Passing同步消息传递 Routines that return when message transfer completed. Synchronous send routine • Waits until complete message can be accepted by the receiving process before sending the message. In MPI, MPI_SSend() routine. Synchronous receive routine • Waits until the message it is expecting arrives. In MPI, actually the regular MPI_recv() routine.
Synchronous Message Passing Synchronous message-passing routines intrinsically perform two actions: • They transfer data and • They synchronize processes.
Synchronous Ssend() and recv() using 3-way protocol Process 1 Process 2 Time Request to send Ssend(); Suspend Ac kno wledgment recv(); process Both processes Message contin ue (a) When occurs bef ore send() recv() Process 1 Process 2 Time recv(); Suspend Request to send process Ssend(); Message Both processes contin ue Ac kno wledgment 之前 (b) 出现在 recv() send()
Parameters of synchronous send(same as blocking send) MPI_Ssend(buf, count, datatype, dest, tag, comm) Address of Datatype of Message tag send b uff er each item Number of items Rank of destination Comm unicator to send process
//ssend_rcev.c #include <stdio.h> #include "mpi.h" int main(int argc, char *argv[]) { int rank, size, i; int buffer[10]; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if(size<2) { printf("Please run with two processes.\n"); fflush(stdout);/*clear stdout buffer*/ MPI_Finalize(); return 0; } if(rank==0) { for(i=0;i<10;i++) buffer[i]=i; printf("Process: %d sending ...\n",rank); MPI_Ssend(buffer,10,MPI_INT,1,123,MPI_COMM_WORLD); } if(rank==1) { for(i=0;i<10;i++) buffer[i]=-1; printf("Process: %d receiving ...\n",rank); MPI_Recv(buffer,10,MPI_INT,0,123,MPI_COMM_WORLD,&status); for(i=0;i<10;i++) { if(buffer[i]!=i) printf("Error: buffer[%d]=%d but is expected to be %d\n",i,buffer[i],i); else printf("Process %d: buffer[%d]=%d as expected\n",rank,i,i); } fflush(stdout); } MPI_Finalize(); return 0; } 一个实例
Asynchronous Message Passing异步消息传递 • Routines that do not wait for actions to complete before returning. Usually require local storage for messages. • More than one version depending upon the actual semantics for returning. • In general, they do not synchronize processes but allow processes to move forward sooner. • Must be used with care.
MPI 阻塞和非阻塞 • Blocking - return after their local actions complete(局部完成后返回), though the message transfer may not have been completed. Sometimes called locally blocking.阻塞发送的局部完成条件是:保存消息的单元可再次为其他语句或例程所使用或者其内容改变但不影响消息的传递 • Non-blocking - return immediately (asynchronous)Non-blocking assumes that data storage used for transfer not modified by subsequent statements prior to being used for transfer, and it is left to the programmer to ensure this. Blocking/non-blocking terms may have different interpretations in other systems.
MPI Nonblocking Routines • Non-blocking send - MPI_Isend() - will return “immediately” even before source location is safe to be altered. • Non-blocking receive - MPI_Irecv() - will return even if no message to accept.
Nonblocking Routine Formats MPI_Isend(buf,count,datatype,dest,tag,comm,request) MPI_Irecv(buf,count,datatype,source,tag,comm, request) Completion detected by MPI_Wait() and MPI_Test(). MPI_Wait() waits until operation completed and returns then. MPI_Test() returns with flag set indicating whether operation completed at that time. Need to know whether particular operation completed. Determined by accessing request parameter.
Example To send an integer x from process 0 to process 1 and allow process 0 to continue: MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* find rank */ if (myrank == 0) { int x; MPI_Isend(&x,1,MPI_INT, 1, msgtag, MPI_COMM_WORLD, req1); compute(); MPI_Wait(req1, status); } else if (myrank == 1) { int x; MPI_Recv(&x,1,MPI_INT,0,msgtag, MPI_COMM_WORLD, status); }
How message-passing routines return before message transfer completed Message buffer needed between source and destination to hold message: Process 1 Process 2 Message b uff er Time send(); Contin ue recv(); Read process message b uff er