630 likes | 773 Views
Chapter 8. Matrix-Vector Multiplication. Sequential Algorithm. Matrix-Vector Multiplication: Input : a[0..m-1, 0..n-1] – matrix with dimension m×n b[0..n-1] – vector with dimension n×1 Output: c[0..m-1] – vector with dimension m×1 for i ← 0 to m – 1 c[i] ← 0
E N D
Chapter 8 Matrix-Vector Multiplication
Sequential Algorithm Matrix-Vector Multiplication: Input : a[0..m-1, 0..n-1] – matrix with dimension m×n b[0..n-1] – vector with dimension n×1 Output: c[0..m-1] – vector with dimension m×1 for i ← 0 to m – 1 c[i] ← 0 for j ← 0 to n-1 c[i] ← c[i] + a[i,j] × b[j]
MPI_Scatter • Cut an array at the specified CPU id equal and then send a part to the other CPU ids which is at same communicator. int Sdata[], Rdata[], Send_cnt, Recv_cnt, src, err; MPI_Comm COMM: MPI_Datatype Stype, Rtype; err = MPI_Scatter(Sdata, Send_cnt, Stype, Rdata, Recv_cnt, Rtype, src, COMM);
MPI_Scatter int Sdata[8] = {1,2,3,4,5,6,7,8}, Rdata[2]; int Send_cnt = 2, Recv_cnt = 2, src = 0; MPI_Scatter( Sdata, Send_cnt, MPI_INTEGER, Rdata, Recv_cnt, MPI_INTEGER , src, MPI_COMM_WORLD); CPU0 CPU1 CPU2 CPU3 Rdata [7,8] [5,6] [1,2] [5,6] Sdata [1,2] Rdata [7,8] Rdata Rdata [3,4] [3,4] MPI_Scatter [1,2,3,4,5,6,7,8]
MPI_Scatter • In general, Send_cnt and Recv_cnt must be the same, and Stype and Rtype is also the same. If not, there may be some problem. • Suppose there are N CPU at this communicator, the size of Sdata must be at least Send_cnt*N.
MPI_Scatterv • A scatter operation in which different processes may end up with different numbers of elements.
Function MPI_Scatterv MPI_Scatterv (void *send_buffer, int* send_cnt, int* send_disp, MPI_Datatype send_type, void *rec_buffer, int recv_cnt, MPI_Datatype recv_type, int root, MPI_Comm communicator)
MPI_Gather • Collective data from CPU id which is at same communicator and put result into specified CPU id. int Sdata[], Rdata[], Send_cnt, Recv_cnt, dest, err; MPI_Comm COMM; MPI_Datatype Stype, Rtype; err = MPI_Gather( Sdata, Send_cnt, Stype, Rdata, Recv_cnt, Rtype, dest, COMM);
MPI_Gather int Send_cnt = 2, Recv_cnt = 2, dest = 0; MPI_Gather ( Sdata, Send_cnt, MPI_INTEGER, Rdata, Recv_cnt, MPI_INTEGER , dest, MPI_COMM_WORLD); CPU0 CPU1 CPU2 CPU3 Rdata Sdata Sdata Sdata [3,4] [1,2] [5,6] [7,8] Sdata MPI_Gather [1,2,3,4,5,6,7,8]
MPI_Gather • In general, Send_cnt and Recv_cnt must be the same, and Stype and Rtype is also the same. If not, there may be some problem. • Suppose there are N CPU at this communicator, the size of Rdata must be at least Send_cnt*N.
MPI_Gatherv • A gather operation in which the number of elements collected from different processes may vary.
Function MPI_Gatherv MPI_Gatherv (void* send_buffer, int send_cnt, MPI_Datatype send_type, void *recv_buffer, int* recv_cnt, int* recv_disp, MPI_Datatype recv_type, int root, MPI_Comm communicator)
MPI_Allgather Be liked MPI_Gather, but MPI_Allgather let collection result sent to all CPU ids which is at same communicator. int Sdata[], Rdata[], Send_cnt, Recvcnt, err; MPI_Comm Comm; MPI_Datatype Stype, Rtype; err = MPI_Allgather( Sdata, Send_cnt, Stype, Rdata, Recv_cnt, Rtype, Comm);
MPI_Allgather int Send_cnt = 2, Recv_cnt = 8; MPI_Allgather ( Sdata, Send_cnt, MPI_INTEGER, Rdata, Recv_cnt, MPI_INTEGER , MPI_COMM_WORLD); CPU0 CPU1 CPU2 CPU3 Sdata Rdata Rdata Rdata Sdata Sdata Sdata [1,2] [3,4] [5,6] [7,8] Rdata MPI_Allgather [1,2,3,4,5,6,7,8] [1,2,3,4,5,6,7,8] [1,2,3,4,5,6,7,8] [1,2,3,4,5,6,7,8]
MPI_Allgatherv An all-gather function in which different processes may contribute different members of elements.
int MPI_Allgatherv (void* send_buffer, int send_cnt, MPI_Datatype send_type, void* receive_buffer, int* receive_cnt, int* receive_disp, MPI_Datatype receive_type, MPI_COMM communicator) Function MPI_Allgatherv
MPI_Alltoall • An all-to-all exchange of data elements among processes
Function MPI_Alltoall int MPI_Alltoall (void *send_buffer, int *send_count, int *send_displacement, MPI_Datatype send_type, void *recv_buffer, int *recv_count, int *recv_displacement, MPI_Datatype recv_type, MPI_Comm communicator)
Data Decomposition Options • Rowwise block-striped decomposition • Columnwise block-striped decomposition • Checkboard block decomposition
There are four collective communication operations The processes in the first column of the virtual process grid participate in the communication that gathers vector b when p is not square. The processes in the first row of the virtual process grid participate in the communication that scatters vector b when p is not square. Each first-row process broadcasts its block of b to other processes in the same column of the process grid. Each row of processes in the grid performs an independent sum-reduction, yielding vector c in the first column of processes.
int MPI_Dims_create int MPI_Dims_create (int nodes, int dims, int *size) nodes: an input parameter, the number of processes in the grid. dims: an input parameter, the number of dimensions in the desired grid. size: an input/output parameter, the size of each grid dimension.
int MPI_Cart_create int MPI_Cart_create (MPI_Comm old_comm, int dims, int *size, int *periodic, int reorder, MPI_Comm *cart_comm) old_comm: the old communicator. All processes in the old communicator must collectively call the function. dims: the number of grid dimensions. *size: an array of size dims. Element size[j] is the number of processes in dimension j. *periodic: an array of size dims. Element periodic[j] should be 1 if dimension j is periodic (communications wrap around the edges of the grid) and 0 otherwise. reorder: a flag indicating if process ranks can be reordered. If reorder is 0, the rank of each process in the new communicator is the same as its rank in old_comm.
int MPI_Cart_rank int MPI_Cart_rank (MPI_Comm comm, int *coords, int *rank) comm: an input parameter whose value is the Cartesian communicator in which the communication is occurring. coords : an input parameter: an integer array containing the coordinates of a process in the virtual grid. rank : the rank of the process in comm with the specified coordinates.