520 likes | 651 Views
Friday, October 20, 2006. “Work expands to fill the time available for its completion.” Parkinson’s 1st Law. MPI_Recv(void *buf, int count , MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status ) MPI_Get_count( MPI_Status *status , MPI_Datatype datatype
E N D
Friday, October 20, 2006 “Work expands to fill the time available for its completion.” • Parkinson’s 1st Law
MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) MPI_Get_count(MPI_Status *status, MPI_Datatype datatype int *count_recvd) • Returns number of entries received in count_recvd variable.
Matrix Vector Multiplication • n x n matrix A • Vector b • x=Ab • p processing elements • Suppose A is distributed row-wise (n/p rows per process) • Each process computes different portion of x
Matrix Vector Multiplication (Initial distribution. Colors represent data distributed on different processes) n/p rows A b x
Matrix Vector Multiplication (Colors represent that all parts of b are required by each process) n/p rows b x A
Matrix Vector Multiplication (All parts of b are required by each process) • Which collective operation can we use?
Matrix Vector Multiplication (All parts of b are required by each process)
Matrix Vector Multiplication • n x n matrix A • Vector b • x=Ab • p processing elements • Suppose A is distributed column-wise (n/p columnsper process) • Each process computes different portion of x.
Matrix Vector Multiplication (initial distribution. Colors represent data distributed on different processes) n/p cols b x A
Partial sums calculated by each process n/p cols partial x0 A b x
MPI_Reduce count=4 dest=1 Task 0 Task 1 Task 2 Task 3 Task 1 Element wise reduction can be done.
Row-wise requires one MPI_Allgather operation. • Column-wise requires MPI_Reduce and MPI_Scatter operations.
Matrix Matrix Multiplication • A and B are nxn matrices • p is the number of processing elements • The matrices are partitioned into blocks of size n/√p x n/√p
16 processes each represented by a different color. Different portions of the nxn matrices are divided among these processes. A B C
16 processes each represented by a different color. Different portions of the nxn matrices are divided among these processes. A B C BUT!To compute Ci,jwe need all sub-matrices Ai,kand Bk,j for 0<=k<√p
To compute Ci,j we need all sub-matrices Ai,k and Bk,j for 0<=k<√p • All to all broadcast of matrix A’s blocks in each row • All to all broadcast of matrix B’s blocks in each column
Canon’s Algorithm • Memory efficient version of the previous algorithm. Each process in ith row requires all√p sub-matrices Ai,k 0<=k<√p • Schedule computation so that computation of √p processes in ith row use diferent Ai,k at any given time
16 processes each represented by a different color. Different portions of the nxn matrices are divided among these processes. A B
A B C
Canon’s Algorithm A B C To compute C0,0we need all sub-matrices A0,kand Bk,0 for 0<=k<√p
Canon’s Algorithm Shift left Shift up A B C
Canon’s Algorithm Shift left Shift up A B C
Canon’s Algorithm Shift left Shift up A B C Sequence of √p sub-matrix multiplications done.
A B C
A B C Some initial alignment required!
A B C Shift all sub-matrices Ai,j to the left (with wraparound) by i steps Shift all sub-matrices Bi,j up (with wraparound) by j steps After circular shift operations, Pijhas submatrices Ai,(j+i)mod√p and B(i+j)mod√p, j
Topologies • Many computational science and engineering problems use a series of matrix or grid operations. • The dimensions of the matrices or grids are often determined by the physical problems. • Frequently in multiprocessing, these matrices or grids are partitioned, or domain-decomposed, so that each partition is assigned to a process.
Topologies • MPI uses linear ordering and views processes in 1-D topology. • Although it is still possible to refer to each of the partitions by a linear rank number, a mapping of the linear process rank to a higher dimensional virtual rank numbering would facilitate a much clearer and natural computational representation.
Topologies • To address the needs of this MPI library provides topology routines. • Interacting processes would be identified by coordinates in that topology.
Topologies • Each MPI process would be mapped in the higher dimensional topology. Different ways to map a set of processes to a two-dimensional grid. (a) and (b) show a row- and column-wise mapping of these processes, (c) shows a mapping that follows a space-filling curve (dotted line), and (d) shows a mapping in which neighboring processes are directly connected in a hypercube.
Topologies • Ideally, mapping would be determined by interaction among processes and connectivity of physical processors. • However, mechanism for assigning ranks to MPI does not use information about interconnection network. • Reason:Architecture independent advantages of MPI (otherwise different mappings would have to be specified for different interconnection networks) • Left to MPI library to find appropriate mapping that reduces cost of sending and receiving messages.
MPI allows specification of virtual process topologies of in terms of a graph • Each node in graph corresponds to a process and edge exists between two nodes if they communicate with each other. • Most common topologies are Cartesian topologies (one, two or higher grids)
Creating and Using Cartesian Topologies • We can create Cartesian topologies using the function: • int MPI_Cart_create( • MPI_Comm comm_old, int ndims, • int *dims, int *periods, • int reorder, MPI_Comm *comm_cart)
With processes renamed in a 2D grid topology, we are able to assign or distribute work, or distinguish among the processes by their grid topology rather than by their linear process ranks.
MPI_CART_CREATE is a collective communication function. • It must be called by all processes in the group.
Creating and Using Cartesian Topologies • Since sending and receiving messages still require (one-dimensional) ranks, MPI provides routines to convert ranks to Cartesian coordinates and vice-versa. int MPI_Cart_coord(MPI_Comm comm_cart, int rank, int maxdims, int *coords) int MPI_Cart_rank(MPI_Comm comm_cart, int *coords, int *rank)
Creating and Using Cartesian Topologies • The most common operation on Cartesian topologies is a shifting data along a dimension of the topology. int MPI_Cart_shift(MPI_Comm comm_cart, int dir, int s_step, int *rank_source, int *rank_dest) • MPI_CART_SHIFT is used to find two "nearby" neighbors of the calling process along a specific direction of an N-dimensional Cartesian topology. • This direction is specified by the input argument, direction, to MPI_CART_SHIFT. • The two neighbors are called "source" and "destination" ranks.