320 likes | 487 Views
Parallel Computing—Higher-level concepts of MPI. MPI—Presentation Outline. Communicators, Groups, and Contexts Collective Communication Derived Datatypes Virtual Topologies. Communicators, groups, and contexts. MPI provides a higher level abstraction to create parallel libraries:
E N D
MPI—Presentation Outline • Communicators, Groups, and Contexts • Collective Communication • Derived Datatypes • Virtual Topologies
Communicators, groups, and contexts • MPI provides a higher level abstraction to create parallel libraries: • Safe communication space • Group scope for collective operations • Process Naming • Communicators + Groups provide: • Process Naming (instead of IP address + ports) • Group scope for collective operations • Contexts: • Safe communication
What are communicators? • A data-structure that contains groups (and thus processes) • Why is it useful: • Process naming, ranks are names for application programmers • Easier than IPaddress + ports • Group communications as well as point to point communication • There are two types of communicators, • Intracommunicators: • Communication within a group • Intercommunicators: • Communication between two groups (must be disjoint)
What are contexts? • An unique integer: • An additional tag on the messages • Each communicator has a distinct context that provides a safe communication universe: • A context is agreed upon by all processes when a communicator is built • Intracommunicators has two contexts: • One for point-to-point communications • One for collective communications, • Intercommunicators has two contexts: • Explained in the coming slides
Intracommunicators • Contains one group • Allows point-to-point and collective communications between processes within this group • Communicators can only be built from existing communicators: • MPI.COMM_WORLD is the first Intracommunicator to start with • Creation of intracommunicators is a collective operation: • All processes in the existing communicator must call it in order to execute successfully • Intracommunicators can have process topologies: • Cartesian • Graph
Creating new Intracommunicators MPI.Init(args); int [] incl1 = { 0, 3}; Group grp1 = MPI.COMM_WORLD.Group(); Group grp2 = grp1.Incl(incl1); Intracomm newComm = MPI.COMM_WORLD.Create(grp2);
How do processes agree on context for new Intracommunicators ? • Each process has a static context variable which is incremented whenever an Intracomm is created • Each process increments this variable, sends it to all the other processes • The max integer is agreed upon as the context • An existing communicators’ context is used for sending “context agreement” messages: • What about MPI.COMM_WORLD? • It is safe anyway, because it is the first intracommunicator and there is no chance of conflicts
Intercommunicators • Contains two groups: • Local group (the local process is in this group) • Remote group • Both groups must be disjoint • Only allows point-to-point communications • Intercommunicators cannot have process topologies • Next slide: How to create intercommunicators
Creating intercommunicators MPI.Init(args); int [] incl2 = {0, 2, 4, 6}; int [] incl3 = {1, 3, 5, 7}; Group grp1 = MPI.COMM_WORLD.Group(); int rank = MPI.COMM_WORLD.Rank(); Group grp2 = grp1.Incl(incl2); Group grp3 = grp1.Incl(incl3); Intercomm icomm = null; if(rank == 0 || rank == 2 || rank == 4 || rank == 6) { icomm = MPI.COMM_WORLD.Create_intercomm(comm1,0,1,56); } else { icomm = MPI.COMM_WORLD.Create_intercomm(comm2,1,0,56);}
Creating intercomms … • What are the arguments to Create_intercomm method: • Local communicator (which contains current process) • local_leader (rank) • remote_leader (rank) • tag for messages sent for selection of contexts • But, the groups were disjoint, how can they communicate? • That is where a peer communicator is required • At least local_leader and remote_leader are part of this peer communicator • In the last figure, MPI.COMM_WORLD is the peer communicator, and process 0 and 1 (ranks relative to MPI.COMM_WORLD) are leaders of their respective groups
Selecting contexts for intercomms • An intercommunicator has two contexts: • send_context (Used for sending messages) • recv_context (Used for receiving messages) • In intercommunicators, processes in local group can only send messages to remote groups • How is context agreed upon? • Each group decides its context, • The leaders (local and remote) exchange the contexts agreed upon, • The one which is greater, is selected as the context
COMM_WORLD Group1 1 Group2 2 0 Process 1 Process 2 Process 0 2 Process 3 Process 7 1 0 Process 6 Process 4 Process 5
MPI—Presentation Outline • Point to Point Communication • Communicators, Groups, and Contexts • Collective Communication • Derived Datatypes • Virtual Topologies
Collective communications • Provided as a convenience for application developers: • Save significant development time • Efficient algorithms may be used • Stable (tested) • Built on top of point-to-point communications • These operations include: • Broadcast, Barrier, Reduce, Allreduce, Alltoall, Scatter, Scan, Allscatter • Versions that allows displacements between the data
Broadcast, scatter, gather, allgather, alltoall Image from MPI standard doc
Reduce collective operations Processes • MPI.PROD • MPI.SUM • MPI.MIN • MPI.MAX • MPI.LAND • MPI.BAND • MPI.LOR • MPI.BOR • MPI.LXOR • MPI.BXOR • MPI.MINLOC • MPI.MAXLOC
A Typical Barrier() Implementation • Eight processes, thus forms only one group • Each process exchanges an integer 4 times • Overlaps communications well
Intracomm.Bcast( … ) • Sends data from a process to all the other processes • Code from adlib: • A communication library for HPJava • The current implementation is based on n-ary tree: • Limitation: broadcasts only from rank=0 • Generated dynamically • Cost: O( log2(N) ) • MPICH1.2.5 uses linear algorithm: • Cost O(N) • MPICH2 has much improved algorithms • LAM/MPI uses n-ary trees: • Limitation, broadcast from rank=0
MPI—Presentation Outline • Point to Point Communication • Communicators, Groups, and Contexts • Collective Communication • Derived Datatypes • Virtual Topologies
MPI Datatypes • What kind (type) of data can be sent using MPI messaging? • Basically two types: • Basic (primitive) datatypes • Derived datatypes
MPI Basic Datatypes • MPI_CHAR • MPI_SHORT • MPI_INT • MPI_LONG • MPI_UNSIGNED_CHAR • MPI_UNSIGNED_SHORT • MPI_UNSIGNED_LONG • MPI_UNSIGNED • MPI_FLOAT • MPI_DOUBLE • MPI_LONG_DOUBLE • MPI_BYTE
Derived Datatypes • Besides basic datatypes, it is possible communicate heterogeneous and non-contiguous data: • Contiguous • Indexed • Vector • Struct
MPI—Presentation Outline • Point to Point Communication • Communicators, Groups, and Contexts • Collective Communication • Derived Datatypes • Virtual Topologies
Virtual topologies • Used to specify processes in a geometric shape • Virtual topologies have no connection with the physical layout of machines: • Its possible to make use of underlying machine architecture • These virtual topologies can be assigned to processes in an Intracommunicator • MPI provides: • Cartesian topology • Graph topology
Cartesian topology: Mapping four processes onto 2x2 topology • Each process is assigned a coordinate: • Rank 0: (0,0) • Rank 1: (1,0) • Rank 2: (0,1) • Rank 3: (1,1) • Uses: • Calculate rank by knowing grid position • Calculate grid positions from ranks • Easier to locate rank of neighbours • Applications may have communication patterns: • Lots of messaging with immediate neighbours
Periods in cartesian topology • Axis 1 (y-axis is periodic): • Processes in top and bottom rows have valid neighbours towards top and bottom respectively • Axis 0 (x-axis is non-periodic): • Processes in right and left column have undefined neighbour towards right and left respectively
Doing Matrix Multiplication using MPI • Just to give you an idea how MPI-based applications are designed …
Basically how it works! 1 0 2 2 1 0 0 2 2 • 3 2 • 0 2 1 • 2 2 4 0 1 0 0 0 1 1 1 1 x =
Matrix Multiplication MxN .. int rank = MPI.COMM_WORLD.Rank() ; int size = MPI.COMM_WORLD.Size() ; if(master_mpi_process) { initialize matrices M and N for(int i=1 ; i<size ; i++) { send rows of matrix M to process `i’ } broadcast matrix N to all non-zero processes for (int i=0 ; i<size ; i++) { receive rows of resultant matrix from process `i’ } .. print results .. } else { receive rows of Matrix M call broadcast to receive matrix N compute matrix multiplication for sub matrix (its done in parallel) send resultant row back to master process } ..