750 likes | 1.03k Views
WW Grid. Parallel Programming with Message-Passing Interface (MPI). An Introduction. Gri d Computing and D istributed S ystems (GRIDS) Lab . The University of Melbourne Melbourne, Australia www.gridbus.org. Rajkumar Buyya. Message-Passing Programming Paradigm.
E N D
WW Grid Parallel Programming with Message-Passing Interface (MPI) An Introduction Grid Computing and Distributed Systems (GRIDS) Lab. The University of MelbourneMelbourne, Australiawww.gridbus.org Rajkumar Buyya
Message-Passing Programming Paradigm • Each processor in a message-passing program runs a sub-program • written in a conventional sequential language • all variables are private • communicate via special subroutine calls M M M Memory P P P Processors Interconnection Network
MPI Slides are Derived from • Dirk van der Knijff, High Performance Parallel Programming, PPT Slides • MPI Notes, Maui HPC Centre. • Melbourne Advanced Research Computing Center • http://www.hpc.unimelb.edu.au
Single Program Multiple Data • Introduced in data parallel programming (HPF) • Same program runs everywhere • Restriction on general message-passing model • Some vendors only support SPMD parallel programs • Usual way of writing MPI programs • General message-passing model can be emulated
SPMD examples main(int argc, char **argv) { if(process is to become Master) { MasterRoutine(/*arguments*/) } else /* it is worker process */ { WorkerRoutine(/*arguments*/) } }
Messages • Messages are packets of data moving between sub-programs • The message passing system has to be told the following information • Sending processor • Source location • Data type • Data length • Receiving processor(s) • Destination location • Destination size
Messages • Access: • Each sub-program needs to be connected to a message passing system • Addressing: • Messages need to have addresses to be sent to • Reception: • It is important that the receiving process is capable of dealing with the messages it is sent • A message passing system is similar to: • Post-office, Phone line, Fax, E-mail, etc • Message Types: • Point-to-Point, Collective, Synchronous (telephone)/Asynchronous (Postal)
Point-to-Point Communication • Simplest form of message passing • One process sends a message to another • Several variations on how sending a message can interact with execution of the sub-program
Point-to-Point variations • Synchronous Sends • provide information about the completion of the message • e.g. fax machines • Asynchronous Sends • Only know when the message has left • e.g. post cards • Blocking operations • only return from the call when operation has completed • Non-blocking operations • return straight away - can test/wait later for completion
Collective Communications • Collective communication routines are higher level routines involving several processes at a time • Can be built out of point-to-point communications • Barriers • synchronise processes • Broadcast • one-to-many communication • Reduction operations • combine data from several processes to produce a single (usually) result
Message Passing Systems • Initially each manufacturer developed their own • Wide range of features, often incompatible • Several groups developed systems for workstations • PVM - (Parallel Virtual Machine) • de facto standard before MPI • Open Source (NOT public domain!) • User Interface to the System (daemons) • Support for Dynamic environments
MPI Forum - www.mpi-forum.org • Sixty people from forty different organisations • Both users and vendors, from the US and Europe • Two-year process of proposals, meetings and review • Produced a document defining a standard Message Passing Interface (MPI) • to provide source-code portability • to allow efficient implementation • it provides a high level of functionality • support for heterogeneous parallel architectures • parallel I/O (in MPI 2.0) • MPI 1.0 contains over 115 routines/functions that can be grouped into 8 categories.
General MPI Program Structure MPI Include File Initialise MPI Environment Do work and perform message communication Terminate MPI Environment
MPI programs • MPI is a library - there are NO language changes • Header Files • C: #include <mpi.h> • MPI Function Format • C: error = MPI_Xxxx(parameter,...); MPI_Xxxx(parameter,...);
MPI helloworld.c #include <mpi.h> main(int argc, char **argv) { int numtasks, rank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, & numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf("Hello World from process %d of %d\n“, rank, numtasks); MPI_Finalize(); }
Example - C #include <mpi.h> /* include other usual header files*/ main(int argc, char **argv) { /* initialize MPI */ MPI_Init(&argc, &argv); /* main part of program */ /* terminate MPI */ MPI_Finalize(); exit(0); }
Handles • MPI controls its own internal data structures • MPI releases ‘handles’ to allow programmers to refer to these • C handles are of distinct typedef‘d types and arrays are indexed from 0 • Some arguments can be of any type - in C these are declared as void *
Initializing MPI • The first MPI routine called in any MPI program must be MPI_Init. • The C version accepts the arguments to main • int MPI_Init(int *argc, char ***argv); • MPI_Init must be called by every MPI program • Making multiple MPI_Init calls is erroneous • MPI_INITIALIZED is an exception to first rule
MPI_COMM_WORLD • MPI_INIT defines a communicator called MPI_COMM_WORLD for every process that calls it. • All MPI communication calls require a communicator argument • MPI processes can only communicate if they share a communicator. • A communicator contains a group which is a list of processes • Each process has it’s rank within the communicator • A process can have several communicators
Communicators • MPI uses objects called Communicators that defines which collection of processes communicate with each other. • Every process has unique integer identifier assigned by the system when the process initialises. A rand is sometimes called process ID. • Processes can request information from a communicator • MPI_Comm_rank(MPI_comm comm, int *rank) • Returns the rank of the process in comm • MPI_Comm_size(MPI_Comm comm, int *size) • Returns the size of the group in comm
Finishing up • An MPI program should call MPI_Finalize when all communications have completed. • Once called no other MPI calls can be made • Aborting: MPI_Abort(comm) • Attempts to abort all processes listed in commif comm = MPI_COMM_WORLD the whole program terminates
MPI Programs Compilation and Execution Let us look into MARC Aplha Cluster
Master Node: manjra.cs.mu.oz.au Dual Xeon 2GHz 512 MB memory 250 GB integrated storage Gigabit LAN CDROM & Floppy Drives Red Hat Linux release 7.3 (Valhalla) Worker Nodes(node1..node13) Each of the 13 worker node consists of the following: Pentium 4 2GHz 512 MB memory 40 GB harddisk Gigabit LAN Red Hat Linux release 7.3 (Valhalla) Master: manjra.cs.mu.oz.au Internal worker nodes: node1 node2 .... node13 Manjra: GRIDS Lab Linux Cluster Manjra Linux cluster
Front View Back View How legion clusters looks
Compile and Run Commands • Compile: • manjra> mpicc helloworld.c -o helloworld • Run: • manjra> mpirun -np 3 -machinefile machines.list helloworld • The file machines.list contains nodes list: • manjra.cs.mu.oz.au • node1 • node2 • node3 • node4 • node6 • node5 and node7 are not working today! No of processes
Sample Run and Output • A Run with 3 Processes: • manjra> mpirun -np 3 -machinefile machines.list helloworld • Hello World from process 0 of 3 • Hello World from process 1 of 3 • Hello World from process 2 of 3 • A Run by default • manjra> helloworld • Hello World from process 0 of 1
Sample Run and Output • A Run with 6 Processes: • manjra> mpirun -np 6 -machinefile machines.list helloworld • Hello World from process 0 of 6 • Hello World from process 3 of 6 • Hello World from process 1 of 6 • Hello World from process 5 of 6 • Hello World from process 4 of 6 • Hello World from process 2 of 6 • Note: Process execution need not be in process number order.
Sample Run and Output • A Run with 6 Processes: • manjra> mpirun -np 6 -machinefile machines.list helloworld • Hello World from process 0 of 6 • Hello World from process 3 of 6 • Hello World from process 1 of 6 • Hello World from process 2 of 6 • Hello World from process 5 of 6 • Hello World from process 4 of 6 • Note: Change in process output order. For each run, process mapping can be different. They may run on machines with different load. Hence such difference.
MPI Routines – C and Fortran • Environment Management • Point-to-Point Communication • Collective Communication • Process Group Management • Communicators • Derived Type • Virtual Topologies • Miscellaneous Routines
MPI Messages • A message contains a number of elements of some particular datatype • MPI datatypes • Basic Types • Derived types • Derived types can be built up from basic types • C types are different from Fortran types
Point-to-Point Communication • Communication between two processes • Source process sends message to destination process • Communication takes place within a communicator • Destination process is identified by its rank in the communicator • MPI provides four communication modes for sending messages • standard, synchronous, buffered, and ready • Only one mode for receiving
Standard Send • Completes once the message has been sent • Note: it may or may not have been received • Programs should obey the following rules: • It should not assume the send will complete before the receive begins - can lead to deadlock • It should not assume the send will complete after the receive begins - can lead to non-determinism • processes should be eager readers - they should guarantee to receive all messages sent to them - else network overload • Can be implemented as either a buffered send or synchronous send
Standard Send (cont.) MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) buf the address of the data to be sent count the number of elements of datatype buf contains datatype the MPI datatype dest rank of destination in communicatorcomm taga marker used to distinguish different message types commthe communicator shared by sender and receiver ierror the fortran return value of the send
Standard Blocking Receive • Note: all sends so far have been blocking (but this only makes a difference for synchronous sends) • Completes when message received MPI_Recv(buf, count, datatype, source, tag, comm, status) source - rank of source process in communicator comm status - returns information about message • Synchronous Blocking Message-Passing • processes synchronise • sender process specifies the synchronous mode • blocking - both processes wait until transaction completed
For a communication to succeed • Sender must specify a valid destination rank • Receiver must specify a valid source rank • The communicator must be the same • Tags must match • Message types must match • Receivers buffer must be large enough • Receiver can use wildcards • MPI_ANY_SOURCE • MPI_ANY_TAG • actual source and tag are returned in status parameter
MPI Send/Receive a Character (cont...) #include <mpi.h> #include <stdio.h> int main(int argc, char *argv[]) { int numtasks, rank, dest, source, rc, tag=1; char inmsg, outmsg='X'; MPI_Status Stat; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { dest = 1; rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); printf("Rank0 sent: %c\n", outmsg); source = 1; rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); }
MPI Send/Receive a Character else if (rank == 1) { source = 0; rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); printf("Rank1 received: %c\n", inmsg); dest = 0; rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } MPI_Finalize(); }