1 / 32

MPI and OpenMP

MPI and OpenMP. By: Jesus Caban and Matt McKnight. What is MPI?. MPI: M essage P assing I nterface Is not a new programming language, is a library with functions that can be called from C/Fortran/Python Successor to PVM (Parallel Virtual Machine )

caelan
Download Presentation

MPI and OpenMP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MPI and OpenMP By: Jesus Caban and Matt McKnight

  2. What is MPI? • MPI: Message Passing Interface • Is not a new programming language, is a library with functions that can be called from C/Fortran/Python • Successor to PVM (Parallel Virtual Machine ) • Developed by an open, international forum with representation from industry, academia and government laboratories.

  3. What it’s for? • Allows data to be passed between processes in a distributed memory environment • Provides source-code portability • Allows efficient implementation • A great deal of functionality • Support for heterogeneous parallel architectures

  4. MPI Communicator • Idea: • Group of processors that are allowed to communicate to each other • Most often use communicators • MPI_COMM_WORLD • Note MPI Format: MPI_XXX var = MPI_Xxx(parameters); MPI_Xxx(parameters);

  5. Getting Started Include MPI header file Initialize MPI environment Work: Make message passing calls Send Receive Terminate MPI environment

  6. Include File Include Include MPI header file #include <stdio.h> #include <stdlib.h> #include <mpi.h> int main(int argc, char** argv){ … } Initialize Work Terminate

  7. Initialize MPI Include Initialize MPI environment int main(int argc, char** argv){ int numtasks, rank; MPI_Init (*argc,*argv) ; MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); ... } Initialize Work Terminate

  8. Initialize MPI (cont.) MPI_Init (&argc,&argv) Not MPI functions called before this call. MPI_Comm_size(MPI_COMM_WORLD, &nump) A communicator is a collection of processes that can send messages to each other. MPI_COMM_WORLD is a predefined communicator that consists of all the processes running when the program execution begins. MPI_Comm_rank(MPI_COMM_WORLD, &myrank) In order for a process to find out its rank. Include Initialize Work Terminate

  9. Terminate MPI environment Terminate MPI environment Include #include <stdio.h> #include <stdlib.h> #include <mpi.h> int main(int argc, char** argv){ … MPI_Finalize(); } Initialize Work No MPI functions called after this call. Terminate

  10. Let’s work with MPI Work: Make message passing calls (Send, Receive) Include if(my_rank != 0){ MPI_Send(data, strlen(data)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else{ MPI_Recv(data, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); } Initialize Work Terminate

  11. Work (cont.) int MPI_Send ( void* message, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) Include Initialize Work int MPI_Recv ( void* message, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm MPI_Status *status) Terminate

  12. Hello World!! #include "mpi.h" int main(int argc, char* argv[]) { int my_rank, p, source, dest, tag = 0; char message[100]; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &p); if (my_rank != 0) { /* Create message */ sprintf(message, “Hello from process %d!", my_rank); dest = 0; MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); }else { for(source = 1; source < p; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s", message); }} MPI_Finalize(); }

  13. Compile and Run MPI • Compile • gcc –c hello.exe mpi_hello.c –lmpi • mpicc mpi_hello.c • Run • mpirun –np 5 hello.exe • Output $mpirun –np 5 hello.exe Hello from process 1! Hello from process 2! Hello from process 3! Hello from process 4!

  14. More MPI Functions • MPI_Bcast( void *m, int s, MPI_Datatype dt, int root, MPI_Comm) • Sends a copy of the data inm on the process with rank root to each process in the communicator. • MPI_Reduce( void *operand, void* result, int count, MPI_Datatype datatye, MPI_Op operator, int root, MPI_Comm comm) • Combines the operands stored in the memory referenced by operand using operation operator and stores the result in res on process root. • double MPI_Wtime( void) • Returns a double precision value that represents the number of seconds that have elapsed since some point in the past. • MPI_Barrier ( MPI_Comm comm) • Each process in comm block until every process in comm has called it.

  15. More Examples • Trapezoidal Rule: • Integral from a to b of a nonnegative function f(x) • Approach: Estimating the area by partitioning the region into regular geometric shapes and then add the areas of the shapes • Compute Pi

  16. Compute PI #include <stdio.h> #include "mpi.h" #define PI 3.141592653589793238462643 #define PI_STR "3.141592653589793238462643" #define MAXLEN 40 #define f(x) (4./(1.+ (x)*(x))) void main(int argc, char *argv[]){ int N=0,rank,nprocrs,i,answer=1; double mypi,pi,h,sum, x, starttime,endtime,runtime,runtime_max; char buff[MAXLEN]; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf(“CPU %d saying hello",rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocrs); if(rank==0) printf("Using a total of %d CPUs",nprocrs);

  17. Compute PI while(answer){ if(rank==0){ printf("This program computes pi as “ "4.*Integral{0->1}[1/(1+x^2)]"); printf("(Using PI = %s)",PI_STR); printf("Input the Number of intervals: N ="); fgets(buff,MAXLEN,stdin); sscanf(buff,"%d",&N); printf("pi will be computed with %d intervals on %d processors.", N ,nprocrs); } /*Procr 0 = P(0) gives N to all other processors*/ MPI_Bcast(&N,1,MPI_INT,0,MPI_COMM_WORLD); if(N<=0) goto end_program;

  18. Compute PI starttime=MPI_Wtime(); sum=0.0; h=1./N; for(i=1+rank;i<=N;i+=nprocrs){ x=h*(i-0.5); sum+=f(x); } mypi=sum*h; endtime=MPI_Wtime(); runtime=endtime-starttime; MPI_Reduce(&mypi,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD); MPI_Reduce(&runtime,&runtime_max,1,MPI_DOUBLE,MPI_MAX,0, MPI_COMM_WORLD); printf("Procr %d: runtime = %f",rank,runtime); fflush(stdout); if(rank==0){ printf("For %d intervals, pi = %.14lf,error=%g",N,pi,fabs(pi-PI));

  19. Compute PI printf("computed in = %f secs",runtime_max); fflush(stdout); printf("Do you wish to try another run? (y=1;n=0)"); fgets(buff,MAXLEN,stdin); sscanf(buff,"%d",&answer); } /*processors wait while P(0) gets new input from user*/ MPI_Barrier(MPI_COMM_WORLD); MPI_Bcast(&answer,1,MPI_INT,0,MPI_COMM_WORLD); if(!answer) break; } end_program: printf("\nProcr %d: Saying good-bye!\n",rank); if(rank==0) printf("\nEND PROGRAM\n"); MPI_Finalize(); }

  20. Compile and Run Example 2 • Compile • gcc –c pi.exe pi.c –lmpi $mpirun –np 2 pi.exe Procr 1 saying hello. Procr 0 saying hello Using a total of 2 CPUs This program computes pi as 4.*Integral{0->1}[1/(1+x^2)] (Using PI = 3.141592653589793238462643) Input the Number of intervals: N = 10 pi will be computed with 10 intervals on 2 processors Procr 0: runtime = 0.000003 Procr 1: runtime = 0.000003 For 10 intervals, pi = 3.14242598500110, error = 0.000833331 computed in = 0.000003 secs

  21. What is ? • Similar to MPI, but used for shared memory parallelism • Simple set of directives • Incremental parallelism • Unfortunately only works with proprietary compilers…

  22. Compilers and Platforms • Compilers and Platforms • Fujitsu/Lahey Fortran, C and C++ • Intel Linux Systems • Sun Solaris Systems • HP HP-UX PA-RISC/Itanium • Fortran • C • aC++ • HP Tru64 Unix • Fortran • C • C++ • IBM XL Fortran and C from IBM • IBM AIX Systems • Intel C++ and Fortran Compilers from Intel • Intel IA32 Linux Systems • Intel IA32 Windows Systems • Intel Itanium-based Linux Systems • Intel Itanium-based Windows Systems • Guide Fortran and C/C++ from Intel's KAI Softare Lab • Intel Linux Systems • Intel Windows Systems • PGF77 and PGF90 Compilers from The Portland Group, Inc. (PGI) • Intel Linux Systems • Intel Solaris Systems • Intel Windows/NT Systems • SGI MIPSpro 7.4 Compilers • SGI IRIX Systems • Sun Microsystems Sun ONE Studio 8, Compiler Collection, Fortran 95, C, and C++ • Sun Solaris Platforms • Compiler Collection Portal • VAST from Veridian Pacific-Sierra Research • IBM AIX Systems • Intel IA32 Linux Systems • Intel Windows/NT Systems • SGI IRIX Systems • Sun Solaris Systems taken from www.openmp.org

  23. How do you use OpenMP? • C/C++ API • Parallel Construct – when a ‘region’ of the program can be executed in multiple parallel threads, this fundamental construct starts the execution. #pragma omp parallel[clause[ [, ]clase] …] new-line structured-block The clause is one of the following: if (scalar–expression) private (variable-list) firstprivate (variable-list) default (shared | none) shared (variable-list) copyin (variable-list) reduction (operator:variable-list) num_threads (integer-expression)

  24. FundamentalConstructs • forConstruct • Defines an iterative work-sharing construct in which the iterations of the associated loop will execute in parallel. • Sections Construct • Identifies a noniterative work-sharing construct that specifies a set of constructs that are to be divided among threads, each section being executed only once by each thread

  25. single Construct • associates a structured block’s execution with only one thread • parallel for Construct • Shortcut for a parallel region containing only one for directive • parallel sections Construct • Shortcut for a parallel region containing only a single sections directive

  26. Master and Synchronization Directives • master Construct • Specifies a structured block that is executed by the master thread of the team • critical Construct • Restricts execution of the associated structured block to a single thread at a time • barrier Directive • Synchronize all threads in a team. When this construct is encountered, all threads wait until the others have reached this point.

  27. atomic Construct • Ensures that a specific memory location is updated ‘atomically’ (meaning only one thread is allowed write-access at a time) • flush Directive • Specifies a “cross-thread” sequence point at which all threads in a team are ensured a “clean” view of certain objects in memory • ordered Construct • A structured block following this directive will iterate in the same order as if executed in a sequential loop.

  28. Data • How do we control the data in this SMP environment? • threadprivate Directive • makes files-scope and namespace-scope private to a thread • Data-Sharing Attributes • private - private to each thread • firstprivate • lastprivate • shared – shared among all threads • default – User affects attributes • reduction – perform reduction on scalars • copyin – assign the same value to threadprivate variables • copyprivate – broadcast the value of a private variable from one member of a team to the others

  29. Scalability test on SGI Origin 2000 Timing results of the dot product test in milliseconds for n  = 16 * 1024. www.public.iastate.edu/~grl/HFP1/hpf.openmp.mpi.June6.2002.html

  30. Timing results of matrix times matrix test in milliseconds for n = 128 www.public.iastate.edu/~grl/HFP1/hpf.openmp.mpi.June6.2002.html

  31. Architecture comparison From http://www.csm.ornl.gov/~dunigan/sgi/

  32. References • Book: Parallel Programming with MPI, Peter Pacheco • www-unix.mcs.anl.gov/mpi • http://alliance.osc.edu/impi/ • http://rocs.acomp.usf.edu/tut/mpi.php • http://www.lam-mpi.org/tutorials/nd/ • www.openmp.org • www.public.iastate.edu/~grl/HFP1/hpf.openmp.mpi.June6.2002.html

More Related