MPI and OpenMP

MPI and OpenMP

How to get MPI and How to Install MPI? • http://www-unix.mcs.anl.gov/mpi/ • mpich2-doc-install.pdf // installation guide • mpich2-doc-user.pdf // user guide

Outline • Message-passing model • Message Passing Interface (MPI) • Coding MPI programs • Compiling MPI programs • Running MPI programs • Benchmarking MPI programs • OpenMP

Message-passing Model

Processes • Number is specified at start-up time • Remains constant throughout execution of program • All execute same program • Each has unique ID number • Alternately performs computations and communicates

Not satisfied Circuit Satisfiability 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Solution Method • Circuit satisfiability is NP-complete • No known algorithms to solve in polynomial time • We seek all solutions • We find through exhaustive search • 16 inputs  65,536 combinations to test

Partitioning: Functional Decomposition • Embarrassingly parallel: No channels between tasks

Agglomeration and Mapping • Properties of parallel algorithm • Fixed number of tasks • No communications between tasks • Time needed per task is variable • Consult mapping strategy decision tree • Map tasks to processors in a cyclic fashion

Cyclic (interleaved) Allocation • Assume p processes • Each process gets every pth piece of work • Example: 5 processes and 12 pieces of work • P0: 0, 5, 10 • P1: 1, 6, 11 • P2: 2, 7 • P3: 3, 8 • P4: 4, 9

Summary of Program Design • Program will consider all 65,536 combinations of 16 boolean inputs • Combinations allocated in cyclic fashion to processes • Each process examines each of its combinations • If it finds a satisfiable combination, it will print it

Include Files • MPI header file #include <mpi.h> #include <stdio.h> • Standard I/O header file

Local Variables int main (int argc, char *argv[]) { int i; int id; /* Process rank */ int p; /* Number of processes */ void check_circuit (int, int); • Include argc and argv: they are needed to initialize MPI • One copy of every variable for each process running this program

Initialize MPI • First MPI function called by each process • Not necessarily first executable statement • Allows system to do any necessary setup MPI_Init (&argc, &argv);

Communicators • Communicator: opaque object that provides message-passing environment for processes • MPI_COMM_WORLD • Default communicator • Includes all processes • Possible to create new communicators • Will do this in Chapters 8 and 9

Communicator Name Communicator Processes Ranks Communicator MPI_COMM_WORLD 0 5 2 1 4 3

Determine Number of Processes • First argument is communicator • Number of processes returned through second argument MPI_Comm_size (MPI_COMM_WORLD, &p);

Determine Process Rank • First argument is communicator • Process rank (in range 0, 1, …, p-1) returned through second argument MPI_Comm_rank (MPI_COMM_WORLD, &id);

2 0 5 4 3 1 id id id id id id 6 6 6 6 6 6 p p p p p p Replication of Automatic Variables

What about External Variables? int total; int main (int argc, char *argv[]) { int i; int id; int p; … • Where is variable total stored?

Cyclic Allocation of Work for (i = id; i < 65536; i += p) check_circuit (id, i); • Parallelism is outside function check_circuit • It can be an ordinary, sequential function

Shutting Down MPI • Call after all other MPI library calls • Allows system to free up MPI resources MPI_Finalize();

Only process 0 will get the result Our Call to MPI_Reduce() MPI_Reduce (&count, &global_count, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); if (!id) printf ("There are %d different solutions\n", global_count);

Benchmarking the Program • MPI_Barrier barrier synchronization • MPI_Wtick  timer resolution • MPI_Wtime  current time

How to form a Ring? • Now to form ring of systems first of all one has to execute following command. [nayan@MPI_system1 ~]$ mpd & [1] 9152 Which make MPI_system1 as master system of ring. [nayan@MPI_system1 ~]$ mpdtrace –l MPI_system1_32958 (172.16.1.1)

How to form a Ring? (cont’d) • Then run following command in terminal of all other system [nayan@MPI_system1 ~]$ mpd –h MPI_system1 –p 32958 & Port number of Master system Host name of Master system

How to kill a Ring? • And to kill ring run following command on master system. [nayan@MPI_system1 ~]$ mpdallexit

Compiling MPI Programs • to compile and execute above program follow the following steps • first of all compile sat1.c by executing following command on master system. [nayan@MPI_system1 ~]$ mpicc -0 sat1.out sat1.c here “mpicc” is mpi command to compile sat1.c file and sat1.out is output file.

Running MPI Programs • Now to run this output file type following command in master system [nayan@MPI_system1 ~]$ mpiexec -n 1 ./sat1.out

Benchmarking Results

OpenMP • OpenMP: An application programming interface (API) for parallel programming on multiprocessors • Compiler directives • Library of support functions • OpenMP works in conjunction with Fortran, C, or C++

Shared-memory Model Processors interact and synchronize with each other through shared variables.

Fork/Join Parallelism • Initially only master thread is active • Master thread executes sequential code • Fork: Master thread creates or awakens additional threads to execute parallel code • Join: At end of parallel code created threads die or are suspended

Fork/Join Parallelism

Shared-memory Model vs.Message-passing Model • Shared-memory model • Number active threads 1 at start and finish of program, changes dynamically during execution • Message-passing model • All processes active throughout execution of program

Parallel for Loops • C programs often express data-parallel operations as for loops for (i = first; i < size; i += prime) marked[i] = 1; • OpenMP makes it easy to indicate when the iterations of a loop may execute in parallel • Compiler takes care of generating code that forks/joins threads and allocates the iterations to threads

Pragmas • Pragma: a compiler directive in C or C++ • Stands for “pragmatic information” • A way for the programmer to communicate with the compiler • Compiler free to ignore pragmas • Syntax: #pragma omp<rest of pragma>

Parallel for Pragma • Format: #pragma omp parallel for for (i = 0; i < N; i++) a[i] = b[i] + c[i]; • Compiler must be able to verify the run-time system will have information it needs to schedule loop iterations

Function omp_get_num_procs • Returns number of physical processors available for use by the parallel program int omp_get_num_procs (void)

Function omp_set_num_threads • Uses the parameter value to set the number of threads to be active in parallel sections of code • May be called at multiple points in a program void omp_set_num_threads (int t)

Comparison

C+MPI vs. C+MPI+OpenMP C + MPI C + MPI + OpenMP

Benchmarking Results

Example Programm

How to run OpenMP programm $ export OMP_NUM_THREAD=2 $ gcc –fopenmp filename.c $ ./a.out

For more information • http://www.cs.ccu.edu.tw/~naiwei/cs5635/cs5635.html • http://www.nersc.gov/nusers/help/tutorials/mpi/intro/print.php • http://www-unix.mcs.anl.gov/mpi/tutorial/mpiintro/ppframe.htm • http://www.mhpcc.edu/training/workshop/mpi/MAIN.html • http://www.openmp.org

MPI and OpenMP

MPI and OpenMP

Presentation Transcript

Parallel Programming in C with MPI and OpenMP

Hybrid Programming with OpenMP and MPI

Hybrid openmp / mpi

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP

Hybrid OpenMP and MPI Programming

Introduction to Mixed MPI and OpenMP

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP

Hybrid OpenMP and MPI Programming and Tuning

MPI and OpenMP

Parallel Programming with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP

Introduction to MPI, OpenMP, Threads

Integrated MPI/OpenMP Performance Analysis

Parallel Programming in C with MPI and OpenMP

Parallel Programming with MPI and OpenMP