520 likes | 763 Views
OpenMP. Presented by Kyle Eli. OpenMP. Open Open, Collaborative Specification Managed by the OpenMP Architecture Review Board (ARB) MP Multi Processing. OpenMP is…. A specification for using shared memory parallelism in Fortran and C/C++. Compiler Directives Library Routines
E N D
OpenMP Presented by Kyle Eli
OpenMP • Open • Open, Collaborative Specification • Managed by the OpenMP Architecture Review Board (ARB) • MP • Multi Processing
OpenMP is… • A specification for using shared memory parallelism in Fortran and C/C++. • Compiler Directives • Library Routines • Environment Variables • Usable with • Fortran 77, Fortran 90, ANSI 89 C or ANSI C++ • Does not require Fortran 90 or C++
OpenMP requires… • Platform support • Many operating systems, including Windows, Solaris, Linux, AIX, HP-UX, IRIX, OSX • Many CPU architectures, including x86, x86-64, PowerPC, Itanium, PA-RISC, MIPS • Compiler support • Many commercial compilers from vendors such as Microsoft, Intel, Sun, and IBM • GCC via GOMP • Should be included in GCC 4.2 • May already be available with some distributions
OpenMP offers… • Consolidation of vendor-specific implementations • Single-source portability • Support for coarse grain parallelism • Allows for complex (coarse grain) code in parallel applications.
OpenMP offers… • Scalability • Simple constructs with low overhead • However, still dependent on the application and algorithm. • Nested parallelism • May be executed on a single thread • Loop-level parallelism • However, no support for task parallelism • They’re working on it
OpenMP compared to… • Message Passing Interface (MPI) • OpenMP is not a message passing specification • Less overhead • High Performance Fortran (HPF) • Not widely accepted • Focus on data parallelism
OpenMP compared to… • Pthreads • Not targeted for HPC/scientific computing • No support for data parallelism • Requires lower-level programming • FORALL loops • Simple loops • Subroutine calls can’t have side-effects • Various parallel programming languages • May be architecture specific • May be application specific
The OpenMP Model • Sequential code • Implemented in the usual way • Executes normally • Parallel code • Multiple threads created • Number of threads can be user-specified • Each thread executes the code in the parallel region
Using OpenMP • Compiler directives • Begin with #pragma omp • In C/C++, the code region is defined by curly braces following the directive • Should be ignored by compilers that don’t understand OpenMP • Define how regions of code should be executed • Define variable scope • Synchronization
Using OpenMP • Parallel region construct • #pragma omp parallel • Defines a region of parallel code • Causes a team of threads to be created • Threads execute code in the region in the same order • Threads join after the region ends
Using OpenMP • Work-sharing directives • For • Sections • Single
Using OpenMP • For construct • #pragma omp for • Loop parallelism • Iterations of the loop are divided amongst worker threads • Workload division can be user-specified • Branching out of the loop is not allowed
Using OpenMP • Sections construct • #pragma omp sections • Divides code into sections which are divided amongst worker threads • #pragma omp section • Used to define each section
Using OpenMP • Single construct • #pragma omp single • Only one thread executes the code • Useful when code is not thread-safe • All other threads wait until execution completes
Using OpenMP • Synchronization directives • Master • #pragma omp master • Code is executed only by the master thread • Critical • #pragma omp critical • Code is executed by only one thread at a time • Barrier • #pragma omp barrier • Threads will wait for all other threads to reach this pointbefore continuing • Atomic • #pragma omp atomic • The following statement (which must be an assignment)is executed by only one thread at a time.
Using OpenMP • Synchronization Directives • Flush • #pragma omp flush • Thread-visible variables are written back to memory to present a consistent view across all threads • Ordered • #pragma omp ordered • Forces iterations of a loop to be executed in sequential order • Used with the For directive • Threadprivate • #pragma omp threadprivate • Causes global variables to be local and persistent to a thread across multiple parallel regions
Using OpenMP • Data Scope • By default, most variables are shared • Loop index and subroutine stack variables are private
Using OpenMP • Data scoping attributes • Private • New object of the same type is created for each thread • Not initialized • Shared • Shared amongst all threads • Default • Allows specification of default scope (Private, Shared, or None) • Firstprivate • Variable is initialized with the value from the original object
Using OpenMP • Data scoping attributes • Lastprivate • Original object is updated with data from last section or loop iteration • Copyin • Variable in each thread is initialized with the data from the original object in the master thread • Reduction • Each thread gets a private copy of the variable, and the reduction clause allows specification of an operator for combining the private copies into the final result
OpenMP Example • A short OpenMP example…
References • http://www.openmp.org • http://www.llnl.gov/computing/tutorials/openMP/
MPI By Chris Van Horn
What is MPI? • Message Passing Interface • More specifically a library specification for a message passing interface
Why MPI? • What are the advantages of a message passing interface? • What could a message passing interface be used for?
History of MPI • MPI 1.1 • Before everyone had to implement own message passing interface • Committee formed of around 60 people from 40 organizations
MPI 1.1 • The standardization process began in April 1992 • Preliminary draft submitted November 1992 • Just meant to get the ball rolling
MPI 1.1 Continued • Subcommittees were formed for the major component areas • Goal to produce standard by Fall 1993
MPI Goals • Design an application programming interface (not necessarily for compilers or a system implementation library). • Allow efficient communication: Avoid memory-to-memory copying and allow overlap of computation and communication and offload to communication co-processor, where available. • Allow for implementations that can be used in a heterogeneous environment. • Allow convenient C and Fortran 77 bindings for the interface. • Assume a reliable communication interface: the user need not cope with communication failures. Such failures are dealt with by the underlying communication subsystem.
MPI Goals Continued • Define an interface that is not too different from current practice, such as PVM, NX, Express, p4, etc., and provides extensions that allow greater flexibility. • Define an interface that can be implemented on many vendor's platforms, with no significant changes in the underlying communication and system software. • Semantics of the interface should be language independent. • The interface should be designed to allow for thread-safety
MPI 2.0 • In March 1995 work began on extensions to MPI 1.1 • Forward Compatibility was preserved
Goals of MPI 2.0 • Further corrections and clarifications for the MPI-1.1 document. • Additions to MPI-1.1 that do not significantly change its types of functionality (new datatype constructors, language interoperability, etc.). • Completely new types of functionality (dynamic processes, one-sided communication, parallel I/O, etc.) that are what everyone thinks of as ``MPI-2 functionality.'' • Bindings for Fortran 90 and C++. This document specifies C++ bindings for both MPI-1 and MPI-2 functions, and extensions to the Fortran 77 binding of MPI-1 and MPI-2 to handle Fortran 90 issues. • Discussions of areas in which the MPI process and framework seem likely to be useful, but where more discussion and experience are needed before standardization (e.g. 0-copy semantics on shared-memory machines, real-time specifications).
How MPI is used • An MPI program consists of autonomous processes • The processes communicate via calls to MPI communication primitives
Features • Process Management • One Sided Communication • Collective Operations • I/O
What MPI Does not Do • Resource Control • Not able to design a portable interface that would be appropriate for the broad spectrum of existing and potential resource and process controllers.
Process Management • Can be tricky to implement properly • What to watch out for: • The MPI-2 process model must apply to the vast majority of current parallel environments. These include everything from tightly integrated MPPs to heterogeneous networks of workstations. • MPI must not take over operating system responsibilities. It should instead provide a clean interface between an application and system software.
Warnings continued • MPI must continue to guarantee communication determinism, i.e., process management must not introduce unavoidable race conditions. • MPI must not contain features that compromise performance. • MPI-1 programs must work under MPI-2, i.e., the MPI-1 static process model must be a special case of the MPI-2 dynamic model.
How Issues Addressed • MPI remains primarily a communication library. • MPI does not change the concept of communicator.
One Sided Communication • Functions that establish communication between two sets of MPI processes that do not share a communicator. • When would one sided communication be useful?
One Sided Communication • How are the two sets of processes going to communicate with each other? • Need some sort of rendezvous point.
Collective Operations • Intercommunicator collective operations • All-To-All • All processes contribute to the result. All processes receive the result. • * MPI_Allgather, MPI_Allgatherv • * MPI_Alltoall, MPI_Alltoallv • * MPI_Allreduce, MPI_Reduce_scatter • All-To-One • All processes contribute to the result. One process receives the result. • * MPI_Gather, MPI_Gatherv • * MPI_Reduce
Collective Operations • One-To-All • One process contributes to the result. All processes receive the result. • * MPI_Bcast • * MPI_Scatter, MPI_Scatterv • Other • Collective operations that do not fit into one of the above categories. • * MPI_Scan • * MPI_Barrier
I/O • Optimizations required for efficiency can only be implemented if the parallel I/O system provides a high-level interface
MPI Implementations • Many different implementations most widely used MPICH(1.1) and MPICH2(2.0) • Argonne National Laboratory
Examples • To run the program ``ocean'' with arguments ``-gridfile'' and ``ocean1.grd'' in C: • char command[] = "ocean"; • char *argv[] = {"-gridfile", "ocean1.grd", NULL}; • MPI_Comm_spawn(command, argv, ...); • To run the program ``ocean'' with arguments ``-gridfile'' and ``ocean1.grd'' and the program ``atmos'' with argument ``atmos.grd'' in C: • char *array_of_commands[2] = {"ocean", "atmos"}; • char **array_of_argv[2]; • char *argv0[] = {"-gridfile", "ocean1.grd", (char *)0}; • char *argv1[] = {"atmos.grd", (char *)0}; • array_of_argv[0] = argv0; • array_of_argv[1] = argv1; • MPI_Comm_spawn_multiple(2, array_of_commands, array_of_argv, ...);
More Examples • /* manager */ • #include "mpi.h" • int main(int argc, char *argv[]) • { • int world_size, universe_size, *universe_sizep, flag; • MPI_Comm everyone; /* intercommunicator */ • char worker_program[100]; • MPI_Init(&argc, &argv); • MPI_Comm_size(MPI_COMM_WORLD, &world_size); • if (world_size != 1) error("Top heavy with management"); • MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, • &universe_sizep, &flag);
Example Continued • if (!flag) { • printf("This MPI does not support UNIVERSE_SIZE. How many\n\ • processes total?"); • scanf("%d", &universe_size); • } else universe_size = *universe_sizep; • if (universe_size == 1) error("No room to start workers"); • /* • * Now spawn the workers. Note that there is a run-time determination • * of what type of worker to spawn, and presumably this calculation must • * be done at run time and cannot be calculated before starting • * the program. If everything is known when the application is • * first started, it is generally better to start them all at once • * in a single MPI_COMM_WORLD. • */ • choose_worker_program(worker_program); • MPI_Comm_spawn(worker_program, MPI_ARGV_NULL, universe_size-1, • MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone, • MPI_ERRCODES_IGNORE); • /* • * Parallel code here. The communicator "everyone" can be used • * to communicate with the spawned processes, which have ranks 0,.. • * MPI_UNIVERSE_SIZE-1 in the remote group of the intercommunicator • * "everyone". • */ • MPI_Finalize(); • return 0; • }
Yet More Example • /* worker */ • #include "mpi.h" • int main(int argc, char *argv[]) • { • int size; • MPI_Comm parent; • MPI_Init(&argc, &argv); • MPI_Comm_get_parent(&parent); • if (parent == MPI_COMM_NULL) error("No parent!"); • MPI_Comm_remote_size(parent, &size); • if (size != 1) error("Something's wrong with the parent"); • /* • * Parallel code here. • * The manager is represented as the process with rank 0 in (the remote • * group of) MPI_COMM_PARENT. If the workers need to communicate among • * themselves, they can use MPI_COMM_WORLD. • */ • MPI_Finalize(); • return 0; • }
References • MPI Standards (http://www-unix.mcs.anl.gov/mpi/mpi-standard/mpi-report-2.0/mpi2-report.htm)