Hybrid openmp / mpi

The hybird approach to programming clusters of multi-core architetures Hybrid openmp/mpi

High-performance computing • Cluster computing • In order to gain high performance many computers are networked together to work on computation heavy problems as a whole • Multi-core computers • Since the max single core computer speeds began to level out, multi-core computers became the popular solution

High-performance computing • Multi-core clusters • With multi-core computers being popular, nodes in clusters began to make use of this style • Having multi-core node would seem to make more powerful nodes and save lot of power and space • Programming these multi-core nodes could possibly be different

Programming • Cluster computers • The standard model for programing clusters is MPI • MPI is the message passing interface that is used mostly with distributed memory • Multi-core computers • Multi-core computers have shared memory • An efficient way to program on shared memory would be the use of OpenMP

Multi-core Clusters • When programming for multi-core clusters MPI treats cores on different nodes and the same node in a similar manner • MPI completely ignore the fact that core on a single node work on shared memory • This causes problems because duplicate data is on on a single node with multiple cores

MPI • MPI uses communicator objects which connect groups of processes in the MPI session. • MPI supports point-to-point communication between two specific processes. • Collective functions are used to communicate all processes in a process group

Problems with MPI • Collectives • Collectives take arguments that are arrays of size equal to the number of processes in the communicator • Example of collectives are MPI_Gatherv, MPI_Scatterv, and MPI_Alltoallv • In the case of MPI_Alltoallv, it takes arguments for both sends and receiveswhich add up to 4 MB on each process if there is a million processes

Solution • The combination of OpenMP and MPI is a worthy solution • MPI can handle the communication between nodes on distributed memory • OpenMP can handle communication within a single node on shared memory

Implementation • Since OpenMP is not an all or nothing model, it can be injected into certain parts of the program • One can identify the most time consuming loops and place OpenMP directives on them • One can also place directives on loops across undistributed dimensions.

Hybrid masteronly • This model is when only one MPI process is used per node • All communication within the node is done by OpenMP • Problem • The idling of other threads in the node while communication my MPI is taking place • MPI bandwidth not always fully used with a single communicating thread

Hybrid with overlap • Using this method is a way to avoid idling computed threads during communication • The communication is split into to one or more OpenMP threads to handle communication in parallel with useful calculation

Conclusion • Taking advantage of the hybrid approach has advantages over pure MPI in some cases • This is not valid for all cases, some cases will have reverse effects • If programmed well, applications can be very scalable and have great performance increase

Hybrid openmp / mpi