340 likes | 354 Views
This paper presents a performance comparison of pure message-passing and hybrid MPI-OpenMP parallelization models on SMP clusters. The study includes an evaluation of hyperplane scheduling, fine-grain model, and coarse-grain model. Experimental results and future work are also discussed.
E N D
Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,nkoziris}@cslab.ece.ntua.gr www.cslab.ece.ntua.gr
Overview • Introduction • Pure Message-passing Model • Hybrid Models • Hyperplane Scheduling • Fine-grain Model • Coarse-grain Model • Experimental Results • Conclusions – Future Work IPDPS 2004
Motivation • Active research interest in • SMP clusters • Hybrid programming models • However: • Mostly fine-grain hybrid paradigms (masteronly model) • Mostly DOALL multi-threaded parallelization IPDPS 2004
Contribution • Comparison of 3 programming models for the parallelization of tiled loops algorithms • pure message-passing • fine-grain hybrid • coarse-grain hybrid • Advanced hyperplane scheduling • minimize synchronization need • overlap computation with communication • preserves data dependencies IPDPS 2004
Algorithmic Model Tiled nested loops with constant flow data dependencies FORACROSS tile0 DO … FORACROSS tilen-2 DO FOR tilen-1 DO Receive(tile); Compute(tile); Send(tile); END FOR END FORACROSS … END FORACROSS IPDPS 2004
Target Architecture SMP clusters IPDPS 2004
Overview • Introduction • Pure Message-passing Model • Hybrid Models • Hyperplane Scheduling • Fine-grain Model • Coarse-grain Model • Experimental Results • Conclusions – Future Work IPDPS 2004
Pure Message-passing Model tile0 = pr0; … tilen-2 = prn-2; FOR tilen-1 = 0 TO DO Pack(snd_buf, tilen-1 – 1, pr); MPI_Isend(snd_buf, dest(pr)); MPI_Irecv(recv_buf, src(pr)); Compute(tile); MPI_Waitall; Unpack(recv_buf, tilen-1 + 1, pr); END FOR IPDPS 2004
Pure Message-passing Model IPDPS 2004
Overview • Introduction • Pure Message-passing Model • Hybrid Models • Hyperplane Scheduling • Fine-grain Model • Coarse-grain Model • Experimental Results • Conclusions – Future Work IPDPS 2004
Hyperplane Scheduling • Implements coarse-grain parallelism assuming inter-tile data dependencies • Tiles are organized into data-independent subsets (groups) • Tiles of the same group can be concurrently executed by multiple threads • Barrier synchronization between threads IPDPS 2004
Hyperplane Scheduling tile (mpi_rank,omp_tid,tile) group IPDPS 2004
Hyperplane Scheduling #pragma omp parallel { group0 = pr0; … groupn-2 = prn-2; tile0 = pr0 * m0 + th0; … tilen-2 = prn-2 * mn-2 + thn-2; FOR(groupn-1){ tilen-1 = groupn-1 - ; if(0 <= tilen-1 <= ) compute(tile); #pragma omp barrier } } IPDPS 2004
Overview • Introduction • Pure Message-passing Model • Hybrid Models • Hyperplane Scheduling • Fine-grain Model • Coarse-grain Model • Experimental Results • Conclusions – Future Work IPDPS 2004
Fine-grain Model • Incremental parallelization of computationally intensive parts • Pure MPI + hyperplane scheduling • Inter-node communication outside of multi-threaded part (MPI_THREAD_MASTERONLY) • Thread synchronization through implicit barrier of omp parallel directive IPDPS 2004
Fine-grain Model FOR(groupn-1){ Pack(snd_buf, tilen-1 – 1, pr); MPI_Isend(snd_buf, dest(pr)); MPI_Irecv(recv_buf, src(pr)); #pragma omp parallel { thread_id=omp_get_thread_num(); if(valid(tile,thread_id,groupn-1)) Compute(tile); } MPI_Waitall; Unpack(recv_buf, tilen-1 + 1, pr); } IPDPS 2004
Overview • Introduction • Pure Message-passing Model • Hybrid Models • Hyperplane Scheduling • Fine-grain Model • Coarse-grain Model • Experimental Results • Conclusions – Future Work IPDPS 2004
Coarse-grain Model • Threads are only initialized once • SPMD paradigm (requires more programming effort) • Inter-node communication inside multi-threaded part (requires MPI_THREAD_FUNNELED) • Thread synchronization through explicit barrier (omp barrier directive) IPDPS 2004
Coarse-grain Model #pragma omp parallel { thread_id=omp_get_thread_num(); FOR(groupn-1){ #pragma omp master{ Pack(snd_buf, tilen-1 – 1, pr); MPI_Isend(snd_buf, dest(pr)); MPI_Irecv(recv_buf, src(pr)); } if(valid(tile,thread_id,groupn-1)) Compute(tile); #pragma omp master{ MPI_Waitall; Unpack(recv_buf, tilen-1 + 1, pr); } #pragma omp barrier } } IPDPS 2004
Overview • Introduction • Pure Message-passing Model • Hybrid Models • Hyperplane Scheduling • Fine-grain Model • Coarse-grain Model • Experimental Results • Conclusions – Future Work IPDPS 2004
Experimental Results • 8-node SMP Linux Cluster (800 MHz PIII, 128 MB RAM, kernel 2.4.20) • MPICH v.1.2.5 (--with-device=ch_p4, --with-comm=shared) • Intel C++ compiler 7.0 (-O3 -mcpu=pentiumpro -static) • FastEthernet interconnection • ADI micro-kernel benchmark (3D) IPDPS 2004
Alternating Direction Implicit (ADI) • Stencil computation used for solving partial differential equations • Unitary data dependencies • 3D iteration space (X x Y x Z) IPDPS 2004
ADI – 2 dual SMP nodes IPDPS 2004
ADI X=128 Y=512 Z=8192 – 2 nodes IPDPS 2004
ADI X=256 Y=512 Z=8192 – 2 nodes IPDPS 2004
ADI X=512 Y=512 Z=8192 – 2 nodes IPDPS 2004
ADI X=512 Y=256 Z=8192 – 2 nodes IPDPS 2004
ADI X=512 Y=128 Z=8192 – 2 nodes IPDPS 2004
ADI X=128 Y=512 Z=8192 – 2 nodes Computation Communication IPDPS 2004
ADI X=512 Y=128 Z=8192 – 2 nodes Computation Communication IPDPS 2004
Overview • Introduction • Pure Message-passing Model • Hybrid Models • Hyperplane Scheduling • Fine-grain Model • Coarse-grain Model • Experimental Results • Conclusions – Future Work IPDPS 2004
Conclusions • Tiled loop algorithms with arbitrary data dependencies can be adapted to the hybrid parallel programming paradigm • Hybrid models can be competitive to the pure message-passing paradigm • Coarse-grain hybrid model can be more efficient than fine-grain one, but also more complicated • Programming efficiently in OpenMP not easier than programming efficiently in MPI IPDPS 2004
Future Work • Application of methodology to real applications and standard benchmarks • Work balancing for coarse-grain model • Investigation of alternative topologies, irregular communication patterns • Performance evaluation on advanced interconnection networks (SCI, Myrinet) IPDPS 2004
Thank You! Questions? IPDPS 2004