1 / 1

MPI

A(10,10). image N. An Emerging, Portable Co-Array Fortran Compiler for High-Performance Computing Daniel Chavarría-Miranda, Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey

astrid
Download Presentation

MPI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A(10,10) image N An Emerging, Portable Co-Array Fortran Compiler for High-Performance Computing Daniel Chavarría-Miranda, Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey {danich, ccristi, dotsenko, johnmc}@cs.rice.edu Co-Array Fortran A sensible alternative to these extremes Programming Models for High-Performance Computing MPI HPF • Simple and expressive models for • high performance programming • based on extensions to widely used languages • Performance: users control data and computation partitioning • Portability: same language for SMPs, MPPs, and clusters • Programmability: global address space for simplicity • The compiler is responsible for communication and data locality • Annotated sequential code (semiautomatic parallelization) • Requires heroic compiler technology • The model limits the application paradigms: extensions to the standard are required for supporting irregular computation • Portable and widely used • The programmer has explicit control over data locality and communication • Using MPI can be difficult and error prone • Most of the burden for communication optimization falls on application developers; compiler support is underutilized Co-Array Fortran Language Explicit Data and Computation Partitioning Finite Element Example • SPMD process images • number of images fixed during execution • images operate asynchronously • Both private and shared data • real a(20,20)private:a 20x20 array in each image • real a(20,20) [*]shared: a 20x20 array in each image • Simple one-sided shared memory communication • x(:,j:j+2) = a(r,:) [p:p+2]copy rows from p:p+2 into local columns • Flexible synchronization • sync_team(team [,wait]) • team= a vector of process ids to synchronize with • wait=a vector of processes to wait for (a subset of team) • Pointers and dynamic allocation • Parallel I/O subroutine assemble(start, prin, ghost, neib, x) integer :: start(:), prin(:), ghost(:), neib(:) integer :: k1, k2, p real :: x(:) [*] call sync_all(neib) do p = 1, size(neib) ! Update from ghost regions k1 = start(p); k2 = start(p+1)-1 x(prin(k1:k2)) = x(prin(k1:k2)) + x(ghost(k1:k2)) [neib(p)] enddo call sync_all(neib) do p = 1, size(neib) ! Update the ghost regions k1 = start(p); k2 = start(p+1)-1 x(ghost(k1:k2)) [neib(p)] = x(prin(k1:k2)) enddo call sync_all end subroutine assemble integer A(10,10)[*] A(10,10) A(10,10) A(10,10) image 0 image 1 image N A(1:10,1:10)[2] = A(1:10,1:10)[2] A(10,10) A(10,10) image 0 image 1 Co-Array Fortran enables simple expression of complicated communication patterns Research Focus Sum Reduction Example • Compiler-directed optimization of communication tailored for target platform communication fabric • Transform as useful from 1-sided to 1.5 sided, two-sided and collective communication • Generate both fine-grain load/store and calls to communication libraries as necessary • Multi-model code for hierarchical architectures • Platform-driven optimization of computation • Compiler-directed parallel I/O with UIUC • Enhancements to Co-Array Fortran synch. model Original Co-Array Program Resulting Fortran 90 parallel program program eCafSum integer, save :: caf2d(10, 10)[*] integer :: sum2d(10, 10) integer :: me, num_imgs, i ! what is my image number me = this_image() ! how many images are running num_imgs = num_images() ! initial data assignment caf2d(1:10, 1:10) = me call sync_all() ! compute the sum for 2d co-array if (me .eq. 1) then sum2d(1:10, 1:10) = 0 do i = 1, num_imgs sum2d(1:10, 1:10) = sum2d(1:10,1:10)& + caf2d(1:10,1:10)[i] end do write(*,*) 'sum2d = ', sum2d endif call sync_all() end program eCafSum program eCafSum < Co-array Fortran initialization > ecafsum_caf2d%ptr(1:10, 1:10) = me call CafArmciSynchAll() if (me .eq. 1) then sum2d(1:10, 1:10) = 0 do i = 1, num_imgs, 1 allocate( cafTemp_2%ptr(1:10, 1:10) ) cafTemp_4%ptr =>ecafsum_caf2d%ptr(1:10,1:10) call CafArmciGetS(ecafsum_caf2d%handle, i, cafTemp_4, cafTemp_2) sum2d(1:10, 1:10) = cafTemp_2%ptr(1:10,1:10)+sum2d(1:10, 1:10) deallocate( cafTemp_2%ptr ) end do write(*,*) 'sum2d = ', sum2d(1:10, 1:10) endif call CafArmciSynchAll() call CafArmciFinalize() end program eCafSum Current Implementation Status • Source-to-source code generation for wide portability • Open source compiler will be available • Working prototype for a subset of the language • Initial compiler implementation performs no optimization • each co-array access is transformed into a get/put operation at the same point in the code • Code generation for the widely-portable ARMCI communication library • Front-end based on production-quality Open64 front end, modified to support source-to-source compilation • Successfully compiled and executed NAS MG on SGI Origin; performance similar to hand coded MPI

More Related