1 / 19

An Evaluation of Auto-Scoping in OpenMP

An Evaluation of Auto-Scoping in OpenMP. Michael Voss, Eric Chiu, Patrick Chow, Catherine Wong and Kevin Yuen ECE Department University of Toronto. An Overview of Auto-scoping. Dieter an Mey proposed Auto-scoping as an extension to OpenMP ( www.cOMPunity.org )

quasim
Download Presentation

An Evaluation of Auto-Scoping in OpenMP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Evaluation of Auto-Scoping in OpenMP Michael Voss, Eric Chiu, Patrick Chow, Catherine Wong and Kevin Yuen ECE Department University of Toronto

  2. An Overview of Auto-scoping • Dieter an Mey proposed Auto-scoping as an extension to OpenMP (www.cOMPunity.org) • Relieve users from burden of explicit scoping • error prone • tedious • compromise: explicit and automatic parallelization • analysis is similar to automatic parallelization • successful in 1 of 2 scientific programs WOMPAT 2004

  3. C$OMP PARALLEL DO SHARED(A,B) C$OMP&PRIVATE(I,J) DO I = 1,100 DO J = 1,100 A(I,J) = A(J,I) + B(I,J) ENDDO ENDDO C$OMP END PARALLEL DO Using DEFAULT(AUTO) C$OMP PARALLEL DO C$OMP&DEFAULT(AUTO) DO I = 1,100 DO J = 1,100 A(I,J) = A(J,I) + B(I,J) ENDDO ENDDO C$OMP END PARALLEL DO WOMPAT 2004

  4. Outline of Talk • Introduction • Implementing DEFAULT(AUTO) in Polaris • An evaluation of DEFAULT(AUTO) in Polaris • comparison with EA Sun Studio 9 F95 compiler • A Discussion of runtime support • Related Work • Conclusion WOMPAT 2004

  5. Implementing DEFAULT(AUTO) in Polaris • Polaris is auto-parallelizer for Fortran 77 • Supports a range of advanced techniques • The Range Test • The Omega Test • Array and Scalar Privatization • Array and Scalar Reduction Recognition • Induction Variables Substitution • Interprocedural Constant Propagation • Most Interprocedural Optimization by Inlining WOMPAT 2004

  6. Polaris as an OMP to OMP Translator Polaris Parser DDtest pass Reduction pass Privatization pass … OpenMP Backend Fortran 77 Fortran 77 + OpenMP Polaris Parser Moerae Backend Fortran 77 + Moerae calls Fortran 77 + OpenMP Original automatic parallelization path OpenMP to explicitly threaded code path New OpenMP to OpenMP path WOMPAT 2004

  7. Supporting DEFAULT(AUTO) • Parse DEFAULT(AUTO) • React appropriately to user directives • selective loop parallelization • no changes without AUTO directive • user scoping overrides Polaris scoping • can parallelize loops that cannot be fully auto-scoped • Limitations • only regions with PARALLEL DO semantics • bails out on general parallel regions WOMPAT 2004

  8. Example 1: No explicit scoping !$OMP PARALLEL DEFAULT(AUTO) DO N = 1,7 DO M = 1,7 !$OMP DO DO L = LSS(itsub),LEE(itsub) I = IG(L) J = JG(L) K = KG(L) LIJK = L2IJK(L) RHS(L,M) = RHS(L,M) + - FJAC(LIJK,LM00,M,N)*DQCO(i-1,j,k,n,NB)*FM00(L) + - FJAC(LIJK,LP00,M,N)*DQCO(i+1,j,k,n,NB)*FP00(L) + - FJAC(LIJK,L0M0,M,N)*DQCO(i,j-1,k,n,NB)*F0M0(L) + - FJAC(LIJK,L0P0,M,N)*DQCO(i,j+1,k,n,NB)*F0P0(L) ENDDO !$OMP END DO NOWAIT ENDDO ENDDO !$OMP END PARALLEL WOMPAT 2004

  9. Example 1: No explicit scoping !$OMP PARALLEL !$OMP+DEFAULT(SHARED)!$OMP+PRIVATE(M,L,N) DO n = 1, 7, 1 DO m = 1,7, 1 !$OMP DO DO l = lss(itsub), lee(itsub), 1 rhs(l, m) = rhs(l, m)+(-dqco(ig(l), (-1)+jg(l), kg(l), n, nb))* *f0m0(l)*fjac(l2ijk(l), l0m0, m, n)+(-dqco(ig(l), 1+jg(l), kg(l), n *, nb))*f0p0(l)*fjac(l2ijk(l), l0p0, m, n)+(-dqco((-1)+ig(l), jg(l) *, kg(l), n, nb))*fjac(l2ijk(l), lm00, m, n)*fm00(l)+(-dqco(1+ig(l) *, jg(l), kg(l), n, nb))*fjac(l2ijk(l), lp00, m, n)*fp00(l) ENDDO !$OMP END DO NOWAIT ENDDO ENDDO !$OMP END PARALLEL WOMPAT 2004

  10. Example 2: Explicit scoping SUBROUTINE RECURSION(n,k,a,b,c,d,e,f,g,h,s) REAL*8 A(*),B(*),C(*),D(*),E(*),F(*),G(*),H(*) REAL*8 T,S INTEGER N,K,I S = 0.0D0 C$OMP PARALLEL SHARED(D) C$OMP+DEFAULT(AUTO) C$OMP DO DO I = 1,N T = F(I) + G(I) A(I) = B(I) + C(I) D(I+K) = D(I) + E(I) H(I) = H(I) * T S = S + H(I) END DO C$OMP END DO C$OMP END PARALLEL END WOMPAT 2004

  11. Example 2: Explicit scoping SUBROUTINE recursion(n, k, a, b, c, d, e, f, g, h, s) DOUBLE PRECISION a, b, c, d, e, f, g, h, s, t INTEGER*4 i, k, n DIMENSION a(*), b(*), c(*), d(*), e(*), f(*), g(*), h(*) s = 0.0D0 !$OMP PARALLEL !$OMP+DEFAULT(SHARED) !$OMP+PRIVATE(T,I) !$OMP DO !$OMP+REDUCTION(+:s) DO i = 1, n, 1 t = f(i)+g(i) a(i) = b(i)+c(i) d(i+k) = d(i)+e(i) h(i) = h(i)*t s = h(i)+s ENDDO !$OMP END DO !$OMP END PARALLEL RETURN END WOMPAT 2004

  12. Evaluation of DEFAULT(AUTO) • Fortran 77 Benchmarks from SPEC OpenMP • removed all explicit scoping • added DEFAULT(AUTO) to all regions • used Omni OpenMP compiler as backend (-O2) • Explicit speedup –vs- auto-scope speedup • four processor Xeon server • 1.8 GHz processors, 16 GBytes main memory • Hyperthreaded, but only used 1 thread per CPU • Also used EA Sun Studio 9 Fortran 95 compiler • supports DEFAULT(__AUTO) • report number of regions auto-scoped WOMPAT 2004

  13. Performance of Auto-scoping Sun results are for the Early Access Version of the Sun Microsystems Studio 9 Fortran 95 compiler. WOMPAT 2004

  14. Discussion • Many regions were not fully analyzable • Polaris could not fully inline the regions • several regions were general parallel regions • Early Access Sun Studio 9 compiler • auto-scoped fewer regions in general • missed important regions in Swim and Mgrid • regions could be parallelized but not auto-scoped • Sun compiler could auto-scope some regions that Polaris could not • can analyze general parallel regions WOMPAT 2004

  15. A general parallel region from WupwisePolaris fails but the Sun compiler succeeds C$OMP PARALLEL DEFAULT(AUTO) LSCALE = ZERO LSSQ = ONE C$OMP DO DO IX = 1, 1 + (N - 1) *INCX, INCX IF (DBLE (X(IX)) .NE. ZERO) THEN ... LSSQ = ONE + LSSQ* (LSCALE / TEMP) ** 2 LSCALE = TEMP END IF ... END DO C$OMP END DO C$OMP CRITICAL IF (SCALE .LT. LSCALE) THEN SSQ = ((SCALE / LSCALE) ** 2) * SSQ + LSSQ SCALE = LSCALE ELSE SSQ = SSQ + ((LSCALE / SCALE) ** 2) * LSSQ END IF C$OMP END CRITICAL C$OMP END PARALLEL WOMPAT 2004

  16. Runtime Support for Auto-scoping • add speculate directive for regions that cannot be auto-scoped • applies to very few regions in SPEC OpenMP • requires interprocedural marking of reads/writes • only 2 regions not auto-scoped can be fully analyzed !$OMP PARALLEL !$OMP+DEFAULT(SHARED) !$OMP+PRIVATE(U51K,U41K,U31K,Q,U21K,M,K,I,U41,U31KM1,U51KM1,U21KM1) !$OMP+PRIVATE(U41KM1,TMP,J) !$OMP+SPECULATE(UTMP,RTMP) !$OMP DO !$OMP+LASTPRIVATE(FLUX2) DO j = jst, jend, 1 ... ENDDO !$OMP END DO !$OMP END PARALLEL (a region from the RHS subroutine of Applu) WOMPAT 2004

  17. Related Work • DEFAULT(AUTO) proposed by Dieter an Mey • Many commercial and research auto-parallelizers • Polaris, SUIF, CAPO, … • Perform parallelization and scoping • The EA Sun Studio 9 Fortran 95 Compiler • paper also here at WOMPAT • thanks to Yuan Lin for pointing me to it • Runtime dependence testing • Saltz, Rauchwerger, … WOMPAT 2004

  18. Conclusion • Implemented DEFAULT(AUTO) in Polaris • created full OpenMP to OpenMP translator • added facilities for auto-scoping • Evaluated implementation • 2 of 5 benchmarks fully auto-scoped • remainder showed significant loss of speedup • results different from EA Sun compiler • performance not portable across compilers • Discussed speculative parallelization support WOMPAT 2004

  19. Conclusion cont… • Combination of loop and region analyzer • Polaris auto-scoped more regions • Sun compiler can handle general regions • Performance not be portable across compilers • never is but… • sacrifice performance for convenience • perhaps a useful tool during manual parallelization • Future work • general region support in Polaris WOMPAT 2004

More Related