HPF (High Performance Fortran)

HPF (High Performance Fortran)

What is HPF? • HPF is a standard for data-parallel programming. • Extends Fortran-77 or Fortran-90. • Similar extensions exist for C and C++, but Fortran is really the focus.

Principle of HPF • Extending sequential language with data distribution directives. • Data distribution directives specify on which processor a certain part of an array should reside. • Compiler then produces: • parallel program, • communication between the processes.

What the Standard Says • Can be used with both Fortran-77 and Fortran-90. • Distribution directives are just a hint, compiler can ignore them. • HPF can be used on both shared memory and distributed memory hardware platforms.

In Commercial Use • HPF is always used with Fortran-90. • Distribution directives are a must. • HPF used on both shared memory and distributed memory platforms. • But the truth is that the language was really meant for distributed memory platforms.

Not to Confuse You • We will discuss commercial use: • Fortran-90 • Concurrency extensions to Fortran-90 in HPF. • HPF data distribution directives. • How HPF maps to a distributed memory platform. • Afterwards, we will discuss what the standard allows in addition.

Fortran-90 • Fortran + a number of array features. • Scalar operations are extended to arrays. • Intrinsic functions are extended to arrays. • Additional array-based intrinsic functions.

Array Assignment Scalar assignment: integer a, b, c a = b + c Array assignment: integer A(10,10), B(10,10), C(10,10) A = B + C

Requirements for Array Assignment • Arrays must be comformable • have the same number of dimensions, and • have the same size in each dimension. • One major exception for scalar is allowed: integer A(10,10), B(10,10), c A = B + c

Intrinsic Functions Extended to Arrays integer A(10,10), B(10,10) A = SQRT(A) B = ABS(A)

Additional Array Intrinsic Functions • MAXVAL, MINVAL • MAXLOC, MINLOC • return array of indices • SUM, PRODUCT • MATMUL, DOT_PRODUCT, TRANSPOSE

Examples real A(100,100), B(100), s int i(1), j(2) s = SUM(A) i = MAXLOC(B) j = MINLOC(A) C = DOT_PRODUCT(B, A)

Array Sections array( lower_bound : upper_bound : stride ) • Refers to the section of the array between lower_bound and upper_bound, with an optional stridespecified. • Multiple dimensions may be specified, with the obvious meaning. • Array sections may be used wherever arrays may be used.

Examples int A(10), B(10), C(10) int D(50), E(100), F(100) int max int G(100), H(100,100) A(1:8) = B(1:8) + C(2:9) D = E(1:100:2) + F(2:99:2) max = MAXVAL( G(1:100:10) ) max = MINVAL( H(1:100, 1:50) )

Semantics of Array Assignments • First, the entire right hand side is evaluated. • Then, assignments are made to the left hand side.

Example int A(4) = {7, 8, 12, 14} A(2:3) = A(1:2) => results in A being {7, 7, 8, 14} => not {7, 7, 7, 14}

Sequential/Parallel Fortran-90 • Fortran-90 is a sequential language. • However, its array assignment semantics makes it easy to parallelize it (automatically).

Not Perfect, Though (1 of 2) do i = 1,100 X(i,i) = 0.0; enddo • Obviously parallelizable. • Not expressible as a Fortran-90 array assignment (only regular sections).

Not Perfect, Though (2 of 2) int D(50), E(100), F(100) D = E(1:100:2) + F(2:99:2) is correct, but int D(100), E(100), F(100) D = E(1:100:2) + F(2:99:2) is not, because array D is not conformable.

HPF: Additional Expressions of Parallelism • FORALL array assignment. • INDEPENDENT construct.

FORALL Array Assignment FORALL( subscript = lower_bound : upper_bound : stride, mask) array-assignment • Execute all iterations of the subscript loop in parallel for the given set of indices, where maskis true. • May have multiple dimensions. • Same semantics: first compute right hand side, then assign to left hand side. • Only one assignment to particular element (not checked by the compiler!).

Examples (1 of 3) do i = 1,100 X(i,i) = 0.0 enddo becomes FORALL(i=1:100) X(i,i) = 0.0

Examples (2 of 3) int D(100), E(100), F(100) D = E(1:100:2) + F(2:100:2) becomes (correctly) FORALL(i=1:50) D(i) = E(2*i-1) + E(2*i)

Examples (3 of 3) • A multiple dimension example with use of the mask option. • Set all the elements of X above the diagonal to the sum of their indices. FORALL(i=1:100, j=1:100, i<j) X(i,j) = i+j

The INDEPENDENT Clause !HPF$ INDEPENDENT DO … ENDDO • Specifies that the iterations of the loop can be executed in any order.

Examples (1 of 2) !HPF$ INDEPENDENT DO i=1, 100 DO j = 1, 100 IF(i.NE.j) A(i,j) = 1.0 IF(i.EQ.j) A(i,j) = 0.0 ENDDO ENDDO

Examples (2 of 2): Nesting !HPF$ INDEPENDENT DO i=1, 100 !HPF$ INDEPENDENT DO j = 1, 100 IF(i.NE.j) A(i,j) = 1.0 IF(i.EQ.j) A(i,j) = 0.0 ENDDO ENDDO

HPF/Fortran-90 Matrix Multiply (1 of 4) C = MATMUL( A, B )

HPF Matrix Multiply (2 of 4) C = 0.0 FORALL(i=1:n, j=1:n ) C(i,j) = C(i,j) + A(i,k) * B(k,j)

HPF Matrix Multiply (3 of 4) !HPF$ INDEPENDENT DO i=1,n DO j=1,n C(i,j) = 0.0 DO k=1,n C(i,j) = C(i,j) + A(i,k) * B(k,j) ENDDO ENDDO ENDDO

HPF Matrix Multiply (4 of 4) !HPF$ INDEPENDENT DO i=1,n !HPF$ INDEPENDENT DO j=1,n C(i,j) = 0.0 DO k=1,n C(i,j) = C(i,j) + A(i,k) * B(k,j) ENDDO ENDDO ENDDO

HPF/Fortran-90 SOR (1 of 4) TEMP(1:n,1:n) = 0.25 * ( GRID(1:n,0:n-1) + GRID(1:n,2:n+1) + GRID(0:n-1,1:n) + GRID(2:n+1,1:n) ) GRID(1:n,1:n) = TEMP(1:n,1:n)

HPF/Fortran-90 SOR (1’ of 4) GRID(1:n,1:n) = 0.25 * ( GRID(1:n,0:n-1) + GRID(1:n,2:n+1) + GRID(0:n-1,1:n) + GRID(2:n+1,1:n) ) Also works, because of array assignment rules

HPF SOR (2 of 4) FORALL(i=1:n,j=1:n) TEMP(i,j) = 0.25 * ( GRID(i-1,j) + GRID(i+1,j) + GRID(i,j-1) + GRID(i,j+1) ) FORALL(i=1:n,j=1,n) GRID(i,j) = TEMP(i,j)

HPF SOR (3 of 4) !HPF$ INDEPENDENT DO I=1,n DO j=1,n TEMP(i,j) = 0.25 * ( GRID(i-1,j) + GRID(i+1,j) + GRID(i,j-1) + GRID(i,j+1) ) !HPF$ INDEPENDENT DO I=1,n DO j=1,n GRID(i,j) = TEMP(i,j)

HPF SOR (4 of 4) !HPF$ INDEPENDENT DO I=1,n !HPF$ INDEPENDENT DO j=1,n TEMP(i,j) = 0.25 * ( GRID(i-1,j) + GRID(i+1,j) + GRID(i,j-1) + GRID(i,j+1) ) !HPF$ INDEPENDENT DO I=1,n !HPF$ INDEPENDENT DO j=1,n GRID(i,j) = TEMP(i,j)

HPF (High Performance Fortran)

HPF (High Performance Fortran)

Presentation Transcript

Fortran at AWE, Aldermaston

Partitioning arrays

Compiling Array Assignments

Chuck Koelbel Department of Computer Science Rice University chk@rice.edu

Compiling High Performance Fortran

HPF Overview

High Performance Fortran (HPF)

Serial Run-time Error Detection and the Fortran Standard

Refining High Performance FORTRAN Code from Programming Model Dependencies

High Performance Fortran (HPF)

Data Parallel Languages (Chapter 4)

MPI

Aspects of practical parallel programming Parallel programming models Data parallel

Visual Solution to High Performance Computing

Fortran

Compiling Array Assignments

Chuck Koelbel Department of Computer Science Rice University chk@rice

Fortran at AWE, Aldermaston

Communication in Data Parallel Languages