370 likes | 561 Views
HPF (High Performance Fortran). What is HPF?. HPF is a standard for data-parallel programming. Extends Fortran-77 or Fortran-90. Similar extensions exist for C and C++, but Fortran is really the focus. Principle of HPF. Extending sequential language with data distribution directives .
E N D
What is HPF? • HPF is a standard for data-parallel programming. • Extends Fortran-77 or Fortran-90. • Similar extensions exist for C and C++, but Fortran is really the focus.
Principle of HPF • Extending sequential language with data distribution directives. • Data distribution directives specify on which processor a certain part of an array should reside. • Compiler then produces: • parallel program, • communication between the processes.
What the Standard Says • Can be used with both Fortran-77 and Fortran-90. • Distribution directives are just a hint, compiler can ignore them. • HPF can be used on both shared memory and distributed memory hardware platforms.
In Commercial Use • HPF is always used with Fortran-90. • Distribution directives are a must. • HPF used on both shared memory and distributed memory platforms. • But the truth is that the language was really meant for distributed memory platforms.
Not to Confuse You • We will discuss commercial use: • Fortran-90 • Concurrency extensions to Fortran-90 in HPF. • HPF data distribution directives. • How HPF maps to a distributed memory platform. • Afterwards, we will discuss what the standard allows in addition.
Fortran-90 • Fortran + a number of array features. • Scalar operations are extended to arrays. • Intrinsic functions are extended to arrays. • Additional array-based intrinsic functions.
Array Assignment Scalar assignment: integer a, b, c a = b + c Array assignment: integer A(10,10), B(10,10), C(10,10) A = B + C
Requirements for Array Assignment • Arrays must be comformable • have the same number of dimensions, and • have the same size in each dimension. • One major exception for scalar is allowed: integer A(10,10), B(10,10), c A = B + c
Intrinsic Functions Extended to Arrays integer A(10,10), B(10,10) A = SQRT(A) B = ABS(A)
Additional Array Intrinsic Functions • MAXVAL, MINVAL • MAXLOC, MINLOC • return array of indices • SUM, PRODUCT • MATMUL, DOT_PRODUCT, TRANSPOSE
Examples real A(100,100), B(100), s int i(1), j(2) s = SUM(A) i = MAXLOC(B) j = MINLOC(A) C = DOT_PRODUCT(B, A)
Array Sections array( lower_bound : upper_bound : stride ) • Refers to the section of the array between lower_bound and upper_bound, with an optional stridespecified. • Multiple dimensions may be specified, with the obvious meaning. • Array sections may be used wherever arrays may be used.
Examples int A(10), B(10), C(10) int D(50), E(100), F(100) int max int G(100), H(100,100) A(1:8) = B(1:8) + C(2:9) D = E(1:100:2) + F(2:99:2) max = MAXVAL( G(1:100:10) ) max = MINVAL( H(1:100, 1:50) )
Semantics of Array Assignments • First, the entire right hand side is evaluated. • Then, assignments are made to the left hand side.
Example int A(4) = {7, 8, 12, 14} A(2:3) = A(1:2) => results in A being {7, 7, 8, 14} => not {7, 7, 7, 14}
Sequential/Parallel Fortran-90 • Fortran-90 is a sequential language. • However, its array assignment semantics makes it easy to parallelize it (automatically).
Not Perfect, Though (1 of 2) do i = 1,100 X(i,i) = 0.0; enddo • Obviously parallelizable. • Not expressible as a Fortran-90 array assignment (only regular sections).
Not Perfect, Though (2 of 2) int D(50), E(100), F(100) D = E(1:100:2) + F(2:99:2) is correct, but int D(100), E(100), F(100) D = E(1:100:2) + F(2:99:2) is not, because array D is not conformable.
HPF: Additional Expressions of Parallelism • FORALL array assignment. • INDEPENDENT construct.
FORALL Array Assignment FORALL( subscript = lower_bound : upper_bound : stride, mask) array-assignment • Execute all iterations of the subscript loop in parallel for the given set of indices, where maskis true. • May have multiple dimensions. • Same semantics: first compute right hand side, then assign to left hand side. • Only one assignment to particular element (not checked by the compiler!).
Examples (1 of 3) do i = 1,100 X(i,i) = 0.0 enddo becomes FORALL(i=1:100) X(i,i) = 0.0
Examples (2 of 3) int D(100), E(100), F(100) D = E(1:100:2) + F(2:100:2) becomes (correctly) FORALL(i=1:50) D(i) = E(2*i-1) + E(2*i)
Examples (3 of 3) • A multiple dimension example with use of the mask option. • Set all the elements of X above the diagonal to the sum of their indices. FORALL(i=1:100, j=1:100, i<j) X(i,j) = i+j
The INDEPENDENT Clause !HPF$ INDEPENDENT DO … ENDDO • Specifies that the iterations of the loop can be executed in any order.
Examples (1 of 2) !HPF$ INDEPENDENT DO i=1, 100 DO j = 1, 100 IF(i.NE.j) A(i,j) = 1.0 IF(i.EQ.j) A(i,j) = 0.0 ENDDO ENDDO
Examples (2 of 2): Nesting !HPF$ INDEPENDENT DO i=1, 100 !HPF$ INDEPENDENT DO j = 1, 100 IF(i.NE.j) A(i,j) = 1.0 IF(i.EQ.j) A(i,j) = 0.0 ENDDO ENDDO
HPF/Fortran-90 Matrix Multiply (1 of 4) C = MATMUL( A, B )
HPF Matrix Multiply (2 of 4) C = 0.0 FORALL(i=1:n, j=1:n ) C(i,j) = C(i,j) + A(i,k) * B(k,j)
HPF Matrix Multiply (3 of 4) !HPF$ INDEPENDENT DO i=1,n DO j=1,n C(i,j) = 0.0 DO k=1,n C(i,j) = C(i,j) + A(i,k) * B(k,j) ENDDO ENDDO ENDDO
HPF Matrix Multiply (4 of 4) !HPF$ INDEPENDENT DO i=1,n !HPF$ INDEPENDENT DO j=1,n C(i,j) = 0.0 DO k=1,n C(i,j) = C(i,j) + A(i,k) * B(k,j) ENDDO ENDDO ENDDO
HPF/Fortran-90 SOR (1 of 4) TEMP(1:n,1:n) = 0.25 * ( GRID(1:n,0:n-1) + GRID(1:n,2:n+1) + GRID(0:n-1,1:n) + GRID(2:n+1,1:n) ) GRID(1:n,1:n) = TEMP(1:n,1:n)
HPF/Fortran-90 SOR (1’ of 4) GRID(1:n,1:n) = 0.25 * ( GRID(1:n,0:n-1) + GRID(1:n,2:n+1) + GRID(0:n-1,1:n) + GRID(2:n+1,1:n) ) Also works, because of array assignment rules
HPF SOR (2 of 4) FORALL(i=1:n,j=1:n) TEMP(i,j) = 0.25 * ( GRID(i-1,j) + GRID(i+1,j) + GRID(i,j-1) + GRID(i,j+1) ) FORALL(i=1:n,j=1,n) GRID(i,j) = TEMP(i,j)
HPF SOR (3 of 4) !HPF$ INDEPENDENT DO I=1,n DO j=1,n TEMP(i,j) = 0.25 * ( GRID(i-1,j) + GRID(i+1,j) + GRID(i,j-1) + GRID(i,j+1) ) !HPF$ INDEPENDENT DO I=1,n DO j=1,n GRID(i,j) = TEMP(i,j)
HPF SOR (4 of 4) !HPF$ INDEPENDENT DO I=1,n !HPF$ INDEPENDENT DO j=1,n TEMP(i,j) = 0.25 * ( GRID(i-1,j) + GRID(i+1,j) + GRID(i,j-1) + GRID(i,j+1) ) !HPF$ INDEPENDENT DO I=1,n !HPF$ INDEPENDENT DO j=1,n GRID(i,j) = TEMP(i,j)