330 likes | 349 Views
Although the [Fortran] group broke new ground … they never lost sight their main objective, namely to produce a product that would be acceptable to practical users with real problems to solve. Fortran … is still by far the most popular language for numerical computation.
E N D
Although the [Fortran] group broke new ground … they never lost sight their main objective, namely to produce a product that would be acceptable to practical users with real problems to solve. Fortran … is still by far the most popular language for numerical computation 879 CISC Parallel Computation High Performance Fortran (HPF) Ibrahim Halil Saruhan saruhan@cis.udel.edu Maurice V. Wilkes
Outline • Introduction • Brief History of Fortran and HPF • HPF Directives, Syntax • Data Mapping • Data Parallelism • Putting It all Together • Intrinsic Procedures • Extrinsic Procedures • References and Further Information
Introduction HPF • a language that combines the full Fortran 90 language with special user annotations dealing with data distribution. • will be a standard programming language for computationally intensive applications on many types of machines. • a set of extensions to Fortran expressing parallel execution at a high level. • designed to provide a portable extension to Fortran 90 for writing data parallel applications. HPFF is the group of people who developed HPF. Since its introduction almost four decades ago, Fortran has been the language of choice for scientific and engineering programming, and HPF is the latest set of extensions to this venerable language.
Brief History of Fortran and HPF Early 1950’s The first programming language to be called Fortran was developed by IBM 1957 Became popular after the first compiler delivered to a customer 1966 ANSI published the first formal standard for Fortran including features like integer, real, double precision, do loop, if conditionals, subroutines, functions, Hollerith data type( replaced with character type) and global Variables. This standard is called Fortran 66 1978 ANSI and ISO published a new standard (Fortran 77) including features like If then else if end if conditional statements, complex data type, complex constants, complex numbers, character type, formatted, unformatted and direct-access file input and output.
Brief History of Fortran and HPF 1991 Desire for Revision on FORTRAN 77 standard let to the work on Fortran with the title of Fortran 8X and resulted in 1991 by ISO and renamed as Fortran 90. Its goal was to modernize Fortran so that it may continue its long history as a scientific and engineering programming language. To satisfy the need for efficient programming on the new generation of parallel Machines, Fortran should need extensions and that leads to the beginning of HPF. The first group to discuss standardization of parallel Fortran features was the Parallel Computing Forum (PCF). The original goals of the group were to standardize the language features for task oriented parallelism and shared memory machines. 1991 Nov Digital Equipment Corporation organized a meeting at the Supercomputing ’91 conference in Albuquerque, New Mexico to discuss HPF
Brief History of Fortran and HPF 1992 Jan Kickoff meeting for HPFF in Houston Texas, hosted by the Center for Research on Parallel Computation at Rice University. Over 130 people attended and the meeting is size was larger than expected, a series of smaller “working group” meetings was scheduled to create the language draft. 1992 Mar The HPFF working group, nearly 40 people, met for the first time in Dallas, Texas. Eight further meetings were held. 1993 May The HPFF working group produced the HPF language Specification version 1.0
There are two forms of directive in HPF: specification-directive executable-diretive specification-directive (H204): Must be in the specification part of the program unit executable-directive (H205): Appears with the other Fortran 90 executable-constructs in the program unit. HPF Directives and Their Syntax The form of an hpf-directive-line (H201) is : Directive-origin hpf-directive where a directive-origin(H202) is one of !HPF$ CHPF$ *HPF$ Fortran 90 allows comments to begin with “C” and “*” as well as “!” in the fixed source form, but allows only “!” to begin a comment in free source form. Examples :align, distribute, processors … Examples :align, distribute, processors …
HPF-conforming or not? !HPF$ DISTRIBUTE (CYCLIC) :: PERIODIC_TABLE … RIGHT REAL PERIODIC_TABLE (103); !HPF$ DISTRIBUTE PERIODIC_TABLE (CYCLIC) WRONG REAL PERIODIC_TABLE (103) !HPF$ DISTRIBUTE PERIODIC_TABLE (CYCLIC) RIGHT !HPF$ DISTRIBUTE PERIODIC_TABLE (CYCLIC); DISTRIBUTE LOG_TABLE (BLOCK) WRONG !HPF$ DISTRIBUTE PERIODIC_TABLE (CYCLIC) !HPF$ DISTRIBUTE LOG_TABLE (BLOCK) RIGHT
Programming Model of HPF Programming Model Communication Parallelism INTRINSINC and STANDARD LIBRARY FUNCTIONS FORALL DO INDEPENDENT EXTRINSINC FUNCTIONS
Data Mapping HPF describes data-to-processor mapping by using two kind of operations: Distribute : Directive that describes how an array is divided into even-sized pieces and distributed to processors in a regular way. REAL A (100,100) Array declaration There are 4 processors in this example !HPF$ DISTRIBUTE A (BLOCK, BLOCK) Result : Each processor receives a 50X50 block of A, like P1 gets A(1:50,1:50) !HPF$ DISTRIBUTE A (CYCLIC, *) Result : Each processor receives every 4.th row of A, like P1 gets A(1,1:100), A(5,1:100), A(9,1:100) ….
Data Mapping Align : Directive that describes how two arrays ‘line up’ together. !HPF$ ALIGN X(I) WITH Y(I) Result : X and Y are always distributed the same !HPF$ ALIGN X(I) WITH Y(2*I-1) Result : Elements of X correspond to the elements of Y(A can have at most half as many elements as Y)
Data Mapping Example REAL DECK_OF_CARDS (52) !HPF$ DISTRIBUTE DECK_OF_CARDS (CYCLIC) 1 6 11 16 2 7 12 17 1 2 3 4 3 8 13 18 REAL DECK_OF_CARDS (52) !HPF$ DISTRIBUTE DECK_OF_CARDS (CYCLIC(5)) 5 6 7 8 4 9 14 19 9 10 11 12 5 10 15 20 13 14 15 16 There are 4 processors in this example 21 26 31 36 17 18 19 20 22 27 32 37 21 22 23 24 23 28 33 38 25 26 27 28 24 29 34 39 29 30 31 32 25 30 35 40 33 34 35 36 41 46 51 DECK_OF_CARDS (1:49:4) DECK_OF_CARDS (2:50:4) DECK_OF_CARDS (3:51:4) DECK_OF_CARDS (4:52:4) DECK_OF_CARDS (1:5) and DECK_OF_CARDS (21:25) and DECK_OF_CARDS (41:45) 37 38 39 40 42 47 52 41 42 43 44 43 48 45 46 47 48 44 49 49 50 51 52 45 50
HPF Data Mapping Model Abstract processors as a user-declared Cartesian mesh Physical Processors Arrays or other objects Group of aligned objects DISTRIBUTE (static)or REDISTRIBUTE (dynamic) Optional implementation-dependent directive ALIGN (static)or REALIGN (dynamic)
Data Mapping Example REAL, DIMENSION (16) :: A, B, C REAL, DIMENSION (32) :: D REAL, DIMENSION (8) :: X REAL, DIMENSION (0:9) :: Y INTEGER, DIMENSION (16) :: INX !HPF$ PROCESSORS, DIMENSION(4) :: PROC !HPF$ DISTRIBUTE, (BLOCK) ONTO PROCS :: A, B, D, INX !HPF$ DISTRIBUTE, (CYCLIC) ONTO PROCS :: C !HPF$ ALIGN (I) WITH Y(I+1) :: X
HPF Data Mapping Declaration PROCS (1) PROCS (2) PROCS (3) PROCS (4) a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 c 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 2 3 4 9 10 11 12 17 18 19 20 25 26 27 28 d 5 6 7 8 13 14 15 16 21 22 23 24 29 30 31 32 inx 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 x 1 2 3 4 5 6 7 8 y 0 1 2 3 4 5 6 7 8 9
HPF Data Mapping Example 1 FORALL (I=1:16) A(I) = B(I) b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 No communication
HPF Data Mapping Example 2 FORALL (I=1:16) A(I) = C(I) c 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Total communication is 12 elements
HPF Data Mapping Example 3 FORALL (I=1:15) A(I) = B(I+1) b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Total communication is 3 elements
Data Parallelism The most important features used by HPF for parallelism are : Forall and Independent Forall generalizes the Fortran 90 array assignment to handle new shapes of arrays. Forall is not a loop, not is it a parallel loop as defined in some languages. Forall doesn’t iterate in any well-defined order. Independent directive gives the compiler more information about a DO loop or FORALL statement. It tells the compiler that a DO loop doesn’t make any bad data access that force the loop to be run sequentially. !HPF$ INDEPENDENT DO I=1,N X(INDX(I)) = Y(I) END DO
Data Parallelism There are two kind of Forall statements. The single statement and the multi statement: 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55 FORALL (I = 2:5) A(I,I) = A(I-1,I-1) 11 12 13 14 15 21 11 23 24 25 31 32 22 34 35 41 42 43 33 45 51 52 53 54 44 A single statement FORALL
Data Parallelism FORALL (I = 1:8) A(I,I) = SQRT(A(I,I)) FORALL (j = I-3 : I+3, J/=I .AND. J>=1 .AND> J<=8) A(I,J) = A(I,I) * A(J,J) END FORALL END FORALL 1 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 16 0 0 0 0 0 0 0 0 25 0 0 0 0 0 0 0 0 36 0 0 0 0 0 0 0 0 49 0 0 0 0 0 0 0 0 64 1 2 3 4 0 0 0 0 2 2 6 8 10 0 0 0 3 6 3 12 15 18 0 0 4 8 12 4 20 24 28 0 0 10 15 20 5 30 35 40 0 0 18 24 30 6 42 48 0 0 0 28 35 42 7 56 0 0 0 0 40 48 56 8 A multi statement FORALL
Putting It all Together The total performance of an HPF program is the combination of parallelism and communication. The performance of an HPF program will depend on the programming model, compiler design, target machine characteristics, and other factors. A simple model for the total computation time of a parallel program is : Ttotal = Tpar /Pactive + Tserial + Tcomm Where : • Ttotal is the total execution time. • Tpar is the total work that can be executed in parallel. • Pactive is the number of (physical) processors that are active, executing the work in Tpar • Tserial is the total work that is done serially. • Tcomm is the cost of communications
Example REAL, ARRAY(16,16) :: X, Y ….. FORALL (J= 2:15 , K=2:15) Y(J,K) = (X(J,K) + X(J-1,K) + X(J+1,K)+ X(J,K-1), X(J,K+1))/5.0 END FORALL DISTRIBUTE X (*, BLOCK) DISTRIBUTE X (BLOCK,BLOCK) DISTRIBUTE X (*, CYCLIC) P1 P2 P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4 P3 P4 Various distributions of a 16*16 array onto four processors
Example DISTRIBUTE X (*, BLOCK) DISTRIBUTE X (BLOCK,BLOCK) P1 P2 P1 P2 P3 P4 P3 P4 • Each processor holds 8*8 sub array. • Each processor must compute 49 elements of Y. • P1 must compute Y(2:8,2:8) … • Each processor can compute 36 elements of Y without requiring communication. • For the remaining 13 elements of Y it must obtain 7 elementsof X from each of two other processors • Tpar/Pactive is 49 element computations • T comm is 14 element-exchanges • P2 and P3 each must compute 56 elements of Y (a 14X4 sub array of Y) • P1 and P4 each must compute 42 elements of Y (a 14X3 sub array of Y) • P2 must exchange 14 elements of X with P1 and 14 another elements of X with P3. • P3 has the same computation as P2. • P1,P4 has less work to do. • Overall completion time: Tpar/Pactive is 56 element-computations and communications overhead (TComm) is 28 element-exchanges.
Intrinsic and Library Procedures • System Inquiry Functions (like NUMBER_OF_PROCESSORS, PROCESSORS_SHAPE, SIZE) • Mapping Inquiry Subroutines (like HPF_ALIGNMENT, HPF_TEMPLATE, HPF_DISTRIBUTION) • Computational Functions • Bit Manipulation Functions (like ILEN,LEADZ,POPCNT,POPPAR) • Array Location Functions (like MAXLOC,MINLOC) • Array Reduction Functions (like IALL,IANY,IPARITY,PARITY) • Array Combining Scatter Functions (like SUM_SCATTER) • Array Prefix and Suffix Functions (like ALL_SCATTER, ANY_SCATTER) • Array Sorting Functions (like GRADE_DOWN)
Extrinsic Procedures HPF provides a mechanism by which HPF programs may call procedures written in other parallel programming languages. Because such procedures are outside of HPF, they are called extrinsic procedures For Instance, INTERFACE EXTRINSINC (COBOL) SUBROUTINE PRINT_REPORT(DATA_ARRAY) REAL DATA_ARRAY(:,:) END SUBROUTINE PRINT_REPORT END INTERFACE
References and Further Information • The High Performance Fortran Handbook by Charles H. Koelbel , David B. Loveman, Robert S. Schreiber The MIT Press 1994 (in the library) • Designing and Building Parallel Programs, by Ian Fosterhttp://www unix.mcs.anl.gov/dbpp/text/node82.html#SECTION03400000000000000000 • http://www.crpc.rice.edu/HPFF/, Rice University • http://www.npac.syr.edu/hpfa/, Syracuse University