750 likes | 991 Views
Theory and Practice of Dependence Testing. Data and control dependences Scalar data dependences True-, anti-, and output-dependences Loop dependences Loop parallelism Dependence distance and direction vectors Loop-carried dependences Loop-independent dependences
E N D
Theory and Practice of Dependence Testing • Data and control dependences • Scalar data dependences • True-, anti-, and output-dependences • Loop dependences • Loop parallelism • Dependence distance and direction vectors • Loop-carried dependences • Loop-independent dependences • Loop dependence and loop iteration reordering • Dependence tests
Data and Control Dependences • Data dependence: data is produced and consumed in the correct order • Control dependence: a dependence that arises as a result of control flow S1 PI = 3.14S2 R = 5.0S3 AREA = PI * R ** 2 S1 IF (T.EQ.0.0) GOTO S3S2 A = A / TS3 CONTINUE
Data Dependence • Definition • There is a data dependence from statement S1 to statement S2 iff1) both S1 and S2 access the same memory location and at least one of them stores into it, and2) there is a feasible run-time execution path from S1 to S2
Data Dependence Classification • Dependence relations are similar to hardware hazards (which may cause pipeline stalls) • True dependences (also called flow dependences), denoted by S1 S2, are the same as RAW (Read After Write) hazards • Anti dependences, denoted by S1-1S2, are the same as WAR (Write After Read) hazards • Output dependences, denoted by S1oS2, are the same as WAW (Write After Write) hazards S1 X = …S2 … = X S1 … = XS2 X = … S1 X = …S2 X = … S1 S2 S1-1S2 S1oS2
Compiling for Loop Parallelism • Dependence properties are crucial for determining and exploiting loop parallelism • Bernstein’s conditions • Iteration I1 does not write into a location that is read by iteration I2 • Iteration I2 does not write into a location that is read by iteration I1 • Iteration I1 does not write into a location that is written into by iteration I2
Fortran’s PARALLEL DO • The PARALLEL DO guarantees that there are no scheduling constraints among its iterations PARALLEL DO I = 1, N A(I+1) = A(I) + B(I)ENDDO PARALLEL DO I = 1, N A(I) = A(I) + B(I)ENDDO PARALLEL DO I = 1, N A(I-1) = A(I) + B(I)ENDDO OK PARALLEL DO I = 1, N S = S + B(I)ENDDO Not OK
Loop Dependences • While the PARALLEL DO requires the programmer to assure the loop iterations can be executed in parallel, an optimizing compiler must analyze loop dependences for automatic loop parallelization and vectorization • Granularity of parallelism • Instruction level parallelism (ILP), not loop-specific • Synchronous (e.g. SIMD, also vector and SSE) parallelism is fine grain and compiler aims to optimize inner loops • Asynchronous (e.g. MIMD, work sharing) parallelism is coarse grain and compiler aims to optimize outer loops
Synchronous Parallel Machines • SIMD model, simpler and cheaper compared to MIMD • Processors operate in lockstep executing the same instruction on different portions of the data space • Application should exhibit data parallelism • Control flow requires masking operations (similar to predicated instructions), which can be inefficient p1,1 p1,2 p1,3 p1,4 p2,1 p2,2 p2,3 p2,4 p3,1 p3,2 p3,3 p3,4 p4,1 p4,2 p4,3 p4,4 Example 4x4 processor mesh
Asynchronous Parallel Computers • Multiprocessors • Shared memory is directly accessible by each PE (processing element) • Multicomputers • Distributed memory requires message passing between PEs • SMPs • Shared memory in small clusters of processors, need message passing across clusters p1 p2 p3 p4 bus mem m1 m2 m3 m4 p1 p2 p3 p4 bus, x-bar, or network
Advanced Compiler Technology • Powerful restructuring compilers transform loops into parallel or vector code • Must analyze data dependences in loop nests besides straight-line code DO I = 1, N DO J = 1, N C(J,I) = 0.0 DO K = 1, N C(J,I) = C(J,I)+A(J,K)*B(K,I) ENDDO ENDDOENDDO parallelize vectorize PARALLEL DO I = 1, N DO J = 1, N C(J,I) = 0.0 DO K = 1, N C(J,I) = C(J,I)+A(J,K)*B(K,I) ENDDO ENDDOENDDO DO I = 1, N DO J = 1, N, 64 C(J:J+63,I) = 0.0 DO K = 1, N C(J:J+63,I) = C(J:J+63,I) +A(J:J+63,K)*B(K,I) ENDDO ENDDOENDDO
Normalized Iteration Space • In most situations it is preferable to use normalized iteration spaces • For an arbitrary loop with loop index I and loop bounds L and U and stride S, the normalized iteration number isi = (I-L+S)/S DO I = 100, 20, -10 A(I) = B(100-I) + C(I/5)ENDDO is analyzed with i = (110-I)/10 DO i = 1, 9 A(110-10*i) = B(10*i-10) + C(22-2*i)ENDDO
Dependence in Loops • In this example loop statement S1 “depends on itself” from the previous iteration • More precisely, statement S1 in iteration i has a true dependence with S1 in iteration i+1 DO I = 1, NS1 A(I+1) = A(I) + B(I) ENDDO is executed as S1 A(2) = A(1) + B(1)S1 A(3) = A(2) + B(2)S1 A(4) = A(3) + B(3)S1 A(5) = A(4) + B(4) …
Iteration Vector • Definition • Given a nest of n loops, the iteration vectori of a particular iteration of the innermost loop is a vector of integersi = (i1, i2, …, in)where ik represents the iteration number for the loop at nesting level k • The set of all possible iteration vectors is an iteration space
Iteration Vector Example • The iteration space of the statement at S1 is {(1,1), (2,1), (2,2), (3,1), (3,2), (3,3)} • At iteration i = (2,1) the value of A(1,2) is assigned to A(2,1) DO I = 1, 3 DO J = 1, IS1 A(I,J) = A(J,I) ENDDO ENDDO I (1,1) (2,1) (2,2) (3,1) (3,2) (3,3) J
Iteration Vector Ordering • The iteration vectors are naturally ordered according to a lexicographical order, e.g. iteration (1,2) precedes (2,1) and (2,2) in the example on the previous slide • Definition • Iteration iprecedes iteration j, denoted i < j, iff1) i[1:n-1] < j[1:n-1], or2) i[1:n-1] = j[1:n-1] and in < jn
Loop Dependence • Definition • There exist a dependence from S1 to S2 in a loop nest iff there exist two iteration vectors i and j such that1) i < j and there is a path from S1 to S22) S1 accesses memory location M on iteration i and S2 accesses memory location M on iteration j3) one of these accesses is a write
Loop Dependence Example • Show that the example loop has no loop dependences • Answer: there are no iteration vectors i and j in {(1,1), (2,1), (2,2), (3,1), (3,2), (3,3)} such that i < j and either S1 in i writes to the same element of A that is read at S1 in iteration j, or S1 in iteration i reads an element A that is written in iteration j DO I = 1, 3 DO J = 1, IS1 A(I,J) = A(J,I) ENDDO ENDDO
Loop Reordering Transformations • Definitions • A reordering transformation is any program transformation that changes the execution order of the code, without adding or deleting any statement executions • A reordering transformation preserves a dependence if it preserves the relative execution order of the source and sink of that dependence
Fundamental Theorem of Dependence • Any reordering transformation that preserves every dependence in a program preserves the meaning of that program
Valid Transformations • A transformation is said to be valid for the program to which it applies if it preserves all dependences in the program DO I = 1, 3S1 A(I+1) = B(I)S2 B(I+1) = A(I) ENDDO invalid valid DO I = 1, 3 B(I+1) = A(I) A(I+1) = B(I) ENDDO DO I = 3, 1, -1 A(I+1) = B(I) B(I+1) = A(I) ENDDO
Dependences and Loop Transformations • Loop dependences are tested before a transformation is applied • When a dependence test is inconclusive, dependence must be assumed • In this example the value of K is unknown and loop dependence is assumed: DO I = 1, NS1 A(I+K) = A(I) + B(I) ENDDO
Dependence Distance Vector • Definition • Suppose that there is a dependence from S1 on iteration i to S2 on iteration j; then the dependence distance vectord(i, j) is defined as d(i, j) = j - i • Example: DO I = 1, 3 DO J = 1, IS1 A(I+1,J) = A(I,J) ENDDO ENDDO True dependence between S1 and itself oni = (1,1) and j = (2,1): d(i, j) = (1,0)i = (2,1) and j = (3,1): d(i, j) = (1,0)i = (2,2) and j = (3,2): d(i, j) = (1,0)
Dependence Direction Vector • Definition • Suppose that there is a dependence from S1 on iteration i and S2 on iteration j; then the dependence direction vectorD(i, j) is defined as “<” if d(i, j)k > 0“=” if d(i, j)k = 0“>” if d(i, j)k < 0 D(i, j)k =
Example DO I = 1, 3 DO J = 1, IS1 A(I+1,J) = A(I,J) ENDDO ENDDO I (1,1) True dependence between S1 and itself oni = (1,1) and j = (2,1): d(i, j) = (1,0), D(i, j) = (<, =)i = (2,1) and j = (3,1): d(i, j) = (1,0), D(i, j) = (<, =)i = (2,2) and j = (3,2): d(i, j) = (1,0), D(i, j) = (<, =) (2,1) (2,2) (3,1) (3,2) (3,3) J source sink
Data Dependence Graphs • Data dependences arise between statement instances • It is generally infeasible to represent all data dependences that arise in a program • Usually only static data dependences are recorded using the data dependence direction vector and depicted in a data dependence graph to compactly represent data dependences in a loop nest DO I = 1, 10000S1 A(I) = B(I) * 5S2 C(I+1) = C(I) + A(I) ENDDO Static data dependences foraccesses to A and C:S1(=)S2 and S2(<)S2 Data dependence graph: S1 (=) S2 (<)
Loop-Independent Dependences • Definition • Statement S1 has a loop-independent dependence on S2 iff there exist two iteration vectors i and j such that1) S1 refers to memory location M on iteration i, S2 refers to M on iteration j, and i=j2) there is a control flow path from S1 to S2 within the iteration
Loop-Carried Dependences • Definitions • Statement S1 has a loop-carried dependence on S2 iff there exist two iteration vectors i and j such that S1 refers to memory location M on iteration i, S2 refers to M on iteration j, and d(i, j) > 0, (that is, D(i, j) contains a “<” as its leftmost non-“=” component) • A loop-carried dependence from S1 to S2 is1) forward if S2 appears after S1 in the loop body2) backward if S2 appears before S1 (or if S1=S2) • The level of a loop-carried dependence is the index of the leftmost non- “=” of D(i, j) for the dependence
Example (1) • The loop-carried dependence S1(<)S2 is forward • The loop-carried dependence S2(<)S1 is backward • All loop-carried dependences are of level 1, because D(i, j) = (<) for every dependence DO I = 1, 3S1 A(I+1) = F(I)S2 F(I+1) = A(I) ENDDO S1(<)S2 and S2(<)S1
Example (2) • All loop-carried dependences are of level 3, because D(i, j) = (=, =, <) • Note: level-k dependences are also denoted by SxkSy DO I = 1, 10 DO J = 1, 10 DO K = 1, 10S1 A(I,J,K+1) = A(I,J,K) ENDDO ENDDO ENDDO S1(=,=,<)S1 Alternative notation for alevel-3 dependence S13S1
Direction Vectors and Reordering Transformations (1) • Let T be a transformation on a loop nest that does not rearrange the statements in the body of the innermost loop; then T is valid if, after it is applied, none of the direction vectors for dependences with source and sink in the nest has a leftmost non-“=” component that is “>”
Direction Vectors and Reordering Transformations (2) • A reordering transformation preserves all level-k dependences if1) it preserves the iteration order of the level-k loop,2) if it does not interchange any loop at level < k to a position inside the level-k loop, and3) if it does not interchange any loop at level > k to a position outside the level-k loop
Examples DO I = 1, 10S1 F(I+1) = A(I)S2 A(I+1) = F(I) ENDDO DO I = 1, 10 DO J = 1, 10 DO K = 1, 10S1 A(I+1,J+2,K+3) = A(I,J,K) + B ENDDO ENDDO ENDDO Find deps. Find deps. Both S1(<)S2 and S2(<)S1are level-1 dependences S1(<,<,<)S1 is a level-1 dependence Choose transform Choosetransform DO I = 1, 10 DO K = 10, 1, -1 DO J = 1, 10S1 A(I+1,J+2,K+3) = A(I,J,K) + B ENDDO ENDDO ENDDO DO I = 1, 10S2 A(I+1) = F(I)S1 F(I+1) = A(I) ENDDO
Direction Vectors and Reordering Transformations (3) • A reordering transformation preserves a loop-independent dependence between statements S1 and S2 if it does not move statement instances between iterations and preserves the relative order of S1 and S2
Examples DO I = 1, 10S1 A(I) = …S2 … = A(I) ENDDO DO I = 1, 10 DO J = 1, 10S1 A(I,J) = …S2 … = A(I,J) ENDDO ENDDO Find deps. Find deps. S1(=)S1 is a loop-independentdependence S1(=,=)S1 is a loop-independent dependence Choose transform Choosetransform DO J = 10, 1, -1 DO I = 10, 1, -1S1 A(I,J) = …S2 … = A(I,J) ENDDO ENDDO DO I = 10, 1, -1S2 A(I) = …S1 … = A(I) ENDDO
Combining Direction Vectors S1(*)S2=S1(<)S2S1(=)S2 S1(>)S2 • When a loop nest has multiple different directions at the same loop level k, we sometimes abbreviate the level-k direction vector component with “*” DO I = 1, 9S1 A(I) = …S2 … = A(10-I) ENDDO DO I = 1, 10S1 S = S + A(I) ENDDO S1(*)S1 DO I = 1, 10 DO J = 1, 10S1 A(J) = A(J)+1 ENDDO ENDDO S1(*,=)S1
Iteration Vectors • Iteration vectors are renditions of the dependences in a loop’s iteration space on a grid • Helps to visualize dependences DO I = 1, 3 DO J = 1, IS1 A(I+1,J) = A(I,J) ENDDO ENDDO Distance vector is (1,0) S1(<,=)S1 J 3 2 1 I 1 2 3
Example 1 J 4 DO I = 1, 4 DO J = 1, 4S1 A(I,J+1) = A(I-1,J) ENDDO ENDDO 3 2 1 Distance vector is (1,1) S1(<,<)S1 I 1 2 3 4
Example 2 J 4 DO I = 1, 4 DO J = 1, 5-IS1 A(I+1,J+I-1) = A(I,J+I-1) ENDDO ENDDO 3 2 1 Distance vector is (1,-1) S1(<,>)S1 I 1 2 3 4
Example 3 J 4 DO I = 1, 4 DO J = 1, 4S1 B(I) = B(I) + C(I)S2 A(I+1,J) = A(I,J) ENDDO ENDDO 3 2 1 I 1 2 3 4 Distance vectors are (0,1) and (1,0) S1(=,<)S1 and S2(<,=)S2
Parallelization • It is valid to convert a sequential loop to a parallel loop if the loop carries no dependence DO I = 1, 4 DO J = 1, 4S1 A(I,J+1) = A(I,J) ENDDO ENDDO S1(=,<)S1 parallelize J 4 PARALLEL DO I = 1, 4 DO J = 1, 4S1 A(I,J+1) = A(I,J) ENDDO ENDDO 3 2 1 I 1 2 3 4
Vectorization (1) • A single-statement loop that carries no dependence can be vectorized DO I = 1, 4S1 X(I) = X(I) + C ENDDO vectorize S1 X(1:4) = X(1:4) + C + Fortran 90 array statement Vector operation
Vectorization (2) • This example has a loop-carried dependence, and is therefore invalid S1(<)S1 DO I = 1, NS1 X(I+1) = X(I) + C ENDDO S1 X(2:N+1) = X(1:N) + C Invalid Fortran 90 statement
Vectorization (3) • Because only single loop statements can be vectorized, loops with multiple statements must be transformed using the loop distribution transformation • Loop has no loop-carried dependence or has forward flow dependences DO I = 1, NS1 A(I+1) = B(I) + CS2 D(I) = A(I) + E ENDDO loop distribution DO I = 1, NS1 A(I+1) = B(I) + C ENDDO DO I = 1, NS2 D(I) = A(I) + E ENDDO vectorize S1 A(2:N+1) = B(1:N) + CS2 D(1:N) = A(1:N) + E
Vectorization (4) • When a loop has backward flow dependences and no loop-independent dependences, interchange the statements to enable loop distribution DO I = 1, NS1 D(I) = A(I) + ES2 A(I+1) = B(I) + C ENDDO loop distribution DO I = 1, NS2 A(I+1) = B(I) + C ENDDO DO I = 1, NS1 D(I) = A(I) + E ENDDO vectorize S2 A(2:N+1) = B(1:N) + CS1 D(1:N) = A(1:N) + E
Preliminary Transformations to Support Dependence Testing • To determine distance and direction vectors with dependence tests require array subscripts to be in a standard form • Preliminary transformations put more subscripts in affine form, where subscripts are linear integer functions of loop induction variables • Loop normalization • Induction-variable substitution • Constant propagation
Loop Normalization • See slide on normalized iteration space • Dependence tests assume iteration spaces are normalized • The normalization distorts dependences, but it is otherwise preferred to support loop analysis • Make lower bound 1 (or 0) • Make stride 1 DO I = 1, M DO J = I, NS1 A(J,I) = A(J,I-1) + 5 ENDDO ENDDO S1(<,=)S1 normalize DO I = 1, M DO J = 1, N - I + 1S1 A(J+I-1,I) = A(J+I-1,I-1) + 5 ENDDO ENDDO S1(<,>)S1
Data Flow Analysis • Data flow analysis with reaching definitions or SSA form supports constant propagation and dead code elimination • The above are needed to standardize array subscripts K = 2 DO I = 1, 100S1 A(I+K) = A(I) + 5 ENDDO constant propagation DO I = 1, 100S1 A(I+2,I) = A(I) + 5 ENDDO
Forward Substitution • Forward substitution and induction variable substitution (IVS) yield array subscripts that are amenable to dependence analysis DO I = 1, 100K = I + 2S1 A(K) = A(K) + 5 ENDDO forward substitution DO I = 1, 100S1 A(I+2) = A(I+2) + 5 ENDDO
Induction Variable Substitution IVS Dep test GCD test to solve dependence equation 2id - 2iu = -1 Since 2 does not divide 1 there is no data dependence. A[] … A[2*i+1] A[2*i+2]
Example (1) • Before and after induction-variable substitution of KI in the inner loop INC = 2KI = 0DO I = 1, 100 DO J = 1, 100KI = KI + INC U(KI) = U(KI) + W(J) ENDDO S(I) = U(KI)ENDDO INC = 2KI = 0DO I = 1, 100 DO J = 1, 100 U(KI+J*INC) = U(KI+J*INC) + W(J) ENDDOKI = KI + 100 * INC S(I) = U(KI)ENDDO