1 / 72

Theory and Practice of Dependence Testing

Theory and Practice of Dependence Testing. Data and control dependences Scalar data dependences True-, anti-, and output-dependences Loop dependences Loop parallelism Dependence distance and direction vectors Loop-carried dependences Loop-independent dependences

abla
Download Presentation

Theory and Practice of Dependence Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Theory and Practice of Dependence Testing • Data and control dependences • Scalar data dependences • True-, anti-, and output-dependences • Loop dependences • Loop parallelism • Dependence distance and direction vectors • Loop-carried dependences • Loop-independent dependences • Loop dependence and loop iteration reordering • Dependence tests

  2. Data and Control Dependences • Data dependence: data is produced and consumed in the correct order • Control dependence: a dependence that arises as a result of control flow S1 PI = 3.14S2 R = 5.0S3 AREA = PI * R ** 2 S1 IF (T.EQ.0.0) GOTO S3S2 A = A / TS3 CONTINUE

  3. Data Dependence • Definition • There is a data dependence from statement S1 to statement S2 iff1) both S1 and S2 access the same memory location and at least one of them stores into it, and2) there is a feasible run-time execution path from S1 to S2

  4. Data Dependence Classification • Dependence relations are similar to hardware hazards (which may cause pipeline stalls) • True dependences (also called flow dependences), denoted by S1 S2, are the same as RAW (Read After Write) hazards • Anti dependences, denoted by S1-1S2, are the same as WAR (Write After Read) hazards • Output dependences, denoted by S1oS2, are the same as WAW (Write After Write) hazards S1 X = …S2 … = X S1 … = XS2 X = … S1 X = …S2 X = … S1 S2 S1-1S2 S1oS2

  5. Compiling for Loop Parallelism • Dependence properties are crucial for determining and exploiting loop parallelism • Bernstein’s conditions • Iteration I1 does not write into a location that is read by iteration I2 • Iteration I2 does not write into a location that is read by iteration I1 • Iteration I1 does not write into a location that is written into by iteration I2

  6. Fortran’s PARALLEL DO • The PARALLEL DO guarantees that there are no scheduling constraints among its iterations PARALLEL DO I = 1, N A(I+1) = A(I) + B(I)ENDDO PARALLEL DO I = 1, N A(I) = A(I) + B(I)ENDDO PARALLEL DO I = 1, N A(I-1) = A(I) + B(I)ENDDO OK PARALLEL DO I = 1, N S = S + B(I)ENDDO Not OK

  7. Loop Dependences • While the PARALLEL DO requires the programmer to assure the loop iterations can be executed in parallel, an optimizing compiler must analyze loop dependences for automatic loop parallelization and vectorization • Granularity of parallelism • Instruction level parallelism (ILP), not loop-specific • Synchronous (e.g. SIMD, also vector and SSE) parallelism is fine grain and compiler aims to optimize inner loops • Asynchronous (e.g. MIMD, work sharing) parallelism is coarse grain and compiler aims to optimize outer loops

  8. Synchronous Parallel Machines • SIMD model, simpler and cheaper compared to MIMD • Processors operate in lockstep executing the same instruction on different portions of the data space • Application should exhibit data parallelism • Control flow requires masking operations (similar to predicated instructions), which can be inefficient p1,1 p1,2 p1,3 p1,4 p2,1 p2,2 p2,3 p2,4 p3,1 p3,2 p3,3 p3,4 p4,1 p4,2 p4,3 p4,4 Example 4x4 processor mesh

  9. Asynchronous Parallel Computers • Multiprocessors • Shared memory is directly accessible by each PE (processing element) • Multicomputers • Distributed memory requires message passing between PEs • SMPs • Shared memory in small clusters of processors, need message passing across clusters p1 p2 p3 p4 bus mem m1 m2 m3 m4 p1 p2 p3 p4 bus, x-bar, or network

  10. Advanced Compiler Technology • Powerful restructuring compilers transform loops into parallel or vector code • Must analyze data dependences in loop nests besides straight-line code DO I = 1, N DO J = 1, N C(J,I) = 0.0 DO K = 1, N C(J,I) = C(J,I)+A(J,K)*B(K,I) ENDDO ENDDOENDDO parallelize vectorize PARALLEL DO I = 1, N DO J = 1, N C(J,I) = 0.0 DO K = 1, N C(J,I) = C(J,I)+A(J,K)*B(K,I) ENDDO ENDDOENDDO DO I = 1, N DO J = 1, N, 64 C(J:J+63,I) = 0.0 DO K = 1, N C(J:J+63,I) = C(J:J+63,I) +A(J:J+63,K)*B(K,I) ENDDO ENDDOENDDO

  11. Normalized Iteration Space • In most situations it is preferable to use normalized iteration spaces • For an arbitrary loop with loop index I and loop bounds L and U and stride S, the normalized iteration number isi = (I-L+S)/S DO I = 100, 20, -10 A(I) = B(100-I) + C(I/5)ENDDO is analyzed with i = (110-I)/10 DO i = 1, 9 A(110-10*i) = B(10*i-10) + C(22-2*i)ENDDO

  12. Dependence in Loops • In this example loop statement S1 “depends on itself” from the previous iteration • More precisely, statement S1 in iteration i has a true dependence with S1 in iteration i+1 DO I = 1, NS1 A(I+1) = A(I) + B(I) ENDDO is executed as S1 A(2) = A(1) + B(1)S1 A(3) = A(2) + B(2)S1 A(4) = A(3) + B(3)S1 A(5) = A(4) + B(4) …

  13. Iteration Vector • Definition • Given a nest of n loops, the iteration vectori of a particular iteration of the innermost loop is a vector of integersi = (i1, i2, …, in)where ik represents the iteration number for the loop at nesting level k • The set of all possible iteration vectors is an iteration space

  14. Iteration Vector Example • The iteration space of the statement at S1 is {(1,1), (2,1), (2,2), (3,1), (3,2), (3,3)} • At iteration i = (2,1) the value of A(1,2) is assigned to A(2,1) DO I = 1, 3 DO J = 1, IS1 A(I,J) = A(J,I) ENDDO ENDDO I (1,1) (2,1) (2,2) (3,1) (3,2) (3,3) J

  15. Iteration Vector Ordering • The iteration vectors are naturally ordered according to a lexicographical order, e.g. iteration (1,2) precedes (2,1) and (2,2) in the example on the previous slide • Definition • Iteration iprecedes iteration j, denoted i < j, iff1) i[1:n-1] < j[1:n-1], or2) i[1:n-1] = j[1:n-1] and in < jn

  16. Loop Dependence • Definition • There exist a dependence from S1 to S2 in a loop nest iff there exist two iteration vectors i and j such that1) i < j and there is a path from S1 to S22) S1 accesses memory location M on iteration i and S2 accesses memory location M on iteration j3) one of these accesses is a write

  17. Loop Dependence Example • Show that the example loop has no loop dependences • Answer: there are no iteration vectors i and j in {(1,1), (2,1), (2,2), (3,1), (3,2), (3,3)} such that i < j and either S1 in i writes to the same element of A that is read at S1 in iteration j, or S1 in iteration i reads an element A that is written in iteration j DO I = 1, 3 DO J = 1, IS1 A(I,J) = A(J,I) ENDDO ENDDO

  18. Loop Reordering Transformations • Definitions • A reordering transformation is any program transformation that changes the execution order of the code, without adding or deleting any statement executions • A reordering transformation preserves a dependence if it preserves the relative execution order of the source and sink of that dependence

  19. Fundamental Theorem of Dependence • Any reordering transformation that preserves every dependence in a program preserves the meaning of that program

  20. Valid Transformations • A transformation is said to be valid for the program to which it applies if it preserves all dependences in the program DO I = 1, 3S1 A(I+1) = B(I)S2 B(I+1) = A(I) ENDDO invalid valid DO I = 1, 3 B(I+1) = A(I) A(I+1) = B(I) ENDDO DO I = 3, 1, -1 A(I+1) = B(I) B(I+1) = A(I) ENDDO

  21. Dependences and Loop Transformations • Loop dependences are tested before a transformation is applied • When a dependence test is inconclusive, dependence must be assumed • In this example the value of K is unknown and loop dependence is assumed: DO I = 1, NS1 A(I+K) = A(I) + B(I) ENDDO

  22. Dependence Distance Vector • Definition • Suppose that there is a dependence from S1 on iteration i to S2 on iteration j; then the dependence distance vectord(i, j) is defined as d(i, j) = j - i • Example: DO I = 1, 3 DO J = 1, IS1 A(I+1,J) = A(I,J) ENDDO ENDDO True dependence between S1 and itself oni = (1,1) and j = (2,1): d(i, j) = (1,0)i = (2,1) and j = (3,1): d(i, j) = (1,0)i = (2,2) and j = (3,2): d(i, j) = (1,0)

  23. Dependence Direction Vector • Definition • Suppose that there is a dependence from S1 on iteration i and S2 on iteration j; then the dependence direction vectorD(i, j) is defined as “<” if d(i, j)k > 0“=” if d(i, j)k = 0“>” if d(i, j)k < 0 D(i, j)k =

  24. Example DO I = 1, 3 DO J = 1, IS1 A(I+1,J) = A(I,J) ENDDO ENDDO I (1,1) True dependence between S1 and itself oni = (1,1) and j = (2,1): d(i, j) = (1,0), D(i, j) = (<, =)i = (2,1) and j = (3,1): d(i, j) = (1,0), D(i, j) = (<, =)i = (2,2) and j = (3,2): d(i, j) = (1,0), D(i, j) = (<, =) (2,1) (2,2) (3,1) (3,2) (3,3) J source sink

  25. Data Dependence Graphs • Data dependences arise between statement instances • It is generally infeasible to represent all data dependences that arise in a program • Usually only static data dependences are recorded using the data dependence direction vector and depicted in a data dependence graph to compactly represent data dependences in a loop nest DO I = 1, 10000S1 A(I) = B(I) * 5S2 C(I+1) = C(I) + A(I) ENDDO Static data dependences foraccesses to A and C:S1(=)S2 and S2(<)S2 Data dependence graph: S1 (=) S2 (<)

  26. Loop-Independent Dependences • Definition • Statement S1 has a loop-independent dependence on S2 iff there exist two iteration vectors i and j such that1) S1 refers to memory location M on iteration i, S2 refers to M on iteration j, and i=j2) there is a control flow path from S1 to S2 within the iteration

  27. Loop-Carried Dependences • Definitions • Statement S1 has a loop-carried dependence on S2 iff there exist two iteration vectors i and j such that S1 refers to memory location M on iteration i, S2 refers to M on iteration j, and d(i, j) > 0, (that is, D(i, j) contains a “<” as its leftmost non-“=” component) • A loop-carried dependence from S1 to S2 is1) forward if S2 appears after S1 in the loop body2) backward if S2 appears before S1 (or if S1=S2) • The level of a loop-carried dependence is the index of the leftmost non- “=” of D(i, j) for the dependence

  28. Example (1) • The loop-carried dependence S1(<)S2 is forward • The loop-carried dependence S2(<)S1 is backward • All loop-carried dependences are of level 1, because D(i, j) = (<) for every dependence DO I = 1, 3S1 A(I+1) = F(I)S2 F(I+1) = A(I) ENDDO S1(<)S2 and S2(<)S1

  29. Example (2) • All loop-carried dependences are of level 3, because D(i, j) = (=, =, <) • Note: level-k dependences are also denoted by SxkSy DO I = 1, 10 DO J = 1, 10 DO K = 1, 10S1 A(I,J,K+1) = A(I,J,K) ENDDO ENDDO ENDDO S1(=,=,<)S1 Alternative notation for alevel-3 dependence S13S1

  30. Direction Vectors and Reordering Transformations (1) • Let T be a transformation on a loop nest that does not rearrange the statements in the body of the innermost loop; then T is valid if, after it is applied, none of the direction vectors for dependences with source and sink in the nest has a leftmost non-“=” component that is “>”

  31. Direction Vectors and Reordering Transformations (2) • A reordering transformation preserves all level-k dependences if1) it preserves the iteration order of the level-k loop,2) if it does not interchange any loop at level < k to a position inside the level-k loop, and3) if it does not interchange any loop at level > k to a position outside the level-k loop

  32. Examples DO I = 1, 10S1 F(I+1) = A(I)S2 A(I+1) = F(I) ENDDO DO I = 1, 10 DO J = 1, 10 DO K = 1, 10S1 A(I+1,J+2,K+3) = A(I,J,K) + B ENDDO ENDDO ENDDO Find deps. Find deps. Both S1(<)S2 and S2(<)S1are level-1 dependences S1(<,<,<)S1 is a level-1 dependence Choose transform Choosetransform DO I = 1, 10 DO K = 10, 1, -1 DO J = 1, 10S1 A(I+1,J+2,K+3) = A(I,J,K) + B ENDDO ENDDO ENDDO DO I = 1, 10S2 A(I+1) = F(I)S1 F(I+1) = A(I) ENDDO

  33. Direction Vectors and Reordering Transformations (3) • A reordering transformation preserves a loop-independent dependence between statements S1 and S2 if it does not move statement instances between iterations and preserves the relative order of S1 and S2

  34. Examples DO I = 1, 10S1 A(I) = …S2 … = A(I) ENDDO DO I = 1, 10 DO J = 1, 10S1 A(I,J) = …S2 … = A(I,J) ENDDO ENDDO Find deps. Find deps. S1(=)S1 is a loop-independentdependence S1(=,=)S1 is a loop-independent dependence Choose transform Choosetransform DO J = 10, 1, -1 DO I = 10, 1, -1S1 A(I,J) = …S2 … = A(I,J) ENDDO ENDDO DO I = 10, 1, -1S2 A(I) = …S1 … = A(I) ENDDO

  35. Combining Direction Vectors S1(*)S2=S1(<)S2S1(=)S2 S1(>)S2 • When a loop nest has multiple different directions at the same loop level k, we sometimes abbreviate the level-k direction vector component with “*” DO I = 1, 9S1 A(I) = …S2 … = A(10-I) ENDDO DO I = 1, 10S1 S = S + A(I) ENDDO S1(*)S1 DO I = 1, 10 DO J = 1, 10S1 A(J) = A(J)+1 ENDDO ENDDO S1(*,=)S1

  36. Iteration Vectors • Iteration vectors are renditions of the dependences in a loop’s iteration space on a grid • Helps to visualize dependences DO I = 1, 3 DO J = 1, IS1 A(I+1,J) = A(I,J) ENDDO ENDDO Distance vector is (1,0) S1(<,=)S1 J 3 2 1 I 1 2 3

  37. Example 1 J 4 DO I = 1, 4 DO J = 1, 4S1 A(I,J+1) = A(I-1,J) ENDDO ENDDO 3 2 1 Distance vector is (1,1) S1(<,<)S1 I 1 2 3 4

  38. Example 2 J 4 DO I = 1, 4 DO J = 1, 5-IS1 A(I+1,J+I-1) = A(I,J+I-1) ENDDO ENDDO 3 2 1 Distance vector is (1,-1) S1(<,>)S1 I 1 2 3 4

  39. Example 3 J 4 DO I = 1, 4 DO J = 1, 4S1 B(I) = B(I) + C(I)S2 A(I+1,J) = A(I,J) ENDDO ENDDO 3 2 1 I 1 2 3 4 Distance vectors are (0,1) and (1,0) S1(=,<)S1 and S2(<,=)S2

  40. Parallelization • It is valid to convert a sequential loop to a parallel loop if the loop carries no dependence DO I = 1, 4 DO J = 1, 4S1 A(I,J+1) = A(I,J) ENDDO ENDDO S1(=,<)S1 parallelize J 4 PARALLEL DO I = 1, 4 DO J = 1, 4S1 A(I,J+1) = A(I,J) ENDDO ENDDO 3 2 1 I 1 2 3 4

  41. Vectorization (1) • A single-statement loop that carries no dependence can be vectorized DO I = 1, 4S1 X(I) = X(I) + C ENDDO vectorize S1 X(1:4) = X(1:4) + C + Fortran 90 array statement Vector operation

  42. Vectorization (2) • This example has a loop-carried dependence, and is therefore invalid S1(<)S1 DO I = 1, NS1 X(I+1) = X(I) + C ENDDO S1 X(2:N+1) = X(1:N) + C Invalid Fortran 90 statement

  43. Vectorization (3) • Because only single loop statements can be vectorized, loops with multiple statements must be transformed using the loop distribution transformation • Loop has no loop-carried dependence or has forward flow dependences DO I = 1, NS1 A(I+1) = B(I) + CS2 D(I) = A(I) + E ENDDO loop distribution DO I = 1, NS1 A(I+1) = B(I) + C ENDDO DO I = 1, NS2 D(I) = A(I) + E ENDDO vectorize S1 A(2:N+1) = B(1:N) + CS2 D(1:N) = A(1:N) + E

  44. Vectorization (4) • When a loop has backward flow dependences and no loop-independent dependences, interchange the statements to enable loop distribution DO I = 1, NS1 D(I) = A(I) + ES2 A(I+1) = B(I) + C ENDDO loop distribution DO I = 1, NS2 A(I+1) = B(I) + C ENDDO DO I = 1, NS1 D(I) = A(I) + E ENDDO vectorize S2 A(2:N+1) = B(1:N) + CS1 D(1:N) = A(1:N) + E

  45. Preliminary Transformations to Support Dependence Testing • To determine distance and direction vectors with dependence tests require array subscripts to be in a standard form • Preliminary transformations put more subscripts in affine form, where subscripts are linear integer functions of loop induction variables • Loop normalization • Induction-variable substitution • Constant propagation

  46. Loop Normalization • See slide on normalized iteration space • Dependence tests assume iteration spaces are normalized • The normalization distorts dependences, but it is otherwise preferred to support loop analysis • Make lower bound 1 (or 0) • Make stride 1 DO I = 1, M DO J = I, NS1 A(J,I) = A(J,I-1) + 5 ENDDO ENDDO S1(<,=)S1 normalize DO I = 1, M DO J = 1, N - I + 1S1 A(J+I-1,I) = A(J+I-1,I-1) + 5 ENDDO ENDDO S1(<,>)S1

  47. Data Flow Analysis • Data flow analysis with reaching definitions or SSA form supports constant propagation and dead code elimination • The above are needed to standardize array subscripts K = 2 DO I = 1, 100S1 A(I+K) = A(I) + 5 ENDDO constant propagation DO I = 1, 100S1 A(I+2,I) = A(I) + 5 ENDDO

  48. Forward Substitution • Forward substitution and induction variable substitution (IVS) yield array subscripts that are amenable to dependence analysis DO I = 1, 100K = I + 2S1 A(K) = A(K) + 5 ENDDO forward substitution DO I = 1, 100S1 A(I+2) = A(I+2) + 5 ENDDO

  49. Induction Variable Substitution IVS Dep test GCD test to solve dependence equation 2id - 2iu = -1 Since 2 does not divide 1 there is no data dependence. A[] … A[2*i+1] A[2*i+2]

  50. Example (1) • Before and after induction-variable substitution of KI in the inner loop INC = 2KI = 0DO I = 1, 100 DO J = 1, 100KI = KI + INC U(KI) = U(KI) + W(J) ENDDO S(I) = U(KI)ENDDO INC = 2KI = 0DO I = 1, 100 DO J = 1, 100 U(KI+J*INC) = U(KI+J*INC) + W(J) ENDDOKI = KI + 100 * INC S(I) = U(KI)ENDDO

More Related