170 likes | 611 Views
Loop Restructuring. Loop unswitching Loop peeling Loop fusion Loop alignment for fusion Loop reversal Loop fission Loop alignment Loop index set splitting Loop interchange Scalar expansion. Unswitching.
E N D
Loop Restructuring • Loop unswitching • Loop peeling • Loop fusion • Loop alignment for fusion • Loop reversal • Loop fission • Loop alignment • Loop index set splitting • Loop interchange • Scalar expansion
Unswitching DO I = 1, N DO J = 2, N IF T(I) > 0 THEN A(I,J) = A(I,J-1)*T(I)+B(I) ELSE A(I,J) = 0.0 ENDIF ENDDOENDDO • Loop unswitching removes loop-independent conditionals • Reduces the frequency of executing branches • But: leads to code expansion DO I = 1, N IF T(I) > 0 THEN DO J = 2, N A(I,J) = A(I,J-1)*T(I)+B(I) ENDDO ELSE DO J = 2, N A(I,J) = 0.0 ENDDO ENDIFENDDO
Peeling J = 0K = MDO I = 0, N A(K) = B(J) - B(K) K = J J = J + 1 ENDDO • Loop peeling removes the first (or last) iteration of a loop into separate code • Enables loop fusion by changing bounds of one loop to match bounds of another • But: leads to code expansion J = 0K = MA(K) = B(J) - B(K)K = JJ = J + 1DO I = 1, N A(K) = B(J) - B(K) K = J J = J + 1ENDDO
Fusion S1 B(1) = T(1)*X(1)S2 DO I = 2, NS3 B(I) = T(I)*X(I)S4 ENDDOS5 DO I = 2, NS6 A(I) = B(I) - B(I-1)S7 ENDDO • Combine two consecutive loops with same IV and loop bounds into one • Fused loop must preserve all dependence relations of the original loop • Enables more effective scalar optimizations in fused loop • But: may reduce temporal locality S1 S6S3 S6 S1 B(1) = T(1)*X(1)Sx DO I = 2, NS3 B(I) = T(I)*X(I)S6 A(I) = B(I) - B(I-1)Sy ENDDO S1 S6S3(=)S6S3(<)S6 Original code has dependencesS1 S6 and S3 S6Fused loop has dependencesS1 S6 and S3(=)S6 and S3(<)S6
Example a) S1 DO I = 1, NS2 A(I) = B(I) + 1S3 ENDDOS4 DO I = 1, NS5 C(I) = A(I)/2S6 ENDDOS7 DO I = 1, NS8 D(I) = 1/C(I+1) S9 ENDDO S1 DO I = 1, NS2 A(I) = B(I) + 1S3 ENDDO Sx DO I = 1, NS5 C(I) = A(I)/2S8 D(I) = 1/C(I+1) Sy ENDDO b) Sx DO I = 1, NS2 A(I) = B(I) + 1S5 C(I) = A(I)/2Sy ENDDO S7 DO I = 1, NS8 D(I) = 1/C(I+1) S9 ENDDO Which of the threefused loops is legal? c) Sx DO I = 1, NS2 A(I) = B(I) + 1S5 C(I) = A(I)/2S8 D(I) = 1/C(I+1) Sy ENDDO
Alignment for Fusion S1 DO I = 1, NS2 B(I) = T(I)/CS3 ENDDOS4 DO I = 1, NS5 A(I) = B(I+1) - B(I-1)S6 ENDDO • Alignment for fusion changes iteration bounds of one loop to enable fusion when dependences would otherwise prevent fusion S2 S5 S1 DO I = 0, N-1S2 B(I+1) = T(I+1)/CS3 ENDDO S4 DO I = 1, NS5 A(I) = B(I+1) - B(I-1)S6 ENDDO S2 S5 Sx B(1) = T(1)/CS1 DO I = 1, N-1S2 B(I+1) = T(I+1)/CS5 A(I) = B(I+1) - B(I-1)S6 ENDDOSy A(N) = B(N+1) - B(N-1) Loop deps:S2(=)S5S2(<)S5
Reversal S1 DO I = 1, NS2 B(I) = T(I)*X(I)S3 ENDDOS4 DO I = 1, NS5 A(I) = B(I+1)S6 ENDDO • Reverse the direction of the iteration • Only legal for loops that have no carried dependences • Enables loop fusion by ensuring dependences are preserved between loop statements S2 S5 S1 DO I = N, 1, -1S2 B(I) = T(I)*X(I)S3 ENDDOS4 DO I = N, 1, -1S5 A(I) = B(I+1)S6 ENDDO S2 S5 S1 DO I = N, 1, -1S2 B(I) = T(I)*X(I)S5 A(I) = B(I+1)S6 ENDDO S2(<)S5
Fission (1) S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDO S6 ENDDO • Loop fission (or loop distribution) splits a single loop into multiple loops • Enables vectorization • Enables parallelization of separate loops if original loop is sequential • Loop fission must preserve all dependence relations of the original loop S3(=,<)S4 S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)Sx ENDDO Sy DO J = 1, 10S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDO S6 ENDDO S3(=,<)S4 S1 PARALLEL DO I = 1, 10S3 A(I,1:10)=B(I,1:10)+C(I,1:10)S4 D(I,1:10)=A(I,0:9) * 2.0S6 ENDDO S3(=,<)S4
Fission (2) S1 DO I = 1, 10S2 A(I) = A(I) + B(I-1)S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)S5 D(I) = sqrt(C(I))S6 ENDDO • Compute the acyclic condensation of the dependence graph to find a legal order of the loops S3(<)S2S4(<)S3 S3(=)S4S4(=)S5 S2 S1 DO I = 1, 10S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)Sx ENDDO Sy DO I = 1, 10S2 A(I) = A(I) + B(I-1)Sz ENDDO Su DO I = 1, 10S5 D(I) = sqrt(C(I))Sv ENDDO 1 S3 S4 S3 1 0 S2 S5 S4 0 Acyclic condensation S5 Dependence graph
Alignment S1 DO I = 2, NS2A(I) = B(I) + C(I)S3 D(I) = A(I-1) * 2.0S4 ENDDO • Align statements in a loop body by expanding the iteration set • Attempts to transform loop-carried dependences into loop-independent dependences • Enables loop parallelization S2(<)S3 S1 DO i = 1, NS2 IF (i>1) A(i) = B(i) + C(i)S3 IF (i<N) D(i+1) = A(i) * 2.0S4 ENDDO S2(=)S3 S1 Before S2 S1 After S2
Index Set Splitting S1 DO I = 1, 100S2 A(I) = B(I) + C(I)S3 IF I > 10 THENS4 D(I) = A(I) + A(I-10)S5 ENDIF S6 ENDDO • Divide index set into two portions • Removes conditionals to enable other transformations • General case handles affine conditions in multi-dimensional loops by detecting a hyperplane through the iteration space polytope • But: code expansion S1 DO I = 1, 10S2 A(I) = B(I) + C(I)Sx ENDDO Sy DO I = 11, 100S2 A(I) = B(I) + C(I)S4 D(I) = A(I) + A(I-10) Su ENDDO 3*J>I Loop1 Loop2 J I
Loop Interchange (1) S1 DO I = 1, NS2 DO J = 1, MS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO • Changes the nesting order of nested loops • Loop interchange must preserve all dependence relations of the original loop • Enables vectorization of an outer loop • Can be used to improve spatial locality S3(=,<)S3 S2 DO J = 1, MS1 DO I = 1, NS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO S3(<,=)S3 S2 DO J = 1, MS3 A(1:N,J)=A(1:N,J-1)+B(1:N,J)S5 ENDDO S3(<,=)S3
Loop Interchange (2) S1 DO I = 1, NS2 DO J = 1, MS3 DO K = 1, LS4 A(I+1,J+1,K) = A(I,J,K) + A(I,J+1,K+1)S5 ENDDOS6 ENDDOS7 ENDDO • Compute the direction matrix and find which columns can be permuted without violating dependence relations in original loop nest S4(<,<,=)S4S4(<,=,>)S4 < < =< = > < < =< = > < = <= > < Invalid Direction matrix < < =< = > < < == < > Valid
Scalar Expansion S1 DO I = 1, NS2T = A(I) + B(I)S3 C(I) = T + 1/TS4 ENDDO • Breaks anti-dependence relations by expanding or promoting a scalar into an array • Scalar anti-dependence relations prevent certain loop transformations such as loop fission and loop interchange S2(=)S3S2-1(<)S3 Sx IF N > 0 THENSyALLOC Tx(1:N)S1 DO I = 1, NS2Tx(I) = A(I) + B(I)Sx C(I) = Tx(I) + 1/Tx(I)S4 ENDDOSz T = Tx(N)Su ENDIF S2(=)S3
Example S1 DO I = 1, 10S2 T = A(I,1)S3 DO J = 2, 10S4 T = T + A(I,J)S5 ENDDO S6 B(I) = TS7 ENDDO S1 DO I = 1, 10S2 Tx(I) = A(I,1)S3 DO J = 2, 10S4 Tx(I) = Tx(I)+A(I,J)S5 ENDDO S6 B(I) = Tx(I)S7 ENDDO S2(=)S4S4(=,<)S4S4(=)S6S2-1(<)S6 S2(=)S4S4(=,<)S4S4(=)S6 S1 DO I = 1, 10S2 Tx(I) = A(I,1)Sx ENDDO S1 DO I = 1, 10S3 DO J = 2, 10S4 Tx(I) = Tx(I) + A(I,J)S5 ENDDO Sy ENDDO Sz DO I = 1, 10S6 B(I) = Tx(I)S7 ENDDO S2 Tx(1:10) = A(1:10,1)S3 DO J = 2, 10S4 Tx(1:10) = Tx(1:10)+A(1:10,J)S5 ENDDO S6 B(1:10) = Tx(1:10) S2 S4S4(<,=)S4S4 S6 S2 S4S4(=,<)S4S4 S6
Other Loop Restructuring Transformations • Loop skewing: denormalize iteration vectors to change the shape of the iteration space (skew) to allow loop interchange • Strip mining: decompose a single loop into two nested loops (where the inner loop computes a strip of the data) • Loop tiling: the loop space is divided into tiles