1 / 16

Loop Restructuring

Loop Restructuring. Loop unswitching Loop peeling Loop fusion Loop alignment for fusion Loop reversal Loop fission Loop alignment Loop index set splitting Loop interchange Scalar expansion. Unswitching.

azize
Download Presentation

Loop Restructuring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Loop Restructuring • Loop unswitching • Loop peeling • Loop fusion • Loop alignment for fusion • Loop reversal • Loop fission • Loop alignment • Loop index set splitting • Loop interchange • Scalar expansion

  2. Unswitching DO I = 1, N DO J = 2, N IF T(I) > 0 THEN A(I,J) = A(I,J-1)*T(I)+B(I) ELSE A(I,J) = 0.0 ENDIF ENDDOENDDO • Loop unswitching removes loop-independent conditionals • Reduces the frequency of executing branches • But: leads to code expansion DO I = 1, N IF T(I) > 0 THEN DO J = 2, N A(I,J) = A(I,J-1)*T(I)+B(I) ENDDO ELSE DO J = 2, N A(I,J) = 0.0 ENDDO ENDIFENDDO

  3. Peeling J = 0K = MDO I = 0, N A(K) = B(J) - B(K) K = J J = J + 1 ENDDO • Loop peeling removes the first (or last) iteration of a loop into separate code • Enables loop fusion by changing bounds of one loop to match bounds of another • But: leads to code expansion J = 0K = MA(K) = B(J) - B(K)K = JJ = J + 1DO I = 1, N A(K) = B(J) - B(K) K = J J = J + 1ENDDO

  4. Fusion S1 B(1) = T(1)*X(1)S2 DO I = 2, NS3 B(I) = T(I)*X(I)S4 ENDDOS5 DO I = 2, NS6 A(I) = B(I) - B(I-1)S7 ENDDO • Combine two consecutive loops with same IV and loop bounds into one • Fused loop must preserve all dependence relations of the original loop • Enables more effective scalar optimizations in fused loop • But: may reduce temporal locality S1 S6S3 S6 S1 B(1) = T(1)*X(1)Sx DO I = 2, NS3 B(I) = T(I)*X(I)S6 A(I) = B(I) - B(I-1)Sy ENDDO S1 S6S3(=)S6S3(<)S6 Original code has dependencesS1 S6 and S3 S6Fused loop has dependencesS1 S6 and S3(=)S6 and S3(<)S6

  5. Example a) S1 DO I = 1, NS2 A(I) = B(I) + 1S3 ENDDOS4 DO I = 1, NS5 C(I) = A(I)/2S6 ENDDOS7 DO I = 1, NS8 D(I) = 1/C(I+1) S9 ENDDO S1 DO I = 1, NS2 A(I) = B(I) + 1S3 ENDDO Sx DO I = 1, NS5 C(I) = A(I)/2S8 D(I) = 1/C(I+1) Sy ENDDO b) Sx DO I = 1, NS2 A(I) = B(I) + 1S5 C(I) = A(I)/2Sy ENDDO S7 DO I = 1, NS8 D(I) = 1/C(I+1) S9 ENDDO Which of the threefused loops is legal? c) Sx DO I = 1, NS2 A(I) = B(I) + 1S5 C(I) = A(I)/2S8 D(I) = 1/C(I+1) Sy ENDDO

  6. Alignment for Fusion S1 DO I = 1, NS2 B(I) = T(I)/CS3 ENDDOS4 DO I = 1, NS5 A(I) = B(I+1) - B(I-1)S6 ENDDO • Alignment for fusion changes iteration bounds of one loop to enable fusion when dependences would otherwise prevent fusion S2 S5 S1 DO I = 0, N-1S2 B(I+1) = T(I+1)/CS3 ENDDO S4 DO I = 1, NS5 A(I) = B(I+1) - B(I-1)S6 ENDDO S2 S5 Sx B(1) = T(1)/CS1 DO I = 1, N-1S2 B(I+1) = T(I+1)/CS5 A(I) = B(I+1) - B(I-1)S6 ENDDOSy A(N) = B(N+1) - B(N-1) Loop deps:S2(=)S5S2(<)S5

  7. Reversal S1 DO I = 1, NS2 B(I) = T(I)*X(I)S3 ENDDOS4 DO I = 1, NS5 A(I) = B(I+1)S6 ENDDO • Reverse the direction of the iteration • Only legal for loops that have no carried dependences • Enables loop fusion by ensuring dependences are preserved between loop statements S2 S5 S1 DO I = N, 1, -1S2 B(I) = T(I)*X(I)S3 ENDDOS4 DO I = N, 1, -1S5 A(I) = B(I+1)S6 ENDDO S2 S5 S1 DO I = N, 1, -1S2 B(I) = T(I)*X(I)S5 A(I) = B(I+1)S6 ENDDO S2(<)S5

  8. Fission (1) S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDO S6 ENDDO • Loop fission (or loop distribution) splits a single loop into multiple loops • Enables vectorization • Enables parallelization of separate loops if original loop is sequential • Loop fission must preserve all dependence relations of the original loop S3(=,<)S4 S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)Sx ENDDO Sy DO J = 1, 10S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDO S6 ENDDO S3(=,<)S4 S1 PARALLEL DO I = 1, 10S3 A(I,1:10)=B(I,1:10)+C(I,1:10)S4 D(I,1:10)=A(I,0:9) * 2.0S6 ENDDO S3(=,<)S4

  9. Fission (2) S1 DO I = 1, 10S2 A(I) = A(I) + B(I-1)S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)S5 D(I) = sqrt(C(I))S6 ENDDO • Compute the acyclic condensation of the dependence graph to find a legal order of the loops S3(<)S2S4(<)S3 S3(=)S4S4(=)S5 S2 S1 DO I = 1, 10S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)Sx ENDDO Sy DO I = 1, 10S2 A(I) = A(I) + B(I-1)Sz ENDDO Su DO I = 1, 10S5 D(I) = sqrt(C(I))Sv ENDDO 1 S3 S4 S3 1 0 S2 S5 S4 0 Acyclic condensation S5 Dependence graph

  10. Alignment S1 DO I = 2, NS2A(I) = B(I) + C(I)S3 D(I) = A(I-1) * 2.0S4 ENDDO • Align statements in a loop body by expanding the iteration set • Attempts to transform loop-carried dependences into loop-independent dependences • Enables loop parallelization S2(<)S3 S1 DO i = 1, NS2 IF (i>1) A(i) = B(i) + C(i)S3 IF (i<N) D(i+1) = A(i) * 2.0S4 ENDDO S2(=)S3 S1 Before S2 S1 After S2

  11. Index Set Splitting S1 DO I = 1, 100S2 A(I) = B(I) + C(I)S3 IF I > 10 THENS4 D(I) = A(I) + A(I-10)S5 ENDIF S6 ENDDO • Divide index set into two portions • Removes conditionals to enable other transformations • General case handles affine conditions in multi-dimensional loops by detecting a hyperplane through the iteration space polytope • But: code expansion S1 DO I = 1, 10S2 A(I) = B(I) + C(I)Sx ENDDO Sy DO I = 11, 100S2 A(I) = B(I) + C(I)S4 D(I) = A(I) + A(I-10) Su ENDDO 3*J>I Loop1 Loop2 J I

  12. Loop Interchange (1) S1 DO I = 1, NS2 DO J = 1, MS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO • Changes the nesting order of nested loops • Loop interchange must preserve all dependence relations of the original loop • Enables vectorization of an outer loop • Can be used to improve spatial locality S3(=,<)S3 S2 DO J = 1, MS1 DO I = 1, NS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO S3(<,=)S3 S2 DO J = 1, MS3 A(1:N,J)=A(1:N,J-1)+B(1:N,J)S5 ENDDO S3(<,=)S3

  13. Loop Interchange (2) S1 DO I = 1, NS2 DO J = 1, MS3 DO K = 1, LS4 A(I+1,J+1,K) = A(I,J,K) + A(I,J+1,K+1)S5 ENDDOS6 ENDDOS7 ENDDO • Compute the direction matrix and find which columns can be permuted without violating dependence relations in original loop nest S4(<,<,=)S4S4(<,=,>)S4 < < =< = > < < =< = > < = <= > < Invalid Direction matrix < < =< = > < < == < > Valid

  14. Scalar Expansion S1 DO I = 1, NS2T = A(I) + B(I)S3 C(I) = T + 1/TS4 ENDDO • Breaks anti-dependence relations by expanding or promoting a scalar into an array • Scalar anti-dependence relations prevent certain loop transformations such as loop fission and loop interchange S2(=)S3S2-1(<)S3 Sx IF N > 0 THENSyALLOC Tx(1:N)S1 DO I = 1, NS2Tx(I) = A(I) + B(I)Sx C(I) = Tx(I) + 1/Tx(I)S4 ENDDOSz T = Tx(N)Su ENDIF S2(=)S3

  15. Example S1 DO I = 1, 10S2 T = A(I,1)S3 DO J = 2, 10S4 T = T + A(I,J)S5 ENDDO S6 B(I) = TS7 ENDDO S1 DO I = 1, 10S2 Tx(I) = A(I,1)S3 DO J = 2, 10S4 Tx(I) = Tx(I)+A(I,J)S5 ENDDO S6 B(I) = Tx(I)S7 ENDDO S2(=)S4S4(=,<)S4S4(=)S6S2-1(<)S6 S2(=)S4S4(=,<)S4S4(=)S6 S1 DO I = 1, 10S2 Tx(I) = A(I,1)Sx ENDDO S1 DO I = 1, 10S3 DO J = 2, 10S4 Tx(I) = Tx(I) + A(I,J)S5 ENDDO Sy ENDDO Sz DO I = 1, 10S6 B(I) = Tx(I)S7 ENDDO S2 Tx(1:10) = A(1:10,1)S3 DO J = 2, 10S4 Tx(1:10) = Tx(1:10)+A(1:10,J)S5 ENDDO S6 B(1:10) = Tx(1:10) S2 S4S4(<,=)S4S4 S6 S2 S4S4(=,<)S4S4 S6

  16. Other Loop Restructuring Transformations • Loop skewing: denormalize iteration vectors to change the shape of the iteration space (skew) to allow loop interchange • Strip mining: decompose a single loop into two nested loops (where the inner loop computes a strip of the data) • Loop tiling: the loop space is divided into tiles

More Related