1 / 25

Proactive Loop-nest Optimizations

Proactive Loop-nest Optimizations. Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai. Adjacent Loops. Five little pumpkins sitting on a gate …. Func. If. Block. If. Then. Else. Then. Else. Block. Loop. If. If. Then. Else. Else. Then. Loop. Loop.

Download Presentation

Proactive Loop-nest Optimizations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai

  2. Adjacent Loops Five little pumpkins sitting on a gate …

  3. Func If Block If Then Else Then Else Block Loop If If Then Else Else Then Loop Loop Loop

  4. Proactive Loop Fusion An automation that applies a set of code transformations (if-merging, head/tail duplication, code motion and etc.) iteratively over the whole function without a fixed order to bring pairs of loops adjacent to each other for the purpose of enabling loop fusion.

  5. Proactive Loop Fusion Candidates LCP A pair of loops are proactive loop fusion candidates iff: • Have a Least Common Predecessor (LCP) in the tree. • Paths from candidates to LCP have equal length. • Each pair of nodes on the path have the same type. Pairs of Ifs have identical values for condition expressions. • Loops not adjacent to each other but are otherwise good fusion candidates. O(( depth * n)^2) (depth: depth in tree, n: number of loops at that depth) If If Block Then Else Then Else Loop2 Loop1

  6. Proactive Loop Fusion Transformation Candidates LCP cand1 cand2 If Block If Proactive loop fusion transformation candidates, cand1 and cand2: 1. Are immediate children of the LCP of loop fusion candidates. 2. Are either a If or a Loop. 3. For every sibling in-between (cand1, cand2) that is a Block or a If. The Block can be safely and legally move above cand1 if cand1 is a Loop. The If has at least one path that does not have dependency on loop fusion candidates. 4. For every sibling in-between (cand1, cand2] that is a If, Its preceding siblings can be legally if-merged or head-duplicated into it. 5. For every sibling in-between [cand1, cand2) that is a if. Its succeeding siblings can be legally if-merged or tail-duplicated into it. Then Else Then Else Loop1 Loop2

  7. LCP cand1 cand2 (1) If Block If sc1 sc2 tail-duplication LCP (2) If If sc1 sc2 if-merging LCP (3) If

  8. Action Table sc1sc2Action Loop Block Safe code motion of sc2 before sc1; Iteration continues on sc1. If Block Tail-duplication of sc2 into sc1; Iteration continues on sc1. Loop If Head duplication of sc1 into sc2; Iteration continues on sc2. If If If-merging or tail duplication of sc2 into sc1. Iteration continues on sc1. If Loop Tail duplication of sc2 into sc1. Iteration continues on sc1.

  9. LCP Func cand1 cand2 (sc1) If(a) If(a) (sc2) Then Else Then Else if (a) { for (i=0; i<n;i++) stmt1; if (b) stmt2; } if (a) { for (i=0; i<n;i++) stmt3; } if (a) { for (i=0;i<n;i++) stmt1; if (b) stmt2; for (i=0;i<n;i++) stmt3; } Loop If(b) Loop Then Else Block ----------------------------------if-merging------------------------------------------------ LCP Func If(a) Then Else Loop If(b) Loop Then Else Block

  10. If(a) LCP Else Then cand1 cand2 (sc1) sc2 Loop If(b) Loop if (a) { for (i=0;i<n;i++) stmt1; if (b) stmt2; for (i=0;i<n;i++) stmt3; } if (a) { if (b) { for (i=0;i<n;i++) stmt1; stmt2; } else { for (i=0;i<n;i++) stmt1; } for (i=0;i<n;i++) stmt3; } Then Else Block ------------------------------head duplication----------------------------------------------- If(a) LCP Then Else sc1 sc2 If(b) Loop Then Else Loop Block Loop

  11. LCP If(a) Then Else sc1 sc2 If(b) Loop Then Else if (a) { if (b) { for (i=0;i<n;i++) stmt1; stmt2; } else { for (i=0;i<n;i++) stmt1; } for (i=0;i<n;i++) stmt3; } if (a) { if (b) { for (i=0;i<n;i++) stmt1; stmt2; for (i=0; i<n;i++) stmt3; } else { for (i=0;i<n;i++) stmt1; for (i=0;i<n;i++) stmt3; } } Loop Block Loop ---------------------------------- tail duplication---------------------------------------------------- If(a) LCP Then Else If(b) Else Then Loop Block Loop Loop Loop

  12. If(a) Then Else LCP If(b) Then Else if (a) { if (b) { for (i=0; i<n;i++) stmt1; stmt2; for (i=0;i<n;i++) stmt3; } } else { for (i=0;i<n;i++) stmt1; for(i=0;i<n;i++) stmt3; } if (a) { if (b) { stmt2; for (i=0;i<n;i++) stmt1; for (i=0;i<n;i++) stmt3; } } else { for (i=0;i<n;i++) stmt1; for (i=0;i<n;i++) stmt3; } cand1 cand2 sc2 (sc1) Loop Block Loop Loop Loop -----------------------------------code motion------------------------------------------------------- If(a) Then Else If(b) LCP Then Else Block Loop Loop Loop Loop

  13. 1. void COMP_UNIT::Pro_loop_fusion_trans() { 2. // Identifying proactive loop fusion candidates and flags LCPs 3.pro_loop_fusion_trans->Classify_loops(func); 4. // Start a top-down proactive loop fusion transformations. 5. pro_loop_fusion_trans->Top_down_trans(func); } 6. void PRO_LOOP_FUSION_TRANS::Top_down_trans(SC_NODE * sc) { 7. if (sc is a LCP) { // Process LCPs 8. while (1) { 9. // Find proactive loop fusion transformation candidates. 10. Find_cand(sc, &cand1, &cand2); 11. // Invoke proactive loop fusion transformations. 12. if (cand1 && cand2) 13. Traverse_trans(cand1, cand2); 14. else 15. break; } 16. if (transformation happens) { 17. // Re- identify proactive loop fusion candidates. 18. Classify_loops(sc); } } 19. // Recursively visit chid nodes. 20. SC_LIST_ITER sc_list_iter; 21. SC_NODE * kid; 22. FOR_ALL_ELEM(kid, sc_list_iter, Init(sc->Kids())) 23. Top_down_trans(kid); } O(n*m) (n: number of LCPs, m: number of intervening nodes among loop fusion candidates)

  14. Proactive Loop Interchange An automation that applies loop unswitching, reverse loop unswitching, if-condition distribution, if-condition tree height reduction and other control flow graph transformations to eliminate intervening statements between the outer loop and the inner loop in a loop-nest for the purpose of enabling loop interchange.

  15. Loop if(a&(1<<i)) Then Else Loop if(b) blue Else Then for (i=0; i<n;i++) { if (a & (1<<i)) { if (b) bar(); else if (c) { for (j=0;j<m;j++) a[j][i] = 0; } } } for (i=0;i<n;i++) { if (a & (1<<i)) { if (!b && c){ for (j=0;j<m;j++) a[j][i] = 0; } else if (b) bar(); } } red if(c) Block red Then Else Loop Loop -----------------------if-condition tree height reduction------------------------- Loop if (a&(1<<i)) Loop Then Else blue red if(!b&&c) Loop Else Then if(b) Loop Then Else Block

  16. Loop if(a&(1<<i)) Then Else Loop if(!b&&c) Then Else blue for (i=0; i<n;i++) { if (a & (1<<i)) { if (!b && c) { for (j=0;j<m;j++) a[j][i]=0; } else if (b) bar(); } } for (i=0;i<n;i++) { if (!b &&c) { if (a & (1<<i)) { for (j=0;j<m;j++) a[j][i] = 0; } } else if (b) { if (a & (1<<i)) bar(); } } if(b) red Loop Then Else Loop Block ------------------------------ if-condition distribution ------------------------------------------------------- Loop if(!b&&c) Then Else Loop if(a&(1<<i)) if(b) red Then Else Then Else blue Loop Loop if(a&(1<<i)) Then Else Block

  17. Loop if(!b&&c) Then Else if(a&(1<<i)) if(b) Then Else Then Else Loop for (i=0;i<n;i++) { if (!b && c) { if (a & (1<<i)) { for (j=0;j<m;j++) a[j][i]=0; } } else if (b) { if (a & (1<<i)) bar(); } } for (i=0;i<n;i++) { if (!b && c) { for (j=0;j<m;j++) { if (a & (1<<i)) a[j][i]=0; } } else if (b) { if (a & (1<<i)) bar(); } } Loop if(a&(1<<i)) red Then Else Block blue Block Loop ----------------------------reversed loop un-switching---------------------------------- Loop Loop if(!b&&c) Else Then red if(b) Loop Loop Then Else if(a&(1<<i)) Then Else if(a&(1<<i)) Then Else Block Block

  18. Loop ty if(!b&&c) Then Else Loop if(b) Then Else Loop if(a&(1<<i)) for (i=0;i<n;i++) { if (!b && c) { for (j=0;j<m;j++) { if (a & (1<<i)) a[j][i]=0; } } else if (b) { if (a & (1<<i)) bar(); } } if (!b && c) { for (i=0;i<n;i++) { for (j=0;j<m;j++) { if (a & (1<<i)) a[j][i]=0; } } } else if (b) { for (i=0;i<n;i++) { if (a & (1<<i)) bar(); } } Then Else if(a&(1<<i)) red Block Then Else Loop Block ---------------------------loop un-switching -------------------------------------------------------------- if(!b&&c) Then Else Loop Loop if(b) Loop Loop Then Else if(a&(1<<i)) Loop Then Else if(a&(1<<i)) Block Then Else Block

  19. Heuristics Proactive loop fusion • Maximize loop fusion. • Large or unknown trip count loops. • Loops on symmetric paths with same iteration spaces. • Pre-check on transformation legality. Proactive loop interchange • Fully-permutable loop-nest. • Memory reference iterates on inner loop’s dimension. Inner loop has large or unknown trip counts. • Simply-nested if-regions. • Pre-check on transformation legality.

  20. Peak scores of libquantum AMD Istanbul, 2.4GHz, 2 socket, 6 cores/socket, 64KB L1 instruction cache, 64KB L1 data cache, 512 KB L2 cache, 6MB/socket L3 cache, 32GB DDR2-800 memory, SLES10 SP2

  21. Reference Kit Barton (www.cs.ualberta.ca/~cbarton) • Gather intervening codes between loops using dominance relation. • Build Data Dependence Graph of the intervening codes. • Use schedule queue to identify movable nodes.

  22. Barton’s Non-Adjacent loops example while (i < N) { a += i; i++; } b := a * 2; c := b + 6; g := 0; h := g + 10; if (c < 100) d := c/2; else e := c * 2; while (j < N) { f := g + 6; j++; } b := a * 2; g := 0; c := b + 6; h := g + 10; • if (c < 100) • d := c/2; • else • e := c * 2;

  23. Barton’s Non-Adjacent loops example • g := 0; • h := g + 10; • while (i < N) { • a += i; • i++; • } • while (j < N) { • f := g + 6; • j++; • } • b := a * 2; • c := b + 6; • if (c < 100) • d := c/2; • else • e := c * 2; while (i < N) { a += i; i++; } b := a * 2; c := b + 6; g := 0; h := g + 10; if (c < 100) d := c/2; else e := c * 2; while (j < N) { f := g + 6; j++; }

  24. Barton’sPros & Cons Pros • Powerful full-fledged code motion. Cons • Loops must be control-flow equivalent. • No finer granularity in if-regions.

More Related