420 likes | 513 Views
Constraint Programming for Compiler Optimization March 2006. Joint work with: Alexander Golynski Alejandro López-Ortiz Abid Malik Jim McInnes Claude-Guy Quimper John Tromp Kent Wilken. Funding: NSERC IBM Canada. Acknowledgements. Optimization problems in compilers.
E N D
Joint work with: Alexander Golynski Alejandro López-Ortiz Abid Malik Jim McInnes Claude-Guy Quimper John Tromp Kent Wilken Funding: NSERC IBM Canada Acknowledgements 2
Optimization problems in compilers • Instruction selection • Instruction scheduling • basic-block instruction scheduling • super-block scheduling • software pipelining & loop unrolling • Register allocation • Memory hierarchy optimizations 3
Basic-block instruction scheduling • Schedule basic-block • straight-line sequence of code with single entry, single exit • Multiple-issue pipelined processors • multiple instructions can begin execution each clock cycle • delay or latencybefore results are available • Find minimum length schedule • Classic problem • lots of attention in literature 4
dependency DAG A B 3 3 D C 1 3 E Example: (a + b) + c instructions A r1 a B r2 b C r3 c D r1 r1 + r2 E r1 r1 + r3 5
dependency DAG A B 3 3 D C 1 3 E Single-issue pipelined processor non-optimal schedule A r1 a B r2 b nop nop D r1 r1 + r2 C r3 c nop nop E r1 r1 + r3 6
dependency DAG A B 3 3 D C 1 3 E Single-issue pipelined processor optimal schedule A r1 a B r2 b C r3 c nop D r1 r1 + r2 E r1 r1 + r3 7
dependency DAG issue width is 2 A A B B 1 C 3 3 2 D 3 C D 5 1 3 4 E E Multiple-issue pipelined processor 8
dependency DAG D B A A C B 1 3 3 2 D 3 C 5 6 1 3 4 E Multiple-issue pipelined processor issue width is 1+1 E 9
Production compilers “At the outset, note that basic-block scheduling is an NP-hard problem, even with a very simple formulation of the problem, so we must seek an effective heuristic, rather than exact, approach.” Steven Muchnick, Advanced Compiler Design & Implementation, 1997 10
Single-issue Previous 10-40 instructions ILP (Arya, 1985) CP (Ertl & Krall, 1991) up to 1000 instructions ILP (Wilken et al, 2000) Our work up to 2600 instructions 20 × faster Multiple-issue Previous 10-40 instructions ILP (Chang et al., 1997) DP (Kessler, 1998) up to 1000 instructions B&B (Heffernan et al., 2005) Our work up to 2600 instructions 50-fold improvement Optimal approachesstate-of-the-art 11
Constraint programming methodology • Model problem • specify in terms of constraints on acceptable solutions • define/choose constraint model: variables, domains, constraints • Solve model • define/choose search algorithm • define/choose heuristics 12
Constraint programming methodology • Model problem • specify in terms of constraints on acceptable solutions • define/choose constraint model: variables, domains, constraints • Solve model • define/choose search algorithm • define/choose heuristics 13
dependency DAG A B 3 3 D C 1 3 E Minimal constraint model variables A, B, C, D, E domains {1, …, m} constraints D A + 3 D B + 3 E C + 3 E D + 1 gcc(A, B, C, D, E, width) 14
Bounds consistency constraint propagation variable A B C D E domain [1, 6] [1, 6] [1, 6] [1, 6] [1, 6] [1, 2] [1, 3] [1, 3] [1, 2] [1, 3] [3, 3] [4, 6] [4, 5] [4, 6] [5, 6] [6, 6] constraints D A + 3 D B + 3 E C + 3 E D + 1 gcc(A, B, C, D, E, 1) 15
Improvements to constraint model • 1. Distance constraints • constraints over nodes which define regions • 2. Predecessor and successor constraints • constraints over nodes with multiple predecessors or multiple successors • 3. Safe pruning constraint • global constraint • 4. Dominance constraints • constraints based on graph isomorphism 16
Improvements to constraint model • 1. Distance constraints • constraints over nodes which define regions • 2. Predecessor and successor constraints • constraints over nodes with multiple predecessors or multiple successors • 3. Safe pruning constraint • global constraint • 4. Dominance constraints • constraints based on graph isomorphism 17
Distance constraints: Regions A pair of nodes i, j define a region in a DAG G if: (i) there is more than one path from i to j, and (ii) not all paths from i to j go through some node k distinct from i and j. i j 18
A 1 1 C B 3 3 D E 1 1 1 F G 3 3 H Distance constraints: Estimate 19
A 1 1 A F j+1 j C B j+2 j+3 j+4 j+5 3 3 5 D E 1 1 1 F G 3 3 H Distance constraints: Estimate 20
A 1 1 E H j+1 j C B j+2 j+3 j+4 j+5 3 3 D E 1 1 1 F 5 G 3 3 H Distance constraints: Estimate 21
A 1 1 A H j+6 j+1 j C B j+2 j+3 j+4 j+5 3 3 D E j+7 j+8 j+9 1 1 1 F G 9 3 3 H Distance constraints: Estimate 22
[1,1] A 1 1 [2,3] [2,3] C B 3 3 [5,6] [5,6] D E 1 1 1 [6,7] [6,7] F G 3 3 [10,10] H Distance constraints: Optimal • Estimate: H A + 9 • Not optimal: A 1 H 10 propagate latency propagate all-diff 23
[1,1] A 1 1 [2,3] [2,3] C B 3 3 [5,6] [5,6] D E 1 1 1 [6,7] [6,7] F G 3 3 [10,10] H Distance constraints: Optimal • Estimate: H A + 9 • Not optimal: A 1 H 10 propagate latency propagate all-diff inconsistent • Optimal: H A + 10 24
Improvements to constraint model • 1. Distance constraints • constraints over nodes which define regions • 2. Predecessor and successor constraints • constraints over nodes with multiple predecessors or multiple successors • 3. Safe pruning constraint • global constraint • 4. Dominance constraints • constraints based on graph isomorphism 25
A 7 1 G B F 1 [5,8] 1 1 D H [6,9] [5,9] [5,9] C 3 3 3 [8,12] [9,12] E 2 2 11 Predecessor constraints [4, ] [ ,14] 26
A [4, ] 7 1 6 5 G B F 1 [5,8] 1 7 8 9 1 H [6,9] [5,9] [5,9] D C 3 3 3 [8,12] [9,12] E 2 2 [ ,14] 11 Predecessor constraints [9,12] 27
A [4, ] 7 1 9 G B 1 [5,8] 1 10 11 12 1 D [6,9] [5,9] [5,9] C 3 3 3 [8,12] [9,12] F [9,12] E 2 2 [ ,14] 11 H Predecessor constraints [12,14] 28
[4, ] 7 A 1 6 1 [5,8] B 1 7 8 9 1 [6,9] [5,9] [5,9] C D E 3 3 3 [8,12] [9,12] F G [9,12] 2 2 [12,14] [ ,14] 11 H Successor constraints [4,6] 29
Constraint programming methodology • Model problem • specify in terms of constraints on acceptable solutions • define/choose constraint model: variables, domains, constraints • Solve model • define/choose searchalgorithm • define/choose heuristics 30
Solving instances of the model • Use constraints to establish: • lower bound on length m of optimal schedule • min and max of domains of variables • Backtracking search • branches on min(x), min(x)+1, … • interleave with bounds consistency constraint propagation • fallback: singleton consistency on bounds • If no solution found, increment m and repeat search 31
[1,5] [1,5] A B 3 3 [1,5] [1,5] D C 1 3 [1,5] E Solving instances of the model 1 2 4 5 A B C D E 32
[ ] [ ] A B 3 3 [ ] [ ] D C 1 3 [ ] E Solving instances of the model 1 2 4 5 A B C D E 33
[1,6] [1,6] A B 1 2 5 6 A B 3 3 [1,6] [1,6] C D C D 1 3 [1,6] E E Solving instances of the model 34
[1,2] [1,2] A B 3 3 [5,5] [3,3] D C 1 3 [6,6] E Solving instances of the model 1 2 5 6 A B C D E 35
Improvements to constraint solver • Design special purpose constraint propagators • commonly occurring constraints • significantly improve efficiency • Improved algorithms for bounds consistency • all-diff constraint • gcc constraint 36
Comparing all-diff propagators (prototype) Time (sec.) to solve instruction scheduling problems; model includes latency, distance, and all-diff constraints. DC: Régin, 1994; MT: Mehlhorn & Thiel, 2000; BC: IJCAI-2003 37
Comparing gcc propagators (prototype) Time (sec.) to solve instruction scheduling problems; model includes latency and gcc constraints; width is 2. DC: Régin, 1996; vH: van Hentenryck et al., 1992; BC: CP-2003 38
Putting it all together: Experimental results SPEC 2000 & MediaBench Benchmarks Total of 352,111 basic blocks of size 3 or greater Improved = improved schedule over heuristic scheduler Timed out = not solved within 10 minutes 39
Putting it all together: Experimental results SPEC 2000 & MediaBench Benchmarks For basic blocks with improved schedules 40
Conclusions • CP approach to instruction scheduling • Single-issue processors • 20-times faster than previous best optimal approach • Multiple-issue processors • larger and more difficult problems • 50-fold reduction in number of problems that cannot be solved • Constraint propagators • faster all-diff and gcc constraint propagators • useful in many problems 41
Current and future work: Expand scope of problem • Instruction selection • Instruction scheduling • basic-block instruction scheduling • super-block scheduling • software pipelining & loop unrolling • Register allocation • Memory hierarchy optimizations 42