250 likes | 413 Views
Optimal Superblock Scheduling Using Enumeration. Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis www.ece.ucdavis.edu/aco/. Outline. Background Existing Solutions Optimal Solution Experimental Results Summary and Future Work. Overview.
E N D
Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis www.ece.ucdavis.edu/aco/
Outline • Background • Existing Solutions • Optimal Solution • Experimental Results • Summary and Future Work
Overview • “Instruction Scheduling is the most fundamental ILP-oriented phase”. [Josh Fisher et al., “Embedded Computing”] • Scheduler tries to find an instruction order that minimizes pipeline stalls • Schedule must preserve program’s semantics and honor hardware constraints
Elements of Instruction Scheduling • Region Formation • Schedule Construction (the focus of our research)
Region Formation • Scheduler’s scope is a sub-graph of the program’s control flow graph (CFG) • Local scheduling: single basic block • Global scheduling: multiple basic blocks: • Trace • Superblock and hyperblock • Treegion • General acyclic: e.g. Wavefront (2000)
Schedule Construction • NP-Hard problem for realistic machines • Heuristic Solutions: Virtually all production compilers and most research • Optimal Approaches: Recent research • Local: Integer Programming and enumeration • Global: Integer Programming
The Superblock • Single-entry multiple-exit sequence of basic blocks • Data and control dependencies and allowed code motions are represented by a Directed Acyclic Graph (DAG)
A 1 1 B G 1 1 C H 0.3 0 0 D 3 E 1 3 F 0 0.2 I 0.5 Example Superblock DAG A B C AB C D E F 0.3 GH I 0.2
List Scheduling • Most common method in practice • Approximate greedy algorithm that runs fast in practice • Data-ready instructions stored in a priority list • Priorities assigned according to heuristics • If ready list is not empty, schedule top priority instruction • Else schedule a stall • Advance to next issue slot
5 A 1 1 4 B G 4 1 1 3 3 C H 0.3 0 0 1 3 D 3 E 1 3 0 F 0 0.2 0 I 0.5 Critical-Path Heuristic Cycle Instruction 0 A 1 B 2 G 3 C 4 D 5 H 6 E 7F 8I
Superblock Heuristics • Critical Path • Successive Retirement • Dependence height and speculative yield (DHASY) • G* • Speculative Hedge • Balance Scheduling
Optimal Scheduling • Can make improvement over heuristics • Accurate heuristic methods are already complex • In some applications, longer compile times can be tolerated • Reference for evaluating accuracy of heuristics and studying ILP limits
Objective Find a schedule with minimum cost S : A given schedule Pi : Probability of exit i Di : Delay of exit i from its lower bound Li E : # of side exits
[0,0] A 1 1 [1,2] B G [1,4] 1 1 [2,3] [2,5] C H 0.3 0 0 [3,6] [3,4] D 3 E 1 3 [6,7] F 0 0.2 [8,8] I 0.5 Cost Function Example: CP Cycle Instruction 0 A 1 B 2 G 3 C 4 D 5 H 6 E 7 F 8 I Cost = 0.3*1 + 0.2*1 + 0.5*0 = 0.5
Optimal Algorithm YES Lower Bounds Heuristic Solution Done Cost = 0 NO YES Feasible Done Enumerate Fix Branches NO
Enumeration • List scheduling with backtracking • Explores one target length at a time • A subset of instructions can be fixed • Branch-and-Bound approach with four feasibility tests (pruning techniques) • Node superiority • LB tightening • History-based domination • Relaxed Scheduling
I2 I3 I1 I4 I5 Enumeration Example I1 2 2 2 2 I2 I3 Target length = 4 I3 I2 I4 stall Infeasible! Backtrack I5
Branch Combinations & Subset Sum • Branch Combination Problem is NP- Complete! • Can be reduced to Subset Sum • In practice, the number of branches and ranges are small. • Solved efficiently using Dynamic Programming
[0,0] A 1 1 [1,2] B G [1,4] 1 1 [2,3] [2,5] C H 0.3 0 0 [3,6] [3,4] D 3 E 1 3 [6,7] F 0 0.2 [8,8] I 0.5 Complete Example Start with CP heuristic Cost = 0.5 Only length 8 is interesting Branch Comb C F Cost (0, 0) 2 6 0.0 (0, 1) 2 7 0.2 (1, 0) 3 6 0.3
[0,0] A 1 1 [1,1] B G [1,4] 1 1 [2,2] [2,5] C H A 0.3 0 0 Relaxed Sched [3,5] [3,3] D 3 E 0 : A 1 : B 1 3 2 : C [6,6] F 3 : D 0 0.2 4 : G [8,8] I ? 5 : E H 0.5 X Branch Combination (0,0)Cost = 0.0 Infeasible
[0,0] A 1 1 A G [1,1] B G [1,4] B 1 1 [2,2] C G [2,5] C H 0.3 G 0 E 0 D [3,6] [3,4] D 3 E D E H 1 H 3 E [7,7] F 0 0.2 [8,8] I E 0.5 F I Branch Combination (0,1)Cost = 0.2 Optimal Schedule A, B, C, G, D, H, E, F, I with cost 0.2
Experimental Results • Superblocks imported from GCC using SPEC CPU2000, FP and INT • Scheduled for 4 machine models: • single-issue • dual-issue • quad-issue • six-issue. • Time limit set to 1 second per problem
Summary & Future Work • An optimal superblock scheduling technique has been developed • About 99% of hard problems solved within 1 sec • 80% improved • Next Goal: explore other global regions. Trace is strongest candidate