1 / 25

Optimal Superblock Scheduling Using Enumeration

Optimal Superblock Scheduling Using Enumeration. Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis www.ece.ucdavis.edu/aco/. Outline. Background Existing Solutions Optimal Solution Experimental Results Summary and Future Work. Overview.

Download Presentation

Optimal Superblock Scheduling Using Enumeration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis www.ece.ucdavis.edu/aco/

  2. Outline • Background • Existing Solutions • Optimal Solution • Experimental Results • Summary and Future Work

  3. Overview • “Instruction Scheduling is the most fundamental ILP-oriented phase”. [Josh Fisher et al., “Embedded Computing”] • Scheduler tries to find an instruction order that minimizes pipeline stalls • Schedule must preserve program’s semantics and honor hardware constraints

  4. Elements of Instruction Scheduling • Region Formation • Schedule Construction (the focus of our research)

  5. Region Formation • Scheduler’s scope is a sub-graph of the program’s control flow graph (CFG) • Local scheduling: single basic block • Global scheduling: multiple basic blocks: • Trace • Superblock and hyperblock • Treegion • General acyclic: e.g. Wavefront (2000)

  6. Schedule Construction • NP-Hard problem for realistic machines • Heuristic Solutions: Virtually all production compilers and most research • Optimal Approaches: Recent research • Local: Integer Programming and enumeration • Global: Integer Programming

  7. The Superblock • Single-entry multiple-exit sequence of basic blocks • Data and control dependencies and allowed code motions are represented by a Directed Acyclic Graph (DAG)

  8. A 1 1 B G 1 1 C H 0.3 0 0 D 3 E 1 3 F 0 0.2 I 0.5 Example Superblock DAG A B C AB C D E F 0.3 GH I 0.2

  9. List Scheduling • Most common method in practice • Approximate greedy algorithm that runs fast in practice • Data-ready instructions stored in a priority list • Priorities assigned according to heuristics • If ready list is not empty, schedule top priority instruction • Else schedule a stall • Advance to next issue slot

  10. 5 A 1 1 4 B G 4 1 1 3 3 C H 0.3 0 0 1 3 D 3 E 1 3 0 F 0 0.2 0 I 0.5 Critical-Path Heuristic Cycle Instruction 0 A 1 B 2 G 3 C 4 D 5 H 6 E 7F 8I

  11. Superblock Heuristics • Critical Path • Successive Retirement • Dependence height and speculative yield (DHASY) • G* • Speculative Hedge • Balance Scheduling

  12. Optimal Scheduling • Can make improvement over heuristics • Accurate heuristic methods are already complex • In some applications, longer compile times can be tolerated • Reference for evaluating accuracy of heuristics and studying ILP limits

  13. Objective Find a schedule with minimum cost S : A given schedule Pi : Probability of exit i Di : Delay of exit i from its lower bound Li E : # of side exits

  14. [0,0] A 1 1 [1,2] B G [1,4] 1 1 [2,3] [2,5] C H 0.3 0 0 [3,6] [3,4] D 3 E 1 3 [6,7] F 0 0.2 [8,8] I 0.5 Cost Function Example: CP Cycle Instruction 0 A 1 B 2 G 3 C 4 D 5 H 6 E 7 F 8 I Cost = 0.3*1 + 0.2*1 + 0.5*0 = 0.5

  15. Optimal Algorithm YES Lower Bounds Heuristic Solution Done Cost = 0 NO YES Feasible Done Enumerate Fix Branches NO

  16. Enumeration • List scheduling with backtracking • Explores one target length at a time • A subset of instructions can be fixed • Branch-and-Bound approach with four feasibility tests (pruning techniques) • Node superiority • LB tightening • History-based domination • Relaxed Scheduling

  17. I2 I3 I1 I4 I5 Enumeration Example I1 2 2 2 2 I2 I3 Target length = 4 I3 I2 I4 stall Infeasible! Backtrack I5

  18. Branch Combinations & Subset Sum • Branch Combination Problem is NP- Complete! • Can be reduced to Subset Sum • In practice, the number of branches and ranges are small. • Solved efficiently using Dynamic Programming

  19. [0,0] A 1 1 [1,2] B G [1,4] 1 1 [2,3] [2,5] C H 0.3 0 0 [3,6] [3,4] D 3 E 1 3 [6,7] F 0 0.2 [8,8] I 0.5 Complete Example Start with CP heuristic Cost = 0.5 Only length 8 is interesting Branch Comb C F Cost (0, 0) 2 6 0.0 (0, 1) 2 7 0.2 (1, 0) 3 6 0.3

  20. [0,0] A 1 1 [1,1] B G [1,4] 1 1 [2,2] [2,5] C H A 0.3 0 0 Relaxed Sched [3,5] [3,3] D 3 E 0 : A 1 : B 1 3 2 : C [6,6] F 3 : D 0 0.2 4 : G [8,8] I ? 5 : E H 0.5 X Branch Combination (0,0)Cost = 0.0 Infeasible

  21. [0,0] A 1 1 A G [1,1] B G [1,4] B 1 1 [2,2] C G [2,5] C H 0.3 G 0 E 0 D [3,6] [3,4] D 3 E D E H 1 H 3 E [7,7] F 0 0.2 [8,8] I E 0.5 F I Branch Combination (0,1)Cost = 0.2 Optimal Schedule A, B, C, G, D, H, E, F, I with cost 0.2

  22. Experimental Results • Superblocks imported from GCC using SPEC CPU2000, FP and INT • Scheduled for 4 machine models: • single-issue • dual-issue • quad-issue • six-issue. • Time limit set to 1 second per problem

  23. Superblock Statistics

  24. INT2000 Results

  25. Summary & Future Work • An optimal superblock scheduling technique has been developed • About 99% of hard problems solved within 1 sec • 80% improved • Next Goal: explore other global regions. Trace is strongest candidate

More Related