260 likes | 388 Views
A Global Progressive Register Allocator. David Ryan Koes Seth Copen Goldstein Carnegie Mellon University {dkoes,seth}@cs.cmu.edu. eax. ebx. ecx. edx. esi. edi. esp. ebp. Register Allocation Problem. unbounded number of program variables.
E N D
A Global Progressive Register Allocator David Ryan Koes Seth Copen Goldstein Carnegie Mellon University {dkoes,seth}@cs.cmu.edu
eax ebx ecx edx esi edi esp ebp Register Allocation Problem unbounded number of program variables limited number of processor registers + slow memory spill code optimization … v = 1 w = v + 3 x = w + v u = v t = u + x print(x); print(w); print(t); print(u); … register preferences rematerialization register allocator live range splitting memory operands
fully utilize machine description explicit and expressive model of costs of allocation for given architecture optimal solutions A More Principled Register Allocator reg alloc machine description
Multi-commodity Network Flow: An Expressive Model • Given network (directed graph) with • cost and capacity on each edge • sources & sinks for multiple commodities • Find lowest cost flow of commodities • NP-complete for integer flows b a Example: edges have unit capacity 1 0 b a
a a r0 r0 r1 r1 mem mem 1 1 Register Allocation as a MCNF Variables Commodities Variable Definition Source Variable Last Use Sink Nodes Allocation Classes (Reg/Mem/Const) Registers Limits Node Capacities Spill Costs Edge Costs Allocation Flow r1 mem 1 3 Also need anti-variables to model persistent memory
Example load cost Source Code int example(int a, int b) { int d = 1; int c = a - b; return c+d; } insn pref cost Pre-alloc Assembly MOVE 1 -> d SUB a,b -> c ADD c,d -> c MOVE c -> r0 mem access cost
Split Normal Merge a: %eax a: %eax a: mem a: mem a: mem Control Flow • MCNF can only represent straight-line code • need to link together networks from basic blocks New nodes to handle block entry/exit constraints a: %eax a: mem
fully utilize machine description explicit and expressive model of costs of allocation for given architecture: Global MCNF optimal solutions NP-hard, so use progressive solution technique reg alloc machine description Technique: Lagrangian relaxation directed allocators Allocation Quality Compile Time A More Principled Register Allocator
Solution Procedure • Compute Lagrangian prices using iterative subgradient optimization • guaranteed converge to “optimal” prices • for linear relaxation of the problem • Prices used by allocator to find solution • solution improves as prices converge • two allocators • iterative heuristic allocator • simultaneous heuristic allocator
Solution Procedure • Advantages • iterative nature progressive • Lagrangian relaxation theory provides means for computing a good lower bound • Can compute optimality bound • Disadvantages • No guarantee of finding optimal solution • Optimality bound poor if integrality gap large 99% of the time integrality gap = 0
a b c d 0 4 0 -2 Iterative Heuristic Allocator Edges to/from memory cost 3 Allocation order: a, b, c, d Cost: Total: 2
X X Simultaneous Heuristic Allocator Edges to/from memory cost 3 Current cost: -1 -3 -2
Evaluation • Implemented in gcc 3.4.3 targeting x86 • Optimize for code size • perfect static evaluation • important metric in its own right • MediaBench, MiBench, Spec95, Spec2000 • over 10,000 functions
default allocator: 1121 graph allocator: 1422 • CPLEX Progressiveness
graph allocator • default allocator • CPLEX Progressiveness
Progressive! Code Size
Optimality Proven maximum distance from optimal Proven optimality
10x slower Compile Time Slowdown :-(
fully utilize machine description explicit and expressive model of costs of allocation for given architecture: Global MCNF optimal solutions approach optimality using progressive solution technique: Lagrangian directed allocators reg alloc machine description A More Principled Register Allocator
? Questions?
Accuracy of the Model Global MCNF model correctly predicts costs of register allocation within 2% for 71% of functions compiled
Compile Time Asymptotic Complexity one iteration: O(nv)
Compile Time Slowdown :-( 10x slower