320 likes | 451 Views
Register Allocation and Spilling via Graph Coloring. G. J. Chaitin IBM Research, 1982. Motivation. Before the register allocation phase, the compiler assumes that there are an unlimited number of general purpose registers
E N D
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982
Motivation • Before the register allocation phase, the compiler assumes that there are an unlimited number of general purpose registers • The symbolic registers must be mapped to real registers in a way that avoids conflicts • Symbolic registers that cannot be mapped to real registers must be spilled to memory • We need an algorithm to map registers with minimal spilling cost
Paper Overview • Register allocation overview • Subsumption algorithm • Interference graph coloring algorithm • Spilling algorithm
Register Allocation Steps • Determine which registers are live at any point in the intermediate language (IL) program • Build a register interference graph • Nodes represent symbolic registers • Edges represent a conflict between symbolic registers • Subsumption: eliminate unnecessary register copies • Find a 32-coloring of the interference graph • Decide which registers to spill if necessary
Subsumption • If the source and destination of a register copy do not interfere, they may be coalesced into a single node • For each register copy in IL, determine whether the registers interfere • If not, coalesce the two nodes into one • After first pass, rewrite IL code • Repeat until no more coalescing is possible
Subsumption Example A B C D
Subsumption Example AD BC
Finding a 32-Coloring • Each symbolic register is assigned a color representing a real register • If no adjacent nodes have the same color, then the coloring succeeds • Assume that G has a node N with degree < 32 • Then G is 32-colorable iff the reduced graph from which N and all its edges have been omitted is 32-colorable • Algorithm throws away nodes of degree < 32 until all nodes have been removed • Algorithm fails if no node has degree < 32
3-coloring example A B C D
Spilling • If the 32-coloring fails, then nodes must be spilled to memory • Spilled registers are stored to memory, then loaded momentarily when their results are needed • Every time spill code is generated, the interference graph must be rebuilt • Usually recoloring succeeds after spilling, but sometimes several passes are required
Spilling • NP-Complete problem • Heuristic: spill the node that minimizes • Cost of spilling / Degree of node • Cost of spilling • (number of definition points + number of use points) * frequency of each point • In some cases, spilled node can be reloaded for an extended interval
Conclusion • The graph coloring and spilling algorithms should produce faster code • The register allocation algorithm is efficient • Graph coloring is (N) • But uses (N2) space
Compile-time Copy Elimination Peter Schnorf Mahadevan Ganapathi John Hennessy Stanford, 1993
Motivation • Single assignment languages simplify dependency checking • Which simplifies automatic detection and exploitation of parallelism • But single-assignment languages require a large number of copies • Previous implementations eliminate copies at runtime • Increased efficiency if copies can be eliminated at compile time
Paper Overview • Single-assignment languages • Code generation • Compile-time copy elimination techniques • Substitution • Pattern matching • Substructure sharing • Substructure targeting • Results – success! • Eliminated all copies in bubble sort
Single-assignment languages • Functional languages (LISP, Haskell, SISAL) • Simpler dependency checking • True dependencies – write, read • b = f(c), a = f(b) • Anti-dependencies – read, write • a = f(b), b = f(c) • Output dependencies – write, write • a = f(b), a = f(c) • Aliasing • caused by pointers, array indexes • To avoid aliasing, all inputs and outputs are passed by value
Example – Swap(A,i,j) Input • Data flow diagram • Edges transport values • Simple nodes are operations • Pick any feasible node evaluation order at random • Naïve implementation • Each edge has its own memory • Swap uses 5 array copies! • Optimized implementation • Swap array updates are done in-place AElement AElement AReplace AReplace
Example: BubbleSort(A) • Compound nodes represent control flow • Loops are implemented using recursion to avoid multiple assignment of the iteration variable • Naïve implementation • Bubble sort requires (n2) array copies • Optimized implementation • All array updates are done in place • But parallelism is decreased
Code Generation Overview • Input is from compiler front-end • IF1: intermediate data-flow graph representation • Code generator eliminates copies • Output is in C • Compiled into machine code using an optimized C compiler
Vertical Substitution Input • If input and output have the same type and size, they can share memory • Updates are done in-place 1 AElement 2 AElement 3 AReplace 4 AReplace
Horizontal Substitution Input • If an output has several destinations, the output edges can share memory 1 AElement 2 AElement 3 AReplace 4 AReplace
Horizontal and Vertical Substitution • Horizontal and vertical substitution can interfere with each other • A node along the substitution chain modifies the shared object before its last use • Edges can be marked as read-only if they are shared and this is not the last use
Horizontal and Vertical Substitution Input Input 1 AElement 2 AElement 1 AElement 3 AElement 3 AReplace 2 AReplace 4 AReplace 4 AReplace
Interprocedural Substitution • Previous discussion concerned simple nodes that can be analyzed at compiler design time • Information about a function is needed in order to use substitution • Does the function modify an input? • Will an input be chained to an output?
Intersubgraph Substitution • Substitution analysis is done for each construct • Same basic principles
Determining the Evaluation Order • Evaluation order can impact efficiency of substitution • Naïve implementation selects the next node to evaluate at random • Hints tell algorithm which nodes should be evaluated before and after other nodes if possible • Hints are ad hoc?
Pattern Matching • Replace hard-to-optimize pieces of code • Patterns are language-specific • Patterns are detected using “ad hoc” methods
Substructure Sharing • Allow substructures to be referenced without copies • AElement can be treated as a NoOp • Happens after substitution analysis – less important • Same principles as substitution analysis
Substructure Targeting • Allow structures to be built from substructures without copies • Similar to substructure sharing
Results • Compared optimizations versus naïve implementation • Optimization eliminate all copies for bubble sort • Informal comparison to run-time optimizer shows improvements
Conclusions • Substitution, pattern matching and substructure sharing can almost eliminate unnecessary copies in a single assignment language. • Copy elimination no longer has to be done at run-time. • Single assignment languages should be more efficient for parallel programs.