Register Allocation and Spilling via Graph Coloring

Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982

Motivation • Before the register allocation phase, the compiler assumes that there are an unlimited number of general purpose registers • The symbolic registers must be mapped to real registers in a way that avoids conflicts • Symbolic registers that cannot be mapped to real registers must be spilled to memory • We need an algorithm to map registers with minimal spilling cost

Paper Overview • Register allocation overview • Subsumption algorithm • Interference graph coloring algorithm • Spilling algorithm

Register Allocation Steps • Determine which registers are live at any point in the intermediate language (IL) program • Build a register interference graph • Nodes represent symbolic registers • Edges represent a conflict between symbolic registers • Subsumption: eliminate unnecessary register copies • Find a 32-coloring of the interference graph • Decide which registers to spill if necessary

Subsumption • If the source and destination of a register copy do not interfere, they may be coalesced into a single node • For each register copy in IL, determine whether the registers interfere • If not, coalesce the two nodes into one • After first pass, rewrite IL code • Repeat until no more coalescing is possible

Subsumption Example A B C D

Subsumption Example AD BC

Finding a 32-Coloring • Each symbolic register is assigned a color representing a real register • If no adjacent nodes have the same color, then the coloring succeeds • Assume that G has a node N with degree < 32 • Then G is 32-colorable iff the reduced graph from which N and all its edges have been omitted is 32-colorable • Algorithm throws away nodes of degree < 32 until all nodes have been removed • Algorithm fails if no node has degree < 32

3-coloring example A B C D

Spilling • If the 32-coloring fails, then nodes must be spilled to memory • Spilled registers are stored to memory, then loaded momentarily when their results are needed • Every time spill code is generated, the interference graph must be rebuilt • Usually recoloring succeeds after spilling, but sometimes several passes are required

Spilling • NP-Complete problem • Heuristic: spill the node that minimizes • Cost of spilling / Degree of node • Cost of spilling • (number of definition points + number of use points) * frequency of each point • In some cases, spilled node can be reloaded for an extended interval

Conclusion • The graph coloring and spilling algorithms should produce faster code • The register allocation algorithm is efficient • Graph coloring is (N) • But uses (N2) space

Compile-time Copy Elimination Peter Schnorf Mahadevan Ganapathi John Hennessy Stanford, 1993

Motivation • Single assignment languages simplify dependency checking • Which simplifies automatic detection and exploitation of parallelism • But single-assignment languages require a large number of copies • Previous implementations eliminate copies at runtime • Increased efficiency if copies can be eliminated at compile time

Paper Overview • Single-assignment languages • Code generation • Compile-time copy elimination techniques • Substitution • Pattern matching • Substructure sharing • Substructure targeting • Results – success! • Eliminated all copies in bubble sort

Single-assignment languages • Functional languages (LISP, Haskell, SISAL) • Simpler dependency checking • True dependencies – write, read • b = f(c), a = f(b) • Anti-dependencies – read, write • a = f(b), b = f(c) • Output dependencies – write, write • a = f(b), a = f(c) • Aliasing • caused by pointers, array indexes • To avoid aliasing, all inputs and outputs are passed by value

Example – Swap(A,i,j) Input • Data flow diagram • Edges transport values • Simple nodes are operations • Pick any feasible node evaluation order at random • Naïve implementation • Each edge has its own memory • Swap uses 5 array copies! • Optimized implementation • Swap array updates are done in-place AElement AElement AReplace AReplace

Example: BubbleSort(A) • Compound nodes represent control flow • Loops are implemented using recursion to avoid multiple assignment of the iteration variable • Naïve implementation • Bubble sort requires (n2) array copies • Optimized implementation • All array updates are done in place • But parallelism is decreased

Code Generation Overview • Input is from compiler front-end • IF1: intermediate data-flow graph representation • Code generator eliminates copies • Output is in C • Compiled into machine code using an optimized C compiler

Vertical Substitution Input • If input and output have the same type and size, they can share memory • Updates are done in-place 1 AElement 2 AElement 3 AReplace 4 AReplace

Horizontal Substitution Input • If an output has several destinations, the output edges can share memory 1 AElement 2 AElement 3 AReplace 4 AReplace

Horizontal and Vertical Substitution • Horizontal and vertical substitution can interfere with each other • A node along the substitution chain modifies the shared object before its last use • Edges can be marked as read-only if they are shared and this is not the last use

Horizontal and Vertical Substitution Input Input 1 AElement 2 AElement 1 AElement 3 AElement 3 AReplace 2 AReplace 4 AReplace 4 AReplace

Interprocedural Substitution • Previous discussion concerned simple nodes that can be analyzed at compiler design time • Information about a function is needed in order to use substitution • Does the function modify an input? • Will an input be chained to an output?

Intersubgraph Substitution • Substitution analysis is done for each construct • Same basic principles

Determining the Evaluation Order • Evaluation order can impact efficiency of substitution • Naïve implementation selects the next node to evaluate at random • Hints tell algorithm which nodes should be evaluated before and after other nodes if possible • Hints are ad hoc?

Pattern Matching • Replace hard-to-optimize pieces of code • Patterns are language-specific • Patterns are detected using “ad hoc” methods

Substructure Sharing • Allow substructures to be referenced without copies • AElement can be treated as a NoOp • Happens after substitution analysis – less important • Same principles as substitution analysis

Substructure Targeting • Allow structures to be built from substructures without copies • Similar to substructure sharing

Results • Compared optimizations versus naïve implementation • Optimization eliminate all copies for bubble sort • Informal comparison to run-time optimizer shows improvements

Results

Conclusions • Substitution, pattern matching and substructure sharing can almost eliminate unnecessary copies in a single assignment language. • Copy elimination no longer has to be done at run-time. • Single assignment languages should be more efficient for parallel programs.

Register Allocation and Spilling via Graph Coloring

Register Allocation and Spilling via Graph Coloring

Presentation Transcript

Graph Coloring and Applications

Graph Coloring

Register Allocation

Map Coloring to Graph Coloring

Register Allocation (via graph coloring)

Graph Coloring

Graph Coloring

Register Allocation

Register Allocation

Graph Coloring

Graph Coloring

Register Allocation

Graph Coloring

Register Allocation

Register Allocation: Graph Coloring

Register Allocation

Graph-Coloring Register Allocation

Register Allocation via Coloring of Chordal Graphs

Register Allocation (via graph coloring)

Register Allocation

Graph Coloring