Register Allocation: Graph Coloring

Register Allocation: Graph Coloring Compiler Baojian Hua bjhua@ustc.edu.cn

Middle and Back End translation AST IR1 translation IR2 other IR and translation asm

Back-end Structure instruction selector IR Assem register allocator TempMap instruction scheduler Assem

int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret } InstructionSelection Prolog int f (int x, int y) { int a; int b; int c; int d; a = x + y; b = a + 4; c = b * 2; d = c / 8; return d; } y: 12(%ebp) x: 8(%ebp) Positions for a, b, c, d can not be determined during this phase. Epilog

Register allocation • After instruction selection, there may be some variables left • basic idea: • put as many as possible of these variables into registers • speed! • Into memory, only if the register are out of supply • This process is called register allocation • the most popular and important optimization in modern compilers

int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret } RegisterAllocation Suppose that the register allocation determines that (we will discuss how to do this a little later): a => %eax b => %eax c => %eax d => %eax t1 => %eax t2 => %edx (this data structure is called a temp map)

.text .globl f f: pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl b, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret Rewriting %eax With the given temp map: a => %eax b => %eax c => %eax d => %eax t1 => %eax t2 => %edx %edx %eax %eax %edx %eax The rest are left to you! We can rewrite the code accordingly, to generate the final assembly code.

.globl f f: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax movl 12(%ebp), %edx movl %eax, %eax addl %edx, %eax movl %eax, %eax addl $4, %eax movl %eax, %eax imult $2 movl %eax, %eax movl %eax, %eax cltd idivl $8 movl %eax, %eax movl %eax, %eax leave ret Peep-holeOptimization Peep-hole optimizations try to improve the code by examine the code using a code window. It’s of a local manner. For example, we can use a code window of width 1, to eliminate the obvious redundancy of the form: movl r, r

// This function does // NOT need a (stack) // frame! .text .globl f f: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax movl 12(%ebp), %edx addl %edx, %eax addl $4, %eax imult $2 cltd idivl $8 leave ret Final Assembly int f (int x, int y) { int a; int b; int c; int d; a = x + y; b = a + 4; c = b * 2; d = b / 8; return 0; }

int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret } RegisterAllocation Register allocation determines a temp map: a => %eax b => %eax c => %eax d => %eax t1 => %eax t2 => %edx How to generate such a temp map? Key observation: two variables can reside in one register, iff they don NOTlivesimultaneously.

int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret } LivenessAnalysis So, we can perform liveness analysis to calculate the live variable information. On the right, we mark, between each two statements, the liveOut set. {…} {eax} {d} {eax} {eax}

int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl c, %eax cltd idivl $8 movl %eax, d movl d, %eax leave ret } InterferenceGraph (IG) Register allocation determines that: (the temp map) a => %eax b => %eax c => %eax d => %eax t1 => %eax t2 => %edx t2 ∞ t1 a ∞ t2 a b %eax %eax c d %eax %eax t1 t2 %edx %eax

Steps in Register Allocator • Do liveness analysis • Build the interference graph (IG) • draw an edge between any two variables which don’t live simultaneously • Coloring the IG with K colors (registers) • K is the number of available registers on a machine • A classical problem in graph theory • NP-complete (for K>=3), thus one must use heuristics • Allocate physical registers to variables

History • Early work by Cocke suggests that register allocation can be viewed as a graph coloring problem (1971) • The first working allocator is Chaitin’s for IBM PL/1 compiler (1981) • Later, IBM PL.8 compiler • Have some impact on the RISC

History, cont • The more recent graph coloring allocator is due to Briggs (1992) • For now, the graph coloring is the most popular allocator, used in many production compilers • e.g., GCC • But more advanced allocators invented in recent years • so, graph coloring is a lesson abandoned? • more on next few lectures …

Graph coloring • Once we have the interference graph, we can try to color the graph with K colors • K: number of machine registers • adjacent nodes with difference colors • But this problem is a NP-complete problem (for K>=3) • So we must use some heuristics

Kempe’s Allocator

Kempe’s Theorem • [Kempe] Given a graph G with a node n such that degree(n)<K, G is K-colorable iff (G-{n}) is K-colorable (remove n and all edges connect n) • Proof? degree(n)<K n …

Kempe’s Algorithm kempe(graph G, int K) while (there is any node n, degree(n)<K) remove this node n assign a color to the removed node n // greedy if (G is empty) // i.e., G is K-colorable return success; return failure;

Example degree(a) = 3<4 remove node “a”, assign the first available color a b e c d K = 4 1, 2, 3, 4

Example degree(a) = 3<4 remove node “a”, assign the first available color a b degree(b) = 2<4 remove node “b”, assign the first available color e Here, we want to choose the node with lowest degree, what kind of data structure should we use? c d K = 4 1, 2, 3, 4

Example degree(a) = 3<4 remove node “a”, assign the first available color a b degree(b) = 2<4 remove node “b”, assign the first available color e degree(c) = 2<4 c d remove node “c”, assign the first available color K = 4 1, 2, 3, 4

Example degree(a) = 3<4 remove node “a”, assign the first available color a b degree(b) = 2<4 remove node “b”, assign the first available color e degree(c) = 2<4 c d remove node “c”, assign the first available color degree(d) = 1<4 remove node “d”, assign the first available color K = 4 1, 2, 3, 4

Example degree(a) = 3<4 remove node “a”, assign the first available color a b degree(b) = 2<4 remove node “b”, assign the first available color e degree(c) = 2<4 c d remove node “c”, assign the first available color degree(d) = 1<4 remove node “d”, assign the first available color K = 4 1, 2, 3, 4 degree(e) = 0<4 remove node “e”, assign the first available color

Example So this graph is 3-colorable. But if we have three colors, we can NOT apply the Kempe algorithm. (Why?) a b We can refine it to the following one: e kempe(graph G, int K) stack = []; while (true) remove and push node<K to stack; if node>=K, remove and push it pop stack and assign colors c d K = 3 1, 2, 3 Essentially, this is a lazy algorithm!

Example remove node “a”, push onto the stack a b e c d K = 3 1, 2, 3

significant Example a remove node “a”, push onto the stack remove node “b”, push onto the stack a b e c d K = 3 1, 2, 3

significant Example b a remove node “a”, push onto the stack remove node “b”, push onto the stack a b remove node “c”, push onto the stack e c d K = 3 1, 2, 3

significant Example d c b a remove node “a”, push onto the stack remove node “b”, push onto the stack a b remove node “c”, push onto the stack remove node “d”, push onto the stack e remove node “e”, push onto the stack c d K = 3 1, 2, 3

significant Example e d c b a remove node “a”, push onto the stack remove node “b”, push onto the stack a b remove node “c”, push onto the stack remove node “d”, push onto the stack e remove node “e”, push onto the stack pop the stack, assign suitable colors c d pop “e” K = 3 1, 2, 3

significant Example d c b a remove node “a”, push onto the stack remove node “b”, push onto the stack a b remove node “c”, push onto the stack remove node “d”, push onto the stack e remove node “e”, push onto the stack pop the stack, assign suitable colors c d pop “e” pop “d” K = 3 1, 2, 3

significant Example c b a remove node “a”, push onto the stack remove node “b”, push onto the stack a b remove node “c”, push onto the stack remove node “d”, push onto the stack e remove node “e”, push onto the stack pop the stack, assign suitable colors c d pop “e” pop “d” pop “c” K = 3 1, 2, 3

significant Example b a remove node “a”, push onto the stack remove node “b”, push onto the stack a b remove node “c”, push onto the stack remove node “d”, push onto the stack e remove node “e”, push onto the stack pop the stack, assign suitable colors c d pop “e” pop “d” pop “c” K = 3 1, 2, 3 pop “b”

significant Example a remove node “a”, push onto the stack remove node “b”, push onto the stack a b remove node “c”, push onto the stack remove node “d”, push onto the stack e remove node “e”, push onto the stack pop the stack, assign suitable colors c d pop “e” pop “d” pop “c” K = 3 1, 2, 3 pop “b” pop “a”

Example remove node “a”, push onto the stack remove node “b”, push onto the stack a b remove node “c”, push onto the stack remove node “d”, push onto the stack e remove node “e”, push onto the stack pop the stack, assign suitable colors c d pop “e” pop “d” pop “c” K = 3 1, 2, 3 pop “b” pop “a”

Moral • Kempe’s algorithm: • step #1: simplify • remove graph nodes, be optimistic • step #2: select • assign a color for each node, be lazy • You should use this algorithm for your lab6 first • But what about the select phase fail? • no enough colors (registers)!

Example remove node “a”, push onto the stack a b e c d K = 2 1, 2

Failure • It’s often the case that Kempe’s algorithm fails • The IG is not K-colorable • The basic idea is to generate spilling code • some variables should be put into memory, instead of into registers • Usually, spilled variables reside in the call stack • Should modify code using such variables: • for variable use: read from the memory • for variable def: store into the memory

Spill code generation • The effect of spill code is to turn long live range into shorter ones • This may introduce more temporaries • The register allocator should start over, after generating spill code • We’ll talk about this shortly

Chaitin’s Allocator

Chaitin’s Algorithm • Build: build the interference graph (IG) • Simplify: simplify the graph • Spill: for significant nodes, mark it as potential spill (sp), remove it and continue • Select: pop nodes and try to assign colors • if this fails for potential spill node, mark potential spill as actural spill and continue • Start over: generate spill code for actural spills and start over from step #1 (build)

Chaitin’s Algorithm build simplify Potential spill Select Actual spill

Step 1: build the IG a b a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e c d e f K = 2 1, 2

Step 2: simplification a b a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e c d f e f K = 2 1, 2

Step 2: simplification a b a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e e c d f e f K = 2 1, 2

Step 2: simplification a b a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e c ps e c d f e f K = 2 1, 2

Step 2: simplification a b a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e d ps c ps e c d f e f K = 2 1, 2

Step 2: simplification a a b a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e d ps c ps e c d f e f K = 2 1, 2

Step 2: simplification b a a b a = 1 b = 2 c = a+b d = a+c e = a+b f = d+e d ps c ps e c d f e f K = 2 1, 2

Register Allocation: Graph Coloring

Register Allocation: Graph Coloring

Presentation Transcript

Graph Algorithms Using Depth First Search

An Introduction to Latent Dirichlet Allocation (LDA)

Advanced DFS, BFS, Graph Modeling

Chapter 6: Register Disbursement Schemes

Getting Started

Cost Allocation and Federal Compliance

Map Coloring with Logic Two Formulations of Map Coloring

Parallel Graph Algorithms

Compilation 0368-3133 (Semester A, 2013/14)

Parallel Graph Algorithms

Theory of Compilation 236360 Erez Petrank

Graphs and Graph Theory in Computational Biology

Dynamic Memory Allocation

Exact Exponential Time Algorithms for Frequency Assignment

CS 267: Applications of Parallel Computers Graph Partitioning

Compiler course

Objective

Graphs

Outlier Detection for Graph Data