520 likes | 693 Views
CSE P501 – Compiler Construction. Optimization: Goals Eliminating Redundancy & Value N umbering Control flow graphs (CFGs ) Optimization Scopes: local, regional, global, whole-program Dominators See Cooper&Torczon chapter 8, 9. Code Improvement. Pick a better algorithm(!)
E N D
CSE P501 – Compiler Construction Optimization: Goals Eliminating Redundancy & Value Numbering Control flow graphs (CFGs) Optimization Scopes: local, regional, global, whole-program Dominators See Cooper&Torczon chapter 8, 9 Jim Hogg - UW - CSE - P501
Code Improvement • Pick a better algorithm(!) • Eg: Quicksort rather than BubbleSort • Compilers ain't smart enough to choose better algorithms • Use machine resources effectively • Instruction selection & scheduling • Register allocation • More about these later… Jim Hogg - UW - CSE - P501
“Optimization” • No optimizations are actually “optimal” • Best we can do is to improve things • Most (much?) (some?) of the time • Realistically: try to do better for common idioms both in the code and on the machine • Never to make it worse! Jim Hogg - UW - CSE - P501
Example Optimization • Classic example: Array references in a loop • for (k = 0; k < n; k++) a[k] = 0; • Naive codegen for a[k] = 0 in loop body • moveax, 4// elemsize = 4 bytes • imuleax, [ebp+offsetk]// k * elemsize • add eax, [ebp+offseta]// &a[0] + k * elemsize • mov [eax], 0 // a[k] = 0 • Better! • moveax, [ebp+offseta] // &a[0], once-off • mov [eax], 0 // a[0]=0, a[1]=0, etc • add eax, 4 // eax = &a[1], &a[2], etc Note: pointers allow a user to do this directly in C or C++ Eg: for (p = a; p < a + n; ) *p++ = 0; Jim Hogg - UW - CSE - P501
Example Optimization: A[row, col] column row-major layout row int A[row, col] = @A + (row * numcols + col) * 4 // 0-based A[row, col] = @A + (row – rowmin) * (colmax – colmin + 1) * elemsize + (col – colmin) * elemsize Jim Hogg - UW - CSE - P501
Optimize A[,] = 0 for (i = 0; i < ncols; i++) { for (j = 0; j < nrows; j++) { a[i, j] = 0; } } • A[i, j] = @A + (i* ncols+ j) * 4 • Naive codegen for A[i, j] = 0 • moveax, [ebp+offseti] • imuleax, [ebp+offsetncols] • add eax, [ebp+offsetj] • imuleax, 4 • lea edx, [ebp+offsetA] • add eax, edx • mov [eax], 0 • Better! • lea eax, [ebp+offsetA] // &A[0], once-off • mov [eax], 0 // A[i, j] = 0 • add eax, 4 • Strength Reduction • C programmers can do this with pointers Jim Hogg - UW - CSE - P501
Optimization Phase • Goal • Discover, at compile time, information about the runtime behavior of the program, and use that information to improve the generated code Jim Hogg - UW - CSE - P501
Common Problems in Optimization • Strategy is typical of most compiler optimizations • Discover opportunities through program analysis • Modify the IR to take advantage of the opportunities • Historically, goal usually was to decrease execution time • Other possibilities: reduce space, power, … Jim Hogg - UW - CSE - P501
Issues (1) • Safety – transformation must not change program meaning • Must generate correct results • Can’t generate spurious errors (ie, report error when there is none) • Optimizations must be conservative, and always correct • never 'take a risk': 99% != 100% Eg: • fetch &o.mand remember m's address • if call to m then call remembered m, so long as obj.vtbl == o.vtbl • Large part of analysis goes towards proving safety • However, it’s not black&white. Eg: • hoist bounds-check • elide exception thru DCE Jim Hogg - UW - CSE - P501
Issues (2) • Profitability • If a transformation is possible, is it profitable? • Example: loop unrolling • Can increase efficiency by reducing loop overhead (fewer tests & branches) • Can eliminate duplicate operations done on separate iterations • But, increases code size Jim Hogg - UW - CSE - P501
Issues (3) • Downside risks • Even if a transformation is worthwhile, consider its side-effects • Eg: • Transformation might need more temporaries, increasing register pressure • Increased code size could cause cache misses • Increased code size could increase paging (a disaster) Jim Hogg - UW - CSE - P501
Eliminating Redundancy • An expression x+y is redundant at a program point iff: • along every path from the procedure’s entry, it has been evaluated and its constituent sub-expressions (x and y) have not been re-defined (ie, assigned new values) since that evaluation • If the compiler can prove the expression is redundant • Can store the result of the earlier evaluation: t123 = x + y; • Can replace the redundant computation with a reference to the earlier (stored) result, as in = t123 ... Jim Hogg - UW - CSE - P501
Value Numbering • Technique for eliminating redundant expressions • Assign an identifying number VN(n) to each expression • VN(x+y) == VN(j) if x+y and j have the same value • Use hashing over value numbers for efficiency • But hashing is not crucial to the algorithm • Old idea (Balke 1968, Ershov 1954) • Invented for low-level, linear IRs • Equivalent methods exist for tree IRs. Eg: build a DAG Jim Hogg - UW - CSE - P501
Redundancies? Basic Block a = x * y b = x * y a = 7 c = x * y x = 3 d = x * y 1 2 3 4 5 6 • Are any of these IR instructions "redundant"? • In 2, can I re-use a in place of b? • In 4, can I re-use a in place of c? • In 6, can I re-use a in place of d? Jim Hogg - UW - CSE - P501
Local Value Numbering (LVN) a3 = x1 * y2 b3 = x1 * y2 a4 = 74 c3 = x1 * y2 x5 = 35 d6 = x5 * y2 • VN of a variable is not the actual value of that variable at runtime! • Two variables with same VN are guaranteed to have the same value at runtime • Two variables with different VN may have same value at runtime, by happenstance, but it's not guaranteed Jim Hogg - UW - CSE - P501
Transform Code Remove Redundancies VN'd a3 = x1 * y2 b3 = x1 * y2 a4 = 74 c3 = x1 * y2 x5 = 35 d6 = x5 * y2 a3 = x1 * y2 b3= a3 a4 = 74 c3 = a3 x5 = 35 d6 = x5 * y2 • Oops!a3 was overwritten Jim Hogg - UW - CSE - P501
Solution- SSA • Could remember a3 into a temporary • But realize that a4 is really a different variable from a3 • Eg: we could store a3 in register R7 and a4 in a different register R2; they are totally unrelated • Use subscript to denote different instances of a variable • Bump the lifetime subscript each time we assign into that variable • This is the essence of SSA form - see later a03 = x01 * y02 b03= a03 a14 = 74 c03 = a03 x15 = 35 d06 = x15 * y02 • Fixed: a03 still alive
Extensions to LVN • Constant Folding • Note whether each operand is a constant • Evaluate any arithmetic of those constants at compile-time • Replace instruction with a load-immediate (eg: moveax, 42) • Algebraic Identities • x+0 = x, x - x = 0, etc, etc • Humans don't write these - but may be created by index calculation, etc • Commutativity • x8 = a5 + b7 • y9 = b7 + a5 • Recognize a+b = b+a; eg: order by increasing VNs a03 = x01 * y02 b03= a03 a14 = 74 c13 = a03 x15 = 35 d06 = x15 * y02
LVN in Detail (1) Basic Block g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v 1 2 3 4 5 6 7 8 • Are any of these IR instructions "redundant"? • In 3 can I re-use already-calculated value of g? • What about 5 - can I replace x with a re-use of h from 2? • Any others? Jim Hogg - UW - CSE - P501
LVN in Detail (2) Expression ValueNumber 1 2 3 3 x y <1,+,2> g Basic Block u v <4,-,5> h 4 5 6 6 g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v 1 2 3 3 x y <1,+,2> i • VN of a variable is not the actual value of that variable at runtime! • Two variables with same VN are guaranteed to have the same value at runtime • Two variables with different VN may have same value at runtime, by happenstance, but it's not guaranteed Jim Hogg CSE P 501
LVN in Detail (3) Expression ValueNumber 6 2 3 3 x y <1,+,2> g g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v u v <4,-,5> h 4 7 5 6 6 1 2 3 3 x y <1,+,2> i 7 3 Jim Hogg CSE P 501
LVN in Detail (4) Expression Expression ValueNumber ValueNumber 6 8 2 3 3 8 x y <1,+,2> g <7,-,5> <3,+,6> g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v 9 10 <3,+,8> u v <4,-,5> h 47 9 5 10 6 6 11 11 <9,+,10> w 1 2 3 3 x y <1,+,2> i 7 3 Jim Hogg CSE P 501
LVN in Detail: Summary Exp VN g h i u v w x y 3 <1,+,2> <3,+,6> <3,+,8> <4,-,5> <7,-,5> <9,+,10> Basic Block 3 6 3 47 9 5 10 11 1 8 2 7 3 9 10 6 8 11 g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v Jim Hogg CSE P 501
LVN in Detail - Superscript Notation Exp VN Basic Block g h i u v w x y <1,+,2> <3,+,6> <3,+,8> <4,-,5> <7,-,5> <9,+,10> 3 6 3 47 9 5 10 11 1 8 2 3 9 10 6 8 11 g3 = x1 + y2 h6 = u4 - v5 i3 = x1 + y2 u7 = 37 x8 = u7 – v5 u9 = g3 + h6 v10 = i3 + x8 w11 = u9 + v10 Jim Hogg CSE P 501
LVN Algorithm foreachinstr in BasicBlock, with instr ≡ lhs = <rhs1,op,rhs2> lookup rhs1 in EVN-Table, giving vn1 if vn1 < 0 then {vn1 = ++V; insert <rhs1,vn1> into EVN-Table} lookup rhs2in EVN-Table, giving vn2 if vn2 < 0 then {vn2= ++V; insert <rhs2,vn2>into EVN-Table} lookup <vn1,op,vn2> in EVN-Table, giving vn if vn < 0 then insert <<vn1,op,vn2>,vn>into EVN-Table else update EVN-Table to hold <<vn1,op,vn2>,vn> next V is a sequence counter, starting at 1 BasicBlock EVN-Table g = x + y h = u - v . . . g h ... <1,+,2> <3,+,6> ... 3 6 ... 3 9 ... For efficiency, use hash table for EVN-Table Jim Hogg CSE P 501
Basic Block • A basic block is a maximal length sequence of straight-line code • A branch-free sequence of instructions • Properties • Statements are executed sequentially • If any statement executes, they all do (barring exceptions) • In linear IR, first instruction of a basic block is called a leader • Procedure entry; jump targets; instruction following jump or call • Implementation • Array of quads (d = x op y) • Linked list of quads Jim Hogg - UW - CSE - P501
Reminder: Flowgraph B0 B2 • A Flowgraph is: • Directed graph of Nodes and Edges • Node = Basic Block • Edge = any possible flow between nodes B3 B1 B4 B5 B6 Jim Hogg - UW - CSE - P501
ENTRY Real Example Flowgraph B1 i = 1 j = 1 B2 Control Flow Graph ("CFG", again!) B3 t1 = 10 * i t2 = t1 + j t3 = 8 * t2 t4 = t3 - 88 a[t4] = 0 j = j + 1 if j <= 10 goto B3 • 3 loops total • 2 of the loops are nested B4 i = i + 1 if i <= 10 goto B2 B5 i = 1 B6 t5 = i - 1 t6 = 88 * t5 a[t6] = 1 i = i + 1 if i <= 10 goto B6 Most of the executions likely spent in loop bodies; that's where to focus efforts at optimization Jim Hogg - UW - CSE - P501 EXIT
EBBs: Formal Definition • A set of blocks b1, b2, …, bn where b1 has multiple predecessors and each of the remaining blocks bi (2≤i≤n) have only bi-1 as its unique predecessor • The EBB is entered only at b1, but may have multiple exits • A single block bi can be the head of multiple EBBs (these EBBs form a tree rooted at bi) Jim Hogg - UW - CSE - P501
Example Extended Basic Blocks (EBBs) B0 Intuition: Maximal straight-line sequence of Blocks, with no join points B2 EBBs cannot represent loops! B3 B1 B4 • { B0, B1 } • { B0, B2, B3 } • { B0, B2, B4 } • { B5 } • { B6 } B5 B6 Jim Hogg - UW - CSE - P501
Optimization Scopes • Local • Within a Basic Block • Regional • Bigger than block, smaller than function • Eg: loop or Extended Basic Block (EBB) • Global • Whole function • Whole Program • Spans two or more functions • Maybe large part of a program (excludes libraries) • BasicBlock< Region < Function < Program Jim Hogg - UW - CSE - P501
Local Optimization • Local optimizations ≡ within a Basic Block • Algebraic simplifications (eg: 2*x = x+x; x+0 = x) • Constant folding (eg: FeetPerMile/MinutesPerHour = 88) • Common sub-expression elimination (CSE) • Dead code elimination (DCE) (unreachable or unused) • Specialize computation based on context • Etc Jim Hogg - UW - CSE - P501
Regional Optimization • Bigger than a Basic-Block; less than whole function • Eg: Loops • Hoist invariant code out of loop • Unroll-and-jam (unroll outer loop; merge inners) • Loop Interchange, for better caching • Array-index strength reduction • Eg: Extended-Basic-Blocks (EBBs) - no-join paths • Super-Local Value Numbering (SVN) • Etc EBB Jim Hogg - UW - CSE - P501
Global Optimization • Misnomer - really means whole-function • Global common sub-expression elimination (CSE) • Register allocation • Choose which variables, for how long, are enregistered • Choose which registers - not so important with 'orthogonal' ISAs • Result memoization / caching • Remember last 50, say, results calculated; re-use where possible • Eg: can speed a naive Fibonacci calculation 10x or more • Etc Jim Hogg - UW - CSE - P501
Whole-Program Optimization (WPO) • a.k.a Inter-Procedural Optimization (IPO) • Spans all, or much, of a program; certainly >1 function • At-odds with "separate compilation" • Inlining (assisted by profile 'training') • Hot/Cold code separation • Whole-Program DCE, or "tree shaking" • No program uses an entire library! • Remove unused classes and/or methods • Can make drastic reductions in program size Jim Hogg - UW - CSE - P501
Local Value Numbering 1 block at a time Strong local results No cross-block effects LVN finds these redundancies LVN misses these Super-Local Value Numbering (SVN) A m = a + b n = a + b B C p = c + d r = c + d q = a + b r = c + d D E e = b + 18 s = a + b u = e + f e = a + 17 t = c + d u = e + f F v = a + b w = c + d x = e + f G y = a + b z = c + d Jim Hogg - UW - CSE - P501
Apply LVN to EBBs: {A,B}, {A,C,D}, {A,C,E} Value Numbering over EBBs A m = a + b n = a + b • Final info from A is initial info for B, C • Final info from C is initial for D, E B C p = c + d r = c + d q = a + b r = c + d • Avoid re-analyzing A, C • Doesn’t help with F, G D E e = b + 18 s = a + b u = e + f e = a + 17 t = c + d u = e + f F v = a + b w = c + d x = e + f • LVN • SVN • Missed G y = a + b z = c + d Jim Hogg - UW - CSE - P501
SSA Name Space (from before) a03 = x01 * y02 b03 = a03 a14 = 74 c03 = a03 x15 = 35 d06 = x15 * y02 • Unique name for each definition • Subscript denotes variable's SSA number • The variable's "generation" number • c03 = a03 now works, since a1= 7 happened to a 'different' variable
SSA Name Space • Two Principles • Each name is defined (assigned-to) by exactly one instruction • Each operand refers to exactly one definition • Need to deal with merge points • Add Φ functions at merge points to reconcile names • Use subscripts on variable names for uniqueness Jim Hogg - UW - CSE - P501
φ Functions a3 a6 A B a7 = φ(a3, a6) C • Result of a7 = φ(a3, a6) is either a3 or a6 depending on whether we reached node C from node A or from node B • Not really a mathematical function at all Jim Hogg - UW - CSE - P501
Insert φ functions at join nodes F, G Little extra cost Still does nothing for F and G SVN with Bells & Whistles A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Jim Hogg - UW - CSE - P501
Still have not helped F, G Problem: multiple predecessors Must decide what facts hold in F and in G For G, combine B & F? Merging states is expensive Fall back on what we know Larger Scopes A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Jim Hogg - UW - CSE - P501
Dominators • Definition • xdominatesyiff every path from the entry of the flowgraph to y includes x • By definition, x dominates x • Associate a Dom set with each node • | Dom(x) |≥ 1 • Many uses in analysis and transformation • Finding loops, building SSA form, code motion Jim Hogg - UW - CSE - P501
Dominator Sets A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 Dom(n) n A B C D E F G {A} {A,B} {A,C} {A,C,D} {A,C,E} {A,C,F} {A,G} D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Jim Hogg - UW - CSE - P501
Dominator Tree & IDom Dom(n) IDom(n) n A A B C D E F G {A} {A,B} {A,C} {A,C,D} {A,C,E} {A,C,F} {A,G} - A A C C C A B C G D E F • Closest dominator is called the Immediate Dominator • Denoted IDom(n) Jim Hogg - UW - CSE - P501
Still looking for a way to handle F, G Use info from IDom(x) to start analysis of x Use C for F and A for G Dominator VNTechnique (DVNT) Dominator Value Numbering A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Jim Hogg - UW - CSE - P501
DVN algorithm • Use SVN on EBBs • Use scoped hash tables & SSA namespace as before • Start each node with EVN-Table from its IDOM • No values flow along back edges (i.e., loops) • Constant folding, algebraic identities as before Jim Hogg - UW - CSE - P501
Advantages Finds more redundancies Little extra cost Shortcomings Misses some opportunities (common calculations in ancestors that are not IDOMs) Doesn’t handle loops or other back edges Dominator Value Numbering A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 LVN SVN DVN Missed G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Jim Hogg - UW - CSE - P501
The Story So Far… • Local algorithm: LVN • Superlocal extension: SVN • Some local methods extend cleanly to superlocal scopes • Dominator VN Technique (DVN) • All of these propagate along forward edges • None are global (can't cope with loops) Jim Hogg - UW - CSE - P501
Next • Dataflow analysis • Global solution to redundant expression analysis • Transformations • Catalog of transforms a compiler can do with dataflow analysis Jim Hogg - UW - CSE - P501