CSE P501 – Compiler Construction

CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing Jim Hogg - UW - CSE - P501

The Story So Far… • Redundant expression elimination • Local Value Numbering (LVN) • Super-local Value Numbering (SVN) • Extends LVN to EBBs • SSA-like namespace • Dominator Value Numbering (DVN) • All of these propagate along forward edges • None are global • In particular, none can handle back edges and loops Jim Hogg - UW - CSE - P501

Most sophisticated algorithm so far Still misses some opportunities Can’t handle loops Dominator Value Numbering A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Missed opportunities Jim Hogg - UW - CSE - P501

Available Expressions • Goal: use dataflow analysis to find CSEs that span basic blocks • Idea: calculate available expressions at beginning of each block (rather than just the Value-Numbers for variables) • Having found an expression that is already available, there's no need to re-evaluate it: use a copy instead Jim Hogg - UW - CSE - P501

Available Expressions: It's Simple! a=b+c d=e+f f=a+c b+c is available here g=a+d h=b+c g=a+c • b+cwas calculated earlier • neither b nor c has been assigned-to since • so replace h=b+c with h=a j=a+b+c+d • No Value Numbers (super-scripts) • ie: trying to work out whether two variables hold same value • No SSA Numbers (sub-scripts) • ie: recording the life or instantiation of each variable Jim Hogg - UW - CSE - P501

“Available” and Other Terms • An expression e is defined at point p in the flowgraph if its value is computed at p • Sometimes called definition site, or simply "def" • eg: x = a+b ; expression a+b is defined here • An expression e is killed at point p if one of its operands is defined at p • Sometimes called kill site, or simply "kill" • eg: x = a+b; def site • b = 7 ; kill site • ; kills every expression involving b ! • An expression e is available at point p if every path leading to p contains a prior definition of e and e is not killed between that definition and p • Simply: an available expression is one you don't need to re-calculate Jim Hogg - UW - CSE - P501

Available Expressions - Intuition =a+b =a+b =a+b =a+b =a+b a= a+b? a+b? a+b? a+b? a+b must reach a+b?So, every path from start to a+b? must include a def for a+b. Any assignment to a or b kills that available expression, throughout the procedure! Jim Hogg - UW - CSE - P501

Available Expressions: Flowgraph a=b+c d=e+f f=a+c g=a+d h=b+c g=a+c j=a+b Jim Hogg - UW - CSE - P501

Number the Expressions a=b+c1 d=e+f2 f=a+c3 g=a+d5 h=b+c6 g=a+c4 • Start by assigning (arbitrary) numbers to every expression in the function • Pay no attention to what each expression is, just number it! • Implementation: a map between expression number and location - eg, expression #6 = instruction#3 in basic-block #4 j=a+b7 Jim Hogg - UW - CSE - P501

def & kill for each Instruction def kill a=b+c1 d=e+f2 f=a+c3 {1} {2} {3} {3,4,5,7} {5} {} g=a+d5 h=b+c6 g=a+c4 j=a+b7 • Eg: a = b + c • defs the expression b+c • kills every expression that uses a Jim Hogg - UW - CSE - P501

Summarize DEF & KILL for Basic Block def kill {1} {2} {3} {1,2} {3,4,5,7} {5} {} {3,4,5,7} a=b+c1 d=e+f2 f=a+c3 KILL = {} foreach instruction KILL = killi g=a+d5 h=b+c6 g=a+c4 DEF= {} foreach instruction DEF= (DEF  geni ) - killi j=a+b7 Union all the defs: {1,2,3}. Remove any which appear in KILL => {1,2} def and kill ~ instruction DEF and KILL ~ basic block Jim Hogg - UW - CSE - P501

Summarize DEF & KILL for Flowgraph def kill {1} {2} {3} {1,2} {3,4,5,7} {5} {} {3,4,5,7} a=b+c1 d=e+f2 f=a+c3 {5} {6} {5,6} {} {} {} g=a+d5 h=b+c6 {4} {4} {} {} g=a+c4 j=a+b7 Jim Hogg - UW - CSE - P501

Available Expression Sets • For each block b, define • DEF(b) – the set of expressions defined in b and not subsequently killed in b • ie: defined in b and survives to its end • can construct this by inspecting b in isolation - never changes • KILL(b) – the set of all expressions in the entire procedure that is killed in b • can construct this by inspecting bin isolation - never changes • AVAIL(b) – the set of expressions available on entry to b • find by solving a set of equations Implementation: assign a number to each expression and track its availability via one bit in a (large) bit-vector representing the set of all expressions in the function Jim Hogg - UW - CSE - P501

Computing Available Expressions • AVAIL(b) = DEF(p)  ( AVAIL(p) - KILL(p) ) p  preds(b) • preds(b) is the set of b’s predecessors in the flowgraph • works for all flows, including loops • defines a system of simultaneous equations – a dataflow problem AVAILb = IntersectpDEFp (AVAILp - KILLp) AVAIL(p) GEN(p) p1 p2 p3 predecessors AVAIL(b) KILL(p) b

Computing Available Expressions Given DEFb and KILLb for each basic block, b, in the procedure: foreach block b { set AVAILb = {all expressions in function} = U } worklist= {all blocks in function} while (worklist {}) remove a block b from worklist recomputeAVAILb if AVAILbchanged worklist= = successorsb } } Jim Hogg - UW - CSE - P501

Name Space? • In previous value-numbering algorithms, we used an SSA-like renaming to keep track of versions • In global dataflow problems, we use the original namespace • we require that a+b have the same value along all paths to its use • if a or b is updated along any path to its use, then a+b has the 'wrong' value, so must recalculate its value • so original names are exactly what we want • KILL captures when an expression becomes no longer "available" Jim Hogg - UW - CSE - P501

Global CSE using Available Expressions • Phase I • Number each expression in procedure • For each block b, compute DEFb and KILLb - once off • Initialize AVAILb = {all expressions in procedure} = U • For each block b, compute AVAILb, by iterating until fixed point • Phase II • For each block b, value-number the block starting with AVAILb • Replace expressions in AVAILb with references to the previously computed values Also called "Global Redundancy Elimination" or GRE Jim Hogg - UW - CSE - P501

LVN – Local Value Numbering SVN – Superlocal Value Numbering DVN – Dominator-based Value Numbering GRE – Global Redundancy Elimination Comparing Algorithms A m = a + b n = a + b B C p = c + d r = c + d q = a + b r = c + d D E e = b + 18 s = a + b u = e + f e = a + 17 t = c + d u = e + f F v = a + b w = c + d x = e + f G y = a + b z = c + d Jim Hogg - UW - CSE - P501

Comparing Algorithms (2) • LVN <= SVN <= DVN form a strict hierarchy • later algorithms find a superset of previous information • Global Redundancy Elimination, via Available Expressions, finds a different set • Discovers e+f in F (computed in both D and E) • Misses identical values if they have different names • eg: a+b and c+d when a=c and b=d • Value Numbering catches this D E e = b + 18 s = a + b u = e + f e = a + 17 t = c + d u = e + f F v = a + b w = c + d x = e + f Jim Hogg - UW - CSE - P501

Scope of Analysis • Larger context (EBBs, regions, global, inter-proc) may help • More opportunities for optimizations • But not always • Introduces uncertainties about flow of control • Usually only allows weaker analysis • Sometimes has unwanted side effects • Can create additional pressure on registers, for example Jim Hogg - UW - CSE - P501

Code Replication • Sometimes replicating code increases opportunities • modify code to create larger regions with simple control flow • Two examples • Cloning • Inline substitution Jim Hogg - UW - CSE - P501

Cloning: Before & After Cloned m = a + b n = a + b A Original C B A p = c + d r = c + d q = a + b r = c + d C B G y = a + b z = c + d D E e = b + 18 s = a + b u = e + f e = a + 17 t = c + d u = e + f E D F F v = a + b w = c + d x = e + f v = a + b w = c + d x = e + f F G G y = a + b z = c + d y = a + b z = c + d G Even LVN can optimize these larger basic blocks Larger code size => increased I-cache pressure Jim Hogg - UW - CSE - P501

Inline Substitution ("inlining") Calling a function can be expensive! • Global optimizer must assume the callee can modify all reachable data: • In MiniJava, all fields of all objects • In C/C++, additionally all "global" data • So the call kills many available expressions • Must save/restore caller registers across call • Calling the function imposes its own overhead Solution • Inlining: replace each call with a copy of the callee • Introduces more opportunities for optimization Jim Hogg - UW - CSE - P501

Inline Substitution - "inlining" Class with trivial getter class C { int x; intgetx() { return x; } } Compiler inlinesbody of getxinto f class X { void f() { C c = new C(); int total = c.x + 42; } } • Eliminates call overhead • Opens opportunities for more optimizations • Can be applied to large method bodies too • Aggressive optimizer will inline 2 or more deep • Increases total code size • With care, is a huge win for OO code • Recompile if caller or callee changes! Method f calls getx class X { void f() { C c = new C(); int total = c.getx() + 42; } } Jim Hogg - UW - CSE - P501

Dataflow Analysis • "Available Expressions" is a first example of dataflow analysis • It supports the optimization called "Global Redundancy Elimination", or GRE • Many similar problems can be expressed in same framework • No limit to the number of execution paths thru a function • No limit to the length of an execution path • And yet, Dataflow Analysis infers a finite number of facts about the function • Dataflow Analysis does not distinguish among the paths taken to any point • eg: it assumes both arms of an IF-THEN-ELSE can be taken • We then use these facts to transform and optimize the IR • Example facts about a single function • Variable x has the constant value 42 at every point • At point p, variable x has same value as variable y • At point p, value of x could have been defined only at point q • At point p, the value of x is no longer required Jim Hogg - UW - CSE - P501

Dataflow Equations - Overview • Available Expressions • AVAILINb = IntersectpDEFp ( AVAILINp - KILLp ) • Live Variables • LIVEINb = USEb  ( LIVEOUTb - DEFb ) • Reaching Defs • REACHESb= UnionpDEFOUTp ( REACHESp SURVIVESp) • Anticipatable (Very Busy) Expressions • ANTICb= IntersectsUSEDs ( ANTICs- KILLEDs) • Generic • OUTb = GENb  ( INb - KILLb ) Jim Hogg - UW - CSE - P501

Dataflow Analysis • Set of techniques for compile-time reasoning about runtime values • Need to build a graph • Trivial for basic blocks • Flowgraph for whole-function (global) analysis • Callgraph for whole-program analysis • Limitations • Assumes allpaths are taken (eg: both arms of IF-THEN-ELSE) • Infers facts about a function, rather than actual runtime values • eg: x+y is redundant • Arrays – treats array as one variable • eg: don't know, in general, whether a[i] == a[j] • Pointers – difficult, expensive to analyze • eg: *p = 1; *q = 2; return *p; // same as return 1? Jim Hogg - UW - CSE - P501

Recap: Available Expressions Same analysis we did earlier to eliminate redundant expressions AVAILb = IntersectpDEFp ( AVAILp - KILLp ) AVAILp DEFp p1 p2 p3 predecessors AVAILb KILLp b Spring 2014

Characterizing Dataflow Analysis • All dataflow algorithms involve sets of facts about each block b • INb – facts true on entry to b • OUTb – facts true on exit from b • GENb – facts created and not killed in b • KILLb – facts killed in b • These are related by the equation OUTb = GENb ( INb – KILLb ) • Solve this iteratively for all blocks • Sometimes facts propagate forward (eg: available expressions) • Sometimes facts propagate backward (eg: reaching defs) INb b GENb KILLb OUTb

Live Variables (or "liveness") • A variable v is live at point p if there is any path from p to a use of v along which v is not redefined • ie: a variable is live here if some later code uses its value there • Some uses: • Register allocation – only live variables need a register • Only live variables need be stored back to memory • Detect use of uninitialized variables - how? • Improve SSA construction – only need Φ-function for variables that are live in a block (later) Jim Hogg - UW - CSE - P501

Liveness Analysis Sets • For each block b, define the sets: • USEb = variables used (ie, read-from) in b before any def • DEFb = variables defined (ie, assigned-to) in b & not subsequently killed in b • INb = variables live on entry to b • OUTb = variables live on exit from b Jim Hogg - UW - CSE - P501

Liveness - Intuition x= B DEFB = {x} x is "liveout" C =x USEC = {x} x= B DEFB = {x} x is not "liveout" =x B USEB = {x} x= C DEFC = {x} x is "livein" Jim Hogg - UW - CSE - P501

Liveness Equations • OUTb = Unions INs • INb= USEb ( OUTb - DEFb ) • Set INb = OUTb = {} • Update IN and OUT until no change • "backwards" dataflow analysis INb OUTb USEb b DEFb INs2 INs1 s2 s1 successors

Liveness Calculation INb= USEb (OUTb– DEFb) OUTb= Unions INs 1: a = 0 2: b = a + 1 3: c = c + b 4: a = b * 2 5: a < N 6: return c • Work backwards from 6 to 1 Jim Hogg - UW - CSE - P501

Liveness Calculation INb= USEb (OUTb– DEFb) OUTb= Unions INs 1: a = 0 2: b = a + 1 3: c = c + b 4: a = b * 2 5: a < N 6: return c • Work backwards from 6 to 1 • Note c is livein for block 1 - uninitialized! Jim Hogg - UW - CSE - P501

Liveness Calculation INb= USEb (OUTb– DEFb) OUTb= Unions INs 1: a = 0 2: b = a + 1 3: c = c + b 4: a = b * 2 5: a < N 6: return c • Work backwards from 6 to 1 • Only change in iteration 2 - a is ivein for block 5 • Stops changing after 2 iterations Jim Hogg - UW - CSE - P501

Liveness Calculation INb= USEb (OUTb– DEFb) OUTb= Unions INs 1: a = 0 2: b = a + 1 3: c = c + b 4: a = b * 2 5: a < N 6: return c • Work backwards from 6 to 1 • Stops changing after 2 iterations Jim Hogg - UW - CSE - P501

Alternate Liveness Equations • Many problems have more than one formulation • Different books use different sets: • USED[b] – variables used in b before being defined in b • NOTDEF[b] – variables not defined in b • LIVE[b] – variables live on exit from b • Equation • LIVE[b] = ssucc(b) USED[s]  ( LIVE[s]  NOTDEF[s] ) Jim Hogg - UW - CSE - P501

Reaching Defs • A definition of variable v at L1reaches instruction at L2 if that instruction uses v and there is a path from L1 to L2 that does not re-define v • Use: • Find all possible defs for a variable in an expression - great debugger plugin when looking for 'culprit' Jim Hogg - UW - CSE - P501

Equations for Reaching Defs • Sets • DEFOUTb – set of defs in b that reach the end of b • SURVIVEDb – set of all defs not killed by a re-def in b • REACHb – set of defs that reach b • Equation • REACHb = UnionpDEFOUTp  ( REACHp  SURVIVEDp ) Jim Hogg - UW - CSE - P501

Anticipated Expressions • Also known as "Very Busy" Expressions • Expression x+y is anticipated at point p if • all paths from p eventually computex+y, using values of x and y as they exist at p • Use: • Code hoisting – move x+yto p • reduces code size; no effect on execution time Jim Hogg - UW - CSE - P501

Equations for: Anticipated Expressions • Sets • USEDb – expressions used in b before they are killed • KILLEDb – expressions def'd in b before they are used • ANTICb – anticipated expressions on exit from b • Equation ANTICb = IntersectsUSEDs  ( ANTICs - KILLEDs ) Jim Hogg - UW - CSE - P501

Dataflow Equations - Recap • Available Expressions • AVAILINb = IntersectpDEFp ( AVAILINp - KILLp ) • Live Variables • LIVEINb = USEb  ( LIVEOUTb - DEFb ) • Reaching Defs • REACHESb= UnionpDEFOUTp ( REACHESp SURVIVESp) • Anticipated Expressions • ANTICb = IntersectsUSEDs ( ANTICs - KILLEDs) • Generic • OUTb = GENb  ( INb - KILLb ) Jim Hogg - UW - CSE - P501

Efficiency of Dataflow Analysis • The algorithms eventually terminate • but reduce time-needed by picking a good order to visit nodes in the flowgraph • depends on how information flows • Forward problems – reverse post-order • Backward problems - post-order Jim Hogg - UW - CSE - P501

Using Dataflow Facts Some possible Tranformations/Optimizations ... Jim Hogg - UW - CSE - P501

CSE Elimination • x+y is defined at L1 and available at L2 • ie: x nor y is re-defined between L1 and L2 after before L1: = x+y t = a L1: = x+y Save calculation into temp t L2: = t Use t, rather than re-calculate L2: = x+y • Analysis: Available Expressions • Code runs faster by not re-calculating x+y Jim Hogg - UW - CSE - P501

Constant Prop. • c is a constant • a reaches L2 (a is not re-defined between L1 and L2) after before L1: a = c L1: a = c L2: = c Propagate c, not a L2: = a • Analysis: Reaching Defs • Code runs faster because c is embedded into the instruction Jim Hogg - UW - CSE - P501

Copy Prop. • x is a variable • a reaches L2 (a is not re-defined between L1 and L2), and is the only def of a to reach L2 after before L1: a = x L1: a = x L2: = x Propagate x, not a L2: = a • Analysis: Reaching Defs • Code runs faster because c is embedded into the instruction Jim Hogg - UW - CSE - P501

Copy Prop. Tradeoffs • Downside: can lengthen lifetime of variable x • => more register pressure or memory traffic thru spilling • not worth doing if only reason is to eliminate copies – let the register allocate deal with that • Upside: may expose other optimizations. Eg: before after a = y + z u = y c = u + z a = y + z u = y c = y+ z Now reveals CSE y+z Jim Hogg - UW - CSE - P501

Dead Code Elimination (DCE) • ais dead after L1 • statement at L1 has no side-effects (output, exceptions, etc) after before Delete statement at L1 L1: a = x • Analysis: Liveness • Code runs faster because one less statement Jim Hogg - UW - CSE - P501

CSE P501 – Compiler Construction