280 likes | 391 Views
Optimizing Compilers CISC 673 Spring 2011 Static Single Assignment II. John Cavazos University of Delaware. What is SSA?. Many data-flow problems have been formulated To limit number of analyses, use single analysis to perform multiple transformations SSA
E N D
Optimizing CompilersCISC 673Spring 2011Static Single Assignment II John Cavazos University of Delaware
What is SSA? • Many data-flow problems have been formulated • To limit number of analyses, use single analysis to perform multiple transformations SSA • Compiler optimization algorithms enhanced or enabled by SSA: • Constant propagation • Dead code elimination • Global value numbering • Partial redundancy elimination • Register allocation
SSA Construction Algorithm (High-level sketch) 1. Insert -functions 2. Rename values
SSA Construction Algorithm (Less high-level sketch) • Insert -functions at every join for every name • Solve reaching definitions • Rename each use to def that reaches it (will be unique) What’s wrong with this approach • Too many -functions! To do better, we need a more complex approach Builds maximal SSA
SSA Construction Algorithm (Less high-level sketch) 1. Insert -functions a.) calculate dominance frontiers b.) find global names for each name, build list of blocks that define it c.) insert -functions
Insert -functions • global name n worklist ← Block(n) // blocks in which n is assigned block b ∈ worklist block d in b’s dominance frontier insert a -function for n in d add d to worklist
Computing Dominance Frontiers • Only join points are in DF(n) for some n • Simple algorithm for computing dominance frontiers • For each join point x (i.e., |preds(x)| > 1) • For each CFG predecessor of x • Run up to IDOM(x ) in dominator tree, • add x to DF(n) for each n betweenx and IDOM(x )
B0 B0 B1 B1 B2 B3 B2 B3 B4 B4 B5 B5 B6 B6 B7 B7 Dominance Frontiers For each join point x For each CFG pred of x Run to IDOM(x ) in dom tree, add x to DF(n) for each n between x and IDOM(x ) Flow Graph Dominance Tree
B0 B1 x (...) B2 B3 B4 B5 x ... B6 x (...) • DF(4) is {6}, so in 4 forces -function in 6 B7 x (...) • in 6 forces -function in DF(6) = {7} • in 7 forces -function in DF(7) = {1} Dominance Frontiers & -Function Insertion • A definition at n forces a -function at m iff • n DOM(m) but n DOM(p) for some p preds(m) • DF(n ) is fringe just beyond region n dominates Dominance Frontiers • in 1 forces -function in DF(1) = Ø (halt ) For each assignment, we insert the -functions
B0 i > 100 i ... B1 a (a,a) b (b,b) c (c,c) d (d,d) i (i,i) a ... c ... B2 B3 b ... c ... d ... a ... d ... d (d,d) c (c,c) b ... B4 B5 B6 d ... c ... B7 a (a,a) b (b,b) c (c,c) d (d,d) y a+b z c+d i i+1 i > 100 Excluding local names avoids ’s for y & z • With all the -functions • Lots of new ops • Renaming is next Assume a, b, c, & d defined before B0
SSA Construction Algorithm (Less high-level sketch) 2. Rename variables in a pre-order walk over dominator tree (uses counter and a stack per global name) Staring with the root block, b a.) generate unique names for each -function and push them on the appropriate stacks
SSA Construction Algorithm (Less high-level sketch) • Rename variables (cont’d) b.) rewrite each operation in the block i. Rewrite uses of global names with the current version (from the stack) ii. Rewrite definition by creating & pushing new name c.) fill in -function parameters of successor blocks d.) recurse on b’s children in the dominator tree e.)<on exit from block b> pop names generated in b from stacks
SSA Construction Algorithm Adding all the details ... Rename(b) for each -function in b, x (…) rename x as NewName(x) for each operation “x y op z” in b rewrite y as top(stack[y]) rewrite z as top(stack[z]) rewrite x as NewName(x) for each successor of b in the CFG rewrite appropriate parameters for each successor s of b in dom. tree Rename(s) for each operation “x y op z” in b pop(stack[x]) for each global name i counter[i] 0 stack[i] call Rename(n0) NewName(n) i counter[n] counter[n] counter[n] + 1 push ni onto stack[n] return ni
B0 i > 100 i ... B1 a (a,a) b (b,b) c (c,c) d (d,d) i (i,i) a ... c ... Assume a, b, c, & d defined before B0 B2 B3 b ... c ... d ... a ... d ... d (d,d) c (c,c) b ... B4 B5 B6 d ... c ... B7 a (a,a) b (b,b) c (c,c) d (d,d) y a+b z c+d i i+1 i has not been defined i > 100 Before processing B0 a b c d i Counters Stacks 1 1 1 1 0 a0 b0 c0 d0 20
B1 a (a0,a) b (b0,b) c (c0,c) d (d0,d) i (i0,i) a ... c ... B2 B3 b ... c ... d ... a ... d ... B4 B5 d ... c ... i > 100 B0 i > 100 i0 ... End of B0 d (d,d) c (c,c) b ... a b c d i B6 Counters Stacks 1 1 1 1 1 B7 a (a,a) b (b,b) c (c,c) d (d,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 21
B1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B2 B3 b ... c ... d ... a ... d ... B4 B5 d ... c ... i > 100 B0 i > 100 i0 ... End of B1 d (d,d) c (c,c) b ... a b c d i B6 Counters Stacks 3 2 3 2 2 B7 a (a,a) b (b,b) c (c,c) d (d,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 c2 a2 22
B1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B2 B3 b2 ... c3 ... d2 ... a ... d ... B4 B5 d ... c ... i > 100 B0 i > 100 i0 ... End of B2 d (d,d) c (c,c) b ... a b c d i B6 Counters Stacks 3 3 4 3 2 B7 a (a2,a) b (b2,b) c (c3,c) d (d2,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 d2 b2 c2 a2 c3 23
B1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B2 B3 b2 ... c3 ... d2 ... a ... d ... B4 B5 d ... c ... i > 100 B0 i > 100 i0 ... Before starting B3 d (d,d) c (c,c) b ... a b c d i B6 Counters Stacks 3 3 4 3 2 B7 i ≤ 100 a (a2,a) b (b2,b) c (c3,c) d (d2,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 c2 a2 24
B1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B2 B3 b2 ... c3 ... d2 ... a3 ... d3 ... B4 B5 d ... c ... i > 100 B0 i > 100 i0 ... End of B3 d (d,d) c (c,c) b ... a b c d i B6 Counters Stacks 4 3 4 4 2 B7 a (a2,a) b (b2,b) c (c3,c) d (d2,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 c2 d3 a2 a3 25
B1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B2 B3 b2 ... c3 ... d2 ... a3 ... d3 ... B4 B5 d4 ... c ... i > 100 B0 i > 100 i0 ... End of B4 d (d4,d) c (c2,c) b ... a b c d i B6 Counters Stacks 4 3 4 5 2 B7 a (a2,a) b (b2,b) c (c3,c) d (d2,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 c2 d3 a2 a3 d4 26
B1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B2 B3 b2 ... c3 ... d2 ... a3 ... d3 ... B4 B5 d4 ... c4 ... i > 100 B0 i > 100 i0 ... End of B5 d (d4,d3) c (c2,c4) b ... a b c d i B6 Counters Stacks 4 3 5 5 2 B7 a (a2,a) b (b2,b) c (c3,c) d (d2,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 c2 d3 a2 a3 c4 27
B1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B2 B3 b2 ... c3 ... d2 ... a3 ... d3 ... B4 B5 d4 ... c4 ... i > 100 B0 i > 100 i0 ... End of B6 d5 (d4,d3) c5 (c2,c4) b3 ... a b c d i B6 Counters Stacks 4 4 6 6 2 B7 a (a2,a3) b (b2,b3) c (c3,c5) d (d2,d5) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 c2 d3 a2 b3 a3 c5 d5 28
B1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B2 B3 b2 ... c3 ... d2 ... a3 ... d3 ... B4 B5 d4 ... c4 ... i > 100 B0 i > 100 i0 ... Before B7 d5 (d4,d3) c5 (c2,c4) b3 ... a b c d i B6 Counters Stacks 4 4 6 6 2 B7 a (a2,a3) b (b2,b3) c (c3,c5) d (d2,d5) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 c2 a2 29
B1 a1 (a0,a4) b1 (b0,b4) c1 (c0,c6) d1 (d0,d6) i1 (i0,i2) a2 ... c2 ... B2 B3 b2 ... c3 ... d2 ... a3 ... d3 ... B4 B5 d4 ... c4 ... i > 100 B0 i > 100 i0 ... End of B7 d5 (d4,d3) c5 (c2,c4) b3 ... a b c d i B6 Counters Stacks 5 5 7 7 3 B7 a4 (a2,a3) b4 (b2,b3) c6 (c3,c5) d6 (d2,d5) y a4+b4 z c6+d6 i2 i1+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 b4 c2 d6 a2 i2 a4 c6 30
B1 a1 (a0,a4) b1 (b0,b4) c1 (c0,c6) d1 (d0,d6) i1 (i0,i2) a2 ... c2 ... B2 B3 b2 ... c3 ... d2 ... a3 ... d3 ... B4 B5 d4 ... c4 ... i > 100 B0 i > 100 i0 ... • After renaming • Semi-pruned SSA form • We’re done … d5 (d4,d3) c5 (c2,c4) b3 ... B6 B7 a4 (a2,a3) b4 (b2,b3) c6 (c3,c5) d6 (d2,d5) y a4+b4 z c6+d6 i2 i1+1 Semi-pruned only names live in 2 or more blocks are “global names”. 31
SSA Construction Algorithm (Pruned SSA) What’s this “pruned SSA” stuff? • Minimal SSA still contains extraneous -functions • Inserts some -functions where they are dead • Would like to avoid inserting them
SSA Construction Algorithm (Two Ideas) • Semi-pruned SSA: discard names used in only one block • Significant reduction in total number of -functions • Needs only local Live information (cheap to compute) • Pruned SSA: only insert -functions where their value is live • Inserts even fewer -functions, but costs more to do • Requires global Live variable analysis (more expensive) In practice, both are simple modifications to step 1.
... X17 x10 ... X17 x11 X17(x10,x11) ... x17 ... x17 SSA Deconstruction At some point, we need executable code • Few machines implement operations • Need to fix up the flow of values Basic idea • Insert copies -function pred’s • Simple algorithm • Works in most cases • Adds lots of copies • Many of them coalesce away