Optimizations using SSA

Optimizations using SSA CS 671 March 25, 2008

if (…) B1 X  5 X  3 B2 B3 Y  X B4 if (…) B1 X0 5 X1 3 B2 B3 X2 (X0, X1) Y0 X2 B4 Last Time – SSA Form • Generating SSA form • Inserting -functions using dominance frontiers • Renaming variables Before SSA After SSA

B0 B1 B1 a1 (a0,a4) b1 (b0,b4) c1 (c0,c6) d1 (d0,d6) i1 (i0,i2) a2 ... c2  ... B2 B3 B4 B5 B2 B3 b2 ... c3  ... d2  ... a3 ... d3  ... B6 B4 B5 d4  ... c4  ... B7 i > 100 B0 i > 100 i0 ... Dominance Tree d5  (d4,d3) c5  (c2,c4) b3  ... B6 a b c d i B7 a4 (a2,a3) b4 (b2,b3) c6 (c3,c5) d6 (d2,d5) y  a4+b4 z  c6+d6 i2  i1+1 Counters Stacks 5 5 7 7 3 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 b4 c2 d6 a2 i2 a4 c6 30

Today – Using SSA in Optimizations • SSA simplifies many optimization algorithms • Simplifies def-use chains • Examples: Dead code elimination, constant propagation

1 x=a+b y=c+d 2 3 x= 4 y= z=x+y 5 z=z+1 6 out(x,y) Dead-Code Elimination • Dead code is either: • Unreachable code • Assignments where the result is never used • Examples • “y in 1” is dead • “x in 1” is partially dead • along path 1-2-4 but not 1-3-4 • “z in 4,5” is never used in relevant computations • here: only x, y are relevant

Dead-Code Elimination • SSA makes dead-code analysis particularly simple • Defn: A variable is live at its definition iff its list of uses is not empty • There can be no other definition of the variable • The definition of a variable dominates every use • So there must be a path from definition to use • while (there is some variable v with no uses && the statement that defines v has no side effects) delete the statement that defines v • When deleting v  x  y or v  (x, y) remove x, y from use list

Dead-Code Elimination in SSA Form • W  a list of all variables in SSA program • while W is not empty • remove some variable v from W • if v’s list of uses is empty • let S be v’s statement of definition • if S has no other side effects • delete S from the program • for each variable xi used by S • delete S from the list of uses of xi • W  W  {xi}

Simple Constant Propagation • For any statement of the form v  c for some constant c • Any use of v can be replaced with a use of c • Any -function of the form v  (c1, c2, …, cn) where all the ciare equal, can be replaced by v  c • Easy to detect using SSA • Easy to implement using work-list algorithm

Simple Constant Prop in SSA Form • W  a list of all statements in SSA program • while W is not empty • remove some statement S from W • if S is v  (c, c,…, c) for some constant c • replace S by v  c • if S is v  c for some constant c • delete S from the program • for each statement T that uses v • substitute c for v in T • W  W  {T}

Other Transformations … • Can be incorporated into the work-list algorithm • All can be done in linear time • Examples • Copy propagation • Constant conditions • Unreachable code

Copy Propagation • A single argument -function x  (y) or a copy assignment x  y can be deleted and y substituted every use of x i1 1 j1 1 k1 0 j2  (j4, j1) k2  (k4, k1) if k2< 100 if j2 < 20 return j2 j3 i1 k3 k2 + 1 j5 k2 k5 k2 + 2 j4  (j3, j5) k4  (k3, k5)

Constant Conditions • if (a < b) goto L1 else L2 where a and b are constant becomes goto L1 (or goto L2) • Extraneous control-flow edge must be deleted • -functions must be adjusted (to account for predecessor-1) j = 1 if (j < 20) goto L1 else goto L2 L1 L2

Unreachable Code • Deleting a predecessor may cause L2 to become unreachable • All statements in L2 can be deleted • Use-lists of all variables used in L2 must be adjusted • L2 can be deleted (and its successors updated) j = 1 goto L1 L1 L2

i  1 j 1 k  0 if k < 100 if j< 20 return j j  i k k + 1 j  k k k + 2 Conditional Constant Propagation • Is j always equal to 1? • Simple constant propagation missed this opportunity!

SSA Conditional Constant Propagation • Keeps track of the result of conditional branches • Only propagate definitions when the flow graph is marked executable • When propagating constants, ignore edges at join nodes that are not executable. • Does not assume that a variable is non-constant until there is evidence • Does not assume that we execute a given block until there is evidence

... ... ci cj ck cl cm cn SSA Conditional Constant Propagation • Uses a lattice: • [x] = T No evidence that any assignment to v is executed • [x] = 4 Evidence of x  4 has been seen • [x] =  Evidence that x may have two different values • Tracks the run-time value of variables • New information can only move a variable down the lattice T Never defined Defined as c Overdefined 

x y z Constant Propagation (cont.) • Side effect of the meet operator: Ç ^ T c 0 ^ T c T 0 (c =c ) ? 0 1 ^ c c ^ 1 1 c : 0 ^ ^ ^ ^ z = f(x, y)

Executability • Also track the executability of each block: • [B] = false We have seen no evidence that block B can ever be executed • [B] = true We have seen evidence that block B can be executed • Start with all blocks: [B] = false • The start block B1 is executable: [B1] = true • For any executable block B with one successor C: [C] = true • For executable branches if x<y goto L1 else L2: • [x] = T or [y] = T • [L2] = true and [L2] = true

An Example • Start with all variables: [x] = T • Start with all blocks: [B] = false • Calculate  and  1 i1 1 j1 1 k1 0 2 j2  (j4, j1) k2  (k4, k1) if k2< 100 3 4 if j2 < 20 return j2 5 6 j3 i1 k3 k2 + 1 j5 k2 k5 k2 + 2 7 j4  (j3, j5) k4  (k3, k5)

Using SSA – Dead code elimination • Dead code elimination • Conceptually similar to mark-sweep garbage collection • Mark useful operations • Everything not marked is useless • Need an efficient way to find and to mark useful operations • Start with critical operations • Work back up SSA edges to find their antecedents • Define critical • I/O statements, linkage code (entry & exit blocks), return values, calls to other procedures • Algorithm will use post-dominators & reverse dominance frontiers

Using SSA – Dead code elimination Mark for each op i clear i’s mark if i is critical then mark i add i to WorkList while (Worklist ≠ Ø) remove i from WorkList (i has form “xy op z” ) if def(y) is not marked then mark def(y) add def(y) to WorkList if def(z) is not marked then mark def(z) add def(z) to WorkList for each b  RDF(block(i)) mark the block-ending branch in b add it to WorkList Sweep for each op i if i is not marked then if i is a branch then rewrite with a jump to i’s nearest useful post-dominator if i is not a jump then delete i • Notes: • Eliminates some branches • Reconnects dead branches to the remaining live code • Find useful post-dominator by walking post-dom tree • Entry & exit nodes are useful

Using SSA – Dead code elimination • When is a branch useful? • When a useful operation depends on its existence • j control dependent on i one path from i leads to j, one doesn’t • This is the reverse dominance frontier of j (RDF(j)) • Algorithm uses RDF(n ) to mark branches as live In the CFG, j is control dependent on i if 1.  a non-null path p from i to j  j post-dominates every node on p after i 2. j does not strictly post-dominate i

Using SSA – Dead Code Elimination • What’s left? • Algorithm eliminates useless definitions & some useless branches • Algorithm leaves behind empty blocks & extraneous control-flow • Two more issues • Simplifying control-flow • Eliminating unreachable blocks • Both are CFG transformations (no need for SSA)

Eliminating Useless Control Flow Transformations B1 B1 B2 B2 • Both sides of branch target Bi • Neither block must be empty • Replace it with a jump to Bi • Simple rewrite of last op in B1 • How does this happen? • Rewriting other branches • How do we find it? • Check each branch Eliminating redundant branches Branch, not a jump

Eliminating Useless Control Flow Transformations empty B1 B2 B2 Eliminating empty blocks • Merging an empty block • Empty B1 ends in a jump • Coalesce B1 with B2 • Move B1’s incoming edges • Eliminates extraneous jump • Faster, smaller code • How does this happen? • Eliminate operations in B1 • How do we find it? • Test for empty block

Eliminating Useless Control Flow Transformations B1 B1 B2 B2 Combining non-empty blocks • Coalescing blocks • Neither block must be empty • B1 ends with a jump • B2 has 1 predecessor • Combine the two blocks • Eliminates a jump • How does this happen? • Simplifying edges out of B1 • How do we find it? • Check target of jump |preds | B1 and B2 should be a single basic block If one executes, both execute, in linear order. *

Eliminating Useless Control Flow Transformations B1 B1 empty empty B2 B2 Hoisting branches from empty blocks • Jump to a branch • B1 ends with jump, B2 is empty • Eliminates pointless jump • Copy branch into end of B1 • Might make B2 unreachable • How does this happen? • Eliminating operations in B2 • How do we find this? • Jump to empty block

Eliminating Useless Control Flow The Algorithm OnePass() for each block i, in postorder if i ends in a conditional branch then if both targets are identical then replace the branch with a jump if i ends in a jump to j then if i is empty then replace transfers to i with transfers to j if j has only one predecessor coalesce i and j if j is empty & j ends in a conditional branch then rewrite i’s jump with j’s branch Clean() until CFG stops changing compute postorder OnePass()

Eliminating Useless Control Flow What about an empty loop? By itself, CLEAN cannot eliminate the loop Loop body branches to itself Branch is not redundant Doesn’t end with a jump Key is to eliminate self-loop Add a new transformation? Then, B1 merges with B2 B0 Targets two distinct blocks! B0 B1 B1 B2 B2 B0 B1 B2 New transformation must recognize that B1 is empty. Presumably, it has code to test exit condition & (probably) increment an induction variable. This requires looking at code inside B1 and doing some sophisticated pattern matching. This is awfully complicated.

Eliminating Useless Control Flow What about an empty loop? How to eliminate <B1,B1> ? Pattern matching ? Useless code elimination ? What does DEAD do to B1? Remember, it is empty So, B1 RDF(B2) B1’s branch is useless DEAD rewrites it as a jump B0 B1 B2 *

Eliminating Useless Control Flow What about an empty loop? How to eliminate <B1,B1> ? Pattern matching ? Useless code elimination ? What does DEAD do to B1? Remember, it is empty So, B1 RDF(B2) B1’s branch is useless DEAD rewrites it as a jump DEAD converts it to a form where CLEAN handles it B0 B0 DEAD B1 B1 B2 B2

Dead Code Elimination Summary Useless Computations  DEAD (Mark and Sweep) Useless Control-flow  CLEAN Unreachable Blocks  Execution counts Other Techniques Constant propagation can eliminate branches Algebraic identities eliminate some operations Redundancy elimination Creates useless operations, or Eliminates them

Using SSA • In general, using SSA leads to • Cleaner formulations • Better results • Faster algorithms • We’ve seen two SSA-based algorithms. • Dead-code elimination • Constant propagation • These optimizations leave behind other inefficiencies

Optimizations using SSA

Optimizations using SSA

Presentation Transcript

Collimation optimizations

SSA

Global optimizations

Proving Optimizations Correct using Parameterized Program Equivalence

Architectural Optimizations

Verifying Optimizations using SMT Solvers

Loop Optimizations

Local Optimizations

Architectural Optimizations

Intraprocedural Optimizations

Reducing Misses using Compiler Optimizations

Interconnect Optimizations

SSA

SSA

Geometry Optimizations

Interprocedural Optimizations

Optimization algorithms using SSA

Gaming Optimizations

Vector Optimizations

Interconnect Optimizations

SSA