270 likes | 345 Views
Optimization algorithms using SSA. Software Optimizations & Restructuring Research Group School of Electronical Engineering Seoul National University 2006-21166 wonsub Kim. Using static single assignment form (SSA). Review of SSA translation Place Φ function using dominance frontier
E N D
Optimization algorithms using SSA Software Optimizations & Restructuring Research Group School of Electronical Engineering Seoul National University 2006-21166 wonsub Kim
Using static single assignment form (SSA) • Review of SSA translation • Place Φ function using dominance frontier • Overview of SSA form (why SSA form? ) • SSA form transform step • Role of SSA form, benefit of SSA form • Further optimization using SSA form • Constant propagation, dead code elimination, induction variable reduction and other issues.
Review of SSA form translation • Observation • Node X does not dominates Z • Node x dominates a immediate dominator of Z • Key observation • insert the Φ function on the first node Z that is common to another path originating in a node Y with an assignment to V • Node z definition is same as dominance frontier definition • Place Φ function in the domi-nance frontier nodes of the nodes where each def of V is
Review of SSA form translation • Dominance frontier definition • Definition – node sets which dominates the immediate predecessor of node Y, but do not dominate node Y. • From A's point of view, these are the nodes at which other control paths that don't go through A make their earliest appearance. • Case 1 – definition이 dominate하는 모든 node에 대해서는 definition이 reachable • Case 2 – case 1 node를 떠나서 dominance frontier에 들어가게 되면 그때서야 비로소 같은 변수에 대한 other definition을 하는 flow를 고려하게 된다. • How about the assignments in the loop? • it also needs to be use Φ function which merge multiple definitions
Static single assignment (SSA) form • Single assignment form • In MA form, variable is memory location, not a value. • In SA form, variable is a value. Simplifying property of variable • Data structure - only a single assignment to a variable (single def-inition), but many uses of it (only one def site, lists of use site) • A def must dominate all of its uses!! • Variables are renamed to remove multiple assignments • Role of SSA form in optimization • Data flow analysis and compiler optimizations are more efficient with SA form and SSA form simplify many optimizations • The need for use-def chain is removed & that info explicitly appear. • Quadratic number of use-def chain O(n^2)-> linear number O(n) • Eliminates false dependences (simplifying context)
Why SSA form? • SSA form provides the compiler with a solution to the question “which definitions of a variable reach the points where it is used? -> Make def-use chains explicit • every definition knows its uses and every use know its single definition • Makes dataflow optimization more easier and faster • For most optimizations reduce space/time requirements • DU chains in SSA form save more space, but spend more • In MA form, # of chains for V variables • Def-use chains are so expensive!! • Worst case # of chains = O(# of defs (v) * # of uses (v)) <= O (E*E*V) • # of defs (v) proportional to E, # of uses (v) also prop to E • In SSA form, # of chains = O (E * V)
SSA form MA form x2 := … x2 := … x1 := … x := … x := … x := … multi step search one step search x4 := (x1,x2,x3) … := x4 … := x … := x4 … := x4 … := x … := x Space reduction in DU chain • Def-use chain structure is more simplified
SSA transform step • SSA transformation step • 1. To get some efficiency from SA form, translate the original code into SA form statically. • 2. SSA form is not executable due to pseudo-instructions (compiler internal form), thus should be translated back to MA form execution Optimized Code (MA form) Original Code (MA form) SSA form Optimizations code motion, redundancy elimination, constant propagation, …
Static single assignment form • Using SSA, further possible optimizations • Constant propagation (simple constant, conditional constant) • Dead code elimination • Induction variable identification • Global Value numbering(p349~355), data dependences.. • Register allocation • Other considerations • More variables, Increase in code size due to Φ-functions • But only linearly increased (in practice, SSA is 0.6-2.4 times larger) • Some optimizations are more annoying • But on the whole, a win for compilers • How does the Φ function choose which xi to use? • We don’t really care about it
Dead code elimination (1) • Problem definition • assignment to variable with no use can be removed • SSA structure • only one definition site & a list of use sites ( easy to check liveness) • Worklist algorithm • W <- a list of all variables in the SSA program (if transformed) • While W is not empty • Remove some variable v from W • If v’s list of uses is empty (def-use chain’s use list empty) • Let S be v’s statement of definition • If S has no side effects(?) other than the assignment to v • Delete S from the program • for each variable xi used by S with UD ( keep track last use!! ) • Delete S from the list of uses of xi
Dead code elimination (2) • Just check which variable value Vi do not show up • Remove the phi function or statement that creates Vi • Recurse as new Vi value may now have become dead
x := 4 y := y+3 goto L on x<5 z := y*2 x := y+9 L: z := y*5 x := 4 y := y+3 goto L on 4<5 z := y*2 x := y+9 L: z := y*5 constant propagation constant propagation dead code elimination dead code elimination dead code elimination Dead code elimination example need extra operation to avoid dangling pointers to du chains for removed statements DU chains in MA form x := 4 y := y+3 L: z := y*5 SSA forms x1 := 4 y2 := y1+3 goto L on x1<5 z1 := y2*2 x2 := y2+9 L: x3 := phi(x1,x2) z2 := phi(z0,z1) z3 := y2*5 x1 := 4 y2 := y1+3 goto L on 4<5 z1 := y2*2 x2 := y2+9 L: x3 := phi(x1,x2) z2 := phi(z0,z1) z3 := y2*5 x1 := 4 y2 := y1+3 L: x3 := x1 z2 := z0 z3 := y2*5
Constant propagation • Evaluate expression at compile time, eliminate dead code, improve efficacy of other optimization • What is constants?? • Simple constant – constant for all paths through a program • Faster algorithm - sparse simple constants algorithm using SSA translation • Conditional constants – constant for actual paths through • Faster algorithm - sparse conditional constants algorithm using SSA translation • Key points!! • sparse SSA edge traverse is more efficient than Normal CFG edge traverse
Simple constant propagation using SSA • Standard worklist algorithm • Identify simple constants • If variable is defined using only one constant value, it is simple constant • If variable definition uses phi func and func arguments are all same, v is simple constant!! • Simple constants • First iteration - i1, j1, k1 • Second iteration – j3 added • Traverse all edges in the CFG, and process. <- inefficient!! I1<-1 J1<-1 K1<-1 J2<-phi(j4, j1) K2<-phi(k4, k1) If k2 < 100 Return j2 If j2 <20 J5 <- k2 K5 <- k2 + 2 J3<-i1 K3<-k2 + 1 J4 <- phi (j3, j5) K4 <- phi (k3, k5)
Sparse simple constant propagation • Standard simple constant propagate algorithm is inefficient!! • For each program points (CFG edge connected node), maintain one constant value for each var. • O(EV), E # of edge in CFG • Inefficient, since constant may have to be propagated through irrelevant node -> exploit spare dependence • Exploit SSA edges (explicitly connect defs with uses) • Iterate over SSA edges instead of over all CFG edges, SSA has fewer edge than def-use graph
Sparse simple constant propagation • Sparse simple worklist algorithm • W <- list of all statement in the SSA program • While W is not empty • Remove statement S from W • If S is v <- phi(c1, c2, c3,…) and for arbitrary I, j, ci == cj • Replace S by v <- c • If S is v<-c for some constant c • Delete S from the program • For each statement T that uses v • Substitute c for v in T • W <- W union {T} • 실제로 constant definition’s use edge에 대해서만 search를 함으로써 sparse search algorithm으로 modify된다.
Conditional constant propagation • Delete infeasible branch due to discovered constant • Data flow analysis • Lattice (over defined, defined, never defined) • Executability – Is there evidence that block B can ever be executed? • Executable assignment – assignments in a executable block B • Processed in compile time not in run time • Algorithm • Simultaneously find constants + eliminates infeasible branch • First find out executable block using following observation • If x < y goto L1 else L2 V[x] = Top or V[y] = Top -> E(L1, L2)=T • If x < y goto L1 else L2 V[x]=c1, V[y]=c2, c1!=c2, L1 or L2 taken • Second, in any assignments, get lattice value for variables • Block executability and variable lattice is updated repeatedly
Conditional constant propagation example • By analysis, j2, j3, j5 are always 1, else part is not reachable • Unreachable code is eliminated by dead code elimination Source code Dead code elimination Analysis for conditional constant propagation SSA form transform
Conditional constant propagation result B1 :I1<-1 J1<-1 K1<-1 K2 <- phi (k3, 0) If k2 < 100 B2 :J2<-phi(j4, j1) K2<-phi(k4, k1) If k2 < 100 K3 <- k2 +1 Return 1 B4 :Return j2 B3 :If j2 <20 J2 constant prop. B6 dead block j4 constant prop. k4 phi function use one argument, so copy propagation B6 :J5 <- k2 K5 <- k2 + 2 B5 :J3<-i1 K3<-k2 + 1 B7 :J4 <- phi (j3, j5) K4 <- phi (k3, k5)
Induction variable identification • Induction variable reduction • Optimize the SSA graph rather than handling CFG directly • SSA graph clarify the link from data use to its definition • Induction variable value is increased/decreased by constant in each iteration. • I0 – first entry value • I2 – value after going through loop • RC – loop invariant expression
Induction variable identification • SSA-based algorithm • Build SSA representation • Iterate from innermost CFG loop to outermost loop (just like loop invariant code motion search.. Innermost -> outermost ) • finding SSA cycle • Each cycle may be basic induction variable if a variable in a cycle is a function of loop invariants and its value on the current iteration (how to detect this condition?) • Phi function in the cycle have as one of its inputs a def from inside the loop and a def from outside the loop • The def inside the loop (phi function input) will be part of the cycle and get one operand from the phi function and all others will be loop invariant • Find derived induction variable
Induction variable identification example • Source code transformation to SSA form • Build SSA value graph • Find a cycle from SSA graph • Cycle phi function이 앞에서 언급한 조건을 만족하므로 basic induction variable!! Variable x case SSA value graph
Induction variable identification example • 1. transform SSA form • 2. find SSA cycle • I2,m2 has a SSA cycle<-candidate • 3. for each candidate, check basic induction variable condition • I2 case • i1 – outside ,i3 - inside • i3 (inside def) get a operand from phi function and another from constant -> biv!!
Global Value numbering • Global Value numbering (GVN) • Symbolic evaluation (not run-time evaluation), if symbolic number is same, two computation are equal. • Compiler optimization based on the SSA IR • Build value graph from the SSA form • prevent the false variable name-value name mappings • more powerful that global common sub expression (CSE) in some cases
Value numbering example SSA transform
Dependency issues • In optimization, parallelization & scheduling, dependency check is important • 3 data dependence - true ( read after write), anti (write after read), output (write after write) dependency • SSA form, true dependence is evident. (def site, use list) • There are no anti, output dependence in SSA form • SSA form has a single definition of each variable • so write after read dependency can’t occur, why? • so write after write dependency can’t occur, why? • Variable definition dominates all use of it.
Other issues • Copy propagation • Check live range of each variable, replace the target variable uses following the copy operation by source variables if the target variable is live. • In SSA form, variable values are assured to remain statically single • So, there is no need to check live range, simply replace it! • Register allocation, common sub-expression elimination