310 likes | 541 Views
Classical Optimization. Types of classical optimizations Operation level : one operation in isolation Local : optimize pairs of operations in same basic block (with or without dataflow analysis), e.g. peephole optimization
E N D
Classical Optimization • Types of classical optimizations • Operation level: one operation in isolation • Local: optimize pairs of operations in same basic block (with or without dataflow analysis), e.g. peephole optimization • Global: optimize pairs of operations spanning multiple basic blocks and must use dataflow analysis in this case, e.g. reaching definitions, UD/DU chains, or SSA forms • Loop: optimize loop body and nested loops
Local Constant Folding • Goal: eliminate unnecessary operations • Rules: • X is an arithmetic operation • If src1(X) and src2(X) are constant, then change X by applying the operation r7 = 4 + 1 src2(X) = 1 r5 = 2 * r4r6 = r5 * 2 src1(X) = 4
Local Constant Combining • Goal: eliminate unnecessary operations • First operation often becomes dead • Rules: • Operations X and Y in same basic block • X and Y have at least one literal src • Y uses dest(X) • None of the srcs of X have defs between X and Y (excluding Y) r7 = 5 r5 = 2 * r4r6 = r5 * 2 r6 = r4 * 4
Local Strength Reduction • Goal: replace expensive operations with cheaper ones • Rules (example): • X is an multiplication operation where src1(X) or src2(X) is a const 2k integer literal • Change X by using shift operation • For k=1 can use add r7 = 5 r5 = 2 * r4r6 = r4 * 4 r5 = r4 + r4 r6 = r4 << 2
Local Constant Propagation r1 = 5r2 = _xr3 = 7r4 = r4 + r1r1 = r1 + r2r1 = r1 + 1r3 = 12r8 = r1 - r2r9 = r3 + r5r3 = r2 + 1r7 = r3 - r1M[r7] = 0 • Goal: replace register uses with literals (constants) in single basic block • Rules: • Operation X is a move to register with src1(X) literal • Operation Y uses dest(X) • There is no def of dest(X) between X and Y (excluding defs at X and Y) • Replace dest(X) in Y with src1(X)
Local Common Subexpression Elimination (CSE) r1 = r2 + r3r4 = r4 + 1r1 = 6r6 = r2 + r3r2 = r1 - 1r5 = r4 + 1r7 = r2 + r3r5 = r1 - 1 • Goal: eliminate recomputations of an expression • More efficient code • Resulting moves can get copy propagated (see later) • Rules: • Operations X and Y have the same opcode and Y follows X • src(X) = src(Y) for all srcs • For all srcs, no def of a src between X and Y (excluding Y) • No def of dest(X) between X and Y (excluding X and Y) • Replace Y with move dest(Y) = dest(X)
Dead Code Elimination r1 = 3r2 = 10 • Goal: eliminate any operation who’s result is never used • Rules (dataflow required) • X is an operation with no use in DU chain, i.e. dest(X) is not live • Delete X if removable (not a mem store or branch) • Rules too simple! • Misses deletion of r4, even after deleting r7, since r4 is live in loop • Better is to trace UD chains backwards from “critical” operations r4 = r4 + 1r7 = r1 * r4 r3 = r3 + 1 r2 = 0 r3 = r2 + r1 M[r1] = r3
Local Backward Copy Propagation r1 = r8 + r9r2 = r9 + r1r4 = r2r6 = r2 + 1r9 = r1r7 = r6r5 = r6 + 1r4 = 0r8 = r2 + r7 • Goal: propagate LHS of moves backward • Eliminates useless moves • Rules (dataflow required) • X and Y in same block • Y is a move to register • dest(X) is a register that is not live out of the block • Y uses dest(X) • dest(Y) not used or defined between X and Y (excluding X and Y) • No uses of dest(X) after the first redef of dest(Y) • Replace src(Y) on path from X to Y with dest(X) and remove Y
Global Constant Propagation r1 = 4r2 = 10 • Goal: globally replace register uses with literals • Rules (dataflow required) • X is a move to a register with src1(X) literal • Y uses dest(X) • dest(X) has only one def at X for UD chains to Y • Replace dest(X) in Y with src1(X) r5 = 2r7 = r1 * r5 r3 = r3 + r5 r2 = 0 r3 = r2 + r1r6 = r7 * r4 M[r1] = r3
Global Constant Propagation with SSA r1 = 4r2 = 10 • Goal: globally replace register uses with literals • Rules (high level) • For operation X with a register src(X) • Find def of src(X) in chain • If def is move of literal, src(X) is constant: done • If RHS of def is an operation, including node, recurse on all srcs • Apply rule for operation to determine src(X) constant • Note: abstract values T (top) and (bottom) are often used to indicate unknown values r5 = 2r7 = r1 * r5 r3 = r3 + r5 r2 = 0 r3 = r2 + r1r6 = r7 * r4 M[r1] = r3 Exercise: compute SSA form and propagate constants
Forward Copy Propagation • Goal: globally propagate RHS of moves forward • Reduces dependence chain • May be possible to eliminate moves • Rules (dataflow required) • X is a move with src1(X) register • Y uses dest(X) • dest(X) has only one def at X for UD chains to Y • src1(X) has no def on any path from X to Y • Replace dest(X) in Y with src1(X) r1 = r2r3 = r4 r6 = r3 + 1 r2 = 0 r5 = r2 + r3
Global Common Subexpression Elimination (CSE) • Goal: eliminate recomputations of an expression • Rules: • X and Y have the same opcode and X dominates Y • src(X) = src(Y) for all srcs • For all srcs, no def of a src on any path between X and Y (excluding Y) • Insert rx = dest(X) immediately after X for new register rx • Replace Y with move dest(Y) = rx r1 = r2 * r6r3 = r4 / r7 r2 = r2 + 1 r1 = r3 * 7 r5 = r2 * r6r8 = r4 / r7 r9 = r3 * 7
Loop Optimizations • Loops are the most important target for optimization • Programs spend much time in loops • Loop optimizations • Invariant code removal (aka. code motion) • Global variable migration • Induction variable strength reduction • Induction variable elimination
Code Motion preheader r1 = 0 • Goal: move loop-invariant computations to preheader • Rules: • Operation X in block that dominates all exit blocks • X is the only operation to modify dest(X) in loop body • All srcs of X have no defs in any of the basic blocks in the loop body • Move X to end of preheader • Note 1: if one src of X is a memory load, need to check for stores in loop body • Note 2: X must be movable and not cause exceptions header r4 = M[r5]r7 = r4 * 3 r8 = r2 + 1r7 = r8 * r4 r3 = r2 + 1 r1 = r1 + r7 M[r1] = r3
Global Variable Migration • Goal: assign a global variable to a register for the entire duration of a loop • Rules: • X is a load or store to M[x] • Address x of M[x] not modified in loop • Replace all M[x] in loop by new register rx • Add rx = M[x] to preheader • Add M[x] = rx to each loop exit • Memory disambiguation is required: all mem ops in loop whose address can equal x must use same address x r4 = M[r5]r4 = r4 + 1 r8 = M[r5]r7 = r8 * r4 M[r5] = r4 M[r5] = r7
Loop Strength Reduction (1) preheader • Goal: create basic IVs from derived IVs • Rules • X is a *, <<, +, or - operation • src1(X) is a basic IV • src2(X) is invariant • No other ops modify dest(X) • dest(X) != src(X) for all srcs • dest(X) is a register header r5 = r4 - 3r4 = r4 + 1 r7 = r4 * r9 src2(X) = r9 r6 = r4 << 2 src1(X) = r4 dest(X) = r7 Basic IV r4 has triple (r4, 1, ?)
Loop Strength Reduction (2) r1 = r4 * r9r2 = 1 * r9 • Transformation • Insert into the bottom of the preheader:new_reg = RHS(X) • If opcode(X) is not + or -, then insert into the bottom of the preheader:new_inc = inc(src1(X)) opcode(X) src2(X) • Elsenew_inc = inc(src1(X)) • Insert at each update of src1(X):new_reg += new_inc • Change X by:dest(X) = new_reg r5 = r4 - 3r4 = r4 + 1r1 = r1 + r2 r7 = r1 r6 = r4 << 2 Exercise: apply strength reduction to r5 and r6
IV Elimination (1) r1 = 0r2 = 0 • Goal: remove unnecessary basic IVs from the loop by substituting uses with another basic IV • Rules for IVs with same increment and initial value: • Find two basic IV x and y • If x and y in same family and have same increment and initial values • Incremented at same place • x is not live at loop exit • For each basic block where x is defined, there are no uses of x between first/last def of x and last/first def of y • Replace uses of x with y r1 = r1 - 1r2 = r2 - 1 r9 = r2 + r4 r7 = r1 * r9 r4 = M[r1] M[r2] = r7 Exercise: apply IV elimination
IV Elimination (2) • Many variants, from simple to complex: • Trivial cases: IV variable that is never used except by the increment operations and is not live at loop exit • IVs with same increment and same initial value • IVs with same increment and initial values are known constant offset from each other • IVs with same increment, but initial values unknown • IVs with different increments and no info on initial values • Method 1 and 2 are virtually free, so always applied • Methods 3 to 5 require preheader operations
IV Elimination (3) • Example for method 4 r1 = ?r2 = ? r1 = ?r2 = ?r5 = r2-r1+8 r3 = M[r1+4]r4 = M[r2+8]…r1 = r1 + 4r2 = r2 + 4 r3 = M[r1+4]r4 = M[r1+r5]…r1 = r1 + 4