720 likes | 881 Views
Introduction to Compilers. Ben Livshits. Based in part of Stanford class slides from http ://infolab.stanford.edu/~ullman/dragon/w06/w06.html. Organization. Really basic stuff Flow Graphs Constant Folding Global Common Subexpressions Induction Variables/Reduction in Strength
E N D
Introduction to Compilers Ben Livshits Based in part of Stanford class slides from http://infolab.stanford.edu/~ullman/dragon/w06/w06.html
Organization • Really basic stuff • Flow Graphs • Constant Folding • Global Common Subexpressions • Induction Variables/Reduction in Strength • Data-flow analysis • Proving Little Theorems • Data-Flow Equations • Major Examples • Pointer analysis
Really Basic Stuff Flow Graphs Constant Folding Global Common Subexpressions Induction Variables/Reduction in Strength
Dawn of Code Optimization • A never-published Stanford technical report by Fran Allen in 1968 • Flow graphs of intermediate code • Key things worth doing
Intermediate Code for (i=0; i<n; i++) A[i] = 1; • Intermediate code exposes optimizable constructs we cannot see at source-code level. • Make flow explicit by breaking into basic blocks = sequences of steps with entry at beginning, exit at end.
i = 0 if i>=n goto … t1 = 8*i A[t1] = 1 i = i+1 Basic Blocks for (i=0; i<n; i++) A[i] = 1;
Induction Variables • x is an induction variable in a loop if it takes on a linear sequence of values each time through the loop. • Common case: loop index like i and computed array index like t1. • Eliminate “superfluous” induction variables. • Replace multiplication by addition (reduction in strength ).
i = 0 if i>=n goto … t1 = 8*i A[t1] = 1 i = i+1 Example t1 = 0 n1 = 8*n if t1>=n1 goto … A[t1] = 1 t1 = t1+8
Loop-Invariant Code Motion • Sometimes, a computation is done each time around a loop. • Move it before the loop to save n-1 computations. • Be careful: could n=0? I.e., the loop is typically executed 0 times.
Example i = 0 i = 0 t1 = y+z if i>=n goto … if i>=n goto … t1 = y+z x = x+t1 i = i+1 x = x+t1 i = i+1
Constant Folding • Sometimes a variable has a known constant value at a point. • If so, replacing the variable by the constant simplifies and speeds-up the code. • Easy within a basic block; harder across blocks.
i = 0 n = 100 if i>=n goto … t1 = 8*i A[t1] = 1 i = i+1 Example t1 = 0 if t1>=800 goto … A[t1] = 1 t1 = t1+8
Global Common Subexpressions • Suppose block B has a computation of x+y. • Suppose we are sure that when we reach this computation, we are sure to have: • Computed x+y, and • Not subsequently reassigned x or y. • Then we can hold the value of x+y and use it in B.
a = x+y t = x+y a = t b = x+y t = x+y b = t c = x+y c = t Example
t = x+y a = t t = x+y b = t c = t Example --- Even Better t = x+y a = t b = t t = x+y b = t c = t
Data-Flow Analysis Proving Little Theorems Data-Flow Equations Major Examples
An Obvious Theorem boolean x = true; while (x) { . . . // no change to x } • Doesn’t terminate. • Proof: only assignment to x is at top, so x is always true.
As a Flow Graph x = true if x == true “body”
Formulation: Reaching Definitions • Each place some variable x is assigned is a definition. • Ask: for this use of x, where could x last have been defined. • In our example: only at x=true.
d2 d1 Example: Reaching Definitions d1: x = true d1 d2 if x == true d1 d2: a = 10
Clincher • Since at x == true, d1 is the only definition of x that reaches, it must be that x is true at that point. • The conditional is not really a conditional and can be replaced by a branch.
Not Always That Easy int i = 2; int j = 3; while (i != j) { if (i < j) i += 2; else j += 2; } • We’ll develop techniques for this problem, but later …
d1 d2 d3 d4 d2, d3, d4 d1, d3, d4 d1, d2, d3, d4 d1, d2, d3, d4 The Flow Graph d1: i = 2 d2: j = 3 if i != j d1, d2, d3, d4 if i < j d3: i = i+2 d4: j = j+2
DFA Is Sometimes Insufficient • In this example, i can be defined in two places, and j in two places. • No obvious way to discover that i!=j is always true. • But OK, because reaching definitions is sufficient to catch most opportunities for constant folding (replacement of a variable by its only possible value).
Be Conservative! • (Code optimization only) • It’s OK to discover a subset of the opportunities to make some code-improving transformation. • It’s notOK to think you have an opportunity that you don’t really have.
Example: Be Conservative boolean x = true; while (x) { . . . *p = false; . . . } • Is it possible that p points to x?
Another def of x d2 As a Flow Graph d1: x = true d1 if x == true d2: *p = false
Possible Resolution • Just as data-flow analysis of “reaching definitions” can tell what definitions of x might reach a point, another DFA can eliminate cases where p definitely does not point to x. • Example: the only definition of p is p = &y and there is no possibility that y is an alias of x.
Reaching Definitions Formalized • A definition d of a variable x is said to reach a point p in a flow graph if: • Every path from the entry of the flow graph to p has d on the path, and • After the last occurrence of d there is no possibility that x is redefined.
Data-Flow Equations --- (1) • A basic block can generate a definition. • A basic block can either • Kill a definition of x if it surely redefines x. • Transmit a definition if it may not redefine the same variable(s) as that definition.
Data-Flow Equations --- (2) • Variables: • IN(B) = set of definitions reaching the beginning of block B. • OUT(B) = set of definitions reaching the end of B.
Data-Flow Equations --- (3) • Two kinds of equations: • Confluence equations : IN(B) in terms of outs of predecessors of B. • Transfer equations : OUT(B) in terms of of IN(B) and what goes on in block B.
{d1, d2} {d2, d3} Confluence Equations IN(B) = ∪predecessors P of B OUT(P) P1 P2 {d1, d2, d3} B
Transfer Equations • Generate a definition in the block if its variable is not definitely rewritten later in the basic block. • Kill a definition if its variable is definitely rewritten in the block. • An internal definition may be both killed and generated.
Example: Gen and Kill IN = {d2(x), d3(y), d3(z), d5(y), d6(y), d7(z)} Kill includes {d1(x), d2(x), d3(y), d5(y), d6(y),…} d1: y = 3 d2: x = y+z d3: *p = 10 d4: y = 5 Gen = {d2(x), d3(x), d3(z),…, d4(y)} OUT = {d2(x), d3(x), d3(z),…, d4(y), d7(z)}
Transfer Function for a Block • For any block B: OUT(B) = (IN(B) – Kill(B)) ∪Gen(B)
Iterative Solution to Equations • For an n-block flow graph, there are 2n equations in 2n unknowns. • Alas, the solution is not unique. • Use iterative solution to get the least fixed-point. • Identifies any def that might reach a point.
Iterative Solution --- (2) IN(entry) = ∅; for each block B do OUT(B)= ∅; while (changes occur) do for each block B do { IN(B) = ∪predecessors P of B OUT(P); OUT(B) = (IN(B) – Kill(B)) ∪Gen(B); }
IN(B1) = {} OUT(B1) = { IN(B2) = { d1, OUT(B2) = { IN(B3) = { d1, OUT(B3) = { Example: Reaching Definitions d1: x = 5 B1 d1} d2} if x == 10 B2 d1, d2} d2} d2: x = 15 B3 d2}
Aside: Notice the Conservatism • Not only the most conservative assumption about when a def is killed or gen’d. • Also the conservative assumption that any path in the flow graph can actually be taken.
Everything Else About Data Flow Analysis Flow- and Context-Sensitivity Logical Representation Pointer Analysis Interprocedural Analysis
Three Levels of Sensitivity • In DFA so far, we have cared about where in the program we are. • Called flow-sensitivity. • But we didn’t care how we got there. • Called context-sensitivity. • We could even care about neither. • Example: where could x ever be defined in this program?
Flow/Context Insensitivity • Not so bad when program units are small (few assignments to any variable). • Example: Java code often consists of many small methods. • Remember: you can distinguish variables by their full name, e.g., class.method.block.identifier.
Context Sensitivity • Can distinguish paths to a given point. • Example: If we remembered paths, we would not have the problem in the constant-propagation framework where x+y = 5 but neither x nor y is constant over all paths.
The Example Again x = 2 y = 3 x = 3 y = 2 z = x+y
An Interprocedural Example int id(int x) {return x;} void p() {a=2; b=id(a);…} void q() {c=3; d=id(c);…} • If we distinguish p calling id from q calling id, then we can discover b=2 and d=3. • Otherwise, we think b, d = {2, 3}.
Context-Sensitivity --- (2) • Loops and recursive calls lead to an infinite number of contexts. • Generally used only for interprocedural analysis, so forget about loops. • Need to collapse strong components of the calling graph to a single group. • “Context” becomes the sequence of groups on the calling stack.
Example: Calling Graph t Contexts: Green Green, pink Green, yellow Green, pink, yellow r s p q main
Comparative Complexity • Insensitive: proportional to size of program (number of variables). • Flow-Sensitive: size of program, squared (points times variables). • Context-Sensitive: worst-case exponential in program size (acyclic paths through the code).