330 likes | 910 Views
Data-Flow Analysis II. CS 671 March 13, 2008. Data-Flow Analysis. Gather conservative, approximate information about what a program does Result: some property that holds every time the instruction executes The Data-Flow Abstraction Execution of an instruction transforms program state
E N D
Data-Flow Analysis II CS 671 March 13, 2008
Data-Flow Analysis • Gather conservative, approximate information about what a program does • Result: some property that holds every time the instruction executes • The Data-Flow Abstraction • Execution of an instruction transforms program state • To analyze a program, we must consider all possible sequences of program points (paths) • Summarize all possible program states with finite set of facts • Limitation: may consider some infeasible paths
The General Approach • Setting up and solving systems of equations that relate information at various points in the program • such as out[S] = gen[S] È ( in[S] - kill[S] ) where • S is a statement • in[S] and out[S] are information before and after S • gen[S] and kill[S] are information generated and killed by S • definition of in, out, gen, and kill depends on the desired information
Data-Flow Analysis (cont.) • Properties: • either a forward analysis (out as function of in) or • a backward analysis (in as a function of out). • either an “along some path” problem or • an “along all paths” problem. • Data-flow analysis must be conservative • Definitions: • point between two statements (or before the first statements and after the last) • path is a sequence of consecutive points in the control-flow graph
Example – Live Variables • Steps: • Set up live sets for each program point • Instantiate equations • Solve equations if (c) x = y+1 y = 2*z if (d) x = y+z z = 1 z = x
Example • Program points L1 if (c) L2 L3 x = y+1 y = 2*z if (d) L4 L5 L6 L7 x = y+z L8 L9 z = 1 L10 L11 z = x L12
Example L1 if (c) 1 L2 L3 x = y+1 y = 2*z if (d) 2 L4 3 L5 4 L6 L7 5 x = y+z L8 L9 z = 1 6 L10 L11 7 z = x L12
in[I] = ( out[I] – def[I] ) use[I] out[B] = in[B’] B’ succ(B) Example L1 = L2 = L3 = L4 = L5 = L6 = L7 = L8 = L9 = L10 = L11 = L12 = L1 = { } if (c) 1 L2 = { } L3 = { } x = y+1 y = 2*z if (d) 2 L4 = { } 3 L5 = { } 4 L6 = { } L7 = { } 5 x = y+z L8 = { } L9 = { } z = 1 6 L10 = { } L11 = { } 7 z = x L12 = { }
More Terminology • Successors • Succ(B1) = • Succ(B2) = • Succ(B3) = • Predecessors • Pred(B2) = • Pred(B3) = • Pred(B4) = B1 B2 B3 B4 • Branch node – more than one successor • Join node – more than one predecessor
Dominators • Dominance is a binary relation on the flow graph nodes that allows us to easily find loops • Node d dominates node i (d dom i) if every possible execution path from entry to i includes d • Dominance is: • Reflexive – every node dominates itself • Transitive – if a dom b and b dom c, then a dom c • Antisymmetric – if a dom b and b dom a then a=b • dom(entry) = • dom(b1) = • dom(b2) = • dom(b3) = • dom(b4) = • dom(b5) = • dom(b6) = • dom(exit) = entry B1 B2 B3 B4 B5 B6 exit
Immediate dominators • Idom(b) – a iff (a b) and (a dom b) and there does not exist a node c such that (a dom c) and (c dom b) with c different than a and b • Idom of a node is unique • Idom relationship forms a tree whose root is the entry node • idom(b1) = • idom(b2) = • idom(b3) = • idom(b4) = • idom(b5) = • idom(b6) = • idom(exit) = entry B1 B2 B3 B4 B5 B6 exit • Flow graph
Strict Dominators and Postdominators • (d sdom i) if d dominates i and d i • (p pdom i) if every possible execution path from i to exit includes p entry • pdom(entry) = • pdom(b1) = • pdom(b2) = • pdom(b3) = • pdom(b4) = • pdom(b5) = • pdom(b6) = B1 B2 B3 B4 B5 B6 exit • Flow graph
Loops • Back edge – edge whose head dominates its tail • Loop containing this type of back edge is a natural loop • i.e. it has a single external entry point • For back edge b c the loop header is c entry B1 • Natural loops = • Loop header (B3 B1) = • Loop header (B2 B2) = B2 B3 exit
Quicksort Example • How might we optimize this code? i := m-1 j := n t1 := 4*n v := a[t1] b1 i := i+1 t2 := 4*i t3 := a[t2] if t3 < v goto b2 b2 j := j-1 t4 := 4*j t5 := a[t4] if t5 > v goto b3 b3 t6 :=4*i x := a[t6] t7 := 4*i t8 := 4*j t9 := a[t8] a[t7] :=t9 t10 := 4*j a[t10] := x t11 := 4*i x := a[t11] t12 := 4*i t13 := 4*n t14 := a[t13] a[t12] := t14 t15 := 4*n a[t15] := x if i >= j goto b6 b4 b5 b6 [Quicksort] (i, j, v, x variables are needed outside)
Reaching Definitions • Informally: • determine if a particular definition (e.g. “x” in “x = 5”) may reach a given point in the program • Why reaching definitions may be useful: x := 5 y := x + 2 if “x := 5” is the only definition reaching “y := x+2”, it can be simplified to “y := 7” (constant propagation)
Reaching Definitions • Definition of a variable X: • is a statements that assigns (or may assign) a value to X • unambiguous: X := 3 • ambiguous: foo(X) or *Y := 3 • A definition d reaches a point p : • if there is a path from the point immediately following d to p, • such that d is not killed along that path. • A definition d of variable X is killed along path p • if there is another definition of X along p.
Reaching Definitions (cont.) • Has the following properties: • forward analysis • “along some path” problem • Is conservative in that: • definition d may not define variable X • along a path p, there is another definition of X, but this other definition is ambiguous • definition d may be killed along infeasible paths
1 1 2 3 1-2-3 2-3 Data-Flow Analysis: Structured Programs • Most programs are structured: • sequence of statements • if-then-else construct • while-loops (including for-loops, loops with breaks,...) • For these programs, we may use an inductive (syntax driven) approach: 1-2-3
d: a=b+c S S Reaching Definitions for Structured Programs gen[S] = {d} kill[S] = All-defs-of-a - {d} out[S] = gen[S]È ( in[S] - kill[S] ) gen[S] = gen[S2]È ( gen[S1] - kill[S2] ) kill[S] = kill[S2]È ( kill[S1] - gen[S2] ) in[S1] = in[S] in[S2] = out[S1] out[S] = out[S2] S1 S2
S S Reaching Definitions for Structured Programs (cont.) gen[S] = gen[S1]Ègen[S2] kill[S] = kill[S1]Ç kill[S2] in[S1] = in[S2] = in[S] out[S] = out[S1]È out[S2] S1 S2 gen[S] = gen[S1] kill[S] = kill[S1] in[S1] = in[S]È gen[S1] out[S] = out[S1] S1
Iterative Solution: Data-Flow Equations • Inductive approach only applicable to structured programs • because utilizes the structure of the program to synthesize & distribute the data-flow information • Need a general technique: Iterative Approach • compute the gen/kill sets of each statement / basic block • initialize the in/out sets • repetitively compute out/in sets until a steady state is reached
Reaching Definitions • Reaching definitions: • set of definitions that may reach (along one or more paths) a given point • gen[S]: definition d is in gen[S] if d may reach the end of S, independently of whether it reaches the beginning of S. • kill[S]: the set of definitions that never reach the end of S, even if they reach the beginning. • Equations: • in[S] =È (P a predecessor of S) out[P ] • out[S] = gen[S] È ( in[S] - kill[S] )
Reaching Definitions (cont.) • Algorithm: for each basic block B: out[B] := gen[B]; (1) do change := false; for each basic block B do in[B] =È (P a predecessor of B) out[P ]; (2) old-out = out[B]; (3) out[B] = gen[B] È (in[B] - kill[B]); (4) if (out[B] != old-out) then change := true; (5) end while change
gen[b1] := {d1, d2, d3} kill[b1] := {d4, d5, d6, d7} i := m-1 d1 j := n d2 a := u1 d3 b1 gen[b2] := {} kill[b2] := {} gen[b3] := {} kill[b3] := {} i := i+1 d4 j := j-1 d5 b2 gen[b4] := {} kill[b4] := {} a := u2 d6 b3 i := u3 d7 b4 Example for Reaching Definitions Compute gen/kill and iterate (visiting order: b1, b2, b3, b4) b1 b2 b3 b4 initial in[B] 000 0000 000 0000 000 0000 000 0000 out[B] 000 0000 000 0000 000 0000 000 0000 pass1 in[B] 000 0000 000 0000 000 0000 000 0000 out[B] 000 0000 000 0000 000 0000 000 0000 pass2 in[B] 000 0000 000 0000 000 0000 000 0000 out[B] 000 0000 000 0000 000 0000 000 0000 pass3 in[B] 000 0000 000 0000 000 0000 000 0000 out[B] 000 0000 000 0000 000 0000 000 0000
Generalizations: Other Data-Flow Analyses • Reaching definitions is a (forward; some-path) analysis • For backward analysis: • interchange in / out sets in the previous algorithm, lines (1-5) • For all-path analysis: • intersection is substituted for union in line (2)
Common Subexpression Elimination • Rule used to eliminate subexpression within a basic block • The subexpression was already defined • The value of the subexpression is not modified • i.e. none of the values needed to compute the subexpression are redefined • What about eliminating subexpressions across basic blocks?
Available Expressions • An expression x+y is available at a point p: • if every path from the initial node to p evaluates x+y, and • after the last such evaluation, prior to reaching p, there are no subsequent assignments to x or y. • Definitions: • forward, all-path, • e-gen[S]: expressions definitely generated by S, • e.g. “z := x+y”: expression “x+y” is generated • e-kill[S]: expressions that may be killed by S • e.g. “z := x+y”: all expression containing “z” are killed. • order: compute e-gen and then e-kill, e.g. “x:= x+y”
Available Expressions (cont.) • Algorithm: for each basic block B: out[B] := e-gen[B]; (1) do change := false; for each basic block B do in[B] = Ç (P a predecessor of B) out[P]; (2) old-out = out[B]; (3) out[B] = e-gen[B] È (in[B] - e-kill[B]); (4) if (out[B] != old-out) then change := true; (5) end while change difference: line (2), use intersection instead of union
Pointer Analysis • Identify the memory locations that may be addressed by a pointer • may be formalized as a system of data-flow equations. • Simple programming model: • pointer to integer (or float, arrays of integer, arrays of float) • no pointer to pointers allowed • Definitions: • in[S]: the set of pairs (p, a), where p is a pointer, a is a variables, and p might point to a before statement S. • out[S]: the set of pairs (p, a), where p might point to a after statement S. • gen[S]: the new pairs (p, a) generated by the statement S. • kill[S]: the pairs (p, a) killed by the statement S.
S: a=b+c S: p = &a S: p = q Pointer Analysis (cont.) input set gen[S ] = { } kill[S ] = { } input set gen[S ] = { (p, a) } kill[S, input set ] = { (p, b) | (p, b) is in input set } input set gen[S, input set ] = { (p, b) | (q, b) is in input set } kill[S, input set ] = { (p, b) | (p, b) is in input set }
Pointer Analysis (cont.) • Algorithm: for each basic block B: out[B] := gen[B ]; (1) do change := false; for each basic block B do in[B] =È (P a predecessor of B) out[P]; (2) old-out = out[B]; (3) out[B] = gen[B, in[B] ] È (in[B] - kill[B, in[B] ] ) (4) if (out[B] != old-out) then change := true; (5) end while change difference: line (4): gen and kill are functions of B and in[B].
Performance of Iterative Solutions • Global analysis may be memory-space / computing intensive • May be reduced by • using bitvector representations for sets • analyzing only relevant variables • e.g. temporary variables may be ignored • synthesizing data-flow within basic block • mixing inductive and iterative solutions • suitably ordering the basic block • e.g. depth first order is good for forward analysis • limiting scope • may reduce the precision of analysis
Summary • Iterative algorithm: • solve data-flow problem for arbitrary control flow graph • To solve a new data-flow problem: • define gen/kill accordingly • determine properties: • forward / backward • some-path / all-path