550 likes | 578 Views
Learn about using lattices for dataflow analysis, how to deal with termination worries, and achieve fixpoints in program analyzers. Explore lattice concepts and techniques to optimize program analysis.
E N D
CSE 231 : Advanced Compilers Building Program Analyzers
http://blog.ezyang.com/2011/04/hoopl-dataflow-lattices/ http://lambda-the-ultimate.org/node/3557
http://research.microsoft.com/en-us/um/people/simonpj/papers/c--/dfopt.pdfhttp://research.microsoft.com/en-us/um/people/simonpj/papers/c--/dfopt.pdf http://blog.ezyang.com/2011/04/hoopl-dataflow-lattices/ http://lambda-the-ultimate.org/node/3557
http://research.microsoft.com/en-us/um/people/simonpj/papers/c--/dfopt.pdfhttp://research.microsoft.com/en-us/um/people/simonpj/papers/c--/dfopt.pdf http://blog.ezyang.com/2011/04/hoopl-dataflow-lattices/ http://lambda-the-ultimate.org/node/3557
for edge e in CFG: m[e] = EMPTY for node n in CFG: q.push(n) while not q.empty(): n = q.pop() info_in = m[n.in_edges] info_out = F(n, info_in) for i from 0 to len(info_out): e = n.out_edges[i] new_info = m[e] UNION info_out[i] if m[e] != new_info: m[e] = new_info q.push(e.dest) Started with sets. Termination worries.
Foundations : Lattices A lattice is (S, ⊑, ⊥, ⊤, ⊔, ⊓) where: (S, ⊑) is a poset ⊥ is the smallest thing in S ⊤ is the biggest thing in S lub(a, b) and glb(a, b) always exist a⊔ b = lub(a, b) a⊓ b = glb(a, b)
Foundations : Lattices (Formally) A lattice is (S, ⊑, ⊥, ⊤, ⊔, ⊓) where: (S, ⊑) is a poset ∀ a ∈ S . ⊥ ⊑ a ∀ a ∈ S . a ⊑ ⊤ ∀ a, b ∈ S . ∃ c . c = lub(a, b) /\ a⊔ b = c ∀ a, b ∈ S . ∃ c . c = glb(a, b) /\ a⊓ b = c
Foundations :Fancy Lattice Names ⊥ is “botom” ⊤ is “top” ⊔ is “join” ⊓ is “meet”
for edge e in CFG: m[e] = BOTTOM for node n in CFG: q.push(n) while not q.empty(): n = q.pop() info_in = m[n.in_edges] info_out = F(n, info_in) for i from 0 to len(info_out): e = n.out_edges[i] new_info = m[e] JOIN info_out[i] if m[e] != new_info: m[e] = new_info q.push(e.dest) Port to lattices. Small patch set.
Termination. while not q.empty(): n = q.pop() info_in = m[n.in_edges] info_out = F(n, info_in) for i from 0 to len(info_out): e = n.out_edges[i] new_info = m[e] JOIN info_out[i] if m[e] != new_info: m[e] = new_info q.push(e.dest) Finite lattice height implies termination.
Termination. But can we do better? while not q.empty(): n = q.pop() info_in = m[n.in_edges] info_out = F(n, info_in) for i from 0 to len(info_out): e = n.out_edges[i] new_info = m[e] JOIN info_out[i] if m[e] != new_info: m[e] = new_info q.push(e.dest) Finite lattice height implies termination. Get rid of that JOIN right in the middle?
Termination. But can we do better? while not q.empty(): n = q.pop() info_in = m[n.in_edges] info_out = F(n, info_in) for i from 0 to len(info_out): e = n.out_edges[i] new_info = m[e] JOIN info_out[i] if m[e] != new_info: m[e] = new_info q.push(e.dest) Get rid of that JOIN right in the middle? Yes. The trick is in the flow functions.
Safely Getting Rid Of The Join In general, can’t remove join and have termination. So, when is it OK?
Safely Getting Rid Of The Join In general, can’t remove join and have termination. So, when is it OK? Port our algorithm to math to figure it out. Build in terms of fixpoints.
Fixpoints Are Easy Fixpoint: an input equal the output. A fixpoint of F is any X such that F(X) = X. “Best” answer since repeating yields same result.
Using Fixpoints In Analysis Goal: compute map m from CFG edges to dataflow information Strategy: define a global flow function F as follows: F takes a map m as a parameter and returns a new map m’, in which individual local flow functions have been applied
The Big F m’ m F
Just Regular Flow Funcs Inside m’ m f1 f2 f3
Goal: Find Fixpoint of F m m’ m’ F F F …
Fixpoint of F Goal: a fixed point of F, i.e. m where m = F(m) How should we do this?
Fixpoint of F Goal: a fixed point of F, i.e. m where m = F(m) How should we do this? Let ⊥ be ⊥lifted to a map: ⊥ = e . ⊥ Compute F(⊥), then F(F(⊥)), then F(F(F(⊥))), ... until the result doesn’t change
Fixpoint of F Goal: a fixed point of F, i.e. m where m = F(m) How should we do this? Let ⊥ be ⊥lifted to a map: ⊥ = e . ⊥ Compute F(⊥), then F(F(⊥)), then F(F(F(⊥))), ... until the result doesn’t change … but how long could that take ???
Fixpoint of F : Formal Solution Solution: ⊔ Fi(⊥) i = 0
Fixpoint of F : Formal Solution Solution: We want F1(⊥) ⊑F2(⊥) ⊑F3(⊥) … ⊑Fk(⊥) Allows us to eliminate the big join. Just require F to be monotonic: ∀ a b, a ⊑ b ➞ F(a) ⊑ F(b) ⊔ Fi(⊥) i = 0
Back To Termination Solution: if F is monotonic, we have it. Finite lattice height termination w/out joins! OK. But how do we know F is monotonic? F is monotonic if flow functions monotonic.
Another benefit of monotonicity • Suppose Marsians came to earth, and miraculously give you a fixed point of F, call it fp. • Then:
Another benefit of monotonicity • Suppose Marsians came to earth, and miraculously give you a fixed point of F, call it fp. • Then:
Another benefit of monotonicity • We are computing the least fixed point...
Recap • Let’s do a recap of what we’ve seen so far • Started with worklist algorithm for reaching definitions
Worklist algorithm for reaching defns let m: map from edge to computed value at edge let worklist: work list of nodes for each edge e in CFG do m(e) := ∅ for each node n do worklist.add(n) while (worklist.empty.not) do let n := worklist.remove_any; let info_in := m(n.incoming_edges); let info_out := F(n, info_in); for i := 0 .. info_out.length do let new_info := m(n.outgoing_edges[i]) ∪ info_out[i]; if (m(n.outgoing_edges[i]) new_info]) m(n.outgoing_edges[i]) := new_info; worklist.add(n.outgoing_edges[i].dst);
Generalized algorithm using lattices let m: map from edge to computed value at edge let worklist: work list of nodes for each edge e in CFG do m(e) := ⊥ for each node n do worklist.add(n) while (worklist.empty.not) do let n := worklist.remove_any; let info_in := m(n.incoming_edges); let info_out := F(n, info_in); for i := 0 .. info_out.length do let new_info := m(n.outgoing_edges[i]) ⊔ info_out[i]; if (m(n.outgoing_edges[i]) new_info]) m(n.outgoing_edges[i]) := new_info; worklist.add(n.outgoing_edges[i].dst);
Next step: removed outer join • Wanted to remove the outer join, while still providing termination guarantee • To do this, we re-expressed our algorithm more formally • We first defined a “global” flow function F, and then expressed our algorithm as a fixed point computation
Guarantees • If F is monotonic, don’t need outer join • If F is monotonic and height of lattice is finite: iterative algorithm terminates • If F is monotonic, the fixed point we find is the least fixed point. • Any questions?
What about if we start at top? • What if we start with ⊤, F(⊤), F(F(⊤)), F(F(F(⊤)))
What about if we start at top? • What if we start with ⊤, F(⊤), F(F(⊤)), F(F(F(⊤))) • We get the greatest fixed point • Why do we prefer the least fixed point? • More precise
Graphically y 10 10 x
Graphically y 10 10 x
Graphically y 10 10 x
Another example: constant prop • Set D = in x := N Fx := N(in) = out in x := y op z Fx := y op z(in) = out
Another example: constant prop • Set D = 2 { x ! N | x 2 Vars Æ N 2 Z } in x := N Fx := N(in) = in – { x ! * } [ { x ! N } out in x := y op z Fx := y op z(in) = in – { x ! * } [ { x ! N | ( y ! N1 ) 2 in Æ ( z ! N2 ) 2 in Æ N = N1 op N2 } out
Another example: constant prop in Fx := *y(in) = x := *y out in F*x := y(in) = *x := y out
Another example: constant prop in Fx := *y(in) = in – { x ! * } [ { x ! N | 8 z 2 may-point-to(x) . (z ! N) 2 in } x := *y out in F*x := y(in) = in – { z ! * | z 2 may-point(x) } [ { z ! N | z 2 must-point-to(x) Æ y ! N 2 in } [ { z ! N | (y ! N) 2 in Æ (z ! N) 2 in } *x := y out
Another example: constant prop in *x := *y + *z F*x := *y + *z(in) = out in x := f(...) Fx := f(...)(in) = out
Another example: constant prop in *x := *y + *z F*x := *y + *z(in) = Fa := *y;b := *z;c := a + b; *x := c(in) out in x := f(...) Fx := f(...)(in) = ; out
Another example: constant prop in s: if (...) out[0] out[1] in[0] in[1] merge out