CSE 231 : Advanced Compilers

CSE 231 : Advanced Compilers Building Program Analyzers

dataflow analysis

http://blog.ezyang.com/2011/04/hoopl-dataflow-lattices/

http://blog.ezyang.com/2011/04/hoopl-dataflow-lattices/ http://lambda-the-ultimate.org/node/3557

http://research.microsoft.com/en-us/um/people/simonpj/papers/c--/dfopt.pdfhttp://research.microsoft.com/en-us/um/people/simonpj/papers/c--/dfopt.pdf http://blog.ezyang.com/2011/04/hoopl-dataflow-lattices/ http://lambda-the-ultimate.org/node/3557

Now where were we…

for edge e in CFG: m[e] = EMPTY for node n in CFG: q.push(n) while not q.empty(): n = q.pop() info_in = m[n.in_edges] info_out = F(n, info_in) for i from 0 to len(info_out): e = n.out_edges[i] new_info = m[e] UNION info_out[i] if m[e] != new_info: m[e] = new_info q.push(e.dest) Started with sets. Termination worries.

Foundations : Lattices A lattice is (S, ⊑, ⊥, ⊤, ⊔, ⊓) where: (S, ⊑) is a poset ⊥ is the smallest thing in S ⊤ is the biggest thing in S lub(a, b) and glb(a, b) always exist a⊔ b = lub(a, b) a⊓ b = glb(a, b)

Foundations : Lattices (Formally) A lattice is (S, ⊑, ⊥, ⊤, ⊔, ⊓) where: (S, ⊑) is a poset ∀ a ∈ S . ⊥ ⊑ a ∀ a ∈ S . a ⊑ ⊤ ∀ a, b ∈ S . ∃ c . c = lub(a, b) /\ a⊔ b = c ∀ a, b ∈ S . ∃ c . c = glb(a, b) /\ a⊓ b = c

Foundations :Fancy Lattice Names ⊥ is “botom” ⊤ is “top” ⊔ is “join” ⊓ is “meet”

for edge e in CFG: m[e] = BOTTOM for node n in CFG: q.push(n) while not q.empty(): n = q.pop() info_in = m[n.in_edges] info_out = F(n, info_in) for i from 0 to len(info_out): e = n.out_edges[i] new_info = m[e] JOIN info_out[i] if m[e] != new_info: m[e] = new_info q.push(e.dest) Port to lattices. Small patch set.

Termination. while not q.empty(): n = q.pop() info_in = m[n.in_edges] info_out = F(n, info_in) for i from 0 to len(info_out): e = n.out_edges[i] new_info = m[e] JOIN info_out[i] if m[e] != new_info: m[e] = new_info q.push(e.dest) Finite lattice height implies termination.

Termination. But can we do better? while not q.empty(): n = q.pop() info_in = m[n.in_edges] info_out = F(n, info_in) for i from 0 to len(info_out): e = n.out_edges[i] new_info = m[e] JOIN info_out[i] if m[e] != new_info: m[e] = new_info q.push(e.dest) Finite lattice height implies termination. Get rid of that JOIN right in the middle?

Termination. But can we do better? while not q.empty(): n = q.pop() info_in = m[n.in_edges] info_out = F(n, info_in) for i from 0 to len(info_out): e = n.out_edges[i] new_info = m[e] JOIN info_out[i] if m[e] != new_info: m[e] = new_info q.push(e.dest) Get rid of that JOIN right in the middle? Yes. The trick is in the flow functions.

Safely Getting Rid Of The Join In general, can’t remove join and have termination. So, when is it OK?

Safely Getting Rid Of The Join In general, can’t remove join and have termination. So, when is it OK? Port our algorithm to math to figure it out. Build in terms of fixpoints.

Fixpoints Are Easy Fixpoint: an input equal the output. A fixpoint of F is any X such that F(X) = X. “Best” answer since repeating yields same result.

Using Fixpoints In Analysis Goal: compute map m from CFG edges to dataflow information Strategy: define a global flow function F as follows: F takes a map m as a parameter and returns a new map m’, in which individual local flow functions have been applied

The Big F m’ m F

Just Regular Flow Funcs Inside m’ m f1 f2 f3

Goal: Find Fixpoint of F m m’ m’ F F F …

Fixpoint of F Goal: a fixed point of F, i.e. m where m = F(m) How should we do this?

Fixpoint of F Goal: a fixed point of F, i.e. m where m = F(m) How should we do this? Let ⊥ be ⊥lifted to a map: ⊥ =  e . ⊥ Compute F(⊥), then F(F(⊥)), then F(F(F(⊥))), ... until the result doesn’t change

Fixpoint of F Goal: a fixed point of F, i.e. m where m = F(m) How should we do this? Let ⊥ be ⊥lifted to a map: ⊥ =  e . ⊥ Compute F(⊥), then F(F(⊥)), then F(F(F(⊥))), ... until the result doesn’t change … but how long could that take ???

Fixpoint of F : Formal Solution Solution: ⊔ Fi(⊥) i = 0

Fixpoint of F : Formal Solution Solution: We want F1(⊥) ⊑F2(⊥) ⊑F3(⊥) … ⊑Fk(⊥) Allows us to eliminate the big join. Just require F to be monotonic: ∀ a b, a ⊑ b ➞ F(a) ⊑ F(b) ⊔ Fi(⊥) i = 0

Back To Termination Solution: if F is monotonic, we have it. Finite lattice height  termination w/out joins! OK. But how do we know F is monotonic? F is monotonic if flow functions monotonic.

Another benefit of monotonicity • Suppose Marsians came to earth, and miraculously give you a fixed point of F, call it fp. • Then:

Another benefit of monotonicity • We are computing the least fixed point...

Recap • Let’s do a recap of what we’ve seen so far • Started with worklist algorithm for reaching definitions

Worklist algorithm for reaching defns let m: map from edge to computed value at edge let worklist: work list of nodes for each edge e in CFG do m(e) := ∅ for each node n do worklist.add(n) while (worklist.empty.not) do let n := worklist.remove_any; let info_in := m(n.incoming_edges); let info_out := F(n, info_in); for i := 0 .. info_out.length do let new_info := m(n.outgoing_edges[i]) ∪ info_out[i]; if (m(n.outgoing_edges[i])  new_info]) m(n.outgoing_edges[i]) := new_info; worklist.add(n.outgoing_edges[i].dst);

Generalized algorithm using lattices let m: map from edge to computed value at edge let worklist: work list of nodes for each edge e in CFG do m(e) := ⊥ for each node n do worklist.add(n) while (worklist.empty.not) do let n := worklist.remove_any; let info_in := m(n.incoming_edges); let info_out := F(n, info_in); for i := 0 .. info_out.length do let new_info := m(n.outgoing_edges[i]) ⊔ info_out[i]; if (m(n.outgoing_edges[i])  new_info]) m(n.outgoing_edges[i]) := new_info; worklist.add(n.outgoing_edges[i].dst);

Next step: removed outer join • Wanted to remove the outer join, while still providing termination guarantee • To do this, we re-expressed our algorithm more formally • We first defined a “global” flow function F, and then expressed our algorithm as a fixed point computation

Guarantees • If F is monotonic, don’t need outer join • If F is monotonic and height of lattice is finite: iterative algorithm terminates • If F is monotonic, the fixed point we find is the least fixed point. • Any questions?

What about if we start at top? • What if we start with ⊤, F(⊤), F(F(⊤)), F(F(F(⊤)))

What about if we start at top? • What if we start with ⊤, F(⊤), F(F(⊤)), F(F(F(⊤))) • We get the greatest fixed point • Why do we prefer the least fixed point? • More precise

Graphically y 10 10 x

Another example: constant prop • Set D = in x := N Fx := N(in) = out in x := y op z Fx := y op z(in) = out

Another example: constant prop • Set D = 2 { x ! N | x 2 Vars Æ N 2 Z } in x := N Fx := N(in) = in – { x ! * } [ { x ! N } out in x := y op z Fx := y op z(in) = in – { x ! * } [ { x ! N | ( y ! N1 ) 2 in Æ ( z ! N2 ) 2 in Æ N = N1 op N2 } out

Another example: constant prop in Fx := *y(in) = x := *y out in F*x := y(in) = *x := y out

Another example: constant prop in Fx := *y(in) = in – { x ! * } [ { x ! N | 8 z 2 may-point-to(x) . (z ! N) 2 in } x := *y out in F*x := y(in) = in – { z ! * | z 2 may-point(x) } [ { z ! N | z 2 must-point-to(x) Æ y ! N 2 in } [ { z ! N | (y ! N) 2 in Æ (z ! N) 2 in } *x := y out

Another example: constant prop in *x := *y + *z F*x := *y + *z(in) = out in x := f(...) Fx := f(...)(in) = out

Another example: constant prop in *x := *y + *z F*x := *y + *z(in) = Fa := *y;b := *z;c := a + b; *x := c(in) out in x := f(...) Fx := f(...)(in) = ; out

Another example: constant prop in s: if (...) out[0] out[1] in[0] in[1] merge out

CSE 231 : Advanced Compilers

CSE 231 : Advanced Compilers

Presentation Transcript

Why STMs Need Compilers (a war story)

Languages and Compilers (SProg og Oversættere)

Beyond Auto-Parallelization: Compilers for Many-Core Systems

E6998-2: Advanced Topics in Programming Languages and Compilers

Interpreters and Compilers

Parallel Sessions: Compilers

Compilers and Application Security

CSE P501 – Compiler Construction

Compilers

Programming Languages and Compilers (CS 421)

Chapter 2 Language Processors

Advanced Compilers CMPSCI 710 Spring 2003 Data flow analysis

Programming Languages and Compilers (CS 421)

G-RSM CVS System at ECPC

Compilers for DSP Processors and Low-Power

Assemblers and Compilers

Software Tools Using PBS

Programming Languages and Compilers (CS 421)

Programming Languages and Compilers (CS 421)

Programming Languages and Compilers (CS 421)

COMPILERS

Languages and Compilers (SProg og Oversættere)