CSE 231 : Advanced Compilers

CSE 231 : Advanced Compilers Building Program Analyzers

Foundations

Foundations : Relations Relation R over set S is just a set of pairs from S: R ⊆S xS a R b means (a, b)∈R Example: < = { (a, b) | b – a is positive } a < b means (a, b)∈<

Foundations : Types of Relations • reflexive • ∀a ∈S . a R a • transitive • ∀a, b, c ∈S. a R b /\b R c a R c • symmetric • ∀a, b ∈S . a R b b R a • anti-symmetric • ∀a, b ∈S . a R b /\ b R a a = b

Foundations : Anti-Symmetry • anti-symmetric • ∀a, b ∈S . a R b /\ b R a a = b • Anti-symmetry is slightly weird. • Essentially: • “If you start at X and only follow R to new elements, you will never get back to X.”

Foundations : Types of Relations equivalence reflexive, transitive, symmetric defines equivalence classes partitions domain partial order reflexive, transitive, anti-symmetric defines a partial order on domain

Foundations : Posets partially ordered set (poset) defined by set S and partial order ·

Foundations : Posets partially ordered set (poset) defined by set S and partial order · Example: (2{x, y, z},⊆)

Foundations : Posets partially ordered set (poset) defined by set S and partial order · Examples: (2S, ⊆) (Z, <) (Z, divides)

Foundations : Least Upper Bounds Assume poset (S, ⊑) least upper bound (lub) of a and b is c where a ⊑c b ⊑c ∀d . (a ⊑d /\b ⊑d) c ⊑d Upper Least

Foundations : Greatest Lower Bounds Assume poset (S, ⊑) greatest lower bound (glb) of a and b is c where c ⊑a c ⊑b ∀d . (d ⊑a /\d ⊑b) d ⊑c Lower Greatest

Foundations : Bounds Assume poset (S, ⊑) Essentially: lub(a, b) = smallest thing bigger than a and b glb(a, b) = biggest thing smaller than a and b Question: Do lub and glb always exist?

Foundations : Bounds Assume poset (S, ⊑) Essentially: lub(a, b) = smallest thing bigger than a and b glb(a, b) = biggest thing smaller than a and b Question: Do lub and glb always exist? Answer: No!

Foundations : Bounds Question: Do lub and glb always exist? Answer: No! glb( , ) = ???

Foundations : Lattices A lattice is (S, ⊑, ⊥, ⊤, ⊔, ⊓) where: (S, ⊑) is a poset ⊥ is the smallest thing in S ⊤ is the biggest thing in S lub(a, b) and glb(a, b) always exist a⊔ b = lub(a, b) a⊓ b = glb(a, b)

Foundations : Lattices (Formally) A lattice is (S, ⊑, ⊥, ⊤, ⊔, ⊓) where: (S, ⊑) is a poset ∀ a ∈ S . ⊥ ⊑ a ∀ a ∈ S . a ⊑ ⊤ ∀ a, b ∈ S . ∃ c . c = lub(a, b) /\ a⊔ b = c ∀ a, b ∈ S . ∃ c . c = glb(a, b) /\ a⊓ b = c

Foundations :Fancy Lattice Names ⊥ is “botom” ⊤ is “top” ⊔ is “join” ⊓ is “meet”

Examples of lattices • Powerset lattice

Examples of lattices • Booleans expressions

End Foundations, Now Build

Analysis on Sets let m: map from edge to computed value at edge let worklist: work list of nodes for each edge e in CFG do m(e) := ∅ for each node n do worklist.add(n) while (worklist.empty.not) do let n := worklist.remove_any; let info_in := m(n.incoming_edges); let info_out := F(n, info_in); for i := 0 .. info_out.length do let new_info := m(n.outgoing_edges[i]) ∪ info_out[i]; if (m(n.outgoing_edges[i])  new_info]) m(n.outgoing_edges[i]) := new_info; worklist.add(n.outgoing_edges[i].dst);

Port Analysis To Run On Lattices Formalize domain with a powerset lattice. What should top and bottom be?

Port Analysis To Run On Lattices Formalize domain with a powerset lattice. What should top and bottom be? Does it matter? Should we even care?

Port Analysis To Run On Lattices Formalize domain with a powerset lattice. What should top and bottom be? Does it matter? Should we even care? Yes. A notion of approximation shows up in the lattice.

Lattice Direction Unfortunate name clashes: dataflow analysis picked one direction abstract interpretation picked the other We work in the abstract interpretation direction: ⊥(bottom) is most precise (optimistic) ⊤(top) is most imprecise (conservative)

Lattice Direction Always safe to go up in the lattice. can always set the result to ⊤ (top) Hard to go down in the lattice. So: ⊥(bottom) will be empty set in reaching defns

Building an Analysis: worklist + lattice let m: map from edge to computed value at edge let worklist: work list of nodes for each edge e in CFG do m(e) := ⊥ for each node n do worklist.add(n) while (worklist.empty.not) do let n := worklist.remove_any; let info_in := m(n.incoming_edges); let info_out := F(n, info_in); for i := 0 .. info_out.length do let new_info := m(n.outgoing_edges[i]) ⊔ info_out[i]; if (m(n.outgoing_edges[i])  new_info]) m(n.outgoing_edges[i]) := new_info; worklist.add(n.outgoing_edges[i].dst);

Termination? For reaching definitions, it terminates. Why?

Termination? For reaching definitions, it terminates. Why? Because lattice is finite. Can we loosen this requirement? Yes, only require the lattice to have a finite height. Height of a lattice: length of longest ascending or descending chain

Termination? Height of lattice (2S, ⊆) = ???

Termination? Height of lattice (2S, ⊆) = | S |

Termination. But can we do better? Still annoying to perform join in the worklist algo: It would be nice to get rid of it. Is there a property of the flow functions that can help? while (worklist.empty.not) do let n := worklist.remove_any; let info_in := m(n.incoming_edges); let info_out := F(n, info_in); for i := 0 .. info_out.length do let new_info := m(n.outgoing_edges[i]) ⊔ info_out[i]; if (m(n.outgoing_edges[i])  new_info]) m(n.outgoing_edges[i]) := new_info; worklist.add(n.outgoing_edges[i].dst);

Crank Up the Formality… To reason even more precisely about termination, we port our worklist algorithm to math. Fixed points underlie our new algorithm.

Back to Foundations

Fixpoints What is a fixpoint?

Fixpoints What is a fixpoint? An input to a function that equals its output. So a fixpoint for f is any input x such that: f(x) = x Easy!

Fixpoints Goal: compute map m from CFG edges to dataflow information Strategy: define a global flow function F as follows: F takes a map m as a parameter and returns a new map m’, in which individual local flow functions have been applied

Fixpoints • Recall, we are computing m, a map from edges to dataflow information • Define a global flow function F as follows: F takes a map m as a parameter and returns a new map m’, in which individual local flow functions have been applied

Fixpoints • We want to find a fixed point of F, that is, a map m such that m = F(m) • Approach to doing this? • Define ⊥, which is ⊥ lifted to be a map: ⊥ =  e. ⊥ • Compute F(⊥), then F(F(⊥)), then F(F(F(⊥))), ... until the result doesn’t change

CSE 231 : Advanced Compilers