800 likes | 958 Views
Lecture 03 – Background + intro AI + dataflow and connection to AI. Program analysis & Synthesis. Eran Yahav. Previously…. structural operational semantics trace semantics meaning of a program as the set of its traces abstract interpretation abstractions of the trace semantics.
E N D
Lecture 03 – Background + intro AI + dataflow and connection to AI Program analysis & Synthesis EranYahav
Previously… • structural operational semantics • trace semantics • meaning of a program as the set of its traces • abstract interpretation • abstractions of the trace semantics Example: Cormac’s PLDI’08 Example: KupersteinPLDI’11
Today • abstract interpretation • some basic math background • lattices • functions • Galois connection • A little more dataflow analysis • monotone framework
Trace Semantics [z := 1]2 [y := x]1 [y := x]1; [z := 1]2; while [y > 0]3 ( [z := z * y]4; [y := y − 1]5; ) [y := 0]6 < 1,{ x42, y0, z0 } > < 2,{ x42, y42, z0 } > < 3,{ x42, y42, z1 } > < 4,{ x42, y42, z1 }> … • note that input (x) is unknown • while loop? • trace semantics is not computable [y > 0]3 [y := x]1 [z := 1]2 < 1,{ x73, y0, z0 } > < 2,{ x73, y73, z0 } > < 3,{ x73, y73, z1 } > < 4,{ x73, y73, z1 }> … [y > 0]3 …
Abstract Interpretation abstract state abstract state S • ’ abstract C’’ S C C’ concrete set of states set of states
Dataflow Analysis { (x,?), (y,?), (z,?) } { (x,?), (y,1), (z,?) } 1: y := x; 2: z := 1; 3: while y > 0 { 4: z := z * y; 5: y := y − 1 } 6: y := 0 • Reaching Definitions • The assignment lab: var := exp reaches lab’ if there is an execution where var was last assigned at lab { (x,?), (y,1), (z,2) } { (x,?), (y,1), (z,2) } { (x,?), (y,?), (z,4) } { (x,?), (y,5), (z,4) } { (x,?), (y,1), (z,2) } { (x,?), (y,?), (z,?) } (adapted from Nielson, Nielson & Hankin)
Dataflow Analysis { (x,?), (y,?), (z,?) } { (x,?), (y,1), (z,?) } 1: y := x; 2: z := 1; 3: while y > 0 { 4: z := z * y; 5: y := y − 1 } 6: y := 0 • Reaching Definitions • The assignment lab: var := exp reaches lab’ if there is an execution where var was last assigned at lab { (x,?), (y,1), (z,2), (y,5), (z,4) } { (x,?), (y,1), (z,2), (y,5), (z,4) } { (x,?), (y,?), (z,4), (y,5) } { (x,?), (y,5), (z,4) } { (x,?), (y,1), (z,2), (y,5), (z,4) } { (x,?), (y,6), (z,2), (z,4) } (adapted from Nielson, Nielson & Hankin)
Dataflow Analysis • Build control-flow graph • Assign transfer functions • Compute fixed point
Control-Flow Graph 1: y := x; 2: z := 1; 3: while y > 0 { 4: z := z * y; 5: y := y − 1 } 6: y := 0 1: y:=x 2: z:=1 3: y > 0 6: y:=0 4: z=z*y 5: y=y-1
Transfer Functions 1: y:=x in(1) = { (x,?), (y,?), (z,?) } in(2) = out(1) in(3) = out(2) U out(5) in(4) = out(3) in(5) = out(4) in(6) = out(3) out(1) = in(1) \ { (y,l) | l Lab } U { (y,1) } 2: z:=1 out(2) = in(2) \ { (z,l) | l Lab } U { (z,2) } 3: y > 0 out(3) = in(3) 6: y:=0 4: z=z*y out(6) = in(6) \ { (y,l) | l Lab } U { (y,6) } out(4) = in(4) \ { (z,l) | l Lab } U { (z,4) } 5: y=y-1 out(5) = in(5) \ { (y,l) | l Lab } U { (y,5) }
System of Equations in(1) = { (x,?), (y,?), (z,?) } in(2) = out(1) in(3) = out(2) U out(5) in(4) = out(3) in(5) = out(4) In(6) = out(3) out(1) = in(1) \ { (y,l) | l Lab } U { (y,1) } out(2) = in(2) \ { (z,l) | l Lab } U { (z,2) } out(3) = in(3) out(4) = in(4) \ { (z,l) | l Lab } U { (z,4) } out(5) = in(5) \ { (y,l) | l Lab } U { (y,5) } out(6) = in(6) \ { (y,l) | l Lab } U { (y,6) } F: ((Var x Lab) )12 ((Var x Lab) )12
System of Equations in(1) = { (x,?), (y,?), (z,?) } in(2) = out(1) in(3) = out(2) U out(5) in(4) = out(3) in(5) = out(4) In(6) = out(3) out(1) = in(1) \ { (y,l) | l Lab } U { (y,1) } out(2) = in(2) \ { (z,l) | l Lab } U { (z,2) } out(3) = in(3) out(4) = in(4) \ { (z,l) | l Lab } U { (z,4) } out(5) = in(5) \ { (y,l) | l Lab } U { (y,5) } out(6) = in(6) \ { (y,l) | l Lab } U { (y,6) } … F: ((Var x Lab) )12 ((Var x Lab) )12 in(1)={(x,?),(y,?),(z,?)} , in(2)=out(1), in(3)=out(2) U out(5), in(4)=out(3), in(5)=out(4),In(6) = out(3) out(1) = in(1) \ { (y,l) | l Lab } U { (y,1) }, out(2) = in(2) \ { (z,l) | l Lab } U { (z,2) } out(3) = in(3), out(4) = in(4) \ { (z,l) | l Lab } U { (z,4) } out(5) = in(5) \ { (y,l) | l Lab } U { (y,5) }, out(6) = in(6) \ { (y,l) | l Lab } U { (y,6) }
System of Equations … F: ((Var x Lab) )12 ((Var x Lab) )12 RD RD’ when i: RD(i) RD’(i)
Monotonicity F: ((Var x Lab) )12 ((Var x Lab) )12 … out(1) = {(y,1)} when (x,?) out(1) in(1) \ { (y,l) | l Lab } U { (y,1) }, otherwise (silly example just for illustration)
Convergence n n … … … 2 2 2 1 1 1 0 0 0
Least Fixed Point • We will see later why it exists • For now, mostly informally… Fn(RD) F: ((Var x Lab) )12 ((Var x Lab) )12 RD RD’ when i: RD(i) RD’(i) F is monotone: RD RD’ implies that F(RD) F(RD’) RD = (, ,…,) F(RD), F(F(RD) , F(F(F(RD)), … Fn(RD) Fn+1(RD) = Fn(RD) … F(F(RD)) F(RD) RD
Partially Ordered Sets (posets) • Partial ordering is a relation : L L { true, false} that is: • Reflexive (l: l l) • Transitive (l1,l2,l3: l1 l2 l2 l3 l1 l3) • Anti-symmetric ( l1 l2 l2 l1 l1 = l2) • A partially ordered set (L, ) is a set L equipped with a partial ordering • For example: ((S), ) • Captures intuition of implication between facts • l1 l2 will intuitively mean l1 l2 • Later, in abstract domains , when l1 l2 , we will say that l1 is “more precise” than l2 (represents fewer concrete states)
Bounds in Posets • Given a poset(L, ) • A subset Y of L has l L as an upper bound if l’ Y : l’ l • A subset Y of L has l L as a lower bound if l’ Y : l’ l (we write l’ l when l l’) • A least upper bound (lub) l of Y is an upper bound of Y that satisfies l l’ whenever l’ is another upper bound of Y • A greatest lower bound (glb) l of Y is a lower bound of Y that satisfies l l’ whenever l’ is another lower bound of Y • For any subset Y L • If the lub exists then it is unique, and denoted by Y (join) • We often write l1 l2 for { l1, l2} • If the glb exists then it is unique, and denoted by Y (meet) • We often write l1 l2 for { l1, l2}
Complete Lattices • A complete lattice (L, ) is a poset such that all subsets have least upper bounds as well as greatest lower bounds • In particular • = = L is the least element (bottom) • = L = is the greatest element (top)
Complete Lattices: Examples {1,2,3} {1,3} {1,2} {2,3} - + {1} {3} {2} 0
Complete Lattices: Examples [- ,] … … [- ,2] [-2,] … … [-2,2] [-,1] [-1,] … … [- ,0] [0,] [-2,1] [-1,2] … … [-,-1] [1,] [-1,1] [0,2] [-2,0] … … [- ,-2] [2,] [-1,0] [-2,-1] [0,1] [1,2] … … [-2,-2] [-1,-1] [0,0] [1,1] [2,2] … … …
Moore Families • A Moore family is a subset Y of a complete lattice (L,) that is closed under greatest lower bounds • Y’ Y: Y’ Y • Hence, a Moore family is never empty • Always contains least element Y and a greatest element • Why do we care? • We will see that Moore families correspond to choices of abstractions (as targets of upper closure operators) • More on this… later…
Moore Families: Examples {1,2,3} {1,2,3} {1,3} {1,2} {1,2} {2,3} {2,3} {1} {3} {2} {2} {1,2,2}
Chains • In program analysis we will often look at sequences of properties of the form l0 l1 … ln • Given a poset (L, ) a subset Y L is a chain if every two elements in Y are ordered • l1,l2 Y: (l1 l2) or (l2 l1) • When Y is a finite subset we say that the chain is a finite chain • A sequence of elements l0 l1 … L is an ascending chain • A sequence of elements l0 l1 … L is a descending chain • L has finite height when all chains in L are finite • A poset (L, ) satisfies the ascending chain condition (ACC) if every ascending chain eventually stabilizes, that is s N: n N: n s ln = ls
Chains: Example Posets 0 … 2 -1 -1 0 1 … - … 1 -2 … 0 - (a) (b) (c)
Properties of Functions • A function f: L1 L2 between two posets (L1,1) and (L2, 2) is • Monotone (order-preserving) • l,l’ L1: l 1 l’ f(l) 2 f(l’) • Special case f: L L, l,l’ L: l l’ f(l) f(l’) • Distributive (join morphism) • l,l’ L1 f(l l’) = f(l) f(l’)
Function Properties: Examples {1,2,3} {1,3} {1,2} {2,3} {1} {3} {2}
Function Properties: Examples {1,2,3} {1,2,3} {1,2} {1,2} {2,3} {2,3} {2} {2}
Function Properties: Examples {1,2,3} {1,2,3} {1,2} {1,2} {2,3} {2,3} {2} {2}
Function Properties: Examples {1,2,3} {1,2,3} {1,3} {1,3} {1,2} {2,3} {1,2} {2,3} {1} {3} {1} {3} {2} {2}
Fixed Points • Consider a complete lattice L = (L,,,,,) and a monotone function f: L L • An element l L is • A fixed point iff f(l) = l • A pre-fixedpointiff l f(l) • A post-fixedpointiff l f(l) • Fix(f) = set of all fixed points • Ext(f) = set of all pre-FP • Red(f) = set of all post-FP Red(f) fn() gfp Fix(f) lfp fn() Ext(f)
Tarski’s Theorem • Consider a complete lattice L = (L,,,,,) and a monotone function f: L L, then lfp(f) and gfp(f) exist and • lfp(f) = Red(f) = Fix(f) • gfp(f) = Ext(f) = Fix(f)
Kleene’sFixedpoint Theorem • Consider a complete lattice L = (L,,,,,) and a continuous function f: L L thenlfp(f) = n N fn() • A function f is continuous if for every increasing chain Y L, f(Y) = { f(y) | y Y } • Monotone functions on posets satisfying ACC are continuous • Every ascending chain eventually stabilizes l0 l1 … ln = ln+1 = … hence ln is the least upper bound of { l0,l1,…,ln } , thus f(Y) = f(ln) • From monotonicity of f f(l0) f(l1) … f(ln) = f(ln+1) = … Hence f(ln) is the least upper bound of { f(l0),f(l1), … ,f(ln) } , thus { f(y) | y Y } = f(ln)
An Algorithm for Computing lfp • Kleene’s fixed point theorem gives a constructive method for computing the lfp lfp(f) = n N fn() l = while f(l) l do l = f(l) • In our case, can compute efficiently using a worklist algorithm for “chaotic iteration”
Reaching Definitions… in(1) = { (x,?), (y,?), (z,?) } in(2) = out(1) in(3) = out(2) U out(5) in(4) = out(3) in(5) = out(4) In(6) = out(3) out(1) = in(1) \ { (y,l) | l Lab } U { (y,1) } out(2) = in(2) \ { (z,l) | l Lab } U { (z,2) } out(3) = in(3) out(4) = in(4) \ { (z,l) | l Lab } U { (z,4) } out(5) = in(5) \ { (y,l) | l Lab } U { (y,5) } out(6) = in(6) \ { (y,l) | l Lab } U { (y,6) } F: ((Var x Lab) )12 ((Var x Lab) )12 l = while F(l) l do l = F(l)
Algorithm Revisited • Input: equation system for RD • Output: lfp(F) • Algorithm: • RD1 = , … , RD12 = • While RDj Fj(RD1,…,RD12) for some j • RDj = Fj(RD1,…,RD12) • Idea: non-deterministically select which component of RD should make a step • Can make selection efficiently based on dependencies • We have information on what equations depend on Fj that has been updated, need only recompute these on every step
But… • where did the transfer functions come from? • e.g., out(1) = in(1) \ { (y,l) | l Lab } U { (y,1) } • how do we know transfer functions really over-approximate the concrete operations?
Trace Semantics [z := 1]2 [y := x]1 [y := x]1; [z := 1]2; while [y > 0]3 ( [z := z * y]4; [y := y − 1]5; ) [y := 0]6 < 1,{ x42, y0, z0 } > < 2,{ x42, y42, z0 } > < 3,{ x42, y42, z1 } > < 4,{ x42, y42, z1 }> … • note that input (x) is unknown • while loop? • trace semantics is not computable [y > 0]3 [y := x]1 [z := 1]2 < 1,{ x73, y0, z0 } > < 2,{ x73, y73, z0 } > < 3,{ x73, y73, z1 } > < 4,{ x73, y73, z1 }> … [y > 0]3 …
Instrumented Trace Semantics • on every assignment, record the label in which the assignment was performed
Abstract Interpretation 1: y := x; 2: z := 1; 3: while y > 0 { 4: z := z * y; 5: y := y − 1 } 6: y := 0 (x,?)(y,?)(z,?)(y,1)(z,2)(y,6) • Instrumented trace semantics (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5)(y,6) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5)(z,4)(y,5)(y,6) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5)(z,4)(y,5)(z,4)(y,5)(y,6) …
Collecting Semantics (x,?)(y,?)(z,?) (x,?)(y,?)(z,?) 1: y:=x (x,?)(y,?)(z,?)(y,1) (x,?)(y,?)(z,?)(y,1) 2: z:=1 (x,?)(y,?)(z,?) (y,1)(z,2) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5) 3: y > 0 (x,?)(y,?)(z,?)(y,1)(z,2) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5) 6: y:=0 4: z=z*y (x,?)(y,?)(z,?)(y,1)(z,2) (y,6) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5)(y,6) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5)(z,4) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5)(z,4)(y,5)(y,6) 5: y=y-1 (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5)(z,4)(y,5)
System of Equations in(1) = { (x,?)(y,?)(z,?) } in(2) = out(1) in(3) = out(2) U out(5) in(4) = out(3) in(5) = out(4) In(6) = out(3) out(1) = { (y,1) | in(1) } out(2) = { (z,2) | in(2) } out(3) = in(3) out(4) = { (z,4) | in(4) } out(5) = { (y,5) | in(5) } out(6) = { (y,6) | in(6) } lfp(G) … Gn(TR) … G(G(TR)) G(TR) Trace = (Var x Lab)* TR G: ((Trace) )12 ((Trace) )12
Galois Connection (A) A C (C) (C) A C (A)
Reaching Definitions: Abstraction • DOM() = variables in • SRD()(v) = l iffrightmost pair for v in is (v,l) • (C) = { (v,SRD()(v)) | v DOM() C } • (A) = { | v DOM() : (v,SRD()(v) ) A } A (A) Set of Traces Set of (var,lab) pairs C (x,?)(y,?)(z,?)(y,1)(z,2)(y,6) (C) (x,?),(z,2),(y,6) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5)(y,6) (x,?),(z,2),(z,4),(y,6) (x,?),(z,4),(y,6) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5)(z,4)(y,5)(y,6) (x,?),(z,4),(y,6)
Reaching Definitions: Concretization • DOM() = variables in • SRD()(v) = l iff rightmost pair for v in is (v,l) • (C) = { (v,SRD()(v)) | v DOM() C } • (A) = { | v DOM() : (v,SRD()(v) ) A } A (x,?),(z,4),(y,6) (A) C (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5)(y,6) (C) (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5)(z,4)(y,5)(y,6) … (x,?)(y,?)(z,?)(y,1)(z,2)(z,4)(y,5)…(z,4)(y,5)(y,6)
Best (Induced) Transformer C’ S ? A’ C S# A How do we figure out the abstract effect S#of a statement S ?
Sound Abstract Transformer A’ C’ Aopt S S# C A S (A) S#(A)
Soundness of Induced Analysis lfp(G) lfp(F) … Gn(TR) … (lfp(G)) … F(F(RD)) G(G(TR)) F(RD) G(TR) RD TR (lfp(G)) lfp( G ) lfp(F)