Spring 2014 Program Analysis and Verification Lecture 7: Static Analysis I

Spring 2014Program Analysis and Verification Lecture 7: Static Analysis I Roman Manevich Ben-Gurion University

Syllabus

Previously Axiomatic verification Weakest precondition calculus Strongest postcondition calculus Total correctness

Axiomatic semantics for While { P[a/x] } x:= a { P } [assp] [skipp] { P } skip { P } { P } S1 { Q }, { Q } S2 { R } { P } S1; S2 { R } [compp] { b P} S1 { Q}, { b P} S2 { Q} { P} if bthenS1elseS2 { Q} { b P } S { P } { P } while bdoS {b P } { P’ } S { Q’ } { P } S { Q } [ifp] [whilep] [consp] if PP’ and Q’Q

Strongest postcondition A forward-going predicate transformer The strongest postcondition for P is’  sp(P, C)if and only if there exists  such that  P and C, ’ Propositions: p { P } C { sp(P, C) } If p { P } C { Q } then sp(P, C)  Q

Calculating sp sp(skip, P) = P sp(x:=a, P) = v. x=a[v/x]  P[v/x] sp(S1;S2, P) = sp(S2, sp(S1, P)) sp(ifbthenS1elseS2, P) =sp(S1, b P)  sp(S2, b P) sp(whilebdo{}S, P) =   bwhere {b } S{}and P  b  

Today • Static analysis for compiler optimization • Common Subexpression Elimination • Available Expression domain • Develop a static analysis for Simple Available Expressions

Array-max example: Post3 nums : array{ N0 0m<N } // N stands for num’s lengthx := 0{ x=0 }res := nums[0]{ x=0  res=nums(0) }Inv = { 0m<x nums(m)res }while x < N { x=k  res=oRes  0m<k nums(m)oRes } if nums[x] > res then { nums(x)>oRes res=oRes  x=k  0m<k nums(m)oRes } res := nums[x] { res=nums(x) nums(x)>oRes x=k  0m<k nums(m)oRes } { x=k  0mk nums(m)res } { (x=k  0mk nums(m)res) (oresnums(x)  res=oRes  x=k  res=oRes  0m<k  nums(m)oRes)} { x=k  0mk nums(m)res } x := x + 1 { x=k+1  0mk nums(m)res } { 0m<x nums(m)res } { x=N  0m<x nums(m)res} [univp]{ m. 0m<N nums(m)res }

Can we find this proof automatically? nums : arrayN : int{ N0 } x := 0 {N0  x=0 }res := nums[0]{ x=0 }Inv = { xN }while x < N{ x=k  k<N } if nums[x] > res then{ x=k k<N } res := nums[x]{ x=k k<N }{ x=k k<N } x := x + 1{ x=k+1 k<N } { xN xN} { x=N } Observation: predicates in proof have the general formconstraintwhere constraint has the formX - Y c orX c

Look under the street lamp …We may move lamp a bit By Infopablo00 (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Zone Abstract Domain Developed by Antoine Minein his Ph.D. thesis Uses constraints of the formX - Y c and X c

Analysis with Zone abstract domain Static Analysis with Zone Abstraction Manual Proof nums : arrayN : int{ N0 } x := 0 {N0  x=0 }res := nums[0]{ N0  x=0 }Inv = { N0  0xN }while x < N{ N0  0x<N } if nums[x] > res then{ N0  0x<N } res := nums[x]{ N0  0x<N }{ N0  0x<N } x := x + 1{ N0  0<x<N } {N0  0x  x=N } nums : arrayN : int{ N0 } x := 0 {N0 x=0 }res := nums[0]{ x=0 }Inv = { xN }while x < N{ x=k  kN } if nums[x] > res then{ x=k k<N } res := nums[x]{ x=k k<N }{ x=k k<N } x := x + 1{ x=k+1 k<N } { xN xN} { x=N }

Static analysisfor compiler optimizations

Motivating problem: optimization A compiler optimization is defined by a program transformation: T : Stmt  Stmt The transformation is semantics-preserving: s. SsosC s = Ssos T(C) s The transformation is applied to the program only if an enabling condition is met We use static analysis for inferring enabling conditions

Common Subexpression Elimination op  {+, -, *, ==, <=} If we have two variable assignmentsx := a op b…y := a op band the values of x, a, and b have not changed between the assignments, rewrite the code asx = a op b…y := x Eliminates useless recalculation Paves the way for more optimizations(e.g., dead code elimination)

What do we need to prove? { true } C1 x := a op bC2{ x = a op b } y := a op bC3 { true } C1 x := a op bC2{ x = a op b } y := xC3 CSE Assertion localizes decision

A simplified problem { true } C1 x := a + bC2{ x = a + b } y := a + bC3 { true } C1 x := a + bC2{ x = a + b } y := xC3 CSE

Available Expressions analysis • A static analysis that infers for every program point a set of facts of the formAV = { x = y | x, y Var } { x = opy | x, y Var, op  {-, !} }  { x = y op z | y, z Var, op  {+, -, *, <=} } • For every program with n=|Var| variables number of possible facts is finite: |AV|=O(n3) • Yields a trivial algorithm … • Is it efficient?

Simple Available Expressions • Define atomic facts (for SAV) as = { x = y | x, y Var }  { x = y + z | x, y, z Var } • For n=|Var| number of atomic facts is O(n3) • Define sav-predicates as  = 2

Notation for conjunctive sets of facts • For a set of atomic facts D  , we defineConj(D) = D • E.g., if D={a=b, c=b+d, b=c} thenConj(D) = (a=b)  (c=b+d)  (b=c) • Notice that for two sets of facts D1 and D2Conj(D1  D2) = Conj(D1)  Conj(D1) • What does Conj({}) stand for…?

Towards an automatic proof • Goal: automatically compute an annotated program proving as many facts as possible of the form x = y and x = y + z • Decision 1: develop a forward-going proof • Decision 2: draw predicates from a finite set D • “looking under the light of the lamp” • A compromise that simplifies problem by focusing attention – possibly miss some facts that hold • Challenge 1: handle straight-line code • Challenge 2: handle conditions • Challenge 3: handle loops

Challenge 1: handling straight-line code By Zachary Dylan Tax (Zachary Dylan Tax) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons

Straight line code example { } x := a + b{ x=a+b } z := a + c{ x=a+b, z=a+c} b := a * c{ z=a+c } Find a proof that satisfies both conditions

Straight line code example { } x := a + b{ x=a+b } z := a + c{ x=a+b, z=a+c } b := a * c{ z=a+c } sp cons Frame Can we make this into an algorithm? What do we need to ensure for each triple?

Goal • Given a program of the formx1 := a1; … xn := an • Find predicates P0, …, Pn such that • {P0} x1 := a1{P1} … {Pn-1} xn := an{Pn} is a proofThat is: sp(xi := ai, Pi-1)  Pi • Each Pi has the form Conj(Di) where Di is a set of atomic

Algorithm for straight-line code • Goal: find predicates P0, …, Pn such that • {P0} x1 := a1{P1} … {Pn-1} xn := an{Pn} is a proof That is: sp(xi := ai, Pi-1)  Pi • Each Pi has the form Conj(Di) where Di is a set of atomic facts • Idea: define a function FSAV[x:=a] :    s.t.if FSAV[x:=a](D) = D’then sp(x := a, Conj(D))  Conj(D’) • We call F the abstract transformer of x:=a • Unless D0 is given, initialize D0={} • For each i: compute Di+1 = Conj(FSAV[xi := ai] Di) • Finally Pi = Conj(Di)

Defining an SAV abstract transformer Goal: define a function FSAV[x:=a] :    s.t.if FSAV[x:=a](D) = D’then sp(x := a, Conj(D))  Conj(D’) Idea: define rules for individual factsand generalize to sets of facts by the conjunction rule

Defining an SAV abstract transformer { y=z+w} x:=a { y=z+w } { y=x+w} x:=a { } { y=w+x} x:=a { } {} x:=  { x= } { x=} x:=a { } [kill-rhs-1] [kill-rhs-2] [kill-lhs] [preserve] [gen]  Is either a variable v or an addition expression v+w Goal: define a function FSAV[x:=a] :    s.t.if FSAV[x:=a](D) = D’then sp(x := a, Conj(D))  Conj(D’) Idea: define rules for individual factsand generalize to sets of facts by the conjunction rule

SAV abstract transformer example { } x := a + b{ x=a+b } z := a + c{ x=a+b, z=a+c } b := a * c{ z=a+c }  Is either a variable v or an addition expression v+w { y=x+w} x:=aexpr { } { y=w+x} x:=aexpr { } { y=z+w} x:=aexpr { y=z+w } {} x:=  { x= } { x=} x:=aexpr { } [kill-rhs-1] [kill-rhs-2] [kill-lhs] [preserve] [gen]

Problem 1: large expressions { } x := a + b + c{ } y := a + b + c{ } Missed CSE opportunity • Large expressions on the right hand sides of assignments are problematic • Can miss optimization opportunities • Require complex transformers • Solution: transform code to normal form where right-hand sides have bounded size

Three-address code { } x := a + b + c{ } y := a + b + c{ } { } i1 := a + b{ i1=a+b } x := i1 + c{ i1=a+b, x=i1+c } i2 := a + b{ i1=a+b, x=i1+c, i2=a+b} y := i2 + c{ i1=a+b, x=i1+c, i2=a+b, y=i2+c } Main idea: simplify expressions by storing intermediate results in new temporary variables Number of variables in simplified statements  3

Three-address code { } x := a + b + c{ } y := a + b + c{ } { } i1 := a + b{ i1=a+b } x := i1 + c{ i1=a+b, x=i1+c } i2 := a + b{ i1=a+b, x=i1+c, i2=a+b } y := i2 + c{ i1=a+b, x=i1+c, i2=a+b, y=i2+c } Need to infer i1=i2 Main idea: simplify expressions by storing intermediate results in new temporary variables Number of variables in simplified statements  3

Problem 2: transformer precision { } i1 := a + b{ i1=a+b } x := i1 + c{ i1=a+b, x=i1+c } i2 := a + b{ i1=a+b, x=i1+c, i2=a+b } y := i2 + c{ i1=a+b, x=i1+c, i2=a+b, y=i2+c } Need to infer i1=i2 • Our transformer only infers syntactically available expressions – ones that appear in the code explicitly • We want a transformer that looks deeper into the semantics of the predicates • Takes equalities into account

Defining a semantic reduction • Idea: make as many implicit facts explicit by • Using symmetry and transitivity of equality • Commutativity of addition • Meaning of equality – can substitute equal variables • For an SAV-predicate P=Conj(D) defineExplicate(D) = minimal set D* such that: • D D* • x=y  D* implies y=x  D* • x=y  D*y=z  D* implies x=z  D* • x=y+z  D* implies x=z+y  D* • x=y  D* and x=z+w  D* implies y=z+w D* • x=y  D* and z=x+w  D* implies z=y+w D* • x=z+w  D* and y=z+w  D* implies x=y  D* • Notice that Explicate(D)  D • Explicate is a special case of a semantic reduction

Sharpening the transformer { } i1 := a + b{ i1=a+b, i1=b+a } x := i1 + c{ i1=a+b, i1=b+a, x=i1+c, x=c+i1 } i2 := a + b{ i1=a+b, i1=b+a, x=i1+c, x=c+i1, i2=a+b, i2=b+a, i1=i2, i2=i1, x=i2+c, x=c+i2, } y := i2 + c{ ... } Since sets of facts and their conjunction are isomorphic we will use them interchangeably Define: F*[x:=aexpr] = Explicate  FSAV[x:=aexpr]

An algorithm for annotating SLP Annotate(P, x:=aexpr) = {P} x:=aexprF*[x:=aexpr](P)Annotate(P, S1; S2) = let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2}

Challenge 2: handling conditions

Goal { P}if bexprthen{ bexpr P}S1{ Q1}else{ bexpr P} S2{ Q2}{ Q} {bexpr P} S1 { Q}, { bexpr P} S2 { Q} { P} if bexprthenS1elseS2 { Q} [ifp] • Annotate a programif bexprthenS1elseS2with predicates from  • Assumption 1:P is given(otherwise use true) • Assumption 2:bexpr is a simple binary expression e.g., x=y, xy, x<y (why?)

Joining predicates Possibly an SAV-fact { P}if bexprthen{ bexpr P}S1{ Q1}else{ bexpr P} S2{ Q2}{ Q} {bexpr P} S1 { Q}, { bexpr P} S2 { Q} { P} if bexprthenS1elseS2 { Q} [ifp] Possibly an SAV-fact • Start with P or {bexpr P} and annotate S1 (yielding Q1) • Start with P or {bexpr P} andannotate S2 (yielding Q2) • How do we infer a Q such that Q1Q and Q2Q?Q1=Conj(D1), Q2=Conj(D2)Define: Q = Q1 Q2 = Conj(D1 D2)

Joining predicates { P}if bexprthen{ bexpr P}S1{ Q1}else{ bexpr P} S2{ Q2}{ Q} {bexpr P} S1 { Q}, { bexpr P} S2 { Q} { P} if bexprthenS1elseS2 { Q} [ifp] The join operator for SAV • Start with P or {bexpr P} and annotate S1 (yielding Q1) • Start with P or {bexpr P} andannotate S2 (yielding Q2) • How do we infer a Q such that Q1Q and Q2Q?Q1=Conj(D1), Q2=Conj(D2)Define: Q = Q1 Q2 = Conj(D1 D2)

Joining predicates Q1=Conj(D1), Q2=Conj(D2) We want to soundly approximate Q1 Q2in  Define: Q = Q1 Q2 = Conj(D1 D2) Notice that Q1Q and Q2Qmeaning Q1 Q2 Q

Simplifying conditions • Extend While with • Non-determinism (or) and • An assume statement assumeb, s sossif B b s = tt • Now, the following two statements are equivalent • if bthenS1elseS2 • (assumeb; S1) or (assumeb; S2)

Handling conditional expressions We want to soundly approximate D  bexprand D  bexprin  Define (bexpr) = if bexpr is factoid {bexpr}else {} Define F[assumebexpr](D) = D  (bexpr) Can sharpenF*[assumebexpr] = Explicate FSAV[assumebexpr]

Handling conditional expressions • Notice bexpr (bexpr) • Examples • (y=z) = {y=z} • (y<z) = {}

An algorithm for annotating conditions letPt = F*[assumebexpr]P letPf = F*[assumebexpr]P let Annotate(Pt, S1) be {Pt} A1 {Q1} let Annotate(Pf, S2) be {Pf} A2 {Q2} return {P}ifbexprthen{Pt} A1 {Q1} else{Pf} A2 {Q2}{Q1 Q2}

Example { } if (x = y){ x=y, y=x } a := b + c{ x=y, y=x, a=b+c, a=c+b} d := b – c{ x=y, y=x, a=b+c, a=c+b}else{ } a := b + c{ a=b+c, a=c+b} d := b + c{ a=b+c, a=c+b, d=b+c, d=c+b, a=d, d=a }{ a=b+c, a=c+b}

Example { } if (x = y){ x=y, y=x } a := b + c{ x=y, y=x, a=b+c, a=c+b } d := b – c{ x=y, y=x, a=b+c, a=c+b }else{ } a := b + c{ a=b+c, a=c+b } d := b + c{ a=b+c, a=c+b, d=b+c, d=c+b, a=d, d=a }{ a=b+c, a=c+b }

Recap • We now have an algorithm for soundly annotating loop-free code • Generates forward-going proofs • Algorithm operates on abstract syntax tree of code • Handles straight-line code by applying F* • Handles conditions by recursively annotating true and false branches and then intersecting their postconditions

Example { } if (x = y){ x=y, y=x } a := b + c{ x=y, y=x, a=b+c, a=c+b } d := b – c{ x=y, y=x, a=b+c, a=c+b }else{ } a := b + c{ a=b+c, a=c+b } d := b + c{ a=b+c, a=c+b, d=b+c, d=c+b, a=d, d=a }{ a=b+c, a=c+b }

Spring 2014 Program Analysis and Verification Lecture 7: Static Analysis I