Software Verification 1 Deductive Verification

Software Verification 1Deductive Verification Prof. Dr. Holger Schlingloff Institut für Informatik der Humboldt Universität und Fraunhofer Institut für Rechnerarchitektur und Softwaretechnik

Predicate Logic • used to formalize mathematical reasoning • dates back to Frege (1879) „Begriffsschrift“ • „Eine der arithmetischen nachgebildete Formelsprache des reinen Denkens“ • individuals, predicates (sets of individuals), relations (sets of pairs), ... • quantification of statements (quantum = how much) • all, none, at least one, at most one, some,most, many, ... • need for variables to denote “arbitrary” objects • In contrast to propositional logic, first-order logic adds • structure to basic propositions • quantification on (infinite) domains

FOL: Syntax • New syntactic elements • R is a set of relation symbols, where each pR has an arity nN0 • V is a denumerable set of (first-order or individual) variables • An atomic formula is p(x1,…,xn), where pR is n-ary and (x1,…,xn)Vn. • Syntax of first-order logicFOL ::= R(Vn) |  | (FOL  FOL) | VFOL

FOL: Syntax • Abbreviations and parenthesis as in PL • Of course, x = ¬x ¬ • Propositions = 0-ary relationsPredicates = 1-ary relations • if all predicates are propositions, then FOL = PL • Examples • xxx (p()  x(q()  p())) • xxy ¬p(x) • xy (p(x,y)  p(y,x)) • (xy p(x,y)  yx p(x,y))

Typed FOL • Often, types/sorts are used to differentiate domains • Signature =(D, F, R), where • D is a (finite) set of domain names • F is a set of function symbols, where each fF has an arity nN0 and a type DDn+1 • 0-ary functions are called constants • R is a set of relation symbols, where each pR has an arity nN0 and a type DDn • unary relations are called predicates • propositions can be seen as 0-ary relations • Remark: domains and types are for ease of use only (can be simulated in an untyped setting by additional predicates)

Terms and Formulas • Let again V be a (denumerable) set of (first-order) variables, where each variable has a type DD (written as x:D)(for any type, there is an unlimited supply of variables of that type) • The notions Term and Atomic FormulaAtFare defined recursively: • each variable of type D is a term of type D • if f is an n-ary function symbol of type (D1,…Dn,Dn+1) and t1, …, tn are terms of type D1, …, Dn, then f(t1,…,tn) is a term of type Dn+1 • if p is an n-ary relation symbol of type (D1,…Dn) and t1, …, tn are terms of type D1, …, Dn, then p(t1,…,tn) is an atomic formula • Revised syntax of first-order logicFOL ::= AtF |  | (FOL  FOL) | V:DFOL

Examples • x:Boy y:Girl loves(x,y) • x:Human y:Human (needs(x,y)  loves(y,x)) • x,y:Int equals(plus(x,y), plus(y,x)) • x:Int ¬equals(zero(), succ(x)) • …

FOL: Models • (We give the typed semantics only) • First-Order Model • Let a universeU be some nonempty set, and let DU U for every DD be the domain of D • InterpretationI: assignment F↦ Un+1R ↦ Un • ValuationV: assignment V ↦ Uinterpretations and valuations must respect typing • ModelM: (U,I,V)

FOL: Semantics • Given a model M: (U,I,V), the value tM of term t (of type D) can be defined inductively • if t=xV, then tM=V(x) • if t=f(t1,…,tn), then tM=I(f)(t1M,…,tnM) • Likewise, the validation relation ⊨ between model M and formula  • M ⊨ p(t1,…,tn) if (t1M,…,tnM)I(p) • M ⊭ ; M ⊨ () if M ⊨  implies M ⊨  • M ⊨x if M‘ ⊨ for some M‘ which differs at most in V(x) from M • Validity and satisfiability is defined as in the propositional case

Examples • ⊨ x x  • ⊨ x x x () • ⊨ x x x () • ⊨ x y y x  • ⊨ x  (x:=t) • If ⊨ , then ⊨ x 

FOL: Calculus • A sound and complete axiom system for FOL: • all substitution instances of axioms of PL • modus ponens: , () ⊢  • ⊢((x:=t)x) instantiation • () ⊢(x) if x doesn‘t occur in  particularization • Relaxation: particularization may be applied if there is no free occurrence of x in ; i.e., x may occur in  inside the scope of a quantification

FOL: Completeness • As in the propositional case, correctness is easy (⊢  ⊨, “every derivable formula is valid”) • Completeness (⊨  ⊢, “every valid formula is derivable”) follows with a similar proof as previously:given a consistent formula, construct a model satisfying it ~⊢¬  ~⊨¬ • Extension lemma: If Φ is a finite consistent set of formulæ and  is any formula, then Φ{} or Φ{¬} is consistent • Needs additionally: If Φ is any consistent set of formulæ and x is a formula in Φ, then Φ{(t)} is consistent for any term t • From this, a canonical model can be constructed as before

Example • Consider the formula xyz ((p(x, y) ∧ p(y, z)) → p(x, z)) ∧ x ¬p(x, x) ∧ x p(x, f(x) ) This formula is satifiable only in infinite models

FOL: Undecidability • Completeness means the set of valid formulæ can be recursively enumerated • Turing showed that the invalid formulæ are not r.e., i.e., there is no algorithm deciding whether a formula is valid or not • strictly speaking, FOL= with at least one binary relation • certain sublanguages of FOL are still decidable

FOL= • Equality is not definable in FOL • First order logic with equality contains an additional (binary) relation == which is always interpreted as equality of domain elements • Written in infix notation, i.e. (x==y) for ==(x,y) • Axioms • (x==x) reflexivity • (x==y  (y==z  x==z)) transitivity • (x==y  y==x) symmetry • (x==y  (  (y:=x))) substitution

Presburger arithmetic • Given a signature (N, 0,´,+) of FOL=, define • n (n´==0) • m n (m´==n´ m==n) • p(0)  n(p(n) p(n´))  n p(n) • If the third axiom holds for all p, then this uniquely characterizes the natural numbers (“monomorphic”) • n (n+0==n) • mn ((m+n)+1 == m+(n+1)) • This theory is decidable!

Peano arithmetic • Given the signature (N, 0,´,+,*) and above axioms, plus • n (n*0==0) • mn (m*n´ == (m*n)+m) • This theory is undecidable

Formalizing C in FOL • Consider the following C program intgcd (int a, int b){ int c; while ( a != 0 ) { c = a; a = b%a; b = c; } return b; } • Consider the following FOL formula : t:N (a(t)==0  c(t+1)==a(t)  a(t+1)==b(t)%a(t)  b(t+1)=c(t)  a(t)==0  a(t+1)==a(t)  b(t+1)==b(t)  c(t+1)==c(t) ) • In which way are these equivalent?

Correctness From this formalization, we expect that •  ⊨ t (a(t)==0 → b(t)==gcd(a(0),b(0)))(partial correctness) •  ⊨ t (a(t)==0  b(t)==gcd(a(0),b(0)))(total correctness) Can we prove these statements?

First order theorem proving • Despite the undecidability of first order logic, provers have reached a remarkable proficiency • SPASS • Vampire • Otter, Prover9 • Need (some) arithmetic solver

Software Verification 1 Deductive Verification