410 likes | 548 Views
The Benefits of Exposing Calls and Returns. Rajeev Alur University of Pennsylvania. CONCUR/SPIN, August 2005. Software Model Checking. Control flow graph + Boolean vars (Pushdown automata). Observables. Predicate abstraction. Abstractor. Code. Model. Temporal logics/Automata
E N D
The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005
Software Model Checking Control flow graph + Boolean vars (Pushdown automata) Observables Predicate abstraction Abstractor Code Model Temporal logics/Automata Regular! Verifier Counter-example Specification Yes On-the-fly explicit state Symbolic fixpoint evaluation
Regular specifications not expressive enough • Classical Hoare-style pre/post conditions • If p holds when procedure A is invoked, q holds upon return • Total correctness: every invocation of A terminates • Integral part of emerging standard JML • Stack inspection properties: security/access control • If a setuuid bit is being set, process root must be in the call stack • Inter-procedural data-flow analysis • An expression e is very busy at a control point p if on all paths from p, e will be used before any of its variables (possibly local in current procedure) are modified Need matching of calls with returns, or finding pending calls, or local paths --- Context-free properties!
Checking Context-free Specifications • Obstcales • Context-free languages are not closed under intersection • Checking context-free properties against context-free models is undecidable • However, many such properties are verifiable • Existing work in security that handles some stack inspection properties[JMT99,JKS03] • Adding assert statements in the program (with additional local variables, if needed), and then checking regular properties (e.g. reachability) amounts to checking context-free properties • Inter-procedural data-flow analysis algorithms [RHS95]
Exposing Calls and Returns • What’s common to the checkable properties? • Both model and property have their own stack, but the two stacks are synchronized and grow/shrink together! • As a generator, program exposes its calls and returns, and as an acceptor, property pushes on calls and pops on returns Formalization of this intuition: Visibly Pushdown Languages A surprisingly robust class of languages with properties like the regular languages and potentially many applications
Talk Outline • Visibly Pushdown Languages • Temporal Logic CaRet and Model Checking • Ongoing Work References: • Visibly pushdown languages Alur, Madhusudan; STOC’04 • A temporal logic of nested calls and returns Alur, Etessami, Madhusudan; TACAS’04 • Congruences for visibly pushdown languages Alur, Kumar, Madhusudan, Viswanthan; ICALP’05
Context-free Languages: Recap (1/2) • Given an alphabet S, a language L is a set of finite words over S • A pushdown automaton (PDA) has a finite control and a stack, and while reading a word, it can push/pop stack symbols while updating control state • Configuration of a PDA: control state + a string of stack symbols • Acceptance defined by empty stack or final state • A language L is a context-free language (CFL) if there is a pushdown automaton that accepts it • Sample CFLs • All regular languages • Set of words of the form an bn, for some n • Set of words with equal number of a and b symbols • Non-CFL: Set of words of the form an bn cn
Context-free Languages: Recap (2/2) • Alternative characterization: Context-free grammars • Natural and popular for defining syntax • Nondeterministic PDAs are more expressive than deterministic ones • Emptiness of a PDA solvable in polynomial-time • Closed under union, but not closed under intersection or complementation • Language inclusion, emptiness of intersection undecidable • Applications: Parsing, Natural language processing, Program analysis…
Exposing Calls and Returns • Pushdown alphabet: partitioned into 3 disjoint sets Σ = SpushSpopSlocal • Pushdown words: finite words over pushdown Σ • A visibly pushdown automaton over a pushdown alphabet Σ is a pushdown automaton that • pushes a symbol onto the stack on a symbol in Spush • pops the stack on a symbol in Spop • cannot change the stack on a symbol in Slocal Key: Stack size at any time is determined by the input wordbut not control state or stack content
Visibly pushdown languages (VPL) • A language L is a VPL over a pushdown alphabet Σ, if there is a visibly pushdown automaton that accepts it (acceptance by final state) • The language {an bn | for some n} • VPL if a is in Spush and b is in Spop • Not a VPL for other partitions • The language of words with equal number of a and b symbols is not a VPL (independent of partition) • Every regular language L is a VPL independent of partitioning • Dyck language (words with well-balanced parantheses) is a VPL provided left/right parantheses are in Spush/Spop resp
VPLs in Program Analysis Program Analysis bool P(u:int) { global int x; local int y; … a: if Q { x = (x+y) }; … } bool Q { local int y; if { …. y++; return 1;} else return P(x) } To figure out whether the expression e=(x+y) is very busy at program point a, Spush = {call-p, call-q} Spop = {ret-p, ret-q} Slocal = {used-e, mod-x, mod-y, skip} Executions are pushdown words, e.g. call-q, skip, mod-y, ret-q, used-e, mod-x, skip, ret-p Set of executions starting at a location a is a VPL: La Set of executions in which e is very-busy is also a VPL: Le e is very busy at a if La is included in Le
VPLs for Document Processing XML Document Query Processing <conference> <name> CONCUR 2005 </name> <location> <city> San Francisco </city> <hotel> Stanford Court </hotel> </location> <sponsor> CISCO </sponsor> <sponsor> Microsoft </sponsor> … </conference> Pushdown alphabet Spush = {<name>, <location>, …} Spop = {</name>, </location>, …} Slocal = {San Francisco, Microsoft, …} A document d is a pushdown word Sample Query: Find documents related to conferences sponsored by Microsoft in San Francisco Specify query as a VPL: L Analysis: Membership question Does document d satisfy query L ? Use VPAs instead of tree automata! (typically, no recursion, but only hierarchy)
Closure Properties Note: can’t combine languages wrt different partitions • Closed under intersection: Given two VPAs A and B, build a product C accepting intersection of L(A) and L(B) • State of C: (state of A, state of B) • Stack symbol of C: (stack symbol of A, stack symbol of B) • C can simulate the stacks of A and B together • Closed under union • Closed under complementation • Closed under concatenation and Kleene-* • Closed under partition-preserving homomorphisms
Determinization • Given a nondeterministic VPA A, we can construct a deterministic VPA B that accepts the same language and has size exponential in A • Potentially useful for building runtime monitors for checking program executions, and online algorithms for XML query processing • VPLs are a subclass of DCFLs (languages defined by deterministic PDAs) • DCFLs not closed under union • Equivalence problem for DCFLs decidable, but complex
Determinization: Sketch of the construction • Determinization of nondeterministic automata uses subset construction: a state R of B is a set of states of A (the states that A can be, having read the word w so far) • Subset construction does not apply to stack • But we can do subsets of summaries: if w is a well-matched word, (q,q’) is a summary of A on w, if A can go from (q,$) to (q’,$), where $ is stack bottom • More precisely, if w=w1c1w2c2…cnwn+1, where ci’s are calls and wi’s are well-matched words, then after reading w, determinized automaton B has • Stack is (Sn,Rn,cn),….(S1,R1,c1)$ • Control state is (Sn+1,Rn+1) • Ri = Set of all states A can be in after w1c1…wi • Si= Set of all summaries of A on the segment wi
Decision Problems • Emptiness: Given a VPA A, is its language empty? • Same as for PDAs: Polynomial-time complete (cubic) • Language inclusion (or equivalence): Given VPAs A and B, is language of A contained in that of B? • Determinize B, take its complement, take product with A, and test for emptiness • Exponential-time complete • Recall: Inclusion is PSPACE-complete for (nondeterministic) finite automata, and undecidable for PDAs
VPL Properties Summary Emptiness Inclusion L Regular Yes Yes Yes NLOG Pspace CFL Yes No No PTIME Undec DCFL Undec No No Yes PTIME Yes Yes Yes Exptime VPL PTIME
Pushdown Words as Binary Trees Let w = i5c1i1 c2i4 i3 i3 r2c1i1 r1r1i5 i3 i5 c1 r1 i1 Stack trees i5 c2 i3 r2 i4 c1 i3 i1 r1 i3
VPL: Connection to tree languages Tree-language characterization: Let L be a set of pushdown words and let ST(L) be the set of stack trees that correspond to L. Theorem: L is a VPL iff ST(L) is a regular tree language Note: It is well-known that the set of parse trees corresponding to a context-free grammar is a regular tree language
Finite word automata that can jump Let w = i5c1i1 c2i4 i3 i3 r2c1i1 r1r1i5 i3 • Summary Automata • Finite-state automaton that reads pushdown word • While reading a call, can send a copy to matching return • d(q,a) is a set of pairs of states if a is in Spush • Nondeterministic summary automata are expressively equivalent to VPAs • Deterministic VPA (= VPL) > Deterministic summary automata > Deterministic tree automata (on stack trees)
Robustness: Alternative Characterizations • Monadic second order logic with matching predicate • m(x,y) means x is a call and y is matching return • Sample formula: forall x. if p(x) then exists y,z. ( q(y) and x<y<z and m(x,z) ) • Thm. MSO + matching predicate interpreted over pushdown words is expressively equivalent to VPLs • Thm: Every CFL is a homomorphic image of a VPL • Context-free grammar based characterization • Two types of non-terminals V0 (matched words) and V1 • All productions are of the form X a if X is in V0 then a must be local X a Y b Za is a call, b is a return, Y is in V0 if X is in V0 then Z must be in V0
“Regular-like” properties continue.. • Congruences and minimization (Myhill-Nerode Theorem) central to theory of regular languages • Given a language L, for well-matched words u and v, define u ~L v iff for all words x and y, xuy in L iff xvy in L • Theorem: A language L of well-matched words is a VPL iff the congruence ~L is of finite index • Minimization • No unique minimal deterministic VPA in general, but… • Minimization of (single-entry) RSMs (i.e. procedural boolean programs) possible. Partitioning into k procedures/modules is adequate to get canonicity!
ω-VPL - Extension to Infinite Words • A Büchi VPA: • VPA over infinite pushdown words • A word is accepted if along a run, the set Fis seen infinitely often • ω-VPL – class of languages accepted by Büchi VPAs • ω-VPL is closed under all Boolean operations Characterization using regular trees and MSO characterization hold. • However, ω-VPLs are not determinizable! • Let L be set of all words such that the stack is repeatedly bounded i.e. for some n, the stack depth is n infinitely often. • L is an ω-VPLbut there is no deterministic (Muller) VPA for it • Language inclusion and equivalence are still decidable
Talk Outline • Visibly Pushdown Languages • Temporal Logic CaRet and Model Checking • Ongoing Work
Software Model Checking Control flow graph + Boolean vars (Pushdown automata) Observables Predicate abstraction Abstractor Code Model CaRet/VPAs Verifier Counter-example Specification Yes
bool bx, by; if bx { ……… by=true ……… } else { ………… by={true,false} ………. } Boolean Program Abstracting Software int x, y; if x>0 { ……. y=x+1 .…… } else { …… y=x+1 …… } bx: x>0 by: y>0 Program
Abstracting Modular Programs Program Recursive State Machine (RSM)/ Pushdown automaton main() { bool y; … x = P(y); … z = P(x); … } bool P(u: bool) { … return Q(u); } bool Q(w: bool) { if … else return P(~w) } A1 A2 A2 A2 A3 A3 Box (function-calls) A3 A1 Entry/Inputs Exit/outputs
Linear-time Propositional Temporal Logic Q ::- p | not Q | Q or Q’ | Next Q | Always Q | Eventually Q | Q Until Q’ Interpreted over (infinite) sequences. Models of an LTL formula is a w-regular language. Useful for stating sequencing properties: • If req happens, then req holds until it is granted: Always ( req → (req Until grant) ) • An exception is never raised: Always ( not Exception )
CARET CARET: A temporal logic for Calls and Returns Expresses context-free properties A B C A …………. Global successor used by LTL
CARET CARET: A temporal logic for Calls and Returns Expresses context-free properties A B C D …………. Global successor used by LTL Local successor: Jump from calls to returns Otherwise global successor at the same level
CARET CARET: A temporal logic for Calls and Returns Expresses context-free properties A B C A …………. Global successor used by LTL Local successor: Jump from calls to returns Otherwise global successor at the same level
CARET CARET: A temporal logic for Calls and Returns Expresses context-free properties Local path A B C A …………. Global successor used by LTL Local successor: Jump from calls to returns Otherwise global successor at the same level
CARET CARET: A temporal logic for Calls and Returns Expresses context-free properties A B C A …………. Global successor used by LTL Local successor: Jump from calls to returns Otherwise global successor at the same level Caller modality: Jump to the caller of the current module Defined for every position except top-level ones
CARET CARET: A temporal logic for Calls and Returns Expresses context-free properties A B C A Caller path gives the stack content! …………. Global successor used by LTL Abstract successor: Jump from calls to returns Otherwise global successor at the same level Caller modality: Jump to the caller of the current module Defined for every position except top-level ones
CARET Definition Syntax: Q ::- p | not Q | Q or Q’ | Next Q | Always Q | Eventually Q | Q Until Q’ Local-Next Q | Local-Always Q Local-Eventually Q | Q Local-Until Q’ Caller Q | CallerPath-Always Q CallerPath-Eventually Q | Q CallerPath-Until Q’ • Local- and Caller- versions of all temporal operators • All these operators can be nested
Expressing properties in Caret Pre-post conditions: If P holds when A is called, then Q must hold when the call returns Always ( (P and call-to-A) Local-Next Q ) Q P A Integrating Manna/Pnueli-style reasoning for reactive computations with Hoare-style reasoning for structured programs
Expressing properties in Caret If A is called with low priority, then it cannot access the file Always ( call-to-A and low-priority Local-Always ( not access-file ) ) A low-priority A high-priority access-file
Expressing properties in Caret Stack inspection properties If a variable x is accessed, then A must be on the call stack Always ( access-to-x CallerPath-Eventually call-to-A ) A access-to-x
Model checking CARET • Given: A (boolean) recursive state machine/ visibly pushdown automaton M A CARET formula Q • Model-checking: Do all runs of M satisfy the specification Q? CARET can be model-checked in time that is polynomial in M and exponential in Q. |M|3 . 2O(|Q|) Complexity class same as that for LTL ! Generalization of Vardi-Wolper construction
Model-checking CARET: Intuition • The specification matches calls and returns of the program, so the runs of the program and models of the formula are both visibly pushdown languages • Given M and formula Q, • Build a Buchi pushdown automaton that accepts words exhibited by M that satisfy (not Q) • Check this pushdown automaton for emptiness • Construction builds on the classical tableaux for LTL Local-Next Q1 Pop s and Q1 ; Check Q1 s Push s and Q1 s, Q1
Conclusions and Ongoing Work • Exposing calls and returns lets you hide the stack! • VPLs seem robust and adequate to model software analysis problems • VPL-triggered research • Dynamic logic with VPL (Loding,Serre) • Visibly pushdown games (Loding,Madhusudan,Serre) • XML query processing (Pitcher) • Third-order Algol with iteration (Murawski,Walukiewicz) • Active area of current research • DTDs, XML, and query languages • Branching-time logics, Fixpoint calculus, and visibly pushdown tree automata (Alur, Chaudhuri, Madhusudan) • Expressive completeness of temporal operators • Implementing a model checker for VPL monitors