Adding Structure to Words

Adding Structure to Words Rajeev Alur University of Pennsylvania http://www.cis.upenn.edu/~alur/ NEVER, March 2006

Software Model Checking Challenges • Automatic abstraction • Efficient analysis • Explicit state • Symbolic model checking • SAT solvers • Checking abstract counter-examples • Abstraction refinement • Component-based checks • Application domain • Choosing specification and modeling languages Specification Program Abstractor Model Verifier Debugger Counter-example No/bug Yes/proof

Classical Model Checking • Both model M and specification S are finite-state, and define regular languages • M as a generator of all possible behaviors • S as an acceptor of “good” behaviors (verification is language inclusion of M in S) or as an acceptor of “bad” behaviors (verification is checking emptiness of intersection of M and S) • Typical specifications (using automata or temporal logic) • Safety: Lock and unlock operations alternate • Liveness: Every request has an eventual response • Theory of regular languages provides robust foundation • For liveness properties, one needs to consider automata over infinite words, but corresponding theory of omega-regular languages is well developed and well understood

Checking Structured Programs • Control-flow requires stack, so model M defines a context-free language • Algorithms exist for checking regular specifications against context-free models • Emptiness of pushdown automata is solvable • Product of a regular language and a context-free language is context-free • But, checking context-free spec against a context-free model is undecidable! • Context-free languages are not closed under intersection • Inclusion as well as emptiness of intersection undecidable

Are Context-free Specs Interesting? • Classical Hoare-style pre/post conditions • If p holds when procedure A is invoked, q holds upon return • Total correctness: every invocation of A terminates • Integral part of emerging standard JML • Stack inspection properties (security/access control) • If setuuid bit is being set, root must be in call stack • Interprocedural data-flow analysis • All these need matching of calls with returns, or finding unmatched calls • Recall: Language of words over [, ] such that brackets are well matched is not regular, but context-free

Checking Context-free Specs • Many tools exist for checking specific properties • Security research on stack inpsection properties • Annotating programs with asserts and local variables • Interprocedural data-flow analysis algorithms • What’s common to checkable properties? • Both model M and spec S have their own stacks, but the two stacks are synchronized • As a generator, program should expose the matching structure of calls and returns Solution: Nested words and theory of regular languages over nested words

An execution as a word An execution as a nested word s s 1 w w 1 1 4 s s Summary edges from calls to returns w w 2 2 r r 3 3 2 s s 3 Symbols: w : write x r : read x s : other w w 4 4 s s Program Executions as Nested Words Program bool P() { local int x,y; … x = 3; if Q x = y ; … } bool Q { local int x; … x = 1; return (x==0); }

q9=d(q2,q8,a9) q1 q0 a1 a2 a9 q2 q3 q7 q8 a3 a4 a7 a8 q4 q5 q6=d(q5,a6) a5 a6 Finite State Automata Nested word: • Linear sequence + well-nested edges • Positions labeled with symbols in S Finite state acceptors: • Finitely many states • Starts in initial state • Must end in one of the final states • Transition function gives state as a function of current symbol and states at all incident edges

Regular Languages of Nested Words • A set of nested words is regular if there is a finite-state automaton that accepts it • Nondeterministic automata over nested words can be determinized • Like subset construction for classical automata • Involves exponential blow-up • The set of runs of a (nondeterministic) pushdown automaton, with the summary edges added, is regular • A structured program can now be modeled as a regular language of nested words • Price: exposed call/return structure of program • Exposure can depend on specifications of interest

Specifications Intuition: Keeping track of context is easy; just skip using a summary edge • Finite-state properties of paths, where a path can be a local path, a global path, or a mixture Sample regular properties: • If p holds at a call, q should hold at matching return • If x is being written, procedure P must be in call stack • Within a procedure, an unlock must follow a lock

Closure Properties • The class of regular languages of nested words is closed under intersection • Closed under union • Closed under complementation • Closed under concatenation and Kleene-* • Natural way to define concatenation of nested words • Closed under homomorphisms • Closed under reversal • Reverse of a nested word: reverse all edges

Decision Problems • Membership: given a nested word w and an automaton A, does A accept w? • Linear in the size of w • Emptiness: given an automaton A over nested words, is its language empty? • Cubic algorithm similar to pushdown automata • Inclusion (equivalence): given automata A and B, is language of A contained in (same as) language of B • If B is nondeterministic, determinize • Complement B • Take product with A and check for emptiness

Robust Expressiveness • Monadic second order logic of nested words • First-order variables and quantifiers • Base relations: i<j, i=j+1, m(i,j), p(i) • Unary relation variables and quantifiers • A set of nested words is definable in MSO iff it is regular • Syntactic congruence based characterization • Myhill-Nerode style theorem equating regularity to existence of congruences of finite index • Minimization also possible (with some caveats)

Relating to Word Languages Moving nesting structure from shape to labels • Alphabet is structured, and partitions positions into calls, returns, and local (this implicitly defines summary edges) Visibly Pushdown Automata • Pushdown automaton that must push at calls and pop at returns • Word languages accepted by VPA: Visibly Pushdown Languages • VPL is a subclass of deterministic CFL with appealing closure/algorithmic properties

Relating to Tree Languages A binary tree is hiding in a nested word • At calls, left subtree encodes what happens in the called procedure, and right subtree gives what happens after return Why not use tree encoding and tree automata ? • Nesting is encoded, but linear structure is lost • Deterministic tree automata are not expressive • XML literature has lots of (uncompelling) attempts to address this deficiency: Tree walking automata, Automata with pebbles…

Summary • Allowing a program to expose call-return summary edges leads to nested words • Robust theory of regular languages of nested words • Deterministic acceptors (Visibly pushdown automata) • Suitable for algorithmic analysis • Extends to w-regular languages of infinite nested words • Branching-time properties: nested trees • Powerful theory of alternating tree automata and fixpoint logics over nested trees • Foundation for next-generation query languages • Inter-procedural program analysis, software model checking, runtime monitoring • Tool development under progress • Beyond program analysis: Document processing (streaming) and XML query languages

Adding Structure to Words