Learning Universally Quantified Invariants of Linear Data Structures

Learning Universally Quantified Invariants of Linear Data Structures Pranav Garg1, ChristofLoding, 2 P. Madhusudan1 and Daniel Neider2 1University of Illinois at Urbana-Champaign 2RWTH Aachen, Germany

Black-box learning of invariants • Renewed interest in application of learning to synthesizing invariants [Sharma et al. CAV-12], [Sharma et al. SAS-13], [Kong et al. APLAS-10] Black-box learning of invariants: • Advantages with respect to white-box techniques: - verification of complex program with simple invariants - generalization - apply extremely scalable Machine Learning algorithms for verification. Program Learner check Hypothesis? H (hypothesis) Teacher

Active Learning and Passive Learning Active Learner • Active learning: - learner queries teacher with equivalence and membership queries • Passive learning: - given a sample = (examples, counter-examples), learn the simplest concept Teacher membership/ equivalence yes/no Sample S Learner

Overview • Build active learning algorithms for learning quantified formulas over linear data structures (arrays/lists). - introduce Quantified Data Automata normal form for such invariants. - build active learning algorithm for QDAs. • Build passive learning algorithm using active learning algorithm. - based on an imprecise teacher that answers questions wrt the samples. • Introduce elastic QDAs (EQDAs) that translate to decidable logics. - develop learning algorithms for EQDAs. • List pointed to by head is sorted head 5 7 8 9

Program Configuration/Data words i Program configuration: head 8 9 3 2 4 7 Data word:

Quantified Data Automata • QDAs represent universally quantified properties of linear data structures. y1 Example: head data(y1) <= data(y2) y2

Quantified Data Automata Fix P – program pointer variables Fix Y – set of quantified variables Fix F – numerical abstract domain over data formulas • QDA over linear data structures: - reads a data word annotated with pointers P and Y - checks whether data stored at these positions satisfy a data property • QDA accepts a data word w with pointers P if it accepts all possible extensions of w with valuations for Y. y1 head data(y1) <= data(y2) y2

Valuation words • Valuation word = data word over P + valuation for Y y1 i, y2 i i, y2 Data word head head, y1 head 8 4 8 9 3 2 4 7 3 3 7 8 9 2 4 7 2 9 Valuation words Universal Quantification QDA accepts a data word iff it accepts ALL corresponding valuation words.

Quantified Data Automata • Deterministic, finite, register automata over words - each state labeled with a data formula f • For a valuation word, QDA reads ptr. and univ. vars. and stores the data values in the register reg. • At the final state, QDA checks if these data values satisfy the formula labeling the state. - reg satisfies f(q) Accepts the valuation word - regdoes not satisfy f(q) Rejects the valuation word head head i, y2 i, y2 y1 y1 reg: head  2 y1 4 i  8 y2  8 3 7 4 2 3 9 8 8 4 7 2 9 f(q) = data(y1) <= data(y2)

Learning QDAs • QDAs are finite automata which output data formulas. • Lift Angluin’s L* algorithm for learning DFAs to learn QDAs. • Given a teacher, the unique minimal QDA can be learned in time polynomial in the size of this minimal QDA. y1 head Regular expression outputs data(y1) <= data(y2) data(y1) <= data(y2) y2

Elastic Quantified Data Automata (EQDA) • Subclass of QDAs which translate to decidable logics - Array Property Fragment (APF) [Bradley et al. VMCAI-06] - decidable fragment of Strand over lists [Madhusudan et al. POPL-11] • Cannot test whether two universal vars. are a bounded distance away. y2 y2 y1 y1 outside APF inside APF Restriction for EQDAs: All transitions on blank symbols (no ptr./univ. var) must be self-loops QDA EQDA

Elastic Quantified Data Automata (EQDA) Unique minimal over-approximation theorem: A QDA A can be uniquelyminimallyover-approximated by a language of valuation words that is accepted by an EQDA Ael • The construction of Ael given QDA A is called elastification. • Learning EQDAs <= learning QDAs + elastification. Bel Cel Ael A

Passively learning QDAs PassiveLearner Active Learner Sample S+, S- Given the samples S+ and S-, the teacher uses them to answer the active learner. The teacher wants the active learner to construct a QDA that includes S+ and excludes S-. • Membership query: - if s belongs to S+, return yes - if s belongs to S-, return no - otherwise, return no(errs on keeping the learned concept semantically small) • Equivalence query: - checks if conjectured invariant is consistent with S+ and S- The learned QDA might be non-optimal (usually small). Running time is polynomial in the size of the learned QDA. Teacher

Experiments • Run the program on arrays/lists of small bounded sizes, with data values from a bounded data-domain, eg. {0, 1, 2}, etc. • Extract the concrete data-structures that get manifest at loop headers. • Obtain the set S+ on which passive learning is performed. - fix F to the cartesian lattice of atomic formulas over relations {=, <, ≤} Learn QDAs using Angluin’s algorithm - The learner never asks long membership queries - The teacher, thus, often has correct answers. The learned QDA is over-approximated to an elastic QDA to get a quantified invariant over decidable Strand or APF.

Experiments

Related Work • Daikon [Ernst et al. ICSE-00] - conjunctive Boolean learning - learns quantified invariants over arrays, to some extent. • Applications of learning in verification - rely-guarantee contracts [Cobleigh et al. TACAS-03, Alur et al. CAV-05] - stateful interfaces [Alur et al. POPL-05] - learning quantified invariants over predicates [Kong et al. APLAS-10] • Machine learning algorithms for invariant synthesis [Sharma et al. CAV-12, SAS-13, ESOP-13]

Conclusion • Learning universally quantified invariants over linear data structures - Quantified Data Automata (QDA) / elastic QDAs - Active learning for QDAs - Unique elastification - Algorithm for passive learning QDAs/EQDAs. - Experimental validation Future Work: • Extensions to trees to capture universally quantified properties like binary-search-tree, max-heap, … • Combining automata based structural learning with machine learning algorithms for learning data formulas Thank You !

Learning Universally Quantified Invariants of Linear Data Structures

Learning Universally Quantified Invariants of Linear Data Structures

Presentation Transcript

Invariants

Linear Data Structures

INVARIANTS

Quantified Invariants in Rich Domains using Model Checking and Abstract Interpretation

Linear Linked Structures

Linear Data Structures

The Universally Designed Learning Experience

Other Linear Structures

Linear Data Structures

Linear Data Structures (Stack)

Invariants for Non-Hierarchical Object Structures

Linear Structures Revisited

Perceptual Organization of Linear Structures

CSE 326: Data Structures Sorting in (kind of) linear time

Linear Data Structures (Queue)

Linear Recursive Structures(LRS)

Module 2 Non Linear Data Structures and Applications

Linear Structures Revisited

CSE 326: Data Structures Sorting in (kind of) linear time

Linear Linked Structures

An Empirical Study of In-Class Labs on Student Learning of Linear Data Structures

Non-Linear Structures