410 likes | 519 Views
Syntax-Guided Synthesis. Rajeev Alur. Joint work with R.Bodik , G.Juniwal , M.Martin , M.Raghothaman , S.Seshia , R.Singh , A.Solar-Lezama , E.Torlak , A.Udupa. Program Verification. Does a program P meet its specification j, where j is written as a logical formula?
E N D
Syntax-Guided Synthesis Rajeev Alur Joint work with R.Bodik, G.Juniwal, M.Martin, M.Raghothaman, S.Seshia, R.Singh, A.Solar-Lezama, E.Torlak, A.Udupa
Program Verification • Does a program P meet its specification j, where j is written as a logical formula? • Motivation: Correctness of systems, finding bugs • Program verification is hard! • Formalizing a structured program into logical formulas • Using tools (SMT solvers) to verify whether the formalized program meets its specification. • SMT-LIB – common standards and library of benchmarks of SMT solvers.
Program Synthesis • Automatically synthesize a program Pthat satisfies a given specification j • Can potentially have greater impact than program verification • Program synthesis is hard! • Let’s provide a syntactic template for the program – Syntax-Guided Synthesis (SyGuS) • Works on special cases already exist (e.g. Sketch 2008) • Let’s build a common standard and benchmarks for SyGuS solvers (SYNTH-LIB)
Talk Outline • Background: SMT Solvers • Formalization of SyGuS • Solution Strategies • Conclusions + SyGuS Competition
What is SMT? Satisfiability Modulo Theories + Magnus Madsen
Recall SAT The Boolean SATisfiability Problem: • A=TRUE, =FALSE, =FALSE literal or negated literal Magnus Madsen
Recall SAT • SAT is NP-complete (solveable in exponential time) • Many SAT solvers exist • DPLL (1962) • Chaff (2001) • MiniSAT (2004) • Some do remarkably well. Magnus Madsen
What is an SMT instance? A logical formula built using • negation, conjunction and disjuction • e.g. • e.g. • theory specific operators • e.g. , • e.g. theory of bitwise operators theory of integers Magnus Madsen
Q: Why not encode every formula in SAT? A: Theory solvers have very efficient algorithms • Graph Problems: • Shortest-Path • Minimum Spanning Tree • Optimization: • Max-Flow • Linear Programming • (just to name a few) Magnus Madsen
Q: But then, Why not get rid of the SAT solver? A: SAT solvers are being studied for a long time Magnus Madsen
Formula SMT Solver SAT Theory YES YES NO add clause: Magnus Madsen
Theories Theory of: • Difference Arithemetic • Linear Arithmetic • Arrays • Bit Vectors • Algebraic Datatypes • Uninterpreted Functions Magnus Madsen
Equivalence Checking of Program Fragments int fun1(int y) { int x, z; z = y; y = x; x = z; return x*x; } SMT formula Satisfiableiff programs non-equivalent ( z = y y1 = x x1 = z ret1 = x1*x1) ( ret2 = y*y ) ( ret1 ret2 ) int fun2(int y) { return y*y; } What if we use SAT to check equivalence? ICCAD 2009 Tutorial
Equivalence Checking of Program Fragments SMT formula Satisfiableiff programs non-equivalent ( z = y y1 = x x1 = z ret1 = x1*x1) ( ret2 = y*y ) ( ret1 ret2 ) int fun1(int y) { int x, z; z = y; y = x; x = z; return x*x; } Using SAT to check equivalence (w/ Minisat) 32 bits for y: Did not finish in over 5 hours 16 bits for y: 37 sec. 8 bits for y: 0.5 sec. int fun2(int y) { return y*y; } ICCAD 2009 Tutorial
Equivalence Checking of Program Fragments int fun1(int y) { int x, z; z = y; y = x; x = z; return x*x; } SMT formula ’ ( z = y y1 = x x1 = z ret1 = sq(x1) ) ( ret2 = sq(y) ) ( ret1 ret2 ) int fun2(int y) { return y*y; } Using EUF solver: 0.01 sec ICCAD 2009 Tutorial
Verification Synthesis Program Verification: Does P meet spec j ? Program Synthesis: Find P that meets spec j SMT: Is jsatisfiable ? Syntax-Guided Synthesis SMT-LIB/SMT-COMP Standard API Solver competition Plan for SyGuS-comp
Talk Outline • Formalization of SyGuS • Solution Strategies • Conclusions + SyGuS Competition
Syntax-Guided Synthesis (SyGuS) Problem • Fix a background theory T: fixes types and operations • Function to be synthesized: name f along with its type • Inputs to SyGuS problem: • Specification j (semantic constraint) Typed formula using symbols in T + symbol f • Set E of expressions given by a context-free grammar Set of candidate expressions that use symbols in T (syntactic constraint) • Computational problem: Output e in E such that j[f/e] is valid (in theory T)
SyGuS Example • Theory QF-LIA Types: Integers and Booleans Logical connectives, Conditionals, and Linear arithmetic Quantifier-free formulas • Function to be synthesized f (int x, int y) :int • Specification: (x ≤ f(x,y)) & (y ≤ f(x,y)) & (f(x,y) = x | f(x,y) = y) • Candidate Implementations: Linear expressions LinExp := x | y | Const | LinExp + LinExp | LinExp - LinExp • No solution exists
SyGuS Example • Theory QF-LIA • Function to be synthesized: f (int x, int y) :int • Specification: (x ≤ f(x,y)) & (y ≤ f(x,y)) & (f(x,y) = x | f(x,y) = y) • Candidate Implementations: Conditional expressions with comparisons Term := x | y | Const | If-Then-Else (Cond, Term, Term) Cond := Term <= Term | Cond & Cond | ~ Cond | (Cond) • Possible solution: If-Then-Else (x ≤ y, y, x) …. Solving SyGus is hard!
Talk Outline • Solution Strategies • Conclusions + SyGuS Competition
Solving SyGuS as Active Learning: Counter-Example Guided Inductive Synthesis Initial examples I Candidate Expression Learning Algorithm Verification Oracle Counterexample Fail Success Concept class: Set E of expressions Examples: Concrete input values
CEGIS Example • Specification: (x ≤ f(x,y)) & (y ≤ f(x,y)) & (f(x,y) = x | f(x,y) = y) • Set E: All expressions built from x,y,0,1, Comparison, +, If-Then-Else Examples = { } Candidate f(x,y) = x Learning Algorithm Verification Oracle Example (x=0, y=1)
CEGIS Example • Specification: (x ≤ f(x,y)) & (y ≤ f(x,y)) & (f(x,y) = x | f(x,y) = y) • Set E: All expressions built from x,y,0,1, Comparison, +, If-Then-Else Examples = {(x=0, y=1) } Candidate f(x,y) = y Learning Algorithm Verification Oracle Example (x=1, y=0)
CEGIS Example • Specification: (x ≤ f(x,y)) & (y ≤ f(x,y)) & (f(x,y) = x | f(x,y) = y) • Set E: All expressions built from x,y,0,1, Comparison, +, If-Then-Else Examples = {(x=0, y=1) (x=1, y=0) (x=0, y=0) (x=1, y=1)} Candidate ITE (x ≤ y, y,x) Learning Algorithm Verification Oracle Success
SyGuS Solutions • CEGIS approach (Solar-Lezama, Seshia et al) • Related work: Similar strategies for solving quantified formulas and invariant generation • Coming up: Learning strategies based on: • Enumerative (search with pruning): Udupa et al (PLDI’13) • Symbolic (solving constraints): Gulwani et al (PLDI’11) • Stochastic (probabilistic walk): Schkufza et al (ASPLOS’13)
Enumerative Learning • Find an expression consistent with a given set of concrete examples • Enumerate expressions in increasing size, and evaluate each expression on all concrete inputs to check consistency • Key optimization for efficient pruning of search space (examples): • Expressions e1 and e2 are equivalent if e1(a,b)=e2(a,b) on all concrete values (x=a,y=b) in Examples • Only one representative among equivalent subexpressions needs to be considered for building larger expressions • Fast and robust for learning expressions with ~ 15 nodes
Symbolic Learning • Each production in the grammar is thought of as a component. Input and Output ports of every component are typed. • Use a constraint solver for both the synthesis and verification steps Term Term Cond ITE >= + Term Cond Term Term Term Term Term Term Term Term Term 0 1 x y • A well-typed loop-free program comprising these component corresponds to an expression DAG from the grammar.
Symbolic Learning • Start with a library consisting of some number of occurrences of each component. n1 n10 n9 n8 n7 n5 n6 n3 n2 n4 • Synthesis Constraints: • Shape is a DAG, Types are consistent • Spec j[f/e] is satisfied on every concrete input values in Examples 0 y 1 x + y >= ITE x + • Use an SMT solver (Z3) to find a satisfying solution. • If synthesis fails, try increasing the number of occurrences of components in the library in an outer loop
Symbolic Learning - example • Iteration 1: x • Learned counter-example: <x= -1, y=0>
Symbolic Learning - example • Iteration 2: x y ≥ ITE x x • Learned counter-example: <x= 0, y=-1>
Symbolic Learning - example • Iteration 3: x y ≥ ITE x y • Learned counter-example: -
Stochastic Learning • Idea: Use the Metropolis-Hastings Method to find desired expression e by probabilistic walk on graph where nodes are expressions and edges capture single-edits. …..in simple words: • Let En be the expressions of size n (n picked randomly). • For every expression e in En set Score(e) between 0 and 1(“Extent to which e meets the spec φ”) • Score(e) = exp( - 0.5 Wrong(e)), where Wrong(e) = No of examples in I for which ~ j [f/e] • Score(e) is large when Wrong(e) is small. Expressions e with Wrong(e) = 0 more likely to be chosen in the limit than any other expression
Stochastic Learning • Initial candidate expression e sampled uniformly from En • When Score(e) = 1, return e • Pick node v in parse tree of e uniformly at random. Replace subtree rooted at e with subtree of same size, sampled uniformly e e’ + + z + z - • With probability min{ 1, Score(e’)/Score(e) }, replace e with e’ • Repeat until finding e’ such that Score(e’) = 1. • Outer loop responsible for updating expression size n y x 1 z
Stochastic Learning - example • Suppose n = 6 ; there are 768 expressions of size 6 • e = ITE(x ≤ 0,y,x) is picked with probability 1/768 • The condition x ≤ 0 is mutated to y ≤ 0 with probability 1/6 X 1/48 • Suppose the set of concrete examples is: • {(-1,-4),(-1, 3),(-1, 2),(1,1),(1,2)} • Then Score(e)=exp(-0.5X2), and Score(e’)=exp(-0.5X4) • As e’ is replaced with e with probability exp(-0.5X2) • Note that for e’’ = ITE(x<= y,y,x) we have Score(e’’)=1. • Specification: (x ≤ f(x,y)) & (y ≤ f(x,y)) & (f(x,y) = x | f(x,y) = y) • Set E: All expressions built from x,y,0,1, Comparison, +, If-Then-Else
Benchmarks and Implementation • Prototype implementation of Enumerative/Symbolic/Stochastic CEGIS • Benchmarks: • Bit-manipulation programs from Hacker’s delight • Integer arithmetic: Find max, search in sorted array • Challenge problems such as computing Morton’s number • Multiple variants of each benchmark by varying grammar • Results are not conclusive as implementations are unoptimized, but offers first opportunity to compare solution strategies
Evaluation Summary • Enumerative CEGIS has best performance, and solves many benchmarks within seconds Potential problem: Synthesis of complex constants • Symbolic CEGIS is unable to find answers on most benchmarks Caveat: Sketch succeeds on many of these • Choice of grammar has impact on synthesis time When E is set of all possible expressions, solvers struggle • None of the solvers succeed on some benchmarks Morton constants, Search in integer arrays of size > 4 • Bottomline: Improving solvers is a great opportunity for research !
Plan for SyGuS-Comp • Proposed competition of SyGuS solvers at FLoC, July 2014 • Organizers: Alur, Fisman (Penn) and Singh, Solar-Lezama (MIT) • Website: excape.cis.upenn.edu/Synth-Comp.html • Mailing list: synthlib@cis.upenn.edu • Call for participation: • Join discussion to finalize synth-lib format and competition format • Contribute benchmarks • Build a SyGuS solver