590 likes | 607 Views
Using active learning to synthesize models of database access applications, leveraging programs as specifications and regenerating for new platforms. Reverse engineering and simplifying legacy code. Observing inputs and outputs for data retrieval apps.
E N D
Using Active Learning to Synthesize Models of Applications That Access Databases Jiasi Shen, Martin RinardMIT EECS & CSAIL Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Motivation Synthesized program • I/O examples often underspecify the program behavior • I/O examples not necessarily easier to write than the program Input/output examples ? Synthesizer Synthesized program ? ? Synthesized program Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Motivation • Leverage a program as the specification Input/output examples Program (Black box) Synthesizer Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Motivation • Leverage a program as the specification • Use active learning to select inputs that eliminate uncertainty Choose inputs Program (Black box) Synthesizer Synthesized program Observe outputs Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Motivation • Leverage a program as the specification • Use active learning to select inputs that eliminate uncertainty Choose inputs Program (Black box) Synthesized program Synthesizer Inference and regeneration Regenerated program Observe outputs Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Why synthesize another program? Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Why synthesize another program? • Migrate implemented functionalitybetween platforms / languages >_ Inference and regeneration _____ __________ _____ __________ Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Why synthesize another program? • Migrate implemented functionality between platforms / languages • Write seed program, then regenerate for new platforms / languages[Rinardetal,Onward!’18] >_ _______ ____ ___________ ______________________ _____________ ________ ___________ _____ _______ Inference and regeneration _____ __________ _____ __________ Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Why synthesize another program? • Migrate implemented functionality between platforms / languages • Write seed program, then regenerate for new platforms / languages[Rinardetal,Onward!’18] • Reverse engineeringwhen source code is unavailable or obfuscated ? Inference and regeneration _____ __________ Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Why synthesize another program? • Migrate implemented functionality between platforms / languages • Write seed program, then regenerate for new platforms / languages[Rinardetal,Onward!’18] • Reverse engineering when source code is unavailable or obfuscated • Rewrite overly engineered legacy code with simple core functionality Inference and regeneration _____ __________ Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Choose inputs Program (Black box) Inference and regeneration Regenerated program Observe outputs Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Observe componentinteractions inadditionto final outputs Choose inputs ? Inference and regeneration Regenerated program Observe outputs Observe traffic and outputs ? ? Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Dataretrievalapplications • Prevalent • Potentially complex implementation • Simple core functionality ? DB >_ Retrieved data SQL query Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Example:Studentregistrationapp • Inputs (ID) • Inputp (Password) • Databasetables:students,teachers,courses,registration ifstudentsexists:ifstudentshas password p:Retrieveregistrationrecords s p s s p Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Observationsofdataretrievalapps • Data flow oftenmanifestsasSQLqueries • Controlflowlargelydependsonqueryresults • Observedatabasequeriesduringprogramexecution Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
? Konure DB Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
? Konure DB Conure https://www.petco.com/shop/en/petcostore/product/bird/live-birds/sun-conure Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Inputparameterformat Databaseschema Choose inputs ? Konure DB Choose DB values Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Inputparameterformat Databaseschema Choose inputs ? Konure DB Observe outputs Observe DB traffic Choose DB values Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Inputparameterformat Databaseschema Choose inputs ? Konure Regenerated program DB Observe outputs Observe DB traffic Choose DB values Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Infeasible if x == 23076821 then A else B Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Degeneratesolution if x == i1 then o1 else if x == i2 then o2 else if x == i3 then o3 ... Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Use DSL to precisely captureprograms that can be inferred • Rule out uninferable programs • Rule out degenerate solutions • Design DSL and inference algorithm together • Restrictive:If program expressible in DSL, guarantee correct inference • Expressive:DSL supports applications of practical interest (dataretrievalapps) Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Each statement performs a query y← select from (joined) tables the rows that satisfy an expression • Retrieve data, store data in y, reference y later • Expressions • Reference retrieved data: Col = y.Col • Reference input parameter: Col = x • Compare columns: Col = Col • Conjunctions: Expr/\Expr Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Control flow directly tied to query results ify← select … then {code if y is nonempty} else {code if y is empty} fory← select … do {code for each row in y} else {code if y is empty} • ObservecontrolflowbyobservingDBtraffic Dependencycomplications Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Control flow directly tied to query results ify← select … then {code if y is nonempty} else {code if y is empty} fory← select … do {code for each row in y} else {code if y is empty} • ObservecontrolflowbyobservingDBtraffic • ForceexecutiondownapathbypopulatingDBwithchosenvalues Dependencycomplications Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Konure inference algorithm Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Two aspects to infer from program executions • Concrete SQL query with concrete values⇢ Abstract query template with variable references • Unstructured sequence of queries⇢ Structured control flow of the program Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Represent hypothesis in DSLsentential form Prog y1← Q1 y2← Q2 y3← Q3 Prog for y1← Q1 do { if y2← Q2 then 𝜖 else Prog } else Prog Resolve each Prognonterminalbyapplyingappropriateproduction Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Prog Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
s = 0 p = 1 s ? Konure Prog p DB Empty Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
students • Empty • teachers • Empty • courses • Empty • registration • Empty DB Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Q1: select student.* where id = s s s = 0 p = 1 s ? Konure Prog p DB SELECT * FROM student WHERE id = ‘0’ Empty Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Q1: select student.* where id = s s s = 0 p = 1 s ? Konure Prog p DB SELECT * FROM student WHERE id = ‘0’ Empty Empty (0 rows) Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Prog := Seq? Prog := 𝜖? y ← Q1 Prog 𝜖 Q1: select student.* where id = s Prog := If ? if y ← Q1 then Progelse Prog s Prog := For ? for y ← Q1 do Progelse Prog ? Konure Prog DB E0 E1 E2 0 1+ 2+ Can we make Q1 retrieve rows? (0 rows) Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Ask for three executions to resolve Prog Execution E0 Q1 Q2 (0 rows) Execution E0 Q1 Q2 (0 rows) Execution E2 Q1 [rep1] ... [repN] (N≥2 rows) Execution E1 Q1 Q2 (1+ rows) Execution E1 Q1 Q3 (1+ rows) Prog:=Seq Prog:=If Prog:=For DSL restrictions Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Prog := 𝜖? Prog := Seq? 𝜖 y ← Q1 Prog Q1: select student.* where id = s Prog := If ? if y ← Q1 then Progelse Prog s Prog := For ? for y ← Q1 do Progelse Prog • E0 (Q1 gets 0 rows):Previous execution • E2 (Q1 gets 2+ rows):Unsat • E1 (Q1 gets 1+ rows):Next execution… ? Konure Prog DB Execution E0 (0 rows) Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Prog := 𝜖? Prog := Seq? 𝜖 y ← Q1 Prog Prog := If ? if y ← Q1 then Progelse Prog Prog := For ? for y ← Q1 do Progelse Prog Execution E1 s = 0 p = 1 s ? Konure Prog p DB student: id = 0, password = 2, firstname = 3, … Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
students • id = 0, password = 2, firstname = 3, lastname = 4 • teachers • Empty • courses • Empty • registration • Empty DB Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Prog := 𝜖? Prog := Seq? y ← Q1 Prog 𝜖 Q1: select student.* where id = s Prog := If ? if y ← Q1 then Progelse Prog s Prog := For ? for y ← Q1 do Progelse Prog Execution E1 s = 0 p = 1 s ? Konure Prog p DB SELECT * FROM student WHERE id = ‘0’ student: id = 0, password = 2, firstname = 3, … Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Prog := 𝜖? Prog := Seq? 𝜖 y ← Q1 Prog Q1: select student.* where id = s Prog := If ? if y ← Q1 then Progelse Prog s Prog := For ? for y ← Q1 do Progelse Prog Execution E1 s = 0 p = 1 s ? Konure Prog p DB SELECT * FROM student WHERE id = ‘0’ student: id = 0, password = 2, firstname = 3, … student: id = 0, password = 2, firstname = 3, … (1 row) Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Prog := Seq? Prog := 𝜖? 𝜖 y ← Q1 Prog Q1: select student.* where id = s Prog := If ? if y ← Q1 then Progelse Prog s Prog := For ? for y ← Q1 do Progelse Prog Execution E1 s = 0 p = 1 s ? Konure Prog p DB student: id = 0, password = 2, firstname = 3, … (1 row) Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Prog := 𝜖? Prog := Seq? 𝜖 y ← Q1 Prog Q1: select student.* where id = s Prog := If ? if y ← Q1 then Progelse Prog s Prog := For ? for y ← Q1 do Progelse Prog Execution E1 s = 0 p = 1 s ? Konure Prog p DB SELECT * FROM studentWHERE id = ‘0’AND password = ‘1’ student: id = 0, password = 2, firstname = 3, … Q1: select student.* where id = s Q2: select student.* where id = s ∧ password = p (1 row) s (1 row) s p Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Prog := 𝜖? Prog := Seq? 𝜖 y ← Q1 Prog Prog := If ? if y ← Q1 then Progelse Prog Prog := For ? for y ← Q1 do Progelse Prog Execution E1 s = 0 p = 1 s ? Konure Prog p DB SELECT * FROM studentWHERE id = ‘0’AND password = ‘1’ Empty student: id = 0, password = 2, firstname = 3, … Q1: select student.* where id = s Q2: select student.* where id = s ∧ password = p (1 row) s (0 rows) s p Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Prog := Seq? Prog := 𝜖? y ← Q1 Prog 𝜖 Prog := If ? if y ← Q1 then Progelse Prog Prog := For ? for y ← Q1 do Progelse Prog ? Konure Prog DB Q1: select student.* where id = s Q2: select student.* where id = s ∧ password = p (1 row) Execution E1 s (0 rows) s p Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Prog := Seq? Prog := 𝜖? y ← Q1 Prog 𝜖 Q1: select student.* where id = s Prog := If ? if y ← Q1 then Progelse Prog s Prog := For ? for y ← Q1 do Progelse Prog Execution E0 Prog Execution E1 if y1← Q1 then Prog else Prog (0 rows) Does not exist (Unsat) Execution E2 ifstudentsexiststhen(E1) else(E0) s (1 row) (0 rows) Q1: select student.* where id = s Q2: select student.* where id = s ∧ password = p s s p Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Resolve subprograms recursively(No backtracking) if y1← Q1 then Prog else Prog if y1← Q1 then { if y2← Q2 then Prog else Prog } else Prog Prog:=If Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Resolve subprograms recursively(No backtracking) if y1← Q1 then { if y2← Q2 then Prog else Prog } else Prog if y1← Q1 then { if y2← Q2 then { for y3← Q3 do Prog else Prog } else Prog } else Prog Prog:=For Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Resolve subprograms recursively(No backtracking) if y1← Q1 then { if y2← Q2 then { for y3← Q3 do Prog else Prog } else Prog } else Prog if y1← Q1 then { if y2← Q2 then { for y3← Q3 do { y4← Q4 Prog } else Prog } else Prog } else Prog Prog:=Seq Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19
Choosing inputs • Encode paths as quantifier-free SMT formulas • Solve for inputs and DB values to force execution down this path • Complications: • Dependency • Ambiguity Using Active Learning to Synthesize Models of Applications That Access Databases, PLDI '19