440 likes | 653 Views
Program Analysis and Testing using Satisfiability Modulo Theories Yandex 2 October 2012, Moscow. Nikolaj Bjørner Senior Researcher Microsoft Research. Agenda. Context : Software Engineering Research @ Microsoft Application : Fuzzing and Test C ase G eneration
E N D
Program Analysis and Testing using SatisfiabilityModulo TheoriesYandex2 October 2012, Moscow Nikolaj Bjørner Senior Researcher Microsoft Research
Agenda Context: Software Engineering Research @ Microsoft Application: Fuzzing and Test Case Generation Application: Program Verification & Bit precise Analysis Application: String analysis - Formal Language Theory for Security Technology: Z3 – An Efficient SMT Solver - Basics and Research Propaganda: Software Engineering Research Tools
Context Team An Efficient SMT Solver Leonardo de Moura, Nikolaj Bjørner, Christoph Wintersteiger
Context Research in Software EngineeringImprove Software Development Productivity Group
Context Biology Communication & Collaboration Computational Linguistics Systems and Networking Economics Education Gaming Graphics and Multimedia Theory Hardware and Devices Health and Well-being Human-computer Interaction Information Retrieval & Management Machine Learning Security and Privacy Social Science Software Engineering OrganizationMicrosoft Research
Microsoft Research Labs Context Research :1% R & D ~40000 Company
Application: Fuzzing and Testing Fuzzing and Test Case Generation Dr. Strangelove? Bug: ***433 “2/29/2012 3:41 PM Edited by ***** SubStatus -> Local Fix I think the fuzzers are starting to become sentient. We must crush them before it is too late. In this case, the fuzzer figured out that if [X was between A and B then Y would get set to Z triggering U and V to happen……] ….. And if this fuzzer asks for the nuclear launch codes, don’t tell it what they are …” SAGE Internal. For Security Fuzzing Runs on x86 instructions External. For Developers Runs on .NET code Try it on: http://pex4fun.com Finding security bugs before the hackers black hat
Application: Fuzzing and Testing SAGE by numbers 100s CPU-years - largest dedicated fuzz lab in the world 100s apps - fuzzed using SAGE 100s previously unknown bugs found Billion+ computers updated with bug fixes Millions of $ saved for Users and Microsoft 10s of related tools (incl. Pex), 100s DART citations 3+ Billion constraints - largest usage for any SMT solver Adapted from [Patrice Godefroid, ISSTA 2010]
Application: Fuzzing and Testing Test case generation (y0 > 0) and (m0 = x0 % y0) and not (m0 = 0) and (x1 = y0) and (y1 = m0) and (m1 = x1 % y1) and (m1 = 0) • x0 = 2 • y0 = 4 • m0 = 2 • x1 = 4 • y1 = 2 • m1 = 0 SSA Solver We want a trace where the loop is executed twice. unsigned GCD(x, y) { requires(y > 0); while (true) { unsigned m = x % y; if (m == 0) return y; x = y; y = m; } }
Application: Fuzzing and Testing Test Case Generation Procedure Run Test and Monitor Path Condition Execution Path TestInputs KnownPaths seed New input Constraint System Solve Unexplored path
Application: Scalable bit-precise analysis What is wrong here? -INT_MIN= INT_MIN (INT_MAX+1)/2 +(INT_MAX+1)/2 = INT_MIN void itoa(int n, char* s) { if (n < 0) { *s++ = ‘-’; n = -n; } // Add digits to s …. intbinary_search(int[] arr,intlow, inthigh, int key) while (low <= high) { // Find middle value int mid = (low + high) / 2; intval = arr[mid];if (val == key) return mid;if (val < key) low = mid+1; else high = mid-1; }return -1; } Book: Kernighan and Ritchie Function: itoa (integer to ascii) Package: java.util.Arrays Function: binary_search
Application: Scalable bit-precise analysis Bit-precise analysis + 0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 1 1 1 1 1 0 1 1 1 = Vector Segments Bit-wise operations Concatenation Bit-wise and = 0 1 0 [4:2] = 1 0 1 0 1 1 = Vector Segments Modular arithmetic Extraction Addition
Application: Verification Hypervisor Verification (2007 – 2010) with Hypervisor Hardware Partners: European Microsoft Innovation Center Microsoft Research Microsoft’s Windows Division Universität des Saarlandes co-funded by the German Ministry of Education and Research http://www.verisoftxt.de
Application: Verification Microsoft Verifying C Compiler
Application: Verification SAT/SMT progress driven by applications:VCC Performance Trends Nov 08 – Mar 09 Modification in invariant checking Switch to Z3 v2 Z3 v2 update Attempt to improve Boogie/Z3 interaction Switch to Boogie2
Application: Verification Verification Attempt Time vs.Satisfaction and Productivity By Michal Moskal (VCC Designer and Software Verification Expert), Language quiz: “loose” or “lose” ?
Application: Verification The Importance of Speed
Application: Verification Building Verve Kernel.cs Source file Verification tool Compilation tool C# compiler Verified Nucleus.bpl (x86) Kernel.obj (x86) 9 person-months Boogie/Z3 TAL checker Translator/ Assembler Linker/ISO generator Verve.iso Safe to the Last Instruction / Jean Yang & Chris Hawbliztl PLDI 2010
Application: String Analysis Why string analysis?(motivating scenario) Tomcat v. < 6.0.18 req = http://www.x.com/%c0%ae%c0%ae/%c0%ae%c0%ae/private/ • security check: reqmust not contain "../" • dir= utf8decode("%c0%ae%c0%ae/%c0%ae%c0%ae/private/") = "../../private/" Analysis question: Does utf8decode reject overlongutf8-encodings such as "%C0%AE" for '.'? access granted to "../../private/" Windows 2000 vulnerability: http://www.sans.org/security-resources/malwarefaq/wnt-unicode.php Apache Tomcat vulnerability: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2938
Application: String Analysis Relativized Formal Language Theory string transformation SymbolicWord Transducers Classical Word Transducers modulo Th() Classical Word Transducers (e.g. decoding automata, rational transductions) Classical I/O Automata (e.g. Mealy machine) Symbolic Word Acceptors ClassicalWord Acceptors (NFA, DFA) Classical Word Acceptors modulo Th() regex matching
Application: String Analysis Rex & Bek – Symbolic RegEx & Transducers Margus Veanes
Application: String Analysis Symbolic Finite Transducer (SFT) • Classical transducer modulo a rich label theory • Core Idea: represent labels with guarded transformation functions • Separation of concerns: finite graph / theory of labels Concrete transitions: Symbolic transition: guard 1920transitions p p x. 8016≤ x ≤ 7FF16/ [C016|x10,6, 8016|x5,0] … ‘\x7FF’/ “\xDF\xBF” ‘\x80’/“\xC2\x80” bitvector operations q q
Technology SMT: Satisfiability Modulo Theories Solution/Model sat, unsat, Proof Is execution path P feasible? Is assertion X violated? WI TNESS SAGE Is Formula FSatisfiable (over Theory of Reals)?
Technology SMT: Satisfiability Modulo Theories Array Theory Arithmetic Uninterpreted Functions
Job Shop Scheduling Technology Machines Tasks Jobs P = NP? Laundry
Technology Job Shop Scheduling Constraints: Precedence: between two tasks of the same job Resource: Machines execute at most one job at a time 3 1 2 4
Technology Job Shop Scheduling Constraints: Encoding: Precedence: - start time of job 2 on mach 3 - duration of job 2 on mach 3 Resource: 3 1 2 4 Not convex
Job Shop Scheduling Technology
Job Shop Scheduling Technology Efficient solvers: - Floyd-Warshal algorithm - Ford-Fulkerson algorithm case split case split
Technology Microsoft Tools using Z3 is used by many research groups More than 19k downloads Z3 places 1st in most categories in SMT competitions Z3 used to check Azure Firewall Policies HAVOC SAGE SecGuru Vigilante Z3 solved more than 3 billion constraints created by SAGE Checking Win8 and Office. Z3 ships in Windows Server with the Static Driver Verifier
Technology Research Areas Practical problems often have structure that can be exploited. Undecidable (FOL + LIA) Algorithms Decidable Fragments Semi Decidable (FOL) Essentially Uninterpreted Formulas NEXPTIME (EPR) Data structures Heuristics PSPACE (QBF) NP (SAT) Generalized array theory Quantified Bit-Vector Logic Logic is “The Calculus of Computer Science” Zohar Manna
Little Engines of Proof Technology Freely available from http://research.microsoft.com/projects/z3
Research around Z3 Technology Decision Procedures • Modular Difference Logic is Hard TR 08 B, Blass Gurevich, Muthuvathi. • Linear Functional Fixed-points. CAV 09 B. & Hendrix. • A Priori Reductions to Zero for Strategy-Independent Gröbner Bases SYNASC 09 M& Passmore. • Efficient, Generalized Array Decision Procedures FMCAD 09 M & B • Quantifier Elimination as an Abstract Decision Procedure IJCAR 10, B • Cutting to the Chase CADE 11, Jojanovich, M • Polynomials IJCAR 12, Jojanovich, M Combining Decision Procedures • Model-based Theory Combination SMT 07 M & B. . • Proofs, Refutations and Z3 IWIL 08 M & B • On Locally Minimal Nullstellensatz Proofs. SMT 09 M & Passmore. • A Concurrent Portfolio Approach to SMT Solving CAV 09 Wintersteiger, Hamadi & M • Conflict Directed Theory Resolution Cambridge Univ. Press 12, M & B Quantifiers, quantifiers, quantifiers • Efficient E-matching for SMT Solvers. CADE 07 M & B. • Relevancy Propagation. TR 07 M & B. • Deciding Effectively Propositional Logic using DPLL and substitution sets IJCAR 08 M & B. • Engineering DPLL(T) + saturation. IJCAR 08 M & B. • Complete instantiation for quantified SMT formulas CAV 09 Ge & M. • On deciding satisfiability by DPLL(+ T) and unsound theorem proving. CADE 09 Bonachina, M & Lynch. • Generalized PDR SAT 12 Hoder & B.. • . • . • .
Introductory Background Reading September 2011
Technology Mile High: Modern SAT/SMT search Backjump Models literal assignments Proofs Conflict Clauses Conflict Resolution Propagate
Core Engine in Z3: Modern DPLL/CDCL Technology One SAT expert to another: “It took me a year to understand the Mini-SAT FUIP code” Mate Soos to NiklasSörenson over ice-cream at SAT 2012 in Trento Model Proof Conflict Resolution [Nieuwenhuis, Oliveras, Tinelli J.ACM 06] customized
Technology Mile High: Modern SMT procedures Efficiently Backtrack to equi-satisfiable state A way to certify satisfiability Backjump Models values to satisfy formula Learn new fact that prune as many dead branches as possible Proofs Conflict Lemmas Efficient indexing for propagating consequences Conflict Resolution A way to certify unsatisfiability Propagate
Technology Research: Solving Horn Clauses mc(x) = x-10 if x > 100 mc(x) = mc(mc(x+11)) if x 100 assert (x ≤ 101 mc(x) = 91) mc() mc() mc() mc() mc() Solver finds solution for mc KrystofHoder & Nikolaj Bjorner, SAT 2012 Bjorner, McMillan, Rybalchenko, SMT 2012
Technology Research: SolvingR Efficiently A key idea: Use partial solution to guide the search Feasible Region Starting search Partial solution: What is the core? Can we extend it to ? DejanJojanovich & Leonardo de Moura, IJCAR 2012
Propaganda .com
Propaganda Core Expertise Empirical Software Engineering Foundations:Logic Program Analysis: Performance, Reliability, Security Programming Languages Design & Implementation
Propaganda Academic Interns
Summary An outline of – an efficient SMT solver Efficient logic solver for SE tools tackling intractable problems http://research.microsoft.com/projects/z3 Software Engineering Research @ Microsoft http://rise4fun.com Academic internships http://research.microsoft.com/en-us/jobs/intern Contact http://research.microsoft.com/~nbjornernbjorner@microsoft.com