Program Analysis and Testing using Satisfiability Modulo Theories Yandex 2 October 2012, Moscow

Program Analysis and Testing using SatisfiabilityModulo TheoriesYandex2 October 2012, Moscow Nikolaj Bjørner Senior Researcher Microsoft Research

Agenda Context: Software Engineering Research @ Microsoft Application: Fuzzing and Test Case Generation Application: Program Verification & Bit precise Analysis Application: String analysis - Formal Language Theory for Security Technology: Z3 – An Efficient SMT Solver - Basics and Research Propaganda: Software Engineering Research Tools

Context Team An Efficient SMT Solver Leonardo de Moura, Nikolaj Bjørner, Christoph Wintersteiger

Context Research in Software EngineeringImprove Software Development Productivity Group

Context Biology Communication & Collaboration Computational Linguistics Systems and Networking Economics Education Gaming Graphics and Multimedia Theory Hardware and Devices Health and Well-being Human-computer Interaction Information Retrieval & Management Machine Learning Security and Privacy Social Science Software Engineering OrganizationMicrosoft Research

Microsoft Research Labs Context Research :1% R & D ~40000 Company

Application: Fuzzing and Testing Fuzzing and Test Case Generation Dr. Strangelove? Bug: ***433 “2/29/2012 3:41 PM Edited by ***** SubStatus -> Local Fix I think the fuzzers are starting to become sentient. We must crush them before it is too late. In this case, the fuzzer figured out that if [X was between A and B then Y would get set to Z triggering U and V to happen……] ….. And if this fuzzer asks for the nuclear launch codes, don’t tell it what they are …” SAGE Internal. For Security Fuzzing Runs on x86 instructions External. For Developers Runs on .NET code Try it on: http://pex4fun.com Finding security bugs before the hackers black hat

Application: Fuzzing and Testing SAGE by numbers 100s CPU-years - largest dedicated fuzz lab in the world 100s apps - fuzzed using SAGE 100s previously unknown bugs found Billion+ computers updated with bug fixes Millions of $ saved for Users and Microsoft 10s of related tools (incl. Pex), 100s DART citations 3+ Billion constraints - largest usage for any SMT solver Adapted from [Patrice Godefroid, ISSTA 2010]

Application: Fuzzing and Testing Test case generation (y0 > 0) and (m0 = x0 % y0) and not (m0 = 0) and (x1 = y0) and (y1 = m0) and (m1 = x1 % y1) and (m1 = 0) • x0 = 2 • y0 = 4 • m0 = 2 • x1 = 4 • y1 = 2 • m1 = 0 SSA Solver We want a trace where the loop is executed twice. unsigned GCD(x, y) { requires(y > 0); while (true) { unsigned m = x % y; if (m == 0) return y; x = y; y = m; } }

Application: Fuzzing and Testing Test Case Generation Procedure Run Test and Monitor Path Condition Execution Path TestInputs KnownPaths seed New input Constraint System Solve Unexplored path

Application: Scalable bit-precise analysis What is wrong here? -INT_MIN= INT_MIN (INT_MAX+1)/2 +(INT_MAX+1)/2 = INT_MIN void itoa(int n, char* s) { if (n < 0) { *s++ = ‘-’; n = -n; } // Add digits to s …. intbinary_search(int[] arr,intlow, inthigh, int key) while (low <= high) { // Find middle value int mid = (low + high) / 2; intval = arr[mid];if (val == key) return mid;if (val < key) low = mid+1; else high = mid-1; }return -1; } Book: Kernighan and Ritchie Function: itoa (integer to ascii) Package: java.util.Arrays Function: binary_search

Application: Scalable bit-precise analysis Bit-precise analysis  +  0 0 1 1 0 1 0 1 0 0 1 1 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 1 1 1 1 1 0 1 1 1 = Vector Segments Bit-wise operations Concatenation Bit-wise and = 0 1 0 [4:2] = 1 0 1 0 1 1 = Vector Segments Modular arithmetic Extraction Addition

Application: Verification Hypervisor Verification (2007 – 2010) with Hypervisor Hardware Partners: European Microsoft Innovation Center Microsoft Research Microsoft’s Windows Division Universität des Saarlandes co-funded by the German Ministry of Education and Research http://www.verisoftxt.de

Application: Verification Microsoft Verifying C Compiler

Application: Verification SAT/SMT progress driven by applications:VCC Performance Trends Nov 08 – Mar 09 Modification in invariant checking Switch to Z3 v2 Z3 v2 update Attempt to improve Boogie/Z3 interaction Switch to Boogie2

Application: Verification Verification Attempt Time vs.Satisfaction and Productivity By Michal Moskal (VCC Designer and Software Verification Expert), Language quiz: “loose” or “lose” ?

Application: Verification The Importance of Speed

Application: Verification Building Verve Kernel.cs Source file Verification tool Compilation tool C# compiler Verified Nucleus.bpl (x86) Kernel.obj (x86) 9 person-months Boogie/Z3 TAL checker Translator/ Assembler Linker/ISO generator Verve.iso Safe to the Last Instruction / Jean Yang & Chris Hawbliztl PLDI 2010

Application: String Analysis Why string analysis?(motivating scenario) Tomcat v. < 6.0.18 req = http://www.x.com/%c0%ae%c0%ae/%c0%ae%c0%ae/private/ • security check: reqmust not contain "../" • dir= utf8decode("%c0%ae%c0%ae/%c0%ae%c0%ae/private/") = "../../private/" Analysis question: Does utf8decode reject overlongutf8-encodings such as "%C0%AE" for '.'? access granted to "../../private/" Windows 2000 vulnerability: http://www.sans.org/security-resources/malwarefaq/wnt-unicode.php Apache Tomcat vulnerability: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2938

Application: String Analysis Relativized Formal Language Theory string transformation SymbolicWord Transducers  Classical Word Transducers modulo Th() Classical Word Transducers (e.g. decoding automata, rational transductions) Classical I/O Automata (e.g. Mealy machine) Symbolic Word Acceptors ClassicalWord Acceptors (NFA, DFA)  Classical Word Acceptors modulo Th() regex matching

Application: String Analysis Rex & Bek – Symbolic RegEx & Transducers Margus Veanes

Application: String Analysis Symbolic Finite Transducer (SFT) • Classical transducer modulo a rich label theory • Core Idea: represent labels with guarded transformation functions • Separation of concerns: finite graph / theory of labels Concrete transitions: Symbolic transition: guard 1920transitions p p  x. 8016≤ x ≤ 7FF16/ [C016|x10,6, 8016|x5,0] … ‘\x7FF’/ “\xDF\xBF” ‘\x80’/“\xC2\x80” bitvector operations q q

Technology SMT: Satisfiability Modulo Theories Solution/Model sat, unsat, Proof Is execution path P feasible? Is assertion X violated? WI TNESS SAGE Is Formula FSatisfiable (over Theory of Reals)?

Technology SMT: Satisfiability Modulo Theories Array Theory Arithmetic Uninterpreted Functions

Job Shop Scheduling Technology Machines Tasks Jobs P = NP? Laundry

Technology Job Shop Scheduling Constraints: Precedence: between two tasks of the same job Resource: Machines execute at most one job at a time 3 1 2 4

Technology Job Shop Scheduling Constraints: Encoding: Precedence: - start time of job 2 on mach 3 - duration of job 2 on mach 3 Resource: 3 1 2 4 Not convex

Job Shop Scheduling Technology

Job Shop Scheduling Technology Efficient solvers: - Floyd-Warshal algorithm - Ford-Fulkerson algorithm case split case split

Technology Microsoft Tools using Z3 is used by many research groups More than 19k downloads Z3 places 1st in most categories in SMT competitions Z3 used to check Azure Firewall Policies HAVOC SAGE SecGuru Vigilante Z3 solved more than 3 billion constraints created by SAGE Checking Win8 and Office. Z3 ships in Windows Server with the Static Driver Verifier

Technology Research Areas Practical problems often have structure that can be exploited. Undecidable (FOL + LIA) Algorithms Decidable Fragments Semi Decidable (FOL) Essentially Uninterpreted Formulas NEXPTIME (EPR) Data structures Heuristics PSPACE (QBF) NP (SAT) Generalized array theory Quantified Bit-Vector Logic Logic is “The Calculus of Computer Science” Zohar Manna

Little Engines of Proof Technology Freely available from http://research.microsoft.com/projects/z3

Research around Z3 Technology Decision Procedures • Modular Difference Logic is Hard TR 08 B, Blass Gurevich, Muthuvathi. • Linear Functional Fixed-points. CAV 09 B. & Hendrix. • A Priori Reductions to Zero for Strategy-Independent Gröbner Bases SYNASC 09 M& Passmore. • Efficient, Generalized Array Decision Procedures FMCAD 09 M & B • Quantifier Elimination as an Abstract Decision Procedure IJCAR 10, B • Cutting to the Chase CADE 11, Jojanovich, M • Polynomials IJCAR 12, Jojanovich, M Combining Decision Procedures • Model-based Theory Combination SMT 07 M & B. . • Proofs, Refutations and Z3 IWIL 08 M & B • On Locally Minimal Nullstellensatz Proofs. SMT 09 M & Passmore. • A Concurrent Portfolio Approach to SMT Solving CAV 09 Wintersteiger, Hamadi & M • Conflict Directed Theory Resolution Cambridge Univ. Press 12, M & B Quantifiers, quantifiers, quantifiers • Efficient E-matching for SMT Solvers. CADE 07 M & B. • Relevancy Propagation. TR 07 M & B. • Deciding Effectively Propositional Logic using DPLL and substitution sets IJCAR 08 M & B. • Engineering DPLL(T) + saturation. IJCAR 08 M & B. • Complete instantiation for quantified SMT formulas CAV 09 Ge & M. • On deciding satisfiability by DPLL(+ T) and unsound theorem proving. CADE 09 Bonachina, M & Lynch. • Generalized PDR SAT 12 Hoder & B.. • . • . • .

Introductory Background Reading September 2011

Technology Mile High: Modern SAT/SMT search Backjump Models literal assignments Proofs Conflict Clauses Conflict Resolution Propagate

Core Engine in Z3: Modern DPLL/CDCL Technology One SAT expert to another: “It took me a year to understand the Mini-SAT FUIP code” Mate Soos to NiklasSörenson over ice-cream at SAT 2012 in Trento Model Proof Conflict Resolution [Nieuwenhuis, Oliveras, Tinelli J.ACM 06] customized

Technology Mile High: Modern SMT procedures Efficiently Backtrack to equi-satisfiable state A way to certify satisfiability Backjump Models values to satisfy formula Learn new fact that prune as many dead branches as possible Proofs Conflict Lemmas Efficient indexing for propagating consequences Conflict Resolution A way to certify unsatisfiability Propagate

Technology Research: Solving Horn Clauses mc(x) = x-10 if x > 100 mc(x) = mc(mc(x+11)) if x  100 assert (x ≤ 101  mc(x) = 91)  mc()  mc()  mc() mc() mc() Solver finds solution for mc KrystofHoder & Nikolaj Bjorner, SAT 2012 Bjorner, McMillan, Rybalchenko, SMT 2012

Technology Research: SolvingR Efficiently A key idea: Use partial solution to guide the search Feasible Region Starting search Partial solution: What is the core? Can we extend it to ? DejanJojanovich & Leonardo de Moura, IJCAR 2012

Propaganda .com

Propaganda Core Expertise Empirical Software Engineering Foundations:Logic Program Analysis: Performance, Reliability, Security Programming Languages Design & Implementation

http://rise4fun.com/z3py

Propaganda Academic Interns

Summary An outline of – an efficient SMT solver Efficient logic solver for SE tools tackling intractable problems http://research.microsoft.com/projects/z3 Software Engineering Research @ Microsoft http://rise4fun.com Academic internships http://research.microsoft.com/en-us/jobs/intern Contact http://research.microsoft.com/~nbjornernbjorner@microsoft.com

Program Analysis and Testing using Satisfiability Modulo Theories Yandex 2 October 2012, Moscow

Program Analysis and Testing using Satisfiability Modulo Theories Yandex 2 October 2012, Moscow

Presentation Transcript

Satisfiability Modulo Theories: A Calculus of Computation PUC, Rio de Janeiro, 2009

Satisfiability Modulo Theories: An Appetizer SBMF 2009 - Gramado

Satisfiability Modulo Theories solvers in Program Analysis and Verification

Scalable Program Analysis Using Boolean Satisfiability: The Saturn Project

Satisfiability Modulo Theories

Introduction to Satisfiability Modulo Theories (SMT)

Engineering Satisfiability Modulo Theories Solvers for I ntractable P roblems

October 18 2012 Moscow, Russia

Satisfiability modulo theories

Satisfiability Modulo Theories and Network Verification

Symbolic program analysis as Satisfiability Modulo Theories

Program Analysis via Satisfiability Modulo Path Programs

Satisfiability Modulo Theories and Network Verification

October 2, 2012 Percent using proportions

Satisfiability Modulo Theories and Network Verification

DPLL-based Checkers for Satisfiability Modulo Theories

Automating Software Testing Using Program Analysis

DPLL-based Checkers for Satisfiability Modulo Theories

A Progressive Approach for Satisfiability Modulo Theories