Compiler Assisted Software Verification Using Plug-Ins Radu Grosu SUNY at Stony Brook

Compiler Assisted Software Verification Using Plug-InsRadu GrosuSUNY at Stony Brook Joint work with S. Callanan, X. Huang, S. A. Smolka and E. Zadok

System-Software • Difficult to develop & maintain: • Concurrent and distributed (OS, ES, middleware), • Complicated by DS improving performance (locks, RC,...), • Mostlywritten inC programming language. • Has to be high-confidence: • Provides thecritical infrastructure for all applications, • Failuresare verycostly (business, reputation), • Has toprotect againstcyber-attacks.

What is High-Confidence? Ability to guarantee that ? system-software Ssatisfies temporal-property φ • Safety: something bad never happens • Liveness: something good eventually happens

Checking for High-Confidence (in-principle) • Every LTL formula can be translated to a FSAwith  executionsB (looping prg.) such that L()= L(B). • Automata-theoretic approach (infinite behaviors): • S|=iff L(BS)  L(B ) iffL(BS  B )= • Checking non-emptiness is equivalent to finding a reachableaccepting cycle(lasso, faultyPHASE!).

Checking for High-Confidence (in-principle) All Lassos Non-accepting BA BS LTL-P  BA BS  B Instrumenter (Product) Execution Engine Accepting Lasso L

Checking for High-Confidence (in-practice) • Combine static & runtime verification techniques: • Abstract interpretation (sequential IS programs), • Model checking (concurrent FS programs), • Runtime analysis (sequential program optimization). • Make scalability a priority: • Open source compiler technology started to mature, • Apply techniques to source code rather than models, • Models can be obtained by abstraction-refinement techniques, • Probabilistic techniques trade-of between precision-effort.

GCC Compiler • Early stages: a modest C compiler. • Translation: source code translated directly to RTL. • Optimization: at low RTL level. • High level information lost: calls, structures, fields, etc. • Now days: full blown,multi-language compiler • generating code for more than30 architectures. • Input: C, C++, Objective-C, Fortran, Java and Ada. • Tree-SSA: added GENERIC, GIMPLE and SSA ILs. • Optimization: at GENERIC, GIMPLE, SSA and RTL levels. • Verification: Tree-SSA API suitable for verification, too.

C File C++ File Java File GPL AST Build CFG C Parser C++ Parser Java Parser .. Parse Tree SSA/GPL CFG Rest Comp Genericize GEN AST RTL Code Code Gen Gimplify Obj Code GPL AST GCC Compilation Process

C File C++ File Java File GPL AST Build CFG C Parser C++ Parser Java Parser .. Parse Tree SSA/GPL CFG Rest Comp Genericize GEN AST RTL Code Code Gen Gimplify GPL AST Obj Code GCC Compilation Process API Plug-In

Plug-In Support • GCC & Buildermodifiedto load plug-ins that: • Analyze or modify the GCC representation, • Have access to the internal APIs of GCC, • Developed independently from GCC, • No GCC recompilation necessary.

C Program and its GIMPLE IL int main { int a,b,c; int T1,T2,T3,T4; a = 5; b = a + 10; T1 = foo(a,b); T2 = a + T1; if (a > T2) goto fi; T3 = b / a; T4 = b * a; c = T2 + T3; b = b + 1; fi:bar(a,b,c); } int main() { int a,b,c; a = 5; b = a + 10; c = a + foo(a,b); if (a > c) c = b++/a + b*a; bar(a,b,c); } Gimplify

FUNCTION DECL a b c T1 T2 T3 T4 Entry int int int int int int int A a = 5; b = a + 10; T1 = foo(a,b); T2 = b + T1; if (a > T2) goto B; CE = CE CE a 5 CE = B true false + C b T3 = b / a; T4 = b * a; c = T3 + T4; b = b + 1; = bar(a,b,c); return; a 10 = + if T2 CallE T1 > b T1 B foo a b Exit a T2 Associated GIMPLE CFG

GAM static HWM Dispatcher Linker run time Rst-Comp GCC Checking for High-Confidence (in-practice) SS S Gimplify GCC CFG BS CFG BS  B LTL-P  Instrument Verifier

GSRV Platform • GSRV suite: • Static and runtime verification tools we are developing for GCC. • General purpose (plug-ins): • Verbose-dump: recursively traverses and prints the CFG, • Intra/inter-procedural slicer: in work, • Code instrumenter: constructs the product machine. • Static verification tools (plug-ins): • Symbolic (BDD) execution engine: for boolean C-programs, • GAM: CFG-GIMPLE abstract machine, • Monte Carlo MC: statistical algorithm for LTL-MC. • Runtime verification tools (static libraries): • Dispatcher: catches and dispatches events to RV, • Monte Carlo RV: statistical algorithm for LTL-RV.

Instrumentation Plug-Ins • Ref-Counts:detects misuse of reference counts • Instruments: inc(rc), dec(rc), • Checks:st-inv (rc0), tr-inv (|rc′-rc|=1), leak-inv(rc>0 ~> rc=0), • Maintains: a list of reference counts and their container type. • Malloc: detects allocation bugs at runtime • Instruments: malloc() and free() function calls, • Checks sequences: free()free(), $free() and malloc()$, • Maintains: a list of existing allocations.

Instrumentation Plug-Ins • Bounds: checks for invalid memory access • Instruments: malloc(), free() and f(a), • Checks: accesses to non-allocated areas, • Maintains: heap, stack and text allocations • Higher accuracy than ElectricFence-like libraries.

Monte Carlo Approach Lassos Computation tree (CT) recurrence diameter … LTL flip a k-sided coin Explore N(,) independent lassos in the CT Error margin andconfidence ratio 

Taking N(,) Independent Lassos (error margin  and confidence ratio ) • Lasso sampling reduces overhead: • - Static verification:Reduces the space overhead • Runtime verification:Dynamically adjusts sampling rate • Lasso sampling weakened for RV: • - Reference counts: From zero up and back to zero.

Geometric Random Variable • Value ofgeometricRV Xwith parameterp: • No. of independent samples until success. • Probability mass function: • p(N) = P[X = N] = qN-1 p • Cumulative Distribution Function: • F(N) = P[X  N] = ∑i  Np(i) = 1 – qN= 1 – (1- p)N

How Many Lassos? • Requiring1- (1-p)N = 1- δ yields: N = ln (δ) / ln (1- p) • Lower bound on number of trials N needed to achieve success with confidence ratioδ.

What If pUnknown? • Requiringp  εyields: M = ln (δ) / ln (1- ε)  N = ln (δ) / ln (1- p) and therefore P[X  M]  1- δ • Lower bound on number of trials M needed to achieve success with confidence ratioδ and error marginε .

Statistical Hypothesis Testing • Null hypothesisH0:p ε • Alternative hypothesisH1:p<ε • If no success after N trials, then rejectH0 • In RV: adjust sampling rate. • Type I error:α= P[ X > M | H0] <δ • Since:P[ X  M | H0 ]  1- δ

Model Checking Results • TCAS: • Safe/best/optimal advisory selection, • No/avoid-unnecessary crossing. • Dining Philosophers: • (Un)Symmetric and (Un)Fair versions • Needham-Schroeder Protocol: • Quite sophisticated C implementation.

Runtime Verification (Reference Counts) • Check Linux file system cache objects • inodes: on-disk files • dentries: namespace nodes • Optionally, log all events • Simple per-category sampling policy • Initially: sample all objects • Hypothesize:ε > 10-5 and δ = 10-5 • Stop sampling: if hypothesis is false.

RV of RC: Results Logging: ~10x ~3x 1,33x

Results Checking: ~2x 1,1x 1,33x

Ongoing and Future Work • Static Verification: open source software MC for GCC • Abstraction/refinement/interpolation techniques, • Directed MC combined with Monte-Carlo MC: • LinkedGAM with CVS Light. • Runtime Verification: open source software RV for GCC • Develop: new plug-ins & a property (monitoring) language • Explore: novel sampling techniques, e.g. based on phases • Apply:Monte Carlo Decision Processes for optimal sampling.

Ongoing Instrumentation Plug-Ins • CFG-duplicator: replicates each function’s CFG • Splits each basic block into two parts: • Uninstrumented block: no change (except labels) • Instrumented block: instrumentation applied • Inserts selectors (if statements) before each pair • Block instrumentation can be toggled at run-time • Multi-core: checking code into a separate thread • Puts relevant information into a shared buffer • Shadow thread reads and parses information • Low latency: 65 cycles between cores on 1.65GHz Power5

Future Instrumentation Plug-Ins • FE-tracer: records function calls and parameters • Can be easily applied to both user and kernel code • Provides valuable trace information to guide debugging • DS-access-logger: records what data went where • Faster than trap-based methods: no context switches • We can exploit type information to provide visual representations of data structures and their links • Thread-DL-detector: detects circular dependencies • Extracts the loop conditions for each loop • Finds variables that would be written if the loop exited • If two threads are blocking on each other, flags a deadlock

Compiler Assisted Software Verification Using Plug-Ins Radu Grosu SUNY at Stony Brook

Compiler Assisted Software Verification Using Plug-Ins Radu Grosu SUNY at Stony Brook

Presentation Transcript

Poster # 873 *Sharon Nachman SUNY Stony Brook Stony Brook, NY sharon.nachman@stonybrook

Radu Grosu SUNY at Stony Brook

Stony Brook University

SUNY Stony Brook BMES

Minghua Zhang Stony Brook University, SUNY

Opportunities at Stony Brook

Radu Grosu SUNY at Stony Brook

Monte Carlo Model Checking Radu Grosu SUNY at Stony Brook

Efficient Modeling of Excitable Cells Using Hybrid Automata Radu Grosu SUNY at Stony Brook

Safety-Liveness Semantics for UML 2.0 Sequence Diagrams Radu Grosu SUNY at Stony Brook

Radu Grosu SUNY at Stony Brook

Paul Chung SUNY Stony Brook NA49 Collaboration

Paul Chung SUNY Stony Brook NA49 Collaboration

P. Chung Nuclear Chemistry, SUNY, Stony Brook

Stony Brook Update

Learning Cycle-Linear Hybrid Automata for Excitable Cells Radu Grosu SUNY at Stony Brook

Monte Carlo Model Checking Radu Grosu SUNY at Stony Brook

Roy Lacey Nuclear Chemistry, SUNY, Stony Brook

Stony Brook Update

Monte Carlo Model Checking Radu Grosu SUNY at Stony Brook