290 likes | 422 Views
Compiler Assisted Software Verification Using Plug-Ins Radu Grosu SUNY at Stony Brook. Joint work with S. Callanan, X. Huang, S. A. Smolka and E. Zadok. System-Software. Difficult to develop & maintain: Concurrent and distributed (OS, ES, middleware),
E N D
Compiler Assisted Software Verification Using Plug-InsRadu GrosuSUNY at Stony Brook Joint work with S. Callanan, X. Huang, S. A. Smolka and E. Zadok
System-Software • Difficult to develop & maintain: • Concurrent and distributed (OS, ES, middleware), • Complicated by DS improving performance (locks, RC,...), • Mostlywritten inC programming language. • Has to be high-confidence: • Provides thecritical infrastructure for all applications, • Failuresare verycostly (business, reputation), • Has toprotect againstcyber-attacks.
What is High-Confidence? Ability to guarantee that ? system-software Ssatisfies temporal-property φ • Safety: something bad never happens • Liveness: something good eventually happens
Checking for High-Confidence (in-principle) • Every LTL formula can be translated to a FSAwith executionsB (looping prg.) such that L()= L(B). • Automata-theoretic approach (infinite behaviors): • S|=iff L(BS) L(B ) iffL(BS B )= • Checking non-emptiness is equivalent to finding a reachableaccepting cycle(lasso, faultyPHASE!).
Checking for High-Confidence (in-principle) All Lassos Non-accepting BA BS LTL-P BA BS B Instrumenter (Product) Execution Engine Accepting Lasso L
Checking for High-Confidence (in-practice) • Combine static & runtime verification techniques: • Abstract interpretation (sequential IS programs), • Model checking (concurrent FS programs), • Runtime analysis (sequential program optimization). • Make scalability a priority: • Open source compiler technology started to mature, • Apply techniques to source code rather than models, • Models can be obtained by abstraction-refinement techniques, • Probabilistic techniques trade-of between precision-effort.
GCC Compiler • Early stages: a modest C compiler. • Translation: source code translated directly to RTL. • Optimization: at low RTL level. • High level information lost: calls, structures, fields, etc. • Now days: full blown,multi-language compiler • generating code for more than30 architectures. • Input: C, C++, Objective-C, Fortran, Java and Ada. • Tree-SSA: added GENERIC, GIMPLE and SSA ILs. • Optimization: at GENERIC, GIMPLE, SSA and RTL levels. • Verification: Tree-SSA API suitable for verification, too.
C File C++ File Java File GPL AST Build CFG C Parser C++ Parser Java Parser .. Parse Tree SSA/GPL CFG Rest Comp Genericize GEN AST RTL Code Code Gen Gimplify Obj Code GPL AST GCC Compilation Process
C File C++ File Java File GPL AST Build CFG C Parser C++ Parser Java Parser .. Parse Tree SSA/GPL CFG Rest Comp Genericize GEN AST RTL Code Code Gen Gimplify GPL AST Obj Code GCC Compilation Process API Plug-In
Plug-In Support • GCC & Buildermodifiedto load plug-ins that: • Analyze or modify the GCC representation, • Have access to the internal APIs of GCC, • Developed independently from GCC, • No GCC recompilation necessary.
C Program and its GIMPLE IL int main { int a,b,c; int T1,T2,T3,T4; a = 5; b = a + 10; T1 = foo(a,b); T2 = a + T1; if (a > T2) goto fi; T3 = b / a; T4 = b * a; c = T2 + T3; b = b + 1; fi:bar(a,b,c); } int main() { int a,b,c; a = 5; b = a + 10; c = a + foo(a,b); if (a > c) c = b++/a + b*a; bar(a,b,c); } Gimplify
FUNCTION DECL a b c T1 T2 T3 T4 Entry int int int int int int int A a = 5; b = a + 10; T1 = foo(a,b); T2 = b + T1; if (a > T2) goto B; CE = CE CE a 5 CE = B true false + C b T3 = b / a; T4 = b * a; c = T3 + T4; b = b + 1; = bar(a,b,c); return; a 10 = + if T2 CallE T1 > b T1 B foo a b Exit a T2 Associated GIMPLE CFG
GAM static HWM Dispatcher Linker run time Rst-Comp GCC Checking for High-Confidence (in-practice) SS S Gimplify GCC CFG BS CFG BS B LTL-P Instrument Verifier
GSRV Platform • GSRV suite: • Static and runtime verification tools we are developing for GCC. • General purpose (plug-ins): • Verbose-dump: recursively traverses and prints the CFG, • Intra/inter-procedural slicer: in work, • Code instrumenter: constructs the product machine. • Static verification tools (plug-ins): • Symbolic (BDD) execution engine: for boolean C-programs, • GAM: CFG-GIMPLE abstract machine, • Monte Carlo MC: statistical algorithm for LTL-MC. • Runtime verification tools (static libraries): • Dispatcher: catches and dispatches events to RV, • Monte Carlo RV: statistical algorithm for LTL-RV.
Instrumentation Plug-Ins • Ref-Counts:detects misuse of reference counts • Instruments: inc(rc), dec(rc), • Checks:st-inv (rc0), tr-inv (|rc′-rc|=1), leak-inv(rc>0 ~> rc=0), • Maintains: a list of reference counts and their container type. • Malloc: detects allocation bugs at runtime • Instruments: malloc() and free() function calls, • Checks sequences: free()free(), $free() and malloc()$, • Maintains: a list of existing allocations.
Instrumentation Plug-Ins • Bounds: checks for invalid memory access • Instruments: malloc(), free() and f(a), • Checks: accesses to non-allocated areas, • Maintains: heap, stack and text allocations • Higher accuracy than ElectricFence-like libraries.
Monte Carlo Approach Lassos Computation tree (CT) recurrence diameter … LTL flip a k-sided coin Explore N(,) independent lassos in the CT Error margin andconfidence ratio
Taking N(,) Independent Lassos (error margin and confidence ratio ) • Lasso sampling reduces overhead: • - Static verification:Reduces the space overhead • Runtime verification:Dynamically adjusts sampling rate • Lasso sampling weakened for RV: • - Reference counts: From zero up and back to zero.
Geometric Random Variable • Value ofgeometricRV Xwith parameterp: • No. of independent samples until success. • Probability mass function: • p(N) = P[X = N] = qN-1 p • Cumulative Distribution Function: • F(N) = P[X N] = ∑i Np(i) = 1 – qN= 1 – (1- p)N
How Many Lassos? • Requiring1- (1-p)N = 1- δ yields: N = ln (δ) / ln (1- p) • Lower bound on number of trials N needed to achieve success with confidence ratioδ.
What If pUnknown? • Requiringp εyields: M = ln (δ) / ln (1- ε) N = ln (δ) / ln (1- p) and therefore P[X M] 1- δ • Lower bound on number of trials M needed to achieve success with confidence ratioδ and error marginε .
Statistical Hypothesis Testing • Null hypothesisH0:p ε • Alternative hypothesisH1:p<ε • If no success after N trials, then rejectH0 • In RV: adjust sampling rate. • Type I error:α= P[ X > M | H0] <δ • Since:P[ X M | H0 ] 1- δ
Model Checking Results • TCAS: • Safe/best/optimal advisory selection, • No/avoid-unnecessary crossing. • Dining Philosophers: • (Un)Symmetric and (Un)Fair versions • Needham-Schroeder Protocol: • Quite sophisticated C implementation.
Runtime Verification (Reference Counts) • Check Linux file system cache objects • inodes: on-disk files • dentries: namespace nodes • Optionally, log all events • Simple per-category sampling policy • Initially: sample all objects • Hypothesize:ε > 10-5 and δ = 10-5 • Stop sampling: if hypothesis is false.
RV of RC: Results Logging: ~10x ~3x 1,33x
Results Checking: ~2x 1,1x 1,33x
Ongoing and Future Work • Static Verification: open source software MC for GCC • Abstraction/refinement/interpolation techniques, • Directed MC combined with Monte-Carlo MC: • LinkedGAM with CVS Light. • Runtime Verification: open source software RV for GCC • Develop: new plug-ins & a property (monitoring) language • Explore: novel sampling techniques, e.g. based on phases • Apply:Monte Carlo Decision Processes for optimal sampling.
Ongoing Instrumentation Plug-Ins • CFG-duplicator: replicates each function’s CFG • Splits each basic block into two parts: • Uninstrumented block: no change (except labels) • Instrumented block: instrumentation applied • Inserts selectors (if statements) before each pair • Block instrumentation can be toggled at run-time • Multi-core: checking code into a separate thread • Puts relevant information into a shared buffer • Shadow thread reads and parses information • Low latency: 65 cycles between cores on 1.65GHz Power5
Future Instrumentation Plug-Ins • FE-tracer: records function calls and parameters • Can be easily applied to both user and kernel code • Provides valuable trace information to guide debugging • DS-access-logger: records what data went where • Faster than trap-based methods: no context switches • We can exploit type information to provide visual representations of data structures and their links • Thread-DL-detector: detects circular dependencies • Extracts the loop conditions for each loop • Finds variables that would be written if the loop exited • If two threads are blocking on each other, flags a deadlock