Static and Runtime Verification A Monte Carlo Approach

Static and Runtime VerificationA Monte Carlo Approach Radu Grosu State University of New York at Stony Brook grosu@cs.sunysb.edu

Embedded Software Systems • Difficult to develop & maintain: • Concurrent and distributed (OS, ES, middleware), • Complicated by DS improving performance (locks, RC,...), • Mostlywritten inC programming language. • Have to be high-confidence: • Provide thecritical infrastructure for all applications, • Failuresare verycostly (business, reputation), • Have toprotect againstcyber-attacks.

What is High-Confidence? Ability to guarantee that ? system-software Ssatisfies LTL property φ

Automata-Theoretic Approach • Every LTL formula can be translated to a BüchiautomatonB such that L()= L(B). • Büchi automaton: NFA over -words with acceptance condition - a final state must be visited -often. • State transition graph of S can also be viewed as a Büchi automaton. • Satisfaction reduced to language emptiness: • S ⊨ ≅L(BS  B ) = 

Checking Non-Emptiness Lassos Computation Tree (CT) of B recurrence diameter Explore alllassos in the CT DDFS,SCC: time efficient DFS: memory efficient

Checking for High-Confidence (in-principle) All Lassos Non-accepting BA BS LTL-P  BA BS  B Instrumenter (Product) Execution Engine Accepting Lasso L

Randomized Algorithms Huge impacton CS: (distributed) algorithms, complexity theory, cryptography, etc. Takes of next step algorithm may depend on random choice(coin flip). Benefitsof randomization include simplicity,efficiency, and symmetry breaking.

Randomized Algorithms • Monte Carlo: may produce incorrect result but with bounded error probability. • Example: Election’s result prediction • Las Vegas: always gives correct result but running time is a random variable. • Example: Randomized Quick Sort

Monte Carlo Approach Lassos Computation tree (CT) of B recurrence diameter … flip a k-sided coin Explore N(,) independent lassos in the CT Error margin andconfidence ratio 

1 pZ = 1/8 1 qZ = 7/8 1 2 2 ½ 4 3 3 4 1 4 4 ¼ ⅛ 4 ⅛ Lassos Probability Space

Geometric Random Variable • Value ofgeometricRV Xwith parameterpz: • No. of independent trials (lassos) until success • Cumulative Distribution Function: • P[X  N] = 1 – (1-pz)N

How Many Lassos? • Requiring1 – (1-pz)N= 1- δ yields: N = ln (δ) / ln (1- pz) • Lower bound on number of trials N needed to achieve success with confidence ratioδ.

What If pz Unknown? • Requiringpz  εyields: M = ln (δ) / ln (1- ε)  N = ln (δ) / ln (1- pz) and therefore P[X  M]  1- δ • Lower bound on number of trials M needed to achieve success with confidence ratioδ and error marginε .

Statistical Hypothesis Testing • Null hypothesisH0:pz  ε • Inequality becomes: P[ X  M | H0 ]  1- δ • If no success after N trials, i.e., X > M, then rejectH0

Monte Carlo Verification (MV) input:B=(Σ,Q,Q0,δ,F), ε, δ N = ln (δ) / ln (1- ε) for (i = 1; i  N; i++) if (RL(B) == 1) return (1, error-trace); return (0, “reject H0 with α = Pr[ X > N | H0 ]< δ”); RL(B): performs a uniform random walk through B storing states encountered in hash table to obtaina random sample (lasso).

Model Checking[ISOLA’04, TACAS’05] • Implemented DDFS and MV in jMocha model checker for synchronous systems specified using Reactive Modules. • Performance and scalability of MV compares very favorably to DDFS.

DPh: Symmetric Unfair Version (Deadlock freedom)

Checking for High-Confidence (in-practice) • Make scalability a priority: • Open source compiler technology started to mature, • Apply techniques to source code rather than models, • Models can be obtained by abstraction-refinement techniques, • Probabilistic techniques trade-of between precision-effort.

GCC Compiler • Early stages: a modest C compiler. • Translation: source code translated directly to RTL. • Optimization: at low RTL level. • High level information lost: calls, structures, fields, etc. • Now days: full blown,multi-language compiler • generating code for more than30 architectures. • Input: C, C++, Objective-C, Fortran, Java and Ada. • Tree-SSA: added GENERIC, GIMPLE and SSA ILs. • Optimization: at GENERIC, GIMPLE, SSA and RTL levels. • Verification: Tree-SSA API suitable for verification, too.

C File C++ File Java File GPL AST Build CFG C Parser C++ Parser Java Parser .. Parse Tree SSA/GPL CFG Rest Comp Genericize GEN AST RTL Code Code Gen Gimplify Obj Code GPL AST GCC Compilation Process

C File C++ File Java File GPL AST Build CFG C Parser C++ Parser Java Parser .. Parse Tree SSA/GPL CFG Rest Comp Genericize GEN AST RTL Code Code Gen Gimplify GPL AST Obj Code GCC Compilation Process API Plug-In

C Program and its GIMPLE IL int main { int a,b,c; int T1,T2,T3,T4; a = 5; b = a + 10; T1 = foo(a,b); T2 = a + T1; if (a > T2) goto fi; T3 = b / a; T4 = b * a; c = T2 + T3; b = b + 1; fi:bar(a,b,c); } int main() { int a,b,c; a = 5; b = a + 10; c = a + foo(a,b); if (a > c) c = b++/a + b*a; bar(a,b,c); } Gimplify

FUNCTION DECL a b c T1 T2 T3 T4 Entry int int int int int int int A a = 5; b = a + 10; T1 = foo(a,b); T2 = b + T1; if (a > T2) goto B; CE = CE CE a 5 CE = B true false + C b T3 = b / a; T4 = b * a; c = T3 + T4; b = b + 1; = bar(a,b,c); return; a 10 = + if T2 CallE T1 > b T1 B foo a b Exit a T2 Associated GIMPLE CFG

SS S Gimplify GCC CFG BS CFG BS  B LTL-P  Instrument GAM Verifier static MC Static Verification of ESS [SOFTMC’05, NGS’06]

Monte Carlo Algorithm • Input:a set of CFGs. • Main function: A specifically designated CFG. • Random walks in the Büchi automaton: generated on-the-fly. • Initial state:of the main routine + bookkeeping information. • Next state: choose process + call GAM on its CFG. • Processes:created by using the fork primitive. • Optimization: GAM returns only upon context switch. • Lassos: detected by usingahierarchic hash table. • Local variables: removed upon return from a procedure.

GIMPLE Abstract Machine (GAM) • Interprets GIMPLE statements: according to their semantics. Interesting: • Inter-procedural: call(), return(). Manipulate the frame stack. • Catches and interprets: function calls to various modeling and concurrency primitives: • Modeling: toss(), assert(). Nondeterminism and checks. • Processes:fork(), … Manipulate the process list. • Communication: send(), recv(). Manipulate shared vars. May involve a context switch.

Results: TCAS

GAM static HWM Dispatcher Linker run time Rst-Comp GCC MC Runtime Verification of ESS [MBT’06, NGS’06] SS S Gimplify GCC CFG BS CFG BS  B LTL-P  Instrument Verifier

Runtime Verification Challenges • Inserting instrumentation code • Verifying states and transitions • Reducing overheads

struct inode* my_inode; atomic_t my_atomic; my_atomic = my_inode->i_count; Inserting Instrumentation Code if(instrument) log_event(ATOMIC_INC, INODE, my_atomic); atomic_inc(my_atomic);

Instrumentation Plug-Ins • Ref-Counts:detects misuse of reference counts • Instruments: inc(rc), dec(rc), • Checks:st-inv (rc0), tr-inv (|rc′-rc|=1), leak-inv(rc>0 ~> rc=0), • Maintains: a list of reference counts and their container type. • Malloc: detects allocation bugs at runtime • Instruments: malloc() and free() function calls, • Checks sequences: free()free(), $free() and malloc()$, • Maintains: a list of existing allocations.

RC Runtime Verification • Lasso concept weakened (abstracted): • Execution where: RC vary 0 ↗ … ↘ 0 • State:may include FS caches, HW regs, etc • Lasso sampling used to reduce overhead: • Check: for acceptance (error) • Dynamically adjust:sampling rate

Sampling Granularity Sample

State and Transition Invariants Change >1 Change <1 Value <0

The Leak Invariant Timeout Timeout

Proof of Concept • Checked Linux file system cache objects • inodes: on-disk files • dentries: name-space nodes • Optionally, log all events • Simple per-category sampling policy • Initially: sample all objects • Hypothesize: err. rate ε > 10-5 and con. ratio δ = 10-5 • Stop sampling: if hypothesis is false.

Benchmarks • Directory traversal benchmark • Create a directory tree (depth 5, degree 6) • Traverse the tree • Recursively delete the tree • Also tested GNU tar compilation • Testbed: • 1.7GHz Pentium 4 (256Kb cache) • 1Gbyte RAM • Linux 2.6.10

Results Logging: ~10x ~3x 1,33x

Conclusions • GSRV is a novel tool suite for randomized: • Static and runtime verification of ESS(growing) • General purpose tools (plug-ins): • Code instrumenter: constructs the product BA • Intra/inter-procedural slicer: in work • Static verification tools (plug-ins): • GAM: CFG-GIMPLE abstract machine • Monte Carlo MC: statistical algorithm for LTL-MC • Runtime verification tools (static libraries): • Dispatcher: catches and dispatches events to RV • Monte Carlo RV: statistical algorithm for LTL-RV

Static and Runtime Verification A Monte Carlo Approach