Static and Runtime Verification A Monte Carlo Approach

Static and Runtime VerificationA Monte Carlo Approach Radu Grosu State University of New York at Stony Brook grosu@cs.sunysb.edu

Talk Outline • Embedded Software Systems • Automata-Theoretic Verification • Monte Carlo Verification • Monte Carlo Model Checking • Static Verification of Software-Systems • Dynamic Verification Software-Systems

Embedded Software Systems • Systems with ongoing interaction with • their environment. • Termination is rather an error than expected behavior • Becoming an integral part of nearly every • engineered product. • - They control:

Embedded Systems Commercial Aircraft Telecommunication Household devices Automobiles Nuclear Power Plants Medical devices

Boeing 777: Super Computers with Wings Has > 4M lines of code > 1K embedded processors In order to - control subsystems - aid pilots in flight mngmnt. A great challenge of software engineering: • hard real-time deadlines, • mission and safety-critical, • complex and embeddedwithin another complex system, • interacts with humans in a sophisticated way.

Embedded Software Systems • Difficult to develop & maintain: • Concurrent and distributed (OS, ES, middleware), • Complicated by DS improving performance (locks, RC,...), • Mostlywritten inC programming language. • Have to be high-confidence: • Provide thecritical infrastructure for all applications, • Failuresare verycostly (business, reputation), • Have toprotect againstcyber-attacks.

Temporal Properties • Safety (something bad never happens): • Airborne planes are at least 1 mile apart • Nuclear reactor core never overheats • Gamma knife never exceeds prescribed dose • Liveness (something good eventually happens): • Core eventually reaches nominal temperature • Dishwasher tank is eventually full • Airbag inflates within 5ms of collision

Linear Temporal Logic • An LTL formula is made up of atomic propositions p, boolean connectives, ,  and temporal modalities X (neXt) and U (Until). • Safety: “nothing bad ever happens” • E.g. G( (pc1=cs  pc2=cs)) where G is a derived modality (Globally). • Liveness: “something good eventually happens” • E.g. G( req  F serviced ) where F is a derived • modality (Finally).

LTL Semantics • Semantics given in terms of the inductively defined entailment relation  ⊨ . •  is an infinite word (execution) over the power set of the set of atomic propositions. •  is an LTL formula.

LTL Semantics X p : p pUq : p p p q p p Fp : p G p : p p p p p p

What is High-Confidence? Ability to guarantee that ? system-software Ssatisfies LTL property φ

Checking if • Statically (at compile time) • Abstract interpretation (sequential IS programs), • Model checking (concurrent FS programs), • Dynamically (at run time) • Runtime analysis (sequential program optimization). • Basic Idea: • Intelligently explore S’s state space in attempt to • establish that S ⊨ 

Automata-Theoretic Approach • Büchi automaton: NFA over -words with acceptance condition - a final state must be visited -often. • Every LTL formula can be translated to a BüchiautomatonB such that L()= L(B). • State transition graph of S can also be viewed as a Büchi automaton.

Automata-theoretic approach • Satisfaction reduced to language emptiness: • S⊨ ≅L(BS)  L(B )≅L(BS) ∩ L(B )= • ≅L(BS) ∩ L(B )= ≅L(BS  B )=

Büchi Automata • Finite automata over infinite words. A B a a b b 1 2 1 2 L(A) = { ab } L(B) =  • Checking non-emptiness is equivalent to finding a reachable accepting cycle (lasso).

Checking Non-Emptiness Lassos Computation Tree (CT) of B recurrence diameter Explore alllassos in the CT DDFS,SCC: time efficient DFS: memory efficient

Randomized Algorithms Huge impacton CS: (distributed) algorithms, complexity theory, cryptography, etc. Takes of next step algorithm may depend on random choice(coin flip). Benefitsof randomization include simplicity,efficiency, and symmetry breaking.

Randomized Algorithms • Monte Carlo: may produce incorrect result but with bounded error probability. • Example: Election’s result prediction • Las Vegas: always gives correct result but running time is a random variable. • Example: Randomized Quick Sort

Monte Carlo Approach Lassos Computation tree (CT) of B recurrence diameter … flip a k-sided coin Explore N(,) independent lassos in the CT Error margin andconfidence ratio 

Lassos Probability Space • Sample Space: lassos in BS  B • Bernoulli random variable Z (coin flip): • Outcome = 1 if randomly chosen lasso accepting • Outcome = 0 otherwise • pZ= ∑ pi Zi(expectation of an accepting lasso) where pi is lasso prob. (uniform random walk)

1 pZ = 1/8 1 qZ = 7/8 1 2 2 ½ 4 3 3 4 1 4 4 ¼ ⅛ 4 ⅛ Example: Lassos Probability Space

Geometric Random Variable • Value ofgeometricRV Xwith parameterpz: • No. of independent trials (lassos) until success • Probability mass function: • p(N) = P[X = N] = qzN-1 pz • Cumulative Distribution Function: • F(N) = P[X  N] = ∑i  Np(i) = 1 - qzN

How Many Lassos? • Requiring1 - qzN= 1- δ yields: N = ln (δ) / ln (1- pz) • Lower bound on number of trials N needed to achieve success with confidence ratioδ.

What If pz Unknown? • Requiringpz  εyields: M = ln (δ) / ln (1- ε)  N = ln (δ) / ln (1- pz) and therefore P[X  M]  1- δ • Lower bound on number of trials M needed to achieve success with confidence ratioδ and error marginε .

Statistical Hypothesis Testing • Null hypothesisH0:pz  ε • Inequality becomes: P[ X  M | H0 ]  1- δ • If no success after N trials, i.e., X > M, then rejectH0 • Type I error:α= P[ X > M | H0] <δ

Monte Carlo Verification (MV) input:B=(Σ,Q,Q0,δ,F), ε, δ N = ln (δ) / ln (1- ε) for (i = 1; i  N; i++) if (RL(B) == 1) return (1, error-trace); return (0, “reject H0 with α = Pr[ X > N | H0 ]< δ”); RL(B): performs a uniform random walk through B storing states encountered in hash table to obtaina random sample (lasso).

Correctness of MV Theorem: Given aBüchi automaton B, error margin ε, and confidence ratio δ, if MV rejects H0, then its type I error has probability α= P[ X > M | H0] <δ

Complexity of MV Theorem: Given aBüchi automaton B having diameter D, error margin ε, and confidence ratio δ, MVrunsin timeO(N∙D) and uses spaceO(D), whereN = ln(δ) / ln(1- ε) Cf. DDFS which runs in O(2|S|+|φ|) time for B= BS B

Model Checking[ISOLA’04, TACAS’05] • Implemented DDFS and MV in jMocha model checker for synchronous systems specified using Reactive Modules. • Performance and scalability of MV compares very favorably to DDFS.

Dining Philosophers

DPh: Symmetric Unfair Version (Deadlock freedom)

DPh: Symmetric Unfair Version (Starvation freedom)

DPh: Asymmetric Fair Version (Deadlock freedom) δ = 10-1 ε = 1.8*10-3 N = 1278

DPh: Asymmetric Fair Version (Starvation freedom) δ = 10-1 ε = 1.8*10-3 N = 1278

Related Work • Random walk testing: • Heimdahl et al: Lurch debugger • Random walks to sample system state space: • Mihail & Papadimitriou (and others) • Monte Carlo Model Checking of Markov Chains: • Herault et al: LTL-RP, bonded MC, zero/one ET • Younes et al: Time-Bounded CSL, sequential analysis • Sen et al: Time-Bounded CSL, zero/one ET • Probabilistic Model Checking of Markov Chains: • ETMCC, PRISM, PIOAtool, and others.

Checking for High-Confidence (in-principle) All Lassos Non-accepting BA BS LTL-P  BA BS  B Instrumenter (Product) Execution Engine Accepting Lasso L

Checking for High-Confidence (in-practice) • Combine static & runtime verification techniques: • Abstract interpretation (sequential IS programs), • Model checking (concurrent FS programs), • Runtime analysis (sequential program optimization). • Make scalability a priority: • Open source compiler technology started to mature, • Apply techniques to source code rather than models, • Models can be obtained by abstraction-refinement techniques, • Probabilistic techniques trade-of between precision-effort.

GCC Compiler • Early stages: a modest C compiler. • Translation: source code translated directly to RTL. • Optimization: at low RTL level. • High level information lost: calls, structures, fields, etc. • Now days: full blown,multi-language compiler • generating code for more than30 architectures. • Input: C, C++, Objective-C, Fortran, Java and Ada. • Tree-SSA: added GENERIC, GIMPLE and SSA ILs. • Optimization: at GENERIC, GIMPLE, SSA and RTL levels. • Verification: Tree-SSA API suitable for verification, too.

C File C++ File Java File GPL AST Build CFG C Parser C++ Parser Java Parser .. Parse Tree SSA/GPL CFG Rest Comp Genericize GEN AST RTL Code Code Gen Gimplify Obj Code GPL AST GCC Compilation Process

C File C++ File Java File GPL AST Build CFG C Parser C++ Parser Java Parser .. Parse Tree SSA/GPL CFG Rest Comp Genericize GEN AST RTL Code Code Gen Gimplify GPL AST Obj Code GCC Compilation Process API Plug-In

C Program and its GIMPLE IL int main { int a,b,c; int T1,T2,T3,T4; a = 5; b = a + 10; T1 = foo(a,b); T2 = a + T1; if (a > T2) goto fi; T3 = b / a; T4 = b * a; c = T2 + T3; b = b + 1; fi:bar(a,b,c); } int main() { int a,b,c; a = 5; b = a + 10; c = a + foo(a,b); if (a > c) c = b++/a + b*a; bar(a,b,c); } Gimplify

FUNCTION DECL a b c T1 T2 T3 T4 Entry int int int int int int int A a = 5; b = a + 10; T1 = foo(a,b); T2 = b + T1; if (a > T2) goto B; CE = CE CE a 5 CE = B true false + C b T3 = b / a; T4 = b * a; c = T3 + T4; b = b + 1; = bar(a,b,c); return; a 10 = + if T2 CallE T1 > b T1 B foo a b Exit a T2 Associated GIMPLE CFG

SS S Gimplify GCC CFG BS CFG BS  B LTL-P  Instrument GAM Verifier static MC Static Verification of ESS [SOFTMC’05, NGS’06]

Monte Carlo Algorithm • Input:a set of CFGs. • Main function: A specifically designated CFG. • Random walks in the Büchi automaton: generated on-the-fly. • Initial state:of the main routine + bookkeeping information. • Next state: choose process + call GAM on its CFG. • Processes:created by using the fork primitive. • Optimization: GAM returns only upon context switch. • Lassos: detected by usingahierarchic hash table. • Local variables: removed upon return from a procedure.

Program State Shared Variables Valuation (channels & semaphores) List Of Process states p2 p3 p1 … Control State Data State CFG Name Statement #

Program State Shared Variables Valuation (channels & semaphores) List Of Process states p1 p2 p3 … Control State Data State Heap Global Variables Valuation Frame Stack f1 f2 … Return Control State Local Variables Valuation

Static and Runtime Verification A Monte Carlo Approach