272: Software Engineering Fall 2012

272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 3: Modular Verification with Magic, Predicate Abstraction

Modular verification with Magic • MAGIC: Modular Analysis of proGrams In C • Goal: Automated verification of C programs against finite state machine specifications (given as labeled transition systems) • Checks that the behavior of the C program conforms to the behavior of the state machine • It is a modular verification approach, the decomposition of the verification task follows the modularity in the code • The procedure that is being analyzed can invoke other procedures which are themselves specified as state machines • It uses predicate abstraction for automatically generating procedure abstractions and then checks conformance of the extracted procedure abstraction to the specification • It uses the abstract-verify-refine approach • If the conformance check fails, the procedure abstraction can be refined

Labeled transition systems as specifications • A labeled transition system (LTS) M is a 4-tuple (S, S0, Act, T) where • S is a finite, non-empty set of states • S0 S is the set of initial states • Act is the set of actions • T  S × Act × S is the transition relation • Assume that there is a special type of state called STOP state. • A STOP state has no outgoing transitions • (s, a, s’)  T is also written as s →a s’ • s  a s’ means that s’ is reachable from s by following only a single a-transition and arbitrary number of ε-transitions • ε is a specific type of action in Act. It corresponds to a silent action (like skip)

Example LTS • There is a textual language (called Finite State Processes, FSP) for specifying labeled transition systems • For the above LTS, the FSP specification would be: MyLock = { lock -> return {$0 == 0} -> STOP | return {$0 == 1} -> STOP } . lock return[0] MyLock STOP return[1]

An example LTS and an example procedure • The goal is to check the conformance between the C procedures and the specification LTSs lock int proc() { if (do_lock()) return 0; else return 1; } return[0] MyLock STOP return[1]

Procedure Abstractions • They define a procedure abstraction (PA) as a set of LTSs. • A PA is a tuple <d, l> where • d is the declaration for the procedure (as it appears in a C header file) • l is a finite list <g1, M1> , …, <gn, Mn> where each gi is a guard formula ranging over the parameters of the procedure and each Mi is an LTS with a single initial state • The guards are mutually exclusive • A PA is an abstraction of a procedure, if, for all i between 1 and n, when the guard gi evaluates to true over the actual parameters passed to the procedure, the procedure conforms to the LTS Mi

Procedure Abstractions • Procedure abstractions serve two purposes • They are used to specify desired behavior of the procedures • They present automated extraction techniques to automatically extract a PA from a given procedure • They are used to achieve modular verification • During verification of a procedure, the behaviors of procedures that are called by that procedure are abstracted as PAs

Conformance as Weak Simulation • Once a PA is extracted from a given procedure, then we want to check if the extracted PA conforms to the given LTS specification • In order to do this we need to formalize what it means to “conform” to a given LTS specification • They do this by using weak simulation • Weak simulation preservers LTLX properties • LTLX is the temporal logic LTL without the next state operator X • So, • if we verify an LTLX property on the specification LTS, and • show that the procedure conforms to the specification LTS, then • we can conclude that the procedure also satisfies the LTL property

Conformance as Weak Simulation • Given two LTSs M = (S, S0, Act, T) and M’ = (S’, S0’, Act, T’) • M’ weakly simulates M if and only if there exists a weak simulation relation E  S × S’ such that • For all s  S0 there exists an s’  S0’ such that (s, s’)  E • (s, s’)  E implies that for all actions a  Act \ {ε} if s  a s1 then there exists an s1’  S0’ such that s’  a s1’and (s1, s1’)  E

Weak Simulation • The existence of a simulation relation between two labeled transition systems can be checked by reducing the problem to an instance of Boolean satisfiability • Due to the specific structure of the SAT instances produced in this reduction, satisfiability of the resulting SAT instance can be solved in linear time. • Weak simulation is the conformance criteria that is used in Magic: • A procedure conforms to an LTS if the LTS can weakly simulate the procedure • This means that the implementation (the C procedure) is safely abstracted by its specification (the LTS)

Weak Simulation • Weak simulation is the conformance criteria that is used in Magic: • A procedure conforms to an LTS if the LTS can weakly simulate the procedure • This means that the implementation (the C procedure) is safely abstracted by its specification (the LTS)

Overall Approach Given a specification Mspec for a procedure • First, extract Mimp which abstracts the behavior of the procedure • During the abstraction process, the procedures that are called by the procedure that is being analyzed are modeled using a set of given procedures abstractions (which are called assumption PAs) • The procedure abstraction is automatically generated using the given assumption PAs and predicate abstraction • Then, check if Mimp conforms to Mspec(via weak simulation) • If Mimp conforms to Mspec then verification is successful and we are done • If Mimp does not conform to Mspec then we check the cause for non-conformance • If it is a bug in the implementation, then we found an error and we are done • If it is not a bug, but non-conformance is due to imprecision in the abstraction Mimp, then refine Mimp and repeat the process

Model Extraction Extraction of Mimp relies on the following principles: • Every state of Mimp models a state during execution of the procedure, so every state is composed of a control component and a data component • The control components intuitively represent the values of the program counter and are formally obtained from the CFG • The data components are abstract representations of the memory state of the procedure and are obtained using predicate abstraction • The transitions between states of the Mimp are derived from the transitions in the control flow graph taking into account the assumption PAs and the predicate abstraction

Inlining assumption PAs • During the model extraction, assumption PAs are used to handle procedure calls • If the procedure that is being abstraction calls another procedure p, then the PA for p is inlined by • creating a copy of the LTS for p • inserting an ε-transition from the call location to the initial state of the LTS for p • inserting ε-transitions from the STOP states of the LTS for p to the statement right after the call statement

Experiments with MAGIC • OpenSSL if an open source implementation of the publicly available SSL specification • SSL protocol is used by a client (typically a web browser) and a server to establish a secure socket connection over a malicious network using public and symmetric key cryptography • A critical component of the protocol is the handshake • Check if the openssl-0.9.6c implementation of the server side handshake conforms to its specification • Implementation is encapsulated in a single procedure with 347 lines of C code • They wrote the Mspec manually (an LTS with 28 states and 67 transitions) • Check if the client-side implementation conforms to the specification • Implementation is encapsulated in a single procedure with 345 lines of C code • Mspec is an LTS with 28 states and 60 transitions

Experiments with MAGIC • They provided 18 predicates for abstraction and provided the PAs for 12 library routines • Server-side verification took 255 seconds and 130MB of memory • Client-side verification took 226 seconds and 107MB of memory • They then changed the specification model to see if their approach can catch errors • Server-side error was found in 247 seconds using 130MB of memory • Client-side error was found in 227 seconds using 11MB of memory

Predicate Abstraction • In the following slides I will give an overview of the predicate abstraction technique

Abstraction (A simplified view) • How do we generate an abstract transition system? • Merge states in the concrete transition system (based on some criteria) • This reduces the number of states, so it should be easier to do verification • Do not eliminate transitions • This will make sure that the paths in the abstract transition system subsume the paths in the concrete transition system

Abstraction (A simplified view) • For every path in the concrete transition system, there is an equivalent path in the abstract transition system • If no path in the abstract transition system violate a property, then no path in the concrete system can violate the property • Using this reasoning we can verify properties in the abstract transition system • If the property holds on the abstract transition system, we are sure that the property holds in the concrete transition system • If the property does not hold in the abstract transition system, then we are not sure if the property holds or not in the concrete transition system

Abstraction (A simplified view) • If the property does not hold in the abstract transition system, what can we do? • We can refine the abstract transition system (split some states that we merged) • We have to make sure that the refined transition system is still an abstraction of the concrete transition system • Then, we can recheck the property again on the refined transition system • If the property does not hold again, we can refine again

Predicate Abstraction • An automated abstraction technique which can be used to reduce the state space of a program • The basic idea in predicate abstraction is to remove some variables from the program by just keeping information about a set of predicates about them • For example a predicate such as x = y maybe the only information necessary about variables x and y to determine the behavior of the program • In that case we can just store a boolean variable which corresponds to the predicate x = y and remove variables x and y from the program • Predicate abstraction is a technique for doing such abstractions automatically

Predicate Abstraction • Given a program and a set of predicates, predicate abstraction abstracts the program so that only the information about the given predicates are preserved • The abstracted program adds nondeterminism since in some cases it may not be possible to figure out what the next value of a predicate will be based on the predicates in the given set • One needs an automated theorem prover to compute the abstraction

Predicate Abstraction, A Very Simple Example • Assume that we have two integer variables x,y • We want to abstract the program using a single predicate “x=y” • We will divide the states of the program to two: • The states where “x=y” is true • The states where “x=y” is false, i.e., “xy” • We will then merge all the states in the same set • This is an abstraction • Basically, we forget everything except the value of the predicate “x=y”

Predicate Abstraction, A Very Simple Example • We will represent the predicate “x=y” as the boolean variable B in the abstract program • “B=true” will mean “x=y” and • “B=false” will mean “xy” • Assume that we want to abstract the following program which contains only one statement: y := y+1

Predicate Abstraction, Step 1 • Calculate preconditions based on the predicate {x = y + 1} y := y + 1 {x = y} Using our temporal logic notation we can say something like: {x=y+1}  AX{x=y} precondition for B being true after executing the statement y:=y+1 {x  y + 1} y := y + 1 {x  y} Again, using our temporal logic notation: {x≠y+1}  AX{x≠y} precondition for B being false after executing the statement y:=y+1

Predicate Abstraction, Step 2 • Use decision procedures to determine if the predicates used for abstraction imply any of the preconditions x = y x = y + 1 ? No x  y x = y + 1 ? No x = yx  y + 1 ? Yes x  y x  y + 1 ? No

Predicate Abstraction, Step 3 • Generate abstract code Predicate abstraction wrt the predicate “x=y” IF B THEN B := false ELSE B := true | false y := y + 1 1) Compute preconditions 3) Generate abstract code x = y x = y + 1 ? No {x = y + 1} y := y + 1 {x = y} x  y x = y + 1 ? No {x  y + 1} y := y + 1 {x  y} x = yx  y + 1 ? Yes 2) Check implications x  y x  y + 1 ? No

Checking conformance to a state machine • We want to check if this procedure conforms to this LTS void example() { do { A: KeAcquireSpinLock(); nPacketsOld = nPackets; req = devExt->WLHV; if(req && req->status){ devExt->WLHV = req->Next; B: KeReleaseSpinLock(); irp = req->irp; if(req->status > 0){ irp->IoS.Status = SUCCESS; irp->IoS.Info = req->Status; } else { irp->IoS.Status = FAIL; irp->IoS.Info = req->Status; } SmartDevFreeBlock(req); IoCompleteRequest(irp); nPackets++; } } while(nPackets!=nPacketsOld); C: KeReleaseSpinLock(); } KeAcquireSpinLock() SpinLock KeReleaseSpinLock() return STOP

Converting a C program to a state machine • We can convert a C program to a state machine • The control component of the state machine will be states of the control from graph • The data component of the state machine will be the values of the predicates used for predicate abstraction

C Code: State Machine (as a program): void example() { do { A: KeAcquireSpinLock(); nPacketsOld = nPackets; req = devExt->WLHV; if(req && req->status){ devExt->WLHV = req->Next; B: KeReleaseSpinLock(); irp = req->irp; if(req->status > 0){ irp->IoS.Status = SUCCESS; irp->IoS.Info = req->Status; } else { irp->IoS.Status = FAIL; irp->IoS.Info = req->Status; } SmartDevFreeBlock(req); IoCompleteRequest(irp); nPackets++; } } while(nPackets!=nPacketsOld); C: KeReleaseSpinLock(); } void example() begin do A: KeAcquireSpinLock(); skip; if (*) then skip; B: KeReleaseSpinLock(); skip; if (*) then skip; else skip; fi skip; fi while (*); C: KeReleaseSpinLock(); end Other than the statements labeled A, B and C, all the rest are ε-transitions

Abstraction Preserves Correctness • The state machine that is generated with predicate abstraction is non-deterministic (the branches labeled “*” are non-deterministic choices) • Non-determinism is used to handle the cases where the predicates used during predicate abstraction are not sufficient enough to determine which branch will be taken • If we find no error in the generated state machine then we are sure that there are no errors in the original program • The abstract state machine allows more behaviors than the original program due to non-determinism. • Hence, if the abstract state machine is correct then the original program is also correct.

Counter-Example Guided Abstraction Refinement (CEGAR) • However, if we find an error in the abstract state machine this does not mean that the original program is incorrect. • The erroneous behavior in the abstract state machine could be an infeasible execution path that is caused by the non-determinism introduced during abstraction. • Counter-example guided abstraction refinement is a technique used to iteratively refine the abstract state machine in order to remove the spurious counter-example traces

CEGAR The basic idea in counter-example guided abstraction refinement is the following: • First look for an error in the abstract program (if there are no errors, we can terminate since we know that the original program is correct) • If there is an error in the abstract program, generate a counter-example path on the abstract program • Check if the generated counter-example path is feasible using a theorem prover • If the generated path is infeasible add the predicate from the branch condition where an infeasible choice is made to the predicate set and generate a new abstract program using predicate abstraction

CEGAR Refined Abstraction: (using the predicate (nPackets = npacketsOld)) Abstraction: the boolean variable b represents the predicate (nPackets = npacketsOld) void example() begin do A: KeAcquireSpinLock(); skip; if (*) then skip; B: KeReleaseSpinLock(); skip; if (*) then skip; else skip; fi skip; fi while (*); C: KeReleaseSpinLock(); end void example() begin do A: KeAcquireSpinLock(); b := T; if (*) then skip; B: KeReleaseSpinLock(); skip; if (*) then skip; else skip; fi b := b ? F : *; fi while (!b); C: KeReleaseSpinLock(); end

CEGAR • Using counter-example guided abstraction refinement we are iteratively creating more an more refined abstractions • This iterative abstraction refinement loop is not guaranteed to converge for infinite domains • This is not surprising since automated verification for infinite domains is undecidable in general • The challenge in this approach is automatically choosing the right set of predicates for abstraction refinement • This is similar to finding a loop invariant that is strong enough to prove the property of interest

SLAM Project • SLAM project at Microsoft Research • Verification of C programs • Can handle unbounded recursion but does not handle concurrency • Uses predicate abstraction and CEGAR • SLAM toolkit was developed to find errors in windows device drivers • Predicate abstraction example in my slides is from: • “The SLAM Toolkit”, Thomas Ball and Sriram K. Rajamani, CAV 2001 • Windows device drivers are required to interact with the windows kernel according to certain interface rules • SLAM toolkit has an interface specification language called SLIC (Specification Language for Interface Checking) which is used for writing these interface rules (which are state machines) • The SLAM toolkit checks if the driver code conforms to these interface specifications

272: Software Engineering Fall 2012