280 likes | 292 Views
Learn about the development of certified program verifiers using Coq, formalization of x86 instruction set, types and extraction in Coq, and more. Explore abstraction refinement and Coq's module system for a modular design.
E N D
Secure Compiler Seminar 11/7Survey: Modular Development ofCertified Program Verifierswith a Proof Assistant Toshihiro YOSHINO(D1, Yonezawa Lab.) <tossy-2@yl.is.s.u-tokyo.ac.jp>
Today’s Paper • A. Chlipala (UC Berkeley). Modular Development of Certified Program Verifiers with a Proof Assistant. ICFP ’06. • Implementation can be downloaded from web site below: ⇒ http://proofos.sourceforge.net/
Overview of the Paper • Case study to develop a certified program verifier with Coq • Verifies memory safety of x86 machine code • Its soundness is machine-checked • Modular development by reusable functors • Possible to create a new verifier based on another type system with low cost
Constructing Certified Verifiers • Design and implement with Coq • Use “extraction” feature of Coq to obtain a working verifier • A verifier can be formalized as: • load: program -> state loads a program • The type program represents binary file format • safe: state -> Prop is the safety property we wish to verify for programs • [[P]] is notation for poption P • option(O’Caml) or Maybe(Haskell) for domain Prop
Constructing Certified Verifiers • Abstraction refinement by multiple stages • Each stage (component) is a functor which transforms target states into source states • Later components reason at higher levels of abstraction • Use Coq’s module system to implement this modular design
Formalization of x86 Instruction Set • PCC-style formalization • Subset of x86 instruction set + ERROR instruction • mov, jcc, … • Safety ≡ ERROR is unreachable • In combination with assertion, many properties can be proven • Can be formalizedcoinductively • Cope with infinitederivation
Types and Extraction in Coq • Basically Coq manipulates on terms of dependently-typed lambda calculus • A proposition is represented as a type, its proof as a term of that type • Well known as Curry-Howard isomorphism • Proving step corresponds to type inference • Given a goal, refine it interactively into subgoals, and eliminate holes • Rules used for these steps are called tactics
Types and Extraction in Coq • Program extraction from Coq code • In short, extraction is to erase terms of sorts other than Set • Brief example: isEven Definition isEven : forall (n:nat), poption (even n). refine (fix isEven (n:nat) : poption (even n) := match n return … with | O => PSome _ _ | S (S n) => … | _ => PNone _ end); auto. Qed. Definition isEven : forall (n:nat), poption (even n). refine (fix isEven (n:nat) : poption (even n) := match n return … with | O => PSome _ _ | S (S n) => … | _ => PNone _ end); auto. Qed. let rec isEven (n:nat) = match n with | O -> true | S (S n) -> isEven n | _ -> false
poption: “option” for Domain “Prop” • Two constructors: PNone and PSome • PSome is given a proof of P • Literately, PSome means “P holds and I have a proof for that” and PNone “I am not sure” • Can be used as failure-monad • PNone >>= _ = PNonePSome p >>= f = f p • In extraction, PSome corresponds to true, and PNone to false
soption • soption extends poption with a parameter • Proposition about a term of domain T (of sort Set) • soption, too, can be used as failure monad • In the paper’s theoretical part, written as {{ x : T | P }}
Coq’s Module System • Used to build re-usable verification components • Frequent pattern: Module M86 <: MACHINE. Definition mstate := state. Definition minstr := instr. … End M86. Module Type MACHINE. Parameter mstate : Set. Parameter minitState : mstate -> Prop. … End MACHINE. Record state : Set := { stRegs32 : regs32; … }. Inductive instr : Set := Arith : … | … . Inductive exec : … := … .
Module ModelCheck • Provides fundamental methods of model checking • Methods to prove theorems about infinite state systems through exhaustive exploration • Refine the model in each of the following stages Abstract Concrete
Module ModelCheckIntroduced Elements • absState: a set of abstract states • An abstract state is managed with “hypotheses”, states that are known to be safe • Hypothesis is used, for example, to formalize return pointer from a function • describes correspondence between machine states and abstract states • Context(Γ) is deleted in extracting a verifier • init is a set (actually a list) of initial states • It must be a set because one real machine state may correspond to multiple abstract states • There must be some elements in init that has no hypothesis
Module ModelCheckIntroduced Elements • step describes execution step • Execute an instruction from the specified state • soption is used because the execution may get stuck • Progress and Preservation must hold Progress Preservation
Initial states Initial states Module ModelCheckThe Concept Illustrated MACHINE: Input to the module State space ofa real machine absState step
Module Reduction • Translates x86 machine language into simpler RISC-style instruction set (SAL) • x86 machine language is too complex and not suitable for verification purposes • One instruction may perform several basic operations • The same basic operations show up in the working of many instructions • Reduction module also provides model checking layer for SAL programs
Module ReductionSAL: Simplified Assembly Language • Named after the language used in Proof-Carrying Code[Necula 1997] • RISC-style instruction set • Arithmetics are extended to allow expressions with parentheses and infix operators • Additional temporary registers TMPi
Module FixedCode • Ensures that code region is not overwritten by the code itself • To simplify the verification framework • Definition is in the form of ModelCheck • Additional check is performed only on storing to the memory
Module TypeSystem • Support for a standard approach for type systems • A set of types is introduced and typing rules for values are described • Subtype relation is also introduced • The definition in the figure suffices because Coq takes care of that part • And each register isassociated with a type
Module TypeSystem • viewShift represents shift of types’ view • Occurs at places a program crosses an abstraction boundary • For example, in function calls when the stack frame changes • Introducing existential is also a kind of view shift
Module WeakUpdate • Introduces a type system of weak update • Each memory cell has a type associated and this type does not change during a run • A cell can be overwritten only with a value of its type • Dynamic memory management is out of the scope • In real setting, memory is frequently reclaimed and reused • Garbage collector or malloc/free
The Rest of Modules • Module StackTypes • Keeps track of types of stack slots • Module SimpleFlags • Keeps track of flag values • In x86 (too), no atomic instruction for conditional test and jump at one time • Crucial for assuring pointer is valid (not null) or checking array boundary
Case Study:A Verifier for Algebraic Datatypes • Implemented the library and a sample verifier with Coq • http://proofos.sourceforge.net/ • Approx. 20K(+α) LoC • Main implementation consists of only 600 LoC • 7,000 LoC for implementing library components • 10,000 for generic utility • 1,000 for bitvectors and fixed-precision arithmetics • 1,000 for a subset of x86 machine code • Auxiliary library from O’Caml implementation (not counted here) • x86 binary parsing, etc.
Related Work • Foundational PCC[Appel 2001] • Reduce TCB and also improve flexibility of PCC by constructing a system on some logical framework • However, efficiency is sacrificed by generality • Theoretical issues seem to have priority to pragmatics • Epigram[McBride, McKinna 2004], ATS[Chen, Xi 2005],RSP[Westbrook et al. 2005] and GADTs[Sheard 2004] • Incorporate dependent types into program languages • But the foundations of Coq’s implementation and metatheories are simpler than them
Summary (of the Paper) • Designed a structure for modular certified verifiers • Components are reusable functors • Pipeline-style design • Implemented library components with Coq • As a case study, memory safety verifier for x86 machine code is constructed
Relevance to My Research • I have been studying a framework to build verifiers for low-level languages • First formalize the common language ADL • Verification is done on the translated program (in ADL) • Trying to prove correctness of translation • Currently ongoing with Coq
Relevance to My Research • Both very similar approach • ADL and SAL are both designed in a minimalist criteria • Verification logic is built on top of the common language’s semantics • To achieve high portability and flexibility • From this viewpoint, my project is covered by his… (x_x) • Correctness of translation is also proven by Coq in proofos • Positively thinking, my direction was not so wrong
Relevance to My Research • Comparison of two projects…