1 / 42

Using Coq to generate and reason about x86 systems code

Using Coq to generate and reason about x86 systems code. Andrew Kennedy & Nick Benton (MSR Cambridge) Jonas Jensen (ITU Copenhagen). The big picture. Compositional specification and verification of high-level behavioural properties of low-level systems code

Download Presentation

Using Coq to generate and reason about x86 systems code

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Coq to generate and reason about x86 systems code Andrew Kennedy & Nick Benton (MSR Cambridge)Jonas Jensen (ITU Copenhagen)

  2. The big picture • Compositional specification and verification of high-level behavioural properties of low-level systems code • Previous work of Benton et al employed idealized machine code • Simple design • Infinite memory; pointers are natural numbers • It’s time to get real(ish): hence, x86

  3. Overview of talk Modelling x86: bits, bytes, instructions, execution Generating x86: assembling & compiling Reasoning about x86: logic & proofs Discussion

  4. Our approach • Clean slate: trusted base is just hardware and its model in Coq. † • No dependencies on legacy code, languages, compilers, or software architectures • Verify everything – including (at some point) loader-verifier • Do everything in Coq, making effective use of computation, notation, type classes, tactics, etc. • No dependencies on external tools • Coq as “world’s best macro assembler” • † And a small boot loader

  5. Modelling x86

  6. Bits, bytes and words Compute here: n-tuples of bools Reason here: 'Z_(2^n) from ssreflect library,reuse lemmas • We want to compute correctly and efficiently inside Coq • Proper modelling of n-bit words, arithmetic with carry, sign, overflow, rotates, shifts, padding, the lot, all O(n) • Generic over word-length, so index type by n : nat • We also want to reason soundlyinside Coq • Associativity, commutativity, order properties, etc

  7. Example: definition of addition Effective use of dependent types Performance inside Coq? On this machine, about 2000 additions a second Definition is very algorithmic:so we can compute!

  8. Example: proofs about addition 2. Apply injectivity of toZp to work in 'Z_(2^n):forall x y, toZp x = toZp y -> x = y 1. Deal with n=0 case 3. Rewrite using homomorphism lemmas e.g. toZp (addBp1p2) = (toZpp1 + toZpp2)%R 4. Apply ssreflect “ring” lemma for 'Z_(2^n)

  9. Machine state • Register state is just total function • Flags can take on undefined value (see later) • Abstractly, memory is DWORD BYTE • Partiality represents whether memory is mapped and accessible • Concretely, for efficiency, a trie-like structure

  10. X86 instructions x86 is notoriously large and baroque (instruction set manual alone is 1640 pages long) Subset only: no legacy 16-bit mode, flat memory model (no segment nonsense), no floating point, no SIMD instructions, no protected-mode instructions, no 64-bit mode (yet) Actually: not too bad, possible to factor so that Coq datatype is “total” (no junk)

  11. Addressing modes e.g. ADD EBX, EDI + [EDX*4] + 12

  12. Instruction format Manuals don’t reveal much “structure” – such as it is – in instruction format But it can bediscerned – andutilitised forconcise decodingfunctions

  13. Instruction decoding Uses monadic syntax,reader reads from memory and advances pointer Note: there may be many instruction formats for the same instruction

  14. Instruction execution Example fragment: call and return Currently, a partial function from State to State. Implemented in monadic style, using “primitive” operations of r/w register, r/w flag, r/w memory, etc. Factored to re-use common patterns e.g. evalMemSpec, evalSrc

  15. Non-determinism & under-specification

  16. Non-determinism & under-specification

  17. Representing non-determinism and under-specification • For sequential x86, for the subset we care about, almost completely deterministic • Flags are the main issue. • Introduce “undefined” state for flags • Instructions that depend on a flag whose value is undefined (e.g. branch-on-carry) then has unspecified behaviour • An alternative would be to set flags non-deterministically (cfRockSalt)

  18. Generating x86: Assembling and Compiling

  19. Instruction encoding Directly represent encoding by list of bytes Note: encoding is position-dependent In future we mightmirror decodingusing a monadic style

  20. Jumps and labels Targets of jumps and branches are just absolute addresses in the Instr type. To write assembler code we want labels – for this we use a kind of HOAS type:

  21. Syntax matters Label binding While macro Label Cute use of notation in Coq: can write assembler code more-or-less using syntax of real assemblers! But also make use of Coq definitions, and “macros”

  22. Assembling Given an assembler program and an address to locate it, we can produce a sequence of bytes in the usual “two-pass” way:

  23. Round-trip theorem Memory between offset and endpos contains bytes Memory between offset and endposdecodes to prog Statement of correctness uses overloaded “points-to” predicate, to be described later

  24. Little languages Instead of trusting – or modelling – existing languages such as C, we plan to develop little languages inside Coq. We have experimented with a tiny imperative language and its “compiler”, proved correct in Coq

  25. Code demo!

  26. Reasoning about x86:Logic and Proof

  27. Big picture Assertion logic: predicate on partial states, usual connectives + separating conjunction Specification logic over this, incorporates step-indexing and framing, with corresponding later and frame connectives Safety specification used to give rules for instructions, in CPS style, packaged as Hoare-style triples for non-jumpy instructions Treatment of labels makes for elegant definition and rules for macros (e.g. while, if)

  28. Partial states • Partiality denotes partial description, as usual for separation logic • Not to be confused with use of partiality for flags (undefined state) and memory (un-mapped or inaccessible)

  29. Assertion logic • We define a separation logic of assertions, with usual connectives. Example rules: • Points-to predicate for memory is overloaded for different “decoders” of memory x could be a BYTE, a DWORD, a seq BYTE or even an Instr Core definition: memory from p to q “decodes” to value x Assertions (= SPred) are predicates on partial states

  30. Safety • Example: tight loop • Example: jmp Machine code does not “finish” and so standard Hoare triple does not suit; also, code is mixed up with store. So we define safe k P to mean “runs without faulting for k steps from any state satisfying P.”

  31. Specification logic It’s painful working directly with safe: we must work explicitly with “step-index” k and “frame” R Instead, we define a specification logic in which a spec is a set S of pairs such that In other words, it builds in steps and frames

  32. Connectives for spec logic • We define a frame connective • It gives us a “frame rule” for specs, and distributes over other connectives To hide explicit step indices, we use a later connective and the Löb rule:

  33. Basic blocks • We can then derive familiar rules such as framing: • This is useful when proving straight-line machine code Given our definitions of safety and points-to for instructions, we can mimic Hoare-style triples for basic blocks:

  34. Rules for instructions (I)No control flow Use Hoare-like triple

  35. Rules for instructions (II)Control flow Two possible continuations Explicit CPS-like use of safe

  36. Reasoning with labels We overload “points-to” on assembler programs, so (roughly)

  37. Macros Our representation of scoped labels makes it easy to define macros that make use of labels internally – and derive rules for them.

  38. Putting it together: A spec for a memory allocator

  39. Trivial implementation of allocator

  40. Proof support • Very painful to work with assertions and specs using only primitive rules • We have built Coq tactic support for • Basic simplification of formulae (AC of *, etc.) • Pulling out existential quantifiers automatically • Greatly simplifies proving!

  41. Proof demo!

  42. Status • We can generate and prove correct tiny programs written in “Coq” assembler and a small while-language • Binary generated by Coq can be run on “raw metal” (booted off a CD!) • Next steps • Model of I/O e.g. screen/keyboard; currently our “observable” is just “faulting” • High-level model of processes • Build and verify OS components such as scheduler, allocator, loaded • Eventual aim: process isolation theorem

More Related