Formal Methods for Minimizing the DHOSA Trusted Computing Base

Formal Methods for Minimizing the DHOSA Trusted Computing Base Greg Morrisett, Harvard University with A.Chlipala, P.Govereau, G.Malecha, G.Tan, J.Tassorati, & J.B.Tristan

DHOSA Technologies We are investigating a variety of techniques to defend hosts: • Binary Translation & Instrumentation • High-Level Program Analysis & Rewriting • LLVM & Secure Virtual Architecture • New Hardware architectures How can we minimize the need to trust these components?

The role of formal methods • Ideally, we should have proofs that the tools are “correct”. • The consumer should be able to independently validate the proofs against the working system. • This raises three hard problems: • We need formal models of system components. • We need formal statements of “correctness”. • We need proofs that our enforcement/rewriting/analysis code (or hardware) are correct.

Some of our activities • Tools for formal modeling of machine architectures • Domain-specific languages embedded into Coq. • Give us declarative specs of machine-level syntax & semantics. • Give us executable specifications for model validation. • Give us the ability to formally reason about machine code. • Tools for proving correctness of binary-validation • Specifically, that a binary will respect an isolation policy. • e.g., SFI, CFI, XFI, NaCL, TAL, etc. • Tools for proving correctness of compilers. • New techniques for scalable proofs of correctness. • New techniques for legacy compilers.

Modeling Machine Architectures • Real machines (e.g., Intel’s IA64) are messy. • Even decoding instructions is hard to get right. • The semantics are not explained well (and not always understood.) • There are actually many different versions. • Yet to prove that a compiler or analysis or rewriting tool is correct, we need to be able to reason about real machine architectures. • And of course, we don’t just want Intel IA64. • Need IA32, AMD, ARM, … • And of course the specialized hardware that DHOSA is considering!

Currently • Various groups are building models of machines. • ACL2 group doing FP verification • Cambridge group studying relaxed memory models • NICTA group doing L4 verification • Inria group doing compiler verification • However, none of them really supports everything we need: • declarative formulation – crucial for formal reasoning • efficiently executable – crucial for testing and validation • completeness – crucial for systems-level work • reuse in reasoning – crucial for modeling many architectures

Our Approach • Two domain-specific languages (DSLs) • One for binary de-coding (parsing): bits -> ASTs • One for semantics: ASTs -> behavior • The DSLs are inspired by N. Ramsey’s work. • Sled andλ-RTL. • Ramsey’s work intended for generating compiler back-ends. • Our focus is on reasoning about compiler-like tools. • The DSLs are embedded into Coq. • lets us reason formally (in Coq) about parsing, semantics. • e.g., is decoding deterministic? • e.g., will this binary, when executed in this state, respect SFI? • the encoding lets us extract efficient ML code (i.e., a simulator)

Decoding??

Yacc in Coq via Combinators Definition CALL_p : parser instr := "1110" $"1000" $word @(fun w => CALL (Imm_opw) None) || "1111" $ "1111" $ ext_op_modrm ("010” ||"011”) @ (fun op => CALL op None) || "1001" $ "1010" $ halfword$ word @ (fun p => CALL (Imm_op (sndp)) (Some (fstp))).

X86 Integer Instruction Decoder Definition instr_parser:= AAA_p || AAD_p || AAM_p || AAS_p || ADC_p || ADD_p || AND_p || CMP_p || OR_p || SBB_p || SUB_p || XOR_p || ARPL_p || BOUND_p || BSF_p || BSR_p || BSWAP_p || BT_p || BTC_p || BTR_p || BTS_p || CALL_p || CBW_p || CDQ_p || CLC_p || CLD_p || CLI_p || CMC_p || CMPS_p || CMPXCHG_p || CPUID_p || CWD_p || CWDE_p || DAA_p || DAS_p || DEC_p || DIV_p || HLT_p || IDIV_p || IMUL_p || IN_p || INC_p || INS_p || INTn_p || INT_p || INTO_p || INVD_p || INVLPG_p || IRET_p || Jcc_p || JCXZ_p || JMP_p || LAHF_p || LAR_p || LDS_p || LEA_p || LEAVE_p || LES_p || LFS_p || LGDT_p || LGS_p || LIDT_p || LLDT_p || LMSW_p || LOCK_p || LODS_p || LOOP_p || LOOPZ_p || LOOPNZ_p || LSL_p || LSS_p || LTR_p || MOV_p || MOVCR_p || MOVDR_p || MOVSR_p || MOVBE_p || MOVS_p || MOVSX_p || MOVZX_p || MUL_p || NEG_p || NOP_p || NOT_p || OUT_p || OUTS_p || POP_p || POPSR_p || POPA_p || POPF_p || PUSH_p || PUSHSR_p || PUSHA_p || PUSHF_p || RCL_p || RCR_p || RDMSR_p || RDPMC_p || RDTSC_p || RDTSCP_p || REPINS_p || REPLODS_p || REPMOVS_p || REPOUTS_p || REPSTOS_p || REPECMPS_p || REPESCAS_p || REPNECMPS_p || REPNESCAS_p || RET_p || ROL_p || ROR_p || RSM_p || SAHF_p || SAR_p || SCAS_p || SETcc_p || SGDT_p || SHL_p || SHLD_p || SHR_p || SHRD_p || SIDT_p || SLDT_p || SMSW_p || STC_p || STD_p || STI_p || STOS_p || STR_p || TEST_p || UD2_p || VERR_p || VERW_p || WAIT_p || WBINVD_p || WRMSR_p || XADD_p || XCHG_p || XLAT_p.

Parsing Semantics • The declarative syntax helps get things right. • we can literally scrape manuals to get decoders. • though it’s far from sufficient – manuals have bugs! • It’s possible to give a simple functional interpretation of the parsing combinators (a la Haskell). • parser T := string -> FinSet(string * T) • Makes it very easy to reason about parsers and prove things like || is associative and commutative. • or e.g., that Intel’s manuals are deterministic (they are not). • But it’s not very efficient. • in essence, does backtracking search. • and is working at the bit level. • we want to be able to extract efficient code.

Proven Correct Parser Generators • So as in Yacc or other parser generator tools, we are compiling the DSL for syntax specification into an efficient program. • We use on-the fly calculation and memoization of parsing derivatives a la Brzozowski and more recently, Might & Darais. • In essence, lazily construct the DFA. • Importantly, we are able to prove the correctness of this translation within Coq. • To be honest, we’ve only done recognition, not parsing so far. • And are still working at the bit-level instead of byte level. • Bottom line: don’t have to trust that the “yacc” compilation is right.

Semantics The usual style for machines is a small-step, operational semantics. M(R1(pc)) = a parse(M,a) = i (M,R1,i) (M’,R1’) (M,R1 || R2 || … || Rn)  (M’,R1’ || R2 || … || Rn) This makes it easy to specify non-determinism and reason about the fine-grained behavior of the machine. But doesn’t really give us an efficient executable. Nor reusable reasoning.

Our approach Write a monadic denotational semantics for instructions: Definitionstep_AND(op1 op2:operand) := w1 <- get_op32 op1 ; w2 <- get_op32 op2 ; let res := Word32.Int.and w1 w2 in set_op32 op1 res ;; set_flag OF false ;; set_flag CF false ;; set_flagZF (is_zero32 res) ;; set_flag SF (is_signed32 res) ;; set_flag PF (parity res) ;; b <- next_oracle_bit ; set_flag AF b

Reasoning versus Validation • The monadic operations can be interpreted as pure functions over oracles and machine states. • The monadic operations are essentially RTLs over bit-vectors. • The infrastructure can be re-used across a wide variety of machine architectures. • i.e., defining and reasoning about machine architecture semantics becomes relatively easy. • But we can extract efficient ML code for testing the model against other simulators & real machines. • e.g., in-place updates for state changes instead of functional data structures.

Using the models: SFI • Stephen McCamant showed how to do software-based fault isolation on the x86 a few years ago. • mask the high-bits of every store/jump to ensure a piece of untrusted code stays in its sandbox. • tricky: must consider every parse of the x86 code. • by enforcing an alignment convention, ensures there’s only one parse. • security depends on the “checker” which verifies these properties. • Google adopts SFI for Native Client. • Our goal: produce a proof that the checker only says “ok” on code which, when executed, respects the sandbox policy.

Thus far… • Focus: Formal methods for modeling real machines. • DSLs for instruction decoding, instruction semantics. • Yield both formal reasoning & efficient execution. • Allows us to prove correctness of binary-level tools like the SFI checker. • Another Focus: compiler correctness • Crucial for eliminating language-based techniques from TCB. • For example, the Illinois group’s secure virtual architecture depends upon the correctness of the LLVM compiler.

To Date • Gold standard was Leory’sCompcert Compiler • (mildly) optimizing compiler for C to x86, ARM, PPC • models of these languages & architectures • proof of correctness • See J.Regher’s compiler bug paper at PLDI. • However: • machine models are incomplete, unvalidated • optimization at O1 levels but not O3 • proofs are roughly 17x the size of the code!

Last Time Post Doc Adam Chlipala’s work on lambda-tamer: • compiler from core-ML to MIPS-like machine • transformations like CPS and closure-conversion • breakthrough: proofs are roughly same size as code • clever language representations avoid tedious proofs about variables, scope, binding. • clever language semantics makes reasoning simpler, more uniform. • clever tactic-based reasoning makes proofs mostly automatic, and far more extensible.

Current Work: • We have built a version of LLVM where the optimizer is provably correct (see PLDI’11 paper). • to be fair, only intra-procedural optimizations • but includes global value numbering, sparse conditional constant propagation, advanced dead code elimination, loop invariant code motion, loop deletion, loop unrolling, and dead-store elimination. • The “proof” is completely automated. • in essence, we have a way to effectively prove that the input to the optimizer has the same behavior as the output. • or more properly, when we can’t, we don’t optimize the code. • The prover knows nothing about the internals of the LLVM optimizer. • so it’s easy to change LLVM, or add new optimizations.

LLVM Translation Validation LLVM Optimizer LLVM front-ends code generator equivalence checker

How do we do this? • Convert LLVM’sSSA-based intermediate language into a categorical value graph representation. • similar to circuit representations (think BDDs). • but incorporates loops by lifting everything to the level of streams of values. • allows us to reason equationally about both data and control. • similar to work of Sorin Lerner on PEGs. • Take advantage of category theory to normalize the input and output graphs, and check for equivalence. • this gives us many equivalences for free, such as common sub-expressions and loop-invariant computations. • but still need to normalize underlying scalar computations. • The key challenge is getting this to scale to big functions.

% of Functions Validated on all Opts. Fail: we fail to translate LLVM’sIR into our representation Alarm: we fail to validate the translation OK: we validate the translation and there are significant differences Boring: we validate but the differences are minimal

Quick Recap • DHOSA relies upon compilers, rewriting, analysis, and other software tools to provide protection. • Our goal is to increase assurance in these tools. • provide detailed formal models of machines • prove correctness of key components • find techniques for automating proofs • The hope is that these investments will pay off, not just for this project but others. • e.g., IARPAStonesup, DARPA CRASH

Formal Methods for Minimizing the DHOSA Trusted Computing Base