Formal Methods for Minimizing the DHOSA Trusted Computing Base

Formal Methods for Minimizing the DHOSA Trusted Computing Base Greg Morrisett, Harvard University with A.Chlipala, P.Govereau, G.Malecha, G.Tan, J.Tassorati, & J.B.Tristan

SVA Cryptographic secure computation e.g., Enforce properties on a malicious OS Binary translation andemulation Data-centric security e.g., Enable complex distributed systems, with resilience to hostile OS’s Formal methods Secure browser appliance transformation Hardware support for isolation Secure servers e.g., Prevent dataexfiltration Dealing with malicious hardware web-based architectures HARDWARE SYstem architectures

DHOSA Technologies We are investigating a variety of techniques to defend hosts: • Binary Translation & Instrumentation • LLVM & Secure Virtual Architecture • New Hardware architectures How can we minimize the need to trust these components?

The role of formal methods • Ideally, we should have proofs that the tools are “correct”. • The consumer should be able to independently validate the proofs against the working system. • This raises three hard problems: • We need formal models of system components. • We need formal statements of “correctness”. • We need proofs that our enforcement/rewriting/analysis code (or hardware) are correct.

Some of our activities • Tools for formal modeling of machine architectures • Domain-specific languages embedded into Coq. • Give us declarative specs of machine-level syntax & semantics. • Give us executable specifications for model validation. • Give us the ability to formally reason about machine code. • Tools for proving correctness of binary-validation • Specifically, that a binary will respect an isolation policy. • e.g., SFI, CFI, XFI, NaCL, TAL, etc. • Tools for proving correctness of compilers. • New techniques for scalable proofs of correctness. • New techniques for legacy compilers.

Modeling Machine Architectures • Real machines (e.g., Intel’s IA64) are messy. • Even decoding instructions is hard to get right. • The semantics are not explained well (and not always understood.) • There are actually many different versions. • Yet to prove that a compiler or analysis or rewriting tool is correct, we need to be able to reason about real machine architectures. • And of course, we don’t just want Intel IA64. • Need IA32, AMD, ARM, … • And of course the specialized hardware that DHOSA is considering!

Currently • Various groups are building models of machines. • ACL2 group doing FP verification • Cambridge group studying relaxed memory models • NICTA group doing L4 verification • Inria group doing compiler verification • However, none of them really supports everything we need: • declarative formulation – crucial for formal reasoning • efficiently executable – crucial for testing and validation • completeness – crucial for systems-level work • reuse in reasoning – crucial for modeling many architectures

Our Approach • Two domain-specific languages (DSLs) • One for binary de-coding (parsing): bits -> ASTs • One for semantics: ASTs -> behavior • The DSLs are inspired by N. Ramsey’s work. • Sled andλ-RTL. • Ramsey’s work intended for generating compiler back-ends. • Our focus is on reasoning about compiler-like tools. • The DSLs are embedded into Coq. • lets us reason formally (in Coq) about parsing, semantics. • e.g., is decoding deterministic? • e.g., will this binary, when executed in this state, respect SFI? • the encoding lets us extract efficient ML code (i.e., a simulator)

Decoding??

Yacc in Coq via Combinators Definition CALL_p : parser instr := "1110" $ "1000" $ word @(fun w => CALL (Imm_opw) None) || "1111" $ "1111" $ ext_op_modrm (str ”010” || str ”011”) @ (fun op => CALL op None) || "1001" $ "1010" $ halfword$$ word @ (fun p => CALL (Imm_op (sndp)) (Some (fstp))).

X86 Integer Instruction Decoder Definition instr_parser := AAA_p || AAD_p || AAM_p || AAS_p || ADC_p || ADD_p || AND_p || CMP_p || OR_p || SBB_p || SUB_p || XOR_p || ARPL_p || BOUND_p || BSF_p || BSR_p || BSWAP_p || BT_p || BTC_p || BTR_p || BTS_p || CALL_p || CBW_p || CDQ_p || CLC_p || CLD_p || CLI_p || CMC_p || CMPS_p || CMPXCHG_p || CPUID_p || CWD_p || CWDE_p || DAA_p || DAS_p || DEC_p || DIV_p || HLT_p || IDIV_p || IMUL_p || IN_p || INC_p || INS_p || INTn_p || INT_p || INTO_p || INVD_p || INVLPG_p || IRET_p || Jcc_p || JCXZ_p || JMP_p || LAHF_p || LAR_p || LDS_p || LEA_p || LEAVE_p || LES_p || LFS_p || LGDT_p || LGS_p || LIDT_p || LLDT_p || LMSW_p || LOCK_p || LODS_p || LOOP_p || LOOPZ_p || LOOPNZ_p || LSL_p || LSS_p || LTR_p || MOV_p || MOVCR_p || MOVDR_p || MOVSR_p || MOVBE_p || MOVS_p || MOVSX_p || MOVZX_p || MUL_p || NEG_p || NOP_p || NOT_p || OUT_p || OUTS_p || POP_p || POPSR_p || POPA_p || POPF_p || PUSH_p || PUSHSR_p || PUSHA_p || PUSHF_p || RCL_p || RCR_p || RDMSR_p || RDPMC_p || RDTSC_p || RDTSCP_p || REPINS_p || REPLODS_p || REPMOVS_p || REPOUTS_p || REPSTOS_p || REPECMPS_p || REPESCAS_p || REPNECMPS_p || REPNESCAS_p || RET_p || ROL_p || ROR_p || RSM_p || SAHF_p || SAR_p || SCAS_p || SETcc_p || SGDT_p || SHL_p || SHLD_p || SHR_p || SHRD_p || SIDT_p || SLDT_p || SMSW_p || STC_p || STD_p || STI_p || STOS_p || STR_p || TEST_p || UD2_p || VERR_p || VERW_p || WAIT_p || WBINVD_p || WRMSR_p || XADD_p || XCHG_p || XLAT_p.

Parsing Semantics • The declarative syntax helps get things right. • we can literally scrape manuals to get decoders. • though it’s far from sufficient – manuals have bugs! • It’s possible to give a simple functional interpretation of the parsing combinators (a la Haskell). • parser T := string -> FinSet(string * T) • Makes it very easy to reason about parsers and prove things like || is associative and commutative. • or e.g., that Intel’s manuals are deterministic (they are not). • But it’s not very efficient. • in essence, does backtracking search. • and is working at the bit level. • we want to be able to extract efficient code.

Proven Correct Parser Generators • So as in Yacc or other parser generator tools, we are compiling the DSL for syntax specification into an efficient program. • We use on-the fly calculation and memoization of parsing derivatives a la Brzozowski and more recently, Might & Darais. • In essence, lazily construct the DFA. • Importantly, we are able to prove the correctness of this translation within Coq. • To be honest, we’ve only done recognition, not parsing so far. • And are still working at the bit-level instead of byte level. • Bottom line: don’t have to trust that the “yacc” compilation is right.

Semantics The usual style for machines is a small-step, operational semantics. M(R1(pc)) = a parse(M,a) = i (M,R1,i)  (M’,R1’) (M,R1 || R2 || … || Rn)  (M’,R1’ || R2 || … || Rn) This makes it easy to specify non-determinism and reason about the fine-grained behavior of the machine. But doesn’t really give us an efficient executable. Nor reusable reasoning.

Our approach Write a monadic denotational semantics for instructions: Definition step_AND(op1 op2:operand) := w1 <- get_op32 op1 ; w2 <- get_op32 op2 ; let res := Word32.Int.and w1 w2 in set_op32 op1 res ;; set_flag OF false ;; set_flag CF false ;; set_flagZF (is_zero32 res) ;; set_flag SF (is_signed32 res) ;; set_flag PF (parity res) ;; b <- next_oracle_bit ; set_flag AF b

Reasoning versus Validation • The monadic operations can be interpreted as pure functions over oracles and machine states. • The monadic operations are essentially RTLs over bit-vectors. • The infrastructure can be re-used across a wide variety of machine architectures. • i.e., defining and reasoning about machine architecture semantics becomes relatively easy. • But we can extract efficient ML code for testing the model against other simulators & real machines. • e.g., in-place updates for state changes instead of functional data structures. • In the next talk, you’ll hear one way we can validate our semantics against other simulators.

Using the models: SFI • SFI is a simple kind of binary-rewriting for enforcing an isolation policy. • good baseline for us to study • mask the high-bits of every store/jump to ensure a piece of untrusted code stays in its sandbox. • tricky: must consider every parse of the x86 code. • by enforcing an alignment convention, ensures there’s only one parse. • security depends on the “checker” which verifies these properties. • Our goal: produce a proof that the checker only says “ok” on code which, when executed, respects the sandbox policy.

Thus far… • Focus: Formal methods for modeling real machines. • DSLs for instruction decoding, instruction semantics. • Yield both formal reasoning & efficient execution. • Allows us to prove correctness of binary-level tools like the SFI checker. • Another Focus: compiler correctness • Crucial for eliminating language-based techniques from TCB. • For example, the Illinois group’s secure virtual architecture depends upon the correctness of the LLVM compiler.

To Date • Gold standard was Leroy’s Compcert Compiler • (mildly) optimizing compiler for C to x86, ARM, PPC • models of these languages & architectures • proof of correctness • See J.Regher’s compiler bug paper at PLDI. • However: • machine models are incomplete, unvalidated • optimization at O1 levels but not O3 • proofs are roughly 17x the size of the code!

Last Time Post Doc Adam Chlipala’s work on lambda-tamer: • compiler from core-ML to MIPS-like machine • transformations like CPS and closure-conversion • breakthrough: |proofs| ≈ |code| • clever language representations avoid tedious proofs about variables, scope, binding. • clever language semantics makes reasoning simpler, more uniform. • clever tactic-based reasoning makes proofs mostly automatic, and far more extensible.

Current Work: • We have built a version of LLVM where the optimizer is provably correct (see PLDI’11 paper). • to be fair, only intra-procedural optimizations • but includes global value numbering, sparse conditional constant propagation, advanced dead code elimination, loop invariant code motion, loop deletion, loop unrolling, and dead-store elimination. • The “proof” is completely automated. • in essence, we have a way to effectively prove that the input to the optimizer has the same behavior as the output. • or more properly, when we can’t, we don’t optimize the code. • The prover knows nothing about the internals of the LLVM optimizer. • so it’s easy to change LLVM, or add new optimizations.

LLVM Translation Validation LLVM Optimizer LLVM front-ends code generator equivalence checker

How do we do this? • Convert LLVM’sSSA-based intermediate language into a categorical value graph representation. • similar to circuit representations (think BDDs). • but incorporates loops by lifting everything to the level of streams of values. • allows us to reason equationally about both data and control. • similar to work of Sorin Lerner on PEGs. • Take advantage of category theory to normalize the input and output graphs, and check for equivalence. • this gives us many equivalences for free, such as common sub-expressions and loop-invariant computations. • but still need to normalize underlying scalar computations. • The key challenge is getting this to scale to big functions.

% of Functions Validated on all Opts. Fail: we fail to translate LLVM’sIR into our representation Alarm: we fail to validate the translation OK: we validate the translation and there are significant differences Boring: we validate but the differences are minimal

Quick Recap • DHOSA relies upon compilers, rewriting, analysis, and other software tools to provide protection. • Our goal is to increase assurance in these tools. • provide detailed formal models of machines • prove correctness of key components • find techniques for automating proofs • The hope is that these investments will pay off, not just for this project but others. • e.g., IARPAStonesoup, DARPA CRASH

Formal Methods for Minimizing the DHOSA Trusted Computing Base