250 likes | 344 Views
Formal Methods for Minimizing the DHOSA Trusted Computing Base. Greg Morrisett, Harvard University with A.Chlipala , P.Govereau , G.Malecha , G.Tan , J.Tassoratti , & J.B.Tristan. SVA. Cryptographic secure computation. e.g., Enforce properties on a malicious OS.
E N D
Formal Methods for Minimizing the DHOSA Trusted Computing Base Greg Morrisett, Harvard University with A.Chlipala, P.Govereau, G.Malecha, G.Tan, J.Tassoratti, & J.B.Tristan
SVA Cryptographic secure computation e.g., Enforce properties on a malicious OS Binary translation andemulation Data-centric security e.g., Enable complex distributed systems, with resilience to hostile OS’s Formal methods Secure browser appliance transformation Hardware support for isolation Secure servers e.g., Prevent dataexfiltration Dealing with malicious hardware web-based architectures HARDWARE SYstem architectures
DHOSA Technologies We are investigating a variety of techniques to defend hosts: • Binary Translation & Instrumentation • LLVM & Secure Virtual Architecture • New Hardware architectures How can we minimize the need to trust these components?
The role of formal methods • Ideally, we should have proofs that the tools are “correct”. • The consumer should be able to independently validate the proofs against the working system. • This raises three hard problems: • We need formal models of system components. • We need formal statements of “correctness”. • We need proofs that our enforcement/rewriting/analysis code (or hardware) are correct.
Some of our activities • Tools for formal modeling of machine architectures • Domain-specific languages embedded into Coq. • Give us declarative specs of machine-level syntax & semantics. • Give us executable specifications for model validation. • Give us the ability to formally reason about machine code. • Tools for proving correctness of binary-validation • Specifically, that a binary will respect an isolation policy. • e.g., SFI, CFI, XFI, NaCL, TAL, etc. • Tools for proving correctness of compilers. • New techniques for scalable proofs of correctness. • New techniques for legacy compilers.
Modeling Machine Architectures • Real machines (e.g., Intel’s IA64) are messy. • Even decoding instructions is hard to get right. • The semantics are not explained well (and not always understood.) • There are actually many different versions. • Yet to prove that a compiler or analysis or rewriting tool is correct, we need to be able to reason about real machine architectures. • And of course, we don’t just want Intel IA64. • Need IA32, AMD, ARM, … • And of course the specialized hardware that DHOSA is considering!
Currently • Various groups are building models of machines. • ACL2 group doing FP verification • Cambridge group studying relaxed memory models • NICTA group doing L4 verification • Inria group doing compiler verification • However, none of them really supports everything we need: • declarative formulation – crucial for formal reasoning • efficiently executable – crucial for testing and validation • completeness – crucial for systems-level work • reuse in reasoning – crucial for modeling many architectures
Our Approach • Two domain-specific languages (DSLs) • One for binary de-coding (parsing): bits -> ASTs • One for semantics: ASTs -> behavior • The DSLs are inspired by N. Ramsey’s work. • Sled andλ-RTL. • Ramsey’s work intended for generating compiler back-ends. • Our focus is on reasoning about compiler-like tools. • The DSLs are embedded into Coq. • lets us reason formally (in Coq) about parsing, semantics. • e.g., is decoding deterministic? • e.g., will this binary, when executed in this state, respect SFI? • the encoding lets us extract efficient ML code (i.e., a simulator)
Yacc in Coq via Combinators Definition CALL_p : parser instr := "1110" $ "1000" $ word @(fun w => CALL (Imm_opw) None) || "1111" $ "1111" $ ext_op_modrm (str ”010” || str ”011”) @ (fun op => CALL op None) || "1001" $ "1010" $ halfword$$ word @ (fun p => CALL (Imm_op (sndp)) (Some (fstp))).
X86 Integer Instruction Decoder Definition instr_parser := AAA_p || AAD_p || AAM_p || AAS_p || ADC_p || ADD_p || AND_p || CMP_p || OR_p || SBB_p || SUB_p || XOR_p || ARPL_p || BOUND_p || BSF_p || BSR_p || BSWAP_p || BT_p || BTC_p || BTR_p || BTS_p || CALL_p || CBW_p || CDQ_p || CLC_p || CLD_p || CLI_p || CMC_p || CMPS_p || CMPXCHG_p || CPUID_p || CWD_p || CWDE_p || DAA_p || DAS_p || DEC_p || DIV_p || HLT_p || IDIV_p || IMUL_p || IN_p || INC_p || INS_p || INTn_p || INT_p || INTO_p || INVD_p || INVLPG_p || IRET_p || Jcc_p || JCXZ_p || JMP_p || LAHF_p || LAR_p || LDS_p || LEA_p || LEAVE_p || LES_p || LFS_p || LGDT_p || LGS_p || LIDT_p || LLDT_p || LMSW_p || LOCK_p || LODS_p || LOOP_p || LOOPZ_p || LOOPNZ_p || LSL_p || LSS_p || LTR_p || MOV_p || MOVCR_p || MOVDR_p || MOVSR_p || MOVBE_p || MOVS_p || MOVSX_p || MOVZX_p || MUL_p || NEG_p || NOP_p || NOT_p || OUT_p || OUTS_p || POP_p || POPSR_p || POPA_p || POPF_p || PUSH_p || PUSHSR_p || PUSHA_p || PUSHF_p || RCL_p || RCR_p || RDMSR_p || RDPMC_p || RDTSC_p || RDTSCP_p || REPINS_p || REPLODS_p || REPMOVS_p || REPOUTS_p || REPSTOS_p || REPECMPS_p || REPESCAS_p || REPNECMPS_p || REPNESCAS_p || RET_p || ROL_p || ROR_p || RSM_p || SAHF_p || SAR_p || SCAS_p || SETcc_p || SGDT_p || SHL_p || SHLD_p || SHR_p || SHRD_p || SIDT_p || SLDT_p || SMSW_p || STC_p || STD_p || STI_p || STOS_p || STR_p || TEST_p || UD2_p || VERR_p || VERW_p || WAIT_p || WBINVD_p || WRMSR_p || XADD_p || XCHG_p || XLAT_p.
Parsing Semantics • The declarative syntax helps get things right. • we can literally scrape manuals to get decoders. • though it’s far from sufficient – manuals have bugs! • It’s possible to give a simple functional interpretation of the parsing combinators (a la Haskell). • parser T := string -> FinSet(string * T) • allows us to extract executable code for testing. • Makes it very easy to reason about parsers and prove things like || is associative and commutative. • or e.g., that Intel’s manuals are deterministic (they are not).
Semantics The usual style for machines is a small-step, operational semantics. M(R1(pc)) = a parse(M,a) = i (M,R1,i) (M’,R1’) (M,R1 || R2 || … || Rn) (M’,R1’ || R2 || … || Rn) This makes it easy to specify non-determinism and reason about the fine-grained behavior of the machine. But doesn’t really give us an efficient executable. Nor reusable reasoning.
Our approach Write a monadic denotational semantics for instructions: Definition step_AND(op1 op2:operand) := w1 <- get_op32 op1 ; w2 <- get_op32 op2 ; let res := Word32.Int.and w1 w2 in set_op32 op1 res ;; set_flag OF false ;; set_flag CF false ;; set_flagZF (is_zero32 res) ;; set_flag SF (is_signed32 res) ;; set_flag PF (parity res) ;; b <- next_oracle_bit ; set_flag AF b
Reasoning versus Validation • The monadic operations can be interpreted as pure functions over oracles and machine states. • The monadic operations are essentially RTLs over bit-vectors. • The infrastructure can be re-used across a wide variety of machine architectures. • i.e., defining and reasoning about machine architecture semantics becomes relatively easy. • But we can extract efficient ML code for testing the model against other simulators & real machines. • e.g., in-place updates for state changes instead of functional data structures. • in particular, we can leverage the work that Stephen talked about to do better validation.
Example Application: Google’s NaCl • NaCl uses software-fault isolation (SFI) to enforce an isolation policy. • good baseline for us to study • mask the high-bits of every store/jump to ensure a piece of untrusted code stays in its sandbox. • tricky: must consider every parse of the x86 code. • by enforcing an alignment convention, ensures there’s only one parse (McCamant). • security depends on the “checker” which verifies these properties. • Our goal: build and prove correctness of the checker.
Our Verified Checker • We generated a checker that is: • declarative • easy to update • provably correct w.r.t. our x86 model • except that it contains ~80 lines of trusted C code • smaller and faster than Google’s checker • Google’s checker about 600 lines of trusted C code • about 3x faster on a 200Kloc C program • Basic idea: • generate a DFA that accepts only correctly rewritten programs. • the DFA is encoded as a set of tables, which are proven correct. • only the DFA driver is trusted.
Thus far… • Focus: Formal methods for modeling real machines. • DSLs for instruction decoding, instruction semantics. • Yield both formal reasoning & efficient execution. • Allows us to prove correctness of binary-level tools like the SFI checker. • Another Focus: compiler correctness • Crucial for eliminating language-based techniques from TCB. • For example, the Illinois group’s secure virtual architecture depends upon the correctness of the LLVM compiler.
To Date • Gold standard was Leroy’s Compcert Compiler • (mildly) optimizing compiler for C to x86, ARM, PPC • models of these languages & architectures • proof of correctness • See J.Regher’s compiler bug paper at PLDI. • However: • machine models are incomplete, unvalidated • optimization at O1 levels but not O3 • proofs are roughly 17x the size of the code!
Earlier Work Post-Doc (now MIT faculty member) Adam Chlipala’swork on lambda-tamer: • compiler from core-ML to MIPS-like machine • transformations like CPS and closure-conversion • breakthrough: |proofs| ≈ |code| • clever language representations avoid tedious proofs about variables, scope, binding. • clever language semantics makes reasoning simpler, more uniform. • clever tactic-based reasoning makes proofs mostly automatic, and far more extensible.
Current Work: • We have built a version of LLVM where the optimizer is provably correct (see PLDI’11 paper). • to be fair, only intra-procedural optimizations • but includes global value numbering, sparse conditional constant propagation, advanced dead code elimination, loop invariant code motion, loop deletion, loop unrolling, and dead-store elimination. • The “proof” is completely automated. • in essence, we have a way to effectively prove that the input to the optimizer has the same behavior as the output. • or more properly, when we can’t, we don’t optimize the code. • The prover knows nothing about the internals of the LLVM optimizer. • so it’s easy to change LLVM, or add new optimizations.
LLVM Translation Validation LLVM Optimizer LLVM front-ends code generator equivalence checker
How do we do this? • Convert LLVM’sSSA-based intermediate language into a categorical value graph representation. • similar to circuit representations (think BDDs). • but incorporates loops by lifting everything to the level of streams of values. • allows us to reason equationally about both data and control. • Take advantage of category theory to normalize the input and output graphs, and check for equivalence. • this gives us many equivalences for free, such as common sub-expressions and loop-invariant computations. • but still need to normalize underlying scalar computations. • The key challenge is getting this to scale to big functions.
% of Functions Validated on all Opts. Fail: we fail to translate LLVM’sIR into our representation Alarm: we fail to validate the translation OK: we validate the translation and there are significant differences Boring: we validate but the differences are minimal
Quick Recap • DHOSA relies upon compilers, rewriting, analysis, and other software tools to provide protection. • Our goal is to increase assurance in these tools. • provide detailed formal models of machines • prove correctness of key components • find techniques for automating proofs • The hope is that these investments will pay off, not just for this project but others. • e.g., IARPAStonesoup, DARPA CRASH