1 / 36

Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages

This seminar explores the need for a secure low-level language, along with the existing research on secure compilers and verifiers, and the limitations of Java bytecode verification. It also discusses strategies to maximize the machine-independent part and reduce the Trusted Computing Base (TCB) size.

waynee
Download Presentation

Secure Compiler Seminar 9/12 Survey on Design of Secure Low-Level Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Secure Compiler Seminar 9/12Survey on Design ofSecure Low-Level Languages Toshihiro YOSHINO, Yonezawa Lab. <tossy-2@yl.is.s.u-tokyo.ac.jp>

  2. High-LevelProgram SecureLow-LevelLanguage MachineExecutable A Secure Low-Level Language is Needed • As the secure compiler target language • “secure” means it has a method to prove program’s properties • Memory safe, control flow safe, … • For this, its concrete formal model should be given • It should also below-level • To reduce complexity of JIT compiler (In other words, TCB) Secure Compiler Verifier JIT compilation

  3. Existing Researches • Two major approaches • TAL, PCC • Extension to conventional assembly languages • Utilizes certain logic (such as type theories) to prove safety • Virtual Machines • Introduce intermediate languages of their own • Many of them adopt safe-by-construction design e.g. Java VM is semantically safe in memory operation • Java VM, Microsoft CIL, mvm [Franz et al. 2003],Jinja [Klein et al. 2006], ADL [Yoshino 2006], …

  4. Comparison: PCC vs. Java VM Machine Independent • PCC • (Extended) Machine code + Proof • Generated code and proof are machine-dependent • Requires one VC implementation for each architecture • Java VM • Verifier ensures type and control flow safety • It often restricts optimizations • Leads to performance degradation • High cost to perform verification • Stack is nothing more than a set of untyped (variable-number) registers Machine Dependent

  5. Limitations ofJava Bytecode Verification • Initialization check is incomplete • Correlation with other variables is not taken into account • Example: class Test { int test(boolean b) { int i; try { if(b) return 1; i = 2; } finally { if(b) i = 3; } return i; }} • Incomplete common subexpression elimination • Cannot eliminate c.s. in address calculations (array refs)

  6. Then What Should We Do? • Maximize machine-independent part • Avoids porting cost of the system • Tradeoff against the size of TCB (Trusted Computing Base) • But recent works in PCC and TAL (e.g. Foundational PCC) aim solely to minimize TCB • Reduce proof size and generation cost • PCC requires much effort to produce proof, because the target’s level is very low • Registers and memory are untyped, etc.

  7. mvm [Franz et al. 2003] • Aimed to find the semantics level that: • Is effective at supporting proof-carrying code • Can also be translated efficientlyinto highly performing native code(on many platforms) • Separated design between VMlayer and PCC layer [1] M. Franz et al. A Portable Virtual Machine Target for Proof-Carrying Code. IVME ’03.

  8. mvm: Virtual Machine Design • Register-based architecture • The number of registers is not bounded • Registers are categorized by the type of values: Integer, Boolean, Pointer, Address • Pointer registers are used to store pointers to heap objects (more specifically, array heads) • Address registers are for storing results of address arithmetic • Bounds check is not performed in arithmetic, so it has to be checked in higher layer • Heap can be used to store objects • Heap model is explained next

  9. 1 false 42 true … … … … mvm: Virtual Machine Design mvm Virtual Machine Integer Boolean Pointer Address label1: instr instr … label2: … Heap

  10. mvm: Heap Model • mvm heap consists of arrays of objects • Object representation in mvm • Each object is tagged • Tag can only be written with new operation and is immutable after creation • Two sections of data area: values and pointers • Integers, booleans are stored into the first section • Pointers are stored into the second section 1 42

  11. mvm: Heap Model • A type is associated with its tag value, layout and structure • This association is managed by compiler • Layout describes the sizes of data sections • Structure describes the possible substructure inside pointer section • Example of type information {} means disjunction datatype T = Int of int | Pair of int * int… T list … <> means a tuple

  12. mvm: Heap Model • Example of a T list object tree int*int T T list

  13. mvm: Instructions • Arithmetics, Logical calculation • Similar to many other languages ;-) • Branch • Unconditional: goto label • Conditional: brtrue bi, label / brfalse bi, label • Condition must be taken from a boolean register • Jump is allowed only to a label • Conditional by object tag (RTTI): iftag

  14. mvm: Instructions • Object creation and access • pj = new(tag, ik) • Creates an array of ik objects with type tag • r = load([sizev, sizep] | tag, pk, offset) • store([sizev, sizep] | tag, pk, offset, r) • sizes and tag are used to check memory safety • Pointer registers and address registers • Object access also permit address registers ak • This distinction is for supporting garbage collection • Address registers always contain “derived” pointers

  15. mvm: Instructions • Accessing arrays • an = adda([sizev, sizep] | tag, pk, il) • Calculates address of the il-th element in an array of type tag stored at pk • in = getlen(pk) • Guards • Bounds checking: CHECKLEN(pk, il) • Validity checking: CHECKNOTNULL(pk) • Type checking: CHECKTAG(pi, [sizev, sizep] | tag) • These guards are inserted when static checking failed

  16. An Example mvm Program

  17. Type Safety in mvm Programs • Operations on primitives are all type-safe • Because registers to store values are distinct • Type-safety proofs are needed only for non-primitive operations • Pointers, arrays and records • For every pointer operation, check that result pointer: • Points to the beginning of an array, record or value • Points to an object of the correct type

  18. Jinja [Klein et al. 2006] • A Java-like programming language built on Isabelle/HOL • Formal description of Jinja language, Jinja VM and compiler are given • Several properties were machine-checked • Big step evaluation and small step evaluation (atomic operations) are equivalent • Compiler correctness [2] G. Klein, T. Nipkow. A Machine-Checked Model for a Java-Like Language, Virtual Machine, and Compiler. TOPLAS 28(4), 2006.

  19. Jinja Language • “Jinja is not Java” • Object-oriented language with exceptions • A program is a set of class definitions and,a class consists of several fields and methods • Method body is an expression • Overriding is supported as in Java • But not overloading, because it is complicated • Language is statically typed • Type system ensures that the execution of a well-typed program never gets stuck

  20. Jinja Language: Language Elements • Values • Boolean Bool b, Integer Intg i, Reference Addr a • Null reference Null, Dummy value Unit • Expressions • Val v , binary operations e1 op e2 , Var V , V := e , e1; e2 , … • Conditional: if (e) e1 else e2 , while (e) e’ / Block: {V:T, e} • Object construction: new C • Casting: Cast C e • Field access: e.F{D} , e.F{D} := e • D is annotation added in preprocessing (e.g., by typechecker) • Method call: e.M(e, e, …) • Exception: throw e , try e1 catch(C V) e2

  21. Jinja Language: Semantics • Big step semantics • Typical operational semantics • State = <Heap, Local Variables> • Detail abbreviated because nothing special • Small step semantics • Finer-grained semantics • One-step evaluation • Useful for formalizing parallelism (?) • Each (small) operation is considered atomic • Not discussed in the paper

  22. Jinja Language: Semantics • Big and small semantics are proven to be equivalent • wwf-J-prog means “weak well-formedness”, which is defined by the following properties: • Number of parameter types and of parameter names are equal • “this” is not included in parameter list • Free variables in the method body only refer to this or these parameters

  23. Jinja VM • Similar to Java VM • Stack-based machine with heap • State = <addr option, heap, frame list> • First element is possibly a generated exception • Third element is a call-stack • Frame = <stack, registers, cname, mname, pc> where stack = value list, registers = value list • Evaluation of operands are done on stack • Registers are for storing local variables

  24. Jinja VM: Instructions • Basic operations • Push v / Pop • Register operations: Load n / Store n • Arithmetics: IAdd, … • Logical operations: CmpEq, … • Object manipulation • Construction: New cname • Casting: Checkcast cname • Field access: Getfield vnamecname / Putfield vname cname • Method invocation: Invoke mnamen

  25. Jinja VM: Instructions • Control flow operation • Branching: Goto n / IfFalse n • n is relative offset from the instruction • Exit from a method: Return • Exception • Throwing an exception: Throw • Information about exception handlers (try-catch) are attached to method declarations • Handler is retrieved from there when needed

  26. Jinja VM: Semantics • Please refer to the paper for detail • Basically, straightforward and intuitive • In this level, there are no runtime checks • For example, IAdd (Integer addition) does not check whether its argument is really integers • Otherwise, the result is unspecified • This kind of checks is performed by a bytecode verifier

  27. Jinja VM Bytecode Verification • JVM relies on the following assumptions: • Types are correct • No overflow or underflow in stack • Code containment • Register initialization before use • Just the same as Java VM • Bytecode verifier statically ensures these assumptions

  28. Intg 1 Addr 1 Int Class A Class B Addr 2 Addr 1 Class A Class A Addr 1 Addr 3 Class B Class B Addr 3 Int 0 Int 0 JVM Program Jinja VM Bytecode Verification • Abstract interpretation • Instead of values, consider only types State State Type

  29. Jinja Compiler and its Correctness • 2-staged compilation • Map parameter names to register indices • Assign local variables to registers • Gather variable occurrences and use it to lookup • Code generation • expression → instruction list (compE2) • Straightforward definition • Exception table generation (compEx2) • Separated from compE2, because exception table must contain global addresses

  30. Jinja Compiler and its Correctness • Correctness of compilation • If a program is weakly well-formed, then: Heap,Vars Heap,[Frame] compilation Jinjaprogram JVMbytecode Heap,Vars Heap, []

  31. Implementation of Jinja • http://afp.sourceforge.net/entries/Jinja.shtml • About 20kLoC in Isabelle/HOL • Over 1,000 theorems are defined • It takes about 25 min. to process these proofs on a 3GHz Pentium 4 machine with 1GB RAM

  32. Summary of Today’s Talk • We would need a secure low-level language for the target of a secure compiler • Minimize machine-dependent part to reduce implementation cost • Also reduce cost for proof generation • To answer this, surveyed two VM projects • mvm • Aimed to find the “sweet spot” that reconciles high performance and small type-safety proofs • Jinja • Constructed a unified formal model of a Java-like language, the underlying VM and compiler • In contrast to mvm, this research is oriented toward higher-level languages and compiler’s properties

  33. How about ADL [Yoshino 2006] …? • The position of ADL is close to mvm • To provide a common basis of implementing verifier for low-level languages • Assumed translation direction is opposite • mvm is an intermediate code of compilation • ADL is designed to simulate real machines JVM, mvm MachineCode ADL Secure L3

  34. How about ADL [Yoshino 2006] …? • ADL takes minimalist approach • Only 7 kinds of commands • Instead, expression-based design to allow complex formulae to be easily written • ADL can be used as an intermediate language? • Probably some modification needed • Register allocation is done, but except for variables • Minimalist design, however, may increase complexity in constructing a verification logic • Abstract interpretation is often not sufficient, so a verification logic may want to calculate exact values

  35. More References • LLVM Project [Lattner 2000] • http://www.llvm.org/ • Use VM for interprocedural optimization • SafeTSA [Amme et al. 2001] • SSA-based language for mobile code security • Dis virtual machine [Winterbottom et al. 1997] • Omniware system [Adl-Tabatabai et al. 1996]

  36. High-LevelProgram MachineExecutable (Typical) Compiler Construction andSeveral Intermediate Languages Lexing /Parsing TypeChecking Normalize(SSA, etc.) LLVM,SafeTSA Java, CIL,Jinja(VM) Intermediate CodeGeneration Optimize TAL,PCC mvm ADL Target Code Generation RegisterAllocation PrettyPrinting

More Related