1 / 38

New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security

New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security. UC Irvine – project trans pro se: transporting programs securely. Michael Franz University of California, Irvine July 2001. Introduction. mobile code is an enabling technology

Download Presentation

New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New Approaches to Mobile Code:Reconciling Execution Efficiencywith Provable Security UC Irvine – project transprose: transporting programs securely Michael Franz University of California, Irvine July 2001

  2. Introduction • mobile code is an enabling technology • download functionality as needed • handheld, untethered devices, “information appliances” • platform-independent fi identical code can run on PDAs, desktop machines, even supercomputers • but, many unresolved issues with respect to • performance of the mobile program (on the target) • performance of the mobile code distribution mechanism • protecting the host against malicious mobile programs • [guarding a mobile program’s secrets against a malicious host]

  3. Guiding Overall Objective • make mobile code practical, so that • eventually, native code will need to exist only transiently, created on-the-fly and consumed on the spot • while mobile code will be used as the storage and distribution medium

  4. Context • dynamic code-generation technology is approaching maturity and processors are becoming fast enough to sustain it (in real time) • this is rapidly diminishing the value of “binary compatibility” • moreover, dynamic optimization techniques yield better code than static compilation • exploit actual processor parameters (caches, …) • “live” profiling data may be available • => “mobile code will define future platform(s)”

  5. Mobile Code Security • most approaches are based on some type-safe programming language • host systems publish their policies in terms of type-safe APIs • conformance to that interface is then guaranteed by the mobile code transportation scheme • semantically equivalent to transporting source code • however, for efficiency and quality of dynamic code generation, usually want to transport a format “closer to the machine” while still preserving source-program type-safety semantics

  6. Existing Practice: Java • the Java Virtual Machine is the de-facto standard format for distributing mobile programs • the JVM has an instruction set that has been designed specifically for representing Java programs • interestingly enough, there still are JVM programs for which no legal equivalent Java source program exists • there are also legal Java programs that are rejected by all possible JVM bytecode verifiers [Staerk’00] • security is obtained by verifying the JVM bytecode, essentially a symbolic execution of the program

  7. Security vs. Efficiency • the Java Virtual Machine's instruction format is not very capable in transporting the results of program analyses and optimizations • as a consequence, when Java byte-code is transmitted, each recipient must repeat most of the analyses and optimizations that could have been performed just once at the origin • the main reason why Java byte-code has these deficiencies is to allow verification by the recipient

  8. Security vs. Efficiency • for example, a code producer often has information about the redundancy of a type or index check • but this fact cannot be communicated safely to the code consumer - not in a manner that the recipient can be sure that this is not a false claim inserted by a malicious third party • similar concerns inhibit common compiler optimizations such as common subexpression elimination at the code producer’s side

  9. An Alternative Approach: PCC • instead of executing the program symbolically at the receiver’s site (which is time consuming and complex), the code producer attaches a “proof” that the code is correct • the “proof” shortcuts the verification: checking a given solution is often much simpler than finding it in the first place • the Java KVM for embedded devices uses a kind of PCC (“stack maps”) that may become a standard for Java

  10. A Third Approach • instead of verifying or checking, we have been been investigating a class of mobile code representations that can provably encode only “legal” programs • security is obtained by construction • the need for verification disappears • our approach can provide the identical security guarantees as the Java Virtual Machine, but it can express most of them statically as a well-formedness property of the encoding itself • in our solution, an incoming mobile program may not do the intended task, but it will not do anything “bad” - for any definition of “bad” that can be cast into a type system • interestingly enough, such “intrinsically secure” mobile code is also denser than virtual machine code, and permits to generate better object code, and faster

  11. A Third Approach: Two Variants • we have in fact designed not just one, but two alternative mobile-code representations, both of which provide “security by construction” • they differ in the semantic level at which they describe the mobile program • “high-level”: close to the source language but with supporting compiler-related information • “low-level”: as close to what a modern code generator back-end needs without being target-machine specific

  12. Rationale for Multi-Track Approach • the relative trade-offs (encoding density vs. decoding/dynamiccompilation speed vs. code quality) are completely unknown and can only be determined by collecting experience with actual prototypes • by implementing both the “high-level” and the “low-level” solution, we are exploring the design space rather than designing an ad-hoc solution

  13. Low-Level Encoding [PLDI01] • SafeTSA preserves control and dataflow information as well as full typing for each intermediate result • it is based on SSA form, a representation that is also used internally by a number of important state-of-the-art research compilers for Java, e.g., • IBM T.J. Watson Lab: Jalapeño • Microsoft: Marmot • Sun Microsystems: HotSpot Server • SafeTSA is far easier to parse into a form useful for code optimization than JVM-code

  14. Current Status and Results • based on Martin Odersky’s Pizza front-end • can compile all of Java to safeTSA • prototype run-time environment almost finished; will provide full interoperability between safeTSA and JVM-based class files • can mix and match both formats with dynamic loading • call-backs from JVM to safeTSA are ugly • safeTSA representation is surprisingly small

  15. High-Level Encoding [Babel01] • ultra-compact representation using grammar-based compression of abstract syntax trees • goal is to transport the source program along with as much compiler-related support information as possible

  16. Schematic Overview Source Parser CodeGenerator “classic Frontend” AST Encoder “classic Backend” AST Decoder PPM-Model & Arithmetic Encoder PPM-Model & Arithmetic Decoder 011000101010… Compression / Decompression

  17. Compression Overview • Parsing: get AST from source • Serialize: get stream of symbols from AST • Modeling: use context and abstract grammar to build predictive statistical model • Coding: use arithmetic coding with model

  18. Types of nodes in AST • String, Integer, Terminal • List : e.g. Block = BlockStatement* • Aggregate : e.g. IF = cond thenbranch elsebranch • Choice : e.g. BinOp = Plus | Minus | … • Information is in choice nodes • want to guess which choice is taken

  19. Transmitting an AST • any predefined serialization will do • we use depth first (pre-order) • when serialized, most info in AST is redundant, e.g. • order and kind of kids of aggregate nodes known • this is because we use knowledge of the grammar • must encode index of choice made at choice nodes

  20. Prediction by Partial Match (PPM) • dynamically maintain counts of characters seen after various contexts • contexts may be of various lengths • eg. for “abcd”, contexts for “d” are : • length 1 context : “c” • length 2 context : “bc” • length 3 context : “abc” • predict characters in current context by looking at what occurred previously

  21. a b c d b c d d c d Maintaining Contexts * a b c d

  22. Adapting PPM To Work On Trees • each node is a symbol • the context is path from root to the current node in the AST • problem: in DFS, what when we reach leaf node and go back up to ancestor? • pop context – all active nodes moved up one position to their parents (in context tree)

  23. Encoding • PPM is used to model the choices made at choice nodes, i.e. associate a probability with every choice • these probabilities are used to drive an arithmetic coder to output bits

  24. Compressing Constants • constants (strings, integers, names) are a significant fraction of source • to compress: make table of constants, and refer to them by their index in this table • further compress: maintain different tables for strings, names etc. – reduces number of bits in index • currently exploring more sophisticated context modeling ideas for compressing constants

  25. AST Compression: Example AST for: i = i + 1 Relevant grammar rules Stmt = If|While|Assign|…. Assign = Lvalue Expr Lvalue = Field|VarAccess|… Expr = Unary|Binary|… Binary = BinOp Expr Expr Choice nodes Preorder traversal Stmt AssignLvalueVarAccess iExpr Binary ExprVarAccess iExprLiteral IntLiteral 1BinOp +

  26. AST Compression: Example Context tree AST for: i = i + 1

  27. AST Compression: Example AST for: i = i + 1 Context tree

  28. AST Compression: Example AST for: i = i + 1 Context tree Model: Prob(j) = 0.3 Prob(k) = 0.5 Prob(i) = 0.2 Send model and choice “i” to arithmetic coder

  29. Status and Results • compressor/decompressor prototype written in Python • completely generic – can be used with any abstract grammar • have implemented the Java abstract grammar • works with single Java source files as well as entire packages. • comparison for Java class-file compression with Pugh’s results (best published Java compressor)

  30. Results: Classes Classes from Sun’s javac package - all sizes in bytes

  31. Results: Archives compressed collections of classes - all sizes in bytes • compressed ASTs are 5-50% smaller than Pugh’s • 3-8 times smaller than uncompressed class files or JAR files

  32. Performance-Enhancing Information • now raise the semantic level of the grammar • e.g. “Escape Analysis” • an object that doesn’t “escape” its defining scope can be allocated on the stack rather than on the heap • this optimization alone can often double performance • the analysis itself is very difficult to do, but the results of the analysis are easy to verify • augment the type system by “escaping/non-escaping” • make this part of the encoding scheme itself • e.g., => a non-escaping object cannot be assigned to a variable from an enclosing scope

  33. Insights So Far • abstract syntax trees viable as a mobile code format • can be highly compressed • Java archives by factor of 3-8 • 5-50% better than Java bytecode specific compression by Pugh

  34. Overall Project Achievements • lead the way to a genuine improvement over virtual machine transportation formats • security without need for validation • tamper-proof performance-improving information • innovative and generic program compression method as a useful by-product of this effort

  35. Task Schedule • Y1 Milestones: • source-level representation => Java compression • low-level representation • core calculus representation • Y2 Milestones: • system prototypes • trade-off analysis • encoding format comprehensive definition • End of Project: • system deliverable • comprehensive documentation 1999 2000 2001 2002 • investigate: • multiple source languages • graph-based encoding schemes • proof-carrying code • investigate: • requirements ofoptimizing code generators • integration of security vs. compiler-related data • investigate: • mutual interaction of security, efficiency, and compression density • security of system

  36. Mobile Code Security Revisited • provided through type-safe programming language and type-safe APIs • semantically equivalent to transporting source code (everybody does it this way) • but many policies currently cannot be expressed in terms of a type system and hence need to be implemented inside the library • “open only files in directory X” • “initiate connections only with IP addresses in range […]” • “execute no more than N instructions between OS calls” • “do not send on network after reading local files” • => security automata • need to represent these properties directly and support them along the whole pipeline from code producer to code consumer • => some other PIs in Oasis are working on these themes and their work can be directly beneficial to this project

  37. Transition of Technology • our prototype implementation(s) will be made available in source form • the idea is to create a “turnkey” replacement to current Java compilers and JVM runtime systems • you simply take your code and recompile using our compiler • it will then run on our runtime • our runtime will also run your old JVM class files • you can even mix our stuff with JVM class files • => we simply provide a new (better!) mobile code transportation layer without changing anything else

  38. Thank You

More Related