380 likes | 502 Views
New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security. UC Irvine – project trans pro se: transporting programs securely. Michael Franz University of California, Irvine July 2001. Introduction. mobile code is an enabling technology
E N D
New Approaches to Mobile Code:Reconciling Execution Efficiencywith Provable Security UC Irvine – project transprose: transporting programs securely Michael Franz University of California, Irvine July 2001
Introduction • mobile code is an enabling technology • download functionality as needed • handheld, untethered devices, “information appliances” • platform-independent fi identical code can run on PDAs, desktop machines, even supercomputers • but, many unresolved issues with respect to • performance of the mobile program (on the target) • performance of the mobile code distribution mechanism • protecting the host against malicious mobile programs • [guarding a mobile program’s secrets against a malicious host]
Guiding Overall Objective • make mobile code practical, so that • eventually, native code will need to exist only transiently, created on-the-fly and consumed on the spot • while mobile code will be used as the storage and distribution medium
Context • dynamic code-generation technology is approaching maturity and processors are becoming fast enough to sustain it (in real time) • this is rapidly diminishing the value of “binary compatibility” • moreover, dynamic optimization techniques yield better code than static compilation • exploit actual processor parameters (caches, …) • “live” profiling data may be available • => “mobile code will define future platform(s)”
Mobile Code Security • most approaches are based on some type-safe programming language • host systems publish their policies in terms of type-safe APIs • conformance to that interface is then guaranteed by the mobile code transportation scheme • semantically equivalent to transporting source code • however, for efficiency and quality of dynamic code generation, usually want to transport a format “closer to the machine” while still preserving source-program type-safety semantics
Existing Practice: Java • the Java Virtual Machine is the de-facto standard format for distributing mobile programs • the JVM has an instruction set that has been designed specifically for representing Java programs • interestingly enough, there still are JVM programs for which no legal equivalent Java source program exists • there are also legal Java programs that are rejected by all possible JVM bytecode verifiers [Staerk’00] • security is obtained by verifying the JVM bytecode, essentially a symbolic execution of the program
Security vs. Efficiency • the Java Virtual Machine's instruction format is not very capable in transporting the results of program analyses and optimizations • as a consequence, when Java byte-code is transmitted, each recipient must repeat most of the analyses and optimizations that could have been performed just once at the origin • the main reason why Java byte-code has these deficiencies is to allow verification by the recipient
Security vs. Efficiency • for example, a code producer often has information about the redundancy of a type or index check • but this fact cannot be communicated safely to the code consumer - not in a manner that the recipient can be sure that this is not a false claim inserted by a malicious third party • similar concerns inhibit common compiler optimizations such as common subexpression elimination at the code producer’s side
An Alternative Approach: PCC • instead of executing the program symbolically at the receiver’s site (which is time consuming and complex), the code producer attaches a “proof” that the code is correct • the “proof” shortcuts the verification: checking a given solution is often much simpler than finding it in the first place • the Java KVM for embedded devices uses a kind of PCC (“stack maps”) that may become a standard for Java
A Third Approach • instead of verifying or checking, we have been been investigating a class of mobile code representations that can provably encode only “legal” programs • security is obtained by construction • the need for verification disappears • our approach can provide the identical security guarantees as the Java Virtual Machine, but it can express most of them statically as a well-formedness property of the encoding itself • in our solution, an incoming mobile program may not do the intended task, but it will not do anything “bad” - for any definition of “bad” that can be cast into a type system • interestingly enough, such “intrinsically secure” mobile code is also denser than virtual machine code, and permits to generate better object code, and faster
A Third Approach: Two Variants • we have in fact designed not just one, but two alternative mobile-code representations, both of which provide “security by construction” • they differ in the semantic level at which they describe the mobile program • “high-level”: close to the source language but with supporting compiler-related information • “low-level”: as close to what a modern code generator back-end needs without being target-machine specific
Rationale for Multi-Track Approach • the relative trade-offs (encoding density vs. decoding/dynamiccompilation speed vs. code quality) are completely unknown and can only be determined by collecting experience with actual prototypes • by implementing both the “high-level” and the “low-level” solution, we are exploring the design space rather than designing an ad-hoc solution
Low-Level Encoding [PLDI01] • SafeTSA preserves control and dataflow information as well as full typing for each intermediate result • it is based on SSA form, a representation that is also used internally by a number of important state-of-the-art research compilers for Java, e.g., • IBM T.J. Watson Lab: Jalapeño • Microsoft: Marmot • Sun Microsystems: HotSpot Server • SafeTSA is far easier to parse into a form useful for code optimization than JVM-code
Current Status and Results • based on Martin Odersky’s Pizza front-end • can compile all of Java to safeTSA • prototype run-time environment almost finished; will provide full interoperability between safeTSA and JVM-based class files • can mix and match both formats with dynamic loading • call-backs from JVM to safeTSA are ugly • safeTSA representation is surprisingly small
High-Level Encoding [Babel01] • ultra-compact representation using grammar-based compression of abstract syntax trees • goal is to transport the source program along with as much compiler-related support information as possible
Schematic Overview Source Parser CodeGenerator “classic Frontend” AST Encoder “classic Backend” AST Decoder PPM-Model & Arithmetic Encoder PPM-Model & Arithmetic Decoder 011000101010… Compression / Decompression
Compression Overview • Parsing: get AST from source • Serialize: get stream of symbols from AST • Modeling: use context and abstract grammar to build predictive statistical model • Coding: use arithmetic coding with model
Types of nodes in AST • String, Integer, Terminal • List : e.g. Block = BlockStatement* • Aggregate : e.g. IF = cond thenbranch elsebranch • Choice : e.g. BinOp = Plus | Minus | … • Information is in choice nodes • want to guess which choice is taken
Transmitting an AST • any predefined serialization will do • we use depth first (pre-order) • when serialized, most info in AST is redundant, e.g. • order and kind of kids of aggregate nodes known • this is because we use knowledge of the grammar • must encode index of choice made at choice nodes
Prediction by Partial Match (PPM) • dynamically maintain counts of characters seen after various contexts • contexts may be of various lengths • eg. for “abcd”, contexts for “d” are : • length 1 context : “c” • length 2 context : “bc” • length 3 context : “abc” • predict characters in current context by looking at what occurred previously
a b c d b c d d c d Maintaining Contexts * a b c d
Adapting PPM To Work On Trees • each node is a symbol • the context is path from root to the current node in the AST • problem: in DFS, what when we reach leaf node and go back up to ancestor? • pop context – all active nodes moved up one position to their parents (in context tree)
Encoding • PPM is used to model the choices made at choice nodes, i.e. associate a probability with every choice • these probabilities are used to drive an arithmetic coder to output bits
Compressing Constants • constants (strings, integers, names) are a significant fraction of source • to compress: make table of constants, and refer to them by their index in this table • further compress: maintain different tables for strings, names etc. – reduces number of bits in index • currently exploring more sophisticated context modeling ideas for compressing constants
AST Compression: Example AST for: i = i + 1 Relevant grammar rules Stmt = If|While|Assign|…. Assign = Lvalue Expr Lvalue = Field|VarAccess|… Expr = Unary|Binary|… Binary = BinOp Expr Expr Choice nodes Preorder traversal Stmt AssignLvalueVarAccess iExpr Binary ExprVarAccess iExprLiteral IntLiteral 1BinOp +
AST Compression: Example Context tree AST for: i = i + 1
AST Compression: Example AST for: i = i + 1 Context tree
AST Compression: Example AST for: i = i + 1 Context tree Model: Prob(j) = 0.3 Prob(k) = 0.5 Prob(i) = 0.2 Send model and choice “i” to arithmetic coder
Status and Results • compressor/decompressor prototype written in Python • completely generic – can be used with any abstract grammar • have implemented the Java abstract grammar • works with single Java source files as well as entire packages. • comparison for Java class-file compression with Pugh’s results (best published Java compressor)
Results: Classes Classes from Sun’s javac package - all sizes in bytes
Results: Archives compressed collections of classes - all sizes in bytes • compressed ASTs are 5-50% smaller than Pugh’s • 3-8 times smaller than uncompressed class files or JAR files
Performance-Enhancing Information • now raise the semantic level of the grammar • e.g. “Escape Analysis” • an object that doesn’t “escape” its defining scope can be allocated on the stack rather than on the heap • this optimization alone can often double performance • the analysis itself is very difficult to do, but the results of the analysis are easy to verify • augment the type system by “escaping/non-escaping” • make this part of the encoding scheme itself • e.g., => a non-escaping object cannot be assigned to a variable from an enclosing scope
Insights So Far • abstract syntax trees viable as a mobile code format • can be highly compressed • Java archives by factor of 3-8 • 5-50% better than Java bytecode specific compression by Pugh
Overall Project Achievements • lead the way to a genuine improvement over virtual machine transportation formats • security without need for validation • tamper-proof performance-improving information • innovative and generic program compression method as a useful by-product of this effort
Task Schedule • Y1 Milestones: • source-level representation => Java compression • low-level representation • core calculus representation • Y2 Milestones: • system prototypes • trade-off analysis • encoding format comprehensive definition • End of Project: • system deliverable • comprehensive documentation 1999 2000 2001 2002 • investigate: • multiple source languages • graph-based encoding schemes • proof-carrying code • investigate: • requirements ofoptimizing code generators • integration of security vs. compiler-related data • investigate: • mutual interaction of security, efficiency, and compression density • security of system
Mobile Code Security Revisited • provided through type-safe programming language and type-safe APIs • semantically equivalent to transporting source code (everybody does it this way) • but many policies currently cannot be expressed in terms of a type system and hence need to be implemented inside the library • “open only files in directory X” • “initiate connections only with IP addresses in range […]” • “execute no more than N instructions between OS calls” • “do not send on network after reading local files” • => security automata • need to represent these properties directly and support them along the whole pipeline from code producer to code consumer • => some other PIs in Oasis are working on these themes and their work can be directly beneficial to this project
Transition of Technology • our prototype implementation(s) will be made available in source form • the idea is to create a “turnkey” replacement to current Java compilers and JVM runtime systems • you simply take your code and recompile using our compiler • it will then run on our runtime • our runtime will also run your old JVM class files • you can even mix our stuff with JVM class files • => we simply provide a new (better!) mobile code transportation layer without changing anything else