2.01k likes | 2.18k Views
An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University. Lecture 1 October 29, 2001. ConCert Meeting. Plan. Today: Show and tell. Cartoons Some history Special J compiler Demo Next time: Technical details. Lf i and Oracle-based checking Safety policies
E N D
An Introduction toProof-Carrying CodePeter LeeCarnegie Mellon University Lecture 1 October 29, 2001 ConCert Meeting
Plan • Today: Show and tell. • Cartoons • Some history • Special J compiler • Demo • Next time: Technical details. • Lfi and Oracle-based checking • Safety policies • Compiler strategy and annotations • Engineering considerations • Ideas for ConCert-related projects
On June 4, 1996, the Arianne 5 took off on its maiden flight. 40 seconds into its flight it veered off course and exploded. It was later found to be an error in reuse of a software component. For the next two years, virtually every research presentation used this picture. Arianne 5
“Better, Faster, Cheaper” • In 1999, NASA lost both the Mars Polar Lander and the Climate Orbiter. • Later investigations determined software errors were to blame. • Orbiter: Component reuse error. • Lander: Precondition violation.
USS Yorktown “After a crew member mistakenly entered a zero into the data field of an application, the computer system proceeded to divide another quantity by that zero. The operation caused a buffer overflow, in which data leaked from a temporary storage space in memory, and the error eventually brought down the ship's propulsion system. The result: the Yorktown was dead in the water for more than two hours.”
Programmable mobile devices • By 2003, one in five people will own a mobile communications device. • Nokia expects to sell 500M Java-enabled phones in 2003. • Most of these devices will be power and memory limited.
Security Attacks • According to CERT, the majority of security attacks exploit • input validation failure • buffer overflow • VBS http://www.cert.org/summaries/CS-2000-04.html
Observations • Failures often due to simple problems “in the details.” • Reuse is critical but perilous. • Performance still matters a lot.
Safety Engineering • Small theorems about large programs would be useful. • Need clearly specified interfaces and checking of interface compliance. • Must not sacrifice performance.
The Code Safety Problem Please install and execute this.
Is this safe to execute? CPU Code Safety Code Trusted Host
Theorem Prover Flexible and powerful. CPU Approach 4Formal Verification Code But really really really hard and must be correct. Trusted Host
CPU A Key Idea: Explicit Proofs Code Certifying Prover Proof Checker Proof Trusted Host
No longer need to trust this component. CPU A Key Idea: Explicit Proofs Code Certifying Prover Proof Checker Proof
Typically native or VM code Formal proof or “explanation” of safety Proof-Carrying Code[Necula & Lee, OSDI’96] A rlrrllrrllrlrlrllrlrrllrrll… B
Reasonable in size (0-10%). No longer need to trust this component. Simple, small (<52KB), and fast. CPU Proof-Carrying Code Code Certifying Prover Proof Checker Proof
Object code Source code Proof Looks and smells like a compiler. CPU % spjc foo.java bar.class baz.c -ljdk1.2.2 Automation viaCertifying Compilation Certifying Prover Certifying Compiler Proof Checker
The Role ofProgramming Languages • Civilized programming languages can provide “safety for free”. • Well-formed/well-typed safe. • Idea: Arrange for the compiler to “explain” why the target code it generates preserves the safety properties of the source program.
The Role ofJava in this Short Course • In recent years, Java has been the main focus of my work. • Java is just barely a civilized programming language. • We routinely do better than this.
Java • Java is probably a worthwhile subject of research. • However, it contains many outrageous and mostly inexcusable design errors. • As researchers, we should not forget that we have already done much better, and must continue to do better in the future.
Note • Our current approach seems to work for many problems. • But it is the only one we have tried — there are many others. • PCC is a general concept and we have just barely scratched the surface.
Overview of Our Approach OK, but let me quickly look over the instructions first. Please install and execute this. Code producer Host
Overview of Our Approach Code producer Host
Overview of Our Approach This store instruction is dangerous! Code producer Host
Overview of Our Approach Can you prove that it is always safe? Code producer Host
Overview of Our Approach Yes! Here’s the proof I got from my certifying Java compiler! Can you prove that it is always safe? Code producer Host
Overview of Our Approach Your proof checks out. I believe you because I believe in logic. Code producer Host
History: early 90’s • Fox project starts building the FoxNet • Need to control memory layout of data • Words, bytes, etc. (endianness? alignment?) • Boxed vs unboxed data (efficiency? control?) • Packet headers (how to write packet filters?) • ML not expressive enough, and compiler technology is inadequate • Harper invents intentional polymorphism, typed intermediate languages, and type-directed compiling • Biagioni, et al., extend SML design
History: mid 90’s • Question: Can these ideas be used in a “production-quality” compiler for a big language like ML? • Morrisett and Tarditi build TIL • General hints on IL design • Encouraging signs that optimizations are OK • Stone and Harper design the MIL • Lots of work, world-wide, on type-directed compiling • Work begins on TILT
History: mid 90’s • An easy observation in 1995: • Types in TIL are not carried all the way down to the final target code • The idea of enclosing LF encodings of proofs with code is “floating around” • Lee and Necula work on this, but get nowhere • Many problems, such as optimizations • Necula goes to DEC SRC to intern with Detlefs and Nelson • Works on extending ESC to catch memory leaks in Modula-3 programs • The next Fall, takes Frank’s Constructive Logic course
History: 1996 • Necula and Lee write several standard BPF packet filters in hand-optimized Alpha assembly code. • Simple operational semantics for a core “safe Alpha” • Checks safety conditions for each instruction execution • Proof system for “real Alpha” • Encoded in LF • Proofs generated and checked using Elf • Results in “self-certified code”, later “proof-carrying code” • Plus proof representations, certifying compilation, safety policies (incl. resource bounds) • Inspires significant follow-on and new work at Cornell, Princeton, INRIA, and many other places
History: 1999 • CMU releases PCC to Cedilla Systems Incorporated. • Patent 6,128,774. Oct.2000, Safe to execute verification of software (Necula and Lee) • Patent 6,253,370. June 2001, Method and apparatus for annotating a computer program to facilitate subsequent processing of the program (Abadi, Ghemawat, and Stata) • In less than 26 months, a complete optimizing “ahead-of-time” PCC compiler for Java.
History: Today • Strong similarities in TILT, PCC, TAL, … • Compiler design is changing • Some day, all compilers will be certifying
History: Today • Are proofs really necessary? • Probably not • And they are messy, compared to types • But as a verification mechanism, proofchecking seems to have some possibly significant engineering advantages over typechecking
The primary contribution • “Proof engineering”. • PCC more clearly defined the proof-engineering problem • How to do checking • with minimal overhead and restriction on programs, • with minimal time and space overhead in checking, • with minimal size and complexity of the checker, • and with minimal need for changes when the proof system changes
K Virtual Machine • Designed to support the CLDC. • Must fit into <128KB. • Must have fast bytecode verification. • kJava class files must be Java-compatible. • Divides bytecode verification into two stages.
Byte codes Source code Annot CPU kJava and KVM kJava Preverifier kJava Compiler Verifier
KVM Verification • “Preverification” is performed by the code producer. • Uses global (iterative) analysis to compute the types of stack slots and local vars at every join point. • Second stage is performed by class loader. • Simple linear scan verifies correctness of join-point annotations.
Join-point typing annotations KVM Example[from Frank Yellin] 0. aload_0 1. astore_1 2. goto 10 Long Number | <> 5. aload_1 6. invokeStatic nextValue(Number) 9. astore_1 Long Number | <> 10. aload_1 11. invokeVirtual intValue() 14. ffne 5 17. return static void test(Long x) { Number y = x; while (y.IntValue() != 0) { y = nextValue(y); } return y;
KVM Verification • The second stage verifier is a 10KB program that requires • a single scan of the code, and • <100 bytes of run-time storage. • Impressive! • This is Java verification done right.
Join-Point Annotations • All of these approaches to certified code make use of join-point typing annotations to reduce code verification to a simple problem. • They are essentially the classical loop invariants of the Dijkstra/ Hoare program verification approach.
Overheads • In TAL and PCC we observe relatively large annotations sizes (~10-20%), sometimes much more. • Unknown for kJava. • Research question: • Can we reduce this size? • Checking speed and storage space is also a problem.
High-Level Architecture Code Verification condition generator Checker Explanation Agent Safety policy Host
High-Level Architecture Code Verification condition generator Checker Explanation Agent Safety policy Host
The VCGen • The verification condition generator (VCGen) examines each instruction. • It is a symbolic evaluator that essentially implements the operational semantics of a “safe” version of the machine language. • It checks some simple properties directly. • E.g., direct jumps go to legal addrs. • Informally, it invokes the Checker when “dangerous” instructions are encountered.