300 likes | 415 Views
Credible Compilation With Pointers. Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology. Goal. Compiler. Source Code (C, Java). Proof that Object Code Implements Source Code. Object Code. Proof Checker. Yes. No. Proposed Approach.
E N D
Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology
Goal Compiler Source Code (C, Java) Proof that Object Code Implements Source Code Object Code Proof Checker Yes No
Proposed Approach Parser Source Internal Representation Equivalence Proof Analysis & Transformation Internal Representation Equivalence Proof Analysis & Transformation Internal Representation Simple Code Generator Binary Code
Key Aspects • Majority of compiler structured as a sequence of transformations on standard intermediate format • Compiler generates a proof for each transformation • Separate verification frameworks for • Parser • Transformations • Code generator • Today: Proving Transformations Correct
Overview of Framework • Compiler operates on compilation units • procedure, loop nest, method • Set of externally visible variables • global variables, instance variables • Correctness condition • transformation preserves final values of externally visible variables • original and transformed programs terminate under same conditions
i 0 g 0 i i + 3 g g + 6 g 2 * i g < 48 i < 24 exit exit Example Externally visible variable: g
Structure of A Transformation • Analysis to discover program properties • reaching definitions • available expressions • Program transformation • constant and copy propagation • common subexpression elimination • Correctness of transformation often depends on correctness of analysis result
Two Stage Proof Structure • Prove analysis results correct • Classic Floyd approach for proving program properties • Use analysis results to prove simulation relations between original and transformed programs • State equality of expressions at corresponding program points under certain execution conditions
Standard and Simulation Invariants • Invariants for program analysis results <c>p - “The condition c is always true at the program point p” • Simulation invariants for transformations <c1,e1>p1 - <c2,e2>p2 “For all executions of the transformed program that reach the program point p2 with the condition c2 true, there exists an execution of the original program that reaches p1 with c1 true such that e1 =e2”
g 0 g g + 6 g < 48 exit Example 1 a i 0 2 b i i + 3 3 c g 2 * i 4 d i < 24 5 exit <g = 2 * i> 4 <true, 2 * i> 2 - <true, g> b <2 * i> 4 - <g> c Correctness Condition <g> 5 - <g> d
Proving Standard Invariants • Proof rules propagate invariants backwards through control flow graph • Substitution at assignment statements • Add condition in at conditional branches • Propagate along all edges at join points 3 g 2 * i <2 * i = 2 * i> 3 4 i < 24 <g = 2 * i> 4
Proving Simulation Invariants • Each proof rule propagates one of the two sides of invariant (<c1,e1>p1 - <c2,e2>p2) • Right side propagated in transformed program • Propagate along all edges at join points • Left side propagated in original program • Propagate along one edge at join points • Can use other invariants (both standard and simulation) to prove propagated invariant
i 0 g 0 i i + 3 g g + 6 g 2 * i g < 48 i < 24 exit exit Simulation Invariant Proof Example Given {<2 * i> 2 - <g> b, <2 * i> 4 - <g> c, <g> 5 - <g> d } and { <g = 2 * i> 4} a 1 b 2 c 3 d 4 5 Prove <2 * i> 2 - <g> b
g 0 g g + 6 g < 48 exit Propagate Right Hand Side a To prove <2 * i> 2 - <g> b Propagate RHS to a, c Must Prove <2 * i> 2 - <0> a <2 * i> 2 - <g<48, g> c b c d
i 0 i i + 3 g 2 * i i < 24 exit Propagate Left Hand Side 1 To prove <2 * i> 2 - <0> a Propagate LHS to 1 <2 * 0> 1 - <0> a To prove <2*i> 2 - <g<48, g> c Propagate LHS to 4 <i<24,2*i> 4 - <g<48, g> c 2 3 4 5
Use Other Invariants • Want to prove <i < 24,2 * i> 4 - <g < 48, g> c • But <2 * i> 4 - <g> c is one of given invariants • So we can substitute 2 * i in for g, and reduce our problem to proving • 2 * i < 48 implies i < 24 • 2 * i = 2 * i
Primary Advantages • Open Compilation Framework • Anyone can provide transformations • No need to trust provider • Buggier Compilers • Incorrect compilation much less serious problem • Compiler writers can focus on • Aggressive optimizations • Latest language developments • NOT on correct compilation in all cases
Key Challenge Compiler must be able to control machine at very low level for efficiency Pointers Register Allocation Instruction Selection Condition Codes
Pointer Problem • Cannot use simple expression substitution in presence of pointers • Aliasing may cause substitution to produce incorrect result • Solution: • Define substitution in the presence of a set of aliases • Compiler uses pointer analysis to produce alias invariants at each program point • Proof rules use alias invariants
Pointer Details • Handling aliasing uncertainty • Must prove result for both aliased and unaliased cases • Flow-insensitive analyses • Change semantics of source language slightly to make analysis results valid • Provide derived rules specifically to support validation of flow-insensitive results
Low Level Details Register Allocation Instruction Selection Condition Codes
Standard Compiler Structure Machine-Independent Representation Parser Source Lowering Machine-Independent Analyses and Transformations Machine-Dependent Representation Machine-Specific Analyses and Transformations Code Generation Binary Code
Proposed Compiler Structure Single Standard Representation Parser Source Lowering Analyses and Transformations Single Standard Representation Code Generation Analyses and Transformations Binary Code
Simple Code Generation • Basic Approach • Each node in control flow graph • Corresponds to single instruction in generated code • Code generator can be very simple • Issues: • Registers in machine-independent IR • Implicitly set state (condition codes) • Instruction selection
Register Allocation • Have a dedicated variable represent each register • Not semantically distinct from other variables in intermediate representation • But code generator for the specific machine understands representation • Allocates dedicated variables in registers • Can represent results of register allocation in an ostensibly machine-independent IR
Condition Codes • Instruction may have multiple effects • Write registers, set condition codes • Solution: Macro Instructions • Define macro instructions as a sequence of basic nodes in IR • System automatically derives proof rules for macro instructions • Code generator produces one instruction for each macro instruction • Approach works for instruction selection
Limitation • Proof rules are based on concepts of partial correctness • Not designed to prove equivalences that depend on termination of loops g 0 g 48 g g + 6 exit g < 48 exit
Proved Transformations • Constant Propagation and Folding • Copy Propagation • Dead Code Elimination • Branch Movement • Induction Variable Elimination • Loop Unrolling • Branch Elimination
Related Work • Totally Correct Compilation • Verifix • Guttman, Ramsdell, Wand • Synchronous Languages • Cimatti, Giunchiglia, Pecchiari, Pietra, Profeta, Romano, Traverso, Yu • Pnueli, Siegal, Singerman • Proof-Carrying Code • Necula, Lee
Conclusion • Credible Compilation for Imperative Languages • Basic Concepts • Standard Invariants • Simulation Invariants • Proof rules for propagating invariants • Support for low-level details • Pointers • Registers and Condition Codes • Instruction Selection