760 likes | 1.02k Views
Building “Correct” Compilers. K. Vikram and S. M. Nazrul A. Outline. Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt
E N D
Building “Correct” Compilers K. Vikram and S. M. Nazrul A.
Outline • Introduction: Setting the high level context • Motivation • Detours • Automated Theorem Proving • Compiler Optimizations thru Dataflow Analysis • Overview of the Cobalt System • Forward optimizations in cobalt • Proving Cobalt Optimizations Correct • Profitability Heuristics • Pure Analyses • Concluding Remarks
Outline • Introduction: Setting the high level context • Motivation • Detours • Automated Theorem Proving • Compiler Optimizations thru Dataflow Analysis • Overview of the Cobalt System • Forward optimizations in cobalt • Proving Cobalt Optimizations Correct • Profitability Heuristics • Pure Analyses • Concluding Remarks
Introduction The Seven Grand Challenges • In Vivo In Silico • Science for Global Ubiquitous Computing • Memories for Life • Scalable Ubiquitous Computing Systems • The Architecture of the Brain and Mind • Dependable Systems Evolution • Journeys in Non-classical computations
Introduction The Seven Grand Challenges • In Vivo In Silico • Science for Global Ubiquitous Computing • Memories for Life • Scalable Ubiquitous Computing Systems • The Architecture of the Brain and Mind • Dependable Systems Evolution • Journeys in Non-classical computations
Introduction Dependable Systems Evolution • A long standing problem • Loss of financial resources, human lives • Compare with other engineering fields! • Non-functional requirements • Safety, Reliability, Availability, Security, etc.
Introduction Why the sudden interest? • Was difficult so far, but now … • Greater Technology Push • Model checkers, theorem provers, programming theories and other formal methods • Greater Market Pull • Increased dependence on computing
Introduction A small but significant step Building Correct Compilers
Outline • Introduction: Setting the high level context • Motivation • Detours • Automated Theorem Proving • Compiler Optimizations thru Dataflow Analysis • Overview of the Cobalt System • Forward optimizations in cobalt • Proving Cobalt Optimizations Correct • Profitability Heuristics • Pure Analyses • Concluding Remarks
Motivation Why are correct compilers hard to build? • Bugs don’t manifest themselves easily • Where is the bug – program or compiler? • Possible solutions • Check semantic equivalence of the two programs (translation validation, etc.) • Prove compilers sound (manually) • Drawbacks? • Conservative, Difficult, Actual code not verified
DIFF Motivation Testing Compiled Prog Source compiler input output exp- ected output run! • To get benefits, must: • run over many inputs • compile many test cases • No correctness guarantees: • neither for the compiled prog • nor for the compiler
Semantic DIFF Motivation Verify each compilation Compiled Prog Source compiler • Translation validation • [Pnueli et al 98, Necula 00] • Credible compilation • [Rinard 99] • Compiler can still have bugs. • Compile time increases. • “Semantic Diff” is hard.
Correctness checker Motivation Proving the whole compiler correct Compiled Prog Source compiler
compiler Correctness checker Motivation Proving the whole compiler correct • Option 1: Prove compiler correct by hand. • Proofs are long… • And hard. • Compilers are proven correct as written on paper. What about the implementation? Correctness checker Link? Proof Proof Proof «¬ $ \ r t l / .
Motivation gcc-bugs mailing list Searched for “incorrect” and “wrong” in the gcc-bugs mailing list. Some of the results: • c/9525: incorrect code generation on SSE2 intrinsics • target/7336: [ARM] With -Os option, gcc incorrectly computes the elimination offset • optimization/9325: wrong conversion of constants: (int)(float)(int) (INT_MAX) • optimization/6537: For -O (but not -O2 or -O0) incorrect assembly is generated • optimization/6891: G++ generates incorrect code when -Os is used • optimization/8613: [3.2/3.3/3.4 regression] -O2 optimization generates wrong code • target/9732: PPC32: Wrong code with -O2 –fPIC • c/8224: Incorrect joining of signed and unsigned division • … And this is only for February 2003! On a mature compiler!
Motivation Need for Automation compiler • This approach: proves compiler correct automatically. Correctness checker Automatic Theorem Prover
Automatic Theorem Prover The Challenge This seems really hard! Task of proving compiler correct Complexity of proving a compiler correct. Complexity that an automatic theorem prover can handle.
Outline • Introduction: Setting the high level context • Motivation • Detours • Automated Theorem Proving • Compiler Optimizations thru Dataflow Analysis • Overview of the Cobalt System • Forward optimizations in cobalt • Proving Cobalt Optimizations Correct • Profitability Heuristics • Pure Analyses • Concluding Remarks
Automated Theorem Proving Brief detour thru ATP • Started with AI applications • Reasoning about FOL sound and complete • 1965: Unification and Resolution • Combinatorial Explosion. SAT (NP-Complete) and FOL (decidable) • Refinements of Resolution, Term Rewriting, Higher order Logics • Interactive Theorem Proving • Efficient Implementation Techniques • Coq, Nuprl, Isabelle, Twelf, PVS, Simplify, etc.
Outline • Introduction: Setting the high level context • Motivation • Detours • Automated Theorem Proving • Compiler Optimizations thru Dataflow Analysis • Overview of the Cobalt System • Forward optimizations in cobalt • Proving Cobalt Optimizations Correct • Profitability Heuristics • Pure Analyses • Concluding Remarks
Optimizations Focus on Optimizations • Optimizations are the most error prone • Only phase that performs transformations that can potentially change semantics • Front-end and back-end are relatively static
Optimizations Common Optimizations • Constant Propagation: replace constant valued variables with constants • Common sub-expression elimination: avoid recomputing value if value has been computed earlier in the program • Loop invariant removal: move computations into less frequently executed portions of the program • Strength Reduction: replace expensive operations (multiplication) with simpler ones (addition) • Dead code removal: eliminate unreachable code and code that is irrelevant to the output of the program
Optimizations Constant Propagation Examples
Optimizations Constant Propagation Condition • Suppose x is used at program point p • If • on all possible execution paths from START of procedure to p • x has constant value c at p • then replace x by c
Optimizations The Analysis Algorithm • Build the control flow graph (CFG) of the program • Make flow of control explicit • Perform symbolic evaluation to determine constants • Replace constant-valued variable uses by their values and simplify expressions and control flow
Optimizations Building the CFG
Optimizations Building the CFG • Composed of Basic Blocks • Straight line code without any branches or merges of control flow • Nodes of CFG • Statements (basic blocks)/switches/merges • Edges of CFG • Possible control flow sequence
Optimizations Symbolic Evaluation • Assign each variable the bottom value initially • Propagate changes in variable values as statements are executed • Based on the idea of Abstract Interpretation
Optimizations Symbolic Evaluation • Flow Functions • x := e state@out = state@in{eval(e, state@in)/x} • Confluence Operation • join over all incoming edges
Optimizations Symbolic Evaluation • Flow Functions • x := e state@out = ƒ (state@in) • Confluence Operation • join over all incoming edges
Optimizations The Dataflow analysis algorithm • Associate one state vector with each edge of CFG. Initialize all entries to • Set all entries on outgoing edge from START to • Evaluate the expression and update the output edge • Continue till a fixed point is reached
Optimizations Example Evaluation
Optimizations Termination Condition • If each flow function ƒ is monotonic • i.e. x ≤ y => ƒ (x) ≤ ƒ (y) • And if the lattice is of finite height • The dataflow algorithm terminates
Optimizations Other Optimizations All Paths Any Path Forward Flow Backward Flow
Outline • Introduction: Setting the high level context • Motivation • Detours • Automated Theorem Proving • Compiler Optimizations thru Dataflow Analysis • Overview of the Cobalt System • Forward optimizations in cobalt • Proving Cobalt Optimizations Correct • Profitability Heuristics • Pure Analyses • Concluding Remarks
Automatic Theorem Prover Overview Making the problem easier Task of proving compiler correct
Automatic Theorem Prover Overview Making the problem easier Task of proving optimizer correct • Only prove optimizer correct. • Trust front-end and code-generator.
Automatic Theorem Prover Overview Making the problem easier Task of proving optimizer correct Write optimizations in Cobalt, a domain-specific language.
Automatic Theorem Prover Overview Making the problem easier Task of proving optimizer correct Write optimizations in Cobalt, a domain-specific language. Separate correctness from profitability.
Automatic Theorem Prover Overview Making the problem easier Task of proving optimizer correct Write optimizations in Cobalt, a domain-specific language. Separate correctness from profitability. Factor out the hard and common parts of the proof, and prove them once by hand.
Overview The Design Interpreter Input Output Cobalt Program
Overview The Design
if (…) { x := …; } else { y := …; } …; Overview The Compiler Front End Source Code 10011011 00010100 01101101 Back End Binary Executable
Overview Results • Cobalt language • realistic C-like IL, operates on a CFG • implemented const prop and folding, branch folding, CSE, PRE, DAE, partial DAE, and simple forms of points-to analyses • Correctness checker for Cobalt opts • using the Simplify theorem prover • Execution engine for Cobalt opts • in the Whirlwind compiler
Overview Cobalt Rhodium ?
Overview Caveats • May not be able to express your opt Cobalt: • no interprocedural optimizations for now. • optimizations that build complicated data structures may be difficult to express. • A sound Cobalt optimization may be rejected by the correctness checker. • Trusted computing base (TCB) includes: • front-end and code-generator, execution engine, correctness checker, proofs done by hand once
Outline • Introduction: Setting the high level context • Motivation • Detours • Automated Theorem Proving • Compiler Optimizations thru Dataflow Analysis • Overview of the Cobalt System • Forward optimizations in cobalt • Proving Cobalt Optimizations Correct • Profitability Heuristics • Pure Analyses • Concluding Remarks
REPLACE Forward Optimizations Constant Prop (straight-line code) y := 5 statement y := 5 statements that don’t define y x := y x := 5 statement x := y
REPLACE Forward Optimizations Adding arbitrary control flow if statement y := 5 y := 5 y := 5 y := 5 is followed by statements that don’t define y until x := y x := 5 statement x := y then transform statement to x := 5
Forward Optimizations Constant prop in English if statement y := 5 is followed by statements that don’t define y until statement x := y then transform statement to x := 5