Translation Validation for an Optimizing Compiler

Translation Validation for an Optimizing Compiler Guy Erez Based on George C. Necula article (ACM SIGPLAN 2000) Advanced Programming Languages Seminar, Winter 2000

In a Nutshell • The Problem: Verify that the optimized and source code are equivalent • Partial (heuristic) Solution: Independently prove the validity of each translation pass • Motivation: Optimizer Testing

Outline • Introduction • Intermediate Language • An extensive example • Simulation Relation • Execution Pair • Equivalence Checking • Branch Navigation • Results and Limitations

Methods of Proving Compiler Correctness • Prove compiler general correctness: • absolute • tedious • impractical for large programs • very dependent of compiler code

Methods of Proving Compiler Corr. (cont.) • Show that each translation phase was valid • weaker • proof per program • applicable for large programs • independent of compiler code

Compilation Process SourceCode IntermediateLanguage(IL) TargetCode

Optimization Process Optimize Pass ILCode0 ILCode1 ILCoden Validator

The IL in GNU C (subset) • Instructions:Expressions: • Operators:

An Example extern int g;extern int a[…];main(){ int n=… /* n contains the length of the array */ int i; for (i=0; i<n; i++) a[i]=g*i+3; return i;}

And in IL… for (i=0;i<n; i++) a[i]=g*i+3;return i;

After Transformation… Use registers Transform while to a repeat loop ?<==> ?<==>

Equivalence • x1,…,xn– variables in source • y1,…,ym– variables in target • Variable Equivalence:x1 = y3 • Expression Equivalence:x1+x2 = y3+6

Simulation Relation • A set of equivalences between a source block and a target block

Execution Pair • Definition: An execution path in the source and its corresponding path in the target Source Target

Checking Equivalence • Equivalence is checked at the end of a specific execution pair • A variable value after the run is marked with a prime Symbolic Substitution x’=x+1 x y y’=y*3

Equivalence Simplification • An equivalence can be simplified using: • Arithmetic rules • Already proven equivalences • Example: If x’=x+1 and y’=y*5 then:3*x’=y’3*(x+1)=y*53*x+3=y*5 • An equivalence holds if it can be simplified to an already proven equivalence

Checking Simulation Relations • A relation is correct if for each execution pair entering it, all of its equivalences hold x x=y+1 y

Something fishy • What’s the point of proving something using the same rules that created it? • Simpler • Provides an independent perspective on the final code

A. Element #1 holds B. There is only oneexecution pair (no cond.) Showtime… C. Prove elem. #2 (Trivial)

b3-b1-b2 and b7-b5 Element #5 • Two execution pairs:

b3-b1-b3 and b7-b7 Element #5 (cont.) • The other pair:

Known Equivalences • Equivalences from the start of the run: • Equivalences at the end of run:

Need to Prove • The path condition is correct: • The equivalences hold, mainly:

Elem #5: Path Cond.

distributivity commutativity Elem #5: The Equivalence Q.E.D

Algorithm Parts • Inferring Simulation Relations • Finding execution pairs • Solving Constraints

Navigating Branches • An optimizer might eliminate or reverse branches • Problem: did branch B’ originate from branch B in the source • Solution: Use heuristics

A Typical Case

Similarity • The similarity between two branches depend on the similarity of their: • preceding instruction sequence • boolean conditions • the twobranching sequences

Similarity (cont.) • Formally: • ~ is a numeric relation(0..1) • “and” is multiplication • “or” is maximum

Boolean Similarity • Branches are similar if: • one can be simplified into the other using simple transforms, such as:

Instruction Similarity • Instructions similarity • amount of function calls • lead to already related branches (in that case, similarity is 1.0)

Instruction Similarity… • gcc specific features • IL instructions serial number • source line number information (for code duplication detection)

Results • Detected a known bug in gcc 2.7.2.2 • Used on large programs: • Increased compile time x4

Limitations • Cannot handle loop unrolling • Cannot resolve all types of equivalences • Produces several false alarms (i.e. the gcc bug was accompanied by 3 false alarms)

Conclusion • Automatically infer equivalences • Uses: • simple rules and substitution • heuristics • Good results • Problems: • false alarms • runtime overhead

Translation Validation for an Optimizing Compiler

Translation Validation for an Optimizing Compiler

Presentation Transcript

FlexCC2 : An Optimizing Retargetable C Compiler for DSP Applications

Compiler Design an Overview

The structure of an optimizing compiler

Polyglot An Extensible Compiler Framework for Java

Validation: an Example

Translation Validation via Linear Recursion Schemes

Optimizing Compiler . Scalar optimizations .

An Architecture for Courseware Validation

Optimizing General Compiler Optimization

Polyglot An Extensible Compiler Framework for Java

Optimizing compiler . Interpocedural optimizations .

Translation Validation

Optimizing Android Performance with GCC Compiler

Optimizing Compiler for the Cell Processor

Optimizing an Adword Account

Translation Validation: From Simulink to C

Optimizing Compiler . Scalar optimizations .

Optimizing compiler. Vectorization .

Translation Validation of Compilers for Model-based Programming

The structure of an optimizing compiler

CStar Optimizing a C Compiler

Translation Validation for an Optimizing Compiler