360 likes | 379 Views
Translation Validation for an Optimizing Compiler. Guy Erez. Based on George C. Necula article (ACM SIGPLAN 2000). Advanced Programming Languages Seminar, Winter 2000. In a Nutshell. The Problem : Verify that the optimized and source code are equivalent
E N D
Translation Validation for an Optimizing Compiler Guy Erez Based on George C. Necula article (ACM SIGPLAN 2000) Advanced Programming Languages Seminar, Winter 2000
In a Nutshell • The Problem: Verify that the optimized and source code are equivalent • Partial (heuristic) Solution: Independently prove the validity of each translation pass • Motivation: Optimizer Testing
Outline • Introduction • Intermediate Language • An extensive example • Simulation Relation • Execution Pair • Equivalence Checking • Branch Navigation • Results and Limitations
Methods of Proving Compiler Correctness • Prove compiler general correctness: • absolute • tedious • impractical for large programs • very dependent of compiler code
Methods of Proving Compiler Corr. (cont.) • Show that each translation phase was valid • weaker • proof per program • applicable for large programs • independent of compiler code
Compilation Process SourceCode IntermediateLanguage(IL) TargetCode
Optimization Process Optimize Pass ILCode0 ILCode1 ILCoden Validator
The IL in GNU C (subset) • Instructions:Expressions: • Operators:
An Example extern int g;extern int a[…];main(){ int n=… /* n contains the length of the array */ int i; for (i=0; i<n; i++) a[i]=g*i+3; return i;}
And in IL… for (i=0;i<n; i++) a[i]=g*i+3;return i;
After Transformation… Use registers Transform while to a repeat loop ?<==> ?<==>
Equivalence • x1,…,xn– variables in source • y1,…,ym– variables in target • Variable Equivalence:x1 = y3 • Expression Equivalence:x1+x2 = y3+6
Simulation Relation • A set of equivalences between a source block and a target block
Execution Pair • Definition: An execution path in the source and its corresponding path in the target Source Target
Checking Equivalence • Equivalence is checked at the end of a specific execution pair • A variable value after the run is marked with a prime Symbolic Substitution x’=x+1 x y y’=y*3
Equivalence Simplification • An equivalence can be simplified using: • Arithmetic rules • Already proven equivalences • Example: If x’=x+1 and y’=y*5 then:3*x’=y’3*(x+1)=y*53*x+3=y*5 • An equivalence holds if it can be simplified to an already proven equivalence
Checking Simulation Relations • A relation is correct if for each execution pair entering it, all of its equivalences hold x x=y+1 y
Something fishy • What’s the point of proving something using the same rules that created it? • Simpler • Provides an independent perspective on the final code
A. Element #1 holds B. There is only oneexecution pair (no cond.) Showtime… C. Prove elem. #2 (Trivial)
b3-b1-b2 and b7-b5 Element #5 • Two execution pairs:
b3-b1-b3 and b7-b7 Element #5 (cont.) • The other pair:
Known Equivalences • Equivalences from the start of the run: • Equivalences at the end of run:
Need to Prove • The path condition is correct: • The equivalences hold, mainly:
distributivity commutativity Elem #5: The Equivalence Q.E.D
Algorithm Parts • Inferring Simulation Relations • Finding execution pairs • Solving Constraints
Navigating Branches • An optimizer might eliminate or reverse branches • Problem: did branch B’ originate from branch B in the source • Solution: Use heuristics
Similarity • The similarity between two branches depend on the similarity of their: • preceding instruction sequence • boolean conditions • the twobranching sequences
Similarity (cont.) • Formally: • ~ is a numeric relation(0..1) • “and” is multiplication • “or” is maximum
Boolean Similarity • Branches are similar if: • one can be simplified into the other using simple transforms, such as:
Instruction Similarity • Instructions similarity • amount of function calls • lead to already related branches (in that case, similarity is 1.0)
Instruction Similarity… • gcc specific features • IL instructions serial number • source line number information (for code duplication detection)
Results • Detected a known bug in gcc 2.7.2.2 • Used on large programs: • Increased compile time x4
Limitations • Cannot handle loop unrolling • Cannot resolve all types of equivalences • Produces several false alarms (i.e. the gcc bug was accompanied by 3 false alarms)
Conclusion • Automatically infer equivalences • Uses: • simple rules and substitution • heuristics • Good results • Problems: • false alarms • runtime overhead