240 likes | 503 Views
Rahul Sharma, Eric Schkufza , Berkeley Churchill, Alex Aiken. DDEC: Data-Driven Equivalence Checking. Equivalence checking. Prove two programs are equivalent Compiler optimizations Validate refactorings Cross checking different implementations Old and well studied problem
E N D
Rahul Sharma, Eric Schkufza, Berkeley Churchill, Alex Aiken DDEC: Data-Driven Equivalence Checking
Equivalence checking • Prove two programs are equivalent • Compiler optimizations • Validate refactorings • Cross checking different implementations • Old and well studied problem • Undecidable in general • Major challenge: prove equivalence of loops • Straight line programs relatively easy
Motivating applications • Prove equivalence of two binaries Trustworthy Compiler CompCert, gcc –O0 … while … … Confidence of , Performance of Optimizing Compiler gcc –O3, icc –O3
Stochastic superoptimization Straight Line Code … while … … Trustworthy Compiler CompCert, gcc –O0 STOKE (ASPLOS 13) Random mutations
Previous work • Do not support “while” loops: [CHR00], [FH02], [FH05], [AEF+05], [SBC+05], [MSF06] • Do not reason about termination: [SDE+08], [GS09], [RE11], [LHM+12], [PY13], [LMS+13] • Translation validation: [Nec00],[GZB05], … • Need information from the compiler
Simulation relation • Decompose proof Rewrite a’ Target movq 8(rsp), r9 a movq 8(rsp), rdi #rdi != 0 #r9 != 0 b b’ decq r9 retq movq 8(rsp), rdi decqrdi movqrdi, 8(rsp) retq c c’ : states equal : 8(rsp)=rdi=r9’ : live out equal
Inference • Given a simulation relation, proofs for loops reduce to proofs for loop free fragments • Use decision procedures • Main challenge: infer a simulation relation • Infer synchronization points • Infer invariants • We use compilers as black boxes • Mine relations from concrete executions
Runtime information • Run some tests to get data • From executions, unit tests, random tests, etc.
Runtime information • Ensure the loops iterate for equal iterations • Use data to align and Target Rewrite 2n n n B B;B B’ retq retq
Runtime information • Attempt to detect synchronization points • Number of times program points are executed • Values align Rewrite n Target movq 8(rsp), r9 movq 8(rsp), rdi #rdi != 0 #r9 != 0 n+1 n+1 decq r9 retq movq 8(rsp), rdi decqrdi movqrdi, 8(rsp) retq 1 n n
Invariants • Invariants are restricted to equalities • Infer invariants from observed data values Target movq 8(rsp), rdi #rdi != 0 movq 8(rsp), rdi decqrdi movqrdi, 8(rsp) retq
Invariants • Invariants are restricted to equalities • Infer invariants from observed data values Rewrite movq 8(rsp), r9 #r9 != 0 decq r9 retq
Linear algebra • Mine all equalities • Find all s.t. • Nullspace or kernel
Check simulation relation • The executions are synchronized • The invariants are maintained Rewrite a’ States equal Target movq 8(rsp), r9 a movq 8(rsp), rdi #rdi != 0 #r9 != 0 b b’ decq r9 retq movq 8(rsp), rdi decqrdi movqrdi, 8(rsp) retq c c’ Live outs equal
Check simulation relation • The executions are synchronized • The invariants are maintained • Queries in quantifier free bitvector arithmetic • Complete SMT solvers! • Incorporate counter-examples in relations • Sound but not complete • If checking succeeds then equivalent • Can fail to infer a sound simulation relation
Limitations • Insufficient data to infer a sound relation • Expressiveness of invariants • Inequalities, quantifiers, etc. • Expressiveness of SMT solver • Floating point, multiply, divide, etc.
Implementation • Run tests and generate data • https://github.com/eschkufz/x64asm • Nullspace computation • libIML: integer matrix library • SMT solver: Z3
Case studies • Compute kernel inside OpenSSL • Validating CompCert against gcc • Stochastic optimization for loops
OpenSSL • Multiplication kernel • Extensive performance tests • Run the kernel ~15 million times • Choose 16 random tests for inference • Compile with gcc –O0 and gcc –O3 • Successfully prove equivalence
Conclusion • Prove equivalence of loops in two stages • Infer simulation relation • Check the inferred relation using SMT solvers • Use runtime data for inference • No change required to the compilers • Better verifiers lead to better optimizers
Inference from concrete states • M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco, M. S. Tschantz, and C. Xiao. The Daikon system for dynamic detection of likely invariants. Sci. Comput. Program., 69(1-3):35–45, 2007 • T. Nguyen, D. Kapur, W. Weimer, and S. Forrest. Using dynamic analysis to discover polynomial and array invariants. ICSE 2012 • P. Garg, C. Löding, P. Madhusudan, D. Neider: Learning Universally Quantified Invariants of Linear Data Structures. CAV 2013 • R. Sharma, S. Gupta, B. Hariharan, A. Aiken, P. Liang, A. V. Nori: A Data Driven Approach for Algebraic Loop Invariants. ESOP 2013 • R. Sharma, S. Gupta, B. Hariharan, A. Aiken, A. V. Nori: Verification as Learning Geometric Concepts. SAS 2013 • A.V. Nori, R. Sharma: Termination proofs from tests. ESEC/SIGSOFT FSE 2013