Intensive reading report of ESEC/FSE 2013: Formal Reasoning

Intensive reading report of ESEC/FSE 2013:Formal Reasoning Li Jiayao 2013/11/01

Papers of Formal Reasoning • Bayesian Inference using Data Flow Analysis • Guillaume Claret, Sriram K. Rajamani, Aditya V. Nori, Andrew D. Gordon, and Johannes Borgström • Second-Order Constraints in Dynamic Invariant Inference • Kaituo Li, ChristophReichenbach, YannisSmaragdakis, and Michal Young • Z3-str: A Z3-Based String Solver for Web Application Analysis • YunhuiZheng, Xiangyu Zhang, and Vijay Ganesh

Background • Dynamic Invariant Inference • An invariant: a property that holds at a certain point or points in a program; often seen in assert statements, documentation, and formal specifications. Examples: “array a is sorted”. • For systematically understanding program properties. • Software testing, documentation, and maintenance benefit directlyfrom dynamic invariant inference. • Daikon • An implementation of dynamic detection of likely invariants. • By monitoring a large number of program executions and heuristically inferring abstract logical properties of the program, expressed as invariants. http://groups.csail.mit.edu/pag/daikon/

Problem & Solution • Problem • How to improve the quality of produced invariants? • To reduce erroneous and noisy invariants that are inconsistent with program semantics or programmer knowledge, and to derive more relevant invariants? • Solution • An annotation mechanism for high-level constraints (a.k.a. “meta-invariants” or “second-order constraints”). • Benefit 1: Improves the quality of produced invariants. • Benefit 2: Serves as a concise and deeper documentation of program behavior by dynamically inferring “second-order constraints”.

Vocabulary of second-order constraints • A vocabulary to describe common second-order constraints. • Subdomain(processDiagonal, processUpperTriangular) • Subrange(listTail, listRange) • CanFollow(open, write) • Follows(add, remove) • Concord(triangularMultiply, matrixMultiply) • For methods that specialize other methods. • OnlyCareAboutVariable(<var>), OnlyCareAboutField(<fld>) • It actually reflects the relationship between conditions at two program points.

Daikon implementation • Easy for Subdomain, Subrange, CanFollow, Follows: • Example: Subdomain(foo(int), bar(int)) • A bit complex for Concord. foo(inti) bar(inti) i >= 0 i == 2 i >= 0

Evaluation(1) • Do second-order constraints aid the inference of better first-order invariants? • Reduce spurious invariants? • Add correct and insightful invariants? • 3 case studies. • StackAr, a small example application with a relatively thorough test suite. • Apache Commons Collections and AspectJCompiler, with their actual test suites • Second-order constraints are determined manually. • Diff the inferred invariants before and after applying second-order constraints.

Evaluation(1)(Continued) • Case Study #1: StackAr • an array-based fixed-size stack implementation that ships with Daikon as a Daikon benchmark and demonstration example.

Evaluation(1)(Continued) • Experiment #1: add: • Subdomain(StackAr.topAndPop(), StackAr.pop()), • Subdomain(StackAr.pop(), StackAr.top()), and • Subdomain(StackAr.top(), StackAr.topAndPop()). • Result: eliminate 5 spurious invariants from pop: • this has only one value • this.theArrayhas only one value • size(this.theArray[]) == 100 • this.theArray[this.topOfStack] != null • this.topOfStack< size(this.theArray[])-1 • And add 2 correct invariants: • this.topOfStack <= size(this.theArray[])-1 • this.DEFAULT_CAPACITY != size(this.theArray[])

Evaluation(1)(Continued) • Case Study #2: Apache Commons Collections • 356 classes, of which 18 is usedexplicitly. • With actual test suites. • Result: • All 35 invariants removed were false. • Add 26 invariants, 25 of which is true.

Dynamically inferring of “second-order constraints” • Key problem: to detect implication relationship between pre/post conditions P and Q at two different program points. • Success rate: • N: the number of invariants in Q that can be implied by P. • M: the number of invariants of Q. • Second-order constraint confidence: • : confidences of invariants of P or Q. • Z: the number of invariants of P. • Set SA and MA threshold to filter out constraints.

Evaluation(2) • Evaluate the success of our dynamic process of inferring second-order constraints: • in documenting the program behavior on their own, • in offering the programmer a set of mostly-correct second-order constraints, and • in finding bugs in manually written second-order constraints.

Evaluation(2)(Continued) • Evaluate correctness of inferred constraints. • manually verified all the generated second-order constraints. • 99% precision. • 5 false constraints due to the low quality of Daikon invariants. • Not necessarily all interesting. • Reflects the absence of other useful vocabulary. • E.g. for immutable class.

Evaluation(2)(Continued) • Inferred vs. Manual Constraints. • Helps in eliminating 12 erroneous manually written constraints. • Of the 52 constraints in Evaluation(1), produced 37 and missed 15. • Due to noisy invariants, absence of data samples and unimplemented vocabulary.

Reflection • A novel idea that goes beyond Dynamic Invariant Inference. • Practical in describing the semantics of program. • Benefit testing, maintenance and documenting. • A more systematical vocabulary is necessary.

Thanks! return 0;

Intensive reading report of ESEC/FSE 2013: Formal Reasoning