Puzzle-Based Automatic Testing: Enhancing Software Quality through Human-Driven Solutions

Puzzle-based automatic testing: bringing humans into the loop by solving puzzles Ning Chen and Sunghun Kim Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. ACM, 2012 2016/12/02 Thomas M2

First author • Ning Chen • PhD student, now works for Google • Sunghun Kim • Associate professor at the Hong Kong University of Science and Technology • Research interests: • Software engineering • Programmer productivity

Abstract • Many automatic test generation methods exist • Randoop [1] • Pex [2] • jCUTE [3] • However, their coverage results are not satisfactory when applied to complex object-oriented programs. • In this work: transform the problem into human-readable puzzles to be solved by non experts to overcome the methods’ limits

Outline • Introduction • Related Work • Motivating example • Design and Implementation • Evaluation • Discussion • Conclusion and future work

1. IntroductionChallenges • Software testing is difficult but essential • Main challenges in automatic test generation for object-oriented programs: • Insufficient objects: instantiating a valid object can be complicated • Complex constraint solving: how to access a branch ? • Object mutation: bringing objects to a certain state is non trivial

1. IntroductionHumans: slow but smart • Computers are bad at handling particular types of problems such as non linear constraints. • A complex condition to analyze for a computer can be trivial for a human:

1. IntroductionContribution • A puzzle-based testing environment: generates puzzles from unsolved constraints and uses human-supplied solution to generate tests. • An implementation: PAT • An experimental evaluation

2. Related work • Automatic test generation: • Random: Randoop[1], GADGET[4] • Dynamic Symbolic Execution: Pex[2] • Object extraction: MSeqGen[5] • All these methods have limitations, PAT is designed to complement them. • Human computation: • reCAPTCHA, Foldit, ESP game • Verification game[6]

Motivation example A motivating example from Apache Commons Math

Motivating exampleConstraint solving challenge • Constraints to reach target branch: • Constraint (3) is non linear, and thus is not supported by some state of the art SMT solvers • Some constraints have been shown to be undecidable

Motivating exampleObject mutation challenge • A model that satisfies the constraints: • If sums and n are not public and there are no setters then it is not trivial to get them to these values. • By observation and semantic induction humans can find the right method call sequence to arrive in this state.

Design and implementationOverall Architecture

Design and implementationUp-front testing • Run tests generated by other tools (randoop, …) • Compute coverage • Collect dynamic information (object instances, call sequences) • Identify branches that are not covered

Design and implementationPath computation • Look for feasible execution path to cover branches that are not covered: • Transform the program into Static Single Assignment • Backward symbolic execution until public entry point • Collect path conditions corresponding to the backtracked path • Try to solve conditions to obtain a model to reach the branch • Repeat until x (here 10) models are obtained • Thresholds to ensure termination

Design and implementationMutation puzzle – sub models • For branches where there is a valid model but no object satisfying said model, generate a mutation puzzle. • Models are separated into sub-models for each object reference. • Solutions to each sub model can be combined if they are independent to create a solution to the full model. • Sub models can be part of multiple models. The more models a sub model is part of, the more priority it is given.

Design and implementationMutation puzzle – puzzle generation and hints

Design and implementationConstraint solving puzzle – generating puzzles • If constraints for a path are unsolved because of a SMT solver error, generate a constraint solving puzzle. • Extract only erroneous constraints for puzzle generation, create a partial model for the others.

Design and implementationConstraint solving puzzle – prioritizing puzzles • Constraint sets can be semantically equivalent even if literally different. • Equivalent sets are presented as one puzzle. • Puzzles that lead to higher potential gain in coverage are prioritized over others

Design and implementationConstraint solving puzzle – solving puzzles

Design and implementationConstructing test cases from solutions • Constraint puzzle solutions are directly used to create models • If the created model is non trivial, new mutation puzzles can be created • Mutation solutions are transformed into call sequences used to generate test cases.

EvaluationSetup • Subject programs: • Baseline: Randoop + Symbolic

EvaluationRQ1 - setup • How many object mutation puzzles and constraint solving puzzles can be solved by humans? • Crowd: 8 graduate computer science students • Subjects: 100 top constraint and mutation puzzle for ACM

EvaluationRQ1 - results • In average the participants spend one minute to solve or pass a puzzle. • Participants solved constraint puzzle more easily than mutation puzzles.

EvaluationRQ2 - setup • How many people would play PAT voluntarily? • Crowd: Volunteers from twitter • Subjects: 100 top constraint and mutation puzzle for ACC

EvaluationRQ2 - results • In average the participants spend one minute to solve or pass a puzzle. • Participants solved constraint puzzle more easily than mutation puzzlesand solved less mutation puzzles than in the first experiment. • Higher participation possible with gaming mechanisms

EvaluationRQ3 • How much is the test coverage improved by the puzzle solutions of PAT?

EvaluationRQ4 - setting • How much manual test case writing effort can be saved with the help of PAT? • One student wrote tests to coverage 10 randomly chosen branches that were not covered by automatic tools.

EvaluationRQ4 - results

Discussion • The authors evaluate the applicability of their method on two subject programs. The evaluation’s scale is very small though and might not be representative of the general problem. • The authors claim there is no need for expert knowledge but the identity of the workers is unknown or the workers are students in computer science, thus “experts”.

Conclusion • The authors introduce a new representation for humans to solve constraint and mutation problems that resist state of the art methods. • The authors’ method does not necessitate expert knowledge from the workers • The authors propose to extend the method with new puzzle types.

Thank you for your attention. Questions ? Remarks?

References • [1] C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In Proc. ICSE, pages 75–84, Minneapolis, MN, USA, May 23–25, 2007 • [2] N. Tillmann and J. de Halleux. Pex-white box test generation for .NET. In Proc. TAP, pages 134–153, 2008. • [3]K.SenandG.Agha.CUTEandjCUTE:Concolicunittestingandexplicit path model-checking tools. In Proc. CAV, 2006. • [4]C.C.Michael,G.McGraw,andM.A.Schatz.Generatingsoftware test data by evolution. IEEE Trans. Softw. Eng., 27:1085–1110, December 2001. • [5]S.Thummalapenta,T.Xie,N.Tillmann,P.deHalleux,andW. Schulte. MSeqGen: Object-oriented unit-test generation via mining source code. In Proc. ESEC/FSE, August 2009. • [6]W.Dietl,S.Dietzel,M.D.Ernst,N.Mote,B.Walker,S.Cooper, • T. Pavlik, and Z. Popovic ́. Verification games: Making verification fun. In FTfJP’2012: 14th Workshop on Formal Techniques for Java-like Programs, Beijing, China, June 2012.

Puzzle-Based Automatic Testing: Enhancing Software Quality through Human-Driven Solutions