Automated Developer Testing: Achievements and Challenges

Automated Developer Testing:Achievements and Challenges • Tao Xie • North Carolina State University • In collaboration with Nikolai Tillmann, Peli de Halleux, Wolfram Schulte @Microsoft Research and students @NCSU ASE

Why Automate Testing? • Software testing is important • Software errors cost the U.S. economy about $59.5 billion each year (0.6% of the GDP) [NIST 02] • Improving testing infrastructure could save 1/3 cost [NIST 02] • Software testing is costly • Account for even half the total cost of software development [Beizer 90] • Automated testing reduces manual testing effort • Test execution: JUnit, NUnit, xUnit, etc. • Test generation: Pex, AgitarOne, ParasoftJtest, etc. • Test-behavior checking: Pex, AgitarOne, ParasoftJtest, etc.

Automation in Developer Testing • Developer testing • http://www.developertesting.com/ • Kent Beck’s 2004 talk on “Future of Developer Testing”http://www.itconversations.com/shows/detail301.html • This talk focuses on tool automation indevelopertesting (e.g., unit testing) • Not system testing etc. conducted by testers

? = Software Testing Setup + Expected Outputs Test inputs Program Outputs Test Oracles

? = Software Testing Problems + Expected Outputs Test inputs Program Outputs Test Oracles • Test Generation • Generating high-quality test inputs (e.g., achieving high code coverage)

? = Software Testing Problems + Expected Outputs Test inputs Program Outputs Test Oracles • Test Generation • Generating high-quality test inputs (e.g., achieving high code coverage) • Test Oracles • Specifying high-quality test oracles (e.g., guarding against various faults)

The Recipe of Unit Testing • Three essential ingredients: • Data • Method Sequence • Assertions void Add() { int item = 3; var list = new List(); list.Add(item); var count = list.Count; Assert.AreEqual(1, count); }

The (problem with) Data list.Add(3); • Which value matters? • Bad choices cause incomplete test suites. • Hard-coded values get stale when product code changes. • Why pick a value if it doesn’t matter?

Parameterized Unit Testing [Tillmann&Schulte ESEC/FSE 05] • Parameterized Unit Test = Unit Test with Parameters • Separation of concerns • Data is generated by a tool • Developer can focus on functional specification void Add(List list, int item) { var count = list.Count; list.Add(item); Assert.AreEqual(count + 1, list.Count); }

Parameterized Unit Tests areAlgebraic Specifications • A Parameterized Unit Test can be read as a universally quantified, conditional axiom. void ReadWrite(string name, string data) {Assume.IsTrue(name != null && data != null); Write(name, data);varreadData = Read(name); Assert.AreEqual(data, readData); }  string name, string data: name ≠ null ⋀ data ≠ null ⇒ equals( ReadResource(name,WriteResource(name,data)), data)

Parameterized Unit Testingis going mainstream Parameterized Unit Tests (PUTs) commonly supported by various test frameworks • .NET: Supported by .NET test frameworks • http://www.mbunit.com/ • http://www.nunit.org/ • … • Java: Supported by JUnit 4.X • http://www.junit.org/ Generating test inputs for PUTs supported by tools • .NET: Supported by Microsoft Research Pex • http://research.microsoft.com/Pex/ • Java: Supported by AgitarAgitarOne • http://www.agitar.com/

Test Generation • Human • Expensive, incomplete, … • Brute Force • Pairwise, predefined data, etc… • Random: • Cheap, Fast • “It passed a thousand tests” feeling • Dynamic Symbolic Execution: Pex, CUTE,EXE • Automated white-box • Not random – Constraint Solving

Dynamic Symbolic Execution Choose next path • Code to generate inputs for: Solve Execute&Monitor void CoverMe(int[] a) { if (a == null) return; if (a.Length > 0) if (a[0] == 1234567890) throw new Exception("bug"); } Negated condition a==null F T a.Length>0 T F Done: There is no path left. a[0]==123… F T Data null {} {0} {123…} Observed constraints a==null a!=null && !(a.Length>0) a!=null && a.Length>0 && a[0]!=1234567890 a!=null && a.Length>0 && a[0]==1234567890 Constraints to solve a!=null a!=null && a.Length>0 a!=null && a.Length>0 && a[0]==1234567890

Challenges of DSE • Loops • Fitnex [Xie et al. DSN 09] • Generic API functions e.g., RegEx matching IsMatch(s1,regex1) • Reggae [Li et al. ASE 09-sp] • Method sequences • MSeqGen [Thummalapenta et al. ESEC/FSE 09] • Environments e.g., file systems, network, db, … • Parameterized Mock Objects [MarriAST 09] Opportunities • Regression testing [Taneja et al. ICSE 09-nier] • Developer guidance (cooperative developer testing)

NCSU Industry Tech Transfer • Loops • Fitnex [Xie et al. DSN 09] • Generic API functions e.g., RegEx matching IsMatch(s1,regex1) • Reggae [Li et al. ASE 09-sp] • Method sequences • MSeqGen [Thummalapenta et al. ESEC/FSE 09] • Environments e.g., file systems, network, db, … • Parameterized Mock Objects [Marri AST 09] Applications • Test network app at Army division@Fort Hood, Texas • Test DB app of hand-held medical assistant device at FDA

Pex on MSDN DevLabsIncubation Project for Visual Studio • Download counts (20 months)(Feb. 2008 - Oct. 2009 ) • Academic: 17,366 • Devlabs: 13,022 • Total: 30,388

NCSU Industry Tech Transfer • Loops • Fitnex [Xie et al. DSN 09] • Generic API functions e.g., RegEx matching IsMatch(s1,regex1) • Reggae [Li et al. ASE 09-sp] • Method sequences • MSeqGen [Thummalapenta et al. ESEC/FSE 09] • Environments e.g., file systems, network, db, … • Parameterized Mock Objects [Marri AST 09] Applications • Test network app at Army division@Fort Hood, Texas • Test DB app of hand-held medical assistant device at FDA

Explosion of Search Space There are decision procedures for individual path conditions, but… • Number of potential paths grows exponentially with number of branches • Reachable code not known initially • Without guidance, same loop might be unfolded forever Fitnex search strategy [Xie et al. DSN 09]

DSE Example TestLoop(0, {0}) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } Path condition: !(x == 90) ↓ New path condition: (x == 90) ↓ New test input: TestLoop(90, {0})

DSE Example TestLoop(90, {0}) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } Path condition: (x == 90) && !(y[0] ==15) ↓ New path condition: (x == 90) && (y[0] ==15) ↓ New test input: TestLoop(90, {15})

Challenge in DSE TestLoop(90, {15}) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } Path condition: (x == 90) && (y[0] ==15) && !(x+1 == 110) ↓ New path condition: (x == 90) && (y[0] ==15) && (x+1 == 110) ↓ New test input: No solution!?

A Closer Look TestLoop(90, {15}) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } Path condition: (x == 90) && (y[0] ==15) && (0 < y.Length) && !(1 < y.Length) && !(x+1 == 110) ↓ New path condition: (x == 90) && (y[0] ==15) && (0 < y.Length) && (1 < y.Length)  Expand array size

A Closer Look TestLoop(90, {15}) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } We can have infinite paths! Manual analysis  need at least 20 loop iterations to cover the target branch Exploring all paths up to 20 loop iterations is infeasible: 220 paths

Fitnex: Fitness-Guided Exploration [Xie et al. DSN 2009] TestLoop(90, {15, 0}) TestLoop(90, {15, 15}) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } Key observations: with respect to the coverage target • not all paths are equally promising for branch-node flipping • not all branch nodes are equally promising to flip • Our solution: • Prefer to flip branch nodes on the most promising paths • Prefer to flip the most promising branch nodes on paths • Fitness function to measure “promising” extents

Fitness Function • FF computes fitness value (distance between the current state and the goal state) • Search tries to minimize fitness value [Tracey et al. 98, Liu at al. 05, …]

Fitness Function for (x == 110) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } Fitness function: |110 – x |

Compute Fitness Values for Paths Fitness Value public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } (x, y) (90, {0}) 20 (90, {15}) 19 (90, {15, 0}) 19 (90, {15, 15}) 18 (90, {15, 15, 0}) 18 (90, {15, 15, 15}) 17 (90, {15, 15, 15, 0}) 17 (90, {15, 15, 15, 15}) 16 (90, {15, 15, 15, 15, 0}) 16 (90, {15, 15, 15, 15, 15}) 15 … Fitness function: |110 – x | Give preference to flip paths with better fitness values We still need to address which branch node to flip on paths …

Compute Fitness Gains for Branches Fitness Value public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } (x, y) (90, {0}) 20 (90, {15})  flip b4 19 (90, {15, 0})  flip b2 19 (90, {15, 15})  flip b4 18 (90, {15, 15, 0})  flip b2 18 (90, {15, 15, 15})  flip b4 17 (90, {15, 15, 15, 0})  flip b2 17 (90, {15, 15, 15, 15})  flip b4 16 (90, {15, 15, 15, 15, 0})  flip b2 16 (90, {15, 15, 15, 15, 15})  flip b4 15 … Fitness function: |110 – x | Branch b1: i < y.Length Branch b2: i >= y.Length Branch b3: y[i] == 15 Branch b4: y[i] != 15 • Flipping Branch b4 (b3) gives us average 1 (-1) fitness gain (loss) • Flipping branch b2 (b1) gives us average 0 fitness gain (loss)

Compute Fitness Gain for Branches cont. • For a flipped node leading to Fnew, find out the old fitness value Fold before flipping • Assign Fitness Gain (Fold – Fnew)for the branch of the flipped node • Assign Fitness Gain (Fnew – Fold )for the other branch of the branch of the flipped node • Compute the average fitness gain for each branch over time

Search Frontier • Each branch node candidate for being flipped is prioritized based on its composite fitness value: • (Fitness value of node – Fitness gain of its branch) • Select first the one with the best composite fitness value • To avoid local optimal or biases, the fitness-guided strategy is integrated with Pex’s search strategies

Object Creation • Pex normally uses public methods to configure non-public object fields • Heuristics built-in to deal with common types • User can help if needed void (Foofoo) { if (foo.Value == 123) throw … [PexFactoryMethod] Foo Create(Bar bar) { return new Foo(bar);}

QuickGraph Example • A graph example from QuickGraph library • interface IGraph • { • /* Adds given vertex to the graph */ • void AddVertex(IVertex v); • /* Creates a new vertex and adds it to the graph */ • IVertexAddVertex(); • /* Adds an edge to the graph. Both vertices should • already exist in the graph */ • IEdgeAddEdge(IVertex v1, Ivertex v2); • } 32 32

Method Under Test • Desired object state for reaching targets 1 and 2: graph object should contain vertices and edges method sequence • Class SortAlgorithm • { • IGraph graph; • public SortAlgorithm(IGraphgraph) { • this.graph= graph; • } • public void Compute (IVertex s) { • foreach(IVertex u in graph.Vertices) • { • //Target 1 • } • foreach(IEdge e in graph.Edges) • { • //Target 2 • } • } • }

Method Under Test • Applying Randoop, a random testing approach that constructs test inputs by randomly selecting method calls Example sequence generated by Randoop VertexAndEdgeProvider v0 = new VertexAndEdgeProvider(); Boolean v1 = false; BidirectionalGraph v2 = new BidirectionalGraph((IVertexAndEdgeProvider)v0, v1); IVertex v3 = v2.AddVertex(); IVertex v4 = v0.ProvideVertex(); IEdge v15 = v2.AddEdge(v3, v4); v4 not in the graph, so edge cannot be added to graph. • Achieved 31.82% (7 of 22) branch coverage • Reason for low coverage: Not able to generate graph with vertices and edges

New MSeqGen Approach • Mine sequences from existing code bases • Reuse mined sequences for achieving desired object states A Mined sequence from an existing codebase VertexAndEdgeProvider v0; boolbVal; IGraphag = new AdjacencyGraph(v0, bVal); IVertex source = ag.AddVertex(); IVertex target = ag.AddVertex(); IVertex vertex3 = ag.AdVertex(); IEdge edg1 = ag.AddEdge(source, target); IEdge edg2 = ag.AddEdge(target, vertex3); IEdge edg3 = ag.AddEdge(source, vertex3); Graph objectincludes both vertices and edges • Use mined sequences to assist Randoop and Pex • Both Randoop and Pex achieved 86.40% (19 of 22) branch coverage with assistance from MSeqGen

Challenges Addressed by MSeqGen • Existing codebases are often large and complete analysis is expensive •  Search and analyze only relevant portions • Concrete values in mined sequences may be different from desired values •  Replace concrete values with symbolic values and use dynamic symbolic execution • Extracted sequences individually may not be sufficient to achieve desired object states •  Combine extracted sequences to generate new sequences

MSeqGen: Code Searching • Problem: Existing code bases are often large and complete analysis is expensive • Solution: • Use keyword search for identifying relevant method bodies using target classes • Analyze only those relevant method bodies Target classes: System.Collections.Hashtable • QuickGraph.Algorithms.TSAlgorithm Keywords: Hashtable, TSAlgorithm Shortnames of target classes are used as keywords

MSeqGen: Sequence Generalization • Problem: Concrete values in mined sequences are different from desired values to achieve target states • Solution: Generalize sequences by replacing concrete values with symbolic values Method Under Test Class A { int f1 { set; get; } int f2 { set; get; } void CoverMe() { if (f1 != 10) return; if (f2 > 25) throw new Exception(“bug”); } } Mined Sequence for A A obj = new A(); obj.setF1(14); obj.setF2(-10); obj.CoverMe(); Sequence cannot help in exposing bug since desired values are f1=10 and f2>25

MSeqGen: Sequence Generalization • Replace concrete values 14 and -10 with symbolic values X1 and X2 Generalized Sequence for A Mined Sequence for A A obj = new A(); obj.setF1(14); obj.setF2(-10); obj.CoverMe(); int x1 = *, x2 = *; A obj = new A(); obj.setF1(x1); obj.setF2(x2); obj.CoverMe(); • Use DSE for generating desired values for X1 and X2 • DSE explores CoverMemethod and generates desired values (X1 = 10 and X2 = 35)

Improvement of State-of-the-Art • Randoop • Without assistance from MSeqGen: achieved 32% branch coverage  achieved 86%branch coverage • In evaluation, help Randoop achieve 8.7% (maximum 20%) higher branch coverage • Pex • Without assistance from MSeqGen: achieved 45% branch coverage  achieved 86%branch coverage • In evaluation, help Pex achieve 17.4% (maximum 22.5%) higher branch coverage 40 40

Test Oracles • Write assertions and Pex will try to break them • Without assertions, Pex can only find violations of runtime contracts causing NullReferenceException, IndexOutOfRangeException, etc. • Assertions leveraged in product and test code • Pex can leverage Code Contracts

? = Summary:Automated Developer Testing + Expected Outputs Test inputs Program Outputs Test Oracles Division of Labors • Test Generation • Test inputs for PUT generated by tools (e.g., Pex) • Fitnex: guided exploration of paths [DSN 09] • MSeqGen: exploiting real-usage sequences [ESEC/FSE 09] • Test Oracles • Assertions in PUT specified by developers

Thank you http://research.microsoft.com/pex http://pexase.codeplex.com/ https://sites.google.com/site/asergrp/

Code Contracts • http://research.microsoft.com/en-us/projects/contracts/ • Library to state preconditions, postconditions, invariants • Supported by two tools: • Static Checker • Rewriter: turns Code Contracts into runtime checks • Pex analyses the runtime checks • Contracts act as Test Oracle • Pex may find counter examples for contracts • Missing Contracts may be suggested

Example: ArrayList Class invariant specification: public class ArrayList { private Object[] _items; private int _size; ... [ContractInvariantMethod] // attribute comes with Contracts protected void Invariant() { Contract.Invariant(this._items != null); Contract.Invariant(this._size >= 0); Contract.Invariant(this._items.Length >= this._size); }

ParameterizedModels

Unit Testing vs. Integration Testing • Unit test: while it is debatable what a ‘unit’ is, a ‘unit’ should be small. • Integration test: exercises large portions of a system. • Observation: Integration tests are often “sold” as unit tests • White-box test generation does not scale well to integration test scenarios. • Possible solution: Introduce abstraction layers, and mock components not under test

ExampleTesting with Interfaces AppendFormat(null, “{0} {1}!”, “Hello”, “World”);  “Hello World!” .Net Implementation: public StringBuilder AppendFormat( IFormatProvider provider, char[] chars, params object[] args){ if (chars == null || args == null) throw new ArgumentNullException(…); int pos = 0; int len = chars.Length; char ch = '\x0'; ICustomFormatter cf = null; if (provider != null) cf = (ICustomFormatter)provider.GetFormat(typeof(ICustomFormatter)); …

Stubs / Mock Objects • Introduce a mock class which implements the interface. • Write assertions over expected inputs, provide concrete outputs public class MFormatProvider : IFormatProvider { public object GetFormat(Type formatType){ Assert.IsTrue(formatType != null); return new MCustomFormatter(); } } • Problems: • Costly to write detailed behavior by example • How many and which mock objects do we need to write?

Parameterized Mock Objects - 1 • Introduce a mock class which implements the interface. • Let an oracle provide the behavior of the mock methods. public class MFormatProvider : IFormatProvider { public object GetFormat(Type formatType) { … object o = call.ChooseResult<object>(); return o; } } • Result: Relevant result values can be generated by white-box test input generation tool, just as other test inputs can be generated!

Automated Developer Testing: Achievements and Challenges