1.28k likes | 1.62k Views
Automated Software Testing and Mining Software Engineering Data: Achievements and Challenges. Tao Xie North Carolina State University In collaboration with Nikolai Tillmann , Peli de Halleux , Wolfram Schulte, @Microsoft Research, Suresh Thummalapenta , and students @NCSU ASE.
E N D
Automated Software Testing and Mining Software Engineering Data:Achievements and Challenges • Tao Xie • North Carolina State University • In collaboration with Nikolai Tillmann, Peli de Halleux, Wolfram Schulte, • @Microsoft Research, Suresh Thummalapenta, and students @NCSU ASE
Why Automate Testing? • Software testing is important • Software errors cost the U.S. economy about $59.5 billion each year (0.6% of the GDP) [NIST 02] • Improving testing infrastructure could save 1/3 cost [NIST 02] • Software testing is costly • Account for even half the total cost of software development [Beizer 90] • Automated testing reduces manual testing effort • Test execution: JUnit, NUnit, xUnit, etc. • Test generation: Pex, AgitarOne, ParasoftJtest, etc. • Test-behavior checking: Pex, AgitarOne, ParasoftJtest, etc.
? = Software Testing Setup + Expected Outputs Test inputs Program Outputs Test Oracles
? = Software Testing Problems + Expected Outputs Test inputs Program Outputs Test Oracles • Test Generation • Generating high-quality test inputs (e.g., achieving high code coverage)
? = Software Testing Problems + Expected Outputs Test inputs Program Outputs Test Oracles • Test Generation • Generating high-quality test inputs (e.g., achieving high code coverage) • Test Oracles • Specifying high-quality test oracles (e.g., guarding against various faults)
Test Generation • Human • Expensive, incomplete, … • Brute Force • Pairwise, predefined data, etc… • Random: • Cheap, Fast • “It passed a thousand tests” feeling • Dynamic Symbolic Execution: Pex, CUTE,EXE • Automated white-box • Not random – Constraint Solving
Dynamic Symbolic Execution Choose next path • Code to generate inputs for: Solve Execute&Monitor void CoverMe(int[] a) { if (a == null) return; if (a.Length > 0) if (a[0] == 1234567890) throw new Exception("bug"); } Negated condition a==null F T a.Length>0 T F Done: There is no path left. a[0]==123… F T Data null {} {0} {123…} Observed constraints a==null a!=null && !(a.Length>0) a!=null && a.Length>0 && a[0]!=1234567890 a!=null && a.Length>0 && a[0]==1234567890 Constraints to solve a!=null a!=null && a.Length>0 a!=null && a.Length>0 && a[0]==1234567890
Challenges of DSE • Loops • Fitnex [Xie et al. DSN 09] • Generic API functions e.g., RegEx matching IsMatch(s1,regex1) • Reggae [Li et al. ASE 09-sp] • Method sequences • MSeqGen [Thummalapenta et al. ESEC/FSE 09] • Environments e.g., file systems, network, db, … • Parameterized Mock Objects [Marri et al. AST 09] Opportunities • Regression testing [Taneja et al. ICSE 09-nier] • Developer guidance (cooperative developer testing) [Xiao et al. ICSE 11]
NCSU Industry Tech Transfer • Loops • Fitnex [Xie et al. DSN 09] • Generic API functions e.g., RegEx matching IsMatch(s1,regex1) • Reggae [Li et al. ASE 09-sp] • Method sequences • MSeqGen [Thummalapenta et al. ESEC/FSE 09] • Environments e.g., file systems, network, db, … • Parameterized Mock Objects [Marri et al. AST 09] Applications • Test network app at Army division@Fort Hood, Texas • Test DB app of hand-held medical assistant device at FDA
PexVisual Studio Power Tool • Download counts (20 months)(Feb. 2008 - Oct. 2009 ) • Academic: 17,366 • Devlabs: 13,022 • Total: 30,388
Pex for FunWeb-based Learning Tool 257,766 clicked 'Ask Pex!‘ since 2010 summer
NCSU Industry Tech Transfer • Loops • Fitnex [Xie et al. DSN 09] • Generic API functions e.g., RegEx matching IsMatch(s1,regex1) • Reggae [Li et al. ASE 09-sp] • Method sequences • MSeqGen [Thummalapenta et al. ESEC/FSE 09] • Environments e.g., file systems, network, db, … • Parameterized Mock Objects [Marri AST 09] Applications • Test network app at Army division@Fort Hood, Texas • Test DB app of hand-held medical assistant device at FDA
Explosion of Search Space There are decision procedures for individual path conditions, but… • Number of potential paths grows exponentially with number of branches • Without guidance, same loop might be unfolded forever Fitnex search strategy [Xie et al. DSN 09]
DSE Example Test input: TestLoop(0, {0}) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } Path condition: !(x == 90) ↓ New path condition: (x == 90) ↓ New test input: TestLoop(90, {0})
DSE Example Test input: TestLoop(90, {0}) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } Path condition: (x == 90) && !(y[0] == 15) ↓ New path condition: (x == 90) && (y[0] == 15) ↓ New test input: TestLoop(90, {15})
Challenge in DSE Test input: TestLoop(90, {15}) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } Path condition: (x == 90) && (y[0] == 15) && !(x+1 == 110) ↓ New path condition: (x == 90) && (y[0] == 15) && (x+1 == 110) ↓ New test input: No solution!?
A Closer Look Test input: TestLoop(90, {15}) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } Path condition: (x == 90) && (y[0] == 15) && (0 < y.Length) && !(1 < y.Length) && !(x+1 == 110) ↓ New path condition: (x == 90) && (y[0] == 15) && (0 < y.Length) && (1 < y.Length) Expand array size
A Closer Look Test input: TestLoop(90, {15}) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } We can have infinite paths! (both length and number) Manual analysis need at least 20 loop iterations to cover the target branch Exploring all paths up to 20 loop iterations is practically infeasible: 220paths
Fitnex: Fitness-Guided Exploration public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } Test input: TestLoop(90, {15, 15}) Key observations: with respect to the coverage target, • not all paths are equally promising for flipping nodes • not all nodes are equallypromising to flip • Our solution: • Prefer to flip nodes on the most promisingpath • Prefer to flip the most promisingnodes on path • Use fitness function as a proxy for promising
Fitness Function • FF computes fitness value (distance between the current state and the goal state) • Search tries to minimize fitness value [Tracey et al. 98, Liu at al. 05, …]
Fitness Function for (x == 110) public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } Fitness function: |110 – x |
Compute Fitness Values for Paths FitnessValue public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } (x, y) (90, {0}) 20 (90, {15}) 19 (90, {15, 0}) 19 (90, {15, 15}) 18 (90, {15, 15, 0}) 18 (90, {15, 15, 15}) 17 (90, {15, 15, 15, 0}) 17 (90, {15, 15, 15, 15}) 16 (90, {15, 15, 15, 15, 0}) 16 (90, {15, 15, 15, 15, 15}) 15 … Fitness function: |110 – x | Give preference to flip a node in paths with better fitness values. We still need to address which node to flip on paths …
Compute Fitness Gains for Branches FitnessValue public boolTestLoop(int x, int[] y) { if (x == 90) { for (int i = 0; i < y.Length; i++) if (y[i] == 15) x++; if (x == 110) return true; } return false; } (x, y) (90, {0}) 20 (90, {15}) flip b4 19 (90, {15, 0}) flip b2 19 (90, {15, 15}) flip b4 18 (90, {15, 15, 0}) flip b2 18 (90, {15, 15, 15}) flip b4 17 (90, {15, 15, 15, 0}) flip b2 17 (90, {15, 15, 15, 15}) flip b4 16 (90, {15, 15, 15, 15, 0}) flip b2 16 (90, {15, 15, 15, 15, 15}) flip b4 15 … Fitness function: |110 – x | Branch b1: i < y.Length Branch b2: i >= y.Length Branch b3: y[i] == 15 Branch b4: y[i] != 15 • Flipping branch node of b4 (b3) gives us average 1 (-1) fitness gain (loss) • Flipping branch node of b2 (b1) gives us average 0 (0) fitness gain (loss)
Compute Fitness Gains for Branches • Fitness gains: • FGain(b) := F(p) – F(p’) • FGain(b’) := F(p’) – F(p) • Compute the average fitness gain for each program branch over time p p’ n n b’ b …. …. F(p) is the fitness value of p F(p’) is the fitness value of p’
Implementation in Pex • Pex maintains global search frontier • All discovered branch nodes are added to frontier • Frontier may choose next branch node to flip • Fully explored branch nodes are removed from frontier • Pex has a default search frontier • It tries to create diversity across different coverage criteria • Frontiers can be combined in a fair round-robin scheme
Implementation in Pex We implemented a new search frontier “Fitnex”: • Nodes to flip are prioritized by their composite fitness value: F(pn) – FGain(bn), where • pn is path of node n • bn is explored outgoing branch of n • Fitnex always picks node with lowest composite fitness value to flip. • To avoid local optimal or biases, the fitness-guided strategy is combined with Pex’s search strategies
Evaluation Subjects A collection of micro-benchmark programs routinely used by the Pex developers to evaluate Pex’s performance, extracted from real, complex C# programs • Ranging from string matching like • if (value.StartsWith("Hello") && • value.EndsWith("World!") && • value.Contains(" ")) { … } • to a small parser for a Pascal-like language where the target is to create a legal program.
Search Strategies Under Comparison • Pex with the Fitnex strategy • Pex without the Fitnex strategy • Pex’s previous default strategy • Random • a strategy where branch nodes to flip are chosen randomly in the already explored execution tree • Iterative Deepening • a strategy where breadth-first search is performed over the execution tree
Evaluation Results #runs/iterations required to cover the target Pex w/o Fitnex: avg. improvement of factor 1.9 over Random Pex w/ Fitnex: avg. improvement of factor 5.2 over Random
? = Summary:Automated Developer Testing + Expected Outputs Test inputs Program Outputs Test Oracles Division of Labors • Test Generation • Test inputs for PUT generated by tools (e.g., Pex) • Fitnex: guided exploration of paths [DSN 09] • MSeqGen: exploiting real-usage sequences [ESEC/FSE 09] • Test Oracles • Assertions in PUT specified by developers
Motivation: New Trends in Development • Exponential increase in libraries or frameworks • Proprietary e.g., .NET or Java SDK • Open source e.g., Eclipse • Sourceforge.net hosts nearly 230,000 projects with two million users Build applications from scratch 1. J. Hammond. What developers think, 2010. http://www.drdobbs.com/architect/222301141/ 2. Black duck’s web page with koders usage information, 2010. http://corp.koders.com/about/ 31 API: Application Programming Interface
Major Problems • Programmers face difficulties in using APIs • Lack of documentation1 • Outdated documentation2 • Complexity3: .NET library provides nearly 10,000 classes Libraries or Frameworks … Use APIs 1,2 1. Jan Bosch, Peter Molin, Michael Mattsson, and PerOlofBengtsson. Object-oriented framework-based software development: problems and experiences. ACM Comput. Surv 2000 2. Timothy C. Lethbridge, Janice Singer, and Andrew Forward. How software engineers use documentation: The state of the practice. IEEE Software 2003. 3. D. Kirk, M. Roper, and M. Wood. Identifying and addressing problems in object-oriented framework reuse. Journal of Empirical Soft. Eng., 12(3):243{274, 2007 32
Consequences • Programmers spend more effort in understanding APIs1 • Reducing productivity • Programmers introduce defects while using APIs2 • Reducing quality 1. Martin P. Robillard. What makes APIs hard to learn? Answers from developers. IEEE Software 2009 2. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. Bugs as deviant behavior: A general approach to inferring errors in systems code. SOSP 2001 33
Example Programming Task • Task: How to parse code in a dirty editor of Eclipse IDE? • DirtyEditor is represented by IEditorPart • Requires ICompilationUnitfor parsing code • Query: IEditorPart -> ICompilationUnit • An example solution • IEditorPartiep = ... • IEditorInputeditorInp = iep.getEditorInput(); • IWorkingCopyManagerwcm = JavaUI.getWorkingCopyManager(); • ICompilationUniticu = wcm.getWorkingCopy(editorInp); • Challenges: • Needs instances of IEditorInput and IWorkingCopyManager • Needs to invoke a static method of JavaUI 34
Consequences • Programmers spend more effort in understanding APIs1 • Reducing productivity • Programmers introduce defects while using APIs2 • Reducing quality 1. Martin P. Robillard. What makes APIs hard to learn? Answers from developers. IEEE Software 2009 2. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. Bugs as deviant behavior: A general approach to inferring errors in systems code. SOSP 2001 35
Defects • A code example from an open source projectHSqlDB • Defect: No rollbackdone when SQLException occurs Missing connection.rollback() • Requires specification: FCc1 FCc2 FCc3 => FCe1 • FCc1 -> Connection conn = OracleDataSource.getConnection() • FCc2 -> Statement stmt = Connection.createStatement() • FCc3 -> stmt.executeUpdate() • FCe1 -> conn.rollback() 36
Solution: WebMiner Framework SEARCH Observation: source code already reusing APIs can be leveraged for improving both software productivity and quality Collect SE data ANALYZE Resolve object types Generate candidates MINE Adopt/adapt/ develop mining algorithm APPLY Postprocess/ Apply mining results 37
Key Idea of WebMiner Framework Application 1 Pattern Candidates Mining Application 2 Pattern Candidates Library Application 3 Pattern Candidates … … Productivity Observation: Frequent patterns are more likely to represent API usage specifications Defect detection techniques Defects 38 Quality
Search Phase • Leverages code search engines such as Google code search • Collects relevant code examples for APIs under analysis • Addresses a major issue of “limited data points” SEARCH Collect SE data ANALYZE Resolve object types Generate candidates MINE Adopt/adapt/ develop mining algorithm APPLY Postprocess/ Apply mining results 39
Search Phase: Limited Data Points Previous approaches mining Code repositories patterns 1 2 Eclipse, Linux, … • Often lack sufficientrelevant data points (eg. API call sites) • Missing patterns: affecting productivity • Missing related defects: affecting quality WebMiner Framework Code repositories searching mining patterns … 2 1 N Code search engine e.g., Open source code on the web 40 40 40
Analyze Phase • Analyzes collected code examples • Generates pattern candidates • Addresses a major issue of partial and non-compilable code examples SEARCH Collect SE data ANALYZE Resolve object types Generate candidates MINE Adopt/adapt/ develop mining algorithm APPLY Postprocess/ Apply mining results 41
Analyze Phase • Challenge: collected code examples are partial and non-compilable • Solution: partial-program analysis • Uses heuristics based on simple language semantics • Advantages: • Does not require code to be compilable • Highly scalable (96 MLOC analyzed in ~ 2 hours on 3.0 GHz Xeon processor and 4GB RAM) 42
Partial-Program Analysis Heuristics • Example 1: • QueueConnection connect; • QueueSession session = connect.createQueueSession (false, int) • How to get the return type of createQueueSessionmethod? • No access to the method declaration • Return type can be inferred from the type of the variable “session” 43
Mine and Apply Phases • Specific to the SE task under analysis • Mines pattern candidates to identify frequent patterns • Suggests patterns to programmers or uses patterns to detect defects SEARCH Collect SE data ANALYZE Resolve object types Generate candidates MINE Adopt/adapt/ develop mining algorithm APPLY Postprocess/ Apply mining results 44
Approaches based on WebMiner Addresses queries of the form “Source Destination” Helps identify where to start reusing a library Detects exception-handling related defects Detects missing condition checks around API method calls Mines static traces and assists white-box test-generation approaches • Industrial Impact: Used DyGen at Microsoft Research to generate regression test suite (~500,000 tests) for two core libraries of .NET framework. Mines dynamic traces and generates regression tests 45
Alattin: Motivation • Problem: Programming rules are often not well documented • General solution: SEARCH • Mine frequent patterns across a large number of data points (e.g., code examples) • Use frequent patterns as programming rules to detect defects Collect SE data ANALYZE Resolve object types Generate candidates MINE Adopt/adapt/ develop mining algorithm APPLY Postprocess/ Apply mining results 46
Challenges addressed by Alattin • Limited data points • Existing approaches mine specifications from a few code bases miss specifications due to lack of sufficient data points • Existing approaches produce a large number of false positives 47
Large Number of False Positives • A major observation: • Programmers often write code in different ways for achieving the same task • Some ways are more frequent than others False Positives Frequent ways Infrequent ways Violations detect violations mine patterns Mined Patterns 48
Example: java.util.Iterator.next() Code Sample 1 Code Sample 2 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Java.util.Iterator.next() throws NoSuchElementExceptionwhen invoked on a list without any elements 49
Example: java.util.Iterator.next() Code Sample 1 Code Sample 2 PrintEntries1(ArrayList<string> entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries2(ArrayList<string> entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Sample 2 (6/1243) Sample 1 (1218 / 1243) 1243 code examples • Mined Pattern from existing approaches: • “boolean check on return of Iterator.hasNextbefore Iterator.next” 50