400 likes | 484 Views
An Empirical Study of Test Case Filtering Techniques Based on Exercising Information Flows. IEEE Transactions on Software Engineering Wes Masri, Andy Podgurski, David Leon Presented by Jason R. Beck and Enrique G. Ortiz. Outline. Introduction and background definitions Paper Objectives
E N D
An Empirical Study of Test Case Filtering Techniques Based on Exercising Information Flows IEEE Transactions on Software Engineering Wes Masri, Andy Podgurski, David Leon Presented by Jason R. Beck and Enrique G. Ortiz
Outline • Introduction and background definitions • Paper Objectives • Filtering Techniques • Profile Types and Tools • Empirical Study Description • Subject Programs • Results • Conclusion • Pros - Cons / Suggestions
Introduction • Information Flow • Important concept in software testing research • Describes complex interactions between different program elements. • Software failures • Often caused by untested information flows. • Why? • Information flows can be complex • Too many to make testing them all feasible.
Introduction • Test Case Filtering • Involves selecting a manageable number of test cases to use. • Software Profiles • Software profiles are recorded interactions during program operation. • Can describe control flow, data flow, input or variable values, object states, event sequences, and timing. • Profiles can analyzed for how likely they are to generate errors and those can be tested further.
Why Filter Test Cases? • Reduce the number of test cases to be executed. • Reduce the number of test executions which need a manually interpretation of correct output. • Anything that requires a human interpretation of results as part of the test involves much effort. • Can be eliminated if test cases are automated and self validating.
Paper Objectives • Presents the results of an empirical study using many test case filtering techniques. • Evaluates techniques for their ability to reveal defects in programs. • Information profiles created using author developed tool.
Information Flow Based Testing • Generally graph theory models showing information flow in the software. • Many proposed techniques • Authors focus on … • Information flow between objects. • Data driven • Dynamic program slicing. • Program statement driven (think stack trace when debugging). • Both have static and runtime versions.
Filtering Techniques • Two techniques compared • Each driven by execution profiles which indicate execution frequency of program elements. • Coverage Based-Techniques • Distribution-Based Techniques
Coverage Based Techniques • “Select test cases to maximize the proportion of program elements of a given type” • Attempts to cover as many elements of the program as possible with the fewest number of test cases. • Instance of a set-cover problem. • Algorithm • Each iteration selects a test case which covers the largest number of program elements not covered by the previously selected tests.
Distribution Based Techniques • Clustering technique • Test cases are clustered and a test case from each cluster can be selected to represent the group. • Created by observing execution profiles as patterns with n dimensions. • Each dimension represents the execution count of a basic block of code. • Also uses failure-pursuit sampling • Audits test cases near failures using a k-nearest neighbor approach. • This allows cases similar to the errors to be checked.
Profile Types and Profiling Tools • Profiles characterize test executions by keeping track of execution frequencies of program elements. • The study takes into account eight types of profiles. • Generated using Byte Code Engineering Library to examine the byte code of Java programs. • It also uses an existing tool the authors created for dynamic information flow analysis.
Profile Types • Method Calls (MC) • contains a count of how many times a method M was called. • Method Call Pairs (MCP) • a count of how many times a method M1 called a method M2. • Basic Blocks (BB) • A count of how many times a given basic block of code was executed.
Profile Types • Basic Block Edges (BBE) • A count of how many times a basic block B1 branches to basic block B2. • Def-use pairs (DUP) • A count of how many times a variable definition is defined and then later used. • All of the above combined (ALL) • Combination of all the above models.
Profile Types More complex profile types • Information flow pairs (IFP) • Count of how many times a variable x flowed into variable y. • Slice Pairs (SliceP) • For each statement pair s1 and s2, s1 occurs before s2 in at least one slice.
Empirical Study • Basic Coverage Maximization • Cluster Filtering (One-per cluster sampling) • Failure-Pursuit Sampling • Simple Random Sampling
Basic Coverage Maximization • Ties • “different tests that each covers the maximal number of program elements not covered by previously selected tests” • Ran 1,000 times per program/profile type • Randomly selected order of the tests • Recorded • Number of tests selected • How many failures and defects detected
Cluster Filtering and Failure Pursuit Sampling • Proportional Binary Metric and Agglomerative Hierarchical Clustering • Number of clusters varied to correspond to a range of percentages of the size of the test suite • Procedure • Clustered into c clusters based on their profiles • One test randomly selected from each cluster • Recorded number of failures and defects • Run 1,000 times • Failure Pursuit: Check 5 nearest neighbors
Simple Random Sampling • Randomly select test without replacement • Record number of failure-inducing tests and defects • Ran 1,000 Times
javac Java Compiler • 28,639 lines of code • Jacks Test Suite • 3,140 tests • 233 cause failures
Xerces XML parser • 52,528 lines of code • XML Conformance Test Suite • Used 1,667 tests of 2,000 • Difficult to determine pass/fail of dropped tests • 10 cause failures • Only checks syntax
TidyHTML Syntax Checker • Test compliance with Java Language Specification • 1,000 files (tests) from Google Groups • Failed on 47 of test cases
Analysis • Defects that caused errors were traced • Results: • Average percentage of defects that they revealed over a number of replicated applications viewed as a function of the number of tests selected • Compared with respect to how often they reveled specific defects
Results • Several defects revealed in 1,000 replications • Some defects only revealed when SliceP and IFP maximized • “Maximization with one type of profile revealed defects that were not revealed with another type of profile that seems to be more detailed.”
Anomalies • Simpler profile types (i.e. MC, MCP, BB, BBE, and DUP) revealed more defects than IFP • “Information Flow Pairs are recorded only when a variable is actually defined (assigned a value), but some defects may be triggered without executing such an operation.”
Threats to Validity • Programs too broad • Did not debug programs enough • Wrongly classified defects • Assumes size of the final set of tests is an accurate measure of cost
Observations Cost and Analysis • Time and space increases with level of profile detail • Time for collecting profile information, longer than time needed for analysis
Conclusions • Coverage maximization, One-Per-Cluster Sampling, and Failure Pursuit Sampling more effective than Random Sampling when proportion of failure high • Coverage maximization based on complex profiles revealed most defects
Conclusions • One-per-cluster sampling and failure pursuit did not clearly perform better than coverage maximization • No clear performance difference between one-per-cluster and failure pursuit sampling
Conclusions • Empirically evaluate test case filtering techniques • Compare with respect to: • Effectiveness for revealing defects • Simple Random Sampling • Complex profiles such as IFP and SliceP justifiable for when large number of tests necessary
Pros and Cons • Pros • Describes a good way to analyze programs. • Uses profiles to help minimize complexity for only those most meaningful code chunks. • Cons • Programs tested were just compilers and syntax checkers. • Graphs could have better captions explaining what is occuring0
Suggestions • Have only one Test Suite • Several different program types that can be tested with same suite • Eliminates an additional variable • Select several types of programs