340 likes | 458 Views
Improving Cluster Selection Techniques of Regression Testing by Slice Filtering. Yongwei Duan, Zhenyu Chen, Zhihong Zhao, Ju Qian and Zhongjun Yang Software Institute, Nanjing University, Nanjing, China http://software.nju.edu.cn/zychen. Outline. Introduction Our Approach
E N D
Improving Cluster Selection Techniques of Regression Testing by Slice Filtering Yongwei Duan, Zhenyu Chen, Zhihong Zhao, Ju Qian and Zhongjun Yang Software Institute, Nanjing University, Nanjing, China http://software.nju.edu.cn/zychen
Outline • Introduction • Our Approach • Experiment and Evaluation • Future Work
Introduction • Test selection techniques • Cluster selection techniques • Problems
Test selection techniques • Rerunning all of the existing test cases is costly in regression testing • Test selection techniques : choose a subset of test cases to rerun
Cluster Selection Run Test Cases Execution Profiles (Basic block level) Collection Clustering Clusters of Test Cases Sampling A reduced test suite Cluster selection overview
Problems • Too much data to cluster • Huge amount of execution traces • Always a high dimension • Just focus on the code fragments that are actually relevant to the program modification!!!
Our approach • Overview • Slice filtering • Clustering analysis • Sampling
Our approach • Overview Running test cases Execution traces traces Trace filtering clusters Cluster analysis sampling Reduced test suite
Slice filtering • The execution traces are too detailed to be used in clustering analysis • We use program slice to filter out fragments that are irrelevant to program modification.
Slice filtering cont’d • Statement 2 is changed from ‘if(m<n)’ to ‘if(m<=n)’ • We compute a program slice with respect to statement 2 and intersect it with each execution trace. • Given 3 test cases, we compare their execution traces and filtered execution traces. if(m<=n){
Slice filtering cont’d • Execution traces are much smaller after program slice filtering. • Traces of t2 and t3 are the same by filtering while the difference between t1 and t2 is magnified. • To condense the traces further, adjacent statements within a basic block is combined into one statement. • Patterns are easy to reveal with simple execution traces.
Slice filtering cont’d • But the amount of test cases is still large. • If a trace is too small (below a threshold) after intersection with the program slice, it is unlikely to be a fault-revealing test case, so we remove it from the test suite.
Slice filtering cont’d • Filtering rate • We define filtering rate FR as: if the threshold is M and the size of the program slice is N, then the filtering rate FR = M / N * 100%. • When FR gets lower, the effect of filtering diminishes i.e. fewer features can be eliminated.
Slice filtering cont’d • Why not just use Dynamic slicing • The computing of dynamic slicing is complex and time consuming • Effective dynamic slicing tools are hard to come by
Clustering analysis • Distance measure • For a filtered trace fi = <ai1,ai2,…,ain>, where aijis the execution count of a basic block. The distance between two filtered trace fiand fj is:
Sampling • We use adaptive sampling in our approach • We first sample a certain number of test cases. If a test case is fault-revealing, the entire cluster from which the test cases are sampled is selected. This strategy favors small clusters and has high probability to select fault-revealing test cases.
Experiment & Evaluation • Subject program • space, from SIR(Software-artifact Infrastructure Repository ) • 5902 LOC • 1533 basic-blocks • 38 modified versions (a real fault is augmented for each version ) • 13585 test cases
Experiment & Evaluation • Subject program • Measurements • Experimental results • Observations
Experiment & Evaluation • 3 measurements • Precision • Reduction • Recall
Experiment & Evaluation • Precision • if in a certain run the technique selects a subset of N test cases, in which M test cases are fault-revealing. The precision of the technique is: M / N * 100%. • Precision measures the extent to which a selection method omits non-fault-revealing test cases in a run 20
Experiment & Evaluation • Reduction • if a selection technique selects M test cases out of all N existing test cases in a certain run, the reduction of the technique is: M / N * 100%. • Reduction measures the extent to which a technique can reduce the size of the original test suite. • A low reduction means a selection technique greatly reduce the original test suite. 21
Experiment & Evaluation • Recall • if a selection technique selects M fault-revealing test cases out of N existing fault-revealing test cases in a certain run, the recall of the technique is: M / N * 100%. • Recall measures the extent to which a selection technique can include fault-revealing test cases. • Recall indicates the fault detecting capability of a technique. A safe selection technique achieves 100% recall. 22
Experiment & Evaluation • Experimental results • A comparison between our approach and Dejavu. Dejavu is known as an effective algorithm in its high precision of test selection. • A comparison between 2 different filtering rate: FR = 0.3 and FR = 0.5 23
Experiment & Evaluation Comparison of precision between our approach when FR=0.3 and Dejavu 24
Experiment & Evaluation Comparison of reduction between our approach when FR=0.3 and Dejavu 25
Experiment & Evaluation We achieve certain improvement except version 13, 25, 26, 35, 37, 38. Comparison of recall between our approach when FR=0.3 and Dejavu 26
Experiment & Evaluation • Analysis • The key to our approach is to isolate the fault-revealing test cases into small clusters • Failures detected on version 13, 25, 26, 35, 37, 38 are mostly memory access violation failures. Those failures cause premature termination of the execution flows. • Program slicing cannot predict runtime execution flow changes and therefore cannot provide enough information to differentiate these test cases and lump them into different clusters. 27
Experiment & Evaluation Comparison of precision between FR=0.3 and FR=0.5 28
Experiment & Evaluation Comparison of reduction between FR=0.3 and FR=0.5 29
Experiment & Evaluation If we raise FR to 0.5, certain improvement on precision, reduction and recall can be achieved Comparison of recall between FR=0.3 and FR=0.5 30
Experiment & Evaluation • Observations • for most versions, our approach has higher precision and lower reduction (lower is better) than Dejavu. It means that we can select fault-revealing test cases from the original test suite and select relatively few non-fault-revealing test cases 31
Experiment & Evaluation • Observations • the effectiveness of our approach depends largely on the level of isolations of fault-revealing test cases. By choosing appropriate parameters such as filtering rate, sampling rate, initial cluster number etc., we can enhance the level of isolation. 32
Future work • We will try to answer the following questions in our future work • How do distance metrics and cluster algorithms affect the result of cluster selection techniques? • Given a program, how to find the best filtering rate and other parameters? 33