450 likes | 588 Views
Bug Prediction for Fine-Grained Source Code Changes. Zi Yuan, Lili Yu, Chao Liu Software Engineering Institute Beihang University, Beijing, China. Outlines. Introduction Bug Prediction Experiments and Evaluation Threads to Validity Conclusions and Future Work. Introduction.
E N D
Bug Prediction for Fine-Grained Source Code Changes Zi Yuan, Lili Yu, Chao Liu Software Engineering Institute Beihang University, Beijing, China
Outlines • Introduction • Bug Prediction • Experiments and Evaluation • Threads to Validity • Conclusions and Future Work
Introduction Software Increasing the cost of bug fixing effort Causing new bugs Residing bug Damaging software reputation
Introduction Software Knowing the existence of bugs as soon as possible precondition Residing bug Removing residing bugs timely
Introduction • Software is constructed by a series of changes and each change has a risk of introducing bugs
Introduction change change change clean change buggy learner
Introduction • Granularity of change finer commit-level file-level statement-level S. Kim, E.J.W. Jr., and Y. Zhang, “Classifying Software Changes: Clean or Buggy?”, IEEE Transactions on Software Engineering, vol. 34, 2008. A. Mockus and D.M. Weiss, “Predicting risk of software changes”, Bell Labs Technical Journal, vol. 5, 2000.
Introduction statementchange statementchange statementchange clean statementchange buggy learner
Introduction Introducing a bug Rev1 Rev2 Rev3 Rev4 statement1 statement1 statement1
Bug Prediction • Problem Definition • Fine-Grained Source Code Change • Indication Labeling • Learning Task • Model Definition • Feature Definition • Classifiers
Bug Prediction • Problem Definition • Fine-Grained Source Code Change (SCC): • Type: 48 change types with tree edit operations on AST. • Entity: 104 type of the language constructs provided by an object-oriented programming language. • Timestamp: the time when 𝑆𝐶𝐶 happens. B. Fluri, M. Würsch, M. Pinzger, and H. Gall, “Change Distilling: Tree Differencing for Fine-Grained Source Code Change Extraction”, IEEE Transactions on Software Engineering, vol. 33, 2007.
Rev 1.3 Rev 1.2
Bug Prediction • Problem Definition • Bug Introducing 𝑆𝐶𝐶: We consider a 𝑆𝐶𝐶 as the bug introducing 𝑆𝐶𝐶 if it leads to at least one bug report and later fix during the software development process.
Rev 1.1 Rev 1.2 Rev 1.3 Bug Report #132156 lead to fix Bug Introducing SCC Class A Class A condition expression update Class A statement insert void foo() Int sum (int a) … void foo() Int sum (int a, int b) void foo() Int sum (int a, int b) … … parameter insert parameter insert … … obj = null … obj != null … statement insert condition expression update … … Bug Introducing SCC
Bug Prediction • Problem Definition • Indication Labeling: • Given a set of fine-grained source code changes 𝐶=(𝑆𝐶𝐶1,𝑆𝐶𝐶2,…,𝑆𝐶𝐶𝑛) ,the label of 𝑆𝐶𝐶𝑖(𝑖 = 1,2,…,𝑛) is defined as: where𝑐𝑙𝑒𝑎𝑛 there is no bug introduced by 𝑆𝐶𝐶𝑖and𝑏𝑢𝑔𝑔𝑦 denotes there is at least one bug introduced by 𝑆𝐶𝐶i . 𝑙(𝑆𝐶𝐶𝑖)∈{𝑐𝑙𝑒𝑎𝑛,𝑏𝑢𝑔𝑔𝑦}
Bug Prediction • Problem Definition • Learning task: • Given a set of fine-grained source code change (𝑆𝐶𝐶) features𝐹 = (𝑓1,𝑓2,…,𝑓𝑚), our goal is to learn a prediction model 𝐵𝑃 to predict whether a𝑆𝐶𝐶has bugs or not. Formally, we have: 𝐵𝑃(𝑆𝐶𝐶𝑖| 𝑓1,𝑓2,…,𝑓𝑚)→𝑙(𝑆𝐶𝐶𝑖), 𝑖=1,2,…,𝑛
Bug Prediction • Model Definition • Feature definition When Who SCC Where What Features from four dimensions
Bug Prediction • Model Definition • Feature definition Feature Group Sub-Feature Group Two-level structure of features groups
Bug Prediction • Model Definition • Feature definition • where: • Definition: It means the context of fine-grained changes. The characteristics of source code files, which have been touched by𝑆𝐶𝐶, reflect the𝑆𝐶𝐶’s context information. • Rationality: If it is difficult for a source code file to be understood thoroughly, the risk of changing it will be high.
+,*,/,%,&&,+=,<=,<< ,var1,var2,… History Metric Halstead Metrics Network Metrics 3 2 6 1 3 3 2 1 6 1 1 where McCabe Metric Topic Diversity PARSER XML::READING DRAWING::SHAPES …
Bug Prediction • Topic Diversity: it indicates domain breadth of source code files, which is measured by the entropy of the file’s topic distribution: 𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦(𝑑) = −𝑝(𝑡𝑘|𝑑) ∙ 𝑙𝑜𝑔𝑝(𝑡𝑘|𝑑) where 𝑑denotes a source code file, 𝑇=(𝑡1,𝑡2,…,𝑡|𝑇|) denotes the set of topics and 𝑝denotes the distribution of topics in a source code file.
Bug Prediction • Model Definition • Feature definition • what • Definition: itmainly denotes the actual content of the fine-grained source code change. • Rationality: • Since the logic complexity of the edit operation on AST, some types of fine-grained source code changes (e.g. condition expression change) and some types of fine-grained entities (e.g. if statement) are more likely to have bugs than others. • Some types of changes related to semantic topics (e.g. change related to “compiler” or “multithread”) are more likely to have bugs than others.
Topic Bug Prone Metric • PARSER [0.368](high) • XML:READING [0.026](low) • DRAW:SHAPES[0.123](medium) • … what • Entity Type(104) • IF Statement • Method Declaration • Class Declaration • Return Statement • … • Change Type (48) • Additional Functionality • Condition Expression Change • Else-Part Insert • Attribute Type Change • ...
Bug Prediction • Topic Bug Prone Metric: a metric to describe the bug proneness of topics that are touched on by 𝑆𝐶𝐶. • Firstly, propose a metric 𝑟𝑎𝑡𝑖𝑜for each delta file 𝛿𝑗(𝑗=1,2,…,𝑁), represented as follows: where 𝑑𝑖𝑟𝑡𝑦𝐿𝑒𝑛𝑔𝑡ℎdenotes the number of words in 𝛿𝑗that are located on the buggy lines. 𝐿𝑒𝑛𝑔𝑡ℎdenotes the total number of words in 𝛿𝑗. 𝑟𝑎𝑡𝑖𝑜(𝛿𝑗) = 𝑑𝑖𝑟𝑡𝑦𝐿𝑒𝑛𝑔𝑡ℎ(𝛿𝑗) / 𝐿𝑒𝑛𝑔𝑡ℎ(𝛿𝑗)
Bug Prediction • Secondly, extract topic distribution of 𝛿𝑗 (𝑗=1,2,…,𝑁)and define the Bug Prone Metric for each topic 𝑡𝑘(𝑘=1,2,…,|𝑇|) and each 𝑆𝐶𝐶𝑖(𝑖 = 1,2,…,𝑛). where 𝑝denotes the distribution of topics in the delta file and 𝐻𝑆𝐶𝐶𝑖denotes a set of indexes of delta files that correspond to changes that precede 𝑆𝐶𝐶𝑖. 𝑝(𝑡𝑘|𝛿𝑗) ∙ 𝑟𝑎𝑡𝑖𝑜(𝛿𝑗) 𝑏𝑢𝑔𝑃𝑟𝑜𝑛𝑒(𝑡𝑘,𝑆𝐶𝐶𝑖) = |𝐻𝑆𝐶𝐶𝑖|
Bug Prediction • Finally, assume 𝑆𝐶𝐶𝑖(𝑖 = 1,2,…,𝑛) touches on some words in the source code file, denoted as 𝑊=(𝑤1,𝑤2,…,𝑤𝑝), and each word would be assigned to a topic. All the topics related to 𝑆𝐶𝐶𝑖are denoted as 𝑇𝑟=(𝑡𝑟1,𝑡𝑟2,…,𝑡𝑟𝑞). Get the Topic Bug Prone Metric of 𝑆𝐶𝐶𝑖by summing up the Bug Prone Metric of each related topics. 𝑡𝑜𝑝𝑖𝑐𝐵𝑢𝑔𝑃𝑟𝑜𝑛𝑒(𝑆𝐶𝐶𝑖) = 𝑏𝑢𝑔𝑃𝑟𝑜𝑛𝑒(𝑡𝑟𝑠,𝑆𝐶𝐶𝑖)
Bug Prediction • Model Definition • Feature definition • who • Definition: it means the familiarity and experience of the developers with the software system and the touched file. • Rationality: Some changes implemented by developers with little experience may buggy prone as they can not understand the project thoroughly.
Bug Prediction who File Experience: the number of times that a developer changes a file Project Experience: the number of times a developer changes the whole project
Bug Prediction • Model Definition • Feature definition • when • Definition: it captures a developer’s habit and work cycle. • Rationality: • Changes at some time period of day ( e.g. between midnight and 4 AM in Eyolfson’s work1) • Changes on some day of week (e.g. Changes for Eclipse and Mozilla were found to be buggiest on Fridays in Sliwerski’s work2) 1J. Eyolfson, L. Tan, and P. Lam, “Do time of day and developer experience affect commit bugginess”, in Proc. MSR, 2011. 2J. Sliwerski, T. Zimmermann, and A. Zeller, “When do changes induce fixes?”, ACM sigsoft software engineering notes, vol. 30, 2005.
Bug Prediction when Time of Day: 00,01,02,…24 Day of Week: Mon., Tues., Wed., Thur., Fri., Sat., Sun.
Bug Prediction • Model Definition • Classifiers • Random Forest • Bagging • K-Nearest Neighbor Ensemble Instance-Based
Experiments and Evaluation • Dataset Description
Experiments and Evaluation McCabe Metric Halstead Metrics • Data Collection File Experience History Metrics Network Metrics Project Experience who Topic Diversity Graph builder2 Prest1 where Versions Log Entries db mallet3 compute History extractor cvs, svn, git Change Type SCC sDiff5 SZZ* Change distiller4 Time of Day EntityType AST compare Day of Week {Clean Buggy} Topic Bug Prone when label what 1.1 1.2
Experiments and Evaluation • Tools • 1http://svn.cmpe.boun.edu.tr/svn/softlab/prest/trunk/Executable/PrestTool.rar • 2http://code.google.com/p/dependency-graph-builder • 3http://mallet.cs.umass.edu/ • 4http://www.ifi.uzh.ch/seal/research/tools/changeDistiller.html • 5http://code.google.com/p/sdiff
Experiments and Evaluation • Performance analysis
Experiments and Evaluation • Performance analysis
Experiments and Evaluation • Feature group analysis
Experiments and Evaluation • Sub-Feature Group Analysis
Threads to Validity • The Construct Validity • First, the accuracy of bug fix change identification depends on the quality of projects’ change logs. • Second, there are two problems in tracking the bug introducing 𝑆𝐶𝐶𝑠. One is that the bug fix and bug introducing changes are at different locations. The other is that some bug fix changes only add several lines of code, which are missed by current approaches. However, analyzing commit messages to identify the bug fix changes and tracking bug introducing changes based on SZZ are common procedures and also reflect state of the art.
Threads to Validity • The External Validity • Only projects following the open source development methodology have been examined in our study. Different development processes used in industry development could lead to different bug introducing 𝑆𝐶𝐶 patterns. all projects are independently developed and come from different domains. Moreover, although open source, the two projects from eclipse community have the industrial background.
Conclusions and Future Work • Conclusions • Performance: • The model using full feature combination and Random Forest Classifier achieves the best performance. It can predict bugs with 78% precision, 71% recall, and 75% F-measure on average.
Conclusions and Future Work • Conclusions • Feature Group: • Among the four feature groups, where has the strongest discriminative power in predicting bugs; when has the weakest discriminative power and what and who have moderate performances.
Conclusions and Future Work • Conclusions • Sub-FeatureGroup: • Among the twelve sub-feature groups, TopicDiversity, Halstead and Topic Bug Prone are the three most powerful prediction factors and followed by Project Experience and McCabe. Two of the least powerful factors are Time of Day and Change Type respectively.
Conclusions and Future Work • Future Work • Exploring more sophisticated 𝑆𝐶𝐶 labeling algorithm by dint of control dependency, data dependency and logic coupling dependency. • Exploring the variation of influencing features with the passing of time and construct incremental learning algorithms. • Generating more new features for feature group when, which is not restricted to the absolute time. Several relative time metrics (e.g. after large-scale refactoring or during the period of change bursts) will be considered.
Thanks! Contact us Email:yuanzi@sei.buaa.edu.cn Address: Software Engineering Institute BeiHang University XueYuan Road No.37, HaiDian District, Beijing, China Tel: +861082317641Fax: +861082317641