480 likes | 631 Views
Fuzzy Set and Cache-based Approach for Bug Triaging. Ahmed Y. Tamrawi. Electrical and Computer Engineering Department Iowa State University 2011. Software Bugs. 2. 3. 4. 5. 1. { Introduction }. Bugs can occur in any software.
E N D
Fuzzy Set and Cache-based Approach for Bug Triaging Ahmed Y. Tamrawi Electrical and Computer Engineering Department Iowa State University 2011
Software Bugs 2 3 4 5 1 { Introduction } • Bugs can occur in any software. • Ranging from operating systems, flight auto-pilot software, to a simple arithmetic program! • Software bugs are costing ~60 bln US$/Y. The term “Bug” Definition: (Software Bug) A common term used to describe a flaw, mistake, or failure in a computer system that produces an incorrect or unexpected result, or causes it to behave in unintended ways. (September 9, 1947) Fuzzy Set and Cache-based Approach for Bug Triaging
More Bugs 2 3 4 5 1 { Introduction } Fuzzy Set and Cache-based Approach for Bug Triaging
Bug Repository 2 3 4 5 1 { Introduction } • Software users and developers report bugs, to allow software developers to fix them. • Bugs are reported using bug reports which are added to an issue tracking system or bug repository. Bugs Repository An interface for Bugs Repository reported stored Fuzzy Set and Cache-based Approach for Bug Triaging
Bug Triaging 2 3 4 5 1 { Introduction } • Manual bug triaging is a difficult, expensive, and lengthy process, since it needs the bug triagerto manually read, analyze, and assign bug fixers for each newly reported bug. Definition: (Bug Triaging) Assigning a bug to the most appropriate/capable developer who will fix it. Fuzzy Set and Cache-based Approach for Bug Triaging
Bug Triaging 2 3 4 5 1 { Introduction } Bug Assignment Bug Triager Software Developers Bugs Repository New Bug Reports Fuzzy Set and Cache-based Approach for Bug Triaging
Bug Triaging 2 3 4 5 1 { Introduction } • Bug triager challenges: • Knowledge about the system/project; • Descriptiveness of bug report; • Rate of reporting bugs; • Many developers, different projects, and various expertise! • Why not to automate the bug triaging process? • Improve software quality; • Reduce cost and time. Eclipse – Feb 2011 Fuzzy Set and Cache-based Approach for Bug Triaging
Example Assigned to: James Moody Summary: New Repository wizard follows implementation model, not user model. Description: The new CVS Repository Connection wizard's layout is confusing. This is because it follows the implementation model of the order of elds in the full CVS location path rather than the user model... Assigned to: James Moody Summary: Opening repository resources doesn't honor type. Description:Opening repository resource always open the default text editor and doesn't honor any mapping between resource types and editors. As a result it is not possible to view the contents of an image (*.gif le) in a sensible way.... 2 3 4 5 1 { Motivation } Technical Aspect Version Control Management (VCM) This aspect is concerned about various Concurrent Versions System (CVS) repository features and operations within Eclipse project. James Moody Fuzzy Set and Cache-based Approach for Bug Triaging
Technical Aspects & Terms • A software system has many technical aspects. • Technical aspects are described via the technical terms extracted from software artifacts. • A bug report describes issues related to technical aspects via its terms. 2 3 4 5 1 { Motivation } Fuzzy Set and Cache-based Approach for Bug Triaging
Automatic Bug Triaging 2 3 4 5 1 { Motivation } Key Philosophy for Automatic Bug Triaging Who have the most bug-fixing capability/expertise with respect to the reported technical aspect(s) in a give bug report should be the fixer(s) Fuzzy Set and Cache-based Approach for Bug Triaging
Problem Definition 2 3 4 5 1 {Bugzie Model } Problem: (Automatic Bug Assignment) In a software system, given a bug report B, and a set of developers D who have past fixing activity. Find the developers(s) with the most fixing expertise with respect to the reported technical aspect(s) in B. Software Developers New Bug Report B Bugs Repository Fuzzy Set and Cache-based Approach for Bug Triaging
Bugzie Overview • Bugzie considers the problem as a ranking problem. • State-of-the-art approaches view the problem as a classification problem. • For a bug report, Bugziedetermines a ranked list of developers most capable toward the reported issue(s). 2 3 4 5 1 {Bugzie Model } Fuzzy Set and Cache-based Approach for Bug Triaging
Bugzie Overview • Bugzie utilizes the fuzzy set theory to rank the fixing expertise of developers toward the technical aspects. • Bugzie models the association of a developer and technical aspects. • If a developer has higher fixing association with a technical aspect, he will have higher expertise and rank for that aspect. 2 3 4 5 1 {Bugzie Model } Fuzzy Set and Cache-based Approach for Bug Triaging
Association of Fixer & Term • is more capable than in the issues related to t. 2 3 4 5 1 {Bugzie Model } Definition: (Capable Fixer toward a Term) For a technical term t, a fuzzy set Ct, with associated membership function , represents the set of developers who have the bug-fixing expertise relevant to technical aspects(s) described by t Ct Fuzzy Set and Cache-based Approach for Bug Triaging
Association of Fixer & Term D( ) D( ) • The membership score of a developer d toward a term t is: • Dd: Bug reports d has fixed. • Dt: Bug reports containing t. 2 3 4 5 1 {Bugzie Model } D( ) D( ) Fuzzy Set and Cache-based Approach for Bug Triaging
Association of Fixer & Bug Report Bug Report (B) t1 t2 tn 2 3 4 5 1 {Bugzie Model } CB Fuzzy Set and Cache-based Approach for Bug Triaging
Association of Fixer & Bug Report • In fuzzy set, unionis a flexible combination. • The strong membership to a sub-fuzzy set(s) implies the strong membership to the combined fuzzy set. • After calculating for the developers, Bugzierecommends the top-scored ones as fixers for the bug report. 2 3 4 5 1 {Bugzie Model } Fuzzy Set and Cache-based Approach for Bug Triaging
BugzieModel 2 Bug Report (B) Bug Report (B) 3 Pre-processing t1 t2 tn 2 3 4 5 1 {Bugzie Model } Recommendation 1 4 Recommendation List Descending on Bugs Repository 5 Updating Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging
Bugzie Caching • Fixer candidates selection (Developers Caching). • Significant terms selection (Terms Caching). 2 3 4 5 1 {Bugzie Model } Bugs Repository Terms Cache T(k) Developers Cache F(x) Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging
Data Collection • Collected all fixed bug reports from 7 bug repositories. • For each bug report, we extracted and merged the summary and description. • For each system, we pre-processed these reports: stemming, stop words removal, etc. 2 3 4 5 1 {Bugzie Model } Fuzzy Set and Cache-based Approach for Bug Triaging
Locality of Fixing Activity 2 3 4 5 1 {Bugzie Model } Bug Report 2006 2007 2008 2009 2010 Timeline Fuzzy Set and Cache-based Approach for Bug Triaging
Locality of Fixing Activity • If d belongs to the F(x), we count this as a hit. 2 3 4 5 1 {Bugzie Model } Bug Report B Fixed by d All Developers that have been fixing before B Hypothesis: (Locality of Fixing Activity) The recent fixing developers are likely to fix bug reports in the near future. Fixing Timeline Recent x% 2006 2007 2008 2009 2010 Developers Cache F(x) Fuzzy Set and Cache-based Approach for Bug Triaging
Locality of Fixing Activity 96% - 99% 2 3 4 5 1 94% - 98% {Bugzie Model } Fuzzy Set and Cache-based Approach for Bug Triaging
Selection of Fixer Candidates • The locality of fixing activity suggests the actual fixer for a given bug report is likely the one having recent fixing activity. • For each bug report, Bugziechooses the top x% of developers sorted by their fixing time as the fixer candidates F(x). 2 3 4 5 1 {Bugzie Model } Bug Report B Fixed by d All Developers that have been fixing before B Fixing Timeline Recent x% 2006 2007 2008 2009 2010 Developers Cache F(x) Fuzzy Set and Cache-based Approach for Bug Triaging
Developers Caching 3 Bug Report (B) 4 Pre-processing t1 t2 tn 2 3 4 5 1 {Bugzie Model } Recommendation 5 Recommendation List 2 Descending on 1 Bugs Repository Developers Cache F(x) Bug Report (B) 6 Updating Updating Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging
Selection of Descriptive Terms 2 3 4 5 1 {Bugzie Model } RECALL : For a developer dand a term t, the higher their association score , the higher significance of t in describing the technical aspects that d has fixing expertise. Fuzzy Set and Cache-based Approach for Bug Triaging
Selection of Descriptive Terms Descending on 2 3 4 5 1 {Bugzie Model } (All Terms) Fuzzy Set and Cache-based Approach for Bug Triaging
Terms Caching Bug Report (B) Pre-processing t1 t2 tn 2 3 4 5 1 {Bugzie Model } Recommendation Recommendation List Descending on Bugs Repository Terms Cache T(k) Bug Report (B) Updating Updating Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging
Empirical Evaluation • We evaluated Bugzieon our collected datasets. • Experiments: • Selection of fixer candidates; • Selection of terms; • Selection of developers and terms; • Comparison with state-of-the-art approaches. 2 3 4 5 1 { Empirical Evaluation } Fuzzy Set and Cache-based Approach for Bug Triaging
Experiment Setup 0 1 2 3 4 5 6 7 8 9 10 Creation Timeline 2 3 4 5 1 { Empirical Evaluation } Bug Report B Bug Report B Bugzieuses frame 0 for initial training 1 Using training data,Bugzierecommends a top-n developers to fix bug report B 2 Recommendation List for B Descending on Move to next Bug Report Bugzieupdates the training data with the tested bug report B 3 Bugzierepeats steps 2 and 3 till it consumes all bug reports Fuzzy Set and Cache-based Approach for Bug Triaging
Prediction Accuracy • If the recommendation list for a bug report contains its actual fixer, we count this as a hit(i.e. a correct recommendation). • For each frame under test, we calculated Prediction Accuracy (PA). • If we have 100 bugsand for 60 of those bugs, we could recommend the actual fixing developer is in our Top-2 list, then Top-2 prediction accuracy is 60%. 2 3 4 5 1 { Empirical Evaluation } Fuzzy Set and Cache-based Approach for Bug Triaging
Selection of Fixer Candidates 3 Bug Report (B) 4 Pre-processing t1 t2 tn 2 3 4 5 1 Bug Report B Fixed by d All Developers that have been fixing before B { Empirical Evaluation } Recommendation Fixing Timeline Recent x% 2006 2007 2008 2009 2010 5 Recommendation List 2 Descending on Developers Cache F(x) 1 Bugs Repository Developers Cache F(x) Bug Report (B) 6 Updating Updating Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging
Selection of Fixer Candidates 2 3 4 5 1 { Empirical Evaluation } Firefox ( ): At x = 10%, PA = 72.4% At x = 100%, PA = 70.7% Top-1 Prediction Accuracy Top-5 Prediction Accuracy Fuzzy Set and Cache-based Approach for Bug Triaging
Selection of Fixer Candidates • Selecting a suitable portion of recent fixers does not lessen much the accuracy, and sometimes improves it as in the cases of Firefox, Eclipse, etc. • Selecting only a portion of available developers as candidates also improves time efficiency. 2 3 4 5 1 { Empirical Evaluation } Fuzzy Set and Cache-based Approach for Bug Triaging
Selection of Terms 3 Bug Report (B) 4 Pre-processing t1 t2 tn 2 3 4 5 1 { Empirical Evaluation } Recommendation 2 5 Recommendation List Descending on 1 Bugs Repository Terms Cache T(k) Bug Report (B) 6 Updating Updating Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging
Selection of Terms 2 3 4 5 1 { Empirical Evaluation } Eclipse( ): At k = 16, PA = 80% At k = All Terms, PA = 72% Peak Range Peak Range Top-1 Prediction Accuracy Top-5 Prediction Accuracy Fuzzy Set and Cache-based Approach for Bug Triaging
Selection of Terms • Selection of terms could improve much the prediction accuracy. • The results suggest that one just needs a small yet significant set of terms for each developer to describe his bug-fixing expertise. 2 3 4 5 1 { Empirical Evaluation } Fuzzy Set and Cache-based Approach for Bug Triaging
Selection of Developers & Terms • To study the impact of both developers selection (x) and terms selection (k). 2 3 4 5 1 { Empirical Evaluation } Eclipse Firefox Fuzzy Set and Cache-based Approach for Bug Triaging
Selection of Developers & Terms 2 3 4 5 1 { Empirical Evaluation } Base: Base model with all developers and all terms C.S.: Candidate Selection T.S.: Terms Selection Both: The best PA when applying both C.S. and T.S. Fuzzy Set and Cache-based Approach for Bug Triaging
Comparison • We compared Bugzie Results with state-of-the-art approaches. • Used Wekato re-implement those approaches 2 3 4 5 1 { Empirical Evaluation } Fuzzy Set and Cache-based Approach for Bug Triaging
Comparison • Some of the approaches (C4.5 - Decision Trees) can not scale up well to our dataset. • We prepared smaller dataset: 2 3 4 5 1 { Empirical Evaluation } 3-Year Histories of the full dataset Fuzzy Set and Cache-based Approach for Bug Triaging
Comparison Results 2 3 4 5 1 { Empirical Evaluation } (d) days, (h) hours, (m) minutes, (s) seconds Fuzzy Set and Cache-based Approach for Bug Triaging
Conclusions • Bugzie achieves higher accuracy and efficiency than state-of-the-art approaches. • Bugziecan accommodate the locality of fixing activity and software evolution with flexible caching of developers and terms. 2 3 4 5 1 { Conclusions} Fuzzy Set and Cache-based Approach for Bug Triaging
Thesis Contributions • Bugzie, a scalable, fuzzy set and cache-based automatic bug triaging approach, which is significantly more efficient and accurate than existing state-of-the-art approaches. • The finding of the locality of fixing activity. • A comprehensive evaluation on the efficiency and correctness of Bugziein comparison with state-of-the-art approaches. • An observation/method to capture a small and significant set of terms describing developers’ bug-fixing expertise. 2 3 4 5 1 { Conclusions} Fuzzy Set and Cache-based Approach for Bug Triaging
Future Work • Use different caching mechanisms for developers and terms. • Explore the usage of other textual and non-textual contents of bug reports for bug triaging. • Use other software artifacts to accurately measure the developer’s expertise. 2 3 4 5 1 { Conclusions} Fuzzy Set and Cache-based Approach for Bug Triaging
Thank You! Fuzzy Set and Cache-based Approach for Bug Triaging