190 likes | 281 Views
Open Source Java Bug Study: Understanding where help is needed. Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University. Technology : Chains of evidence (CoE) Extra-linguistic program assurance (Lock, Uniqueness) Bureaucratic (mechanical).
E N D
Open Source Java Bug Study:Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University
Technology: Chains of evidence (CoE) Extra-linguistic program assurance (Lock, Uniqueness) Bureaucratic (mechanical) • Question: Can this have a positive impact on practice? What canbe assured Where ishelp needed +impact Goal: determine, empirically, how useful CoE is (how common and “costly” are defects CoE could help prevent) Study defect reports on and code changes made (fixes) to widely deployed open source Java projects Motivation—Why study open source Java bugs? Fluid Assurance Tool
This talk • Methodology • Data collection • Selected Java projects • Tool data and limitations (and solutions) • Variable creation • Variable reduction • Summary • Questions & discussion
Focus Today… Methodology (1) Bug Selection (2) Expert Analysis Example… Data collection Results Analysis Variable creation Expert judgment:Is bug/fix bureaucratic? Could CoE have helped?Semantic category?Do we understand bug/fix? Variable reduction/exploratory data analysis Inter-raterReliability? Sampling Develop definitions (bureaucratic)
Ant Struts Tomcat Data collection:3 Java projects investigated • Ant (64 kSLOC) • A Java-based build tool • Struts (40 kSLOC) • Framework for building Java web applications based on a variation of the classic MVC design paradigm • Tomcat (65 kSLOC Java) • The official reference implementation of Java Servlet and JavaServer Pages technologies (web server) Selection: Widely used Java software (external validity?)
Data collection:Tool data used • Software Defect (“Bug”) Data • Off-line copy of Apache Software Foundation (ASF) Bugzilla MySQL database • Ant: 2,230 bugs (7-Sep-00 to 16-May-03) • Struts: 1,473 bugs (19-Oct-00 to 16-May-03) • Tomcat: 4,052 bugs (26-Aug-00 to 16-May-03) • Code Changes • CVS commit logs • Ant: 9,565 commits (13-Jan-00 to 4-Jun-03) • Struts: 3,610 commits (31-May-00 to 4-Jun-03) • Tomcat: 14,833 commits (10-Oct-99 to 4-Jun-03)
Data collection:Limitations of ASF tool data Goal: Link code changes made by each bug toadd code change information to bug information Bugzilla Bugs CommitLogs • Problems • No link from bug to commits • Informal links from commits to bugs • Informal identity management
Data examples ------------------ CVS commit log 1272 at 2001-02-01 15:37:28 by Nico Seessle ------------------ Fixed Bug #378. ExecuteOn (and Apply) have a default-value of false for their parallel-attribute. Problem: Informal links from commits to bugs Commit Email Real name Bugzilla Id craigmcc@locus.apache.org craigmcc@hyperreal.org craigmcc@apache.org craigmcc@daedelus.apache.org Craig R. McClanahan craig.mcclanahan@sun.com strutsbugs@freetocreate.org struts-bug@freetocreate.org rleland@apache.org rleland@apache.org Rob Leland Problem: Informal identity management
Solution: 1st manual identity determination • Manual building of project committer identity • 99 individuals identified • Used: • ASF web pages • Google, etc. • Dates of actions • Project mailing lists (headers noting real name) Very Manual—High Confidence in Links: an “Anchor” for linking bugs to commit logs
Solution: 2nd semi-automated linking of bugs to commits • Wrote Java code to assist linking CVS commits to individual Bugzilla bugs • Extracts all numbers from CVS commit log • Checks if number is a bug for the project • Becomes set of possible bugs • Checks if commit is within the duration of bug • Checks if committer was “involved” with the bug • Becomes inferred set of bugs If extracted set matches inferred set then entry is made automatically—otherwise researcher shown all information and asked to correct the inferred set (if necessary)
Example: Automatic Link "struts" bug 15799 found : created 2003-01-04 15:12:17 (15799) Bugzilla description: Nested tags picks up wrong bean for values (15799) 2003-01-05 22:13:43 David Morris 4 1.0 Beta 3 1.1 Beta 3 (15799) 2003-02-04 21:03:34 James Mitchell 4 1.1 Beta 3 Nightly Build (15799) 2003-02-05 02:40:54 James Turner 15 struts-dev@jakarta.apache.org (15799) 2003-02-05 03:36:34 Ted Husted 4 Nightly Build 1.1 Beta 3 (15799) 2003-02-06 00:36:48 Arron Bates 8 NEW RESOLVED (15799) 2003-02-06 00:36:48 Arron Bates 11 FIXED ------------------ CVS commit log 27541 at 2003-02-05 16:26:11 by Arron Bates ------------------ Committed patch Bug15799, reported and patched by David Morris. IDEA also told me to remove a redundant class cast ( ...a fashionable thing to do it seems :) Inferred set [15799] = [15799] No decision required by researcher
Example: Manual Link "tomcat" bug 207 found : created 2000-10-28 11:58:02 (207) Bugzilla description: mod_jk.conf-auto is not generated when tomcat is started BugRat Report#319 Not adding bug 207 to inferred set [:log time after bug lifetime:comitter not in bug group] "tomcat" bug 660 found : created 2001-02-21 03:04:15 (660) Bugzilla description: Bad context on Authentication Form Page Not adding bug 660 to inferred set [:log time after bug lifetime:comitter not in bug group] "tomcat" bug 371 found : created 2000-12-22 20:24:31 (371) Bugzilla description: Webdav status code 207 not present in core/LocalStrings.properties BugRat Report#660 ------------------ CVS commit log 13662 at 2001-03-15 12:15:21 by Marc Saegesser ------------------ Added 207 result code for WEBDAV. PR: 660/Bugzilla 371 Submitted by: dsklar@mediaone.net (David F. Sklar) Inferred set [371] Link bug ids (c to clear)[207, 660, 371] 371 MANUAL INPUT Decision required by researcher: 207 is a result code (not a bug reference) and 660 is the id from the pre-Bugzilla Jakarta bug system
Noting and linking outside contribution: not done (yet) • Linking contribution by non-committers to bug fixes (or enhancements) between CVS and Bugzilla • Often committers commit code changes contributed by non-committers • No standard approach in CVS logs to indicate such a contribution (informal references to known contributors) • Obscuring of email address (to fight SPAM) has hit open source logs • Linking contributor names to Bugzilla Ids would face same issues noted for committers • Larger scale and less “context” to manually build up a case to link identity to identifiers Testcase submitted by: Martijn Kruithof <martijn at kruithof.xs4all.nl>
Variable creation:Narrowing bug focus • total to fixed? • fixed to w/java? • Examined 20 bugs:
subsets non-normal Variable creation:Per-bug variables
Factor 1: Public interest Public_LN (0.7) COMMsize_LN (0.6) DUPcount_BI (0.6) STATUSchanges_BI (0.3) Factor 2: Java code changed JavaCUchange (0.9) JavaPKchange (0.8) JavaSLOCchange (0.7) Factor 3: Committer interest Pcommit_BI (0.9) Pasf_BI (-0.9) Factor 4: Effort/Time Dtotal_nonLATER_LN (0.7) PRIORITY_BI (0.7) STATUSchanges_BI (0.6) SEVERITY_BI(-0.3) Variable reduction: (preliminary) Principal components analysis
Summary • We have a reasonable set of “synthetic” measures of some of the important characteristics of bugs and their fixes • How “costly” in several dimensions (time, public interest, etc.) • Next step: Identify, via expert judges, bugs for which CoE would have been effective • Combination with results so far will provide some understanding of how
Questions & Discussion • Questions? • Issues: • Approach to study • Definitions • bureaucratic (mechanical) vs. functional program properties • NetBeans data