110 likes | 122 Views
This paper discusses defect prediction using CVS data to analyze various characteristics such as lines added, co-changed files, and modifications without commit messages. It further explores the use of value series of evolution attributes as relative measures. Results show correlations and validation using different metrics. Likes include interesting series measures and high correlation found in authors and commit messages. Dislikes include bland reading and lack of support for decisions on series attributes.
E N D
Quality Assessment based on Attribute Series of Software Evolution Paper Presentation for CISC 864 Lionel Marks
What is this paper about? • Defect Prediction • Uses CVS data to analyze characteristics such as: • Number of lines added for bug fixes • Number of co-changed files • Number of modifications without a commit message • Then they took the analysis further
The “further” analysis • Took “value series” of evolution attributes • These are relative measures • For example, for the number of lines of code deleted for bug fixes • The “value series” version would be: Number of lines deleted for bug fixes/Number of lines deleted (any type)
Examples of Evolution Attributes • Lines Added/Deleted • Number of Changes • Number of Authors • Co-Changed Files • Co-Changed New Files • Number of files that were created together with a change to the investigated file
Examples of Corresponding Value Series • Lines Added: • Lines added within a day/Total lines of code until this day • Number of Changes • Number of Changes within a day/Total number of changes in the history file until this day • Number of Authors • Number of authors within a day/ Number of changes within this day (Interesting!) • Co-Changed New Files • Number of newly introduced files that are co-changed files/number of co-changed files
Validation • Distance Equation – sum of squares of actual minus estimated • y is the actual value • w is the weight • a is the attribute • Over k instances
Correlation Coefficient • p bar is an average of the predicted values • a bar is an average of the actual values • Value of 1 = perfect correlation • Value of 0 = no correlation • Value of -1 = inverse correlation
Mean Absolute Error • Value of 0 means perfect data • Value greater than zero shows the error in the data averaged out over the number of points in the set
Mean Squared Error • Instead of using absolute value bars, squares are used to emphasize error more when there are large deviations • Still averaged out over the number of points in the data set
Results • Comm. had a lot of errors in their system • Authors best indicator overall • TLinesAdd a solid metric as well – • # of lines added in all co-changed files/#of couplings
Likes and Dislikes of This Paper • Likes • The different series measures were interesting • Very impressed with the high correlation found in Authors and Commit Messages • Nicely related to course work • Dislikes • Found the reading bland • Prefer more support for decisions for series attributes, would have liked more discussion on how they decided upon their denominators • Did not find many unique points to the paper