360 likes | 468 Views
EEL 6883 Research Paper Presentation. Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes. Hector M.Olague, Letha H.Etzkorn, Sampson Gholston, Stephen Quattlebaum
E N D
EEL 6883 Research Paper Presentation Empirical Validation of Three SoftwareMetrics Suites to Predict Fault-Pronenessof Object-Oriented Classes DevelopedUsing Highly Iterative or AgileSoftware Development Processes Hector M.Olague, Letha H.Etzkorn, Sampson Gholston, Stephen Quattlebaum IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 33, NO. 6, JUNE 2007 Mustafa Ilhan Akbas – Omer Bilal Orhan
Introduction • OO metrics have been developed to assess design quality • A measure must be correct both theoretically and practically. • Empirical validation is necessary to demonstrate the usefulness of a metric in practical applications
Earlier Studies • El Emam et al. gives a literature survey • Many emphasize validation of the CK metrics suite. Also additional metrics for validation studies are considered. Using OO class metrics as quality predictors may be useful when a highly iterative or agile SW process is employed.
Case Study • Empirical validation of OO class metrics Goal: To assess the ability of OO metrics to identify fault-prone components in different software development environments. New: Agile SW process Questions to be answered: • Can the OO metrics suites employed identify fault-prone classes in software developed using a highly iterative, or agile, development process during its initial delivery ? 2. Can the OO metrics suites employed identify fault- prone classes in multiple, sequential releases of software developed using a highly iterative or agile process?
Test Case Test case: Multiple versions of Rhino. • Rhino may be considered an example of the use of the agile software development model in open source software. • New enhancements->New defects • Validated: • The Chidamber and Kemerer (CK) metrics • Abreu’s Metrics for Object-Oriented Desig • Bansiya and Davis’ Quality Metrics for Object-Oriented Design (QMOOD) validated.
Chidamber and Kemerer’s (CK) Metrics • Originally 1991, revised in 1994
Brito e Abreu’s MOOD Metrics • A small suite of metrics that could be used to evaluate systems designed with an “outside-in” methodology whereby early preparation and planning for development are of special importance. • The metrics that do not depend to a great extent on the definitions of functions can be collected early in the design phase. • The metrics should be easy to compute and should have a formal definition not tied to any particular OO language, should use consistent units, and should result in numbers independent of the system size. • All of the metrics result in a probability value between 0 and 1.
Bansiya and Davis’s Quality Model for ObjectOriented Design (QMOOD) Metrics • The QMOOD metrics calculated on a system can be used to compute a kind of supermetric, the Total Quality Index. • QMOOD metrics are defined to be computable early in the design process. • Bansiya and Davis first decided on a set of design quality attributes, based loosely on the attributes defined (ISO) 9126 standard: reusability, flexibility, understandability, functionality, extendibility, and effectiveness. • Then, they identified a set of object- oriented design properties that support the design quality attributes. • There are 11 of these design properties; for each one, Bansiya and Davis identified a metric that captures the design property.
Examined SW Data • Mozilla Rhino: Open-source implementation of JavaScript by Netscape. • 6 releases were analyzed. • Metrics collection tool used: Software System Markup Language (SSML) tool chain: Set of tools working together to translate source code into an intermediate source-language independent representation and then perform analysis on this representation. • One of the tools in the SSML tool chain yielded the metrics used in the paper.
12 principles by Agile Alliance } Open source: The users/clients of the software are typically the developers 1. Early and continuous delivery of SW. 2. Welcome changing requirements, even late. 3. Deliver working SW frequently. 4. Business people and developers must work together daily 5. Build projects around motivated individuals. 6. The most efficient and effective method to convey information is face-to-face conversation. 7. Working software is the primary measure of progress. 8. Agile processes promote sustainable development. 9. Continuous attention to technical excellence enhances agility. 10. Simplicity is essential. 11. The best designs emerge from self-organizing teams 12. The team reflects on how to become more effective regularly and adjusts its behavior accordingly. X Users provide updates continually as the software is used } The team performs regression testing and relies on user feedback Users would perform improvements only to their own work. Users who need the updates are the teams
Analysis Methods • Fault data was collected and analyzed for Rhino. • SSML tools were used to collect OO metrics from the source code for each version of Rhino
Analysis • It is checked whether the metrics from 3 different suites are related to each other: Whether they are measuring different dimensions of OO class quality or whether they measure the same thing ? • First, a calibration: Analyzed intercorrelations between the CK class metrics and compared results to see differences compared to previous case studies. • Then, to determine which metrics can be used as fault predictors, a bivariate correlation between defects and the individual metrics from 3 metrics suites was performed. New for QMOOD and MOOD metric suites.
Analysis • Developed models using the different metrics suites to predict faults • The lack of variability in the response variable has been the principal motivation to forgo the use of traditional linear regression techniques in favor of logistic regression. • Examined the distribution of the number of defects found in classes of Rhino SW and concluded there was good variability in the response variable in 3 later versions of Rhino, so developed multivariate linear regression models for them. • Yielded poor models. So employed binary logistic regression analysis to develop models to predict faults.
BLR • Performed univariate binary logistic regression (UBLR) of metrics versus faults to determine which variables were statistically significant quality indicators. • Performed a collinearity analysis to determine which variables to include in the multivariate binary logistic regression (MBLR) models. • Developed three models for the CK metrics, two models for the MOOD metrics, and two models for the QMOOD metrics. Models are validated using a simple holdout method
Results • Comparison of CK to previous studies • Bivariate Correlation between defects and models. • Logistic Regression Analysis
Comparison of CK results in Rhino to Previous Statistical Studies • CK has 6 metrics and these results will show the inter correlation between those. (Table 1) • WMC : Weighted Methods Per Class • DIT : Depth of Inheritance Tree • NOC : Number of Children • CBO :Coupling between Children • RFC : Response for a Class • LCOM : Lack of Cohesion of MEthods
Bivariate Correlation Between Defects and Metrics From Models • Similar results were shown on CK metric but little on others. • Only 3 versions of Rhino is shown here because number of faults are higher in these versions. (i.e. 178, 198, 201)
Bivariate Correlation Between Defects and Metric Components • CK RTC and WMC good positive correlation, • QMOOD Dam is consistent negative correlation • QMOD NOM result might be wrong ??
Logistic Regression Analysis • Binary logistic regression analysis is done in different Rhino versions. • First univariate BLR to determine which metrics are good indicators of quality than multivariate BLR on CK and QMOOD.
Univariate Binary Logistic RegressionMetrics versus Faults • The measures of association used are • Log Likelihood (LL) • P-value • Odds Ratio • Test statistics (G) • Hosmer-Lemeshow (HL) • CK–CBO is significant in 5/6 versions. • CK-LCOM98 is significant 4/6 versions • CK-RFC & QMOOD-CIS are significant on all 6 versions • MOOD metrics were significant only in 2/6 of the versions
Multivariate Binary Logistic Regression • Performed a collinearity analysis to determine which models to use in MBLR. • There are correlation between 2 variables in every metric suite models. That means a potential Collinearity problem (i.e. there are dependent variables in the model). • To remove the dependent variables and build a model using independent variables, VIF (variance inflation factor) of all possible repressors will be computed. The ones with the multicollinearity problem will be remove and reevaluated. • Also Condition Number is an indicator of Multicollinearity will be examined. (CN = Largest EigenValue/ Rest of Eigs)
MBLR Parameter SelectionModel3 Similar selections are done for QMOOD and MOOD and 2 models from each suites are selected and created.
MBLR RESULTS • 3 models using CK, 2 using MOOD and 2 using QMOOD is created. • Model1 for CK was successful but univariate BLR for CK-WMC is also successful. So why use Multivariate? • Some other parameters give significant result for different versions. • Models for MOOD were unsuccessful. • 2 models of QMOOD were shown significant, and 2 different parameters were important in 2 models, so using multivariate might help.
MBLR Model Validation • The results are shown for small and large classes, because size of the class might bias the errors in the code. Also only the concordant (defectives remain defective) values will be shown. • Results. • Models are able to classify fault prone classes. • There is a general deterioration of effectiveness of the metrics as the software progress the versions. • CK and QMOOD models performed better than MOOD models
MBLR Model Validation • Also effectiveness deteriorate faster in successive version for small classes.
Conclusion • Authors conducted a statistical analysis of the CK, MOOD, and QMOOD OO class metrics suites using six versions of Mozilla’s Rhino open source software • Primary contribution is using OO metrics to predict defects in agile SW. • Another contribution is the empirical study of the QMOOD and MOOD metrics • CK-WMC, CK-RFC, QMOOD-CIS, and QMOOD-NOM are consistent predictors of class quality (error-proneness).
Conclusion • The MOOD metrics were not useful as predictors of OO class quality. • CK metrics suite produced the best three models for predicting OO class quality, followed closely by one QMOOD model. • CK metrics have been shown to be better and more reliable predictors of fault-proneness than the MOOD or QMOOD metrics. • Class size can impact metric performance. • There are practical limitations to the effectiveness of the metrics over the course of several software iterations as the software matures and the dynamic nature of the software development process subsides.
Future Work • Complexity-related measures may be effective in detecting error-prone classes in highly iterative or agile processes • The decision trees may be more effective than binary logistic regression in detecting error-prone classes in highly iterative or agile processes. • Various aspects of OO complexity proposed in previous studies and implemented in various metrics suites may be better predictors of OO class quality in highly iterative or agile systems. • The use of decision trees using metrics from this study.