230 likes | 392 Views
An Empirical Study of the Relationship Between Code Bad Smells and Software Faults. Min Zhang School of Computer Science University of Hertfordshire. Introduction. What is a Code Bad Smell? Problems using Code Bad Smells An overview of the empirical study Code Bad Smell detection
E N D
An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire
Introduction • What is a Code Bad Smell? • Problems using Code Bad Smells • An overview of the empirical study • Code Bad Smell detection • Fault identification • Result and discussion • Conclusion • Q/A
Code Bad Smells • The 22 Code Bad Smells are bad structures in source code informally identified by Fowler et al. (1999). • Fowler et al. (1999) suggest that Code Bad Smells can give “indications that there is trouble that can be solved by a refactoring”. • They are widely used for detecting refactoring opportunities in software (Mens and Tourwe, 2004).
Problems in Using Code Bad Smells • Fowler et al. (1999) claim that Code Bad Smells are structures which cause detrimental effects on software. However, little empirical evidence has been provided. • Most existing Code Bad Smell detection tools are Metric-based. We argue about their accuracy.
An Empirical Study of the Relationship between Code Bad Smells and Faults • Objective: Capture the relationship between Code Bad Smells and faults • Targeted Code Bad Smells: Data Clumps, Message Chains, Middle Man, Speculative Generality, and Switch Statements • Research Data: • Eclipse Core Packages (Release 3.0, 3.0.1, 3.0.2, 3.1 and 3.2) • Apache Common Packages (Common IO, Common Logging, Common Codec, Common DbUtils, Common DBCP, and Common Net)
Code Bad Smell Detection • Pattern-based Code Bad Smell detection • Define each Code Bad Smell as particular code patterns • Ideas from Gamma et al.’s (1995) definition of the GoF Design Patterns • Use Recoder API to analyse Java source code
An Example: The Pattern-based Definition of the Message Chains Bad Smell The Pattern-based Definition of the Message Chains Bad Smell
Fault Identification • Zimmerman et al.’s (2007) fault identification approach: • Locate “bug”, “fix(ed)” and “update(d)” token in CVS comment messages. • If a version entry in CVS contains one or more above tokens and those tokens are followed by numbers, this version entry is seen as a bug fixing update. • Those numbers are treated as bug ID. • Confirm the bug ID using Bugzilla database.
Results and Discussion: Binary Coding of the Existence of Code Bad Smells (1)
Result and Discussion: Binary Coding of the Existence of Code Bad Smells (2)
Result and Discussion: One-way Analysis of Variance Eclipse Data (1)
Result and Discussion: One-way Analysis of Variance Eclipse Data (2) • The five profiles which indicate the existence of each of the five Code Bad Smells contain significantly lower mean number of faults than profile zero. • All profiles which have higher mean number of faults than profile zero contain the Message Chains and the Switch Statement Bad Smells.
Result and Discussion: the Message Chains and Switch Statements
Result and Discussion: the Message Chains and Switch Statements • All source code samples associated with more than 10 faults contain the Message Chains Bad Smell. • The Switch Statements Bad Smell does not show a clear relationship with high number of faults.
Result and Discussion: One-way Analysis of Variance Apache Data (1)
Result and Discussion: One-way Analysis of Variance Apache Data (2) • The five profiles which indicate the existence of each of the five Code Bad Smells contain lower mean number of faults than profile zero. • All the Message Chains Bad Smell contained profiles do not show higher mean number of faults than the profile zero.
A Detailed Investigation of Message Chains • Objective: • To test whether the Message Chains Bad Smell is directly associated with faults. • To test whether the Message Chains Bad Smell is directly associated with particular types of faults. • Method: • Manually investigate 20 source code samples from the Eclipse project
An Detail Investigation of Message Chains: Direct Association with Faults
A Detailed Investigation of Message Chains: Fault Classification • Classification Schema: An adopted version of Seaman et al.’s (2008) fault classification schema • Results:
A Detailed Investigation of Message Chains: Result • Message Chains Bad Smell is not likely to be directly associated with faults, but it indicates a complicated software context. • Message Chains Bad Smell is likely to be associated with Algorithm/Method faults.
Conclusion • Source code containing only one of the five Code Bad Smells is not likely to be fault prone. • The Message Chains Bad Smell could cause a high number of faults and is likely to be associated with Algorithm/Method faults, so it deserves further attention. • The Message Chains Bad Smell may not be directly associated with faults but it may indicate a complicated software context.
References • FOWLER, M., BECK, K., BRANT, J., OPDYKE, W. & ROBERTS, D. (1999) Refactoring: Improving the Design of Existing Code, Addison Wesley. • GAMMA, E., HELM, R., JOHNSON, R. & VLISSIDES, J. (1995) Design patterns : elements of reusable object-oriented software, Reading, Mass., Addison-Wesley. • MENS, T. & TOURWE, T. (2004) A survey of software refactoring. Software Engineering, IEEE Transactions on, 30, 126-139. • SEAMAN, C. B., SHULL, F., REGARDIE, M., ELBERT, D., FELDMANN, R. L., GUO, Y. & GODFREY, S. (2008) Defect categorization: making use of a decade of widely varying historical data. Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement. Kaiserslautern, Germany, ACM. • ZIMMERMANN, T., PREMRAJ, R. & ZELLER, A. (2007) Predicting Defects for Eclipse. IN PREMRAJ, R. (Ed.) Predictor Models in Software Engineering, 2007. PROMISE'07: ICSE Workshops 2007. International Workshop on.