An Empirical Study of the Relationship Between Code Bad Smells and Software Faults

An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire

Introduction • What is a Code Bad Smell? • Problems using Code Bad Smells • An overview of the empirical study • Code Bad Smell detection • Fault identification • Result and discussion • Conclusion • Q/A

Code Bad Smells • The 22 Code Bad Smells are bad structures in source code informally identified by Fowler et al. (1999). • Fowler et al. (1999) suggest that Code Bad Smells can give “indications that there is trouble that can be solved by a refactoring”. • They are widely used for detecting refactoring opportunities in software (Mens and Tourwe, 2004).

Problems in Using Code Bad Smells • Fowler et al. (1999) claim that Code Bad Smells are structures which cause detrimental effects on software. However, little empirical evidence has been provided. • Most existing Code Bad Smell detection tools are Metric-based. We argue about their accuracy.

An Empirical Study of the Relationship between Code Bad Smells and Faults • Objective: Capture the relationship between Code Bad Smells and faults • Targeted Code Bad Smells: Data Clumps, Message Chains, Middle Man, Speculative Generality, and Switch Statements • Research Data: • Eclipse Core Packages (Release 3.0, 3.0.1, 3.0.2, 3.1 and 3.2) • Apache Common Packages (Common IO, Common Logging, Common Codec, Common DbUtils, Common DBCP, and Common Net)

Code Bad Smell Detection • Pattern-based Code Bad Smell detection • Define each Code Bad Smell as particular code patterns • Ideas from Gamma et al.’s (1995) definition of the GoF Design Patterns • Use Recoder API to analyse Java source code

An Example: The Pattern-based Definition of the Message Chains Bad Smell The Pattern-based Definition of the Message Chains Bad Smell

Fault Identification • Zimmerman et al.’s (2007) fault identification approach: • Locate “bug”, “fix(ed)” and “update(d)” token in CVS comment messages. • If a version entry in CVS contains one or more above tokens and those tokens are followed by numbers, this version entry is seen as a bug fixing update. • Those numbers are treated as bug ID. • Confirm the bug ID using Bugzilla database.

Results and Discussion: Binary Coding of the Existence of Code Bad Smells (1)

Result and Discussion: Binary Coding of the Existence of Code Bad Smells (2)

Result and Discussion: One-way Analysis of Variance Eclipse Data (1)

Result and Discussion: One-way Analysis of Variance Eclipse Data (2) • The five profiles which indicate the existence of each of the five Code Bad Smells contain significantly lower mean number of faults than profile zero. • All profiles which have higher mean number of faults than profile zero contain the Message Chains and the Switch Statement Bad Smells.

Result and Discussion: the Message Chains and Switch Statements

Result and Discussion: the Message Chains and Switch Statements • All source code samples associated with more than 10 faults contain the Message Chains Bad Smell. • The Switch Statements Bad Smell does not show a clear relationship with high number of faults.

Result and Discussion: One-way Analysis of Variance Apache Data (1)

Result and Discussion: One-way Analysis of Variance Apache Data (2) • The five profiles which indicate the existence of each of the five Code Bad Smells contain lower mean number of faults than profile zero. • All the Message Chains Bad Smell contained profiles do not show higher mean number of faults than the profile zero.

A Detailed Investigation of Message Chains • Objective: • To test whether the Message Chains Bad Smell is directly associated with faults. • To test whether the Message Chains Bad Smell is directly associated with particular types of faults. • Method: • Manually investigate 20 source code samples from the Eclipse project

An Detail Investigation of Message Chains: Direct Association with Faults

A Detailed Investigation of Message Chains: Fault Classification • Classification Schema: An adopted version of Seaman et al.’s (2008) fault classification schema • Results:

A Detailed Investigation of Message Chains: Result • Message Chains Bad Smell is not likely to be directly associated with faults, but it indicates a complicated software context. • Message Chains Bad Smell is likely to be associated with Algorithm/Method faults.

Conclusion • Source code containing only one of the five Code Bad Smells is not likely to be fault prone. • The Message Chains Bad Smell could cause a high number of faults and is likely to be associated with Algorithm/Method faults, so it deserves further attention. • The Message Chains Bad Smell may not be directly associated with faults but it may indicate a complicated software context.

Q/A

References • FOWLER, M., BECK, K., BRANT, J., OPDYKE, W. & ROBERTS, D. (1999) Refactoring: Improving the Design of Existing Code, Addison Wesley. • GAMMA, E., HELM, R., JOHNSON, R. & VLISSIDES, J. (1995) Design patterns : elements of reusable object-oriented software, Reading, Mass., Addison-Wesley. • MENS, T. & TOURWE, T. (2004) A survey of software refactoring. Software Engineering, IEEE Transactions on, 30, 126-139. • SEAMAN, C. B., SHULL, F., REGARDIE, M., ELBERT, D., FELDMANN, R. L., GUO, Y. & GODFREY, S. (2008) Defect categorization: making use of a decade of widely varying historical data. Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement. Kaiserslautern, Germany, ACM. • ZIMMERMANN, T., PREMRAJ, R. & ZELLER, A. (2007) Predicting Defects for Eclipse. IN PREMRAJ, R. (Ed.) Predictor Models in Software Engineering, 2007. PROMISE'07: ICSE Workshops 2007. International Workshop on.

An Empirical Study of the Relationship Between Code Bad Smells and Software Faults

An Empirical Study of the Relationship Between Code Bad Smells and Software Faults

Presentation Transcript

An Empirical Evaluation of Relationship Between Crude Oil and Natural Gas Prices

Identifying Architectural Bad Smells

Refactoring and Code Smells

An Empirical Study of the Causal Relationship Between IT Investment and Firm Performance

Procedural vs Object Oriented Design Bad code smells

Software Construction and Evolution - CSSE 375 Bad Smells in Code

Software Construction and Evolution - CSSE 375 Even more Bad Smells in Code

Code Smells

Relationship between the Santa Sonica , Hollywood , and Raymond faults.

Allometry: the study of the relationship between size and shape

Investigating the Evolution of Bad Smells in Object-Oriented Code

An Empirical Study of Real-world Polymorphic Code Injection Attacks

Empirical Study of Software Quality and Reliability

Refactoring and Code Smells

Study of the Relationship Between Life Satisfaction and Materialism

An Empirical Study of the Demeter System

Empirical study on the relationship between financial structure and economic growth of Zhejiang province

Empirical Study of Software Quality and Reliability

Refactoring and Code Smells

Refactoring and Code Smells

Refactoring and Code Smells