Prediction of fault-proneness at early phase in object-oriented development

Prediction of fault-proneness at early phase in object-oriented development Toshihiro Kamiya†, Shinji Kusumoto† and Katsuro Inoue†‡ † Osaka University ‡Nara Institute of Science and Technology

Background • Complexity metrics are used to estimate fault-proneness of software component. • According to the value of the metrics, we can allocate the effort of review/testing to the fault-prone component. • Chidamber and Kemerer’s metrics are the representative complexity metrics for object-oriented software.

Chidamber and Kemerer’s metrics[1] • C&K metrics evaluate complexity of classes from the following three viewpoints: • Inheritance complexity • ・ DIT (Depth of inheritance tree) • ・ NOC (Number of children) • Coupling complexity • ・ RFC（Response for a class • ・ CBO(Coupling between object-class) • Class internal complexity • ・ WMC(Weighted methods par class • ・ LCOM(Lack of cohesion in method) • [1] S.R. Chidamber and C.F. Kemerer, A Metrics Suite for Object Oriented Design, IEEE Trans. on software eng., vol., 20, No. 6 (1994) 476-492.

Evaluation of C&K metrics • Several research studies evaluate the usefulness of C&K metrics. • ・ Chidamber and Kemerer confirm that C&K metrics satisfy Weyuker’s properties [1]. • ・ Basili et. al. empirically evaluated that C&K metric suit is better predictor of fault-proneness of class than traditional code metrics [2]. • ・ Briand et. al. discussed several design metrics that include C&K Metrics [3]. • [2] Basili, V. R., Briand, L. C., and Mélo, W. L., A validation of object-oriented design metrics as quality indicators, IEEE Trans. on Software Eng. Vol. 20, No. 22, (1996) 751-761. • [3] Briand, L. C., Daly, J.W., and Wüst, J.K., A Unified Framework for Coupling Measurement in Object-Oriented Systems, IEEE Trans. on software eng., vol.25, No.1, (1999) 91-121.

Difficulty in applying C&K metrics to design • In previous researches, C&K metrics were applied to source code. Because some of C&K metrics need information such as algorithm or call-relationship, which are determined later at design phase. • In order to allocate the review and testing effort efficiently, early estimation of the fault-prone classes (components) is preferable.

Proposed method • We propose a method to predict fault-proneness at early phase in object-oriented development. • 1. Introduce four checkpoints into design / implementation phase. • 2. Determine the available metric set at each checkpoint. • 3. By multivariate logistic regression analysis, estimate fault-proneness of the classes (components) at each checkpoint. • We empirically evaluate how the metric sets predict fault-prone classes at each checkpoint.

Introduced checkpoints Analysis • CP1: Association and attributes of classes are determined. • CP2: Derivation, interface(method), and reused classes are determined. • CP3: Algorithm of each method is developed. • CP4: Source code is written. System Design Object Design Implemen- tation t

Metrics • We use following metrics in this study. • C&K metrics • ・ DIT, NOC, RFC, CBO, WMC, and LCOM • ・ CBON (Coupling to newly developed classes) • ・ CBOR (Coupling to reused classes) • CBO = CBON + CBOR • Other metrics • ・ NIV (Number of instance variables) • ・ SLOC (Source lines of code)

Checkpoints and metric sets Analysis • CP1: Association and attributes of classes are determined. • { CBON, NIV } • CP2: Derivation, interface(method), and reused classes are determined. • { CBON, NIV,CBOR, CBO, WMC, DIT, NOC } • CP3: Algorithm of each method is developed. • { CBON, NIV,CBOR, CBO, WMC, DIT, NOC,RFC, LCOM } • CP4: Source code is written. • {CBON, NIV,CBOR, CBO, WMC, DIT, NOC,RFC, LCOM,SLOC } System Design Object Design Implemen- tation t

Estimation of fault-proneness of classes • “Multivariate logistic regression is a standard technique based on maximum likelihood estimation, to analyze the relationships between measures and fault-proneness of classes.” • P1: fault-proneness (probability of fault detected) • CBO, NIV: metric values • C0, C1, C2: coefficients • If P1 of the target class > 0.5, then the class is predicted as faulty.

Outline of the experiment • We empirically evaluate the proposed method using the data collected from an experimental project. • ・ The experimental project was performed at a computer company for five days in August 1997. • ・ Developers were new employees who finished on-the-job training of object-oriented design and C++ programming. • ・ Developer teams developed an identical e-mail delivery system using C++.

Experimental data • Fault tracking data • ・ Location • ・ Type • ・ Effort to fix • Metric data • ・ Metric values of developed classes • As the result, 80 faults of 141 classes were collected.

Statistics of empirical data

Prediction by metrics (1/2) With the collected data, we estimate fault-prone classes at each checkpoint. Prediction at CP1

Prediction by metrics (2/2)

Indicators for evaluation To illustrate the precision of the estimation, two indicators are used [2]. Completeness: percentage of classes correctly predicted faulty in actual faulty. Correctness: percentage of classes actual faulty in predicted faulty. Completeness Correctness

Precision of estimation On the whole, the precision of estimation improves as the process progress. • Correctness is relatively high at all checkpoints, so that the estimation used to ‘seed’ the faulty classes. • Completeness becomes better at later checkpoint. The estimation at CP2 does well.

Conclusion • We have proposed a method to predict fault-proneness at early phase in object-oriented development, and evaluated the method empirically. • As further work, we are going to: • ・ Use other metrics in the proposed method. • ・ Develop the tool which support the proposed method.

Weyuker’s properties [4] • Let (c) denote a measurement of metric  for class c, and p + q denote the combined class of class p and q, • W1pq, (p) (q). • W2pq, (p) = (q), and p differs from q. • W3pq, (p) (q), and p's functionality is equal to q's (but p's design differs from q's). • W4pq, (p) (p + q), and (q) (p + q). • W5pq  r, (p) = (q), and (p + r) (q + r). • W6pq, (p) + (q) (p + q). • Chidamber and Kemerer proved that each metric WMC, DIT, NOC, CBO, RFC, and LCOM satisfies W1, ..., W6, except for NOC and LCOM which do not satisfy W4. • [4]Weyuker, E. J., Evaluating software complexity measures, IEEE Trans. on Software Eng. Vol. 14, No. 9, (1998), 1357-1365.

Coefficients at each checkpoint

Prediction of fault-proneness at early phase in object-oriented development

Prediction of fault-proneness at early phase in object-oriented development

Presentation Transcript

SE204 Object-Oriented Development

CSSE501 Object-Oriented Development

Object-Oriented Software Development

Object-Oriented Development

Object-Oriented Database Development

Object-Oriented Development

CSSE501 Object-Oriented Development

Early History of Object Oriented Programming

Object-Oriented Development

Object Oriented Development

Object Oriented Programming Development

Object Oriented Development

Object-Oriented Software Development

Object Oriented Software Development

Object-Oriented Software Development

Object-Oriented Development

Object Oriented Software Development