100 likes | 200 Views
CLUSTERING SUPPORT FOR FAULT PREDICTION IN SOFTWARE. Maria La Becca Dipartimento di Matematica e Informatica, University of Basilicata, Potenza, Italy marialabecca@gmail.com. Fault Prediction Approaches. LEXICAL & STRUCTURAL INFORMATION. NEW SW CLUSTERING APPROACH. Process Metrics.
E N D
CLUSTERING SUPPORT FOR FAULT PREDICTION IN SOFTWARE Maria La Becca Dipartimento di Matematica e Informatica, University of Basilicata, Potenza, Italy marialabecca@gmail.com
Fault Prediction Approaches LEXICAL & STRUCTURAL INFORMATION NEW SW CLUSTERING APPROACH Process Metrics Component || Package Level Fault Predictors Product Metrics • SW • Quality • Testing • Refactoring GOAL INTRODUCTION Cluster Level Predictor FAULT PREDICTION MODELS
Software Clustering Approach– Steps : • Lexical 1 - CORPUS CREATION 2 - CORPUS NORMALIZATION 3 - CORPUS INDEXING • Structural Corpus Vector Space Model (VSM) OO SW System Terms D1 Terms Di • SplittingIdentifiers • Special TokenElimination • Stop Word Removal • Stemming Term by Document Matrix Identifiers & Comment Terms D2 . . Terms Dn Terms D SOFTWARE CLUSTERING 4 - COMPUTING SIMILARITIES 5 - EXTRACTING DEPENDENCIES 6 - CLUSTERING ? JRipples ? G’ = (V, E, ω) LexicallySimilar StructurallyDependent BorderFlow Algorithm
Fault PredictionModels FAULT PREDICTION MODELS FAULT PREDICTION MODELS • Classes • LexicallySimilar • StructurallyDependent • Product Metrics • Multivariate Linear Regression • LogisticRegression
Definition and Context VS Baseline Approache (Class & Package) SW Clustering Approache Fault PredictionModels Fault PredictionModels != • Cluster Granularity Level • Class & Package Granularity Level = • Metrics • SWLR - LGR RQ – Does the cluster levelapproachimprove fault predictionascompared with the baseline (i.e., class and package level) ? CASE STUDY Source Code 15 Release & Popularity Dataset SW Metrics & Fault
Planning Training Set X.1 X.0 INTRA Previous Knowledge OO SW System Fault Prediction Empiric Evaluation X.0 X.1 INTER Test Set SelectedVariables CASE STUDY
Validation and Evaluation – Intra- & Inter-Release Analysis K-Fold Cross Validation K-Rounds Results To assess and compare predictors SWLR e LGR Averaged over the rounds Intra-Release Inter-Release DATASET V X.0 DATASET DATASET V X.0 V X.1 Version X.0 Training Set Test Set V X.0 V X.0 Dataset Dataset Test Set Training Set CASE STUDY (close to 1) SAR Kendall τ & Spearmanρ [-1;+1] SWLR Models SWLR Predictors Precision Recall F - measure AIC & RD (Lower Values > Goodness of Fit) LGR Models LGR Predictors
Results 3 SWLR CLUSTER + BASELINE INTRA- INTER-RELEASE 6 PREDICTORS OO Software System 3 LGR CLUSTER + BASELINE Legend: Best Values Worst Values No Prevalence RESULTS
Thanks CONCLUSION Acknowledgements Carmine Gravino Andrian Marcus Tim Menzies Giuseppe Scanniello