Increasing the accuracy of software development effort estimation

Increasing the accuracy of software development effort estimation Greg Somers Alex Plachkov submitted to Professor ShervinShirmohammadi in partial fulfillment of the requirements for the course ELG 5100

Problem Statement • Number of successful projects in software industry (according to Standish Chaos Report) was 32% in 2009 • To do well, a company needs • Mature development process • Refined over many iterations of completed (and failed) projects • Skilled and experienced managers • Able to draw parallels between upcoming and past projects • Reliable size/effort estimation tools • COnstructiveCOstMOdel (COCOMO) • Skilled and experienced developers • Usually not the issue

Why do Software Project Fail? • Software projects have a dynamic and uncertain nature • Posses features that are hard to understand • Typically, there is a lack of information during the early stages of a project • Can lead to wrong estimates, which in turn can lead to a project failure • Conclusion: effort is hard to model with fixed algorithmic approaches • How do we predict software development effort?

Background Info & Related Work • To overcome the aforementioned challenges, non-algorithmic approaches have been employed: • Analogy-Based Estimation (ABE) • Artificial Neural Network (ANN)

1. Analogy-Based Estimation (ABE) • Compiled dataset with data from previous projects • Selecting relevant features (e.g. LOC, FP) • Estimate required • Represent each project as a point in multi-dimensional space (feature vector) • Calculate similarity level between new project and the existing projects (similarity functions) • Euclidean Similarity • Manhattan Similarity • Estimate the new project’s effort - solution functions • Most Similar Project • Median of Similar Projects • Average of Similar Projects

2. Artificial Neural Network (ANN) • Many different types of ANN used in the literature • Projects represented as points in multi-dimensional space • Network learns the features present in the historical project dataset (training) • Attempts to predict the difficulty of a new project (estimation) • Suffers from inaccurate estimation when inconsistent projects (ones that are not alike) are present in the historical project dataset

Hybrid Solution • Fuzzy C-Means (FCM) Clustering • All historical projects are clustered • Projects that are alike belong to the same cluster • Cluster is marked (ABE vs ANN) according to the number of features and the number of projects found within • Target project is compared to the centre of clusters using ES function • Depending on the cluster mark, ABE or ANN technique is employed • Neural networks perform well when they are trained using consistent (non-contradictory) training data -- projects found in a single cluster provide this type of data

Advantages & Disadvantages • Observed advantages of approach • The consistency of the training data for the Neural Network is improved • More accurate predictions • Offers a solution for high population clusters and low populations clusters • Observed disadvantages of approach • Large amount of projects needed in the historical dataset • Large amount of information per project • There can be cases where a new project is not similar to any existing ones • Still need to make some estimates about the new project • Relevant features

Evaluation Procedure • Datasets • Desharnais dataset • Based on 77 completed software projects from 1989 • 8 features / project • Team Experience , manager’s experience, project length, programming language, etc. • Maxwell dataset • Based on 62 software projects from 2002 • 26 features / project • Database, user interface, standards use, install requirements, staff skills, tools use

Evaluation Procedure • Setup • Clustering - Cross validated • Analogy-Based Estimation • Euclidean similarity function & Inverse distance weighted mean • Artificial Neural Network • Feed forward function with 2 levels

Evaluation Procedure • Cross Validation - 3 trials • Each trial uses different training subsets and testing subsets • Comparisons • Compared with 4 variations of ABE, ANN, multiple linear regression (MLR), stepwise regression (SWR), and CART • Performance • mean magnitude of relative error (MMRE) – lower result is better • percentage of the prediction (PRED) – higher result is better N is the number of estimated projects and A is the number of projects with MRE less than or equal to X

Scientific Contributions / Results • Desharnais- Mean magnitude of relative error (MMRE) results

Scientific Contributions / Results • Desharnais - Mean magnitude of relative error (MMRE) results

Scientific Contributions / Results • Desharnais- Percentage of the prediction within 25% results

Scientific Contributions / Results • Desharnais– Magnitude of relative error (MRE) box plot

Scientific Contributions / Results • Maxwell – Mean magnitude of relative error (MMRE) results

Scientific Contributions / Results • Maxwell - Percentage of the prediction within 25% results

Scientific Contributions / Results • Maxwell – Magnitude of relative error (MRE) box plot

Scientific Contributions / Results • Percentage Improvement

References • [1] KhatibiBardsiri, V.; Jawawi, D.N.A.; Hashim, S.Z.M.; Khatibi, E., "Increasing the accuracy of software development effort estimation using projects clustering," Software, IET , vol.6, no.6, pp.461,473, Dec. 2012 • [2] D. Galorath (2012, June 7). Software Project Failure Costs Billions, Better Estimation & Planning Can Help [Online]. Available: http://www.galorath.com/wp/software-project-failure-costs-billions-better-estimation-planning-can-help.php • [3] Desharnais, J.: ‘Analyse statistique de la productivitie des projetsinformatiquea partie de la technique des point des foncti on’ (MasterofScience, University of Montreal, 1989) • [4] Maxwell, K.: ‘Applied statistics for software managers’ (Prentice-Hall,EnglewoodCliffs, NJ, 2002)

Questions?

Increasing the accuracy of software development effort estimation