A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies

EEL 6883 Research Paper Presentation A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies Barbara Kitchenham, Emilia Mendes, Guilherme Travassos IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 33, NO. 5, MAY 2007 Mustafa Ilhan Akbas – Omer Bilal Orhan

Outline • Motivation • Objective • Method • Results • Conclusions • Comments

Motivation Cross vs within-company cost estimation • Early studies suggested calibrating general purpose cost estimation models and using only single-company data. BUT: • Time required to collect data • Older projects may not reflect current tech • Care is necessary in data collection • Cross-company models are favored. BUT:

Motivation 1999 Maxwell – Within-company model is more accurate 1999 Briand – Cross-company model could be as acc. 2000 Briand – (with Maxwell data) Cross-comp. model can be as good as within data. 2002 Wieczorek Ruhe – Same trend with Briand data 2005 Mendes - Same trend with another data set But 2000,2001Jeffrey - Within-company models are superior 2003 Lefley Shepperd - Within-company model is more accurate with Briand data 2004 Mendes, Kitchenham - Within-company models are significantly better

Motivation Applicability of cross company models to the effort estimate for single company projects contradicts.

Objective • To determine under what conditions individual organisations are able to rely on cross-company-based estimation models • To provide advice to researchers about the value of cross-company models.

Method • Prepare a systematic review to determine factors that influence the outcome of the studies. • Discuss different variations in experimental procedure

Research Questions For the review, the authors follow the approach of Kitchenham paper: “Procedures For Performing Systematic Reviews” Point of view is formed by research questions: • Question one: • What evidence is there that cross-company estimation models are not significantly different from within-company estimation models for predicting effort for software/Web projects?

Research Questions • Question two: • What characteristics of the study data sets and the data analysis methods used in the study affect the outcome of within-company and cross-company effort estimation accuracy studies? • Question three: • Which experimental procedure is most appropriate for studies comparing within-company and cross-company effort estimation models?

Method: Population, Intervention, Comparison, Outcome • Population: Cross-company benchmarking data bases of Web and software projects • Intervention: Effort estimation models constructed from cross-company data, used to predict effort for single company projects • Comparison Intervention: Effort estimation models constructed from the within- company data only • Outcome: The accuracy of the cross- and within-company models

Search Strategy used for Primary Studies • The search terms used are constructed using the following strategy: • Derive major terms from the questions by identifying the population, intervention and outcome; • Identify alternative spellings and synonyms for major terms. Consultations with field experts and/or subject librarians to identify the terms; • Check the keywords in any relevant papers we already have; • Use the Boolean OR to incorporate alternative spellings and synonyms; Use the Boolean AND to link the major terms from population, intervention and outcome.

The main search terms: • Population: software, Web, project. • Intervention: cross-company, project, effort, estimation,model. • Comparison: single-company, project, effort, estimation,model. • Outcomes: prediction, estimate, accuracy

Sample search string: AND (software OR application OR product OR Web ) AND (method OR process OR system OR technique OR methodology OR procedure) AND(cross company OR multi organisation OR within organisation OR single company OR single-organisational OR company-specific) AND(model) AND(effort OR cost) AND(estimation OR prediction OR assessment) Complete set of search strings is given in the paper.

Initial Search Phase • Identification of candidate primary sources based on authors’ knowledge, and searches of electronic databases using the derived search strings • 1344 papers were retrieved, 25 represented the set of 10 known papers. • Manual scan of titles and/or abstracts of all 1344 papers

Databases/Journals Searched(from an earlier work) • Electronic Databases • INSPEC • El Compendex • Science Direct • Web of Science • IEEExplore • ACM Digital library • Individual journals (J) and conference proceedings (C) • Empirical Software Engineering (J) • Information and Software Technology (J) • Software Process Improvement and Practice (J) • Management Science (J) • International Software Metrics Symposium (C) • International Conference on Software Engineering (C) • Evaluation and Assessment in Software Engineering (manual search) (C)

Secondary search phase Has two sub-phases: • To review the references of each of the primary sources to find candidate primary sources repeatedly until no further relevant document is found. • To contact researchers who authored the primary sources in the first phase, or who could be working on the topic. Six researchers were contacted, no one was working in the area.

Study Selection • Criteria for including a primary study: • Any study compared predictions of cross-company with within-company models based on analysis of single-company project data. • Criteria for excluding a primary study: • If projects were only collected from a small number of different sources • If models derived from a within-company data set were compared with predictions from a general cost estimation model.

Study Quality Assessment • Part 1 : The quality of the study itself. • Has four top-level questions and an additional quality issue related to data set size. (Weight: 1,5) • Is the data analysis process appropriate? • Did studies carry out a sensitivity or residual analysis? • Were accuracy statistics based on the raw data scale? • How good was the study comparison method?

Study Quality Assessment • Part 2: The quality of the provided reporting. • Has four top-level questions. (Weight: 1) • Is it clear what projects were used to construct each model? • Is it clear how accuracy was measured? • Is it clear what cross-validation method was used? • Were all model construction methods fully defined?

Quality • Quality is used in 2 different ways: • as a score to ensure that results are not largely confounded with quality • a source of difference indicator between studies. • Quality of the study, not the model used. • The overall quality is good. • The factors varied between papers are size of data set, the method for predictions and performance of sensitivity analyses.

Data Extraction Strategy • For each paper a reviewer was nominated at random as data extractor, checker, or adjudicator. • Extractor : Reads the paper and completes the form • Checker : Reads the paper and verify the correctness of the form • Adjudicator : If there is a disagreement between first two, then reads the paper and give the final decision.

Data Extraction Strategy • Roles were assigned at random with the following restrictions: • No one should be data extractor on a paper he/she authored. • All reviewers should have an equal work load (as far as possible).

Results: Question 1 What evidence is there that cross-company estimation models are not significantly different from within-company estimation models for predicting effort for software/Web projects?

Results: Question 1 The Studies are organized into 3 groups: Cross-company models are not significantly different from within-company models. (4 out of 10) Cross-company models are significantly worse than within-company models. (All accuracy statistics are better for within-co models) (6 out of 10) Studies that didn’t undertake formal statistical testing – inconclusive ( 2 of them, S1 and S7)

Results: Question 1 • Four studies stating cross-company models are not significantly different. Uses leave-one-out, which biases positively towards within-company models. • S6 is not independent (uses S2 data), so this cannot be used as an evidence in group1. • S1 and S7 did not test the statistical significance. They are regarded as inconclusive and cannot be used as evidence either.

Results: Question 2 • What characteristics of the study data sets and the data analysis methods used in the study affect the outcome of within-company and cross-company effort estimation accuracy studies? • S10 contradicts that quality control makes cross- models as good as within-company models. • S3 and S1 take a different view on quality control (ESA database) Quality control isn’t reliable. • S2 and S6 both agree that stringent quality control is applied to data collection. Quality control can not ensure cross-company models perform as well as within-company models.

Results: Question 2 • No consistent evidence that the quality of the studies influences the results • S2 and S3 have lower scores • S10 has the highest quality score

Results: Question 2 • Number of projects in the within-company models. • There is noticeable difference in this number for S2, S3, S10 (median 63) and S4, S5, S8, S9 (median 10) are compared. • All the studies where within-company predictions were significantly better than cross-company predictions used small within-company data sets of fair quality. • Similar pattern applies to the range of effort values for the entire database

Results: Question 2 • Number of projects in the within-company models. • No clear patterns were observed for the size metrics used, nor for the procedure used to build the within-company model

The relationship between within-company and cross-company projects. Tukutuku suggests, greater the difference between projects, less likely it is that the cross-company model will provide accurate predictions for single company project. There is no clear indication that the strength of the cross-company relationship is a major factor in determining whether cross-company prediction models are as good as within-company models. Results: Question 2 30

Results: Question 3 Which experimental procedure is most appropriate for studies comparing within-company and cross-company estimation models? There is a large variation in the adopted procedures. 31

Results: Question 3 Studies aimed at assessing the conditions that would favor (or not) the use of a cross-company model should adopt the following procedure: • Use new within-co data sets independent of existing cross-co data sets • Perform sensitivity analysis using residual analysis for non-regression-based methods and influence analysis for regression-based methods. • Use regression analysis as the default model construction method. • Use a stepwise approach on the cross-company data based on variables collected in within-company data set. • Apply data transformations appropriate to the specific application • Perform statistical tests based on the absolute residuals on the raw data scale. • Report the residuals for each model or the effort.

Results: Question 3 Unable to provide definitive advice on cross validation but the authors believe that leave-one-out cross validation is not sufficiently stringent criterion. 33

Conclusions Some organizations would benefit from models derived from cross-company databases, while some others would not. The review is not able to conclusively explain the reason for this but shows some trends. 34

Conclusions Some trends: In all cases where within-company datasets significantly outperformed, the datasets are small and cross validation method was not very stringent. Within-co data is a subset of cross-co in all studies which shows no significant difference between two. Similarly, the within-co data sets had been collected separately in half of the studies that shows within-company dataset is significantly better. 35

Conclusions Authors’ advice: Consider the similarity of the projects in the cross-company dataset to your project and characteristics of your own company. Further research is required. To researchers : Come to consensus about the appropriate experimental procedure for this type of study. (authors suggest their procedure ) 36

Comments • There were no other reviews on the same topic that have been previously conducted. • The review criteria are not well-defined. • Only 6 of 10 studies give results for Q1. • No definitive results. • There is no information about company size for some projects. • If the projects undertaken in the company are similar to the dataset of cross-co model, it can be used. But deciding this similarity is another problem. • The authors contributed to 3 of 10 studies. • The paper can’t go further away from the starting point.

A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies

A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies

Presentation Transcript

Cost Estimation

Cost Estimation

Cost Estimation

Cost Estimation

Cost Estimation

A Systematic Review of the Literature

Cost Estimation

Cost Estimation

Cost estimation

Cost Estimation

Cost Estimation

Systematic studies

Cost Estimation

Cost Estimation

Developing a Systematic Review

Cost Estimation

COST ESTIMATION

Systematic review of qualitative studies 9914 citations, 11 studies

Systematic Review vs. Meta Analysis

Statswork Systematic Review Vs Meta-Analysis

Cost Estimation

A Review of Forest Carbon Sequestration Cost Studies