Software Cost Estimation

Software Cost Estimation Strictly speaking effort! 강릉대학교 컴퓨터공학과 권 기 태

1. Background 2. “Current” techniques 3. Machine learning techniques 4. Assessing prediction systems 5. Future avenues Agenda

1. Background Scope: • software projects • early estimates • effort ≠ cost • estimate ≠ expected answer

What the Papers Say... ...

The Problem • Software developers need to predict, e.g. • effort, duration, number of features • defects and reliability • But ... • little systematic data • noise and change • complex interactions between variables • poorly understood phenomena

An estimate is a prediction based upon probabilistic assessment. most likely p equal probability of under / over estimate 0 effort So What is an Estimate?

Some Causes of Poor Estimation • We don’t cope with political problems that hamper the process. • We don’t develop estimating expertise. • We don’t systematically use past experience. Tom DeMarco Controlling Software Projects. Management, Measurement and Estimation. Yourdon Press: NY, 1982.

2. “Current” Techniques • Essentially a software cost estimation system is an input vector mapped to an output. • expert judgement • COCOMO • function points • DIY models Barry Boehm “Software Engineering Economics,” IEEE Transactions on Software Engineering, vol. 10, pp. 4-21, 1984.

2.1 Expert Judgement • Most widely used estimation technique • No consistently “best” prediction system • Lack of historical data • Need to “own” the estimate • Experts plus … ?

BUT Lack of objectivity Lack of repeatability Lack of recall /awareness Lack of experts! Expert Judgement Drawbacks Preferable to use more than one expert.

What Do We Know About Experts? • Most commonly practised technique. • Dutch survey revealed 62% of estimators used intuition supplemented by remembered analogies. • UK survey - time to estimate ranged from 5 minutes to 4 weeks. • US survey found that the only factor with a significant positive relationship with accuracy was responsibility.

Information Used • Design requirements • Resources available • Base product/source code (enhancement projects) • Software tools available • Previous history of product • ...

Information Needed • Rules of thumb • Available resources • Data on past projects • Feedback on past estimates • ...

Delphi Techniques? Methods for structuring group communication processes to solve complex problems. Characterised by iteration anonymity Devised by Rand Corporation (1948). Refined by Boehm (1981).

1. Experts receive spec + estimation form 2. Discussion of product + estimation issues 3. Experts produce individual estimate 4. Estimates tabulated and returned to experts 5. Only expert's personal estimate identified 6. Experts meet to discuss results 7. Estimates are revised 8. Cycle continues until an acceptable degree of convergence is obtained Stages for Delphi Approach

Wideband Delphi Form Project: X134 Date: 9/17/03 Estimator:Hyolee Estimation round: 1 0 10 20 30 40 50 x x* x x! x x x Key: x= estimate;x* = your estimate; x!= median estimate

Observing Delphi Groups • Four groups of MSc student • Developing a C++ prototype for some simple scenarios • Requested to estimate size of prototype (number of delimiters) • Initial estimates followed by 2 group discussions • Recorded group discussions plus scribes

Delphi Size Estimation Results Absolute errors Estimation Mean Median Min Max Initial 371 160.5 23 2249 Round 1 219 40 23 749 Round 2 271 40 3 949

true size Converging Group

true size A Dominant Individual

Best known example of an algorithmic cost model. Series of three models: basic, intermediate and detailed. Models assume relationships between: size (KDSI) and effort effort and elapsed time 2.2 COCOMO Barry Boehm “Software Engineering Economics,” IEEE Transactions on Software Engineering, vol. 10, pp. 4-21, 1984. http://sunset.usc.edu/COCOMOII/cocomo.html

Model coefficients are dependant upon the type of project: organic: small teams, familiar application semi-detached embedded: complex organisation, software and/or hardware interactions COCOMO contd.

Drivers hard to empirically validate. Many are inappropriate for 1990's e.g. database size. Drivers not independent e.g. MODP and TOOL. COCOMO Cost Drivers • product attributes • computer attributes • personnel attributes • project attributes

Very influential, non-proprietory model. Drivers help the manager understand the impact of different factors upon project costs. Hard to port to different development environments without extensive re-calibration. Vulnerable to mis-classification of development type Hard to estimate KDSI at the start of a project. COCOMO Assessment

A synthetic (indirect) measure derived from a software requirements specification of the attribute functionality. This conforms closely to our notion of specification size. Uses: effort prediction productivity 2.3 What are Function Points?

Albrecht developed FPs in mid 1970's at IBM. Measure of system functionality as opposed to size. Weighted count of function types derived from specification: interfaces inquiries inputs / outputs files Function Points (a brief history) A. Albrecht and J. Gaffney, “Software function, source lines of code, and development effort prediction: a software science validation,”IEEE Transactions on Software Engineering, vol. 9, pp. 639-648, 1983. C. Symons, “Function Point Analysis: Difficulties and Improvements,”IEEE Transactions on Software Engineering, vol. 14, pp. 2-11, 1988.

Weighted count of different types of functions: external input types (4) e.g. file names external output types (5) e.g. reports, msgs. inquiries (4) i.e. interactive inputs needing a response external files (7) i.e. files shared with other software systems internal files (10) i.e. invisible outside system The unadjusted count (UFC) is the weighted sum of the count of each type of function. Function Point Rules

Type Simple Average Complex External input 3 4 6 External output 4 5 7 Logical int. file 7 10 15 Ext. interface 5 7 10 Ext. inquiry 3 4 6 Function Types

14 factors contribute to the technical complexity factor (TCF), e.g. performance, on-line update, complex interface. Each factor is rated 0 (n.a.) - 5 (essential). TCF = 0.65 + (sum of factors)/100 Thus TCF may range from 0.65 to 1.35, and FP = UFC*TCF Adjusted FPs

Data communications Distributed functions Performance Heavily used configuration Transaction rate Online data entry End user efficiency Technical Complexity Factors Online update Complex processing Reusability Installation ease Operational ease Multiple sites Facilities change

Language LOC per FP Assembler 320 C 150 (128) COBOL 106 (105) Modula-2 71 (80) 4GL 40 (20) Query languages 16 (13) Spreadsheet 6 Function Points and LOC Behrens (1983), IEEE TSE 9(6). C. Jones “Applied Software Measurement, McGraw-Hill (1991)

Simplest form is: effort = FC + p * FP Need to determine local productivity, p and fixed costs, FC. 40000 30000 E F F O 20000 R T 10000 500 1000 1500 2000 FP FP Based Predictions Effort v FPs at XYZ Bank

Productivity figures in FPs per 1000 hours: IBM 29.6 Finnish 99.5 Canada 58.9 Mermaid 37.0 US 28.5 All environments are not equal training personnel management techniques tools applications etc.

Widely used, (e.g. government, financial organisations) with some success: monitor team productivity cost estimation Most effective where homogeneous environment Variants include Mk II Points and Feature Points Function Point Users

Subjective counting (Low and Jeffery report 30% variation between different analysts). Hard to automate. Hard to apply to maintenance work. Not based upon organisational needs, e.g. is it productive to produce functions irrelevant to the user? Oriented to traditional DP type applications. Hard to calibrate. Frequently leads to inaccurate prediction systems. Function Point Weaknesses

The necessary data can be available early on in a project. Language independent. Layout independent (unlike LOC) More accurate than estimated LOC? What is the alternative? Function Point Strengths

2.4 DIY models Predicting effort using number of files

To introduce an economies or diseconomies of scale exponent: effort = p * Se where 0<e. An empirical study of 60 projects at IBM Federal Systems Division during the mid 1970s concluded that effort could be modelled as: effort (PM) = 5.2 * KLOC0.91 A Non-linear Model

Effort (PM) Size (KLOC) KLOC/PM 42.27 10 0.24 79.42 20 0.25 182.84 50 0.27 343.56 100 0.29 2792.57 1000 0.36 Productivity and Project Size using the Walston and Felix Model Productivity and Size

Productivity v Size

Model Researcher MMRE Basic COCOMO Kemerer 601% FP Kemerer 103% SLIM Kemerer 772% ESTIMACS Kemerer 85% COCOMO Miyazaki & Mori 166% Intermediate COCOMO Kitchenham 255% Bespoke is Better!

So Where Are We? • A major research topic. • Poor results “off the shelf”. • Accuracy improves with calibration but still mixed. • Needs accurate, (largely) quantitative inputs.

3. Machine Learning Techniques • A new area but demonstrating promise. • System “learns” how to estimate from a training set. • Doesn’t assume a continuous functional relationship. • In theory more robust against outliers, more flexible types of relationship. Du Zhang and Jeffrey Tsai, “Machine Learning and Software Engineering,”Software Quality Journal, vol. 11, pp. 87-119, 2003.

Different ML Techniques • Case based reasoning (CBR) or analogical reasoning • Neural nets • Neuro-fuzzy systems • Rule induction • Meta-heuristics e.g. GAs, simulated annealing

Case Based Reasoning

Using CBR • Characterise a project e.g. • no. of interrupts • size of interface • development method • Find similar completed projects • Use completed projects as a basis for estimate (with adaptation)

Finding the analogy, especially in a large organisation. Determining how good the analogy is Need for domain knowledge and expertise for case adaptation. Need for systematically structured data to represent each case. Problems

ANGEL ANaloGy Estimation tooL (ANGEL) http://dec.bmth.ac.uk/ESERG/ANGEL/

ANGEL Features • Shell • n features (continuous or categorical) • Brute force search for optimal subset of features — O((2**n) -1) • Measures Euclidean distance (standardised dimensions) • Uses k nearest cases. • Simple adaptation strategy (weighted mean). With k=1 becomes a NN technique

CBR Results A study of 275 projects from 9 datasets suggests that CBR outperforms more traditional statistical methods e.g. stepwise regression. Shepperd, M. Schofield, C. IEEE Trans. on Softw. Eng. 23(11), pp736-743.

Software Cost Estimation