580 likes | 666 Views
INFO 636 Software Engineering Process I Prof. Glenn Booker. Weeks 4-5 – Estimating Software Size. Why Plan?. As emphasized earlier, we need a good estimate of the amount of work to be performed, in order to predict effort and time accurately (per Boehm)
E N D
INFO 636Software Engineering Process IProf. Glenn Booker Weeks 4-5 – Estimating Software Size INFO636 Weeks 4-5
Why Plan? • As emphasized earlier, we need a good estimate of the amount of work to be performed, in order to predict effort and time accurately (per Boehm) • Estimation is one of the most challenging aspects of managing software development, hence our substantial focus on it here INFO636 Weeks 4-5
Estimation Example • Other fields have well established formulas for estimating work • Construction knows the cost per square foot of various types of construction • More complex projects look at the linear amount of walls, and the areas of various parts (walls, ceilings, etc.) to develop good estimates INFO636 Weeks 4-5
Size Estimation Process • The framework, or process, for planning a project was covered last lecture • Define system requirements • Product conceptual design • Estimate product size • Estimate resources and schedule • Develop the product • Refine basis for later estimates INFO636 Weeks 4-5
Estimation Tools • Most software estimation tools have been calibrated to use software size as an input, and produce effort and schedule as outputs • COCOMO, SLIM, PriceS, and McConnell’s tables in Rapid Development • Often start at fairly large project sizes, e.g. 10,000 LOC and up INFO636 Weeks 4-5
Estimation Tools • We need a basis for estimation which works for an individual (programmer) • Most organizations use either no estimation methods, or use terribly unreliable ones • 100% error is far too common INFO636 Weeks 4-5
Desired Estimation Goals • Criteria for a good estimation method include: • Use structured and trainable methods • Should apply to both development and maintenance • Should be able to handle all aspects of development, not just code INFO636 Weeks 4-5
Desired Estimation Goals • It should be suitable for statistical analysis • It should be adaptable to future types of work • It should be possible to judge the accuracy of your work (and hence refine the model) • We’ll briefly cover four estimation methods, then explain the proxy-based PROBE approach INFO636 Weeks 4-5
Estimation Methods • Wideband-Delphi Method • Fuzzy Logic Method • Standard Component Method • Function Point Method • Proxy-based Estimating INFO636 Weeks 4-5
Wideband-Delphi Method • This method was developed by Rand Corporation • It uses several people to estimate the same task, then applies a Delphi method to get a consensus estimate • The process is: • Discuss the problem INFO636 Weeks 4-5
Wideband-Delphi Method • Get anonymous estimates, and hand them to a moderator • Find the median estimate, and show everyone the set of estimates • Discuss the results, to uncover different views of the project scope • Repeat the process until estimates converge to within a predefined range INFO636 Weeks 4-5
Fuzzy Logic Method • This approach uses historic data to arrive at some meaningful estimates based on qualitative descriptions • Size categories such as Very Small, Small, Medium, Large, and Very Large • How data are divided into these categories depends on the type of data INFO636 Weeks 4-5
Fuzzy Logic Method • Data with a small range (say, a factor of five from very small to very large) can use a linear divisions • Data with a large range can use a base 10 logarithmic division (as shown in the text) INFO636 Weeks 4-5
Fuzzy Logic Method • Linear division breaks up sizes into evenly divided pieces • Here’s an example for the N track • If your work to read the text involves chapters from 23 to 75 pages long (I made those numbers up), then the range of sizes is 75-23=52 pages • Divide that range into five pieces by dividing by four 52/4 = 13 INFO636 Weeks 4-5
Fuzzy Logic Method • The midpoints of each size are just the lowest size, then add the 13 four times • Very Small midpoint = 23 pages • Small midpoint = 23+13=36 pages • Medium midpoint = 23+13*2=49 pages • Large midpoint = 23 +13*3=62 pages • Very Large midpoint = 23 +13*4=75 pages (which equals the largest chapter size) INFO636 Weeks 4-5
Fuzzy Logic Method • Use half of 13, or 6.5, to find the ranges for each size • Very Small range is up to 23+6.5=29.5 pages • Small range is 29.5 to 36+6.5=42.5 pages • Medium range is 42.5 to 49+6.5=55.5 pages • Large range is 55.5 to 62+6.5=68.5 pages • Very Large range is 68.5 pages and up • Notice each category’s range is also 13 pages, since we have linear divisions INFO636 Weeks 4-5
Fuzzy Logic Method • The logarithmic version is messier, since we have to • Convert the sizes to their base-10 logarithms • Follow the linear approach using the logarithms • Take everything to the power of 10 to convert it back to the original units INFO636 Weeks 4-5
Fuzzy Logic Method • The example in the book has LOC ranging from 173 to 10,341 LOC • The log10 of 173 is 2.238 • The log10 of 10,341 is 4.014 • The difference is 4.014 – 2.238 = 1.776 • Divide the difference by four to get the interval 1.776/4=0.444 • Mimic slide 15 to find the midpoints INFO636 Weeks 4-5
Fuzzy Logic Method • The midpoints of each size are just the lowest size, then add the 0.444 four times • Very Small midpoint = 2.238 • Small midpoint = 2.238 + 0.444 = 2.682 • Medium midpoint = 2.238 + 0.444*2 = 3.126 • Large midpoint = 2.238 + 0.444*3 = 3.570 • Very Large midpoint = 2.238 + 0.444*4 = 4.014 (which equals the largest code size) • Mimic slide 16 to find the ranges of each size category INFO636 Weeks 4-5
Fuzzy Logic Method • Use half of 0.444, or 0.222, to find the ranges for the first size (then just keep adding 0.444 to each range boundary) • Very Small range is up to 2.238+0.222=2.460 • Small range is 2.460 to 2.460+0.444=2.904 • Medium range is 2.904 to 2.904+0.444=3.348 • Large range is 3.348 to 3.348+0.444=3.792 • Very Large range is 3.792 and up INFO636 Weeks 4-5
Fuzzy Logic Method • Now take 10 to the power of the logarithms to find the actual LOC • Very Small range is up to 10^2.460=288 LOC • Small range is 288 to 10^2.904=802 LOC • Medium range is 802 to 10^3.348=2228 LOC • Large range is 2228 to 10^3.792=6194 LOC • Very Large range is 6194 LOC and up • This is the basis for the poorly labeled table at the bottom of page 104 in the text INFO636 Weeks 4-5
Fuzzy Logic Method • An aside…Tables 5.2 in the text divide each of the five basic categories (Very Small, etc.) into five more “subranges” • This follows the same approach, just adding more detail to each category • It’s unlikely you’ll have enough data to worry about subranges INFO636 Weeks 4-5
Standard Component Method • The Standard Component Method, by Putnam, assumes you have a substantial database from which to make your estimates • Make a realistic estimate of how many screens you think will be in your system • Estimate the lowest and highest possible numbers of screens you could imagine will be in your system INFO636 Weeks 4-5
Standard Component Method • For actual estimation, usen = (lowest number + highest number + 4*realistic number)/6 • The idea is to try to account for possible error in your estimate • Repeat this process for each type of component in your system INFO636 Weeks 4-5
Function Point Method • The function point approach uses “function points” as a proxy for the complexity of the system, independent of the programming language used INFO636 Weeks 4-5
Function Point Method • Each input or output function, interface, file, and inquiry is judged on a fixed complexity scale of small to large (not shown in the Humphrey text), and assigned some number of function points • The total number of function points is adjusted for 14 “influence” factors, such as the developers’ expertise, business environment, etc. INFO636 Weeks 4-5
Function Point Method • While a great language-independent method for judging the complexity of a program, it isn’t as reliable for estimating development effort • See IFPUG for more details INFO636 Weeks 4-5
Proxy-based Estimating • We are trying to predict the final size of a software product • Measuring or estimating that directly is tricky at best, so we use proxies to help get there • A proxy is an intermediate concept or substitute for what we really want to predict INFO636 Weeks 4-5
Proxy-based Estimating • The overall process is like this • We want to take the conceptual design, and break it into parts which correspond to the proxies available • Estimate each part of the system, based on the proxies • Add them up to get the overall product size INFO636 Weeks 4-5
Choosing a Proxy • The proxy size should correspond to the development effort size • Proxy content should be countable and easy to visualize • Proxy must be customizable • The proxy should be sensitive to the same factors which affect development INFO636 Weeks 4-5
Possible Proxies • In a manner similar to function points, any characteristic of the system could be proxies • Input screens, output reports, data files • Objects or classes • The fuzzy logic and function point concepts are essentially blended to produce the PROBE approach INFO636 Weeks 4-5
PROBE Method • PROxy-Based Estimation (PROBE) uses objects as proxies • See also Appendix C, Tables C36 and C40 • First choose appropriate proxy categories (e.g. Table 5.7, p. 117) • For code, calculation, data, I/O, control, print, etc. might be suitable proxies • Reading, discussion, homework,… (N track) INFO636 Weeks 4-5
PROBE Method • Choose reasonable size options for the proxies • For class, you might only have enough data for three sizes instead of five • Analyze your historic data to determine approximate sizes (LOC) for each proxy • For N track, the amount of effort needed INFO636 Weeks 4-5
PROBE Method • Now start using your method for a given assignment • Develop a conceptual design for the solution • Use your proxies to estimate the amount of code or effort needed to develop them • The example on page 120 is the first use of form C39 (p. 683) INFO636 Weeks 4-5
A Course Note • P track students will use the estimating pretty much as written in the text • Our forms are slightly different • N track students will develop their own proxies to correspond to their weekly activities, and create a custom form N39 to follow a similar process INFO636 Weeks 4-5
PROBE Method • The BASE PROGRAM section of C39 is a summary of the expected changes to the preexisting code • Base Size (B) is the amount of code already present • LOC Deleted (D) is how much existing code you plan to remove • LOC Modified (M) is how much existing code you expect to change INFO636 Weeks 4-5
PROBE Method • The PROJECTED LOC section contains: • Base Additions (BA) are planned additions to existing code (new lines within existing modules) • New Objects (NO) are new modules or classes which will need to be implemented • Your proxy structure is used to describe the Type, Methods, and Relative Size of the changes to BA and NO INFO636 Weeks 4-5
PROBE Method • The REUSED OBJECTS (R) section of C39 is used to describe • Code you’ll reuse from another preexisting source • Code you’ll create during this assignment which will be reusable • These tend to be rare during the course INFO636 Weeks 4-5
PROBE Method • Now comes the number crunching part • The Projected LOC (P) is the total amount of new development for this assignment; P = BA + NO • The terms b0 (hereafter beta0) and b1 (beta1) are linear regression parameters from your work history • By now you have a history of planned LOC or effort, and actual INFO636 Weeks 4-5
PROBE Method • What are beta0 and beta1? • The classic equation for a line is y = mx + b • ‘m’ is the slope, which corresponds to beta1 • ‘b’ is the y-intercept, which is beta0 • Here the ‘x’ axis is the planned LOC or effort, and the ‘y’ axis has actual values INFO636 Weeks 4-5
Actual LOC (Y) x Linear regression x x x x Data points from weekly assignments x Beta1 (slope) } 1 Beta0 (y-intercept) Planned LOC (X) PROBE Method INFO636 Weeks 4-5
PROBE Method • See “regression” handout for an example of calculating beta0 and beta1 • Note that Sxi2 means S(xi2) not [S(xi)]2 • When you use this, make sure the formulas are correct • ‘n’ changes each week as new data is created INFO636 Weeks 4-5
PROBE Method • Incidentally, if your estimates are always perfect, you’d have beta1 = 1, and beta0 = 0 (why?) • Once you have beta0 and beta1, find: • New and Changed LOC (N) = beta0 + beta1*(P + M) • It’s critical to note that later calculations for prediction interval use ‘N’, not ‘P’ INFO636 Weeks 4-5
PROBE Method • The expected size of the application after this project is • Total LOC (T) = N + B - D – M + R • The Total New Reused is the sum of code flagged (with a *) in the New Objects section which are being reused • Don’t need to use this very often INFO636 Weeks 4-5
PROBE Method • Then we get to the Range calculation • We have a refined estimate of the size of the system, but want to establish a prediction interval in which the real outcome is likely to fall • See the PSP_Calculation_Example.xls spreadsheet INFO636 Weeks 4-5
PROBE Method • To find the Range, we start with a parameter from the ‘t’ distribution • Called ‘t(a/2, n-2)’ where • a/2 is the width of the prediction interval – generally 70% or 90% • ‘n-2’ is the number of degrees of freedom; again, ‘n’ is the number of data pairs • In Excel, use TINV(1 - a/2, n - 2) INFO636 Weeks 4-5
PROBE Method • Next we need the standard deviation, s • That’s why column G adds up(Yi - b0 + b1*Xi)2 • s = sqrt[ S(Yi - b0 + b1 Xi)2 / (n-2)] • Now there’s a new term, xk (xk) • xk = P + M • This is the same term used in the N formula – the projected and modified LOC INFO636 Weeks 4-5
PROBE Method • Now use this to plug into formula 5.3 on page 124 • I’m not going to copy it here • Notice in the spreadsheet the column H calculation of (Xi - Xavg)**2which is also used to find the Range INFO636 Weeks 4-5
PROBE Method • Finally, find the Upper and Lower Prediction Intervals (UPI and LPI) • UPI = N + Range • LPI = N – Range • The Prediction Interval Percent is either 70% or 90%, the value used to find ‘t’ INFO636 Weeks 4-5
PROBE Method • If Range is comparable to N in magnitude • Choose a Prediction Interval Percent of 70% to keep Range smaller, and/or • Look for data fliers which can have a strong influence on sigma (s) • E.g. data points with relatively large value of (Yi - b0 + b1*Xi)2 INFO636 Weeks 4-5