140 likes | 255 Views
STAR Seeking New Frontiers in Cost Modeling. Tim Menzies (WVU) Jairus Hihn (JPL) Oussama Elrawas (WVU) Karen Lum (JPL) Dan Baker (WVU). 22nd International Forum on COCOMO and Systems/Software Cost Modeling (2007) . STAR has three key advancements over traditional methods and even 2cee
E N D
STARSeeking New Frontiers in Cost Modeling Tim Menzies (WVU) Jairus Hihn (JPL) Oussama Elrawas(WVU) Karen Lum (JPL) Dan Baker (WVU) 22nd International Forum on COCOMO and Systems/Software Cost Modeling (2007)
STAR has three key advancements over traditional methods and even 2cee Provides an integrated set of COCOMO models COCOMO II COQUALMO COCOMO II Risk (threats) Assessment Model Can be used to systematically analyze strategic and tactical policy decisions Searches for optimal combination of inputs that jointly reduce effort, defect rates and threats Uses constraints to restrict search Free, Floating, Fixed Can be tuned/calibrated with constraint sets instead of traditional historical data records Seek stable conclusions in space of all tunings Abduction: View it as an alternative to Bayesian methods based STAR is an abductive inference enginethat applies simulated annealing to a treatment learner (TAR) STAR
Note • This talk is an extension of material presented in “The Business Case for Automated Software Engineering” • IEEE ASE 2007 • Menzies, Elwaras, Hihn Feather, Madachy, Boehm http://menzies.us/pdf/ 07casease.pdf
Stagger across the space of known tunings and inputs (Monte Carlo) For N staggers, score N runs by an index we call energy: Ef = (effort - minEffort ) / (maxEffort - minEffort) De = (defects - minDefects ) / (maxDefects - minDefects) Th = (threats - minThreats) / (maxThreats - minThreats) Save the one with lowest energy index Method normalization 0 <= x <= 1
Simulated annealing (Von Neuman) Pick input ranges and internal values at random Do many runs starting from “boiling hot” (when you stagger around like a drunk) to “cooler” (No staggering walk straight to your destination) Keep track of multiple solutions Current New Best How to Stagger Bad Good Best 10% Sample runs from STAR (after 500 runs, little improvement)
COCOMO effort estimation Effort multipliers are straight (ish) lines when EM = 3 = nominal… multiple effort by one (I.e. nothing) i.e. they pass through the point {3,1}; Staggering the Tunings Range of effort multipliers (COCOMO) decrease effort Increase effort acap, apex, ltex, pcap, pcon, plex,sced, site,toool cplx, data, docu pvol, rely, ruse, stor, time
Sort all ranges by their “goodness” Try the first ranked range, Then the first and second, Then the first and second and third And so on Seek the “policy” The fewest ranges that most reduce threats, effort, defects After staggering,select best things Bad Good 38 not-so- good ideas 22 good ideas
Staggering the inputs : 5 different ways • COCOMO II: stagger over entire model input space 1 4 2 3 5 “Values” = fixed “Ranges”= Loose (select within these ranges)
Making Strategic Decisions Constrained by Jairus’ guess at JPL environment Full range of model
One advantage of this output display If you can’t accept the full policy… … you can see what trade-offs arise with some partial policy But partial polices cannot include many choices. For example note the missing values: Peer reviews < 6 Execution testing & tools < 6 Automated analysis < 5 Results : OSP
OSP2 was a more constrained environment as it was a follow-on from OSP and ‘inherited’ the Team Development Environment Design Etc. Again note the missing values: Peer reviews < 6 Execution testing & tools < 6 Automated analysis < 6 Results: OSP2
No point in half-hearted defect removal Never found in any policy Peer reviews in 1,2,3,4 Execution testing & tools in 1,2,3,4 Automated analysis in 1,2,3,4 Beware spurious generalities X= one of {cocomo or osp or osp2 or flight or ground} Y= one of {cocomo or osp or osp2 or flight or ground} Not(X = Y) X’s best policy is not Y’s best policy Exception … … Use more automated analysis (model checking, etc) Automated analysis = 5 or 6 always in best policy Results: all experiments
Traditional Approach Current cost models are tuned to local contexts LC (Boehm, 1981) Tuned to local data using LC Hard to tell when old data no longer locally relevant Suffers from the “large outlier problem” Row pruning done heuristically Traditional approach Next Step 2CEE (Menzies, Jalali, Baker, Hihn, Lum, 2007) Tuned to local data using LC Tunes and validates every time it runs Tames outliers primarily with column pruning Uses Nearest Neighbor for row pruning Not all flight software is equal Culls old data that is no longer relevant Both of these approaches require you get more data which may be hard to obtain STAR Current research results suggest that we may be able to estimate almost as well without local data and LC. Use est vs actual instead of energy as evaluation metric Constrain parameter ranges based on project being estimated knowledge of what typically varies in your environment Assumes basic COCOMO tunings are ‘representative’ Seems reasonable Calibrating/Tuning Models
Mre = abs(predicted - actual) /actual Diff = ∑ mre(lc)/ ∑ mre(star) “” same at 95% confidence (MWU) “” same at 99% confidence (MWU) Very little difference Half the time: insignificantly different Otherwise, median diffs = +/- 25% Why so little difference? Most influential inputs tightly constrained Most of the variance comes from uncertainty in the SLOC, Not from noise of internal staggering Comparisons diff diff diff diff same diff same same same same