1 / 16

The deviance problem in effort estimation

And defect prediction. And software engineering. The deviance problem in effort estimation. tim@menzies.us PROMISE-2. Software effort estimation Jorgensen: most effort estimation is “expert-based”; so “model-based” estimation is a waste of time

kennan
Download Presentation

The deviance problem in effort estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. And defect prediction And software engineering The deviance problem in effort estimation tim@menzies.us PROMISE-2

  2. Software effort estimation Jorgensen: most effort estimation is “expert-based”; so “model-based” estimation is a waste of time model- vs expert-base studies: 5 better, 5 even, 5 worse Software defect prediction Shepperd&Ince: static code measures un-informative for software quality dumb LOC vs Mccabe studies: 6 better, 6 even, 6 worse I smell a rat Variance: confusing prior results? Selecting Best Practices for Effort Estimation - Menzies, Chen, Hihn, Lum. TSE 200X Data Mining Static Code to Learn Defect Predictor - Menzies, Greenwald,Frank, TSE 200X

  3. What you never want to hear… • “This isn't right. This isn't even wrong.” • Wolfgang Pauli

  4. Standard disclaimer • An excessive focus on empiricism … • … stunts the development of novel , pre-experimental, speculations • But currently: • there is no danger of an excess of empiricism in SE • SE= a field flooded by pre-experimental speculations.

  5. Public domain data Don’t test using your training data N-way cross val M * randomize order Straw man Feature subset selection Thall shalt script you will run it again Study mean and variance over M * N Defect predictions Sampleexperiments effort estimation defect prediction

  6. Massive FSS Singletons, including LOC, not enough Data summation:K.I.S.S. • Combine PD/PF • Compute & sort combined performance deltas , method A vs all others • Summarize as quartiles • 400,000 runs • Nb= naïve bayes • J48= entrophy-based decision tree learner • oneR=straw man • logNums= log the numerics

  7. Software effort estimation Jorgensen: most effort estimation is “expert-based”; so “model-based” estimation is a waste of time model- vs expert-base studies: 5 better, 5 even, 5 worse Software defect prediction Shepperd&Ince: static code measures un-informative for software quality dumb LOC vs Mccabe studies: 6 better, 6 even, 6 worse I smell a rat Variance: confusing prior results?

  8. Software effort estimation 30 * { shuffle, test = data[1..10] train = data - test, <a,b> = LC(train) MRE = Estimate(a,b,test) } Software defect prediction 10 * { randomly select 90% of data, score each attribute via “INFOGAIN” } Target class: continuous Target class: discrete Large deviations confuse comparisons of competing methods Numerous candidates for “most informative” attributes Can be reduced by FSS Sources of variance

  9. PCA worse (empirically) INFOGAIN fastest Useful for defect detection e.g. 10,000 modules in defect logs WRAPPER slowest Performs best Practical for effort estimation e.g. dozens of past projects in company databases Turned blue to green What is Feature Subset Selection? “wiggle” in x,y,z causes “wiggle” in “a” Removing x,y,z,reduces “wiggle”in “a” But can damage mean performance a = 10.1 + 0.3x + 0.9y - 1.2z

  10. Warning: no single “best” theory effort estimation defect prediction

  11. Ensemble-based learning bagging, boosting, stacking, etc Conclusions by voting across a committee 10 identical experts are a waste of money 10, slightly different, experts can offer different insights onto a problem Committee-based learning

  12. Classification ensembles: “Majority vote does as good as anything else”- Tom Dietrich Numeric prediction ensembles Can use other measures: “heuristic rejection rules” Theorists: “gasp horror” Seasoned cost-estimation practitioners: “of course” Using committees Standard statistics failing . T-tests report that none of these are “worse

  13. For any pair of treatments, • If one is “worse” • Vote it off • Repeat till none “worse” survivors

  14. So, those M*N-way cross-vals Time to use them. New research area Automatic model selection methods are now required Data fusion in biometrcs The technical problem is not the challenge Issues with explanation and expectation So…

  15. Why so many unverified ideas in software engineering? • Humans use language to mark territory • Repeated effect: linguistic drift • Villages, separated by just a few miles, evolve different dialects • Language choice = who you talk to, what tools you buy • US vs THEM: SUN built JAVA as a weapon against Microsoft • Result: never-ending stream of new language systems • Vendors want to sell new tools, not assess them.

  16. Text mining of NFRs, traceability: IEEE RE’06 (Minniapolis, 2006) The Detection and Classification of Non-Functional Requirements Cleland-Huang, Settimi, Zou, Solc IEEE TSE Jan, 2006, p 4-19: Advancing Candidate Link Generation for Requirements Tracing Hayes, Dekhtyar, Sundaram Software effort estimation IEEE TSE 200? Selecting Best Practices for Effort Estimation Menzies, Chen, Hihn, Lum Software defect prediction IEEE TSE 200? Data Mining Static Code Attributes to Learn Defect Predictors Menzies, Greenwald, Frank But, the tide is turning Best paper Yes Timmy, senior forums endorse empirical rigor

More Related