1 / 36

From myths and fashions to evidence-based software engineering

From myths and fashions to evidence-based software engineering. Magne Jørgensen. Most of the methods below have once been (some still are) fashionable.

musserg
Download Presentation

From myths and fashions to evidence-based software engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From myths and fashions to evidence-based software engineering Magne Jørgensen

  2. Most of the methods below have once been (some still are) fashionable ... • The Waterfall model, the sashimi model, agile development, rapid application development (RAD), unified process (UP), lean development, modified waterfall model, spiral model development, iterative and incremental development, evolutionary development (EVO), feature driven development (FDD), design to cost, 4 cycle of control (4CC) framework, design to tools, re-used based development, rapid prototyping, timebox development, joint application development (JAD), adaptive software development, dynamic systems development method (DSDM), extreme programming (XP), pragmatic programming, scrum, test driven development (TDD), model-driven development, agile unified process, behavior driven development, code and fix, design driven development, V-model-based development, solution delivery, cleanroom development, .....

  3. The paper clip was invented by a Norwegian

  4. Short men are more aggressive (The Napoleon complex)

  5. Most (93%) of our communication is non-verbal

  6. Therewere/is a softwarecrisis (page 13 of their 1994-report): “We then called and mailed a number of confidential surveys to a random sample of top IT executives, asking them to share failure stories.”

  7. 45% of features of “traditional projects” are never used(source: The Standish Group, XP 2002) No-one seems to know (and the Standish Group does not tell) anything about this study! Why do so many believe (and use) this non-interpretable, non-validated claim? They benefit from it (agile community) + confirmation bias (we all know at least one instance that fit the claim)

  8. 14% Waterfall and 42% of Agile projects are successful(source: The Standish Group, The Chaos Manifesto 2012) Successful = “On cost, on schedule and with specified functionality” Can you spot a serious error of this comparison?

  9. The number one in the stink parade …

  10. The ease of creating myths: Are risk-willing or risk-averse developers better? Group A: Group B: Initially Average 3.3 Initially Average 5.4 Debriefing Average 2: 3.5 Debriefing Average 2: 5.0 2 weeks later Average 3: 3.5 2 weeks later Average 3: 4.9 Study design: Research evidence + Self-generated argument. Question: Based on your experience, do you think that risk-willing programmers are better than risk-averse programmers?1 (totally agree) – 5 (No difference) - 10 (totally disagree) Neutral group: Average 5.0

  11. “I see it when I believe it” vs “I believe it when I see it” • 26 experienced software managers • Different preferences on contract types: Fixed price or per hour • Clients tended to prefer fixed price, while providers were more in favor of per hour • Presentation of a data set of 16 projects with information about contract type and project outcome (client benefits and cost-efficiency of the development work) • Results: Chi-square of independence gives p=0.01

  12. Bias among researchers … Effect size = MMRE_analogy – MMRE_regression Regression-based models better Analogy-based models better

  13. Development of own analogy-basedmodel (vested interests) Effect size = MMRE_analogy – MMRE_regression Regression-based models better Analogy-based models better

  14. How many results are incorrect? The effect of low power, researcher bias and publication bias

  15. Correct test results: (150 + 475)/1000 = 62.5% 500 true relationships Proportion exp. stat. sign results: (150+25)/1000 = 17.5% Statistical power is 30% -> 150 True positive (green) 1000 statisticaltests Significance level is 5% -> 25 False positive (red) Correct positive tests: 150/(150+25) = 85.7% (prob. of null hyp. being true when p<0.05 is 14.4%, not 5%) 500 false relationships

  16. We observe about 50% p<0.05 in published SE experiments • We should expect 17.5% • Maximum 30%, if we only test true relationships • Researcher and publication bias

  17. effect of adding 20% researcher bias and 30% publication bias

  18. Removes 78 negative tests (30% publication bias) 42% positive tests Researcher bias is 20% -> 70 more true positive tests (blue) Correct test results: 61% (just above half of the tests) Statistical power is 30% -> 150 true positive (green) 1000 statistical tests Significance level is 5% -> 25 false positive (red) Correct positive tests: 65% One third of the reported positive tests are incorrect! Researcher bias is 20% -> 95 more false positive tests (blue) Removes 114 negative tests (30% publication bias)

  19. Low proportion of correct results!We need to improve statistical research practices in Software engineering!In particular, we need to increase statistical power (increased sample size)

  20. Have you heard about the assumption of Fixed variables?

  21. IIlustration: Salary discrimination? • Assume an IT-company which: • Has 100 different tasks they want to complete and for each task hire one male and one female (200 workers) • The “base salary” of a task varies (randomly) from 50.000 to 60.000 USD and is the same for the male and the female employees. • The actual salary is the “base salary” added a random, gender independent, bonus. This is done through use of a “lucky wheel” with numbers (bonuses) between 0 and 10.000. • This should lead to (on average): Salary of female = Salary of male • Let’s du a regression analysis with: “Salary of female = a + b*Salary of male” • b<1 means that women are discriminated • The regression analysis gives b=0.56. Strong discrimination of women!? • Let’s repeat the analysis on the same data with the model: “Salary of male = a* + b**Salary of female” • The regression analysis gives b*=0.56. Strong discrimination of men????

  22. Salary women Salary men Salary men Salary women

  23. How would you interpret these data?(from a published study) CR duration = Actual duration (effort) to complete a change request Interpretation by the author of the paper: Larger tasks are more under-estimated.

  24. What about these data? They are from the exact same data set! The only difference is in the use of the estimated instead of actual duration as the task size variable.

  25. Economy of scale? Probably not ... (M. Jørgensen and B. Kitchenham. Interpretation problems related to the use of regression models to decide on economy of scale in software development, Journal of Systems and Software, 85(11):2494-2503, 2012.)

  26. How to make software engineering more EVIdence-based?

  27. Evidence-based software engineering (EBSE) The main steps of EBSE are as follows: • Convert a relevant problem or need for information into an answerable question. • Search the literature and practice-based experience for the best available evidence to answer the question. (+ create own local evidence, if needed) • Critically appraise the evidence for its validity, impact, and applicability. • Integrate the appraised evidence with practical experience and the client's values and circumstances to make decisions about practice. • Evaluate performance in comparison with previous performance and seek ways to improve it.

  28. The software industry should learn to formulate questions meaningful for their context/challenge/problem The question “Is Agile better than Traditional methods?” is NOT answerable. • What is agile? • What is traditional? • What is better? • What is the context?

  29. Learn to be more critical (myth busting) when claims are made • Find out what is meant by the claim. • Is it possible to falsify the claim? If not, what is the function of the claim? • Put yourself in a ”critical mode” • Raise the awareness of the tendency to accept claims, even without valid evidence, when you agree/it seems intuitively correct. • Reflect on what you would consider as valid evidence to support the claim. • Vested interests? • Do you agree because of the source? • Collect and evaluate evidence • Research-based, practice-base, and “own” evidence • Synthesize evidence and conclude (if possible)

  30. Learn to question what statements and claims means

  31. Claim Data Warrant Backing Qualifier Reservation Learn how to evaluate argumentation

  32. Learn how to use google scholar (or similar sources of research-based evidence)

  33. Learn how to collect and evaluate practice-based experience • Methods similar to evaluation of research-based evidence and claims • Be aware of “organizational over-learning”

  34. Learn how to create local evidence • Experimentation is simpler than you think • Pilot studies • Trial-sourcing • Controlled experiments

  35. Is it realistic to achieve an evidence-based software engineering profession? • Yes,but there are challenges. • Main challenges: • Not much research. • High number of different contexts • Much research has a low reliability, sometimes hard to identify • Opportunities: • More and better use of practice-based evidence • More experimenting in local contexts

  36. Coffee dehydrates your body?

More Related