280 likes | 415 Views
Evidence-Based Software Engineering: A Paradigm for the Future?. Tore Dybå Chief Scientist SINTEF ICT tore.dyba@sintef.no JavaZone, Oslo, 15 September 2005. Outline. The problem How do I know that what I know is true? An example from object-oriented design The evidence-based paradigm
E N D
Evidence-Based Software Engineering: A Paradigm for the Future? Tore Dybå Chief Scientist SINTEF ICT tore.dyba@sintef.no JavaZone, Oslo, 15 September 2005
Outline • The problem • How do I know that what I know is true? • An example from object-oriented design • The evidence-based paradigm • Evidence-Based Software Engineering (EBSE) • Practical and scientific challenges • Conclusion • Bibliography
The problem • Software development organizations are employing methods and tools that frequently lack sufficient evidence regarding their suitability, limits, qualities, costs, and inherent risks. • Here’s a message from software practitioners to software researchers: • “We need your help. We need some better advice on how and when to use methodologies”. (Robert L. Glass, Communications of the ACM, vol. 47, no. 5, May 2004, pp. 19-21).
How do I know that what I know is true? • There are three basic answers to this question: • Authority – truth is given to us by someone more knowledgeable than ourselves • Omniscient authority (the authority is God) • Human authority (the authority is a human expert) • Reason – what is true is that which can be proven using the rules of deductive logic • Experience – what is true is that which can be encountered through one or more of the senses • Anecdotal experience • Empirical evidence
Relative strengths of these approaches • Omniscient authority provides absolute truth; • if there is a God and He has spoken on something, • then what He says must, by definition, be true. • Reason yields conditional absolute truth; • if the premises of a valid deductive argument are known to be true, • then the conclusion of the argument must also be true. • Empirical evidence provides probable truth; • if controlled experiments are designed properly and repeated enough times, • then it is highly probable that the results accurately describe reality. • Anecdotal experience yields possible truth; • if something happened for one person, • it is possible it might happen to others also. • Finally, human authority provides opinion.
On which approaches is SE mostly based? • The software engineering literature is filled with pronouncements about how software should be developed. • Representative comments include the following: • “The best way to develop reusable software is to use object-oriented design” • “To create reliable software, you must use n-version programming” • “The only way to ensure reliable software is to use formal methods” • “Rapid prototyping is the best way to get the specifications right” • “Functional programming is the only way to go” • “GOTO's are harmful” • “A developer is unsuited to test his or her code” • ”Adding manpower to a late project makes it later”
Delegated vs. Centralized control – expert opinions • The Delegated Control Style: • Rebecca Wirfs-Brock: A delegated control style ideally has clusters of well defined responsibilities distributed among a number of objects. To me, a delegated control architecture feels like object design at its best… • Alistair Cockburn: [The delegated coffee-machine design] is, I am happy to see, robust with respect to change, and it is a much more reasonable ''model of the world.'‘ • The Centralized Control Style: • Rebecca Wirfs-Brock: A centralized control style is characterized by single points of control interacting with many simple objects. To me, centralized control feels like a "procedural solution" cloaked in objects… • Alistair Cockburn: Any oversight in the “mainframe” object (even a typo!) [in the centralized coffee-machine design] means potential damage to many modules, with endless testing and unpredictable bugs.
Evaluating the Effect of a Delegated versus Centralized Control Style on the Maintainability of Object-Oriented Software* “Assuming that it is not only highly skilled experts who are going to maintain an object-oriented system, a viable conclusion from the controlled experiment reported in this paper is that a design with a centralized control style may be more maintainable than is a design with a delegated control style.” *Erik Arisholm and Dag Sjøberg, IEEE Transactions on Software Engineering, vol. 30, no. 8, August 2004, pp. 521-534.
“Pair programming leads to faster, better, and cheaper software” – or …
The Pair Programming Experiment* • 295 junior, intermediate and senior professional Java consultants in total • 99 individuals (conducted in 2001) • 98 pairs (conducted in 2004/2005) • Norway: 41 • Sweden: 28 • UK: 29 *Hans Gallis, Erik Arisholm, Tore Dybå and Dag Sjøberg, Work in progress, Simula Research Laboratory and SPIKE project (NFR).
Junior developers benefit most from pair programming, in particular on complex tasks!* • The effect of pair programming on duration, effort and correctness depends on the expertise of the developers and the type of task to be solved: • The benefits of pair programming are reduced with increasing programming skills of the individuals • The benefits of pair programming are reduced with decreasing task complexity • Pair programming requires significant more effort than individual programming (regardless of programmer category) • Juniors: • Pair programming significantly improves correctness (less logical errors) • The effect of pair programming on correctness is highest for the tasks based on the delegated control style • Junior pairs obtain almost the same correctness as intermediate and senior pairs (≈ 80 percent correct) • Seniors: • No clear benefits of pair programming *Hans Gallis, Erik Arisholm, Tore Dybå and Dag Sjøberg, Work in progress, Simula Research Laboratory and SPIKE project (NFR).
Archie Cochrane “It is surely a great criticism of our profession that we have not organised a critical summary, by specialty or subspecialty, adapted periodically, of all relevant randomised controlled trials.”
The Evidence-Based Paradigm • Evidence-Based Medicine (EBM) has changed research practices* • Medical researchers found • Failure to organize existing medical research cost lives • Clinical judgment of experts worse than systematic reviews • Evidence-based paradigm adopted by many other disciplines providing service to public • Social policy • Education • Psychiatry *D.L. Sackett, S.E. Straus, W.S. Richardson, W. Rosenberg, and R.B. Haynes, “Evidence-Based Medicine: How to Practice and Teach EBM”, Second Edition, Churchill Livingstone: Edinburgh, 2000.
What is evidence? • Systematic reviews • Methodologically rigorous synthesis of all available research relevant to a specific research question • Not ad hoc literature reviews • Best systematic reviews based on Randomized Controlled Trials (RCTs) • Not laboratory experiments • Trials of real treatments on real patients in a clinical setting
Integrating evidence • Medical researchers & practitioners construct practitioner-oriented guidelines • Assess the evidence • Determine strength of evidence (type of study) • Size of effects (practical not just statistical) • Relevance (appropriateness of outcome measures) • Assess applicability to other settings • Summarize benefits & harms • Present the evidence to stakeholders • Balance sheet of evidence & harms
Goal of EBSE • EBM: Integration of best research evidence with clinical expertise and patient values • EBSE: Adapted from Evidence-Based Medicine • To provide the means by which current best evidence from research can be integrated with practical experience and human values in the decision making process regarding the development and maintenance of software • EBSE sets requirements on practitioners and researchers: • Practitioners need to track down and use best evidence in context of practice • Researchers need to provide best evidence
The steps of EBSE EBSE is a process involving five steps: • Converting a relevant problem or information need into an answerable question. • Searching the literature for the best available evidence to answer the question. • Critically appraising the evidence for its validity, impact, and applicability. • Integrating the appraised evidence with practical experience and the values and circumstances of the customer to make decisions about practice. • Evaluating performance and seeking ways to improve it.
Step 1: Asking an answerable question • The first step in EBSE is to convert a relevant problem or information need into an answerable question. • Typical questions ask for specific knowledge about how to appraise and apply methods, tools, and techniques in practice. • Well formulated questions usually have three components: • The main intervention or action we are interested in. • The context or specific situations of interest. • The main outcomes or effects of interest. • Example: • “Does the use of pair programming lead to improved code quality when practiced by professional software developers?”
Step 2: Finding the best evidence • Finding an answer to our question includes selecting an appropriate information resource and executing a search strategy. • The main source of research-based evidence is articles published in scientific journals. Examples of databases that index published articles include: • IEEE Xplore,http://ieeexplore.ieee.org • ACM Digital Library,http://www.acm.org/dl • ISI Web of Science,http://isiknowledge.com • Often, reading important magazines such as the Communications of the ACM, IEEE Computer, IEEE Software, and IT Professional would probably be enough to get a general overview of the latest develop-ments within software engineering.
Step 3: Critically appraising the evidence • Unfortunately, published research isn’t always of good quality; the problem under study might be unrelated to practice or the research method could have weaknesses so that the results cannot be trusted. • To assess whether research is of good quality and can be applied to practice, we must be able to critically appraise the evidence. • Is there any vested interest? • Is the evidence valid? • Is the evidence important? • Can the evidence be used in practice? • Is the evidence in this study consistent with the evidence in other available studies?
Step 4: Applying the evidence • Active use of new knowledge is characterized by applying or adapting specific evidence to a specific situation in practice. • Therefore, in order to practice EBSE, the individual software developer must commit him or herself to actively engage in a learning process, combining the externally transmitted evidence with prior knowledge and experience. • Thus, it is at this point that EBSE needs to be integrated with process improvement. • EBSE should provide the scientific basis for undertaking specific process changes while SPI should manage the process of introducing a new technology.
Step 5: Evaluating performance • We need to consider how well we perform each step in the EBSE process and how we might improve our use of EBSE. • In particular, we should ask ourselves how well we are integrating evidence with practical experience, customer requirements, and our knowledge of the specific circumstances. • Following SPI practice, we also need to assess whether process change has been effective. • This might include After Action Reviews, Postmortem Analyses, and organization-wide measurement programs.
Software engineering challenges • No comparable (to medicine) research infrastructure. • No agreed standards for empirical studies • A proposal for formal experiments and surveys • Nothing for qualitative or observational studies • No agreed standards for systematic review • Few software engineering guidelines based on empirical evidence. • Challenges in addressing software engineering specifics • The skill factor • The lifecycle issue • The context dependences
Is evidence worth waiting for?* *Scott Ambler, Answering the "Where is the Proof That Agile Methods Work" Question, http://www.agilemodeling.com/essays/proof.htm
Conclusion • Evidence-based practice works in medicine. • Experience from undertaking empirical studies, systematic reviews, and teaching students in EBSE gives some confidence that it will work within software engineering as well. • However, EBSE lacks the infrastructure required to support the evidence-based paradigm • Would need financial support to put in place appropriate infrastructure • Need to develop appropriate protocols for SE studies • Some aspects of EBSE are easy to adopt, e.g. systematic reviews • EBSE needs to be tested on real and relevant problems • Guidelines for practice based on systematic reviews • Improvement strategies need to take more than scientific evidence into consideration and must be balanced according to your specific situation.
Bibliography • Barbara A. Kitchenham, Tore Dybå and Magne Jørgensen, “Evidence-Based Software Engineering," Proceedings of the 26th International Conference on Software Engineering (ICSE 2004), Edinburgh, Scotland, 23-28 May, IEEE Computer Society, 2004, pp. 273-281. • Tore Dybå, Barbara A. Kitchenham and Magne Jørgensen, “Evidence-Based Software Engineering for Practitioners,” IEEE Software, vol. 22, no. 1, January/February, 2005, pp. 58-65. • Magne Jørgensen, Tore Dybå and Barbara A. Kitchenham, “Teaching Evidence-Based Software Engineering to University Students," Proceedings of the 11th International Software Metrics Symposium (METRICS 2005), Como, Italy, 19-22 September, 2005. • D.L. Sackett, S.E. Straus, W.S. Richardson, W. Rosenberg, and R.B. Haynes, Evidence-Based Medicine: How to Practice and Teach EBM, Second Edition, Churchill Livingstone: Edinburgh, 2000.