340 likes | 415 Views
Experimentation in Computer Science and Software Engineering Kavi Khedo Senior Lecturer Department of Computer Science and Engineering Faculty of Engineering University of Mauritius k.khedo@uom.ac.mu http://khedo.wordpress.com. References.
E N D
Experimentation in Computer Science and Software EngineeringKavi KhedoSenior LecturerDepartment of Computer Science and Engineering Faculty of EngineeringUniversity of Mauritiusk.khedo@uom.ac.muhttp://khedo.wordpress.com
References • Tichy, W.F., “Should Computer Scientists Experiment More ?”, IEEE Computer, May 1998 • Zelkowitz, M.V, and Wallace, D.R., “Experimental Models for Validating Technology”, IEEE Computer, May 1998.
Outline • Nature of computing • Why experiment? • Methods of experimentation • Issues and possible approaches • Looking ahead • Conclusion
Nature of Computing • Science or engineering? • Computers and programs are human creations. • CS not a natural science in the traditional sense. • Computers and software • Subject of enquiry not just technical issues • But models of information and information processes.
Computer Science • “A science is any discipline in which the fool of this generation can go beyond the point reached by the genius of the last generation.” Max Gluckman • Computer science is a young and constantly evolving discipline. It is therefore viewed in different ways by different people, leading to different perceptions of whether it is a “science” at all.
Modeling information processes • Are information processes artificial? • Where and how do they occur? • Computer models compare poorly with information processes found in nature. • e.g., nervous systems, immune systems, genetic processes, brains of programmers and users, etc.
Why experiment ? • Experiments don’t prove a thing ! • View of mathematicians • No amount of experimentation provides proof with absolute certainty • Show presence of errors but not their absence • A theory can be shot down by contrary evidence • Test theoretical predictions against reality • A theory gets accepted if all known facts in its domain can be deduced from it and are verified by experiments • e.g., astrophysics
Why experiment ? • Example of a failed theory: • Failure probability of multi-version programs is the product of the failure probabilities of individual versions. • Experiments by Knight and Leveson showed significantly higher failure than predicted. • False assumption detected by experiment: faults in program versions are statistically independent.
Why experiment ? • Another example: • Artificial neural networks originally discarded on theoretical grounds. • Experiments showed properties better than predicted. • Now researchers have developed better theories to explain what is observed.
Benefits of experimentation • Help build reliable base of knowledge. • reduce uncertainty about adequacy of theories, methods and tools. • Lead to new, useful and unexpected insights. • open new areas of investigation. • Accelerate progress by eliminating fruitless approaches, erroneous assumptions and fads.
How to experiment • General categories of experiments: • Scientific method. • Engineering method. • Empirical method.
Scientific method • Develop a theory to explain a phenomenon. • Propose a hypothesis and test alternative variations of it. • Collect data to verify or refute claims of the hypothesis.
Engineering method • Develop and test a solution to a hypothesis. • Based on results of the test, improve the solution. • Iterate until no further improvement needed.
Empirical method • Statistical method proposed as a means to validate a hypothesis. • There may not be a formal model or theory describing the hypothesis. • Data collected to verify the hypothesis.
A comparison of the scientific method (on the left) with the role of experimentation insystem design (right).
Other important aspects • Replication • Other researchers must be able to reproduce the experiments. • Influence • Impact of experimental design on the result. • Temporal properties • Historical or current data? • Is any required information missing?
Lack of validation in CS and SE • 40% of papers requiring empirical evaluation had none. • in a sample of 400 papers published by the ACM in 1993 • 50% in software related journals. • 40-50% of SE papers found to be unvalidated. • study by Zelkowitz and Wallace (Computer, May 1998) • Much smaller percentage in disciplines such as physics, psychology and anthropology.
Argument:Experiments do not prove anything. • Response: • True, experiments show only evidence for or against a theory, but cannot prove or disprove it. • However: experiments are used for theory testing, and for exploration leading to theory development. • Theory acceptance follows gradual community acceptance as evidence accumulates • (Note importance of repeatability)
Argument: Traditional scientific method is not applicable • Response: • Applicability is identical, only the target object/subject changes • We’re dealing partly with human processes and activities, these have clearly been amenable to experimentation in other disciplines • Likewise, encodings of processes (e.g. programs) can be investigated
Argument: The current level of experimentation is sufficient • Response: • Not when compared with other sciences • Tichy: 50% vs. 15% of unsupported claims • Zelkowitz/Wallace: 40% - 50% unvalidated papers • Note: Tichy is not advocating replacing theory and engineering by experiment, but advocating balance.
Argument: Experiments are expensive • Response: • So what!? • Depends on the importance of the research questions, some are clearly important enough. • There’s a spectrum of experimental approaches differing in cost from which to choose. • Benchmarks could amortize costs. • Other scientific disciplines accept this.
Cost of experiments • Require more resources than theory. • So what ? • Example: • A significant segment of software industry switched from C to C++ at a substantial cost. • No solid evidence to show that C++ is superior to C for programmer productivity and software quality.
Benchmarks • A sample of the task domain • Effective and affordable way to experiment • Well-defined performance measurements • Used in several areas: • Speech understanding, information retrieval, pattern recognition, data warehousing and OLAP, etc. • Help to eliminate unpromising approaches and exaggerated claims
Argument: Demos are sufficient • Demos provide proof-of-concepts in the engineering sense. • Illustrate a potential, but depend on observers’ imagination and extrapolation. • Do not produce solid evidence. • Not a substitute for the scientific process. • Satisfactory when presenting a radically new idea or a significant breakthrough. • e.g., first compiler, time-sharing system, OO language, web browser, etc. • Demos don’t investigate cause/effect, don’t provide (statistically) quantifiable results
Examples of questions for experimentation • Introduce theories of how requirements are refined into programs and test them. • Deeper understanding of what is intelligence. • Quality of human computer interactions. • Relative merits of parallel machine models and algorithms. • Behavior of algorithms on typical problems.
Argument: Too much noise (too many variables to control) • Too many variables make experimentation hard. • No more than in other fields, this is just laziness • Human subjects experiments are particularly difficult but other fields have developed many techniques for addressing these difficulties • Benchmarking can simplify many questions in CS. • Benchmark development can help • Composition of the benchmark is subjective, and so the weakest link. • Is the benchmark representative enough? • Evolve over time to be close to what needs to be tested.
Argument: Progress will slow • (e.g. requiring experimentation with every paper will prevent ideas from emerging.) • We are wasting time by targeting unproductive research and development, productivity might actually improve given more experimentation. • There’s no reason for prohibiting conceptual papers and papers formulating new theories or hypotheses. (It’s a question of balance.)
Argument: Technology changes too fast • Technology changes too fast, experiments are nonrelevant by the time they’ve been completed. • Response: • Experiment focus is then too narrow • Consider instead the bigger picture (e.g. fundamental underlying questions, not ephemeral concerns.)
Argument: You’ll never get it published. • Response: • Can be true, especially when you run into reviewers who don’t understand empirical science! • But this has been changing. Still, a painful process of education in empirical research methods continues to be needed.
Potential Substitutes for Experimentation • Feature comparison • Okay sometimes, but it isn’t science. • Intuition • There are plenty of examples of times when intuition has been wrong • Expert judgment • Get real. Science is built on skepticism.
Concepts Vs Experiments • Rapid publication of novel concepts and new hypotheses is important. • But questionable ideas need to be weeded out by meaningful validation. • Then scientists can concentrate on promising approaches • Need for balance.
Problems with experiments • Unrealistic assumptions, manipulated data • Failure to provide details for repeating experiments • Results over-interpreted, or do not generalise • Scientific process can self-correct errors, hoaxes and even fraud.
CS as a harder science • Most papers take small steps forward. • Scientists should create models, formulate hypotheses and test them using experiments. • Competing theories: new theory replacing old lead to paradigm shifts • In physics, but not so evident in CS • Physical symbol system theory Vs knowledge processing theory in AI. • A theory needed for behavior of algorithms on typical problems.
Conclusion • CS research used to rely far less on experiments than most other disciplines. • A good case exists for more experimentation. • Conventional scientific methods have made CS a ‘hard’ science. • Balance between theory, engineering and experimentation needed.