A Taxonomy of Evaluation Approaches in Software Engineering

A Taxonomy of Evaluation Approaches in Software Engineering A. Chatzigeorgiou, T. Chaikalis, G. Paschalidou, N. Vesyropoulos, C. K. Georgiadis, E. Stiakakis University of Macedonia, Greece BCI 2015, Craiova Romania, September 2015

…we regret to inform you … the evaluation of your approach is rather weak … …unfortunately we had to reject a number of good papers… ..the proposed approach lacks a thorough evaluation… …we would like to thank you for your submission, BUT… ..further evaluation is required … …Congratulations, your paper has been accepted… …evaluation is backed up by systematic statistical results …

I need some proof = EVALUATION!!

Taxonomies Taxonomy: Τάξις(Arrangement) + Νόμος(Law, Method) “aims at organizing a collection of objects in a hierarchical manner to provide a conceptual framework for discussion and analysis”

Goal of Study To build a taxonomy of evaluation approaches in Software Engineering

Context of Study 3 PhD students, 3 faculty members TSE: ΙΕΕΕ Transactions on Software Engineering, TOSEM: ACM Trans. on Soft. Eng. and Methodology, JSS: Elsevier's Journal of Systems and Software articles that appeared in the corresponding 2012 volume

Context of Study (2) Title, Authors, Journal, Issue Free Keywords & Classification (ACM) Employed Evaluation Approach Pages devoted to the evaluation Total #pages TSE: 81 articles TOSEM: 24 articles JSS: 207 articles Filtered: articles that clearly did not belong in the SE domain, Empirical Studies (Systematic Literature Reviews, surveys, mapping studies) 133 Articles TSE: 58 articles TOSEM: 22 articles JSS: 53 articles

Key Terms Performance: Most typical definition of performance originates from computer architecture: performance refers to the amount of work that a system/computer/program can perform in a given time or for given resources. Effectiveness: By effectiveness we refer to the extent by which a proposed technique/methodology accomplishes the desired goal. For example, a testing approach is effective if it reveals a large number of bugs. Benchmark: A benchmark is a standard, acknowledged data set (consisting of tasks, collection of items, software etc.) designed with the purpose of being representative of problems that occur frequently in real domains.

Proposed Taxonomy

Goal is to make clear the advantages and dis-advantages over previous work, and usually to high-light the added value of the proposed technique

By formal treatment we mean the use of a mathematically-based approach for proving theorems, properties, invariants or the correctness of a system. Not all of software engineering research can benefit from the application of formal methods criterion is related to the completeness of the proof, 1. the mathematical reasoning validates the entire approach 2. ensures the fulfillment of certain properties

Application of the proposed tool, algorithm, technique on artificially constructed or selected case studies. Results are obtained and discussed to demonstrate the feasibility, performance or effectiveness of the approach. Empirical Evaluation Case Studies Case Study Evaluation Empirical Results Experiments Experimental Results …..

Extent of Evaluation papers with just one page and papers with as many as 24 pages for the evaluation have been encountered

Availability of Data

Validation of the Taxonomy • By definition, it is difficult to assess whether taxonomies are valid, since their construction relies on the subjective interpretation of categories • we have applied the taxonomy on articles which have not been considered during its development • we have classified the papers from the Main Track of the 34th International Conference on Software Engineering (ICSE'2012) • 87 articles have been considered • We recorded: • Whether the paper actually introduces any technique • Whether the paper could be mapped to any of the derived classification categories • The corresponding category code

Validation of the Taxonomy (2)

Correlation between evaluation and area RQ1: Is the evaluation approach correlated to the area of research? H0 Variables "Area of Research" and "Evaluation Type" are independent H1 Variables "Area of Research" and "Evaluation Type" are dependent Areas of research correspond to a second level classification based on the 2012 ACM Computing Classification System A chi-square test revealed that there is no statistically significant correlation between “Evaluation Type” and “Area of Research”

In Software Testing there is a tendency to employ case studies and analysis of effectiveness (i.e. how well a testing strategy achieves its goals)

Correlation between evaluation and area RQ2: Is the extent of the evaluation correlated to the evaluation approach? H0 The distribution of "Extent of Evaluation" is the same across categories of "Evaluation Type" H1 The distribution of "Extent of Evaluation" is not the same across categories of "Evaluation Type" we applied the non-parametric Independent-Samples Kruskal-Wallis test to compare the distributions across groups formed by the evaluation type variable result is significant at the 0.05 level. In other words, the extent of evaluation is affected by the employed evaluation strategy.

Evaluation of efficiency on case studies, relying on explicitly stated research questions (E3.3.1.1) devotes a large percentage of the paper to the evaluation.

Conclusion In software engineering there is a vast amount of different evaluation techniques designed and executed to serve the needs of each particular research We have attempted to introduce a taxonomy of evaluation approaches. We identified 17 evaluation types that any approach can adopt either individually or in combination with other types and 8 axes according to which evaluation approaches can be classified.

So, the next time you receive a review pointing to the strength or weaknesses of the evaluation approach . . . We are glad to inform you that your paper: ….has been ACCEPTED by BCI 2015 Program Committee Review 1 … the authors have done good job in supporting their methodology by a convincing evaluation approach ….. You might be able to classify your approach based on the proposed taxonomy!

Thank you for your attention!! BCI 2015, Craiova Romania, September 2015

A Taxonomy of Evaluation Approaches in Software Engineering

A Taxonomy of Evaluation Approaches in Software Engineering

Presentation Transcript

Software Re-engineering - Theoretical and Practical Approaches

Collaboration in Software Engineering: A Roadmap

Evaluation of EPI Approaches

A TAXONOMY OF PRIVACY

Software Engineering Modern Approaches

Participatory Approaches in Impact Evaluation

Software system taxonomy

A Taxonomy of privacy

Empirical Evaluation in End-User Software Engineering

Leveraging Software Development Approaches in Systems Engineering

Systems Engineering: A Sub-field of Software Engineering?

ILP: Software Approaches

Engineering Principles in Software Engineering

Software Engineering in Media Engineering

Evaluation approaches

Evaluation of Software Design

CS352 Software Engineering (Software Engineering in the Small)

Alternative Evaluation Approaches

CS351 Software Engineering (Software Engineering in the Small)

Principles of Software Engineering: Why Study Software Engineering?

Software system taxonomy