Information and Software Technology 48 (2006) 311 www.elsevier.com/locate/infsof
Editorial
Evaluation and Assessment in Software Engineering (EASE 05) There is ample evidence that Software Engineering and Computer Science researchers undertake relatively little empirical validation of their research. For example, Zelkowitz and Wallace [4] classified 612 software engineering papers and found that 20% included no validation, and about one- third had a weak form of validation (i.e. assertions). They also confirmed that this result was worse than in other scientific fields. Glass et al. [1] surveyed papers in six leading Software Engineering journals and found only 14% were evaluative. More recently, in a survey of papers published in the Journal of Empirical Software Engineering (a journal not surveyed by Glass et al.), [3] found that, in spite of the journal’s name, nearly half of the papers were not evaluative. In addition, they found the scope of these papers “Some what narrow in topic, with measurement/metrics, maintenance, and review and inspection accounting for almost half the papers”. They also suggested that we need more field studies to balance the emphasis on laboratory experiments. So why do not we do more evaluation? Tichy [2] identified the arguments computer scientists use to defend their lack of experimentation, for example, experimentation is inappropriate, too difficult, useless and even harmful. He refuted all of those arguments and confirmed the importance of experimentation, not least because appropriate experimentation can give industry a 3–5 year competitive advantage. However, the problem of lack of experimentation still remains. Tichy suggested that software engineers find that experiments are difficult to perform. I have good deal of sympathy with this viewpoint. Valid experimentation is difficult. Furthermore, it is usually easy to find flaws in any experiment, no matter how well conducted, making it difficult to get empirical papers accepted by referees. The goal of the EASE Conference (Evaluation and Assessment in Software Engineering, http://ease.cs.keele.ac.uk) is to address this problem. EASE aims to encourage software engineers to undertake more empirical studies and provides a forum for researchers to discuss practical and methodological issues associated with evaluation. This special issue introduces four empirical studies that were first presented at the EASE Conference held in 2005. Addressing one of the issues raised by Segal et al. [3], the scope of the papers includes two field studies as well as two laboratory experiments. Only one of the papers relates to metrics, but this is not about developing new metrics, it is about comparing the effectiveness of metrics plans constructed using 0950-5849/$ - see front matter q 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.infsof.2005.09.009
the GQM (Goal Question Metric) methods with metrics plans using an extension to the GQM method. Not one of the papers is about inspections. In addition to methods for preparing metrics plans, the papers evaluate pair design in a laboratory setting, trust in outsourcing relationships by means of structured interviews, and the effectiveness of a method for undertaking lightweight process evaluations by means of qualitative analysis of evaluation reports and feedback forms. The paper on metrics planning is also a replication study. None of these papers is flawless—conducting a flawless empirical study is virtually impossible. Rather, a good empirical study is one that reports its limitations, not one that hides them. Furthermore, all of the studies presented here provide interesting insights into topics that are of current importance to software engineering practitioners. I hope these papers will give you a flavour of the variety of topics that can be addressed by empirical studies and the variety of empirical methodologies available to researchers. Next time you propose a new method, technique or procedure, I encourage you to think about the implications of your research and then to undertake empirical studies to confirm the value of your work. If we, as researchers, cannot be bothered to evaluate our research results, why should practitioners take the risk of adopting them? References [1] R.J. Glass, I. Vessey, V. Ramesh, Research in software engineering: an analysis of the literature, IST 44 (2002) 491–506. [2] W.F. Tichy, Should computer scientists experiment more?, Computer 31 (5) (1998) 32–40. [3] Judith Segal, Anthony Grinyer, Helen Sharp, The type of evidence produced by empirical software engineers, in: Proceedings of the Workshop on Realising Evidence-based Software Engineering, ICSE2005, http:/portal.acm.org/dl.cfm. [4] Marvin V. Zelkowitz, Dolores Wallace, Experimental validation in software engineering, Information and Software Technology 39 (11) (1997) 735–743.
Barbara Ann Kitchenham* Department of Computer Science, Keele University, Keele Village, Stoke-on-Trent, Staffordshire ST5 5BG, UK E-mail address:
[email protected] Received 26 September 2005; Accepted 28 September 2005
* Tel.: C44 1622 820484.