ARTICLE IN PRESS
International Journal of Nursing Studies 42 (2005) 373–376 www.elsevier.com/locate/ijnurstu
Editorial
Science and art in reviewing literature Keywords: Systematic review; Literature review; Academic review
As educators and researchers, we spend much time scrutinising work that is either published or will eventually find outlet as papers in scholarly journals, such as the International Journal of Nursing Studies (IJNS). Over time certain trends become apparent in the accepted approach to presenting such work. Some of these trends are to be applauded and encouraged. The CONSORT guidelines for reporting randomised controlled trials (Moher et al., 2001a, b), for example, specify essential information that must be reported in order to allow a reader to judge the validity of the study. This Journal has recently started referring authors and reviewers to these and similar guidelines (QUOROM) for reporting systematic reviews and meta-analyses of studies of effectiveness (Moher et al., 1999) (www.consort-statement.org/QUOROM.pdf) and we hope that the result will be an increase in both the quality of papers published but also the underlying research. There is some evidence that this may have happened in medical journals when such guidelines were adopted (Moher et al., 2001a, b). However, not all such trends are for the good. One such is the tendency to treat all reviews of the literature as if they were or should be systematic reviews. Indeed there is a subtext to this, which is to equate the systematic review with a ‘good’ review and other reviews as substandard and not fit for academic journals. At the moment we can offer little formal evidence for this trend beyond our personal experience. It is most strongly embodied, in our experience, in the literature review chapters that Doctoral or Master’s degree students prepare as part of their thesis. An increasing number of students begin these chapters with a statement of a search strategy: usually a series of keywords that were searched on one or more biomedical databases, and reference to an appendix which shows how these terms have been combined to focus the search. Similarly, it is embodied in papers submitted to this Journal that call themselves ‘systematic reviews’ based upon little ‘system’ other than the listing of terms searched.
This editorial is about this trend, which has the potential to take the art and scholarship out of reviewing and reduce it to a formulaic exercise. It argues that not all scholarly reviews need be ‘systematic reviews’ in the now accepted sense and presents an invitation for more (and hopefully better) reviews of all kinds to be submitted to the IJNS, which has hitherto published a lower proportion of review papers than the most other top ten non-specialist academic nursing journals (as rated by impact factor). This argument and this invitation may seem contradictory, given the approving reference to the QUOROM guidelines above. But in endorsing these guidelines, we are not making an argument against the traditional review. Quite the reverse is true. Traditional reviews can also be scholarly and ‘systematic’ in its traditional dictionary definition sense of ‘‘methodical, not casual or sporadic or unintentional’’ (Fowler and Fowler, 1964), and such reviews can continue to make an important contribution to the development of nursing science. This becomes clear, through consideration of the ‘science’ of systematic review. A systematic review is defined as ‘‘a review of the evidence on a clearly formulated question that uses systematic and explicit methods to identify, select and critically appraise relevant primary research, and to extract and analyse data from the studies that are included in the review.’’ (NHS Centre for Reviews and Dissemination, 2001). Many publications in US English use systematic review and meta-analysis interchangeably although European usage tends to reserve meta-analysis for the statistical synthesis of results. Most, but by no means all, attention has been focused on systematic reviews to answer questions of effectiveness. The argument for such reviews is compelling. Antman et al.’s (1992) paper demonstrated persuasively how traditional narrative reviews compiled by experts failed to accurately represent the evidence of effectiveness for a number of treatments of acute myocardial infarction. They conducted systematic reviews of studies of effectiveness
0020-7489/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.ijnurstu.2005.02.001
ARTICLE IN PRESS 374
Editorial / International Journal of Nursing Studies 42 (2005) 373–376
and presented the results of a statistical meta-analysis of findings incrementally, year by year. They compared the available evidence from randomised controlled trials at each point in time with recommendations in scholarly review articles or textbooks published in that year. For a number of treatments, these recommendations lagged far behind the available evidence with delays of 10 years or more between evidence being available and a majority of reviews reflecting that evidence. The reasons for such delay may be complex. Delays in getting reviews published may be a contributory factor, but does not provide a full explanation. The key conclusion is that the authors of these reviews simply did not go out to look for, evaluate and synthesise relevant evidence. Rather they selectively quoted evidence that they were already familiar with, to support recognised or favoured treatments and were subject to the fallacy of equating statistical and clinical significance by assuming that a series of inconclusive (statistically non-significant) small early studies proved that a treatment was ineffective and could thus be ignored. Before hands are thrown up in horror at this apparent dereliction of scholarly duty, it is worth noting that much of this process is precisely how nascent scholars are trained to review the literature. Scholars are required to seek out evidence to support their points and are thus encouraged to selectively quote findings. There is a clear imperative that they should consider counter evidence but when seeking to develop a scholarly argument there is a tendency to start with supportive evidence, consider (but not necessarily actively seek out) counter evidence and find reasons to dismiss it. As students we are rewarded for showing our facility with this process. The premium on determining the actual truth is relatively low—particularly for questions and issues that do not lend themselves to simple answers and are inherently value laden. It should not be too surprising if the vestiges of this process remain. However, when it comes to health care decisions based upon matters of fact (or rather requiring quantifiable uncertainty) the stakes are high. Finding some evidence to support a particular treatment is a beginning. But not all evidence is equal and certain characteristics of studies are important in determining the value of the answers they give. All things being equal, large studies are better than small ones. Studies in which a high proportion of participants have been properly followed up are to be valued above those where many are lost. Studies in which the sampling frame and population are clear are more useful than those in which they are not because it is clear whether or not the sample may be biased and precisely which population the results might generalise to. Where the question is one of treatment effect studies in which individual differences in response are properly accounted for are to be strongly preferred because few health care interventions are so powerful that such
individual variation can be ignored. This is primarily but not exclusively achieved by removing the possibility of systematic allocation of individuals to treatments by use of randomisation. Other issues commonly identified as key determinants of the quality of a study (and hence the validity of the study’s findings) include issues such as concealment of allocation sequences, and blinding of outcome assessors to information that might influence their judgement (Juni et al., 2001). Different criteria apply to different questions and a particular criterion may matter more under some circumstances than others. However, the fundamental issue is that when what is asked of the literature or offered as an answer is a quantifiable parameter such as the prevalence of a problem, the impact of a treatment or the accuracy of a diagnostic or screening instrument, selective quotation of literature is inadequate. The systematic review thus aims to identify all valid answers from existing research to such focused questions. Explicit methods are used to judge the quality of the literature and (crucially) the same criteria are applied to all studies (see for example Du Moulin et al., (2005) in this month’s Journal). The search for evidence is based upon a highly structured approach to searching (largely electronic) databases. Search strategies are designed by breaking down the question into its component parts using, for example, the ‘problem, intervention, comparison and outcome’ or ‘PICO’ format. Some or all of these parts are used to develop a series of searches based on synonyms and index terms, which aim to be maximally sensitive searches for each component. Finally these sensitive searches for each part of the question are combined (using the Boolean operator AND) to achieve a search that aims to be both sensitive and specific (Glasziou et al., 2001). Clear description and documentation of this process is important in that it allows the reader to judge the comprehensiveness of the search. Recent examples published in this journal have incorporated extended descriptions of search strategies (Griffiths et al., 2005; Hanafin et al., 2004). However, even for focused systematic reviews such clear and reproducible aspects of the search are only partial and processes such as ‘snowballing’ (checking references of papers), ‘hand searching’ key journals (basically checking tables of contents) and contacting authors for detail of other published and unpublished research are also crucial for a comprehensive search (Glasziou et al., 2001). However, not all good reviews need to look like this. Form should follow function, and so much depends on the purpose of the review. The literature for a Doctoral thesis, for example, has a much broader purpose than providing an answer to a focused question. Here the goal is ‘‘..a thorough analysis of all up to date works concerning the research projecty’’ (Burton, 2000). This broader purpose requires that the student survey the
ARTICLE IN PRESS Editorial / International Journal of Nursing Studies 42 (2005) 373–376
field in order to set the context of a study. The student needs to: identify and explore key concepts and theories; identify specific problems in a broad field; and ascertain both what is known and what is not known on the topic. Further, such reviews need to identify appropriate methods to research outstanding problems. To approach such reviews as a series of precisely formulated questions each in need of a systematic review is potentially crippling and, although there have been examples of attempts to abbreviate the process to make the task more manageable (Hanafin et al., 2004), the problem remains. In a similar fashion, broader topic reviews that explore issues in bodies of research can make valuable contributions to scholarly journals and one such example, a discussion on measurement of quality of life is published in this month’s IJNS (Holmes, 2005). The need for a full systematic review where there is a clearly focused question in need of empirical answers is clear. On broad topics such as this the argument is less than compelling provided that there is no attempt to estimate empirical parameters. There is benefit from a statement of the scope of the review and type of material to be considered but a series of formal statements of inclusion criteria and search strategies would make for a paper that was unreadable. Ultimately there is an art to writing that could be lost if all scholarly overviews conformed to a single formal scientific method of reporting. In other situations the focused systematic review is simply not an appropriate tool even for questions of effectiveness. Greenhalgh et al. (2003) describe the process of acquiring the evidence for developing online postgraduate medical education. A narrowly focused systematic review of randomised controlled trials effectiveness would have revealed a single study. Approaching a review and presenting it in a way that gives confidence that this is indeed the full extent of studies of effect is valuable but the resulting review would hardly be illuminating to anyone unless contemplating a straight replication of that particular intervention. It would add little to the scientific literature other than a reiteration of that which was already published, whereas the goal of the systematic review is (or should be) to transcend the individual studies. However, a more considered approach to reviewing the topic led to the development of multiple searches for evidence based on an analysis of multiple relevant questions such as ‘‘what is a quality experience in online education’’ and ‘‘how can it be delivered effectively’’. Again the scholarly process of selecting and identifying the process requires sensitivity to the intricacies of experience and is a much an art as a science. In other cases there may reasonably be a less formal approach to advanced definitions of questions but an attempt to review all of the literature on a focused topic. The literature review for a Ph.D. thesis, which was published as a paper, provides one such example
375
(Griffiths and Wilson-Barnett, 1998). This review of the literature on what was then referred to as ‘nursing beds’ (more latterly nursing-led intermediate care) had a simple search strategy that was informed by the ‘PICO’ format, which was briefly described in the paper. However, because neither the topic nor the relevant terminology is easy to define precisely the strategy was broad and much of the material was retrieved purely on the basis of prior knowledge of key authors and the use of citation searching on the science and social science citation indices. This process, of following up the works of key authors by searching for other works that cite them, is often overlooked in systematic reviews. Yet for some topics, where terminology is imprecise or concepts defined by a small number of key thinkers, limiting searching and description of searches to keywords simply demonstrates a lack of knowledge on the subject even though it may appear more ‘systematic’. That review addressed multiple topics including descriptions of the clinical services, the concepts and theory underlying them and the methods used to study them. The review also attempted to examine the evidence for effectiveness from experimental studies. This part of the paper could probably have benefited from a more formal statement of quality and selection criteria—a systematic review as part of the paper. However, the broader paper retains its value, in our view, and it is worth noting that it was a further 6 years before a full and formal systematic review was completed (Griffiths et al., 2004, 2005). Not all systematic reviews need be so complex or so time consuming and elsewhere one of us (PG) has outlined the approach of ‘mini-review’ (Griffiths, 2002), which attempts to retain much of the virtue of the systematic review in terms of focus and attempt to minimise bias. However, the mini-review recognises the limitations in resources that many scholars encounter, emphasises priority setting in terms of questions (so fewer outcomes may be addressed) and puts the priority on critically appraising evidence that is found as opposed to exhaustive searching. This approach seems to be supported by evidence which suggests that the quality of the review process is key and that, in general, omitting studies that are typically difficult to locate does not necessarily introduce bias (Egger et al., 2003; Moher et al., 2003) although the potential should not be discounted (McAuley et al., 2000). Not all reviews need be systematic reviews. There is much to be learnt from the formal processes of systematic review but it is clear that the system must vary for different reviews. There is virtue in diversity. Formal statement of search strategies and application of selection criteria is crucial in a systematic review. In other reviews, the priorities differ. We can offer no better suggestion than that given in an editorial from another journal:
ARTICLE IN PRESS Editorial / International Journal of Nursing Studies 42 (2005) 373–376
376
‘‘It may be helpful to include search terms, but these will always be tempered by the sieving process undertaken by the author, using that unsurpassed instrument, the brain, to decide if the paper identified is relevant to the subject or not. (Vetter, 2003)’’ Rather than describing this sieving as formal inclusion/exclusion criteria many reviews would be improved by a qualitative description of the scope and nature of material considered. However, this does not make them systematic reviews, in the now accepted sense, and this term should be reserved for papers that meet the definition given above and which utilise those formal processes. By pointing our authors to the QUORUM statement, we hope to ensure that systematic reviews are properly conducted and well reported so that they can serve their purpose of giving reliable and valid answers to focused questions; the science of nursing needs such focused summations of evidence and they should be undertaken in a rigorous scientific manner. But the IJNS encourages submission of other types of review too, and would not want all published reviews to conform to a single model. Broader, discursive reviews can and should be undertaken with rigour, and these too can make an important contribution to the development of nursing as a practice discipline and the quality of patient care.
References Antman, E.M., Lau, J., Kupelnick, B., Mosteller, F., Chalmers, T.C., 1992. A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts. Treatments for myocardial infarction [see comments]. JAMA 268 (2), 240–248. Burton, D., 2000. Writing a thesis. In: Burton, D. (Ed.), Research Training for Social Scientists. Sage, London 501pp. Du Moulin, M.F.M.T., Hamers, J.P.H., Paulus, A., Berendsen, C., Halfens, R., 2005. The role of the nurse in community continence care: A systematic review, this issue, doi:10.1016/ j.ijnurstu.2004.08.002. Egger, M., Ju¨ni, P., Bartlett, C., Holenstein, F., Sterne, J., 2003. How important are comprehensive literature searches and the assessment of trial quality in systematic reviews? Empirical study. Health Technology Assessment 7 (1), 1–76. Fowler, H.W., Fowler, F.G. (Eds.), 1964. The Concise Oxford Dictionary. Oxford University Press, Oxford. Glasziou, P., Irwig, L., Bain, C., Colditz, G., 2001. Systematic Reviews in Health Care: A Practical Guide. Cambridge University Press, Cambridge. Greenhalgh, T., Toon, P., Russell, J., Wong, G., Plumb, L., Macfarlane, F., 2003. Transferability of principles of evidence based medicine to improve educational quality: systematic review and case study of an online course in primary health care. British Medical Journal 326 (7381), 142–145.
Griffiths, P., 2002. Introducing the mini review. British Journal of Community Nursing 7 (1), 38. Griffiths, P., Wilson-Barnett, J., 1998. The effectiveness of ‘nursing beds’: a review of the literature. Journal of Advanced Nursing 27 (6), 1184–1192. Griffiths, P., Edwards, M., Forbes, A., Harris, R., Ritchie, G., 2004. Intermediate care in nursing-led in-patient units: effects on health care outcomes and resources. The Cochrane Database of Systematic Reviews 2004 (4), Art. No.: CD002214.pub002212. Griffiths, P., Edwards, M., Forbes, A., Harris, R., 2005. Postacute intermediate care in nursing-led units: a systematic review of effectiveness. International Journal of Nursing Studies 42 (1), 107–116. Hanafin, S., Cowley, S., Griffiths, P., 2004. An application of the mini review to a complex methodological question: how best to research public health nursing and service quality? International Journal of Nursing Studies 41 (7), 799–811. Holmes, S., 2005. Assessing the quality of life: reality or impossible dream? A discussion paper International Journal of Nursing Studies 42 (4), 493–501. Juni, P., Altman, D.G., Egger, M., 2001. Systematic reviews in health care: Assessing the quality of controlled clinical trials. British Medical Journal 323 (7303), 42–46. McAuley, L., Pham, B., Tugwell, P., Moher, D., 2000. Does the inclusion of grey literature influence estimates of intervention effectiveness reported in meta-analyses? Lancet 356 (9237), 1228–1231. Moher, D., Cook, D., Eastwood, S., Olkin, I., Rennie, D., Stroup, D., 1999. Improving the quality of reports of metaanalyses and randomised controlled trials: the QUOROM statement. Lancet 354, 1896–1900. Moher, D., Jones, A., Lepage, L., for the CONSORT Group, 2001a. Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA 285 (15), 1992–1995. Moher, D., Schulz, K., Altman, D., 2001b. The CONSORT statement: revised recommendations for improving the quality of reports of parallel group randomized trials. BMC Medical Research Methodology 1, 2. Moher, D., Lawson, M., Klassen, T., 2003. The inclusion of reports of randomised trials published in languages other than English in systematic reviews. Health Technology Assessment 7 (4), 1–90. NHS Centre for Reviews and Dissemination, 2001. Undertaking systematic reviews of research on effectiveness : CRD guidelines for those carrying out or commissioning reviews. CRD, York. Vetter, N., 2003. What is a clinical review? Reviews in Clinical Gerontology 13, 103–105.
Peter Griffiths (Deputy Editor), Ian Norman (Editor-in-Chief) Florence Nightingale School of Nursing & Midwifery, King’s College London, James Clerk Maxwell Building, 57 Waterloo Road, London SE1 8WA, UK E-mail addresses: peter.griffi
[email protected] (P. Griffiths),
[email protected] (I. Norman)