Journal of Clinical Epidemiology 59 (2006) 980–983
Publication bias affected the estimate of postoperative nausea in an acupoint stimulation systematic review Anna Leea,*, John B. Copasb, Masayuki Henmib, Tony Gina, Raymond C.K. Chunga a
Department of Anaesthesia and Intensive Care, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, NT, Hong Kong b Department of Statistics, University of Warwick, Coventry CV4 7AL, United Kingdom Accepted 6 December 2005
Abstract Background and Objective: To assess the effect of publication bias and country effect on the results and conclusion of a systematic review of wrist P6 acupoint stimulation for the prevention of postoperative nausea and vomiting. Methods: Reanalysis of a systematic review of 26 randomized trials comparing P6 acupoint stimulation with sham published in the Cochrane Database of Systematic Reviews using the Copas’ sensitivity approach. Results: If it is assumed that all studies that have ever been carried out are included, or that those selected for review are truly representative of all such studies, then the estimated relative risk (RR) for nausea was 0.71 (95% CI: 0.58 to 0.88, P ! .01) and for vomiting was 0.70 (95% CI: 0.56 to 0.88, P ! .01) after adjusting for country effect. For nausea, adjustment for publication bias suggests that the risk has been overestimated. If around 33% of studies have been unpublished, the RR of nausea (0.92, 95% CI: 0.80 to 1.06, P 5 .25) is no longer significant. For vomiting, however, there is no strong evidence of publication bias. The number of unpublished studies required to substantially overturn the above significant result is implausibly large. Conclusion: Publication bias affects the published estimate of postoperative nausea, not vomiting. Ó 2006 Elsevier Inc. All rights reserved. Keywords: Acupuncture; Methodology; Publication bias; Systematic review
1. Introduction The presence of publication bias is a major threat to the validity of a meta-analysis [1] because it can lead to inappropriate clinical decision making and health policies. Publication bias exists because research with results that are significant and of higher quality are more likely to be submitted, published, or published more rapidly than work without such characteristics [2]. Various methods for detecting and correcting for publication bias have been reviewed [3], and have specific limitations in their design. Although publication bias may be present to some degree in about 50% of meta-analyses, missing studies changed the conclusions in less than 10% of meta-analyses [4]. Nevertheless, researchers should always check for the presence of publication bias and perform a sensitivity analysis to assess the potential impact of missing studies [4].
* Correspondence author. Tel.: 1852 2732 2735; fax 1852 2637 2422. E-mail address:
[email protected] (A. Lee). 0895-4356/06/$ – see front matter Ó 2006 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2006.02.003
In complementary medicine, studies from Asian countries tend to report unusually high proportions of positive results in controlled clinical trials of acupuncture [5]. In Vickers et al.’s paper [5], the proportion of positive trials comparing acupuncture with controls was 100% for trials originating from China, Taiwan, Japan, Hong Kong and Vietnam; in comparison, the proportion was 60% from 21 western developed countries such as the United States, Sweden, United Kingdom, Denmark, Germany, and Canada. Publication bias is also common in randomized controlled trials of traditional Chinese medicine [6]. Lee and Done [7] reviewed the results of 26 trials of wrist P6 acupoint stimulation to prevent postoperative nausea and vomiting in the Cochrane Database of Systematic Reviews (2004, issue 3). There were significant risk reductions of nausea (relative risk [RR] 5 0.72, 95% confidence interval [CI]: 0.59 to 0.89, P 5 .002) and vomiting (RR 5 0.71, 95% CI: 0.56 to 0.91, P 5 0.007) despite moderate amounts of heterogeneity between the trials of P6 acupoint stimulation vs. sham. However, the approach used by Lee and Done [7] did not allow for the possibility of publication
A. Lee et al. / Journal of Clinical Epidemiology 59 (2006) 980–983
981
bias and the country effect on heterogeneity. We reanalyzed the results to assess the effect of publication bias and country effect on the results and conclusion of the above metaanalysis.
2. Methods Figures 1 and 2 show the log relative risks of nausea and vomiting from the randomized controlled trials plotted against a measure of uncertainty (standard error) in that log relative risk. These plots are similar to the funnel plots used to assess the presence of publication bias. The uncertainty increases as the size of the study decreases so that large studies are on the left of the plot and small studies on the right. Informal visual examination of Fig. 1 shows a trend, with the smaller studies (larger standard errors) giving more positive results than the larger studies. This suggests that publication bias is present. Formal statistical tests to assess the asymmetry of the funnel plot show discordant results (Begg’s test [8] P 5 0.04, Egger’s test [9] P 5 0.13). In Fig. 2, the trend is less obvious, and formal statistical tests also show discordant results (Begg’s test [8] P 5 0.82, Egger’s test [9] P 5 0.06). Both Begg’s and Egger’s test have low power [10], and there is almost no agreement beyond chance between both tests [11]. More importantly, these tests do no address the problem of how to proceed if publication bias is suspected [1]. 2.1. Copas’ sensitivity approach Therefore, we used the Copas’ sensitivity approach to assess different levels of publication bias on the pooled
Fig. 2. Plot of log relative risk of vomiting vs. standard error. Overall weighted average of relative risk is shown as solid line and fitted value (study selection at maximum probability of 0.80 and minimum probability of 0.60) as dashed line. Open circle represents a study from a non-Asian country; filled circle represents a study originating from Asia.
estimate [12]. The method describes the apparent relationship between relative risk and study size by a fitted line. We extended the model to include a covariate (country of publication) to assess the effect of publication bias and country effect on the results and conclusion of the wrist P6 acupoint stimulation systematic review [7]. We assumed no difference in the extent of publication bias between trials from Asian and nonAsian countries. The model is based on two ideas: 1. The original distribution of the outcome from each study (i.e., log relative risk) is described by the standard random effect model with the covariate indicating country effect and 2. Process of study selection is described by a regression model with residuals that are correlated with study outcome We used the following model in our sensitivity method: yi 5 mi 1 aðci 2 cÞ 1 si 3i ; 3i | Nð0; 1Þ; mi |N m; t2 ; ð1Þ
zi 5 a 1 b=si 1 di ; di | Nð0; 1Þ; corrð3i ; di Þ 5 r:
Fig. 1. Plot of log relative risk of nausea vs. standard error. Overall weighted average of relative risk is shown as solid line and fitted value (study selection at maximum probability of 0.80 and minimum probability of 0.60) as dashed line. Open circle represents a study from a non-Asian country; filled circle represents a study originating from Asia.
ð2Þ
Model 1 is one for complete data, which includes data of missing studies (i.e., published and unpublished studies in the P6 acupoint stimulation meta-analysis). Here, yi is the estimated treatment effect (e.g., a log relative risk) observed in the ith study, m is the overall treatment effect in the sense of averaging the effects in Asian and nonAsian countries, t2 is the heterogeneity variance, s2i is the within-study sampling variance. The variable ci is a binary
982
A. Lee et al. / Journal of Clinical Epidemiology 59 (2006) 980–983
Table 1 Estimated overall relative risk of nausea, alpha coefficient for Asian country effect and number of unpublished studies for various probabilities of study selection Maximum probabilitya
Minimum probabilityb
Overall relative risk (95% CI)
P-value
Asian effect (a, 95% CI)
1.00 0.99 0.80 0.60 0.40 0.20
1.00 0.80 0.60 0.30 0.10 0.01
0.71 0.75 0.92 0.90 0.90 0.90
0.002 0.001 0.253 0.466 0.529 0.535
20.38 20.29 20.15 20.20 20.20 20.20
a b
(0.58 (0.63 (0.80 (0.66 (0.66 (0.66
to to to to to to
0.88) 0.89) 1.06) 1.21) 1.24) 1.24
(20.95 (20.87 (20.59 (20.79 (20.79 (20.79
to to to to to to
0.19) 0.29) 0.29) 0.38) 0.39) 0.39)
Goodness-of-fit of model (P-value)
Number of unpublished studies
0.070 0.256 0.953 0.789 0.934 0.988
0 2 8 26 88 683
Selection probability of study with smallest observed standard error 0.10. Selection probability of study with largest observed standard error 1.39.
(0/1) indicator of Asian versus nonAsian country, and c is the mean of cis. Model 2 describes the study-selection process. Here, si is the standard error of yi. The variable zi is a latent one, whose role is that
selection (overall in the sense of averaging the effects in Asian and non-Asian countries). The maximum probability and minimum probability indicate the assumption on the selection probabilities of the study with the smallest observed standard error and of the study with the largest observed standard error. For example, if the maximum and minimum probabilities are 0.80 and 0.60, then it means that 80 and 60% of the trials ever conducted with the smallest and the largest observed standard errors are in the metaanalysis respectively; 20 and 40%, respectively, are unpublished, and thus missing from the analysis. Increasing amounts of publication bias is shown by decreasing these minimum and maximum probabilities. If the maximum and minimum probabilities are both 1.00, which indicates that all studies that have ever been carried out are included, then there is no publication bias. In this case, the estimated overall RR for nausea was 0.71 (95% CI: 0.58 to 0.88, P ! .01) and for vomiting was 0.70 (95% CI: 0.56 to 0.88, P ! .01), both highly significant. For any given values of the maximum and minimum probabilities, it is possible to estimate the number of studies that have been undertaken but not published. They are shown in the last columns of the tables. For nausea, if eight studies have been unpublished against observed 16 studies (maximum probability 5 0.80, minimum probability 5 0.60), the overall RR (0.92, 95% CI: 0.80 to 1.06) would no longer be significant (P 5 .25). This means that a plausibly moderate number of unpublished studies can overturn
yi is observed only when zi O0: The residuals (3i, di) are assumed to be jointly normal with the correlation r, which controls the extent of publication bias. (If r 5 0, then there is no publication bias.) In our sensitivity method, we give several possible values to the parameters a and b rather than estimate those parameters. Then we see how the result of the inference for a, m, and so on, changes as the values of a and b vary. In Tables 1 and 2, we determine the values of a and b by giving some values of ‘‘maximum probability’’ and ‘‘minimum probability.’’ In our original model [12] the second term on the righthand side of (1) does not exist. However, this can be easily added, as we have done. By adding this term (a), we can investigate whether the difference between Asian and non-Asian countries (or any other study level attribute) contributes to the heterogeneity between these studies. 3. Results Tables 1 and 2 give the estimates of the overall relative risk for various assumptions on the probability of study
Table 2 Estimated overall relative risk of vomiting, a coefficient for Asian country effect, and number of unpublished studies for various probabilities of study selection Maximum probabilitya
Minimum probabilityb
Overall relative risk (95% CI)
P-value
Asian effect (a, 95% CI)
1.00 0.99 0.80 0.60 0.40 0.20
1.00 0.80 0.60 0.30 0.10 0.01
0.70 0.74 0.79 0.82 0.83 0.83
0.003 0.009 0.057 0.164 0.209 0.217
20.60 20.53 20.47 20.43 20.42 20.42
a b
(0.56 (0.58 (0.62 (0.62 (0.61 (0.61
to to to to to to
0.88) 0.93) 1.01) 1.09) 1.11) 1.12)
Selection probability of study with smallest observed standard error 0.09. Selection probability of study with largest observed standard error 1.64.
(21.42 (21.35 (21.28 (21.24 (21.24 (21.24
to to to to to to
0.22) 0.29) 0.34) 0.39) 0.40) 0.40)
Goodness-of-fit of model (P-value)
Number of unpublished studies
0.314 0.416 0.610 0.794 0.870 0.901
0 3 10 36 126 999
A. Lee et al. / Journal of Clinical Epidemiology 59 (2006) 980–983
the result. On the other hand, for vomiting, the same maximum and minimum probabilities give 10 unpublished studies against observed 20 studies, and the overall RR (0.79, 95% CI: 0.62 to 1.01) has P 5 .06, not quite significant but almost so. To match the same P-values as the nausea analysis (P 5 .25), the number of unpublished studies would have to be implausibly large, as is shown in Table 2. There was no significant evidence of a country effect on heterogeneity, as indicated by the 95% confidence intervals around the a values in Tables 1 and 2. The dashed lines in Figures 1 and 2 show the fit from the Copas’ sensitivity models when one assumes around 66% of studies have been included; these lines fit the available evidence well (Pvalues for testing goodness of fit of the model were 0.95 and 0.61 for nausea and vomiting, respectively). The existence of publication bias can be tested by the goodness-offit test for the model when the maximum and minimum probabilities are both 1.00. There was some suggestion that publication bias exists for nausea (P 5 .07) but not for vomiting (P 5 .31), supporting the results of the informal visual inspection of the funnel plots outlined above. To best detect the possibility of publication bias using the funnel plot, we followed the guidelines on the choice of axis [13] where funnel plots should use standard error as measure of study size and ratio measures of treatment effect. 4. Conclusion Although publication bias is prevalent [4], a modest degree of publication bias (where around 33% of studies are not found) can lead to nonsignificant reduction of the relative risk in nausea even after allowing for the possibility of a country effect. On the other hand, for vomiting, there is no strong evidence of publication bias, and an implausibly large number of unpublished studies is required to substantially overturn the result. This suggests that the published estimate for nausea [7] may need to be interpreted with some caution, but that the published estimate for vomiting is likely to be still significant given plausible levels of publication bias.
983
Acknowledgments A.L. initiated the project, contributed to the project design, data analysis, and interpretation, wrote the first draft of the article and is the guarantor for the article. Reanalysis of the data was done jointly by J.B.C. and M.H. T.G. and R.C.K. assisted with the interpretation of the results. The article was revised jointly by all authors. This study was funded by internal funds from the Department of Anaesthesia and Intensive Care, The Chinese University of Hong Kong.
References [1] Sutton AJ, Song F, Gilbody SM, Abrams KR. Modelling publication bias in meta-analysis: a review. Stat Methods Med Res 2000;9: 421–45. [2] Sterne JAC, Egger M, Smith GD. Systematic reviews in health care: investigating and dealing with publication and other biases in metaanalysis. BMJ 2001;323:101–5. [3] Thornton A, Lee P. Publication bias in meta-analysis: its causes and consequences. J Clin Epidemiol 2000;53:207–16. [4] Sutton AJ, Duval SJ, Tweedie RL, Abrams KR, Jones DR. Empirical assessment of effect of publication bias on meta-analyses. BMJ 2000;320:1574–7. [5] Vickers A, Goyal N, Harland R, Rees R. Do certain countries produce only positive results? A systematic review of controlled trials. Control Clin Trials 1998;19:159–66. [6] Tang JL, Zhan SY, Ernst E. Review of randomised controlled trials of traditional Chinese medicine. BMJ 1999;319:160–1. [7] Lee A, Done ML. Stimulation of the wrist acupuncture point P6 for preventing postoperative nausea and vomiting. Cochrane Database Syst Rev 2004;CD003281. [8] Begg CB, Mazumdar M. Operating characteristics of a rank correlation test for publication bias. Biometrics 1994;50:1088–101. [9] Egger M, Davey SG, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997;315:629–34. [10] Macaskill P, Walter SD, Irwig L. A comparison of methods to detect publication bias in meta-analysis. Stat Med 2001;20:641–54. [11] Pham B, Platt R, McAuley L, Klassen TP, Moher D. Is there a ‘‘best’’ way to detect and minimize publication bias? An empirical evaluation. Eval Health Prof 2001;24:109–25. [12] Copas JB, Shi JQ. A sensitivity analysis for publication bias in systematic reviews. Stat Methods Med Res 2001;10:251–65. [13] Sterne JAC, Egger M. Funnel plots for detecting bias in meta-analysis: guidelines on choice of axis. J Clin Epidemiol 2001;54:1046–55.