Journal of Clinical Epidemiology 61 (2008) 1083e1084
INVITED COMMENTARY
Response to commentary: dealing with heterogeneity in meta-analyses of diagnostic test accuracy Roger Harborda,*, Penny Whitinga, Matthias Eggerb, Jonathan J. Deeksc, Aijing Shangb, Lucas M. Bachmannd, Jonathan A.C. Sternea a Department of Social Medicine, University of Bristol, UK Department of Social and Preventive Medicine, University of Bern, Switzerland c Department of Public Health and Epidemiology, University of Birimingham, UK d Horten Centre, University of Zurich, Switzerland b
We thank Dr Begg for his thoughtful commentary on our paper. We agree with several of the points he makes: The bivariate/HSROC method that we advocate is statistically more complex than its competitors. We started our study hoping that an alternative, simpler method could be recommended, but were unable to find one that consistently gave similar results. We hope that a new Stata command (metandi) that we have developed will further facilitate the use of the method. Although the bivariate/HSROC method is still relatively new, it has been used at least a dozen times in applied work in recent years [1e12]. We thank Dr Begg for pointing out that biases due to flaws in the conduct of test accuracy studies are a major cause of between-study heterogeneity: this point was not sufficiently emphasized in our paper. We agree that although covariate analyses (inclusion of study characteristics in meta-analytic models) can provide evidence for such biases, they cannot resolve them. Tools such as QUADAS [13] are useful for the careful examination of study features advocated Dr Begg and by us. When flaws in the conduct of studies are identified, the best approach is to restrict meta-analyses to studies at low risk of bias. For example, in a meta-analysis of magnetic resonance imaging for the diagnosis of multiple sclerosis [9], we found that diagnostic odds ratios were more than 20 times greater in 11 studies with an inappropriate patient spectrum (mainly diagnostic caseecontrol studies) than in 18 diagnostic cohort studies. We therefore restricted our main analyses to diagnostic cohort studies. Even when meta-analyses are restricted to studies at low risk of bias, heterogeneity (due to variation in both threshold and accuracy) often remains. As in conventional random-effect meta-analysis, the bivariate/HSROC method
DOI of original article: 10.1016/j.jclinepi.2008.05.011. * Corresponding author. Tel.: þ44-117-9287289; fax: þ44-1179287325. E-mail address:
[email protected] (R. Harbord). 0895-4356/08/$ e see front matter Ó 2008 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2008.05.010
assumes a normal distribution for the random effects. Although this assumption is hard to validate, sensitivity to distributional assumptions can readily be evaluated using the Bayesian MCMC approaches to estimation implemented in WinBUGS software. It would be desirable to evaluate whether the shape of the summary ROC curve, a feature unique to diagnostic meta-analysis, is sensitive to these distributional assumptions. There are great dangers in the use of a fixed-effect framework in the presence of the substantial between-study heterogeneity present in the diagnostic meta-analyses included in our study. In particular, confidence intervals for summary estimates will be too narrow, giving a misleading impression of the accuracy with which sensitivity and specificity have been estimated. The prediction regions shown in Figure 3 of our paper give a realistic impression of uncertainty in predicting sensitivity and specificity in a new study, and can only be constructed in a random-effect framework. For two of the data sets (BAV and OAR), estimates of sensitivity and specificity were located in a relatively small portion of ROC space. These were also the data sets for which there were the most notable differences between ROC curves estimated using the different methods. Because extrapolation outside the range of the data is problematic, we denoted the range of the data by gray rectangles on the ROC plots. Nonetheless, there were clear differences between the ROC curves within the range of the data, and this should be of concern for anyone using one of the simpler methods such as the Littenberg-Moses method for which, as we noted in our paper, the assumptions of linear regression (constant variance, covariate measured without error) do not hold. On the technical matter of the assumption of statistical independence of the ‘‘cutpoint’’ and ‘‘accuracy’’ parameters of the HSROC model, we have shown that the HSROC model chooses the ‘‘shape/asymmetry/scale’’ parameter so as to make the correlation between the ‘‘cutpoint’’ and
1084
R. Harbord et al. / Journal of Clinical Epidemiology 61 (2008) 1083e1084
‘‘accuracy’’ parameters zero (which is equivalent to statistical independence once the additional assumption that they both have normal distributions is made) [14]. The apparent assumption of statistical independence of these parameters is thus an inevitable consequence of the model; the HSROC and bivariate models are mathematically equivalent, so share the same assumptions. We agree that further simulation studies assessing the performance of hierarchical and other models are desirable. In such a study, which appeared after we submitted our original manuscript and examined 10 as well as 50 studies per meta-analysis, Riley et al. [15] found that the bivariate/ HSROC method (which they referred to as a generalized BRMA [bivariate random-effect model] for proportions) outperformed the other methods assessed (though not entirely without bias). Though there remains scope for more comprehensive studies, we consider that available theoretical, simulation and empirical evidence justifies recommending that the bivariate/HSROC method be used in preference to other currently available methods. References [1] Glas AS, Roos D, Deutekom M, Zwinderman AH, Bossuyt PM, Kurth KH. Tumor markers in the diagnosis of primary bladder cancer. A systematic review. J Urol 2003;169:1975e82. [2] Gilbody S, Richards D, Brealey S, Hewitt C. Screening for depression in medical settings with the patient health questionnaire (PHQ): a diagnostic meta-analysis. J Gen Intern Med 2007;22: 1596e602. [3] Nandalur KR, Dwamena BA, Choudhri AF, Nandalur MR, Carlos RC. Diagnostic performance of stress cardiac magnetic resonance imaging in the detection of coronary artery diseaseda metaanalysis. J Am Coll Cardiol 2007;50:1343e53. [4] Kwee TC, Kwee RM. MR angiography in the follow-up of intracranial aneurysms treated with Guglielmi detachable coils: systematic review and meta-analysis. Neuroradiology 2007;49:703e13.
[5] Shaheen AAM, Myers RP. Diagnostic accuracy of the aspartate aminotransferaseto-platelet ratio index for the prediction of hepatitis c-related fibrosis: a systematic review. Hepatology 2007;46:912e21. [6] Thangaratinam S, Daniels J, Ewer AK, Zamora J, Khan KS. Accuracy of pulse oximetry in screening for congenital heart disease in asymptomatic newborns: a systematic review. Arch Dis Child Fetal Neonatal Ed 2007;92:F176e80. [7] Di Nisio M, Squizzato A, Rutjes AWS, Buller HR, Zwinderman AH, Bossuyt PMM. Diagnostic accuracy of D-dimer test for exclusion of venous thromboembolism: a systematic review. J Thromb Haemost 2007;5:296e304. [8] Williams GJ, Macaskill P, Chan SF, Karplus TE, Yung W, Hodson EM, et al. Comparative accuracy of renal duplex sonographic parameters in the diagnosis of renal artery stenosis: paired and unpaired analysis. Am J Roentgenol 2007;188:798e811. [9] Whiting P, Harbord R, Main C, Deeks JJ, Filippini G, Egger M, et al. Accuracy of magnetic resonance imaging for the diagnosis of multiple sclerosis: systematic review. Br Med J 2006;332:875e84. [10] Bipat S, Phoa SSKS, van Delden OM, Bossuyt PMM, Gouma DJ, Lameris JS, et al. Ultrasonography, computed tomography and magnetic resonance imaging for diagnosis and determining resectability of pancreatic adenocarcinomada meta-analysis. J Comput Assist Tomogr 2005;29:438e45. [11] Bipat S, Glas AS, Slors FJM, Zwinderman AH, Bossuyt PMM, Stoker J. Rectal cancer: local staging and assessment of lymph node involvement with endoluminal US, CT, and MR imagingda metaanalysis. Radiology 2004;232:773e83. [12] Stein PD, Hull RD, Patel KC, Olson RE, Ghali WA, Brant R, et al. D-dimer for the exclusion of acute venous thrombosis and pulmonary embolismda systematic review. Ann Intern Med 2004;140: 589e602. [13] Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003;3:25. [14] Harbord RM, Deeks JJ, Egger M, Whiting P, Sterne JA. A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics 2007;8:239e51. [15] Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR. Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Med Res Methodol 2007;7:3.