Treatment of multiple test readers in diagnostic accuracy systematic reviews-meta-analyses of imaging studies

Treatment of multiple test readers in diagnostic accuracy systematic reviews-meta-analyses of imaging studies

European Journal of Radiology 93 (2017) 59–64 Contents lists available at ScienceDirect European Journal of Radiology journal homepage: www.elsevier...

437KB Sizes 2 Downloads 21 Views

European Journal of Radiology 93 (2017) 59–64

Contents lists available at ScienceDirect

European Journal of Radiology journal homepage: www.elsevier.com/locate/ejrad

Research papers

Treatment of multiple test readers in diagnostic accuracy systematic reviews-meta-analyses of imaging studies

MARK



Trevor A. McGratha, Matthew D.F. McInnesb, , Felipe W. Langerc, Jiho Honga, Daniël A. Korevaard, Patrick M.M. Bossuytd a

Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada University of Ottawa, Department of Radiology, Clinical Epidemiology Program, Ottawa Hospital Research Institute, Room c159 Ottawa Hospital Civic Campus, 1053 Carling Ave., Ottawa, ON, K1Y 4E9, Canada c Faculty of Medicine, Federal University of Santa Maria, Rio Grande do Sul, Brazil d Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands b

A R T I C L E I N F O

A B S T R A C T

Keywords: Imaging, diagnostic – diagnostic imaging Research methodology – research design Review, systematic – review Medicine, evidence-based – evidence-based medicine Data reporting – research design

Objective: To evaluate the handling of multiple readers in imaging diagnostic accuracy systematic reviews-metaanalyses. Methods: Search was performed for imaging diagnostic accuracy systematic reviews that performed metaanalysis from 2005–2015. Handling of multiple readers was classified as: 1) averaged; 2) ‘best’ reader; 3) ‘most experienced’ reader; 4) each reader counted individually; 5) random; 6) other; 7) not specified. Incidence and reporting of multiple reader data was assessed in primary diagnostic accuracy studies that were included in a random sample of reviews. Results: Only 28/296 (9.5%) meta-analyses specified how multiple readers were handled: 7/28 averaged results, 2/28 included the best reader, 14/28 treated each reader as a separate data set, 1/28 randomly selected a reader, 4/28 used other methods. Sample of 27/268 ‘not specified’ reviews generated 442 primary studies. 270/442 (61%) primary studies had multiple readers: 164/442 (37%) reported consensus reading, 87/442 (20%) reported inter-observer variability, 9/442 (2%) reported independent datasets for each reader. 26/27 (96%) meta-analyses contained at least one primary study with multiple readers. Conclusions: Reporting how multiple readers were treated in imaging systematic reviews-meta-analyses is uncommon and method used varied widely. This may result from a lack of guidance, unavailability of appropriate statistical methods for handling multiple readers in meta-analysis, and sub-optimal primary study reporting.

1. Introduction Many studies that evaluate the diagnostic accuracy of imaging modalities contain multiple readers, which means that more than one physician interprets each examination. This is commonly done to assess inter-observer variability, or to examine the impact of reader experience or expertise on diagnostic accuracy. Multiple independent readers are preferred to a single reader, as a single reader may have expertise that is difficult for others to reproduce. Reporting measures of interobserver variability can help assess the generalizability of a diagnostic test to clinical practice [1]. However, the presence of multiple readers in primary diagnostic accuracy studies presents unique challenges when researchers try to synthesize the available evidence, such as in



systematic reviews of imaging studies and corresponding meta-analyses [2–4]. When faced with such a challenge, authors of systematic reviewsmeta-analyses seem to have several options, which include: 1) use an average of the diagnostic accuracy results across readers within a study [5]; 2) select the ‘best’ reader within a study (i.e. the reader that reached highest accuracy) [6]; 3) select the ‘most experienced’ reader within a study (i.e. most years of clinical experience) [7]; 4) count each reader within a study as an ‘individual study’[8]; or 5) randomly select one reader within a study [9]. At present, there are no recommendations regarding which strategy is optimal [10–12]. Each of these strategies may have its disadvantages. If results of multiple readers are averaged, heterogeneity from inter-observer

Corresponding author. E-mail addresses: [email protected] (T.A. McGrath), [email protected] (M.D.F. McInnes), [email protected] (F.W. Langer), [email protected] (J. Hong), [email protected] (D.A. Korevaar), [email protected] (P.M.M. Bossuyt). http://dx.doi.org/10.1016/j.ejrad.2017.05.032 Received 2 March 2017; Received in revised form 4 May 2017; Accepted 23 May 2017 0720-048X/ © 2017 Published by Elsevier Ireland Ltd.

European Journal of Radiology 93 (2017) 59–64

T.A. McGrath et al.

methods and results sections along with full text keyword searches including “kappa”, “inter”, “observer” and “rater”. Extraction was verified by XX by repeating 20% of all extractions performed, and discrepancies were resolved through discussion.

variability may be minimized. However, this strategy may be less impactful when inter-observer agreement is high. Therefore if results are averaged, reporting inter-observer variability is of crucial importance to evaluate the meta-analysis results. If only the best or most experienced readers are selected, the test’s diagnostic accuracy may be overestimated and not reflect what is achievable in daily practice. These practices may have a less pronounced effect on accuracy estimates if all readers have similar performance, but will likely consistently overestimate test accuracy. If each reader is treated as an individual study, the results of a single study will be over-represented in the sample, the biases inherent to the study design will be magnified in the pooled results, and additional statistical challenges will occur due to the paired nature of the data; no straightforward statistical solutions currently exist for this. When choosing a reader at random, there is a risk of bias due to sampling error. If you were to randomly choose all the best or worst readers for your systematic review-meta-analysis the results would be biased. The purpose of our study was to evaluate the current reporting of how multiple reader data is handled in systematic reviews-metaanalyses of diagnostic accuracy studies in imaging.

2.3. Post hoc primary study assessment Due to scarce reporting of multiple readers handling in the included systematic reviews-meta-analyses, a post hoc assessment was undertaken to assess the incidence of multiple readers at the primary study level in a subset of the included systematic reviews-meta-analyses to determine if review authors were under-reporting, or if data was absent at the primary study level. Systematic reviews-meta-analyses reporting on the methods used for handling multiple readers, and thus clearly containing primary studies with multiple readers, were excluded from this post hoc assessment. To balance sufficient sample size with manual effort required, from the systematic reviews-meta-analyses for which multiple reader handling methods were categorized as “not specified”, a random 10% sample was selected using a random number generator [18]. From this sample of systematic reviews-meta-analyses, full texts of all primary studies included in the reviews were retrieved and assessed for: [1] the presence of multiple readers, [2] reporting of a single consensus reading from multiple readers (i.e. multiple readers interpret images, discuss discrepant results, reach consensus, and report one final dataset), [3] reporting of inter-observer variability (e.g. Cohen's kappa coefficient or (ICC)) and [4] reporting of independent datasets of each reader. Two authors (AA and BB, first year medical student) each performed half of this data extraction from primary studies. Extraction was verified by XX by repeating 20% of all extractions performed, and discrepancies were resolved through discussion.

2. Materials and methods 2.1. Identification of systematic reviews-meta-analyses Medline was searched through PubMed, applying the database’s systemic review filter, combined with a previously published search filter for diagnostic accuracy studies (Appendix 1 in Supplementary data) [14,15]. The searches were restricted to “radiology, nuclear medicine & medical imaging” journals, as defined by Thomson Reuters’ Journal Citation Reports [16]. A list of the 127 included journals is available in Appendix 2 in Supplementary data. Searches were performed on May 31, 2015, and limited to articles published from January 1 2005. A ten-year period was selected for analysis to obtain a substantial sample size of relatively recent reviews without obtaining an inordinate number of results. No language restrictions were applied. An analysis with a different purpose of the same set of systematic reviews-meta-analyses was previously reported elsewhere [17]. Inclusion criteria: only systematic reviews that performed metaanalyses of diagnostic accuracy imaging studies were included. One author (XX; second year medical student) reviewed retrieved titles and abstracts and, if potentially relevant, assessed their full texts. In case of any doubt, eligibility of the article was discussed with a second author (YY; fellowship-trained abdominal radiologist with 10 years of clinical experience and 6 years of experience in performing systematic reviews).

3. Results 3.1. Identification of systematic reviews-meta-analyses The literature search yielded 839 articles. After screening titles and abstracts, 532 articles were excluded, as they did not meet the inclusion criteria. Full texts of the remaining 307 articles were retrieved and assessed for inclusion. Eleven articles were excluded at the full text level for reasons specified in Fig. 1 and Appendix 3 in Supplementary data, yielding 296 systematic reviews for the study, published in 45 different journals. Summary demographic data of included reviews are presented in Table 1. Characteristics of individual included reviews are available in Appendix 4 in Supplementary data.

2.2. Data extraction

3.2. Multiple readers handling in systematic reviews-meta-analyses

Two authors (XX and YY) independently extracted the following data from each included systematic review-meta-analysis: first author, journal of publication, year of publication, country and continent of corresponding author, subspecialty area and review title. They also independently extracted which methods each systematic review-meta-analysis reported for handling multiple readers within a primary study, classified as: [1] results were averaged across readers; [2] results of ‘best’ reader were selected; [3] results of ‘most experienced’ reader were selected; [4] results of all readers were selected and treated as individual studies; [5] results from one randomly assigned reader were selected; [6] another type of strategy; or [7] not specified. Extraction relied on full reading of methods and results sections, as well as full text keyword searches including “reader”, “observer”, “rater” and “multiple”. Additionally, one author (AA, second year medical student) extracted whether included systematic reviews-meta-analyses reported on inter-observer variability statistics (e.g. Cohen's kappa coefficient or inter-class correlation ICC) within primary studies, by reading the

28 of the 296 included systematic reviews-meta-analyses (9.5%) reported their methods for handling multiple readers in primary studies: 7/28 (25%) averaged the results from multiple readers within a study, 1/28 (3.5%) selected only the ‘best reader’ within a study, 1/28 (3.5%) selected only the ‘most experience reader’ within a study, 14/28 (50%) treated each individual reader within a study as a separate dataset, 1/28 (3.5%) randomly selected one reader within a study, and 4/28 (14%) used ‘other’ treatments. Of reviews that used ‘other’ methods, 3 used data from the ‘first’ reader reported within a study. These data are summarized in Table 2 and depicted in Fig. 2. 5/296 systematic reviews-meta-analyses reported statistics on interobserver variability (Kappa or ICC) among included primary studies in some capacity; none of these studies included inter-observer variability statistics in their formal data analysis. 3.3. Multiple readers in primary studies A random sample of 27 reviews for which the multiple reader 60

European Journal of Radiology 93 (2017) 59–64

T.A. McGrath et al.

Fig. 1. Study Flow Diagram outlining search and study selection.

handling was classified as ‘not specified’ were selected, together including 496 primary diagnostic accuracy studies. Of these primary studies, 8 were excluded as duplicates, 39 were excluded due to being published in languages other than English, and 7 had to be excluded because full texts could not be obtained (Fig. 1). This resulted in 442 primary studies being assessed. Duplicate extraction of 20% of these studies identified discrepancies in 1% (3/258) of extracted data elements. 270/442 (61%) included primary studies reported the presence of multiple readers of the index test: 164/442 (37%) reported reading by consensus, 87/442 (20%) reported an assessment of inter-observer variability and 9/442 (2%) reported independent datasets for each reader; these classifications

were not mutually exclusive. 77/442 (17%) only reported to have used multiple readers, but did not report independent data sets or how the multiple reader data were combined into a single data set. A summary of reporting is provided in Table 3. Twenty-six of the 27 (96%) systematic reviews-meta-analyses assessed included at least one primary study reporting multiple readers: 24/27 (89%) included one or more primary studies that reported consensus reading, 21/27 (78%) included one or more primary studies that reported on inter-observer variability and 8/27 (30%) included one or more primary studies that reported independent reader datasets. Therefore, of the 26 systematic reviews-meta-analyses assessed containing a minimum of one primary study with multiple readers, 8 (30%) 61

European Journal of Radiology 93 (2017) 59–64

T.A. McGrath et al.

had to actually take multiple reader data into account in their analyses, but did not report on how they did so.

Table 1 Demographic data of included diagnostic accuracy systematic reviews of imaging studies. Demographic

# of Systematic Reviews

Total reviews

296

Year 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

12 4 11 24 12 20 30 48 48 57 30

Subspecialty Breast Cardiac Gastrointestinal Genitourinary Head & Neck Musculoskeletal Neuroradiology Nuclear Medicine Obstetrical Other/Not Applicable Pediatric Thoracic Vascular & Interventional

13 48 50 27 13 15 15 76 18 4 2 10 5

Continenta Asia Australia Europe North America

129 6 119 42

Journalb Academic Radiology American Journal of Roentgenology Clinical Nuclear Medicine European Journal of Nuclear Medicine and Molecular Imaging European Journal of Radiology European Radiology Nuclear medicine communications Radiology Ultrasound in Obstetrics and Gynecology All journals with < 10 studies

4. Discussion We identify that diagnostic accuracy systematic reviews-metaanalyses of imaging studies are infrequently reporting on methods for handling multiple readers, despite the fact that almost all systematic reviews-meta-analyses of imaging studies have to deal with primary studies that included multiple readers; this may be compounded by a lack of clear reporting of multiple reader data in the included primary diagnostic accuracy studies. Among the few systematic reviews-metaanalyses that do report on multiple readers, methods for handling these vary considerably. Although guidance for performing systematic reviews-meta-analyses of diagnostic accuracy studies is available, there are presently no recommendations for handling or reporting multiple readers in such reviews [20,21]. This lack of guidance may be responsible for the variability in reporting and methods used. No method for treating multiple readers in systematic reviews-metaanalyses currently available is without shortcomings. If results of multiple readers are averaged, heterogeneity from inter-observer variability may be obscured. If only the best or most experienced readers are selected, the test’s diagnostic accuracy may be overestimated. If only a random or ‘first’ reader is chosen, then data is being omitted from the meta-analysis and reporting is incomplete. If each reader is treated as an individual study, the results of this study may be over-represented in the sample, which could lead to additional statistical challenges, and the biases inherent to the study design will be magnified in the pooled results. A novel hierarchical statistical model may need to be developed to overcome this hurdle in meta-analyses of diagnostic test accuracy. A hierarchical model that accounts for between-observer variability within studies and between-study variability would be feasible if multiple reader data was reported consistently at the primary study level. Such a technique has the potential to account for inter-observer variability within a study and the correlation between sensitivity and specificity using a bivariate framework. Using such a model, all readers would be included in the meta-analysis, inter-observer variability at the primary study level would not be lost, and a single study would not be over-represented. Unfortunately, such a model does not currently exist, as far as we know. Until one is developed, the best option may be for systematic review authors to disclose their methodology for handling multiple readers so the inherent bias of their chosen method may be taken into account by the individual reading the systematic reviewmeta-analysis. For authors of systematic reviews-meta-analyses to be able to take multiple reader data into account in their analysis, they rely on reporting in primary diagnostic accuracy studies. Our findings suggest that the majority of primary imaging diagnostic accuracy studies have multiple reader data, but only a small minority provide information on inter-observer variability and individual reader datasets so that systematic reviews can take them into account. Due to the small number of primary studies reporting multiple reader data, attempting to determine what effect the various methods of treating multiple readers may have on diagnostic accuracy summary estimates is, at present, not feasible in a meaningful manner. While not formally addressed in this study, reporting of intra-observer variability (if assessed) and all applicable datasets is equally important at the primary study level. The multiple datasets may not arise from multiple readers per se, but many of the same problems and principles apply to reporting and analyzing multiple datasets from a single reader in a meta-analysis. Guidance on how to address multiple reader data in primary diagnostic accuracy studies is currently lacking. The STAndards for Reporting Diagnostic accuracy (STARD) published updated guidelines as of 2015 [22]. This 30 item checklist contains essential items that

16 15 11 12 35 37 21 31 20 98

a

Continent was determined by the address of the corresponding author. Only journals that published ≥10 systematic reviews of diagnostic accuracy studies are presented individually. b

Table 2 Reporting of multiple reader handling in 296 diagnostic accuracy systematic reviews of imaging studies. Specified Multiple Reader Handling

# of Total Systematic Reviews (%)

Yes No

28 (9.5) 268 (91.5)

Method of Multiple Reader Handling

# of Specified Systematic Reviews (%)

Averaged ‘Best’ Reader ‘Most experienced’ reader Treated individually as separate dataset Randomly selected reader Other method

7 (25) 1 (3.5) 1 (3.5) 14 (50) 1 (3.5) 4 (14)

62

European Journal of Radiology 93 (2017) 59–64

T.A. McGrath et al.

Fig. 2. Frequency of methods for handling multiple readers (January 2005–May 2015) in diagnostic accuracy systematic reviews of imaging studies.

this practice is worrisome at the primary study level, systematic reviews are generally considered a higher level of evidence, thus making the perpetuation of these reporting deficiencies potentially more problematic. For the sake of transparency and reproducibility, if multiple readers are used in a study, we recommend that independent datasets are reported, and if systematic reviews encounter primary studies with independent reader datasets they should explicitly state how many had multiple readers, how many readers per study and how they treated the data in their analysis so that the biases introduced by the method are apparent to the reader. Our study is not without limitation. We were unable to make statements about the impact of each multiple readers handling strategy, as the number of systematic reviews reporting their methods is by far a minority. Until reporting independent datasets for each reader in primary studies becomes prevalent, the impact of multiple readers in systematic reviews is difficult to determine. Failure to report the handling of multiple readers is a deficit in completeness of reporting, which is a noted deficiency among imaging systematic reviews [25,26]. In addition, our search only covers a 10-year period and is limited to imaging diagnostic accuracy systematic reviews published in imaging journals; excluding studies published prior to 2005 and in different journals is a potential source of bias. Planned evaluation for variability in handling of multiple readers by journal, subspecialty and year of publication could not be done because so few systematic reviews-metaanalyses reported on how they handled multiple readers.

Table 3 Reporting of multiple readers in 442 primary diagnostic accuracy studies of imaging modalities. Multiple Readers Present

# of Total Primary Studies (%)

Yes No

270 (61) 172 (39)

Multiple Reader Reporting

# of Multiple Reader Primary Studies (%)

Consensus only Inter-observer variability only Consensus + inter-observer variability Inter-observer variability + independent multiple reader data Consensus + inter-observer variability + independent multiple reader data No additional information on multiple reader data

106 (39) 21 (8) 57 (21) 8 (3) 1 (< 1) 77 (28)

should be reported in the report of a diagnostic accuracy study. While these guidelines serve to elevate the baseline level of reporting in diagnostic test accuracy studies, none of the items on the checklist are specific to imaging studies. The presence of multiple readers is a phenomenon that is highly prevalent among imaging studies and may not be encountered in other fields of diagnostic testing. An extension of the STARD 2015 checklist, which would provide reporting guidance specific to imaging studies, could help address the multiple readers issue and other issues specific to imaging test accuracy studies. The shortcomings of using consensus reads for multiple readers in primary diagnostic accuracy studies have been previously described [1]. Methods exist to generate average test accuracy measures and assess variability between readers within primary studies; however our findings suggest that consensus readings still appear frequently in the literature, without reporting inter-observer variability [23,24]. A single reader may have expertise that is difficult for others to duplicate and multiple readers in a study can help uncover this. Reporting multiple readings as a consensus may obscure variability between readers. Moreover, unless multiple readers in clinical practice are reading the studied test in consensus, the reported accuracy measures in the study may not be generalizable to routine clinical practice [1]. Thus, from a clinical practice perspective, it is useful to include multiple independent readers in a primary diagnostic accuracy study, to report on the interobserver variability, and to provide independent datasets for each reader. Unfortunately, this was the case in only 9 of 442 primary studies analyzed. These reporting deficiencies at the primary study level are being perpetuated at the level of the systematic review-meta-analysis. While

5. Conclusions A minority of diagnostic accuracy systematic reviews-meta-analyses of imaging studies report on methods for handling multiple readers in primary studies; a minority of these primary studies report the data of individual multiple readers. For completeness of reporting and to assess clinical applicability of systematic review results, authors of systematic reviews-meta-analyses are encouraged to report the incidence and handling of multiple reader data. Authors of primary studies are encouraged to report the results from individual readers. Conflicts of interest The authors have no relevant conflicts of interest to declare. Acknowledgements We would like to acknowledge Mrs. Alexandra Davis, librarian at the Ottawa Hospital, for her assistance in the design of our search strategy and the retrieval of selected articles. 63

European Journal of Radiology 93 (2017) 59–64

T.A. McGrath et al.

Collaboration, 2010, 2017. [11] A. Liberati, D.G. Altman, J. Tetzlaff, et al., The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration, J. Clin. Epidemiol. 62 (10) (2009) e1–34. [12] B.J. Shea, J.M. Grimshaw, G.A. Wells, et al., Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews, BMC Med. Res. Methodol. 7 (2007) 10. [14] W.A. van Enst, E. Ochodo, R.J. Scholten, L. Hooft, M.M. Leeflang, Investigation of publication bias in meta-analyses of diagnostic test accuracy: a meta-epidemiological study, BMC Med. Res. Methodol. 14 (2014) 70. [15] W.L. Devillé, P.D. Bezemer, L.M. Bouter, Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy, J. Clin. Epidemiol. 53 (1) (2000) 65–69. [16] Thomson Reuters, Science Citation Index Expanded – Radiology, Nuclear Medicine & Medical Imaging – Journal List, Thomson Reuters;, 2015 [cited 15 April 2015]; Available from: http://science.thomsonreuters.com/cgi-bin/jrnlst/jlresults. cgi?PC=D&SC=VY. [17] T.A. McGrath, M.D. McInnes, D.A. Korevaar, P.M. Bossuyt, Meta-analyses of diagnostic accuracy in imaging journals: analysis of pooling techniques and their effect on summary estimates of diagnostic accuracy, Radiology 152229 (2016). [18] Ltd. RaIS. True Random Number Generator. 2015 [cited 15 September 2015]. [20] J. Deeks, P. Bossuyt, C. Gatsonis, Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, 1.0.0 ed., The Cochrane Collaboration, 2013. [21] D. Moher, A. Liberati, J. Tetzlaff, D.G. Altman, P. Group, Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement, BMJ 339 (2009) b2535. [22] P.M. Bossuyt, J.B. Reitsma, D.E. Bruns, et al., STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies, Radiology 277 (3) (2015) 826–832. [23] N.A. Obuchowski, S.V. Beiden, K.S. Berbaum, et al., Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods, Acad. Radiol. 11 (9) (2004) 980–995. [24] N.A. Obuchowski, New methodological tools for multiple-reader ROC studies, Radiology 243 (1) (2007) 10–12. [25] M.D. McInnes, P.M. Bossuyt, Pitfalls of systematic reviews and meta-analyses in imaging research, Radiology 277 (1) (2015) 13–21. [26] A.S. Tunis, M.D. McInnes, R. Hanna, K. Esmail, Association of study quality with completeness of reporting: have completeness of reporting and quality of systematic reviews and meta-analyses in major radiology journals changed since publication of the PRISMA statement? Radiology 269 (2) (2013) 413–426.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ejrad.2017.05.032. References [1] D. Levine, A. Bankier, E. Halpern, Submissions to radiology: our top ten list of statistical errors, Radiology (2009) 288–290. [2] T. Hodgdon, M.D. McInnes, N. Schieda, T.A. Flood, L. Lamb, R.E. Thornhill, Can quantitative CT texture analysis be used to differentiate fat-poor renal angiomyolipoma from renal cell carcinoma on unenhanced CT images? Radiology 276 (3) (2015) 787–796. [3] N. Schieda, M. Al-Subhi, T.A. Flood, M. El-Khodary, M.D. McInnes, Diagnostic accuracy of segmental enhancement inversion for the diagnosis of renal oncocytoma using biphasic computed tomography (CT) and multiphase contrast-enhanced magnetic resonance imaging (MRI), Eur. Radiol. 24 (11) (2014) 2787–2794. [4] N. Schieda, C.B. van der Pol, B. Moosavi, M.D. McInnes, K.T. Mai, T.A. Flood, Intracellular lipid in papillary renal cell carcinoma (pRCC): T2 weighted (T2W) MRI and pathologic correlation, Eur. Radiol. 25 (7) (2015) 2134–2142. [5] Y.J. Lee, J.M. Lee, J.S. Lee, et al., Hepatocellular carcinoma: diagnostic performance of multidetector CT and MR imaging—a systematic review and meta-analysis, Radiology (2015) 140690. [6] M.C. de Jong, T.S. Genders, R.J. van Geuns, A. Moelker, M.G. Hunink, Diagnostic performance of stress myocardial perfusion imaging for coronary artery disease: a systematic review and meta-analysis, Eur. Radiol. 22 (9) (2012) 1881–1895. [7] S.K. Das, X.K. Niu, J.L. Wang, et al., Usefulness of DWI in preoperative assessment of deep myometrial invasion in patients with endometrial carcinoma: a systematic review and meta-analysis, Cancer Imaging 14 (2014) 32. [8] M. Dave, B.J. Elmunzer, B.A. Dwamena, P.D. Higgins, Primary sclerosing cholangitis: meta-analysis of diagnostic performance of MR cholangiopancreatography, Radiology 256 (2) (2010) 387–396. [9] A. Andreano, G. Rechichi, P. Rebora, S. Sironi, M.G. Valsecchi, S. Galimberti, MR diffusion imaging for preoperative staging of myometrial invasion in patients with endometrial cancer: a systematic review and meta-analysis, Eur. Radiol. 24 (6) (2014) 1327–1338. [10] P.G.C. Macaskill, J.J. Deeks, R.M. Harbord, Y. Takwoingi, Chapter 10: analysing and presenting results, in: J.J. Deeks, P.M. Bossuyt, C. Gatsonis (Eds.), Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, The Cochrane

64