ARTICLE IN PRESS
Original Investigation
Has the Objective Quality of Evidence in Imaging Papers Changed Over the Last 20 Years? Danielle E. Kostrubiak, MD, Renee F. Cattell, BA, Franco Momoli, MSc, PhD, Mark E. Schweitzer, MD Rationale and Objectives: We aimed to determine if both evidence level (EL) as well as clinical efficacy (CE) of imaging manuscripts have changed over the last 20 years. Materials and Methods: With our review of medical literature, Institutional Review Board approval was waived, and no informed consent was required. Using Web of Science, we determined the 10 highest impact factor imaging journals. For each journal the 10 most cited and 10 average cited papers were compared for the following years: 1994, 1998, 2002, 2006, 2010, and 2014. EL was graded using the same criteria as the Journal of Bone and Joint Surgery (Wright et al., 2003). CE was graded using the criteria of Thornbury and Fryback (1991). Statistical software R and package lme4 were used to fit mixed regression models with fixed effects for group, year, and a random effect for journal. Results: EL has improved −0.03 every year on average (P < .001). The more cited papers had better ELs (group effect = −0.23, SE 0.09, P = .011). CE is lower in top cited compared to average cited articles, although the differences were not statistically significant (group effect = −0.14, SE = 0.09, P = .16). CE level increased modestly in both groups over this 20-year time period (0.06 per year, SE = 0.007, P < .001). Conclusion: Over the last 20 years, imaging journal articles have improved modestly in quality of evidence, as measured by EL and CE. Key Words: Evidence-based medicine; radiology; research design; quality improvement; radiologic technology. © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
INTRODUCTION
G
overnment bodies and insurance companies often rely on scientific papers to make best quality care recommendations, which influence reimbursement decisions for medical and imaging procedures. To a large degree, the support for these decisions is based on the scientific strength of evidence available (1,2). For example, in the UK, the National Institute for Health and Care Excellence provides “technology appraisal guidance” which includes a systematic review of the available evidence with a preference for higher grade evidence such as randomized controlled trials (2,3). UK National Health Service organizations are mandated by law to provide payment for those technologies that have been vetted through this system (4). In France, Haute Autorite de Sante (HAS), the High Authority of Health, serves Acad Radiol 2018; ■:■■–■■ From the University of Vermont Medical Center, 111 Colchester Ave, Burlington, VT, 05401 (D.E.K.); Department of Radiology, Health Sciences Center, Stony Brook University School of Medicine, Stony Brook, New York (R.F.C., M.E.S.); Centre for Practice-Changing Research, University of Ottawa, Ottawa, Ontario, Canada (F.M.). Received July 24, 2017; revised December 1, 2017; accepted December 27, 2017. Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Address correspondence to: D.K. e-mail:
[email protected],
[email protected] © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved. https://doi.org/10.1016/j.acra.2017.12.026
a similar role, providing recommendations to UNCAM (Union Nationale des Caisses d’Assurance Maladie), an organization which then uses these recommendations to determine reimbursement rates (3,5,6). Both of these authorities grade the available evidence for a particular treatment or diagnostic modality according to a scale of “evidence level” (EL), with the highest ELs being the most rigorous studies (2,3,5). In the United States, with a multipayer system, including both government and private entities, reimbursement recommendations are much more complex, and often not generated by the payers themselves (7). The US Preventive Services Task Force (USPSTF) is responsible for creating recommendations— which payers can choose to support with reimbursement decisions or not—based upon the available evidence, with graded levels from A-I based upon the strength and quality of the published evidence (8). Meanwhile, the Medicare Payment Advisory Commission makes recommendations on Medicare reimbursements based upon these grades and other quality metrics, with private payers often following suit (9). These organizations, and others like them, evaluate the quality of research papers based upon several subjective and objective factors, including EL and clinical efficacy (CE) (2,8,9). EL is defined by the strength of the methods and study design, including sample size, selection, randomization, blinding, data collection, and follow-up (10,11). It was first described as a metric to evaluate existing data in the Canadian Task Force 1
KOSTRUBIAK ET AL
on Physical Health Examination, where a grading system of evidence was established, with grade I as evidence from a randomized controlled trial, and grade III as an expert opinion (10). Since that time, journals and professional organizations have adopted their own grading systems for ELs. The Oxford Centre for Evidence-based Medicine, for example, grades ELs of manuscripts and research according to their methods, with systematic reviews of randomized controlled trials as the highest EL paper (11). In various fields of medicine, journals and medical societies have also created similar standards for ELs; for example, the North American Spine Society created the “Levels of Evidence for Primary Research Question,” which has since been adopted by the Journal of Bone and Joint Surgery, Elsevier, and other top journals and publishers (1,12). In radiology, journals such as the Journal of Magnetic Resonance Imaging, have followed suit, seeking to improve the quality of the research that they publish (13). Another metric for measuring the quality of imaging studies is the level of CE that they assess. In 1991, Fryback and Thornbury described a hierarchy of 6 levels of efficacy to evaluate medical imaging systems: technical efficacy, diagnostic accuracy, diagnostic thinking efficacy, therapeutic efficacy, patient outcome, and societal efficacy (14). Their definition of efficacy is “the probability of benefit to individuals in a defined population from a medical technology applied for a given medical problem under ideal conditions of use” (14). It is important to evaluate imaging systems not only on a technical level but also in the sense of its effect on the decision-making of the clinician and contribution to society as a whole (14). As there is increasing pressure from governmental and corporate funders to provide high-quality, high-value care, which is informed by the available scientific data, it is important to evaluate the quality and strength of these data. One way to do this is to use standardized measures, such as EL and CE. Over the past 15 years or so, the concept of assessing EL and CE of publications has been introduced, and many fields of medicine have studied their literature based on these metrics in an effort to improve it (1,15–21). The purpose of our study was to determine if both the EL and the CE of imaging manuscripts have changed over the last 20 years. MATERIALS AND METHODS We performed a review of existent medical literature without human subjects, and, therefore, Institutional Review Board approval was waived, and no informed consent was required. Using Web of Science (on February 16, 2016) we determined the 10 highest impact factor (IF) imaging journals, including Journal of the American College of CardiologyCardiovascular Imaging, Radiology, Neuroimage, Journal of Nuclear Medicine, Human Brain Mapping, Circulation-Cardiovascular Imaging, European Journal of Nuclear Medicine and Molecular Imaging, Journal of Cardiovascular Magnetic Resonance, Investigative Radiology, and European Radiology—all with impact factors greater than 4 (22). Web of Science determines impact factor based on the frequency of citation for the average article in each journal 2
Academic Radiology, Vol ■, No ■■, ■■ 2018
(22). For each journal, the 10 most cited and 10 average cited papers were compared for each of the following publication years: 1994, 1998, 2002, 2006, 2010, and 2014. The number of citations was determined using Web of Science “times cited count” for each year. The 10 average cited papers were chosen based on the average citations per item for that year from the Citation Report on Web of Science. This was found by searching a specific publication year (ie, 2014) and a specific journal on Advanced Search in Web of Science and then selecting “citation report” for these results. The average citations per item were reported, and then the 10 papers that were closest to that average value were selected by sorting by number of citations. The metrics for evaluation were EL and CE. EL was graded on a scale of 1–5: level 1 focused on prospective randomized trials with an excellent reference standard, as well as systematic reviews of randomized controlled trials, and hence the best EL; level 2 included prospective studies and lesser reference standards; level 3 included nonconsecutive cohort studies; level 4 included retrospective case series; and level 5 included “expert opinions,” commentaries, and editorials, considered the lowest EL, using the same criteria as the Journal of Bone and Joint Surgery (1). We chose this criterion as it is used by many publishers, including Elsevier, as well as top journals (1,12). CE was graded on a scale of 1–6, with 1 as the lowest, focused on image quality; level 2 focused on accuracy, sensitivity, and specificity; level 3 included the effect on pre- and post-test diagnostic probabilities and the usefulness of the test in clinical diagnosis; level 4 included the usefulness of the test in management of care; level 5 focused on the clinical outcomes of the test at the patient level, including risk/benefit analysis; and level 6 the highest level, included cost and social impact of the test, based on the criteria of Thornbury and Fryback (14). The scale of Thornbury and Fryback rates CE, with 6 being the highest, whereas the EL scale rates 1 as the highest. One researcher read and analyzed all of these papers to determine the ratings for the manuscript. Each paper was rated based upon the scales as outlined earlier. Some papers did not fit into the ELs as outlined earlier, as they were either basic science, computer algorithm, letters to the editor, or educational papers. These papers were excluded from final counts and averages. Likewise, some basic science and computer algorithm papers did not fit into CE levels as previously mentioned, and were excluded from final counts and averages. The original researcher re-graded a random subset (10%) of the papers 6 months after the original analysis, and a second researcher assessed the papers for interobserver and intraobserver concordance. Kappa reliability coefficient was calculated for these re-graded subsets by Altman’s criteria (23). A weighted average of scores was derived for each journal for each year for top vs average cited papers, in order to create a linear mixed model for analysis. Statistical software R (version 3.2.4) and package lme4 were used to fit mixed regression models with fixed effects for group (average vs top cited) and year, and a random effect for journal. Analysis of variance with
Academic Radiology, Vol ■, No ■■, ■■ 2018
likelihood ratio test was used for the mixed model, with P < .05 considered statistically significant.
QUALITY OF EVIDENCE IN IMAGING PAPERS
TABLE 3. Average Cited Papers—Number in Each Clinical Efficacy Level for Each Year
RESULTS Unweighted kappa for interobserver concordance was 0.50 (0.45–0.69) for EL and 0.49 (0.35–0.63) for CE, which correspond to “moderate” reliability according to the Altman criteria: moderate (0.41 ≥ κ ≤ 0.60) (23). Likewise the intraobserver kappa was 0.88 (0.81–0.96) for EL and 0.97 (0.93– 1.00) for CE, corresponding to excellent reliability (0.81 ≥ κ ≤ 1.00). For average cited papers, the EL in 2014 was 3.10, which is higher on the objective scale than each year preceding (Table 1). The P value for year effect is .001. In 1994, 1998, and 2002, there were no papers that fit into the EL 1 category, which includes prospective randomized trials with an excellent reference standard, whereas in 2014 there were 4 average cited papers in this category (Table 1). Likewise, for top cited papers, the EL in 2014 was 2.89, whereas in 1994 it was 3.84, and there were no papers in EL 1 in 1994, and 8 papers in this category in 2014 (Table 2). For CE level, the scale is reversed. In 2014, for average cited papers, the average level was 3.05, whereas in 1994 it was 1.72 (Table 3). For the highest CE level of 6, which includes cost and societal efficacy, there were no average cited papers that fit this category (Table 3). For top cited papers, the average CE in 2014 was 2.64, up from 1.42 in 1994 (Table 4). There was 1 CE level 6 in 2010, but not in any other year (Table 4).
TABLE 1. Average Cited Papers—Number in Each Evidence Level for Each Year Year
EL 1
EL 2
EL 3
EL 4
EL 5
Average for Year
2014 2010 2006 2002 1998 1994
4 8 0 0 0 0
26 20 8 8 5 5
28 14 15 17 11 9
15 25 34 25 35 18
14 17 12 10 8 10
3.10 3.27 3.72 3.61 3.78 3.78
EL, evidence level.
TABLE 2. Top Cited Papers—Number in Each Evidence Level for Each Year Year
EL 1
EL 2
EL 3
EL 4
EL 5
Average for Year
2014 2010 2006 2002 1998 1994
8 8 5 4 1 0
38 26 19 19 11 3
24 23 22 14 12 12
17 14 7 14 22 18
13 26 13 19 11 11
2.89 3.25 3.06 3.35 3.54 3.84
EL, evidence level.
2014 2010 2006 2002 1998 1994
CE 1
CE 2
CE 3
CE 4
CE 5
CE 6
Average for Year
12 11 25 39 29 34
14 32 23 21 20 14
28 30 7 2 2 3
10 8 10 5 9 3
16 16 6 5 2 3
0 0 0 0 0 0
3.05 2.86 2.28 1.83 1.95 1.72
CE, clinical efficacy.
TABLE 4. Top Cited Papers—Number in Each Clinical Efficacy Level for Each Year
2014 2010 2006 2002 1998 1994
CE 1
CE 2
CE 3
CE 4
CE 5
CE 6
Average for Year
17 13 21 33 34 40
33 27 30 23 19 16
30 28 13 8 3 1
9 10 9 5 5 1
11 21 0 5 1 1
0 1 0 0 0 0
2.64 3.02 2.14 2.00 1.71 1.42
CE, clinical efficacy.
TABLE 5. Evidence Level and Clinical Efficacy—Results of Mixed Regression Analysis, Fixed Effect Group Effect Evidence level Top vs average cited papers Journal year Clinical efficacy Top vs average cited papers Journal year
Estimate
Standard Error
P Value
−0.23
0.09
.011*
−0.03
0.01
<.001*
−0.14
0.09
.16
0.06
0.01
<.001*
* Statistical significance.
The EL of manuscripts has improved over time by 0.03 points on a scale of 1–5 every year on average, as shown by the group effect of year of −0.03 (P < .001) (Table 5). In 1994, the average EL for both top and average cited papers was 3.81, and in 2014, it was 2.99 (Fig 1). Furthermore, the more cited papers also had better ELs compared to the average cited papers (group effect = −0.23, SE 0.09, P = .011) (Table 5). For example, in 2006, the average EL for average cited papers was 2.85 and for top cited papers was 2.24. Interestingly, the levels of CE were lower in top cited compared to average cited articles, although the differences were not statistically significant (group effect = −0.14, SE = 0.09, P = .16) (Table 5). However, CE level did increase modestly over this 20-year time period, with an average score of 1.57 in 1994 to 2.83 in 2014 (0.06 3
KOSTRUBIAK ET AL
Figure 1. Average evidence level of top and average cited papers over 20 years, showing trend of improvement over time.
Figure 2. Average clinical efficacy of top and average cited papers over 20 years, showing trend of improvement over time.
year group effect, SE = 0.007, P < .001) (Fig 2, Table 5). Note that because the scale for EL is reversed from CE, the trends are in opposite directions, although both improved over time. DISCUSSION Compared to average cited imaging papers, top cited papers have better ELs, with no statistically significant difference in CE levels. Additionally, over time, the EL of imaging papers has improved in these top 10 journals, as has the levels of CE. This suggests that the quality of imaging research has improved over time, and that top cited papers are equal if not higher quality than average cited papers in these journals. There was modest interobserver variability (moderate kappa score) in our results, which is expected in any scoring system of this nature. This is likely due to different interpretation of various ELs and CE, as there are some ambiguity and judgment calls that need to be made with each assignment. Observers were also not blinded to the year and journal, which could have introduced some bias. Additionally, the observers did not discuss each journal to reach a consensus, but rather graded the entire set individually. 4
Academic Radiology, Vol ■, No ■■, ■■ 2018
Other fields in medicine have sought to determine if their published literature is high quality based on EL. For example, in Plastic Surgery, a recent study of 4 journals found an average EL of 3.2 (using the Oxford evidence based-medicine levels) (24). Studies of manuscripts in the fields of ophthalmology and urology have produced similar results (25,26). Similar to our study, some researchers in other fields of medicine have sought to determine if high impact journals and manuscripts are correlated with higher EL. In the field of plastic surgery, a 2011 study found this to be the case, with higher impact factor journals publishing higher EL manuscripts (27). Likewise, a study of 2011–2012 pediatric literature showed that articles published in top medical journals such as New England Journal of Medicine had higher EL than those published in general pediatric journals (28). These findings correlate with our study’s results—the top cited articles have higher EL in the imaging journals we studied. In other fields, the question of whether EL has improved over time has been examined as well. In the Journal of Bone and Joint Surgery, Hanzlik et al. found an increase in EL from 3.72 to 2.90 (using the same scale that our study did) from 1975 to 2005, which is similar to our result of an increase in EL from 3.84 to 2.89 in top cited articles from 1994 to 2014 (29). Similarly, the sports medicine literature (from 1995 to 2010) and pediatric surgery literature (from 1998 to 2007) have reported increases in EL over time (18,19). Conversely, the neurosurgery literature reported a decrease in EL over time— using the Detsky quality of reporting scale—indicating that from 1999 to 2010, there was a decrease in level I manuscripts in 3 neurosurgical journals (17). This reported decrease is in discordance with our findings, although other fields have had results similar to ours. Our results as far as CE is concerned are overall echoed in the existing literature. The Standards for Reporting Diagnostic Accuracy (STARD) statement was developed in 2003 to attempt to improve the quality of published literature focused on testing diagnostic accuracy—largely CE level II (14,16). This statement includes a checklist of 25 items that a publication should have to be considered a high-quality diagnostic accuracy study (16). Over time, the literature would suggest that there has been a modest improvement in STARD use—a study of publications from the year 2000 reported a mean STARD score of 11.9/25 items, whereas a later study of articles from 2003 to 2012 reported a mean STARD of 15.5/25 (30,31). Although the study by Smidt et al. focused on an aspect of CE different from ours, the upward trend in quality is similar. Likewise, the recent study by Dilauro et al. (31) found a weak, statistically nonsignificant positive correlation between STARD score and citations, as well as impact factor, similar to the weak, statistically nonsignificant positive association between CE and top cited vs average cited articles that we found. Overall, we found a preponderance of low CE level studies, with almost no CE level 6 papers, as others have previously suggested in the literature, although without a formal analysis. Hollingworth and Jarvik (21) refer to technical assessment (TA), and create a “TA hierarchy” which is similar to the
Academic Radiology, Vol ■, No ■■, ■■ 2018
CE scale of Fryback and Thornbury (14). They note that overall, TA or efficacy research tends to be largely lower level, focused on basic diagnostic accuracy, possibly due to the cost associated with studies higher up the CE scale or TA hierarchy (21). Editors of journals have also commented on the lack of high CE studies, such as Schulze, the editor of Dental Maxillofacial Radiology who published a formal request for higher CE studies (20). Clearly others have noted the same trends in CE as our study, and noted that steps are being made in the right direction to improve the quality of published research. Our study is limited by the somewhat subjective rating systems used, although the use of established scales for EL and CE level does make these measures somewhat more objective. Our interobserver kappa score was moderate, which can be considered a limitation, related to this subjective rating system. Our study is also limited by sample size—although we analyzed 1200 manuscripts, we combined the measures into an average for each journal for each year for top vs average cited papers, effectively reducing the sample size to closer to 120. This was done for the purpose of statistical modeling; to improve the sample size more journals and more years of publications would need to be analyzed. Only the top impact factor imaging journals were evaluated in this study, which may not be representative of the field as a whole, but does represent the most read and cited studies overall. We chose to use Web of Science’s impact factor measurements, as determined by the frequency of citation of the “average article” in that journal, to determine the top 10 imaging journals (22). This may be another limitation, as there are many different ways to measure impact factor, and using a different metric may influence the results. Additionally, we may have missed some of the most cited imaging articles by focusing on imaging journals, as some important imaging articles are published in journals from other fields. In conclusion, over the last 20 years, imaging journal articles, in particular the top impact factor studies, have improved modestly in evidence quality, as measured by EL and CE. Additionally, top cited articles are of higher EL than average cited articles. Our findings are, to a large degree, similar to studies of other medical field’s scientific data quality. This improvement in quality has important ramifications for clinical knowledge base, patient care, and future research. As imagers, we are very dependent on payers’ decisions for our financial futures; hence it behooves us to be cognizant of improving the evidence and efficacy levels of our literature. REFERENCES 1. Wright JG, Swiontkowski MF, Heckman JD. Introducing levels of evidence to the journal. J Bone Joint Surg Am 2003; 85-A:1–3. 2. Into Practice. NICE: National Institute for Health and Care Excellence. 2016. Available at: https://www.nice.org.uk/. Accessed August 2016. 3. Hull S. Global evidence and reimbursement strategies. In: Consulting HAM, editor. 2011. 4. International Society for Pharmacoeconomics and Outcomes Research. United Kingdom (England and Wales) Reimbursement Process: International Society for Pharmacoeconomics and Outcomes Research, 2008. [updated October 2008]. Available at: https://www.ispor .org/htaroadmaps/uk.asp.
QUALITY OF EVIDENCE IN IMAGING PAPERS
5. Haute Autorite de Sante. HAS—Haute Autorite de Sante, 2015. Available at: http://www.has-sante.fr/portail/. Accessed August 2016. 6. France-Pharmeceuticals. France—Pharmaceuticals: International Society for Pharmacoeconomics and Outcomes Research, 2009. [updated October 2009]. Available at: https://www.ispor.org/HTARoadMaps/France.asp. Accessed August 2016. 7. Patricia Connelly, ELS, CMR International Institute for Regulatory Science. Review and reimbursement: aligning the needs and requirements in clinical development. Workshop Report. Washington DC, USA, 2010. Available at: http://cirsci.org/publications/1007_March_2010_Workshop_Review_and _Reimbursement.pdf. 8. About the USPSTF. US Preventive Services Task Force: USPST, 2016. Available at: http://www.uspreventiveservicestaskforce.org/. Accessed August 2016. 9. MEDPAC: The Medicare Payment Advisory Commission; 2016. 10. Examination CTFotPH. The periodic health examination. Can Med Assoc J 1979; 121:1193–1254. 11. Oxford Centre for Evidence-based Medicine—levels of evidence (March 2009). Centre for Evidence-based Medicine. 2016. Available at: http://www.cebm.net/oxford-centre-evidence-based-medicine-levels -evidence-march-2009/. Accessed August 1, 2016. 12. Levels of evidence for clinical studies. Elsevier. 2016. Available at: https://www.elsevier.com/__data/promis_misc/Levels_of_Evidence.pdf. 13. Schweitzer ME. Evidence level. J Magn Reson Imaging 2016; 43:543. 14. Fryback DG, Thornbury JR. The efficacy of diagnostic imaging. Med Decis Making 1991; 11:88–94. 15. Gazelle GS, Kessler L, Lee DW, et al. A framework for assessing the value of diagnostic imaging in the era of comparative effectiveness research. Radiology 2011; 261:692–698. 16. Bossuyt PM, Reitsma JB, Bruns DE, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med 2003; 138:W1–W12. 17. Yarascavitch BA, Chuback JE, Almenawer SA, et al. Levels of evidence in the neurosurgical literature: more tribulations than trials. Neurosurgery 2012; 71:1131–1137, discussion 7–8. 18. Al-Harbi K, Farrokhyar F, Mulla S, et al. Classification and appraisal of the level of clinical evidence of publications from the Canadian Association of Pediatric Surgeons for the past 10 years. J Pediatr Surg 2009; 44:1013–1017. 19. Grant HM, Tjoumakaris FP, Maltenfort MG, et al. Levels of evidence in the clinical sports medicine literature: are we getting better over time? Am J Sports Med 2014; 42:1738–1742. 20. Schulze R. The efficacy of diagnostic imaging. Dentomaxillofac Radiol 2012; 41:443. 21. Hollingworth W, Jarvik JG. Technology assessment in radiology: putting the evidence in evidence-based radiology. Radiology 2007; 244:31– 38. 22. Web of Science: Thomson Reuters. 2016. Available at: http://apps .webofknowledge.com. Accessed February 2016–February 2017. 23. Altman DG. Practical statistics for medical research. 1st ed. London: Chapman & Hall, 1991. 24. Sinno H, Neel OF, Lutfy J, et al. Level of evidence in plastic surgery research. Plast Reconstr Surg 2011; 127:974–980. 25. Borawski KM, Norris RD, Fesperman SF, et al. Levels of evidence in the urological literature. J Urol 2007; 178(4 Pt 1):1429–1433. 26. Lai TY, Leung GM, Wong VW, et al. How evidence-based are publications in clinical ophthalmic journals? Invest Ophthalmol Vis Sci 2006; 47:1831–1838. 27. Rodrigues MA, Tedesco AC, Nahas FX, et al. Journal impact factor versus the evidence level of articles published in plastic surgery journals. Plast Reconstr Surg 2014; 133:1502–1507. 28. Jacobson DA, Bhanot K, Yarascavitch B, et al. Levels of evidence: a comparison between top medical journals and general pediatric journals. BMC Pediatr 2015; 15:3. 29. Hanzlik S, Mahabir RC, Baynosa RC, et al. Levels of evidence in research published in The Journal of Bone and Joint Surgery (American Volume) over the last thirty years. J Bone Joint Surg Am 2009; 91:425– 428. 30. Smidt N, Rutjes AW, van der Windt DA, et al. Quality of reporting of diagnostic accuracy studies. Radiology 2005; 235:347–353. 31. Dilauro M, McInnes MD, Korevaar DA, et al. Is there an association between STARD statement adherence and citation rate? Radiology 2016; 280:62– 67.
5