Improving Accuracy in Reporting CT Scans of Oncology Patients: Assessing the Effect of Education and Feedback Interventions on the Application of the Response Evaluation Criteria in Solid Tumors (RECIST) Criteria Henry Andoh, BA, Nancy J. McNulty, MD, Petra J. Lewis, MB, BS Rationale and Objectives: In February 2010, our radiology department adopted the use of the Response Evaluation Criteria in Solid Tumors (RECIST) 1.1 criteria for newly diagnosed oncology patients. Prior to staff used RECIST 1.1, we hypothesized that education and feedback interventions could help clarify differences between RECIST 1.0 and the newly adopted RECIST 1.1 guidelines and result in appropriate and accurate utilization of both reporting systems. This study evaluates the effect of education and feedback interventions on the accuracy of computed tomography (CT) reporting using RECIST criteria. Materials and Methods: Consecutive CT scan reports and images were retrospectively reviewed during three different periods to assess for compliance and adherence to RECIST guidelines. Data collected included interpreting faculty, resident, type, and total number of errors per report. Significance testing of differences between cohorts was performed using an unequal variance t-test. Group 1 (baseline): RECIST 1.0 used; prior to adoption of RECIST 1.1 criteria. Group 2 (post distributed educational materials): Following adoption of RECIST 1.1 criteria and distribution of educational materials. Group 3 (post audit and feedback): Following the audit and feedback intervention. Results: The percentage of reports with errors decreased from 30% (baseline) to 28% (group 2) to 22% (group 3). Only the difference in error rate between the baseline and group 3 was significant (P = .03). Conclusion: The combination of distributed educational materials and audit and feedback interventions improved the quality of radiology reports requiring RECIST criteria by reducing the number of studies with errors. Key Words: RECIST; quality; report; CT scans. ªAUR, 2013
T
he quality and accuracy of the radiology report is critical for the appropriate management of oncology patients, both on and off clinical trials. The Response Evaluation Criteria in Solid Tumors (RECIST 1.0), generated by a multidisciplinary group of physicians, were initially published in 2000 to provide a standardized, simplified set of rules for measuring and reporting tumor burden in oncology patients. This facilitates accurate determination of tumor response to therapy to direct future treatment decisions (1). The RECIST 1.0 criteria were subsequently widely adopted by academic institutions, cooperative groups, and the pharmaceutical industry. It was subsequently determined that tumor response could accurately be assessed using fewer lesions, and in an effort to improve the accuracy of choosing and Acad Radiol 2013; 20:351–357 From the Geisel School of Medicine at Dartmouth, Hanover, NH (H.A., N.J.M., P.J.L.) and Department of Radiology, Dartmouth Hitchcock Medical Center, 1 Medical Center Drive, Lebanon, NH 03756 (N.J.M., P.J.L.). Received May 16, 2012; accepted December 8, 2012. Address correspondence to: N.J.M. e-mail:
[email protected] ªAUR, 2013 http://dx.doi.org/10.1016/j.acra.2012.12.002
measuring appropriate lymph node target lesions, the original criteria were revised by the RECIST Working Group yielding RECIST 1.1 criteria (2) (Table 1). The RECIST 1.0 criteria were adopted at our institution in 2005 following a didactic educational session and used exclusively until February 2010. In February 2010, the RECIST 1.1 criteria were adopted for use in the reports of any newly diagnosed oncology patients while the reports of any patient who had imaging prior to that date would continue to use RECIST 1.0 criteria. Our departmental computed tomography (CT) report standard includes a table of RECIST-defined indicator lesion measurements identified by lesion number, series number, image number, and size in mm. The table reports the current indicator lesion measurements as well as the corresponding measurements from the most recent comparison CT scan. Though adoption of both sets of RECIST criteria was viewed positively by our oncologists, the departmental application of the RECIST 1.0 criteria had not been consistent and low-volume readers had difficulty applying the specifics of the criteria. Because of this, education was deemed necessary to increase departmental accuracy using RECIST, and a formal QA assessment was initiated. 351
ANDOH ET AL
Academic Radiology, Vol 20, No 3, March 2013
TABLE 1. Summary Comparison of Guideline Characteristics in RECIST 1.0 and RECIST 1.1
Maximum number of target lesions Maximum number of target lesions per organ Axis to measure lymph nodes Minimal lymph node size for target lesion in millimeters (mm) Minimum size for target lesion (non-lymph node) (mm)
RECIST 1.0
RECIST 1.1
10 5
5 2
Long 10
Short 15
10
10
RECIST, Response Evaluation Criteria in Solid Tumors.
Educational interventions such as distribution of educational materials and audit with feedback have demonstrated the ability to improve physician practice by improving process outcomes (3,4). Distributed educational materials (DEM) represent a passive dissemination strategy that may use monographs, electronic publications in peer-reviewed journals, clinical practice guidelines, or audiovisual materials to improve knowledge, awareness, professional skills, or patient outcomes (3,6). Audit and feedback (A & F) is defined as any summary of clinical performance of health care over a specified period, given in a written, electronic, or verbal format (5). Before adopting RECIST 1.1 criteria and transitioning staff to the utilization of both RECIST criteria, we hypothesized that educational interventions such as these could improve the application and accuracy of reporting. Given that DEM and A & F represent the two most studied forms of educational intervention, we employed these in our study (3,4). This study evaluates whether the distribution of educational materials and audit with feedback intervention significantly improved the accuracy of reporting CT scans for oncology patients.
MATERIALS AND METHODS The study was approved by the Institutional Committee for the Protection of Human Subjects. The CT scan images and reports of all oncology scans performed over three 1-month periods were evaluated for adherence to RECIST guidelines. The three periods (cohorts) were: 1) pre-RECIST 1.1 introduction, 2) post RECIST 1.1 adoption and distribution of educational materials intervention, and 3) post audit and feedback intervention. Scan Review
All CT scan reports and axial images for each of the study groups were retrospectively reviewed by a medical student (H.A.) trained to identify specific types of errors. All errors recorded by the student were subsequently reviewed and confirmed by two faculty radiologists (N.M., P.L.) experienced in body imaging and in the application of the RECIST criteria 352
with 9 and 14 years of post-residency experience, respectively. Data collected included interpreting faculty, resident, each specific type of error made, and total number of errors made per study (Table 2). Each error type counted equally toward the total error score for each reader. Readers were not penalized for making the same type of error repeatedly within the same study. Errors were also subcategorized into major versus minor errors. A major error was defined as one that could result in misinterpretation of disease response, such as using lesions <10 mm or measuring a lymph node in the incorrect axis. Minor errors were defined as those unlikely to result in misinterpretation of disease response, such as measuring a lesion using the wrong window/level settings or slice thickness. Educational Interventions
DEM. The DEMs created for this study included a concise summary of RECIST 1.0 and 1.1 criteria and guidelines for utilization, including appropriate indicator lesion selection, measurement techniques, and reporting standards. These were provided in the following formats: a two-page summary handout was printed and placed in radiologists’ mailboxes, sent via e-mail, posted in all radiology reading rooms, and placed in the departmental Google documents folder. Staff radiologists and trainees also attended a 1-hour audiovisual presentation that highlighted the key features and differences of RECIST 1.0 and 1.1, the rationale for revision of the criteria, and provided a detailed summary of proper utilization and application of RECIST 1.0 and RECIST 1.1 criteria. The presentation consisted of 76 slides, with multiple imaging examples of common errors made and how to avoid them. In addition, the presentation was electronically distributed to all staff radiologists and trainees. A & F. One month after the DEM intervention, the reports and images of all CT scans of oncology patients over a 1month period were reviewed and analyzed on a picture archive and communication system (PACS) workstation. Errors in the application of the RECIST criteria were tabulated. Each staff radiologist then received an email providing a summary of their clinical accuracy in applying the RECIST criteria over this audit period, including the total number of scans read, number of scans with errors, total number of errors, and a list of all specific errors committed with the accompanying scan accession numbers to enable review. Cohorts
The study population was generated from a PACS search using the institutional oncology provider names. Each cohort was defined by all CT scans performed on oncology patients over selected 1-month periods. Group A, baseline. All CT scans performed on oncology patients over a 1-month period during which time RECIST 1.0 were the sole criteria used (prior to introduction of the RECIST 1.1 criteria) in the institution. No educational
Academic Radiology, Vol 20, No 3, March 2013
EDUCATIONAL INTERVENTIONS TO IMPROVE RECIST REPORTING
TABLE 2. Types of Errors Recorded for RECIST 1.0 and RECIST 1.1
RECIST 1.0 Errors RECIST table omitted inappropriately Inappropriate choice of indicator lesion(s) in first scan >10 total lesions reported >5 lesions/organ reported Lesion <10 mm measured Measurement(s) not saved in PACS Incorrect measurement unit (eg, report centimeters [cm] instead of millimeters [mm]) Inconsistent measurement angle Measurement not accurate Poor lesion conspicuityz LN <10 mm measured LN measured in short axis Non-measurable disease used Indicator lesion numbering altered from previous RECIST report Indicator lesion dropped Nonaxial plane used Wrong window used Wrong slice thickness used Measured through intervening bowel or vessel Multiple nodes grouped when measuring Impression used the terms: complete response, partial response, progressive disease, stable disease Other error (ie, incorrect series or slice number referenced for a lesion, failure to add new lesion to RECIST table, providing both short and long axis measurements for a lesion)
RECIST 1.1 Errors
Error Type: Major* (2) vs. Minory (1)
RECIST table omitted inappropriately Inappropriate. choice of indicator lesion(s) in first scan >5 total lesions reported >2 lesions per organ reported Lesion <10 mm measured Measurement(s) not saved in PACS Incorrect measurement unit (eg, cm)
1 2
Inconsistent measurement angle Measurement not accurate Poor lesion conspicuityz LN <15 mm measured LN measured in long axis Non-measurable disease used Indicator lesion numbering altered from previous RECIST report Indicator lesion dropped Nonaxial plane used Wrong window used Wrong slice thickness used Measured through intervening bowel or vessel Multiple nodes grouped when measuring Impression used the terms: complete response, partial response, progressive disease, stable disease
2 2 2 2 2 2 1
Other error (ie, incorrect series or slice number referenced for a lesion, failure to add new lesion to RECIST table, providing both short and long axis measurements for a lesion) Incorrect RECIST version used RECIST version used not indicated
1 1 2 1 1
2 1 1 1 2 2 2
1
1 1
LN, lymph node; PACS, picture archive and communication system; RECIST, Response Evaluation Criteria in Solid Tumors. *Error could result in misinterpretation of disease response. y Error unlikely to result in misinterpretation of disease response. z Poorly defined lesion margins resulting in measurements that are difficult to reproduce.
intervention had been implemented although reporting using RECIST 1.0 criteria had been in effect for 5 years (following an initial didactic educational session). Group B, post DEM. All CT scans performed on oncology patients over a 1-month period beginning a month after the introduction of the RECIST 1.1 criteria and the DEM intervention. During this period, scans were reported using either RECIST 1.0 criteria (follow-up studies) or RECIST 1.1 criteria (new diagnoses). Group C, post A & F. All CT scans performed on oncology patients over a 1-month period beginning 3 months after the A & F intervention. A 3-month delay was chosen to allow faculty the time to review their audit data and review the cases in which they had made errors. Scans were reported using either RECIST 1.0 criteria (follow-up studies) or RECIST 1.1 criteria (new diagnoses).
Statistical Analysis
The percentage of studies with errors and the total number of errors committed were assessed between the study cohorts using an unequal variance t-test. A P value of less than .05 was determined to be significant. Error rates and the total number of errors in studies read with and without a resident/fellow were also assessed.
RESULTS Total Number of Errors by Cohort
The error types and totals for each cohort are listed in (Table 3). The baseline group (A) consisted of 246 consecutive CT scans reported by 20 different staff radiologists with a total of 96 errors committed. The post DEM group (B) consisted of 353
ANDOH ET AL
Academic Radiology, Vol 20, No 3, March 2013
TABLE 3. RECIST Errors Committed by Cohort
RECIST Error Type RECIST version not indicated RECIST table omitted inappropriately Incorrect RECIST version used Inappropriate choice of indicator lesions in first scan Too many total lesions used Too many lesions/organ Lesion <10 mm used Measurements not saved Incorrect measurement unit (eg, cm) Inconsistent measurement angle Measurement not accurate Poor lesion conspicuity LN too small measured LN measured in incorrect axis Non-measurable disease used Indicator lesions numbered wrong Indicator lesion dropped Nonaxial plane used Wrong window Wrong slice thickness Measured through intervening bowel or vessel Multiple nodes grouped Impression used complete response, partial response or progressive disease, stable disease Other error Total errors
Group A Group B Group C (n = 246) (n = 246) (n = 218) n/a 34
15 37
4 10
n/a n/a
4 0
1 0
0 0 0 5 3
1 2 1 4 1
0 0 2 5 0
0 3 1 0 5 0 0
0 1 1 1 5 2 2
0 3 0 0 12 0 1
2 0 4 2 0
0 0 0 1 2
4 0 4 1 0
2 12
2 4
0 2
23 96
7 93
19 68
LN, lymph node; RECIST, Response Evaluation Criteria in Solid Tumors.
246 consecutive CT scans reported by 21 different staff radiologists with 93 total errors committed. The post A & F group (C) consisted of 218 consecutive CT scans reported by 24 different staff radiologists with 68 total errors committed. There was a trend of decreasing total number of errors over the study period (from 96 to 93 to 68); however, neither the decrease in the total number of errors following DEM (P = .85) nor A & F (P = .22) was significant compared to baseline. In addition, there was no significant difference in the total number of errors between groups B and C (P = .35) (Table 4). Error Rate and Mean Errors per Report by Cohort
The percent of CT scan reports with errors in application of the RECIST criteria decreased from 30% (group A) to 28% (group B) to 22% (group C). However, only the difference in error rate in group C compared to A was significant (P = .03). There was no significant difference in the percent 354
of studies with errors between groups B and C (P = .11) (Table 4). There was a significant variation in the number of scans read per radiologist per cohort, ranging from 1 to 106. The highest volume reader’s error rate changed only slightly during the study; 27% in group A, 28% in group B, and 23% in group C. The mean number of errors committed per report on studies requiring RECIST decreased from 0.93 (group A) to 0.80 (group B) and 0.82 (group C). None of these differences was statistically significant compared to baseline, and the difference between groups B and C was also not significant (P = .88). Trends in Types of Errors across Cohorts
The most common error committed at baseline was the inappropriate omission of the RECIST table. The DEM intervention failed to decrease this error. Although the A & F intervention did result in reduction of this error, it should be noted that this cohort had fewer scans which required RECIST. The number of major errors remained fairly constant, from 23 in group A to 19 in group B and to 22 in group C. Of note, the most common major error made in group A was ‘‘Impression utilized terms complete response, partial response, progressive disease, or stable disease’’ (n = 12), which decreased to 4 in group B and 2 in group C (Table 3). One major error, ‘‘failing to measure the lymph node in the correct axis,’’ actually increased over the series of interventions, from 3 in group A to 5 in group B to 12 in group C (Table 5). Error Rates between Staff Only and Staff Plus Residents/Fellow Reads
The involvement of a resident or fellow did not appear to impact accuracy. The percent of reports with errors and the total number of errors was not significantly different between cases read by staff only compared to those read in conjunction with residents or fellows (P = .94 and P = .52, respectively) (Table 6).
DISCUSSION Accurate interpretations of imaging studies and clear communication of their results are imperative to appropriately determine disease response to therapy for routine clinical care and clinical trials. In the 1990s, the RECIST criteria were created by an international working group to standardize and simplify tumor response reporting (1). The goals were to standardize the methods of obtaining tumor measurements and the definitions of response, and to increase accuracy in determining tumor response to specific therapy. We adopted these criteria for reporting of all CT oncology studies at the request of our oncologists in 2005. The criteria were revised and RECIST 1.1 was published in January 2009 following an analysis of a database of prospectively documented tumor measurements
Academic Radiology, Vol 20, No 3, March 2013
EDUCATIONAL INTERVENTIONS TO IMPROVE RECIST REPORTING
of 6500 patients and >18,000 indicator lesions (2). It was determined that the total number of lesions and the number per organ could be reduced without reducing the accuracy of the assessment of tumor response. In addition, the technique to measure and report lymph nodes was changed to be concordant with the imaging literature, which recommends short axis measurements. The minimum size of a lymph node to be used as an indicator lesion was increased to 15 mm in short axis, thus ensuring that selected indicator lesions were likely to be metastatic. Implementation of RECIST 1.1 was difficult because patients who had been followed with the original criteria continued to be reported using version 1.0 and any new oncology cases were reported using version 1.1. The differences between RECIST 1.0 and RECIST 1.1 (Table 1) were difficult to keep track of in a busy clinical environment. In addition, in our small academic practice, multiple physicians interpret CT scans with varying frequency (ranging from 1 to 106 studies read per radiologist per cohort period in this study). The low-volume readers in particular found it difficult to recall the specifics of each set of criteria. Because of these issues, two educational interventions were used to reinforce the correct reporting guidelines and to increase our departmental accuracy in using the RECIST criteria. The effective dissemination of clinical practice guidelines is becoming increasingly important as organizations implement various quality assurance initiatives to improve quality, safety, and efficiency. Radiology, like other specialties, continues to seek methods to increase adherence to these institutionspecific or national guidelines to ensure provision of care consistent with the current standard (7–10). Given the potential high-resource requirements and costs of implementing quality improvement/educational initiatives, it is important to understand the effectiveness of different guideline dissemination strategies in altering physician behavior. Currently, no strong evidence exists to support which type of guideline dissemination strategy is more effective in a given setting (3). Two Cochrane reviews have examined the effect of educational materials and auditing with feedback on process outcomes, including physician behavior. The educational materials review showed that when compared to no intervention, DEM have small beneficial effects on professional practice (4). The audit with feedback review showed that this strategy can improve professional practice; however, the effects are variable and when effective, audit with feedback results in a small to moderate effect on professional practice (5). Both reviews commented that there is limited scientific evidence to determine which guideline dissemination strategy is likely to be more effective in a given circumstances or how the different strategies can be optimized (4–6). In designing this study, DEM were used as one educational intervention given their low cost and ability to be rapidly disseminated. A & F was the second educational intervention used because it has been shown to have some effect on clinical practice despite requiring costly resources such as physician time (5).
Our study demonstrated that following both DEM and A & F, the percent of studies with errors and total number of errors decreased compared to baseline. However, only the decrease in the percent of studies with errors when comparing the post-A & F cohort to baseline was significant (P = .03). This suggests that use of two different educational interventions failed to demonstrate a linear dose-response curve because there were no significant differences in the percent of studies with errors or the total number of errors when comparing the post-DEM cohort with baseline and the postDEM cohort with the post-A & F cohort. We do not know how the DEM intervention impacted the results of the post-A & F cohort. There is conflicting evidence regarding the effectiveness of multifaceted interventions compared to single interventions. Though some systematic reviews have suggested a dose-response curve, a recent systematic review posits that multifaceted interventions are no more effective than single interventions and that no evidence exists to support the dose-response theory, which is consistent with the results of our study (3). A primary goal of the educational intervention was to reduce the rate of major errors, given that a major error in the application and reporting of RECIST criteria could result in misinterpretation of disease response. Although our major error rate did increase over the study period, we were able to identify that the primary source of this was incorrect lymph node measuring and reporting. This accounted for only 3 of the 23 major errors in group A, but 5 of the 19 in group B and 12 of the 22 in group C. The method for measuring and reporting lymph nodes changed significantly between RECIST 1.0 and 1.1, and the number of exams which were reported using the RECIST 1.1 criteria increased from 0 in group A to 60% of studies in group C, therefore impacting the rate of this specific type of error. Radiologists, who traditionally measure and report lymph nodes in the short axis to determine likelihood of pathogenicity, had changed their thought processes and reporting of lymph nodes to the long axis to comply with RECIST 1.0. With the implementation of RECIST 1.1, the scheme changed back to report lymph nodes in short axis, resulting in much confusion and inaccuracies in lymph node measurements. Excluding major errors resulting from lymph node misreporting, the major error rate decreased from 20 (21%) in group A to 14 (15%) in group B and 10 (15%) in group C. In addition to evaluating the effect of the educational interventions on the appropriate and accurate use of RECIST criteria, we examined whether those scans read by staff in conjunction with a resident or fellow were prone to higher error rates than those read by staff alone. We hypothesized that because trainees rotate through CT only periodically, they may be less familiar with and experienced applying the RECIST criteria and therefore be more prone to making errors. We found there was no confounding effect with the presence of residents as far as the study could determine. Potential shortcomings of this study include the small cohort sizes, which may have reduced the ability to detect 355
ANDOH ET AL
Academic Radiology, Vol 20, No 3, March 2013
TABLE 4. Error Rates by Cohort Group A Baseline
Group B Post DEM
Group C Post A & F
246 103 75 30% (25%–36%) 96 0.93 (0.72) 23 (24%)
246 113 67 28% (22%–34%) 93 0.80 (1.03) 19 (20%)
218 83 47 22% (16%–27%)* 68 0.82 (0.93) 22 (32%)
Total number of studies interpreted Number of studies requiring use of RECIST Number of RECIST-requiring studies with error Percent of studies with error (95% CI) Total number of errors Mean number of errors in RECIST applicable studies (SD) Major (2) error rate
A & F, audit and feedback; CI, confidence interval; RECIST, Response Evaluation Criteria in Solid Tumors; SD, standard deviation. *Group A vs. group C, P = .03.
TABLE 5. Number of Major and Minor Errors Committed by Cohort Major Errors Minor Errors Committed Committed Group A Group B Group C
23 19 22
73 74 46
Most Common Major Error Impression used ‘‘complete response, partial response or progressive disease, stable disease’’ Impression used ‘‘complete response, partial response or progressive disease, stable disease’’ Lymph node measured in incorrect axis
TABLE 6. Comparison of Error Rates in Resident/Fellow vs. Attending-only Reports
Total number of studies Number of studies requiring RECIST Number of RECISTrequiring studies with errors Percent of studies read with error(s) (95% CI) Total number of errors Mean number of errors in RECIST applicable studies (SD)
Resident/Fellow
Attending Only
206 79
504 219
54
135
26% (21%–33%)
27% (23%–31%)
81 1.01 (1.15)
176 0.79 (0.79)
CI, confidence interval; RECIST, Response Evaluation Criteria in Solid Tumors; SD, standard deviation.
smaller significant effects. We were also unable to control for unintended selection bias. The last point is particularly important because changes in the composition of radiologists or number of studies read by specific radiologists may have resulted in an improvement uncorrelated to the intervention. The number of studies read per radiologist within a cohort ranged from 1 to 106. The highest volume reader read a significantly larger number of studies than any other, and although their error rate reduced over the study period, it remained fairly high and may have driven our overall findings. Perhaps intensive education of the higher volume readers would yield better overall results. In addition, the introduction of RECIST 1.1 and the utilization of both RECIST 1.0 and 1.1 in the post-DEM and A & F cohorts may have served as a confounding variable 356
because only RECIST 1.0 was used at baseline. This confounding variable may have affected the error rate in each group and thereby reduced the apparent effects of our educational interventions. Our department is a small academic practice, requiring radiologists to interpret studies in a variety of subspecialty areas. A total of 24 different attending radiologists read CT scans of oncology patients during our study period. The baseline accuracy of physician application of the RECIST criteria and the effectiveness of the performed educational interventions may be different if applied to a smaller cohort of cross sectional fellowship trained radiologists; thus our results may not be applicable to larger academic centers with more subspecialty focus. Our practice model more closely resembles many private practice groups and is likely more representative of how work is divided throughout the nonacademic world. Limitations also include the difference in time delay to evaluating accuracy; one month after DEM versus 3 months after A & F. Although this was felt necessary to allow the faculty time to review the cases in which they had made errors, it may have impacted the results. Cohort B may have performed better because of the ‘‘honeymoon’’ effect that may occur immediately after training. There was neither a carrot nor a stick (ie, no tangible reward or benefit) to producing consistent, accurate reports, and this remains a potential avenue to explore. Last, an economic analysis regarding the costs of guideline dissemination and audit and feedback was not performed, although it should be noted that the physician time involved in providing the A & F intervention was extensive. Significant time and energy was put into the A & F, with only a relatively small incremental gain in accuracy achieved. The postinterventions error rate (22%) remained higher than desirable.
Academic Radiology, Vol 20, No 3, March 2013
EDUCATIONAL INTERVENTIONS TO IMPROVE RECIST REPORTING
CONCLUSION The results of our study are consistent with the conflicting evidence base regarding the effectiveness of DEM and A & F. This study highlights some of the difficulties of training a variable cohort of radiologists to use a standard image reporting format. Despite extensive passive and active individual-based educational initiatives, some of which required significant physician time and resources, we were still only able to minimally impact the accuracy of reports. Although this adds to the evidence base regarding the effectiveness of DEM and A & F as guideline dissemination strategies, further randomized controlled trials that reduce study bias and shed light on effect modifiers and confounders are needed. REFERENCES 1. Therasse P, Arbuck SG, Eisenhauer EA, et al. New guidelines to evaluate the response to treatment in solid tumors (RECIST Guidelines). J NCI 2000; 92:205–216. 2. Eisenhauer EA, Therasse P, Bogaerts J, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 2009; 45:228–247.
3. Grimshaw J, Eccles M, Thomas R, et al. Toward evidence-based quality improvement evidence (and its limitations) of the effectiveness of guideline dissemination and implementation strategies 1966–1998. J Gen Intern Med 2006; 21:S14–S20. gare F, Turcot L, et al. Printed educational materials: effects 4. Farmer AP, Le on professional practice and health care outcomes. Cochrane Database of Systematic Reviews 2008;(3):CD004398. http://dx.doi.org/10.1002/ 14651858.CD004398.pub2. Art. No. 5. Jamtvedt G, Young JM, Kristoffersen DT, et al. Audit and feedback: effects on professional practice and health care outcomes. Cochrane Database of Systematic Reviews 2006;(2):CD000259. http://dx.doi.org/10.1002/ 14651858.CD000259.pub2. Art. No. 6. Foy R, Eccles M, Jamtvedt G, et al. What do we know about how to do audit and feedback? Pitfalls in applying evidence from a systematic review. BMC Health Serv Res 2005; 5:50. 7. Kielar AZ, McInnes M, Quan M, et al. Introduction of QUIP (quality information program) as a semi-automated quality assessment endeavor allowing retrospective review of errors in cross-sectional abdominal imaging. Acad Radiol 2011; 18:1358–1364. 8. Carney PA, Abraham L, Cook A, et al. Impact of an educational intervention designed to reduce unnecessary recall during screening mammography. Acad Radiol 2012; 19:1114–1120. 9. Carney PA, Geller BM, Sickles EA, et al. Feasibility and satisfaction with a tailored web-based audit intervention for recalibrating radiologists’ thresholds for conducting additional work-up. Acad Radiol 2011; 18:369–376. 10. Chabi ML, Borget I, Ardiles R, et al. Evaluation of the accuracy of a computer-aided diagnosis (CAD) system in breast ultrasound according to the radiologist’s experience. Acad Radiol 2012; 19:311–319.
357