LETTERS TO THE EDITOR Re: “Frequency and Spectrum of Errors in Final Radiology Reports Generated With Automatic Speech Recognition Technology” I am prompted to write this letter following my reading of the interesting article by Quint et al [1] from the University of Michigan. They reported an error rate in finalized reports generated by voice recognition (VR) software of 22%. Additionally, they found that radiologists underestimated their own error rates. As a radiology resident at the University of California, San Diego, my colleagues and I have considerable experience using a VR system (Talk Station version 3.1.4; Agfa, Mortsel, Belgium). To gauge the perceived current impact of using VR, I conducted a brief survey of our department. I received replies from 34 radiologists (14 residents, 5 fellows, and 15 faculty members). Respondents answered each question using a 5point, Likert-type response scale: 1 ⫽ “strongly disagree,” 2 ⫽ “disagree,” 3 ⫽ “were neutral,” 4 ⫽ “agreed,” 5 ⫽ “strongly agree.” The questions and their mean answers were as follows: 1. I am satisfied with our VR software: 2.2 2. Our VR software increases my productivity/efficiency at work: 1.9 3. Our VR software decreases my productivity/efficiency at work: 4.2 4. Using our VR software introduces errors in communication: 4.2 5. I have seen errors in prior reports that I attribute to our VR software (not diagnostic errors): 4.4 6. I have been notified by a clinician because of an error due to our VR software: 3.8 282
7. I have seen an error attributable to our VR software that has significant clinical implication: 3.8 8. I have had to addend a report because of an error generated by our VR software: 4.1 9. Macros in our VR software increase completeness: 2.9 10. Macros in our VR software decrease accuracy: 3.6 11. Using our VR software enhances patient care and safety: 2.3 12. Our VR software increases potential for malpractice: 3.8 13. Our VR software allows for more time for teaching, learning, and attending conferences: 2.0 14. When reviewing reports generated by our VR software (before finalizing them), I notice errors which need correcting: 4.1 15. When comparing alternative systems (i.e., human transcription services), our VR software is . . . a. Faster: 1.8 b. More accurate: 1.8 c. Leads to more confusing reports: 3.6 From these responses, I have concluded that there is poor satisfaction with and concerns about clinical safety associated with our VR software. The time it takes to correct our VR-generated reports was felt to negatively affect the residency educational experience. Reports in the literature corroborate these findings [2-5]. The conclusions the authors of these articles drew include that VR software is a distraction from reading studies, more prone to cause errors, less cost effective, and more frustrating to use compared with human transcription services. One study reported that 90% of VR-generated reports contained errors, and fully
35% of finalized VR-generated reports contained errors [5]. A more recent report by Quint et al [1] suggests a finalized VR-generated report error rate of 22% and that radiologists underestimate their own error rates. It has been calculated that time spent correcting VR-generated reports is worth about $80,000 dollars per radiologist, annually [5]. Recently at the University of California, San Diego, a medicine team came to discuss a patient. “Reactive lymph nodes” had been finalized as “metastatic lymph nodes.” Before starting therapy, they wanted to know the full extent of malignancy. After explaining to them that this was an error of VR, the medicine residents revealed something disturbing: each morning, they gathered the radiology reports from the previous day to find the most ridiculous VR error. Rarely did they bother to clarify the situation with the radiologists so that the reports could be changed. We are at a time when external pressures are mounting for radiology: dwindling reimbursement, increased imaging competition from nonradiologists, the potential to entirely outsource a largely anonymous profession, and so on. Given these threats, we should strive to make our ultimate product, the written report, as accurate, useful, and timely as possible. Does our current system truly maximize our collective strengths and minimize our weaknesses with these goals in mind? Does it make sense for radiologists to divert their gaze and attention to a third screen to edit thousands of words of text daily? Peter Andrew Marcovici, MD University of California San Diego, Medical Center Department of Radiology 200 W Arbor Drive, Dept 8756 San Diego, CA 92103-8756 e-mail:
[email protected]
© 2009 American College of Radiology 0091-2182/09/$36.00 ● DOI 10.1016/j.jacr.2009.01.002
Letters to the Editor 283
REFERENCES 1. Quint LE, Quint DJ, Myles JD. Frequency and spectrum of errors in final radiology reports generated with automatic speech recognition technology. J Am Coll Radiol 2008;5:1196-9. 2. Hansen GC, Falkenbach KH, Yaghmai I. Voice recognition system. Radiology 1988;169:580. 3. Heilman RS. Voice recognition transcription: surely the future but is it ready? Radiographics 1999;19:2. 4. Leeming BW, Porter D, Jackson JD, Bleich HL, Simon M. Computerized radiologic reporting with voice data-entry. Radiology 1981;138:585-8. 5. Pezzullo JA, Tung GA, Rogg JM, Davis LM, Brody JM, Mayo-Smith WW. Voice recognition dictation: radiologist as transcriptionist. J Digit Imaging 2007;21:384-9. DOI 10.1016/j.jacr.2009.01.002 ● S1546-1440(09)00003-9
Authors’ Reply Thanks to Peter Marcovici, MD, for his comments reflecting his and his department’s experience with speech recognition (SR). We completely agree with the points made by Dr. Marcovici. Specifically, it takes too long to generate and edit reports with SR, there is a high error rate in reports generated using SR, time spent
generating a SR report takes away from time spent either looking at images or teaching, and suboptimally worded/edited reports reflect poorly upon us as radiologists. SR has the potential to shorten turnaround times. This is an easy metric to measure, and one that is often impressive to people who measure things. However, rewarding short turnaround times does not reward the generation of correct reports. Imagine what might happen if compensation were based on the readability and correctness of a report, rather than on the speed at which it was signed. Another concern needing attention is the issue of departmental morale. While tougher to quantify, a walk though any reading room or a brief conversation with members of departments with SR reflects the overwhelming sense that SR makes the job tougher and the work day longer—this is the last thing we need, as everyone will be expected to further increase productivity in the future. As SR technology is easily integrated into PACS and RIS networks and may someday become
99⫹% accurate, we do not feel that it should be abandoned. It seems that the best solution would be to take advantage of the benefits of SR, while avoiding distracting radiologists from doing their primary job in the most efficient and costeffective manner. This could be accomplished by having transcriptionists edit the reports first, before final editing and signing by the radiologist. In this scenario, SR could potentially increase the speed of transcriptionists (and thus report turnaround time), without reducing radiologist productivity. However, it remains to be seen if this type of flow pattern would reduce the report error rate compared to its current status. Douglas J. Quint, MD Leslie E. Quint, MD Department of Radiology University of Michigan Health System 1500 E. Medical Center Drive Ann Arbor, MI 48109-0030 e-mail:
[email protected] DOI 10.1016/j.jacr.2009.02.005 ● S1546-1440(09)00074-X