Point-Counterpoint
The USMLE Step 1 Pass/Fail Reporting Proposal: Another View J. Bryan Carmody, MD, MPH, David Sarkany, MD, Darel E. Heitkamp, MD
The Association of Program Directors in Radiology recently issued a statement endorsing continued reporting of results of the United States Medical Licensing Examination (USMLE) as a three-digit score. While this position was approved by the Association of Program Directors in Radiology Board of Directors, it does not reflect the opinions of all radiology program directors. Here, we present an argument in support of reporting USMLE results as pass/fail. As a psychometric instrument, the USMLE Step 1 is designed to assess basic science knowledge and intended to inform a binary decision on licensure. Due to a steadily-increasing burden of applications to review, program directors have increasingly relied upon scores for candidate screening. Such use has multiple adverse consequences. Student focus on Step 1 systematically devalues educational content not evaluated on the exam, and the reliance on Step 1 scores almost certainly works against efforts to increase workforce diversity. Moreover, the increasing pressure of “Step 1 Mania” has negative consequences for trainee mental health and wellness. Despite the widespread use of Step 1 scores to select applicants, there are little data to correlate scores to meaningful outcomes related to patient care or clinical practice. We find the current situation untenable, and believe a necessary first step toward reform is making Step 1 a pass/fail only examination. Key Words: United States Medical Licensing Examination; USMLE; Step 1; National Board of Medical Examiners; NBME; Medical Student Education; Pass/Fail; Score; Wellness; Diversity. © 2019 Published by Elsevier Inc. on behalf of The Association of University Radiologists.
R
ecently, the Association of Program Directors in Radiology issued a statement of support for continued reporting of results of the United States Medical Licensing Examination (USMLE) as a numeric score. (1) While this position was approved by the Association of Program Directors in Radiology Board of Directors, it does not reflect the opinions of all radiology program directors. As program directors (D.E.H. and D.S.S.) and medical educators (J.B.C.) we have sincere concerns about “Step 1 Mania” and the current system of residency selection and support a move to a pass/fail USMLE Step 1. HOW DID WE GET HERE? The USMLE was first administered in 1992, the end result of decades of effort by the National Board of Medical Examiners (NBME) to simplify the process of interstate physician licensure. The exam, however, was never intended to function as a “Residency Aptitude Test.” In fact, the NBME previously issued a disclaimer that its examinations were not Acad Radiol 2019; 26:1403–1406 From the Eastern Virginia Medical School, Department of Pediatrics, Division of Nephrology, Norfolk, Virginia (J.B.C.); Staten Island University Hospital, Northwell Health, Department of Radiology, 475 Seaview Avenue, Staten Island, NY 10305 (D.S.); Advent Health Orlando, Department of Radiology, Orlando, Florida (D.E.H.). Received May 27, 2019; revised June 13, 2019; accepted June 13, 2019. Address correspondence to: D.S.S. e-mail:
[email protected] © 2019 Published by Elsevier Inc. on behalf of The Association of University Radiologists. https://doi.org/10.1016/j.acra.2019.06.002
developed for the purpose of assessing preparation for postgraduate medical training (2). Coincidentally, 1992 was also the last year in which there were more PGY-1 training positions available than there were applicants for those positions. By 1996, there were only 0.83 positions available for every applicant in the Match, and though both the number of positions and applicants have grown since, the mismatch persists, with 0.84 positions per active applicant in 2019 (3). For fourth-year students confronting this statistical reality, going unmatched represents a potentially devastating outcome. When modeled using game theory, the situation approximates a “prisoner’s dilemma,” in which an applicant can gain a slight advantage over his or her peers by applying to more programs, making “overapplication” a dominant strategy (4). Simultaneously, in 1996, the Association of American Medical Colleges introduced the Electronic Residency Application System. This system allowed applicants to instantly send applications to residency programs with a click of a button, further enabling students to overapply. Unsurprisingly, given the lowered barriers to overapplication and the continued incentive to do so, the number of applications submitted by medical students has steadily increased. Today, US seniors applying to radiology apply to an average of 48.7 programs each, a statistic that translates into substantial burden for radiology program directors. In the 2018 2019 application season, radiology programs received an average of 422 applications from US seniors and 162 from international medical graduates, a total increase of over 30% from just 2013 (5). 1403
CARMODY ET AL
Against this backdrop, a simple numeric metric to screen applications indeed becomes very seductive—and program directors have increasingly turned to Step 1 scores to limit the number of applications to fully review. In 2010, 69% of radiology program directors reported using Step 1 scores to identify candidates to interview (6). By 2018, that figure had increased to 95% (7). But as program directors, we must ask ourselves: do Step 1 scores truly identify the type of candidates that we seek to train? Is this metric capable of predicting which applicants will succeed in tomorrow’s complex radiology practice environment, or is it simply one of convenience? And, by choosing to rely on Step 1 scores, are we doing more harm to medical education than what we gain from the information that the scores provide? WHAT DOES STEP 1 PREDICT? USMLE Step 1 is a multiple-choice question test of basic science. While an understanding of scientific principles is an appropriate requirement for all physicians, it is difficult to argue that much of the material covered on Step 1 is relevant to the practice of radiology. Even if it were, much of the basic science covered on Step 1 is poorly retained, with substantial decreases in examinee performance after just 1 2 years (8,9). Studies show that there is little correlation between Step 1 scores and meaningful outcomes related to patient care or clinical practice (10). The strongest correlates of Step 1 scores are scores on other standardized tests such as the Medical College Admission Test, the American College of Radiology InTraining Exam, and the American Board of Radiology (ABR) Core Exam (11,12). Such associations raise a question of whether these instruments are truly independent measures of knowledge - or whether we are simply repeatedly assessing a skill in test taking. While success on standardized tests is a skill prized in our society, it is not necessarily one that adds value to patient care. In an era when medical knowledge is more accessible than ever before, it seems curious that we have chosen to prioritize a measure of basic science memorization over higher-level analysis and critical thinking. Since program accreditation (and residency recruitment) may be impacted by poor resident performance on board certification exams, some program directors highlight the association between Step 1 scores and specialty board certification. Yet the predictive value of this association is often overstated. Although a recent study demonstrated a general correlation between higher USMLE scores and increased odds of passage on the ABR Core Examination (12), more detailed analysis in other disciplines has shown that the relationship between Step 1 scores and board passage is nonlinear, and that scores as low as 200 210 correspond to a high likelihood of success, with diminishing returns thereafter (13 18). WHAT’S THE HARM IN USING STEP 1 SCORES? If there is some utility for Step 1 scores in predicting board passage or selecting candidates to interview, program 1404
Academic Radiology, Vol 26, No 10, October 2019
directors might reasonably choose to continue using them. However, does this benefit offset the harms of such use? If we as program directors argue for maintaining the status quo, we must at least be aware of the externalities imposed upon medical students and medicine as a whole. A veritable industry has cropped up around Step 1 preparation to capitalize on the rise in student anxiety. The typical medical student budget for Step 1 includes registration for the test ($630), at least 3 4 proprietary test prep resources ($100 $500 each), and approximately 3 practice exams sold by the NBME ($60 each) (19 21). The average student thus likely incurs $1500 $2000 of expenses related Step 1. For many students, these costs are financed at prevailing student loan interest rates. More worrisome than monetary costs are the changes happening to preclinical medical education as a result of the exam. Because of the importance program directors place on Step 1 scores, preclinical students today prioritize exam preparation at the expense of other elements of medical training (19,20). Content not explicitly tested on Step 1 is systematically devalued—content which includes skills recognized by the Accreditation Council for Graduate Medical Education as core competencies in graduate medical education, such as communication, professionalism, and patient care. Students’ decisions to forgo these experiences in favor of Step 1 preparation may have significant ramifications to their professional development. A recent paper authored by medical students called attention to this issue, arguing that commercial test prep resources have become ''the de facto national curriculum of preclinical medical education” (20). Maintaining a scored Step 1 may also work against efforts to improve radiology workforce diversity. Women and underrepresented minorities are disproportionately underrepresented in our field. In fact, among the 20 largest residency training specialties, diagnostic radiology ranks 17th and 20th for female and underrepresented minorities representation, respectively, with no trend toward improvement in recent years (22). However, these underrepresented groups score lower on Step 1. A recent NBME study showed that, when compared to a white male reference group, female students scored 5.9 points lower on USMLE Step 1, while, Asian, Hispanic, and black test-takers scored 4.5, 12.1, and 16.6 points lower, respectively (23). Often, Step 1 scores are a held up as a surrogate for a student’s work ethic, determination, or time management skills—yet these seem poor explanations for these systematic differences. In our current environment—when 67% of radiology program directors use specific target scores to screen candidates (7)—diverse candidates are almost certainly being screened out. Yet perhaps the most troubling effect of Step 1 Mania is its impact on student mental health. A 2014 study comparing medical students and the general public with regard to burnout and other mental health symptoms found significantly higher rates of burnout and depression in medical students compared to aged-matched samples of college graduates (24). Step 1 is a substantial source of medical student anxiety (25), as a recent
Academic Radiology, Vol 26, No 10, October 2019
THE USMLE STEP 1 PASS/FAIL REPORTING PROPOSAL
commentary authored by students and residents from six US medical schools vividly describes:
The process of holistic review, in which the applications of all academically-qualified candidates are carefully evaluated across all domains and measured against the unique needs of the institution, has been championed by the Association of American Medical Colleges as the model for medical school admissions (33,34). Admittedly, such a process requires reviewers to make subjective judgments—just as radiologists must do every day in clinical practice. The issue of overapplication deserves special consideration. If the burden of application review necessitates the use of nonevidence-based screening metrics, is the best solution to continue to use those metrics—or to insist upon a more sensible application policy? Moreover, analysis shows that while overapplication provides a relative advantage for individual candidates, there is no improvement in the overall match rate (4). It seems unlikely that program directors would have tolerated the rise in overapplication were it not for the availability of Step 1 scores.
For students, the Step 1 climate is less about learning than keeping their heads above water in a cutthroat profession. At stake is choice of specialty, residency location, and even self-worth. It is not surprising that students push themselves to their physical, psychological, and interpersonal limits to succeed in this environment. In our view, the Step 1 climate contributes to the ongoing mental health crisis affecting the medical community, characterized by increased rates of anxiety, depression, burnout, and suicide among physicians and physicians-in-training (20). The increasing pressure on students to perform well on Step 1 is driven in part by the fact that the national mean Step 1 score has been rising at approximately 0.9 points per year—meaning that students must score higher and higher to distinguish themselves from their peers. When the USMLE began in 1992, the mean score was 200. Today, a Step 1 score of 200 would fall at the 9th percentile (26), while the average score for a US student who successfully matches is 233 (27). In this population already known to be at increased risk of burnout, depression, and suicide (28,29), shouldn't program directors seize opportunities to improve medical student wellbeing rather than enabling an arms race that pushes students harder and harder each year?
WHAT WOULD PROGRAM DIRECTORS DO WITHOUT STEP 1 SCORES? It is sometimes suggested that without Step 1 scores, program directors will be left without sufficient data with which to make selection decisions. Are such concerns justified? After all, if there are no data that Step 1 scores reliably identify the qualities we seek in our trainees, or that patients seek in their physicians, how could other metrics perform worse? Although other data used to inform candidate selection are imperfect, some highly-competitive programs have nonetheless successfully identified candidates to interview with an algorithmic approach using existing metrics without a USMLE cut score (30). Moreover, if we agreed to eliminate Step 1 score reporting in 3 5 years, the medical education community would work hard to devise meaningful replacements. The only way that other, more informative measures will not arise is if we as program directors fail to request them. Radiology is a competitive discipline, and highly motivated students will seek to distinguish themselves by whatever measures we tell them are important. Other specialties have requested other measures. For instance, some surgical fellowships use tests of situational judgment and technical skills (31). In emergency medicine, program directors review a standardized letter of evaluation—which they cited as the most important factor in determining candidates to interview (ahead of USMLE scores) (32).
THE PROGRAM DIRECTOR’S DILEMMA In the absence of high-quality of evidence suggesting that Step 1 scores predict which applicants will be safe, effective radiologists, and amidst growing concern over the harms of Step 1 Mania, what are program directors to do? Granted, if a program director’s goal is simply to recruit residents capable of performing well on the ABR Core Exam, then the Step 1 score may be sufficient. But if radiologists are serious about adding value through communication, professionalism, and interdisciplinary patient care, then we must be more deliberate in our recruitment and gatekeeping. Our field would benefit more from holistic review (to identify candidates with excellence in domains such as ethics, leadership, cultural competency, diversity, communication, healthcare disparities, patient and family centered care, and innovation) than by continuing to rely on a one-size-fits-all standardized test. Program directors have incredible leverage in this debate. We cannot abdicate our power to insist upon reform just to make our lives easier during interview season. Overapplication is a phenomenon born out of the Step 1 culture that we have unwittingly helped to create. The use of Step 1 scores presents a dilemma for program directors and anyone else who considers themselves a trainee advocate. We must admit our role in in perpetuating this problem and address the adverse effects of Step 1 Mania on medical education, diversity in medicine, and trainee wellness. THE PATH FORWARD We believe that we have reached a tipping point in this debate. As easy as it would be for program directors to continue outsourcing candidate evaluation to the NBME, from our standpoint, the status quo has become untenable. While reporting USMLE Step 1 results as pass/fail is not a cure-all, it is a necessary first step. Rather than endorsing the current system, program directors should lead the change by insisting 1405
CARMODY ET AL
upon application reform, meaningful selection metrics, holistic candidate review, and a refocus on preclinical medical education that will enable students to prepare for the challenges they will face in independent practice.
Academic Radiology, Vol 26, No 10, October 2019
17.
18.
REFERENCES 1. 2.
3. 4.
5. 6.
7.
8.
9. 10.
11. 12. 13.
14.
15.
16.
Rozenshtein A, Mullins ME, Marx MV. The USMLE step 1 pass/fail reporting proposal: the APDR position. Acad Radiol 2019; 26(10):1400–1402. National Board of Medical Examiners. “National Board Examinations, use of scores in residency selection, and staff changes”. Natl Board Examiner 1988; 35(1):3. National Resident Matching Program. Results and Data: 2019 Main Residency Match. Washington, DC: National Resident Matching Program, 2019. Weissbart SJ, Kim SJ, Feinn RS, et al. Relationship between the number of residency applications and the yearly match rate: time to start thinking about an application limit. J Grad Med Educ 2015; 7(1):81–85. ERAS Statistics - Preliminary Data (ERAS 2019). Online at: https://www. aamc.org/services/eras/stats/359278/stats.html. Accessed 22 May 2019. National Resident Matching Program. Data Release and Research Committee: Results of the 2010 NRMP Program Director Survey. Washington, DC: National Resident Matching Program, 2010. Online at: https:// mk0nrmpcikgb8jxyd19h.kinstacdn.com/wp-content/uploads/2013/08/ programresultsbyspecialty2010v3.pdf. Accessed June 10, 2019. National Resident Matching Program. Data Release and Research Committee: Results of the 2018 NRMP Program Director Survey. Washington, DC: National Resident Matching Program, 2018. Online at: https:// mk0nrmpcikgb8jxyd19h.kinstacdn.com/wp-content/uploads/2018/07/ NRMP-2018-Program-Director-Survey-for-WWW.pdf. Accessed June 10, 2019. Swanson DB, Case SM, Luecht RM, et al. Retention of basic science information by fourth-year medical students. Acad Med 1996; 71(10): S80–S82. Ling Y, Swanson DB, Holtzman K, et al. Retention of basic science information by senior medical students. Acad Med 2008; 83(10):S82–S85. McGaghie WC, Cohen ER, Wayne DB. Are united states medical licensing exam step 1 and 2 scores valid measures for postgraduate medical residency selection decisions? Acad Med 2011; 86(1):48–52. Boyse TD, Petterson SK, Cohan RH, et al. Does medical school performance predict radiology resident performance? Acad Radiol 2002; 9:437–445. Calisi N, Gondi KT, Asmar J, et al. Predictors of success on the ABR core examination. J Am Coll Radiol 2019. in press. McCaskill QE, Kirk JJ, Barata DM, et al. USMLE step 1 scores as a significant predictor of future board passage in pediatrics. Ambul Pediatr 2007; 7(2):192–195. Kay C, Jackson JL, Frank M. The relationship between internal medicine residency graduate performance on the ABIM certifying examination, yearly in-service training examinations, and the USMLE step 1 examination. Acad Med 2015; 90(1):100–104. Shellito JL, Osland JS, Helmer SD, et al. American board of surgery examinations: can we identify surgery residency applicants and residents who will pass the examinations on the first attempt? Am J Surg 2010; 199(2):216–222. Armstrong A, Alvero R, Nielsen P, et al. Do U.S. medical licensure examination step 1 scores correlate with council on resident education in obstetrics and gynecology in-training examination scores and american
1406
19.
20.
21.
22.
23.
24.
25. 26.
27.
28.
29. 30.
31.
32.
33.
34.
board of obstetrics and gynecology written examination performance? Mil Med 2007; 172(6):640–643. Swanson DB, Sawhill A, Holtzman KZ, et al. Relationship between performance on part i of the american board of orthopaedic surgery certifying examination and scores on USMLE steps 1 and 2. Acad Med 2009; 84(10 Suppl):S21–S24. Dillon GF, Swanson DB, McClintock JC, et al. The relationship between the american board of anesthesiology part 1 certifying examination and the united states medical licensing examination. J Grad Med Educ 2013; 5(2):276–283. Prober CG, Kolars JC, First LR, et al. A plea to reassess the role of united states medical licensing examination step 1 scores in residency selection. Acad Med 2016; 91(1):12–15. Chen DR, Priest KC, Batten JN, et al. Student perspectives on the “Step 1 Climate” in preclinical medical education. Acad Med 2019; 94 (3):303–304. Burk-Rafel J, Santen SA, Purkiss J. Study behaviors and USMLE step 1 performance: implications of a student self-directed parallel curriculum. Acad Med 2017; 92:S67–S74. Chapman CH, Hwang WT, Both S, et al. Current status of diversity by race, hispanic ethnicity, and sex in diagnostic radiology. Radiology 2014; 270(1):232–240. Rubright JD, Jodoin M, Barone MA. Examining demographics, prior academic performance, and united states medical licensing examination scores. Acad Med 2019; 94(3):364–370. Dyrbye LN, West CP, Satele D, et al. Burnout among US medical students, residents, and early career physicians relative to the general US population. Acad Med 2014; 89(3):445–451. Slavin S. Reflections on a decade leading a medical student well-being initiative. Acad Med 2019; 94:771–774. USMLE Score Interpretation Guidelines. Online at: https://www.usmle.org/ pdfs/transcripts/USMLE_Step_Examination_Score_Interpretation_Guidelines. pdf. Accessed 23 May 2019. National Resident Matching Program. Charting Outcomes in the Match: U.S. Allopathic Seniors, 2018. Washington, DC: National Resident Matching Program, 2018. Online at: https://mk0nrmpcikgb8jxyd19h.kinstacdn.com/wp-content/uploads/2018/06/Charting-Outcomes-in-theMatch-2018-Seniors.pdf. Accessed June 10, 2019. Dyrbye LN, Massie FS, Eaker A, et al. Relationship between burnout and professional conduct and attitudes among US medical students. JAMA 2010; 304(11):1173–1180. Schwenk TL, Davis L, Wimsatt LA. Depression, stigma, and suicidal ideation in medical students. JAMA 2010; 301(11):1181–1190. Vilwock JA, Hamill CS, Sale KA, et al. Beyond the USMLE: the STAR algorithm for initial residency applicant screening and interview selection. J Surg Res 2019; 235:447–452. Gardner AK, Dunkin BJ. Pursuing excellence: the power of selection science to provide meaningful data and enhance efficiency in selecting surgical trainees. Ann Surg 2018. doi:10.1097/SLA.0000000000002806. Epub ahead of print. Love JN, Smith J, Weizberg M, et al. Council on emergency medicine residency Directors’ standardized letter of recommendation: the program director’s perspective. Acad Emerg Med 2014; 21(6):680–687. Conrad SC, Addams AN, Young GH. Holistic review in medical school admissions and selection: a strategic, mission-driven response to shifting societal needs. Acad Med 2016; 91(11):1472–1474. Kirch DG. Transforming admissions: the gateway to medicine. JAMA 2012; 308:2250–2251.