Accepted Manuscript Does the NBME Surgery Shelf exam constitute a “double jeopardy” of USMLE Step 1 performance? Michael S. Ryan, Jorie M. Colbert-Getz, Salem N. Glenn, Joel D. Browning, Rahul J. Anand PII:
S0002-9610(16)30978-3
DOI:
10.1016/j.amjsurg.2016.11.045
Reference:
AJS 12205
To appear in:
The American Journal of Surgery
Received Date: 7 March 2016 Revised Date:
21 October 2016
Accepted Date: 29 November 2016
Please cite this article as: Ryan MS, Colbert-Getz JM, Glenn SN, Browning JD, Anand RJ, Does the NBME Surgery Shelf exam constitute a “double jeopardy” of USMLE Step 1 performance?, The American Journal of Surgery (2017), doi: 10.1016/j.amjsurg.2016.11.045. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
RI PT
ACCEPTED MANUSCRIPT
M AN U
SC
DOES THE NBME SURGERY SHELF EXAM CONSTITUTE A “DOUBLE JEOPARDY” OF USMLE STEP 1 PERFORMANCE?
TE D
Michael S Ryan MD, MEHPa, Jorie M Colbert-Getz MS, PhDb, Salem N Glenn BSa, Joel D Browning BSa, Rahul J Anand MDa
a. Virginia Commonwealth University School of Medicine 1201 East Marshall Street, Richmond, Virginia 23298
AC C
EP
b. University of Utah School Of Medicine 30 N 1900 E, Salt Lake City, Utah 84132
Corresponding Author Rahul J Anand MD FACS
[email protected]
ACCEPTED MANUSCRIPT
Background: Scores from the NBME Subject Examination in Surgery (Surgery Shelf) positively correlate with United States Medical Licensing Examination Step 1 (Step 1). Based on this relationship, the
RI PT
authors evaluated the predictive value of Step 1 on the Surgery Shelf.
Methods:
SC
Surgery Shelf standard scores were substituted for Step 1 standard scores for 395 students in 2012-2014 at one medical school. Linear regression was used to determine how well Step 1
M AN U
scores predicted Surgery Shelf scores. Percent match between original (with Shelf) and modified (with Step 1) clerkship grades were computed.
Results:
TE D
Step 1 scores significantly predicted Surgery Shelf scores, R2= 0.42, P<0.001. For every point increase in Step 1, a Surgery Shelf score increased by 0.30 points. Seventy-seven percent of
Conclusion:
EP
original grades matched the modified grades.
AC C
Replacing Surgery Shelf scores with Step 1 scores did not have an effect on the majority of final clerkship grades. This observation raises concern over use of Surgery Shelf scores as a measure of knowledge obtained during the Surgery clerkship.
ACCEPTED MANUSCRIPT
Keywords: Surgery Clerkship, Surgery Shelf Exam, USMLE Step 1 exam
RI PT
Funding:
This research did not receive any specific grant from funding agencies in the public, commercial,
AC C
EP
TE D
M AN U
SC
or not-for-profit sectors.
ACCEPTED MANUSCRIPT
Introduction
Most Surgery Clerkships across the United States use the National Board of Medical
RI PT
Examiners (NBME) Subject Examination in Surgery (henceforth referred to as Surgery Shelf) as a component of final grade determination. Institutions vary in the weight afforded to the Surgery Shelf for final grade calculations, but it commonly accounts for at least a quarter of the final
SC
grade.1 The Surgery Shelf appeals to clerkship directors because of both its ease in
administration and its value as a nationally developed measure of knowledge acquisition.2 In
M AN U
contrast to other assessment methods such as global rating forms for clinical performance, the Surgery Shelf provides a more objective measure of clerkship-based performance.3
The United States Medical Licensing Examination (USMLE) Step 1 has been shown to correlate positively with Shelf exams in Surgery,4,5 Internal Medicine5, Pediatrics,5 Psychiatry,5
TE D
Obstetrics and Gynecology,5,6 and Family Medicine.5,7 This correlation has several implications for the Surgery clerkship. First, it suggests that scores on USMLE Step 1 may be helpful in identifying students at risk for failing the Surgery Shelf and/or receiving a poor grade in the
EP
clerkship. Second, it has raised concern that the Surgery Shelf may signify a clerk’s general testtaking capabilities rather than his/her knowledge related to surgery8 thus serving as “double
AC C
jeopardy” for students who perform poorly in other standardized tests.
The purpose of this study was to assess the value of USMLE Step 1 in predicting
performance on the Surgery Shelf and consequentially, final clerkship grades. We compared performance on the Surgery Shelf to USMLE Step 1 scores and then substituted Surgery Shelf scores with scores from USMLE Step 1 to determine the impact the substitution would have on Surgery clerkship grades. We expected that USMLE Step 1 scores would be predictive of scores
ACCEPTED MANUSCRIPT
on the Surgery Shelf and that final clerkship grades would be unaffected if we substituted USMLE Step 1 scores for Surgery Shelf scores.
RI PT
Materials and Methods:
This was a retrospective study. We collected USMLE Step 1 and USMLE Step 2
Clinical Knowledge (CK) scores, Surgery Shelf scores, and Surgery Clerkship grades for all
SC
medical students who completed the Surgery Clerkship from 2012 to 2014 at the Virginia
M AN U
Commonwealth University School of Medicine (VCU-SOM).
The Surgery Clerkship at Virginia Commonwealth University School of Medicine (VCUSOM) is an 8 week experience with students spending 4 weeks on a General Surgery Service (Gastrointestinal and Bariatric Surgery, Trauma and Emergency Surgery, Surgical Oncology, Transplantation Surgery, Pediatric Surgery, or Veterans Affairs Hospital), and two rotations of 2
TE D
weeks each on surgical subspecialty service (Cardiothoracic Surgery, Neurosurgery, Ophthalmology, Orthopedic Surgery, Otolaryngology, Plastic Surgery, Urology, Community Rotation and Vascular Surgery). Final grades on the clerkship are based on preceptors’ ratings of
EP
clinical performance (30% from General Surgery and 15% from each subspecialty experience),
AC C
Surgery Shelf performance (30%), completion of online assignments (10%), and completion of other non-graded components such as written History and Physicals (Pass/Fail). To compute a final grade, VCU-SOM converts each assessment score into a T-score. T-
scores are standard scores that have a mean of 50 and a standard deviation of 10 and allow for comparison between assessment types that use different scales (e.g. 0-100%, 1-4). At the VCUSOM, thresholds for Honors, High Pass, and Pass are suggested using norm-referenced T-scores from the previous academic cycle. The threshold for pass is identified by subtracting 2 standard
ACCEPTED MANUSCRIPT
deviations from the mean T-score from the previous academic year. The threshold for High Pass includes all T-score values above the mean but below the top 15%. All scores above the top 15% meet the threshold for Honors. Final grades are assigned by a grading committee which
RI PT
takes into account the suggested clerkship grade as well as comments from preceptors. For the purpose of this study, the suggested T-score grade was termed the “original” grade.
Each student’s Surgery Shelf T-score was theoretically substituted with their USMLE
SC
Step 1 T-score in their final grading computation to determine a “modified grade” using the same
M AN U
thresholds for honors, high pass, pass. The frequency of each grade type was computed. Mean scores and standard deviations for USMLE Step 1 and the Surgery Shelf were also computed. Linear regression was employed to determine how well raw USMLE Step 1 scores predicted raw Surgery Shelf scores. A squared correlation coefficients (R2) between raw USMLE Step 1 scores and raw Surgery Shelf scores was computed to determine the effect size of the regression
TE D
equation. Additionally, we also used linear regression to determine how well raw Surgery Shelf scores predicted USMLE Step 2 CK scores.
EP
Results:
Three hundred and ninety five medical students completed the Surgery Clerkship in 2012
AC C
to 2014. Descriptive demographic information for these students is provided in Table 1. Figure 1 illustrates the median Surgery Shelf scores by rotation over the two academic years studied. During the 2012-2013 academic year, median scores improved with each rotation, however, this did not occur in the 2013-2014 academic year. Figure 2 provides a comparison of original and modified final Surgery clerkship grades. There were no failures for both original and modified grades. There was a match between
ACCEPTED MANUSCRIPT
original and modified grades for 77% (304) of the students. For the 91 un-matched cases, 46% (42) were modified grades that were lower than original grades (e.g. HP instead of H) and 54% (49) were modified grades that were higher than original grades (e.g. H instead of HP). Ninety
RI PT
six percent (86) of the modified grade changes were usually going up or down one grade (e.g. H to HP) with only 6% (5) of the changes going up or down two grades (e.g. H to P).
Students averaged a 227 (SD = 18.67) on USMLE Step 1 and 75 (scaled score) (SD = 9)
SC
on the Surgery Shelf. Linear regression analysis showed that USMLE Step 1 scores significantly
M AN U
predicted Surgery Shelf scores, R2 = 0.42, P<0.001. The R2 value indicated USMLE Step 1 scores accounted for 42% of the variance in Shelf scores. Figure 3 provides USMLE Step 1 scores by shelf scores with the regression equation line. The regression equation (Y = 62.22 + 0.30x) was interpreted with 185 as a baseline score on Step 1. Thus, a Step 1 score of 185 would
by 0.30 points.
TE D
predict a shelf score of 62 and for every 1 point increase in Step 1 a Shelf score would increase
Step 2 CK scores were unavailable for 5 students. Linear regression analysis showed that
EP
shelf scores significantly predicted Step 2 CK scores, R2 = 0.44, P<0.001. The regression equation (Y = 206.69+ 1.32x) was interpreted with 50 as a baseline score on shelf. Thus, a shelf
AC C
score of 50 would predict a Step 2 CK score of 207 and for every 1 point increase in shelf a Step 2 CK score would increase by 1.32 points. Discussion:
The current study demonstrated that performance on the Surgery Shelf examination could be predicted from USMLE Step 1 performance and shelf scores shared a large percentage of variance with USMLE Step 1 scores. Additionally, theoretically replacing Surgery Shelf scores
ACCEPTED MANUSCRIPT
with USMLE Step 1 scores did not have an effect on a majority of final Surgery clerkship grades. These results further concerns expressed by others regarding the use of the Surgery Shelf
RI PT
as a measure of knowledge acquired during the Surgery clerkship.
Surgery clerkship grades are commonly determined by a constellation of faculty and/or resident evaluations of a student’s clinical performance, the Surgery Shelf, and in some settings,
SC
oral examinations, objective structured clinical examinations, encounter notes, or other
assignments. Numerous studies have illustrated poor concordance between clinical evaluations
M AN U
and performance on the Surgery Shelf leading to the conclusion that clinical evaluations do not effectively measure medical knowledge.3,9-11 However, the assumption is that the Surgery Shelf does, in fact, measure knowledge acquisition during the clerkship.
In the recent text from the Alliance on Clinical Education (ACE), a section is devoted to
TE D
the utility of the NBME subject examinations in assessing knowledge acquisition.2 As the authors describe, Shelf exams “reflect cumulative knowledge, including knowledge acquired from basic science and prior clinical experiences (p.187).” This statement is supported by
EP
substantial literature, which reflects the impact of clerkship timing and performance on the subject exams. Specific to the Surgery Shelf, several authors have made the same
AC C
observations.12-14 For example, Gerhardt and colleagues compared surgical clerkship students on “slow” and “busy” clinical services and found that characteristics of clinical rotations were less influential on the Surgery Shelf scores than time of year.14 The NBME also acknowledges the notion that Shelf examinations may reflect cumulative knowledge. While they highlight that these examinations are intended to reflect learning in the
ACCEPTED MANUSCRIPT
context of the respective course or clerkship, they go on to share that students’ scores also relate to their overall progression throughout the course of medical school. 15
RI PT
The acknowledgements from ACE, the NBME and our own observations are somewhat problematic for both clerkship directors and medical students. The NBME establishes suggested thresholds for “Honors” and passing using common standard-setting methods from a panel of experts. However the use of these guidelines and the weight afforded to the Surgery Shelf is at
SC
the discretion of the local clerkship and/or School of Medicine. At our institution the Surgery Shelf counts for 30% of the final grade, but the weighting is variable across the country with
M AN U
some institutions reportedly using the Shelf for up to 70% of the final grade.1 At this time there is no consistency in Surgery Clerkship grading systems across medical schools.16 For medical students, there are additional concerns. Outside of professionalism, performance on USMLE Step 1 and “Honors” in the Surgical clerkship are considered the two
TE D
most important factors in selecting applicants to interview for surgical residency.17 While these are considered in the larger view of the applicant, one may pose that a student can combat poor
EP
performance on one with an outstanding performance on the other. However, if “Honors” on the Surgery clerkship is heavily influenced by the Surgery Shelf exam, and performance on this
AC C
exam is influenced by previous performance, such as USMLE Step 1 performance, this raises concern for inherent “double jeopardy.” Though our data does raise concern over the use of the Surgery Shelf scores in
determining surgical knowledge obtained during the clerkship, one must also consider the consequences of removing the Surgery Shelf as a component of the final grade. Rockney and colleagues assessed the impact of dropping the Pediatric Shelf examination from the clerkship at a single institution and found that students performed worse on USMLE Step 2 CK as a result.18
ACCEPTED MANUSCRIPT
Similarly, the Shelf has also been used effectively to identify students who are at risk for failure of USMLE Step 2.19 The results of the current study suggests that performance on the USMLE Step 2 CK could be predicted from Surgery Shelf scores and USMLE Step 2 CK scores shared a
RI PT
large percentage of variance with Shelf scores. Therefore, the Surgery Shelf can be viewed, at a minimum, as a method of preparing for USMLE Step 2 CK in a low(er) stakes setting, depending on how much weight it is worth in final grade computations of the Surgery clerkship.
SC
In addition to concerns expressed over potential for worse outcomes on USMLE Step 2
M AN U
CK, one must consider the alternative methods of assessment available. Measures such as global rating forms for clinical performance are fraught with tendencies for evaluators to confuse attitudes such as motivation with fund of knowledge, skills in the operating room, or potential as a future surgeon.3 Newer assessment tools such as virtual patient cases may be worth considering in the future.20 However, regardless of the assessment method chosen, there are always trade-
TE D
offs, which is why multiple assessment tools and multiple observations of performance are needed in clerkships to determine final grades. As the NBME itself states, “the results of the
EP
subject exams should not be viewed as the beginning and end of evaluation.” 15 There are various limitations to this study. First, we used data from a single institution in
AC C
which the Surgery Shelf counts almost for a third of the grade. The impact of replacing Surgery Shelf scores with USMLE Step 1 scores on final grades would presumably vary depending on how much weight shelf contributes to the grade across institutions. Further investigation with multiple medical schools is needed to determine if “double jeopardy” is a national issue or just a local issue. Second, while approximately three quarters of modified grades “matched” the original grades, we did not find a match for one quarter of final grades. Our study was not designed to evaluate data such as career interest, study habits, or demographic variables which
ACCEPTED MANUSCRIPT
may have influenced a better or worse performance on the Surgery Shelf exam than expected based on USMLE Step 1 scores.
RI PT
Conclusions Performance on the Surgery Shelf examination can be predicted using USMLE Step 1 scores. This result questions the use of the Surgery Shelf as a specific examination for
SC
knowledge acquired during the Surgery Clerkship and may have significant implications for
AC C
EP
TE D
M AN U
Surgery clerkship directors.
ACCEPTED MANUSCRIPT
References: 1. National Board of Medical Examiners. Characteristics of clinical clerkships. http://www.nbme.org/PDF/SubjectExams/Clerkship Survey Summary.pdf. Accessed
RI PT
January 1 2016.
2. Sisson T, Grum C. Clerkship examinations. In: Pangaro LN and McGaghie WC, eds. Alliance for Clinical Education: Handbook on Medical Student Evaluation and
SC
Assessment. North Syracuse: Gegensatz Press, 2015: 177-190.
3. Awad SS, Liscum KR, Aoki N et al. Does the subjective evaluation of Medical Student
Research 2002;104:36-39.
M AN U
Surgical Knowledge Correlate with Written and Oral Exam Performance? J Surg
4. Kozar RA, Kao LS, Miller CC et al. Preclinical Predictors of Surgery NBME Exam Performance. J Surg Research 2007;140:204-7.
TE D
5. Zahn CM, Saguil A, Artino AR Jr, et al. Correlation of National Board of Medical Examiners scores with United States Medical Licensing Examination Step 1 and Step 2 scores. Acad Med. 2012;87:1348–1354
EP
6. Ogunyemi D, De Taylor-Harris S. NBME obstetrics and gynecology clerkship final examination scores: Predictive value of standardized tests and demographic factors. J
AC C
Reprod Med. 2004;49:978-998
7. Myles T, Galvez-Myles R. USMLE Step 1 and 2 scores correlate with family medicine clinical and examination scores. Fam Med. 2003;35:510–513
8. Hermanson B, Firpo M, Cochran A et al. Does the National Board of Medical Examiners’ Surgery Subtest level the playing field? Am J Surg. 2004:188:520-1.
ACCEPTED MANUSCRIPT
9. Goldstein SD, Lindeman B, Colbert-Getz J et al. Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores. Am J Surg. 2014; 207: 231-5.
RI PT
10. Lawrence PF, Nelson EW, Cockayne TW. Assessment of medical student fund of knowledge in surgery. Surgery. 1985; 97: 745-9.
11. Farrell TM, Kohn GP, Owen SM, et al. Low correlation between subjective and
SC
objective measures of knowledge on surgery clerkships. J Am Coll Surg. 2010; 210: 680-5.
M AN U
12. Ripkey DR, Case SM, Swanson DB. Predicting performances on the NBME surgery subject test and USMLE Step 2: The effects of surgery clerkship timing and length. Acad Med. 1997; 72: S31-33
13. Baciewicz FA Jr, Arent L, Weaver M, et al. Influence of clerkship structure and timing
TE D
on individual student performance. Am J Surg. 1990; 159: 265-8. 14. Gerhardt JD, Filipi CJ, Watson P, et al. Are long hours and hard work detrimental to end-clerkship examination scores? Am J Surg. 1999; 177: 132-135.
EP
15. National Board of Medical Examiners. Subject Exams. http://www.nbme.org/schools/Subject-Exams/index.html. Accessed January 1 2016.
AC C
16. Ravelli C, Wolfson P. What is the “Ideal” Grading System for the Junior Surgery Clerkship? Am J Surg 1999;177:140-44.
17. National Resident Matching Program. http://www.nrmp.org/wpcontent/uploads/2014/09/PD-Survey-Report-2014.pdf. Accessed
January 1 2016.
ACCEPTED MANUSCRIPT
18. Rockney RM, Allister RG. Dropping the Shelf examination: does it affect student performance on the United States Medical Licensure Examination Step 2? Ambul Pediatr 2005;5:240-43
USMLE Step 2. Acad Med 1999;74:45-48.
RI PT
19. Ripkey DR, Case SM, Swanson DB. Identifying students at risk for poor performance on
20. Yang RL, Hashimoto DA, Predina JD et al. The Virtual-Patient Pilot: Testing a New
SC
Tool for Undergraduate Surgical Education and Assessment. J Surg Education; 70:394-
AC C
EP
TE D
M AN U
400.
ACCEPTED MANUSCRIPT
# Females
181 (46 %)
Age Range
25 - 42 years
Average Age
28 years
# Caucasian
194 (49 %)
# Asian
100 (25%)
# Underrepresented Minorities
24 (6%)
SC
214 (54%)
TE D
M AN U
# Males
RI PT
Table 1: Demographic Information for 395 Surgery Clerkship Students
Figure 1: Surgery Shelf Examination Scores by Rotation Number
EP
See separate file
Figure 2: Percentage of students receiving modified grades of “Honors,” “High Pass,” and
AC C
“Pass” compared to their original grades in the Surgery Clerkship See separate file
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
Figure 3: Step 1 Scores Minus 185 by Surgery Shelf Examination Scores for 395 Students
RI PT
ACCEPTED MANUSCRIPT
82
SC
80
M AN U
76
Legend 2012-2013
TE D
74
72
2013-2014
EP
70
68
66
1
2
AC C
Shelf Exam Raw Score
78
3
4
Rotation Number
5
6
SC
RI PT
ACCEPTED MANUSCRIPT
M AN U
200
160 140
TE D
120 100 80
40 20 0
Honors
High Pass
Original Grade
Honors High Pass Pass
EP
60
AC C
Modifed Grade (n receiving)
180
Pass
SC
RI PT
ACCEPTED MANUSCRIPT
128
High Pass
23
Pass
3
AC C
EP
TE D
Honors
High Pass
M AN U
Honors
Pass
17
5
34
20
22
167
SC
RI PT
ACCEPTED MANUSCRIPT
M AN U
Pass 3%
High Pass 12%
Pass 26%
High Pass 11%
TE D
Honors 30%
High Pass 44%
Pass 87%
Honors
AC C
EP
Honors 85%
Honors 2%
High Pass Original Grades
Pass