The Journal of Emergency Medicine, Vol. -, No. -, pp. 1–6, 2016 Ó 2016 Elsevier Inc. All rights reserved. 0736-4679/$ - see front matter
http://dx.doi.org/10.1016/j.jemermed.2016.09.018
Education FACULTY EVALUATIONS CORRELATE POORLY WITH MEDICAL STUDENT EXAMINATION PERFORMANCE IN A FOURTH-YEAR EMERGENCY MEDICINE CLERKSHIP Nicole M. Dubosh, MD,*† Jonathan Fisher, MD, MPH,‡ Jason Lewis, MD,*† and Edward A. Ullman, MD*† *Department of Emergency Medicine, Harvard Medical School, Boston, Massachusetts, †Department of Emergency Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, and ‡Department of Emergency Medicine, Maricopa Medical Center, Phoenix, Arizona Corresponding Address: Nicole M. Dubosh, MD, Department of Emergency Medicine, Beth Israel Deaconess Medical Center, One Deaconess Road, W-CC2, Boston, MA 02215
, Abstract—Background: Clerkship directors routinely evaluate medical students using multiple modalities, including faculty assessment of clinical performance and written examinations. Both forms of evaluation often play a prominent role in final clerkship grade. The degree to which these modalities correlate in an emergency medicine (EM) clerkship is unclear. Objective: We sought to correlate faculty clinical evaluations with medical student performance on a written, standardized EM examination of medical knowledge. Methods: This is a retrospective study of fourth-year medical students in a 4-week EM elective at one academic medical center. EM faculty performed end of shift evaluations of students via a blinded online system using a 5-point Likert scale for 8 domains: data acquisition, data interpretation, medical knowledge base, professionalism, patient care and communication, initiative/reliability/ dependability, procedural skills, and overall evaluation. All students completed the National EM M4 Examination in EM. Means, medians, and standard deviations for end of shift evaluation scores were calculated, and correlations with examination scores were assessed using a Spearman’s rank correlation coefficient. Results: Thirty-nine medical students with 224 discrete faculty evaluations were included. The median number of evaluations completed per student was 6. The mean score (±SD) on the examination was 78.6% ± 6.1%. The examination score correlated poorly with faculty evaluations across all 8 domains
(r 0.074–0.316). Conclusion: Faculty evaluations of medical students across multiple domains of competency correlate poorly with written examination performance during an EM clerkship. Educators need to consider the limitations of examination score in assessing students’ ability to provide quality patient clinical care. Ó 2016 Elsevier Inc. All rights reserved. , Keywords—evaluation; medical student clerkships; written examinations
INTRODUCTION Emergency medicine (EM) is gaining prominence in medical school curriculum and is now a required clerkship at 52% of United States (US) allopathic medical schools (1). The increased importance within undergraduate medical education warrants examination of current student evaluation and grading systems, given the strong implications for determination of competency and residency placement. Organizations including the Task Force on National Fourth Year Medical Student Emergency Medical Student Curriculum and the Accreditation Council for Graduate Medical Education Outcomes Project recommend multisource evaluation methods to improve the reliability and scope of
Reprints are not available from the authors.
RECEIVED: 6 September 2016; ACCEPTED: 12 September 2016 1
2
N. M. Dubosh et al.
assessment of students in clerkships (2–4). The unique work structure of EM means that students are often assigned to a particular faculty member for an individual shift. One common form of evaluation is the end of shift evaluation or so called ‘‘shift card.’’ A recent national survey of clerkship directors found that end of shift faculty evaluations and written examinations are the 2 most common means of assessment in the EM clerkship (1). This is consistent across medical schools, because subjective faculty ratings and written examinations are common tools of assessment in multiple clinical settings (5). Developed by the Clerkship Directors in Emergency Medicine (CDEM) Testing Committee, the National EM M4 Examination is a 50-question multiple-choice examination derived from the national fourth-year curriculum in EM (2,3,6). It currently exists in 2 versions (V1 and V2) and was validated for objective content by the CDEM Testing Committee. Student performance on both versions was found to correlate with performance on the National Board of Medical Examiners (NBME) Advanced Clinical Examination in EM, the only other published, standardized medical student EM examination that is currently used (7). It is unclear how test performance compares to faculty subjective assessment of student performance. The objective of this study was to correlate faculty clinical evaluation scores across 8 domains, including medical knowledge with medical student performance on the National EM M4 Examination. MATERIALS AND METHODS Study Design This was a retrospective study at an academic, urban, level 1 trauma center with an annual emergency department (ED) patient volume of 56,000 and home to a 3year EM residency program.
medical center and a community-affiliated ED where they are supervised by academic faculty in EM and senior EM residents. All students use the online material at CDEMCurriculum.org as their required reading. Assessment and Evaluation Students are evaluated based on end of shift faculty evaluations and performance on V2 of the National EM M4 Examination. Faculty complete end of shift evaluations electronically using a 5-point Likert scale with ‘‘1’’ being unsatisfactory and ‘‘5’’ being outstanding across the following 8 domains: data acquisition, data interpretation, medical knowledge base, professionalism, patient care and communication, initiative/reliability/dependability, procedural skills, and overall evaluation. Faculty members are asked to complete an evaluation after each shift during which they worked with a medical student. All students complete the National EM M4 Examination during the final week of their rotation. The test is administered electronically in a proctored setting by the clerkship coordinator. Students were assigned a number for deidentification purposes. Only the study investigators had access to the identification key, and all evaluation and test score data were kept confidential. The study was reviewed by the institutional review board at our institution and determined to be exempt from further review. Data Analysis Data regarding student demographic information, faculty evaluation scores, and National EM M4 Examination scores were collected and analyzed using Excel (Microsoft, Redmond, WA) and SPSS software (version 21.0; SPSS, Inc, Armonk, NY). Means, medians, and standard deviations were calculated, and correlations between test score and faculty evaluation scores across the 8 domains were assessed using a Spearman’s rank correlation coefficient.
Study Setting and Population RESULTS The study population consisted of fourth-year US medical students enrolled in a 4-week EM elective clerkship at our institution. All US medical students rotating in our ED from July 2013 through October 2013 were included. The clerkship requirements include completion of 14 clinical shifts and 1 nursing shift, attendance at weekly dedicated student lectures at the medical school, attendance at weekly resident didactics, four 3-h simulation and procedure sessions, and a brief end of rotation scholarly presentation on the student’s topic of choice. Students complete their clinical shifts at the tertiary
During the study period, 39 medical students completed the clerkship. Thirty-three faculty physicians completed 224 discrete evaluations during the study period. The
Table 1. Characteristics of Study Subjects Female, % Visiting from other US medical schools, % No. of home medical schools represented Students who matched in emergency medicine, % US = United States.
30.1 87.7 26 87.7
Medical Student Examination Performance and Faculty Evaluations
3
Distribution of Evaluations 45% 40% 35% 30% 25% 20%
1 2
15%
3
10%
4
5%
5
0%
Figure 1. Distribution of faculty evaluation scores.
subject characteristics are shown in Table 1. The median number of evaluations completed per student was 6. The distribution of evaluation scores is shown in Figure 1. The mean ( 6 SD) score on the National EM M4 Examination was 78.6% 6 6.1%. Written examination scores correlated poorly with faculty evaluations across each of the 8 domains (Table 2). DISCUSSION Our study shows a poor correlation between faculty evaluation and examination performance of fourth-year medical students in an EM clerkship. In particular, there is a poor correlation of faculty assessment of medical knowledge and examination scores. To the best of our knowledge, this is the first study assessing this correlation among medical students in EM. Previous studies across
different medical specialties have shown mixed results with regard to this association. Faculty evaluations were found to correlate poorly with learner performance on standardized, specialty-specific objective tests of medical knowledge among medical students and residents in surgery, internal medicine, and radiology clerkships (8–10). Aldeen et al. found only a moderate correlation between an ability to predict EM residents’ scores on the EM InTraining Examination by EM faculty (11). Education faculty had a greater correlation compared to general EM faculty (11). Conversely, during a 9-week pediatric rotation, Dudas et al. found convergent validity amongst resident and faculty global assessments of students’ medical knowledge and performance on the NBME examination in pediatrics (12). In a similar study of surgical clerkship students, faculty evaluations of students’ medical knowledge weakly
Table 2. Correlation between Emergency Medicine Fourth-Year Examination Scores and Faculty Evaluations Medical Patient Care Data Data Knowledge and Initiative/Reliability/ Procedural Overall Acquisition Interpretation Base Professionalism Communication Dependability Skills Evaluation EM Spearman M4 significance test
.244* .000
.301* .000
.316* .000
EM = emergency medicine; M4 = 4th year medical student. * Correlation is significant at the 0.01 level (2-tailed). † Correlation is significant at the 0.05 level (2-tailed).
.179* .007
.158† .018
.213* .001
.074 .271
.271* .000
4
correlated with scores on the NMBE Surgical Examination. However, the correlation was higher when faculty evaluated students over a 4-week period compared to 2 weeks (13). Our results are multifactorial and have various explanations. Because of the shift structure in EM, students in our clerkship typically work shifts with a variable amount of faculty members, limiting the total time with 1 attending. This is in contrast to the majority of other specialty clerkships that typically have fewer faculty physicians but increased total time with individual students. As evident by Farrell et al., increasing the time faculty members spend with an individual student may result in more accurate evaluations of medical knowledge (13). Potentially changing the clerkship structure to one in which each student works multiple shifts with fewer faculty members may be 1 way to gather more meaningful information on student clinical performance. In addition, the faculty evaluation scores for each student in our study are averaged across all faculty members, including those who have educational roles in the program and those who do not. As suggested by Aldeen et al., faculty members with leadership positions within education may be able to more accurately provide subjective assessment of learners’ medical knowledge base (11). Assigning students to work with faculty who are more invested in education might lead to more consistent assessment; however, this strategy may not be practical given the variability of attending physician scheduling. Providing faculty development in the area of assessment may be more effective, given the relatively large volume of students who rotate through EM clerkships. In addition, our findings highlight the need for educators to use caution in relying too heavily on standardized written examinations. Our results show that there is poor correlation between examination scores and faculty assessment of medical knowledge in the clinical setting. While written examinations are easy to administer, there is no evidence to show that those who attain higher scores will provide better patient care. Faculty evaluations of clinical performance have been used as the primary means of assessing medical students on clerkships for decades (14). While difficult to measure, they are perhaps the most direct means by which to determine how students will care for patients. With the increased focus on competency-based assessment in undergraduate and graduate medical education, it is important to reevaluate assessment tools to ensure that they are measuring actual performance (15–18). Limitations This study has several limitations. First, this is a singlecenter study with a relatively small sample size that
N. M. Dubosh et al.
may limit extrapolation to other institutions. We believe, however, that the large number of faculty evaluators included in our study and the fact that the medical students in our clerkship come from 26 different medical schools adds heterogeneity of our population that enhances generalizability. Second, not all faculty completed evaluations of the students with whom they worked. While each student completed 14 clinical shifts during their rotation, the median number of evaluations each student received was 6. Faculty members often work multiple shifts with 1 student but provide only a single evaluation. Third, there are intrinsic limitations to multiple-choice tests. While test writers make attempts to create an appropriate medical student test, there can be inherent flaws in questions that can result in examination failure despite appropriate fund of knowledge (19,20). The National M4 EM examination in particular contains questions written according to item-writing guidelines and has been shown to demonstrate content validity, adequate question discriminatory ability, and appropriate level of difficulty (6). Test performance, however, does not necessarily translate to clinical competency and acumen at the bedside. Finally, many students complete multiple clerkships in EM and may have taken this examination previously. CONCLUSION Faculty evaluations of medical students across multiple competency domains correlate poorly with written examination performance during an EM clerkship. Educators need to understand the limitations of standardized examinations with regard to assessing the ability of students to provide competent clinical care. Additional studies should focus on the understanding the differences and gaps between faculty assessment and written examinations.
REFERENCES 1. Khandelwal S, Way DP, Wald DA, et al. State of undergraduate education in emergency medicine: a national survey of clerkship directors. Acad Emerg Med 2014;21:92–5. 2. Manthey DE, Coates WC, Ander DS, et al. Report of the task force on national fourth year medical student emergency medicine curriculum guide. Ann Emerg Med 2006;47:E1–7. 3. Manthey DE, Ander DS, Gordon DC, et al. Emergency medicine clerkship curriculum: an update and revision. Acad Emerg Med 2010;17:638–43. 4. Swing SR. The ACGME outcome project: retrospective and prospective. Med Teach 2007;29:648–54. 5. Mavis BE, Cole BL, Hoppe RB. A survey of student assessment in U.S. medical schools: the balance of breadth versus fidelity. Teach Learn Med 2001;13:74–9. 6. Senecal EL, Heitz C, Beeson MS. Creation and implementation of a national emergency medicine fourth-year student examination. J Emerg Med 2013;45:924–34.
Medical Student Examination Performance and Faculty Evaluations 7. Hiller K, Miller ES, Lawson L, et al. Correlation of the NBME advanced clinical examination in EM and the EM M4 exams. West J Emerg Med 2015;16:138–42. 8. Goldstein SD, Lindeman B, Colbert-Getz J, et al. Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores. Am J Surg 2014;207: 231–5. 9. Kolars JS, McDonald FS, Subhiyah RG, et al. Knowledge base evaluation of medicine residents on the gastroenterology service: implications for competency assessments by faculty. Clin Gastroenterol Hepatol 2003;1:64–8. 10. Wise S, Stagg PL, Szucs R, et al. Assessment of resident knowledge: subjective assessment versus performance on the ACR intraining examination. Acad Radiol 1999;6:66–71. 11. Aldeen AZ, Salzman DH, Gisondi MA, et al. Faculty prediction of in-training examination scores of emergency medicine residents. J Emerg Med 2014;46:390–5. 12. Dudas RA, Cobert JM, Goldstein S, et al. Validity of faculty and resident global assessment of medical students’ clinical knowledge during their pediatrics clerkship. Acad Pediatr 2012;12:138–41. 13. Farrell TM, Kohn GP, Owen SM, et al. Low correlation between subjective and objective measures of knowledge on surgery clerkships. J Am Coll Surg 2010;210:680–3.
5
14. Kassenbaum DG, Eaglen RH. Shortcomings in the evaluation of students’ clinical skills and behaviors in medical school. Acad Med 1999;74(7):842–9. 15. The Accreditation Council for Graduate Medical Education website. Emergency medicine core competencies. Available at: http:// www.acgme.org/acgmeweb/Portals/0/PDFs/FAQ/110_emergency_ medicine_FAQs_07012013.pdf. Accessed July 12, 2015. 16. Beeson MS, Carter WA, Christopher TA, et al. The development of the emergency medicine milestones. Acad Emerg Med 2013;20: 724–9. 17. Association of American Medical Colleges Drafting Panel website. Core entrustable physician activities for entering residency (updated). Available at: https://www.mededportal.org/icollaborative/ resource/887. Accessed July 12, 2015. 18. Santen SA, Peterson WJ, Khandelwal S, et al. Medical student milestones in emergency medicine. Acad Emerg Med 2014;21: 905–11. 19. Downing SM. The effects of violating standard item writing priciples on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Adv Health Sci Educ Theory Pract 2005;10:133–43. 20. Senecal EL, Askew K, Gorney B, et al. Anatomy of a clerkship test. Acad Emerg Med 2010;17:S31–7.
6
N. M. Dubosh et al.
ARTICLE SUMMARY 1. Why is this topic important? Medical students in emergency medicine clerkships are evaluated by multiple modalities, including written examinations and faculty evaluations of clinical performance. Clerkship grades have strong implications for both determination of competence and residency selection. 2. What does this study attempt to show? This study assesses the correlation between faculty clinical evaluations and medical student performance on the emergency medicine National EM M4 Examination. 3. What are the key findings? Faculty evaluations of medical students across multiple domains of competency correlate poorly with written examination performance in an emergency medicine clerkship. 4. How is patient care impacted? By understanding the utility of different medical student evaluation modalities, educators are better able to determine medical students’ competency in providing patient care.