Resident self-assessment of operative performance

Resident self-assessment of operative performance

The American Journal of Surgery 185 (2003) 521–524 Association for surgical education Resident self-assessment of operative performance Myle`ne Ward...

48KB Sizes 0 Downloads 46 Views

The American Journal of Surgery 185 (2003) 521–524

Association for surgical education

Resident self-assessment of operative performance Myle`ne Ward, M.D.a, Helen MacRae, M.D.a,b,*, Christopher Schlachta, M.D.b, Joseph Mamazza, M.D.b, Eric Poulin, M.D.b, Richard Reznick, M.D.a,b, Glenn Regehr, Ph.D.a a

Centre for Research in Education, University Health Network, and Department of Surgery, University of Toronto, Toronto, Ontario, Canada University of Toronto, Department of Surgery, Mount Sinai Hospital, 600 University Ave., Room 1514, Toronto, Ontario M5G 1X5, Canada

b

Manuscript received September 3, 2002; revised manuscript October 9, 2002 Presented at the 22nd Annual Meeting of the Association of Surgical Education, Baltimore, Maryland, April 4 – 6, 2002

Abstract Background: In medicine, the development of expertise requires the recognition of one’s capabilities and limitations. This study aimed to verify the accuracy of self-assessment for the performance of a surgical task, and to determine whether self-assessment may be improved through self-observation or exposure to relevant standards of performance. Methods: Twenty-six senior surgical residents were videotaped performing a laparoscopic Nissen fundoplication in a pig. Experts rated the videos using two scoring systems. Subjects evaluated their performances after performance of the Nissen, after self-observation of their videotaped performance, and after review of four videotaped “benchmark” performances. Results: Expert interrater reliability was 0.66 (intraclass correlation coefficient). The correlation between experts’ and residents’ selfevaluations was initially moderate (r ⫽ 0.50, P ⬍0.01), increasing significantly after the residents reviewed their own videotaped performance to r ⫽ 0.63 (⌬r ⫽ 0.13, P ⬍0.01), yet did not change after review of the benchmarks. Conclusions: Self-observation of videotaped performance improved the residents’ ability to self-evaluate. © 2003 Excerpta Medica, Inc. All rights reserved. Keywords: Self-assessment; Self-evaluation; Medical education; Technical skill

The competent physician is a lifelong learner, but effective learning depends upon the ability to recognize one’s strengths and weaknesses [1]. Consistent with this principle, professional self-regulation in medicine is predicated upon the assumption that physicians are cognizant of their capabilities and limitations. Despite this theoretical argument for the critical importance of accurate self-assessment in medicine, many studies suggest that medical trainees may not be aware of what they do not know. A meta-analysis of 44 self-assessment studies in higher education reported a mean correlation between self- and expert assessments of 0.39 [2]. Subsequently, a review by Gordon [3] of 18 studies in the health professions yielded similar correlations. Since the publication of these two reviews, studies continue to corroborate this finding. The correlations between self-ratings generated by health

* Corresponding author. Tel.: ⫹1-416-586-4800, ext. 2836; fax: ⫹1416-586-8644. E-mail address: [email protected]

professional trainees and expert ratings are generally low [4 –10] or fail to be statistically different from zero [11–13]. In view of the importance of self-assessment to clinical practice, methods to improve learner’s self-assessment abilities warrant investigation. Several authors have hypothesized that self-observation through videotape review would lead to more accurate self-assessments. Hays [14] reported that this technique brought family medicine residents’ selfassessment scores more in line with expert evaluations of performance on a clinical interview. However, in a group of physical therapy students, videotape playback only served to confirm the students’ impressions of their performance [15]. Martin and colleagues [16] hypothesized that the opportunity to benchmark one’s performance by comparing it to other performances at varying levels of competence would improve self-assessment. They reported a statistically significant increase in the correlation between selfand expert evaluations after exposure to the videotaped benchmark performances, lending support to this hypothesis.

0002-9610/03/$ – see front matter © 2003 Excerpta Medica, Inc. All rights reserved. doi:10.1016/S0002-9610(03)00069-2

522

M. Ward et al. / The American Journal of Surgery 185 (2003) 521–524

For practicing surgeons, the safe adoption and the incorporation of new techniques or procedures into practice necessitates accurate self-assessment. Studies that have considered self-assessment in the context of a surgical rotation have also reported low correlations between self- and expert evaluations [8,9,13,17]. But it must be noted that performance evaluation in these studies encompassed multiple domains of clinical competence beyond technical ability. The current study investigated the following questions: (1) What is the accuracy of self-assessment for the performance of a laparoscopic operation? (2) Do interventions—selfobservation of videotaped performance, and review of benchmark performances—lead to an improvement in selfassessment ability?

Methods Subjects Twenty-seven senior (postgraduate years 3 to 5) general surgery residents and one general surgery fellow at the University of Toronto participated in this study. The purpose of the study was explained to all residents and informed consent was obtained prior to participation. The research protocol was approved by the Research Ethics Committee at the University of Toronto. Task Each resident completed a questionnaire to determine previous experience with simple and advanced laparoscopic operations. A 10-minute videotape of an expert performing the essential steps of a Nissen fundoplication on a live, anesthetized pig was shown to familiarize all residents with the procedure. Participants performed a laparoscopic Nissen fundoplication on live, anesthetized pigs. The Animal Care Committee (University Health Network, Toronto) granted ethical approval for the study. Fifty-kilogram pigs were anesthetized by a licensed veterinarian, using standard protocol. Participants were instructed in the positioning of the laparoscopic ports and established pneumoperitoneum. Participants then performed a standard fundoplication with a 360-degree wrap. No time limit was set. A second surgical resident or a medical student assisted during each operation. Each station was provided with an adequate supply of laparoscopic instruments. Expert laparoscopic surgeons were on hand to provide instruction when needed. However, experts were not permitted to provide feedback or criticism. The laparoscopic video signal was used to record the operations. Performance evaluation Two scoring systems were used for both expert and self-evaluation of operative performance: the global rating

Table 1 Interrater reliabilities of examiner pairs for assessment of the benchmark videotapes Expert rater pairs

First set of ratings

Second set of ratings

1 and 2 2 and 3 1 and 3

0.99 0.87 0.88

0.80 0.48 0.70

scale (GRS) and the operative component rating scale (OCRS). The global rating form, previously published and validated, captures general constructs of performance common to all surgical procedures [18,19]. The standard global rating form was modified slightly for this study by removing the item “knowledge of instruments.” Two additional items were included at the end of the GRS, both rated on anchored 5-point scales: “overall performance” and “quality of final product.” The OCRS assesses component steps of the operation and has demonstrated high interrater reliability (0.73 to 0.96) in the hands of expert surgeons [20]. Residents had the opportunity to review both rating forms prior to their performance of the Nissen fundoplication. Benchmark videotapes The study’s design required videotapes of four individuals, each performing a Nissen fundoplication on a porcine model. Thirty videotapes of senior residents performing this procedure were available from a separate study. Four videotapes were selected by two masked expert laparoscopists to represent a range of ability on the task. The selected performances were placed in a random sequence in the final tape to be shown to the residents. The four benchmark videotapes were given to three expert raters for evaluation. They independently completed the GRS and OCRS for all four performances. High interrater reliability (with a range of 0.87 to 0.99) was taken as evidence that the four selected performances were suitable as a set of benchmarks for comparison (Table 1). Expert assessment Expert laparoscopists, masked as to the identity of the operator, evaluated the videotapes of the operation, using the GRS and OCRS. Care was taken to ensure that expert raters were not matched to the videotapes of operative performances that they had observed in the animal laboratory. Examiners were given permission to cue forward through the videotapes at their discretion, deciding for themselves which portions of the operation allowed them to rate the operative performance to their satisfaction. This technique, used in a previous study at the University of Toronto, substantially reduces the time needed for assessment while demonstrating good overall reliability. Interrater reliability of experts in using this technique for the evalua-

M. Ward et al. / The American Journal of Surgery 185 (2003) 521–524

523

tion of laparoscopic Nissen fundoplications previously ranged from 0.73 to 0.96 [20]. Three expert laparoscopists each rated eight or nine resident videotapes plus all four benchmark videotapes. The benchmark videotapes could not be distinguished from the other videotapes. Thus, one expert rating was obtained for each resident operative performance and three expert ratings were again obtained for the benchmark videotapes.

showed a statistically significant increase to 0.63 (⌬r ⫽ 0.13, P ⬍0.01), suggesting that the opportunity to view the one’s own performance improved self-assessment accuracy. However, the correlation between experts’ assessments and residents’ self-assessments after the residents had viewed the benchmark videotapes was not significantly different, at r ⫽ 0.66 (⌬r ⫽ 0.03, not significant).

Resident self- and video assessment

Comments

The residents evaluated their performances at three intervals: immediately after performance of the NF, after self-observation of their videotaped performance, and after review of the four videotaped benchmark performances. Residents were given permission to cue forward at their own discretion. Each resident reviewed their own videotaped performance and all other benchmark performances at a single session.

It is generally accepted that self-appraisal of one’s strengths and weaknesses has important implications for safe surgical performance. This study aimed to verify the accuracy of self-assessment for the performance of a surgical task, and to determine whether self-assessment abilities may be improved through self-observation or exposure to relevant standards of performance. Our findings suggest that senior surgical residents are fairly accurate judges of their technical performance in a laparoscopic model. The initial measure of self-assessment accuracy (r ⫽ 0.50) was higher than expected compared with the literature, where correlations between self- and expert evaluations have generally been much lower [2,3]. In this study, self-observation of videotaped performance further improved the residents’ ability to self-evaluate. Several possible explanations may account for the better than expected ability to self-assess. The participants in this study were more senior than the subjects involved in many other self-assessment studies to date. Exposure to a large volume of operative cases may enable senior surgical residents to readily identify expert performance. Other authors have hypothesized that poor self-assessment abilities may reflect a lack of experience in the particular domain of interest [21,22]. Alternatively, these findings may reflect self-assessment abilities for the performance of a focused, technical task, which may lend itself to more objective self-evaluation. Another possibility is that observation of performance and continuous informal feedback may occur more often for technical skills than for other clinical skills. This feedback may lead to better self-evaluation. In the current study, self-observation of videotaped performance improved the residents’ ability to self-evaluate. Reviewing one’s operative performance on videotape may aid the development of more accurate self-representations of performance, by stimulating focused self-reflection once completion of the task no longer constitutes a distraction. This potential training tool is particularly well suited to the review of laparoscopic procedures, which may be videotaped with minimal effort. On the other hand, unlike previous studies, we found that the opportunity to view benchmark performances of the same procedure did not improve self-assessment ability. This finding lends further credence to the speculation that the ability to self-assess among senior surgical residents is related to surgical experience. Review of four benchmark

Statistical analyses An overall score was generated for each performance evaluation, by calculating the average of all 15 items comprising both scoring systems. To calculate self-assessment accuracy, the experts’ scores for each individual were correlated with the scores the residents assigned to themselves, at each of the three self-assessment intervals.

Results Gold standard Initially, high interrater reliabilities between pairs of examiners for the four benchmark videotapes served to confirm our selection by showing strong agreement as to what constituted a poor or good performance. When the benchmark videotapes were viewed a second time, within the context of the other resident videotaped performances, interrater reliabilities between examiner pairs were somewhat lower (Table 1). The intraclass correlation coefficient (ICC) for the final set of benchmark evaluations, which provides a measure of overall expert interrater reliability, was 0.66. Self-assessment accuracy Two residents were excluded from the data analysis, one due to loss of the video signal during the procedure and the other due to failure to complete the videotape assessment session. Therefore, data from 26 subjects were available for analysis. The correlation between experts’ evaluations and residents’ initial self-evaluations was moderate (r ⫽ 0.50, P ⬍0.01). The correlation between experts’ and self-assessments after self-observation of videotaped performance

524

M. Ward et al. / The American Journal of Surgery 185 (2003) 521–524

performances may exercise little influence relative to the wide exposure to surgical performance that occurs through residency training. The measurement of self-assessment accuracy rests on the assumption that the designated external measure of performance represents the “true” measure of performance. In an effort to establish the quality of our “gold standard,” we calculated the interrater reliability of the three expert laparoscopists for the four benchmark videotapes. The observed interrater reliability (ICC ⫽ 0.66) lies within the range of reliabilities that have been previously observed with the evaluation of operative tasks [18]. Therefore, expert assessment of videotaped performance constitutes a reasonable gold standard against which to compare selfassessment. However, it should be noted that any degree of expert unreliability functions as a theoretical upper limit on the correlation of the self-evaluations with the gold standard [23]. The ability to detect an improvement in self-assessment after review of the benchmark videotapes may have been compromised by this ceiling effect. After self-observation of videotaped performance, the correlation between self and expert-assessment was 0.63, which was similar to the degree of agreement between experts themselves (0.66).

Conclusions Our research demonstrates that senior surgical residents are capable of discriminating strong versus weak operative performances and are better at self-assessment of technical skill than would be expected from the literature. Future investigation is required to determine whether the demonstrated self-assessment abilities are specific to the evaluation of technical performance or whether more senior trainees are better self-assessors in general.

Acknowledgments This work was supported by a grant from the Association for Surgical Education.

References [1] Spencer JA, Jordan RK. Learner centred approaches in medical education. BMJ 1999;318:1280 – 8. [2] Falchikov N, Boud D. Student self-assessment in higher education: a meta-analysis. Rev Educ Res 1989;59:395– 430.

[3] Gordon MJ. A review of the validity and accuracy of self-assessments in health professions training. Acad Med 1991;66:762–9. [4] Daniel SJ, Scruggs RR, Grady JJ. Accuracy of student self-evaluations of dental sealants. J Dent Hyg 1990;64:339 – 42. [5] Das M, Mpofu D, Dunn E, Lanphear JH. Self and tutor evaluations in problem-based learning tutorials: is there a relationship? Med Educ 1998;32:411– 8. [6] Fitzgerald JT, Gruppen LD, White CB. The influence of task formats on the accuracy of medical students’ self-assessments. Acad Med 2000;75:737– 41. [7] Gruppen LD, Garcia J, Grum CM, et al. Medical students’ selfassessment accuracy in communication skills. Acad Med 1997; 72(suppl 1):S57–9. [8] Harrington JP, Murnaghan JJ, Regehr G. Applying a relative ranking model to the self-assessment of extended performances. Adv Health Sci Educ 1997;2:17–25. [9] Herbert WNP, McGaghie WC, Droegemueller W, et al. Student evaluation in obstetrics and gynecology: self- versus departmental assessment. Obstet Gynecol 1990;76:458 – 61. [10] Wooliscroft JO, TenHaken J, Smith J, Calhoun JG. Medical students’ clinical self-assessments: comparisons with external measures of performance and the students’ self-assessments of overall performance and effort. Acad Med 1993;68:285–94. [11] Antonelli MAS. Accuracy of second-year medical students’ selfassessment of clinical skills. Acad Med 1997;72(suppl 1):S63–5. [12] Johnson D, Cujec B. Comparison of self, nurse, and physician assessment of residents rotating through an intensive care unit. Crit Care Med 1998;26:1811– 6. [13] Risucci DA, Tortolani AJ, Ward RJ. Ratings of surgical residents by self, supervisors and peers. Surg Gynecol Obstet 1989;169:519 –26. [14] Hays RB. Self-evaluation of videotaped consultations. Teach Learn Med 1990;2:232– 6. [15] Palmer PB, Henry JN, Rohe DA. Effect of videotape replay on the quality and accuracy of student self-evaluation. Phys Ther 1985;65: 497–501. [16] Martin D, Regehr G, Hodges B, McNaughton N. Using videotaped benchmarks to improve the self-assessment ability of family practice residents. Acad Med 1998;73:1201– 6. [17] Morton JB, MacBeth WA. Correlations between staff, peer and self assessments of fourth-year students in surgery. Med Educ 1977;11: 167–70. [18] Martin JA, Regehr G, Reznick R, et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 1997;84:273– 8. [19] Reznick R, Regehr G, MacRae H, et al. Testing technical skill via an innovative “bench station” examination. Am J Surg 1996;172:226 – 30. [20] Dath D, MacRae H, Birch D, et al. Towards reliable operative assessment: the reliability and feasibility of videotaped assessment of laparoscopic technical skills. Presented at the meeting of the Association for Surgical Education, Nashville, Tennessee, April 2001. [21] Calhoun JG, Ten Haken JD, Wooliscroft JO. Medical students’ development of self- and peer-assessment skills: a longitudinal study. Teach Learn Med 1990;2:25–9. [22] Sclabassi SE, Woelfel SK. Development of self-assessment skills in medical students. Med Educ 1984;84:226 –31. [23] Ward M, Gruppen L, Regehr G. Measuring self-assessment: current state of the art. Adv Health Sci Educ 2000;7:63– 80.