The American Journal of Surgery 182 (2001) 137–142
Association for surgical education
Laparoscopic skills training Daniel J. Scott, M.D.a, William N. Young, B.S.a, Seifu T. Tesfay, R.N.a, William H. Frawley, Ph.D.b, Robert V. Rege, M.D.a, Daniel B. Jones, M.D.a,* a
Department of Surgery, Southwestern Center for Minimally Invasive Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX 75235-9092, USA b Department of Biostatistics, University of Texas Southwestern Medical Center, Dallas, TX, USA Manuscript received October 19, 2000; revised manuscript April 9, 2001
Abstract Background: The purpose of this study was to quantify the learning curve of a previously validated laparoscopic skills curriculum. Methods: Second-year medical students (MS2, n ⫽ 11) and second (PGY2, n ⫽ 11) and third (PGY3, n ⫽ 6) year surgery residents were enrolled into a curriculum using five video-trainer tasks. All subjects underwent baseline testing, training (30 minutes per day for 10 days), and final testing. Scores were based on completion time. The relationship between task completion time and the number of practice repetitions was examined. Improvement (the difference in baseline and final performance) amongst groups was compared by one-way analysis of variance using the baseline score as a covariate; P ⬍0.05 indicated significance. Results: Baseline scores were not significantly different. Final scores were significantly better for MS2s versus PGY3s. Adjustedimprovement was significantly larger for the MS2s compared with PGY2s and PGY3s, and for PGY2s compared with PGY3s. The mean number of repetitions corresponding to a predicted 90th percentile score was 32. Conclusions: Inexperienced subjects benefit the most from skills training. For maximal benefit, we recommend that each task be practiced for at least 30 to 35 repetitions. © 2001 Excerpta Medica, Inc. All rights reserved. Keywords: Laparoscopy; Learning curve; Skills training; Skills curriculum; Surgical education
Financial pressures limit the use of operating room time for the purpose of training surgery residents [1,2]. Training residents outside of the operating room may decrease patient complications [3] and enhance efficiency [4]. Efforts to train residents using a skills laboratory have begun to gain momentum for laparoscopic surgery [5–14]. However, curriculums may vary considerably, and no consensus exists as to what type of training is appropriate and how much training is necessary. Despite numerous studies showing the effectiveness of curriculums in improving performance in the laboratory setting, data validating curriculums in the context of clinical performance are lacking [7,9,15,16]. We previously showed that operative performance measurably improved after a 2-week period of training using five basic video-trainer tasks [17,18]. We wanted to further * Corresponding author. Tel.: ⫹1-214-648-9000; fax: ⫹1-214-6489448. E-mail address:
[email protected] Presented in part at the Annual Meeting of the Association for Surgical Education, Toronto, Ontario, Canada, May 4, 2000.
characterize this curriculum in order to better understand the effects of training on skill development. The purpose of this study was to determine how much practice was needed to reach a satisfactory level of performance, and to compare the relative benefit of training for subjects with different levels of experience.
Methods Second-year medical students (MS2, n ⫽ 11) and second-year (PGY2, n ⫽ 11) and third-year (PGY3, n ⫽ 6) surgery residents were enrolled into a standardized 4-week curriculum (consisting of 2 testing weeks and 2 training weeks) using a laparoscopic video-trainer (Fig. 1). MS2 subjects participated during July 1999 and were recruited as volunteers; surgery residents participated between August 1998 and June 1999 as part of the residency program. All subjects were questioned as to prior video-game, videotrainer, and laparoscopic experience. Age and sex were recorded.
0002-9610/01/$ – see front matter © 2001 Excerpta Medica, Inc. All rights reserved. PII: S 0 0 0 2 - 9 6 1 0 ( 0 1 ) 0 0 6 6 9 - 9
138
D.J. Scott et al. / The American Journal of Surgery 182 (2001) 137–142
Fig. 1. Multistation video-trainer and five laparoscopic tasks: Checkerboard, Bean Drop, Running String, Block Move, and Suture Foam.
Skills testing and training The standardized curriculum was based on five laparoscopic drills (Fig. 1) suitable for novice surgeons [6,14,17, 18]. The five tasks included Checkerboard, Bean Drop, Running String, Block Move, and Suture Foam, as previously described [17,18]. Trainees met under the direct supervision of an instructor (DJS) during all testing and practice sessions. Scores were based on the completion times according to discrete starting and stopping points for each task. Trainees recorded their own scores throughout the study using digital stopwatches. Data sheets were collected, and scores were entered into a database immediately after each session. During week 1, initial testing was performed during a single morning session lasting about 60 minutes. Immediately before testing, trainees watched an instructor (DJS) perform each task one time and the rules were explained in detail, including the standardized starting and stopping points used for scoring. No practice was allowed prior to testing except for the Suture Foam drill, for which a single supervised practice (untimed) was allowed so that trainees could become familiar with the device. The initial test consisted of three repetitions of each of the five tasks. In order to test up to five individuals at one time, each task was set up at a single video-trainer station, and the trainees rotated among the stations until three successive repetitions
of each task were performed. The choice of which task to start with was left up to the trainee and the trainee rotated to any available station until testing was complete. A composite score was defined as the sum of the five individual task scores. During weeks 2 and three, subjects met in groups consisting of their own peers and practiced the five tasks for 30 minutes per day for 10 consecutive weekdays. The choice of which tasks to practice was left up to the trainee, but the trainees were encouraged to practice all five tasks during each session. Additional demonstration of tasks was rarely given and only upon specific request. All practice attempts were timed and the scores were recorded. During week 4, all subjects met during a single morning session lasting about 45 minutes and were again tested. Practice was not allowed during posttesting and repeat demonstrations of the tasks were not performed. Posttesting was conducted in a rotating manner amongst stations as described above for pretesting. Trainees performed each task three times and recorded their scores. Self-reported interval laparoscopic operative experience was also documented. Statistical analysis Improvement was determined for each subject by subtracting final from baseline scores. Since the amount of improvement varied with baseline performance, a linear
D.J. Scott et al. / The American Journal of Surgery 182 (2001) 137–142
Fig. 2. Checkerboard learning curve: video-trainer task scores (completion time) versus number of consecutive repetitions for second-year medical students (MS II), and second-year (R2), and third-year (R3) surgery resident subjects. The number of repetitions required to achieve a 90th percentile performance level is projected onto the x-axis for each group.
covariance adjustment was used to compensate for differences in baseline scores, as described by Fleiss [19]. Group baseline and final scores, covariance-adjusted improvement, baseline and interval laparoscopic operative experience, number of repetitions, and participant age were subjected to one-way analysis of variance (ANOVA). Post-hoc analysis was conducted using the Student-Newman-Keuls test. Proportions were subjected to chi-square analysis. Differences were considered significant for P ⬍0.05. Values are reported as mean ⫾ SD. The learning curve for each task was constructed as follows. Individual task scores from consecutive repetitions were plotted for each group. Plots were truncated when data were no longer available for three or more subjects from all groups. Spline curves were fit to the data (Figs. 2– 6) and a best ultimate score was determined. A point along the curve corresponding to 90% of the best ultimate score was identified. This point was projected down to the x-axis to iden-
Fig. 3. Bean Drop learning curve with the 90th percentile number of repetitions projected onto the x-axis.
139
Fig. 4. Running String learning curve with the 90th percentile number of repetitions projected onto the x-axis.
tify the corresponding number of repetitions required to reach the 90th percentile level of performance.
Results Eleven MS2, 11 PGY2, and 6 PGY3 subjects completed training. No subject had prior exposure to the video-trainer or to laparoscopic drills. Age, sex, and video-game and operative experience are listed in Table 1. MS2 subjects were significantly younger than PGY2 and PGY3 subjects. No significant difference between groups was demonstrated in the number of video-game players, defined as playing video-games at least once per week, although MS2s had the highest reported number. Laparoscopic experience was defined in terms of the self-reported number of laparoscopic cases as surgeon. Baseline laparoscopic experience was considered the number of cases performed prior to enrollment in the study and was significantly different for all groups. Similarly, interval laparoscopic experience (the
Fig. 5. Block Move learning curve with the 90th percentile number of repetitions projected onto the x-axis.
140
D.J. Scott et al. / The American Journal of Surgery 182 (2001) 137–142 Table 2 Number of repetitions
Checkerboard Bean Drop Running String Block Move Suture Foam Total
MS2 (n ⫽ 11)
PGY2 (n ⫽ 11)
PGY3 (n ⫽ 6)
29 ⫾ 4 34 ⫾ 7 32 ⫾ 5 30 ⫾ 6 49 ⫾ 14 178 ⫾ 27
34 ⫾ 7 40 ⫾ 11 32 ⫾ 11 37 ⫾ 13* 51 ⫾ 11 194 ⫾ 34
34 ⫾ 16 32 ⫾ 8 25 ⫾ 9 23 ⫾ 7 42 ⫾ 13 156 ⫾ 23
Values are mean ⫾ SD. * P ⬍ 0.05, PGY2 versus PGY3, one-way analysis of variance with Student-Newman-Kreuls post-hoc test. MS2 ⫽ second-year medical student; PGY2 ⫽ second-year surgery resident; PGY3 ⫽ third-year surgery resident. Fig. 6. Suture Foam learning curve with the 90th percentile number of repetitions projected onto the x-axis.
Comments number of cases performed between the pretest and posttest intervals) was significantly different for all groups. The number of repetitions for each task is listed in Table 2. There were no significant differences between groups for the number of repetitions for each task and for the total number of practice attempts, except for the PGY2 group, which performed significantly more block move repetitions than the PGY3 group. MS2, PGY2, and PGY3 baseline composite scores were not significantly different (Table 3). MS2 final scores were significantly better than PGY3 final scores. Adjusted-improvement was significantly larger for the MS2 group compared with both the PGY2 and PGY3 groups. Adjustedimprovement was also significantly larger for the PGY2 group compared with the PGY3 group. Learning curves from consecutive task repetitions are shown in Figures 2 through 6. For all groups and tasks combined, the mean number of repetitions required to achieve a 90th percentile score was 32.
Table 1 Age, sex, and video-game and operative experience
Age Female Male Video-game player Laparoscopic experience (number of cases) Baseline Interval
MS2 (n ⫽ 11)
PGY2 (n ⫽ 11)
PGY3 (n ⫽ 6)
25 ⫾ 3*† 2 9 8
28 ⫾ 2 3 8 4
29 ⫾ 2 0 6 1
0*† 0*†
9 ⫾ 6‡ 6 ⫾ 4‡
18 ⫾ 10 10 ⫾ 5
Values are mean ⫾ SD. One-way analysis of variance with StudentNewman-Kreuls post-hoc test: * P ⬍ 0.05, MS2 versus PGY2; † P ⬍ 0.05, MS2 versus PGY3; ‡ P ⬍ 0.05, PGY2 versus PGY3. MS2 ⫽ second-year medical students; PGY2 ⫽ second-year surgery resident; PGY3 ⫽ third-year surgery resident.
This study documented laparoscopic performance of three groups of subjects with different levels of experience over the course of a previously validated curriculum. Two fundamental questions about skills training were examined: who benefits the most from training, and how much training is enough? To determine the benefit of training, improvement was defined as the difference in baseline and final performance. Improvement was adjusted using a covariance analysis to compensate for differences in baseline performance [19]. A subject who performed poorly at baseline had a greater opportunity to improve than did a subject who did well on initial assessment. The statistical technique of covariance analysis provided a means of compensating for individual differences in baseline test scores, so that groups with different baseline performance could be compared. The net effect of this analysis was to adjust each subject’s improvement score to an equivalent score, as if each baseline test score had been equal to the overall baseline test average. Our results suggest that the magnitude of adjusted-improvement was inversely related to trainee experience level
Table 3 Composite scores: time (seconds) for task completion
Baseline Final Adjusted improvement§
MS2 (n ⫽ 11)
PGY2 (n ⫽ 11)
PGY3 (n ⫽ 6)
432 ⫾ 69 169 ⫾ 21† 249 ⫾ 22*†
403 ⫾ 105 203 ⫾ 47 207 ⫾ 33‡
398 ⫾ 33 236 ⫾ 46 173 ⫾ 39
Values are mean ⫾ SD. One-way analysis of variance with Student-Newman-Kreuls post-hoc test. * P ⬍ 0.05, MS2 versus PGY2; † P ⬍ 0.05, MS2 versus PGY3; ‡ P ⬍ 0.05, PGY2 versus PGY3. § Adjusted improvement defined as baseline minus final score, calculated individually for each subject, adjusted by linear analysis of covariance for differences in baseline scores. MS2 ⫽ second-year medical students; PGY2 ⫽ second-year surgery residents; PGY3 ⫽ third-year surgery residents.
D.J. Scott et al. / The American Journal of Surgery 182 (2001) 137–142
and independent of interval laparoscopic operative experience. The only statistically significant difference in the number of task repetitions was between PGY2 and PGY3 groups for the block move. The total number of repetitions was not significantly different between groups. All groups received the same standardized training, but the medical students seemed to learn the most. It seems intuitive that the least experienced subjects would derive the most benefit from training, but until now no data supported this concept. Several limitations in the design of this study must, however, be mentioned. First, although this study was prospective, the groups were not randomized. Second, the medical students were volunteers whereas residents were required to participate as part of the residency program. Voluntary participation may have enhanced the enthusiasm of the medical students. Additionally, residents— having already had operative experience—may have been disinterested in simulated tasks. Furthermore, residents were subjected to the stresses of long working hours and being on call. Responsibilities for patient care may have distracted residents from concentrating on skills training. On the other hand, we have shown that video-trainer skills are transferable to the operating room [17,18] and we expected the residents—with more interval operative experience—to show greater improvement and to outperform the medical students at posttesting. As this was not the case, several explanations may be offered. The medical students initially realized large improvements in performance; positive reinforcement may have motivated the novice subjects to work harder on training than the experienced subjects. Medical students were also younger than the residents and although statistical significance was not reached, a greater number of medical students tended to play video-games than residents. The younger, technology-minded medical students may have more easily adapted to the simulated laparoscopic environment of the skills laboratory. Despite the limitations of this study, it seems reasonable to conclude that basic level laparoscopic training should commence as early in residency as possible. Other studies have documented the learning process associated with laparoscopic skills training. Rosser et al [5] showed steady improvement over 10 repetitions for three tasks performed by practicing surgeons. In a later study, Rosser et al [6] showed differences in the time required to perform 10 repetitions of five tasks for residents and practicing surgeons. Compared with practicing surgeons, residents had better performance on one task, equal performance on two tasks, and worse performance on two tasks. Derossis et al [8] showed improvement for PGY3 residents practicing seven tasks, but a plateau in performance was not demonstrated over the course of seven repetitions. Shapiro et al [11] showed improvement in four of five tasks for practicing surgeons who underwent 5 hours of training. Melvin et al [10] showed that performance on four tasks improved for PGY2 and PGY3 residents after 4 hours of training. All of these studies emphasized the need for skills
141
curriculums and documented improvement in skill with training. However, there exists no consensus on which tasks are suitable, how much training is needed, and who should be trained. In our study, we graphically analyzed the large number of task repetitions performed by various trainee levels in an attempt to quantify the learning process. Until now our training efforts have been based solely on training duration. We generated the learning curves (Figs. 2 to 6) to examine skill acquisition in relation to the number of repetitions. Spline curves were used to show overall trends while minimizing fluctuations between adjacent data points. The slopes of the curves were generally steeper for the MS2 group compared with the PGY2 and PGY3 groups, demonstrating more rapid improvement for the MS2 group. Fortunately, a plateau in scores occurred near the end of the training, which indicated that ten 30-minute sessions was a reasonably efficient amount of training. Had fewer repetitions been performed, the subjects would not have reached their full potential. With additional repetitions, improvement diminished until a plateau occurred and no further improvement was achieved. To predict where the plateau was being approached, the 90th percentile score was chosen. This cutoff level was arbitrary but provided a reasonable estimate of performance that could be expected for these tasks. We were thus able to predict the number of repetitions required to achieve a specified level of performance. For all groups and tasks combined, a mean of 32 repetitions was required to achieve a predicted 90th percentile score. In light of these findings, we currently recommend that subjects practice each task at least 30 to 35 times, instead of requiring a specified training duration. At the University of Texas Southwestern Medical Center, we are changing our video-trainer curriculum to incorporate this approach. Caution regarding the interpretation of skills training data is warranted. Our data are encouraging for using the skills laboratory to train residents. Using the laboratory to assess performance may be a more complex issue. Previously, we showed that skills testing correlates with operative performance [20], and both skills testing and intraoperative assessment may be useful evaluation modalities. However, as seen in the present study, isolated skills testing scores may not be indicative of operative ability. The medical student who outperforms the PGY3 resident on the video-trainer may have a superior score, but is not technically capable of performing an operation and lacks surgical judgment. Technical skill is simply one component of competency and skills testing needs further validation before being used for assessment purposes. Laparoscopic skills training is an established part of our residency program. Data now support that the curriculum we use is effective and relevant to clinical practice. Training should commence as early as possible. Thirty to 35 repetitions over a 2-week period is recommended for novices and junior surgery residents to reach an adequate level of per-
142
D.J. Scott et al. / The American Journal of Surgery 182 (2001) 137–142
formance. Technological developments, including virtual reality and procedure-specific simulators [21], may further enhance surgical education and will require validation. Additionally, the durability of skills acquired in the skills laboratory is unknown and will need to be further investigated.
Acknowledgments This study was supported by a grant from the Association for Surgical Education. Funding was also provided by the Southwestern Center for Minimally Invasive Surgery as supported in part by an educational grant from United States Surgical, a division of Tyco Healthcare Group. The videotrainer was provided by Karl Storz Endoscopy.
References [1] Bridges M, Diamond D. The financial impact of teaching surgical residents in the operating room. Am J Surg 1999;177:28 –32. [2] Schwartz RW, Sloan DA, Griffen WO, et al. The necessity, practicality, and feasibility of modern educational and evaluative methods for residency training: economic and governing body perspectives, the 1994 Association for Academic Surgery panel on education. Curr Surg 1997;54:261–9. [3] Martin M, Vashisht B, Frezza E, et al. Competency-based instruction in critical invasive skills improves both resident performance and patient safety. Surgery 1998;124:313–17. [4] Reznick RK. Teaching and testing technical skills. Am J Surg 1993; 165:358 – 61. [5] Rosser JC, Rosser LE, Salvalgi RS. Skill acquisition and assessment for laparoscopic surgery. Arch Surg 1997;132:200 – 4. [6] Rosser JC, Rosser LE, Savalgi RS. Objective evaluation of a laparoscopic surgical skill program for residents and senior surgeons. Arch Surg 1998;133:657– 61.
[7] Derossis AM, Fried GM, Abrahamowicz M, et al. Development of a model for training and evaluation of laparoscopic skills. Am J Surg 1998;175:482–7. [8] Derossis AM, Bothwell J, Sigman HH, Fried GM. The effect of practice on performance in a laparoscopic simulator. Surg Endosc 1998;12:1117–20. [9] Keyser EJ, Derossis AM, Antoniuk M, et al. A simplified simulator for the training and evaluation of laparoscopic skills. Surg Endosc 2000;14:149 –53. [10] Melvin WS, Johnson JA, Ellison EC. Laparoscopic skills enhancement. Am J Surg 1996;172:377–9. [11] Shapiro SJ, Paz-Partlow M, Daykhovsky L, Gordon LA. The use of a modular skills center for the maintenance of laparoscopic skills. Surg Endosc 1996;10:816 –19. [12] Macmillan AIM, Cuschieri A. Assessment of innate ability and skills for endoscopic manipulations by the advanced Dundee endoscopic psychomotor tester: predictive and concurrent validity. Am J Surg 1999;177:274 –7. [13] Chung JY, Sackier JM. A method of objectively evaluating improvements in laparoscopic skills. Surg Endosc 1998;12:1111–16. [14] Jones DB, Brewer JD, Soper NJ. The influence of three-dimensional video systems on laparoscopic task performance. Surg Laparosc Endosc 1996;6:191–7. [15] Gagner M. Objective evaluation of a laparoscopic surgical skill program. Arch Surg 1998;133:911–12. [16] Reznick RK. Virtual reality surgical simulators: feasible but valid? J Am Coll Surg 1999;189:127– 8. [17] Scott DJ, Bergen PC, Euhus DM, et al. Intense laparoscopic skills training improves operative performance of surgery residents. Am Coll Surg Surg Forum. 1999;L:670 –1. [18] Scott DJ, Bergen PC, Rege RV, et al. Laparoscopic training on bench models: better and more cost effective than operating room experience? J Am Coll Surg 2000;191(3):272– 83. [19] Fleiss JL. Design and analysis of clinical experiments. New York: Wiley, 1986. [20] Scott DJ, Valentine RJ, Bergen PC, et al. Evaluating surgical competency using ABSITE, skill testing, and intra-operative assessment. Surgery 2000;128:613–22. [21] Scott DJ, Rege RV, Tesfay ST, Jones DB. Development of a laparoscopic TEP hernia repair simulator for training surgery residents [abstract]. Surg Endosc 2000;14:S217.