EDUCATION
Evaluation of Resident Laparoscopic Performance Using Global Operative Assessment of Laparoscopic Skills Andrew A Gumbs, MD, Nancy J Hogle, MS, Dennis L Fowler, MD, FACS The Global Operative Assessment of Laparoscopic Skills (GOALS), developed by Vassiliou and colleagues, has construct validity in the assessment of surgical residents’ laparoscopic skills in dissection of the gallbladder from the liver bed. We hypothesized that GOALS would have construct validity for the entire laparoscopic cholecystectomy procedure and also for laparoscopic appendectomy. METHODS: Using GOALS, attending surgeons evaluated PGY1 through PGY5 surgical resident performance during laparoscopic cholecystectomy (LC, n ⫽ 51) and laparoscopic appendectomy (LA, n ⫽ 43). Scores for five domains (depth perception, bimanual dexterity, efficiency, tissue handling, and autonomy) were recorded on a Web-based operative report generator at the conclusion of all cases. Domain scores were recorded using a 5-point Likert scale. Difficulty of the case was similarly rated on a 5-point scale. For analysis, residents were divided into two groups: novice (PGY1 to 3) and experienced (PGY4 to 5). Biostatistical analysis was performed using a two-sample t-test. Paired t-test was used to compare mean scores of residents who performed both LA and LC. RESULTS: For both LC and LA, the experienced group scored higher than novices did in all five domains. The differences were significant in all domains. Using the mean of the scores from all 5 domains for both LC and LA, the experienced residents scored significantly better than novices did (LC 3.93 versus 2.76, p ⬍ 0.001) (LA 4.22 versus 2.75, p ⬍ 0.001). No significant differences were noted in difficulty of the cases (p ⫽ 0.060 for LC and p ⫽ 0.19 for LA). CONCLUSIONS: This study provides additional evidence in support of GOALS as an assessment tool for objectively measuring technical skills in laparoscopic surgery. (J Am Coll Surg 2007;204:308–313. © 2007 by the American College of Surgeons) BACKGROUND:
With the growing technical complexity of today’s general surgical training and the limitation of the 80-hour workweek, there is an urgent need for a standard method to adequately train and assess surgical residents. Teaching minimally invasive techniques in the operating room has become increasingly difficult in the litigious and economically compromised environment of modern medicine.1-4 Multiple simulators and scoring systems have been devised to assess the technical skills of surgeons in training, but they have been fraught with
problems of applicability in the operating room and cost and time consumption issues.2,5-8 The Global Operative Assessment of Laparoscopic Skills (GOALS), developed at McGill University in Montreal by Vassiliou and colleagues,2 has construct validity in the assessment of surgical residents’ intraoperative performances of dissection of the gallbladder from the liver bed. Construct validity is demonstrated when a test has the ability to discriminate between groups with assumed differences, such as novice and expert. Experienced surgeons and surgeons with more training (experts) should perform better than surgeons with less experience or less training (novices).2,9 Additionally, Fried and colleagues10 showed that GOALS has concurrent validity by comparing GOALS scores with assessment of skills using a simulator. Concurrent validity is another method for demonstrating construct validity.
Competing Interests Declared: None. Received May 6, 2006; Revised November 6, 2006; Accepted November 14, 2006. From the Minimal Access Surgery Center, Columbia College of Physicians and Surgeons, New York, NY. Correspondence address: Dennis Fowler, MD, Minimal Access Surgery Center, Columbia College of Physicians and Surgeons, 622 W 168th St, PH 12-126, New York, NY 10032. email:
[email protected]
© 2007 by the American College of Surgeons Published by Elsevier Inc.
308
ISSN 1072-7515/07/$32.00 doi:10.1016/j.jamcollsurg.2006.11.010
Vol. 204, No. 2, February 2007
Gumbs et al
Abbreviation and Acronyms
GOALS ⫽ Global Operative Assessment of Laparoscopic Skills LA ⫽ laparoscopic appendectomy LC ⫽ laparoscopic cholecystectomy.
GOALS is based on the concept that performance can be evaluated in several categories called domains. It evaluates performance in five domains (depth perception, bimanual dexterity, efficiency, tissue handling, and autonomy), and each domain is scored with an integer rating from 1 to 5. A descriptive anchor is provided for scores of 1, 3, and 5 for each domain (Table 1). We used GOALS in the evaluation of novice and experienced surgical residents and hypothesized that GOALS would have construct validity for the evaluation of residents’ performances of entire laparoscopic procedures, specifically laparoscopic cholecystectomy (LC) and laparoscopic appendectomy (LA). METHODS To provide documentation of the performance of each resident, the GOALS evaluation tool was installed in a Web-based operative report generator, and each attending surgeon was expected to complete a GOALS evaluation of the resident at the end of each laparoscopic procedure. GOALS evaluation is required by the departmental leadership as a teaching requirement and review of these data was identified as a quality improvement initiative. At our institution, quality improvement initiatives do not require Institutional Review Board approval. For the purpose of this study, the first GOALS evaluation of the academic year for each resident for LC and LA was collected. The performance of the resident in each of the five domains during LC or LA was tabulated and analyzed. The residents were divided into two categories: novices (PGY1, PGY2, and PGY3) and experienced (PGY4 and PGY5). We included PGY3s in the novice group because we still consider them to be junior residents with minimal opportunity for autonomy, compared with PGY4 and PGY5 residents at our institution. Any resident could be reevaluated for inclusion in the study if at least 6 months had passed since the previously included evaluation and if the resident was in a new PGY year. Scores of novices were compared with scores of experienced residents for each domain and also
Global Operative Assessment of Laparoscopic Skills
309
for a mean of all five domains for both LC and LA. Level of difficulty of each procedure was documented using a 5-point Likert scale. Descriptors/anchors were defined for different degrees of difficulty (Table 1). Biostatistical analysis was performed using a two-sample t-test. Paired t-test was used to compare mean scores of residents who performed both LA and LC. RESULTS A total of 13 attending surgeons experienced in LC and LA performed the assessments at 2 sites of 1 major academic center. The attendings and residents were the same in both hospitals. In total, for both LC and LA, only 4 residents of 94 (4%) were counted twice. Between July 2004 and April 2006, 30 novices and 21 experienced residents were evaluated for LC. The experienced group scored significantly higher for each of the five domains (Table 2). A two-sample t-test analysis of the mean of all 5 domains for LC revealed the difference between novices and experienced residents was also significant (2.76 versus 3.93, p ⬍ 0.001). Difficulty of operation was evaluated in 35 of 51 patients (69%), and there was no significant difference between the difficulty of the cases for novices (range 1 to 4; mean 2.3) when compared with the difficulty of cases for experienced residents (range 1 to 5; mean 3.0; p ⫽ 0.060). Twenty-five novices and 18 experienced residents were evaluated for their performance during LA. The experienced group scored significantly higher for each of the five domains (Table 3). A significant difference was again noted when comparing the mean of all 5 domains for the novices with the mean of all 5 domains for the experienced residents (2.75 versus 4.22, p ⬍ 0.001). Difficulty of operation was evaluated in 30 of 43 patients (70%), and no significant difference was noted when comparing the difficulty between the 2 groups of residents (novices: range 1 to 5, mean 2.3; experienced: range 1 to 5; mean ⫽ 2.9; p ⫽ 0.19). Thirty-five of 43 residents who performed LA also performed LC in this study. Using paired t-test, we compared the mean score of the residents in each group for LA with their mean score for LC. For novices, there was no difference in scores between the scores for LA and LC for depth perception (p ⫽ 0.73), bimanual dexterity (p ⫽ 0.67), efficiency (p ⫽ 0.48), tissue handling (p ⫽ 0.57, and autonomy (p ⫽ 0.60). For the experienced group, there was no difference in scores between those for LA and LC for depth perception (p ⫽ 0.55), bimanual
310 Gumbs et al
Table 1. Global Operative Assessment of Laparoscopic Skills Domains
Bimanual dexterity
Constantly overshooting target, hits backstop, wide swings, slow to correct Use of one hand, ignoring nondominant hand, poor coordination between hands
Efficiency
Uncertain, much wasted effort, many tentative motions, constantly changing focus of operation, or persisting at a task without progress
Tissue handling
Rough, tears tissue by excessive traction, injures adjacent structures, poor control of coagulation device (recoil), grasper frequently slips off
Autonomy
Unable to complete entire procedure, even in a straightforward case and with extensive verbal guidance Easy exploration and dissection
Level of difficulty
2
Anchor descriptors 3
4
Accurately directs instruments in correct plane to target
Use of both hands but does not optimize interactions between hands to facilitate conduct of operation Slow, but planned and reasonably organized
Expertly uses both hands in a complementary manner to provide optimal working exposure Confident, efficient and safe conduct of operation, maintaining focus on component of procedure until better done by another approach Handles tissues very well with appropriate traction on tissues and negligible injury of adjacent structures. Uses energy sources appropriately but not excessively Able to complete operation independently without prompting
Handles tissues reasonably well, with some minor trauma to adjacent tissues, eg, coagulation of liver, causes unnecessary liver bleeding, occasional slipping of grasper Able to complete operation safely with moderate prompting Intermediate between 1 and 3
5
Some overshooting or missing plane but corrects quickly
Moderate difficulty (eg, mild inflammation, scarring, adhesions, obesity, or severity of disease)
Intermediate between 3 and 5
Global Operative Assessment of Laparoscopic Skills
Depth perception
1
Extremely difficult (eg, severe inflammation, scarring, adhesions, obesity, or severity of disease)
Adapted from Vassiliou et al, American Journal of Surgery, 2 with permission.
J Am Coll Surg
Vol. 204, No. 2, February 2007
Gumbs et al
Global Operative Assessment of Laparoscopic Skills
311
Table 2. Two-Sample t-Test for Laparoscopic Cholecystectomy, Novice Versus Experienced
Domain
Depth perception Bimanual dexterity Efficiency Tissue handling Autonomy Average Degree of difficulty
Novice (PGY1ⴚ3) score (n ⴝ 30*) 95% CI Mean‡
2.83 2.53 2.53 3.03 2.87 2.76 2.32
2.52⫺3.14 2.16⫺2.88 2.20⫺2.87 2.72⫺3.35 2.59⫺3.14 2.50⫺3.02 1.84⫺2.80
Experienced (PGY4ⴚ5) score (n ⴝ 21†) Mean‡ 95% CI
4.29 3.81 3.90 3.86 3.81 3.93 3.06
3.83⫺4.74 3.41⫺4.21 3.45⫺4.36 3.33⫺4.38 3.34⫺4.28 3.53⫺4.33 2.35⫺3.77
p Value
⬍ 0.001 ⬍ 0.001 ⬍ 0.001 0.005 ⬍ 0.001 ⬍ 0.001 0.060
*PGY1, n ⫽ 3; PGY2, n ⫽ 14; PGY3, n ⫽ 13. † PGY4, n ⫽ 10; PGY5, n ⫽ 11. ‡ Highest ⫽ best score (5 ⫽ best performance; 1 ⫽ worst performance).
dexterity (p ⫽ 0.30), efficiency (p ⫽ 0.72), tissue handling (p ⫽ 0.62), and autonomy (p ⫽ 0.21) (Table 4). DISCUSSION Interest in the development of surgical simulators and assessment tools to evaluate the skills and performance of laparoscopic surgeons has increased in recent years, if only because of the increased publicity about medical errors. Additionally, the public’s perception of an improvement in airline safety can, at least in part, be attributed to implementation of standardized assessment for airline pilots.11 Application of this model to the operating room has been fraught with difficulties, including high cost, time consumption, and lack of a validated assessment tool. Not only have surgeons not had a validated simulator with which to train and assess surgeons, they have not had a validated assessment tool of any type, other than a global assessment. Early simulators used inanimate surgical models, although more recently developed simulators used a computer to create a virtual
environment, but none of these devices has achieved wide applicability to date.11-20 GOALS, developed at McGill University, has been shown to have construct validity in the assessment of surgical residents’ laparoscopic skills in a portion of laparoscopic cholecystectomy.2 We hypothesized that GOALS would have construct validity for the entire laparoscopic procedure. To identify any differences in difficulty of the cases between groups, we assessed the degree of difficulty with a 5-point Likert scale and scored it similarly to the other domains on a scale of 1 to 5.2 A Likert scale was chosen because the GOALS evaluation was computer-based and use of the visual analog scale was not possible. Although the Likert scale is not validated, we believe the well-defined anchors should lead to a reproducible documentation of the difficulty of the procedure. Our analysis revealed that GOALS was able to differentiate between groups of novice and experienced resident surgeons in the assessment of entire laparoscopic
Table 3. Two-Sample t-Test for Laparoscopic Appendectomy, Novice Versus Experienced
Domain
Depth perception Bimanual dexterity Efficiency Tissue handling Autonomy Average Degree of difficulty
Novice (PGY1ⴚ3) score (n ⴝ 25*) Mean‡ 95% CI
2.96 2.64 2.64 2.76 2.76 2.75 2.31
2.59⫺3.33 2.15⫺3.13 2.21⫺3.07 2.31⫺3.21 2.33⫺3.19 2.37⫺3.13 1.67⫺2.95
*PGY1, n ⫽ 3; PGY2, n ⫽ 13; PGY3, n ⫽ 9. † PGY4, n ⫽ 7; PGY5, n ⫽ 11. ‡ Highest ⫽ best score (5 ⫽ best performance; 1 ⫽ worst performance).
Experienced (PGY4ⴚ5) score (n ⴝ 18†) Mean‡ 95% CI
4.5 4.11 4.11 4.17 4.22 4.22 2.93
4.15⫺4.85 3.70⫺4.53 3.66⫺4.56 3.65⫺4.69 3.79⫺4.66 3.85⫺4.59 2.07⫺3.79
p Value
⬍ 0.001 ⬍ 0.001 ⬍ 0.001 ⬍ 0.001 ⬍ 0.001 ⬍ 0.001 0.19
312
Gumbs et al
Global Operative Assessment of Laparoscopic Skills
J Am Coll Surg
Table 4. Paired t-Test for Novice and Experienced Residents Who Completed Both Laparoscopic Cholecystectomy and Laparoscopic Appendectomy Domain
Depth perception Bimanual dexterity Efficiency Tissue handling Autonomy
LA
2.95 2.64 2.68 2.82 2.77
Novice (PGY1ⴚ3) (n ⴝ 22) LC p Value
2.86 2.5 2.5 3.00 2.82
0.73 0.67 0.48 0.57 0.60
LA
4.54 4.08 4.00 4.08 4.08
Experienced (PGY4ⴚ5) (n ⴝ 13) LC p Value
4.38 3.77 3.92 3.85 3.69
0.55 0.30 0.72 0.62 0.21
LA, Laparoscopic appendectomy; LC, laparoscopic cholecystectomy.
procedures. When we analyzed the means of the scores for all domains, performances of experienced residents were significantly better than those of novice residents, for both LC and LA (p ⬍ 0.001). Mean scores for difficulty of cases for both operations were not significantly different for experienced residents compared with novice residents, although there were somewhat higher mean scores for cases done by experienced residents with a higher range (for LC). This lends support to the use of GOALS to differentiate performance between novice and experienced residents. The lack of a significant difference between scores for LA and LC for the residents in each group who performed both operations provides additional evidence to support our premise that GOALS is a useful tool to evaluate a surgeon’s technical skills regardless of the procedure being evaluated. In fact, clinically, there was only a small difference for each domain. By evaluating every resident on every case, GOALS can be used to identify areas of skill deficiency that require improvement. With appropriate feedback to the program director and resident, additional training and mentoring can be offered to address the skills deficiency. Additionally, residents’ performances can be evaluated in relation to the mean of the performances of other residents in the same postgraduate year, or a yet to be defined benchmark. Other methods to objectively score surgeons’ performances during entire laparoscopic procedures have been developed, but widespread application has been hampered by the complexity of the assessment tools. One tool was a scoring system that was able to differentiate year of experience, but it required 3 experienced surgeons reviewing videotapes of LC and assigning scores to 23 different steps.12 Another tool consisted of video review that also had construct validity, but its hierarchical task analysis was specific to only LC and not applicable to other laparoscopic procedures.13 Animal and inanimate surgical models have been de-
veloped to evaluate surgical skills of residents.14 A 6-year study of Objective Structured Assessment of Technical Skills recently found that resident surgical skills improve with time, but there was no demonstrable improvement when compared with residents who received less laboratory training.14 This implies that although laboratory training will make residents better at performing laboratory skills, it will not necessarily give them better skills in the operating room. Another limitation of widespread use of Objective Structured Assessment of Technical Skills is cost, as each examination costs from $40 to $150 per resident per examination.11 Intraoperative assessment with GOALS costs relatively little (the evaluator’s time) and can be completed at the end of each case; as a result, no clinical time is lost from residents who already have a curtailed workweek. Nonetheless, benefits of the Objective Structured Assessment of Technical Skills include standardization, exposure of deficiencies in residency programs, and evaluation of minimal technical standards in residency programs.11 There is considerable evidence supporting a virtual reality simulator’s reliability and construct validity.18,21,22 Interest in virtual reality simulators exists because of the similarity to existing models in other industries, ability to obtain standardization, and ability to use the simulators as teaching instruments.11,18 Critics of virtual reality simulators note that real laparoscopic instruments are not used, current haptic technology has limitations, and high cost limits widespread applicability of virtual reality simulators.19 GOALS is an easy tool to use and, in this study, attendings completed it at the end of the cases. At our institution, it is a computerized form that appears at the end of each operative note. All operative notes must be completed at the end of each procedure and the computerized and automatic appearance of the GOALS form at the end of all operative notes accounts for the high numbers of residents evaluated over a relatively
Vol. 204, No. 2, February 2007
Gumbs et al
short time. The form requires less than 2 minutes to complete and data collection is simplified because it is already on a computer database.15 Our study has several limitations, including the lack of blinding of the evaluators and lack of proof of interrater reliability. Although evaluators were aware of the general level of training of the residents, not all evaluators correctly identified the postgraduate year of the resident. Evaluators did not have information about the residents’ case logs or specific information about actual earlier experience or information about training outside the operating room. In reviewing all LC and LA evaluations, we noticed that if a resident was evaluated twice in the same day, on the same type of case, by the same evaluator, the scores were not necessarily identical. It was not possible in this study to document interrater reliability because we used only one evaluator per resident. But Vassiliou and colleagues have documented inter-relater reliability in previous work.2 In future studies, the use of GOALS to assess the entire LC and LA will be assessed for external validity at other institutions. Face and content validity of the instrument remain to be proved. In conclusion, this study provides additional evidence in support of GOALS as an assessment tool for objectively measuring technical skills in laparoscopic surgery. This study documents that GOALS is an appropriate assessment tool to evaluate surgery residents’ performances during basic laparoscopic procedures. REFERENCES 1. Moorthy K, Munz Y, Sarker SK, Darzi A. Objective assessment of technical skills in surgery. BMJ 2003;327:1032–1037. 2. Vassiliou MC, Feldman LS, Andrew CG, et al. A global assessment tool for evaluation of intraoperative laparoscopic skills. Am J Surg 2005;190:107–113. 3. Haluck RS, Krummel TM. Computers and virtual reality for surgical education in the 21st century. Arch Surg 2000;135: 786–792. 4. Bridges M, Diamond DL. The financial impact of teaching surgical residents in the operating room. Am J Surg 1999;177: 28–32. 5. Gallagher AG, McClure N, McGuigan J, et al. Virtual reality training in laparoscopic surgery: a preliminary assessment of minimally invasive surgical trainer virtual reality (MIST VR). Endoscopy 1999;31:310–313.
Global Operative Assessment of Laparoscopic Skills
313
6. Scott DJ, Bergen PC, Rege RV, et al. Laparoscopic training on bench models: better and more cost effective than operating room experience? J Am Coll Surg 2000;191:272–283. 7. Rogers DA, Elstein AS, Bordage G. Improving continuing medical education for surgical techniques: applying the lessons learned in the first decade of minimal access surgery. Ann Surg 2001;233:159–166. 8. Derossis AM, Fried GM, Abrahamowicz M, et al. Development of a model for training and evaluation of laparoscopic skills. Am J Surg 1998;175:482–487. 9. Gallagher AG, Ritter EM, Satava RM. Fundamental principles of validation, and reliability: rigorous science for the assessment of surgical education and training. Surg Endosc 2003;17:1525– 1529. 10. Fried GM, Feldman LS, Vassiliou MC, et al. Proving the value of simulation in laparoscopic surgery. Ann Surg 2004;240:518– 525; discussion 525⫺518. 11. Goff B, Mandel L, Lentz G, et al. Assessment of resident surgical skills: is testing feasible? Am J Obstet Gynecol 2005;192:1331– 1338; discussion 1338⫺1340. 12. Eubanks TR, Clements RH, Pohl D, et al. An objective scoring system for laparoscopic cholecystectomy. J Am Coll Surg 1999; 189:566–574. 13. Sarker SK, Chang A, Vincent C, Darzi SA. Development of assessing generic and specific technical skills in laparoscopic surgery. Am J Surg 2006;191:238–244. 14. Lentz GM, Mandel LS, Goff BA. A six-year study of surgical teaching and skills evaluation for obstetric/gynecologic residents in porcine and inanimate surgical models. Am J Obstet Gynecol 2005;193:2056–2061. 15. Larson JL, Williams RG, Ketchum J, et al. Feasibility, reliability and validity of an operative performance rating system for evaluating surgery residents. Surgery 2005;138:640–647; discussion 647⫺649. 16. Mandel LS, Goff BA, Lentz GM. Self-assessment of resident surgical skills: is it feasible? Am J Obstet Gynecol 2005;193: 1817–1822. 17. Youngblood PL, Srivastava S, Curet M, et al. Comparison of training on two laparoscopic simulators and assessment of skills transfer to surgical performance. J Am Coll Surg 2005;200: 546–551. 18. Vassiliou MC, Ghitulescu GA, Feldman LS, et al. The MISTELS program to measure technical skill in laparoscopic surgery: evidence for reliability. Surg Endosc 2006;20:744–747. 19. Adrales GL, Chu UB, Hoskins JD, et al. Development of a valid, cost-effective laparoscopic training program. Am J Surg 2004; 187:157–163. 20. Sarker SK, Vincent C, Darzi AW. Assessing the teaching of technical skills. Am J Surg 2005;189:416–418. 21. Duffy AJ, Hogle NJ, McCarthy H, et al. Construct validity for the LAPSIM laparoscopic surgical simulator. Surg Endosc 2005; 19:401–405. 22. Fraser SA, Klassen DR, Feldman LS, et al. Evaluating laparoscopic skills: setting the pass/fail score for the MISTELS system. Surg Endosc 2003;17:964–967.