Task-Level vs. Segment-Level Quantitative Metrics for Surgical Skill Assessment

Task-Level vs. Segment-Level Quantitative Metrics for Surgical Skill Assessment

ORIGINAL REPORTS Task-Level vs. Segment-Level Quantitative Metrics for Surgical Skill Assessment$ S. Swaroop Vedula, MBBS, PhD,* Anand Malpani, MSE,*...

1MB Sizes 0 Downloads 40 Views

ORIGINAL REPORTS

Task-Level vs. Segment-Level Quantitative Metrics for Surgical Skill Assessment$ S. Swaroop Vedula, MBBS, PhD,* Anand Malpani, MSE,* Narges Ahmidi, PhD,* Sanjeev Khudanpur, PhD,† Gregory Hager, PhD,* and Chi Chiung Grace Chen, MD‡ Department of Computer Science, Johns Hopkins University, Baltimore, Maryland; †Department of Electrical & Computer Engineering, Johns Hopkins University, Baltimore, Maryland; and ‡Department of Gynecology & Obstetrics, Johns Hopkins University School of Medicine, Baltimore, Maryland *

OBJECTIVE: Task-level metrics of time and motion efficiency are valid measures of surgical technical skill. Metrics may be computed for segments (maneuvers and gestures) within a task after hierarchical task decomposition. Our objective was to compare task-level and segment (maneuver and gesture)-level metrics for surgical technical skill assessment. DESIGN: Our analyses include predictive modeling using

data from a prospective cohort study. We used a hierarchical semantic vocabulary to segment a simple surgical task of passing a needle across an incision and tying a surgeon’s knot into maneuvers and gestures. We computed time, path length, and movements for the task, maneuvers, and gestures using tool motion data. We fit logistic regression models to predict experience-based skill using the quantitative metrics. We compared the area under a receiver operating characteristic curve (AUC) for task-level, maneuver-level, and gesture-level models. SETTING: Robotic surgical skills training laboratory. PARTICIPANTS: In total, 4 faculty surgeons with experi-

ence in robotic surgery and 14 trainee surgeons with no or minimal experience in robotic surgery. RESULTS: Experts performed the task in shorter time (49.74 s; 95% CI ¼ 43.27-56.21 vs. 81.97; 95% CI ¼ 69.71-94.22), with shorter path length (1.63 m; 95% CI ¼ 1.49-1.76 vs. 2.23; 95% CI ¼ 1.91-2.56), and with fewer movements (429.25; 95% CI ¼ 383.80-474.70 vs. 728.69; 95% CI ¼ 631.84-825.54) than novices. Experts differed from novices on metrics for individual maneuvers and gestures. The AUCs were 0.79; 95% CI ¼ 0.62-0.97 for ☆ Funding from the following sources supported the design, conduct, and analysis of the study described in this manuscript: NSF-NRI Award 1227277, NSF-CPS 0931805, NSF-CDI-II 0941362, NIH 1R21EB009143-01A1, Intuitive Surgical, Inc. (Sunnyvale, CA), and the Swirnow Family Foundation. Correspondence: Inquiries to S. Swaroop Vedula, MBBS, PhD, 3400 N Charles Street, Hackerman Hall 200, Baltimore, MD 21218; e-mail: [email protected]

482

task-level models, 0.78; 95% CI ¼ 0.6-0.96 for maneuverlevel models, and 0.7; 95% CI ¼ 0.44-0.97 for gesture-level models. There was no statistically significant difference in AUC between task-level and maneuver-level (p ¼ 0.7) or gesture-level models (p ¼ 0.17). CONCLUSIONS: Maneuver-level and gesture-level metrics are discriminative of surgical skill and can be used to provide targeted feedback to surgical trainees. ( J Surg Ed C 2016 Association of Program Directors in 73:482-489. J Surgery. Published by Elsevier Inc. All rights reserved.) KEY WORDS: task decomposition, segment-level skill metrics, objective skill assessment, task-level skill metrics, robotic surgical skills COMPETENCIES: Patient Care, Medical Knowledge, Prac-

tice-Based Learning and Improvement

INTRODUCTION Delineating effective methods to teach and assess surgical technical skill is especially critical in the current milieu of work hour restrictions and increasingly limited number of cases for resident training. Surgical technical skill acquisition involves graduated learning of tasks such as suturing, dissection, or retraction. Traditionally, early surgical technical education and assessment is done at the task level, where trainees are taught specific tasks such as suturing. This education may be more effective when it is augmented with teaching focused on discrete segments that comprise tasks. Surgical tasks can be decomposed into segments, which include maneuvers and gestures.1-4 Surgical tasks are composed of repetitions of discrete maneuvers, which are further constituted by multiple atomic activity segments called gestures. For example, the task of suturing a wound close comprises maneuvers such as passing a needle across the incision and tying a knot. In turn, the maneuver of passing

Journal of Surgical Education  & 2016 Association of Program Directors in Surgery. Published by 1931-7204/$30.00 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jsurg.2015.11.009

FIGURE 1. Decomposition of a simple surgical task into maneuvers and gestures.

the needle across the incision comprises gestures such as driving the needle through the tissue, grasping the needle, and rotating it out of the tissue (Fig. 1). Conventional motion analysis involves computing quantitative metrics for overall tasks (task-level metrics),5-8 which ignore the inherent structure of surgical activities. Task-level metrics contain limited information about how segments (maneuvers and gestures) comprising the task were performed. Task-level metrics also do not explain whether surgeons are uniformly skillful in performing individual segments comprising the task.7,9 There is evidence that a granular motion analysis uncovers task segments that are specific indicators of skill, and may identify segments that are difficult to master or that merit additional practice.6-9 Assessing skill at the segment level (maneuver and gesture level) also informs trainees on which parts of the task they should improve through deliberate practice. However, the value of motion analysis at the segment level for surgical technical skill assessment is unclear. Our objective was to assess different granularities of assessment with respect to their ability to distinguish surgical technical skill, and to compare quantitative metrics at the task, maneuver, and gesture levels for technical skill assessment.

METHODS We used a data set comprising surgical tool motion and video data captured from the da Vinci surgical robot (da Vinci Surgical Systems, Intuitive Surgical, Inc., Sunnyvale, CA), as operators performed the study task comprising a needle passing followed by a 2-1 surgeon’s knot on an inanimate bench-top model.10,11 The data set included a total of 135 repetitions of a study task performed by 4 experts (experienced attending surgeons with robotic surgery practices) and 14 novices (surgical trainees) in

2 academic surgical programs. Every surgeon performed at least 3 repetitions of the study task. We manually annotated all 135 repetitions of the study task for constituent maneuvers and gestures using a prespecified vocabulary. The maneuvers in the vocabulary for our study task included passing the needle across the incision (or a suture throw), running suture to extract it out of the incision, tying a knot using 2 loops or a single loop, and intermaneuver segments (activities performed to prepare for the next maneuver, Fig. 1). The gestures in our vocabulary included driving the needle to enter the skin or tissue, grasping the needle, rotating or pulling the needle out of the skin or tissue, looping the suture around a needle driver to make a knot, grasping and pulling the tail of suture through the loop, and pulling the 2 suture ends taut to tighten the knot. Our vocabulary also included adjustment gestures that were performed in preparation for the next gesture within a maneuver, such as repositioning the needle on the needle driver within the suture throw maneuver. In a separate study (unpublished work), we observed moderate to high agreement between investigators using the vocabulary to annotate the study task (Cohen k ¼ 0.99 for maneuvers and approximately 0.8 for gestures). For our current analyses, we studied only the following maneuvers —suture throw, 2-loop knot, 1-loop knot, and intermaneuver segments (performed to prepare for the next maneuver); and the following gestures—drive needle across tissue, grasp needle, rotate needle out of tissue, loop suture around needle driver (twice in a 2-loop knot and once in a 1-loop knot), grasp suture tail through loop in the knot, pull suture tail through loop in the knot, pull suture ends taut to tighten knot, and adjustment gestures. We used tool motion data to compute 3 metrics for the overall task, and for each maneuver and gesture within a task: (1) time to completion in seconds, (2) path length (distance traveled by the tool tips in meters), and (3)

Journal of Surgical Education  Volume 73/Number 3  May/June 2016

483

number of movements (defined as a peak in speed).1-4 All 3 metrics have previously been shown to be valid measures of surgical technical skill.1-4 For purposes of exploratory analyses, we used an expertassigned global rating score (GRS) with the objective structured assessment of technical skills approach as the ground truth for determining skill level.12 There are no previously established cutoffs to specify expert, intermediate, and novice levels of skill using GRS. For our descriptive analyses, we used arbitrary cutoffs of GRS 4 22 as expert, r 14 as novice, and the remaining as intermediate skill. We computed the 95% CI for the average of each metric using bootstrap samples of the data set with surgeon as the resampling unit (constrained to resample 4 expert and 14 novice surgeons). We fit logistic generalized estimation equations (GEE) models to predict the skill level (expert vs. novice), separately using quantitative metrics computed for the task, maneuvers, and gestures. GEE models allowed us to account for clustering of repetitions of the study task by the same surgeons. To obtain a parsimonious prediction model with a good fit, we implemented a systematic approach to select from among 12 predictors for maneuver-level metrics and 36 predictors for gesture-level metrics. First, we used Markov blankets, which is an established method for variable selection in data sets where the number of predictors is large relative to the number of observations.13 A Markov blanket for the outcome (skill level) variable is the set of predictor variables conditional on which the outcome variable is independent of all other predictors in the data set. Then we applied a forward selection approach to identify the set of variables that yielded good model fit and explained most of the variation in the data.14-17 For purposes of evaluating our predictive models, we considered attending surgeons as experts and trainee surgeons as novices (experience-based skill level). We evaluated validity of the prediction model using a 4-fold crossvalidation setup. We first fit the model on the training data comprising 3 folds (partitions) of the data and predicted the skill level in the fourth (test) fold. We computed the area under the receiver operating characteristic curve (AUC) separately for the task-level, maneuver-level, and gesturelevel GEE prediction models. We used the DeLong method to compare the task-level vs. maneuver-level curves and task-level vs. gesture-level curves.18 We tested the null hypotheses that metrics computed at the task level were not different from metrics computed at the maneuver and gesture levels. We used Matlab 2015a (MathWorks, Inc., Natick, MA) and R (version 3.1.3)19 for our analyses. FIGURE 2. Task-level metrics by skill levels.

RESULTS At the task level, all 3 metrics were discriminative of experience-based skill (Fig. 2). Experts, when compared 484

with novices, performed the task in shorter time, with shorter path length, and with fewer movements. When considering GRS-based skill at the task level, the differences

Journal of Surgical Education  Volume 73/Number 3  May/June 2016

of time and movement metrics among experts, intermediates, and novices were statistically significant using GRSbased skill (Fig. 2). Path length (distance traveled by the tools) showed no statistically significant difference between experts and intermediates. On the contrary, the tools traveled statistically significantly longer distances for novices compared with those for experts and intermediates. At the maneuver level, when using experience-based skill, there were statistically significant differences in all 3 metrics between experts and novices for all the 4 maneuver segments in our analysis (Fig. 3). Experts performed the individual maneuvers faster, with tools traveling shorter distances, and with fewer movements compared with novices. Additionally, when using GRS-based skill, experts performed the task in significantly less time and more efficiently (shorter path length and fewer movements) than did novices and intermediates (Fig. 3). Intermediates performed the suture throw maneuver in significantly more time, longer path length, and with more movements than expert performances but the difference between intermediates and novices was not statistically significant. On the contrary, intermediates performed the 2-loop knot maneuver in significantly less time, with shorter path and fewer movements than those of novices but the difference between intermediates and experts was not statistically significant. Experts spent shorter time, and were more efficient in performing intermaneuver segments than intermediates and novices. At the gesture level, when using experience-based skill, experts were different from novices for nearly all gestures with respect to time and number of movements but for only a few gestures with respect to path length (Supplementary Table 1). Experts performed some gestures such as driving needle through tissue, rotating it out of the tissue, and looping suture around the needle driver 2 or 3 times faster and with half to one-third the number of movements compared with novices. Similarly, when using GRS-based skill, most differences between experts and novices were observed for time and number of movements while performing the gestures (Supplementary Table 2). Our variable selection approach yielded the following 6 metric-maneuver combinations for inclusion in the GEE model—(1) time to complete suture throw, (2 and 3) time and number of movements for the first knot, (4 and 5) time and path length for the second knot, and (6) time for intermaneuver segments. Our analysis identified the following 8 gesture-metric combinations for inclusion in the GEE model—(1) drive needle across the incision (time), (2 and 3) rotate needle out of tissue (time and movements), (4 and 5) grasp suture tail through loop in a 2-loop knot and 1-loop knot (time), (6) pull suture ends taut to tighten a 1-loop knot (path length), and (7 and 8) adjustment gestures (time and path length). We did not observe statistically significant differences in AUC (computed using the GEE models) for predicting

experience-based skill when comparing task-level vs. maneuver-level and task-level vs. gesture-level models (Fig. 4). All 3 models had similar classification accuracy, with an AUC of 0.79 (95% CI ¼ 0.62-0.97) for the tasklevel model, 0.78 (95% CI ¼ 0.6-0.96) for the maneuverlevel model, and 0.7 (95% CI ¼ 0.44-0.97) for the gesturelevel model. The accuracy for the task-level model was not statistically significantly different from that of the maneuver-level model (p ¼ 0.7) or the gesture-level model (p ¼ 0.17). Our findings do not indicate that metrics computed at the maneuver level and gesture levels differ from task-level metrics in their ability to discriminate between experts and novices.

DISCUSSION Our study is a comprehensive, quantitative, comparative analysis of surgical skill at the level of maneuvers and gestures comprising a task. In this study, we established the ability of quantitative metrics to effectively classify or discriminate surgical skill (instead of a simple descriptive analysis comparing metrics between surgeons with different levels of experience), and we extended previous research by additionally investigating maneuver level and gesture level metrics. Overall, the 3 quantitative metrics we studied— time, path length, and movements—were only moderately predictive of experience-based skill at a task level, maneuver level, and gesture level. Additionally, there were no statistically significant differences in predictive value among tasklevel, maneuver-level, and gesture-level metrics for assessing experience-based skill. Metrics computed at the maneuver level and gesture levels did provide additional insights into how surgeons performed different segments of the task. This is critical for educators to provide targeted feedback on which parts of the task the trainees have to improve. For example, all 3 metrics (time, path length, and movements) for intermaneuver segments, and time and movements for adjustment gestures were significantly lower for expert compared with novice surgeons. The intermaneuver segments and adjustment gestures represent what surgeons do to position oneself for the subsequent maneuver or gesture, respectively. Our observations reinforce a traditional notion that what demonstrates expertise among surgeons (and other activity-based domains such as sports) is not simply what operators do but what they do when they plan for the next steps (i.e., when they “do not do anything”). In the context of intermaneuver segments, targeted feedback may involve trainees learning how expert surgeons efficiently manipulate surgical tools to position themselves for the next activity. Targeted feedback is necessary because the difficulty in learning may vary across different maneuvers and gestures within a task. Previous research suggests that trainees subjectively find some tasks within surgical procedures more

Journal of Surgical Education  Volume 73/Number 3  May/June 2016

485

FIGURE 3. Maneuver-level metrics by skill levels.

486

Journal of Surgical Education  Volume 73/Number 3  May/June 2016

FIGURE 4. Receiver operating characteristic curves for task-level, maneuver-level, and gesture-level metrics to predict experience-based skill (expert or novice).

difficult to master than others.20 Our analyses provide data to show that some maneuvers and gestures within a simple surgical task might be more difficult to learn and perform than others. For example, intermediates performed the suture throw maneuver and intermaneuver segments with similar efficiency (time, path length, and movements) to those of novices, and both categories had significantly less efficiency than those of experts. On the contrary, intermediates performed knot-tying maneuvers with similar efficiency as those of experts; and both categories had significantly more efficiency than those of novices. Examining gestures reveals a similarly descriptive picture. Intermediates performed certain gestures, such as grasping the needle after passing it across the incision and rotating it out of the tissue with similar efficiency as those of novices, and both were significantly less efficient than experts. On the contrary, intermediates performed other gestures such as looping the suture around a needle driver to make a knot with similar efficiency as those of experts, and both were significantly more efficient than novices. These observations may also be interpreted as a reflection of difficulty in mastering different steps of the task; therefore, targeted feedback during surgical skills training may be focused on the more difficult steps of the task. Our study is limited in that we relied on a single data set for selecting variables and evaluating predictive accuracy of our regression models. To avoid bias, our model selection criteria were based on model fit and not related to predictive accuracy. However, future work should perform model selection and evaluation of predictive accuracy on different partitions of the data set. Also, our data set included a small sample of 18 surgeons performing a single task based on an inconsistent implementation of instructions. Some surgeons performed a 2-1 surgeon’s knot, whereas others performed a 1-1 square knot or a 2-2 surgeon’s knot. A larger sample of surgeons performing multiple structured tasks based on uniform instructions would provide more generalizable and actionable feedback at the maneuver level and gesture levels for trainees. Error assessment and an individual’s ability to

detect errors in their performance have been shown to be predictive of surgical skill.21,22 We did not consider error assessment in our study. We also assumed that technical skill at the maneuver level and gesture levels corresponds to technical skill assessed at the task level using a GRS. This approach is consistent with clinical practice, and previous research has established the association between maneuverlevel and task-level technical skill.23 Our analyses considered only technical skill and did not include assessment of nonsurgical skills such as decision-making or surgical judgment, which perhaps cannot be assessed for simple surgical tasks such as those studied here. Our findings pose an important question for further research regarding the educational value of targeted feedback within surgical skills training curricula through maneuver-level and gesture-level skill assessment. Typically, feedback from instructors to trainees acquiring technical skill is aimed at the gesture level. For example, trainees learning to suture tissue are given feedback on holding the needle in an appropriate location to ensure a smooth drive through the tissue. But technical skill is usually assessed at a global level, for example, at the task level. Our analyses in this study and prior research show that maneuvers and gestures contain information on the skill with which they were performed.23 Thus, determining whether maneuverlevel and gesture-level skill assessment and feedback translates into quicker and more effective acquisition (longer retention) of surgical technical skill can inform how curricula may evolve to better address time and resource constraints. Implementing segment-level skill assessment and feedback on a large scale within a surgical skills training curriculum would require efficient techniques to annotate components within structured surgical tasks. Recent technological and methodological advances have automated or semiautomated segmentation of tasks into constituent maneuvers and gestures. For example, several techniques have been developed for automatically segmenting structured surgical tasks into constituent gestures using tool

Journal of Surgical Education  Volume 73/Number 3  May/June 2016

487

motion, video data or both.4,24-28 Crowdsourcing is another unexplored alternative method for efficiently annotating surgical tasks into maneuvers and gestures. Others have shown that crowdsourcing yields valid responses from untrained individuals for a variety of situations, including surgical skill assessment and annotation or segmentation of video images.29-31 But crowdsourcing for surgical activity annotation has yet to be explored. In conclusion, our study is a novel investigation into differences in how maneuvers and gestures within a surgical task are performed by expert and novice surgeons. Maneuver-level and gesture-level quantitative metrics are both predictive of surgical technical skill. Our study supports the relevance and importance of maneuver-level and gesture-level assessment in addition to task-level evaluation. Skill assessment at maneuver level and gesture levels can be used to provide targeted feedback to trainees, that is, inform trainees on which parts of the task they need to improve.

9. Reiley CE, Hager GD. Task versus subtask surgical

skill evaluation of robotic minimally invasive surgery. In: Yang G-Z, Hawkes D, Rueckert D, Noble A, Taylor C, editors. MICCAI 2009, Part I, LNCS, 5761. Berlin Heidelberg: Springer, 2009. p. 435-442. 10. Kumar R, Jog A, Vagvolgyi B, et al. Objective

measures for longitudinal assessment of robotic surgery training. J Thorac Cardiovasc Surg. 2012;143(3): 528-534. 11. Kumar R, Jog A, Malpani A, et al. Assessing system

operation skills in robotic surgery trainees. Int J Med Robot. 2012;8(1):118-124. 12. Martin JA, Regehr G, Reznick R, et al. Objective

structured assessment of technical skill (OSATS) for surgical residents. Br J Surg. 1997;84(2):273-278. 13. Freno A. Selecting Features by Learning Markov

Blankets. Apolloni B, Howlett RJ, Jain L, editors. Knowledge-based intelligent information and engineering systems. Berlin, Heidelberg: Springer, 2007. p. 69-76. 14. Schemper M. Predictive accuracy and explained varia-

REFERENCES 1. Lin HC. Structure in surgical motion.

Baltimore, MD: The Johns Hopkins University; 2010.

2. Reiley CE, Lin HC, Yuh DD, Hager GD. Review of

tion. Stat Med. 2003;22(14):2299-2308. 15. Pan W. Akaike’s information criterion in generalized

estimating equations. Biometrics. 2001;57(1):120-125.

methods for objective surgical skill evaluation. Surg Endosc. 2011;25(2):356-366.

16. Scutari M. Learning Bayesian networks with the

3. Varadarajan B. Learning and inference algorithms for

17. Sauerbrei W, Schumacher1 M. Bootstrap and cross-

dynamical system models of dextrous motion. Baltimore, MD: The Johns Hopkins University; 2011. 4. Varadarajan B, Reiley C, Lin H, Khudanpur S,

bnlearn R package. J Stat Softw. 2010;35(3):1-22. validation to assess complexity of data-driven regression models. Brause RW, Hanisch E, editors. Medical Data Analysis. Berlin: Springer, 2000. p. 234-241.

Hager G. Data-derived models for segmentation with application to surgical assessment and training. In: Yang G-Z, Hawkes D, Rueckert D, Noble A, Taylor C, editors. MICCAI 2009, Part I, LNCS, 5761. Berlin, Heidelberg: Springer, 2009. p. 426-434.

18. DeLong ER, DeLong DM, Clarke-Pearson DL. Com-

5. Bann SD, Khan MS, Darzi AW. Measurement of

surgical dexterity using motion analysis of simple bench tasks. World J Surg. 2003;27(4):390-394.

statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available at: 〈http:// www.R-project.org/〉; 2014.

6. Aggarwal R, Dosis A, Bello F, Darzi A. Motion

20. Dooley IJ, O’Brien PD. Subjective difficulty of each

tracking systems for assessment of surgical skill. Surg Endosc. 2007;21(2):339.

stage of phacoemulsification cataract surgery performed by basic surgical trainees. J Cataract Refract Surg. 2006;32(4):604-608.

7. Datta V, Mackay S, Mandalia M, Darzi A. The use of

paring the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837-845. 19. R Core Team. R: A language and environment for

electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model. J Am Coll Surg. 2001;193(5):479-485.

21. Bann S, Khan M, Datta V, Darzi A. Surgical skill is

8. Mackay S, Datta V, Mandalia M, Bassett P, Darzi A.

22. Bann S, Datta V, Khan M, Darzi A. The surgical error

Electromagnetic motion analysis in the assessment of surgical skill: relationship between time and movement. ANZ J Surg. 2002;72(9):632-634.

examination is a novel method for objective technical knowledge assessment. Am J Surg. 2003;185(6): 507-511.

488

predicted by the ability to detect errors. Am J Surg. 2005;189(4):412-415.

Journal of Surgical Education  Volume 73/Number 3  May/June 2016

23. Malpani A, Vedula SS, Chen CC, Hager GD. A study

of crowdsourced segment-level surgical skill assessment using pairwise rankings. Int J Comput Assist Radiol Surg. 2015;10(9):1435-1447. 24. Tao L, Elhamifar E, Khudanpur S, Hager GD,

Vidal R. Sparse hidden Markov models for surgical gesture classification and skill evaluation. In: Abolmaesumi P, Joskowicz L, Navab N, Jannin P, editors. Information Processing in Computer-Assisted Interventions, 167. Berlin, Heidelberg: Springer, 2012. p. 177. 25. Ahmidi N, Gao Y, Béjar B, et al. String motif-based

description of tool motion for detecting skill and gestures in robotic surgery. In: Mori K, Sakuma I, Sato Y, Barillot C, Navab N, editors. MICCAI 2013, Part I, LNCS, 8149. Berlin, Heidelberg: Springer-Verlag, 2013. p. 26-33. 26. Zappella L, Béjar B, Hager G, Vidal R. Surgical

gesture classification from video and kinematic data. Med Image Anal. 2013;17(7):732-745. 27. Haro BB, Zappella L, Vidal R. Surgical gesture

classification from video data. In: Ayache N, Delingette H, Golland P, Mori K, editors. MICCAI 2012,

Part I, LNCS, 7510. Berlin, Heidelberg: SpringerVerlag, 2012. p. 34-41. 28. Tao L, Zappella L, Hager GD, Vidal R. Surgical

gesture segmentation and recognition. In: Mori K, Sakuma I, Sato Y, Barillot C, Navab N, editors. MICCAI 2013, Part III, LNCS, 8151. Berlin, Heidelberg: Springer, 2013. p. 339-346. 29. Chen C, White L, Kowalewski T, et al. Crowd-

sourced assessment of technical skills: a novel method to evaluate surgical performance. J Surg Res. 2014; 187(1):65-71. 30. Brady CJ, Villanti AC, Pearson JL, Kirchner TR,

Gupta OP, Shah CP. Rapid grading of fundus photographs for diabetic retinopathy using crowdsourcing. J Med Internet Res. 2014;16(10):e233. 31. Maier-Hein L, Mersmann S, Kondermann D, et al.

Can masses of non-experts train highly accurate image classifiers? In: Golland P, Hata N, Barillot C, Hornegger J, Howe R, editors. MICCAI 2014, Part II, LNCS, 8674. Switzerland: Springer International Publishing, 2014. p. 438-445.

SUPPLEMENTARY INFORMATION Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.jsurg. 2015.11.009.

Journal of Surgical Education  Volume 73/Number 3  May/June 2016

489