CHAPTER 37
Interventional procedures training Tamas Ungia , Matthew Holdena , Boris Zevinb , Gabor Fichtingera a School
of Computing, Queen’s University, Kingston, ON, Canada of Surgery, Queen’s University, Kingston, ON, Canada
b Department
Contents 37.1. Introduction 37.2. Assessment 37.2.1 Rating by expert reviewers 37.2.2 Real-time spatial tracking 37.2.3 Automatic video analysis 37.2.4 Crowdsourcing 37.3. Feedback 37.3.1 Feedback in complex procedures 37.3.2 Learning curves and performance benchmarks 37.4. Simulated environments 37.4.1 Animal models 37.4.2 Synthetic models 37.4.3 Box trainers 37.4.4 Virtual reality 37.5. Shared resources 37.6. Summary References
909 911 912 913 915 916 917 917 918 919 920 920 921 923 924 925 925
37.1. Introduction Medical education and competency assessment have been criticized to overly focus on lexical knowledge and written tests. Investing in better education and evaluation of procedural skills would likely result in better patient care [5]. However, studies suggest that most medical students leave school with insufficient surgical and procedural skills to perform basic procedures without putting patients at risk [9,40]. Although this gap in education has been realized by both practitioners and researchers, implementing a change in the education system is challenging for medical schools. Education could be improved by increasing exposure to practical procedural education sessions in medical schools. But such a change would require fundamental changes in the current curriculum structure, and it would probably require significantly more faculty resources. In fact, the fundamentals of medical education have not changed much in the past centuries. The apprenticeship model and one-on-one training is still the domHandbook of Medical Image Computing and Computer Assisted Intervention https://doi.org/10.1016/B978-0-12-816176-0.00042-9
Copyright © 2020 Elsevier Inc. All rights reserved.
909
910
Handbook of Medical Image Computing and Computer Assisted Intervention
inant teaching method for procedural medical skills [43]. More practice opportunities for students would require proportionally more supervision by experienced physicians, which is simply not feasible due to the limited time physicians can dedicate to teaching while balancing their main responsibility for patient care. To make a meaningful change in procedural skills education, supervision by experienced physicians need to be at least partially replaced by new technologies. That is why computer-guided medical training could bring the long-awaited transformation in medical education to reality. Many procedures have been taught and performed for decades without computerassisted training, but recent changes in social awareness of patient safety and physician liability increased the need for more objective measures in training. Computer-assisted training technology offers many advantages in making the training process more standardized and objective. Increased awareness of patient safety demands that physicians develop means for quality assurance in their training programs. Professional organizations recommend documented proof of skills of medical trainees before they perform those procedures on patients. This is a daunting task for most medical specialties. In many specialties it is hard to put objective measures on the interactions between physicians and patients. But interventional procedures have a great potential to exploit computers and technology to objectively measure and document performance of technical skills. Computers can help resolve two problematic aspects of skill assessment. First, computers are inherently objective. They ignore potential bias based on, e.g., race and gender by design. Computers always perform skill analysis based on the same preprogrammed metrics. Second, they are more accessible than experienced physician observers. This is important for equity aspects in training. Students can use computer-based training as much as they need. Today, computers are often more accessible for medical students than experienced physician mentors. Especially in remote or underserved regions, physicians may not be available for students because they are overburdened with patient care. Using computer-based simulators and trainers, students can practice procedures until they meet expected benchmarks. Computerized skill assessment can inform them continuously by providing feedback on their progress towards these benchmarks. Computer-assisted training has yet another potential in medical education besides objective skill assessment and accessibility. Feedback is the most important element of training. Any new technology for training should reinforce good practices and point out mistakes to avoid. Without feedback, progress along the learning curve is not guaranteed. Besides skill assessment and accessibility, algorithm development also needs to focus on giving useful feedback for trainees. This is an especially exciting field of research as new visualization devices supporting augmented and virtual reality have recently become affordable for almost everyone. The learning environment of surgical and other procedural skills can be improved today with virtual reality technology in any medical school with technology that was only available in a few research laboratories just a decade ago.
Interventional procedures training
In addition to improving learning experience for existing procedures, computerized training has another important aspect for researchers of new intervention guidance technologies. Training can be the key to acceptance and success of new technologies. Therefore, development of training methods should be an integral part of all innovation. Computer-assisted intervention guidance procedures are rarely intuitive enough to be performed without specific training. Every new surgical navigation and guidance method needs to eventually be evaluated by clinical practitioners. They will only find them helpful if they can learn to use the new techniques quickly and effortlessly. Researchers of new assisted interventions should keep training in mind from the onset of development of new assistive technologies. Training materials can be updated as the procedure evolves through cycles of testing by clinical users. Medical technology is often criticized to lag behind other technology-intensive application areas like military or aviation. The entertainment industry seems to lead by decades of evolution ahead of medicine when we compare their virtual and augmented reality visualization to radiological and surgical navigation displays. The technology seems to be available for medical applications as well. It is enough to visit a major conference for anyone to see that cutting-edge technology is just waiting for clinical translation. A few clinicians involved in the development process typically have no issues during the first evaluation in patients, but evidence-based clinical research requires multicenter trials for the adoption of new methods. Any new technology will only have a chance to be used at other places, outside where it was invented, if it comes with an excellent training support for effortless reproduction of results. In this chapter we review four aspects of computer-assisted training technology for medical interventions. The first section discusses various ways to obtain data while trainees practice different procedures, and how this data can be used to assess the skills of trainees. The second section reviews available technology for generating feedback for trainees on their performance and ways to improve their procedures. The third section discusses simulation environments for deployment of training systems. Finally, shared software and data resources are discussed that enable quick entry for new researchers to this field and facilitate collaboration between existing research groups. We end this chapter with a short summary.
37.2. Assessment Performance assessment is the first task any surgical or interventional training software is required to perform, after simulating the medical procedure. There is a wide variety of sensors available to collect data during practice procedures. Every training system uses one or a combination of these sensors. Some sensors like pupillometry or sweat sensors provide general information on the stress levels of trainees, but in this chapter we focus on devices more specific to the medical procedures. There is no commonly
911
912
Handbook of Medical Image Computing and Computer Assisted Intervention
accepted system for data collection. Instrument-heavy procedures like laparoscopic and robotic surgery are often best analyzed by accurate spatial tracking. Freehand and open procedures may require different sensing modalities. It is beyond the scope of this chapter to generally define what makes physicians good at their profession. We only discuss expertise of procedural skills, so we can refer to features that are easier to define and quantify, like positional accuracy, appropriate force, or timeliness. But often we see the same procedure assessed by different metrics in different published research. A common taxonomy of metrics was proposed by Schmitz et al. [46], as a resource for designing new training systems and methods. It is important to refer to these common definitions to make systems and research results comparable with each other. Without fair comparison between different research prototypes, there is little hope for this area to make significant progress towards translation to users. The next subsections discuss the most commonly used assessment methods such as expert observers, spatial tracking, automatic video analysis, and crowdsourcing.
37.2.1 Rating by expert reviewers Assessment of surgical or interventional skills by experienced clinicians has been utilized since the introduction of the apprenticeship model in medical education. Historically, assessments of learners in surgery and interventional procedures were primarily subjective in their nature (i.e., did the trainee do a “good job” at a particular task); however, subjective assessments are prone to assessment bias, where trainees can be unfairly penalized on the basis of gender, race, religion, or socioeconomic status. Objective assessment scales for surgical and interventional skills were developed to overcome the limitation of subjective assessment. Objective assessment can (1) aid learning by provision of constructive feedback, (2) determine the level that a trainee has achieved, (3) check whether a trainee has made progress, (4) ensure patient’s safety before a trainee performs an unsupervised procedure, and (5) certify completion of training [4]. As these assessment scales are the currently accepted standard of practice, they are important resources and reference points for anyone who is developing computerized training technology. Currently available objective assessment scales can be grouped into the categories of checklists, global rating scales, procedure-specific assessment scales, and rubrics. Checklists are now available for common surgical procedures such as appendectomy, cholecystectomy, Nissen fundoplication, ventral hernia repair, and others [3]. The main advantages of checklists include objectivity, unambiguous expectations and the opportunity to provide immediate and relevant feedback [42]. The main disadvantage is that a checklist for one procedure cannot be used for assessment of another procedure. A checklist is completed by observing a surgical or interventional procedure and checking off the items on the list that were “completed” and “not-completed”. Checklists turn examiners into observers of behavior rather than interpreters of behavior, thereby making the assessment more objective and less subjective.
Interventional procedures training
Global rating scales (GRS) offer another method for objective assessment of surgical and interventional procedural skills. Global rating scales are not procedure-specific, which makes them a convenient tool for educators with results that are comparable to each other. Examples of GRS for assessment of technical skills include Objective Structured Assessment of Technical Skill Global Rating Scale (OSATS GRS) [37], and the Global Operative Assessment of Laparoscopic Skills (GOALS) [54]. The main advantage of global rating scales is their generalizability across different procedures. Global rating scales are designed to assess generic surgical and interventional skills such as tissue handling, knowledge of instruments, use of assistants, etc. These generic skills are transferable from one procedure to the next. Checklists, on the other hand, evaluate whether specific operative steps were completed correctly, making them not transferable from one procedure to the next [18]. The three main disadvantages of global rating scales are (1) the requirement to train the raters in the use of the scale [15], (2) the challenge of using the scores on global rating scales for formative feedback, and (3) the subjectivity introduced during judgment of how a trainee performs on a particular component of the global rating scale. Procedure-specific checklists form the middle ground between checklists and global rating scales [41]. They are specific enough to provide useful formative feedback to the individual that is being assessed; however, they are not as prescriptive as checklists. They provide formative feedback via identification of specific areas of weakness, while offering some flexibility in regards to the completion of specific operative steps. An example of a procedure specific rating scale is the BOSATS scale [57]. Rubrics are similar to rating scales, but instead of characterizing skill levels with numerical values they provide explicit definitions and criteria for skill levels. In contrast to global rating scales, rubrics with descriptions of skill levels make standards of performance explicit [48]. Advantages of rubrics for surgical and interventional skills include (1) shared frames of reference for raters through the use of a clear assessment framework and scoring guide; (2) improvement in the consistency, reliability, and efficiency of scoring for both single and multiple assessors; (3) providing trainees with immediate feedback on performance; and (4) improvement in trainees’ ability to self-assess performance [24]. Example of a rubric for objective assessment of surgical skill is the Surgical Procedure Feedback Rubric developed by Toprak et al. [48]. Rating scales have been validated and used consistently in medical education [37]. These rating scales are trusted tools, therefore computerized assessment methods should be validated against them.
37.2.2 Real-time spatial tracking Motion analysis by real-time tracking devices have been used to supplement human observer analysis of surgical motions. The first application of tool tracking was in laparoscopic surgery, because laparoscopic instruments are easy to equip with additional
913
914
Handbook of Medical Image Computing and Computer Assisted Intervention
motion sensors and they restrict the freedom of motion to make motion analysis simpler. The Imperial College Surgical Assessment Device (ICSAD) has been one of the first systems ever used for motion analysis during surgical training. It was first used in laparoscopic procedures [55], then in open surgical tasks [8,39]. It is still used as an objective and quantitative assessment tool in interventional procedure training [6]. Since the invention of ICSAD, similar systems have been developed to measure procedural skills based on motion analysis. Hand and instrument motion data can be used in various ways to estimate skill levels of trainees. ICSAD offers motion quantification by total path length and motion count. But other metrics such as idle time (hand velocity under 20 mm/s for over 0.5 s) were also found to correlate with experience level. Novices were found to pause their motions more often than others [7]. Motion metrics are usually computed from the raw motion data directly, and threshold values are applied to either the metrics themselves or a mathematical combination of the metrics. Decisions based on metrics values lead to the classification of expertise levels. A recently developed artificial neural network-based algorithm seem to be a better choice for determining the skill levels compared to simpler machine learning methods [45]. Convolutional neural networks are effective in processing laparoscopic video data, and even classify the operator’s skill level at a high accuracy [13]. However, this area needs further research to collect sufficient evidence for acceptance by medical schools. Hand motion is most commonly tracked by electromagnetic sensors, because their small size enables seamless integration to existing scenarios. Wires to sensors can usually be managed, and they avoid line of sight issues of optical trackers. But a wide variety of devices can also be used to track instrument or hand motions. Active optical markers require line of sight between tracked markers and camera, but they provide very high accuracy and precision [7]. Optical markers can be active LED lights, passive light reflective objects, or even simple printed optical patterns (Fig. 37.1). Accelerometers and gyroscopes are affordable, small, and wireless sensors that provide rotational data, but absolute pose (position and orientation) cannot be measured with such devices. Rotational data may still be enough to recognize and classify surgical skills [32]. Most hand motion tracking systems sense only one or a few points on each hand, but gloves can be equipped with flex sensors to track each joint on the hand [45]. A common issue with all of these tracking devices is that they require an extra device, a sensor or a marker, be added to the existing surgical instruments or the hands of the operator. While this is usually feasible without much interference with the normal procedures, especially in simulated environments, there is a need for automatically tracking and recognizing operator actions without altering the operative field. This is especially important for the study of translation of procedural skills from simulators to clinical environments. The rapid evolution of automatic video processing may take some territory over from optical and electromagnetic tracking devices.
Interventional procedures training
Figure 37.1 Surgical instruments equipped with optical markers for spatial tracking. 3D-printed attachments are designed for minimal interference with normal handling of the instruments.
37.2.3 Automatic video analysis Video camera technology underwent a staggering innovative improvement in the past decade. Cameras of the smallest size and under-$100 price provide 1080p resolution at 30 fps and beyond. The accessibility of cameras contributed to the renewed attention to computer vision and video processing algorithms. An impressive body of research has been published recently on video-based assessment of medical interventions training. And this may just be the beginning of transformative changes, because automatic video processing is already in a more advanced stage in other industries such as surveillance or autonomous driving. When that level of video-based decision making is adapted to medical training, it may not only enable new procedures to be taught by computers, but also many tracker-based training systems may become obsolete. Video recordings of learners can also be reviewed by expert practitioners. In fact, this is often used in medical schools today. Trainees and teachers often review video recordings of these procedures to discuss, give feedback, or grade the skills of trainees. The main limitations of this learning method are the cost, availability, and potential subjectivity of expert physicians. To address the problem of limited resources, researchers experimented with rating recorded procedures based on only parts of the full procedures, but they found that the reliability of ratings dropped with shorter video segments [44]. Automatic video processing seems to be the only way to scale up video-based skill assessment to become a universal teaching tool in medical education. While video reviews and feedback by experts is only available a few times for each medical trainee, computerized assessment is virtually unlimited. It could guide the trainee all the way through the learning process, regardless how long it takes to master the necessary skills for each individual. Standard tools in well-defined environments make video processing easier for algorithms. Therefore, the most popular application of this technology is laparoscopic surgery. It has been shown that motion economy and smoothness is characteristic of skill [17]. While most methods require some user interactions or special markers to identify
915
916
Handbook of Medical Image Computing and Computer Assisted Intervention
surgical tools, modern machine learning algorithms are capable of reliably identifying surgical instruments in a fully automated way in real-time video sequences [25]. Automatic video review for rating surgical skills has not only been applied in laparoscopic procedures. It has also been shown that videos from overhead mounted cameras are helpful in rating suturing skills [38]. When practicing periodic motions like running sutures or repeated knot tying, entropy-based webcam video analysis methods seem to perform extremely well in classifying skill level. Organized motions in videos are associated with less entropy of the video image content over time [58]. Videos of eye surgery were classified for skill level using convolutional neural networks on features extracted from the videos [31]. It seems to be clear after about a decade of research in automatic analysis of trainee videos that it is a very promising avenue of research, but it can only be made robust for widespread use if the research community works together on collecting training data for machine learning methods. State-of-the-art machine learning is extremely dependent on the amount of training data. Current efforts are already targeting these wide collaborations. Public databases are created for challenge events at research conferences to productively compete in the analysis of these public datasets. These efforts will hopefully be nurtured and grown into global initiatives, engaging even more researchers and creating large public datasets.
37.2.4 Crowdsourcing In the past decades a new concept also emerged for accomplishing large tasks that require human observer input. The concept, called crowdsourcing, is based on the idea of distributing work in an open community through the Internet. The lack of quality control of individuals in the crowd is compensated by aggregating the results from parallel processing by multiple people. Crowd work is managed online through websites designed specifically for these tasks. General sites like Amazon Mechanical Turk can be used to hire volunteers to watch and label video segments, usually for a small financial compensation. It has been shown that a group of 32 random crowd workers provide a more reliable assessment of surgical skills from urethrovesical anastomosis videos than faculty experts [36], and online workers to perform this task were available at any time during the day, unlike most experts. To make the classification of video segments easier, it is recommended to offer a few keywords for reviewers and ask them to pick the ones more characteristic of the video that they are watching [12]. These labels can later be used to classify the video into skill levels. Crowdsourcing may solve assessment tasks in medical training in the future, but controversies still exist on the feasibility of this method for some procedures. It was found that complications after intracorporeal urinary diversion during robotic radical cystectomy were not predicted by crowd workers after reviewing video recordings [19].
Interventional procedures training
However, experienced surgeons were also unable to predict complications from the same videos. Algorithms and computers can clearly outperform human observers in some video processing tasks already, and the use of machine learning technology in nonmedical applications have also been proven extremely useful. Crowdsourcing may be affordable, but a minimal cost will always remain proportional to the amount of work. Trainees may also be concerned about data protection and privacy issues, when video recordings of their performances are shared with a global network of raters. Therefore, there are plenty of reasons to believe that video processing will play a much more significant role in interventional procedures training in the future.
37.3. Feedback Feedback is the key element of learning. It gives the trainee hints on how to modify their actions to achieve a better performance, but feedback of training systems can be implemented in many ways. Often, providing a trainee a score as an objective measurement of skill level is insufficient feedback in training [51]. This may be the reason why the apprenticeship model has survived so long even with its known limitations. The verbal feedback from an experienced observer is extremely hard to substitute. Interventional procedures are often best described with graphical representation, rather than text. Pictures, videos, or real-life demonstrations better illustrate the spatial relationship between anatomy and instruments or the pace of a procedure. One way to provide trainees with graphical feedback on their performance is through augmented reality display. This shows trainees perspectives of the surgical scene that cannot be seen by the naked eye or through traditional medical imaging modalities. While this feedback is not quantitative, it facilitates trainees’ understanding of their movements in space relative to the anatomy. Augmented reality graphical feedback has been shown to improve trainee performance when provided in real time, especially for percutaneous interventions [28,30]. With augmented reality technologies becoming increasingly available to consumers, this will make such graphical feedback even more effective and accessible. The next two sections discuss two aspects of performance feedback: first, how to define simple metrics for complex medical procedures, and second, how to establish benchmarks and analyze learning curves.
37.3.1 Feedback in complex procedures Most surgical and procedural skill assessment systems focus on a single task like suturing or needle placement. But real-life procedures often involve a series of tasks to be performed before and after the actual intervention. These additional tasks include disinfecting the skin, applying anesthesia, or delivering an injection. Failure to perform any procedural step appropriately defeats the purpose of accurate needle placement. There-
917
918
Handbook of Medical Image Computing and Computer Assisted Intervention
fore, training systems need to be prepared to handle multistep processes, detect whether they are performed in the correct order, and give feedback to the trainee on procedural mistakes. Teaching is most effective when learners realize what specific mistakes they have made and how to correct them. Systems that analyze trainee inputs to assess skill levels without domain specific knowledge are helpful in assessing trainee performance, but they are not accelerating the learning process and cannot substitute an experienced trainer person [29]. Domain specific knowledge about each aspect of the intervention may be incorporated into computerized assessment through expert-defined performance metrics. Domain specific knowledge about the workflow of an intervention may be modeled based on expert consultation using Surgical Process Models (SPM) [34]. When expert-defined performance metrics are used for assessment, each metric acts as quality indicator for some aspect of the intervention. As a result, these metrics can be translated into feedback using expert domain knowledge [22]. Of course, these metrics must be interpreted in the context of the expert population. In addition, machine learning methods can be applied to provide specific feedback to trainees via OSATS-like ratings [52]. Given a model of the surgical process, motion analysis or video analysis can be used to recognize individual activities during interventions training in real time. This has been commonly achieved using Hidden Markov Models [11,23] to represent the transitions between activities, but methods using Dynamic Time Warping [1] have also been implemented. In recent years, deep learning-based video assessment has been the most popular method [53,26]. By recognizing the ongoing activity, information on how to best complete the activity can be provided in real time to the trainee. Furthermore, the subsequent activity can be automatically predicted [27,16] to provide information on which activity to perform next. In the case where the sequence of activities is linear, like central line catheterization, information about the ongoing activity can be used in real time to remind the trainee of specific workflow mistakes and how to correct them [21] (Fig. 37.2). Finally, at the end of the intervention, this activity recognition may be combined with expert-defined performance metrics to provide specific feedback on how to improve in each phase of the intervention.
37.3.2 Learning curves and performance benchmarks It is also important to provide trainees with feedback on their progression towards proficiency. This can be done through monitoring trainees’ learning curves over time. This may include both their overall performance learning curves and their learning curves with respect to each aspect of the intervention. This can be done by visual inspection of learning curve plots or quantitatively through cumulative sum analysis [10]. To interpret one’s own learning curves, they must be put into perspective of expectations. Proficiency benchmarks must be established so trainees can see how much
Interventional procedures training
Figure 37.2 Training system for central line catheterization. Electromagnetic tracking sensors are complemented by webcam-based object recognition. Procedure steps are automatically recognized and displayed for trainees to guide through the early stages of learning.
they need to learn before achieving the expected level of skills in a given training environment. It is often a challenging process to establish benchmarks, because the skill levels of novices and experts are usually too diverse and too far from each other. The benchmark may also be dependent on the stage of training and the training environment. While there exist both participant-centered approaches (e.g., contrasting groups method) and item-centered approaches (e.g., Angoff method with domain experts), no standard method for setting performance benchmarks has been established by the community [20]. Regardless of the method for establishing the performance benchmark, it must be justified empirically, and every trainee must meet the benchmark prior to patient encounters. It is a good practice to validate the benchmarks by parallel measurement of trainee performances by an automatic method and an already accepted expert review. Benchmark performance metrics can be adjusted based on what procedures were labeled as proficient by the expert reviewers.
37.4. Simulated environments When implementing training environments for medical procedures, the first difficult choice for the teachers to make is often how to simulate the patient and the interventional tools. The most realistic environments may be the most expensive, but may not be most effective in training. Animal models are often the first choice when practice on human subjects is not safe or feasible. Synthetic models of various fidelity are also commonly used. Recent developments in virtual and augmented reality provide a new emerging technology that may replace many more animal and synthetic models in the future. In the next sections, we discuss these options for simulation.
919
920
Handbook of Medical Image Computing and Computer Assisted Intervention
37.4.1 Animal models Just a few decades ago it was considered standard practice that medical students learned their practical skills on patients. Currently that is not an accepted learning method, because it conflicts with patient rights and safety. Animals were chosen as alternative learning models. Most medical schools are currently equipped with animal operating rooms and facilities to conduct research on animals. However, the cost of care, animal rights awareness, and long-term planning requirements impose significant limitations on how much of the procedural training can be performed on animals. It was also shown that synthetic models offer similar value in training [37], therefore today the vast majority of medical interventions training and testing of new methods is done on synthetic phantom models before transitioning into the clinical environment.
37.4.2 Synthetic models Different kinds of plastic materials are suitable for manufacturing anatomical models for practicing procedures. The most common material in research setting is silicone, because it is simple to handle and does not require special laboratory equipment. Making an entire anatomical region from silicone with realistic visual and tactile properties of different tissues is possible, e.g., for practicing oncoplastic surgery [35]. There are a few cases when silicone cannot be used. These include ultrasoundguided procedures. Ultrasound image is generated with the assumption that echo time is proportional to the depth of tissue boundaries in the body. But the speed of sound in silicone is only around 900 m/s, while it is typically 1540 m/s in human soft tissue. The slow speed of sound delays echoes, causing objects to appear in ultrasound much deeper than they really are. Although some ultrasound machines allow the user to set the speed of sound, but silicone also attenuates ultrasound signal much more than human tissue. Therefore, silicone is only suitable for very superficial ultrasound-guided procedures. Alternatives to silicone for ultrasound-guided procedures include organic materials like gelatine and agar gel, and synthetic materials like polyvinyl chloride and polyvinyl alcohol. Phantom models from organic materials can be made at a lower cost, but they dry out and decay over time. Synthetic plastics typically require more time and special equipment when being prepared, but they last longer, and they are reusable for minimally invasive procedures like needle insertions. Recent improvement in rapid prototyping technology and availability of low-cost 3D printers allow researchers to efficiently design and print mold shapes for soft tissue models. 3D printed bone models are also commonly used, and they can be combined with soft tissue models (Fig. 37.3). The best synthetic models work with multiple modalities like ultrasound and multispectral imaging, while providing realistic tactile feedback. Creating anatomical models often requires the combination of various software and hardware tools. The rapidly developing field of 3-dimensional design requires contin-
Interventional procedures training
Figure 37.3 3D-printed lumbar spine model that can be filled with soft gel for simulated lumbar puncture procedures. Plastic tube can be filled with water to simulate cerebrospinal fluid.
uous self-teaching from researchers to be able to effectively exploit these new technologies. Fortunately, there is a selection of resources available for design procedures, and often an abundance of community-created training materials on video sharing sites and other websites. Many design software is free, or free for academic use. The design process typically starts with a conventional volumetric medical imaging modality, like a computed tomography (CT) or magnetic resonance imaging (MRI) scan of the region. Volumetric images of anatomical regions are first segmented to create organ shapes. Although automatic segmentation methods are available both in research literature and in open-source and commercial software products, it is best to use manual segmentation methods for creating training models. These models require high accuracy and anatomical correctness. After segmentation the anatomical surface models may be combined with parts designed in mechanical engineering applications, and finally translated to an instruction sets for 3D printers (Fig. 37.4). Manufacturing should be considered from the first step of the design process. Anatomical images are best rotated before segmentation so their main axes align with the directions of the 3D printer axes at the end of the design process. Complex shapes are best prepared in parts that can be printed without much support material. Good models often require many iterations of testing on prototypes, which requires considerable time.
37.4.3 Box trainers Simplified environments can be built for practicing specific procedures or skills. A popular training environment is the laparoscopic box trainer, a simple design with holes to hold trocars for practicing tasks and skills on various synthetic models (Fig. 37.5). Similar trainers exist for other specialties, like colonoscopy or ureteroscopy. These trainers typically have low fidelity, and the procedures performed with them have little in common with clinical procedures. But they teach just those skills, like hand–eye coordination, which can be easily complemented by anatomy and pathology knowledge
921
922
Handbook of Medical Image Computing and Computer Assisted Intervention
Figure 37.4 Typical design process of anatomical phantom models. Anatomical shapes are segmented, and imported into professional computer-assisted design (CAD) software. Final models are translated to 3D printer instruction sets.
Figure 37.5 Box trainer for laparoscopic surgery skills.
to effectively apply them in clinical procedures later. The popularity of box trainers proves that fidelity is not the most important aspect of medical procedures simulators. It is more important to find critical tasks and challenging features that need the most practice in a procedure. If a simple simulated environment can capture those features, then adding more realism may not improve the effectiveness of the training system.
Interventional procedures training
Figure 37.6 Medical student reviews his procedure using a virtual reality display after practicing ultrasound-guided needle targeting. For the student, only the virtual reality display is visible, while this picture combines virtual reality and a photograph for illustration.
37.4.4 Virtual reality When the environment is hard to simulate by either biological or synthetic materials, virtual reality environments may help. Virtual reality head-mounted displays have become affordable and feasible for personal computers. Even high-end cell phones come with the capability to render virtual reality scenes. Consumer virtual reality will probably always have its technical limitations. Tactile feedback is often not part of these systems. But head-mounted displays are a great adjunct or alternative to simulated environments in some procedures. Entire operating rooms or emergency scenarios could be shown in virtual reality to improve the realism and simulate the stress factors when practicing on simulated models. Haptic interfaces are also added to high-end simulators, but current haptics technology can only simulate rigid tools like laparoscopic tools and needles. Simulating tactile sensation of the hands would be important for surgical training, and hopefully future technology will make it available. Virtual reality also offers visualization overlaid with physical models that help understand anatomical relations and hand–eye coordination for trainees. Current consumer augmented reality displays lack the spatial registration accuracy required for guidance of precise surgical tools, but they are already useful for creating a more understandable picture of spatial relationships for trainees than they could ever experience without virtual reality (Fig. 37.6). The rapid development of virtual and augmented reality technologies will inevitably result in a wider use in medical training. Creating virtual reality educational content requires special skills, including artistic components. Hiring professional modelers is often beyond the budget of research projects, but lots of models are already available freely online for fantasy and hobby projects, and hopefully models for medical education will also be shared in larger numbers in the future.
923
924
Handbook of Medical Image Computing and Computer Assisted Intervention
37.5. Shared resources Shared software is a key to productive collaboration between researchers in computerassisted intervention training. Training systems typically include components for sensing trainee actions, data processing for performance evaluation, computer graphics for feedback visualization, and other smaller components. Development of such a system takes excessive resources away from research. In addition, independent software tools make the results of different research teams harder to compare. Shared software enables effortless data sharing and reuse of earlier results by other teams. Open-source initiatives have been the most successful models in creating research software platforms. Visualization of 3-dimensional graphics is almost exclusively performed with the open-source Visualization Toolkit (VTK, www.vtk.org) in both research and commercial projects. It would be inconceivable today that a researcher would have the time to get familiar with computer graphics at the level that is provided by VTK for free. Other components of intervention training systems are also provided by open-source software. Communication with hardware devices such as tracking sensors or imaging systems is another major task that requires significant technical background. The PLUS toolkit (www.PlusToolkit.org) implements communication interfaces with a wide range of devices, even complete surgical navigation systems. Data acquired from devices is stored in a standardized format and provided in real time for user interface applications. Typically, PLUS server is run in the background and applications get real-time data through the OpenIGTLink (www.OpenIgtLink.org) network protocol [47]. Applications implementing graphical user interfaces only need to integrate OpenIGTLink to be able to communicate with hundreds of devices and benefit from the data filtering, synchronization, and calibration features of PLUS. User interface applications are also typically started from an application platform. The most popular open-source application platforms implementing OpenIGTLink are 3D Slicer (www.slicer.org) [14] and (www.mitk.org) MITK [33]. 3D Slicer provides an example for incremental development towards specific applications. 3D Slicer was originally developed as an advanced viewer for medical images. Adding OpenIGTLink to 3D Slicer allowed the development of the SlicerIGT extension (www.SlicerIGT.org) for using real-time tracking and imaging data [49]. Algorithms specifically implemented for training medical interventions were added as the Perk Tutor extension (www.PerkTutor.org) [50]. When a new training application is developed for a specific medical intervention, e.g., for central line catheterization, only minimal coding is required due to the multitude of software features already implemented in the underlying software toolkits [21,56]. The proportion of software code amounts in different software architectural layers show that with the proper use of opensource platforms, researchers can focus their time only on the critical part of their applications (Fig. 37.7).
Interventional procedures training
Figure 37.7 Proportions in lines of software code in a typical training simulator for central line insertion.
Shared software tools are a great step towards productive collaboration between research groups. But data is becoming increasingly important too, especially with the data-demanding machine learning algorithms gaining popularity. Shared data can be used to fairly compare different algorithms with each other, and to create benchmark levels in training [2].
37.6. Summary There is a high demand for better education in surgical and interventional skills in medicine. Computerized training methods have been developed, but they are often not the standard method of education worldwide, yet. Future research needs more focus on objective measurement of competency, creating and analyzing shared databases, and building shared platforms to facilitate wider dissemination of research outcomes. Only synchronized efforts and multicenter research studies can provide enough evidence to convince medical educators on a global level on the benefit of computer-assisted training methods. These studies require large international trainee cohorts and many years of follow-up. Primary outcomes need to directly focus on the reduction of clinical complications and improvement in patient outcomes. If researchers take advantage of new technologies and collaborate on shared platforms, their work will achieve these ambitious goals. Good research practices and timely dissemination of knowledge will change the daily practice of medical education to the benefit of healthcare on a global scale.
References [1] S.A. Ahmadi, T. Sielhorst, R. Stauder, M. Horn, H. Feussner, N. Navab, Recovery of surgical workflow without explicit models, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, Berlin, Heidelberg, Oct. 2006, pp. 420–428.
925
926
Handbook of Medical Image Computing and Computer Assisted Intervention
[2] N. Ahmidi, L. Tao, S. Sefati, Y. Gao, C. Lea, B.B. Haro, L. Zappella, S. Khudanpur, R. Vidal, G.D. Hager, A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery, IEEE Transactions on Biomedical Engineering 64 (9) (Sept. 2017) 2025–2041. [3] American College of Surgeons, ACS/APDS Surgery Resident Skills Curriculum, retrieved from http://www.facs.org/education/surgicalskills.html, July 2018. [4] J.D. Beard, Assessment of surgical competence, British Journal of Surgery 94 (11) (Nov. 2007) 1315–1316. [5] C.H. Bisgaard, S.L.M. Rubak, S.A. Rodt, J.A.K. Petersen, P. Musaeus, The effects of graduate competency-based education and mastery learning on patient care and return on investment: a narrative review of basic anesthetic procedures, BMC Medical Education 18 (1) (June 2018) 154. [6] M.A. Corvetto, J.C. Pedemonte, D. Varas, C. Fuentes, F.R. Altermatt, Simulation-based training program with deliberate practice for ultrasound-guided jugular central venous catheter placement, Acta Anaesthesiologica Scandinavica 61 (9) (Oct. 2017) 1184–1191. [7] A.L. D’Angelo, D.N. Rutherford, R.D. Ray, S. Laufer, C. Kwan, E.R. Cohen, A. Mason, C.M. Pugh, Idle time: an underdeveloped performance metric for assessing surgical skill, The American Journal of Surgery 209 (4) (Apr. 2015) 645–651. [8] V. Datta, S. Mackay, M. Mandalia, A. Darzi, The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model, Journal of the American College of Surgeons 193 (5) (Nov. 2001) 479–485. [9] C.R. Davis, E.C. Toll, A.S. Bates, M.D. Cole, F.C. Smith, Surgical and procedural skills training at medical school–a national review, International Journal of Surgery 12 (8) (Aug. 2014) 877–882. [10] G.R. de Oliveira Filho, P.E. Helayel, D.B. da Conceição, I.S. Garzel, P. Pavei, M.S. Ceccon, Learning curves and mathematical models for interventional ultrasound basic skills, Anesthesia and Analgesia 106 (2) (Feb. 2008) 568–573. [11] A. Dosis, F. Bello, D. Gillies, S. Undre, R. Aggarwal, A. Darzi, Laparoscopic task recognition using hidden Markov models, Studies in Health Technology and Informatics 111 (2005) 115–122. [12] M. Ershad, R. Rege, A.M. Fey, Meaningful assessment of robotic surgical style using the wisdom of crowds, International Journal of Computer Assisted Radiology and Surgery 24 (Mar. 2018) 1–2. [13] H.I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, P.A. Muller, Evaluating surgical skills from kinematic data using convolutional neural networks, in: A. Frangi, J. Schnabel, C. Davatzikos, C. Alberola-López, G. Fichtinger (Eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, MICCAI 2018, in: Lecture Notes in Computer Science, vol. 11073, Springer, Cham, 2018, pp. 214–221. [14] A. Fedorov, R. Beichel, J. Kalpathy-Cramer, J. Finet, J.C. Fillion-Robin, S. Pujol, C. Bauer, D. Jennings, F. Fennessy, M. Sonka, J. Buatti, S. Aylward, J.V. Miller, S. Pieper, R. Kikinis, 3D Slicer as an image computing platform for the Quantitative Imaging Network, Magnetic Resonance Imaging 30 (9) (Nov. 2012) 1323–1341. [15] M. Feldman, Rater training to support high-stakes simulation-based assessments, Journal of Continuing Education in the Health Professions 32 (4) (2012) 279–286. [16] G. Forestier, F. Petitjean, L. Riffaud, P. Jannin, Automatic matching of surgeries to predict surgeons’ next actions, Artificial Intelligence in Medicine 81 (Sept. 2017) 3–11. [17] S. Ganni, S.M. Botden, M. Chmarra, R.H. Goossens, J.J. Jakimowicz, A software-based tool for video motion tracking in the surgical skills assessment landscape, Surgical Endoscopy 32 (6) (June 2018) 2994–2999. [18] I. Ghaderi, F. Manji, Y.S. Park, D. Juul, M. Ott, I. Harris, T.M. Farrell, Technical skills assessment toolbox: a review using the unitary framework of validity, Annals of Surgery 261 (2) (Feb. 2015) 251–262. [19] M.G. Goldenberg, A. Garbens, P. Szasz, T. Hauer, T.P. Grantcharov, Systematic review to establish absolute standards for technical performance in surgery, British Journal of Surgery 104 (1) (Jan. 2017) 13–21.
Interventional procedures training
[20] M.G. Goldenberg, J. Nabhani, C.J. Wallis, S. Chopra, A.J. Hung, A. Schuckman, H. Djaladat, S. Daneshmand, M.M. Desai, M. Aron, I.S. Gill, Feasibility of expert and crowd-sourced review of intraoperative video for quality improvement of intracorporeal urinary diversion during robotic radical cystectomy, Canadian Urological Association Journal 11 (10) (Oct. 2017) 331. [21] R. Hisey, T. Ungi, M. Holden, Z. Baum, Z. Keri, C. McCallum, D.W. Howes, G. Fichtinger, Realtime workflow detection using webcam video for providing real-time feedback in central venous catheterization training, in: Medical Imaging 2018: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10576, International Society for Optics and Photonics, Mar. 2018, 1057620. [22] M.S. Holden, Z. Keri, T. Ungi, G. Fichtinger, Overall proficiency assessment in point-of-care ultrasound interventions: the stopwatch is not enough, in: Imaging for Patient-Customized Simulations and Systems for Point-of-Care Ultrasound, vol. 14, Springer, Cham, Sept. 2017, pp. 146–153. [23] M.S. Holden, T. Ungi, D. Sargent, R.C. McGraw, E.C. Chen, S. Ganapathy, T.M. Peters, G. Fichtinger, Feasibility of real-time workflow segmentation for tracked needle interventions, IEEE Transactions on Biomedical Engineering 61 (6) (June 2014) 1720–1728. [24] J.J. Isaacson, A.S. Stacy, Rubrics for clinical evaluation: objectifying the subjective experience, Nurse Education in Practice 9 (2) (Mar. 2009) 134–140. [25] A. Jin, S. Yeung, J. Jopling, J. Krause, D. Azagury, A. Milstein, L. Fei-Fei, Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks, in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, Mar. 2018, pp. 691–699. [26] Y. Jin, Q. Dou, H. Chen, L. Yu, J. Qin, C.W. Fu, P.A. Heng, SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network, IEEE Transactions on Medical Imaging 37 (5) (May 2018) 1114–1126. [27] D. Kati´c, A.L. Wekerle, F. Gärtner, H. Kenngott, B.P. Müller-Stich, R. Dillmann, S. Speidel, Ontology-based prediction of surgical events in laparoscopic surgery, in: Medical Imaging 2013: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 8671, International Society for Optics and Photonics, Mar. 2013, 86711A. [28] Z. Keri, D. Sydor, T. Ungi, M.S. Holden, R. McGraw, P. Mousavi, D.P. Borschneck, G. Fichtinger, M. Jaeger, Computerized training system for ultrasound-guided lumbar puncture on abnormal spine models: a randomized controlled trial, Canadian Journal of Anesthesia (Journal canadien d’anesthésie) 62 (7) (July 2015) 777–784. [29] A. Khan, S. Mellor, E. Berlin, R. Thompson, R. McNaney, P. Olivier, T. Plötz, Beyond activity recognition: skill assessment from accelerometer data, in: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, ACM, Sept. 2015, pp. 1155–1166. [30] E.J. Kim, J. Min, J. Song, K. Song, J.H. Song, H.J. Byon, The effect of electromagnetic guidance system on early learning curve of ultrasound for novices, Korean Journal of Anesthesiology 69 (1) (Feb. 2016) 15–20. [31] T.S. Kim, M. O’Brien, S. Zafar, G.D. Hager, S. Sikder, S.S. Vedula, Objective assessment of intraoperative technical skill in capsulorhexis using videos of cataract surgery, International Journal of Computer Assisted Radiology and Surgery 14 (6) (June 2019) 1097–1105, https://doi.org/10.1007/s11548-01901956-8. [32] G.S. Kirby, P. Guyver, L. Strickland, A. Alvand, G.Z. Yang, C. Hargrove, B.P. Lo, J.L. Rees, Assessing arthroscopic skills using wireless elbow-worn motion sensors, The Journal of Bone and Joint Surgery 97 (13) (July 2015) 1119–1127. [33] M. Klemm, T. Kirchner, J. Gröhl, D. Cheray, M. Nolden, A. Seitel, H. Hoppe, L. Maier-Hein, A.M. Franz, MITK-OpenIGTLink for combining open-source toolkits in real-time computer-assisted interventions, International Journal of Computer Assisted Radiology and Surgery 12 (3) (Mar. 2017) 351–361. [34] F. Lalys, P. Jannin, Surgical process modelling: a review, International Journal of Computer Assisted Radiology and Surgery 9 (3) (May 2014) 495–511.
927
928
Handbook of Medical Image Computing and Computer Assisted Intervention
[35] D.R. Leff, G. Petrou, S. Mavroveli, M. Bersihand, D. Cocker, R. Al-Mufti, D.J. Hadjiminas, A. Darzi, G.B. Hanna, Validation of an oncoplastic breast simulator for assessment of technical skills in wide local excision, British Journal of Surgery 103 (3) (Feb. 2016) 207–217. [36] T.S. Lendvay, K.R. Ghani, J.O. Peabody, S. Linsell, D.C. Miller, B. Comstock, Is crowdsourcing surgical skill assessment reliable? An analysis of robotic prostatectomies, Journal of Urology 197 (4) (Apr. 2017) E890–E891. [37] J.A. Martin, G. Regehr, R. Reznick, H. Macrae, J. Murnaghan, C. Hutchison, M. Brown, Objective structured assessment of technical skill (OSATS) for surgical residents, British Journal of Surgery 84 (2) (Feb. 1997) 273–278. [38] B. Miller, D. Azari, R. Radwin, B. Le, MP01-07 use of computer vision motion analysis to aid in surgical skill assessment of suturing tasks, The Journal of Urology 199 (4) (2018) e4. [39] K. Moorthy, Y. Munz, S.K. Sarker, A. Darzi, Objective assessment of technical skills in surgery, BMJ 327 (7422) (Nov. 2003) 1032–1037. [40] M. Morris, A. O’Neill, A. Gillis, S. Charania, J. Fitzpatrick, A. Redmond, S. Rosli, P.F. Ridgway, Prepared for practice? Interns’ experiences of undergraduate clinical skills training in Ireland, Journal of Medical Education and Curricular Development 3 (2016), JMECD-S39381. [41] V.N. Palter, T.P. Grantcharov, A prospective study demonstrating the reliability and validity of two procedure-specific evaluation tools to assess operative competence in laparoscopic colorectal surgery, Surgical Endoscopy 26 (9) (Sep. 2012) 2489–2503. [42] G. Regehr, H. MacRae, R.K. Reznick, D. Szalay, Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination, Academic Medicine 73 (9) (Sept. 1998) 993–997. [43] R.K. Reznick, H. MacRae, Teaching surgical skills—changes in the wind, The New England Journal of Medicine 355 (25) (Dec. 2006) 2664–2669. [44] J.M. Sawyer, N.E. Anton, J.R. Korndorffer Jr., C.G. DuCoin, D. Stefanidis, Time crunch: increasing the efficiency of assessment of technical surgical skill via brief video clips, Surgery 163 (4) (Apr. 2018) 933–937. [45] L. Sbernini, L.R. Quitadamo, F. Riillo, N. Di Lorenzo, A.L. Gaspari, G. Saggio, Sensory-glove-based open surgery skill evaluation, IEEE Transactions on Human-Machine Systems 48 (2) (Apr. 2018) 213–218. [46] C.C. Schmitz, D. DaRosa, M.E. Sullivan, S. Meyerson, K. Yoshida, J.R. Korndorffer Jr, Development and verification of a taxonomy of assessment metrics for surgical technical skills, Academic Medicine 89 (1) (Jan. 2014) 153–161. [47] J. Tokuda, G.S. Fischer, X. Papademetris, Z. Yaniv, L. Ibanez, P. Cheng, H. Liu, J. Blevins, J. Arata, A.J. Golby, T. Kapur, OpenIGTLink: an open network protocol for image-guided therapy environment, The International Journal of Medical Robotics and Computer Assisted Surgery 5 (4) (Dec. 2009) 423–434. [48] A. Toprak, U. Luhanga, S. Jones, A. Winthrop, L. McEwen, Validation of a novel intraoperative assessment tool: the surgical procedure feedback rubric, The American Journal of Surgery 211 (2) (Feb. 2016) 369–376. [49] T. Ungi, A. Lasso, G. Fichtinger, Open-source platforms for navigated image-guided interventions, Medical Image Analysis 33 (Oct. 2016) 181–186. [50] T. Ungi, D. Sargent, E. Moult, A. Lasso, C. Pinter, R.C. McGraw, G. Fichtinger, Perk Tutor: an opensource training platform for ultrasound-guided needle insertions, IEEE Transactions on Biomedical Engineering 59 (12) (Dec. 2012) 3475–3481. [51] M.C. Porte, G. Xeroulis, R.K. Reznick, A. Dubrowski, Verbal feedback from an expert is more effective than self-accessed feedback about motion efficiency in learning new surgical skills, The American Journal of Surgery 193 (1) (Jan. 2007) 105–110.
Interventional procedures training
[52] Y. Sharma, T. Plötz, N. Hammerld, S. Mellor, R. McNaney, P. Olivier, S. Deshmukh, A. McCaskie, I. Essa, Automated surgical OSATS prediction from videos, in: Biomedical Imaging (ISBI), 2014 IEEE 11th International Symposium on, IEEE, Apr. 2014, pp. 461–464. [53] A.P. Twinanda, S. Shehata, D. Mutter, J. Marescaux, M. De Mathelin, N. Padoy, Endonet: a deep architecture for recognition tasks on laparoscopic videos, IEEE Transactions on Medical Imaging 36 (1) (Jan. 2017) 86–97. [54] M.C. Vassiliou, L.S. Feldman, C.G. Andrew, S. Bergman, K. Leffondré, D. Stanbridge, G.M. Fried, A global assessment tool for evaluation of intraoperative laparoscopic skills, The American Journal of Surgery 190 (1) (July 2005) 107–113. [55] J.D. Westwood, H.M. Hoffman, D. Stredney, S.J. Weghorst, Validation of virtual reality to teach and assess psychomotor skills in laparoscopic surgery: results from randomised controlled studies using the MIST VR laparoscopic simulator, in: Medicine Meets Virtual Reality: Art, Science, Technology: Healthcare and Evolution, 1998, p. 124. [56] S. Xia, Z. Keri, M.S. Holden, R. Hisey, H. Lia, T. Ungi, C.H. Mitchell, G. Fichtinger, A learning curve analysis of ultrasound-guided in-plane and out-of-plane vascular access training with Perk Tutor, in: Medical Imaging 2018: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10576, International Society for Optics and Photonics, Mar. 2018, 1057625. [57] B. Zevin, E.M. Bonrath, R. Aggarwal, N.J. Dedy, N. Ahmed, T.P. Grantcharov, Development, feasibility, validity, and reliability of a scale for objective assessment of operative performance in laparoscopic gastric bypass surgery, Journal of the American College of Surgeons 216 (5) (May 2013) 955–965. [58] A. Zia, Y. Sharma, V. Bettadapura, E.L. Sarin, I. Essa, Video and accelerometer-based motion analysis for automated surgical skills assessment, International Journal of Computer Assisted Radiology and Surgery 13 (3) (Mar. 2018) 443–455.
929