Accepted Manuscript Validation of a Novel Cognitive Simulator for Orbital Floor Reconstruction Renata Khelemsky, DDS MD, Resident, Brianna Hill, B.A., Medical Student, Daniel Buchbinder, DMD MD, Chairman PII:
S0278-2391(16)31208-3
DOI:
10.1016/j.joms.2016.11.027
Reference:
YJOMS 57560
To appear in:
Journal of Oral and Maxillofacial Surgery
Received Date: 30 August 2016 Revised Date:
20 November 2016
Accepted Date: 24 November 2016
Please cite this article as: Khelemsky R, Hill B, Buchbinder D, Validation of a Novel Cognitive Simulator for Orbital Floor Reconstruction, Journal of Oral and Maxillofacial Surgery (2017), doi: 10.1016/ j.joms.2016.11.027. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Title: Validation of a Novel Cognitive Simulator for Orbital Floor Reconstruction Authors: Renata Khelemsky, DDS MD£ ; Brianna Hill, BA¥ ; Daniel Buchbinder, DMD MD€ Resident, Department of Otolaryngology-Head and Neck Surgery, Division of Oral &
RI PT
£
Maxillofacial Surgery, Mount Sinai Beth Israel and Icahn School of Medicine at Mount Sinai, ¥
B.A., Medical Student, The George Washington University School of
Medicine and Health Sciences.
€
Chairman, Department of Otolaryngology-Head and Neck
SC
New York, NY.
Surgery, Division of Oral & Maxillofacial Surgery, Mount Sinai Beth Israel and Icahn School of
M AN U
Medicine at Mount Sinai, New York, NY.
Corresponding author:
TE D
Renata Khelemsky 10 Union Square E, Suite 5B New York, NY 10003
EP
Fax: (212) 844-6975
AC C
Email:
[email protected] Conflict of interest: None
Financial Disclosures: None Acknowledgements: The authors would like to acknowledge Ali Bahsoun, Jean Nehme, and Andre Chow of Touch Surgery™ for their help with data acquisition and unwavering
ACCEPTED MANUSCRIPT
cooperation in the methodology of this study, as well as Jennifer Reingle Gonzalez, Ph.D. for her
AC C
EP
TE D
M AN U
SC
RI PT
chief statistical support in analyzing the data.
ACCEPTED MANUSCRIPT
Validation of a Novel Cognitive Simulator for Orbital Floor Reconstruction
Brianna Hill, BA
AC C
EP
TE D
M AN U
SC
Daniel Buchbinder, DMD, MD
RI PT
Renata Khelemsky, DDS, MD
ACCEPTED MANUSCRIPT
Abstract Purpose: The increasing focus on patient safety in current medical practice has promoted the development of surgical simulation technology in the form of virtual reality (VR) training
RI PT
designed largely to improve technical skills, and less so, non-technical aspects of surgery such as decision-making and material knowledge. The present study investigates the validity of a novel cognitive VR simulator called Touch Surgery ™ for a core maxillofacial surgical procedure:
SC
orbital floor reconstruction (OFR).
M AN U
Methods: A cross-sectional study was carried out on two groups of participants with differing experience level. Novice graduate dental students and expert surgeons were recruited from a local dental school and academic residency programs, respectively. All participants completed the OFR module on Touch Surgery™. The primary outcome variable was simulator performance
TE D
score. Post-module questionnaires rating specific aspects of the simulation experience were completed by both groups and served as the secondary outcome variables. The age and sex of participants were considered additional predictor variables. From these data, conclusions were
EP
made regarding three types of validity (face, content, and construct) for the Touch Surgery™ simulator. Dependent samples t-tests were used to explore the consistency in simulation
AC C
performance scores across Phase 1 and 2 by experience level. Two multivariate ordinary least squares regression models were fit to estimate the relationship between experience and Phase 1 and 2 scores.
Results: A total of 39 novices and 10 experts naïve to Touch Surgery™ were recruited for the study. Experts outperformed novices on both phases of the OFR module (p<0.001), which
ACCEPTED MANUSCRIPT
provided the measure of construct validation. Responses to the questionnaire items used to assess face validity were favorable from both groups. Positive questionnaire responses were also
age and sex were not significant predictors of performance scores.
RI PT
recorded from experts alone on items assessing the content validity for the module. Participant
Conclusion: Construct, content, and face validity were observed for the OFR module on a novel
SC
cognitive simulator, Touch Surgery™. OFR simulation on the smart device platform could therefore serve as a useful cognitive training and assessment tool in maxillofacial surgery
AC C
EP
TE D
M AN U
residency programs.
ACCEPTED MANUSCRIPT
Introduction Traditionally, a surgical procedure like orbital floor reconstruction (OFR) would be learned by textbook reading and repeated observation, followed by supervised practice in a surgical
RI PT
residency program.1 However changes in medical economics and residency standards over the past decade have focused on reduced resident work hours, shortened hospital stays, outpatient procedures, and minimally invasive techniques.2,3 While oral and maxillofacial surgery (OMS)
SC
training does not fall under the mandates of the Accreditation Council for Graduate Medical Education (ACGME), the initial 80-hour workweek conditions set forth by the 1989 Bell
M AN U
Commission were extended to OMS residents in New York State.4 A significant portion of OMS training also takes place alongside other surgical fields in a hospital setting, which realistically implicates OMS in the pool of training issues. The relatively recent ACGME initiatives to protect resident wellbeing and patient safety led to objective improvements in
TE D
clinical outcomes, leaving OMS residencies at free will to implement regulations for the sake of consistency with other surgical departments or for individual reasons to adopt restricted work hours.4-6 In response to the climate of increasing safety, conventional training methods in
EP
surgical residencies have also been affected.7 For a surgical trainee, the increasing emphasis on patient safety may be directly at odds with the shrinking number of opportunities to observe and
AC C
practice surgery.2,8,9 Moreover, competency-based evaluations and standardized assessments of residents have increased, adding to the demand for more standardized practice opportunities.10
Orbital floor fractures are frequently the result of accidents such as sports injuries, traffic accidents, falls, and interpersonal violence-related facial trauma.11 Given the complex anatomical nature of the bony orbit, repair of this facial injury demands keen surgical ability
ACCEPTED MANUSCRIPT
when considering the orbit’s role in the support and function of the globe.12,13 Fractures of the orbital floor can result in bony defects that require reconstruction rather than reduction and fixation, placing both the function and aesthetic appearance of the eye at risk. It is therefore
RI PT
essential that repair be undertaken by highly skilled and knowledgeable surgeons to decrease the risk for complications such as globe malposition, diplopia, corneal injury, retrobulbar hematoma
SC
and blindness.13,14
Surgical simulation as an educational tool has somewhat abated the concern surrounding the
M AN U
shortage of experience-based opportunities for surgical trainees. That experience is a prerequisite of expertise is well-known; however, the notion of the “right” type of experience, which is aimed at improving performance, can potentially be fostered with virtual reality (VR) simulation.15 VR simulation combines several positive educational components of experience like repetition,
TE D
formative feedback, and immediate review of outcomes in a safe, controlled, and self-directed setting that is ideal for adult learning.16-18 VR simulation has been implemented in other fields like aviation as a tool to expedite mastery achievement without risking safety.1 Surgical
EP
simulation has focused on improving technical skills with the idea that the early learning curve can be fulfilled on a virtual trainer rather than a live patient. Indeed, reduced error rates during
AC C
laparoscopic surgery were demonstrated in a randomized double-blinded study for trainees who learned the procedure with simulation compared to those who used traditional methods.19 Still, cognitive qualities such as automation of problem solving processes and a rich professional memory, are key to surgical excellence, and thus represent an important first phase of skill development.10,20
ACCEPTED MANUSCRIPT
For the trainee learning from an expert surgeon, it can be a challenge to learn each minute step involved in a procedure, partly because experts tend not to dictate that they use unconscious automated knowledge to recall and perform during surgery.21 To accumulate hands-on practice
RI PT
outside the operating room, maxillofacial residents have resorted to cadaver labs and benchtop bone model exercises, which are difficult to access, expensive, and lead to highly variable
experiences.22 The use of more advanced simulation technologies is no longer novel to the
SC
practice of craniomaxillofacial surgery and has already been demonstrated with tools like virtual preoperative planning, temporal bone haptics-enabled drilling, and high-fidelity bimanual
M AN U
neurosurgery simulators.23-25 Moreover, adult learners benefit from self-directed engagement with a task,19 such as the kind that can be provided by VR simulation wherein residents engage in deliberate practice prior to encountering a live patient. Simulation research has consequently focused on technical training with evidence of improved clinical performance following
TE D
simulator exposure.26-29 Simulation-based training focusing on the cognitive decision-making aspects of surgery, however, is relatively new and largely underutilized within head and neck
EP
surgical curricula.20,21
Touch Surgery™ is a novel, readily available and free application (Kinosis Limited, London,
AC C
UK) designed for smart devices (ie, smart phones and tablets) and intended to teach surgical procedures by utilizing cognitive task analysis (CTA) theory, which holds that a performable task consists of a series of basic cognitive operations. This theory is framed on understanding how an experienced operator executes complex tasks during surgery. Such tasks require both controlled (conscious) and automated (unconscious) knowledge and belong to a three-part system wherein skill-based, rule-based, and knowledge-based behaviors create the sum of
ACCEPTED MANUSCRIPT
intraoperative decision-making.21 The Touch Surgery™ platform is a hybrid of CTA and VR simulation whereby trainees interact with 3D animation to rehearse a given surgery in a stepwise fashion, gaining access to the necessary tools, anatomical landmarks, and pitfalls of the
RI PT
chosen surgical procedure.
Surgical education research enforces rigorous validation testing and methodologies for any novel
SC
assessment tool in order to examine its effectiveness in simulation-based training.30 Of these, the most commonly encountered are face and content validity, in which the former describes the
M AN U
appropriateness of the tool and the latter, carried out by a group of experts, describes the integral makeup of such a tool to measure what it is supposed to measure. A third benchmark of validation research is construct validity, which is defined as the extent to which a tool can measure variations in performance between subjects having different levels of a defined
orthopedic surgery.32,33
TE D
construct.30,31 Previous validation studies for Touch Surgery™ have been conducted for
EP
The purpose of this study was to perform a validation study of the OFR module on the Touch Surgery™ platform and to assess the utility of this novel cognitive simulator for residency
AC C
curricula applications such as training and assessment. The specific aims of the study were to measure the objective Touch Surgery™ performance metrics in a group of novice and expert participants with respect to their experience level in performing an OFR procedure. This enabled a measure of construct validity. Additional aims of the study were to collect subjective questionnaire responses regarding the acceptability of the module from both groups, and the accuracy of the module from the experts alone. This enabled a measure of face and content
ACCEPTED MANUSCRIPT
validity, respectively. The investigators hypothesize that this novel cognitive trainer will demonstrate the three forms of validity and thus have a purposeful role in residency curriculum
Surgery™ in the scope of maxillofacial surgery.
Methods
SC
Study Design & Sample
RI PT
development and competency-based training. This is the first validation study for Touch
To address the research purpose, the investigators designed and implemented a cross-sectional
M AN U
study. The study population was composed of novice graduate dental students from Columbia University College of Dental Medicine and local expert surgeons in the greater New York City region. Both groups of participants were recruited by voluntary enrollment through email invitations in June 2016. To be included in the study sample, dental students were required to
TE D
have completed a comprehensive head and neck anatomy course. Students were excluded from the novice group if he or she had previously observed or participated in an OFR, had prior exposure to Touch Surgery™, or did not complete both Phases 1 and 2 of the OFR module. To
EP
be considered an expert in this study, surgeons were required to perform OFR surgery independently. Exclusion criteria for expert participants included having prior exposure to
AC C
Touch Surgery™ and failure to complete both Phases 1 and 2 of the OFR module. None of the study participants from either group were involved in the development or authorship of the module.
Variables
ACCEPTED MANUSCRIPT
The primary predictor variable in this study is experience level as defined in the inclusion criteria for novice and expert participants. The primary outcome variable in this study is simulator performance score calculated out of a maximum score of 100. Secondary outcome variables
RI PT
were made up of answers to post-module questionnaire items dealing with the face validity (for both groups) and content validity (for experts only) of the simulator. Additional variables that may have affected the outcomes of interest and were thus additionally considered in this study
M AN U
Data Collection Methods
SC
included age and sex of participants.
Enrollment and Location – Due to the educational nature of the study and the de-identification of all enrolled subjects, an exemption was obtained from the Institutional Review Board from the Icahn School of Medicine at Mount Sinai. Novices were recruited by voluntary open enrollment
TE D
from email invitations sent to the Columbia University College of Dental Medicine’s second, third, and fourth year classes. Experts were recruited by similar email invitations from local New York City training programs in both OMS and otolaryngology departments. A description
EP
of the study purpose and design was provided, however due to our defined exclusion criteria, the name of the software application was not disclosed. A dedicated testing site consisting of a quiet
AC C
room at the graduate school’s library was reserved for the novice group, and makeup sessions took place in the same library and conditions. Experts who enrolled were asked to arrange a time and quiet location of their choice in order to complete the module. A research investigator observed all novices and experts during the time of their participation for basic compliance and study integrity.
ACCEPTED MANUSCRIPT
Touch Surgery Simulator – On the day of the study, participants downloaded the free Touch Surgery™ app using their own smart devices, which included Apple iPads, iPhones (Apple, Apple, Cupertino, CA, USA), and Android phone devices. Participants randomly selected a
RI PT
paper card with login information from a box of pre-generated email addresses with which to register their software. The investigators provided chargers and a backup Apple iPad for anyone whose device malfunctioned during the study. Free wireless access was provided for
SC
participants. The OFR module was authored by Dr. Khelemsky and Dr. Buchbinder of Mount
M AN U
Sinai Beth Israel, NY and developed by Touch Surgery™ Labs in June 2015.
The app can be used in two modes: Learn and Test. Subjects were not exposed to the Learn mode during this study (Figures 1a and 2a). The Test mode measures the user’s knowledge of a procedure through a series of multiple choice questions (Figure 1b) and intermittent virtual
TE D
prompts to drag a virtual instrument to the correct anatomical space using a swipe motion (Figure 2b), both of which occur during a continuous 3D animation of the surgical procedure. In the Test mode, the user must choose the single appropriate choice or swipe before advancing to
EP
the next procedural step and before obtaining a final performance score.
AC C
Both Phase 1 and Phase 2 of the OFR module were completed back to back in “Test” mode. Phase 1 of the OFR module depicted the sequence of steps required to prep and drape a patient, followed by exposure of the orbital floor fracture via a transconjunctival approach with lateral canthotomy. Phase 2 of the module related to the insertion and fixation of a preformed titanium orbital implant and closure of the soft tissue incisions.
ACCEPTED MANUSCRIPT
Questionnaires – All participants were required to complete pre-test and post-test questionnaires. The pre-test questionnaire was used to screen participants for inclusion and exclusion criteria. Gender, age, and experience level (graduate year for novices, and surgical experience in 5-year
RI PT
ranges for experts) were optional items. Post-test questionnaires collected data about the
acceptability and realism of the simulator from both groups of participants by rating statements on a 5-point Likert scale in the categories of visual appearance, utility for training, interest to
SC
learn more procedures, and interest for use in training programs. Participants in the expert group were given additional questionnaire items that assessed categories of anatomic accuracy, surgical
M AN U
sequence, and instrumentation for the depicted procedure (Figure 3).
Data Collection - Each participant underwent a rigorous de-identification process such that data was securely recorded in real time after logging in with the randomly selected pre-generated
TE D
email addresses. The Touch Surgery™ server was able only to identify which users were novices and which users were experts based on the login email addresses, while any remaining subject
EP
data remained completely de-identified.
Participants were evaluated based on the number of points they earned while completing the
AC C
module. An individual score for both Phase 1 and Phase 2 of the module was calculated by the software based on correct multiple-choice answer and correct swipe choice. Once the correct answer choice or swipe was selected, the module progressed to the next authored step. The module did not support alternative approaches or modification of steps during the virtual simulation. A single point was gained with each correct choice, while an incorrect choice caused
ACCEPTED MANUSCRIPT
a failure to accumulate points, leading to a lower cumulative score. The data was stored to the secure Touch Surgery™ server as two performance scores, one each for Phase I and Phase II.
RI PT
Data Analysis
A Shapiro-Wilk Test was used to assess normality in the score distribution and continuous covariates. Chi-squared and t-tests were used to examine differences between novices and
SC
experts on categorical and continuous measures, respectively. The Pearson’s r was used to assess the magnitude of the association correlation between simulation performance scores by module
M AN U
phase and subject age (the only two continuous covariates measured among all participants). Dependent samples t-tests were used to explore the consistency in simulation performance scores across Phase 1 and 2 by experience level. Bivariate and multivariate ordinary least squares (OLS) regression models were fit to estimate the relationship between experience and Phase 1
TE D
and 2 scores, controlling for age and sex.
To examine whether content and face validation measures were equal across novices and experts,
EP
the non-parametric, two-sample Kolmogorov-Smirnov test was used. Because these measures included multiple Likert scales, the distributions were not normal and medians are therefore
AC C
presented. For all models, item-level listwise deletion was used because data were missing completely at random. Alpha was set a priori at .05.
Results
Table I summarizes the demographics of the study population. The study initially recruited 42 novices and 10 experts. Data for three novices were collected but later excluded because these
ACCEPTED MANUSCRIPT
participants reported observing an OFR. The final sample was composed of 39 novices of which 18 (47%) were males, 20 (53%) were females, and 1 who chose not to identify. The mean age of novice participants was 24.8 ± 2.5 years and 12 (30%) were second-year, 13 (33%) were third-
RI PT
year, and 14 (36%) were fourth-year dental students. Due to a technical server error in data collection, performance scores were not recorded from four participants in the novice group.
SC
The second group consisted of 10 experts of which 9 (90%) were males and 1 (10%) was female. The mean age of experts was 46.6 ± 9.8 years of age. Compared to novices, experts were
M AN U
significantly older (p<0.001) and more frequently male (p=0.02). The majority of experts (80%) had greater than 10 years of independent surgical experience, while 1 expert (10%) had 1-5 years and 1 expert (10%) had 6-10 years of experience.
TE D
Table II displays the bivariate evaluation of the relationship between age, sex and performance scores across study phase. Age was positively associated with performance scores at both time points, indicating that older participants had higher performance scores. Every one unit increase
EP
in age was associated with a 0.72 and 0.63 unit increase in Phase 1 and 2 scores, respectively
AC C
(p<0.001). No sex differences in the mean performance scores were identified in either phase.
Table III depicts the relationship between experience and performance scores per phase of the module. Overall, no significant differences in performance scores across Phases 1 and 2 by experience level emerged. The mean performance score for novices was slightly higher in Phase 1 (mean: 44) than Phase 2 (mean: 42); however, this difference did not approach statistical
ACCEPTED MANUSCRIPT
significance (p=0.397). Among experts, scores for Phase 2 were marginally higher than Phase 1 (mean 78 vs. 76, respectively). This difference was not statistically significant (p=0.709).
RI PT
Table IV includes results of bivariate OLS regression models examining the unadjusted
relationships between experience, sex, age and performance score. In both phases, experts had significantly higher performance scores compared to novices. In Phase 1, experts scored 32.21
SC
units (95% CI 25.40-39.01) higher on average than novices. In Phase 2, experts scored 35.71 units (95% CI 25.46-45.95) higher than novices on average. These differences by experience
M AN U
level were statistically significant (p<0.001). In both Phases 1 and 2, age was positively associated with performance scores. Specifically, each unit increase in age was associated with a 1.13 unit (95% CI 0.79-1.47) increase in Phase 1 performance score, and a 1.23-unit (95% CI 0.76-1.70) increase in Phase 2 performance score. Male participants also scored significantly
TE D
higher on average than female participants across both phases. In Phase 1, males scored 10.87 units (95% CI 1.39-20.45) higher than females (p=0.026); in Phase 2, males scores 16.20 units
EP
(95% CI 4.27-28.13) higher than females (p=0.009).
Table V includes results of two OLS regression models to examine the relationship between
AC C
experience and performance score, controlling for sex and age. In both phases, experts had significantly higher performance scores compared to novices. In Phase 1, experts scored 32.91 units (95% CI 16.87-48.95) higher on average than novices, controlling for age and sex. In Phase 2, experts scored 33.85 units (95% CI 10.48-57.22) higher than novices on average, controlling for age and sex. These differences by experience level were statistically significant. Notably, sex and age were not significantly associated with performance scores in multivariate
ACCEPTED MANUSCRIPT
models.
Table VI summarizes secondary outcomes data from questionnaires that were calculated using
RI PT
median values from a 5-point Likert scale. Questionnaire responses to four items that rated face validity were equally favorable from both groups; no significant differences emerged between novices and experts. For each item rated by experts and novices, median Likert scale values were
SC
at or above 4 from experts and 5 from novices. Experts rated the questionnaire items related to content validity favorably as well, with median values of 4 for surgical instrumentation, 3.5 for
M AN U
surgical sequence, and 4 for anatomic accuracy.
Discussion
The purpose of the present study is to perform a validation study for a common maxillofacial
TE D
surgical procedure on a novel cognitive VR simulator and to assess its utility as a training and assessment tool in the context of a maxillofacial training curriculum. Three types of validity were under investigation – construct, face, content – and in order to provide objective measures
EP
of each one, primary and secondary outcome variables were defined as simulator performance score and numerical questionnaire answers, respectively. The investigators hypothesized that
AC C
experience level of the user would correlate with simulator performance scores such that increased experience leads to higher performance scores. This was indeed the key finding of the study, as experts significantly outperformed novices on both phases of the module. Since construct validity pertains to the ability of a simulator to reliably measure a difference in expertise between two groups,27 the present findings designate construct validation to the Touch Surgery™ simulator.
ACCEPTED MANUSCRIPT
The investigators also found that both the bivariate and multivariate regression models revealed a positive association between experience and performance score across both phases. In the
RI PT
bivariate models, sex and age were significantly associated with performance scores. However, multivariate results suggest that experience was the predictor that explained the variation in performance scores and sex and age were confounding these effects. These findings are to be
SC
expected, as experts were older and more likely to be male than novices. The data analysis
confirms our hypothesis that the Touch Surgery™ simulator maintains construct validity in that
M AN U
it measures what it is intended to measure, which is a level of knowledge that is proportional to experience.
Additional findings of this study show that both study groups also answered favorably (ratings of
TE D
4 or 5) on questions dealing with the simulator’s realism and acceptability. This data upheld the presence of face validity. Moreover, experts alone answered favorably on items pertaining to the objective accuracy and content of the modules. From this data, we corroborated content validity.
EP
Given the observation of construct, face, and content validity, the investigators propose the incorporation of cognitive surgical simulation such as the one offered by the Touch Surgery™
AC C
platform into the field of OMS for both training and assessment purposes.
Touch Surgery™ is a mobile VR simulator that provides trainees with the opportunity to rehearse a series of goal-oriented steps that make up a larger operative task and to train surgical decision-making skills using the principle of cognitive task analysis (CTA). Our results for construct validation are similar to previously published data by Sugand et al. using the same
ACCEPTED MANUSCRIPT
software for an orthopedic procedure where median scoring ranges were 41-60% for novices and 72-77.5% for experts,32 compared to our median ranges of 44-47% and 77-77.5% respectively. This data demonstrates a measurable gap with respect to the construct of our interest (surgical
assessment tool to distinguish between users of varying skill level.
RI PT
proficiency) between two dissimilar groups, advocating that Touch Surgery™ is a useful
SC
Face validity confirms the simulator’s acceptance to the degree that is appears to be realistic and useful in portraying the surgical procedure.27,34 We support the presence of face validity in this
M AN U
study with median questionnaire ratings of 4 or greater from both experts and novices. Subjects in both groups aided in the measurement of face validity since both experts and novices had adequate prior exposure to the anatomic region of interest, by way of surgery and/or anatomy education and dissection. The questionnaire items mainly addressed categories of visual
TE D
appearance, utility for training, and interest to use the software for future education. These categories were chosen in order to assess whether this simulation platform creates a virtual environment that is appropriately realistic for learning surgical procedures. Previous studies
EP
reporting on face validation used similar questionnaire-based methods to address categories on a
AC C
numerical analog or Likert scale.29,32,35
On the other hand, only experts with independent experience in OFR were given questionnaire items to assess content validity, which measures the appropriateness of Touch Surgery™ as a tool that delivers and achieves a particular goal.34 While validity in a general sense refers to an instrument’s ability to measure what it intends to measure, content validity derives exclusively from experts who offer their opinions in order to bring diverse concepts to a working level.36
ACCEPTED MANUSCRIPT
Early literature on the development of content validity argues that in order to meet the standards of content validation, a two-stage process is required in which wide-ranging content is first developed and then quantified into discrete testable items, ideally with an expert panel to identify
RI PT
any areas of omission.34 While the authors of the present study did not adhere to this rigorous methodology, the present study is the first to show a positive rating of the anatomic accuracy, surgical sequence, and depicted instrumentation to perform an OFR by what is essentially a
SC
small group of ten maxillofacial surgery “experts.” A lower rating for surgical instrumentation can be explained by the fact that more than one set of instruments is suitable for common
M AN U
surgical steps, and personal preferences are likely to emerge in a relatively small number of expert participants.
Several limitations within this study are worth mentioning. First, we did not use an intermediate
TE D
group between novices and experts, such as OMS residents with early exposure to surgery. Arguably this would help to further corroborate the construct validity of the OFR module, particularly on Phase 1 where performance was in part related to knowledge of patient
EP
positioning and preparation that is obtained earlier in training. While non-surgeon graduate students share only the anatomical basis for surgery with their expert counterparts, second year
AC C
OMS residents would be proportionally more experienced than our chosen novices and would thus be expected to have scores in the intermediate range. The choice of subjects in this study limits construct validation to a gross measure of the simulator’s ability to assess a difference at two extremes, rather than a more graded assessment of knowledge that would lend support for a more sensitive and widely applicable assessment tool.
ACCEPTED MANUSCRIPT
Second, the assessment of face validity in our study is based on subjective answers that are at risk for response bias by nature of the tested population. Graduate students are typically grateful for the opportunity to sample advanced training and free education, whereas experts have
RI PT
varying exposure to technically innovative tools. However in our study we observed median scores well above 3.0 out of 5.0; therefore, it is unlikely that this issue influenced our
conclusions. Third, we did not evaluate whether cognitive skill acquisition persisted with time,
SC
and future studies in this field should investigate how long and how often simulation modules should be repeated to maintain a satisfactory proficiency level among trainees. The learning
M AN U
curve associated with repeated use of the OFR module should also be explored in the future as was done for an orthopedic surgery procedure,33 which would allow for fine-tuning of the content and MCQ’s to develop a valid as well as an efficient training tool.
TE D
Fourth, the Touch Surgery™ simulator consists of fairly basic functions that are designed to measure a finite amount of knowledge, rather than intricate technical skills to respond to changing surgical conditions (decreased visibility, fracture severity, anatomical variants, etc).
EP
This limitation should be weighed against the overall context of OFR, which can be performed in a variety of ways that are often dictated by complex intraoperative needs. The OFR module as it
AC C
was authored for Touch Surgery™ is committed to a linear progression of steps without accounting for these special circumstances, and this may limit users from practicing more complex cognitive functions. While a high-fidelity interactive simulator experience is beyond the intention of the Touch Surgery ™ platform, the investigators do support increased development of functions such as the addition of decision trees, ongoing revision of specific
ACCEPTED MANUSCRIPT
steps, and residency communication channels, both internal and external, that invite differing professional views about debated steps or tools for a given surgical procedure.
RI PT
The most substantial limitation is the lack of evidence to show predictive validity. Prior studies attempting to show valuable data in this realm mostly rely on small subject numbers for their conclusions.37 A systematic review of 21 qualified papers pertaining to VR simulators for
SC
Otolaryngology procedures elicited 7 predictive validity studies that showed improved cadaveric temporal bone dissection following a virtual temporal bone simulator, significantly increased
M AN U
surgical confidence following practice on an endoscopic sinus surgery simulator, as well reduced overall operating time, reduced number of surgical errors made, and better overall performance with simulation-trained residents.27 A recent study enrolled 17 trainees and 4 expert robotic surgeons and showed a significant positive relationship between simulated VR training tasks and
TE D
intraoperative robotic performance.26 These data underscore that the results of the present study cannot make the assertion that exposure to this simulator will improve clinical performance. In order to become a useful training tool, the OFR module on the Touch Surgery™ platform must
EP
provide the training functions needed to reduce errors, provide rapid feedback with unlimited practice of the correct surgical sequence, and ultimately improve clinical outcomes for patients.
AC C
Although it may be cumbersome to perform a VR-to-OR study when considering the limited evidence to support the use of a new tool and ethical barriers involving the use of human subjects, predictive validity studies still possess the burden of proof to justify implementing this technology to resident training curricula.
37
An area that has raised recent interest is the ability to assess and teach surgical judgment.
ACCEPTED MANUSCRIPT
At a cognitive level, expert surgical judgment is rooted in automated representations of a given surgery with an allowance of free cognitive capacity to anticipate cues and problems, rather than actively responding to them.38 While dexterity and manual sensitivity can be rehearsed without
RI PT
the presence of an expert on technical simulators,26 this may lead to insufficient training. Pugh et al. interviewed 13 experienced surgeons and performed a rigorous CTA-based analysis of their thought processes during critical, error-prone surgical moments. The researchers found several
SC
variables affecting intra-operative decision making, indicating that “knowing” what to do in a crisis is a key player for task completion, leading the authors to conclude that knowledge-based
M AN U
behaviors outweigh skill-based ones in error rescue scenarios.21 These studies argue for the development of a knowledge-based surgical simulator, which may help to equalize decisionmaking processes between surgeons and operative team members, and promote situational awareness.21,38 A simulator platform like Touch Surgery™ that uses cognitive virtual task
intraoperative decisions.
EP
Conclusion
TE D
training has a role that falls consistent with the teaching of knowledge-based and successful
The present study establishes the presence of face, content, and construct validity for the OFR
AC C
module on the Touch Surgery™ cognitive task simulator, asserting Touch Sugery™ as a userfriendly platform that is well adapted for rehearsal of key surgical progression. The investigators also believe that simulation offers an opportunity to create standardized competency goals for basic surgical procedures, one of which is the OFR in maxillofacial surgery. While improvements in training methodologies to protect patient safety have received tremendous focus in health policy, VR simulation has not yet been incorporated in most training programs.39
ACCEPTED MANUSCRIPT
A validated assessment tool would allow residency programs to use a set of metrics to better quantify the amount of experience a trainee has received before operating on patients.
On a
larger scale, we propose a “blended” curriculum strategy that incorporates simulation into the
RI PT
current training schedule, along with conventional intraoperative guidance, which would allow trainees to remediate areas of deficiency, focus on specific portions of challenging procedures, and partake in meaningful feedback with mentors regarding their progression to mature and
SC
experienced surgeons.
M AN U
Presently, no other VR simulators use cognitive task analysis to enhance decision-making skills in maxillofacial surgery training. Future work is required to validate this platform as a useful training tool by testing for predictive validity and describing the learning curve associated with repeated exposure. The ideal time point within post-graduate training to initiate simulation
TE D
exposure for maximum resident benefit (early versus late with respect to operative experience) also requires further investigation. Given the potential positive impact on education, medical costs, and patient safety,3 it is reasonable to defend the interest in VR simulation for expanding
AC C
EP
surgical training on both a national and global level.
ACCEPTED MANUSCRIPT
References Dawson S. Procedural simulation: a primer. Radiology 241:17, 2006.
2.
Philibert I, Friedman P, Williams WT. New requirements for resident duty hours. JAMA 288: 1112, 2002.
3.
Millenson ML. Pushing the profession: how the news media turned patient safety into a priority. Qual Saf Health Care. 11: 57, 2002.
Chahal HS. Work hour regulations and training of residents. J Oral Maxillofac Surg
SC
4.
65:154, 2007.
Fisher EL, Blakey GH. Perspective on work-hour restrictions in oral and
M AN U
5.
RI PT
1.
maxillofacial surgery: the argument against adopting duty hours regulations. J Oral Maxillofac Surg 70:1249, 2012. 6.
Cunningham LL, Salman SO, Savit E. Limiting resident work hours: the case for the
7.
TE D
80-hour work week. J Oral Maxillofac Surg 70:1246, 2012. Eckert M, Cuadrado D, Steele S, Brown T, Beekley A, Martin M. The changing face of the general surgeon: national and local trends in resident operative experience. Am
8.
EP
J Surg 5:652, 2010.
Brennan TA, Leape LL, Laird NM, Hebert L, Localio AR, Lawthers AG, Newhouse
AC C
JP, Weiler PC, Hiatt HH. Incidence of adverse events and negligence in hospitalized patients: results of the Harvard Medical Practice Study. Qual Saf Health Care 13:
145, 2004.
9.
Verma SP, Dialey SH, McMurray JS, Jiang JJ, McCulloch TM. Implementation of a program for surgical education in laryngology. Laryngoscope 120:2241, 2010.
ACCEPTED MANUSCRIPT
10.
Accreditation Council for Graduate Medical Education. ACGME Common Program Requirements section III.B. Accessed August 01, 2016.
11.
Zaleckas L, Pečiulienė V, Gendvilienė I, Pūrienė A, Rimkuvienė J. Prevalence and
Perry M. Maxillofacial trauma - developments, innovations and controversies. Injury 12:1252, 2009.
13.
Converse JM, Smith B, Obear MF, Wood-Smith D. Orbital blowout fractures: a tenyear survey. Plast Reconstr Surg 39:20, 1967.
Losee JE, Gimbel ML, Rubin JP, Wallace CG, Wei F-C. Schwartz's Principles of
M AN U
14.
SC
12.
RI PT
etiology of midfacial fractures: a study of 799 cases. Medicina 51:222, 2015.
Surgery Plastic and Reconstructive Surgery. 10e [online]. New York, NY, McGrawHill Education, 2014, Ch. 45. 15.
Hall JC, Ellis C, Hamdorf J. Surgeons and cognitive processes. Br J Surg 90:10,
16.
TE D
2003.
Ericsson KA. Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Acad Med 79:S70, 2004. Tsuda S, Scott D, Doyle J, Jones DB. Surgical skills training and simulation. Curr
EP
17.
Probl Surg. 46:271, 2009. Goldman S. The educational kanban: promoting effective self-directed adult learning
AC C
18.
in medical education. Acad Med. 84:927, 2009.
19.
Seymour NE, Gallagher AG, Roman SA, O'Brien MK, Bansal VK, Andersen DK, Satava RM. Virtual reality training improves operating room performance: results of a randomized, double-blinded study. Ann Surg 236:458, 2002.
20.
Hamdorf JM, Hall JC. Acquiring surgical skills. Br J Surg 87:28, 2000.
ACCEPTED MANUSCRIPT
21.
Pugh CM, Santacaterina S, DaRosa DA, Clark RE. Intra-operative decision making: more than meets the eye. J Biomed Inform 44:486, 2011.
22.
Salma A, Chow A, Ammirati M. Setting up a microneurosurgical skull base lab:
23.
RI PT
technical and operational considerations. Neurosurg Rev 34:317, 2011.
Morris D, Sewell C, Barbagli F, Salisbury K, Blevins NH, Girod S. Visuohaptic simulation of bone surgery for training and evaluation. IEEE Comput Graph Appl
24.
SC
26:48, 2006.
Pflesser B, Petersik A, Tiede U, Hohne KH, Leuwer R. Volume cutting for virtual
25.
M AN U
petrous bone surgery. Comput Aided Surg 2:74, 2002.
Dubois L, Jansen J, Schreurs R, Habets PE, Reinartz SM, Gooris PJ, Becking AG. How reliable is the visual appraisal of a surgeon for diagnosing orbital fractures? J Craniomaxillofac Surg 44:1015, 2016.
Aghazadeh MA, Mercado MA, Pan MM, Miles BJ, Goh AC. Performance of robotic
TE D
26.
simulated skills tasks is positively associated with clinical robotic surgical performance. BJU Int 118:475, 2016. Arora A, lau LY, Awad Z, Darzi A, Singh A, Tolley N. Virtual reality simulation
EP
27.
training in Otolaryngology. Int J Surg 2:87, 2014. Tanaka A, Graddy C, Simpson K, Perez M, Truong M, Smith R. Robotic surgery
AC C
28.
simulation validity and usability comparative analysis. Surg Endosc 30:3720, 2015.
29.
Kelly DC, Margules AC, Kundavaram CR, Narins H, Gomella LG, Trabulsi EJ, Lallas CD. Face, content, and construct validation of the da vinci skills simulator. Urology 79:1068, 2011.
ACCEPTED MANUSCRIPT
30.
Van Nortwich SS, Lendvay TS, Jensen AR, Wright AS, Horvath KD, Kim S. Methodologies for establishing validity in surgical simulation studies. Surgery 147:622, 2010. Gallagher AG, Ritter EM, Satava RM. Fundamental principles of validation, and
RI PT
31.
reliability: rigorous science for the assessment of surgical education and training. Surg Endosc 17:1525, 2003.
Sugand K, Mawkin M, Gupte C. Validating touch surgery (TM): a cognitive task
SC
32.
simulation and rehearsal app for intramedullary femoral nailing. Injury 46:2212,
33.
M AN U
2015.
Sugand K, Mawkin M, Gupte C. Training effect of using touch surgery (TM) for intramedullary femoral nailing. Injury 47:448, 2016.
Lynn MR. Determination and quantification of content validity. Nurs Res 6:382,1986.
35.
Xu S, Perez M, Perrenot C, Hubert N, Hubert J. Face, content, construct, and
TE D
34.
concurrent validity of a novel robotic surgery patient-side simulator: the Xperience™ team trainer. Surg Endos 30:3334, 2016. Slocumb EM, Cole FL. A practical approach to content validation. Appl Nurs Res 4:
EP
36.
192, 1991.
Hogle NJ, Widmann WD, Ude AO, Hardy MA, Fowler DL. Does training novices to
AC C
37.
criteria and does rapid acquisition of skills on laparoscopic simulators have predictive validity or are we just playing video games? J Surg 65:431, 2008.
38.
Kempton SJ, Bentz ML. Making master surgeons out of trainees: part I. teaching surgical judgment. Plast Reconstr Surg 137:1646, 2016.
ACCEPTED MANUSCRIPT
39.
Yule S, Flin R, Paterson-Brown S, Maran N. Non-technical skills for surgeons in the
AC C
EP
TE D
M AN U
SC
RI PT
operating room: a review of the literature. Surgery 2:140, 2006.
ACCEPTED MANUSCRIPT
Figure Legend
RI PT
Figure 1a. Screen shot of animation for the cantholysis surgical step during simulation in “Learn” mode (the participants were not exposed to this mode in the present study).
SC
Figure 1b: Corresponding multiple choice question for the cantholysis step in “Test” mode. A single correct answer choice must be chosen to proceed to the next step. The cumulative score is
M AN U
displayed on the top right corner of the screen.
Figure 2a. Screen shot of a swipe interaction during simulation in “Learn” mode. The user is instructed to perform a manual swipe from the depicted blue circle to the dotted purple circle,
TE D
representing the correct anatomical area for the proposed instrument.
Figure 2b. Sample swipe interaction for the same step as in Figure 2a, now shown during
EP
simulation module in “Test” mode. The dotted purple circle is no longer present, and the user is asked to “drag” the displayed instrument to the correct anatomical area to proceed to the next
AC C
step.
Figure 3. Likert Scale and items included in the post-module questionnaire for both novice and expert participants.
ACCEPTED MANUSCRIPT
Table I. Descriptive analysis of novice and expert characteristics.
Experts
N(%) or Mean(sd) 39
N(%) or Mean(sd) 10
24.8 ± 2.5
46.6 ± 9.8
0.000
Male
18 (47%)
9 (90%)
0.02
Female
20 (53%)
Age in years¥ Sex*
2 3 4 b
`Expert years of experience 1–5
>15
12 (30%)
-
13 (33%)
-
-
14 (36%)
-
-
-
1 (10%)
-
1 (10%)
-
-
4 (40%)
-
-
4 (40%)
-
TE D
6 – 10 11 – 15
1 (10%)
M AN U
Student graduate yeara
SC
Sample size (n)
EP
¥ Mean ±SD * One participant chose not to identify a Student graduate year data were only collected among novices. b Years of experience data were only collected among experts.
AC C
p-value
RI PT
Novices
Study variable
ACCEPTED MANUSCRIPT
Table II. Bivariate correlations between age and performance scores, and Chi Squared analysis of sex and performance scores. Performance score Phase I r = 0.72
r = 0.63
Sex
Mean(sd)
Mean(sd)
Male
56 ±18.06
57 ±21.14
Female
45 ±11.98
41 ±15.8
AC C
EP
TE D
M AN U
SC
Age
r = correlation coefficient
p-value
Phase II
RI PT
Study variable
<0.001
0.226 0.714
ACCEPTED MANUSCRIPT
Table III. Bivariate relationship between experience and performance scores per phase of the module. Performance score Phase I
Phase II
Novice
44 ±8.9
42 ±15.6
Expert
76 ±4.8
78 ±5.9
p-value
AC C
EP
TE D
M AN U
SC
Experience
Dependent Sample
RI PT
Variable
0.397 0.709
ACCEPTED MANUSCRIPT
Table IV. Bivariate ordinary least squares regression models examining the unadjusted relationships between experience, sex, age, and performance score, n=46 (Phase 1) and n=44 (Phase 2).
p-value
Referent 32.21 (25.40-39.01)
<0.001
Referent 35.71 (25.46-45.95)
<0.001
Age
1.13 (0.79-1.47)
<0.001
1.23 (0.76-1.70)
<0.001
Male
10.87 (1.39-20.45)
0.026
16.20 (4.27-28.13)
AC C
EP
TE D
M AN U
SC
RI PT
p-value
Phase 2 Beta coefficient (95% CI)
Variable Experience Novice Expert
Phase 1 Beta coefficient (95% CI)
0.009
ACCEPTED MANUSCRIPT
Table V. Multivariate ordinary least squares regression models testing the magnitude of the relationship between experience and Phase 1 and 2 scores, controlling for age and sex, n=46 (Phase 1) and n=44 (Phase 2).
Referent 32.91 (16.87-48.95)
<0.001
Referent 33.85 (10.48-57.22)
0.006
-0.05 (-0.68-0.56) 0.97 (-5.49-7.43)
0.846 0.763
-0.05 (-0.95-0.85) 6.14 (-3.51-15.79)
0.910 0.206
RI PT
p-value
M AN U TE D EP AC C
Age Male
p-value
Phase 2 Beta coefficient (95% CI)
SC
Variable Experience Novice Expert
Phase 1 Beta coefficient (95% CI)
ACCEPTED MANUSCRIPT
Table VI. Descriptive and median comparison tests of validity assessments across novices and experts. Novices
Experts
p-value
Face Validity Visual appearance
5 (4-5)
4.5 (3-5)
0.930
Utility for training
5 (3-5)
4.5 (4-5)
0.493
Interest to rehearse more procedures
5 (3-5)
4.5 (3-5)
0.732
Interest for use in training programs
5 (3-5)
4 (3-5)
0.756
-
Surgical sequence
4 (3-5)
3.5(3-5) 4 (3-5)
AC C
EP
TE D
* Reported as Median (Range)
M AN U
Instrumentation
SC
Content Validity Anatomic accuracy
RI PT
Questionnaire Items Responses*
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT