Author's Accepted Manuscript Basic Laparoscopic Skills Assessment Study – Validation and Standard Setting among Canadian Urology Trainees Jason Y. Lee , Sero Andonian , Kenneth T. Pace , Ethan Grober
PII: DOI: Reference:
S0022-5347(16)31929-2 10.1016/j.juro.2016.12.009 JURO 14238
To appear in:
The Journal of Urology
Please cite this article as: Lee JY, Andonian S, Pace KT, Grober E, Basic Laparoscopic Skills Assessment Study – Validation and Standard Setting among Canadian Urology Trainees, The Journal of Urology® (2017), doi: 10.1016/j.juro.2016.12.009. DISCLAIMER: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our subscribers we are providing this early version of the article. The paper will be copy edited and typeset, and proof will be reviewed before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to The Journal pertain.
Embargo Policy All article content is under embargo until uncorrected proof of the article becomes available online. We will provide journalists and editors with full-text copies of the articles in question prior to the embargo date so that stories can be adequately researched and written. The standard embargo time is 12:01 AM ET on that date. Questions regarding embargo should be directed to
[email protected].
RI PT
ACCEPTED MANUSCRIPT
SC
Basic Laparoscopic Skills Assessment Study – Validation and Standard Setting among Canadian Urology Trainees
TE D
Corresponding Author
M AN U
Jason Y Lee, Sero Andonian, Kenneth T Pace, Ethan Grober
AC C
EP
Jason Y Lee 61 Queen St E – Suite 2-012 Toronto, Ontario M5C 2T2 (office) 416.867.3735 (fax) 416.867.7433 (email)
[email protected]
* Study was partially funded by a Canadian Urological Association (CUA) Scholarship Fund grant
ACCEPTED MANUSCRIPT
ABSTRACT
word count = 248
RI PT
Introduction As urology training programs move to a competency-based medical education model, iterative assessments with objective standards will be required. To develop a valid set of technical skills standards, we initiated a national skills assessment study focusing initially on laparoscopic skills.
M AN U
SC
Methods Between Feb2014 and Mar2016, the basic laparoscopic skill of Canadian urology trainees and attending urologists was assessed using 4 standardized tasks from the AUA BLUS curriculum; peg transfer, pattern cutting, suturing & knot tying, vascular clip-applying. All performances were video-recorded and assessed using 3 methods; time + errors-based scoring (TE), expert global rating scores (GRS) and a novel, crowd-sourced assessment platform (CSATS). Different methods of standard setting were employed to develop pass-fail cut points.
TE D
Results Ninety-nine trainees and 6 attending urologists completed testing. Reported laparoscopic experience and training level both correlated with performance (p<0.01) and attending urologists were significantly better than trainees (p<0.05), demonstrating construct validity evidence for the 4 AUA BLUS tasks. The CSATS method of assessment correlated well with traditional methods (TE, GRS). We were able to use both relative and absolute standard setting methods to define pass-fail cut points for all 4 AUA BLUS tasks.
AC C
EP
Conclusions The 4 AUA BLUS tasks demonstrated good construct validity evidence for use in assessing basic laparoscopic skill. Performance scores using the novel CSATS platform correlated well with traditional, time-consuming methods of assessment. Various standard setting methods were used to develop pass-fail cut points for educators to utilize when making formative and summative assessments of basic laparoscopic skill.
ACCEPTED MANUSCRIPT
INTRODUCTION
RI PT
The traditional Halstedian apprenticeship model of surgical training has served us well over the past several decades. Contemporary surgical training however, is undergoing a major paradigm shift with increased emphasis on
SC
objective, iterative assessments, increased utilization of simulation-based training and assessment methods, and an overall shift to a competency-based training
M AN U
model(1,2).
In Canada, the Royal College of Physicians and Surgeons of Canada has mandated that all training programs fully transition to a competency-based medical education (CBME) curriculum within the next few years, a national
TE D
initiative dubbed “Competence By Design”(3).
As part of this new CBME model, in addition to knowledge-based competencies, programs will be required to objectively assess technical skills
EP
competencies. This will require the validation of assessment tools and the setting of competency standards for these technical skills.
AC C
Laparoscopy has become the gold standard approach in the management of various urologic maladies and is an essential skill-set for the contemporary urologic surgeon in Canada. The aim of our research was to validate the American Urological Association (AUA) Basic Laparoscopic Urologic Surgery (BLUS) skill tasks(4) for use in assessing the basic laparoscopic skill of Canadian urology trainees, to compare traditional and novel technical skill assessment
ACCEPTED MANUSCRIPT
methods, and to set pass-fail standards for basic laparoscopic skills competency among Canadian urology trainees.
RI PT
METHODS
After obtaining research ethics board approval, between February 2014
SC
and March 2016, the basic laparoscopic skill of Canadian urology trainees was
M AN U
assessed using 4 basic laparoscopic skill tasks; peg transfer (PT), pattern cutting (PC), suturing & knot-tying (SKT), and vascular clip application (VCA). These 4 tasks have been validated by the AUA as part of the BLUS curriculum(4,5), but have never been validated within a Canadian cohort. The same testing materials included in the BLUS curriculum were used for this study, though we did not
used.
TE D
include tool motion and force data assessments; the EDGE simulator was not
Study participants included urology residents at various levels of training;
EP
final year medical students applying for urology residency (PGY0), junior
AC C
residents (PGY3), and senior residents (PGY5). Several Canadian attending urologists with expertise in laparoscopy were also included in the study. Participants completed all 4 AUA BLUS tasks using standardized equipment and all performances were video-recorded without identifying data. Each performance was subsequently scored using different assessment methods; traditional time + error penalty-based (TE) scoring, a global rating score as assessed by two blinded expert faculty (GRS), and a global rating score as
ACCEPTED MANUSCRIPT
assessed by a novel crowd-sourcing platform (CSATS)(6). The TE scores were calculated by adding a time penalty for predefined errors (eg 1 error = additional 3 seconds) to the total time to complete the task; therefore a lower TE score was
RI PT
a better score. Each TE score was calculated by a single, blinded and trained
rater. Using the same validated GOALS assessment tool (7), both the 2 expert faculty raters (GRS) and the crowd-sourced assessment platform (CSATS)
SC
provided a global rating score for each performance; scores ranged from 4 to 20 and a higher score was a better score. Additionally, the expert faculty raters
M AN U
provided an overall pass-fail rating for each performance.
The CSATS platform(6,8) is a web-based, crowd-sourced assessment program that allows educators and learners to upload performance videos for the purpose of being assessed using a validated scoring tool. Uploaded videos are
TE D
scored anonymously by online “workers”, who have been oriented to the type of performance videos they will be rating (eg laparoscopic basic skills, robotic surgery, open surgery, etc) and are trained on how to use the predetermined
EP
scoring tool. “Workers” are subsequently paid for each performance video scored. All performance scores (TE, GRS, CSATS) were then analyzed and
AC C
compared, along with basic demographic data provided by all participants. Using expert faculty pass-fail ratings, two different standard setting methods were utilized and compared; norm-referenced method (1 standard deviation with expert mean scores) and criterion-referenced method (contrasting groups). All statistical analysis was conducted using SPSS v21. Performance scores for each level of training were analyzed and compared using t-test and
ACCEPTED MANUSCRIPT
ANOVA as indicated, with a p-value threshold of <0.05 to determine statistical significance. Performance scores and demographic data were correlated using Spearman’s correlation for ordinal data. Reliability of GRS scores was assessed
RI PT
using intraclass correlation coefficient. Comparison of the various assessment
methods was performed using Pearson’s correlation coefficient for the interval data. Correlation coefficients greater than 0.7 were considered highly
M AN U
SC
correlated(9).
TE D
RESULTS
A total of 99 trainees and 6 attending urologists completed the study with fair representation from across Canada (Table 1).
EP
The 4 AUA BLUS skill tasks demonstrated construct validity among our
AC C
Canadian cohort as both participant level of training and self-reported laparoscopic surgery experience correlated well with performance scores (Table 2). In particular, the more difficult tasks, PC and SKT, correlated most strongly (correlation coefficients >0.70). Mean attending urologist performance on all 4 tasks, as measured by any of the three assessment methods, was significantly better than trainee performances (Table 3). Amongst the trainees only, most of the BLUS tasks demonstrated good construct validity. Performance on the SKT
ACCEPTED MANUSCRIPT
task did not differ between PGY5s and PGY3s on the GRS score (13.3 vs 13.1, p=0.92), though there was a significant difference amongst both groups when compared to PGY0s. Performance on the vascular clip applying task did not
RI PT
differ between PGY5s and PGY3s for the GRS score (14.2 vs 13.1, p=0.15) nor between PGY3s and PGY0s for the TE score (64.6 vs 71.0, p=0.24). Trainee self-reported access to laparoscopic training tools correlated well with task
SC
performance, particularly for the SKT task (Table 4).
Performance scores from the 2 expert faculty raters (GRS) demonstrated
M AN U
excellent reliability; intraclass correlation coefficient was 0.877. Performance scores on all 4 tasks as assessed using the novel CSATS crowd-sourcing platform correlated well with the more tradition methods of assessment, TE and GRS (Table 5), though the strength of correlation for the VCA task was only
TE D
moderate (correlation coefficient <0.7). There were 19 incongruent pass-fail ratings (4.5%) made by the two faculty expert raters. These were manually reviewed and resolved by consensus.
EP
There were no differences in trainee performance on any of the 4 AUA BLUS tasks based on gender or region of training (p>0.05).
AC C
Using a norm-referenced standard setting method, we determined cut
points for all 4 BLUS tasks. The standard pass-fail cut point was calculated as 1 standard deviation below the mean score for attending urologists (Figure 1). The contrasting groups criterion-referenced standard setting method provided passfail cut points for each of the 4 BLUS tasks using expert pass-fail ratings (Figure 2, 3). Comparing both methods of standard setting, the norm-referenced method
ACCEPTED MANUSCRIPT
resulted in much higher standards (better scores) for both TE score (lower score) and GRS score (higher score). For example, the pass-fail cut point for the PT task was 64.2 for the norm-referenced methods vs 101.0 for criterion-referenced
M AN U
SC
norm- and criterion-referenced methods, respectively.
RI PT
methods. The GRS pass-fail cut point for the SKT task was 15.8 vs 13.25 for the
TE D
DISCUSSION
Our study has provided further construct validity evidence for the AUA BLUS(4) basic laparoscopic skill tasks; peg transfer, pattern cutting, suturing &
EP
knot tying, and vascular clip applying. This is the first study to confirm validity
AC C
evidence for use of these specific assessment tasks to assess Canadian urologic trainees specifically.
As we move towards a competency-based medical education paradigm, it
will be important for urology training programs to have the ability to perform iterative, objective assessments; both for formative and summative purposes(10). The results of this study provide further support for the use of these 4 AUA BLUS tasks to assess the basic laparoscopic skill of urology trainees. The more
ACCEPTED MANUSCRIPT
challenging tasks of pattern cutting (PC) and suturing & knot tying (SKT) were particularly good at discerning expert laparoscopic urologists from less-
construct validity evidence in the AUA validation study(5).
RI PT
experienced trainees. The SKT task was similarly found to demonstrate excellent
Reliable and valid assessment methods must be employed in order to assess surgical competence(11). Ideally, the assessment method should be
SC
timely and should not be resource-intense. Traditional methods of technical skills assessment, specifically for laparoscopy, include time + error-based scoring
M AN U
methods as employed by the Fundamentals of Laparoscopic Surgery (FLS) curriculum(12) and global rating scores by expert faculty (7). Both methods are time-consuming and require significant time commitments from expert faculty raters. Our study results demonstrated a novel, crowd-sourced platform
TE D
(CSATS)(6) could also provide reliable and valid assessments of basic laparoscopic skill. The CSATS platform provides more timely assessments of competence and does not require expert faculty to make the assessments, which
EP
may be important in a CBME training model where assessments are meant to be frequent and iterative. Previous studies have shown CSATS assessments to be
AC C
approximately 4 to 6-fold less expensive and more timely than those done by expert faculty(13).
Finally, in order to make assessments of competency, defensible
standards (pass-fail cut points) need to be established(14). While various methods of standard setting have been described(14-16), standard setting for surgical competence falls into two main categories: norm-referenced (relative)
ACCEPTED MANUSCRIPT
methods and criterion-referenced (absolute) methods. Our study provides basic laparoscopic skill competency cut points using both methods of standard setting. Norm-referenced standard setting methods have the advantage of being
RI PT
simple and reproducible, but have been criticized for its variability between test administrations. In addition, standards are significantly impacted by the
performance characteristics of the reference standard group (eg expert attending
SC
urologists). As such, cut points set using relative standard setting methods are best employed for formative assessments(14,17,18). For summative
M AN U
assessments of surgical competence, criterion-referenced standard setting methods are preferred(17,19). The contrasting groups method used in this study has the advantage of allowing the pass-fail cut point to be adjusted to minimize errors in misjudging incompetent performances as competent for high-stakes
TE D
assessments. Also, while this method does require the opinions of expert faculty, it still utilizes an evidence-based scoring system, and so may be the more valid method for educators to utilize.
EP
It is interesting to note that in our study, the pass-fail cut scores established using the norm-referenced standard setting method were much
AC C
higher than the cut scores established using the absolute method. This is a direct result of the level of expertise demonstrated by the reference expert attending urologist group. Given that only 6 expert attending urologists were included in the testing, each of who had significant advanced laparoscopic experience, the reference group scores were significantly better than trainee scores with small variance in performance between experts.
ACCEPTED MANUSCRIPT
There are several limitations to our study. While a large cohort of participants were included in the study, there were an unequal number of trainees at each level assessed and only 6 attending urologists were included in
RI PT
the study, potentially introducing selection bias into the study results. We also
included only 2 expert faculty reviewers, and while they demonstrated excellent
reliability (ICC >0.8), the inclusion of more expert raters could have improved the
SC
robustness of our data. Additionally, a single rater assessed all TE scores. While the rater was trained and oriented to the standardized definition of an “error” for
M AN U
each task, inter-rater reliability for TE scoring cannot be confirmed for our study. Lastly, while we have determined basic laparoscopic skill competency standards, it is unknown whether performance on these 4 AUA BLUS tasks will correlate
EP
TE D
with procedural or intra-operative competence.
AC C
CONCLUSIONS
The 4 AUA BLUS tasks demonstrated construct validity among Canadian
urology trainees and expert attending urologists, providing evidence to use these tasks to assess basic laparoscopic skill. Utilization of a novel, crowd-sourcing assessment platform (CSATS) to obtain performance scores correlated well with
ACCEPTED MANUSCRIPT
traditional, more time-consuming methods of assessment. We have now established a valid set of standards for basic laparoscopic skills assessments using the 4 AUA BLUS tasks. Various standard setting methods can be used to
RI PT
develop pass-fail cut points, which can aid educators in making formative and summative assessments of basic laparoscopic skill among Canadian urology
2.
3.
Frank JR, Snell LS, Cate OT, et al. Competency-based medical education: theory to practice. Med Teach. 2010;32(8):638–45.
AC C
1.
EP
REFERENCES
TE D
M AN U
SC
trainees.
Aggarwal R, Darzi A. Technical-skills training in the 21st century. N Engl J Med. 2006 Dec 21;355(25):2695–6.
http://www.royalcollege.ca/rcsite/cbd/competence-by-design-cbd-e. Competence by Design. 2016. pp. 1–1.
4.
Sweet RM, Beach R, Sainfort F, et al. Introduction and Validation of the American Urological Association Basic Laparoscopic Urologic Surgery Skills Curriculum. Journal of Endourology. 2012 Feb;26(2):190–6.
5.
Kowalewski TM, Sweet R, Lendvay TS, et al. Validation of the AUA BLUS Tasks. J Urol. Elsevier; 2016 Apr;195(4P1):998–1005.
ACCEPTED MANUSCRIPT
Chen C, White L, Kowalewski T, et al. Crowd-Sourced Assessment of Technical Skills: a novel method to evaluate surgical performance. J Surg Res. 2014 Mar;187(1):65–71.
7.
Vassiliou MC, Feldman LS, Andrew CG, et al. A global assessment tool for evaluation of intraoperative laparoscopic skills. The American Journal of Surgery. 2005 Jul;190(1):107– 13.
8.
Powers MK, Boonjindasup A, Pinsky M, et al. Crowdsourcing Assessment of Surgeon Dissection of Renal Artery and Vein During Robotic Partial Nephrectomy: A Novel Approach for Quantitative Assessment of Surgical Performance. Journal of Endourology. 2016 Apr;30(4):447–52.
9.
Hinkle DE, Wiersma W, Jurs SG. Applied Statistics for the Behavioral Sciences. Boston: Houghton Mifflin; 2016.
10.
Epstein RM. Assessment in medical education. N Engl J Med. 2007 Jan 25;356(4):387–96.
11.
Hamstra SJ. Keynote Address: The Focus on Competencies and Individual Learner Assessment as Emerging Themes in Medical Education Research. Kowalenko T, editor. Academic Emergency Medicine. 2012 Dec 26;19(12):1336–43.
12.
Peters JH, Fried GM, Swanstrom LL, et al. Development and validation of a comprehensive program of education and assessment of the basic fundamentals of laparoscopic surgery. Surgery. 2004 Jan;135(1):21–7.
13.
White LW, Kowalewski TM, Dockter RL, et al. Crowd-Sourced Assessment of Technical Skill: A Valid Method for Discriminating Basic Robotic Surgery Skills. Journal of Endourology. 2015 Nov;29(11):1295–301.
14.
Downing SM, Tekian A, Yudkowsky R. Procedures for establishing defensible absolute passing scores on performance examinations in health professions education. Teaching and Learning in Medicine. 2006;18(1):50–7.
15.
Yudkowsky R, Downing SM, Popescu M. Setting standards for performance tests: a pilot study of a three-level Angoff method. Acad Med. 2008 Oct;83(10 Suppl):S13–6.
16.
Norcini JJ, McKinley DW. Assessment methods in medical education. Teaching and Teacher Education. 2007 Apr;23(3):239–50.
17.
Cendan J, Wier D, Behrns K. A primer on standards setting as it applies to surgical education and credentialing. Surg Endosc. 2013 Jul;27(7):2631–7.
AC C
EP
TE D
M AN U
SC
RI PT
6.
18.
Goldenberg MG, Garbens A, Szasz P, et al. Systematic review to establish absolute standards for technical performance in surgery. Br J Surg. 2016 Sep 30.
19.
Norcini JJ. Setting standards on educational tests. Medical Education. 2003 May;37(5):464–9.
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
ABBREVIATIONS
AUA – American Urological Association
RI PT
BLUS – Basic Laparoscopic Urological Surgery TE – time + error penalty method of scoring GRS – expert faculty global rating scale scores
CBME – competency-based medical education
PC – pattern cutting task SKT – suturing & knot-tying task VCA – vascular clip applying task
TE D
PGY – post-graduate year
M AN U
PT – peg transfer task
SC
CSATS – crowd-sourced assessments of technical skill global rating scale scores
GOALS – global operative assessment of laparoscopic skill
AC C
EP
FLS – fundamentals of laparoscopic surgery