Development and Pilot Testing of an Assessment Tool for Performance of Anatomic Lung Resection

Development and Pilot Testing of an Assessment Tool for Performance of Anatomic Lung Resection

Journal Pre-proof Development and Pilot Testing of an Assessment Tool for Performance of Anatomic Lung Resection Simon R. Turner, MD MEd, Hollis Lai, ...

1MB Sizes 0 Downloads 9 Views

Journal Pre-proof Development and Pilot Testing of an Assessment Tool for Performance of Anatomic Lung Resection Simon R. Turner, MD MEd, Hollis Lai, PhD, Basil S. Nasir, MD, Kazuhiro Yasufuku, MD, Colin Schieman, MD, James Huang, MD, Eric L.R. Bédard, MD MSc PII:

S0003-4975(19)31624-8

DOI:

https://doi.org/10.1016/j.athoracsur.2019.09.052

Reference:

ATS 33193

To appear in:

The Annals of Thoracic Surgery

Received Date: 22 May 2019 Revised Date:

15 August 2019

Accepted Date: 16 September 2019

Please cite this article as: Turner SR, Lai H, Nasir BS, Yasufuku K, Schieman C, Huang J, Bédard ELR, Development and Pilot Testing of an Assessment Tool for Performance of Anatomic Lung Resection, The Annals of Thoracic Surgery (2019), doi: https://doi.org/10.1016/j.athoracsur.2019.09.052. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 by The Society of Thoracic Surgeons

Development and Pilot Testing of an Assessment Tool for Performance of Anatomic Lung Resection Running Head: Performance of Anatomic Lung Resection

Simon R. Turner MD MEd*1, Hollis Lai PhD2, Basil S. Nasir MD3, Kazuhiro Yasufuku MD4, Colin Schieman MD5, James Huang MD6, Eric L.R. Bédard MD MSc1

1

Thoracic Surgery, University of Alberta, Edmonton, Canada

2

Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Canada

3

Thoracic Surgery, Université de Montréal, Montréal, Canada

4

Thoracic Surgery, University of Toronto, Toronto, Canada

5

Thoracic Surgery, University of Calgary, Calgary, Canada

6

Thoracic Surgery, Memorial Sloan Kettering Cancer Center, New York, USA

* Simon R. Turner MD Med, 421 Community Services Centre, 10240 Kingsway Ave, Edmonton, Canada, T5H3V9, email: [email protected]

Presented at the American Association for Thoracic Surgery, San Diego, CA, May 2018.

Word count: 4350

1

Abstract Background: To meet the need for competency assessment in thoracic surgery education, we developed and tested an instrument to assess trainees’ ability to perform anatomic lung resection for cancer.

Methods: The Thoracic Competency Assessment Tool-Anatomic Resection for lung Cancer (TCATARC) was developed through a multi-step process involving logical analysis, expert review, and simulation-based and clinical pilot testing. Validity evidence was gathered during a six-month clinical study of trainees performing anatomic lung resections and assessments of practicing surgeons. Feedback was gathered via post-encounter questionnaires.

Results: A 35-item instrument was developed and was tested in the clinical validation study. Seven trainees in four North American institutions participated and completed 64 anatomic lung resections. Reliability was high (α=0.93). Inter-observer reliability (k=0.73) and correlation with an existing global competency scale (k=0.68) were moderately high. Item analysis revealed the most difficult and discriminatory items, which matched well with a conceptual understanding of lung resection. Both trainees and assessors viewed the instrument as highly educationally effective and user-friendly. Practicing surgeons outperformed trainees.

Conclusions: TCAT-ARC demonstrated early evidence of validity and reliability in assessing performance of anatomic lung resection. The instrument may be most useful early in training and as a means for providing fine-grained formative feedback about which steps have been mastered and which still require improvement. TCAT-ARC may be employed in training programs to aid in the development of trainees’ competency and as a part of an aggregate assessment of trainees’ overall mastery of the procedure and readiness for independent practice.

2

Competency based medical education (CBME) is now a major paradigm in the assessment of medical trainees (1,2). The need to determine and document the ability of trainees to independently perform complex operations is paramount for patient safety. In Canada, the Competence by Design initiative has mandated the creation of specialty-specific milestones to guide the assessment of key competencies (3). Similar initiatives are underway in the USA via the ACGME Milestones and the Next Accreditation System (4) and in Europe via the Bologna declaration (5). However, valid and reliable tools for assessing competence in thoracic surgical procedures are lacking.

Thoracic surgeons must be able to perform a variety of anatomic lung resections, from segmentectomies and lobectomies, to pneumonectomies, for cancers in each of five lobes via open and minimally invasive approaches. The operation carries a significant risk of morbidity and must adhere to key oncologic principles. It is important, therefore, to ensure that thoracic surgery trainees can perform anatomic lung resection in a safe, efficient and effective manner.

Our group is currently developing a suite of Thoracic Competency Assessment Tools (TCATs) for core thoracic surgery operations. The TCAT for invasive mediastinal staging has been described (6). The objective of this study was to develop an instrument to assess the competencies required to perform any anatomic lung resection for cancer and collect evidence of validity and reliability via pilot-testing.

Material and Methods

The Thoracic Competency Assessment Tool-Anatomic Resection for lung Cancer (TCAT-ARC), a comprehensive competency assessment instrument, was developed in a multi-step process involving item development, item refinement through expert review, and simulated and clinical pilot testing (Figure 1). The guiding principle was that the instrument should be applicable to any anatomic lung resection for cancer, regardless of location, approach (thoracotomy or video assisted thoracic surgery-VATS) or

3

surgical technique. All participants provided informed consent. Approval was granted by the University of British Columbia Research Ethics Board.

Item development

An initial version of TCAT-ARC was developed using a logical analysis. The instrument’s psychometric domain was defined as the complete set of steps that must be completed in any safe, oncologically sound, anatomic lung resection for primary lung cancer. The goal was to create a list fully representative and relevant to this domain, including technical and non-technical skills. A set of 35 steps (items) was generated, in 5 competency areas: general, pre-operative, VATS (if applicable), resection and post-resection.

Expert Review

The initial version of TCAT-ARC was distributed to members of the Canadian Association of Thoracic Surgeons (CATS, n=86) via online survey in 2014-2015. Respondents were asked whether each item should be included in a competency assessment tool for anatomic resection for lung cancer, indicating agreement on a five-point Likert scale (1=strongly disagree, 5=strongly agree). Respondents were invited to comment on items and asked to provide any items they believed were missing. Mean responses for each item were calculated and each respondent’s total deviation from the mean was determined. Respondents with the lowest deviation from the mean were selected as the most representative group of judges in order to remove outliers for further rounds of instrument refinement until consensus was reached (7-9). We pre-set a minimum level of agreement to determine consensus of mean >4.5/5 and mode = 5/5. Items not meeting this threshold were included in subsequent rounds of review with the relevant comments from the previous round, along with novel items suggested by respondents.

4

Pilot Testing-Simulation

The refined instrument was then pilot tested in two phases. The first was conducted in July, 2015 at the University of Toronto Interventional Thoracic Surgery training course, which serves as an introductory “boot camp” for thoracic surgery trainees from across Canada (10). At this course trainees performed simulated VATS lobectomy using a porcine model, under the supervision of a thoracic surgeon, who assessed their performance using TCAT-ARC. The assessment process was observed to judge how the instrument functioned and to note any unexpected deficiencies encountered, so that items could be further refined. A post-encounter feedback questionnaire was distributed to trainees and assessors at the course. A case-difficulty rating scale from 1 to 3 was added to the instrument to allow educators to better understand the results of a given assessment. The Likert scale anchors that were ultimately selected for use in the final version of the instrument were inspired by aviation industry training programs: 1=unable or unsafe to attempt, 2=attempted, failed to execute, 3=executed with major correction by surgeon, 4=executed with minor correction by surgeon, 5=executed without need for input from surgeon. The final revised instrument (Figures 2a/b), was used in the clinical pilot study.

Pilot Testing-Clinical

A clinical pilot study was conducted at four thoracic surgery training programs; two Canadian and two American. Trainees were voluntarily recruited to have their performance of anatomic lung resection for cancer assessed and self-assessed using TCAT-ARC at least twice a month for six months. Each performance was also simultaneously assessed using the Objective Structured Assessment of Technical Skills (OSATS) global competency instrument as a comparison for external validity (11). Assessments were done live and in-person. Detailed instructions were provided in writing to all trainees

5

and surgeons participating in the trial to ensure consistency. It was left to the participating trainees and surgeons to select appropriate cases for assessment. Participants were encouraged to have the assessments and self-assessments completed as soon as possible after completion of the case. Participants were administered a post-encounter questionnaire.

Practicing Surgeon Assessment

Separately, a group of five practicing general thoracic surgeons assessed each other’s performance of anatomic lung resection TCAT-ARC during live case observation. Assessments were submitted anonymously. Each surgeon was assessed three times for a total of fifteen assessments. All surgeons had been in practice at least six years.

Statistical Analysis

Results of the clinical validation study were analyzed using item analysis and inter-observer reliability. For item analysis, data from all participants were pooled and dichotomized (1-4 vs 5). Item difficulty was calculated as 1 minus the proportion of scores of 5/5, and discrimination (the ability of one item to differentiate between trainees of overall low and high ability) was calculated as the point biserial statistic (which correlates the individual item score with the overall test score). Item scores were correlated with overall case difficulty using Pearson’s correlation statistic. Reliability, a measure of how well a test measures a single psychometric construct, also known as internal consistency, was calculated using Cronbach’s alpha. Inter-observer reliability was estimated by correlating the scores on the assessment performed by the supervising surgeon with the self-assessment performed by the trainee on each given case.

Results

6

Expert Review

The initial expert review survey was returned by 46/86 CATS members (response rate=54%), representing 8 of 10 Canadian provinces and a range of experience, with a similar proportion in practice less than five years (22%) and over twenty years (29%). Most respondents had experience with supervising thoracic surgery (66%) and non-thoracic surgery trainees (66%) in the OR. Respondents performed on average 55 anatomic lung resections per year (range 15-100). Most (76%) had some specialized training in VATS, and 76% performed most of their resections by VATS. A second review survey was distributed to 22 respondents with the most representative responses (lowest deviation from the median). There were no significant differences in the demographics or experience levels of these selected respondents and the remainder, though their average volume of lung resections per year was slightly higher (58 vs 48). Twenty of these 22 responded to the second-round survey of 36 items (response rate=91%). This second-round survey consisted of nine items that previously had not met the consensus threshold and were therefore edited to reflect respondent feedback, and an additional item suggested by respondents. After the second round, 36 items were retained, with mean agreement of 4.87 for all items, with median and mode of 5 for each, meaning a third round was not necessary.

Pilot Testing- Simulation

Seven trainees were assessed during simulated VATS lobectomy by a single surgeon. Instances were identified where the wording of items or organization of items into sections was not clear and intuitive. The wording of several items was modified and one was removed, leaving 35 items.

7

Six trainees out of seven completed the post-pilot encounter questionnaire (response rate 86%). For all six items in the questionnaire, the median score on a five-point Likert scale was 4 or higher (Figure 3). One surgeon-assessor completed the post-encounter questionnaire, rating all six items 5.

Pilot Testing-Clinical

Seven trainees were assessed and self-assessed during 64 live anatomic lung resections for cancer at four participating institutions (Table 1). Assessments were performed by sixteen general thoracic surgeons. There were 54 lobectomies, 7 segmentectomies, and 3 bilobectomies. There were 49 VATS cases, 14 thoracotomy and 1 robotic. The mean difficulty rating was 1.9/3. The pooled performance over time for assessments by surgeons and self-assessments is displayed in Figure 4. Overall performance appeared to plateau at approximately 16 months from the trainees’ start of training, or about a third of the way through year two. At this point, trainees’ average item scores approached 5. Reliability of the instrument (Cronbach’s alpha) was high, 0.93. Inter-observer reliability between assessments and self-assessments was moderately high (k=0.73). Correlation of TCAT-ARC with OSATS was moderately high (k=0.68). The item analysis revealed that the highest difficulty items tended to be those in the “resection” section, including division of the pulmonary artery, bronchus and vein (Table 2). Several of these difficult items were the most negatively correlated with the overall case difficulty, such that in more difficult cases, scores decreased on these items. This included division of the lung parenchyma in the fissure, identification of bronchial and vascular anatomy and division of the pulmonary artery. Other difficult items, including appropriate use of assistants and division of the pulmonary vein had correlation with overall case difficulty close to zero. Scores on these items tended to be low regardless of whether the case was difficult or not. The most discriminatory items tended to be those with the lowest difficulty scores, for example ensuring proper prep and drape of the patient and visualizing lung re-expansion.

8

Practicing Surgeon Assessments

Five general thoracic surgeons performed fifteen anatomic lung resections (4 segmentectomies, 9 lobectomies (1 with bronchoplasty), 1 bilobectomy and 1 pneumonectomy, of which 8 cases were by VATS). Average difficulty rating for these cases was 2.3/3. Average item score for practicing surgeons was 4.96, with 9/15 performances scoring 5/5 on each item of the case.

Post-Encounter Questionnaire

Six trainees (response rate 88%) and twelve surgeons (response rate 75%) completed the postencounter feedback questionnaire (Figure 5). Both trainees and assessors rated measures of userfriendliness and educational effectiveness highly. Neither group thought the instrument was too time consuming or confusing to use. Trainees stated they would want to be assessed with TCAT-ARC every few months (3/5, 60%) or every month (2/5, 40%). Surgeons stated they would want to assess their trainees monthly (8/12, 67%), weekly (2/12, 17%) or with each procedure (2/12, 17%). Trainees commented that TCAT-ARC facilitated and motivated their assessor to “provide organized constructive feedback” by laying out the individual components of an anatomic lung resection, “help(ing) the instructor think of which steps are performed adequately or not.” Some felt that the tool was most useful early in training, stating that when the trainee attained a high level of competence and consistently scored well on most steps, comments from their assessor became more useful to them than their score. It is important then, that assessors felt TCAT-ARC “provided a framework to discuss a trainee’s strengths and deficiencies” and that it “makes you think about the steps in the procedure. Previously we assumed trainees understood these aspects, but it’s clear when you ask there are some things they don’t know.” Most thought that the tool had a positive impact on their trainee and some felt that it “makes me a better educator.”

9

Comment

The utility of an educational assessment can be judged using the framework proposed by van der Vleuten, composed of reliability, validity, educational impact, acceptability and cost effectiveness (12). This study, the largest study to date of a lung resection competence assessment, provides evidence for TCAT-ARC in each of these domains (with the exception of cost). Created using a multi-step process involving expert review procedure and simulated and clinical pilot testing, TCAT-ARC demonstrated evidence of high reliability and both internal and external validity when evaluated with trainees from a range of experience levels and in two different countries. Importantly, it was also highly-rated by both trainees and educators as both user-friendly and educationally effective. Some of the strongest validity evidence comes from the item analysis. Items within TCAT-ARC function in a way that makes logical and intuitive sense to us as surgical educators. The most challenging items in the instrument were some of the most high-stakes and vital steps of lung resection: division of the pulmonary vasculature and bronchus. The least challenging items were those that would be expected to be easy for any thoracic surgery trainee. These items may still be useful because they were highly discriminatory (e.g. ability to prep and drape a patient-if a trainee can’t complete this step they are unlikely to perform well at any step). That is, these easy items are more specific for identifying trainees who are struggling overall. Conversely, the more difficult items tend to be difficult in most assessmentseven if a trainee has good skills overall, division of the artery and bronchus can still present a challenge, meaning these questions are not highly correlated to overall score performance but are still crucial to determining whether a trainee is ready to perform the procedure. Items in the resection section were the most inversely correlated with overall case difficulty. As the overall case becomes harder, performance on these items tended to decline, as expected. An exception is the division of the pulmonary vein, which is typically the easiest of the hilar structures to divide. We hypothesize that in the minds of trainees and assessors, the overall rating of the case difficulty was

10

dominated more by difficulty with the artery and bronchus, as the two highest-stakes steps during anatomic lung resection. As expected, practicing surgeons had average item scores approaching 5/5, with most assessments scoring 5/5 on all items. Meanwhile, while trainees achieved surgeon-level proficiency sporadically throughout the study period, this occurred more consistently as time in training increased, as the average trainee performance approached but did not reach the average score of experienced surgeons by the end of the third year. The suggests that the best utility for this instrument may be for trainees early on in their training where there is more still to master, rather than after a baseline of proficiency has been attained. This is supported by comments from trainees who felt TCAT-ARC was most useful early in training. The high mean item scores may be partly due the Hawthorne effect. Alternatively, trainees may only have been allowed to attempt steps that their supervisor thought they were likely to succeed at, for reasons of patient safety or efficiency, meaning steps they did not attempt received a score of “not applicable” and did not factor into the mean score. Furthermore, a score of 5/5 denotes the ability to execute the step safely without assistance. This scale is not intended to capture other factors that would discrimiante this proficiency (attainable by a trainee) from mastery (as may be exhibited by a practicing surgeon). Moreover, the overall score is a poor measure of overall competence as it is insensitive to a single, potentially disastrous error. One case in the clinical pilot test highlights this principle: The trainee performing a VATS left upper lobe segmentectomy caused an injury to the pulmonary vein, requiring conversion to thoracotomy to repair. The trainee performed well and scored highly on all other steps of the procedure (and 5/5 on all components of OSATS) but had an isolated low score on the item related to dissection of the vein. This could rightly be judged to be a “fail” on performance of the segmentectomy, even though the overall score was high. This illustrates the importance of focusing on individual item performance rather than total score, and the insensitivity of generic instruments like OSATS to identify these types of isolated, serious deficiencies. For this reason we did not attempt to set a threshold score that would define a minimum level of competence. Nor do we believe that an average item score is a useful

11

measure of overall trainee competence, in part because some items (e.g. division of the pulmonary artery) clearly deserve more weight than others, and also because a mean item score does not reflect when a trainee was unable to complete a case and the surgeon had to take over. TCAT-ARC is intended to allow assessment of trainee abilities on a step-by-step, skill-by-skill level by reviewing a trainee’s performance on each item individually. This allows educators and the trainees themselves to better understand which steps they have mastered and which still need further work before entrustment for independent performance can occur. Feedback from those who used the instrument was consistently positive. They felt TCAT-ARC provided useful information about the trainee’s competence to both assessor and trainee, at least in part by providing a framework on which to structure feedback on individual steps of the operation. TCAT-ARC thus seems more appropriate and useful for formative feedback, throughout training, than as a summative assessment of ultimate competence. Ultimately, the final judgement of whether a trainee is ready to perform an anatomic lung resection independently should take into account as many data points as possible, including multiple competence assessments spread across time as well as more subjective, gestalt assessments of a trainee’s readiness for practice (13). This study’s major limitation is the sample size. Seven trainees participated in the clinical validation trial, but sixteen surgeons provided assessments. Five of seven trainees were assessed by more than one surgeon. The majority of the procedures were performed by a single trainee, though seven different surgeons provided assessments for that trainee. Results were similar when excluding that trainee, suggesting their data did not introduce significant bias. Other limitations include response and recall bias. The data presented here are not enough to recommend that TCAT-ARC be used alone for high-stakes summative competency judgements, such as the decision to promote or graduate a trainee, or to certify a surgeon for independent practice. The optimal use of TCAT-ARC may be to provide ongoing, finegrained formative feedback to trainees throughout their development.. This addresses an important need in how we assess our thoracic surgery trainees, and allows the benefit of being able to guide the progress

12

of our trainees while they are still in development, rather than waiting to assess them once training is completed. The robust development process and early evidence of reliability, as well as internal and external validity, suggest that TCAT-ARC may have a role to play in the assessment of competence in anatomic lung resection, though more data is needed. In particular, TCAT-ARC helped facilitate specific and detailed formative feedback, which in turn may help trainees develop their skills further. Given its userfriendliness and the consistently positive feedback from both trainees and surgeons alike, this instrument could be easily incorporated into training programs to supplement existing competency assessment strategies.

13

The authors acknowledge the mentorship of the Thoracic Education Cooperative Group (TECoG) and the assistance of the surgeons participating in the study: MSKCC-Drs. Adusumilli, Bains, Bott, Downey, Isbell, Jones, Molena and Sihag; SMCCI: Drs. Aye, Costas and Vallières; UA-Drs. Laing, Johnson, Stewart and Valji, UWO-Drs. Fortin, Frechette and Malthaner.

14

References [1] ten Cate O. Competency-based postgraduate medical education: Past, present and future. GMS J Med Educ 2017;34:Doc69. [2] Frank JR, Mungroo R, Ahmad Y, Wang M, De Rossi S, Horsley T. Toward a definition of competency-based education in medicine: A systematic review of published definitions. Med Teach 2010;32:631-7. [3] Harris K, Frank J, eds. The Royal College of Physicians and Surgeons of Canada. Competence by Design: Reshaping Canadian Medical Education. Ottawa: The Royal College of Physicians and Surgeons of Canada 2014. [4] Nasca TJ, Philibert I, Brigham T, Flynn TC. The next GME accreditation system-Rationale and benefits. N Engl J Med 2012;366:1051-6. [5] Cumming A, Ross M. The Tuning Project for Medicine-Learning outcomes for undergraduate medical education in Europe. Med Teach 2007;29:636-41. [6] Turner SR, Nasir BS, Lai H, Yasufuku K, Schieman C, Louie BE, Bédard EL. Thoracic competency assessment tool for invasive staging (TCAT-IS): Development and pilot testing. Ann Thorac Surg. (in press). [7] Keeny S, Hasson F, McKenna H: The Delphi Technique in Nursing and Health Research. 1st ed. Chichester, UK: Wiley-Blackwell, 2011. [8] Scott EA, Black N. When does consensus exist in expert panels. J Pub Health Med. 1991;13:35-9. [9] Shapley L, Grofman B. Optimizing group judgemental accuracy in the presence of interdependencies. Pub Choice 1984;43:329-43. [10] Schieman C, Ujie H, Donahoe L, Hanna W, Malthaner R, Turner S, et al. Developing a national, simulation-based, surgical skills bootcamp in general thoracic surgery. J Surg Educ 2018;75:1106-12. [11] Martin JA, Regehr G, Reznik R, MacRae H, Murnaghan J, Hutchison C, et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg. 1997;84:273-8.

15

[12] van der Vleuten CP. The assessment of professional competence: Developments, research and practical implications. Adv Health Sci Ed. 1996;1:41-67. [13] Hodges B. Assessment in the post-psychometric era: Learning to love the subjective and collective. Med Teach 2013;35:564-8.

16

Table 1. Clinical pilot test trainees. Participant

Type of Training

Country

Year of Training

Study Cases Performed

1

General Thoracic

Canada

1

6

2

General Thoracic (non-

USA

1

4

accredited) 3

General Thoracic

Canada

2

2

4

Cardiothoracic

USA

2

3

General Thoracic

Canada

2

32

Thoracic Fellowship‡

USA

3

10

6

Thoracic Fellowship‡

USA

3

4

7

Thoracic Fellowship‡

USA

3

3

(thoracic track) 5*

All trainees completed general surgery residency before entering thoracic training. *One trainee participated in two institutions, in the second and third year of training respectively. ‡Trainees completing an additional year of training after completing two years of accredited general thoracic or cardiothoracic training.

17

Table 2. Item analysis for each item in the TCAT-ARC instrument. Section

Item

Brief Descriptor

Point

Difficulty

Biserial

Correlation With Case Difficulty

Preoperative

General

Thoracoscopic

Resection

1

Suitability

0.64

0.37

-0.08

2

Consent

0.70

0.26

-0.08

3

Imaging

0.46

0.31

0.00

4

Approach

0.71

0.24

-0.08

5

Lung isolation

0.59

0.19

0.02

6

Positioning

0.71

0.19

-0.05

7

Prep and drape

0.80

0.15

-0.02

8

Time-out

0.62

0.17

-0.05

9

Incisions

0.50

0.37

-0.03

10

Examination

0.68

0.15

-0.02

11

Tissue handling

0.49

0.35

-0.07

12

Hand motions

0.55

0.56

-0.43

13

Use of assistants

0.42

0.56

-0.02

14

Assisting

0.52

0.39

-0.14

15

Communication

0.67

0.15

-0.17

16

Ports

0.59

0.31

-0.11

17

Thoracoscopy

0.58

0.35

0.03

18

Pre-op conditions

0.57

0.33

-0.01

19

Intra-op conditions

0.63

0.44

-0.27

20

PA injury demo

0.50

0.46

0.07

21

Mobilization

0.62

0.35

-0.13

18

Post-Resection

22

Lung parenchyma

0.53

0.48

-0.34

23

Hilar anatomy

0.54

0.54

-0.47

24

Artery

0.46

0.63

-0.37

25

Vein

0.36

0.56

-0.07

26

Bronchus

0.41

0.69

-0.36

27

Specimen removal

0.66

0.17

-0.11

28

Nodal anatomy

0.51

0.48

-0.34

29

Node harvest

0.48

0.59

-0.45

30

Order of steps

0.74

0.28

-0.26

31

Air leak check

0.11

0.70

0.06

32

Hemostasis

0.66

0.24

-0.08

33

Anesthesia

0.05

0.65

-0.18

34

Chest drain

0.77

0.13

-0.17

35

Lung re-expansion

0.79

0.13

-0.25

19

Figure Legends Figure 1. Study schema for instrument development and initial collection of validity evidence.

Figure 2. (A, B) The Thoracic Competency Assessment Tool-Anatomic Lung Resection for Cancer (TCAT-ARC)

Figure 3. Average responses to simulated lobectomy pilot test feedback questionnaire (5-point Likert).

Figure 4. Performance of trainees over time (assessments and self-assessments). Dotted line = trend line of best fit (least squares method), dashed line = practicing surgeon average score.

Figure 5. Average responses to clinical validation study post-encounter feedback questionnaire (5-point Likert).

20