Journal Pre-proof The Canada Lymph Node Score For Prediction of Malignancy in Mediastinal Lymph Nodes During Endobronchial Ultrasound Danielle A. Hylton, BSc., MSc., Simon Turner, MD., Biniam Kidane, MD., MSc., Jonathan Spicer, MD., PhD., Feng Xie, MSc., PhD., Forough Farrokhyar, MPhil., PhD., Kazuhiro Yasufuku, MD., PhD., John Agzarian, MD., MPH., Waël C. Hanna, MDCM., MBA, Canadian Association of Thoracic Surgery (CATS) Working Group PII:
S0022-5223(19)33477-4
DOI:
https://doi.org/10.1016/j.jtcvs.2019.10.205
Reference:
YMTC 15399
To appear in:
The Journal of Thoracic and Cardiovascular Surgery
Received Date: 22 May 2019 Revised Date:
23 October 2019
Accepted Date: 29 October 2019
Please cite this article as: Hylton DA, Turner S, Kidane B, Spicer J, Xie F, Farrokhyar F, Yasufuku K, Agzarian J, Hanna WC, and the Canadian Association of Thoracic Surgery (CATS) Working Group, The Canada Lymph Node Score For Prediction of Malignancy in Mediastinal Lymph Nodes During Endobronchial Ultrasound, The Journal of Thoracic and Cardiovascular Surgery (2019), doi: https:// doi.org/10.1016/j.jtcvs.2019.10.205. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Copyright © 2019 Published by Elsevier Inc. on behalf of The American Association for Thoracic Surgery
1Title: The Canada Lymph Node Score For Prediction of Malignancy in Mediastinal Lymph 2Nodes During Endobronchial Ultrasound 3 4Short
Title:
Canada
Lymph
Node
Score:
Malignancy
Scoring
System
5 6Authors: Danielle A. Hylton, BSc., MSc., a Simon Turner, MD.,b Biniam Kidane, MD., MSc.,c 7Jonathan Spicer, MD., PhD.,d Feng Xie, MSc., PhD.,a Forough Farrokhyar, MPhil., PhD.,a 8Kazuhiro Yasufuku, MD., PhD.,e John Agzarian, MD., MPH.,f and Waël C. Hanna, MDCM., 9MBAa,f, and the Canadian Association of Thoracic Surgery (CATS) Working Group 10 11aDepartment of Health Research Methods, Evidence, and Impact, McMaster University, 1280 12Main Street West, Hamilton, Ontario, Canada, L8S 4L8 13bDivision of Thoracic Surgery, Department of Surgery, University of Alberta, WC Mackenzie 14Health Sciences Centre, 8440 112 St. NW, Edmonton, Alberta, Canada, T6G 2R7 15cDivision of Thoracic Surgery, Department of Surgery, University of Manitoba, Health Sciences 16Centre, 820 Sherbrook Street, Winnipeg, Manitoba, Canada, R3A 1R9 17dDivision of Thoracic Surgery and Upper Gastrointestinal Surgery, Department of Surgery, 18McGill University, Montreal General Hospital, 1650 Cedar Avenue, Montreal, Quebec, Canada, 19H3G 1A4 20eDivision of Thoracic Surgery, Department of Surgery, University of Toronto, Toronto General 21Hospital, 200 Elizabeth Street, Toronto, Ontario, Canada, M5G 2C4 22fDivision of Thoracic Surgery, Department of Surgery, McMaster University, St. Joseph’s 23Healthcare Hamilton 50 Charlton Avenue East, Hamilton, Ontario, Canada, L8N 4A6 24
1
25Corresponding Author Details: Dr. Waël C. Hanna,
[email protected], Division of 26Thoracic Surgery, Department of Surgery, McMaster University, 50 Charlton Avenue East, 27Hamilton, Ontario, Canada, L8N 4A6 28 29Conflicts of Interest: 30 31The following authors report no conflicts of interest: DAH, ST, BK, JT, FX, FF, and JA. JS 32reports the following: Dr. Spicer is the principal investigator for the Checkmate 816 trial with 33Bristol-Myers-Squibb, member of the Speaker’s Bureau for Bristol-Myers-Squibb, member of 34the advisory board for AstraZeneca, and holds a research grant with Pattern Pharma. WCH 35reports the following: Dr. Hanna is member of the Speaker’s Bureau with Minogue Medical, 36member of the Data and Safety Monitoring Board for Roche/Genentech, and holds a research 37grant with Intuitive Surgical. KY reports the following: Dr. Yasufuku has received unrestricted 38grants from Olympus Medical Systems for continuing medical education. 39 40Funding: This work was supported by the McMaster Surgical Associates (MSA) grant 2016. 41 42Clinical Trial Registration Number: NCT02793713 43 44Date of IRB Approval & IRB Number: July 25th, 2016; Hamilton Integrated Research Ethics 45Board (2016-1876-GRA). 46Text word count: 3,453 48Abstract word count: 220 50Number of References: 17 51 52Number of Figures: 4 53 54Number of Tables: 4 55 56Number of Videos: 1 57
2
58Number of Supplementary Figures: 3 59 60Number of Supplementary Tables: 4 61 62Keywords: endobronchial ultrasound, lung cancer, lymph nodes 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
3
104Glossary of Abbreviations: 105 106 1. CI = Confidence interval 107
2. CLNS = Canada Lymph Node Score
108
3. CT = Computed tomography
109
4. EBUS-TBNA = Endobronchial ultrasound trans-bronchial needle aspiration
110
5. MRI = Magnetic resonance imaging
111
6. PET = Positron emission tomography
112
7. ROC = Receiver operator characteristic
113Central Message: 114Using a four-point score, lymph nodes scoring ≥ 3 indicate malignancy and may inform 115decision-making for biopsy, repeat biopsy, or mediastinoscopy if initial results are inconclusive. 116 117Perspective Statement: 118Ultrasonographic features of mediastinal lymph nodes can predict malignancy during 119endobronchial ultrasound. When initial biopsy results are insufficient, application of an 120ultrasonographic score can determine when a repeat biopsy is necessary. This score may also 121guide decision-making regarding which lymph nodes should be biopsied, with the potential to 122reduce the number of biopsies required. 123 124 125 126 127 128 129 130 131 132 133
4
134Abstract: 135 136Objective(s): During endobronchial ultrasound (EBUS) staging, ultrasonographic features can be 137used to predict mediastinal lymph node (LN) malignancy. We sought to develop the Canada 138Lymph Node Score (CLNS) a tool capable of predicting LN metastasis at the time of EBUS. 139 140Methods: Patients undergoing EBUS staging for lung and esophageal cancer were prospectively 141enrolled. Features were identified in real-time by an endoscopist and video-recorded. Videos 142were sent to raters. Pathological specimens from biopsies/surgical resections were used as the 143gold-standard reference test. Logistic regression, receiver operator characteristic curve, and 144Gwet’s AC1 analyses were used to test the performance, discrimination, and inter-rater 145reliability, respectively. 146Results: In total, 300 LNs from 140 patients were analyzed by 12 endoscopists (raters) across 7 147Canadian centres. Beta-coefficients from a multivariate regression model were used to create a 4148point score: short-axis diameter, margins, central hilar structure, and necrosis. The model showed 149good discriminatory power (c-statistic= 0.72 ± 0.04, 95%CI: 0.64-0.80; bias-corrected c-statistic: 1500.66, 95%CI: 0.55-0.76). LNs scoring 3/4 or 4/4 had odds ratios of 15.17 (p<0.0001) and 50.56 151(p=0.001) for predicting malignancy, respectively. Inter-rater reliability for a score ≥ 3 was 0.81 152± 0.02 (95%CI: 0.77-0.85). 153 154Conclusions: The CLNS is a 4-point score demonstrating excellent performance in identifying 155malignant LNs during EBUS. A cut-off of ≥ 3 may inform decision-making regarding biopsy, 156repeat
biopsy,
or
mediastinoscopy
if
the
initial
results
are
inconclusive.
158
5
159Introduction: 160 161 Endobronchial ultrasound transbronchial needle aspiration (EBUS-TBNA) is the first-line 162investigation for mediastinal staging in patients with lung cancer. Unlike mediastinoscopy, 163EBUS allows for the visualization and identification of several ultrasonographic nodal features 164that can be predictive of malignancy. These features include: small axis diameter (size), central 165hilar structure, central necrosis, and margin status (Figure 1).2,3,4 These features become 166particularly useful when lymph node biopsies are inadequate or inconclusive, as is the case in as 167much as 42% of EBUS-TBNA procedures,5 when a repeat EBUS-TBNA or mediastinoscopy is 168mandated.6 169 170
There have been several attempts to develop diagnostic scores for predicting malignancy
171based on these ultrasonographic features. While those scores remain useful in clinical situations, 172there has been some limitations to their wide-spread applicability, including lack of weighting to 173reflect the features most predictive of malignancy, 7 lack of external validation, 8 and 174overestimation of the possibility of malignancy. 9,10 As such, lymph node scoring systems have 175not been incorporated into clinical guidelines for lung cancer staging, and have not gained wide176spread acceptance by clinicians. 177 178
Building on previous work, we hypothesized that a novel scoring system, called the
179Canada Lymph Node Score (CLNS), could accurately and reliably predict malignancy in 180mediastinal lymph nodes at the time of EBUS staging for lung and esophageal cancer. 181 182 183Materials and Methods: 184 185Design and Patient Selection: 186
6
187
To improve upon the previous malignancy scores, we endeavoured to develop a
188predictive scoring tool using prospective data collection, model calibration, discrimination 189assessment, complete inter-rater reliability evaluation, and a Canada-wide external validation 190assessment. A prospective cohort validation design was used and the study was divided into two 191parts. 192 193Part I: Data Collection and Assessment of Validity: 195A combination of consecutive and convenience sampling were used to recruit patients 196undergoing EBUS for staging of suspected or confirmed lung or esophageal cancer. Informed 197consent was obtained prior to enrolment. There were no exclusion criteria, except for 198neoadjuvant chemotherapy, in order to avoid nodal down-staging as a confounding variable. 199Lymph nodes were included in the study whether positive or negative on PET scan. 200
At the time of EBUS, all ultrasonographic features were identified and a score was
201assigned to each lymph node in real-time (details below). A screen-recording device was used to 202record each lymph node assessment for reliability and validation experiments. lymph nodes were 203then biopsied after completion of ultrasound assessment. Features of malignancy for each lymph 204node were then compared to the gold standard of cytological specimen (if the node was not 205excised surgically) or operative pathological specimen (if the node was excised surgically during 206subsequent lung resection). Lymph node stations were identified using the International 207Association for the Study of Lung Cancer Lymph Node Mapping Nomenclature.11 208 209 211 212Part II: Assessment of external reliability:
7
214
Thoracic surgeons and interventional respirologists (raters) from across Canada were
215invited to participate in identifying ultrasonographic features from the lymph nodes included in 216Part I. Prior to participating, all raters were required to complete an online education module. 217Only raters who successfully passed the competency test (correctly score a lymph node) at the 218end of the module were eligible to participate in the study. The education module was specially 219designed using competency enhancing educational theories to encourage mastery of the core 220concepts related to correctly identifying ultrasonographic features. 12 The module was comprised 221of six different sections including: 1) pre-educational assessment, 2) educational videos, 3) 222ultrasonographic criteria identification with corrective feedback, 4) practice with corrective 223feedback, 5) summary of learned concepts, and 6) post-education assessment. Raters were then 224asked to watch the lymph node videos from Part I and assign a score to each lymph node. Raters 225were blinded to all information from other staging tests and final pathology results of each lymph 226node. 227 228
This study was conducted in accordance with the Declaration of Helsinki and approved
229by the Hamilton Integrated Research Ethics Board (2016-1876-GRA). 230 231Endobronchial Ultrasound Trans-Bronchial Needle Aspiration: 232 In Part I, EBUS-TBNA, ultrasonographic feature identification, and video-recording were 233 234completed by the same thoracic surgeon (WCH) at a tertiary centre for thoracic oncology site. 235Procedures were performed via the oral route under deep sedation using midazolam and fentanyl, 236using an Olympus endoscope (Olympus, Shinjuku-ku, Tokyo, Japan) with convex probe EBUS 237and the EU-ME1 transducer (Olympus, Shinjuku, Tokyo, Japan). After identification of lymph 238node stations using anatomical landmarks in the airway and mediastinum, the dimensions (axes
8
239lengths) of the lymph nodes were measured using frozen images. The other ultrasonographic 240features were identified and recorded in real-time. Transbronchial needle aspiration with a 22241gauge needle was then performed in the usual manner under ultrasound guidance. The specimen 242was spread onto glass slides, fixed, and air-dried. The dried slides were evaluated immediately 243by a cytology technician via rapid on-site examination to determine whether they were adequate 244for pathological analysis. 245Analysis of Ultrasound Images and Score Assignment: 246 247 In Part II, the following ultrasonographic features were identified according to the strict 248definitions by Fujiwara et al. (2010)4: 249
1. Round shape: defined as the ratio of the long and small axis being less than 1.5.
250
2. Well-defined margins: > 50% echogenic line delimiting the lymph node.
251
3. Heterogeneous echogenicity: presence of non-uniform echogenic patterns.
252
4. Absence of a central hilar structure: missing a flat, central, echogenic structure in the
253
lymph node.
254
5. Small axis > 10 mm: presence of a small axis length greater than 10 mm.
255 256
6. Presence of central necrosis: presence of a central hypoechoic structure in the lymph node.
258
Eligible raters were given access to an online survey system housed by the Research
259Electronic Data Capture system. This system dispatched 10 lymph node videos per survey to 260each rater. The system continued to send batches of 10 new lymph node videos (one survey) to 261each rater until they either opted out of the study or reached the total sample of videos. 262 263Statistical Analysis: 264 The lymph node was used as the unit of analysis. The data are presented as mean ± 265
9
266standard deviation, median (range), or number (percentage). Negative likelihood ratios, positive 267likelihood ratios, sensitivity, specificity, positive predictive value, and negative predictive value 268were calculated using pathological diagnosis from surgical specimen and/or lymph node 269aspirates as the gold standard. A cut-off of ≥ 3 was used to calculate sensitivity, specificity, 270negative predictive value, and positive predictive value. Pearson’s chi-square analysis was used 271to assess categorical variables. Univariate binary logistic regression was completed for each 272ultrasonographic feature (shape, echogenicity, central hilar structure, small axis length, and 273central necrosis) (significance level α =0.05 ¿. Binary logistic regression was used to identify 274features significantly predictive of malignancy and build a prediction model. A combination of 275backwards and stepwise automatic variable selection procedures and clinical judgment were used 276to build the final multivariate model. Receiver operating characteristics (ROC) curve and 277concordance statistic (c-statistic) were used to evaluate the model’s discriminatory capability. 278Bootstrapping (using 1000 repetitions) was used where appropriate to calculate optimism 279corrected 95% CIs. Model calibration was assessed using the Hosmer-Lemeshow test. 280
Gwet’s AC1 coefficient was used to calculate inter-rater reliability based on expert
281evaluation by 12 endoscopists from 7 Canadian centres. Gwet’s AC1 coefficient is an alternative 282to the commonly used Cohen’s Kappa for determining inter-rater reliability. Gwet’s AC1 283calculates coefficients across more than two raters and avoids Kappa’s Paradox. It was therefore 284deemed most appropriate for this study design. Interpretation of Gwet’s AC1 is similar to that of 285Cohen’s Kappa, higher coefficient values (>0.6) indicate greater inter-rater reliability. To 286develop the CLNS, a previously published approach by Han et al. was used. 13 This method uses 287the beta-coefficients from the multivariate logistic regression to determine the scoring values for 288each included ultrasonographic feature.12 The sample size was calculated based on confidence
10
289intervals rather than power and using a normal approximation to a binomial distribution. 14 290Assuming 90.00% sensitivity and specificity, a prevalence of 30% for malignant lymph nodes, 291and an alpha error of 0.05 for EBUS, 300 lymph nodes would provide adequate precision with 292confidence intervals of ± 3.50% for diagnostic properties. Statistical analyses were performed 293with STATA 15 (StataCorp 15, 2017, College Station, Texas, United States of America) and 294levels of significance were set at p<0.05. The funding agency for this study, McMaster Surgical 295Associates, was not involved in any statistical analysis or data interpretation related to this study. 296Results: 298Patient Characteristics and Histology Results: 299 In total, 140 patients were recruited. The majority (55.0%, n=77) had lung cancer, 22.9% 300 301(n=32) had esophageal cancer, and 22.1% had benign disease (n=31). The average age of 302participants was 68±10.6 years and 54.3% (n=76) were male (Table 1). Clinical staging with 303chest Computed Tomography (CT) and PET scans were performed in 99.3% of patients prior to 304EBUS. A median of 3 (1-4) lymph nodes were sampled per lung cancer patient. Of the 300 305lymph nodes sampled, the most commonly biopsied stations were 7 (43.0%) and 4R (28.7%). 306The prevalence of malignant disease in the lymph nodes was 18.0% (Figure 2). Of the lymph 307nodes pathologically determined to be malignant (n=54), 96.3% were initially confirmed as 308malignant during EBUS. After surgical resection, an additional two lymph nodes were confirmed 309as
malignant
(2/54,
3.7%).
310 311Ultrasonographic Feature Relationship with Lymph Node Malignancy: 312
Analysis of the ultrasonographic features of lymph nodes is shown in e-Table 1. The 4
313features of small axis length, absence of central hilar structure, margins, and central necrosis
11
314were predictive of malignancy on Pearson’s chi-square test, whereas the 2 features of 315echogenicity and shape were not. Backwards elimination and stepwise modeling (e-Table 2) both 316demonstrated that central hilar structure, small axis length, and margins were significant 317predictors of malignancy, but central necrosis was not (p=0.142) (Table 2). However, because of 318the low prevalence of this finding (4.7%, n=14) and overwhelming literature that suggests its 319association with malignancy, it was kept in the model on clinical grounds. 4 The final multivariate 320model included the following features: central hilar structure, small axis length, margins, and 321central necrosis. 322Model Calibration & Discrimination: 323 The Hosmer-Lemeshow test was used to assess the calibration of the model. The 324 325resulting chi-square value was 2.99 and coincided with a p-value equal to 0.89 (corresponding 326calibration plot depicted in e-Figure 1). The high p-value suggests that the null hypothesis (the 327model does fit the data well) must be retained. Figure 3 depicts the ROC curve for the 328multivariate model. The c-statistic of 0.72 ± 0.04 (normal 95%CI: 0.64-0.80) (bias-corrected c329statistic: 0.66, 95%CI: 0.55-0.76) indicates good discriminatory capability for distinguishing 330between malignant and benign lymph nodes. 331 332Development of the Novel Predictive Score: 333 Using formulae from Han et al. (2016) the smallest beta coefficient from the multivariate 334 335model (central hilar structure ß=0.81) was used as a base constant.12 The beta-coefficients for 336each feature in the multivariate model were then divided by the base constant. The resulting 337quotient provided the score value. The results are described in e-Table 3. Based on calculations, 338each covariate was allotted a score of 1 resulting in a maximum score of 4 for each lymph node. 339Lymph nodes had the following CLNS distribution: 0 = 34.3% (n=103), 1 = 37.0% (n=111), 2 =
12
34020.0% (n=60), 3 = 6.7% (n=20), 4 = 2.0% (n=6). This distribution corresponds with the 341pathological gold-standard analysis which found 82.0% (n=246) of the study sample to be 342benign. 343
A logistic regression was completed to evaluate the CLNS (Table 2). A score of one (1/4)
344was not a statistically significant predictor of malignancy, scores ≥ 2 were significant. Figure e345Figure 2 describes the increase in probability as the lymph node scores increase. Lymph nodes 346scoring 0-2 had low probabilities of being malignant, whereas lymph nodes scoring 3-4 had 347higher likelihood of being malignant (e-Figure 2). This figure also illustrates why although a 348score of 2 was a significant predictor of malignancy, a score ≥ 3 was considered the cut-off for 349malignancy. The steep increase in probability for lymph nodes scoring ≥ 3 suggests improved 350clinical significance than a score of two. Table 3 summarizes the sensitivity, specificity, and 351positive and negative likelihood ratios for each possible CLNS score. A lymph node scoring four 352had a specificity of 99.59% and a positive likelihood ratio of 22.78. The CLNS, when using ≥ 3 353as the cut-off, was associated with an overall sensitivity, specificity, positive predictive value, 354and negative predictive value of 31.5%, 96.3%, 65.4%, and 86.5%, respectively. 355 356External Evaluation of the CLNS Across Canada: 357 A range of 10-300 lymph node were evaluated per rater. Reliability for the individual 358 359features ranged from 0.25±0.03 (95%CI: 0.18-0.31) for echogenicity and 0.77±0.02 (95%CI: 3600.72-0.82) for central necrosis (e-Table 4). The agreement between raters (n=12) on the raw 361scores was 0.29±0.02 (95%CI: 0.25-0.33. Reliability improved to 0.74±0.02 (95%CI: 0.70-0.79) 362when only the raters who reviewed all 300 lymph nodes were considered. The inter-rater 363reliability of a CLNS≥3 was 0.81±0.02 (95%CI: 0.77-0.85) (Table 4). 364Discussion:
13
365 366
The current standard of care at the time of EBUS staging of the mediastinum for lung
367cancer is comprehensive and systematic EBUS-TBNA sampling of all the relevant nodal 368stations, with a minimum of three stations (4R, 4L, and 7). 6 In reality, this happens less than 50% 369of the time- likely due to technical difficulties, patient factors, small lymph nodes, inadequate 370samples, and inconclusive pathological interpretation 15. In those cases, where a pathological 371diagnosis is unobtainable, ultrasonographic assessment with a malignancy score can serve as a 372helpful adjunct to guide clinical decisions. However, ultrasonographic features are not frequently 373reported by clinicians, likely due to the complexity of previous scores, and to the lack of high 374quality evidence confirming their reliability and validity. In this work, we address this problem 375by developing a 4-point score that was tested with robust statistical methodology and widely 376validated across Canada (Figure 4). 377 378
To the best of our knowledge, there exists four other scores using ultrasonographic
379features to predict malignancy (e-Figure 3). Each of these studies was retrospective, and only one 380included a prospective internal validation. Three did not use beta-coefficients to develop their 381scoring mechanisms,7,8,9 instead, arbitrary weights were assigned for each score component. 382Development of the CLNS incorporated the magnitude of the relationship for each feature based 383on their beta-coefficients thus improving the accuracy of the score. The analysis by Evison et al. 384found three significant predictors of malignancy, with only one being an ultrasonographic feature 385(echogenicity) and the remaining two being related to PET10, limiting its use in PET-negative 386lymph nodes. The CLNS strictly includes features that can be assessed during EBUS procedures, 387therefore lymph nodes not imaged on PET can still be accurately assessed. The study by Alici et 388al. was the only one to not report a formal inter-rater reliability assessment. 9 Schmid-Bindert et
14
389al. and Shafiek et al. reported raw agreements for each ultrasonographic feature, however these 390tend to overestimate the true level of agreement.7,8,16 Evison et al. assessed inter-rater reliability 391using Cohen’s kappa statistic, however the results were likely influenced by trait prevalence and 392base-rates making these results incomparable across different populations. 10,17,18 Unlike previous 393published studies, we formally assessed the reliability of our predictive score in a sample of 12 394raters and determined that a the CLNS has a high level of agreement among raters, especially 395when expert raters who looked at 300 lymph nodes were considered. We acknowledge that the 396external reliability assessment for ultrasonographic features had a relatively high degree of 397variability. We hypothesize that this observed variability was due to the operator-dependent 398nature of ultrasound assessment and the varying level of expertise of the raters. These two factors 399represent the reality of ultrasound assessment, which further reinforces the generalizability of the 400Canada Lymph Node Score. 401
Clinically, the results of the CLNS suggest that the malignancy status of lymph nodes can
402be predicted based on ultrasonographic features. The pathological information generated from 403using the CLNS may have a profound impact in the event of insufficient biopsy results. In such 404instances, it is possible that clinicians may be able to determine whether a repeat EBUS 405procedure needs to be completed based on the CLNS. However, this will require external 406validation before it becomes a reality. Alternatively, the CLNS can be used prior to EBUS407TBNA to score lymph nodes, and those scoring ≤ 2 may not require biopsy as they are unlikely 408to be malignant. 409
The CLNS is a high specificity-low sensitivity test, this in combination with the high
410negative predictive value and positive likelihood ratio (for CLNS >2) are all clinically significant 411factors suggesting that the CLNS is highly capable of detecting benign disease and ruling in
15
412malignancy. The low sensitivity of the CLNS suggests that it should be used as a second line 413diagnostic tool and potentially reduce or eliminate the need for mediastinoscopy after initial 414EBUS. Current standard of care guidelines mandate that at least three mediastinal stations need 415to be biopsied for complete staging. We posit that the CLNS can be used to document the score 416of each node prior to biopsy, however all nodes must still be biopsied in accordance with 417guidelines. In instances of inconclusive biopsy results, the CLNS can be used to determine the 418necessity of repeat EBUS-TBNA or mediastinoscopy. Research is ongoing to determine the 419clinical utility of the CLNS being used to determine if a lymph node should be biopsied and 420integration 421
with
CT/PET
results
(clinicaltrials.gov:
NCT03859349).
This study is not without limitations. First, it included only patients with suspected or
422confirmed lung or esophageal cancer. Lymphadenopathy is not only present in patients with 423cancer. Mediastinal lymph nodes in patients with sarcoidosis, certain autoimmune diseases, and 424tuberculosis can also exhibit similar ultrasonographic features presented above. 9,4 In this study, 425we have isolated the ultrasonographic features predictive of malignancy and developed a 426predictive tool without considering benign causes of lymphadenopathy, and therefore the 427applicability of the CLNS is limited in these conditions. Second, the CLNS did not include any 428features of preoperative CT or PET/CT, which are important staging studies that determine 429pretest probability of malignancy. We also acknowledge that our population only includes 430patients that have had previous PET and CT imaging, therefore predisposing this research to 431work-up bias. However, we are cognizant that mediastinal staging also occurs after initial 432imaging has ruled out distant disease. As such, in real world situations most patients undergoing 433EBUS-TBNA for staging will have already completed a full imaging work up, and therefore this 434study population reflects this fact. Third, despite logistic regression analysis suggesting that
16
435lymph nodes scoring 2 and 3 were highly significant and predictive of malignancy the 436corresponding probability of malignancy and positive predictive value are not clinically 437significant. To determine the true clinical importance of lymph nodes scoring 2 and 3 on the 438CLNS, further validation with larger sample sizes will be required. This work has laid the 439foundation for future research into predictive scores using ultrasonographic features. Future 440endeavours related to this research will need to include formal external validation with different 441clinicians and patients to understand the true clinical utility of the CLNS. This external 442validation step will determine the generalizability of the CLNS which is necessary for any 443scoring system. Until such external validation is conducted, pathologic diagnosis must remain 444the gold standard. Furthermore, implementation science methodology should be incorporated in 445future application of this research to accurately assess the potential for systematic integration of 446the CLNS. Other interesting research avenues include correlation of the score with CT/PET 447results (clinicaltrials.gov: NCT03859349). 448
The Canada Lymph Node Score can be used as a tool to enhance prediction of
449malignancy status of mediastinal lymph nodes during EBUS staging of the mediastinum for lung 450and esophageal cancers. In the event of inconclusive biopsies, the CLNS can be used to 451determine whether a repeat EBUS-TBNA or mediastinoscopy is required. The CLNS may also 452assist in decision-making at the time of EBUS with respect to which lymph nodes need to be 453biopsied. 454 455Acknowledgements: 456 457 WCH is the senior author of this manuscript, contributed significantly to study inception, 458study design, statistical analysis, revising the manuscript and accepts responsibility for the 459accuracy of the data analysis (guarantor). DAH had full access to the data and contributed
17
460substantially to the study design, data collection, statistical analysis and interpretation, writing of 461the manuscript, and accepts responsibility for the accuracy of the data analysis (guarantor). JH, 462ST, DF, CW, JM, BK, JS, JT, C Finley, YS, C Fahim, FX, FF, KY, and JA contributed 463substantially to study design, data collection, and revising the manuscript. 464
This work was supported by the McMaster Surgical Associates (MSA) grant 2016.
465
The following authors report no conflicts of interest: DAH, JH, ST, DF, CW, JM, BK, JT,
466CFinley, YS, CFahim, FX, FF, JA, and WCH. JS reports the following: Dr. Spicer is the 467principal investigator for the Checkmate 816 trial with Bristol-Myers-Squibb, member of the 468Speaker’s Bureau for Bristol-Myers-Squibb, member of the advisory board for AstraZeneca, and 469holds a research grant with Pattern Pharma. KY reports the following: Dr. Yasufuku has received 470unrestricted grants from Olympus Medical Systems for continuing medical education. 471References: 472 473 1. Silvestri GA, Gonzales AV, Jantz MA, Margolis ML, Gould MK, Tanoue LT, et al. 474
Methods of staging non-small cell lung cancer: Diagnosis and management of lung
475
cancer, 3rd edn. American College of Chest Physicians Evidence-Based Clinical Practice
476
Guidelines. Chest. 2013;143(Suppl):e211S–e250S. doi: 10.1378/chest.12-2355.
477 478
2. Akissue de Camargo Teixeria P, Chala LF, Shimizu C, Filassi JR, Maesaka JY, de Barros
479
N. Axillary Lymph Node Sonographic Features and Breast Tumor Characteristics as
480
Predictors of Malignancy: A Nomogram to Predict Risk. Ultrasound Med Biol. 2017;
481
43(9):1837-1845. doi: 10.1016/j.ultrasmedbio.2017.05.003.
482
18
483
3. Sun YS, Lyu HJ, Zhao YR, Zhang SS, Bai YX, Shi BY. Risk factors for central neck
484
lymph node metastases of papillary thyroid carcinoma. Zhonghua Er Bi Yan Hou Tou
485
Jing Wai Ke Za Zhi. 2017;52(6):421-25.
486 487
4. Fujiwara T, Yasufuku K, Nakajima T, Chiyo M, Yoshida S, Suzuki M et al. The Utility
488
of Sonographic Features During Endobronchial Ultrasound Guided Transbronchial
489
Needle Aspiration for Lymph Node Staging in Patients with Lung Cancer: A Standard
490
Endobronchial Ultrasound Image Classification System. Chest. 2010; 138(3):641-647.
491 492
5. Ortakoylu M, Iliaz S, Bahadir A, Aslan A, Iliaz R, Ozgul MA, et al. Diagnostic value of
493
endobronchial ultrasound-guided transbronchial needle aspiration in various lung
494
diseases. J. Bras. Pneumol. 2015;41(5): 410-14.
495 496
6. National Institute for Health and Care Excellence. Lung cancer: diagnosis and
497
management.
498
Guidance#diagnosis-and-staging; 2011. Accessed 15 October 2018.
Retrieved
from:
https://www.nice.org.uk/guidance/cg121/chapter/1-
499 500
7. Schmid-Bindert G, Jiang H, Kähler G, Saur J, Hanzler T, Wang H, et al. Predicting
501
malignancy in mediastinal lymph nodes by endobronchial ultrasound: a new ultrasound
502
scoring system. Respirology. 2012; 17:1190-8.
503
19
504
8. Shafiek H, Fiorentino F, Peralta AD, Serra E, Esteban B, Martinez R et al. Real-time
505
prediction of mediastinal lymph node malignancy by endobronchial ultrasound. Arch
506
Bronconeumol. 2014;50(6):228-234.
507 508
9. Alici I, Demirci N, Yilmaz A, Karakaya J, Ozaydin E. The sonographic features of
509
malignant mediastinal lymph nodes and a proposal for an algorithmic approach for
510
sampling during endobronchial ultrasound. Clin Respir J. 2016; 10:606-613.
511 512
10. Evison M, Morris J, Martin J, Shah R, Barber PV, Booton R, et al. Nodal Staging in Lung
513
Cancer: A Risk Stratification Model for Lymph Nodes Classified as Negative by EBUS-
514
TBNA. J Thorac Oncol. 2015; 10:126-133. Doi: 10.1097/JTO.0000000000000348
515
11. El-Sherief AH, Lau CT, Wu CC, Drake RL, Abbott GF, Rice TW. International
516
Association for the Study of Lung Cancer (IASLC) Lymph Node Map: Radiologic
517
Review
518
10.1148/rg.346130097
with
CT
Illustration.
RadioGraphics.
2014;
34(6):
1681-91.
Doi:
519 520
12. Hylton D, Fahim C, Shargall Y, Finley C, Agzarian J, Hanna WC. A Novel Online
521
Education Module to Teach Clinicians How to Correctly Identify Ultrasonographic
522
Features of Mediastinal Lymph Nodes During Endobronchial Ultrasound. Canadian
523
Journal of Surgery [Accepted for publication]. 2019.
524
20
525
13. Han K, Song K, Choi B. How to Develop, Validate, and Compare Clinical Prediction
526
Models Involving Radiological Parameters: Study Design and Statistical Methods.
527
Korean J Radiol. 2016; 17(3): 339-350. Doi: 10.3348/kjr.2016.17.3.339
528 529
14. Jiroutek M, Muller K, Kupper L, Stewart P. A New Method for Choosing Sample Size
530
for Confidence Interval-Based Inferences. Biometrics. 2003; 59(3):580-590. Doi:
531
10.1111/1541-0420.00068
532 533
15. Boffa D, Fernandez F, Kim S, Kosinski A, Onaitis MW, Cowper P et al. Surgically
534
Managed Clinical Stage IIIA-Clinical N2 Lung Cancer in The Society of Thoracic
535
Surgeons
536
10.1016/j.athoracsur.2017.02.031
Database.
Ann
Thorac
Surg.
2017;
104:395-403.
Doi:
537 538 539
16. Hallgren KA. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. Tutor Quant Methods Psychol. 2012; 8(1):23-4.
540 541 542
17. Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology. 1990; 43(6):543-9.
543 544 545
18. Thompson WD, Walter SD. Kappa and the concept of independent errors. Journal of Clinical Epidemiology. 1988; 41: 969-70.
546 547 548 549 550
21
551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586Tables: 587 588Table 1. Patient baseline demographics and pathological diagnosis of biopsied and scored 589lymph nodes Age (years) [mean ± SD] Males: n (%)/ females: n (%)
Population Size (n= 140) 68 ± 10.6 76 (54.3%)/ 64 (45.7%)
Pre-planned imaging studies completed
22
MRI, n (%) Head CT, n (%) Chest CT or PET, n (%) Median (range) of Lymph Nodes
27 (19.3%) 10 (7.1%) 139 (99.3%) 3 (1-4)
Scored/Biopsied per Patient Biopsied Lymph Nodes (n=300) 7, n (%) 4R, n (%) 4L, n (%) 10, n (%) 11, n (%) Other (1, 2R, 2L, 12), n (%)
129 (43.0%) 86 (28.7%) 54 (18.0%) 13 (4.3%) 6 (2.0%) 12 (4.0%)
Pathology diagnosis: Malignant Cases Primary lung cancer Adenocarcinoma, n (%) Squamous cell carcinoma, n (%) Other, n (%) Primary esophageal cancer Adenocarcinoma, n (%) Squamous cell carcinoma, n (%) Other, n (%)
n= 77 (55.0%) 36 (46.8%) 25 (32.5%) 16 (20.8%) n= 32 (22.9%) 28 (87.5%) 3 (9.4%) 1 (3.1%)
Pathology Diagnosis: Benign Cases
31 (22.1%)
Pathological Diagnosis: Lymph Nodes Malignant, n (%) Benign, n (%) 590SD = Standard deviation
n= 54 (18.0%) n= 246 (82.0%)
591MRI = Magnetic resonance imaging 592CT = Computed tomography 593 594 595 596
23
597 598Table 2. Multivariate Analyses for Ultrasonographic Features with Logistic Regression Ultrasonographi
OR
c Features
Central Hilar
2.2
Structure
5
Beta
OR
Z
P-
OR 95%
Bias Corrected
Coefficient
Standar
Scor
Value
Confidenc
95%Confidenc
s
d
e
e Interval
e Interval
0.81
Error 0.83
2.21
0.027
1.09-4.63
0.04-1.64
0.92
0.84
2.74
0.006
1.30-4.84
0.22-1.61
1.08
1.10
2.88
0.004
1.41-6.11
0.35-1.80
0.91
1.53
1.47
0.142
0.74-8.33
-0.49-2.29
-2.71
0.02
-8.46
<0.000
0.04-0.12
-3.62- -2.09
(Absence vs. Presence) Small Axis
2.5
Length (≥ 10
1
mm vs. < 10mm) Margin (Well-
2.9
defined vs. Ill-
4
defined) Central Necrosis
2.4
(Presence vs.
8
Absence) Constant
0.0 7
1
599OR = Odds ratio 600 601 602 603 604 605 606 607
24
608Table 3. Canada Lymph Node Score Logistic Regression & Diagnostic Statistics Canada
Odd
Standar
P-
95%
Bias
Sensitivit
Specificit
Positive
Negative
Score
s
d Error
Value
Confidenc
Corrected
y (%)
y (%)
Likelihoo
Likelihoo
Values
Rati
e Interval
95%
d Ratio
d Ratio
o
Confidenc
1 (.vs 0) 2 (.vs 0) 3 (.vs 0)
1.39 3.48 15.6
0.70 1.74 10.39
0.52 0.012 <0.000
0.51-3.76 1.31-9.26 4.27-57.49
e Interval -0.53-1.51 0.25-2.23 1.55-4.05
83.33 61.11 31.48
36.99 77.64 96.34
1.32 2.73 8.60
0.45 0.51 0.71
4 (.vs 0)
7 52.2
39.21
1 <0.000
11.98-
2.92-6.89
9.26
99.59
22.78
0.91
Constan
2 0.10
0.03
1 <0.000
227.47 0.05-0.19
-3.16-
N/A
N/A
N/A
N/A
t
1 609N/A = Not applicable
-1.71
610 611Table 4. Reliability Assessment for the Canada Lymph Node Score Percent
Gwet’s AC1 Value
95% Confidence
3 Rater Comparison
Agreement 76.6%
0.74 ± 0.02
Interval 0.70 - 0.79
(n=900) 12 Rater Comparison
41.7%
0.29 ± 0.02
0.25 - 0.33
3 Rater Comparison
84.4%
0.80 ± 0.02
0.75 - 0.85
(n=900) (≥3 cut-off) 12 Rater Comparison (≥3 85.4%
0.81 ± 0.02
0.77 - 0.85
cut-off) 612 613 614Figure Legends 615
25
616Figure 1. Ultrasonographic features of mediastinal lymph nodes. Comparison of benign and 617malignant versions of ultrasonographic features commonly identified during endobronchial 618ultrasound (EBUS) procedures. These ultrasonographic features comprise the Canada Lymph 619Node Score. 620 621Figure 2. Standards for Reporting of Diagnostic Accuracy Studies (STARD) diagram reporting 622the flow of participants through the study. A total of 300 lymph nodes were analyzed, the 623majority being benign (n=246/300). 624 625Figure 3. Discriminatory power of the Canada Lymph Node Score. Analysis of the Canada 626Lymph Node Score multivariate logistic regression model via receiver operator characteristic 627curve generated an acceptable c-statistic = 0.72 (standard error = 0.04, 95% confidence interval = 6280.64-0.80) (bias-corrected c-statistic: 0.66, 95%CI: 0.55-0.76). 629 630Figure 4. The Canada Lymph Node Score (CLNS) project methodology. In total, 300 lymph 631nodes (LN) from 140 patients were biopsied and assessed for ultrasonographic features. This 632data was used to develop the Canada Lymph Node Score and was later validated across seven 633Canadian centres. OR = odds ratio. 634Supplemental Figure Legend 635e-Figure 1. Model calibration plot. Graphical depiction of the Hosmer-Lemeshow goodness of fit 636test for the Canada Lymph Node Score logistic regression model. Chi-square value was 2.99 637(p-value = 0.89) suggesting a well calibrated model. The red line represents perfect model 638calibration, the green line shows the Canada Lymph Node Score calibration.
26
639 640e-Figure 2. Predicted probability of malignancy. The associated probability of malignancy for 641each possible scoring value of the Canada Lymph Node Score (0-4). 642 643e-Figure 3. Commonly reported ultrasonographic features. During endobronchial ultrasound 644procedures, there are six ultrasonographic features commonly reported in the literature as being 645predictive of malignancy: heterogeneous echogenicity, central hilar structure absence, central 646necrosis presence, well-defined margins, round shape, and small axis length ≥ 10mm. 647 648Video Legend: 649Video 1. Endobronchial ultrasound transbronchial needle aspiration procedure. All four 650malignant ultrasonographic features (small axis length greater than 10mm, central hilar structure 651absent, central necrosis present, and well-defined margins are present corresponding to a 4/4 652score on the Canada Lymph Node Score. Pathology confirmed the lymph node was positive for 653malignancy.
27
Table 1. Patient baseline demographics and pathological diagnosis of biopsied and scored lymph nodes Population Size (n= 140) Age (years) [mean ± SD] 68 ± 10.6 Males: n (%)/ females: n (%) 76 (54.3%)/ 64 (45.7%) Pre-planned imaging studies completed MRI, n (%) Head CT, n (%) Chest CT or PET, n (%) Median (range) of Lymph Nodes Scored/Biopsied per Patient
27 (19.3%) 10 (7.1%) 139 (99.3%) 3 (1-4)
Biopsied Lymph Nodes (n=300) 7, n (%) 4R, n (%) 4L, n (%) 10, n (%) 11, n (%) Other (1, 2R, 2L, 12), n (%)
129 (43.0%) 86 (28.7%) 54 (18.0%) 13 (4.3%) 6 (2.0%) 12 (4.0%)
Pathology diagnosis: Malignant Cases Primary lung cancer Adenocarcinoma, n (%) Squamous cell carcinoma, n (%) Other, n (%) Primary esophageal cancer Adenocarcinoma, n (%) Squamous cell carcinoma, n (%) Other, n (%)
n= 77 (55.0%) 36 (46.8%) 25 (32.5%) 16 (20.8%) n= 32 (22.9%) 28 (87.5%) 3 (9.4%) 1 (3.1%)
Pathology Diagnosis: Benign Cases
31 (22.1%)
Pathological Diagnosis: Lymph Nodes Malignant, n (%) Benign, n (%) SD = Standard deviation MRI = Magnetic resonance imaging CT = Computed tomography
n= 54 (18.0%) n= 246 (82.0%)
Table 2. Multivariate Analyses for Ultrasonographic Features with Logistic Regression Bias Corrected POR 95% Z OR Ultrasonographi OR Beta Confidenc 95%Confidenc c Features Coefficient Standar Scor Value e Interval e Interval e d s Error 2.2 0.81 0.83 2.21 0.027 1.09-4.63 0.04-1.64 Central Hilar 5 Structure (Absence vs. Presence) 2.5 0.92 0.84 2.74 0.006 1.30-4.84 0.22-1.61 Small Axis 1 Length (≥ 10 mm vs. < 10mm) 2.9 1.08 1.10 2.88 0.004 1.41-6.11 0.35-1.80 Margin (Well4 defined vs. Illdefined) Central Necrosis 2.4 0.91 1.53 1.47 0.142 0.74-8.33 -0.49-2.29 (Presence vs. 8 Absence) Constant 0.0 -2.71 0.02 -8.46 <0.000 0.04-0.12 -3.62- -2.09 7 1 OR = Odds ratio
Table 3. Canada Lymph Node Score Logistic Regression & Diagnostic Statistics Canada
Odd
Standar
P-
95%
Bias
Sensitivit
Specificit
Positive
Negative
Score
s
d Error
Value
Confidenc
Corrected
y (%)
y (%)
Likelihoo
Likelihoo
Values
Rati
e Interval
95%
d Ratio
d Ratio
o
Confidenc
1 (.vs 0) 2 (.vs 0) 3 (.vs 0)
1.39 3.48 15.6
0.70 1.74 10.39
0.52 0.012 <0.000
0.51-3.76 1.31-9.26 4.27-57.49
e Interval -0.53-1.51 0.25-2.23 1.55-4.05
83.33 61.11 31.48
36.99 77.64 96.34
1.32 2.73 8.60
0.45 0.51 0.71
4 (.vs 0)
7 52.2
39.21
1 <0.000
11.98-
2.92-6.89
9.26
99.59
22.78
0.91
Constan
2 0.10
0.03
1 <0.000
227.47 0.05-0.19
-3.16-
N/A
N/A
N/A
N/A
t
1 N/A = Not applicable
-1.71
Table 4. Reliability Assessment for the Canada Lymph Node Score Percent Gwet’s AC1 Value Agreement 3 Rater Comparison 76.6% 0.74 ± 0.02 (n=900) 12 Rater Comparison 41.7% 0.29 ± 0.02 3 Rater Comparison 84.4% (n=900) (≥3 cut-off) 12 Rater Comparison (≥3 85.4% cut-off)
95% Confidence Interval 0.70 - 0.79 0.25 - 0.33
0.80 ± 0.02
0.75 - 0.85
0.81 ± 0.02
0.77 - 0.85