Machine learning algorithm as a diagnostic tool for hypoadrenocorticism in dogs

Machine learning algorithm as a diagnostic tool for hypoadrenocorticism in dogs

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Domestic...

813KB Sizes 0 Downloads 14 Views

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Domestic Animal Endocrinology xxx (2019) 106396

Contents lists available at ScienceDirect

Domestic Animal Endocrinology journal homepage: www.journals.elsevier.com/ domestic-animal-endocrinology

Machine learning algorithm as a diagnostic tool for Q 1 hypoadrenocorticism in dogs Q 20

K.L. Reagan a, B.A. Reagan b, C. Gilor a, * a b

Department of Veterinary Medicine and Epidemiology, University of California, Davis, 1 Shields Ave, Davis, CA 95616, USA Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CA, USA

a r t i c l e i n f o

a b s t r a c t

Article history: Received 26 February 2019 Received in revised form 3 July 2019 Accepted 6 September 2019

Canine hypoadrenocorticism (CHA) is a life-threatening condition that affects approximately 3 of 1,000 dogs. It has a wide array of clinical signs and is known to mimic other disease processes including kidney and gastrointestinal diseases, creating a diagnostic challenge. Because CHA can be fatal if not appropriately treated, there is risk to the patient if the condition is not diagnosed. However, the prognosis is excellent with appropriate therapy. A major hurdle to diagnosing CHA is lack of awareness and low index of suspicion. Once suspected, the application and interpretation of conclusive diagnostic tests is relatively straight forward. In this study, machine learning methods were employed to aid in the diagnosis of CHA using routinely collected screening diagnostics (complete blood count and serum chemistry panel). These data were collected for 908 control dogs (suspected to have CHA, but disease ruled out) and 133 dogs with confirmed CHA. A boosted tree algorithm (AdaBoost) was trained with 80% of the collected data, and 20% was then utilized as test data to assess performance. Algorithm learning was demonstrated as the training set was increased from 0 to 600 dogs. The developed algorithm model has a sensitivity of 96.3% (95% CI, 81.7%–99.8%), specificity of 97.2% (95% CI, 93.7%–98.8%), and an area under the receiver operator characteristic curve of 0.994 (95% CI, 0.984–0.999), and it outperforms other screening methods including logistic regression analysis. An easy-to-use graphical interface allows the practitioner to easily implement this technology to screen for CHA leading to improved outcomes for patients and owners. Ó 2019 Elsevier Inc. All rights reserved.

Keywords: Addison’s AdaBoost Boosted tree Artificial intelligence Canine

1. Introduction Canine hypoadrenocorticism (CHA) is typically caused by an immune-mediated destruction of the cortex of the adrenal gland, resulting in a deficiency of both glucocortiQ 5 coids and mineralocorticoids (GMDH). Less commonly with secondary CHA, segmental sparing of the zona glomerulosa occurring with atrophy of the zona reticularis and fasciculata results in glucocorticoid deficiency alone (GDH) [1–4]. Hypoadrenocorticism in dogs poses a significant diagnostic challenge because of its wide array of potential Q4

Q2

* Corresponding author. Tel.: 530 752 1363; fax: ---. E-mail address: [email protected] (C. Gilor).

clinical presentations that may mimic kidney disease, gastrointestinal disease, hepatic insufficiency, and other diseases [5,6]. It is critical to correctly and promptly identify patients and institute therapy to increase the likelihood of a positive outcome. A major hurdle to making a diagnosis of hypoadrenocorticism is recognizing patients that may have the disease and pursuing confirmatory diagnostics, as mild clinical signs or subtle diagnostic abnormalities can be overlooked [1,7,8]. Canine hypoadrenocorticism causes characteristic biochemical and hematologic parameter changes, including hyponatremia, hyperkalemia, azotemia associated with mineralocorticoid deficiency and hypocholesterolemia, hypoglycemia, lack of a lymphopenia, and eosinophilia associated with cortisol deficiency [9,10].

0739-7240/$ – see front matter Ó 2019 Elsevier Inc. All rights reserved. https://doi.org/10.1016/j.domaniend.2019.106396

FLA 5.6.0 DTD  DAE106396_proof  1 October 2019  11:02 pm  ce

51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111

112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172

2

K.L. Reagan et al. / Domestic Animal Endocrinology xxx (2019) 106396

However, these parameters can be within the normal reference ranges in many patients with confirmed CHA Q 6 [7,11,12]. Assessing changes to the CBC and serum chemistry (SC) as a screening tool for CHA is common, and many guidelines have been proposed that should prompt the veterinarian to consider CHA as a diagnosis and pursue the confirmation of diagnosis by performing an ACTH stimulation test [5,9,11,13–15]. Low sodium-to-potassium ratio (Na:K) is commonly associated with CHA and is reported routinely by most veterinary laboratories as a screening test for CHA. However, CHA only represents 9% of the diagnoses in patients with this low ratio and only 80% of dogs suffering from CHA have sodium and potassium abnormalities [6,12]. Thus, as screening tests, electrolyte abnormalities are neither sensitive nor specific for CHA diagnosis, especially for patients with GDH [1,9,14]. Mathematical and statistical models have been used to further evaluate the biochemical parameter changes noted with CHA. A combination of Na:K and leukocyte parameters has also been evaluated as a screening tool for CHA [9]. When sensitivity was set to 100%, the specificity of these parameters individually was between 15% and 35% with an area under the receiver operator characteristic curve (AUC) of 0.631–0.873. When the parameters were combined using binary logistic regression Q 7 modeling, AUC improved to 0.935 [11]. Adding more parameters to a logistic regression model (combining corticosteroid-induced alkaline phosphatase, Na:K, creatinine kinase, BUN, and albumin) achieved a sensitivity of 98%, specificity of 100%, and AUC of 0.994 [15]. Furthermore, clinical decision trees that utilize several of these parameters in sequence resulting in high accuracy have been constructed [5]. However, in these studies, the proposed models were only assessed on data that were also used to create the model; therefore, the performance may be overestimated. These results do however show that changes in the CBC and SC can be harnessed as a screening tool for CHA, regardless of the status as GMDH or GDH. Machine learning, a branch of artificial intelligence, has been utilized extensively in the medical field as a tool to aid in diagnoses of medical conditions and make diagnostic predictions [17,18]. Supervised machine learning algorithms such as the boosted tree algorithm, AdaBoost, consistently have superior accuracy compared with other machine learning algorithms and logistic regression models when asked to assign a classification to data [19]. The goal of this study was to create an accurate tool to screen for CHA exploiting routine diagnostics, the CBC and SC, which are often the first-line diagnostics performed when an ill dog is presented to the veterinarian. We aimed to create a machine learning model that utilizes these parameters to classify dogs as positive or negative for CHA regardless of GMDH or GDH status and test this model with naïve data not used in the training of the algorithm to validate the model. The secondary aim was to create a userfriendly interface that can be easily applied to the clinical setting to screen dogs for CHA using the machine learning algorithm.

2. Methods 2.1. Data set A search in the University of California Davis Veterinary Medical Teaching Hospital electronic medical record was conducted for the years 2010 to 2017 for all dogs that had a resting cortisol (Immulite 2000; Siemens, Erlangen, Germany) performed. Patients were enrolled in the study if complete blood count (Advia 120; Siemens, Erlangen, Germany) and serum chemistry (Cobas c501/6000 series; Roche Diagnostics, Indianapolis, IN) were available within the same visit or within 1 wk of cortisol measurement. All blood work was conducted at the Clinical Laboratory Services at the University of California Davis Veterinary Medical Teaching Hospital. Exclusion criteria included history of hyperadrenocorticism, administration of corticosteroids, trilostane, mitotane, or ketoconazole in the previous 6 mo. Patients had an ACTH stimulation test performed at clinician’s discretion. Those with resting cortisol concentrations <2 mg/dL were excluded if an ACTH stimulation test was not performed. For all cases and controls, data extracted from the medical record included signalment, cortisol concentrations before and (when available) after ACTH stimulation test, and CBC and SC at the time cortisol was measured. If multiple CBC or SC were available, the first results during the patients visit were used for analysis. 2.2. Hypoadrenocorticism group A classification of CHA was made if patients did not meet the aforementioned exclusion criteria and had a postACTH stimulation test cortisol of <2 mg/dL. These dogs were subcategorized as having either normal (>27) or low (27) Na:K ratio. 2.3. Control group Dogs were classified as controls if they did not meet the aforementioned exclusion criteria and their resting cortisol or cortisol post-ACTH stimulation test was >2 mg/dL. Patients’ signalment, CBC, SC, and ACTH stimulation test results were collected from the medical record. 2.4. Machine learning The results of routine CBC and SC from the electronic medical record were utilized as inputs for machine learning algorithms in this study (Table 1). Any patient with incomplete results was excluded from the data set. Eighty percent of the data was randomly utilized as a training set with the remaining 20% set aside to test the model (Fig. 1). The machine learning classification scheme used here was based on adaptive boosting with decision trees as the weak learners using the AdaBoost algorithm [20]. For each training group size in Figure 2, 30 predictive models were trained using Bayesian optimization with different training parameters including number of weak learners and using cross validation to avoid excessive overfitting. Models with up to a maximum of 500 trees were trained. The best

FLA 5.6.0 DTD  DAE106396_proof  1 October 2019  11:02 pm  ce

173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233

3

295 296 297 CBC Serum chemistry 298 299 RBC (M/mL) Anion gap (mmol/L) Hemoglobin (g/dL) Sodium (mmol/L) 300 Hematocrit (%) Potassium (mmol/L) 301 Mean corpuscular volume Chloride (mmol/L) 302 (MCV) (fL) 303 Mean cell hemoglobin Bicarbonate (mmol/L) 304 (MCH) (pg) Mean corpuscular hemoglobin Phosphorus (mg/dL) 305 concentration (MCHC) (g/dL) Calcium (mg/dL) 306 RBC distribution width Urea nitrogen (BUN) (mg/dL) 307 (RDW) (%) 308 WBC (/mL) Creatinine (mg/dL) 309 Neutrophils (/mL) Glucose (mg/dL) Fig. 2. Evidence of learning. As the sample size of the training set increases Lymphocytes (/mL) Total protein (g/dL) 310 there is improvement in sensitivity (blue circles), specificity (red diamonds), Monocytes (/mL) Albumin (g/dL) 311 and AUC (black squares) for this algorithm. Abbreviation: AUC, area under Eosinophils (/mL) Globulins (g/dL) the receiver operator characteristic curve. (For interpretation of the refer312 Platelets (/mL) Alanine aminotransferase ences to color in this figure legend, the reader is referred to the Web version 313 (ALT) (IU/L) of this article.) Mean platelet volume Aspartate aminotransferase 314 (MPV) (fL) (AST) (IU/L) Clinicopathologic features were compared using the Mann 315 Plasma protein (g/dL) Alkaline phosphatase (ALP) (IU/L) 316 Whitney test with a Bonferroni correction. 317 318 performing network, as determined by overall correct 3. Results 319 classification rate within the training group, was retained 320 in each case. The selected optimal model consisted of 93 3.1. Demographic information 321 small decision trees. 322 The algorithm was evaluated using accuracy, sensitivity, The population consisted of 1,041 dogs that were sus- 323 and specificity as compared to the standard reference ACTH pected to have CHA by attending clinicians. One hundred 324 stimulation test for the identification of CHA positive cases. thirty-three dogs were identified with CHA based on post- 325 In addition, receiver operating characteristic (ROC) curves ACTH stimulation cortisol of <2 mg/dL, of which 74 had an 326 were generated (Prism 7; GraphPad, La Jolla, CA) for Na:K Na:K < 27. The remaining 908 dogs were suspected to have 327 ratio, lymphocyte count as well as resting cortisol, and loCHA and that diagnosis was ultimately ruled out, making 328 gistic regression models that have been previously up the control set. Characteristics of this population are 329 described [11,15]. Youden’s index was calculated for each summarized in Table 2. Fifty different breeds are repre- 330 screening tool to determine the optimum cutpoint. MATsented in the CHA set, and 124 breeds are represented in 331 LAB software (MathWorks, Natick, MA) was utilized for the control set. The top 4 most common breeds are the 332 algorithm construction, logistic regression, statistical same for both the CHA and control groups; mix breed (31/ 333 analysis, and construction of a graphical user interface. ROC 133 CHA, 203/908 control, P ¼ 0.8), Labrador retriever (11/ 334 curves were compared using previously described methods 133 CHA, 60/908 control, P ¼ 0.4), standard poodle (7/133 335 (MedCalc, Ostend, Belgium) [21]. Sensitivity and specificity CHA, 32/908 control, P ¼ 0.3), and Chihuahua (5/133 CHA, 336 were compared at some cutpoints using a chi-squared test. 30/908 control, P ¼ 0.8) (Table S1). 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 Q 21355 Fig. 1. Experimental design.

web 4C=FPO

Table 1 Input parameters: blood work features used as input for machine learning algorithm.

web 4C=FPO

234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294

K.L. Reagan et al. / Domestic Animal Endocrinology xxx (2019) 106396

FLA 5.6.0 DTD  DAE106396_proof  1 October 2019  11:02 pm  ce

356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416

4

K.L. Reagan et al. / Domestic Animal Endocrinology xxx (2019) 106396

Characteristics

Control dogs

Canine hypoadrenocorticism dogs

Sample size Male sex (n, %) Castrated (n, %) Weight (Kg) Age (yr)

908 437 (48%) 763 (84%) 19  15 74

133 70 (52%) 116 (87%) 24  17 63

Values are represented in mean  standard deviation or n (%).

3.2. Clinicopathologic results Resting cortisol, CBC, and SC results are available for all patients, and a post–ACTH stimulation cortisol is available in patients with resting cortisol <2 mg/dL. Significant differences were noted in sodium, potassium, chloride, bicarbonate, phosphorus, calcium, blood urea nitrogen, creatinine, glucose, albumin, globulin, aspartate aminotransferase (AST), ALP, cholesterol, mean corpuscular volume of the red blood cells, mean corpuscular hemoglobin concentration of the red blood cells, lymphocyte concentration, and eosinophil concentration. The mean values and standard deviation of the CBC, SC, and cortisol results are summarized in Table 3. Table 3 Clinicopathologic values. Characteristic

Control dogsa

Resting cortisol (mg/dL) Anion gap (mmol/L) Sodium (mmol/L) Potassium (mmol/L) Chloride (mmol/L) Bicarbonate (mmol/L) Phosphorus (mg/dL) Calcium (mg/dL) BUN (mg/dL) Creatinine (mg/dL) Glucose (mg/dL) Total protein (g/dL) Albumin (g/dL) Globulins (g/dL) Alanine aminotransferase (ALT) (IU/L) Aspartate aminotransferase (AST) (IU/L) Alkaline phosphatase (ALP) (IU/L) GGT (IU/L) Cholesterol (mg/dL) Bilirubin (mg/dL) RBC (M/uL) Hemoglobin (gm/dL) Hematocrit (%) Mean corpuscular volume (MCV) (fL) Mean cell hemoglobin (MCH) (pg) Mean corpuscular hemoglobin concentration (MCHC) (gm/dL) RBC distribution width (RDW) (%) WBC (106/mL) Neutrophils (106/mL) Lymphocytes (106/mL) Monocytes (106/mL) Eosinophils (106/mL) Platelets (106/mL) Mean platelet volume (MPV) (fL) Plasma protein (g/dL)

4 (2.5–6.5) 19 (17–22) 146 (143–148) 4.3 (4.0–4.8) 110 (107–113) 20 (18–22) 4.1 (3.4–5.1) 10 (9.4–11) 16 (11–25) 0.9 (0.7–1.2) 98 (88–110) 5.9 (5.2–6.4) 3.4 (2.8–3.7) 2.4 (2–2.9) 47 (30–83) 35 (27–51) 66 (36–149) 3 (3–5) 195 (143–256) 0.2 (0.1–0.2) 6.5 (5–7.2) 15 (13–17) 44 (39–49) 69 (67–72) 24 (23–24) 34 (33–35)

a b

417 418 The training set included 106 CHA cases (58 of which 419 had a Na:K < 27) and 727 control dogs. The remaining 20% 420 of the cases were withheld to be used as a testing set to 421 422 determine the performance of the algorithm (Fig. 1). To determine the optimal training set size, the algorithm 423 was trained on increasing sample sizes starting at 100. As 424 the training set size was increased, there was improvement 425 in the performance of the algorithm that plateaued when 426 the training set reached 600 with no further apparent 427 benefit as the training set sample size approached 800 428 429 based on the learning curve presented in Figure 2. 430 431 3.4. Classification test and algorithm performance 432 433 The trained algorithm was applied to the test set which 434 consisted of 27 CHA cases (16 of which had a Na:K < 27) 435 and 181 controls that were not a part of the original 436 training data. The algorithm correctly classified 202 of 208 437 test cases for an error rate of 2.9% and accuracy of 97.1% 438 (Table 4). When the algorithm was applied to the entire 439 data set (training set and test set), the error rate decreased 440 to 0.6% with an overall accuracy of 99.4%. 441 442 443 Q 17 444 Canine hypoadrenocorticism dogsa P valueb 445 446 0.3 (0.2–0.3) <0.0035 447 19 (16–24) 1 147 (130–145) <0.0035 448 5.5 (4.6–6.9) <0.0035 449 106 (98–113) <0.0035 450 17 (14–20) <0.0035 451 5.4 (4.3–7.4) <0.0035 11 (9.9–12) <0.0035 452 29 (21–53) <0.0035 453 1.4 (1–2) <0.0035 454 86 (74–100) <0.0035 455 6.1 (5.3–6.6) 1 456 2.8 (2.3–3.3) <0.0035 3.1 (2.5–3.6) <0.0035 457 55 (41–76) 0.8225 458 56 (40–91) <0.0035 459 37 (26–55) <0.0035 460 3.0 (2–3) 0.007 113 (92–160) <0.0035 461 0.2 (0.1–0.2) 1 462 6.8 (5.4–7.7) 1 463 15 (12–18) 1 464 43 (35–50) 1 465 65 (63–68) <0.0035 23 (22–24) 0.0105 466 35 (34–36) <0.0035 467 468 13 (12–15) 1 469 11.8 (9.5–16.2) 0.0105 6.8 (5.2–10.0) 1 470 2.9 (2.0–4.4) <0.0035 471 0.6 (0.4–0.9) 1 472 0.7 (0.4–1.3) <0.0035 473 286 (218–372) 1 474 10 (9.3–12) 1 6.0 (5.2–6.7) 0.014 475 476 477 3.3. Training the algorithm

Table 2 Population demographics.

13 (12–14) 10.2 (7.8–14.9) 7.1 (5.2–11.2) 1.5 (1–2.1) 0.6 (0.4–0.9) 0.3(0.1–0.5) 272 (204–371) 11 (9.3–12) 6.4 (5.6–7.1)

Values reported are median and interquartile range. P values reported with Bonferroni correction, P < 0.0014 significant.

FLA 5.6.0 DTD  DAE106396_proof  1 October 2019  11:02 pm  ce

478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538

K.L. Reagan et al. / Domestic Animal Endocrinology xxx (2019) 106396 Table 4 Confusion matrix. Algorithm prediction

Actual disease status CHAþ

CHA

CHAþ CHA

26 1

5 176

Classification results of the machine learning algorithm are listed for the test data set.

Further analysis of these data on the test set shows a high sensitivity of 96.3% (95% CI, 81.7%–99.8%) and specificity of 97.2% (95% CI, 93.7%–98.8%) when the optimal cut point was applied. The AUC of the ROC for this diagnostic is 0.994 (95% CI, 0.984–0.999) (Fig. 3a). When all cases (training set and test set) were classified by the algorithm, a sensitivity of 99.5% (95% CI, 97.1%–99.9%), specificity of 99.4% (95% CI, 98.6%–99.8%) at the optimal cut point (Table 5), and AUC of 0.999 (95% CI, 0.998–1) were achieved (Fig. 3b). 3.5. Comparison with alternate screening tools The Na:K ratio, lymphocyte count, and resting cortisol were assessed as screening tools in the total study popuQ 8 lation to compare with the algorithm performance. ROC curves were generated with and AUC for Na:K ratio, lymphocyte count, and resting cortisol are presented in Figure 4. The ROC curves for the algorithm (test set) was significantly different from the ROC curves for Na:K ratio and lymphocyte count (P < 0.001). There was no statistical difference between the ROC curve for resting cortisol and the algorithm (test set). The optimal cut point was determined to be <28.2 for the Na/K ratio, >2,207 cells/mL for lymphocyte count, and <1.25 mg/dL for resting cortisol for a diagnosis of CHA. The sensitivity and specificity for these cut points as well as a cut point that allows for 100% sensitivity for each screening tool are listed in Table 5. Specificity at the optimal cut point was significantly different when comparing the algorithm (test set) to Na/K and lymphocyte count (P < 0.001). When the cut point for the resting cortisol and algorithm (test set) were adjusted to allow for a 100% sensitivity, the specificity for resting cortisol was 87.88% (95% CI, 85.56%–89.94%) and

Fig. 3. Receiver operator characteristic (ROC) curves. ROC curves representing algorithm performance of the test with zoom-in view of the upper left corner of the ROC curve (inset).

5

539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 3.6. Discordant algorithm results 557 When the algorithm was tested on patient data that 558 were not used as part of the training set, there were 6 pa- 559 tients with discordant classifications, including 1 false- 560 negative and 5 false-positive results. To better understand 561 the discordant results, the cases are briefly described here. 562 The first false-positive case was an 11-yr-old male intact 563 Pekingese that presented for idiopathic pericardial effusion 564 and suspected congestive heart failure and was treated 565 with furosemide before presentation. On intake, a CBCQ 9566 showed a moderate microcytic [MCV 61.6 fL (65–75 fL)] 567 anemia with hematocrit of 33.4% (40%–55%) and a normal 568 leukogram. SC showed a hyponatremia of 141 mmol/L 569 (143–151 mmol/L), hyperkalemia of 5 mmol/L (3.6– 570 4.8 mmol/L), hypochloremia of 103 mmol/L (108– 571 116 mmol/L), elevated BUN of 34 mg/dL (11–33 mg/dL), 572 hypoglycemia of 80 mg/dL (86–118 mg/dL), and hypo- 573 albuminemia of 3.3 g/dL (3.4–4.3 g/dL). After an anesthetic 574 procedure and pericardiectomy, the patient developed 575 right-sided heart failure. ACTH stimulation test was per-Q 10576 formed because of the patient’s clinical worsening condi- 577 tion and concern for GI bleeding. Cortisol post ACTH was 578 6.1 mg/dL, inconsistent with CHA. The following day the 579 patient was euthanized and a postmortem examination 580 was performed. Histopathology of the adrenal gland 581 revealed severe adrenal cortical necrosis with hemorrhage 582 583 and lymphoplasmacytic adrenalitis. The second false-positive case is a 2-yr-old female 584 spayed German shepherd dog that presented for chronic 585 intermittent diarrhea and weight loss. Diagnostics per-Q 11586 formed revealed a normal CBC (lymphocytes ¼ 2,343/mL, RI 587 1,000–4,000/mL) and SC with hypoalbuminemia of 3 g/dL 588 (3.4–4.3 g/dL), hypocholesterolemia of 128 mg/dL (139– 589 353 mg/dL), and high AST of 55 (20–49 IU/L). Abdominal 590 ultrasound revealed small adrenal glands (right 0.398 cm, 591 left 0.381 cm) bilaterally. Resting cortisol was 1.7 mg/dL, and 592 post-ACTH stimulation cortisol was 10 mg/dL, ruling out 593 CHA. The dog was diagnosed with inflammatory bowel 594 disease as an underlying cause of the diarrhea based on 595 endoscopically obtained GI biopsies. The patient has been 596 managed with cyclosporine for multisystemic immune- 597 mediated disease with vasculitis causing dermatologic 598 and ocular lesions. This patient is currently being treated 599

specificity for the algorithm (test set) was 92.27% (95% CI, 87.36%–95.71%); however, the difference was not significant (P ¼ 0.08). Sensitivity at the optimal cut point was significantly different between the algorithm (test set) and the Na:K and lymphocyte count (P < 0.01), but not different when compared with the resting cortisol. The logistic regression model described previously utilizing Na:K and lymphocyte count was applied to our data set [11]. The ROC curve was generated (Fig. 4) with an AUC of 0.8680 (95% CI, 0.8475–0.8885), which was significantly different from the ROC obtained from the algorithm (test set) (P ¼ 0.037). At an optimal cut point, the sensitivity and specificity were determined and compared with the algorithm (test set) and found to be 77.44% (95% CI, 69.39%– 84.23%, P ¼ 0.02) and 83.59% (95% CI, 81.23%–85.72%, P < 0.001) respectively.

FLA 5.6.0 DTD  DAE106396_proof  1 October 2019  11:02 pm  ce

K.L. Reagan et al. / Domestic Animal Endocrinology xxx (2019) 106396

661 662 663 Screening test Cut point (if applicable) Sensitivity Specificity AUC 664 a,b,c Algorithm – test set Optimal 96.3% (81.7–99.8) 97.2% (93.7–98.8) 0.994 (0.984–0.999) 665 High sensitivity 100% (87.23–100) 92.27% (87.36–95.71) – Algorithm–total data set Optimal 99.5% (97.1–99.97) 99.4% (98.6–99.8) 0.999 (0.998–1) 666 High sensitivity 100% (97.26–100) 98.28% (97.29–98.95) – 667 a Na/K (ratio) <28.42 66.92% (58.2–74.83)* 84.8% (82.3–87.08) 0.808 (0.761–0.854) 668 <52.3 100% (97.26–100) 0.77% (0.31–1.58) – 669 >2,207 72.18% (63.75–79.6)* 79.19% (76.4–81.78) 0.787 (0.741–0.834)b Lymphocyte count (cells/mL) 670 >229 100% (97.26–100) 1.92% (1.18–3.11) – Resting cortisol (mg/dL) <1.25 96.24% (91.44–98.77) 97.11% (95.79–98.1) 0.993 (0.989–0.997) 671 <2 100% (97.26–100) 87.88% (85.56–89.94) – 672 c Na/K and lymphocyte logistic Optimal 77.44% (69.39–84.23)* 83.59% (81.23–85.72) 0.868 (0.8475–0.8885) 673 regression [8] High Sensitivity 100% (97.26–100) 1.54% (0.91–2.49) – 674 The calculated sensitivity, specificity, accuracy, and area under the receiver operator characteristic curve (AUC) for the algorithm is listed for the test data set 675 and the total data set. Pairwise comparison of algorithm – test set receiver operator characteristic curve to various screening tests a,b P < 0.0001, c P ¼ 0.037. 676 Chi-square analysis between proportions as compared with algorithm – test set sensitivity (*) or specificity ( ) with P < 0.05. 677 678 142 mmol/L (143–151 mmol/L), hypoalbuminemia of 1.4 g/ with topical corticosteroids, and therefore follow-up CHA 679 dL (3.4–4.3 g/dL), and hypocholesterolemia of 90 mg/dL testing is not feasible. 680 (139–353 mg/dL). Resting cortisol of 2.6 mg/dL ruled out The third false-positive case was a 7-yr-old female 681 spayed Doberman pinscher that presented for right-sided CHA. The patient was discharged from the hospital after 682 congestive heart failure due to dilated cardiomyopathy surgery and was later lost to follow-up. 683 with abdominal effusion and evidence of pulmonary The final false-positive case is a 7-yr-old female spayed 684 edema on thoracic radiographs. CBC showed low normal Chihuahua cross that presented for an acute kidney injury 685 MCV of 65.5 fL (65–75 fL), a mild neutrophilia of 14,671/mL of unknown etiology. CBC and SC at intake revealed a 686 microcytosis with MCV of 63 fL (65–75 fL), lymphopenia of (3,000–10,500/mL), and normal lymphocytes of 1,737/mL 687 942/mL, hyponatremia of 132 mmol/L, hyperphosphatemia (1,000–4,000/mL). On SC, there was severe hyperkalemia of 688 6.7 mmol/L (3.6–4.8 mmol/L), hyponatremia of 128 mmol/ of 9.4 mg/dL (2.6–5.2 mg/dL), azotemia with BUN of 689 L(143–151 mmol/L), hypochloremia of 96 mmol/L (108– 132 mg/dL (11–33 mg/dL) and creatinine of 3.9 mg/dL (0.8– 690 116 mmol/L), low bicarbonate of 12 mmol/L (20–29 mmol/ 1.5 mg/dL), hyperglycemia of 149 mg/dL (86–118 mg/dL), 691 L), hyperphosphatemia of 9.5 mg/dL(2.6–5.2 mg/dL), and hyperalbuminemia of 3.3 g/dL (1.7–3.1 g/dL). Resting 692 hypoalbuminemia of 2.5 g/dL(3.4–4.3 g/dL), and hypocortisol of 8.5 mg/dL ruled out CHA. This patient recoveredQ 12 693 cholesterolemia of 119 mg/dL (139–353 mg/dL). Resting from the AKI with supportive care that included intrave694 cortisol of 9.1 mg/dL ruled out CHA. This patient passed nous fluid therapy but not mineralocorticoid or cortico695 steroid therapy. The patient no longer has any clinical signs away from their disease, and no postmortem examination 696 of illness; however, no further blood work results are was performed. 697 available and the patient was lost to further follow-up. The fourth false-positive case was a 3-yr-old female 698 The single false-negative classification was a 5-yr-old spayed beagle that presented for myelopathy secondary to 699 male castrated Samoyed dog that was diagnosed with intervertebral disc disease. She also had a history of chronic 700 splenic and omental abscessation that was thought to be diarrhea. On CBC and SC before hemilaminectomy, changes 701 secondary to a grass awn migration. CBC showed a severe were noted that could be consistent with CHA including a 702 microcytic anemia with MCV of 58 fL (65–75 fL) and a hemild nonregenerative anemia of 38.7% (40%–55%), lack of a 703 matocrit of 22.4% (40%–55%), a marked neutrophilia of lymphopenia 2,519/mL (1,000–4,000/mL), hyponatremia of 704 25,256/mL (3,000–10,500/mL) and an otherwise normal 705 leukogram. SC showed a mild hyponatremia of 143 mmol/L 706 (145–154 mmol/L), hypoalbuminemia of 2.3 g/dL (3.4–4.3 g/ 707 dL), and hyperglobulinemia of 5.1 g/dL (1.8–3.9 g/dL) with 708 the remainder unremarkable. The dog was categorized as 709 having CHA (GDH) by the ACTH stimulation test, with post710 ACTH cortisol of 1.5 mg/dL, indicating this patient had some 711 residual capacity to secrete cortisol. This dog was started on 712 corticosteroid therapy and was then lost to follow-up. 713 714 715 3.7. Construction of user interface 716 Fig. 4. Comparison of screening tools. Receiver operator characteristic (ROC) curves generated for screening tools including resting cortisol (black dotted A graphical user interface was created that allows for 717 line), lymphocyte count (green dash and dotted line), sodium-to-potassium easy implementation in a clinical setting (Fig. 5). A text 718 ratio (orange solid line), logistic regression model that includes Na-K ratio input box is present, to input patient’s data from the CBC 719 and lymphocyte count (blue dash and dotted line), and the machine learning and SC results. A percent chance that the patient has CHA is 720 algorithm (red solid line). (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) displayed along with an overall positive or negative result 721 Table 5 Performance of screening tools.

ˇ ˇ

ˇ ˇ

web 4C=FPO

ˇ

600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660

6

FLA 5.6.0 DTD  DAE106396_proof  1 October 2019  11:02 pm  ce

web 4C=FPO

722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782

K.L. Reagan et al. / Domestic Animal Endocrinology xxx (2019) 106396

7

783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 Fig. 5. Graphical user interface. Graphical user interface being used in the clinic for a dog with that presented with gastrointestinal signs. The user types the 801 patients’ medical record number into the input box and clicks the “Run” button. The CBC and SC results are displayed on the left from the medical record. The 802 algorithm prediction is displayed in the upper right and in this case the dog is suspected to have hypoadrenocorticism and the user is prompted to pursue further 803 Q 18 diagnostics. Abbreviations: CBC, -; SC, serum chemistry. 804 805 some residual cortisol activity, and their clinical signs or 806 based on the optimal discriminator. Graphical display albiochemical parameter changes might be more difficult for 807 lows for visualization of the patient data (black open circle) the clinician to recognize. Therefore, proposed screening 808 in the context of the training data with both CHA (red tools ideally would be able to discriminate between healthy 809 closed circle) and control cases (blue closed circle) plotted dogs and those with CHA that retain some cortisol activity. 810 for several CBC and SC parameters as a way to visualize the The use of Na/K ratio or lymphocyte count alone has 811 patients CBC and SC data in the context of blood work been proposed as screening tools for CHA. The Na/K ratio in 812 values from the training set. our study had an optimal cut point of <28.42, similar to 813 other studies [25]. The sensitivity of a Na/K ratio <28.42 for 814 4. Discussion detecting CHA in this population was 67%, which is similar, 815 albeit slightly lower than previous reports that have sen- 816 CHA has a wide variety of clinical presentations, and sitivities ranging from 70% to 90% [11,25]. A lymphocyte 817 recognition of the disease as a differential diagnosis can be count of >229/mL was identified as 100% sensitive for 818 a barrier to making a final diagnosis. In addition, the gold detection of CHA in this population; however, the speci- 819 standard confirmatory test, the ACTH stimulation test, has ficity was low at 2%. This is similar to previous findings that 820 several shortcomings as a screening test including high cost show a low specificity of 35% at a sensitivity of 100% [11]. 821 to clients and an anticipated shortage of the synthetic ACTH Previously, Na/K ratio and lymphocyte count were 822 needed to perform the test. This has led to the investigation assessed together in a logistic regression model and had 823 of alternative modalities to aid in the diagnosis and high diagnostic accuracy with an AUC of 0.935 [11]. The 824 screening of CHA. Several attempts to find a diagnostic tool published logistic regression model was applied to the 825 utilizing routine blood work and/or resting cortisol have population in the present study and achieved good diag- 826 been investigated with variable success [15,22–24]. We nostic accuracy with an AUC of 0.8680. The slightly 827 demonstrate that our machine learning algorithm can be decreased performance here is expected as the previously 828 utilized to predict a diagnosis of CHA with high accuracy, published AUC of the logistic regression model was deter- 829 sensitivity, and specificity using only CBC and SC results mined using data from the same population from which the 830 regardless of subclassification of CHA type (GMDH or GDH). model was created and illustrates a recognized limitation of 831 The algorithm has superior performance to other screening applying logistic regression models to novel populations. 832 modalities that have been proposed utilizing CBC and SC Here, patients from a different population with blood work 833 data in this study population. Moreover, the performance of performed in a different laboratory were used to validate 834 previously described screening tests was not tested on the model with good success, emphasizing the hypothesis 835 naïve populations, skewing their results toward high perthat biochemical parameter changes can be useful for CHA 836 formance. In contrast, our algorithm was trained on one screening. However, this model still had inferior perfor- 837 data set and then tested on another, still showing high mance when compared with the proposed algorithm. 838 performance. Resting cortisol measurement is a test specifically or- 839 An important addition to this study is that dogs were dered by the clinician for patients with suspicious clinical 840 included in the CHA group with a post-ACTH stimulation up signs of CHA and consistent changes in routine laboratory 841 to 2 mg/dL, whereas previous studies only included dogs tests. At a cut point of <2 mg/dL, resting cortisol is an 842 with a cortisol post–ACTH stimulation of <1 mg/dL [11]. The effective test to rule out CHA with 100% sensitivity and 88% 843 population of dogs with cortisol between 1 and 2 mg/dL have

FLA 5.6.0 DTD  DAE106396_proof  1 October 2019  11:02 pm  ce

844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904

8

K.L. Reagan et al. / Domestic Animal Endocrinology xxx (2019) 106396

specificity in our patient population similar to previous reports [26]. Although fairly accurate as such, it requires the clinician to actively pursue the diagnosis of CHA and it adds cost to the diagnostic process. In contrast, when set at a sensitivity of 100%, the algorithm developed here has a higher specificity (98%) compared with resting cortisol, and it can be ran automatically on CBC and SC without requiring the clinician to suspect the disease and without adding significant cost to the diagnostic process. Review of the falsely classified animals shows that the algorithm might be predisposed to misclassify in some unique clinical presentations such as chronic effusions. It also shows that the algorithm may have correctly classified animals that were misclassified by the ACTH stimulation test such as the case with adrenal necrosis on postmortem examination, which is associated with CHA that had a cortisol that traditionally would have ruled out CHA [27,28]. Some of these patients were very ill and may have also been afflicted with critical illness associated adrenal insufficiency, which is difficult to assess for with the reference standard, resting cortisol, or ACTH stimulation test [29,30]. The one patient that had a false negative from the algorithm had residual cortisol on post-ACTH stimulation. Most dogs with CHA have post-ACTH cortisol levels below the limit of detection (<0.2 mg/dL). This may indicate that the algorithm discriminates between patients that have no measurable cortisol vs those that have measurable cortisol as most CHA patients in the training set did not have residual cortisol measured. However, other patients with residual cortisol were correctly classified by the algorithm. A limitation of this study is that all cases enrolled were initially identified by clinicians as possibly having CHA based on traditional diagnostic criteria of CHA, creating a bias against more unusual clinical presentations. This is compounded in veterinary medicine by the need to prioritize diagnostic tests based on limited financial resources, making it less likely that a relatively low yield test (resting cortisol) would be performed in a patient with less than classical presentation. In addition, all data were collected from a referral hospital, which may not reflect the general canine population. The dogs in a referral population may have more comorbidities such as renal or gastrointestinal disease that could change biochemical parameters. Furthermore, dogs with more typical CHA biochemical parameter changes may have been diagnosed with their referring veterinarian, therefore removing those dogs from the study population. 5. Conclusions The use of this boosted tree machine learning algorithm, AdaBoost, allowed for a highly sensitive, specific, and accurate tool to screen patients with CHA. This tool is superior to other tools that are available using the most commonly collected biochemical parameter data, the CBC and SC, and has similar performance to the resting cortisol. This tool has the potential to improve clinical outcomes by identifying patients that should have confirmatory testing (ACTH stimulation test) for CHA performed regardless of GDH or GMDH status. Prospective studies are needed to validate this method in a wider array of patients.

905 906 907 [16]. 908 909 Acknowledgments 910 There are no conflicts of interest for any of the authors. 911 912 The authors received no specific fund for this work. Q 13 Author contribution: K.R. contributed to conceptuali- 913 zation, data curation, formal analysis, investigation, meth- 914 odology, writing the original draft, writing review, and 915 editing. B.R. contributed to formal analysis, methodology, 916 writing the review, and editing. C.G. contributed to 917 conceptualization, formal analysis, investigation, writing 918 the original draft, writing review, and editing and was 919 920 responsible for supervision. Q 19 921 922 Appendix A. Supplementary data 923 Supplementary data to this article can be found online 924 925 at https://doi.org/10.1016/j.domaniend.2019.106396. 926 927 References 928 929 [1] Feldman EC, Nelson RW, Reusch C, Scott-Moncrieff JCR. Canine & feline endocrinology. 4th ed. St. Louis, MI: Elsevier Saunders; 2015. 930 [2] Kooistra HS, Rijnberk A, van den Ingh TS. Polyglandular deficiency 931 syndrome in a boxer dog: thyroid hormone and glucocorticoid 932 deficiency. Vet Q 1995;17:59–63. 933 [3] Adissu HA, Hamel-Jolette A, Foster RA. Lymphocytic adenohypophysitis and adrenalitis in a dog with adrenal and thyroid atrophy. 934 Vet Pathol 2010;47:1082–5. 935 [4] Frank CB, Valentin SY, Scott-Moncrieff JCR, Miller MA. Correlation of 936 inflammation with adrenocortical atrophy in canine adrenalitis. J Comp Pathol 2013;149:268–79. Q 14937 [5] Zeugswetter F, Schwendenwein I. Diagnostic efficacy of the leuko938 gram and the chemiluminometric ACTH measurement to diagnose 939 canine hypoadrenocorticism. Tierarztl Prax Ausg K Kleintiere Heimtiere 2014;42:223–30. 940 [6] Klein SC, Peterson ME. Canine hypoadrenocorticism: part I. Can Vet J 941 2010;51:63. 942 [7] Peterson ME, Kintzer PP, Kass PH. Pretreatment clinical and laboratory findings in dogs with hypoadrenocorticism: 225 cases (1979943 1993). J Am Vet Med Assoc 1996;208:85–91. 944 [8] Podell M. Canine hypoadrenocorticism. Diagnostic dilemmas asso945 ciated with the “great pretender”. Probl Vet Med 1990;2:717–37. [9] Adler JA, Drobatz KJ, Hess RS. Abnormalities of serum electrolyte 946 concentrations in dogs with hypoadrenocorticism. J Vet Intern Med 947 2007;21:1168–73. Q 15 948 [10] Syme HM, Scott-Moncrieff JC. Chronic hypoglycaemia in a hunting 949 dog clue to secondary hypoadrenocorticism. J Small Anim Pract 1998;39:348–51. 950 [11] Seth M, Drobatz KJ, Church DB, Hess RS. White blood cell count and 951 the sodium to potassium ratio to screen for hypoadrenocorticism in 952 dogs. J Vet Intern Med 2011;25:1351–6. [12] Thompson AL, Scott-Moncrieff JC, Anderson JD. Comparison of 953 classic hypoadrenocorticism with glucocorticoid-deficient hypoa954 drenocorticism in dogs: 46 cases (1985-2005). J Am Vet Med Assoc 955 2007;230:1190–4. [13] Chastain CB, Madsen RW, Franklin RT. A screening evaluation for 956 endocgenous glucocorticoid deficiency in dogs: a modified thorn 957 test. J Am Anim Hosp Assoc 1965;25:18–22. 958 [14] Nielsen L, Bell R, Zoia A, Mellor DJ, Neiger R, Ramsey I. Low ratios of sodium to potassium in the serum of 238 dogs. Vet Rec 2008;162: 959 431–5. 960 [15] Borin-Crivellenti S, Garabed RB, Moreno-Torres KI, Wellman ML, 961 Gilor C. Use of a combination of routine hematologic and biochemical test results in a logistic regression model as a diag962 nostic aid for the diagnosis of hypoadrenocorticism in dogs. Am J 963 Vet Res 2017;78:1171–81. 964 [16] Roth L, Tyler RD. Evaluation of low sodium: potassium ratios in dogs. J Vet Diagn Invest 1999;11:60–4. Q 16965 Uncited reference

FLA 5.6.0 DTD  DAE106396_proof  1 October 2019  11:02 pm  ce

966 967 968 969 970 971 972 973 974 975 976 977 978 979 980

K.L. Reagan et al. / Domestic Animal Endocrinology xxx (2019) 106396 [17] Mani S, Ozdas A, Aliferis C, Varol HA, Chen Q, Carnevale R, et al. Medical decision support using machine learning for early detection of lateonset neonatal sepsis. J Am Med Inform Assoc 2014;21:326–36. [18] Lapuerta P, Azen SP, Labree L. Use of neural networks in predicting the risk of coronary artery disease. Comput Biomed Res 1995;28:38–52. [19] Schapire RE. The boosting approach to machine learning: an overview. New York, NY: Springer; 2003. p. 149–71. [20] Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 1997;55: 119–39. [21] DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837. [22] Lathan P, Scott-Moncrieff JCC, Wills RWW. Use of the cortisol-toACTH tatio for diagnosis of primary hypoadrenocorticism in dogs. J Vet Intern Med 2014;28:1546–50. [23] Boretti FS, Meyer F, Burkhardt WA, Riond B, Hofmann-Lehmann R, Reusch CE, et al. Evaluation of the cortisol-to-ACTH ratio in dogs with hypoadrenocorticism, dogs with diseases mimicking hypoadrenocorticism and in healthy dogs. J Vet Intern Med 2015;29:1335–41.

9

[24] Bovens C, Tennant K, Reeve J, Murphy KF. Basal serum cortisol concentration as a screening test for hypoadrenocorticism in dogs. J Vet Intern Med 2014;28:1541–5. [25] Adler JA, Drobatz KJ, Hess RS. Abnormalities of serum electrolyte concentrations in dogs with hypoadrenocorticism. J Vet Intern Med 2007;21:1168–73. [26] Lennon EM, Boyle TE, Hutchins RG, Friedenthal A, Correa MT, Bissett SA, et al. Use of basal serum or plasma cortisol concentrations to rule out a diagnosis of hypoadrenocorticism in dogs: 123 cases (2000-2005). J Am Vet Med Assoc 2007;231:413–6. [27] Chapman PS, Kelly DF, Archer J, Brockman DJ, Neiger R. Adrenal necrosis in a dog receiving trilostane for the treatment of hyperadrenocorticism. J Small Anim Pract 2004;45:307–10. [28] Frank CB, Valentin SY, Scott-Moncrieff JCR, Miller MA. Correlation of inflammation with ddrenocortical dtrophy in canine adrenalitis. J Comp Pathol 2013;149:268–79. [29] Cooper MS, Stewart PM. Corticosteroid insufficiency in acutely ill patients. N Engl J Med 2003;348:727–34. [30] Marik PE. Critical illness-related corticosteroid insufficiency. Chest 2009;135:181–93.

FLA 5.6.0 DTD  DAE106396_proof  1 October 2019  11:02 pm  ce

981 982 983 984 985 986 987 988 989 990 991 992 993 994 995