Vol. 60 No. 1 July 2020
Journal of Pain and Symptom Management
1
Original Article
Comparing an Artificial Neural Network to Logistic Regression for Predicting ED Visit Risk Among Patients With Cancer: A Population-Based Cohort Study Rinku Sutradhar, PhD, and Lisa Barbera, MD Department of Biostatistics (R.S.), Dalla Lana School of Public Health, University of Toronto, Toronto; ICES (R.S., L.B.), Toronto; Institute of Health Policy, Management and Evaluation (R.S.), University of Toronto, Toronto; and Department of Oncology (L.B.), Tom Baker Cancer Centre, University of Calgary, Calgary, Canada
Abstract Context. Prior work using symptom burden to predict emergency department (ED) visits among patients with cancer has used traditional statistical methods such as logistic regression (LR). Machine learning approaches for prediction, such as artificial neural networks (ANNs), are gaining attention but are yet to be commonly applied in practice. Objectives. We will compare an artificial neural network with logistic regression for predicting ED visit risk among patients with cancer. Methods. This was a population-based study of patients diagnosed with cancer between 2007 and 2015 in Ontario, Canada. After splitting the cohort into training and test sets, an ANN model and a LR model were developed on the training cohort to predict the risk of an ED visit within seven days after an assessment of symptom burden. The predictive performance of each risk model was assessed on the test cohort and compared with respect to area under the curve and calibration. Results. The training cohort consisted of 170,092 patients undergoing 1,015,125 symptom assessments, and the remaining 42,523 patients undergoing 252,169 symptom assessments were set aside as the test cohort. Both models performed similarly with respect to specificity (ANN 67.0%; LR 67.3%) and accuracy (ANN 67.1%; LR 67.2%), and only minor improvement was found with respect to sensitivity (ANN 68.9%; LR 67.1%), discrimination (ANN 74.3%; LR 73.7%), and calibration under the ANN model compared with the LR model. The most notable improvement in calibration was found among patients in the highest ED visit risk percentile. Conclusion. Although both models were similar in predictive performance using our data, ANNs have an important role in prediction because of their flexible structure and data-driven distribution-free benefits and should thus be considered as a potential modeling approach when developing a prediction tool. J Pain Symptom Manage 2020;60:1e9. Ó 2020 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved. Key Words Emergency department, symptom severity, risk prediction models, artificial neural networks, logistic regression, area under curve, calibration
Introduction Clinical Background Patients diagnosed with cancer often experience numerous emergency department (ED) visits throughout their disease trajectory.1,2 Compared with the general population, the rate of ED use is
Address correspondence to: Rinku Sutradhar, PhD, Institute for Clinical Evaluative Sciences, G1-06 2075 Bayview Avenue, Toronto, Ontario M4N 3M5, Canada. E-mail:
[email protected] Ó 2020 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
more than double among patients with cancer.3 ED visits during the phase of cancer treatment can be particularly burdensome. Not only are EDs crowded with long wait times, additional health concerns are faced by patients with cancer who are immunocompromised or those suffering from adverse symptoms
Accepted for publication: February 12, 2020.
0885-3924/$ - see front matter https://doi.org/10.1016/j.jpainsymman.2020.02.010
2
Sutradhar and Barbera
and toxicities stemming from their treatments.3 ED visits are often viewed as preventable health system use, and although it is expected that patients with cancer will require more health services to meet their health needs, it is possible that some of these ED visits can be prevented by providing care in alternate settings.3 Reducing ED visits remain an important part of improving quality of cancer care. Better prediction models for forecasting ED use can serve as an early warning indicator. It can facilitate timely interventions in clinic or at home by cancer care providers, targeting individuals with high risk, which in turn may reduce a patient’s need to visit the ED. Prior work has shown that worsening of physical symptoms, such as pain, nausea, shortness of breath, along with constitutional symptoms such as well-being, fatigue, and appetite, contribute to ED visits in the cancer population.1 Recent research demonstrated that not only is high symptom burden associated with greater risk of an ED visit but also information on symptom burden improves our ability to predict ED visit risk.4 In Ontario, Canada, patients make regular visits to outpatient cancer clinics for receiving treatments and assessments by their care providers. As patient-reported information such as symptom burden is often taken during these visits, it serves as an opportune time to flag patients who may be at high risk for short-term adverse outcomes such as ED visits and can thus benefit from timely intervention and symptom management.
Vol. 60 No. 1 July 2020
node.8 The nodes in different layers are connected by weights. The process of determining the appropriate number of hidden neurons and estimating the optimal weights for reliable outcome prediction is called learning or training.9 Several authors have compared ANN against logistic regression (LR) using clinical data, where input variables were provided based on clinical reasoning. There was no general consensus on the superiority of one model over the other because model performance was a strong function of the specific clinical question and type of data available.8,10e12
Study Challenges and Goals Using information on symptom severity, we are interested in predicting the risk of an ED visit within one week after a symptom assessment among patients with cancer. This poses several challenges: 1) symptoms are known to behave in clusters, meaning there may be interplay between symptoms that are difficult to explicitly capture;13,14 2) there may be complex nonlinear relationships between the covariates and ED visit risk; and 3) our information arises from administrative data where all covariates are selected and measured based on clinical reasoningdthis may
Methodological Background Traditional statistical techniques such as logistic or time-to-event regression models are often used for ED visit prediction.4e7 With the rise of machine learning techniques for making predictions with health data, it is of interest to determine if these learning algorithms can be used to predict ED use in the cancer population, and to determine how these approaches compare in predictive performance to traditional statistical methods. We are specifically interested in the artificial neural network (ANN) framework, which are mathematical constructs modeled on interconnected nodes. The generic structure of a basic ANN consists of a series of nodes arranged in three layers (input, hidden, and output layers). Fig. 1 provides an illustration of an ANN framework with one hidden layer; it has 10 neurons in the input layer, four neurons in the hidden layer, and one neuron in the output layer (also known as a 10e4e1 ANN model). The input nodes and output node of an ANN correspond to the predictor variables and outcome variable, respectively. The nodes in the hidden layer are intermediate unobserved values that allow the ANN to model complex nonlinear relationships between the input nodes and the output
Fig. 1. Visualization of a 10e4e1 neural network: one input layer consisting of 10 nodes (I1eI10), one hidden layer consisting of four nodes (H1eH4), and one output layer consisting of one node (O1). B1 and B2 are the set of bias terms. The black lines represent the connections/weights that need to be estimated.
Vol. 60 No. 1 July 2020
ANN for Predicting ED
limit our prediction ability especially if the prior challenges are not adequately accounted for or if other nonobvious characteristics are not considered. In an attempt to overcome these barriers, the purpose of this study is to develop an ANN to predict the risk of ED visits among patients with cancer using information on symptom burden, and to compare the predictive performance of the ANN against the traditional LR model.
Methods Study Design, Population, and Observation Period This was a population-based retrospective cohort study using linked administrative health care databases. The cohort consisted of patients who were newly diagnosed with a primary cancer and had at least one assessment of symptom burden completed using the Edmonton Symptom Assessment System (ESAS) between January 1, 2007 and December 31, 2015 in Ontario, Canada. This cohort has been used in prior work.4 Patients had to be eligible for the Ontario Health Insurance Plan (OHIP) and at least 18 years of age at the time of diagnosis. The OHIP is Ontario’s universal health care insurance program, which is essentially available to all Ontario residents. The Ontario Cancer Registry, consisting of all incident cases of cancer in Ontario since 1964, was used to determine the diagnosis date.15 To capture an ambulatory cohort, patients were included only if their ESAS assessments occurred in a regional cancer center or partner hospital. Starting from diagnosis, every patient was observed until one of the following occurred: a subsequent cancer diagnosis, loss of OHIP eligibility, entry into a long-term care facility, death, or study end date on December 31, 2015. Administrative databases were linked using unique encoded identifiers and analyzed at ICES (formerly known as the Institute for Clinical Evaluative Sciences). ICES is an independent and nonprofit research institute whose legal status under Ontario’s health information privacy law allows it to collect and analyze health care and demographic data, without consent, for health system evaluation and improvement. The use of data in this project was authorized under Section 45 of Ontario’s Personal Health Information Protection Act, which does not require review by a Research Ethics Board.
Index Dates The index dates for each individual were the dates of their ESAS assessments occurring during the course of observation, provided there was at least a seven-day gap between consecutive assessments. The ESAS assessment dates were retrieved from the Symptom
3
Management Reporting Database held by Cancer Care Ontario.
Outcome (ANN Output Node) For each patient after every ESAS assessment, the outcome was defined as the occurrence (yes/no) of at least one ED visit within the next seven days. This seven-day window was chosen a priori based on clinical reasoning; the window was long enough for the provider to potentially respond to the symptom screen and short enough that the screening could be attributed to the ED visit.1,4 All ED visits were captured through the Canadian Institute for Health Information’s National Ambulatory Care Reporting System database.16
Covariates (ANN Input Nodes) As there were 60 unique covariates, information on these variables is provided in point form herewith. Complete details can be found in prior work.4 The following variables were measured once at diagnosis for each patient: Sex (binary measure) Year of cancer diagnosis (continuous measure) Type of cancer diagnosis (categorical measure: lung, breast, lung, gastrointestinal, genitourinary, hematology, or other) Stage at cancer diagnosis (categorical measure: one, two, three, four, or unknown) Neighborhood median income quintile representing socioeconomic status (five-level categorical measure) Rurality determined by postal code (binary rural/ urban) Total number of ED visits in the two years before cancer diagnosis (continuous measure) Total number of hospitalizations in the two years before cancer diagnosis (continuous measure) The following variables were updated and measured at every ESAS assessment for each patient: Age at ESAS assessment (continuous measure) ESAS scores for each of the nine symptoms (anxiety, appetite, depression, drowsiness, nausea, pain, shortness of breath, fatigue, and wellbeing). Symptom scores were examined in a continuous and categorical manner (none is score zero, mild is scored one to three, moderate is scored four to six, or severe is scored seven to 10).17 Number of months since diagnosis (continuous measure) Number of prior ESAS assessments since diagnosis (continuous measure)
4
Sutradhar and Barbera
Receipt of chemotherapy within 30 days before the ESAS assessment Receipt of radiation within 30 days before the ESAS assessment Receipt of surgery before the ESAS assessment (looking back to the date of diagnosis) Total number of clinic visits to a radiation oncologist within 30 days before the ESAS assessment Total number of clinic visits to a medical oncologist within 30 days before the ESAS assessment Total number of visits to a family physician within 30 days before the ESAS assessment Burden of comorbidity for each patient, derived using all diagnoses in the two years before the ESAS assessment based on 32 aggregated diagnosis groups (every patient had 32 aggregated diagnosis group comorbidity indicator variables that were updated at each ESAS assessment) Level of health care utilization in the two years before the ESAS assessment captured using resource utilization bands, which are quintiles of expected resource use (five-level categorical measure) Phase of care/phase of cancer management at the time of ESAS assessment (categorical measure: initial, continuing, or palliative)
Statistical Analyses Descriptive Analyses. The distributions of the characteristics among the training and test cohorts were explored; continuous measures were described with medians and interquartile ranges, and categorical measures were described using frequencies and percentages. ANN Prediction Model. We constructed a three-layer ANN model consisting of an input layer, one hidden layer, and an output layer. One layer of hidden neurons is generally sufficient for classifying noncomplex data.10 All 60 covariates described previously were represented as nodes in the input layer, and the output layer consisted of a single node representing our binary outcome (whether an ED visit occurred or not). To start, we included two nodes in the single hidden layer of the neural network. The weights of the neural network were estimated using the 80% training cohort. This was done under resilient backpropagation with weight backtracking, where a cross-entropy error function was minimized.18 The area under the receiver operating characteristic curve (AUC) value was then calculated for the training cohort to understand the degree of discrimination. A value of 0.5 implies a useless model that classifies no better than chance.19 In theory, a perfectly discriminating model (value of 1.0) would assign a higher event probability
Vol. 60 No. 1 July 2020
to everyone who experienced the event compared with any individual who did not. We then repeated this process by separately including three, four, five, and six nodes into the single hidden layer of the ANN. We found that the AUC value for the training cohort increased with each additional node; however, problems with convergence arose once we reached five nodes. As a result, the final ANN model included four nodes in its single hidden layer. LR Prediction Model. A traditional multivariable LR model to predict ED visit risk was built using the training cohort. The model started with all covariates listed previously. To increase model flexibility, twoway interactions between symptoms, and polynomial relationships for continuous covariates were also explored. Backward selection with a P-value cutoff of 0.05 was used to derive the final multivariable LR model. Comparing the ANN Model Against the LR Model. The estimated set of weights from the final ANN model and the estimated set of regression coefficients from the final LR model were used to make predictions on each patient in the 20% test cohort (which had been completely set aside up to this point). Calibration plots were constructed under each of the two models using the test cohort. This was done by grouping patients into percentiles (100 groups) based on their predicted risk and then plotting the observed ED risk within a percentile against the corresponding mean predicted risk within that percentile.20 Points closer to the 45 line indicate better calibration. In addition, the predicted number of outcomes was compared with the actual number of outcomes in the test cohort by composing a confusion matrix. In the test cohort under each of the two models, we calculated sensitivity (true positive fraction), specificity (true negative fraction), accuracy (true positive or negative fraction), and discrimination (measured using the AUC value). All analyses were conducted using R, Version 3.2.3, statistical software (R Foundation for Statistical Computing, Vienna, Austria).21
Results The study population consisted of 212,615 unique patients, experiencing a total of 1,267,294 ESAS assessments. The median (interquartile range) of the number of ESAS assessments per patient was four (two to eight). Before proceeding with modeling, the study population was randomly divided into two mutually exclusive cohorts: 80% of patients (n ¼ 170,092 undergoing 1,015,125 ESAS assessments) comprised the training cohort, and the remaining 20% of patients
Vol. 60 No. 1 July 2020
ANN for Predicting ED
(n ¼ 42,523 undergoing 252,169 ESAS assessments) were set aside as the test cohort. Among the 1,015,125 ESAS assessments in the training cohort, 31,961 (3.15%) were followed by an ED visit within seven days after assessment. The distributions of characteristics at the baseline ESAS assessment in our training and test cohorts are presented in Table 1. Both cohorts were well balanced with respect to each of the characteristics. The median time between diagnosis and baseline ESAS assessment was 3.5 months. Anxiety, fatigue, and lack of well-being were the most prominent symptoms experienced at that time. Most patients were diagnosed with either breast, gastrointestinal, or genitourinary cancer. The results from the LR model applied on the training cohort are provided in Table 2. Unlike ANN models, the estimates of the LR model coefficients are directly interpretable. Phase of care at the time of assessment, cancer type, cancer stage, and rurality were the strongest determinants of whether an ED visit occurred within seven days after an assessment. We also found that worsening of appetite, pain, dyspnea, fatigue, or well-being demonstrated a strong significant increase in the risk of going to ED.4 Table 3 compares the prediction performance of the ANN and LR models (which were derived from the training cohort) on the test cohort. The estimates of sensitivity, specificity, accuracy, and discrimination are provided within the test cohort. The ANN model demonstrated marginal improvement compared with the LR model with respect to sensitivity and discrimination; both models demonstrated similar sensitivity and accuracy. Calibration plots are given in Fig. 2. The red dots and blue dots represent the calibration results on the test cohort under the ANN and LR models, respectively. Overall, the red dots are marginally tighter along the 45 line compared with the blue dots, indicating minor improvement in calibration under the ANN model compared with the LR model. The greatest difference in distance from the 45 line can be seen among patients in the highest risk percentile (dots to the right of the plot); the ANN model is superior in predicting ED use compared with the LR model among these patients. The LR model overestimates the risk of ED use, as the mean predicted ED risk is larger than the observed ED risk.
Discussion This article develops an ANN model to predict the risk of ED visits among patients with cancer using information on symptom burden and compares the predictive performance of the ANN model against the
5
traditional LR model. Both models performed similarly with respect to specificity and accuracy, and only minor improvement was found with respect to sensitivity, discrimination, and calibration under the ANN model compared with the LR model. The LR model was computationally more efficient with respect to convergence speed compared with the ANN model. Although both models were similar in predictive performance using our data, we argue that ANNs have an important role in prediction and should still be considered as a possible approach. Parameter estimation under LR requires making distributional assumptions, whereas ANN is predominantly a distribution-free data-driven approach. An ANN does neither require a priori knowledge on the relationships between the predictors and the outcome nor does it require a priori knowledge on the presence of interactions among predictors. These networks are designed to be able to identify and model the underlying and possibly complex (or arbitrary) relationships and are able to implicitly detect interactions. In contrast, a LR can only model such relationships and interactions if explicitly defined by the user. Both models are able to provide individual-level risk prediction based on a subject’s covariate profile, and both models have the potential to be used as decision support tools once they are integrated into clinical practice.8 For studies using administrative data, where the inputs (covariates) are initially selected based on clinical reasoning, ANN models may be superior to traditional statistical models for outcome prediction if one of the following exists: 1) complex nonlinear relationships between the covariates and the outcome that are difficult to explicitly capture even with the use of techniques such as including polynomial terms or cubic splines, 2) complex interplay between covariates that are difficult to capture through multiway interactions, 3) repeated covariate and outcome measurements taken on the same individual over time that may result in within-individual correlation, and 4) when interest lies in predicting multiple (and possibly correlated) types of outcomes on the same individual. It can be further argued that, at minimum, ANN models will not be inferior in prediction compared with LR models, similar to what we found in this work. If no complex relationships exist to be being captured/ learned by hidden layers, then the weights in an ANN model will collapse to an LR model. With respect to clinical interpretations, LR models are highly useful for researchers interested in examining associations between covariates and outcomes. The estimated model regression coefficients provide user-friendly odds ratios that are convenient for
6
Sutradhar and Barbera
Vol. 60 No. 1 July 2020
Table 1 Distributions of Characteristics at Baseline ESAS Assessment Among the Training and Test Cohorts 80% Training Cohort (n ¼ 170,092) Characteristic Anxiety Depression Drowsiness Appetite Nausea Pain Dyspnea Fatigue Well-being Months from diagnosis Number of prior ESAS assessments Age at assessment Receipt of chemotherapy within prior 30 days Receipt of radiation within prior 30 days Receipt of surgery since diagnosis Number of radiation oncologist visits within prior 30 days Number of medical oncologist visits within prior 30 days Number of PCP visits within prior 30 days RUB
Phase of care Sex (female/male) Year of diagnosis Cancer type
Stage at diagnosis
Income quintile
Rural residence Number of ED visits two years before diagnosis Number of hospitalizations two years before diagnosis
Value Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous
20% Test Cohort (n ¼ 42,523)
Frequency/Median Percentage/IQR Frequency/Median Percentage/IQR 1 0 0 0 0 0 0 2 2 3.5
0e4 0e2 0e3 0e3 0e0 0e3 0e2 0e5 0e5 1.4e14.5
1 0 0 0 0 0 0 2 2 3.5
0e4 0e2 0e3 0e3 0e0 0e3 0e2 0e5 0e5 1.4e14.6
Continuous Yes
64 20,059
55e73 11.9
64 5015
55e73 11.8
Yes
15,123
9
3781
8.9
Yes Continuous
84,049 0
49.2 0e0
21,012 0
49.4 0e0
Continuous
0
0e0
0
0e0
Continuous
0
0e1
0
0e1
0 1 2 3 4 5 Initial Continuing Palliative Female Continuous Breast Central nervous system Gastrointestinal Genitourinary Gynecologic Hematology Head and neck Other Primary unknown Skin Lung 0 1 2 3 4 Unknown 1 2 3 4 5 Yes Continuous Continuous
713 912 4971 69,167 49,511 44,818 88,357 32,306 49,429 91,709 2012 37,505 2458 29,115 29,301 14,130 20,273 8887 2734 1042 4894 19,753 329 35,747 42,141 29,395 24,589 37,891 29,007 32,974 33,560 36,765 37,786 23,763 0 0
0.4 0.5 2.8 40.7 29.2 26.3 51.9 19.0 29.1 53.9 2009e2014 22 1.4 17.1 17.2 8.3 11.9 5.2 1.6 0.6 2.9 11.6 0.2 21.0 24.8 17.3 14.5 22.3 17.3 19.5 19.8 21.4 22.3 14.0 0e1 0e0
178 228 1243 17,292 12,378 11,204 22,076 8161 12,286 22,956 2012 9376 615 7279 7325 3533 5068 2222 683 261 1223 4938 79 8967 10,514 7335 6039 9589 7252 8244 8390 9191 9446 6019 0 0
0.4 0.5 2.9 40.9 29.1 26.5 51.9 19.2 28.9 54.0 2009e2014 22.1 1.2 17.3 17.2 8.2 11.7 5.1 1.7 0.6 2.7 11.7 0.2 21.1 24.7 17.2 14.2 22.6 17.1 19.4 19.7 21.6 22.2 14.2 0e1 0e0
ESAS ¼ Edmonton Symptom Assessment System; IQR ¼ interquartile range; PCP ¼ primary care physician; RUB ¼ resource utilization band; ED ¼ emergency department. Information on 32 aggregated diagnosis group distributions is not shown here because of space restrictions. Median and IQR provided for continuous covariates; frequencies and percentages provided for binary or categorical covariates.
Vol. 60 No. 1 July 2020
ANN for Predicting ED
7
Table 2 Results From LR Model (Using the Training Cohort) LR Model Results Characteristic Anxiety Depression Drowsiness Appetite Nausea Pain Dyspnea Fatigue Well-being Months from diagnosis Number of prior ESAS assessments Age at assessment Receipt of chemotherapy within prior 30 days Receipt of radiation within prior 30 days Receipt of surgery since diagnosis Number of radiation oncology visits within prior 30 days Number of medical oncology visits within prior 30 days Number of PCP visits within prior 30 days RUB
Phase of care Sex Year of diagnosis Cancer type
Stage at diagnosis
Income quintile
Rural residence Number of ED visits two years before diagnosis Number of hospitalizations two years before diagnosis
Value Mild Moderate Severe Mild Moderate Severe Mild Moderate Severe Mild Moderate Severe Mild Moderate Severe Mild Moderate Severe Mild Moderate Severe Mild Moderate Severe Mild Moderate Severe Continuous Continuous Continuous Yes Yes Yes Continuous Continuous Continuous 0 1 2 3 4 Initial Palliative Female Continuous Breast Central nervous system Gastrointestinal Genitourinary Gynecologic Hematology Head and neck Other Primary unknown Skin 2 3 4 Unknown 1 2 3 4 Yes Continuous Continuous
Reference None None None None None None None None None None None None None None None None None None None None None None None None None None None
No No No
5 5 5 5 5 Continuing Continuing Male Lung Lung Lung Lung Lung Lung Lung Lung Lung Lung 1 1 1 1 5 5 5 5 No
OR Estimate
P
0.97 1.04 1.03 0.93 0.87 0.79 1.06 1.1 1.17 1.2 1.37 1.62 1.02 1.08 1.14 1.12 1.27 1.41 1.01 1.12 1.39 1.09 1.21 1.37 1.11 1.24 1.49 1.01 elim elim 1.23 elim 0.95 1.06 1.22 1.17 elim elim elim elim elim 1.48 1.93 0.92 1.04 0.8 1.04 0.97 0.8 0.87 0.77 0.67 0.86 0.8 0.88 1.13 1.25 1.34 1.36 1.12 1.05 1.01 1.03 1.36 1.07 0.97
0.0788 0.0777 0.2611 <0.0001 <0.0001 <0.0001 0.0003 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 0.2031 0.0006 <0.0001 <0.0001 <0.0001 <0.0001 0.3977 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 0.0001 elim elim <0.0001 elim 0.0002 <0.0001 <0.0001 <0.0001 elim elim elim elim elim <0.0001 <0.0001 <0.0001 <.0001 <0.0001 0.4495 0.181 <0.0001 <0.0001 <0.0001 <0.0001 0.0048 0.0024 0.0054 <0.0001 <0.0001 <0.0001 <0.0001 <.0001 0.0075 0.5843 0.1053 <0.0001 <0.0001 0.001
LR ¼ logistic regression; OR ¼ odds ratio; ESAS ¼ Edmonton Symptom Assessment System; PCP ¼ primary care physician; RUB ¼ resource utilization band; ED ¼ emergency department. elim implies that this variable was eliminated during the backward selection model building process; information on 32 aggregated diagnosis group results are not shown here because of space restrictions.
8
Sutradhar and Barbera
Vol. 60 No. 1 July 2020
Table 3 Prediction Performance Under the ANN and Risk Models (on the Test Cohort) Performance Measure Sensitivity Specificity Accuracy Area under ROC (discrimination)
ANN Risk Prediction Model
LR Risk Prediction Model
68.9 67.0 67.1 74.3
67.1 67.3 67.2 73.7
ANN ¼ artificial neural network; LR ¼ logistic regression; ROC ¼ receiver operating characteristic.
identifying important predictors of risk. These estimated regression coefficients also provide actionable information, without actually having to further calculate an absolute ED risk score. For example, further clinical assessment or interventions during a clinic visit may be considered for those patients who have high-risk characteristics (as determined from the model estimates). In contrast, with ANN models, further clinical assessment or interventions during a clinical visit can only be considered once the individual’s absolute risk score is calculated. ANNs are often referred to as black-box models, as estimates of the weights in an ANN do not have a direct real-life interpretation, thus making it difficult to pinpoint the risk predictors. However, in scenarios where risk prediction and outcome classification performance are more important than determining associations or model interpretation, ANNs should be considered as a prediction modeling option.8 This article has numerous strengths. This provincebased study with more than 200,000 patients with cancer and more than 1.2 million assessments of symptom severity made it possible to build and test our prediction models on large training and tests cohorts. We were able to measure an extensive list of covariates, including information on physical and constitutional symptoms, comorbidities, cancer type, cancer stage, treatments, and various forms of prior health care utilization. The cohort had few exclusion criteria, making our results generalizable to other cancer patient populations belonging to similar single-payer health care systems. This study also had several limitations. Symptom assessments were only taken among ambulatory cancer patients during their visits to a cancer clinic, and thus our ED visit prediction model may not be applicable to nonambulatory individuals in hospital or hospice. In multiple payer health care systems, it may be difficult to obtain longitudinal information on symptom severity for patients with cancer. In addition, discontinuity in care may make it difficult to measure the number of ED visits in the two years before cancer diagnosis, for example, which we also know is an
Fig. 2. Calibration plot (on the test cohort) under the artificial neural network (ANN) risk prediction model (red dots) and under the logistic regression risk prediction model (blue þ). For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.
important factor in predicting future ED use. We believe that variation in both data availability and data quality, especially for those key determinants of ED visit risk, would reduce the predictive ability of our models. Our prediction models were not able to distinguish between ED events occurring earlier vs. later within the seven-day outcome window. Prediction model performance (under both ANN and LR models) may be further improved with information on measures, such as types of chemotherapy drugs, availability of caregiver support at home, functional status, the presence of a primary care physician; however, these additional data were not available. This study demonstrates similarities in prediction performance between ANNs and LR models in determining ED visit risk. Although each model has their own strengths and limitations, ANNs should be considered as a risk modeling option when interest lies solely in prediction. Researchers should be open to the possibility of exploring several risk modeling options to achieve the best prediction performanced this in turn can have meaningful clinical impact. Accurate ED prediction at the time of symptom assessment can serve as an early warning indicator tool, for example. This can support discussions between oncologists and patients so that timely interventions and symptom management can be implemented to reduce
Vol. 60 No. 1 July 2020
ANN for Predicting ED
9
the burden of these unwanted adverse outcomes, ultimately improving cancer care and quality of life.
the end of life?: a population-based cohort study of cancer decedents. J Pain Symptom Manage 2016;51:204e212.
Disclosures and Acknowledgments
8. Ayer T, Chhatwal J, Alagoz O, et al. Comparison of logistic regression and artificial neural network models in breast cancer risk estimation. Radiographics 2010;20:13e22.
This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care. Parts of this material are based on data and information complied and provided by Ontario Ministry of Health and Long-Term Care, Cancer Care Ontario, and the Canadian Institute for Health Information. The analyses, conclusions, opinions, and statements expressed herein are solely those of the authors and do not reflect those of the funding or data sources; no endorsement is intended or should be inferred. The authors declare no conflicts of interest. Ethical approval: This study involved secondary data analyses only and was thus exempt from requiring Research Ethics Board approval because ICES is a designated 45.1 entity under the Personal Health Information Protection Act enabling the use of personal health information.
References 1. Barbera L, Taylor C, Dudgeon D. Why do cancer patients visit the emergency department near the end of life? CMAJ 2010;182:563e568. 2. Mayer DK, Travers D, Wyss A, et al. Why do patients with cancer visit emergency departments? Results of a 2008 population study in North Carolina. J Clin Oncol 2011;29: 2683e2688. 3. Lash RS, Bell JF, Bold RJ, et al. Emergency department use by recently diagnosed cancer patients in California. JCSO 2017;15:95e102. 4. Sutradhar R, Rostami M, Barbera L. Patient-reported symptoms improve performance of risk prediction models for ED among patients with cancer: a population-wide study in Ontario using administrative data. J Pain Symptom Manage 2019;58:745e755. 5. Barbera L, Atzema C, Sutradhar R, et al. Do patientreported symptoms predict emergency department visits in cancer patients? A population-based analysis. Ann Emerg Med 2013;61:427e437. 6. Barbera L, Sutradhar R, Howell D, et al. Does routine symptom screening with ESAS decrease ED visits in breast cancer patients undergoing adjuvant chemotherapy? Support Care Cancer 2015;23:3025e3032. 7. Seow H, Barbera L, Pataky R, et al. Does increasing homecare nursing reduce emergency department visits at
9. Haykin SO. Neural networks and learning machines, 3rd ed. Upper Saddle River, NJ: Pearson Education, Inc, 2009. 10. Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 2002;35:352e359. 11. Eftekhar B, Mohammad K, Ardebil HE, et al. Comparison of artificial neural network and logistic regression models for predicting mortality in head trauma based on initial clinical data. BMC Med Inform Decis Making 2005; 5:3. 12. Rodrigo H, Tsokos CP. Artificial neural network model for predicting lung cancer survival. J Data Anal Inf Process 2017;5:33e47. 13. Cheung WY, Barmala N, Zarinehbaf S, et al. The association of physical and psychological symptom burden with time to death among palliative cancer outpatients. J Pain Symptom Manage 2009;37:297e304. 14. Fan G, Filipczak L, Chow E. Symptom clusters in cancer patients: a review of the literature. Curr Oncol 2007;14: 173e179. 15. Clarke EA, Marrett LD, Kreiger N. Cancer registration in Ontario: a computer approach. In: Jensen OM, Parkin DM, MacLennan R, Muir CS, Skeet RG, eds. Cancer registration principles and methods. Lyon, France: IARC Publications, 1991:246e257. 16. Institute for Clinical Evaluative Sciences. ICES Data Dictionary Toronto, ON: Institute for Clinical Evaluative Sciences. Available from https://datadictionary.ices.on.ca/ Applications/DataDictionary/Default.aspx. Accessed January 23, 2018. 17. Selby D, Cascella A, Gardiner K, et al. A single set of numerical cutpoints to define moderate and severe symptoms for the Edmonton Symptom Assessment System. J Pain Symptom Management 2010;39:241e249. 18. Gunther F, Fritsh S. neuralnet: training of neural network. R J 2010;2:30e38. 19. Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 2010;21:128e138. 20. Yi M, Meric-Bernstam F, Kuerer HM, et al. Evaluation of a breast cancer nomogram for predicting risk of ipsilateral breast tumor recurrences in patients with ductal carcinoma in situ after local excision. J Clin Oncol 2012;30:600e607. 21. R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2009. Available from http://www.R-project.org. Accessed July 28, 2014.