Mathematical and Computer Modelling 46 (2007) 88–94 www.elsevier.com/locate/mcm
Artificial neural networks and risk stratification: A promising combination M. De Beule a,∗ , E. Maes a , O. De Winter b , W. Vanlaere a , R. Van Impe a a Ghent University, Department of Structural Engineering, Faculty of Engineering, 9052 Zwijnaarde, Belgium b Ghent University, Department of Radiotherapy and Nuclear Medicine, Faculty of Medicine and Health Sciences, 9000 Gent, Belgium
Received 18 May 2006; accepted 15 December 2006
Abstract A brief overview of the principles of Artificial Neural Networks (ANN’s) is presented, followed by a review of the state of the art for ANN’s in the application field of the diagnosis of cardiovascular diseases. Next the technique of ANN’s is applied to model the risk stratification according to D’Agostino et al. [R.B. D’Agostino, M.W. Russell, D.M. Huse, et al., Primary and subsequent coronary risk appraisal: New results from the Framingham study, American Heart Journal 139 (2000) 272–281]. The performance of the network proves its ability to find non-linear relationships in (medical) data and some important factors in accomplishing an accurate and reliable network are derived. At the end an ANN is designed to investigate the predictive quality of certain well chosen risk factors for secondary prevention. The performance of the resulting network is put in an appropriate perspective and some aspects that need further study are mentioned. c 2007 Elsevier Ltd. All rights reserved.
Keywords: Neural networks; Diagnosis; Cardiology; Framingham; D’Agostino
1. Introduction Epidemiologists seek to understand the predictive quality of certain risk factors in risk prevention, using – in general – classical determination methods. Since the 1960’s a lot of mathematical predictive models have been created for primary prevention (prediction of the risk of a new coronary heart disease (CHD)) on the basis of the Framingham study. In secondary prevention, the risk for a new CHD event is calculated for persons with a history of coronary or cardiovascular disease. By evaluating afterwards which risk factor affected the global risk for each patient the most, it is possible to support the secondary prevention in an effective way. The risk factors considered in the analysis of D’Agostino [1] are age, log ratio of total to HDL cholesterol and diabetes for men. Log transformed Systolic Blood Pressure (SBP) and smoking are also included in the model for women in addition to the risk factors for men. After a concise explanation of the technique of ANN’s, an overview of the current status of the use of ANN’s in the field of diagnosis in cardiology is presented. The authors would like to state that this survey is certainly not complete, but has to be seen as a global picture. Subsequently, some important factors in accomplishing an accurate ∗ Corresponding author.
E-mail address:
[email protected] (M. De Beule). c 2007 Elsevier Ltd. All rights reserved. 0895-7177/$ - see front matter doi:10.1016/j.mcm.2006.12.024
M. De Beule et al. / Mathematical and Computer Modelling 46 (2007) 88–94
89
and reliable network are derived from a basic ANN to model the risk stratification according to D’Agostino. Some of those recommendations are used in the design of a second network. Using classical epidemiological insights, the authors determined 10 (secondary and possible) risk factors: age, diabetes, smoking, personal history, heart rate at rest, systolic blood pressure at rest, total cholesterol value, left ventricular ejection fraction (LVEF), end systolic volume (together with LVEF determined by QGS myocardial gated SPECT software) and defect extent in stress minus defect extent at rest. The idea grew of seeing whether these factors could be used in a neural network, to predict the outcome for the patients. This article may contribute to encouraging using the technique of ANN’s in the field of epidemiology where these networks can provide new insights into the predictive quality of certain risk factors for risk prevention and be supplementary to classical determination methods. The paper is organized as follows. Section 2 presents a brief overview of the principles of ANN’s. Section 3 gives a review of the state of the art for ANN’s in the application field of the diagnosis of cardiovascular diseases. Sections 4 and 5 discuss the results obtained from applying ANN’s on a data set – collected in a time period of about four years at the Ghent University Hospital – for 273 patients with a history of ischaemic heart disease and an ejection fraction of less than 40%. Finally, Section 6 draws some conclusions and gives proposals for future work. 2. Artificial neural network principles Artificial neural networks belong to the field of artificial intelligence and are based on the structure of the human brain. The ANN is a data processing technique that is capable of detecting, learning and predicting complex relations between quantities. A remarkable feature of ANN’s is that the teaching of the relations is done by repeatedly presenting examples of the relationship to the network. An ANN is a collection of a large number of units, which are called neurons. Some of these neurons are connected to others by means of links, through which information is sent. This means that each neuron receives its information and sends the results to other neurons. These links are very important for the ANN’s. Each link is a given an interconnection weight wlji that is constantly adapted during the learning or training process. For the weights wlji , the upper index l stands for the layer number, while the lower indices indicate which neurons are connected by the link. The information that a neuron receives through the incoming links is processed in a first phase by calculating the weighted sum of the information: X l+1 l ail+1 = wl+1 (1) ji x j + bi j
where x lj is the output of neuron j in layer l and bil+1 is a bias term. This weighted sum ail+1 is known as the activation value of the neuron. This value is used inside the neuron as the input of the activation function, which is the second phase of the information processing. A frequently used activation function is the sigmoid function, given by xil = f (ail )
with f (x) =
1 . 1 + e−x
(2)
The result of this function is the output of one neuron. Since a network consists of multiple neurons, these have to cooperate in one way or another. For this reason, the neurons are grouped into different layers. The commonest network architecture is the feedforward structure. In this structure, three kinds of layers are considered: the input layer, one or more hidden layers and the output layer. The links, which were mentioned before, only appear between the neurons of neighbouring layers. When such a network is used to detect the relation between multiple variables on the one hand and a corresponding function value on the other hand, the values of the variables have to be presented to the network through the input layer. This information travels through the network via the neurons and the links and reaches the output neurons in processed form. The value of these output neurons can then be interpreted as the prediction of the network. Of course, an ANN is not capable of producing meaningful predictions immediately after its architecture has been determined. First, the relation under consideration has to be taught to the network. This is done by repeatedly presenting known examples of the relation of interest to the network. The prediction of the network (output x L ) is compared each time with the correct output value (target T ) and the interconnection weights wlji are adapted based
90
M. De Beule et al. / Mathematical and Computer Modelling 46 (2007) 88–94
on the deviation error E: E=
1 2Nset Nout
( Nset X Nout X p=1
) (Ti, p − xi,L p )2 .
(3)
i=1
In this expression, Nset stands for the number of examples in the training set, Nout for the number of output neurons and L indicates the output layer. In order to get a successful training process, a training algorithm has to be used. A frequently used algorithm is the backpropagation algorithm [2]. For this algorithm, the adjustment of the weights is done using ∆wil j = ηδi,l p x l−1 j, p δ L = f 0 (x L )(T − x L ) i, p i, p i, p i, p ! (4) X l+1 l 0 l l+1 δ = f (x ) δ w where l = 1, . . . , L − 1 i, p r, p ri i, p r
where η stands for the learning rate. Once the prediction error E is small enough, the training process can be considered as completed and the network is ready to give predictions for unknown input vectors. For more detailed information concerning the ANN, we refer the reader to Rojas and Vanlaere [3–5]. 3. Artificial neural networks and diagnosis in cardiology Ortiz et al. [6] sought to assess the usefulness and accuracy of ANN’s in the prognosis of one-year mortality of patients with heart failure. The database was divided randomly into a training data set (47 cases, 8 deaths) and a testing data set (48 cases, 7 deaths). Results of artificial neural network classification were compared with those from linear discriminant analysis, clinical judgement and conventional heuristically based programs. All four methods resulted in comparable performance and predicted more non-survivors than occurred in reality. The best ANN was able to predict the outcome with an accuracy of 90%, a specificity of 93% and a sensitivity of 71.4%. According to Ortiz et al. the cause for that low sensitivity could be explained by the fact that the number of non-survivors in the database was limited and that the data set was not well spread over the input space. Reategui et al. [7] have presented a new approach for integrating case-based reasoning (CBR) with a neural network in diagnostic systems. When solving a new problem, the neural network is used to make hypotheses and to guide the CBR module in the search for a similar previous case that supports one of the hypotheses. The neural network was trained with 143 cases and validated afterwards with 71 cases. The resulting accuracy of the network reached 87.32% and this was considered to be a very good result. Budde et al. [8] developed a prognostic computer model for individually predicting post-procedural complications in interventional cardiology (The Intervent Project). This model is able to forecast these complications (death, abrupt vessel closure with myocardial infarction and haemodynamic disorders) with an accuracy of over 95%. The ANN was based on 455 patients who had percutaneous transluminal coronary angioplasty (PTCA). Freeman et al. [9] did a similar study comparing artificial neural networks with logistic regression in the prediction of inhospital death after PTCA. He concluded that ANN’s were able to model this complication when guiding variable selection (i.e. to determine a set of input variables from a univariate analysis). However, performance was not better than traditional modelling techniques. To predict the risk of major in-hospital complications following percutaneous coronary intervention accurately, Resnic et al. [10] developed a simplified risk score model. This risk score modelling system was compared with traditional multivariate logistic regression and artificial neural network models. All three methods came to similar conclusions. It must be emphasized that no external validation of these results was carried out and that such a validation is necessary before extending these findings to additional settings. In the Prospective Cardiovascular M¨unster Study (PROCAM), Voss et al. [11] used ANN’s for the prediction of the risk of coronary events in middle-aged men. For this study he used three methods: logistic regression, multi-layer perceptron network and probabilistic neural networks. The MLP network resulted in a remarkable higher sensitivity (74.5%) in contrast to the other models (33.5% for logistic regression and 45.8% for the probabilistic neural networks). The analysis of ROC curves showed the superior performance of neural network analysis over logistic regression in predicting coronary events among middle-aged men in PROCAM. Scales et al. [12] created a system architecture that will serve as a diagnostic tool for cardiovascular disease prediction extracted from medical data sets. A fuzzy logic system, based
M. De Beule et al. / Mathematical and Computer Modelling 46 (2007) 88–94
91
Table 1 Values for the parameters used in the test set Gender Age (year) Diabetes Smoking TC/HDL SBP (mmHg)
Male 50 Yes Yes 4 120
Female 70 No No 6 140
on the domain knowledge granted by the ANN, provides an overall accuracy of 92%. Other applications of the ANN in cardiology can be found in the following fields: interpretation of ECG, analysis of medical imaging and dosage of cardiovascular medication. 4. Artificial neural networks and D’Agostino One way to prove the ability of ANN’s to deal with complex non-linear data sets is to see whether an ANN is able to predict the calculated risk scores (in %) reasonably accurately (e.g. absolute error <3%). For this reason a data set of 273 patients from the Ghent University Hospital is studied. The data are collected in a time period of about four years and all patients had a history of ischaemic heart disease and an ejection fraction of less than 40%. In a first phase the secondary risk is calculated for each patient and then a network is trained, validated and tested (with the software packet JavaNNS [13]) to see whether the network’s risk score (output) corresponds to the calculated risk values. The total database is divided randomly into a training set of 200 persons and a validation set of 73 patients. The application range for the network for the non-discrete parameters is as follows: age [39,7;89,1] (years), TC/HDL [2,017;10,05], SBP [80;190] (mmHg) and secondary risk [4,02;33,49] (%). All data are scaled as symmetrically as possible within the range [−1;1]. In the training phase there are two possible options: searching for a network with an acceptable accuracy or seeking the optimal network with the highest accuracy possible. For both options it is necessary to vary all parameters (e.g. number of hidden nodes, number of hidden layers, training algorithm, learning rate, . . . ) when searching for the lowest error on the output. To find the optimal network, it is necessary to take a lot more combinations of those parameters into consideration than with respect to the search for an acceptable network. For this study we have chosen to search for an acceptable network as it gave already reasonably accurate results. In order to have an independent performance check, it is necessary to test the network with ‘virgin’ data (not used for training and validation). As all patient data from the data set have already been used, the authors chose to generate the test set as follows. For every parameter two values (lying within the application range) are considered, as can be seen in Table 1. By combining all possible combinations of these parameters and calculating the secondary risk, a test set with 64 cases is generated. In Fig. 1 the target values are compared with the output values of the network. In this figure the bisector represents the perfect match between the target and output values. The figure clearly proves that the network performs very accurately in predicting the secondary risk at risks lower than 17%. For higher risks the maximum absolute error is about 3% and the resulting network can thus be stated as acceptable for the prediction of the secondary risk according to D’Agostino’s method. The authors would like to draw attention to the fact that the absolute errors of the test set are much higher than those for the validation set. A plausible cause for this discrepancy is that the data are not very well spread over the whole range (e.g. 14.5% women in the training set, 2.8% women in the validation set and 50% women in the test set). To make an accurate prediction with ANN’s, it is necessary to train the network with examples which are distributed evenly over the range of application for each input parameter. If this is not the case, the network can generate results which cannot be trusted in areas where few examples were available. A possibility for covering that lack in distribution is a profound study of the spreading of the data and a good (re)definition of the range of applicability of the network (e.g. exclude all women from the data set). 5. Artificial neural networks and risk stratification A second ANN is developed to predict the occurrence of a total event (i.e. cardiac death, non-fatal myocardial infarction, coronary artery bypass grafting, percutaneous transluminal coronary angioplasty or hospitalization after
92
M. De Beule et al. / Mathematical and Computer Modelling 46 (2007) 88–94
Fig. 1. Performance of the resulting network.
Fig. 2. Spread of the ejection fraction for all male data.
heart failure). Using this ANN, each patient can be categorized as a high risk (possible occurrence of an event) or a low risk person (no predicted event). Considering the determinant factors for the collected database of the Ghent University Hospital, the designed network is a tool for secondary prevention. To make an accurate prediction with ANN’s, it is necessary to train the network with examples which are evenly distributed over the range of application for each input parameter as was mentioned earlier. For this reason 200 male patients with an age between 45 and 85 years were selected from the database of 273 persons. Their heart rhythm (in rest) varies from 40 to 100 beats/min; the systolic blood pressure has a minimum value of 90 and a maximum value of 180 mmHg and the total cholesterol value lies between 120 and 280 mg/dl. All men have an ejection fraction between 15% and 40% and an end systolic volume of 50–300 ml. These ranges determine the field of applicability of the network. A useful tool for defining those ranges is the scatterplot. In Fig. 2 such a plot is given to determine the range of applicability of the ejection fraction (EF). The different slopes of the curve are an indication of the spreading of the data and are an explanation for the exclusion of the patients with an EF less than 15%. The feedforward network is trained with the backpropagation algorithm with a momentum term using the software package JavaNNS. In the quest for a network with an acceptable accuracy, numerous training procedures are carried out while varying several parameters (e.g. number of hidden layers, nodes per hidden layer, momentum term). The characteristics of the most accurate network (that produces the highest sensitivity and specificity for the validation set) are shown in Table 2. The ratio of the number of training examples to the degrees of freedom for the designed network is 2.5. This approaches the recommended value of 3 given in Freeman et al. [9]. Since all data are used for the training and validation set, the network cannot be validated with an independent test set. The performance (in %) of the network is summarized in Table 3. The accuracy (Acc) is the percentage of correct classifications, the sensitivity is the fraction of high risk persons who are tested positive (i.e. classified as high risk or occurrence of an event), the specificity is the fraction of low risk persons who are tested negative (i.e. classified as low risk or no event). The Positive Predictive Value (PPV) is the ratio of the number of correct high risk classifications to all high risk classifications; the relation of the number of correct low risk classifications to all low risk classifications is given by the negative predictive value (NPV).
93
M. De Beule et al. / Mathematical and Computer Modelling 46 (2007) 88–94 Table 2 Parameters of the resulting network Number of data Number of hidden nodes
150 (training set) Layer 1 Layer 2 Learning rate Momentum term
Parameters
50 (validation set) 5 2 0.3 0.1 1500
Number of training cycles
Table 3 Performance of the trained and validated network Data set
Acc (%)
Se (%)
Sp (%)
PPV (%)
NPV (%)
Training set Validation set
89.33 82.00
66.67 64.29
98.15 88.89
93.33 69.23
88.33 86.49
The designed ANN recognizes the low risk patients with an acceptable accuracy (i.e. 88.89%). The support for the remaining group could then be intensified in secondary prevention. In this way the “health care resources” could be spent in a more efficient and economical way. In order to use this model in clinical practice, a reliable evaluation of the network should be carried out. This involves an external and a temporal evaluation. The external check can be done by testing the performance using another database. This can be an indication of whether the network remains valuable for slightly different populations. A temporal evaluation means that the network has to be tested after a certain period of time to see whether the model remains valid. In addition a comparison of the network with existing classical methods for secondary prevention has to be made. If all tests are positive, the effect of the network on the decision making strategy of the medical staff has to be investigated. When this last evaluation is also positive, one could start thinking of implementing such a network in clinical practice. The obtained sensitivity of our network is rather low. Maybe the technique of (k-fold) cross validation can offer a solution for increasing the performance as the number of training examples is rather low (compared to the number of input parameters). As mentioned in the introduction, the input parameters (or risk factors) of our network are chosen using classical epidemiological insights. An alternative method would be to do a full investigation (e.g. principal component analysis) on the complete data set to see whether the same risk factors are obtained. If this study revealed new risk factors, their predictive value could be tested with ANN’s. Another possibility would be to decrease the number of input parameters and to study the effect on the network. Further research can lead to a network with two output nodes: one parameter that predicts the occurrence of total events and another one that determines the time to this event. 6. Conclusions According to the previous discussions, the present work is directed towards an encouragement of using the technique of ANN’s in the field of epidemiology where these networks can provide new insights into the predictive quality of certain risk factors for risk prevention and could be supplementary to classical determination methods. To reach this goal, a profound study of the spreading of the data and a good definition of the range of applicability of the network is crucial. The designed ANN is able to classify low risk patients with an accuracy of 88.89%, which offers interesting perspectives in relation to an efficient and economical use of the “health care resources”. No external and temporal evaluation of the network has been carried out to date, so the network is not (yet) ready for clinical practice. Besides those reliable evaluations, subjects for further research are: • • • •
to compare the network’s performance with existing classical methods for secondary prevention; to study the effect of the network on the decision making strategy of the medical staff; to increase the network’s performance regarding sensitivity; to predict the occurrence of and the time to an event.
94
M. De Beule et al. / Mathematical and Computer Modelling 46 (2007) 88–94
Acknowledgements The authors would like to acknowledge Prof. Dr. T. Gillebert, Prof. Dr. J. De Sutter and MSc. M. De Buyzere from Ghent University Hospital for their valuable support. References [1] R.B. D’Agostino, M.W. Russell, D.M. Huse, et al., Primary and subsequent coronary risk appraisal: New results from the Framingham study, American Heart Journal 139 (2000) 272–281. [2] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by backpropagating errors, Nature 323 (1986) 533–536. [3] R. Rojas, Neural Networks — A Systematic Introduction, Springer-Verlag, Berlin, Heidelberg, 1996. [4] W. Vanlaere, Toepassing van neurale netwerken bij de berekening van constructies, Master Thesis, Ghent University, 2001. [5] W. Vanlaere, P. Buffel, G. Lagae, R. Van Impe, J. Belis, Neural networks for assessing the failure load of a construction, Journal of Computational and Applied Mathematics 168 (1–2) (2004) 501–508. [6] J. Ortiz, G.C.M. Ghefter, C.E.S. Silva, One-year mortality prognosis in heart failure: A neural network approach based on electrocardiographic data, Journal of the American College of Cardiology 26 (1995) 1586–1593. [7] E.B. Reategui, L. Ohno-Machado, B.F. Leao, Combining a neural network with case-based reasoning in a diagnostic system, Artificial Intelligence in Medicine 9 (1997) 5–27. [8] T. Budde, M. Haude, W. H¨opp, A prognostic computer model to individually predict post-procedural complications in interventional cardiology, European Heart Journal 20 (1999) 354–363. [9] R.V. Freeman, K.A. Eagle, E.R. Bates, Comparison or artificial neural networks with logistic regression in prediction of in-hospital death after percutaneous transluminal coronary angioplasty, American Heart Journal 140 (2000) 511–520. [10] F.S. Resnic, L. Ohno-Machado, A. Selwyn, Simplified risk score models accurately predict the risk of major in-hospital complications following percutaneous coronary intervention, American Journal of Cardiology 88 (2001) 5–9. [11] R. Voss, P. Cullen, H. Schulte, Prediction of risk of coronary events in middle-aged men in the Prospective Cardiovascular M¨unster Study (PROCAM) using neural networks, International Journal of Epidemiology 31 (2002) 1253–1262. [12] R. Scales, M. Embrechts, Computational intelligence techniques for medical diagnostics, in: Proc. Walter Lincoln Hawkins ’32 Graduate Research Conference, New York, 2002. [13] Java Neural Network Simulator, User Manual, Version 1.1, Fischer, Hennecke, Bannes and Zell, University of T¨ubingen.