Predictions of coronary artery stenosis by artificial neural network

Predictions of coronary artery stenosis by artificial neural network

Artificial Intelligence in Medicine 18 (2000) 187 – 203 www.elsevier.com/locate/artmed Predictions of coronary artery stenosis by artificial neural n...

134KB Sizes 0 Downloads 31 Views

Artificial Intelligence in Medicine 18 (2000) 187 – 203 www.elsevier.com/locate/artmed

Predictions of coronary artery stenosis by artificial neural network Bert A. Mobley a,*, Eliot Schechter b, William E. Moore c, Patrick A. McKee b, June E. Eichner c a

Department of Physiology, College of Medicine, Uni6ersity of Oklahoma, Oklahoma City, OK 73190, USA b Department of Medicine, College of Medicine, Uni6ersity of Oklahoma, Oklahoma City, OK 73190, USA c Department of Biostatistics and Epidemiology, College of Public Health, Uni6ersity of Oklahoma, Oklahoma City, OK 73190, USA Received 6 July 1999; received in revised form 28 August 1999; accepted 7 September 1999

Abstract Data from angiography patient records comprised 14 input variables of a neural network. Outcomes (coronary artery stenosis or none) formed both supervisory and output variables. The network was trained by backpropagation on 332 records, optimized on 331 subsequent records, and tested on final 100 records. If 0.40 was chosen as the output distinguishing stenosis from no stenosis, 81 patients who had stenosis would have been identified, while 9 of 19 patients who did not have stenosis might have been spared angiography. The results demonstrated that artificial neural networks could identify some patients who do not need coronary angiography. © 2000 Elsevier Science B.V. All rights reserved. Keywords: Artificial neural networks; Coronary angiography; Coronary artery disease; Coronary artery stenosis; Outcome predictions

1. Introduction Relatively few individuals referred for coronary angiography are found to be free from coronary artery stenosis, but some negative angiography results are unavoid* Corresponding author. Tel.: +1-405-2712284; fax: + 1-405-2713181. E-mail address: [email protected] (B.A. Mobley) 0933-3657/00/$ - see front matter © 2000 Elsevier Science B.V. All rights reserved. PII: S 0 9 3 3 - 3 6 5 7 ( 9 9 ) 0 0 0 4 0 - 8

188

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

able because the non-invasive diagnostic systems utilized prior to angiography are not perfectly sensitive and specific. Furthermore, the patient, medical, legal and business communities are reasonably more accepting of a few false positive pre-angiographic diagnoses than of any false negative diagnoses. Nevertheless, a number of systems have been proposed to reduce the number of false positives, i.e. individuals who, in retrospect, may have been unnecessarily subjected to the concerns, risks and expenses of the invasive procedure of coronary angiography [5,10,13,20,22,23,28]. Artificial neural networks produce complicated nonlinear models relating the inputs, i.e. the independent variables of a system, to the outputs, e.g. the dependent predictive variables. Neural networks can be trained to recognize patterns, and the nonlinear models developed during training also allow neural networks to generalize their conclusions and to make application to patterns not previously encountered. A potential application of neural networks is predicting medical outcomes such as coronary artery stenosis. Demographic and medical data from a cross-sectional study of coronary angiography patients were collected and entered into a database. Then a neural network was developed to predict the presence or absence of coronary artery stenosis. The network was trained, cross-validated and tested on patient records from the database, records that contained the correct answers regarding stenosis (supervisory variables) as a result of angiography. The result was a prototype of a neural network system that might safely spare some patients from coronary angiography without also endangering those patients with coronary artery stenosis.

2. Methods

2.1. Patient data and neural network model 6ariables Data from 763 consecutive coronary angiography patients at the Veterans Affairs Medical Center (VAMC) and University Hospital (UH) in Oklahoma City were collected and recorded during the period, November, 1992–March, 1994. Patients underwent angiography as a result of the suspicion of angina, past or recent history of myocardial infarction, a positive stress test or as an adjunct to other hemodynamic assessments (see Section 4). The Institutional Review Board approved the study. Cardiology Fellows invited patients to participate in the study and obtained their informed consent. Table 1 shows the variables used in the neural network model, which was designed to be a predictive system for coronary artery stenosis. Outcomes defined by at least one measurable coronary stenosis or none (revealed by angiography and represented dichotomously by a ‘1’ or a ‘0’, respectively) were used as supervisory variables. The supervisory variables were utilized during the training, cross-validation and testing of the neural network and compared to the outcomes represented by the network predictions, i.e. real numbers ranging from 0 to 1, which were the dependent output variables of the neural network.

Table 1 CAS-model Supervisory variable Definition

Range

Mean

S.D.

Median

1. CAS

Coronary artery stenosis (Y/N, 1/0)

0, 1

0.85

0.35

NA

Range

Mean

S.D.

.Median

0–1

NA

NA

NA

Dependent output variable Variable

Definition

1. Network pre- Prediction of coronary artery stenosis (Y/N) depending on the cutoff prediction chosen diction between 0 and 1 Independent input variables Units

Range

Mean

S.D.

Median

1 2 3 4 5 6 7

years M(1), F(0) NW(1), W(0) Current (2), former (1), never (0) Yes (3), maybe (2), maybe not (1), no (0) Yes (3), maybe (2), maybe not (1), no (0) kg/m2

22–83 0, 1 0, 1 0, 1, 2 0, 1, 2, 3 0, 1, 2, 3 16–56

58 0.82 0.10 1.43 1.02 1.90 28

11 0.38 0.31 0.71 1.35 1.41 6

59 NA NA NA NA NA 28

mg/dl mg/dl mg/dl mg/dl ratio

0.4–19 25–1551 81–636 10–102 1.9–27.7

1.2 184 220 33 7

1.1 142 58 10 3

1.1 150 213 31 7

mg/dl mg/dl

79–999 0–272

367 36

135 39

339 21

Age Sex Race Smoking Diabetes Hypertension Body-Mass Index 8 Creatinine 9 Triglycerides 10 Cholesterol 11 HDL 12 Cholesterol/ HDL 13 Fibrinogen 14 Lipoprotein(a)

189

Variable

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

Variable

190

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

Fourteen independent input variables, essentially risk factors for coronary artery disease, were incorporated in the model. All variables and their units or modes of representation are shown in Table 1. A patient was deemed to be a former smoker and was represented by a one if he or she had stopped smoking more than a year before the angiography. If the diabetic status of a patient at angiography was recorded as unknown, no or left blank, but the fasting blood sugar was discovered to be greater than 140 (mg/dl), then, for purposes of the neural network model, the diabetic status was classified as ‘maybe’ and represented by a two. If the diabetic status recorded at angiography was unknown or left blank, but the fasting blood sugar was less than or equal to 140 (mg/dl), then the status for the network model was classified as ‘maybe not’ and represented by a one. (At the time this data was collected, the criterion for diagnosing diabetes was a fasting blood sugar of 140 mg/dl.) If the hypertensive status of a patient at angiography was marked unknown, no or left blank, but either the systolic blood pressure was greater than 140 mmHg or the diastolic blood pressure was greater than 90 mmHg, then the status for the network was considered to be ‘maybe’ and represented by a two. If the hypertensive status was initially marked unknown or left blank, but the systolic pressure was 140 mmHg or less and the diastolic pressure was 90 mmHg or less, then the category ‘maybe not’, represented as a 1, was utilized. Twenty-nine patients underwent angiography more than once during the period when data were collected. However, only records from the first angiography of each patient were included in this analysis. Seven hundred and sixty three patients had complete records of the output and input variables in Table 1. Four hundred and sixty two individuals were patients of the VAMC, and 301 were patients of the UH. Cineangiograms were obtained in two views for each vessel. The degree of each obstruction was estimated visually in the view in which the stenosis was judged most severe. For the purposes of this study, a patient was deemed to have coronary artery stenosis if any stenosis, mild or severe, was noted on the angiography report (see Section 4). When patients gave informed consent to participate in the study and allow examination of their medical records, they also agreed to contribute a sample of blood for the laboratory analyses listed in Table 1. Before heparin was administered, 55 ml of blood was taken through the arterial sheath. The laboratory procedures for the analysis of blood have been described [11,12].

2.2. Patient files In order that the patient records from this cross-sectional study could be used to simulate an implemented predictive system, records for the training, cross-validation and test files were selected from patients in sequence rather than in random order. Records from the first 332 patients to undergo coronary angiography composed a training file for the neural network. A cross-validation file was formed from records of the next 331 patients [26,19] and a test file was composed of records from the final 100 patients in the series. Table 2 shows the number of patients with and without coronary artery stenosis in all three files. A chi-square test revealed no

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

191

Table 2 Files* File name (consecutive patients)

Coronary stenosis patients

Controls (no stenosis)

Total patients

Training Cross-validation Test

291 280 81

41 (12%) 51 (15%) 19 (19%)

332 331 100

* P =0.214 (chi square).

statistically significant difference in the proportion of controls (angiography patients with no measurable stenosis) in the three files (P= 0.21)

2.3. Artificial neural network The database was converted to spreadsheet format and applied to the neural network (NeuroShell 2, Ward Systems Group, Inc., Frederick, MD). The network was composed of three layers (Fig. 1). The input layer had 14 elements with linear transfer functions, corresponding to the 14 independent variables. The middle or hidden layer had 26 elements with logistic transfer functions, and the output layer

Fig. 1. A representation of the artificial neural network used in this project. As the training file was processed, the difference between each dichotomous supervisory variable, CAS (1/0, stenosis/no stenosis), and each corresponding dependent output variable produced by the network (0 – 1) was the basis for modifying the 390 multiplicative weights.

192

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

had one logistic element corresponding to the single dependent variable. The single output element predicted coronary artery stenosis by virtue of its numerical output, which ranged from 0 to 1. Although the elements of the input layer had linear transfer functions, the magnitudes of inputs were nevertheless constrained between 0 and 1 by a normalizing procedure; each entry was divided by the maximum entry for that variable from any record. The number of elements in the middle layer was calculated as the square root of 332 (the number of patient records in the training file) plus half the sum of the number of input and output elements (7.5). The number of elements in the middle layer and its calculation, as described, was the recommendation of the producers of the neural network software. Furthermore, no other number of elements in the middle layer produced better network results on the test file (see Section 4). The network was trained; i.e. the magnitudes of the network weights illustrated in Fig. 1 were determined by the method of backpropagation of errors with momentum [24,25]. The learning constants and momentum constants of the network were each set to 0.1, as was the maximum of the initial randomized weights in all 390 pathways to all elements in the second and third layers of the network. The training records in each epoch of 332 were always presented randomly to the network, and the performance of the network on the cross-validation file was examined after the network processed each epoch of 332 training records. Network weights (the 390 weights on the elements of the second and third layers of the neural network) were retained each time the total mean squared error between the supervisory variables and the dependent output variables from the cross-validation file (331 records) became less than the previous minimum. Therefore, the software ultimately saved the version of the trained network that minimized the mean squared error on the cross-validation file. As a precaution against overlooking a trained network which would perform optimally on the cross-validation file, the search for an optimal network was not halted until 1 000 000 training records were processed subsequent to saving the last set of network weights. However, the optimally trained network was actually achieved after only 105 epochs of the training file were processed, i.e. after 34 860 training iterations (105× 332= 34 860). The optimally trained network was then tested for its ability to predict stenosis in the test file of 100 records. However, when considering the performance of the network on the test file, the prediction of stenosis or no stenosis for individuals was the sole concern; the mean squared error for records of the entire test file was of no consequence per se. Before any output of the network (0–1) could be interpreted as a prediction of stenosis or none, a cutoff prediction between 0 and 1 had to be chosen. Network outputs greater than the cutoff prediction were interpreted to be predictions of stenosis, while outputs equal to or less than the cutoff prediction were interpreted as no stenosis.

2.4. ROC analysis and logistic regression Because no single network output between 0 and 1 served as a perfect cutoff prediction for stenosis, receiver operating characteristic (ROC) analysis was applied

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

193

to the output data of the test file. Twenty-one equally spaced cutoff predictions were examined in the range 0.00–1.00. The true and false positives and negatives were calculated for each cutoff prediction, and the sensitivity, specificity, and positive and negative predictive values were also determined for each cutoff. The cumulative ROC area was also calculated. Patient records were further tested with logistic regression. Both ROC analysis and logistic regression were accomplished with NCSS Statistical Software (Kaysville, UT).

3. Results

3.1. Network predictions Fig. 2 shows the results of predictions of the neural network concerning records from the test file, records from the final 100 patients in the series of 763 patients to undergo coronary angiography. The axis labeled CAS shows the correct answers, i.e. the supervisory variables, (1/0, stenosis/no stenosis), indicative of the results of coronary angiography. The axis labeled NETWORK PREDICTIONS shows the predictive outputs from the network, i.e. real numbers ranging from 0 to 1.

Fig. 2. Scatterplot showing the performance of the trained neural network on the test file of 100 patient records. The axis, CAS (Coronary Artery Stenosis) represents the dichotomous results of coronary angiography (1/0, stenosis/no stenosis) on the patients (supervisory variables). The axis, NETWORK PREDICTIONS, represents the magnitudes of the dependent output variable produced by the trained neural network on each of those patients.

194

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

Generally the smaller numerical predictions were associated with records of patients having no stenosis, while the larger numerical predictions were generally associated with records noting stenosis.

3.2. ROC analysis Table 3 shows the results of ROC analysis on the network predictions shown in Fig. 2. The true positives were those patients whose predictions were greater than the cutoff prediction, who were therefore predicted to have stenosis, and who were shown by angiography to have stenosis. False positives were those patients predicted by the network to have stenosis and who were shown by angiography to have no stenosis. True negatives were those patients predicted by the network to have no stenosis (predictions less than or equal to the cutoff prediction) and who were shown by angiography to have no stenosis. False negatives were those patients predicted by the network to have no stenosis but who were shown by angiography to have stenosis. A cutoff prediction of 0.40 is the largest at which the sensitivity was still 1.00 i.e., all patients with stenosis were identified (See Fig. 2 as well as Table 3.). Consequently, there were no false negatives at a cutoff prediction of 0.40, and therefore the negative predictive value of the network was also 1.00, i.e. all patients identified as having no stenosis indeed had no stenosis. The cutoff prediction of 0.40 identified nine of 19 patients without stenosis. The cumulative ROC area of 89% is illustrated in Fig. 3; it should be compared to 100%, which would be the ROC area if there were at least one cutoff prediction, which discriminated perfectly between all records of patients with and without stenosis. The ROC area of 89% should also be compared to a value of 50% (the diagonal line in Fig. 3), which presumably would be the area appropriate for a device which randomly assigned patient records to the categories of stenosis or no stenosis.

3.3. Variable ranking Table 4 shows a relative ranking of the 14 independent variables in the trained neural network model. The numbers, which determine the rankings, are the sums of the absolute values of the multiplicative weights in each set of 52 (26+ 26) connective pathways, which intervened between each of the 14 input elements and the single output element. Age and sex had the largest cumulative weights in the trained network, while creatinine and triglycerides had the smallest cumulative weights. Twelve of the 19 patients in the test file who did not have stenosis were women. All of the nine true negatives identified at a cutoff prediction of 0.40 were women. Another network was run without creatinine and triglycerides as independent input variables. That network of 12 independent variables was less effective at identifying the true negatives in the test file than the network of 14 independent variables described above.

Table 3 Test results by ROC True pos

False pos

False neg

True neg

Sensitivity

Specificity

PPVa

NPVb

Cumulative ROC area

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

81 81 81 81 81 81 81 81 81 79 78 77 73 71 69 65 62 55 43 29 0

18 18 18 17 16 15 15 13 10 9 9 8 7 6 6 6 3 2 1 0 0

0 0 0 0 0 0 0 0 0 2 3 4 8 10 12 16 19 26 38 52 81

1 1 1 2 3 4 4 6 9 10 10 11 12 13 13 13 16 17 18 19 19

1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.98 0.96 0.95 0.90 0.88 0.85 0.80 0.77 0.68 0.53 0.36 0.00

0.05 0.05 0.05 0.11 0.16 0.21 0.21 0.32 0.47 0.53 0.53 0.58 0.63 0.68 0.68 0.68 0.84 0.89 0.95 1.00 1.00

0.82 0.82 0.82 0.83 0.84 0.84 0.84 0.86 0.89 0.90 0.90 0.91 0.91 0.92 0.92 0.92 0.95 0.96 0.98 1.00 0.00

1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.83 0.77 0.73 0.60 0.57 0.52 0.45 0.46 0.40 0.32 0.27 0.19

0.05 0.05 0.05 0.11 0.16 0.21 0.21 0.32 0.47 0.53 0.53 0.58 0.62 0.67 0.67 0.67 0.80 0.83 0.87 0.89 0.89

a b

PPV; Positive predictive value. NPV; Negative predictive value.

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

Cutoff prediction

195

196

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

Fig. 3. The receiver operating characteristic (ROC) curve is shown representing the neural network performance on the test file of 100 patient records. Sensitivity with respect to 1-Specificity is plotted for each of the 21 cutoff predictions listed in Table 3. The curve encloses a cumulative ROC area of 0.89 (0.03 SEM). The diagonal curve obviously has a cumulative area of 0.50 and would be representative of a device predicting stenosis randomly.

Still another network model was also trained and tested. That network included a 15th independent input variable, which was assigned a ‘1’ or ‘0’ randomly. The performance of the network with the added random dichotomous variable was exactly the same as the network with 14 input variables described above, and the added random variable had the lowest relative ranking of all 15 variables. Table 4 Variable rankings Independent variable

Sum of weights

Age Sex Fibrinogen Cholesterol Race Cholesterol/HDL Smoking HDL Body-mass index Diabetes Lipoprotein (a) Hypertension Triglycerides Creatinine

8.05 7.19 6.73 6.66 5.88 5.83 4.97 4.51 4.20 4.01 3.42 2.69 1.84 1.61

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

197

3.4. Ad6ance determination of cutoff If a neural network system similar to the one described herein were implemented to assist the taking of clinical decisions, a method of approximating the optimum cutoff prediction in advance (e.g., 0.40 for this test file) would be critical. Of course one could retain a test file, even in an implemented system, and use that test file for the sole purpose of determining the optimum cutoff prediction, i.e. the cutoff to be used when making actual predictions on subsequent individual patients. Alternatively, we examined the possibility that the performance of the trained network on the training file itself might provide a vehicle for approximating the optimum cutoff prediction in advance. Upon using this latter method, the records in the test file of this study then became analogous to individual patients on who advance predictions of stenosis or none could be made by the neural network system. The largest cutoff, which identified as many true negatives in the training file as possible without including a false negative in the grouping, was 0.30. In turn, the application of 0.30 as the cutoff for the network predictions in the test file would have allowed the neural network to identify four of the 19 patients without coronary stenosis without recommending that any patients with stenosis not undergo angiography.

3.5. Logistic regression Finally, the predictive ability of the neural network was compared to the predictive ability of logistic regression. Multivariable logistic regression was performed on the 663 records from what were the combined training and cross-validation files of the neural network. Because the records of the cross validation file were not needed to determine when to stop the logistic regression, in contrast to the case of the neural network, the logistic regression was performed on all 663 records. When the coefficients of logistic regression were applied to the records of the test file, a cutoff prediction of 0.20 was the largest at which the sensitivity was 1.00, there were no false negatives and the negative predictive value was 1.00. However, a cutoff prediction of 0.20 identified only two of 19 patients without stenosis in the test file. 4. Discussion

4.1. Summary The data for this project were taken from a cross-sectional study of coronary angiography patients. Samples of blood as well as demographic data and a limited medical history were taken at the time of angiography. Nevertheless, the study was conducted as if the data had been collected prospectively i.e. the 100 records of the test file were made analogous to records of patients on whom predictions of coronary stenosis were to be made. Consequently, it was important that the records of the test file be records from the latest patients to undergo angiography rather than records selected randomly from the entire database.

198

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

A cutoff prediction of 0.40 by the network, when applied to the processing of records in the test file identified nine of 19 patients without stenosis, while zero false negatives were included in the group whose network predictions were 0.40 or less. In contrast, logistic regression identified only two of the 19 patients found to be free of coronary stenosis under similar circumstances. Theoretically, the nine patients (women) identified by the neural network to be without stenosis might have been spared the invasive procedure of coronary angiography with its associated expenses and concerns, but no patient with stenosis would have been advised to forgo angiography. Artificial neural networks have been used to predict acute myocardial infarction in prospective studies [3,4]. However, the small prevalence of acute myocardial infarction in patients presenting with anterior chest pain caused the predictive power of a positive test by the network to be low even though the sensitivity, specificity and predictive power of a negative test were impressive. By contrast, as a result of pre-angiographic evaluation and testing, most of the patients referred for coronary angiography in this study did indeed have coronary artery stenosis; therefore it was the low prevalence of controls, i.e. individuals without coronary artery stenosis that made the predictive power of a negative test the major concern in this study. Nevertheless, we found that the neural network could indeed recommend sparing the procedure of coronary angiography to some patients discovered to be without stenosis (nine of 19), without recommending that angiography be denied to any patients discovered to have stenosis.

4.2. Conser6ati6e approach The neural network system was appropriately conservative in terms of the standards used in its development and in terms of its implied future application. The degree of stenosis utilized to train the network was the notation of any measurable stenosis in each angiography, not for example the 50% or greater stenosis generally regarded as likely to produce ischemic symptoms. Furthermore, a network cutoff prediction was sought which identified as many false positives as possible without risking the denial of angiography to anyone having stenosis (however slight). For example, the sum of network sensitivity and specificity was greatest (1.61) at a cutoff prediction of 0.80. However, Table 3 reveals nineteen false negatives at a cutoff prediction of 0.80, clearly an unacceptable result. Additional conservative suggestions would be that network models be trained and tested in the local environment (physician group, hospital) in which they are to be implemented and that updating the training and testing of the network become routine. An advantage of the network system, (true for any formalized mathematical or statistical system) is that all of the input variables contribute to a decision regarding the recommendation of angiography for each patient and that all of the variables contribute objectively i.e., to their trained degree. Nevertheless, even if the system were implemented and employed routinely, any decision of the network would only provide additional information to the physician and the patient, based on evidence

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

199

from the training on all variables. The physician would remain free to recommend and the patient to take a decision regarding angiography based on an emphasis of one or more of the variables included in the network model or on variables not included in the model. Sometimes, following an assessment of the chest pain, the risk factors for coronary artery disease and even other tests such as a stress test, there nonetheless remains a concern about whether angiography is necessary. Such situations of uncertainty probably occur more frequently in women patients than in men. The evidence-based neural network system in this study is proposed for such occasions.

4.3. Cutoff prediction No theoretical work defines how the appropriate cutoff prediction for network processing of a test file or for actual predictive records should be determined. We examined an empirical approach and used the performance of the trained network on the training file to determine an approximation to the optimum cutoff for the test file. The optimum cutoff resulting from the network processing of the training file was only 0.30. A cutoff of 0.30, if applied to the patient records of the test file, might have spared four of 19 patients from angiography. While it is disappointing that a cutoff closer to 0.40 was not obtained from the training file, it is nonetheless important to note that the performance of the network on the training file did not recommend a cutoff greater than 0.40. A cutoff greater than 0.40, if applied to the test file, would have caused the inclusion of false negatives in the group whose network predictions were less than or equal to that larger cutoff, meaning in effect that the network would have recommended denial of angiography to individuals with stenosis. Additional studies are needed to determine whether network performance on the training file is suitable for defining the cutoff prediction to be used in making predictions of stenosis. Alternatively, a test file could be retained as part of a system implemented to predict stenosis. The optimum cutoff prediction revealed by network performance on the test file could then be used in making predictions of stenosis.

4.4. Catheterization patients Judged by the suspicion of angina or history of myocardial infarction, 704, i.e. 92.3% of the patients in this study were referred for angiography because of suspected coronary stenosis. Additionally, 39 records (5.1%) had no entry in the angina category and no record of myocardial infarction, although some of those patients may have been referred for angiography as a result of a positive stress test. However, some patients (20, 2.6%) denied angina, had no record of myocardial infarction, and were probably referred for catheterization for hemodyanmic assessment and not for suspected coronary artery stenosis. Two of the nine patients correctly identified from the test file by a cutoff of 0.40 as not having stenosis were referred for catheterization in anticipation of valve replacement; coronary artery disease was not suspected.

200

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

4.5. Variables Family history of coronary artery disease is a proven risk factor for coronary stenosis and was an obvious candidate for inclusion as an independent variable in this study [8]. Initially, we provided for the inclusion of a dichotomous variable of family history. However, late in the study we became aware that all patients were not asked about family history, nor were questions and standards on the subject always posed in the same manner. Consequently, this important variable was excluded from the model. The ratio, cholesterol/HDL, was employed as an independent variable (see Table 1 and Table 4) in order to provide some approximate information about LDL. Measurements of LDL were missing from many records, i.e. those in which the triglycerides were greater than 400 mg/dl [11]. Information about family history, LDL, homocysteine [21], hormone replacement therapy [18], proteinuria [29], Chlamydia and cytomegalovirus [14,27] as well as data from the electrocardiogram, echocardiogram, stress test [17], chest X-ray and characteristics of chest pain are candidates for inclusion as independent variables in future studies utilizing neural networks to predict coronary artery stenosis. The ranking of variables shown in Table 4, while interesting, does not necessarily reveal simple relationships regarding the importance of the individual variables with respect to the output predictions. As noted, the mathematical relationship describing the neural network model is nonlinear and complicated. At present there is no reason to think that the relative importance of the independent variables in this study would be replicated in similar studies of other angiography patients; additional investigation is needed.

4.6. O6erfitting A neural network is subject to what is known as the memorization of training data, otherwise known as the statistical phenomenon of overfitting. If a network overfits or memorizes the training data, its generalized performance on other sample populations such as the test file in this project, or on records for which prospective predictions are to be made, is likely to be severely compromised. Several methods have been proposed as means by which neural network systems can avoid the phenomenon of overfitting the training data. Those methods involve calculations which quantify a trade-off between increasing the size of the training file and limiting the number of elements in the hidden layer of the network [2,6,7]. Preliminary experiments, in which those methods were tested, showed that they were less effective than the mechanism we finally adopted, namely the introduction of a cross-validation file on which the performance of the network was optimized during training [26,19]. The performance of the network on the training file always continued to improve significantly after the optimum performance on the cross-validation file was achieved (see Section 2). Although a cross-validation file was used to avoid overfitting the training data, the calculation of the number of elements in the middle layer of the network, as recommended by the producers of the neural

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

201

network software, was nonetheless obviously dependent on the number of training records (see Section 2). Although no theoretical justification exists, it is often asserted that multivariable regression, of which neural network analysis is a type, requires sufficient data to provide at least ten outcome events per independent variable [9]. Table 2 reveals that while the ratio of records of stenosis in the training file to independent variables is 21, the ratio of controls, i.e. patient records without stenosis, to independent variables is only three. The importance of the ‘rule of ten’ in neural network studies and the relative importance of the rule of ten in systems which employ cross-validation files to avoid the effects of overfitting have not yet been systematically investigated. Fifty-three of 137 women in this study (17/50 in the Training File, 24/63 in the Cross-Validation File and 12/24 in the Test File) were without stenosis (39%), while only 58 of 626 men in the study (24/282 in the Training File, 27/268 in the Cross-Validation File and 7/76 in the Test File) were without stenosis (9%). It has been shown that women are referred for angiography less frequently than men [15] and that women hospitalized for coronary heart disease undergo fewer major diagnostic procedures than men [1]. However, another study has also demonstrated that women were more likely than men to have normal coronary arteries at angiography [16]. Depending on the cutoff prediction chosen, four to nine of the 12 women in the test file who didn’t have stenosis were identified by the neural network, while none of the seven men who did not have stenosis were identified. Apparently, an insufficient number of records of men without stenosis were present in the training file in order to train the network to identify men without stenosis in the test file. An extensive prospective trial devoted to predicting coronary stenosis in women would be important and might provide the foundation for a clinically useful system.

5. Conclusions This study produced a prototype of a neural network system that might safely spare some patients from coronary angiography and yet not endanger anyone with coronary artery stenosis by recommending that they not undergo angiography. The results demonstrated that, given a multiplicity of input data or risk factors for coronary artery disease, artificial neural networks could nonetheless become valuable tools in identifying patients who do not need coronary angiography.

Acknowledgements Collection of the data was supported by Oklahoma Center for the Advancement of Science and Technology, Grant HR2-025 and American Heart Association/Oklahoma Affiliate, Grant 93078550 to Dr Eichner.

202

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

References [1] Ayanian JZ, Epstein AM. Differences in the use of procedures between women and men hospitalized for coronary heart disease. N Engl J Med 1991;325:221 – 5. [2] Baum EB, Haussler D. What size net gives valid generalization. Neural Comp 1989;1:151 – 60. [3] Baxt WG. Use of an artificial neural network for the diagnosis of myocardial infarction. Ann Int Med 1991;115:843–8. [4] Baxt WG, Skora J. Prospective validation of artificial neural network trained to identify acute myocardial infarction. Lancet 1996;347:12 – 5. [5] Bobbio M, Fubini A, Detrano R, Shandling AH, Ellestad MH, Brezden O, et al. Diagnostic accuracy of predicting coronary artery disease related to patients’ characteristics. J Clin Epidemiol 1994;47:389–95. [6] Carpenter WC, Barthelemy JF. Common misconceptions about neural networks as approximators. J Comp Civ Engr 1994;8:345–58. [7] W.C. Carpenter, M.E. Hoffman. Training backprop neural networks, AI Expert 1995; 30 – 33. [8] M. Ciruzzi, H. Schargrodsky, J. Rozlosnik, P. Pramparo, H. Delmonte, V. Rudich, et al. for the Argentine FRICAS Investigators. Frequency of family history of acute myocardial infarction in patients with acute myocardial infarction, Am. J. Cardiol. 1997;80: 122 – 127. [9] Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariable models. Ann Int Med 1993;118:201–10. [10] Do D, West JA, Morise A, Atwood JE, Froelicher V. An agreement approach to predict severe angiographic coronary artery disease with clinical and exercise test data. Am Heart J 1997;134:672 – 9. [11] Eichner JE, Moore WE, McKee PA, Schechter E, Reynolds DW, Qi H, et al. Fibrinogen levels in women having coronary angiography. Am J Cardiol 1996;78:15 – 8. [12] Eichner JE, Moore WE, Schechter E, Reynolds DW, Morrissey JH, Comp PC. Activated factor VII levels in patients with angiographically confirmed coronary artery disease. Am J Cardiol 1997;80:217–9. [13] Graboys TB, Biegelsen B, Lampert S, Blatt CM, Lown B. Results of a second-opinion trial among patients recommended for coronary angiography. J Am Med Assoc 1992;268:2537 – 40. [14] Gura T. Infections: a cause of artery-clogging plaques. Science 1998;281:35 – 7. [15] Heller LI. Diagnostic evaluation of women with suspected coronary artery disease. Cardiology 1995;86:318–23. [16] Jong P, Mohammed S, Sternberg L. Sex differences in the features of coronary artery disease of patients undergoing coronary angiography. Can J Cardiol 1996;12:671 – 7. [17] Khaw KT, Barrett-Connor E. Sex differences, hormones, and coronary heart disease. In: Coronary Heart Disease Epidemiology: From Aetiology to Public Health. England: Oxford University Press, 1992:274–86. [18] Kukar M, Kononenko I, Groselj C, Kralj K, Fettich J. Analysing and improving the diagnosis of ischaemic heart disease with machine learning. Art Intell Med 1999;16(1):25 – 50. [19] Leahy K. The overfitting problem in perspective, AI Expert 1994; 35 – 36. [20] Maddahi J, Gambhir SS. Cost-effective selection of patients for coronary angiography. J Nucl Cardiol 1997;4:S141–51. [21] McCully KS. The Homocysteine Revolution. New Canaan, CT: Keats Publishing, 1997. [22] Ouzan J, Chapoutot L, Carre E, Liehn JC, Elaerts J. A multivariate analysis of the diagnostic values of clinical examination, exercise testing and exercise radionuclide angiography in coronary artery disease. Cardiology 1993;83:197 – 204. [23] Pashkow FJ. Diagnostic evaluation of the patient with coronary artery disease. Cleve Clin J Med 1994;61:43–8. [24] Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL, editors. Parallel Distributed Processing: Explorations in the Microstructures of Cognition, vol. 1. Cambridge, MA: MIT Press, 1986:318 – 62.

B.A. Mobley et al. / Artificial Intelligence in Medicine 18 (2000) 187–203

203

[25] Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature 1986;323:533–6. [26] Rumelhart DE, Widrow B, Lehr MA. The basic ideas in neural networks. Comm ACM 1994;37:87–92. [27] Sadegh-Zadeh K. Fundamentals of clinical methodology: 2. Etiology. Art Intell Med 1998;12(3):227–70. [28] Yamada H, Do D, Morise A, Atwood JE, Froelicher V. Review of studies using multivariable analysis of clinical and exercise test data to predict angiographic coronary artery disease. Prog Cardiovasc Dis 1997;39:457–81. [29] Yudkin JS, Forrest RD, Jackson CA. Microalbuminuria as predictor of vascular disease in non-diabetic subjects. Lancet 1988;3:530 – 3.

.