Annals of Oncology 4 (SuppL 4): S31-S34, 1993. O 1993 Kluwer Academic Publishers. Printed m the Netherlands.
Symposium article Neural network analysis to predict treatment outcome H. J. Kappen1 & J. P. Neijt2 1 University of Nijmegen, Laboratorium for Medical and Biophysics and Dutch Foundation for Neural Networks, Nijmegen; Internal Medicine, Section of Oncology, Utrecht University Hospital, Utrecht, The Netherlands
2
Department of
status as being important for survival. The neural network identified additional predictive factors such as place of diagBackground: Quantitative methods for the analysis of prog- nosis and age. As the Cox's prognostic index has not been nostic information are important in order to use this knowl- tested to predict survival on an independent data set a comedge optimally. The neural network is a new quantitative parison with the results obtained in the neural network test method where the fundamental building blocks are units set could not be performed. which can be likened to neurons, and weighted connections Conclusions: Neural networks perform at least as well as which can be likened to synapses. The more the hidden units, Cox's method for the prediction of survival, and prognostic the more complex the patterns that can be learnt. factors can easily be identified. The analysis not only revealMaterials and methods: Data from two Dutch studies in ed the predictive power of some characteristics, but also the ovarian cancer were used to compare the previously reported non-predictive power of the others. survival rates predicted by the Cox's prognostic index with the prediction obtained by a neural network. Results: Both the Cox's analysis and the neural network Key words: ovarian cancer, prognostic factors, neural netagreed on residual tumour size, stage, and performance work Summary
At present one of the leading issues in the treatment of cancer is the identification of subgroups of patients with a good or poor prognosis. A better definition of these subgroups leads to the creation of less toxic treatment schedules for good risk patients and new treatment strategies for the bad risk groups. The assessment of prognosis is not only valuable in the planning of treatment for individual patients but also for the stratification of patients in clinical trials. Quantitative methods for analysis of prognostic information are important for the optimal use of this knowledge. However little research has been done on quantification of prognosis of individual patients and on using the quantified prognosis for selection of treatment. We employed a new quantitative method to define the prognosis of cancer patients: an artificial neural network. Neural networks are powerful algorithms that can be used to learn relations in a database, and which can predict output parameters on novel data. Recently, several researchers have successfully used neural networks to analyze data in cancer research [1,2]. Neural networks
Neural computing is a relatively recent development in
the information sciences, an outgrowth of artificial intelligence research in the 1950s and 1960s. Neural networks are so named because they exhibit certain analogies, at least superficially, to the way in which arrays of neurons probably function in biological learning and memory. They differ from the usual computer programs in that they 'learn' from a set of examples rather than being programmed to get the right answer [3-5]. The information is encoded in the strength of the network's 'synaptic' connections. At present neural networks are rapidly making their way into various areas of the biomedical sciences. They are being applied to pattern recognition or decision making in diverse fields including nucleic acid sequence analysis, protein sequence analysis, quantitative cytology, diagnostic imaging, breast cancer research and cancer drug development [1, 2, 6-15]. Only the last few years have the algorithms advanced to the point at which they would be useful for the prediction of prognosis in cancer treatment. To predict survival for patients with advanced ovarian cancer, we developed neural networks such as that shown in Fig. 1. The fundamental building blocks are units which can be likened to neurons, and weighted connections which can be likened to synapses. The networks used have a variable number of input units (depending on the number of patients and treatment characteristics) and one output unit. The output unit
Downloaded from http://annonc.oxfordjournals.org/ at New York University on June 30, 2015
Introduction
32
Downloaded from http://annonc.oxfordjournals.org/ at New York University on June 30, 2015
we considered only patients who had received cisplatin based treatment (269 patients) [22]. One can use Cox's Prognostic Index as calculated by Van Houwelingen to predict whether patients will survive a number of years. However one must keep in mind that this is not a true prediction, since the 'prediction' is done on the same patient data, that were used to Fig. 1. Artificial neural network architecture. The links (synapses) construct the prognostic index. The result is given in between the units (cell bodies) have different weighting factors (a Fig. 2a in the top line, where the percentage of correctvalue that determines the strength of the connection). For instance, ly predicted patients is given after each year. For inWij indicates the weight between cells i and j. stance, Cox's Prognostic Index predicts correctly in almost 70% of the cases whether or not patients will surmay either encode a continuous value, such as patient vive 2 years. survival in months, or a discrete value, such as survival The bottom line in Fig. 2a is the score that is obfor a period longer or shorter than a number of years. tained, when one predicts on the assumption that there Between the inputs and outputs of the network, we is no relation between patient characteristics and surplaced one hidden layer of neurons whose number of vival (No Prognostic Index). One simply assumes that units could be varied. As shown in Fig. 1 each hidden everybody survives after 1 and 2 years, and that nounit is connected to all inputs and to all outputs. The body survives 3 years. Fig. 2a shows, that Cox's Progmore hidden units, the more complex the patterns that nostic Index contains significant information for 2-4 can be learned. In addition, there are direct connec- year survival prediction, but no information for 1 or 6 tions between input and output units. Using a set of pa- year survival. tients whose survival duration is known, we can train In Fig. 2b, the graphs of Fig. 2a are redrawn the network to predict survival. This training procedure (dashed). In addition, the prediction of a Boltzmann is done using the 'backpropagation algorithm' for the Machine with 0 hidden units is given. The top graphs network with continuous output, or Boltzmann Ma- present the 'prediction' performance on the training chine learning for the network with a discrete output set, i.e. those patients on which the network was trained [16-18]. Neural networks used in this way can be (90% of the data), which is not a true prediction. The shown to be able to approximate any problem arbitrari- bottom graph gives the real prediction performance on ly well, given enough hidden units [19]. Part of the data the test set, i.e. those patients which the network had is not presented during training. After training, the per- not seen during training (10% of the data). Since no formance of the network is verified on this independent data are available on the performance of Cox's Progtest set. nostic Index on independent test data, we can not compare the predictive power of the Cox's method with the Boltzmann Machine. Neural networks for prediction of survival In Fig. 2c, the training set and test set results of a Boltzmann Machine with 5 hidden units are shown We have tested the performance of several neural net- (comparison with Cox's Prognostic Index and indeworks to predict survival of patients with ovarian can- pendent assumption as reference graphs). As we excer after 1,2,... 6 years. We used a set of data from The pect, the training performance increases. However, this Netherlands Joint Study Group for Ovarian Cancer to is not accompanied with a better prediction on the intrain the network. This data set contained data from dependent test set. two studies initiated in 1979 and 1981, respectively. To improve the generalization performance, larger The first study compared a combination of hexa- data sets must be used for training the network. In submethylmelamine, cyclophosphamide, methotrexate, sequent studies, data from the EORTC will be made and 5-fluorouracil (Hexa-CAF) with cyclophospha- available. mide and hexamethylmelamine alternating with doxorubicin and a 5-day course of cisplatin (CHAP-5) in 186 patients with advanced epithelial ovarian carci- Neural networks to identify new prognostic factors noma [20]. In the second study, initiated in 1981, 191 eligible patients were enrolled and treated with either In patients with ovarian cancer conventional statistical CHAP-5 or cyclophosphamide and cisplatin (CP) both analyses revealed sets of independent prognostic facadministered intravenously on a single day at 3-week tors such as the performance status, the grade of the intervals [21]. Protocol entry criteria, pretreatment tumour, the size of the residual tumour prior to the inistaging, histology grading, the randomization proce- tiation of chemotherapy, the FIGO stage and the presdure, assessment and definitions of tumour response, ence or absence of ascites [23, 24]. From the data of evaluation and statistical methods were all the same in The Netherlands Joint Study Group several pretreatboth studies. To be able to make an exact comparison ment characteristics with prognostic significance were with the Cox's analysis reported by Van Houwelingen, identified in the patients treated with a platinum com-
33 Table 1. Neural network prognostic index obtained from 17 pretreatment characteristics, based on 2 year survival. Rescaled average value and variance over the data
Residual tumour size FIGO stage Kamofsky index Place of diagnosis Age Haemoglobin Leucocytes Thrombocytes Broders' grade Hospital experience* Weight Length Cell type 1 Cell type 2 Cell type 3 Cell type 4 Serum creatinine"
-0.24 ± 0.03 -0.1810.01 0.14 ±0.01 -0.14 ±0.02 -0.1310.02 0.08 ±0.01 -0.08 ±0.01 -0.08 ± 0.02 -0.07 10.02 -0.06 ± 0.02 0.05 ±0.01 0.05 ±0.02 0.03 ± 0.02 0.00 ± 0.02 -0.03 ±0.01 0.00 ± 0.03 0.00 ± 0.02
To assess the relative importance of the different characteristics, each characteristic was rescaled, so that its average value and variance over the data set would be 0 and 1 respectively. The prognostic values of the different characteristics, wl, ...,wl7 are given in descending order. The error bars indicate variations obtained from different training and test sets. For a patient with rescaJed characteristics (xl,..., xl7), her probability of 2 year survival is given by p 1/2'(1 + tanh(wl*xl + w2*x2 +.. + wl7*xl7 + 0.1). Tumour size in cm (tumours smaller than average are encoded as negative). Place of diagnosis is a binary value: 0 and 1 mean patient is diagnosed and treated in same respectively in a different hospital. * Hospital experience - number of patients from the database that are treated in that hospital. ** Serum creatinine - serum creatinine prior to treatment.
An advantage of neural networks is that the influence of a large number of potentially relevant patient and treatment characteristics can be assessed in the one network. Table 1 contains a list of pretreatment characteristics that were inputs to the neural network. Each characteristic was rescaled, so that its average value and variance over the data set would be 0 and 1, respectively. As can be seen from the preliminary results in Table 1, the neural network agreed with Cox's analysis in the importance of tumour size, FIGO stage and Karnofsky index for survival prediction (in that order). ycare The neural network ascribed less importance to Fig. Z Percentage of correctly predicted survival of patients for different years, a) Top line: Cox's Prognostic Index. Bottom line: No Broder's grading than does Cox's analysis. Two addiPrognostic Index, b) Boltzmann Machine with 0 hidden units (com- tional predictive factors were identified as compared to parison with Cox's Prognostic Index and No Prognostic Index, those revealed by Cox's analysis: place of diagnosis (indashed). Top solid line: results on the training set (90% of the data). dicating that patients diagnosed and treated in different Bottom solid line: results on the test set (10% of the data), c) Boltzhospitals have different prognoses compared to pamann Machine with 5 hidden units (comparison with Cox's Prognostic Index and N& Prognostic Index, dashed). Top solid line: tients diagnosed and treated in the same hospital) and Results on the training set (90% of the data). Bottom solid line: age (younger patients do better). Ascites was not conResults on the test set (10% of the data). sidered in the neural network analysis. Its importance will be assessed in a future study. bination. With a Cox' multivariate analysis the Karnofsky index, the site of metastases (FIGO stage), the size of residual tumour, the histological (Broders') grade Conclusions and the presence of ascites all predicted the survival independent of each other [22]. From this experiment we conclude that neural net-
Downloaded from http://annonc.oxfordjournals.org/ at New York University on June 30, 2015
Pretreatment characteristics
34
References 1. Ravdin PM, Clark GM, Hilsenbeck SG et ai. A demonstration that breast cancer recurrence can be predicted by neural network analysis. Breast Cancer Research and Treatment 1992; 21:47-53. 2. Weinstein JN, Kohn KW, Grever MR et al. Neural Computing in Cancer Drug Development Prediction Mechanism of Action. Science 1992; 258:447-51. 3. Heskes TM, Kappen HJ. Learning processes in neural networks, Phys Rev A44 1991; 2718-26.
4. Heskes T, Slijpen E, Kappen B. Learning in neural networks with local minima. Phys. Res A46 1992; 5221-31. 5. Heskes T, Kappen HJ. On-line learning processes in artificial neural networks. In Taylor J (ed): Mathematical foundations of neural networks. Amsterdam: Elsevier 1993 (in press). 6. Brunak S, Engelbrecht J, Knudsen S. Neural network detects errors in the assignment of mRNA splice sites. Nucleic Acids Res 1990; 18:4797-801. 7. O'Neill MC. Training back-propagation neural networks to define and detect DNA-binding sites. Nucleic Acids Res 1991; 19:313-8. 8. Holley LH, Karplus M. Protein secondary structure prediction with a neural network. Proc Natl Acad Sci, USA 1989; 86 (1): 152-6. 9. Kneller DG, Cohen FE, Langridge RJ. Improvements in protein secondary structure prediction by an enhanced neural network. Mol Biol 1990; 214 (1): 171-82. 10. McGregor MJ, Flores TP, Steinberg MJE. Prediction of betaturns in proteins using neural networks. Protein Eng 1989; 2 (7): 521-6. 11. Wied GL, Dytch H, Bibbo M et al. Artificial intelligenceguided analysis of cytologic data. Anal Quant Cytol Histol 1990; 12 (6): 417-28. 12. Wolberg WH, Mangasarian OL. Computer-aided diagnosis of breast aspirates via expert systems. Anal Quant Cytol Histol 1990; 12: 314-20. 13. Dytch E, Wied GL. Artificial neural networks and their use in quantitative pathology. Anal Quant Cytol Histol 1990; 12: 379-93. 14. Boone JM, Sigillito VG, Shaber GS. Neural networks in radiology: An introduction and evaluation in a signal detection task. Med Phys 1990; 17:234-41. 15. DaPonte JS, Sherman P. Classification of ultrasonic image texture by statistical discriminant analysis of neural networks. Comput Med Imag Grah 1991; 15: 3-9. 16. Rumelhart D, Hinton G, Williams R. Learning representations by back-propagating errors. Nature 1986; 323: 533-6. 17. Ackley D, Hinton G, Sejnowski T. A learning algorithm for Boltzmann Machines. Cognitive Science 1985; 9:147-69. 18. Kappen HJ. Using Boltzmann Machines for probability estimation. In Gielen and Kappen (eds): Proceedings ICANN'93. London: Springer Verlag 1993: in press. 19. Nillson N. Learning Machines. McGraw-Hill: New York 1965. 20. Neijt JP, ten Bokkel Huinink WW, Van der Burg MEL et al. Randomised trial comparing two combination chemotherapy regimens (Hexa-CAF vs CHAP-5) in advanced ovarian carcinoma. The Lancet 1984; 2: 594-600. 21. Neijt JP, ten Bokkel Huinink WW, Van der Burg MEL et al. Randomized trial comparing two combination chemotherapy regimens (CHAP-5 vs CP) in advanced ovarian carcinoma. J Clin Oncol 1987; 5:1157-68. 22. Houwelingen van JC, ten Bokken Huinink WW, van der Burg MEL et al. Predictability of the survival of patients with advanced ovarian cancer. J Clin Oncol 1989; 7: 769-73. 23. Redman JR, Petroni GR, Saigo PE et al. Prognostic factors in advanced ovarian carcinoma. J Clin Oncol 1986; 4: 515-23. 24. Swenerton KD, Hislos TG, Spinelli J et al. Ovarian carcinoma: A multivariate analysis of prognostic factors. Obstet Gynecol 1985; 65: 264-70.
Correspondence to: H. J. Kappen, Ph£>. Dutch Foundation for Neural Networks Geert Grooteplein 21 6525 EZ Nijmegen The Netherlands
Downloaded from http://annonc.oxfordjournals.org/ at New York University on June 30, 2015
works perform at least as well as Cox's method for the prediction of patient survival. In order to assess this fully, prediction performance of Cox's method on an independent test set should be measured. In addition, it was shown that prognostic factors can be easily identified by using neural networks. The advantages of neural networks over Cox's method are: 1. By using the paradigm of training set and test set, large numbers of potentially relevant patient and treatment characteristics can be included and their predictive importance can be assessed. 2. The neural network can easily be extended to include non-linear relations, although in the present study this does not lead to improved generalizations. This advantage should be fully exploited in larger databases. We conclude that neural networks can be trained to predict survival, progression-free survival, and response of patients with ovarian cancer. The inputs to the model may consist of treatment related factors and all patient characteristics registered. The network is allowed to search for possible complex time-dependent interactions of the prognostic factors. When trained in a large data base the network has the potential to identify high risk groups, quantify the prognostic information, establish the relative importance of the drugs used in combination schedules, and establish the relative importance of dose-intensity. The prognostic index presented in this paper, was derived from a neural network without hidden units. For larger data bases, it is expected that the optimal network will have hidden units. The translation of this network structure to a prognostic index would be more complex, and would need further research. After learning, the synaptic connections of the neural network will have a tolerance around an average value, which can be calculated [3]. These can be used to assess the statistical significance of the connections, and to 'prune' the network, i.e. to remove those synaptic connections that hardly affect the predictions. After sufficient pruning, the network can be interpreted as a set of fuzzy rules whose statistical significance is guaranteed by the method. If the method proves to be successful in this field of cancer research it can be used to analyze databases of other tumour types as well.