THE USE OF NEURAL NETWORKS AND LOGISTIC REGRESSION ANALYSIS FOR PREDICTING PATHOLOGICAL STAGE IN MEN UNDERGOING RADICAL PROSTATECTOMY: A POPULATION BASED STUDY

THE USE OF NEURAL NETWORKS AND LOGISTIC REGRESSION ANALYSIS FOR PREDICTING PATHOLOGICAL STAGE IN MEN UNDERGOING RADICAL PROSTATECTOMY: A POPULATION BASED STUDY

0022-5347/01/1665-1672/0 THE JOURNAL OF UROLOGY® Copyright © 2001 by AMERICAN UROLOGICAL ASSOCIATION, INC.® Vol. 166, 1672–1678, November 2001 Printe...

106KB Sizes 3 Downloads 20 Views

0022-5347/01/1665-1672/0 THE JOURNAL OF UROLOGY® Copyright © 2001 by AMERICAN UROLOGICAL ASSOCIATION, INC.®

Vol. 166, 1672–1678, November 2001 Printed in U.S.A.

THE USE OF NEURAL NETWORKS AND LOGISTIC REGRESSION ANALYSIS FOR PREDICTING PATHOLOGICAL STAGE IN MEN UNDERGOING RADICAL PROSTATECTOMY: A POPULATION BASED STUDY A. BORQUE,* G. SANZ, C. ALLEPUZ,* L. PLAZA, P. GIL*

AND

L. A. RIOJA†

From the Urology Department, Miguel Servet University Hospital and Department of Statistical Methods, University of Zaragoza, Zaragoza, Spain

ABSTRACT

Purpose: Clinical under staging occurs in 40% to 60% of patients who undergo radical prostatectomy for prostate cancer. To decrease under staging several methods of predicting pathological stage preoperatively have been developed based on statistical logistic regression analysis and neural networks. To our knowledge none has been validated in our homogeneous regional patient population to date. We created logistic regression and neural network models, and implemented and adapted them into our practice. We also compared the 2 methods to determine their value and practicality in daily clinical practice. We present the results of our novel approach for predicting pathological staging of prostate adenocarcinoma. Materials and Methods: Between 1986 and 1999, 600 white men from the Aragon region of Spain underwent surgery for prostate cancer; of whom 468 were selected for study. Predictive study variables included patient age, clinical stage, biopsy Gleason score and preoperative prostate specific antigen (PSA). The predicted result included in analysis was organ confined or nonorgan confined disease. Data were analyzed by multivariate logistic regression and a supervised neural network (multilayer perceptron and radial basis function). Results were compared by comparing the areas under the receiver operating characteristics curves. Results: We generated 5 logistic regression models. The model created with clinical staging, Gleason biopsy score and PSA distributed in 5 categories (p ⬍0.001) with an area under the receiver operating characteristics curve of 0.840 proved to be most predictive of pathological stage. Similarly of the 6 neural network models evaluated the radial basis function model, which included age, clinical stage, Gleason biopsy score and preoperative PSA distributed in 5 categories with an area under the curve of 0.882, proved the most predictive but not superior to the logistic regression model. The difference in the area under the curves in the 2 chosen models was 0.042 (p ⫽ 0.1). Conclusions: It is possible to generate useful predictive models of organ confined disease using logistic regression or neural networks with high indexes of clinical and statistical validity. However, using these variables neural networks did not prove to be better than logistic regression analysis. Therefore, better predictive variables must be identified, preferably nonlinear characteristics with respect to the probability of organ confined tumor, to generate better predictive models using neural networks. KEY WORDS: prostate, prostatic neoplasms, neural networks (computer), neoplasm staging, logistic models

Prostate cancer is the second most common cause of male cancer death in the Western hemisphere after bronchopulmonary cancer and, therefore it is a matter of social and medical concern. The mortality rate due to this illness in our area is 28.4/100,000 male inhabitants yearly.1 However, environmental factors also have a role with specific characteristics according to the population and area under consideration,2 as evidenced by the fact that migrant populations eventually acquire the patterns of prostate cancer specific to the area of settlement.3 This finding justifies differential study in various areas and populations with restricted appli-

cation of the results of each study to other areas even within the same country without at least previous validation.4 However, as a matter separate from possible geographical differences, there exists a uniform pattern with respect to clinical under staging, which occurs in up to 40% to 60% of cases in large series. Thus, between 4 and 6 of every 10 males undergoing radical prostatectomy for clinically suspected, organ confined tumor are not cured by this surgery since histopathological study of the lesion reveals nonorgan confined disease. In these cases other palliative therapeutic alternatives are more indicated, such as radiotherapy, hormone therapy and so forth. To decrease this level of clinical under staging a number of predictive models have been designed. Particular attention must be paid to the Partin’s nomogram5 obtained by the log linear regression method and to recent developments in the field of artificial intelligence in the form of neural networks designed with the same objectives.6, 7 However, because to our knowledge none has been

Accepted for publication June 1, 2001. Supported by the Miguel Servet Research Foundation and Urology Research Foundation of the Spanish Association of Urology. * Financial interest and/or other relationship with Miguel Servet Research Foundation and Urology Research Foundation, Spanish Association of Urology. † Financial interest and/or other relationship with Urology Research Foundation, Spanish Association of Urology. 1672

USE OF NEURAL NETWORKS FOR PREDICTING PATHOLOGICAL STAGE IN PROSTATECTOMY

1673

validated in our area, we believed that it was appropriate to develop models for the population in this area. They were constructed using logistic regression and neural networks, which we compared to determine which was most applicable to daily clinical practice. Our working hypothesis was that it is possible to predict organ confined prostate adenocarcinoma from preoperative clinical parameters. We used certain objectives to confirm this hypothesis. We constructed a predictive model of organ confined tumor using logistic regression with the preoperative parameters of patient age, clinical stage, Gleason biopsy score and preoperative prostate specific antigen (PSA). We also designed a neural network as a predictive model for organ confined disease using the preoperative parameters of age, clinical stage, Gleason biopsy score and preoperative PSA. We then compared the 2 predictive models to determine which is more applicable to daily clinical practice. MATERIAL AND METHODS

Study population. Of the 600 patients who underwent surgery for adenocarcinoma of the prostate between June 1986 and December 1999 at our institution radical prostatectomy was done in 502, seminal vesicle biopsies in 242 (followed by radical prostatectomy in 145 since they were negative for cancer), and laparoscopic lymphadenectomy in 86 (followed by radical prostatectomy in 75 since it was negative for cancer). Figures 1 and 2 show the indications for these procedures, which are part of the clinical staging protocol at our center.8, 9 The total number of operations was more than 600 because the same patient may have undergone seminal vesicle biopsy and laparoscopic lymphadenectomy before proceeding to radical prostatectomy when 1 or each of the initial procedures was negative. However, each patient with his respective preoperative variables was considered a single case for the prediction of organ confined tumor whether 1 to 3 operations were performed. Clinical assessment. Inclusion Criteria: We obtained certain data on all males in the study, including age at diagnosis, clinical stage defined by rectal examination in addition to transrectal ultrasound when details were available and the results of biopsy or previous prostatic surgery according to the 1997 TNM classification.10 Serum PSA was measured by the Tandem-E method (Hybritech Beckman-Coulter Corp., San Diego, California) to June 1998 and the Elecsys 2010

FIG. 1. Indications for seminal vesicle biopsy and laparoscopic lymphadenectomy up to 1997.

FIG. 2. Indications for seminal vesicle biopsy and laparoscopic lymphadenectomy since 1997.

method (Elecsys 2010-Roche Diagnostics, Mannheim, Germany) after July 1998 (at our laboratory correlation coefficient, r ⫽ 0.98), while avoiding the possible influence of drugs, inflammatory processes, prostate or rectal examination. Prostate adenocarcinoma was histologically confirmed in the biopsy cores, transurethral resection fragments or prostatectomy specimen and graded using the Gleason method.11 Pathological classification of the radical prostatectomy specimen was done according to the 1997 TNM method,10 so that each case was considered organ confined, stage pT2a or pT2b, or nonorgan confined, stage pT3a, pT3b or pN1 disease. Patients who did not undergo radical prostatectomy due to tumor positive biopsy of the seminal vesicles or laparoscopic lymphadenectomy were always considered to have nonorgan confined, stage pT3b or pN1 disease. Exclusion Criteria: Patients were excluded from analysis due to previous radiotherapy treatment, which may distort pathological interpretation of the surgical specimen, and neoadjuvant hormone therapy, which may affect the pathological interpretation. However, 33 patients were entered into the trial although they had received neoadjuvant hormone therapy. In these cases the preoperative parameters of clinical stage, PSA and Gleason score were known before the administration of neoadjuvant therapy. Pathological study of the specimen confirmed nonorgan confined disease, indicating that neoadjuvant therapy had not lowered the pathological classification to organ confined cancer. After applying these study inclusion and exclusion criteria our initial sample of 600 patients was decreased to 468, including 391, 69 and 8 who underwent radical prostatectomy, seminal vesicles biopsy only and laparoscopic lymphadenectomy only due to tumor positive biopsy, respectively. Statistical assessment. Predictive variables were patient age at diagnosis in years, considered a continuous variable, clinical stage in the 5 categories T1a to T1b, T1c, T2a, T2b and T3a to T3b according to rectal examination, ultrasound findings when available and the results of unilateral or bilateral prostate biopsy. We also considered preoperative serum PSA as a continuous variable or distributed into the 5 categories 0 to 4, 4.01 to 10, 10.01 to 20, 20.01 to 30 and greater than 30 ng./ml. The result variable was organ confined prostate adenocarcinoma, considered a binary or yes/no variable. Predictive methods involved multivariate logistic regression assisted by a commercially available software program and supervised neural networks (multilayer perceptron

1674

USE OF NEURAL NETWORKS FOR PREDICTING PATHOLOGICAL STAGE IN PROSTATECTOMY

and radial basis function) using a commercially available software program. We compared results by comparing the areas under the receiver operating characteristics (ROC) curve according to Hanley and McNeil.12, 13 RESULTS

Table 1 lists descriptive statistics on the 468 patients 45 to 84 years old at diagnosis (mean age plus or minus standard deviation 64.65 ⫾ 5.46, median 65, interquartile range 8) selected for the prediction of organ confined tumor. As clinical stage advanced, we noted an increase in the percent of patients with nonorgan confined disease on pathological examination. Again, there was an increase in the prevalence of nonorgan confined disease with increasing Gleason score. Preoperatively median PSA was 10.89 ng./ml. (range 0.49 to 485.1, interquartile range 13.89, mean 22.08 ⫾ 43.33). There was the same directly proportional increase in the prevalence of nonorgan confined disease that we observed for the previous variables. Logistic regression enabled a formula to be generated for predicting nonorgan confined disease using the predictive variables of clinical stage, Gleason score and preoperative PSA distributed into categories (fig. 3). Table 2 shows the application of this formula. A ROC curve for the predictive model was generated after establishing various cutoff points (table 3). Mean area under the curve was 0.840 ⫾ 0.019 (fig. 4). Table 4 lists the odds ratios per predictive variable category. The 3 predictive variables used to generate the model, namely Gleason score, preoperative PSA and clinical stage, were statistically significant (p ⬍0.001). Overall the model showed chi-square 196.334 for 10 degrees of freedom (p ⬍0.001) with a ⫺2 log likelihood of 452.315, confirming its high statistical significance. In addition to our initial model 1, 4 other logistic regression models were constructed for predicting nonorgan confined disease. In model 2 the predictive variables were clinical stage and Gleason score distributed in categories and preoperative PSA as a continuous variable. This model showed chi-square 200.787 for 7 degrees of freedom (p ⬍0.001) with a ⫺2 log likelihood of 447.862 and a mean area under the ROC curve of 0.843 ⫾ 0.018. Comparison with model 1, in which PSA was distributed in 5 categories, revealed no statistically significant difference (p ⫽ 0.7). In model 3 the TABLE 1. Distribution of the study sample No. Pts. (%)

Age: 41–50 51–60 61–70 71–80 Older than 80 Clinical stage: cT1a–cT1b cT1c cT2a cT2b cT3a–cT3b Preop. Gleason score: 2–4 5–6 7 8–10 Preop. PSA (ng./ml.): 0–4 4.01–10 10.01–20 20.01–30 Greater than 30 Totals

No. Postop. Pathological Findings (%) Stage pT2

Greater Than Stage pT2

5 (1.1) 98 (20.9) 305 (65.2) 59 (12.6) 1 (0.2)

2 (40.0) 63 (64.3) 147 (48.2) 26 (44.1) 0

3 (60.0) 35 (35.7) 158 (51.8) 33 (55.9) 1 (100)

11 (2.3) 116 (24.8) 175 (37.4) 125 (26.7) 41 (8.8)

10 (90.9) 90 (77.6) 101 (57.7) 36 (28.8) 1 (2.4)

1 (9.1) 26 (22.4) 74 (42.3) 89 (71.2) 40 (97.6)

33 (7.0) 327 (69.9) 78 (16.7) 30 (6.4)

25 (75.8) 196 (59.9) 14 (17.9) 3 (10.0)

8 (24.2) 131 (40.1) 64 (82.1) 27 (90.0)

35 (7.5) 184 (39.3) 129 (27.5) 41 (8.8) 79 (16.9)

30 (85.7) 126 (68.5) 60 (46.5) 13 (31.7) 9 (11.4)

5 (14.3) 58 (31.5) 69 (53.5) 28 (68.3) 70 (88.6)

468 (100)

238 (50.9)

230 (49.1)

predictive variables were clinical stage, Gleason score and preoperative PSA distributed in categories, and age at diagnosis as a continuous variable. This model showed chi-square 203.103 for 11 degrees of freedom (p ⬍0.001) with a ⫺2 log likelihood of 445.546 and a mean area under the ROC curve of 0.849 ⫾ 0.018. No statistically significant difference was noted when compared with the initial model in which age was not included (p ⫽ 0.1). In model 4 the predictive variables were clinical stage, Gleason score and preoperative PSA distributed in categories as well as the interactions clinical stage ⫻ Gleason score, Gleason score ⫻ preoperative PSA and clinical stage ⫻ preoperative PSA. This model showed chi-square 221.078 for 36 degrees of freedom (p ⬍0.001) with a ⫺2 log likelihood of 427.571, which was not statistically significantly different compared with the initial model (p ⫽ 0.5). In model 5 the predictive variables were clinical stage, Gleason score and preoperative PSA distributed in categories with the same interactions as in model 4 but with the addition of the interaction clinical stage ⫻ Gleason score ⫻ preoperative PSA. This model showed chisquare 232.980 for 46 degrees of freedom (p ⬍0.001) with a ⫺2 log likelihood of 415.669, which did not reach statistical significance compared with the initial model (p ⫽ 0.4). A neural network was also developed for predicting nonorgan confined disease. Of the sample 328 cases (70%) were used to train the system, 47 (10%) were used for validation and the remaining 93 (20%) were used for the test phase. Three models were defined for each of the 2 supervised neural network models (multilayer perceptron and radial basis function) to obtain the corresponding predictive models. They were tested using the test group of 93 patients, including 41 with organconfined and 52 with nonorgan confined disease, to generate the respective ROC curves. In model 1 the predictive variables were clinical stage, Gleason score and preoperative PSA distributed in categories. For multilayer perceptron the mean area under the ROC curve was 0.826 ⫾ 0.045 and for radial basis function it was 0.749 ⫾ 0.053. In model 2 the predictive variables were the same as in model 1 with the addition of age at diagnosis as a continuous variable. The mean area under the ROC curve for multilayer perceptron was 0.861 ⫾ 0.041 and for radial basis function it was 0.882 ⫾ 0.038. In model 3 the predictive variables were the same as in model 1 but with PSA distributed in continuous form. The mean area under the ROC curve for multilayer perceptron was 0.847 ⫾ 0.043 and for radial basis function it was 0.788 ⫾ 0.049. When we compared the multilayer perceptron model with the largest area under the curve with the radial base function model with the largest area (model 2) the difference in areas was 0.021, which was not statistically significant (p ⫽ 0.5). Furthermore, we compared the areas under the curve for the best neural network and logistic regression models. The best neural network corresponded to radial basis function model 2 (mean area under the curve 0.882 ⫾ 0.038, fig. 4). Table 5 shows the cutoff points for the parameters used to generate this model. The chosen logistic regression model was model 1, which was generated by clinical stage, Gleason score and preoperative PSA distributed in categories (mean area under the curve 0.840 ⫾ 0.019). The difference in areas was 0.042, which was not statistically significant (p ⫽ 0.1). DISCUSSION

The idea of combining preoperative variables for predicting the final pathological stage after radical prostatectomy is not new. The first published reference on this subject dates from 1987, when Oesterling et al combined clinical stage, Gleason biopsy score and prostatic acid phosphatase, and performed multivariate logistic regression to predict pathological stage after radical prostatectomy in 275 patients with clinically localized prostate adenocarcinoma.14 There have been many

1675

USE OF NEURAL NETWORKS FOR PREDICTING PATHOLOGICAL STAGE IN PROSTATECTOMY

FIG. 3. Probability of nonorgan confined disease

TABLE 2. Probability of nonorgan confined tumor Preop. Ng./Ml. PSA Clinical Stage

0–4

4.01–10

T1a–T1b T1c T2a T2b T3a–T3b

3 6 13 24 74

7 14 30 47 89

T1a–T1b T1c T2a T2b T3a–T3b

11 21 41 59 93

26 43 65 80 97

T1a–T1b T1c T2a T2b T3a–T3b

14 25 46 64 94

31 48 70 83 98

10.01–20 Gleason biopsy score 2–6 12 22 42 60 93 Gleason biopsy score 7 37 55 77 87 98 Gleason biopsy score 8–10 42 61 80 89 99

20.01–30

Greater Than 30

18 32 55 72 96

36 54 75 86 98

50 68 84 92 99

71 84 93 97 100

56 72 87 93 99

76 87 94 97 100

TABLE 3. Internal validity of the predictive equation for different cutoff points for greater than stage pT2 disease Cutoff Point 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

% Accuracy

% Sensitivity

% Specificity

% Pos. Predictive Value

% Neg. Predictive Value

51.07 64.53 73.72 74.57 77.56 75.21 75.64 68.80 63.68

100 94.34 82.17 81.73 69.56 59.13 57.39 40.86 26.95

3.78 35.71 65.54 67.64 85.29 90.75 93.27 95.79 99.15

50.10 58.64 69.74 70.94 82.05 86.07 89.18 90.38 96.87

100 86.73 79.18 79.31 74.35 69.67 69.37 62.63 58.41

later improvements,5, 15–24 not only due to larger sample size, but also to the use of PSA testing as a better tumor marker than prostatic acid phosphatase. Of all later developments the model developed by Partin et al with a large sample of 4,133 patients must be mentioned.5 However, it has not been validated in our area and we have made certain modifications to the design, including use of the 1997 instead of the 1992 TNM classification, and rectal examination and prostatic biopsy done for clinical staging, not rectal examination only. We believe that when only 1 lobe is suspicious on rectal examination but biopsy is positive in each lobe, disease should be clinically considered stage T2b and not T2a, which would otherwise lead to even further under staging. Up to 42% of patients with palpable tumor in 1 lobe have bilateral disease, although ultrasound of the nonsuspect lobe on rectal examination is normal in 57%.25 Similarly Narayan et al reported better results in their logistic regression predictive model when clinical stage was based on the results of ultrasound guided biopsy than in the model in which it was based on rectal examination alone.19 Also, with reference to PSA Partin et al used the value determined at least 4 weeks after prostatic biopsy, transurethral prostatic resection or the 2 procedures.5 Although we consider that 4 weeks are sufficient time for PSA normalization after prostate aggression,26 we believe that in cases in which the diagnosis of prostate cancer was an incidental finding after transurethral resection PSA 4 weeks after operation is likely to be low due to the reduction in prostatic tissue after transurethral prostatic

resection. Therefore, predictive value would be altered by using this lower PSA, which may explain why in the Partin nomogram stage T1b or nonpalpable tumor that is not visible on imaging with normal PSA always had a worse prognosis than stage T1c or nonpalpable tumor that is not visible on imaging but involves increased PSA and was clinically more advanced.5 Our study is particularly interesting due to the homogeneity of the study population with respect to racial distribution. All patients were of white males from our health district, which is of particular importance due to the known effect that race has on the incidence of prostate cancer, particularly with reference to the black American population.27, 28 This factor may not have been considered in predictive models widely used to date and it may have distorted the results.5 We believe that models generated in areas of greater racial diversity should have considered race as another predictive variable or should have sought a more homogeneous sample with respect to this variable. Again, it is not advisable to use these models in areas with populations different than that for which they were designed, at least not without previous validation. Regarding the use of neural networks in urology and more specifically in prostate cancer, the first example dates from 1994 when Snow et al tried to predict biopsy results and recurrence after radical prostatectomy from clinical parameters.29 With specific reference to prostate cancer staging in 1998 Tewari et al initially attempted to predict the existence

1676

USE OF NEURAL NETWORKS FOR PREDICTING PATHOLOGICAL STAGE IN PROSTATECTOMY

FIG. 4. ROC curves for logistic regression and neural network. AUC, area under curve

TABLE 4. Odds ratios of the predictive variable categories for predicting nonorgan confined disease Factor Gleason score: 7/2–6 8–10/7 PSA (ng./ml.): 4.01–10/0–4 10.01–20/4.01–10 20.01–30/10.01–20 Greater than 30/20.01–30 Clinical stage: T1c/T1a–b T2a/T1c T2b/T2a T3a–b/T2b

Odds Ratio 4.5 1.3 2.8 1.7 1.7 2.5 2.1 2.6 2.1 8.7

of positive margins and seminal vesicle or lymph node involvement using a neural network with a large number of predictive variables, including race.30 More recently Crawford et al studied the prediction of lymph node involvement.7 However, to our knowledge no study has been done along these lines in our area. Our report also represents the first study in which logistic regression options were compared with neural networks options for predicting the pathological stage of prostate adenocarcinoma. Logistic regression. Preoperative PSA was distributed into categories (model 1) and used as a continuous variable (model 2). We noted slight improvement in the area under the curve in the latter method, although statistical significance was not reached (p ⫽ 0.7). For this reason using PSA as a continuous variable was discounted, so that we would not complicate the construction and interpretation of the nomograms (table 2). Others have reported similar results using PSA as a continuous variable or distributed into categories.18 Age was considered a continuous variable because the age range of our study sample at prostate cancer diagnosis was slight with 95% of patients between 54 and 75 years old. Thus, stratification into categories was impractical. However, comparing the areas under the ROC curves showed that the contribution of age (model 3) led to only minimal improvement in the area under the curve compared with our initial model 1, which did not include age. Again, the difference was not statistically significant (p ⫽ 0.1). Therefore, as in the previous model, age was not included in the final

equation for prediction. Others similarly discounted age from their equations for predicting the pathological stage of prostate adenocarcinoma.17, 20 Gilliland et al assigned predictive capacity to age for estimating nonorgan confined disease.24 It should be noted that this multicenter predictive model was constructed using PSA, Gleason biopsy score and age but excluding clinical stage as a predictive variable due to a lack of uniformity in the criteria applied by the various study groups. As defined by integrating rectal examination, transrectal ultrasound and prostate biopsy findings, clinical stage was distributed into 5 categories using the updated 1997 TNM classification after assuming that separation into a larger number of categories had no clinical usefulness. Gleason biopsy score was also distributed into the categories 2 to 6, 7 and 8 to 10. The classical stratification into Gleason 2 to 4, 5 to 7 and 8 to 10 was discounted due to the special significance of Gleason 7 today with a worse prognostic significance than Gleason 5 to 6, although it is not as bad as Gleason 8 to 10.31 Gleason suggested forming Gleason 2 to 6 and 7 to 10 groups due to the particular prognostic implications.32 It did not appear appropriate to decrease stratification to this extreme, and so we initially used the 4 groups Gleason 2 to 4, 5 to 6, 7 and 8 to 10. However, later statistical calculations indicated the unification of the first 2 groups into the single group Gleason 2 to 6 due to the small number of patients with Gleason 2 to 4 than with 5 to 6 disease 70% (7% versus table 1). The particular importance of a Gleason score of 7 with respect to a poor prognosis was clear when the odds ratios of the logistic regression model were revised. With all other variables equal a patient with a Gleason biopsy score of 7 was at 4.5-fold greater risk for nonorgan confined disease than a patient with a score of between 2 and 6 (table 4). We also assessed models 4 and 5 with interaction among the variables, which showed no statistically significant differences compared with model 1 (p ⫽ 0.5 and 0.4, respectively). Therefore, we constructed our nomograms using our initial model for which the statistical parameters described confirmed high statistical significance (table 2). In our opinion it represents a valid predictive model in this study population sample for predicting organ confined tumor. Neural networks. As indicated, model 2 had the largest area under the curve for multilayer perceptron and radial basis function, with a mean area under the curve of 0.861 ⫾ 0.041 and 0.882 ⫾ 0.038, respectively. The 0.021 difference in

USE OF NEURAL NETWORKS FOR PREDICTING PATHOLOGICAL STAGE IN PROSTATECTOMY

1677

TABLE 5. Validity of the model 2 radial basis function neural network in the test group Cutoff Point 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

% Accuracy

% Sensitivity

% Specificity

% Pos. Predictive Value

% Neg. Predictive Value

54.84 63.44 67.74 74.19 78.49 80.65 75.27 63.44 55.91

100 100 97.56 92.68 92.68 80.49 48.78 19.51 2.44

19.23 34.62 44.23 59.62 67.31 80.77 96.15 98.08 98.08

49.40 54.67 57.97 64.41 69.09 76.74 90.91 88.89 50.00

100 100 95.83 91.18 92.11 84.00 70.42 60.71 56.04

these areas was not statistically significant (p ⫽ 0.5). Evaluation of the logistic regression method showed that using age and PSA as continuous variables led to improvement in the area under the curve compared with the model in which the 3 predictive variables of clinical stage, Gleason score and PSA were distributed in categories. However, since it did not achieve statistical significance, we did not think that it was clinically justified to complicate the graphic construction and interpretation of the nomogram by introducing the additional variable age or by including a variable as a continuous variable, such as PSA. From a practical point of view this difficulty does not exist for neural networks because a computer is required for application and there is no problem when including more variables, continuous or otherwise, if it leads to improved predictive capacity of the model. Thus, for comparing neural networks with logistic regression we chose neural network model 2 with the largest area under the curve for radial basis function. However, improvement in the area under the curve in this model versus the other neural network models was not statistically significant and, therefore, not clinically relevant. We did not establish parameters for comparison for our neural network model and previously published models since we identified none that was specifically designed to predict nonorgan confined disease. Logistic regression versus neural networks. Figure 4 shows the comparison of the model 1 logistic regression (mean area under the curve 0.840 ⫾ 0.019) and the model 2 radial basis function neural network (mean area under the curve 0.882 ⫾ 0.038). The 0.042 difference in area favored the latter, although it did not achieve statistical significance (p ⫽ 0.1). In other words, logistic regression and neural networks gave rise to predictive models with high levels of validity in terms of precision, sensitivity, specificity, predictive value and area under the ROC curve, in which the better result of the neural network was minimal and no doubt due to the high levels attained by logistic regression. These high levels may possibly have been due to the use of few variables and the relatively linear relationship of these variables to the variable to be predicted, as indicated by others.33 Despite the promising expectations that arise when considering the application of artificial intelligence to the field of medicine, often the theoretical advantages of the relative simplicity compared with conventional statistical methods and the capacity to recognize unknown patterns of relationships among variables, especially when it has a nonlinear character,34 are overestimated. As artificial intelligence disadvantages does not express clearly the relationships established among variables in regard to those with greater meaning and it can be over trained on the training group and, thus, lose the possibility of being generalized to new samples.34 What is worse, in a number of publications the good results of neuronal networks are shown without comparing them with the recognized predictive methods of traditional statistics, namely logistic regression, which would adapt better in terms of design to the problems that some attempt to solve by neural networks.35–37 On the few occasions in which this comparison is shown it is often presented by comparing the accuracy, sensitivity, specificity and predictive value at cer-

tain cutoff points, and not by ROC curves that compare diagnostics methods. For neural networks as well as logistic regression they should be the elements of comparison,35–37 which would enable us to explain the true supremacy of 1 method over another. Thus, ROC curves were our interest and method of performance, and not finding a statistically significant and clinical difference in the use of neural networks compared with logistic regression in our problem. When properly compared, this equality in the 2 methods is not a new development in scientific publications, in which there are more daily precedents in various biomedical areas.38 – 45 However, publications also exist on the superiority of neural networks over logistic regression,46, 47 which reflects the necessity of not accepting in a predetermined way the superiority of neural networks over traditional statistics and forcing a comparison in each case. In our opinion this comparison must be done using ROC curves. Therefore, neuronal networks are at least as useful as logistic regression for predicting nonorgan confined disease according to clinical stage, Gleason biopsy score and preoperative PSA, although they are technically more complex in design and use. These models of artificial intelligence would benefit from the inclusion of a larger number of predictive variables, particularly those without a linear relationship with the variable to be predicted. In this way we can achieve models with a greater predictive capacity that may become standard in clinical practice, displacing logistic regression. CONCLUSIONS

We constructed predictive models for organ confined prostate cancer in our area using logistic regression and neural networks with high statistical validity and clinical usefulness. The neural networks were not superior to logistic regression in terms of predictive capacity. More predictive variables must be identified and included, particularly those without a linear relationship with respect to organ confined prostate cancer, to construct even better predictive models using neural networks. REFERENCES

1. Instituto Nacional de Estadística. Available at http:// www.ine.es/prensa/np155.htm. Accessed January 5, 2001 2. Boyle, P. and Zaridze, D. G.: Risk factors for prostate and testicular cancer. Eur J Cancer, 29A: 1048, 1993 3. Haenszel, W. and Kurihara, M.: Studies of Japanese migrants. I. Mortality from cancer and other diseases among Japanese in the United States. J Natl Cancer Inst, 40: 43, 1968 4. Kattan, M. W., Stapleton, A. M., Wheeler, T. M. et al: Evaluation of a nomogram used to predict the pathologic stage of clinically localized prostate carcinoma. Cancer, 79: 528, 1997 5. Partin, A. W., Kattan, M. W., Subong, E. N. et al: Combination of prostate-specific antigen, clinical stage, and Gleason score to predict pathological stage of localized prostate cancer. A multiinstitutional update. JAMA, 277: 1445, 1997 6. Tewari, A. and Narayan, P.: Novel staging tool for localized prostate cancer: a pilot study using genetic adaptive neural networks. J Urol, 160: 430, 1998

1678

USE OF NEURAL NETWORKS FOR PREDICTING PATHOLOGICAL STAGE IN PROSTATECTOMY

7. Crawford, E. D., Batuello, J. T., Snow, P. et al: The use of artificial intelligence technology to predict lymph node spread in men with clinically localized prostate carcinoma. Cancer, 88: 2105, 2000 8. Allepuz Losa, C. A., Sanz Velez, J. I., Gil Sanz, M. J. et al: Seminal vesicle biopsy in prostate cancer staging. J Urol, 154: 1407, 1995 9. Blas Marín, M., Allepuz Losa, C., Rioja Sanz, C. et al: Valor y secuencia de la biopsia de vesículas seminales y la linfadenectomía pelviana laparosco´ pica en el estadiaje del ca´ ncer de pro´ stata. Actas Urol Esp, 21: 874, 1997 10. TNM Classification of Malignant Tumours, 5th ed. Edited by L. H. Sobin and C. Wittekind. New York: John Wiley & Sons, p. 170, 1997 11. Gleason, D. F.: Histological grading of prostatic carcinoma. In: Pathology of the Prostate. Edited by D. G. Bestwick. New York: Churchill-Livingstone, chapt. 6, p. 83, 1990 12. Hanley, J. A. and McNeil, B. J.: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology, 148: 839, 1983 13. Hanley, J. A. and McNeil, B. J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143: 29, 1982 14. Oesterling, J. E., Brendler, C. B., Epstein, J. I. et al: Correlation of clinical stage, serum prostatic acid phosphatase and preoperative Gleason grade with final pathological stage in 275 patients with clinically localized adenocarcinoma of the prostate. J Urol, 138: 92, 1987 15. Kleer, E., Larson-Keller, J. J., Zincke, H. et al: Ability of preoperative serum prostate-specific antigen value to predict pathologic stage and DNA ploidy. Influence of clinical stage and tumor grade. Urology, 41: 207, 1993 16. Partin, A. W., Yoo, J., Carter, H. B. et al: The use of prostatespecific antigen, clinical stage, and Gleason score to predict pathological stage in men with localized prostate cancer. J Urol, 150: 110, 1993 17. Bluestein, D. L., Bostwick, D. G., Bergstralh, E. J. et al: Eliminating the need for bilateral pelvic lymphadenectomy in select patients with prostate cancer. J Urol, 151: 1315, 1994 18. Sands, M. E., Zagars, G. K., Pollack, A. et al: Serum prostatespecific antigen, clinical stage, pathologic grade, and the incidence of nodal metastases in prostate cancer. Urology, 44: 215, 1994 19. Narayan, P., Gajendran, V., Taylor, S. P. et al: The role of transrectal ultrasound-guided biopsy-based staging, preoperative serum prostate-specific antigen, and biopsy Gleason score in prediction of final pathologic diagnosis in prostate cancer. Urology, 46: 205, 1995 20. Bostwick, D. G., Qian, J., Bergstralh, E. et al: Prediction of capsular perforation and seminal vesicle invasion in prostate cancer. J Urol, 155: 1361, 1996 21. Puppo, P. and Perachino, M.: Clinical stage, prostate-specific antigen and Gleason grade to predict extracapsular disease or nodal metastasis in men with newly diagnosed, previously untreated prostate cancer. A multicenter study. A. Ur. O. Cooperative Group. Eur Urol, 32: 273, 1997 22. Villavicencio Marrich, H., Milla´ n Rodriguez, F., Che´ chile Toniolo, G. et al: Factores prono´ sticos y tablas predictivas del ca´ ncer de pro´ stata no localizado que excluirían la realizacio´ n de la prostatectomía radical. Actas Urol Esp, 22: 581, 1998 23. Zudaire Bergera, J. J., Martín-Marquina Aspiunza, A., Sa´ nchez Zalabardo, D. et al: Prostatectomía radical en adenocarcinoma de pro´ stata. Factores clínicos influyentes en el estadio patolo´ gico. Modelo diagno´ stico. Actas Urol Esp, 23: 694, 1999 24. Gilliland, F. D., Hoffman, R. M., Hamilton, A. et al: Predicting extracapsular extension of prostate cancer in men treated with radical prostatectomy: results from the population based prostate cancer outcomes study. J Urol, 162: 1341, 1999 25. Daniels, G. F., Jr., McNeal, J. E. and Stamey, T. A.: Predictive value of contralateral biopsies in unilaterally palpable prostate cancer. J Urol, part 2, 147: 870, 1992 26. Yuan, J. J., Coplen, D. E., Petros, J. A. et al: Effects of rectal examination, prostatic massage, ultrasonography and needle

27. 28. 29. 30. 31.

32. 33. 34. 35.

36. 37.

38. 39.

40.

41.

42. 43.

44.

45.

46. 47.

biopsy on serum prostate specific antigen levels. J Urol part 2, 147: 810, 1992 Hayes, R. B., Ziegler, R. G., Gridley, G. et al: Dietary factors and risks for prostate cancer among blacks and whites in the United States. Cancer Epidemiol Biomarkers Prev, 8: 25, 1999 Brawley, O. W., Knopf, K. and Thompson, I.: The epidemiology of prostate cancer part II: the risk factors. Semin Urol Oncol, 16: 193, 1998 Snow, P. B., Smith, D. S. and Catalona, W. J.: Artificial neural networks in the diagnosis and prognosis of prostate cancer: a pilot study. J Urol part 2, 152: 1923, 1994 Tewari, A. and Narayan, P.: Novel staging tool for localized prostate cancer: a pilot study using genetic adaptive neural networks. J Urol, 160: 430, 1998 Epstein, J. I.: Pathology of adenocarcinoma of the prostate. In: Campbell’s Urology, 7 ed. Edited by P. C. Walsh, A. B. Retik, E. D. Vaughan, Jr. et al. Philadelphia: W. B. Saunders, chapt. 81, p. 2497, 1998 Gleason, D. F.: Histologic grading of prostate cancer: a perspective. Hum Pathol, 23: 273, 1992 Partin, A. W., Murphy, G. P. and Brawer, M. K.: Report on prostate cancer tumor marker workshop 1999. Cancer, 88: 955, 2000 Douglas, T. H. and Moul, J. W.: Applications of neural networks in urologic oncology. Semin Urol Oncol, 16: 35, 1998 Meistrell, M. L.: Evaluation of neural networks performance by receiver operating characteristic (ROC) analysis: examples from the biotechnology domain. Comput Methods Programs Biomed, 32: 73, 1990 Woods, K. and Bowyer, K. W.: Generating ROC curves for artificial neural networks. IEEE Trans Med Imaging, 16: 329, 1997 Duh, M.-S., Walker, A. M., Pagano, M. et al: Prediction and cross-validation of neural networks versus logistic regression: using hepatic disorders as an example. Am J Epidemiol, 147: 407, 1998 Schwarzer, G., Vach, W. and Schumacher, M.: On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. Stat Med, 19: 541, 2000 Clermont, G., Angus, D. C., DiRusso, S. M. et al: Predicting hospital mortality for patients in the intensive care unit: a comparison of artificial neural networks with logistic regression models. Crit Care Med, 29: 291, 2001 Freeman, R. V., Eagle, K. A., Bates, E. R. et al: Comparison of artificial neural networks with logistic regression in prediction of in-hospital death after percutaneous transluminal coronary angioplasty. Am Heart J, 140: 511, 2000 Walker, A. J., Cross, S. S. and Harrison, R. F.: Visualisation of biomedical datasets by use of growing cell structure networks: a novel diagnostic classification technique. Lancet, 354: 1518, 1999 Rowland, T., Ohno-Machado, L. and Ohrn, A.: Comparison of multiple prediction models for ambulation following spinal cord injury. Proc AMIA Symp, p. 528, 1998 Timmerman, D., Verrelst, H., Bourne, T. H. et al: Artificial neural network models for the preoperative discrimination between malignant and benign adnexal masses. Ultrasound Obstet Gynecol, 13: 17, 1999 Duh, M. S., Walker, A. M., Pagano, M. et al: Prediction and cross-validation of neural networks versus logistic regression: using hepatic disorders as an example. Am J Epidemiol, 147: 407, 1998 Tu, J. V., Weinstein, M. C., McNeil, B. J. et al: Predicting mortality after coronary artery bypass surgery: what do artificial neural networks learn? Steering Committee of the Cardiac Care Network of Ontario. Med Decis Making, 18: 229, 1998 Zernikow, B., Holtmannspoetter, K., Michel, E. et al: Artificial neural network for predicting intracranial haemorrhage in preterm neonates. Acta Paediatr, 87: 969, 1998 Dybowski, R., Weller, P., Chang, R. et al: Prediction of outcome in critically ill patients using artificial neural network synthesised by genetic algorithm. Lancet, 347: 1146, 1996