Expert Systems with Applications 36 (2009) 11341–11346
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Comparison of Bayesian survival analysis and Cox regression analysis in simulated and breast cancer data sets Imran Kurt Omurlu a,*, Kazim Ozdamar b, Mevlut Ture c a
Trakya University Medical Faculty, Department of Biostatistics, 22030 Edirne, Turkey Eskisßehir Osmangazi University Medical Faculty, Department of Biostatistics, Eskisßehir, Turkey c Adnan Menderes University Medical Faculty, Department of Biostatistics, Aydın, Turkey b
a r t i c l e
i n f o
Keywords: Cox regression Bayesian survival Survival Breast cancer Markov Chain Monte Carlo Simulation
a b s t r a c t We aimed to compare the performance of Cox regression analysis (CRA) and Bayesian survival analysis (BSA) by using simulations and breast cancer data. Simulation study was carried out with two different algorithms that were informative and noninformative priors. Moreover, in a real data set application, breast cancer data set related to disease-free survival (DFS) that was obtained from 423 breast cancer patients diagnosed between 1998 and 2007 was used. In the simulation application, it was observed that BSA with noninformative priors and CRA methods showed similar performances in point of convergence to simulation parameter. In the informative priors’ simulation application, BSA with proper informative prior showed a good performance with too little bias. It was found out that the bias of BSA increased while priors were becoming distant from reliability in all sample sizes. In addition, BSA obtained predictions with more little bias and standard error than the CRA in both small and big samples in the light of proper priors. In the breast cancer data set, age, tumor size, hormonal therapy, and axillary nodal status were found statistically significant prognostic factors for DFS in stepwise CRA and BSA with informative and noninformative priors. Furthermore, standard errors of predictions in BSA with informative priors were observed slightly. As a result, BSA showed better performance than CRA, when subjective data analysis was performed by considering expert opinions and historical knowledge about parameters. Consequently, BSA should be preferred in existence of reliable informative priors, in the contrast cases, CRA should be preferred. Ó 2009 Elsevier Ltd. All rights reserved.
1. Introduction Survival analysis is a family of statistical procedures for data analysis for which the outcome variable of interest is time until an event occurs. Most popular of survival procedures is Cox regression analysis (CRA). Because it is a semiparametric and a method for investigating the effect of several variables upon the time a specified event takes to happen (Kleinbaum & Klein, 1996). But over the last few years there has been increased interest shown in the application of survival analysis based on Bayesian methodology. Researchers did not use Bayesian analysis frequently in medical studies because it has a complex theory. Bayesian analysis of survival data has received much recent attention due to advances in computational and modeling techniques (Ibrahim, Chen, & Sinha, 2001). Bayesian survival analysis (BSA) provides inferences that are exact, while CRA bases maximum likelihood estimations of parameters on asymptotic considerations (Calle, Hough, Curia, & Gómez, 2006; SAS Institute, 2006). * Corresponding author. Tel.: +90 284 2357641/1633; fax: +90 284 2357652. E-mail address:
[email protected] (I.K. Omurlu). 0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.03.058
BSA consists of data and prior information. It generates conclusions based on the synthesis of new information from an observed data and historical knowledge or expert opinion. Historical knowledge from past similar studies can be very helpful in interpreting the results of the current study. Therefore, BSA reflects researches’ subjective beliefs. Prior elicitation plays the most crucial role in BSA. BSA cannot be used for any modeling without using a prior distribution (Ibrahim et al., 2001; SAS Institute, 2006). Recently, few works have been published on the BSA method. Yin and Ibrahim (2006) analyzed a simulation study using BSA for varying sample sizes, 1000 replications, 5000 Gibbs samples and 200 burn-in samples and a real data set from a melanoma clinical trial. Calle et al. (2006) analyzed data from sensory shelf-life studies. Wong, Lam, and Lo (2005) used BSA to investigate the effectiveness of silver diamine fluoride and sodium fluoride varnish in arresting active dentin caries in Chinese pre-school children. The purpose of this study was to compare performances of CRA and BSA under varying sample sizes using Monte Carlo simulation and to apply CRA and BSA for disease-free survival (DFS) in breast cancer patients.
11342
I.K. Omurlu et al. / Expert Systems with Applications 36 (2009) 11341–11346
2. Material and methods 2.1. Cox regression analysis The CRA is the most general of the regression models because it is not based on any assumptions concerning the nature or shape of the underlying survival distribution (Ahmed, Vos, & Holbert, 2007). The CRA is the most widely used method of survival analysis. Survival analysis typically examines the relationship of the survival distribution to covariates. Most commonly, this examination entails the specification of a linear-like model for the log hazard. The Cox model may be written as 0
hðt; xÞ ¼ h0 ðtÞeb x where x is the covariate vector, b is the unknown parameter vector and h0(t) is called the baseline hazard (it is the hazard for the respective individual when all independent variable values are equal to zero). h(t,x) denotes the resultant hazard, given the values of the covariates for the respective case and the respective survival time (t). This method uses the partial likelihood to estimate the parameters, and parameter estimates in the method are obtained by maximizing partial likelihood function. The partial likelihood is given by
LðbÞ ¼
k Y i¼1
expðb0 xðiÞ Þ 0 I2Rðt Þ expðb x1 Þ
P
ðiÞ
where the summation in the denominator is the over all subjects in the risk set at time t(i), denoted by R(t(i)), the product is over the k distinct ordered survival times and x(i) denotes the value of the covariate for the subject with ordered survival time t(i) (Hosmer & Lemeshow, 1999; Kleinbaum & Klein, 1996). The CRA has two assumptions, while no assumptions are made about the shape of the underlying hazard function. First, they specify a multiplicative relationship between the underlying hazard function and the log-linear function of the covariates. The second assumption is that there is a log-linear relationship between the independent variables and the underlying hazard function (Hosmer & Lemeshow, 1999; Kleinbaum & Klein, 1996). 2.2. Bayesian survival analysis Bayesian analysis generates conclusions based on the synthesis of new information from the observed data and previous knowledge or external evidence (Wong et al., 2005). In classical approaches such as maximum likelihood, inference is based on the likelihood of the data alone. In Bayesian models, the likelihood of the observed data x given parameters b, denoted as p(xjb) or equivalently L(b), is used to modify the prior beliefs p(b), with the updated knowledge summarized in a posterior density, p(bjx). The relationship between these densities is:
pðbjxÞ / LðbÞpðbÞ Thus, updated beliefs are a function of prior knowledge and the sample data evidence. From the Bayesian perspective the likelihood is viewed as a function of b given fixed data x, and so elements in the likelihood which are not functions of b become part of the proportionality in this equation. L(b) is the partial likelihood function with regression coefficients b as parameters (Congdon, 2003, 2006; Ibrahim et al., 2001; SAS Institute, 2006). In complex models, posterior densities can often be too difficult to work with directly. To update knowledge about the parameters requires that one can sample from the posterior density. With Markov Chain Monte Carlo (MCMC) method, it is possible to generate samples from a posterior density and to use these samples to approximate expectations of quantities of interest. MCMC method
samples successively from a target distribution, with each sample drawn depending on the previous one. Gibbs sampler is a MCMC method, and a very powerful simulation algorithm. Gibbs sampler can be efficient when the parameters are not highly dependent on each other and the full conditional distributions are easy to sample from SAS Institute (2006) and Robert and Casella (2004). Gibbs sampler works as follows (Ibrahim et al., 2001; Robert & Casella, 2004; SAS Institute, 2006): 1. Set m = 0(m = 1, 2, . . . , M), and choose an arbitrary initial value of ð0Þ ð0Þ 0 bð0Þ ¼ fb1 ; b2 ; . . . bð0Þ p g. ðmþ1Þ ðmþ1Þ ; b2 ; . . . bðmþ1Þ g0 2. Generate each component of bðmþ1Þ ¼ fb1 p as follows: ðmþ1Þ ðmÞ from pðb1 jb2 ; . . . ; bðmÞ Draw b1 p ; xÞ ðmþ1Þ
Draw b2 from from Draw bðmþ1Þ p
ðmÞ pðb2 jbðmþ1Þ ; b3 . . . ; bðmÞ p ; xÞ 2 ðmþ1Þ pðbp jb1ðmþ1Þ ; bðmþ1Þ . . . ; bp1 ; xÞ 2
3. Set m = m + 1 and go to step 1. Convergence diagnostics help to resolve whether the Markov chain has reached its stationary. Many diagnostic tests (Gelman– Rubin, Geweke, autocorrelation and so forth) are designed to verify a necessary but not sufficient condition for convergence. With some models, certain parameters can appear to have very good convergence behavior, but that could be misleading due to the slow convergence of other parameters. If some of the parameters have bad mixing, posterior inference for parameters is failed (Congdon, 2003; SAS Institute, 2006). In Bayesian analysis, prior elicitation plays the most crucial role. Bayesian analysis cannot be used for any modeling without using a prior distribution. Bayesian analysis is used to noninformative (objective) or informative (subjective) prior in inference. Informative prior is obtained from previous studies, past experiences or expert opinions. It is not dominated by the likelihood and has an impact on the posterior distribution. Sometimes there is no prior information about any of the model parameters, and what is often referred to as noninformative prior density is used. Noninformative prior has minimal impact on the posterior distribution of b, and can lead to improper posteriors. However, while noninformative prior is very popular in some applications, it is not always easy to construct (Gelman, 2002a; Gelman, 2002b; SAS Institute, 2006). 2.3. Simulation algorithms Our interest in this study was to compare the parameter estimates from CRA and BSA in different conditions. The models developed here have the same multiplicative structure as the Cox regression model. We used two different simulation algorithms for analyses. The probability models with one explanatory variable were used in simulations and the following steps were applied to carry out the simulations. Algorithm I: We compared CRA and BSA with noninformative prior in this algorithm. (1) (2) (3) (4)
Set up a value of the parameter b. Set up a value of the sample size. Set up a value of the baseline hazard function (h0(t)). The variable E was generated from exponential distribution, E Exponential (1). (5) The explanatory variable was generated from uniform distribution with (0, 1) parameters. 0 (6) Survival time ðt ¼ E=h0 ðtÞeb x Þ (Bender, Augustin, & Blettner, 2005) was generated by using values obtained in steps 1–5.
11343
I.K. Omurlu et al. / Expert Systems with Applications 36 (2009) 11341–11346
(7) For uncensored data, the variable s_time was generated from exponential distribution, s_time Exponential (1). If t 6 s_time, uncensored was 1. (8) CRA and BSA were performed by using these steps. (9) The parameter estimates were recorded. Steps 4–8 were replicated 1000 times. Thus, 1000 different parameter estimates were obtained from the analyses. Algorithm II: In the Algorithm II, we compared CRA and BSA with informative prior based on Algorithm I. In simulation studies, n = 30, 100, 250 and 500 were for sample sizes. b = 1 and h0(t) = 0.005 were selected. In the Algorithm I, noninformative prior was selected from a multivariate normal prior distribution with mean vector 0 and covariance matrix 106I, where I was the identity matrix. In the Algorithm II, informative prior valp ¼ 1; 1:1; 1:5; 2 and ues using a normal prior distribution with b r2p ¼ 0:01; 0:05; 0:1; 0:5 was selected. In these algorithms, the underlying assumption was that, after 2000 iterations, the chain would have reached its target distribution. Thus, we took a burn-in of 2000 samples and the posterior estimates were based on 10000 Markov chain samples. Simulations and analyses were performed by using SAS 9.1.3 macro programming language, and SAS PHREG, TPHREG and BPHREG procedures. For each situation, 1000 simulations were performed. After the analyses based on Algorithms I and II were performed, the mean of the 1000 different parameter estimates was calculated. It was evaluated that how the average of parameter estimates close to the value determined for b in step 1. 2.4. Breast cancer data A retrospective analysis was performed in 547 breast cancer patients diagnosed between 1998 and 2007. For the investigation of the prognostic factors such as age, menopausal status, age of first delivery, family history of cancer, histologic tumor type, quadrant
of tumor, tumor size, estrogen and progesterone receptor status, axillary nodal status, pericapsular involvement of lymph nodes, radiotherapy and hormonal therapy, we had complete data for 423 patients, who form the basis of this study. Descriptive statistics of clinical and pathologic data for the entire patient population is listed in Table 1. We performed the classical statistical analysis to examine the differences in the distribution of variables between patients who had recurrence or not. The Kolmogorov Smirnov test with Lilliefors adjustment was used to assess the normality of numeric variables. For all the numeric variables that were non-normally distributed, comparison between two groups was made by the Mann–Whitney U test and results were expressed as median and interquartile range (IQR). Association of recurrence with nominal variables was assessed using the chi-square test. The results obtained from the previous studies for age (Yu et al., 1995), tumor size (Foekens et al., 1992), menopausal status (Gasparini et al., 1997), progesterone receptor status (Pinto, Andre, & Soares, 1999), estrogen receptor status (Pinto et al., 1999), radiotherapy (Rowlings et al., 1999), axillary nodal status (Foekens et al., 1992), and histologic tumor type (Gasparini et al., 1997) with regard to DFS in breast cancer patients were used to construct the informative prior as prior information in calculating the DFS posterior distributions. For other parameters, we resorted to using a normal prior distribution with mean 0 and variance 106. 3. Results 3.1. Simulations We simulated the generated data by running for each of CRA and BSA with noninformative prior using Algorithm I. The averaged values over the 1000 simulations are reported in Table 2 for vary^ b. Although, in CRA ing sample sizes. Biases were calculated as b and BSA with noninformative prior, estimated parameter was found close to simulation parameter (b = 1) in all the sample sizes, BSA estimated a model parameter with a slightly small bias.
Table 1 Clinical and laboratory characteristics of the study groups. Independent variable
Recurrence
Age (year) median (IQR) Tumor size (cm) median (IQR) Age of first delivery (year) Menopausal status Progesterone receptor status Estrogen receptor status Radiotherapy Hormonal therapy Family history of cancer
Axillary nodal status
Quadrant of tumor Pericapsular involvement of lymph nodes Histologic tumor type
P30 <30 Post menopausal Pre + peri menopausal Negative Positive Negative Positive Absent Present Absent Present Absent Breast cancer Other cancers Negative 1–3 Lymph nodes positive P4 Lymph nodes positive Unicentric Multicentric Negative Positive Ductal Non-ductal
z
p
52 (17) 3.5 (2.9)
1.029 3.769
0.304 <0.001
n (%) 7 (5.8) 113 (94.2) 64 (53.3) 56 (46.7) 46 (38.3) 74 (61.7) 40 (33.3) 80 (66.7) 17 (14.2) 103 (85.8) 44 (36.7) 76 (63.3) 77 (64.2) 12 (10.0) 31 (25.8) 27 (22.5) 28 (23.3) 65 (54.2) 96 (80.0) 24 (20.0) 55 (45.8) 65 (54.2) 104 (86.7) 16 (13.3)
v2 0.176
p 0.675
1.440
0.230
7.354
0.007
3.749
0.053
1.474
0.225
13.796
<0.001
1.858
0.395
32.613
<0.001
8.950
0.003
23.501
<0.001
2.021
0.155
Absent (n = 303)
Present (n = 120)
49 (15) 3 (2) n (%) 13 (4.3) 290 (95.7) 142 (46.9) 161 (53.1) 76 (25.1) 227 (74.9) 73 (24.1) 230 (75.9) 60 (19.8) 243 (80.2) 59 (19.5) 244 (80.5) 215 (71.0) 25 (8.3) 63 (20.8) 139 (45.9) 85 (28.1) 79 (26.1) 276 (91.1) 27 (8.9) 215 (71.0) 88 (29.0) 243 (80.2) 60 (19.8)
We compared across the CRA and BSA methods under varying sample sizes by using Monte Carlo simulation method and discovered the risk factors for the management of DFS.
0.0069 0.0007 0.0003 0.0014 0.0040 0.0840 0.0529 0.0371 0.0140 0.4238 0.2645 0.1807 0.0542 0.8510 0.5317 0.3619 0.1046 0.2221 0.0338 0.1049 0.1424 0.1999 0.0340 0.1053 0.1429 0.2001 0.0348 0.1072 0.1448 0.2009 0.0360 0.1103 0.1476 0.2020
p Bias n = 500
p Bias ^ b
r^ b^ n = 250
r^ b^
Bias
p n = 100
^ b p Bias ^ b
r^ b^ n = 30
r2p p b
4. Discussion
Prior
In Table 4, we gave estimates, standard errors, Wald test statistics and hazard ratios of the regression coefficients in the stepwise CRA and stepwise BSA with forward elimination for DFS time. In CRA and BSA with noninformative prior, we found that age (p = 0.003), tumor size (p < 0.001), hormonal therapy (p = 0.009) and lymph nodes positive P4 (p < 0.001) had significant effects on DFS. In BSA with informative prior, age (p = 0.002), tumor size (p < 0.001), hormonal therapy (p = 0.010), lymph nodes positive 1–3 (p < 0.001) and lymph nodes positive P4 (p < 0.001) had significant effects on DFS (Table 4). We showed the parameter estimates and their standard errors for the CRA and BSA in Fig. 1. Parameter estimates obtained from BSA with informative prior in the situation used historical knowledge as prior information for parameters had a slightly smaller standard error than CRA and BSA with noninformative prior. Fig. 2 shows survival curves for the CRA and BSA. Survival curves obtained from CRA and BSA with noninformative prior overlapped. In BSA, Geweke diagnostic test and autocorrelations indicated a reasonably good mixing of the Markov chain for all the parameters (p > 0.05).
Method
3.2. Evaluation on breast cancer data
Table 3 Parameter estimates, biases and standard errors obtained from 1000 Monte Carlo simulation for b = 1 and n = 30, 100, 250, 500.
Furthermore, we found that parameter estimates of both methods were converged to the simulation parameter according to two proportion t-test (p > 0.05). As a result simulations showed that results of BSA with noninformative prior were similar with CRA and did not surpass in different sample sizes. We simulated the generated data by running for each of CRA and BSA with informative prior using Algorithm II. The averaged values over the 1000 simulations are reported in Table 3 for varying sample sizes. When sample size increased, the parameter estimates obtained from CRA and BSA were more unbiased with small p ¼ 1 standard error. On condition that the best informative prior (b and r2p ¼ 0:01; 0:05; 0:01; 0:5), BSA had a better predictive performance than CRA for four sample sizes. Especially, when variance of informative prior was decreased and informative prior was the best, we showed that the bias of parameter estimates obtained from BSA decreased. On condition that the improper informative p ¼ 1:1; 1:5; 2Þ, we found that the bias of the parameter estiprior ðb mates decreased in BSA. However, when sample size and variance p ¼ 0:01; 0:05; 0:01; 0:5Þ increased, although of prior distribution ðb prior information was improper, we found that the bias of the parameter estimates decreased. In both simulation algorithms, Geweke diagnostic test and autocorrelations indicated a reasonably good mixing of the Markov chain (p > 0.05).
^ b
r^ b^
0.941 0.938 0.964 0.972
1.0069 0.9993 1.0003 1.0014 1.0040 1.0840 1.0529 1.0371 1.0140 1.4238 1.2645 1.1807 1.0542 1.8510 1.5317 1.3619 1.1046
0.0567 0.0417 0.0151 0.0083
0.998 0.947 0.970 0.978 0.992 0.001 0.525 0.757 0.954 <0.001 0.001 0.102 0.744 <0.001 <0.001 0.001 0.509
0.7551 0.5367 0.3337 0.2350
0.0007 0.0018 0.0038 0.0043 0.0028 0.0899 0.0653 0.0485 0.0155 0.4575 0.3427 0.2607 0.0892 0.9186 0.6927 0.5288 0.1818
1.0567 1.0417 1.0151 1.0083
0.3308 0.0272 0.1022 0.1560 0.2703 0.0273 0.1026 0.1565 0.2707 0.0279 0.1044 0.1589 0.2724 0.0287 0.1075 0.1628 0.2749
0.709 0.939 0.961 0.966
1.0007 0.9982 0.9962 0.9957 0.9972 1.0899 1.0653 1.0485 1.0155 1.4575 1.3427 1.2607 1.0892 1.9186 1.6927 1.5288 1.1818
p
0.0515 0.0433 0.0165 0.0100
0.953 0.969 0.997 0.987 0.968 <0.001 0.287 0.584 0.883 <0.001 <0.001 0.009 0.569 <0.001 <0.001 <0.001 0.277
Bias
0.7495 0.5653 0.3349 0.2360
0.0321 0.0007 0.0007 0.0022 0.0137 0.0958 0.0853 0.0762 0.0503 0.4824 0.4261 0.3732 0.1972 0.9663 0.8545 0.7474 0.3823
r^ b^
1.0515 1.0433 1.0165 1.0100
0.5435 0.0181 0.0795 0.1383 0.3412 0.0181 0.0797 0.1386 0.3418 0.0184 0.0806 0.1403 0.3447 0.0187 0.0825 0.1435 0.3496
^ b
1.0321 0.9993 1.0003 1.0022 1.0137 1.0958 1.0853 1.0762 1.0503 1.4824 1.4261 1.3732 1.1972 1.9663 1.8545 1.7474 1.3823
p
0.957 0.885 0.944 0.960 0.993 <0.001 0.074 0.367 0.853 <0.001 <0.001 <0.001 0.345 <0.001 <0.001 <0.001 0.069
Bias
0.0604 0.0015 0.0035 0.0047 0.0032 0.0974 0.0917 0.0862 0.0640 0.4935 0.4728 0.4502 0.3338 0.9888 0.9502 0.9071 0.6741
r^ b^
1.1207 0.0103 0.0493 0.0938 0.3408 0.0103 0.0495 0.0941 0.3419 0.0105 0.0504 0.0958 0.3476 0.0109 0.0520 0.0989 0.3575
BSA
^ b
1.0604 0.9985 0.9965 0.9953 0.9968 1.0974 1.0917 1.0862 1.0640 1.4935 1.4728 1.4502 1.3338 1.9888 1.9502 1.9071 1.6741
30 100 250 500
CRA
– 0.01 0.05 0.1 0.5 0.01 0.05 0.1 0.5 0.01 0.05 0.1 0.5 0.01 0.05 0.1 0.5
n
– 1 1 1 1 1.1 1.1 1.1 1.1 1.5 1.5 1.5 1.5 2 2 2 2
Table 2 Parameter estimates, biases and standard errors obtained from 1000 Monte Carlo simulation for b = 1.
0.975 0.983 0.998 0.992 0.984 0.014 0.616 0.795 0.944 <0.001 0.014 0.213 0.787 <0.001 <0.001 0.015 0.605
I.K. Omurlu et al. / Expert Systems with Applications 36 (2009) 11341–11346
CRA BSA
11344
11345
I.K. Omurlu et al. / Expert Systems with Applications 36 (2009) 11341–11346 Table 4 Results obtained from stepwise analyses for DFS in breast cancer patients. Method
Independent variable
^ b
r^ b^
Wald chi-square
p
Hazard ratio
CRA
Age (year) Tumor size (cm) Hormonal therapy Axillary nodal status
Present 1–3 Lymph nodes positive P4 Lymph nodes positive
0.0226 0.1314 0.5037 0.5194 1.1819
0.0076 0.0360 0.1917 0.2731 0.2430
8.8274 13.3168 6.9051 3.6178 23.6517
0.003 <0.001 0.009 0.057 <0.001
1.023 1.140 0.604 1.681 3.261
Present 1–3 Lymph nodes positive P4 Lymph nodes positive
0.0225 0.1289 0.5005 0.5252 1.1982
0.0077 0.0365 0.1920 0.2724 0.2433
8.5867 12.4488 6.7933 3.7190 24.2468
0.003 <0.001 0.009 0.054 <0.001
1.023 1.138 0.606 1.691 3.314
Present 1–3 Lymph nodes positive P4 Lymph nodes positive
0.0237 0.1655 0.4893 0.6462 1.2006
0.0075 0.0303 0.1908 0.1364 0.1330
10.0554 29.9290 6.5785 22.4580 81.4679
0.002 <0.001 0.010 <0.001 <0.001
1.024 1.180 0.613 1.908 3.322
BSA with noninformative prior
BSA with informative prior
Age (year) Tumor size (cm) Hormonal therapy Axillary nodal status Age (year) Tumor size (cm) Hormonal therapy Axillary nodal status
1.5
Parameter Estimate
1.0
0.5
0.0
CRA BSA-I BSA-II -SE +SE
-0.5
-1.0 Age Tumor Size
Hormonal Therapy Lymph Nodes Positive >=4 Lymph Nodes Positive 1-3
Fig. 1. Estimates and their standard errors (SE) of regression coefficients obtained from stepwise CRA, BSA with informative prior (BSA-I) and BSA with noninformative prior (BSA-II) for DFS in breast cancer patients.
Fig. 2. Survival curves obtained from stepwise CRA, BSA with informative prior (BSA-I) and BSA with noninformative prior (BSA-II) for DFS in breast cancer patient.
One major distinction between BSA and CRA is their interpretation. The BSA method reflects subjective beliefs. CRA assumes that unknown parameters are fixed constants, and it defines probability by using limiting relative frequencies. BSA treats parameters as random variables and defines probability as degrees of belief. It provides inferences that are conditional on the data and Bayesian inference is exact rather than asymptotic (Gelfand & Mallick, 1995; SAS Institute, 2006).
Prior information plays the most crucial role in BSA. BSA cannot be used for any modeling without using a prior distribution. If useful prior information is available about coefficient parameters we would be happy to use it. Sometimes there is no prior information about any of the model parameters, and what is often referred to as noninformative prior density is used. Calle et al. (2006) and Ibrahim et al. (2001) reported that Bayesian and classical approaches usually result in similar conclusions, when additional external information is not available. In our simulation studies, when prior information (historical knowledge and expert opinion) was not available, and noninformative prior was used for inference, BSA provided results that were very similar to the results produced by CRA for varying sample sizes. Ibrahim et al. (2001) reported that although noninformative and improper priors may be useful and easier to specify for certain problems, they cannot be used in all applications (model selection or comparison). Few works have been published on the BSA method. Yin and Ibrahim (2006) analyzed a simulation study using BSA for varying sample sizes, 1000 replications, 5000 Gibbs samples and 200 burnin samples and a real data set from a melanoma clinical trial. They determined that the posterior standard deviation increases, as the censoring rate increases. Calle et al. (2006) reported that the results obtained from the fat-free yogurt were used to construct a prior information in calculating the whole-fat yogurt posterior distributions, and they led to small improvements in the posterior distributions. Wong et al. (2005) used BSA to investigate the effectiveness of silver diamine fluoride and sodium fluoride varnish in arresting active dentin caries in Chinese pre-school children. They reported that there is a danger that the additional complexity of Bayesian methods could lead to improper data analysis if it is not used correctly. In these simulation studies, we showed that prior information played a very crucial role in predicting simulation parameter. We showed that performance of BSA increased when proper prior information with small variance was used. In BSA, the bias of parameter estimate increased for using improper prior information and varying sample sizes. However, when improper prior information with small variance was used, the bias of parameter estimate increased. Gelfand and Mallick (1995) said that Bayesian approach would be expected to provide more believable estimates of variability than under likelihood analysis for smaller data sets. Gelman (2002a) said that prior distribution is a key part of Bayesian inference. They reported that with well-identified parameters and large sample sizes, reasonable choices of prior distributions will have minor effects on posterior inferences, and if the sample size is small or available data provide only indirect information about the parameters of interest, the prior distribution becomes more important. Similarly, in this study, we found that
11346
I.K. Omurlu et al. / Expert Systems with Applications 36 (2009) 11341–11346
the BSA had the best performance if proper informative prior was used for smaller data sets. Although the bias of parameter estimate in CRA decreased for smaller data sets, both the bias and standard error of parameter estimate in BSA decreased. Although Ibrahim et al. (2001) and Wong et al. (2005) said that BSA is more advantageous than CRA, in terms of flexibility of model building for complex data, in our simulations and breast cancer study, BSA that used informative and proper prior information was more advantageous than CRA. However, BSA that used improper prior information did not have an advantage over the CRA. In every condition, informative and proper prior information should be used for analyzing data with BSA. As a result, BSA showed better performance than CRA, when subjective data analysis was performed by considering expert opinions and historical knowledge about parameters. Consequently, BSA should be preferred in existence of reliable or proper informative priors, in the contrast cases, CRA should be preferred. Acknowledgement We would like to express our gratitude to Dr. Fusun Tokatli for providing breast cancer data and for his useful comments. References Ahmed, F. E., Vos, P. W., & Holbert, D. (2007). Modeling survival in colon cancer: A methodological review. Molecular Cancer, 6, 15. Bender, R., Augustin, T., & Blettner, M. (2005). Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine, 24(11), 1713–1723. Calle, M. L., Hough, G., Curia, A., & Gómez, G. (2006). Bayesian survival analysis modeling applied to sensory shelf life of foods. Food Quality and Preference, 17(3–4), 307–312.
Congdon, P. (2003). Applied Bayesian modeling. England: John Wiley & Sons. Congdon, P. (2006). Bayesian statistical modelling. England: John Wiley & Sons. Foekens, J. A., Schmitt, M., Van Putten, W. L. J., Peters, H. A., Bontenbal, M., Janicke, F., et al. (1992). Prognostic value of urokinase-type plasminogen activator in 671 primary breast cancer patients. Cancer Research, 52, 6101–6105. Gasparini, G., Toi, M., Gion, M., Verderio, P., Dittadi, R., Hanatani, M., et al. (1997). Prognostic significance of vascular endothelial growth factor protein in nodenegative breast carcinoma. Journal of the National Cancer Institute, 89(2), 139–147. Gelfand, A. E., & Mallick, B. K. (1995). Bayesian analysis of proportional hazards models built from monotone functions. Biometrics, 51(3), 843–852. Gelman, A. (2002a). Posterior distribution. Encyclopedia of Environmetrics, 3, 1627–1628. Gelman, A. (2002b). Prior distribution. Encyclopedia of Environmetric, 3, 1627–1628. Hosmer, D. W., & Lemeshow, S. (1999). Applied survival analysis: Regression modeling of time to event data. Canada: John Wiley & Sons. Ibrahim, J. G., Chen, M. H., & Sinha, D. (2001). Bayesian survival analysis. New York: Springer-Verlag. Kleinbaum, D. G., & Klein, M. (1996). Survival analysis: A self-learning text. USA: Springer. Pinto, A. E., Andre, S., & Soares, J. (1999). Short-term significance of DNA ploidy and cell proliferation in breast carcinoma: A multivariate analysis of prognostic markers in a series of 308 patients. Journal of Clinical Pathology, 52, 604–611. Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods. New York: Springer. Rowlings, P. A., Williams, S. F., Antman, K. H., Fields, K. K., Fay, J. W., Reed, E., et al. (1999). Factors correlated with progression-free survival after high-dose chemotherapy and hematopoietic stem cell transplantation for metastatic breast cancer. JAMA, 282, 1335–1343. SAS Institute. (2006). Preliminary capabilities for Bayesian analysis in SAS/STATR Software, SAS Institute Inc., Cary, NC, USA. Wong, M. C. M., Lam, K. F., & Lo, E. C. M. (2005). Bayesian analysis of clustered interval-censored data. Journal of Dental Research, 84(9), 817–821. Yin, G., & Ibrahim, J. G. (2006). Bayesian transformation hazard model. In Proceedings of the second Lehmann symposium-optimality, IMS lecture notes– monograph series (Vol. 49, pp. 170–182). Yu, H., Giai, M., Diamandis, E. P., Katsaros, D., Sutherland, D. J., Levesque, M. A., et al. (1995). Prostate-specific antigen is a new favorable prognostic indicator for women with breast cancer. Cancer Research, 55(10), 2104–2110.