MBS 7350
No. of Pages 9, Model 5G
10 May 2013 Mathematical Biosciences xxx (2013) xxx–xxx 1
Contents lists available at SciVerse ScienceDirect
Mathematical Biosciences journal homepage: www.elsevier.com/locate/mbs
Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy
2 3 4
Q1
a
5 6
b
7 8
Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan
a r t i c l e
1 2 0 2 11 12 13 14 15 16 17 18 19 20 21
Taiji Suzuki a, Kazuyuki Aihara a,b,⇑
i n f o
Article history: Available online xxxx
Q3
Keywords: Hormone therapy Prostate cancer Intermittent androgen suppression System identification Variational Bayes Gaussian process Piecewise affine system
a b s t r a c t These days prostate cancer is one of the most common types of malignant neoplasm in men. Androgen ablation therapy (hormone therapy) has been shown to be effective for advanced prostate cancer. However, continuous hormone therapy often causes recurrence. This results from the progression of androgen-dependent cancer cells to androgen-independent cancer cells during the continuous hormone therapy. One possible method to prevent the progression to the androgen-independent state is intermittent androgen suppression (IAS) therapy, which ceases dosing intermittently. In this paper, we propose two methods to estimate the dynamics of prostate cancer, and investigate the IAS therapy from the viewpoint of optimality. The two methods that we propose for dynamics estimation are a variational Bayesian method for a piecewise affine (PWA) system and a Gaussian process regression method. We apply the proposed methods to real clinical data and compare their predictive performances. Then, using the estimated dynamics of prostate cancer, we observe how prostate cancer behaves for various dosing schedules. It can be seen that the conventional IAS therapy is a way of imposing high cost for dosing while keeping the prostate cancer in a safe state. We would like to dedicate this paper to the memory of Professor Luigi M. Ricciardi. Ó 2013 Elsevier Inc. All rights reserved.
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
39 40
1. Introduction
41
The purpose of this paper is to propose methods to predict the behavior of prostate cancer by identifying its dynamics. On the basis of the identified dynamics, we analyze the effect of intermittent androgen suppression (IAS) therapy in comparison to continuous androgen suppression (CAS) therapy on the trade-off between the amounts of cancer cells and dosing. In IAS therapy, dosing is intermittently ceased to avoid the proliferation of drug-resistant cancer cells. The American Cancer Society estimates that prostate cancer strikes approximately 16% of American men [29]. Prostate cancer continues to be the most common type of cancer in men and the second leading cause of cancer death, after lung cancer, in the United States. Androgen ablation therapy has been the mainstay of treatment for advanced prostate cancer, following the pioneering work of Huggins and Hodges [19,20], and it makes use of the amenability of prostate cancer to androgen. Secretory epithelial cells included in the prostate have androgen receptors, which are responsible
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
⇑ Corresponding author at: Institute of Industrial Science, The University of
Q2
Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan. Tel.: +81 3 5452 6691; fax: +81 3 5452 6692. E-mail address:
[email protected] (K. Aihara).
for the androgen dependence of the growth of prostate cancer. When androgen is withdrawn, the secretory epithelial cells go into apoptosis [7,26]. This phenomenon is a key point of androgen ablation therapy. In the treatment, the amount of androgen is decreased through medical castration, and the withdrawal causes a decrease of androgen-dependent tumor cells. In fact, CAS therapy for 3 months lets about 90% of patients have normal levels of the serum prostate-specific antigen (PSA), which is a tumor marker. However, the CAS therapy often results in recurrence, where the growth of prostate cancer cannot be managed by the hormone therapy. The main reasons for the ineffectiveness of the CAS therapy are adaptation and mutation to androgen-independent cells as well as clonal selection. Upon complete androgen withdrawal, the surviving tumor cells generally adapt to the androgen-depleted environment and progress to an androgen-independent state. Furthermore, permanent androgen ablation might even promote growth and progression of androgen-independent cells [11,25]. However, if androgen ablation therapy is stopped before the clonogenic tumor cells progress to an androgen-independent state, we may retain the amenability of the surviving cells to androgen and avoid the selective survival of preexisting androgen-independent cells and the mutation of androgen-dependent cells to the androgen-independent state [14]. Since excellent treatment involving pharmacological castration has been developed, we can stop and restart androgen ablation at any time. Recently, many
0025-5564/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.mbs.2013.04.007
Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
MBS 7350
No. of Pages 9, Model 5G
10 May 2013 Q1 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
138 139 140 141 142 143 144 145 146
2
T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx
articles have shown the effectiveness of the IAS therapy [2,10,12,14,15,17,21,23] in clinical studies, and the methodology of the treatment has been well established. Another reason why intermittent therapy is preferred is because it prevents side effects. Androgen withdrawal can cause several side effects such as hot flashes, loss of libido, impotence, adynamia, osteoporosis, fatigue, cognitive dysfunction, and depression. Severe side effects pose serious problems since they can impair the quality of life. Thus, it is noteworthy that well-manipulated IAS can not only lead to prolongation of the androgen-dependent state but also improve the quality of life. Along with clinical study, it would be useful to devise a method for identifying the dynamics of the tumors and predicting their future behavior [18,22]. Toward this purpose, we utilize techniques developed in computer science. In this paper, we consider prostate cancer as a system that is driven by unknown dynamics. The identification of unknown dynamics of a system is called ‘‘system identification.’’ There are two main approaches for general system identification. One is a bottom–up approach, which is used to construct a model from previous knowledge about an objective system. This is strongly dependent on the pertinency of the modeler’s knowledge. If the objective system is not well investigated or the model is not sufficiently polished, this approach fails. The other approach is a black-box (or a non-parametric) approach, which is used to construct a model by investigating only input– output relations with minimum assumptions on an objective system. In other words, the black-box approach aims to directly predict the future values of PSA given the medical history observed so far. This is essentially the non-parametric estimation of a functional that maps an input sequence (medical history) to an output sequence (future values of PSA). Each of these two approaches has both pros and cons. In this paper, we investigate the black-box approach. In particular, we discuss two methods, namely, a variational Bayesian (VB) method and a Gaussian process method. In the former method, we employ a piecewise Affine (PWA) system to model the dynamics of prostate cancer. The PWA system is one of the hybrid systems [1,16,30,34]. Recently, a method with the optimal control of a PWA hybrid system model was proposed to control the behavior of prostate cancer [31]. The Gaussian process method, on the other hand, is based on the Gaussian process regression technique [27]. Gaussian process regression is one of the most popular methods for non-parametric regression in statistics and machine learning. This method is computationally efficient and has a solid theoretical basis. We compare these two methods in terms of their predictive accuracy. The second purpose of this paper is to evaluate the conventional IAS therapy from the viewpoint of optimality. We will show that the system identification methods described above can reconstruct the real evolution of the PSA well. On the basis of the model constructed in the black-box approach, we compare the IAS therapy with the CAS therapy by determining a weighted sum of the PSA value and the dosing amount. It is observed that the conventional IAS treatment in a clinical situation can be said to be a therapy that minimizes dosing while keeping the PSA in a safe state.
2. Modeling prostate cancer’s dynamics with a nonlinear autoregressive eXogeneous model In this section, we discuss the modeling of the dynamics of the PSA with a nonlinear Auto-Regressive eXogeneous (ARX) model (see Fig. 1 for actually observed data). Then, we show two non-parametric system identification methods based on a nonlinear ARX model: a variational Bayesian method utilizing PWA regression and a Gaussian process regression method. Roughly speaking, the nonlinear ARX model describes a nonlinear relation between the future
PSA values and the medical history (PSA and dosing history). The system identification problem of a nonlinear ARX model can be reduced to a (nonlinear) regression problem. The first approach with the variational Bayesian method can be used along with the PWA regression model for solving the regression problem. The PWA model is particularly important in the context of optimal control [4,5]. In the second approach with the Gaussian process method, a smooth function is fit to the data in a nonparametric Bayesian manner with Gaussian process priors.
147
2.1. ARX model for PSA dynamics
156
Here, we introduce the ARX model that describes the dynamics of the PSA. Now, we assume that patients’ PSA values are observed on a regular schedule.1 We denote by qt the actual PSA value at the tth time step. Let yt 2 R be the log of the PSA value at time step t and let it be given by yt ¼ logðqt Þ.2 Let ut 2 f0; 1g represent the on–off state of the androgen ablation: ut ¼ 1 for on treatment and ut ¼ 0 for off treatment. Then the nonlinear ARX model can be described as
157
yt ¼ f ðyt1 ; yt2 ; . . . ; ytm ; ut ; . . . ; utm Þ þ t ;
148 149 150 151 152 153 154 155
158 159 160 161 162 163
164 166
where m is the maximum time delay, f is an unknown nonlinear function, and t is i.i.d. Gaussian noise with mean 0 and unknown variance. By identifying f, it becomes possible to predict the future value of PSA given the medical history. The estimation of the dynamics is reduced to the estimation of the nonlinear function f (and the unknown variance of the noise), which is a regression problem. We estimate f from the observed data D ¼ fyt ; ut gTt¼1 . We propose two approaches based on Bayesian estimation to estimate f.
167
2.2. Piecewise affine ARX model by variational Bayesian inference
175
Here, we introduce the PWA model for the auto-regressive function f. Let xt be given by
176
xt ¼ ½yt1 ; yt2 ; . . . ; ytm ; ut ; ut1 ; . . . ; utm > : The ordinary (linear) ARX model employs a linear function as the function f as follows:
where w 2 R2mþ1 . In the case of the PWA model that is of interest to us, f is given as
f ðxt Þ ¼
w>i
xt 1
169 170 171 172 173 174
177
178 180 181 182
183 185
f ðxt Þ ¼ w> xt ;
168
186 187
188
;
if xt 2 X i ;
190
where wi 2 R2mþ2 ; fX i gM i¼1 is a polyhedral decomposition of the state S 2mþ1 space R2mþ1 , that is, M ; X i \ X j ¼ ;ði – jÞ, and each X i i¼1 X i ¼ R
191
can be represented as X i ¼ fz 2 R2mþ1 jHi z 6 K i ; F i z < Gi g for appropriate Hi ; K i ; F i , and Gi . It is known that many kinds of problems appearing in the analysis of a PWA hybrid system can be reduced to mixed integer programming (MIP) problems. In particular, the optimal control problem of a PWA hybrid system is reduced to a MIP problem [4,5]. We need to determine the number M of polyhedral regions, the M polyhedral decompositon fX i gM i¼1 , and the parameters fwi gi¼1 . Here, we propose a method that uses a variational Bayesian technique to identify the PWA system. Now, let d ¼ 2m þ 1 and
193
1
In this paper, the time interval between each observation is 21 days. Because of the log-transformation, yt can take any real value while the PSA value qt ¼ expðyt Þ is always ensured to be positive. Therefore, if we estimate the dynamics of yt instead of that of qt , we do not need to impose any constraint on the dynamics such as positivity. This gives us much flexibility to develop a non-parametric system identification method. This technique is often utilized to model a positive time series, e.g., stock prices. 2
Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007
192
194 195 196 197 198 199 200 201 202
MBS 7350
No. of Pages 9, Model 5G
10 May 2013 Q1
3
T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx
Fig. 1. The evolution of the PSA and on–off of androgen ablation. The blue solid line shows the PSA, the red dashed line shows the on–off of androgen ablation (up is on, and down is off), and the gray broken line indicates the end of the estimation period. (For interpretation of the references to colour in this figure caption, the reader is referred to the web version of this article.)
203 204 205 206
207
209 210 211 212 213 214 215 216 217
218
N ¼ T m. By abuse of notation, we denote by fðxn ; yn ÞgNn¼1 the observed input–output pairs. First, we model the distribution of the input vector x as the Gaussian mixture distribution as follows:
pðxjW; MÞ ¼
M X
1 i ; Si Þ;
ui N ðxjl
ð1Þ
i¼1
PM M i ; Si gi¼1 ; i¼1
where W ¼ fui ; l ui ¼ 1; ui P 0 (i ¼ 1; . . . ; M), and N ðxjl; VÞ is a Gaussian distribution with mean l and covariance matrix V. This is a mixture of M Gaussian components N ðxjli ; S1 i Þ (i ¼ 1; . . . ; M). Then, we introduce a latent variable, Z ¼ fzni gM;N i¼1;n¼1 , that indicates which component the nth sample xn belongs to. If the nth sample belongs to the ith Gaussian component, then zni ¼ 1; otherwise zni ¼ 0. Thus, the distribution of xn for zni ¼ 1 is given as 1 i ; Si Þ:
220
pðxn jzni
221
Second, we model the distribution of the output (yn ) conditioned by the input (xn ) and the latent variable (fzni gM i¼1 ) as
222
223 225 226 227
228
230 231 232 233 234 235
¼ 1; WÞ ¼ N ðxn jl
pðyn jxn ; zni ¼ 1; HÞ ¼ N ðyn jw>i ½xn ; 1; b1 i Þ; w> i
ð2Þ
fwi ; bi gM i¼1 .
dþ1
where 2 R ; bi 2 Rþ , and H ¼ Then, we obtain the (joint) likelihood of D and Z as
pðD; ZjW; H; MÞ ¼
N Y M n Y
o zn i 1 T ui N ðxn jli ; S1 : i ÞN ðyn jwi ½xn ; 1; bi Þ
n¼1 i¼1
On the basis of this likelihood, we estimate the parameters W; H, and Z in a Bayesian manner. As a prior distribution, we employ a so-called conjugate prior that is convenient to compute the posterior (see B for the details). Since the analytical form of the posterior distribution is not available despite the availability
of a conjugate prior, we need to use a computational method to obtain the posterior distribution. Roughly speaking, there are two types of computational methods to obtain the Bayesian estimator; one involves a Gibbs sampler (Markov chain Monte Carlo approach) and the other involves a variational Bayesian (VB) estimator. In this paper, we employ the latter method since the VB approach is computationally more efficient (but less accurate). M The technical detail to estimate fwi gM i¼1 and fX i gi¼1 is described in B.
236
2.3. Nonlinear ARX model by Gaussian process regression
245
Here, we introduce the second approach based on Gaussian process regression [27]. In the Gaussian process regression, we put a non-parametric Gaussian process prior GP on a functional space. A zero-mean Gaussian process ~f GP is a random variable on a function space (e.g., L1 ðRd Þ-space) such that each finite subset ð~f ðx1 Þ; . . . ; ~f ðxj ÞÞðj ¼ 1; 2; . . .Þ obeys a zero-mean multivariate normal distribution. The behavior of the Gaussian process is determined by the kernel function k : Rd Rd ! R, that is, the covariance function defined by
246
h i kðx; x Þ :¼ E~f GP ~f ðxÞ~f ðx0 Þ :
255
0
237 238 239 240 241 242 243 244
247 248 249 250 251 252 253 254
257
Then, letting f ¼ ½~f ðx1 Þ; . . . ; ~f ðxn Þ and f ¼ ~f ðxÞ for a newly ob> served point x 2 Rd , the joint prior distribution of ½f f > is given as
258
K 11 > pð½f f Þ ¼ N ½f f > 0; K >12
260
>
> >
K 12 K 22
ðkðxn ; xn0 ÞÞN;N n¼1;n0 ¼1 ; K 12
;
262 >
where K 11 ¼ ¼ ½kðx1 ; xÞ; kðx2 ; xÞ; . . . ; kðxN ; xÞ , and K 22 ¼ kðx; xÞ. Now, since we assumed noise t to be Gaussian,
Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007
259
263 264
MBS 7350
No. of Pages 9, Model 5G
10 May 2013 Q1
4
T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx
266
the likelihood function of ~f is also given by a Gaussian distribution as follows:
269
pðyjfxn gNn¼1 ; ~f Þ ¼ N ðyjf; r2 IÞ;
270
where y ¼ ½y1 ; . . . ; yN > , and r is the standard deviation of n . Note that the likelihood function depends only on f. Thus, through some calculations, we observe that the posterior distribution of f obeys the Gaussian distribution, with the mean f and variance var ðf Þ given by
265
267
271 272 273 274
275
f ¼ K > ðK 11 þ r2 IÞ1 y; 12 277 278 279 280 281 282 283
284
var ðf Þ ¼ K22
K>12 ðK11
2
1
þ r IÞ K12 :
P Here, the posterior mean f can be rewritten as f ¼ Nn¼1 kðx; xn Þan , 2 1 where a ¼ ðK 11 þ r IÞ y. Therefore, the estimated value of f ðxÞ can be calculated by simply summing up the values kðx; xn Þ of the kernel function. In our experiments, we employed a Gaussian kernel as the kernel function as follows: 0 2
!
286
kx x k kðx; x Þ ¼ exp 2r02
287
The Gaussian width
288
from the center such that r0
0
:
r0 was set to be the mean squared distance
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P P ¼ N1 Nn¼1 kxn N1 Nn0 ¼1 xn0 k2 . Moreover
r was estimated by 5-fold cross validation. 2
289
the noise variance
290
3. Long-term and short-term predictability
291
In this section, we evaluate the predictability of the proposed methods for both long-term and short-term horizons. Fig. 1 shows the evolution of the PSA and on–off of androgen ablation for five patients.3 The PSA values were observed in discrete times. We connected the observed data point with solid lines in Fig. 1. Using the data shown in Fig. 1, we estimated the dynamics of each patient with the two methods introduced in Section 2. We discretized the observation time. We employed 21 days as the time interval between each time step, that is, y1 is the log of the PSA value on the day 0, y2 is the log of the PSA value on day 21, yt is the log of the PSA value on day 21 ðt 1Þ, and so on. Then, we used the first T est time steps for the estimation of the dynamics, and the rest est of the data were used for the evaluation, i.e., fðyt ; ut ÞgTt¼1 were used T end for estimation and fðyt ; ut Þgt¼T est þ1 were used for evaluation, where T end is the entire length of the observed data. In Fig. 1, T est for each patient is indicated by the gray broken line. The actual values of T est are T est ¼ 62; 62; 82; 67, and 51 (these T est values correspond to 1281; 1281; 1701; 1386, and 1050 days, respectively). Moreover, we employed m ¼ 5 since that gave the best predictive accuracy. To evaluate how accurately the methods estimate the dynamics, we considered the predictive accuracies for two cases: a long-term horizon and a short-term horizon.
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312
313
3.1. Long-term prediction
314
First, let us evaluate the predictive accuracies for a long-term T end horizon. We estimated the PSA value ðqt Þt¼T after the estimation est þ1 period by running the estimated dynamics, with the initial setting determined by the observed values: xT est þ1 ¼ ½yT est ; . . . ; yT est mþ1 ;
315 316 317
3
The data in this paper were taken from a prospective phase II clinical trial of IAS for men with evidence of serum PSA recurrence after radiotherapy for locally advanced cancer [9]. This trial was approved by the Health Protection Branch of the Ministry of Health of Canada and by the Clinical Research Ethics Review Boards of the participating centres. All patients provided written informed consent in accordance with institutional guidelines. The data are available at http://www.nicholasbruchovsky.com/clinicalResearch.html. The patients shown in the figure correspond to patient numbers 1, 28, 37, 55, and 91 in the data.
Table 1 Predictive errors for the long-term horizon. For the variational Bayes approach, the average error along with the standard deviation over 10 repetitions is shown. Patient no.
Variational Bayes
1 2 3 4 5
0.67 1.94 1.21 0.76 0.76
(0.09) (0.40) (0.09) (0.17) (0.08)
Gaussian process 0.78 2.03 1.02 0.99 0.55
uT est þ1 ; . . . ; uT est mþ1 . We used the actual input ut to run the estiend ^t ÞTt¼T mated dynamics. Let the estimated PSA sequence be ðq . est þ1 We then evaluated the predictive error as the following sum of the absolute values of the discrepancy from the actual PSA values:
predictive error ¼
T end X
^t qt j: jq
318 319 320 321
322
ð3Þ
t¼T est þ1
324
The predictive errors of the VB approach and Gaussian process approach are presented in Table 1. Since the result of the VB approach varied with the initial setting (because of local optimum), we randomly generated the initial settings 10 times and averaged the predictive errors over the 10 repetitions (in Table 1, the standard deviation over the 10 repetition is also presented in the parentheses). We can see that the better method depends on the patients. With regard to the total error across different patients, the VB approach is slightly better. Examples of the estimated time series of the PSA are shown in Fig. 2. It was reported that patient 1 changed the medicine for androgen ablation in the last cycle, therefore, making a prediction for the last cycle should be a hard problem.
325
3.2. Short-term prediction
338
Next, we evaluate the predictive accuracies for a short-term horizon. We iteratively estimate the actually observed PSA qt by ^t ¼ ^f ðxt Þ, using xt given by the actually observed yt and ut as q where ^f is the estimated auto-regressive function. This situation corresponds to a one-step prediction. Then, as in the case of the long-term prediction, we evaluate the predictive error by using Eq. (3). The short-term predictive errors of the VB approach and Gaussian process approach are presented in Table 2. Here, we again randomly generated the initial settings of the VB approach 10 times and averaged the predictive errors over the 10 repetitions (the standard deviation over the 10 repetition is also presented in the parenthesis). We can see that overall, the Gaussian process approach shows better performances than the VB approach. This is different from the result of the long-term prediction, where both methods showed almost similar performances. As expected, the predictive errors for the short-term horizon is smaller than those for the long-term horizon (Table 2).
339
4. Optimality of IAS
357
In this section, we discuss optimality of the conventional IAS by investigating what kind of criterion IAS optimizes. For the investigation, we perform a simulation on the estimated dynamics for various types of dosing schedules and compare the values of some cost function. We examine patient 1 in Fig. 1 and apply the variational Bayesian system identification method to the data of patient 1. We use the entire time series for the estimation, and the parameter settings are the same as those in Section 3. Fig. 3 shows the simulated sequence. The on–off of androgen ablation is switched when the PSA value crosses the magenta border lines (13 and 1.8). Here since the observation time is
358
Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007
326 327 328 329 330 331 332 333 334 335 336 337
340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356
359 360 361 362 363 364 365 366 367 368
MBS 7350
No. of Pages 9, Model 5G
10 May 2013 Q1
T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx
5
Fig. 2. Estimated PSA time series. The blue solid line shows the actually observed PSA, the green solid line shows the estimated time series of the PSA, the red dashed line shows the on–off of androgen ablation (up is on, and down is off), and the gray broken line indicates the end of the estimation period. (For interpretation of the references to colour in this figure caption, the reader is referred to the web version of this article.)
374 Table 2 Predictive errors for the short-term horizon. For the variational Bayes approach, the average error along with the standard deviation over 10 repetitions is shown. Patient no.
Variational Bayes
1 2 3 4 5
0.17 1.03 0.40 0.40 0.41
(0.03) (0.47) (0.09) (0.06) (0.08)
Gaussian process 0.06 0.87 0.24 0.21 0.19
Fig. 3. Simulated PSA time series on the estimated system. The blue solid line shows the simulated PSA, the red dashed line shows the on–off of androgen ablation (up is on, and down is off), and the magenta broken lines indicate the threshold values of the on–off switching of androgen ablation. (For interpretation of the references to colour in this figure caption, the reader is referred to the web version of this article.)
369 370 371 372 373
discrete where the time interval is 21 days long, there appear delays of switching. We can see from Fig. 3 that the estimated system reproduces the behavior of the actually observed data well. Now, we consider the following cost function of an input (on–off of androgen ablation) sequence ðut ÞTt¼1 :
T X ^t þ ð1 kÞut g; Xk ðuÞ ¼ fkq
ð4Þ
t¼1
^t ÞTt¼1 ðq
where is the simulated PSA sequence for the input ðut ÞTt¼1 ; k 2 ½0; 1 gives the weight for the cost, and T is the number of time steps. We generated five input sequences ðut ÞTt¼1 by adjusting the on and off periods of androgen ablation. The input sequences were de^t does not exceed 20, namely the oversigned so that the PSA q growth of prostate cancer is inhibited. Then, we observed the cost function of Eq. (4) for these five generated input sequences ^t 6 20 under the settings of k ¼ 0 or 0:1 and T ¼ 80. The with q low value of the cost function with k ¼ 0 points to the regulation ^t in a safe region while suppressing therapy as much as possiof q ble. In contrast, when k ¼ 0:1, in addition to suppressing dosing, ^t should also be suppressed to obtain a small value the PSA value q of the cost function. The results are shown in Fig. 4, in which the five input sequences are aligned according to the values of the corresponding cost functions. The input sequence corresponding to the top-left and right-bottom panels in Fig. 4 (the lowest cost function for k ¼ 0 and the highest cost function for k ¼ 0:1) is an imitation of the actually treated input sequence in the clinical data. The on and off periods are set to be the average ones of the clinical data. Since this input sequence minimizes the cost function Xk ðuÞ for k ¼ 0, we can say that the conventional IAS is a way of making the PSA stay in a safe region while skipping the therapy as long as possible. On the other hand, we can see that the short-term cessation of androgen ablation gives a larger cost for k ¼ 0 and a smaller cost for k ¼ 0:1. Such a treatment suppresses the PSA value very well. However, it imposes a larger amount of dosing, which might diminish the quality of life of the patient. Here, note that since the original clinical data do not show recurrence, the identified system also does not show recurrence. Therefore, permanent androgen ablation does not lead to
Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007
376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408
MBS 7350
No. of Pages 9, Model 5G
10 May 2013 Q1
6
T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx
Fig. 4. Ordering various input sequences according to their cost function values. The left column corresponds to k ¼ 0, and the right column corresponds to k ¼ 0:1. From top to bottom, the cost function increases. 409 410 411 412
recurrence in the identified system. We should point out that although the CAS therapy minimizes the cost function with k ¼ 0:1, there is a high possibility that it finally causes irreversible recurrence.
413
5. Summary and discussion
414
5.1. Summary
415
428
In this paper, we proposed two methods to estimate the dynamics of prostate cancer, namely the variational Bayesian (VB) method for a piecewise affine (PWA) system and the Gaussian process regression method. We compared the two methods in two cases: namely, long-term prediction and short-term prediction. In the case of long-term prediction, both methods showed almost similar performances. In the case of short-term prediction, on the other hand, the Gaussian process regression method slightly outperformed the VB method. Moreover, on the basis of an estimated system, we investigated the optimality of the conventional IAS therapy. For various dosing schedules, the conventional IAS therapy was observed to be an optimal method that minimizes the amount of dosing while keeping the PSA value below the risky level.
429
5.2. Discussion and future work
416 417 418 419 420 421 422 423 424 425 426 427
430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445
A favorable point of the methods presented in this paper is the computational efficiency. The parametric model proposed by [18] needs a non-convex optimization, that requires much more computational cost. On the other hand, our approaches cannot predict what has not occurred so far, because they do not utilize any other knowledge. One possible solution to overcome this drawback is to construct a hierarchical Bayes model to transfer the information observed in other patients. We have observed that the predictive accuracy for the shortterm horizon is better than that for the long-term horizon. Moreover, in the long-term horizon, unpredictable recurrence might occur as stated above. Thus the proposed methods are suited for a short- or middle-term prediction. A practical usage of this kind of prediction is to alert if the actual PSA value goes far outside the confidence interval of the predicted PSA value. This should give helpful clinical information to make a medical decision.
Acknowledgement
446
The authors thank Dr. N. Bruchovsky and Dr. Y. Hirata for their valuable discussions. The authors would also like to thank anonymous reviewers for their helpful comments. This research is partially supported by the Japan Society for the Promotion of Science (JSPS) through its ‘‘Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program),’’ initiated by the Council for Science and Technology Policy (CSTP).
447
Appendix A. State space embedding
454
Here we discuss validity of an ARX model compared with a state space model. One might favor a state space model rather than an ARX model from the viewpoint of interpretability. However, we can find a nice connection between ARX models and state space models through the Takens embedding theorem [24,28,32]. Let M be a d-dimensional compact C k -manifold, and let Diff ðMÞ and C k ðM; RÞ denote the space of C k diffeomorphisms from M to M and that of C k maps from M to R, respectively.
455
Theorem 1. Takens Embedding Theorem [24,28,32]. Let M be a compact C k -manifold, and let d be the dimension of M. Let W be a Takens map:
463
W:M ! 2
2
x
#
R
448 449 450 451 452 453
456 457 458 459 460 461 462
464 465
466
2dþ1
ðhðxÞ; h /ðxÞ; . . . ; h /2d ðxÞÞ:
Then, the set
468 469
470
E ¼ fð/; hÞ 2 Diff ðMÞ C k ðM; RÞj TheTakensmapWforð/; hÞisembeddingofMg is an open dense subset of Diff ðMÞ C k ðM; RÞ with the Whitney C k topology. h
472 473 474 475
This theorem tells us that from the series of delayed observations ðhðxÞ; h /ðxÞ; . . . ; h /2d ðxÞÞ, the state space can be reconstructed for almost every ð/; hÞ, and the 2d þ 1 dimension is
Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007
476 477 478
MBS 7350
No. of Pages 9, Model 5G
10 May 2013 Q1
481
large enough to embed the d-dimensional manifold. Therefore, for sufficiently large m, the ARX model gives an equivalent representation of the state space model.
482
Appendix B. Variational Bayesian inference
483
B.1. Variational Bayes estimation for PWA model
484
In the VB approach, we approximate the true posterior distribution by the product of each probability function corresponding to each parameter as follows:
479 480
485 486
487 489
490 491 492 493 494 495 496
497 499 500
501 502 503
504 506
pðW; H; ZjD; MÞ ’ Q ðWjMÞQ ðHjMÞQ ðZjMÞ: The VB minimizes the discrepancy between the true posterior distribution and the approximated posterior distribution in terms of the KL-divergence. The VB technique used to calculate Q is based on [33]. The details are given in the following Sections (B.2 and B.3). Using the approximated posterior, the affine function corresponding to each Gaussian component can be estimated as the posterior mean as follows:
i ¼ w
Z
wi Q ðwi jMÞd wi :
Finally, let us compute the polynomial decomposition fX i gM i¼1 . From the approximated posterior distribution Q ðZjMÞ of the latent variable, one can obtain the expectation of the latent variable as follows:
zni ¼
Z
zni Q ðzni jMÞdzni ¼ Q ðzni ¼ 1jMÞ:
513
Then, a cluster label such as cn ¼ arg maxi¼1;...;M zni is assigned to each sample xn . On the basis of the N labeled pairs fðxn ; cn ÞgNn¼1 , we consider the problem of obtaining a polyhedral decomposition as a multicategory classification problem. In our implementation, we employed Multicategory SVM (M-SVM) [6,8]. The M-SVM gives M couples of a vector and a real fð/i ; gi ÞgM i¼1 that separate the M polyhedral regions. For fixed i, if x 2 Rd satisfies
516
x> /i gi > x> /j gj
517
520
then x is classified in the ith polyhedral region X i . In other words, x 2 Rd is classified in the i th polyhedral region, where i ¼ arg maxi¼1;...;M fx> /i gi g. Thus, the polyhedral decomposition of the region is achieved.
521
B.2. General framework
522
538
Here, we present the general framework of variational Bayes. Let D ¼ fxn gNn¼1 be the observed data that are independently identically distributed with an unknown true distribution. The aim of statistical inference is to estimate the true distribution that has generated D from observed data. For the estimation, we utilize a statistical model M that describes the true distribution well, and the model has a parameter h that specifies the likelihood (probability density) pðDjh; MÞ of D. In the mixture model, as in our setting, we identify the model M as the number of components. Moreover, we are interested in the latent variable Z ¼ fzni gN;M n¼1;i¼1 that indicates which component each sample xn belongs to. The maximum likelihood framework maximizes the likelihood, i.e., it estimates the true parameter value and the latent variable from arg maxh;Z pðDjh; Z; MÞ. In contrast, the Bayesian inference does not try to identify the unknown variables with one point, but estimates the posteriori distribution pðh; ZjD; MÞ, which is defined as
541
pðDjh; MÞpðhjMÞpðZjMÞ pðh; ZjD; MÞ ¼ ; pðDjMÞ
507 508 509 510 511 512
514
518 519
523 524 525 526 527 528 529 530 531 532 533 534 535 536 537
539
7
T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx
where pðhjMÞ and pðZjMÞ are the prior distributions of parameters h and Z. A disadvantage of the Bayesian inference is that the computation of the normalization constant pðDjMÞ is difficult, and thus, the inference on the basis of the posterior is usually computationally difficult. For example, the posterior mean of the parameter R h ¼ hpðhjD; MÞdh is not easily obtained. To cope with this problem, there are mainly three approaches, i.e., the Markov chain Monte Carlo (MCMC) method, the Laplace approximation method, and the variational Bayes (VB) method [3,35]. We employ the VB method in this study. Let # be a set of hyperparameters. We then consider the following marginal log-likelihood of D that does not depend on the parameters and model structures:
542 543 544 545 546 547 548 549 550 551 552 553 554 555
556
LðDÞ ¼ log pðDÞ XX ZZ ¼ log pðD; Zjh; MÞpðhj#; MÞpð#jMÞpðMÞdhd#; M
Z
where pð#jMÞ and pðMÞ are the prior distributions of # and M. Here, note that LðDÞ is a constant variable given data D. The VB method overcomes the computational difficulty by approximating the true a posteriori distribution to an alternate distribution Q that is easy to treat (e.g., having an analytic form). The approximated distribution Q is called a variational posterior distribution. Here, we restrict the class of Q to a set of distributions that can be factorized as Q ðZ; h; #; MÞ ¼ Q ðZjMÞQ ðhjMÞQ ð#jMÞQ ðMÞ. In the case where pðhj#; MÞ and pð#jMÞ can be factorized by indepenQ dent parameters, i.e., pðhj#; MÞ ¼ Ii¼1 pðhi j#i ; MÞ and pð#jMÞ ¼ QI i¼1 pð#i jMÞ, we also assume that Q can be factorized as Q Q Q ðhjMÞ ¼ Ii¼1 Q ðhi jMÞ and Q ð#jMÞ ¼ Ii¼1 Q ð#i jMÞ. This is the case we consider in this study. By introducing the new distribution Q, we can rewrite LðDÞ as follows:
LðDÞ ¼
XX ZZ M
ð8j – iÞ;
560 561 562 563 564 565 566 567 568 569 570 571 572 573
Q ðZ; h; #; MÞ log pðDÞdhd#
Z
where we define
576 577
def
KLðQ ðZ; h; #; MÞjjpðZ; h; #; MjDÞÞ ¼
X ZZ
578
Q ðZ; h; #; MÞ
M;Z
QðZ; h; #; MÞ dhd#; log pðZ; h; #; MjDÞ ZZ X pðD; Z; h; #; MÞ def dhd#: Q ðZ; h; #; MÞ log F ðQ Þ ¼ Q ðZ; h; #; MÞ M;Z KLðQ jjpÞ measures the discrepancy between probabilistic distributions Q and p, and it is non-negative. The purpose of using the VB is to make Q as close to p as possible, i.e., to minimize KLðQ jjpÞ over Q among the set of factorized distributions. Now LðDÞ is constant, and thus, the purpose can be achieved by maximizing F ðQ Þ. Here, we set F M ðQ Þ as
pðD; Zjh; MÞ def F M ðQ Þ¼ log Q ðZjMÞ QðZjMÞ;Q ðhjMÞ M M X X pðhi j#i ; MÞ pð#i jMÞ þ log þ log Qðhi jMÞ Q ðhi jMÞ;Q ð#i jMÞ i¼1 Q ð#i jMÞ Q ð#i jMÞ i¼1 pðMÞ ; Q ðMÞ
where hf ðXÞiQ ðXÞ is the expectation of f with respect to the distribution Q ðXÞ, namely EQ ðXÞ ½f ðXÞ. Then, we obtain
Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007
559
574
¼ KLðQ ðZ; h; #; MÞjjpðZ; h; #; MjDÞÞ þ F ðQ Þ;
þ log
558
580 581 582 583 584 585 586
587
589 590 591
592
MBS 7350
No. of Pages 9, Model 5G
10 May 2013 Q1
594 595 596 597 598 599
600
602 603
604
8
T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx
F ðQ Þ ¼
X
QðMÞF M ðQÞ:
ð5Þ
M
To maximize F ðQ Þ, the VB iteratively maximizes over a distribution of each parameter while the distributions corresponding to the other parameters are fixed. By the variational method, the (t þ 1) th iteration procedure is as follows: VB E-step:
Q ðZjMÞ
ðtþ1Þ
607 608 609 610 611
612 614
Q ðMÞ ¼
pðfu
/ pð#i jMÞ expfhlog pðhi j#i ; MÞiQðhi jMÞðtþ1Þ g;
1 ðM ¼ M Þ; 0
ðotherwiseÞ;
619 620
623 624 625 626
627 629
630 631
632
634 635 636 637 638
639
I Y
1 exp Tr ðB0 SÞ ; 2
i¼1
udi 0 1 d
I X
ui ¼ 1&ui P 0 ;
b xa1 expfbxg: CðaÞ
644
The Wishart distribution is used for the distribution of a covariance matrix of a Gaussian distribution, and the Dirichlet distribution is used for a mixture proportion f/gM i¼1 .
645
Next, we introduce hyperpriors faj gM j¼1 , which control the vari-
646
ances of fwj gJj¼1 , i.e., the inverse of Kj ¼ diag ðaj;1 ; . . . ; aj;dþ1 Þ corre-
642 643
647 648 649 650
pðfwj ; bj gIj¼1 jfaj gIj¼1 ; MÞ ¼
664
J Y N ðwj j0; ðbj Kj Þ1 ÞGðbj jq0 ; k0 Þ; j¼1
dþ1 Y
Gðaj;k jj0 ; f0 Þ;
where Kj ¼ diagðaj;1 ; . . . ; aj;dþ1 Þ: M (the number of components):
( pðMÞ ¼
1 ; 15
if M 6 15;
0;
otherwise:
The remaining hyperparameters corresponding to each prior distribution are as follows:
In our inference, we employed the following setting for the hyperparameters:
666 667
668
670 671 672
674 673 676 677 678
679 681
As for B0 , we first perform k-means clustering and set B0 as the within-class variance–covariance matrix. Details of the calculation of the variational posterior can be found in [33].
682
References
686
i¼1
a
641
663
683 684 685
!
where c0 is a normalizing constant and dð Þ is an indicator function that returns 1 if the statement in the round bracket is true and 0 otherwise. Gamma distribution:
Gðxja; bÞ ¼
662
pðbj Þ ¼ Gðbj jq0 ; k0 Þ;
m0 ¼ 0; n0 ¼ 1; g0 ¼ 10; d0 ¼ 1; q0 ¼ 10; k0 ¼ 1; j0 ¼ 1; and f0 ¼ 10:
where c is a normalizing constant. Dirichlet distribution:
Dðfui gIi¼1 jd0 Þ ¼ c0
659
k¼1
Wishart distribution: 2
658
pðwj jbj ; aj Þ ¼ N ðwj j0; ðbj Kj Þ Þ;
pðaj Þ ¼
Here, we describe the prior distributions that we employed in the inference. First, we define notations of some probability distributions.
WðSjg0 ; B0 Þ ¼ cjSi j
657
661
li : m0 ; n0 ; Si : g0 ; B0 ; ui : d0 ; bj : q0 ; k0 ; and aj : j0 ; f0 :
g0 d1
655
1
M
B.3. Prior distributions
618
¼ Dðfu
I i gi¼1 jd0 Þ:
a (the coefficient of each PWA component and the variance of noise (see Eq. (2))):
with M ¼ arg maxF M :
622
617
N ðli jm0 ; ðn0 Si Þ1 ÞWðSi jg0 ; B0 Þ:
fwj ; bj gJj¼1 ; f j gM j¼1
621
616
I Y i¼1
I i gi¼1 Þ
To obtain the optimal model, we compute the optimal F M ðQ Þ with respect to each M and then select the optimal model M . Here there still exists the problem of local minima. The VB procedure is highly likely to capture a local minima. To overcome this problem, Ueda and Ghahramani proposed a Bayesian SMEM algorithm for a mixture model that executes a split-and-merge procedure and avoids stacks at a local minimum [33].
615
pðfli ; Si gIi¼1 jMÞ ¼
where hi denotes fhj gj–i . The names ‘‘E-step’’ and ‘‘M-step,’’ are adopted on the basis of an analogy with an EM algorithm [13]. Next, we choose the optimal model. Remember that F ðQ Þ is the average of F M ðQ Þ over Q ðMÞ (see Eq. (5)). Thus, Q ðMÞ that maximizes F ðQ Þ is given by
653
656
n
Q ð#i jMÞ
652
pðSi Þ ¼ WðSi jg0 ; B0 Þ;
fui gM i¼1 (the mixture weight of each Gaussian component (see Eq. (1))):
VB M-step:
ðtþ1Þ
651
pðli jSi Þ ¼ N ðli jm0 ; ðn0 Si Þ1 Þ;
/ expfhlog pðD; Zjh; MÞiQðhjMÞðtÞ g;
Q ðhi j#i ; MÞðtþ1Þ / exp hlog pðD; Zjh; MÞiQðZjMÞðtÞ ;Qðhi jMÞðtÞ o þhlog pðhi j#i ; MÞiQð#i jMÞðtÞ ; 606
fli ; Si gM i¼1 (the mean and variance–covariance matrix of each Gaussian component (see Eq. (1))):
def
sponds to the variance of wj . By introducing aj , the influence of each component of wj can be modified more flexibly. Then we employ the following conjugate priors as the prior distributions for parameters.
[1] K. Aihara (Ed.), Theme Issue, Theory of hybrid dynamical systems and its applications to biological and medical systems, Philosophical Transactions of The Royal Society A 368 (1930) (2010). [2] K. Akakura, N. Bruchovsky, S.L. Goldenberg, P.S. Rennie, A.R. Buckley, L.D. Sullivan, Effects of intermittent androgen suppression on androgen-dependent tumors-apoptosis and serum prostate-specific antigen, Cancer 71 (9) (1993) 2782. [3] H. Attias, Inferring parameters and structure of latent variable models by variational Bayes, in: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), Stockholm, Sweden, 1999, pp. 21–30. [4] A. Bemporad, F. Borrelli, M. Morari, Optimal controllers for hybrid systems: stability and piecewise linear explicit form, in: Proceedings of the 39th IEEE Conference on Decision and Control, Sydney, Australia, 2001, pp. 1810–1815. [5] A. Bemporad, F. Borrelli, M. Morari, On the optimal control law of linear discrete time hybrid systems, Lecture Notes in Computer Science 2289 (2002) 105. [6] K.P. Bennett, O.L. Mangasarian, Multicategory discrimination via linear programming, Optim. Methods Softw. 3 (1994) 27. [7] R.R. Berges, Y. Furuya, L. Remington, H.F. English, T. Jacks, J.T. Isaacs, Cell proliferation, DNA repair, and p53 function are not required for programmed death of prostatic glandular cells induced by androgen ablation, Proc. Natl. Acad. Sci. USA 90 (19) (1993) 8910. [8] E.J. Bredensteiner, Multicategory classification by support vector machines, Comput. Optim. Appl. 12 (1999) 53.
Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007
687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710
MBS 7350
No. of Pages 9, Model 5G
10 May 2013 Q1 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742
T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx [9] N. Bruchovsky, L. Klotz, J. Crook, S. Malone, L. Charles, W.J. Morris, M.E. Gleave, S.L. Goldenberg, Final results of the Canadian prospective phase II trial of intermittent androgen suppression for men in biochemical recurrence after radiotherapy for locally advanced prostate cancer, Cancer 107 (2006) 385. [10] N. Bruchovsky, B. Lesser, E. Van Doorn, S. Craven, Hormonal effects on cell proliferation in rat prostate, Vitam. Horm. 33 (1975) 61. [11] N. Bruchovsky, P.S. Rennie, A.J. Coldman, S.L. Goldenberg, M. To, D. Lawson, Effects of androgen withdrawal on the stem cell composition of the Shionogi carcinoma, Cancer Res. 50 (1990) 2275. [12] J.M. Crook, E. Szumacher, S. Malone, S. Huan, R. Segal, Intermittent androgen suppression in the management of prostate cancer, Urology 53 (3) (1999) 530. [13] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B (Methodological) 39 (1) (1977) 1. [14] A.M. Evens, T.M. Lestingi, J.D. Bitran, Intermittent androgen suppression as a treatment for prostate cancer: a review, The Oncologist 3 (6) (1998) 419. [15] S.L. Goldenberg, M.E. Gleave, L.D. Sullivan, K. Akakura, Intermittent androgen suppression in the treatment of prostate cancer: a preliminary report, Urology 45 (5) (1995) 839. [16] W.P.M.H. Heemels, B. De Schutter, A. Bemporad, Equivalence of hybrid dynamical models, Automatica 37 (2001) 1085. [17] C.S. Higano, W. Ellis, K. Russell, P.H. Lange, Intermittent androgen suppression with leuprolide and flutamide for prostate cancer: a pilot study, Urology 49 (1997) 79. [18] Y. Hirata, N. Bruchovsky, K. Aihara, Development of a mathematical model that predicts the outcome of hormone therapy for prostate cancer, J. Theor. Biol. 264 (2010) 517. [19] C. Huggins, C.V. Hodges, Studies on prostatic cancer. I. The effect of castration, estrogen and of androgen injection on serum phosphatases in metastatic carcinoma of the prostate, Cancer Res. 1 (1941) 293. [20] C. Huggins, R.E. Stevens, C.V. Hodges, Studies on prostatic cancer. II. The effect of castration on advanced carcinoma of the prostate gland, Arch. Surg. 43 (1941) 209.
9
[21] L.H. Klotz, H.W. Herr, M.J. Morse, W.F. Whitmore, Intermittent endocrine therapy for advanced prostate-cancer, Cancer 58 (11) (1986) 2546. [22] H. Kuramae, Y. Hirata, N. Bruchovsky, K. Aihara, H. Suzuki, Nonlinear systems identification by combining regression with bootstrap resampling, Chaos 21 (2011) 043121. [23] N. Kyprianou, N.T. Isaacs, Quantal relationship between prostatic dihydrotestosterone and prostatic cell content: critical threshold concept, Prostate 11 (1987) 41. [24] L. Noakes, The Takens embedding theorem, Int. J. Bifur. Chaos 1 (4) (1991) 867. [25] R.L. Noble, Hormonal control of growth and progression tumors of NB rats and a theory of action, Cancer Res. 37 (1) (1977) 82. [26] M.H. Rashid, U.H. Chaudhary, Intermittent androgen deprivation therapy for prostate cancer, The Oncologist 9 (2004) 295. [27] C.E. Rasmussen, C. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006. [28] T. Sauer, J.A. Yorke, M. Casdagli, Embedology, J. Stat. Phys. 65 (3/4) (1991) 579. [29] R. Siegel, D. Naishadham, A. Jemal, Cancer statistics, A Cancer J. Clin. 62 (2012) 10. [30] E.D. Sontag, Non-linear regulation: the piecewise linear approach, IEEE Trans. Autom. Control 26 (2) (1981) 346. [31] T. Suzuki, N. Bruchovsky, K. Aihara, Piecewise affine systems modelling for optimizing hormonal therapy of prostate cancer, Philos. Trans. A R. Soc. 368 (2010) 5045. [32] F. Takens, Detecting strange attractors in turbulence, Lecture Notes Math. 898 (1981) 366. [33] N. Ueda, Z. Ghahramani, Bayesian model search for mixture models based on optimizing variational bounds, Neural Netw. 15 (2002) 1223. [34] A.J. Van Der Schaft, J.M. Schumacher, Complementarity modelling of hybrid systems, IEEE Trans. Autom. Control 43 (1998) 483. [35] S.R. Waterhouse, D. MacKay, A.J. Robinson, Bayesian methods for mixture of experts, Adv. Neural Inf. Proces. Syst. NIPS8 (1995) 351–357.
Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007
743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774