Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy

Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy

MBS 7350 No. of Pages 9, Model 5G 10 May 2013 Mathematical Biosciences xxx (2013) xxx–xxx 1 Contents lists available at SciVerse ScienceDirect Mat...

1MB Sizes 0 Downloads 34 Views

MBS 7350

No. of Pages 9, Model 5G

10 May 2013 Mathematical Biosciences xxx (2013) xxx–xxx 1

Contents lists available at SciVerse ScienceDirect

Mathematical Biosciences journal homepage: www.elsevier.com/locate/mbs

Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy

2 3 4

Q1

a

5 6

b

7 8

Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan

a r t i c l e

1 2 0 2 11 12 13 14 15 16 17 18 19 20 21

Taiji Suzuki a, Kazuyuki Aihara a,b,⇑

i n f o

Article history: Available online xxxx

Q3

Keywords: Hormone therapy Prostate cancer Intermittent androgen suppression System identification Variational Bayes Gaussian process Piecewise affine system

a b s t r a c t These days prostate cancer is one of the most common types of malignant neoplasm in men. Androgen ablation therapy (hormone therapy) has been shown to be effective for advanced prostate cancer. However, continuous hormone therapy often causes recurrence. This results from the progression of androgen-dependent cancer cells to androgen-independent cancer cells during the continuous hormone therapy. One possible method to prevent the progression to the androgen-independent state is intermittent androgen suppression (IAS) therapy, which ceases dosing intermittently. In this paper, we propose two methods to estimate the dynamics of prostate cancer, and investigate the IAS therapy from the viewpoint of optimality. The two methods that we propose for dynamics estimation are a variational Bayesian method for a piecewise affine (PWA) system and a Gaussian process regression method. We apply the proposed methods to real clinical data and compare their predictive performances. Then, using the estimated dynamics of prostate cancer, we observe how prostate cancer behaves for various dosing schedules. It can be seen that the conventional IAS therapy is a way of imposing high cost for dosing while keeping the prostate cancer in a safe state. We would like to dedicate this paper to the memory of Professor Luigi M. Ricciardi. Ó 2013 Elsevier Inc. All rights reserved.

23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

39 40

1. Introduction

41

The purpose of this paper is to propose methods to predict the behavior of prostate cancer by identifying its dynamics. On the basis of the identified dynamics, we analyze the effect of intermittent androgen suppression (IAS) therapy in comparison to continuous androgen suppression (CAS) therapy on the trade-off between the amounts of cancer cells and dosing. In IAS therapy, dosing is intermittently ceased to avoid the proliferation of drug-resistant cancer cells. The American Cancer Society estimates that prostate cancer strikes approximately 16% of American men [29]. Prostate cancer continues to be the most common type of cancer in men and the second leading cause of cancer death, after lung cancer, in the United States. Androgen ablation therapy has been the mainstay of treatment for advanced prostate cancer, following the pioneering work of Huggins and Hodges [19,20], and it makes use of the amenability of prostate cancer to androgen. Secretory epithelial cells included in the prostate have androgen receptors, which are responsible

42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

⇑ Corresponding author at: Institute of Industrial Science, The University of

Q2

Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan. Tel.: +81 3 5452 6691; fax: +81 3 5452 6692. E-mail address: [email protected] (K. Aihara).

for the androgen dependence of the growth of prostate cancer. When androgen is withdrawn, the secretory epithelial cells go into apoptosis [7,26]. This phenomenon is a key point of androgen ablation therapy. In the treatment, the amount of androgen is decreased through medical castration, and the withdrawal causes a decrease of androgen-dependent tumor cells. In fact, CAS therapy for 3 months lets about 90% of patients have normal levels of the serum prostate-specific antigen (PSA), which is a tumor marker. However, the CAS therapy often results in recurrence, where the growth of prostate cancer cannot be managed by the hormone therapy. The main reasons for the ineffectiveness of the CAS therapy are adaptation and mutation to androgen-independent cells as well as clonal selection. Upon complete androgen withdrawal, the surviving tumor cells generally adapt to the androgen-depleted environment and progress to an androgen-independent state. Furthermore, permanent androgen ablation might even promote growth and progression of androgen-independent cells [11,25]. However, if androgen ablation therapy is stopped before the clonogenic tumor cells progress to an androgen-independent state, we may retain the amenability of the surviving cells to androgen and avoid the selective survival of preexisting androgen-independent cells and the mutation of androgen-dependent cells to the androgen-independent state [14]. Since excellent treatment involving pharmacological castration has been developed, we can stop and restart androgen ablation at any time. Recently, many

0025-5564/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.mbs.2013.04.007

Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007

59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83

MBS 7350

No. of Pages 9, Model 5G

10 May 2013 Q1 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137

138 139 140 141 142 143 144 145 146

2

T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx

articles have shown the effectiveness of the IAS therapy [2,10,12,14,15,17,21,23] in clinical studies, and the methodology of the treatment has been well established. Another reason why intermittent therapy is preferred is because it prevents side effects. Androgen withdrawal can cause several side effects such as hot flashes, loss of libido, impotence, adynamia, osteoporosis, fatigue, cognitive dysfunction, and depression. Severe side effects pose serious problems since they can impair the quality of life. Thus, it is noteworthy that well-manipulated IAS can not only lead to prolongation of the androgen-dependent state but also improve the quality of life. Along with clinical study, it would be useful to devise a method for identifying the dynamics of the tumors and predicting their future behavior [18,22]. Toward this purpose, we utilize techniques developed in computer science. In this paper, we consider prostate cancer as a system that is driven by unknown dynamics. The identification of unknown dynamics of a system is called ‘‘system identification.’’ There are two main approaches for general system identification. One is a bottom–up approach, which is used to construct a model from previous knowledge about an objective system. This is strongly dependent on the pertinency of the modeler’s knowledge. If the objective system is not well investigated or the model is not sufficiently polished, this approach fails. The other approach is a black-box (or a non-parametric) approach, which is used to construct a model by investigating only input– output relations with minimum assumptions on an objective system. In other words, the black-box approach aims to directly predict the future values of PSA given the medical history observed so far. This is essentially the non-parametric estimation of a functional that maps an input sequence (medical history) to an output sequence (future values of PSA). Each of these two approaches has both pros and cons. In this paper, we investigate the black-box approach. In particular, we discuss two methods, namely, a variational Bayesian (VB) method and a Gaussian process method. In the former method, we employ a piecewise Affine (PWA) system to model the dynamics of prostate cancer. The PWA system is one of the hybrid systems [1,16,30,34]. Recently, a method with the optimal control of a PWA hybrid system model was proposed to control the behavior of prostate cancer [31]. The Gaussian process method, on the other hand, is based on the Gaussian process regression technique [27]. Gaussian process regression is one of the most popular methods for non-parametric regression in statistics and machine learning. This method is computationally efficient and has a solid theoretical basis. We compare these two methods in terms of their predictive accuracy. The second purpose of this paper is to evaluate the conventional IAS therapy from the viewpoint of optimality. We will show that the system identification methods described above can reconstruct the real evolution of the PSA well. On the basis of the model constructed in the black-box approach, we compare the IAS therapy with the CAS therapy by determining a weighted sum of the PSA value and the dosing amount. It is observed that the conventional IAS treatment in a clinical situation can be said to be a therapy that minimizes dosing while keeping the PSA in a safe state.

2. Modeling prostate cancer’s dynamics with a nonlinear autoregressive eXogeneous model In this section, we discuss the modeling of the dynamics of the PSA with a nonlinear Auto-Regressive eXogeneous (ARX) model (see Fig. 1 for actually observed data). Then, we show two non-parametric system identification methods based on a nonlinear ARX model: a variational Bayesian method utilizing PWA regression and a Gaussian process regression method. Roughly speaking, the nonlinear ARX model describes a nonlinear relation between the future

PSA values and the medical history (PSA and dosing history). The system identification problem of a nonlinear ARX model can be reduced to a (nonlinear) regression problem. The first approach with the variational Bayesian method can be used along with the PWA regression model for solving the regression problem. The PWA model is particularly important in the context of optimal control [4,5]. In the second approach with the Gaussian process method, a smooth function is fit to the data in a nonparametric Bayesian manner with Gaussian process priors.

147

2.1. ARX model for PSA dynamics

156

Here, we introduce the ARX model that describes the dynamics of the PSA. Now, we assume that patients’ PSA values are observed on a regular schedule.1 We denote by qt the actual PSA value at the tth time step. Let yt 2 R be the log of the PSA value at time step t and let it be given by yt ¼ logðqt Þ.2 Let ut 2 f0; 1g represent the on–off state of the androgen ablation: ut ¼ 1 for on treatment and ut ¼ 0 for off treatment. Then the nonlinear ARX model can be described as

157

yt ¼ f ðyt1 ; yt2 ; . . . ; ytm ; ut ; . . . ; utm Þ þ t ;

148 149 150 151 152 153 154 155

158 159 160 161 162 163

164 166

where m is the maximum time delay, f is an unknown nonlinear function, and t is i.i.d. Gaussian noise with mean 0 and unknown variance. By identifying f, it becomes possible to predict the future value of PSA given the medical history. The estimation of the dynamics is reduced to the estimation of the nonlinear function f (and the unknown variance of the noise), which is a regression problem. We estimate f from the observed data D ¼ fyt ; ut gTt¼1 . We propose two approaches based on Bayesian estimation to estimate f.

167

2.2. Piecewise affine ARX model by variational Bayesian inference

175

Here, we introduce the PWA model for the auto-regressive function f. Let xt be given by

176

xt ¼ ½yt1 ; yt2 ; . . . ; ytm ; ut ; ut1 ; . . . ; utm > : The ordinary (linear) ARX model employs a linear function as the function f as follows:

where w 2 R2mþ1 . In the case of the PWA model that is of interest to us, f is given as

f ðxt Þ ¼

w>i

xt 1

169 170 171 172 173 174

177

178 180 181 182

183 185

f ðxt Þ ¼ w> xt ;



168

186 187

188

 ;

if xt 2 X i ;

190

where wi 2 R2mþ2 ; fX i gM i¼1 is a polyhedral decomposition of the state S 2mþ1 space R2mþ1 , that is, M ; X i \ X j ¼ ;ði – jÞ, and each X i i¼1 X i ¼ R

191

can be represented as X i ¼ fz 2 R2mþ1 jHi z 6 K i ; F i z < Gi g for appropriate Hi ; K i ; F i , and Gi . It is known that many kinds of problems appearing in the analysis of a PWA hybrid system can be reduced to mixed integer programming (MIP) problems. In particular, the optimal control problem of a PWA hybrid system is reduced to a MIP problem [4,5]. We need to determine the number M of polyhedral regions, the M polyhedral decompositon fX i gM i¼1 , and the parameters fwi gi¼1 . Here, we propose a method that uses a variational Bayesian technique to identify the PWA system. Now, let d ¼ 2m þ 1 and

193

1

In this paper, the time interval between each observation is 21 days. Because of the log-transformation, yt can take any real value while the PSA value qt ¼ expðyt Þ is always ensured to be positive. Therefore, if we estimate the dynamics of yt instead of that of qt , we do not need to impose any constraint on the dynamics such as positivity. This gives us much flexibility to develop a non-parametric system identification method. This technique is often utilized to model a positive time series, e.g., stock prices. 2

Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007

192

194 195 196 197 198 199 200 201 202

MBS 7350

No. of Pages 9, Model 5G

10 May 2013 Q1

3

T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx

Fig. 1. The evolution of the PSA and on–off of androgen ablation. The blue solid line shows the PSA, the red dashed line shows the on–off of androgen ablation (up is on, and down is off), and the gray broken line indicates the end of the estimation period. (For interpretation of the references to colour in this figure caption, the reader is referred to the web version of this article.)

203 204 205 206

207

209 210 211 212 213 214 215 216 217

218

N ¼ T  m. By abuse of notation, we denote by fðxn ; yn ÞgNn¼1 the observed input–output pairs. First, we model the distribution of the input vector x as the Gaussian mixture distribution as follows:

pðxjW; MÞ ¼

M X

1 i ; Si Þ;

ui N ðxjl

ð1Þ

i¼1

PM M i ; Si gi¼1 ; i¼1

where W ¼ fui ; l ui ¼ 1; ui P 0 (i ¼ 1; . . . ; M), and N ðxjl; VÞ is a Gaussian distribution with mean l and covariance matrix V. This is a mixture of M Gaussian components N ðxjli ; S1 i Þ (i ¼ 1; . . . ; M). Then, we introduce a latent variable, Z ¼ fzni gM;N i¼1;n¼1 , that indicates which component the nth sample xn belongs to. If the nth sample belongs to the ith Gaussian component, then zni ¼ 1; otherwise zni ¼ 0. Thus, the distribution of xn for zni ¼ 1 is given as 1 i ; Si Þ:

220

pðxn jzni

221

Second, we model the distribution of the output (yn ) conditioned by the input (xn ) and the latent variable (fzni gM i¼1 ) as

222

223 225 226 227

228

230 231 232 233 234 235

¼ 1; WÞ ¼ N ðxn jl

pðyn jxn ; zni ¼ 1; HÞ ¼ N ðyn jw>i ½xn ; 1; b1 i Þ; w> i

ð2Þ

fwi ; bi gM i¼1 .

dþ1

where 2 R ; bi 2 Rþ , and H ¼ Then, we obtain the (joint) likelihood of D and Z as

pðD; ZjW; H; MÞ ¼

N Y M n Y

o zn i 1 T ui N ðxn jli ; S1 : i ÞN ðyn jwi ½xn ; 1; bi Þ

n¼1 i¼1

On the basis of this likelihood, we estimate the parameters W; H, and Z in a Bayesian manner. As a prior distribution, we employ a so-called conjugate prior that is convenient to compute the posterior (see B for the details). Since the analytical form of the posterior distribution is not available despite the availability

of a conjugate prior, we need to use a computational method to obtain the posterior distribution. Roughly speaking, there are two types of computational methods to obtain the Bayesian estimator; one involves a Gibbs sampler (Markov chain Monte Carlo approach) and the other involves a variational Bayesian (VB) estimator. In this paper, we employ the latter method since the VB approach is computationally more efficient (but less accurate). M The technical detail to estimate fwi gM i¼1 and fX i gi¼1 is described in B.

236

2.3. Nonlinear ARX model by Gaussian process regression

245

Here, we introduce the second approach based on Gaussian process regression [27]. In the Gaussian process regression, we put a non-parametric Gaussian process prior GP on a functional space. A zero-mean Gaussian process ~f  GP is a random variable on a function space (e.g., L1 ðRd Þ-space) such that each finite subset ð~f ðx1 Þ; . . . ; ~f ðxj ÞÞðj ¼ 1; 2; . . .Þ obeys a zero-mean multivariate normal distribution. The behavior of the Gaussian process is determined by the kernel function k : Rd  Rd ! R, that is, the covariance function defined by

246

h i kðx; x Þ :¼ E~f GP ~f ðxÞ~f ðx0 Þ :

255

0

237 238 239 240 241 242 243 244

247 248 249 250 251 252 253 254

257

Then, letting f ¼ ½~f ðx1 Þ; . . . ; ~f ðxn Þ and f ¼ ~f ðxÞ for a newly ob>  served point x 2 Rd , the joint prior distribution of ½f f > is given as

258

  K 11 >   pð½f f  Þ ¼ N ½f f > 0; K >12

260



>

>  >

K 12 K 22

ðkðxn ; xn0 ÞÞN;N n¼1;n0 ¼1 ; K 12

 ;

262 >

where K 11 ¼ ¼ ½kðx1 ; xÞ; kðx2 ; xÞ; . . . ; kðxN ; xÞ , and K 22 ¼ kðx; xÞ. Now, since we assumed noise t to be Gaussian,

Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007

259

263 264

MBS 7350

No. of Pages 9, Model 5G

10 May 2013 Q1

4

T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx

266

the likelihood function of ~f is also given by a Gaussian distribution as follows:

269

pðyjfxn gNn¼1 ; ~f Þ ¼ N ðyjf; r2 IÞ;

270

where y ¼ ½y1 ; . . . ; yN > , and r is the standard deviation of n . Note that the likelihood function depends only on f. Thus, through some  calculations, we observe that the posterior distribution of f obeys    the Gaussian distribution, with the mean f and variance var ðf Þ given by

265

267

271 272 273 274

275

f  ¼ K > ðK 11 þ r2 IÞ1 y; 12 277 278 279 280 281 282 283

284



var ðf Þ ¼ K22 

K>12 ðK11

2

1

þ r IÞ K12 :

P Here, the posterior mean f  can be rewritten as f  ¼ Nn¼1 kðx; xn Þan , 2 1 where a ¼ ðK 11 þ r IÞ y. Therefore, the estimated value of f ðxÞ can be calculated by simply summing up the values kðx; xn Þ of the kernel function. In our experiments, we employed a Gaussian kernel as the kernel function as follows: 0 2

!

286

kx  x k kðx; x Þ ¼ exp  2r02

287

The Gaussian width

288

from the center such that r0

0

:

r0 was set to be the mean squared distance

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P P ¼ N1 Nn¼1 kxn  N1 Nn0 ¼1 xn0 k2 . Moreover

r was estimated by 5-fold cross validation. 2

289

the noise variance

290

3. Long-term and short-term predictability

291

In this section, we evaluate the predictability of the proposed methods for both long-term and short-term horizons. Fig. 1 shows the evolution of the PSA and on–off of androgen ablation for five patients.3 The PSA values were observed in discrete times. We connected the observed data point with solid lines in Fig. 1. Using the data shown in Fig. 1, we estimated the dynamics of each patient with the two methods introduced in Section 2. We discretized the observation time. We employed 21 days as the time interval between each time step, that is, y1 is the log of the PSA value on the day 0, y2 is the log of the PSA value on day 21, yt is the log of the PSA value on day 21  ðt  1Þ, and so on. Then, we used the first T est time steps for the estimation of the dynamics, and the rest est of the data were used for the evaluation, i.e., fðyt ; ut ÞgTt¼1 were used T end for estimation and fðyt ; ut Þgt¼T est þ1 were used for evaluation, where T end is the entire length of the observed data. In Fig. 1, T est for each patient is indicated by the gray broken line. The actual values of T est are T est ¼ 62; 62; 82; 67, and 51 (these T est values correspond to 1281; 1281; 1701; 1386, and 1050 days, respectively). Moreover, we employed m ¼ 5 since that gave the best predictive accuracy. To evaluate how accurately the methods estimate the dynamics, we considered the predictive accuracies for two cases: a long-term horizon and a short-term horizon.

292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312

313

3.1. Long-term prediction

314

First, let us evaluate the predictive accuracies for a long-term T end horizon. We estimated the PSA value ðqt Þt¼T after the estimation est þ1 period by running the estimated dynamics, with the initial setting determined by the observed values: xT est þ1 ¼ ½yT est ; . . . ; yT est mþ1 ;

315 316 317

3

The data in this paper were taken from a prospective phase II clinical trial of IAS for men with evidence of serum PSA recurrence after radiotherapy for locally advanced cancer [9]. This trial was approved by the Health Protection Branch of the Ministry of Health of Canada and by the Clinical Research Ethics Review Boards of the participating centres. All patients provided written informed consent in accordance with institutional guidelines. The data are available at http://www.nicholasbruchovsky.com/clinicalResearch.html. The patients shown in the figure correspond to patient numbers 1, 28, 37, 55, and 91 in the data.

Table 1 Predictive errors for the long-term horizon. For the variational Bayes approach, the average error along with the standard deviation over 10 repetitions is shown. Patient no.

Variational Bayes

1 2 3 4 5

0.67 1.94 1.21 0.76 0.76

(0.09) (0.40) (0.09) (0.17) (0.08)

Gaussian process 0.78 2.03 1.02 0.99 0.55

uT est þ1 ; . . . ; uT est mþ1 . We used the actual input ut to run the estiend ^t ÞTt¼T mated dynamics. Let the estimated PSA sequence be ðq . est þ1 We then evaluated the predictive error as the following sum of the absolute values of the discrepancy from the actual PSA values:

predictive error ¼

T end X

^t  qt j: jq

318 319 320 321

322

ð3Þ

t¼T est þ1

324

The predictive errors of the VB approach and Gaussian process approach are presented in Table 1. Since the result of the VB approach varied with the initial setting (because of local optimum), we randomly generated the initial settings 10 times and averaged the predictive errors over the 10 repetitions (in Table 1, the standard deviation over the 10 repetition is also presented in the parentheses). We can see that the better method depends on the patients. With regard to the total error across different patients, the VB approach is slightly better. Examples of the estimated time series of the PSA are shown in Fig. 2. It was reported that patient 1 changed the medicine for androgen ablation in the last cycle, therefore, making a prediction for the last cycle should be a hard problem.

325

3.2. Short-term prediction

338

Next, we evaluate the predictive accuracies for a short-term horizon. We iteratively estimate the actually observed PSA qt by ^t ¼ ^f ðxt Þ, using xt given by the actually observed yt and ut as q where ^f is the estimated auto-regressive function. This situation corresponds to a one-step prediction. Then, as in the case of the long-term prediction, we evaluate the predictive error by using Eq. (3). The short-term predictive errors of the VB approach and Gaussian process approach are presented in Table 2. Here, we again randomly generated the initial settings of the VB approach 10 times and averaged the predictive errors over the 10 repetitions (the standard deviation over the 10 repetition is also presented in the parenthesis). We can see that overall, the Gaussian process approach shows better performances than the VB approach. This is different from the result of the long-term prediction, where both methods showed almost similar performances. As expected, the predictive errors for the short-term horizon is smaller than those for the long-term horizon (Table 2).

339

4. Optimality of IAS

357

In this section, we discuss optimality of the conventional IAS by investigating what kind of criterion IAS optimizes. For the investigation, we perform a simulation on the estimated dynamics for various types of dosing schedules and compare the values of some cost function. We examine patient 1 in Fig. 1 and apply the variational Bayesian system identification method to the data of patient 1. We use the entire time series for the estimation, and the parameter settings are the same as those in Section 3. Fig. 3 shows the simulated sequence. The on–off of androgen ablation is switched when the PSA value crosses the magenta border lines (13 and 1.8). Here since the observation time is

358

Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007

326 327 328 329 330 331 332 333 334 335 336 337

340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356

359 360 361 362 363 364 365 366 367 368

MBS 7350

No. of Pages 9, Model 5G

10 May 2013 Q1

T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx

5

Fig. 2. Estimated PSA time series. The blue solid line shows the actually observed PSA, the green solid line shows the estimated time series of the PSA, the red dashed line shows the on–off of androgen ablation (up is on, and down is off), and the gray broken line indicates the end of the estimation period. (For interpretation of the references to colour in this figure caption, the reader is referred to the web version of this article.)

374 Table 2 Predictive errors for the short-term horizon. For the variational Bayes approach, the average error along with the standard deviation over 10 repetitions is shown. Patient no.

Variational Bayes

1 2 3 4 5

0.17 1.03 0.40 0.40 0.41

(0.03) (0.47) (0.09) (0.06) (0.08)

Gaussian process 0.06 0.87 0.24 0.21 0.19

Fig. 3. Simulated PSA time series on the estimated system. The blue solid line shows the simulated PSA, the red dashed line shows the on–off of androgen ablation (up is on, and down is off), and the magenta broken lines indicate the threshold values of the on–off switching of androgen ablation. (For interpretation of the references to colour in this figure caption, the reader is referred to the web version of this article.)

369 370 371 372 373

discrete where the time interval is 21 days long, there appear delays of switching. We can see from Fig. 3 that the estimated system reproduces the behavior of the actually observed data well. Now, we consider the following cost function of an input (on–off of androgen ablation) sequence ðut ÞTt¼1 :

T X ^t þ ð1  kÞut g; Xk ðuÞ ¼ fkq

ð4Þ

t¼1

^t ÞTt¼1 ðq

where is the simulated PSA sequence for the input ðut ÞTt¼1 ; k 2 ½0; 1 gives the weight for the cost, and T is the number of time steps. We generated five input sequences ðut ÞTt¼1 by adjusting the on and off periods of androgen ablation. The input sequences were de^t does not exceed 20, namely the oversigned so that the PSA q growth of prostate cancer is inhibited. Then, we observed the cost function of Eq. (4) for these five generated input sequences ^t 6 20 under the settings of k ¼ 0 or 0:1 and T ¼ 80. The with q low value of the cost function with k ¼ 0 points to the regulation ^t in a safe region while suppressing therapy as much as possiof q ble. In contrast, when k ¼ 0:1, in addition to suppressing dosing, ^t should also be suppressed to obtain a small value the PSA value q of the cost function. The results are shown in Fig. 4, in which the five input sequences are aligned according to the values of the corresponding cost functions. The input sequence corresponding to the top-left and right-bottom panels in Fig. 4 (the lowest cost function for k ¼ 0 and the highest cost function for k ¼ 0:1) is an imitation of the actually treated input sequence in the clinical data. The on and off periods are set to be the average ones of the clinical data. Since this input sequence minimizes the cost function Xk ðuÞ for k ¼ 0, we can say that the conventional IAS is a way of making the PSA stay in a safe region while skipping the therapy as long as possible. On the other hand, we can see that the short-term cessation of androgen ablation gives a larger cost for k ¼ 0 and a smaller cost for k ¼ 0:1. Such a treatment suppresses the PSA value very well. However, it imposes a larger amount of dosing, which might diminish the quality of life of the patient. Here, note that since the original clinical data do not show recurrence, the identified system also does not show recurrence. Therefore, permanent androgen ablation does not lead to

Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007

376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408

MBS 7350

No. of Pages 9, Model 5G

10 May 2013 Q1

6

T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx

Fig. 4. Ordering various input sequences according to their cost function values. The left column corresponds to k ¼ 0, and the right column corresponds to k ¼ 0:1. From top to bottom, the cost function increases. 409 410 411 412

recurrence in the identified system. We should point out that although the CAS therapy minimizes the cost function with k ¼ 0:1, there is a high possibility that it finally causes irreversible recurrence.

413

5. Summary and discussion

414

5.1. Summary

415

428

In this paper, we proposed two methods to estimate the dynamics of prostate cancer, namely the variational Bayesian (VB) method for a piecewise affine (PWA) system and the Gaussian process regression method. We compared the two methods in two cases: namely, long-term prediction and short-term prediction. In the case of long-term prediction, both methods showed almost similar performances. In the case of short-term prediction, on the other hand, the Gaussian process regression method slightly outperformed the VB method. Moreover, on the basis of an estimated system, we investigated the optimality of the conventional IAS therapy. For various dosing schedules, the conventional IAS therapy was observed to be an optimal method that minimizes the amount of dosing while keeping the PSA value below the risky level.

429

5.2. Discussion and future work

416 417 418 419 420 421 422 423 424 425 426 427

430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445

A favorable point of the methods presented in this paper is the computational efficiency. The parametric model proposed by [18] needs a non-convex optimization, that requires much more computational cost. On the other hand, our approaches cannot predict what has not occurred so far, because they do not utilize any other knowledge. One possible solution to overcome this drawback is to construct a hierarchical Bayes model to transfer the information observed in other patients. We have observed that the predictive accuracy for the shortterm horizon is better than that for the long-term horizon. Moreover, in the long-term horizon, unpredictable recurrence might occur as stated above. Thus the proposed methods are suited for a short- or middle-term prediction. A practical usage of this kind of prediction is to alert if the actual PSA value goes far outside the confidence interval of the predicted PSA value. This should give helpful clinical information to make a medical decision.

Acknowledgement

446

The authors thank Dr. N. Bruchovsky and Dr. Y. Hirata for their valuable discussions. The authors would also like to thank anonymous reviewers for their helpful comments. This research is partially supported by the Japan Society for the Promotion of Science (JSPS) through its ‘‘Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program),’’ initiated by the Council for Science and Technology Policy (CSTP).

447

Appendix A. State space embedding

454

Here we discuss validity of an ARX model compared with a state space model. One might favor a state space model rather than an ARX model from the viewpoint of interpretability. However, we can find a nice connection between ARX models and state space models through the Takens embedding theorem [24,28,32]. Let M be a d-dimensional compact C k -manifold, and let Diff ðMÞ and C k ðM; RÞ denote the space of C k diffeomorphisms from M to M and that of C k maps from M to R, respectively.

455

Theorem 1. Takens Embedding Theorem [24,28,32]. Let M be a compact C k -manifold, and let d be the dimension of M. Let W be a Takens map:

463

W:M ! 2

2

x

#

R

448 449 450 451 452 453

456 457 458 459 460 461 462

464 465

466

2dþ1

ðhðxÞ; h  /ðxÞ; . . . ; h  /2d ðxÞÞ:

Then, the set

468 469

470

E ¼ fð/; hÞ 2 Diff ðMÞ  C k ðM; RÞj TheTakensmapWforð/; hÞisembeddingofMg is an open dense subset of Diff ðMÞ  C k ðM; RÞ with the Whitney C k topology. h

472 473 474 475

This theorem tells us that from the series of delayed observations ðhðxÞ; h  /ðxÞ; . . . ; h  /2d ðxÞÞ, the state space can be reconstructed for almost every ð/; hÞ, and the 2d þ 1 dimension is

Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007

476 477 478

MBS 7350

No. of Pages 9, Model 5G

10 May 2013 Q1

481

large enough to embed the d-dimensional manifold. Therefore, for sufficiently large m, the ARX model gives an equivalent representation of the state space model.

482

Appendix B. Variational Bayesian inference

483

B.1. Variational Bayes estimation for PWA model

484

In the VB approach, we approximate the true posterior distribution by the product of each probability function corresponding to each parameter as follows:

479 480

485 486

487 489

490 491 492 493 494 495 496

497 499 500

501 502 503

504 506

pðW; H; ZjD; MÞ ’ Q ðWjMÞQ ðHjMÞQ ðZjMÞ: The VB minimizes the discrepancy between the true posterior distribution and the approximated posterior distribution in terms of the KL-divergence. The VB technique used to calculate Q is based on [33]. The details are given in the following Sections (B.2 and B.3). Using the approximated posterior, the affine function corresponding to each Gaussian component can be estimated as the posterior mean as follows:

i ¼ w

Z

wi Q ðwi jMÞd wi :

Finally, let us compute the polynomial decomposition fX i gM i¼1 . From the approximated posterior distribution Q ðZjMÞ of the latent variable, one can obtain the expectation of the latent variable as follows:

zni ¼

Z

zni Q ðzni jMÞdzni ¼ Q ðzni ¼ 1jMÞ:

513

Then, a cluster label such as cn ¼ arg maxi¼1;...;M zni is assigned to each sample xn . On the basis of the N labeled pairs fðxn ; cn ÞgNn¼1 , we consider the problem of obtaining a polyhedral decomposition as a multicategory classification problem. In our implementation, we employed Multicategory SVM (M-SVM) [6,8]. The M-SVM gives M couples of a vector and a real fð/i ; gi ÞgM i¼1 that separate the M polyhedral regions. For fixed i, if x 2 Rd satisfies

516

x> /i  gi > x> /j  gj

517

520

then x is classified in the ith polyhedral region X i . In other words,  x 2 Rd is classified in the i th polyhedral region, where  i ¼ arg maxi¼1;...;M fx> /i  gi g. Thus, the polyhedral decomposition of the region is achieved.

521

B.2. General framework

522

538

Here, we present the general framework of variational Bayes. Let D ¼ fxn gNn¼1 be the observed data that are independently identically distributed with an unknown true distribution. The aim of statistical inference is to estimate the true distribution that has generated D from observed data. For the estimation, we utilize a statistical model M that describes the true distribution well, and the model has a parameter h that specifies the likelihood (probability density) pðDjh; MÞ of D. In the mixture model, as in our setting, we identify the model M as the number of components. Moreover, we are interested in the latent variable Z ¼ fzni gN;M n¼1;i¼1 that indicates which component each sample xn belongs to. The maximum likelihood framework maximizes the likelihood, i.e., it estimates the true parameter value and the latent variable from arg maxh;Z pðDjh; Z; MÞ. In contrast, the Bayesian inference does not try to identify the unknown variables with one point, but estimates the posteriori distribution pðh; ZjD; MÞ, which is defined as

541

pðDjh; MÞpðhjMÞpðZjMÞ pðh; ZjD; MÞ ¼ ; pðDjMÞ

507 508 509 510 511 512

514

518 519

523 524 525 526 527 528 529 530 531 532 533 534 535 536 537

539

7

T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx

where pðhjMÞ and pðZjMÞ are the prior distributions of parameters h and Z. A disadvantage of the Bayesian inference is that the computation of the normalization constant pðDjMÞ is difficult, and thus, the inference on the basis of the posterior is usually computationally difficult. For example, the posterior mean of the parameter R  h ¼ hpðhjD; MÞdh is not easily obtained. To cope with this problem, there are mainly three approaches, i.e., the Markov chain Monte Carlo (MCMC) method, the Laplace approximation method, and the variational Bayes (VB) method [3,35]. We employ the VB method in this study. Let # be a set of hyperparameters. We then consider the following marginal log-likelihood of D that does not depend on the parameters and model structures:

542 543 544 545 546 547 548 549 550 551 552 553 554 555

556

LðDÞ ¼ log pðDÞ XX ZZ ¼ log pðD; Zjh; MÞpðhj#; MÞpð#jMÞpðMÞdhd#; M

Z

where pð#jMÞ and pðMÞ are the prior distributions of # and M. Here, note that LðDÞ is a constant variable given data D. The VB method overcomes the computational difficulty by approximating the true a posteriori distribution to an alternate distribution Q that is easy to treat (e.g., having an analytic form). The approximated distribution Q is called a variational posterior distribution. Here, we restrict the class of Q to a set of distributions that can be factorized as Q ðZ; h; #; MÞ ¼ Q ðZjMÞQ ðhjMÞQ ð#jMÞQ ðMÞ. In the case where pðhj#; MÞ and pð#jMÞ can be factorized by indepenQ dent parameters, i.e., pðhj#; MÞ ¼ Ii¼1 pðhi j#i ; MÞ and pð#jMÞ ¼ QI i¼1 pð#i jMÞ, we also assume that Q can be factorized as Q Q Q ðhjMÞ ¼ Ii¼1 Q ðhi jMÞ and Q ð#jMÞ ¼ Ii¼1 Q ð#i jMÞ. This is the case we consider in this study. By introducing the new distribution Q, we can rewrite LðDÞ as follows:

LðDÞ ¼

XX ZZ M

ð8j – iÞ;

560 561 562 563 564 565 566 567 568 569 570 571 572 573

Q ðZ; h; #; MÞ log pðDÞdhd#

Z

where we define

576 577

def

KLðQ ðZ; h; #; MÞjjpðZ; h; #; MjDÞÞ ¼

X ZZ

578

Q ðZ; h; #; MÞ

M;Z



 QðZ; h; #; MÞ dhd#;  log pðZ; h; #; MjDÞ   ZZ X pðD; Z; h; #; MÞ def dhd#: Q ðZ; h; #; MÞ log F ðQ Þ ¼ Q ðZ; h; #; MÞ M;Z KLðQ jjpÞ measures the discrepancy between probabilistic distributions Q and p, and it is non-negative. The purpose of using the VB is to make Q as close to p as possible, i.e., to minimize KLðQ jjpÞ over Q among the set of factorized distributions. Now LðDÞ is constant, and thus, the purpose can be achieved by maximizing F ðQ Þ. Here, we set F M ðQ Þ as

 pðD; Zjh; MÞ def F M ðQ Þ¼ log Q ðZjMÞ QðZjMÞ;Q ðhjMÞ  M M  X X pðhi j#i ; MÞ pð#i jMÞ þ log þ log Qðhi jMÞ Q ðhi jMÞ;Q ð#i jMÞ i¼1 Q ð#i jMÞ Q ð#i jMÞ i¼1 pðMÞ ; Q ðMÞ

where hf ðXÞiQ ðXÞ is the expectation of f with respect to the distribution Q ðXÞ, namely EQ ðXÞ ½f ðXÞ. Then, we obtain

Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007

559

574

¼ KLðQ ðZ; h; #; MÞjjpðZ; h; #; MjDÞÞ þ F ðQ Þ;

þ log

558

580 581 582 583 584 585 586

587

589 590 591

592

MBS 7350

No. of Pages 9, Model 5G

10 May 2013 Q1

594 595 596 597 598 599

600

602 603

604

8

T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx

F ðQ Þ ¼

X

QðMÞF M ðQÞ:

ð5Þ

M

To maximize F ðQ Þ, the VB iteratively maximizes over a distribution of each parameter while the distributions corresponding to the other parameters are fixed. By the variational method, the (t þ 1) th iteration procedure is as follows: VB E-step:

Q ðZjMÞ

ðtþ1Þ

607 608 609 610 611

612 614

Q ðMÞ ¼

pðfu 

/ pð#i jMÞ expfhlog pðhi j#i ; MÞiQðhi jMÞðtþ1Þ g;

1 ðM ¼ M Þ; 0

ðotherwiseÞ;

619 620

623 624 625 626

627 629

630 631

632

634 635 636 637 638

639

I Y

1 exp  Tr ðB0 SÞ ; 2

i¼1

udi 0 1 d

I X

ui ¼ 1&ui P 0 ;

b xa1 expfbxg: CðaÞ

644

The Wishart distribution is used for the distribution of a covariance matrix of a Gaussian distribution, and the Dirichlet distribution is used for a mixture proportion f/gM i¼1 .

645

Next, we introduce hyperpriors faj gM j¼1 , which control the vari-

646

ances of fwj gJj¼1 , i.e., the inverse of Kj ¼ diag ðaj;1 ; . . . ; aj;dþ1 Þ corre-

642 643

647 648 649 650

pðfwj ; bj gIj¼1 jfaj gIj¼1 ; MÞ ¼

664

J Y N ðwj j0; ðbj Kj Þ1 ÞGðbj jq0 ; k0 Þ; j¼1

dþ1 Y

Gðaj;k jj0 ; f0 Þ;

where Kj ¼ diagðaj;1 ; . . . ; aj;dþ1 Þ:  M (the number of components):

( pðMÞ ¼

1 ; 15

if M 6 15;

0;

otherwise:

The remaining hyperparameters corresponding to each prior distribution are as follows:

In our inference, we employed the following setting for the hyperparameters:

666 667

668

670 671 672

674 673 676 677 678

679 681

As for B0 , we first perform k-means clustering and set B0 as the within-class variance–covariance matrix. Details of the calculation of the variational posterior can be found in [33].

682

References

686

i¼1

a

641

663

683 684 685

!

where c0 is a normalizing constant and dð Þ is an indicator function that returns 1 if the statement in the round bracket is true and 0 otherwise.  Gamma distribution:

Gðxja; bÞ ¼

662

pðbj Þ ¼ Gðbj jq0 ; k0 Þ;

m0 ¼ 0; n0 ¼ 1; g0 ¼ 10; d0 ¼ 1; q0 ¼ 10; k0 ¼ 1; j0 ¼ 1; and f0 ¼ 10:

where c is a normalizing constant.  Dirichlet distribution:

Dðfui gIi¼1 jd0 Þ ¼ c0

659

k¼1

 Wishart distribution: 2

658

pðwj jbj ; aj Þ ¼ N ðwj j0; ðbj Kj Þ Þ;

pðaj Þ ¼

Here, we describe the prior distributions that we employed in the inference. First, we define notations of some probability distributions.

WðSjg0 ; B0 Þ ¼ cjSi j

657

661

li : m0 ; n0 ; Si : g0 ; B0 ; ui : d0 ; bj : q0 ; k0 ; and aj : j0 ; f0 :

g0 d1

655

1

M

B.3. Prior distributions

618

¼ Dðfu

I i gi¼1 jd0 Þ:

a (the coefficient of each PWA component and the variance of noise (see Eq. (2))):

with M  ¼ arg maxF M :

622

617

N ðli jm0 ; ðn0 Si Þ1 ÞWðSi jg0 ; B0 Þ:

fwj ; bj gJj¼1 ; f j gM j¼1



621

616

I Y i¼1

I i gi¼1 Þ

To obtain the optimal model, we compute the optimal F M ðQ Þ with respect to each M and then select the optimal model M . Here there still exists the problem of local minima. The VB procedure is highly likely to capture a local minima. To overcome this problem, Ueda and Ghahramani proposed a Bayesian SMEM algorithm for a mixture model that executes a split-and-merge procedure and avoids stacks at a local minimum [33].

615

pðfli ; Si gIi¼1 jMÞ ¼

where hi denotes fhj gj–i . The names ‘‘E-step’’ and ‘‘M-step,’’ are adopted on the basis of an analogy with an EM algorithm [13]. Next, we choose the optimal model. Remember that F ðQ Þ is the average of F M ðQ Þ over Q ðMÞ (see Eq. (5)). Thus, Q ðMÞ that maximizes F ðQ Þ is given by



653

656

n

Q ð#i jMÞ

652

pðSi Þ ¼ WðSi jg0 ; B0 Þ;

 fui gM i¼1 (the mixture weight of each Gaussian component (see Eq. (1))):

VB M-step:

ðtþ1Þ

651

pðli jSi Þ ¼ N ðli jm0 ; ðn0 Si Þ1 Þ;

/ expfhlog pðD; Zjh; MÞiQðhjMÞðtÞ g;

Q ðhi j#i ; MÞðtþ1Þ / exp hlog pðD; Zjh; MÞiQðZjMÞðtÞ ;Qðhi jMÞðtÞ o þhlog pðhi j#i ; MÞiQð#i jMÞðtÞ ; 606

 fli ; Si gM i¼1 (the mean and variance–covariance matrix of each Gaussian component (see Eq. (1))):

def

sponds to the variance of wj . By introducing aj , the influence of each component of wj can be modified more flexibly. Then we employ the following conjugate priors as the prior distributions for parameters.

[1] K. Aihara (Ed.), Theme Issue, Theory of hybrid dynamical systems and its applications to biological and medical systems, Philosophical Transactions of The Royal Society A 368 (1930) (2010). [2] K. Akakura, N. Bruchovsky, S.L. Goldenberg, P.S. Rennie, A.R. Buckley, L.D. Sullivan, Effects of intermittent androgen suppression on androgen-dependent tumors-apoptosis and serum prostate-specific antigen, Cancer 71 (9) (1993) 2782. [3] H. Attias, Inferring parameters and structure of latent variable models by variational Bayes, in: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), Stockholm, Sweden, 1999, pp. 21–30. [4] A. Bemporad, F. Borrelli, M. Morari, Optimal controllers for hybrid systems: stability and piecewise linear explicit form, in: Proceedings of the 39th IEEE Conference on Decision and Control, Sydney, Australia, 2001, pp. 1810–1815. [5] A. Bemporad, F. Borrelli, M. Morari, On the optimal control law of linear discrete time hybrid systems, Lecture Notes in Computer Science 2289 (2002) 105. [6] K.P. Bennett, O.L. Mangasarian, Multicategory discrimination via linear programming, Optim. Methods Softw. 3 (1994) 27. [7] R.R. Berges, Y. Furuya, L. Remington, H.F. English, T. Jacks, J.T. Isaacs, Cell proliferation, DNA repair, and p53 function are not required for programmed death of prostatic glandular cells induced by androgen ablation, Proc. Natl. Acad. Sci. USA 90 (19) (1993) 8910. [8] E.J. Bredensteiner, Multicategory classification by support vector machines, Comput. Optim. Appl. 12 (1999) 53.

Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007

687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710

MBS 7350

No. of Pages 9, Model 5G

10 May 2013 Q1 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742

T. Suzuki, K. Aihara / Mathematical Biosciences xxx (2013) xxx–xxx [9] N. Bruchovsky, L. Klotz, J. Crook, S. Malone, L. Charles, W.J. Morris, M.E. Gleave, S.L. Goldenberg, Final results of the Canadian prospective phase II trial of intermittent androgen suppression for men in biochemical recurrence after radiotherapy for locally advanced prostate cancer, Cancer 107 (2006) 385. [10] N. Bruchovsky, B. Lesser, E. Van Doorn, S. Craven, Hormonal effects on cell proliferation in rat prostate, Vitam. Horm. 33 (1975) 61. [11] N. Bruchovsky, P.S. Rennie, A.J. Coldman, S.L. Goldenberg, M. To, D. Lawson, Effects of androgen withdrawal on the stem cell composition of the Shionogi carcinoma, Cancer Res. 50 (1990) 2275. [12] J.M. Crook, E. Szumacher, S. Malone, S. Huan, R. Segal, Intermittent androgen suppression in the management of prostate cancer, Urology 53 (3) (1999) 530. [13] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B (Methodological) 39 (1) (1977) 1. [14] A.M. Evens, T.M. Lestingi, J.D. Bitran, Intermittent androgen suppression as a treatment for prostate cancer: a review, The Oncologist 3 (6) (1998) 419. [15] S.L. Goldenberg, M.E. Gleave, L.D. Sullivan, K. Akakura, Intermittent androgen suppression in the treatment of prostate cancer: a preliminary report, Urology 45 (5) (1995) 839. [16] W.P.M.H. Heemels, B. De Schutter, A. Bemporad, Equivalence of hybrid dynamical models, Automatica 37 (2001) 1085. [17] C.S. Higano, W. Ellis, K. Russell, P.H. Lange, Intermittent androgen suppression with leuprolide and flutamide for prostate cancer: a pilot study, Urology 49 (1997) 79. [18] Y. Hirata, N. Bruchovsky, K. Aihara, Development of a mathematical model that predicts the outcome of hormone therapy for prostate cancer, J. Theor. Biol. 264 (2010) 517. [19] C. Huggins, C.V. Hodges, Studies on prostatic cancer. I. The effect of castration, estrogen and of androgen injection on serum phosphatases in metastatic carcinoma of the prostate, Cancer Res. 1 (1941) 293. [20] C. Huggins, R.E. Stevens, C.V. Hodges, Studies on prostatic cancer. II. The effect of castration on advanced carcinoma of the prostate gland, Arch. Surg. 43 (1941) 209.

9

[21] L.H. Klotz, H.W. Herr, M.J. Morse, W.F. Whitmore, Intermittent endocrine therapy for advanced prostate-cancer, Cancer 58 (11) (1986) 2546. [22] H. Kuramae, Y. Hirata, N. Bruchovsky, K. Aihara, H. Suzuki, Nonlinear systems identification by combining regression with bootstrap resampling, Chaos 21 (2011) 043121. [23] N. Kyprianou, N.T. Isaacs, Quantal relationship between prostatic dihydrotestosterone and prostatic cell content: critical threshold concept, Prostate 11 (1987) 41. [24] L. Noakes, The Takens embedding theorem, Int. J. Bifur. Chaos 1 (4) (1991) 867. [25] R.L. Noble, Hormonal control of growth and progression tumors of NB rats and a theory of action, Cancer Res. 37 (1) (1977) 82. [26] M.H. Rashid, U.H. Chaudhary, Intermittent androgen deprivation therapy for prostate cancer, The Oncologist 9 (2004) 295. [27] C.E. Rasmussen, C. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006. [28] T. Sauer, J.A. Yorke, M. Casdagli, Embedology, J. Stat. Phys. 65 (3/4) (1991) 579. [29] R. Siegel, D. Naishadham, A. Jemal, Cancer statistics, A Cancer J. Clin. 62 (2012) 10. [30] E.D. Sontag, Non-linear regulation: the piecewise linear approach, IEEE Trans. Autom. Control 26 (2) (1981) 346. [31] T. Suzuki, N. Bruchovsky, K. Aihara, Piecewise affine systems modelling for optimizing hormonal therapy of prostate cancer, Philos. Trans. A R. Soc. 368 (2010) 5045. [32] F. Takens, Detecting strange attractors in turbulence, Lecture Notes Math. 898 (1981) 366. [33] N. Ueda, Z. Ghahramani, Bayesian model search for mixture models based on optimizing variational bounds, Neural Netw. 15 (2002) 1223. [34] A.J. Van Der Schaft, J.M. Schumacher, Complementarity modelling of hybrid systems, IEEE Trans. Autom. Control 43 (1998) 483. [35] S.R. Waterhouse, D. MacKay, A.J. Robinson, Bayesian methods for mixture of experts, Adv. Neural Inf. Proces. Syst. NIPS8 (1995) 351–357.

Q1 Please cite this article in press as: T. Suzuki, K. Aihara, Nonlinear system identification for prostate cancer and optimality of intermittent androgen suppression therapy, Math. Biosci. (2013), http://dx.doi.org/10.1016/j.mbs.2013.04.007

743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774