MATHEMATICAL COMPUTER MODELLING Mathematical and Computer Modelling 33 (2001) 1323-1345
PERGAMON
www.elsevier.nl/locate/mcm
Stochastic Modeling of Carcinogenesis by State Space Models: A New Approach WAI-YUAN
TAN
Department of Mathematical Sciences The University of Memphis Memphis, TN 38152-6429, U.S.A.
[email protected] CHAO W. CHEN AND WEI WANG National Center for Environmental Assessment USEPA, Washington, D.C. 20460, U.S.A.
Abstract-In
this paper, we have developed some state space models for carcinogenesis involving multievent models and multiple pathways models. In these state space models, the stochastic system models are stochastic models of carcinogenesis expressed in terms of stochastic differential equations, whereas the observation models are statistical models based on the observed number of detectable preneoplastic lesions per individual over time and the observed number of detectable cancer tumors per individual over time. In this paper, we have applied some of the theories to some animal papillomas data from some initiation-promotion experiments on skin cancer in mice to estimate some unknown parameters. For this data set we have obtained excellent fit by a model with three piece-wise intervals. @ 2001 Elsevier Science Ltd. All rights reserved.
Keywords-Multievent model of carcinogenesis, Multiple pathways model of carcinogenesis, Ob servation model, Stochastic differential equations, Stochastic system model.
1. INTRODUCTION To develop stochastic models of carcinogenesis, the traditional approach is by way of Markov theories. The basic approach along this line consists of four basic steps: (a) (b) (c) (d)
derive derive derive obtain
the the the the
probability generating function (PGF) of the number of tumors, incidence function of tumors, probability distribution of time to tumor onset, and probabilities of the number of tumors.
This approach has been described and illustrated in [I]. The problem with such a modeling approach is that if the model is more complex than the framework of the two-stage model, the modeling process becomes too complicated to be useful. Even for the basic two-stage (MVIC) model, an assumption that a single malignant cell is equivalent to a tumor must be made in order to simplify mathematics. Recent studies by cancer researchers have shown that the process
0895-7177/01/S - see front matter @ 2001 Elsevier Science Ltd. All rights reserved. PII: s0895-7177(00)00319-8
Typeset by dn/zs-w
1324
W.-Y.
TAN
et al.
of carcinogenesis could be very complex, and may be far beyond the framework of the MVK two-stage model. (1) The carcinogenic process may be time-dependent (i.e., nonhomogeneous). This is especially true in initiation and promotion experiments in which animals were exposed to different chemicals with different dose rates at different time. (2) In many practical situations, the process may involve k-stages with k > 2. For example, in colon cancer, it is now recognized that three-stage models are more appropriate than twostage models [2]. Similarly, in leukemia, Little et al. [3] have shown that the three-stage models are more appropriate. (3) In many cases, the carcinogenesis may involve more than two pathways. when BGC3Fl mice were exposed to dichloroacetic acid (DCA),
For example,
carcinomas, adenomas,
hyperplastic nodules, and other preneoplastic lesions were induced in liver and stained positive with different markers [4]. DeAngelo [5] hypothesized that at least four pathways are involved for the DCA-induced
Hepatocellular carcinomas in B6C3Fl
mice; that is,
a carcinoma may be preceded directly by hyperplastic nodules, adenomas, or dysplastic foci, and through an indirect pathway, carcinomas may also evolve from adenomas that have evolved from hyperplastic nodules. Because of the complicated nature of many carcinogenesis models, to develop general theories would call for some feasible new approaches which are much more easier to handle mathematically and can be applied to animal carcinogenicity data caswell as other data. For achieving these purposes, in this paper we will propose an alternative approach by stochastic differential equations. By using this approach, we will then develop some Kalman filter models for carcinogenesis. The stochastic differential equation approach is equivalent to the traditional approach of Markov theories and is the consequence of the observation that the stochastic birth-death process and the mutation process is equivalent to multinomial distributions over small time intervals. In Section 2, we will extend the multievent model of carcinogenesis to involve cancer progression. We will illustrate how to derive stochastic equations for the numbers of normal stem cells, initiated cells, and cancer tumors as well as the incidence functions of detectable cancer tumors. In Section 3, we will derive stochastic equations and incidence function of the multiple pathways model proposed by DeAngelo [5]. In Section 4, we will derive some state space models (Kalman filter models) for these models. As an application, in Section 5, we will apply some of the results to some initiation-promotion data to illustrate how to estimate the unknown paramet,ers.
‘2. THE
EXTENDED
MULTIEVENT
MODEL
The k-stage multievent model with k 2 2 was first proposed by Chu [G]. When k = 2, this reduces to the MVK two-stage model; see [l, Chapter 31. To illustrate, let 1” denote normal stem cells, and 1.j the jt” stage initiated cells arising from the (J’ - l)t” stage initiated cells (.i = l,... , k) by mutation or some genetic changes. Then the model is specified by t,he transitions I” + Ii -+ ... --+ 1k:, with the I3 cells (j = 0, 1, . . . , k) subject to stochastic proliferation (birth) and differentiation (death) (for details, see [l, Chapter G] and [3,7]). In these models, an assumption that the kt” stage initiated cells (Ik cells) are identical with tumors has been made, thus ignoring the stochastic proliferation (birth) and differentiation (death) of IA. cells. Hence, when cancer tumors arise from Ik cells through some stochastic birth-death processes, the mathematical theories of these models developed by Tan [l] and Little [7] through numerical integration are no longer valid. (Note that an Ik cell may give rise to zero tumors under stochast,ic birt,h and death.) To relax this assumption, we follow Yang and Chen [g] to assume thatPcancer tumors derive through clonal expansion of primary Ik cells (tumor cells), where primary 1~.cells are Ik cells which arise directly by mutation or other genetic changes from Ik-i cells. We will refer these models as the extended multievent models to be distinguished from the multievent models defined in [G].
Stochastic
2.1.
1325
Modeling of Carcinogenesis
Stochastic Differential Equations
Let bj (t), dj (t), and aj (t) denote the birth rate, the death rate, and the mutation rate of the Ij cells, respectively, for j = 0, 1, . . . , k - 1. That is, the probabilities that an 1j cell at time t will give rise to two Ij cells, zero 1j cells, and one Ij cell and one 1j+l cell at time t + At are given, respectively, by bj(t)At + o(At), dj(t)At + o(At), and aj(t)At + o(At), where o(At) is defined by limht_+o o(At)/At = 0. To derive stochastic differential equations for Ij, j = 0, 1, . . . , k - 1, denote by Bj (t) = number of new Ij cells generated by stochastic cell proliferation and differentiation ofIj
cellsduring(t,t+At],j=O,l,...,
k-l;
Dj(t)=~~ulnberofdeathofI~cellsduring(t,t+dt],j=O,l,...,k-l; A’Ij(t) = number of new Ij cells arising from Ij- 1 cells by mutation or some genetic changes during (t, t + At], j = 1,. . . , k (note that the Mk(t) cells are the primary cancer tumor cells generated during (t, t + dt]). Then, for j = 0, 1, . . . , k- 1, to order o(At) the conditional distributions of Xj(t) = [Bj(t), Dj(t), slj+,(t)] given Ij(t) at time t are multinomial with parameters {13(t), bj(t)At, dj(t)At, aj(t)At}. Further. we have the following stochastic equations for N(t)
= lo(t) and Ii(t),
i = 1,. . . , k - 1:
lo(t + dt) = lo(t) + Be(t) - Do(t), Ij(t + dt) = Ij(t) + iI+
+ B,(t)
(1) j=l,...,k-1.
- Dj(t),
(2)
Let eo(t)At = [Be(t) - lo(t)bo - [Do(t) - lo(t)do(t)At] and for j = 1,. . . , k- 1, ej(t)At = [Mj(t) - Ij_~(t)q_l(t)At] + [Bj(t) - Ij(t)bj(t)At] - [Dj(t) - Ij(t)dj(t)At]. Then, denoting by AX(t) = X(t + At) - X(t), we have the following stochastic differential equations for Ij(t)? j=O.l,...,
k-l: Alo
= lo(t)eo(t)At
Al,(t)
= {I,_l(t)+l(t)
(3)
+ eo(t)At,
+ Ij(t)q(t)}
where cj(t) = bj(t) - dj(t), j = O,l,. . . , k - 1. In equations (3),(4) , given the numbers E(t)
j=
At + ej(t)At, = {Io(t),lj(t),
j = l,..
l,...,k-1,
(4)
., k - 1) at time t, the
random noises {ej (t), j = 0, 1, . . . , k - 1) are residuals of linear combinations of multinomial random variables from its conditional mean values; hence, these random noises have expectation zero conditionally. It follows that E(ej(t)) = 0 for j = 0, 1, . . . , k - 1. Using the basic formulae Cov(X,Y) = E{Cov[(X,Y) 1 Z]} + Cov[E(X 1 Z),E(Y 1 Z)], it is also obvious that the ej(t)s are uncorrelated with elements of U(t). Further, to order o(At), t,he variances aucl covariances of elements j = O.l,...,
of the ej(t)s
are easily obtained as COV[e,(t)At,ej(t)At]
k, Qjj(t) Q.i.j(t) = 0, for i # j.
= E{[l
2.2.
Distribution Preneoplastic
The Probability and Detectable
- do~]lj_l(t)Qj-l(t)
+ Ij(t)[bj(t)
of the Number Lesions
= Qi3(t)At.
where for
(j = 071,. . . 7k - l),
+ dj(t)]}
of Detectable
Cancer
Tumors
Let T(t) denote the number of detectable cancer tumors at time t. To derive stochastic equatious for T(t), let the birth rate and death rate at time t of the In: cells which arise from the Ik-1 cells at time s be given by b~(s, t) and dT(s, t), respectively. Let PT(s, t) be the probability t,hat, a primary cancer tumor cell arising at time s will develop into a detectable cancer t,umor by t,ime t. Assume that a cancer tumor is detectable only if it contains at least NT cancer tumor cells. Then, by using results from [9], we obtain 1 PT(s, t, = <(s, t) + q(s, t) where <(s,t)
= exp{-
J,“[~T(s,z)
77(% t) as,
t) + rl(s, t) >
NT-1 ’
- &r(S,Z)] dz} and a(~, t) = Jf &r(s, Y)<(s~y) dy.
(5)
132G
W.-Y.
If the
TAN et al.
of tumor cells follow a stochastic
growth
bTexp(-6(t--s))
and &(s,t)
exp{&
- s))-l]}
[exp(-6(t
Gompertz birth-death
= drexp(--6(t--s)),
with& = (bT-d~)Jd~d q(s,t) - <(t - s)) so that ~T(s, t) reduces to
[bT/(bT - dT)](l
where ,~~(t) = (1 - &)/(I described by Gompertz
= q(t-s)
= S,fD~(s,~)E(s,y)dg
- s) = bT -(N’-l)UT(t - s) (1 - t@(t -
= &(t
PT(S,t)
- &&(t - S)) birth-death
with
6,
process, then b~(s, t) =
wh ere 6 > 0. In this case, <(s, t) = E(t - s) =
dT/bT.
=
SillCe the
=
S)}NT-l,
growth Of tUlllOr
CdlS
process [9], we will assume the above stochastic
best
iS
Gompertz
birth-death process for the growth of tumor cells so that pT(S, t) = PT(t - s). To derive the probability distribution of T(t), let T(s, t) be the number of detectable
cancer
tumors at time t arising from primary tumor cells which arose from 1k_i cells at time s. Then, given Ik_i(s)
I&i
cells at time s, to order o(As),
T(s, t) is binomial with parameters 6T(%s,t (u
-
1 Ik-l(S))
of
l)ak_l(s)P~(t
conditional
-
{Ik_i(s),
@VeU
T(%t)
Ik_l(s)1k_l
s)As} II‘- l(S).
the conditional
a&l(S)PT(t CdS
Let 4T(U;t)
PGF of T(t) given Lk_1 = {Ik_i(s),
-
s)As}.
at tilde
probability
distribution
Thus, the conditional
S iS $T(‘U:S,t
be the PGF of T(t)
1 Ik__l(S))
and ~$T(qt
=
of
PGF (1
f
1 IA.-~) the
to < s I t}. Then, @T(U;t) = E&(u.>
1 L,_,),
and we have dT
(‘u,t1
Lk-1)
=
,Fy,
fi
4T (u; to +
sAs, t 14-1
(to + sAs))
s=o = *hno
fi {I + (U - l)crk-i s=o
(to + sAs) PT (t - (to + sAs)) As}‘~-~(~~+~*~)
&_i(.r)(wk-i(%)PT(t
- X) dx
. >
From above, the conditional distribution of T(t) given I& 1 is a Poisson variable with conditional mean given by J:i I~-~(z)cY~_~(s)PT(~
- Z) dx. It follows that one may write T(t) as
t T(t)
=
J to
Ik-Irk-l(x)PT(t
-
(7)
2) da: + ET(t),
where ET(t) has expected value zero and covariance
cov(ET(t),ET(T))
=
I
t
dx,
EI!+l(X)ar,_l(X)PT(t-X)
to
where t = Min(t, T). In many animal carcinogenicity experiments, very often some data on the number of preneoplastic lesions are available. For example, in animal initiation-promotion experiments for mice skin cancer, very often data on the number of papillomas per mouse and the number of carcinom
=
>
’
(8)
Stochastic
Modeling of Carcinogenesis
1327
WI, dr(t)
•~~-
Papillomas
Figure 1. A two-stage (multievent with k = 2) model for carcinogenesis promotion experiments.
in initiation-
where NI is the minimum number of 11 cells in the preneoplastic lesion for its detection, g(s, t) = exp{- s,” [bl (z) - 4 @)I ds} and h(s, t) = s,” bl (y)g(s, 9) dy. Using exactly the same approach as above, the conditional PGF &,(u, t 1 I_,) of T,(t)
given
{11(s), s 5 t} is obtained as & (u,t I {I)
= ev
{(u - 1) 1: ~~(~h(~Pd~,~)
dx}
.
Thus, the conditional distribution of T,,(t) given (11 (s), s 5 t} is Poisson with mean t ~l(~)w(~)JWc,~)
s to
dx.
t
It follows that Tp(t) can be expressed as
Tp@)= where Ed
J
Il(x)a~(x)Pr(x, t) dx +
Ez(t)t
(9)
to
has expected value zero and covariance
cov(~l(t), cz(T)) = /i El1 (xh (~)J’I (x:,f) dx, to
where t = Min(t, T).
2.3. The Incidence Function of Detectable Tumors Denote by QT(t) = exp{-S,toIk-l(T)ak_l(x)~~(t incidence function X,(t) of detectable tumors is
X7-(t) =-
$ $T(O;t) q5T(0, t)
=
- z)dx}.
Using the PGF &r(u,t),
t
&E
{
I~-~(x)cY~--~(x)
Qdt) J to
2 Er(t
1
- 2) dx .
the
(10)
To compute the above incidence function for given parameter values, we will
respectively, by -
2
Ik_l(i)ak_1(i)I+(t
-i)
i=to+1
I
and AT(t)
=
&E
QT(t)
2 j=to+1
I/c-l(j)%-l(j)
f h(t
-d
(11)
W.-Y.
132x
TAN
et al.
Then one may use the stochastic equations in (I)+,(2) to generate a random sample for { Ik.- i (s), s
XT(t)
. , IV} and compute &r(t) by the approximation
-$ ik-,(j)cw(~) $ mt - j)Y
N
(12)
j=to+1
where
To illustrate, we consider an extended multievent model with k = 3 and assume the parameter values as follows:
(1) for
10 cells, do(t) = 0.289
&Ne-60t,
ho(t) - do(t) = a0
where &N = 0.0005, X0 = lo-‘, for an initiator; (2) for
x0 + zox1,
=
Xi = lo-“,
60 = 10e5, zo = 100, where to is the dose level
i = 1,2, h(t)
= UJ (bi + po (1 + ZI)f‘l) )
d&(t) = UOdi, (ul = WI)(0.0001 + 0.0001Z~) ! a:! = g-7 (1 - exp(-l)),
where bi = b2 = 0.5100, di = dz = 0.4899, wg = l/3.65, 0,50,100, where TVis the dose level of a promoter: (3) for primary tumor cells, bT(s, t) = bTes(t-S),
dT(s,t)
ILO= 0.01, ,u~ = 0.01. t1 =
= dTes(t-S),
where bT = 1, dT = 0.05, and S = 0.1. The incidence functions are plotted in Figure 2.
3. AN
EXTENDED MODEL OF
MULTIPLE PATHWAYS CARCINOGENESIS
At EPA, currently experiments are being conducted for assessing risk of dichloroacet,ic acid (DCA) on induction of hepatocellular carcinomas in mice. In these experiments. mice were
Stochastic
--------
low middle
------
high
dose
Modeling of Carcinogenesis
1329
le&l dose
dose
‘Igvel level
Figure 2. Plots showing the incidence with parameter values in Section 3.
function of a three-stage
exposed to DCA and preceding to carcinomas,
hyperplastic
multievent
model
nodules, dysplastic
nodules, and
adenomas were observed; furthermore, adenomas might also be evolved from hyperplastic nodules and dysplastic nodules. Let N = Ia denote normal stem cells and Tp the tumor cells; let II, Ig, and 1s denote initiated cells which lead to hyperplastic nodules, dysplastic nodules, and adenomas through clonal expansion, respectively. Then, based on studies by DeAngelo [5], the model can be presented schematically as in Figure 2. Let Ij(t) (j = 0,1,2,3) be the numbers of I3 (j = 0,1,2,3) cells at time t, respectively; let Tr (t), Tz(t), and Ta(t) be the numbers of hyperplastic nodules, dysplastic nodules, and adenomas at time t, respectively. To derive stochastic equations for these variables, we let the transition rates of various paths during (t, t + At] be given in Table 1. Further, as in Section 2, we follow [8] to assume that carcinomas derive by clonal expansion of primary tumor cells (T,, cells) and that hyperplastic nodules, dysplastic nodules, and adenomas derive from primary II, 12, and Is cells, respectively, by clonal expansion. (Primary Is cells are Is cells arising from {I,, i = 0, 1,2} cells directly by mutation or some genetic changes; the primary Ij cells for j = 1,2 are Ij cells arising from 10 cells directly by mutation or some genetic changes.) 3.1.
Stochastic
Differential
Equations
for Ij(t)
Denote by BJ (t) = number of new I3 cells generated by stochastic cell proliferation and differentiation of Ij cells during (t, t + At], j = 0, 1,2,3; Dj (t) = number of death of Ij cells during (t, t + dt], j = 0, 1,2,3; Mj((t) = number of new Ij cells arising from N = 10 cells by mutation or some genetic changes during (t, t + At], j = 1,2,3; Ni(t) = number of new 1s cells arising from Ii cells by mutation or some genetic changes during (t, t + At], i = 1,2; Tr,j(t) = number of new Tp tumor cells arising from I3 cells by mutation or some genetic changes during (t, t + At], j = 1,2,3.
W.-Y.
1330 Table
1. Transition
TAN et al.
rates of various paths for the multiple pathways
model in Sec-
tion 3.
Transition Trausition Probabilities
tth
day
During
(t + l)t”
1)
day
110
-
1flJ
--*
llo
-
1 1091 II
110
-
1 lo,1
I2
az(t)
llo
-
1 zo.1 13
a3(t)
1II
4
2 11
121
-
death
111
-
lZl,l
13
Pl @I
1II
-
1 Zl,l
Tp
n(t)
II2
-
112
-
122
+
lZ2,l
112
-
1 1271 Tp
113
-
2 13
113
-
death
d3(t)
113
-
11391 Tp
73(t)
2 10 death
ho(t) do(t) w(t)
h(t)
&(t)
12
bz(t)
death
dz(t)
2
13
Then, as in Section 2, we have the following stochastic Ii(t),
[t,t +
02(t) -Yz(t) b3(t)
difference equations for N(t) = lo(t) and
i = 1,2,3:
lo(t + dt) = lo(t) + Be(t) - Do(t), 1jj(t + dt) = I#)
+ hlj@)
+ B,(t)
13(t + dt) = 13(t) + MS(t) + 2
(13) - Dj(t),
j = 1,2,
(14)
Nj(t) + E&(t) - Ds(t),
(15)
j=l
T&
+ dt)
=
2
T&(t).
(16)
j=l
In equations (l3)-(lG),
the probability distributions
of the random variables are given as fol-
lows. (1) To order o(At), the conditional distribution of X0(t) = [Be(t), Do(t), Ail,(t), j = 1,2,3] given N(t) = 10(t) at time t is multinomial with parameters {lo(t);bo(t)At,do(t)At, a,(t)At,
j = 1,2,3}. X0(t)
That is,
1lo(t) - ML {l,,(t);
b,,(t)At, do(t)
cxj(t)At,
j = 1,2,3} .
(2) X3(t)
=
(3) For j = 1,2,
[B3(t)r
03(t),
Tp3(t)l
I 13(t)
N ML
{13(t);
b3(t)M
d3(t)&
r3(t)At}.
StochasticModelingof Carcinogenesis
1331
These distribution results lead to the following stochastic differential equations for Ij(t), j = 0, 1,2,3:
AI,(t) = Io(t)eo(t)At + eo(t)At, Al,(t)
= {lo(t)q(t)
a&(t)
=
(17)
+ Ij(t)q(t)}
r,(t)&)
+ &)&(t)
At + ej(t)At, + k&k&)
j = 1,2,
(18)
At + es(t)Ak
(19)
j=l
where Ej(t) = bj (t) - dj (t), j = 0, 1,2,3 and where the random noises ej are defined by
[Be(t) - Io(t)bo(t)Atl - [Do(t)- lo(t)do(t)Atl
eo(t)At
=
~j(t)At
= [n/Ij((t) - Io(t)aj((t)At]
~3(t)At = (AIs
+ [Bj(t) - Ij(t)bj(t)At]
- lo(t)~~s(t)At] + 2
for j = 1,2,
,
- [oj(t)
[Nj(t) - Ij(t)Pj(t)At]
- Ij(t)dj(t)At]
1
and
+ [B3(t) - Is(t)bs(t)At]
j=l -
[Wt)
-
M~MW~l.
Note that, given the numbers y(t)
j = O,l, 2,3} at time t, the random noises {ej(t),
= {Ij(t)l
j = O,l, 2,3} in equations (17)-(19) are residuals of linear combinations of multinomial random va.riables from its conditional mean values; hence, these random noises have expectation zero conditionally and are uncorrelated with elements of V(t). Further, to order o(At), the variances and covariances of elements of the ej(t)s are easily obkined where, for j = 0, 1,2,3,
Qjj(t)
= E
[ e&j15
[l - bj] Io(t)aj(t) + 1 -
i=o
j=O,l,...,
k - 1, Qij(t)
as COV[e,(t)At,
ej(t)At]
= Qij(t)At,
Iu(t)Pu(t) + Ij(t) [bj(t) + dj(t)]
3
u=l
= 0, for i # j.
3.2. The Probability Distributions of {Tj(t), j = 1,2,3} and the Number T(t) of Detectable Carcinomas Define the probability Pj(s, t) by 1 Pj(S,
t)
gj(slt)
R.,-1
hj(S, t)
=
+ hj(sd)
Sj(S,
t) -t hj(S,
t)
>
’
(20)
where R,i is the minimum number of cells in the preneoplastic lesion (the hyperplastic nodules, or dysplastic nodules, or adenomas) for its detection, gj(s, t) = exp{-
S,t[bj(Z) - dj (x)] dx} and
hj(s:t) = J~‘j(y)gj(s,y)dy* Then P’(s, t), j = 1,2,3, are the probabilities that a primary Ij cell, j = 1,2,3, at time s will give rise to a detectable hyperplastic nodule, a detectable dysplastic nodule, and a detectable adenomas through clonal expansion, respectively. Let
denote the conditional
PGF of Tj(t),
j = 1.2 given {Io(s),s
the conditional PGF of 7’s(t) given {Ij(S),
j = 0,1,2,
the conditional PGF of T(t) given {Ij(s), j = 0,1,2,3, approach from Section 2, we obtain the following.
s 5 s 5
5 t), &(u, t ( Jj, j = 0,1,2)
t}, t}.
and @T(u, t 1 Lj. j = 0,1,2,3) Then, using exactly t,he same
W.-Y.
1332
TAN
et al.
For j = 1,2,
4T
(%t 10,$,
j=l,2,3)
=exp{(li-l)~~~i(i:)h(i)R(r.t)d.j;
where PT (x, t) = PT (t - x) is given in equation From the above {k(s), tional
results,
it follows that
s I t) is P oisson with mean
the conditional
C;=, W-c)Pi(z)]fi(z, t) dx; th e conditional is Poisson with mean C&, St’, li(x)ri(x)&(x, tions for {Tj(t),
t J
j = 1,2,3}
Tj(t) =
s 5 t} is Poisson
j = 0,1,2,
of T’(t)
distribution
t) d2, respectively.
J:, Io(z)aj(s)Pj(x,
of 7’s(t) given {Ij(s),
distribution
(6). the condi-
with mean j~~[I~(~)os(~)
of T(t) given {Ij(s),
distribution
(j = 1,2) given
Similarly,
j = 0, 1,2,3, s 5 t}
t) dx. Thus, we have the following stochastic
t J[
l~(z)~j(z)pj(z,
for j = 1,2,
t) dx + &j(t),
4(l)a3(~)+~4(.)/3~(2)
to
T(t)=
2
(21)
where ~j (t), j = 1,2,3
I
(23)
to
and ET(t) have expected
value zero. The covariances
t J[
cov(cj(t),ej(T)) = JtE[~o(r)]aj(x)~j(x.i) dx, to
cov
(22)
and
P3(x,t)dx+E3(t),
j=l
Jtli(x)71(T)R(x,t)dx+&T(t),
j=l
(Q(t),&(T))
equa-
and for T(t).
to
Ts(t) =
+
=
E
Io(z)c~~(x)+ eIj(X)q(X)
to
j=l
are given by
for j = 1,2,
1
I'3(X,f) dx,
and
rjlx)h(x,fldx, j=lJ’E to[Ij(x)]
COV (ET(t), ET(T)) = 2
where t = Min( t? T) . 3.3.
The Incidence Function of Detectable
Denote by QT(t) = exp{of detectable tumors is
c,“=,
Tumors
A: Ij(x)yj(x)PT(x,
t) dx}. Tl len, the incidence
function
AT(t)
(24) Assuming a small time interval such as one hour or 0.1 day as one time unit, XT(t) can be approximated, respectively, by
-
2
eIj(i)^i3(i)J)T(t
r=to+l j=l
-i)
then
C&-(t) and
Stochastic
Modeling of Carcinogenesis
1333
and (25) To compute for {Ij(s), s
XT(t), one may use the stochastic equations in (13)-( 15)to generate a random sample j = 1,2,3, s < t} of size N. Denoting the generated sample by {[ii)(s), j = 1,2,3, for , N}, then Qdt) and b(t) are computed by the following approximations:
XT(t)
N
(26)
e ki,(i)r,(i) $
PT(t -i),
j=l
%=to+l
where
ij(i)
=
&
and
$ @?(t)Ij’)(i), r-1 -
k
$Ij”(i)*li(i)I+(t
i=to+l
-
i)
j=l
To illustrate, we assume some parameter values for the extended multiple pathways model in Figure 3 and compute the above incidence functions. The parameter values are assumed as follows. (1) For 1, cells, do(t) = 0.289(0.485 +0.05/(1-t t)), be(t) -do(t) = &Nembot, cq = O.O003w0&, i = 1,2,3, where &N = 0.0005, 60 = 10e5, wg = l/3.65, X1 = 0.509, AZ = 0.5112, XJ = 0.5001. (2) For Ii, i = 1,2,3 cells, hi(t) = wo(bi + /.~o(l + ZI)~‘), di(t) = wodi, PI = 02 = wo(lO-” + 5-*zr),y1 = 72 = wo(lO-"+ 10e5zr), 73 = wo(lO-"+ lO-'z~), ZI = 0,50,100, where bl = 0.509, dl = 0.48, b2 = 0.5112, d2 = 0.4888, b3 = 0.5001, d3 = 0.4999, p. = 0.01, and p1 = 0.1, where ZI is the dose level of a promoter. (3) For the primary tumor cells, bT(s,t) = b,ea(t-s), dT(s,t) = dTea(t-s), where bT = 1, dT = 0.05, and 6 = 0.1. The incidence functions are plotted in Figure 4.
-
Hyperplastic
0.9 -
l
* .. l
Figure 3. A multiple pathways dichloroacetic acid (DCA).
Nodule
Adenomas
.. -
Carcinomas
Dysplastic Nodule
model for hepatocellular
carcinomas
induced by
W.-Y.
Figure 4. parameter
TAN et al.
Plots showing the incidence function of a multiple pathway values in Section 3.
4. SOME KALMAN FILTER OF CARCINOGENESIS
model with
MODELS
Kahnan filter models (state space models) are stochastic models consisting of two submodelst,he stochastic system model which is the stochastic model of the system and the observation model which is a statistical model based on some data from the system. For carcinogenesis, the stochastic system model of the Kalman filter model is the set of stochastic differential equations for the numbers of normal stem cells and initiated cells as described in Sections 2 and 3; the observation model of the Kalman filter model is a statistical model based on some available cancer data from the experiment. To derive the observation model, we assume that data on the numbers of detectable preneoplastic lesion and of detectable cancer tumors per individual are available. For skin cancer, one may associate preneoplastic lesion with papillomas and associate cancer tumor with carcinomas. Then such data are available in several animal initiation and promotion experiments; see [lO,ll]. 4.1. Some Kalman Filter Models for the Multievent Model of Carcinogenesis For the multievent model described in Section 2, the stochastic system model of the Kalman filter model is represented by the stochastic equations (3),(4). To describe the observation model, let Yl (i, j) and Yz(i, j) denote the observed numbers of detectable preneoplastic lesion per individual and of detectable cancer tumors per individual of the it” individual at time tij (i = 1,. . . , m; j = l,..., ni), respectively. Then, the observation model is specified by the following statistical equations (stochastic equations): q(i,.?)
= TP (&j) + el(i,.G,
(27)
fi(i,j)
= T (&j) + en(i,j),
(28)
Stochastic Modelingof Carcinogenesis
1335
(U = 1,2) is the where T(t) and Tp(t) are given by equations (7) and (9), respectively, and e,(i,j) random measurement error associated with measuring Y,(i, j) (U = 1,2). As usual, it is assumed = az(i,j) (U = 1,2). One may assume that that E[e,,(i,j)] = 0 (U = 1,2) and VAR(e,(i,j)) these measurement errors are uncorrelated with one another and uncorrelated with the random noises in the stochastic system equations. 4.2.
Some Kalman Filter Models for the Multiple Pathways Model
of Carcinogenesis
For the multiple pathways model given in Section 3, the stochastic system model is specified by the stochastic equations (13)-(16) given in Section 3; for the observation model, the statistical equations are represented by Tj(t) (j = 1,2,3) and T(t) given by equations (21)-(23), Let Yj(t) (j = 1,2,3)
be the observed numbers of T’(t)
ti, J t*.i J
(j = 1,2,3),
respectively.
respectively, and Y(t) the
observed number of T(t). Then the observed equations are given by
K(i,j) = &(i,j)
=
YT(i,j) =
Io(x)c~~(x)
to
[
3
rti.i
C J
for
+ 5
Ii(x)&(x)
i=l
[~i(x)Ti(x>l
to
i=l
7‘= 1,2,
dx + [4&j) + e,(i,j)l ,
~~(x)c+(x)P,-(x,&j)
to
pT
1
J’3 (x,hj)
da: +
[~3(hj)
(x, tij) dx + [ET(hj) + ed&.dl
+ e3(%j)]
(29)
,
(30)
(31)
I
i = 1,. . . ,172, j = 1,. . . , lli. For implementing the above model to derive some optimal estimates of the state variables and
the unknown the time unit for j - 1 5 t equations in
parameters, we assume a very small time interval such as one day or one hour as and let At m 1 for this time unit. For such a time unit, one may aSsume Ii(t) = lij 5 j = j - 1 + At, i = 0, 1, . . . , k - 1. Then one may approximate the observation (27),(28) for the k-stage multievent model by the following stochastic difference
equations:
Y~(i,j)
t1.l C lo(S)Cro(s)J?
2
(sltij)
+
[El
(tij)
+
+
[&2 (kj)
(32)
el(i,j)],
s=to-t-1 t,j yZ(i,j)
%
C
Ik-I(s)Q~-l(s)&
(3,&j)
+ ez(i,j)],
(33)
s=to+1
for i = 1, . . . ) 112,j = 1, . . . , IQ. Similarly, one may approximate the observation equations for the multiple pathways models in Section 3 by the stochastic difference equations t;, Y,(i,j)
z
Y3(irj) =
C lo(S)W(S)p, s=to+1 t,; C
10(s)a3(s)
s=to+1
(s,tij)
+ [ET(hj)
I
+ 54(s)B(s)
fi
1=1
[
t*, YT(i,j)
E=
c
2
[Il(S)?‘l(S)]
PT
(sltij)
+
+
r = 1,2,
e,(i,j)l,
(S.&j)
[ET (bj)
+ [&3 (hj)
+ eT(&j)]
1
+ e3(Cj)],
(34)
(35)
(36)
s=to+11=1
1,. . , m, j = 1,. . . ,1zi. Given the above approximation, general theorems have been developed in [12] to derive the linear best estimates of the numbers of normal stem cells and the initiated cells given data for i =
W.-Y.
1336
TAN et al.
given the parameter values. To illustrate the application of these theorems, we assume an extended multievent model with k = 2 as given in Section 2 and generate some Monte Carlo and
data to estimate the numbers of normal stem cells and initiated cells over time. The assumed parameter values are as follows. (1) For normal stem cells, do = 0.289(0.485 + 0.05/(1 + t)), be - do = &NemdNt, LYN= QO+ ~13~1, where &N = 10m4, bN = 10P5, (~0 = 10m7, or = 1O-G, and z = 50. (2) For the initiated cells, by
= wyr, dl(t)
= wp~, XI = lo-“,
w = wa + zrwr, where
~1 = 0.50, ~1 = 0.4889, wa = 113.65, wr = 10m5, and zr = 50. (3) For tumor cells, b~(s,t)
= bTe-6(t-S),
d~(s,t)
= dTemGctmS),where by = 1, dT = 0.05,
and 6 = 10P3. Table 2. Generated number of N(t) and their Kalman filter estimators
fi(t).
Time(Week)
0
2
4
6
8
10
12
N(t)
1000000.0
1000913.0
1001568.0
1000716.0
997162.0
998408.0
1000183.0
J+(t)
1000000.0
1000012.1
999992.4
999969.8
999969.6
999981.7
999997.0
Time( Week)
14
16
18
20
22
24
26
N(t)
1000000.0
1000012.1
999992.4
999969.8
1007272.0
1005793.0
1005341.0
s(t)
1000013.6
1000030.9
1000048.4
1000066.0
1000084.0
1000102.4
1000120.8
Time( Week)
28
30
32
34
36
38
40
N(t)
1004058.0
1001504.0
1001922.0
999118.0
994664.0
1000077.0
994378.0
a(t)
1000139.0
1000156.5
1000174.0
1000191.5
1000208.6
1000225.3
1000241.9
Time(Week)
42
44
46
48
50
52
N(t)
994611.0
1000701.0
1000061.0
995300.0
994549.0
999390.0
Nt)
1000258.5
1000274.3
1000290.1
1000305.8
1000321.3
1000099.2
Figure 5. Plots showing the generated number of Z cell and the corresponding filter estimates over time.
Kahnan
Stochastic
Modeling of Carcinogenesis
1337
The observed number of detectable cancer tumors per individual were generated by adding a Gaussian random error to the generated number, where the Gaussian random error has mean zero and variance M(t)a2
with M(t)
being the number of the generated detectable number of
cancer tunlors per individual at time t and with u2 = 10. Given in Table 2 are the generated numbers and the corresponding Kalman filter estimates of the normal stem cells over time. Given in Figure 5 are the plots showing the generated numbers and t,he corresponding Kalman filter estimates of the initiated cells over time. The results in Table 2 and Figure 5 show clearly that the Kalman filter estimates can trace the generated numbers very closely, indicating the usefulness of these theorems. 5.
APPLICATIONS
TO INITIATION-PROMOTION
For t,he purpose of illustration, consider an initiation-promotion
DATA
experiment for skin cancer
in mice (see [lO,ll]). In these experiments, at time to similar mice are treated by an initiator for a very short period (normally a few days); these treated animals are then promoted during the period (tm,te] by a promoter.
At a number of fixed times after initiation, a fixed number
(say, ‘/rj at time tj) of treated animals are sacrificed and autopsies performed. The observed data are as follows: (1) the number of animals with papillomas at time tj among the sacrificed animals at tj, j = 1,. . . , k; (2) given that the animal has papillomas, the number of papillomas per animal are then
counted; see [lo]. Given in Table 3 are a set of data from (lo]. In some experiments, the number of animals with carcinomas at time tj among the sacrificed animals are also counted; given that the animal has carcinomas, then the number of carcinomas per animal at tj are counted. (Such data have been reported in [ll].) To model carcinogenesis for these experiments, let Z,(i, j) be the number of papillomas per mouse and Z,(i, j) the number of carcinomas per mouse for the ith mouse sacrificed at time tj. Then, assuming a two-stage model as given in Figure 1, the state variables for this animal are described by lo(i,j),&(t) (to < t 5 tj), Z,,(i,j) and ZC(i,j), w 1lere lo(i, j) denotes the number of initiated cells by the initiator in this animal, and Ii,j (t) the number of I cells at time t for this animal. As usual [I], we assume that the numbers of normal stem cells (say NO at time to) are very large, the mutation rate (say CrN(to)) of normal stem cells at to very small but X = NOON (to) is finite; in practice, it may also be assumed that the spontaneous mutation rate of normal stem cells are negligible so that almost all papillomas arise by clonal expansion of initiated cells induced by the initiator. Then the &(i, j)s are distributed independently as Poisson with mean X; the probability distributions of the state variables are derived in the following section. 5.1.
The Probability
Distributions
of the State Variables
Let one time unit be a small interval such as one hour or 0.1 day. During (t, t + l], let the probabilities of birth, death, and mutation from I Tp of the I cells be given by br(t), dI(t), and aI( respectively. Similarly, let b~(.s, t) and &(s, t) denote the probabilities of birth and death during (t, t + l] of primary tumor cells which arise from 1 cells by mutation at time s. For the 9” animal sacrificed at time tj, let Bi,j(t), Di,j(t), and &(t) denote the total numbers of birt,h. de&h, and mutation (from I Tp) of I cells during (t, t + 11, respectively. Then, we have lo(i,j) Z,(i,j)
N Poisson(X),
independently for i = 1, . . . , nj ,
( Io(i,j)
- B{Io(i,j);PI(to,tj)},
{Bi,j(t),
oi,j(t),
bfi,j(t))
I 4.j(t)
j = 1,. . . , k:
independently for i = l,...,?zj; N ML {Ii,j(t);bl(t),dl(t),crr(t)}
independently for i = 1, . . . ,1x3.
and 3
W.-Y. TAN et al.
338 Table 3. Observed Nissan as initiator.
mean number of papillomas
per mouse from EPA data using
1
A: Low Dose Level Time(Week)
1
2
3
4
5
G
7
8
9
10
11
12
13
l’(t)
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.01
0.11
0.15
0.06
0.18
0.27
(80)
(80)
(80)
(80)
(80)
(80)
(89)
(80)
(80)
(80)
(80)
(80)
(80)
Time(Week)
14
15
16
17
18
19
20
21
22
23
24
25
26
Y(t)
0.35
0.28
0.31
0.25
0.27
0.35
0.32
0.32
0.41
0.35
0.46
0.49
0.44
(80)
(80)
(80)
(80)
(79)
(79)
(79)
(79)
(79)
(79)
(79)
(79)
(78)
Time(Week)
27
28
29
30
31
32
33
34
35
3G
l’(t)
0.47
0.51
0.42
0.44
0.56
0.50
0.57
0.63
0.60
0.61
0.61
0.65
0.73
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(76)
(76)
(76)
(76)
(76)
(74)
40
41
42
43
44
45
46
47
48
49
50
51
52
Time(Week) l’(t)
0.62
0.68
0.59
0.75
0.54
0.64
0.58
0.72
0.84
0.66
0.64
0.66
0.69
(74)
(74)
(74)
(68)
(6’3)
(64)
(GO)
(59)
(58)
(57)
(54)
(54)
(52)
B: Middle Dose Level Time(Week)
1
2
3
4
5
G
7
8
9
10
11
12
13
l’(t)
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.02
0.07
0.17
0.2G
0.39
0.39
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(76)
Time(Week)
14
15
1G
17
18
19
20
21
22
23
24
25
2B
l’(t)
0.59
0.61
0.84
0.81
0.97
0.95
1.07
1.07
1.09
1.17
1.23
1.X
1.01
(76)
(76)
(76)
(76)
(76)
(76)
(76)
(7(i)
(76)
(76)
(76)
(76)
(75)
27
28
29
30
31
32
33
34
35
36
37
38
39
1.38
1.52
1.41
1.40
1.38
1.44
1.41
1.40
1.62
1.42
1.52
1.23
1.36
Time(Week) I’(t)
(75) Time(Week) Y(t)
(75)
(75)
(75)
(75)
(75)
(74)
(71)
(70)
(‘38)
(68)
(68)
(68)
40
41
42
43
44
45
4G
47
48
49
50
51
52
1.28
1.40
1.32
1.38
1.45
1.29
1.34
1.39
1.42
1.26
1.31
1.35
1.40
(67)
(62)
(62)
(60)
(60)
(55)
(53)
(51)
(49)
(49)
(47)
(45)
(44)
C: High Dose Level Time(Week)
1
2
3
4
5
G
7
8
9
10
11
12
13
y(t)
0.00
0.00
0.00
0.00
0.00
0.01
0.15
0.23
0.66
0.84
1.14
1.79
1.83
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(78)
(77)
14
15
16
17
18
19
20
21
22
23
24
25
26
Time(Week) l’(t)
Time(Week) l’(t)
2.57
3.32
3.53
3.51
3.77
3.80
3.64
3.74
3.87
4.31
4.51
4.76
5.52
(77)
(77)
(77)
(77)
(77)
(77)
(77)
(77)
(77)
(76)
(7’3)
(76)
(75)
27
28
29
30
31
32
33
34
35
36
37
38
39
5.34
5.51
5.51
4.85
4.58
4.61
5.06
5.14
5.10
4.95
5.18
4.55
4.43
(75)
(74)
(74)
(74)
(73)
(73)
(72)
(71)
(69)
(67)
(65)
(G5)
(65)
Time(Week)
40
41
42
43
44
45
46
47
48
49
50
51
52
y(t)
4.88
4.49
4.00
4.35
4.18
4.42
4.04
4.62
4.31
4.3B
3.6I.i
3.tiG
4.47
(62)
(59)
(57)
(54)
(54)
(52)
(48)
(45)
(45)
(44)
(39)
(39)
(3(i)
The number in parentheses is the total number of mice examined
StochasticModelingof Carcinogenesis
1339
Furthermore, for to < t < tj,
A.j(t +
1) = ~t,,tIO(i,j)
+
Ii,j(t) +&j(t)
with Ii,j (to) = 0,
- oi,j(t),
(37)
where 6i.j is the Kronecker’s 6. Using these results, we have that Z,(i, j) is distributed as Poisson with mean {A Pl(to, tj)} independently for i = 1, . . . , tlj. To derive the probability distribution of Z,(i, j), let Z,(i; s, tj) denote the number of detectable carcinomas of the it” animal at time tj arising from primary tumor cells at time s. Then, &(i, j) = C?!=,,+, Z,(i; s, tj) given Ii,j(S) I cells at time S is
Z,(i; s, tj) and the conditional distribution of
2, (6 s,tj) 1Iz,j(s) m B {IQ(S); al(s)h(tj
- s)}
independently for i = 1,. . . , IQ and s = to + 1,. . . , tj. Thus, the conditional PGF of Z,(i; s, tj) is (1 + (IL
7
l)O!I(S)PT
(tj
-
S)}““(‘) = exp Z
for small al(s). with mean Ci&+,
{I.;,j(S)
exp{(u
-
[l + (U - l)aI(s)PT
log
l)li,j(S)(.rI(S)PT(tj
S)]}
(tj -
-S)}
It follows that the conditional distribution of ZC(i, j) is approximately Poisson Ii,j(s)al(S)PT(tj
- s) independently for (i = 1,. . . , TIj, j = 1,. . . , k) and for
small a~@). 5.2. The Kalman Filter Model Let Yp( i, j) and Y,(i, j) denote the observed numbers of papillomas per mouse and of carcinomas per mouse for the it” animal sacrificed at tj. Let one time unit be a very small interval such as one hour or 0.1 day. Then, with the distribution theories given above, we may define a discrete-time state space model (Kalman filter model) for the papillomas and carcinomas. In this st,ate space model, the stochastic system model is specified by the state variables {lo(i, j), Ii,,j(t) w hereas the observation model is specified by Yp(i,j) and Y,(i, j). (to 5 t L tj), -&(i,j), &(i,j)} With iid denoting independently and identically distributed, the stoch,astic system model and the observation model of the state space model are given by equations in (A) and (B), respectively. (A) STOCHASTIC SYSTEM MODEL.
Io(i,j)
‘2 Poisson(X) , i=
-G(i,j)
I Mi,d
l,...,llj;
j = l,...,k;
N B(Mi,j);P~
(38)
(to,tj)),
independentlyfor i = 1,. . . , Ilj, Ii.j(t + 1) = sj.,,lo(i,j)
I A,j(t)
(39)
+ Ii.j(t) + Bv,j(t) - oi.j(t),
with I;,j(to) = 0, [B;.J(t).D,,.j(t)?nr,,~(t)]
j = l,...,k;
and
(40)
N n,rL{l;,(t);br(t),dr(t),nr(t)}, independently for i = 1, . . . , Itj;
&(i,j)
=
5
(41)
T,(s,t,).
s=to+l
where T, (s,tj)
1 Afi,j(s)
-
B
{k&(s);
independently for i = 1, . . . ,?lj.
PT(tj
-
s)} . (42) (43)
W.-Y.
1340
(B)
TAN
et al.
OBSERVATION MODEL.
for i = 1,. . . , nj, j = 1,. . . , k, where e,(i, j) and eC(i, j) are the measurement random errors for measuring Yr,(;,(i,j) and Y,(i, j), respectively. (One may assume that these measurement errors are independently
distributed
of one another and of the random noise variables in the system
model, and have expected values zero and variances Var{e,(i,
= a; and Var{e,(i, j)}
j)}
= 0,’
for ep(i, j) and eC(i, j), respectively.) If one has data only on Z,,(i,j) variables
and if one is only interested
in the state space model are lo(i, j) and 2,(&j),
in papillomas,
then the state
and the observation
model is only
specified by Y,(i, j). 5.3.
The Probability
Distribution
of the Number of Animals with Papillomas or Carcinomas In initiation-promotion experiments, data on the numbers of animals with papillomas or carcinomas may also be available. In these cases, one would need to obtain probability distributions of these numbers. To derive these probability distributions, assume that IZj animals are sacrificed at, time tj. Among these sacrificed animals, let Yp2(j) be the observed number of animals with papillomas and YC.(j) the observed number of animals with carcinom,as. For an animal sacrificed at time tj and initiated by the initiator at time to, the probability that this animal will develop at least one papillomas by time tj is Qp(j) = [l - exp(-A)]-’
cexp(-x)$2
(1)
[pr
(to, $)I”
[I -
pr (to, Q)]‘-’
i=l
r=l
= [l - exp(-A)]-‘$Jexp(-X)$
{I-
[l - PI
(tj)l’l
r=l =
[i -
(1 - exp [-API
exp(-A)]-1
($)I).
It follows that
pr {5,2(j)
= 112) =
5 (‘y)[l -
exp (-X)li
[exp (-X)1”.‘-”
i=m X
1 - exp [-X PI
i m (>(
(tj)]
1 -exp(-X)
i-m
9n >
l
_
1 -
w
[-A
PI
1 - exp(-X)
(
(t.;)] )
To derive Pr{YC2(j) = m}, partition the time interval (to, tj] into Ilaj nonoverlapping subin~nj with length As SO that AS = (tj - to)/,ttzj. Then, given that this tervals L,, s = l,..., animal has I(s) I cells at time s, the probability that these I(s) I cells will develop at, leasst one carcinoma by time tj is
(ll(s, tj) =
ICs)I(s)
C
( i ) [alas]’
[l -
QI(s)As]~(~)-’ 2 (i) [PT (S.
r=l =
tj)]'"
u=l
[al(s)A
= I - {I - al(s)&
[l -
cr~(s)As]‘(~)-” { 1 - [1 -
(S,tj) AS}‘(‘).
PT
(s,tj)]‘}
[l - PT (S, tj)]"-'"
Stochastic
Modeling of Carcinogenesis
1341
Thus, the probability that an initiated animal at time to will give rise to at least one carcinomas isL ‘%
l-E1
lim
As-0
where Q(tj)
n
mj
lim
[l - $ (s,tj)]
As-+0
.9=1
= Er exp{-
n
$i I(s)a~(s)P~(s,
(1 -(rr(s)P~(s,t~)As}“~’
s=l
tj) ds} with Er denoting taking expectation
over
{I@), to < ‘U5 tj}. It follows that Pr {Y&)
= ??2} = 2
(7)
i=m 5.4. Fitting
of Some
[1 - exp (-A)]” [exp (-A)]“‘-”
Papilloma
x (iI)
[I - Q ($)I’”
[Q
(h)li-” .
Data
For the papilloma data given in Table 3, one may use the Kalman filter model to estimate the mutation rate of normal stem cells and the birth rate and death rate of the initiated cells. Given in the Appendix is a procedure for achieving these purposes. To fit the data in Table 3, assume b,(s,t)
= br, dr(s,t)
dr(s,t) = d2 if Tr I t < T2; bl(s,t) = bar dl(s,t) Then the estimated values are as follows.
= dr if time t < Tr; bI(s,t)
= 02,
= d3 if t > Ts.
the high dose level, Tl = 15 weeks, Ts = 28 weeks; X = 350, bl = 0.509, dl = 0.490, bz = 0.501, d2 = 0.497, b3 = 0.503, d3 = 0.504. (2) For the middle dose level, Tl = 14 weeks, T2 = 26 weeks; X = 150, bl = 0.508, dl = 0.490, b2 = 0.502, d2 = 0.499, b3 = 0.503, d3 = 0.5028. (3) For the low dose level, Tl = 14 weeks, TZ = 26 weeks; bl = 0.509, dl = 0.490, b2 = 0.500, d2 = 0.499, b3 = 0.503, d3 = 0.500.
(1) For
XX‘Ic1Ia-r-
X-W-I
Figure 6. Plots showing observed and fitted mean numbers of papillomas per mouse for data in Table 3.
W.-Y.
1342
I-o...,
90 7-I-e
TAN et al. clor-
1-w-.
‘s-a
-e-Cc
nn
Figure 6. (cont.)
Plotted in Figure 6 are the fitted and the observed mean numbers of papillomas. The results revealed good fit of the data by the proposed model.
6. CONCLUSIONS
AND
DISCUSSION
As we have noted in the introduction, in many practical problems the process of carcinogenesis may be very complicated, far beyond the framework of the MVK two-stage model. Furthermore, an assumption that cancer cells are identical to cancer tumors has to be made. In these situations, the standard approach via Markov theories appears to be too complicated to be of much use. In order to extend the stochastic models, in this paper we have thus proposed an alternative approach through the stochastic differential equations. By using these stochastic differential equations
APPENDIX A PROCEDURE PARAMETERS
FOR ESTIMATING UNKNOWN FROM PAPILLOMA DATA
For the Kalman filter model given in Section 5, the following procedure can be used to estimate the mutation rate of N cells and the birth rate and death rate of I cells. The general theories
StochasticModelingof Carcinogenesis
1343
of this procedure are given in [13] and [14]. The procedure consists of two major steps: Step A generates the state variables (i.e., X = {10(&j), Z,(i,j), i = 1,. . . ,TZ~,j = 1,. . . , k}) from the conditional distribution Pr{X ] Y, 0) of X given the observed data Y and the parameter values; and Step B generates the parameter values from the posterior distribution of 8. These are the so-called multilevel Gibbs sampler method [16]. By applying the weighted bootstrap method (see [17]), Step A is achieved by the following substeps. STEP A. The following procedure is a modification of the method given in [13,14].
(1) One (2)
first generates N independent sequences
{I,$‘)(i, j),
i = 1,. . .) nj 7 j = 1,. . . , k}, T =
1,...1 N, by using (1:’ (i, J’) N Poisson(X), independently for i = 1,. . . , IZ~, j = 1, . . . , k}, r=l,...,N. Given {IF’(i,j), i = 1,. . . ,nj, j = 1,. . . ,k}, r = 1,. . . ,N, generate ZF’(i,j) by USing binomial distribution Z,(i,j)
1 Ir’(i,j)
where Pl(tj,E)
N B(~~‘(i,j),Pr(tj,B)),
=
PI(tO,tj). (3) Compute
w,(j) = exp
[-& (Y(j)- z:)(j))21 , and
JJJm
a-m=
pJ
.
El wu(.i) (4
Draw a random sample from the population with elements {El, . . . , EN} with probability qr(j) associating with drawing element E,. If the outcome is E,, then i = l,... l,..., k} is the random sample from Pr{X ] Y, E}. ,12j,j
{IF)(i,j),
Zr’(i,j)?
=
STEP B. Let P(X, 19)= P(X)P(E)
be the prior distribution of {X, E}. Let X = {ic(i,j),
.&,(i,.j),
i = l,..., k] be the {~~(i,.d,&(i,.d) s generated in Step A. Then the posterior distribution of #J= {X, E} is P(X ] X, Y)P(f ] X, Y) l,...,?Zj,j
=
P(X/-tY) cc exp(-nX)XioP(X), wllere n = c,“=, Izj, io = C:=, lo(j), and d(j)
where 9(X,!)
= C~=l{~~(j)logP,(tj,B)
= CYLI io(Cj)l
+ (is(j)
- i,(j))log[l
and
- PI(tj? f)]}
with 21j(j)
=
C:“& .&W). Given the above specifications, from P(z
] X,Y).
Step B then generates X from P(X ] X,Y)
and generates c
(T o improve efficiency, one may generate a large number of samples and use
sample mean as estimates.) To implement the above method for generating X, one may assume P(X) as Gamma with the two hyperparameters being determined by the estimate x of X and the estimate of the standard error of i in some previous studies. To implement the above method for generating 8, one must first find 8 for which g(X, 0) is closely approximated by a Taylor series expansion to second order
1344
W.-Y.
TAN et at.
where
where 6
= i + (P + 61)-l?.
If P(0) is noninformative
or normal, then P(B 12, Y) is normal
so that-h can readily be generated. To find t so that g(_$, f) is closely approximated by a Taylor series expansion to second order, one may proceed as follows. (1) Given 19 , compute “j
forsomeS>O;j=O,l,.... (2) Compute Rj = zj’(F Ej
=
+ SI)-lEj,
j = O,l,.
.
. . If Rj 5 E for some small E, then choose
Es
REFERENCES 1. W.Y. Tan, Stochastic Models of Carcinogenesis,Marcel Dekker, NewYork, (1991). 2. S.H. Moolgavkar, A population perspective on multistage carcinogenesis, In Multistage Carcinogenesis, (Edited by CC. Harris et al.), CRC Press, Boca Raton. FL, (1992). 3. M.P. Little, C.R. Muirhead and C.A. Stiller, Modelling lymphocytic leukaemia incidence in England and Wales using generalizations of the two-mutation model of carcinogenesis of Moolgavkar. Venzon and Knudson. Statistics in Medicine 15, 1003-1022, (1996). 4. R. Richmond, A. DeAngelo, C. Potter and F. Daniel, The role of nodules in dichloroacetic acid-induced hepatocarcinogenesis in B6C3Fl male mice, Carcinogenesis 12, 1383-1387, (1991). 5. A. DeAngelo, Dichloro
Stochastic
Modeling of Carcinogenesis
1345
12. W.Y. Tan and C.W. Chen, Stochastic modeling of carcinogenesis: Some new insights, Mathl. Comput. Modelling 28 (ll), 49-71, (1998). 13. G. Kitagawa, A self organizing state space model, Jour. American Statist. Association 93, 1203-1215, (1998). 14. J.S. Liu and R. Chen, Sequential Monte Carlo method for dynamic systems, Jour. American Statist. Assoczation 93, 1032-1044, (1998). 15. W.Y. Tan and Z. Ye, Estimation of HIV infection and HIV incubation via state space models, 1I4uth. Biosciences (to appear). 16. N. Shephard, Partial non-Gaussian state space, Biometrika 81, 115-131, (1994). 17. A.F.M. Smith and A.E. Gelfand, Bayesian statistics without tears: A sampling-resampling perspectives, American Statistician 46, 84-88, (1992).