Copyright © IFAC 12th Triennial World Congress. Sydney. Australia. 1993
TUMOR DYNAMICS: SURVIVAL MODEL AND DATA PROCESSING A.L. Asachenkov International Institute/or Applied Systems Analysis. A-2361 lAxenburg. Austria
Abstract. The survival model with quadratic mortality rate, problem of pa.ramete~ estimation and clini~al data processing for the patients after radical surgery to remove a solid tumor are discussed. The respective problems arise from applied motivations that come from medical issues. Key words. Survival model, systems analysis, parameter estimation, medical inormation processing, modeling, random processes.
1
Introduction
To model the individual trajectories of these indices which have presumably stochastic character, the ODE with random perturbations of parameters can be used
The state of the organism during disease is assessed according to the laboratory tests. The disorders in normal functioning of main homeostatic systems, which are caused by the disease, lead to a deviation of these indices from the values which correspond to the state of the healthy body. A systematic disorder of the homeostasis which acquires a stable and uncompensated character considerably raises the probability of the patient's death. This general definition permits us to construct a model of the tumor dynamics as it links two of its most important properties: the rate of growth, i.e. the activity of the disease; and the intensity of decreasing the number of patients surviving for certain periods of time; Asachenkov et al. (1989) and Marchuk et al. (1989). Indeed, when assessing the tumor dynamics using laboratory tests, the totality of individual trajectories of physiological indices can be put into correspondence with the survival function. Thereby, the dynamics of variation in the death probability can be determined by statistical characteristics. Note that mortality characteristics can be accurately observed whereas the homeostatic disorders can be assessed only indirectly using the available clinical data. More details are presented in Asachenkov et al. (1993), Asachenkov and Sobolev (1990).
2
~x~ = f(x~, 0" + {I{.), t E [O,T].
x:
is a perturbed solution, c: > 0 is a small paramHere eter, {, is a stochastic process such that E{, = 0 and cov({t, {HT) - 0 as T -+ 00. These trajectories can be considered as a result of small perturbations of the dynamic system. The perturbed motion described by this model is the fast random fluctuation along the reference trajectory x(t,O'·). Let Y/ = x; - x(t, 0") be a deviation between the perturbed motion and reference trajectory. Then, the process Y,' can be approximated by the linear stochastic differential (Asachenkov et al. (1993)).
dYc = i.f(x"O'°)Ycdt
ax
Y(O)
= 0,
+ aa0' f(XI,O")dw"
t E [O,T],
where x, = x(t,O'°) is a support solution, and w, is a Gaussian process with independent increments, such that Ew, = 0, cov (wt, w,) = ft, where r is a matrix of intensities of ~t.
State equation 3
Let us consider the dynamics of observed indices from the patients after surgery. Let t = 0 be the instant of surgery. Denote x(t) E Rn vector measured in clinic indices. Let the dynamics of the clinically measured indices, on the average, be described by the equation
dx(t) - f( . ) dt X,O',
x(O)
= Xo ~ 0,
t
~
0,
Mortality dynamics
Assume, for simplicity, that we deal with just one cohort (a closed group of patients all of which have the same time entry to study) and there exist so-called failue or termination time T for each trajectory. This failure is associated with death or specific health changes and can be described by a set of trajectories (Xt(Wi), i = 1, ... , N) and the corresponding set of termination times 0 = {TJ, . .. , T N }, where N is a sample size. The trajectory set generates the probability density function of failure times, and we can study the relation between the trajectory set and probability distribution of random time of failure.
(1)
A solution of equation (1) x(t,O") which describes the average trajectory in the group of patients with ufavorable clinical history" is called a support solution or a reference trajectory, and the vector 0" is called a reference or support vector.
299
Let T be a nonnegative continuous random variable (termination time) with a probability density p(t). Then the distribution function F(t) defines the probability that an individual trajectory will fail at or before time t. It is possible to define a continuous function of time s(t) = I - F(t)
4
Unfortunately, there are no general recommendations for the choice of the analytical form of .\(t) as a function of measurable variables. The choice of corresponding analytical relation is determined by the analysis of experimental data and the ease of mathematical manipulations. The simplest approximation of an unknown function is the Taylor series. This yields a quadratic failure rate
which represents the probability that the individual will survive to time t. The function s(t) is called a survival function. If the probability density p(t) exists, then
s(t) =
[0 p(v)dv,
s(O)
= I.
Po(l)
Another important characteristic in survival analysis is the mortality intensity .\(1). It is simple to obtain the failure Tale form for intensity as
A(O) = 0,
and may be interpreted as a pathological pressure upon the organism caused by the disease up to the instant of time t. Let us consider the relationship between the deviations of measurable indices from reference trajectory x(I,O") and a survival function s(t). This deviations are caused by unmeasured endogenous and exogenous factors and can be considered as Gaussian-Markov process which satisfies the stochastic differential
+ D(I)dwl
dY; = a(t)Y;dt
Yo = 0,
+ yT(t)Q(t)Y(t),
where Po(t) is a function which determines the standard death hazard not related to a given disease, and Q(t) is a n x n-symmetrical positive definite matrix. On the other hand, the quadratic failure rate arises from applied motivations. According to Asachenkov el al. (1989) the variance of deviations of clinical indices from the reference trajectories is inversely proportional to the life span of patients. The quadratic form of failure rate represents increased risk at both high and low physiological values. It specifies a range of values as optimal in the sense of survival. The statistical properties of the random process {Y;, t :::: O} with quadratic failure selection and marginal distribution of the failure time is given in
The total mortality intensity in the interval [0, t] is
A(I) = !o'.\(u)du = logs(t),
Survival model with quadratic mortality rate
Proposition 4.1 (Yashin, 1985). Let Ihe random p1'Ocess
{Y,(w) E Rn,t:::: O,w E n} salisfya linear SDE; the individual failure rate p(t, Ytl = Po(l) + yT(t)Q(t)Y(t) satisfies quadralic hazard; and assume Ihat Yo is distributed as N(mo,-Yo). Under Ihese condilions fts(t) = -(po(t)+m T (I)Q(I)m(t)+lr[Q(t)-y(t)])s(t),(3)
(2)
where a(l) is a n x n matrix, D(I) is a n x m matrix, DDT = b(t)rbT(t), Wl is a Wiener process. Y; is distributed as N(m(t),-y(t)) where a means and variance satisfy the equations
s(O) = I, a(t)m(l), a(t)-y(t)
and Y; is distributed as N(m(t),-y(t)IT > t), where m(t) and -y(I) are the solutions of the following ODE
+ -y(t)aT(t) + b(t)rbT(t).
d dim(t) = a(t)m(t) - 2-y(t)Q(t)m(t),
Wood bury and Manton (1977) suggested to consider the failure rate conditional on the path of physiological covariates over time (individual mortality rate in Yashin el al.,
m(O) = mo
1986)
a(t)-y(t) p(t,Y,)
.
I
hm -Pr{T E (t, t + ~I)IT
-y(0) = -Yo
> t,(Yu,O:Sv:St)}, where
as a function which defines the survival chances for an individual with trajectory (Yv , 0 v t) and
:s :s
-l
+ -y(t)a(tf + b(t)fbT(t)
2-y(t)Q(t)-y(t),
61-0 ~t
s(tlY;) = exp{
(4)
m(t) = E{Y,IT :::: t}, -y(I) = E{(Y; - m(I))(Y; - m(I))TIT:::: t}.
p(u, Yu)du}.
0
Because s(t) is the marginal probability of survival to time I, we will refer to .\(t) as the marginal cohort failure rate. Similarly N(m(t), -y(t)IT > I) will be called as the marginal distribution of Y(t) among the survivors at time t. Now the marginal probability density is
However, this exponential formula does not necessarily hold without some conditions. Necessary and sufficient conditions for this expression were founded by Yashin and Arjas (1988). The relationship between conditional and unconditional failure rate was presented by Yashin (1985). Hence, the mortality dynamics observed in the group of patients is determined by
p(t) = .\(t)exp[-!o'.\(v)dv]. Thus, the mortality intensity observed in the group of patients is related to the dynamics of clinically controlled indicators by the formula
.\(1) = E{p(Y;)IT > t} where E is averaging over all individual trajectories.
.\(t) = Po(t)
+ mT(t)Q(t)m(l) + tr[Q(t)-y(I)].
As a rule, the model parameters have a practical interpretation. For example, the elements of matrix Q define the degree of influence on the failure dynamics of each state variable. Parameter estimation over clinical data
300
helps to study the role of various factors which are related to failure. Moreover, the total mortality intensity A(t) = (Y(t), Q, Po) as a function of Y(t) (under the known matrix Q and a function Ilo(t)) can be used to check clinically non-expressed tumor growth on line (Asachenkov et
6.1
Marginal density function
Proposition 6.1 (Sobolev, 1989). Let T 1 , ••• , T N be non-
negative independent random variables with the probability density p(t,f3'),t ~ 0,
al., 1993).
p(t,f3) = A(t,f3)exp{-A(t,f3)} where
Maximum likelihood estimate
5
A(t, f3) =
Assume, for simplicity, that matrices Q and r are diagonal. Introduce a new 2m x 1 vector, f3, of unknown parameters f3 = ( diag Q, diag
and the functions m(t,f3),,(t,f3),A(t,f3) are solutions of problem (3-4). If [[ is a compact subset of B which contains f3', then there exists local M LE in I-I, ~N, such that with probability one ~N -> f3' as N ->
rf·
Let f3' be the true value. Using a reference trajectory for a group of N patients, we can obtain the set of trajectory deviations YN = {Vi, i = 1,2, ... , N}. Other available information is the failure times 0 = {T[, ... , TN}. The solution of the problem (3-4) can be used to define the unconditional or marginal probability density for random failure times
p(t,f3)
The proof of the proposition is based on Pitman's theorem (Pitman, 1979). It means that for a great N, with proba~ility one, function PN(X[, .. . , XN, f3) will have a maximum f3N E [[ in some neighborhood of point f3' with arbitrary small radius.
= A(t,f3)exp[-A(t,f3)].
6.2
According to the likelihood principle, we have a log-likelihood function, for example
y i = {Y,~ E Rn,t k < Ti,k = 1, ... ,nol· Denote the joint distribution function at to, t[, ... , T i as
An estimate of the unknown vector f3 is given by
p(Y)
~ = arg max 1(f3),
= p(Yo, Y[, . .. , Yn,;
to, t[, . .. , tn,)'
To simplify the notation we write Yk instead of Y, •. The independence of the individual trajectories means that the likelihood function can be defined as
(JEB
where
L
Joint probability density
For the i-th patient we have a set of measurements
L(f3) = logp(0,f3).
1(f3) =
+ mT(t)Q(t)m(t) + tr[Q(th(t)]
110
N
log A(t, f3) - A(t, f3).
£(0, Y, f3)
tEe
= IT p(T;iyi)p(y i ), i=1
The difficulty of this procedure is associated with the functions m(t,f3),,(t,f3),A(t,f3) which are the functions of unknown parameter f3. There are two variants of thi~ estimation problem:
where Y = (Y', i = 1, ... , N}. The functions p(TIY) and p(Y) are defined by Proposition 6.2 (Yashin et al. 1986). Let a process {Y" t ~ O} satisfy a linear stochastic differential (2). The failure
(i) we can use the failure times only
rate is assumed to be a quadratic function
(ii) we can use the conditional form of the likelihood function.
+ yT(t)QY(t). Q, r be a diagonal,
p(t, Y,) = Po(t)
The consistency of MLEs and a numerical algorithm for searching such estimates are given by Asachenkov and Sobolev (1993).
Let the matrices function
conditional survival
s(t, Vd) = P{T > tlYd} and the failul'e rate conditional on trajectory yi be
6
Consistency of Maximum Likelihood Estimates
D
.
Ai(t) = - Dt logs(t, V'). Then the functions Ai(t) and p(Y,') can be represented as
Denote the joint density function for N independent and identically distributed (i.i.d.) random values with the probability density pC, f3) by PNC, f3) and the likelihood function for x[, . .. , XN by
n
Ai(t) = Po(t)
+ L qibii(t) + m i (tIYf)2), i:l
N
PN(X\, . .. , XN, f3)
= IT p(Xi, f3). 1=1
Any ~N(X) E B, which maximizes PN over f3 E B is called a maximum likelihood estimate (MLE). Therefore the MLE is a solution of the problem
where m(tlYf) and ,(t) are sectionally continuous functions on the intervals tk ::; t < tk+\ satisfies the equations
max{PN(x[, ... , XN, f3)j f3 E B}. Often it is convenient to maximize log PN in place of ]IN· It should be mentioned that MLEs do not always exist.
d. . . dim(tlyn = a\(t)m(tlYn - 2,(t)Qm(tlyn,
Moreover, if MLEs exist, they are not necessarily unique.
d di,(t) = al(th(t)
301
+ ,(t)a;(t) + b(t)rbT(t) -
2,(t)Q,(t)
jority of acute infectious diseases, which are characterized by a significant deviation of clinical indicators from their normal values, the tumor dynamics shows an irregular deviation of the clinical data from the reference trajectory that seems to testify the systematic stable influence of the tumor on the homeostatic system and in the long run promotes an increase in the death probability. The results presented here can be treated as a tool for obtaining a working knowledge about the process in question and as an introduction to the theory of clinical data processing on the basis of available information.
with the initial values for h, k = 1, ... ,ni,
o For discrete observations we can write p(y i ) =
IT
N(Y:lm(t;lYk_I);,(t;)),
{k,t.
where N(Y;lm(t k IYk- I ); ,(t;)) is the conditional Gaussian density with the means m(t;IYLI) and variances ,(t;) (Yashin and Arjas, 1988).
8
Asachenkov, A., B. Sobolev and Ye. Smolianinov. Mathematical modeling and data analysis from immunological tests in oncological patients, IIASA Working Paper WP-89-32, 1989.
Assume that there exists vector f3* E B such that a.e. p(t; (3')
= p(t) and
pry; (3')
Asachenkov, A.L. and B.G. Sobolev. Models and Data Analysis of Cancer Patients. In Selected Topics on Mathematical Models in Immunology and Medicine (R. Mohler and A. Asachenkov eds.), IIASA Paper, CP-90-007, 1990, pp. 61-69.
= pry)·
According to the maximum likelihood principle the desired estimate is a solution of the problem
max{L(lI,Y, (3), (3 E B}.
Asachenkov, A.L. and N. Grigorenko. Applications of Control Theory in Cancer Research, IIASA Working Paper, WP-92-56, 1992, 30pp.
Proposition 6.3 (Sobolev, 1989). The solution ~N of the system I
Asachenkov, A.L., G.1. Marchuk, R.R.Mohler, S.M. Zuev. Disease Dynamics, Birkhauser, Boston-Basel-Berlin, (1993) forthcoming.
fj!::i.{JL(lI,Y,(3) = 0 with probability one, converges to vector (3' as N --+ 00. 0
Asachenkov, A.L. and B. Sobolev. Parameter Estimation of Survival Model, IIASA Working Paper, WP-93-1, 1993,24pp.
The sufficient condition for the uniqueness of this solution is the non-singularity of the matrix
1 1 rjJ((3, fj) = !::i.{J[fj!::i.{JL(lI,Y,(3)]
Manton, K.G. and E. Stallard. Recent Trends in Mortality Analysis. Academic Press, Orlando, 1984.
at ((3.,0). This condition was given by Zuev (1988).
1
Marchuk, G.I., A.L. Asachenkov, Ye.S. Smolianinov and B.G. Sobolev. On the problem of processing clinical and laboratory data on oncological patients, Sov. J. Numer. Anal. Math. Modelling, 4, 5(1989).
Conclusion
The application of this model for stomach cancer is discussed by Marchuk, G., et al. (1989) and Asachenkov et al. (1993). The pathological pressure upon the organism due to the disease as a function of measurable parameters Y(t) is characterized by the expression Ai(t)
= l[J1.o(u) + y;T(u)Q(u)Y;(u)du = l
Pitman, E.J.G. Some Basic Theory of Statistical Inference. Chapman and Hall, London, 1979. Sobolev, B.G. Statistical estimating of coefficients in cancer disease model, Dep. Numer. Math. Preprint, 227 (1989), (in Russian)
J1.i(u)du
Woodbury, M.A. and K.G. Manton. A Random-Walk Model of Human Mortality and Aging, Theor. Popul. Bioi., 11(1977).
and for discrete observations it can be approximated by . I 1\'(t) = 2
L
References
.. (tk - tk-tl(J1.:. + J1.:.-I)·
k:tk$t
Yashin, A.1. Dynamics in survival analysis: Conditional Gaussian property versus the Cameron-Martin formula, Steklov Seminar (I
Here the function J1.o(t) and the matrix Q(t) are known. A close correlation between this index and the life span was shown by Asachenkov et al. (1989). This index can be considered as an integral characteristics of the state of patient. It provides the possibility to check the development of a clinically non-expressed tumor growth according to clinically measured indices and can be used for the selection of an adequate scheme of treatment. The minimization of this index is equivalent to the maximization of the life span after the beginning of treatment. Thus the index could be considered as a cost function for the optimal control problem. A simple example was studied by Asachenkov and Grigorenko, 1992.
Yashin, A.I., K.G. Manton and E. Stallard. Evaluating the Effects of Observed and Unobserved Diffusion Processes in Survival Analysis of Longitudinal Data, Mathematical Modelling, 7 (1986). Yashin, A.1. and E. Arjas. A note on random intensities and conditional survival function, J. Appl. Prob., 25(1988). Zuev, S.M. Statistical Estimating Parameters of Mathematical Models of Disease. Nauka, Moscow, 1988, (in Russian).
It should be emphasized that in the index construction no expert assessments have been used. In contrast to the ma-
302