Accepted Manuscript Identifiability of cure models revisited Leonid Hanin, Li-Shan Huang PII: DOI: Reference:
S0047-259X(14)00132-8 http://dx.doi.org/10.1016/j.jmva.2014.06.002 YJMVA 3760
To appear in:
Journal of Multivariate Analysis
Received date: 8 January 2014 Please cite this article as: L. Hanin, L.-S. Huang, Identifiability of cure models revisited, Journal of Multivariate Analysis (2014), http://dx.doi.org/10.1016/j.jmva.2014.06.002 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
*Manuscript Click here to download Manuscript: Ident Paper 2014 - Revised.pdf
Click here to view linked References
IDENTIFIABILITY OF CURE MODELS REVISITED Leonid Hanina∗ and Li-Shan Huangb a
Department of Mathematics, Idaho State University, 921 S. 8th Avenue,
Stop 8085, Pocatello, ID 83209-8085, USA b
Institute of Statistics, National Tsing-Hua University, 101, Section 2, Kuang-
Fu Road, Hsin-Chu 30013, Taiwan ∗
Corresponding author. E-mail:
[email protected]; phone: +1-208-282-3293
Summary. We obtained results on identifiability of mixture, mixture proportional hazards and bounded cumulative hazards (or Yakovlev) models of survival in the presence of cured (or nonsusceptible) subpopulation. These results specify conditions under which model parameters can, or cannot, be estimated from the observed potentially censored survival times and thus may guide statistical modeling. The results are formulated for larger classes of models and in greater generality than previously and correct some misconceptions that exist in statistical literature on the subject. All results are supplied with rigorous self-contained proofs. Keywords: Mixture model; model identifiability; proportional hazards model; scalable family of functions; Yakovlev model.
1. Introduction Estimation of numerical and functional parameters of statistical models allowing for population heterogeneity, including various models arising in survival analysis, depends critically on model identifiability. The latter property postulates that no two sets of parameters may generate the same model output. Thus, identifiability is a structural property of the model. However, it has far-reaching implications for model-based statistical inference, for if a model is non-identifiable then estimation of its parameters from a sample of model outputs is impossible and, if attempted, would lead to instability of the estimates. For an extensive discussion of identification of stochastic models, the reader is referred to 1
Hanin (2002). Survival models incorporating a subpopulation that is not susceptible to the event of interest (for example, a subpopulation of immune, unexposed or cured individuals, or perhaps of long-term survivors) have recently become an important statistical tool for analyzing survival data in the fields where the nature of the disease (or other condition) or advances in medicine make the existence of such a subpopulation plausible. Two major approaches to constructing such models have been proposed: the mixture model (Boag, 1949; Berkson and Gage, 1952; Farewell, 1982; Maller and Zhou, 1996; Peng and Dear, 2000; Lu and Ying, 2004; Tournoud and Ecochard, 2008, to mention a few) and the Yakovlev (or bounded cumulative hazards, BCH) model (Yakovlev et al., 1993; Yakovlev, 1994; Yakovlev and Tsodikov, 1996; Chen, Ibrahim and Sinha, 1999; Tsodikov, Ibrahim and Yakovlev, 2003; Zeng, Yin and Ibrahim, 2006; Cooner et al., 2007; Ma and Ying, 2008, among others). The component of the mixture model accounting for a susceptible subpopulation can be endowed with the proportional hazards structure, which leads to the mixture proportional hazards (PH) cure models (Kuk and Chen 1992; Sy and Taylor, 2000). The PH structure can also be embedded in the Yakovlev model of the survival function for the entire population. The main focus of this work is on identifiability of mixture, mixture PH and Yakovlev cure models. These models and issues related to their identification were discussed in an extensively cited paper by Li, Taylor and Sy (2001). However, a close examination of that work reveals that (1) formulation of the results and proofs need clarification; (2) a few critical conditions (in particular, those of scalability and weak scalability, see Section 2) indispensable for carrying out the proofs were omitted while some assumptions made by the authors are unnecessary; and (3) all results are formulated in a generality insufficient for many applications (in particular, the spaces of covariates, also termed design spaces, are assumed one-dimensional or empty, and no sharing of covariates between various components of the models is allowed). The goal of this work is three-fold: (1) to present concepts and results pertaining to identification of mixture, mixture PH and BCH cure models correctly, clearly, and in a generality sufficient for most applications; (2) to supply all results with rigorous selfcontained proofs; and (3) to develop both general framework and specific technical tools that may prove useful for analyzing identifiability properties of many other models.
2
As shown in Hanin (2002), any non-identifiable model depending on finitely many parameters can be made identifiable through re-parameterization by a smaller number of parameters represented as functions of the original parameters at the expense of potentially losing their natural meaning. However, finding these identifiable combinations of parameters may present significant difficulties. The paper is organized as follows. In Section 2, we introduce the mixture cure model and define its identifiability. In this section, we also discuss various geometric properties of the design spaces and scalability of families of functions that set up the stage for our results on identification of the mixture cure model in Section 3. In Sections 4 and 5, we introduce mixture PH and BCH cure models, respectively, and prove results on identification of these models and some of their important special cases. Finally, in Section 6 we briefly summarize our results and discuss directions of further study. 2. Mixture Cure Model and its Identification: Preliminaries The general mixture cure model has the form ¯ x, y, z) = 1 − p(x, y) + p(x, y)S(t; x, z), x ∈ X , y ∈ Y, z ∈ Z, 0 ≤ t < T, G(t;
(1)
where x, y, z are vectors of covariates from the respective non-empty sets (design spaces) X , Y, Z; function p(x, y) is the fraction of individuals in a population that are susceptible to an event of interest; S(t; x, z) is the survival function of the time to the event for the sub¯ x, y, z) represents the overall survival population of susceptible individuals; function G(t; function in the population in question; and T > 0 is the length of the observation period (including the possibility of T = ∞). Alternatively, function 1 − p can be interpreted as the fraction of cured individuals (or the probability of cure, also termed cure rate) while S can be viewed as the survival function for the time to the event in the sub-population of non-cured (or non-curable) individuals. Note that vector x contains covariates that are shared by the probability function p and the survival function S for susceptible individuals, whereas y and z are the respective vectors of their own covariates. Vectors x, y, z may contain variables descriptive of individuals, their disease (or other condition) and treatment (broadly construed). Model (1) can be also used in various settings beyond survival analysis, in which case interpretation of model components may be quite different. However, we will adhere to the terminology commonly used in survival analysis. 3
Observe that the survival function S is not assumed to be necessarily proper, i.e. to satisfy the condition S(T −; x, z) = 0 for all x ∈ X and z ∈ Z. Consideration of improper survival functions allows one to accommodate censoring and the possibility of insufficiently long follow-up time for all, or some, susceptible individuals. It is assumed that functions p : X × Y → [0, 1] and S : [0, T ) × X × Z → [0, 1] in model
(1) belong to certain families that will be denoted P and S, respectively. Identifiability of model (1) will be defined in terms of these families, and it depends critically on their size and various properties. Consideration of families of functions p and S reflects the assumption that variation of the values of these functions within the population in question is not necessarily determined by the covariates; in fact, various groups of individuals may have the same covariates but differ in susceptibility to the event of interest and distribution of the time to this event. The two simplest versions of the mixture model (1) arise when p(x, y) = p for all (x, y) ∈ X × Y, where 0 < p < 1, and when the survival function S(t; x, z) is the same function S(t) independent of covariates x, z. Another important example of the family of functions P arises in the logistic regression setting introduced by Farewell (1982), where it
is assumed that the set, U, of covariates for function p is a subset of Rd and log
p(u) = a + b · u, 1 − p(u)
where a ∈ R and b ∈ Rd , so that the vector of parameters is θ = (a, b). Then p(u, θ) =
exp(a + b · u) . 1 + exp(a + b · u)
(2)
As a preview of a more general notion of identifiability discussed in what follows, we define it here for parameterization (2). Definition 1. Model (2) is identifiable if the validity of the equation p(u, θ1 ) = p(u, θ2 ) for all u ∈ U implies that θ1 = θ2 . A simple criterion of identifiability of model (2) is given below. To formulate it, we first need to define the affine dimension of a set in Rd . Definition 2. Let U be a subset of Rd . The affine dimension of U (notation: adim(U))
is the maximum number n for which there are points u0 , u1 , ..., un ∈ U such that vectors u1 − u0 , u2 − u0 , ..., un − u0 are linearly independent.
4
It follows from this definition that U ⊂ u0 + L, where u0 ∈ U and L is a vector subspace
of Rd with dim(L) = adim(U). The affine dimension of U introduced above is an extension
of the linear dimension of U (notation: ldim(U)) defined as the dimension of the linear span of U. Clearly, adim(U) ≤ ldim(U). However, this inequality may be strict: think of a line L in R2 not passing through the origin, in which case adim(L) = 1 and ldim(L) = 2. Proposition 1. 1. Model (2) is identifiable if and only if adim(U) = d. 2. Model (2) with a = 0 is identifiable if and only if ldim(U) = d. Proof. 1. Let adim(U) = d. Suppose that p(u, θ1 ) = p(u, θ2 ) for all u ∈ U, where θi = (ai , bi ), i = 1, 2. Equivalently, a1 + b1 · u = a2 + b2 · u,
u ∈ U.
(3)
According to the definition of the affine dimension, there are points u0 , u1 , ..., ud ∈ U such that vectors u1 −u0 , u2 −u0 , ..., ud −u0 are linearly independent in Rd . Subtracting equation
(3) for point u0 from the same equation for points u1 , ..., ud we find that (b1 −b2 )·(uk −u0 ) =
0, 1 ≤ k ≤ d. Therefore, b1 = b2 , and hence from (3) we obtain a1 = a2 . Thus, model (2) is identifiable. Conversely, let parameterization (2) be identifiable. Suppose n := adim(U) < d. Then
there is a point u0 ∈ U and a vector b ∈ Rd , b ̸= 0, such that b · (u − u0 ) = 0 for all u ∈ U . Pick any number a1 ∈ R and vector b1 ∈ Rd , and set b2 := b1 + b, a2 := a1 − b · u0 . Notice
that b2 ̸= b1 . For all u ∈ U we have
a2 + b2 · u = a1 − b · u0 + (b1 + b) · u = a1 + b1 · u + b · (u − u0 ) = a1 + b1 · u, which implies that model (2) is non-identifiable. This contradiction shows that adim(U) = d. 2. The proof for the no-intercept case a = 0 is obtained from the above proof for the general case by a minor modification. 2 In what follows, when dealing with parameterization (2) of function p (or its no-intercept version) we will be assuming that the affine (or linear) dimension of U equals d. Observe also that the condition ldim(U) = d can be assumed without loss of generality by reducing the dimension of the Euclidean space that contains U. To formulate one of our results on identifiability of the mixture cure model in Section 3, we need to define the local dimension of a set X at a point. 5
Definition 3. (1) Let x0 be a limit point of a set X ⊂ Rd . A unit vector h ∈ Rd is
called a (generalized) tangent vector to the set X at the point x0 if there is a sequence {xk } ⊂ X such that xk ̸= x0 for all k, xk → x0 and (xk − x0 )/ | xk − x0 |→ h as k → ∞.
(2) The maximum number of linearly independent tangent vectors to X at x0 is called
the local dimension of the set X at the point x0 and denoted dimx0 (X ).
Notice that if X is a smooth k−dimensional manifold in Rd (in particular, a k−dimensional
affine or linear subspace of Rd ) then the local dimension at any point of X equals k. Also,
it follows from Definition 2 that adim(X ) ≥ dimx0 (X ) for every limit point x0 ∈ X .
Model (1) can be represented in an equivalent form through the corresponding cumulative distribution functions (cdf’s): G(t; x, y, z) = p(x, y)F (t; x, z),
(4)
¯ x, y, z) and F (t; x, z) = 1 − S(t; x, z). Denote by F the family where G(t; x, y, z) = 1 − G(t;
of cdf’s corresponding to the family S.
We will be assuming throughout that (i) 0 < p(x, y) < 1 for all x ∈ X , y ∈ Y and p ∈ P;
(ii) for each x ∈ X , z ∈ Z and F ∈ F , the function t 7−→ F (t; x, z) is non-decreasing,
continuous from the right at each point t ∈ [0, T ), satisfies the conditions F (0; x, z) =
0, F (T −; x, z) > 0, and in the case T = ∞ is proper, i.e. F (∞; x, z) = 1. We now define identifiability of the mixture cure model.
Definition 4. Model (4) is identifiable within families P and F if the equality G1 (t; x, y, z) = G2 (t; x, y, z),
x ∈ X , y ∈ Y, z ∈ Z, 0 ≤ t < T,
where Gi (t; x, y, z) = pi (x, y)Fi (t; x, z),
i = 1, 2,
for some functions p1 , p2 ∈ P and F1 , F2 ∈ F implies that p1 (x, y) = p2 (x, y) for all x ∈ X , y ∈ Y and F1 (t; x, z) = F2 (t; x, z) for all x ∈ X , z ∈ Z and 0 ≤ t < T.
Identifiability of model (4) makes it possible to distinguish between the effects on the observed survival of the presence of non-susceptible (or curable) individuals accounted for by function p and of insufficiently long follow-up or censoring for susceptible individuals encapsulated by function F. Clearly, if model (4) is identifiable within certain families P 6
and F then the same is true for any of their subfamilies. The problem, then, is to find the largest families P and F for which model (4) is still identifiable. We start with the following simple observation. Proposition 2. Suppose that family F consists of proper cdf ’s, i.e. F (T −; x, z) = 1
for all x ∈ X , z ∈ Z and F ∈ F . Then model (4) is identifiable. Proof. Suppose that p1 (x, y)F1 (t; x, z) = p2 (x, y)F2 (t; x, z),
x ∈ X , y ∈ Y, z ∈ Z, 0 ≤ t < T,
(5)
where p1 , p2 ∈ P and F1 , F2 ∈ F. Letting t → T − we conclude from (5) that p1 (x, y) =
p2 (x, y) for all (x, y) ∈ X × Y. Then (5) implies that F1 (t; x, z) = F2 (t; x, z) for all x ∈ X , z ∈ Z and 0 ≤ t < T. 2
It follows from Proposition 2 that in the case T = ∞ model (4) is identifiable. Therefore,
below we will focus our attention on the case T < ∞ and assume that some of the cdf’s
in the family F are improper. The assumption of proper cdf’s F is similar to the zero-tail constraint in Taylor (1995), Peng and Dear (2000), Liu and Shen (2009) and other works. Our results on identification of mixture and other cure models discussed in Sections 3-5 below suggest that the following two related properties of function families are critical for model identifiability. Definition 5. 1. A family P of functions p : X × Y → (0, 1) is called scalable if together with any function p it also contains every scalar multiple cp, c > 0, provided that it satisfies condition (i) stated above. The family P is termed weakly scalable if it contains two (distinct) functions p1 , p2 such that p2 = cp1 for some positive constant c ̸= 1.
2. Similarly, a family F of cdf type functions F : [0, T ) × X × Z → [0, 1] satisfying the
conditions listed in (ii) is scalable if together with any function F it contains all its scalar multiples cF, provided that cF (T −; x, z) ≤ 1 for all x ∈ X and z ∈ Z. The family F is weakly scalable if F2 = cF1 for some functions F1 , F2 ∈ F and constant c > 0, c ̸= 1.
Note that every scalable family of functions is weakly scalable. Also, it is easy to see that for a given function p : X × Y → (0, 1) from a scalable family, the set of admissible values for c is 0 < c < 1/∥p∥ in the case where the supremum-norm ∥p∥ := sup{p(x, y) : x ∈ X , y ∈ Y} is attained at some point in X × Y and 0 < c ≤ 1/∥p∥ in case it is not attained. Similarly, 7
the set of admissible values of c for a function F from a scalable family F is (0, 1/∥F ∥], where ∥F ∥ := sup{F (T −; x, z) : x ∈ X , z ∈ Z}. Finally, observe that the set of all functions p : X × Y → (0, 1) is scalable, and so is the
set of all cdf type functions F : [0, T ) × X × Z → [0, 1] subject to conditions (ii).
Many natural nonparametric families F on a finite interval [0, T ) are scalable. For
example, this is the case for the families of all piecewise constant cdf’s, in particular, those resulting from the Kaplan-Meier estimator. Likewise, various families of piecewise linear and, more generally, piecewise polynomial cdf’s, continuous or otherwise, are all scalable. Some parametric families, such as the family of cdf’s on [0, T ) represented by polynomials of degree ≤ m, F (t) =
m ∑
ck tk ,
k=1
where ck ≥ 0, 1 ≤ k ≤ m, and
∑m
k=1 ck T
k
0 ≤ t < T,
≤ 1, are scalable as well. The following proposi-
tion shows, however, that many other commonly used parametric families of distributions are non-scalable. Consider, for example, the family of absolutely continuous distributions on (0, ∞) with the probability density function (pdf) given by f (t; α, β, λ) = C tα−1 exp{−βtγ },
t > 0,
(6)
where α, β, λ > 0 and C = C(α, β, γ) =
γβ α/γ Γ(α/γ)
is the normalization constant. Clearly, this class of distributions contains exponential, gamma and Weibull families. Proposition 3. The family F of cdf ’s F given by ∫ t F (t) = f (u)du, 0 ≤ t < T, 0
where pdf f is specified in (6), is not weakly scalable. Proof. Suppose family F is weakly scalable. Then there is a cdf F0 ∈ F parameterized
by α0 , β0 , γ0 and a positive constant c ̸= 1 such that cF0 ∈ F . Let α, β, γ be parameters of this cdf. Thus, 8
∫
t
f (u)du = c
0
∫
t
f0 (u)du,
0
0 ≤ t < T,
where f0 (t) = f (t; α0 , β0 , λ0 ) and f (t) = f (t; α, β, λ). Differentiating the above equation we have f = cf0 , that is, C tα−1 exp{−βtγ } = cC0 tα0 −1 exp{−β0 tγ0 },
0 < t < T,
(7)
where C = C(α, β, λ) and C0 = C(α0 , β0 , λ0 ). Since the limit at 0 of the ratio of two distinct power functions is either 0 or infinity, we conclude from (7) that α = α0 . Then equation (7) reduces to βtγ + k = β0 tγ0 ,
0 < t < T,
(8)
where k = log(cC0 /C). Again, comparing limiting behavior at 0 of the two sides of equation (8) we find that γ = γ0 . Then, equation (8) implies that β = β0 . Therefore, functions f and f0 are identical, and hence c = 1. This contradiction leads to the required conclusion that family F is not weakly scalable. 2 3. Identifiability of Mixture Cure Model: Results Our first result concerns identifiability of the general mixture cure model. Among the two equivalent model formulations, we refer to the one given by formula (4). Let ∥p∥x := sup{p(x, y) : y ∈ Y},
p ∈ P, x ∈ X ,
and ∥F ∥x := sup{F (T −; x, z) : z ∈ Z},
F ∈ F, x ∈ X .
(9)
Theorem 1. 1. Suppose that ∥p∥x = 1 for all p ∈ P and x ∈ X or ∥F ∥x = 1 for all
F ∈ F and x ∈ X . Then model (4) is identifiable within families P and F.
2. Suppose that functions in the families P and F do not share covariates and that at
least one of these families is not weakly scalable. Then model (4) is identifiable. 3. Suppose families P and F be scalable. If ∥p∥ < 1 for some p ∈ P or ∥F ∥ < 1 for
some F ∈ F then model (4) is non-identifiable. 9
4. If family P consists of constant functions p, 0 < p < 1, and family F is weakly scalable then model (4) is non-identifiable. Proof. 1. Suppose either of the two conditions on the families P and F is satisfied. To show that model (4) is identifiable, suppose that (5) holds. Let τi (x, z) := sup{t : 0 ≤ t < T, Fi (t, x, z) = 0}, i = 1, 2.
(10)
It follows from (5) that τ1 (x, z) = τ2 (x, z); we denote the common value of this quantities by τ (x, z) and note that 0 ≤ τ (x, z) < T. Then, equation (5) can be represented in the form p1 (x, y) F2 (t; x, z) = , p2 (x, y) F1 (t; x, z)
x ∈ X , y ∈ Y, z ∈ Z, τ (x, z) < t < T.
This implies that there is a function c(x) > 0 such that p1 (x, y) = c(x)p2 (x, y),
x ∈ X , y ∈ Y,
(11)
and F2 (t; x, z) = c(x)F1 (t; x, z),
x ∈ X , z ∈ Z, τ (x, z) < t < T.
(12)
Notice that in view of (10), equation (12) is satisfied for all t ∈ [0, T ). From equality (11) we derive that ∥p1 ∥x = c(x)∥p2 ∥x ,
x ∈ X,
(13)
∥F2 ∥x = c(x)∥F1 ∥x ,
x ∈ X.
(14)
while (12) yields
Suppose that ∥p∥x = 1 for all p ∈ P and x ∈ X . Then (13) implies that c(x) = 1 for all
x ∈ X , which, according to equations (11) and (12), means that model (4) is identifiable. Similarly, if ∥F ∥x = 1 for all F ∈ F and x ∈ X then (14) again implies c(x) = 1 for all
x ∈ X , i.e. identifiability of model (4).
2. In the case where functions p and F do not share covariates equation (5) takes on the form p1 (y)F1 (t; z) = p2 (y)F2 (t; z),
y ∈ Y, z ∈ Z, 0 ≤ t < T.
As shown in the proof of statement 1, this implies the existence of a constant c > 0 such that p1 (y) = cp2 (y) for all y ∈ Y and F2 (t; z) = cF1 (t; z) for all z ∈ Z and 0 ≤ t < T. From 10
the assumed non-weak scalability of at least one of the families P and F we deduce that c = 1. Thus, p1 = p2 and F1 = F2 , so that model (4) is identifiable. 3. Assume that P and F are scalable families and model (4) is identifiable. Suppose
that ∥p∥ < 1 for some p ∈ P. Then one can find c > 1 such that cp ∈ P. Pick an arbitrary F ∈ F, then F/c ∈ F . Setting p˜ = cp and F˜ = F/c we have p˜F˜ = pF, which implies that
model (4) is non-identifiable. Similarly, if ∥F ∥ < 1 for some F ∈ F then there would exist c > 1 such that F˜ := cF ∈ F. Notice that p˜ := p/c ∈ P. Hence again p˜F˜ = pF, and thus model (4) is non-identifiable. This proves the third statement of Theorem 1. 4. Because family F is weakly scalable, there are functions F, F˜ ∈ F and a constant c > 1 such that F˜ = cF. Pick any p ∈ (0, 1) and set p˜ := p/c. Then p˜F˜ = pF, which shows that model (4) is non-identifiable. 2 Notice that the condition ∥p∥ < 1 implies the presence of non-susceptible (or curable)
individuals. Similarly, the condition ∥F ∥ < 1 means that either time T is insufficient for
manifestation of the event of interest in all susceptible individuals or that such a manifestation was prevented by censoring. According to Theorem 1, either of the inequalities ∥p∥ < 1 or ∥F ∥ < 1 brings about non-identifiability of the cure model (4) within any scalable families P and F. Also, according to statement 2 of Theorem 1, if the family of
cdf’s associated with pdf’s (6) (or any its subfamily) share no covariates with functions p, then the mixture cure model (4) is identifiable. Finally, statements 2 and 4 of Theorem 1 imply that if family P consists of constant functions p, 0 < p < 1, then model (4) is identifiable if and only if family F is not weakly scalable.
We now turn to a special case where dependence of the probability p on covariates is given by formula (2). In this case, identifiability of model (4) depends on the structure of the design space X . Note that the latter played no role in Theorem 1. In the next theorem, we will be assuming that model (2) is non-degenerate in the sense that b ̸= 0.
Theorem 2. Let X ⊂ Rd and P be the family of all functions given by the non-
degenerate model (2). Suppose that functions in the families P and F do not share covariates. 1. If d = 1 and the set X contains at least three points then model (4) is identifiable for
any family F.
2. If d = 1 and the set X consists of one or two points (in particular, if the covariate 11
of function p is a binary variable) then model (4) is non-identifiable for any weakly scalable family F.
3. Suppose d ≥ 2 and dimx0 (X ) = d for some limit point x0 ∈ X . Then model (4) is
identifiable for any family F.
Proof. Suppose that equation (5) holds for some F1 , F2 ∈ F and pi (x) =
exp(ai + bi · x) , 1 + exp(ai + bi · x)
i = 1, 2,
where a1 , a2 ∈ R and b1 , b2 are non-zero vectors in Rd . The argument used in the beginning of the proof of statement 1 of Theorem 1 shows that p1 = cp2 and F2 = cF1 for some constant c > 0. Solving the equation exp(a1 + b1 · x) exp(a2 + b2 · x) =c , 1 + exp(a1 + b1 · x) 1 + exp(a2 + b2 · x)
x ∈ X,
(15)
for exp(−b2 · x) we find that e−b2 ·x = Ae−b1 ·x + B,
x ∈ X,
(16)
where coefficients A = cea2 −a1
and B = (c − 1)ea2
(17)
are independent of x. 1. Suppose that d = 1 and X contains three distinct points, say x0 , x1 , x2 . Setting x = x0 , x1 , x2 in (16) and subtracting the equation for x0 from those for x1 and x2 we have e−b2 xi − e−b2 x0 = A(e−b1 xi − e−b1 x0 ),
i = 1, 2.
(18)
Using the notation ui = exp(−b1 xi ), i = 0, 1, 2, we conclude from (18) that uα − uα0 uα1 − uα0 = 2 , u1 − u0 u2 − u0
(19)
where points u0 , u1 , u2 are distinct and α = b2 /b1 . Note that α ̸= 0. Because the function u 7−→ uα is strictly concave for 0 < α < 1 and strictly convex for α < 0 and α > 1, equation (19) implies that α = 1, i.e. b1 = b2 . Next, it follows from (18) that A = 1, and then (16) 12
yields B = 0. Finally, from equations (17) we derive that c = 1 and a1 = a2 . Thus, in the case d = 1 model (4) is identifiable. 2. We continue to assume that d = 1. Suppose X contains no more than two points.
Since family F is weakly scalable, there exist a function F ∈ F and a constant c > 1 such that the function F/c belongs to F. It is sufficient to find pairs (a1 , b1 ) and (a2 , b2 ) of real numbers with b1 , b2 ̸= 0 such that equation (15) (or equivalently equation (16)
with coefficients A and B specified in (17)) is satisfied for all x ∈ X . Because ai + bi x =
ai + bi x0 + bi (x − x0 ), i = 1, 2, where x0 ∈ X , we may assume without loss of generality that 0 ∈ X . Also, in the case where X is a two-point set, by scaling coefficients b1 , b2 we may assume that the second point in X is 1, so that covariate x is binary.
If X = {0} then we only have to satisfy equation A + B = 1, see (16). Thus, we pick
any number A, 0 < A < 1, set B := 1 − A and, for the number c > 1 selected above, find
unique numbers a1 , a2 for which equations (17) are met. In this case, the choice of numbers b1 , b2 does not matter. In the case X = {0, 1} equations (17) take on the form A + B = 1,
e−b2 = Ae−b1 + B.
(20)
Thus, we pick an arbitrary number A, 0 < A < 1, any b1 ̸= 0 and set B := 1 − A. Since Ae−b1 + 1 − A is a positive number different from 1, we find a unique number b2 ̸= 0 for
which the second equation in (20) is met. Finally, we solve equations (17) to find a1 and a2 . 3. Suppose now that d ≥ 2 and dimx0 (X ) = d at some limit point x0 ∈ X . Let h be
a unit tangent vector to the set X at x0 . Then there is a sequence of points {xk } ⊂ X
different from x0 such that xk → x0 and (xk − x0 )/ | xk − x0 |→ h. Subtracting equation
(16) for x = x0 from that for x ∈ X we have
e−b2 ·x − e−b2 ·x0 = A(e−b1 ·x − e−b1 ·x0 ),
x ∈ X.
Setting in (21) x = xk we represent this equation in the form e(b1 −b2 )·x0 [1 − e−b2 ·(xk −x0 ) ] = A[1 − e−b1 ·(xk −x0 ) ]. Expanding the exponential functions we get e(b1 −b2 )·x0 [b2 · (xk − x0 ) + o(xk − x0 )] = A[b1 · (xk − x0 ) + o(xk − x0 )]. 13
(21)
Dividing this equation by | xk − x0 | and taking limit as k → ∞ we have [Ab1 − e(b1 −b2 )·x0 b2 ] · h = 0. Because this is true for d linearly independent vectors h we conclude that Ab1 = exp[(b1 −
b2 ) · x0 ]b2 . Thus, b2 = λb1 , where λ = A exp[(b2 − b1 ) · x0 ]. This allows us to represent equation (21) as uλ − uλ0 = A(u − u0 ),
(22)
where u0 = exp(−b1 · x0 ) and u = exp(−b1 · x).
Since the local dimension of X at the point x0 is d ≥ 2, there is a unit tangent vector h
to the set X at x0 that is not orthogonal to b1 . Let {xk } ⊂ X be a sequence as above (i.e. xk → x, xk ̸= x0 for all k, and (xk − x0 )/ | xk − x0 |→ h). We claim that b1 · xk ̸= b1 · x0
(23)
for all sufficiently large k. In fact, if this were false then we would have b1 · (xkm − x0 ) = 0 for a subsequence {km }. Diving this equation by | xkm − x0 | and taking limit as m → ∞ would yield b1 · h = 0, which is a contradiction.
Pick a point xk that satisfies condition (23). The latter condition implies that b1 · xk ̸=
b1 · xm for some m > k. Thus, for u1 = exp(−b1 · xk ) and u2 = exp(−b1 · xm ) we have u1 ̸= u2 , u1 ̸= u0 and u2 ̸= u0 . Then our argument in the case d = 1 applied to equation
(22) would show that λ = 1, i.e. b2 = b1 . Then, according to equation (21), we have A = 1. As it was argued in the proof of statement 1 of this theorem, this implies identifiability of model (4). 2 It is interesting to mention that allowing parameterization (2) to degenerate may lead to the loss of identifiability of model (4) regardless of the size and geometry of the design space X .
Proposition 4. Let P be the family of functions (2) where b = 0 is allowed. Suppose
F is a weakly scalable family of functions that do not share covariates with functions from the family P. Then model (4) is non-identifiable.
Proof. There exist functions F1 , F2 ∈ F and constant c > 1 such that F2 = cF1 . To
show that model (4) is non-identifiable, we set b1 = b2 = 0 and find numbers a1 , a2 ∈ R such that
e a2 e a1 = c 1 + ea1 1 + ea2 14
or equivalently e−a2 = ce−a1 + c − 1. Because ce−a1 + c − 1 > 0, the latter equation has, for any given a1 , a unique solution
a2 ̸= a1 , as required. 2
We now consider identification of model (4) with functions p from the no-intercept version of family (2) (a = 0). In accordance with Proposition 2, we assume that the design space X ⊂ Rd of model (2) satisfies the condition ldim(X ) = d. In particular, in the case
d = 1, we assume that X = ̸ {0}. A fairly simple modification of the proof of Theorem 2 leads to the following result.
Theorem 3. Let X ⊂ Rd and P be the family of all functions given by the no-intercept
version of non-degenerate model (2). Suppose that functions in the families P and F do not share covariates. 1. If d = 1 and the set X contains at least two points then model (4) is identifiable for
any family F.
2. If d = 1 and the set X consists of one (non-zero) point then model (4) is non-
identifiable for any weakly scalable family F.
3. Suppose d ≥ 2 and dimx0 (X ) = d for some limit point x0 ∈ X . Then model (4) is
identifiable for any family F.
In many applications, survival function S in the mixture cure model (1) is a piecewise constant function estimated nonparametrically from the data through the Kaplan-Meier estimator; then so is the corresponding cdf F. This motivates consideration of the family F of all cdf’s F of the form F (t) =
N −1 ∑
bi 1[ai ,ai+1 ) .
(24)
i=0
Here N ≥ 2; 1A stands for the indicator function of a set A; {ai : 0 ≤ i ≤ N } is an
increasing sequence such that a0 = 0 and aN = T ; and {bi : 0 ≤ i ≤ N − 1} is an
increasing sequence such that b0 = 0 and 0 < bN −1 ≤ 1. Note that cdf F, and thereby number N and both sequences {ai } and {bi }, may depend on covariates.
Theorem 4. Suppose that cdf in model (4) is of the form (24). Then: 1. Number N and sequence {ai } are identifiable.
2. Suppose that Y = ∅, so that all covariates of functions p ∈ P are also covariates
of functions F ∈ F. If family P contains functions p and p˜ such that p(x) < p˜(x) for all 15
x ∈ X (in particular, if family P is weakly scalable) then sequence {bi } is not identifiable. Proof. 1. Suppose that for certain cdf’s F, F˜ ∈ F , some probability functions p, p˜ ∈ P and for all values of the covariates (which we suppress notationally) p˜F˜ (t) = pF (t),
0 ≤ t < T.
(25)
Assume that F (t) is given by (24) and similarly F˜ (t) =
˜ −1 N ∑
˜bi 1[˜a ,˜a ) . i i+1
i=0
˜. We fix the values of all the covariates and assume without loss of generality that N ≤ N
We now show using mathematical induction that a ˜i = ai , 0 ≤ i ≤ N. For i = 0 we
have by definition a ˜0 = a0 = 0. Suppose that a ˜k = ak for some k, 0 ≤ k ≤ N − 2, and
show that a ˜k+1 = ak+1 . Suppose that a ˜k+1 ̸= ak+1 , say, ak+1 < a ˜k+1 . Then for t ∈ (ak , ak+1 ) equation (25) yields p˜˜bk = pbk , while for t ∈ [ak+1 , a ˜k+1 ) we have p˜˜bk = pbk+1 . Therefore, pbk+1 = pbk , which in view of our general assumption p > 0 implies that bk+1 = bk , contrary to the assumption bk < bk+1 built into the definition of piecewise constant function (24). This contradiction shows that a ˜k+1 = ak+1 . Thus, a ˜i = ai , 0 ≤ i ≤ N. However, because ˜ = N. aN = T, we must also have a ˜N = T, which implies that N 2. Fix the values of the covariates and a sequence {ai :
0 ≤ i ≤ N } such that
0 = a0 < a1 < ... < aN −1 < aN = T. Pick probabilities p˜ and p with 0 < p < p˜ < 1 and a sequence {bi : 0 ≤ i ≤ N − 1} such that 0 = b0 < b1 < ... < bN −1 ≤ 1. Define ˜bi = pbi /˜ p, 0 ≤ i ≤ N − 1. Such sequence {˜bi } is increasing and distinct from {bi }, has the properties ˜b0 = 0, ˜bN −1 < 1 and depends on an allowable set of covariates. Also, the corresponding cdf F˜ satisfies (25). This means that the set of values {bi } taken by the cdf (24) is not identifiable from the overall survival (1). 2 4. Identifiability of Mixture Proportional Hazards Cure Model This class of cure models is a generalization of model (1) along the lines of proportional hazards. We start with the model ¯ x, y, z, u, v, s, w) = 1 − p(x, y, u, v) + p(x, y, u, v)[S(t; x, y, z, s)]r(x,z,u,w) G(t;
(26)
that assumes the most general form of dependence of functions p, S and r on covariates. Here 0 ≤ t < T ; x ∈ X , y ∈ Y, z ∈ Z, u ∈ U, v ∈ V, w ∈ W, s ∈ Σ, where 16
X , Y, Z, U, V, W, Σ are the ranges for the respective covariates; and r(x, z, u, w) > 0 for all (x, z, u, w) ∈ X × Z × U × W. We continue to assume that functions p and S satisfy the general conditions spelled out in Section 2. One of the widely used families R is r(α) = exp{a + b · α},
α ∈ A,
(27)
where A ⊂ Rd , a ∈ R and b ∈ Rd . Notice that the family of functions (27) is scalable. Another useful family is the no-intercept version of (27) (a = 0) arising from the Cox model: r(α) = exp{b · α},
α ∈ A.
(28)
To ensure identifiability of models (27) and (28), it should be assumed, in accordance with Proposition 1, that adim(A) = d and ldim(A) = d, respectively. Scalability properties of the family (28) has the following characterization. Proposition 5. Suppose that ldim(A) = d. (1) If adim(A) < d then model (28) is scalable. (2) If model (28) is weakly scalable then adim(A) < d. Proof. Let n := adim(A). By definition of the affine dimension (see Section 2), there exists a point α0 ∈ A and a vector subspace L in Rd such that A ⊂ α0 +L and dim(L) = n. Let α2 be the orthogonal projection of α0 onto L and α1 := α0 − α2 . Then A ⊂ α1 + L.
1. Suppose that n < d. To show that model (28) is scalable we have to find, for any b ∈ Rd and constant c > 0, c ̸= 1, a vector ˜b ∈ Rd such that exp{˜b · α} = c exp{b · α} for all α ∈ A or equivalently
(˜b − b) · α = k,
α ∈ A,
(29)
where k = log c. Since ldim(A) > dim(L), the set A is not contained in L, which implies that α1 ̸= 0. Set ˜b := b + kα1 /∥α1 ∥2 and observe that vector ˜b − b is orthogonal to L. Representing any vector α ∈ A in the form α = α1 + u, where u ∈ L, we have (˜b − b) · α =
(˜b − b) · α1 = k, as required.
2. Conversely, suppose that model (28) is weakly scalable. Then for some b ∈ Rd and k ̸= 0 there exists a vector ˜b ∈ Rd , ˜b ̸= b, satisfying condition (29). It follows from (29) that (˜b − b) · (α − α0 ) = 0 for all α ∈ A. Since vector space L is spanned by the set {α − α0 : α ∈ A}, we conclude that n = dim(L) < d. 2 17
It follows from Proposition 5 that for the family of functions (28) scalability and weak scalability are equivalent, and in the case ldim(A) = d the criterion of scalability is given by the condition adim(A) < d. Definition 6. Model (26) is identifiable within families P, S and R if validity of the equality ¯ 1 (t; x, y, z, u, v, w, s) = G ¯ 2 (t; x, y, z, u, v, w, s) G for all x ∈ X , y ∈ Y, z ∈ Z, u ∈ U, v ∈ V, w ∈ W, s ∈ Σ and 0 ≤ t < T, where Gi (t; x, y, z, u, v, s, w) = 1 − pi (x, y, u, v) + pi (x, y, u, v)[Si (t; x, y, z, s)]ri (x,z,u,w) , i = 1, 2; p1 , p2 ∈ P; S1 , S2 ∈ S; r1 , r2 ∈ R, implies that p1 (x, y, u, v) = p2 (x, y, u, v) for all
x ∈ X , y ∈ Y, u ∈ U , v ∈ V; S1 (t; x, y, z, s) = S2 (t; x, y, z, s) for all 0 ≤ t < T, x ∈ X , y ∈ Y, z ∈ Z, s ∈ Σ; and r1 (x, z, u, w) = r2 (x, z, u, w) for all x ∈ X , z ∈ Z, u ∈ U , w ∈ W.
Our first result concerning models (26) deals with the semi-parametric case where survival function S is piecewise constant: S(t) =
N −1 ∑
ci 1[ai ,ai+1 ) ,
(30)
i=0
where N ≥ 2; {ai : 0 ≤ i ≤ N } is an increasing sequence such that a0 = 0 and aN = T ; and {ci : 0 ≤ i ≤ N − 1} is a decreasing sequence such that c0 = 1 and 0 ≤ cN −1 < 1.
Note that survival functions S, and thereby number N and both sequences {ai } and {ci }, may depend on covariates. Denote by S0 the family of all proper survival functions (30), that is, satisfying the condition cN −1 = 0. The first two statements of the following theorem are parallel to the respective claims in Theorem 4. Theorem 5. Suppose that survival functions S in model (26) are of the form (30). Then: 1. Number N and sequence {ai } are identifiable.
2. Suppose that U = V = ∅, so that all covariates of functions p ∈ P are also covariates
of functions S ∈ S. If family P contains functions p and p˜ such that p < p˜ for all values
of relevant covariates (in particular, if P is weakly scalable) then sequence {ci } is not
identifiable. 18
3. Within the family S0 of proper survival functions (30), (a) function p is identifiable; (b) function r ∈ R is not identifiable if family R contains two distinct functions (in
particular, if family R is weakly scalable);
(c) function S is non-identifiable if family R contains two distinct functions depending
on covariates x, z alone. Proof. 1. Suppose that for some pairs of functions S, S˜ ∈ S; p, p˜ ∈ P; r, r˜ ∈ R and for all values of the covariates (which we suppress notationally) ˜ r˜ = 1 − p + p[S(t)]r , 1 − p˜ + p˜[S(t)]
0 ≤ t < T.
(31)
Assume that S(t) is given by (30) and similarly ˜ = S(t)
˜ −1 N ∑
c˜i 1[˜ai ,˜ai+1 ) .
i=0
˜. We fix the values of all the covariates and assume without loss of generality that N ≤ N
We will show using mathematical induction that a ˜i = ai , 0 ≤ i ≤ N. For i = 0 we have
a ˜0 = a0 = 0. Suppose that a ˜k = ak for some k, 0 ≤ k ≤ N − 2 and show that a ˜k+1 = ak+1 .
Suppose that a ˜k+1 ̸= ak+1 , say, ak+1 < a ˜k+1 . Then for t ∈ (ak , ak+1 ) we have from (31)
1 − p˜ + p˜c˜rk˜ = 1 − p + pcrk , while for t ∈ [ak+1 , a ˜k+1 ) we have 1 − p˜ + p˜c˜rk˜ = 1 − p + pcrk+1 .
Therefore, pcrk+1 = pcrk . In view of our assumptions that p, r > 0 this implies that ck+1 = ck , which is a contradiction. Therefore, a ˜k+1 = ak+1 . This proves that a ˜i = ai , 0 ≤ i ≤ N. ˜ = N. This establishes However, because aN = T, we also have a ˜N = T, which implies that N statement 1 of the theorem. 2. Fix the values of the covariates and a sequence {ai : 0 ≤ i ≤ N } such that 0 = a0 <
a1 < ... < aN −1 < aN = T. Pick probabilities p˜ and p with 0 < p < p˜ < 1, a function r ∈ R
and a sequence {ci : 0 ≤ i ≤ N − 1} with the properties 1 = c0 > c1 > ... > cN −1 ≥ 0. Define p c˜i = [1 − (1 − cri )]1/r , p˜
0 ≤ i ≤ N − 1.
(32)
It is easy to see that sequence {˜ ci } is decreasing, c˜0 = 1 and c˜N −1 > 0. We rewrite equation ci } and (32) in the form p˜(1 − c˜ri ) = p(1 − cri ), 0 ≤ i ≤ N − 1, to conclude that sequences {˜ {ci } are distinct. Also, the latter equation is equivalent to 1 − p˜ + p˜c˜ri = 1 − p + pcri for all 19
i. Thus, for functions S, S˜ ∈ S of the type (30) whose values are related through equation (32) and for any r ∈ R we have
˜ r = 1 − p + p[S(t)]r , 1 − p˜ + p˜[S(t)]
0 ≤ t < T.
This means that the set of values {ci } of a survival function of the form (30) is not identifiable from the overall survival function (26). 3. Suppose that equation (31) holds for some p, p˜ ∈ P, S, S˜ ∈ P and r, r˜ ∈ R. Letting
t → T − in this equation we get 1 − p˜ = 1 − p so that p˜ = p. This proves statement 3 (a).
To show that functions r are in general non-identifiable, pick an arbitrary function p ∈ P; any piecewise constant survival function S that takes two values, 1 and 0 (N = 2); two distinct functions r, r˜ ∈ R; and set p˜ = p, S˜ = S. Then equation (31) is clearly satisfied, which proves our claim 3 (b). Finally, if there are distinct functions r, r˜ ∈ R that are independent of covariates u
and w then for any piecewise constant survival function S ∈ S0 with N ≥ 3, we may set S˜ := S r/˜r to obtain a piecewise constant function from the family S0 that depends on a
proper set of covariates and is distinct from S so that equation (31) is satisfied. This shows our claim 3 (c). 2 We will now discuss an important particular case of model (26) where function r shares no covariates with functions p and S. Thus, the model we deal with becomes ¯ x, y, z, u, v) = 1 − p(x, y) + p(x, y)[S(t; x, z)]r(u,v) , G(t;
0 ≤ t < T.
(33)
We will also assume that there is a point (u0 , v0 ) in the covariate space U × V where r(u0 , v0 ) = 1 for all r ∈ R. Then S(·; x, z) represents the baseline survival function. Notice that in this case model (33) is identical to model (1). If for all values of covariates x, z function S(t) is absolutely continuous and non-vanishing on [0, T ) with the corresponding hazard function h0 (t) then the hazard function for susceptible individuals associated with model (33) is h(t) = rh0 (t). This is why model (33) is also called mixture proportional hazards (PH) cure model. Clearly, identifiability of model (33) within certain families P, S, R implies identifia-
bility of the underlying model (1) within the families P, S. The converse is not necessarily ¯ true. For example, the most basic model G(t) = 1 − p + pS(t), where 0 < p < 1 and
S is a piecewise constant survival function taking values 1 and 0, is identifiable; however, 20
according to statement 3 of Theorem 4, its PH version is not. Conditions under which identifiability of model (33) follows from identifiability of model (1) are addressed in the following proposition, where we exclude the trivial case R = {1} where both models are identical. Proposition 6. Suppose that family R contains a function different from 1. Suppose
also that there is a covariate vector (x0 , z0 ) ∈ X ×Z such that every function S(·; x0 , z0 ) ∈ S
takes values different from 0 and 1 (in particular, this is the case if all such functions are improper). Then identifiability of model (1) within families P and S implies identifiability of model (33) within the families P, S and R.
Proof. Suppose that model (1) is identifiable. To show identifiability of model (33), let ˜ x, z)]r˜(u,v) = 1 − p(x, y) + p(x, y)[S(t; x, z)]r(u,v) 1 − p˜(x, y) + p˜(x, y)[S(t;
(34)
for some p, p˜ ∈ P; S, S˜ ∈ S; r, r˜ ∈ R and for all values of covariates and 0 ≤ t < T. Setting here (u, v) = (u0 , v0 ) we conclude, due to the assumed identifiability of model (1), that p˜ = p and S˜ = S. According to our assumption there is a point t0 ∈ (0, T ) such that 0 < S(t0 ; x0 , z0 ) < 1. We deduce from (34) that the number s0 := S(t0 ; x0 , z0 ) satisfies the r˜(u,v)
condition s0
r(u,v)
= s0
for all (u, v) ∈ U × V, which implies r˜ = r. Therefore, model (33)
is identifiable. 2 Finally, we show that model (33) is fully identifiable for a very general family of survival functions that are not piecewise constant; in particular, this family contains all absolutely continuous survival functions, proper or improper. Furthermore, this is even true for a somewhat more general model allowing functions p and r to share covariates: ¯ x, y, z, u, v) = 1 − p(x, y, u) + p(x, y, u)[S(t; x, z)]r(u,v) , G(t;
0 ≤ t < T.
(35)
Theorem 6. Suppose that for every (x, z) ∈ X × Z and each S(·; x, z) ∈ S, the discrete part of the (proper or improper) probability distribution corresponding to S consists of finitely many (perhaps zero) atoms while the continuous (but not necessarily absolutely continuous) part of this distribution is non-zero. Then model (35) is fully identifiable. Proof. Suppose that for all x ∈ X , y ∈ Y, z ∈ Z, u ∈ U , v ∈ V and t ∈ [0, T ) ˜ x, z)]r˜(u,v) = 1 − p(x, y, u) + p(x, y, u)[S(t; x, z)]r(u,v) . 1 − p˜(x, y, u) + p˜(x, y, u)[S(t;
21
Then
( ) ) ( ˜ x, z)]r˜(u,v) = p(x, y, u) 1 − [S(t; x, z)]r(u,v) . p˜(x, y, u) 1 − [S(t;
(36)
Let τ (x, z) := sup{t : 0 ≤ t < T, S(t; x, z) = 1} and similarly τ˜(x, z) := sup{t : 0 ≤ t < ˜ x, z) = 1}. It follows from (36) that τ (x, z) = τ˜(x, z). We rewrite (36) in the form T, S(t; ˜ x, z)]r˜(u,v) p(x, y, u) 1 − [S(t; = , p˜(x, y, u) 1 − [S(t; x, z)]r(u,v)
τ (x, z) < t < T.
Thus, both sides of this equation are represented by a positive function of covariates x and u alone; we call this function c(x, u). Then p(x, y, u) = c(x, u)˜ p(x, y, u), Solving the equation
x ∈ X , y ∈ Y, u ∈ U .
(37)
˜ x, z)]r˜(u,v) 1 − [S(t; = c(x, u) 1 − [S(t; x, z)]r(u,v)
˜ x, z)]r˜(u,v) we have: for [S(t; ˜ x, z)]r˜(u,v) = 1 − c(x, u) + c(x, u)[S(t; x, z)]r(u,v) . [S(t;
(38)
Observe that this equation holds for all t ∈ [0, T ). Setting (u, v) = (u0 , v0 ) and denoting c0 (x) := c(x, u0 ) yields ˜ x, z) = 1 − c0 (x) + c0 (x)S(t; x, z). S(t;
(39)
This allows us to represent equation (38) in the form [1 − c0 (x) + c0 (x)S(t; x, z)]r˜(u,v) = 1 − c(x, u) + c(x, u)[S(t; x, z)]r(u,v) . We fix covariates x, y, z, u, v and for simplicity of notation suppress them in the last equation to obtain [1 − c0 + c0 S(t)]r˜ = 1 − c + cS r (t) ,
0 ≤ t < T.
(40)
According to the main premise of Theorem 6, the image of the interval [0, T ) under function S contains an open interval, say (α, β), where α < β. Then from equation (40) we deduce that 22
(1 − c0 + c0 w)r˜ = 1 − c + cwr ,
α < w < β.
As functions of w, both sides of this equation are analytic in the half-plane H = {Re w > 0}. By the uniqueness principle for analytic functions they are identical on H. In particular, they coincide on the entire set of positive real numbers: (1 − c0 + c0 w)r˜ = 1 − c + cwr ,
w > 0.
We differentiate this equation m times in variable w to obtain ˜(˜ r − 1) · ... · (˜ r − m + 1)(1 − c0 + c0 w)r˜−m = cr(r − 1) · ... · (r − m + 1)wr−m . cm 0 r In particular, setting here w = 1 we have: cm r˜(˜ r − 1) · ... · (˜ r − m + 1) = c0 r(r − 1) · ... · (r − m + 1) ,
m ≥ 1.
For m = 1, 2, 3 these equations are c˜ r = c0 r,
c2 r˜(˜ r − 1) = c0 r(r − 1),
c3 r˜(˜ r − 1)(˜ r − 2) = c0 r(r − 1)(r − 2).
(41)
It follows from the second equation that if r = 1 then also r˜ = 1 and vice versa. Assuming that r, r˜ ̸= 1 and dividing the second equation by the first equation and the third equation by the second equation we find that c(˜ r − 1) = r − 1 and c(˜ r − 2) = r − 2.
(42)
Subtraction of the second equation in (42) from the first one yields c = c(x, u) = 1 for all x ∈ X , u ∈ U. Then also c0 = c(x, u0 ) = 1 for all x ∈ X . From the first equation in (41) we conclude that r˜ = r while from (39) we derive that S˜ = S. Finally, equation (37) implies that p˜ = p. Taken together, these conclusions suggest that model (35) is fully identifiable. 2 Notice that identifiability of model (35) follows, without any change to the proof, from a weaker condition on the survival functions S, specifically that the range of every function S in the family S contains a limit point.
23
5. Identifiability of the Yakovlev Model In this section, we address identification of a non-mixture BCH model introduced in Yakovlev et al. (1993) and Yakovlev (1994) (see also Yakovlev and Tsodikov 1996) and alternatively called Yakovlev model. According to this model, the survival function is given by ¯ x, y, z) = exp{−θ(x, y)F (t; x, z)}, x ∈ X , y ∈ Y, z ∈ Z, 0 ≤ t < T, G(t;
(43)
where θ(x, y) is a positive function of covariates (x, y) ∈ X × Y from a family Θ and F (t; x, z) is a cdf type function on [0, T ), not necessarily proper, from a family F.
Definition 7. Model (43) is identifiable within the families Θ and F if the equality ¯ 1 (t; x, y, z) = G ¯ 2 (t; x, y, z), G
x ∈ X , y ∈ Y, z ∈ Z, 0 ≤ t < T,
where ¯ i (t; x, y, z) = exp{−θi (x, y)Fi (t; x, z)}, G
i = 1, 2,
for some functions θ1 , θ2 ∈ Θ and F1 , F2 ∈ F implies that θ1 (x, y) = θ2 (x, y) for all x ∈ X , y ∈ Y and F1 (t; x, z) = F2 (t; x, z) for all 0 ≤ t < T, x ∈ X and z ∈ Z.
In the case where cdf’s F ∈ F are proper, i.e. F (T −; x, z) = 1 for all x ∈ X , z ∈ Z and
F ∈ F, we have the following elementary result that can be proved similarly to Proposition 2. Proposition 7. If all cdf ’s in the family F are proper then model (43) is identifiable.
In view of Proposition 7 we assume below that T < ∞ and that family F contains
improper cdf’s. We recall that for F ∈ F and x ∈ X the quantity ∥F ∥x is defined in (9). Our main result regarding general model (43) is as follows.
Theorem 7. 1. Suppose that ∥F ∥x = 1 for all x ∈ X and F ∈ F. Then model (43) is identifiable. 2. Suppose that functions in the families Θ and F do not share covariates. If either of
the families Θ or F is not weakly scalable then model (43) is identifiable.
3. Model (43) is non-identifiable if one of the families Θ and F is scalable and the other
is weakly scalable. Proof. 1. Suppose that θ1 (x, y)F1 (t; x, z) = θ2 (x, y)F2 (t; x, z). 24
(44)
Recalling the definition of τ1 (x, z) and τ2 (x, z), see equation (10), we conclude from (44) that τ1 (x, z) = τ2 (x, z). Denote the common value of these quantities by τ (x, z). Then F2 (t; x, z) θ1 (x, y) = , θ2 (x, y) F1 (t; x, z)
x ∈ X , y ∈ Y, z ∈ Z, τ (x, z) < t < T.
Therefore, there is a positive function c(x) such that θ1 (x, y) = c(x)θ2 (x, y),
x ∈ X , y ∈ Y,
(45)
and F2 (t; x, z) = c(x)F1 (t; x, z),
x ∈ X , z ∈ Z, 0 ≤ t < T.
(46)
Equation (46) yields ∥F2 ∥x = c(x)∥F1 ∥x . According to our assumption, ∥F1 ∥x = ∥F2 ∥x = 1. Hence c(x) = 1 for all x ∈ X , which in view of equations (45) and (46) implies that model (43) is identifiable. 2. If functions θ and F do not share covariates then (44) becomes θ1 (y)F1 (t; z) = θ2 (y)F2 (t; z),
y ∈ Y, z ∈ Z, 0 ≤ t < T.
As shown in the proof of statement 1, there exists constant c > 0 such that θ1 (y) = cθ2 (y) for all y ∈ Y and F2 (t; z) = cF1 (t; z) for all z ∈ Z and 0 ≤ t < T. Non-weak scalability of at least one of the families Θ and F implies that c = 1. Thus, model (43) is identifiable.
3. Suppose family F is weakly scalable and family Θ is scalable. Then for some F ∈ F
and c > 1 we have F/c ∈ F and cθ ∈ Θ. This implies non-identifiability of model (43).
The argument for the case where family F is scalable and family Θ is weakly scalable is very similar. 2 Finally, suppose that functions in the family Θ are parameterized through the Cox model or its extension, see equations (27) and (28). To ensure identifiability of parameterizations (27) or (28) of the family Θ, we will assume in accordance with Proposition 1 that adim(A) = d or ldim(A) = d, respectively, where A = X × Y ⊂ Rd . Theorem 8. 1. Model (43) with function θ of the form (27) is non-identifiable within any weakly scalable family F.
2. If adim(X ×Y) < d then model (43) with function θ of the form (28) with A = X ×Y
is non-identifiable within any weakly scalable family F.
25
3. Suppose that functions θ and F do not share covariates. Let θ be given by (28), where A = Y is the set of covariates of θ and ldim(Y) = d. If adim(Y) = d then model (43) is identifiable within any family F.
Proof. 1. Pick a function F ∈ F such that F˜ := F/c ∈ F for some c > 1. Let θ be any ˜ function of the form (27). Setting a ˜ = a + log c and θ(α) = exp{˜ a + b · α} we have θ˜ = cθ, so that θ˜F˜ = θF. This shows that model (43) is not identifiable.
2. Select a function F ∈ F and a constant c > 0, c ̸= 1, such that F˜ := F/c ∈ F. According to Proposition 5, condition adim(A) < d means that model (28) is scalable. ˜ Therefore, there exist vectors b, ˜b ∈ Rd such that functions θ(α) = exp{b · α} and θ(α) =
exp{˜b · α} satisfy the relation θ˜ = cθ. Therefore, θ˜F˜ = θF, and hence model (43) is not identifiable. 3. In the case at hand model (43) takes on the form G(t; y, z) = exp{−θ(y)F (t; z)},
(y, z) ∈ Y × Z,
(47)
where Y ⊂ Rd and θ(y) = exp{b · y} with b ∈ Rd . Suppose that for some functions F, F˜ ∈ F and vectors b, ˜b ∈ Rd exp{˜b · y}F˜ (t; z) = exp{b · y}F (t; z),
y ∈ Y, z ∈ Z, 0 ≤ t < T.
As we have shown above on several occasions, this implies the existence of a constant c > 0 such that F (t; z) = cF˜ (t; z) for all z ∈ Z and t ∈ [0, T ) and exp{˜b · y} = c exp{b · y} for all y ∈ Y. The assumption adim(Y) = d informs us through Proposition 5 that model (28) is not weakly scalable. Therefore, c = 1, which implies that F˜ = F. Also, from ldim(Y) = d we conclude that model (28) is identifiable, and hence ˜b = b. Thus, model (47) with function θ of the form (28) is identifiable. 2 6. Discussion In this article, we studied identification of various classes of nonparametric, semiparametric and fully parametric cure models whose components may depend in a very general way on covariates. Our main focus was on mixture models, mixture PH models and their extensions as well as on BCH (or Yakovlev) models. Identifiability of these models (or lack thereof) is a complex phenomenon that depends on the type of the model, its covariate and parametric structure, the size and geometry of the design spaces, and scalability 26
properties of the associated families of mixture, survival and hazard functions. We found that employment of proper survival functions typically entails identifiability of cure models. On the contrary, the following properties of cure models may prevent identifiability: (1) the presence of improper survival functions resulting from insufficient follow-up time and/or censoring; (2) the presence, for all values of covariates, of non-susceptible, unexposed, curable or cured subpopulations (see statement 3 of Theorem 1); (3) sharing of covariates between various components of cure models; (4) scalability or weak scalability of the appropriate families of functions involved in cure models; and (5) insufficient size or geometric degeneration of the design spaces for covariates of cure models. In the present work, we developed a general theory of identification for some commonly used basic cure models. Consideration of particular models of these and other similar types will undoubtedly add further richness to the general theory presented above. In particular, one can investigate special types of informative or non-informative censoring and different ways in which it is incorporated into cure models. Our approach to analyzing identifiability can be applied to proportional odds cure models (Mao and Wang, 2010), frailty cure models (Peng and Zhang, 2008; Tournoud and Ecochard, 2008) as well as to models combining longitudinal and survival data (Yu et al., 2008). Another possibility for extending the results of this work is to consider more general joint design spaces that are not necessarily represented as the product of design spaces for particular covariates. This work was motivated by the paper by Li, Taylor and Sy (2001) and the desire to overcome its shortcomings and broadly generalize its results. Specifically, the aforementioned paper (1) unnecessarily restricted design spaces to intervals of the real line; (2) prohibited sharing of covariates by various components of cure models and dependence of the survival function of the latency time in mixture and mixture PH models as well as cdf in Yakovlev model on covariates, which is unrealistic in many practical settings; (3) disregarded, or did not explicitly articulate, the notion of scalability and weak scalability of function families that, as we showed above, are critical for identification of cure models; and (4) did not fully uncover substantial differences, mostly manifesting through scalability, between discrete and absolutely continuous latency time distributions as far as identifica-
27
tion of cure models is concerned. All these deficiencies were addressed in the present work. Specifically, we dealt with more general design spaces, allowed for an arbitrary dependence of the components of cure models on covariates, provided not only sufficient but also necessary conditions for identifiability of cure models, took full and explicit account of scalability properties, and established results on identifiability of cure models for piecewise constant and absolutely continuous survival functions, the two classes most commonly used in applications. Acknowledgments The work was conceived during the visit of the first author to the Institute of Statistics of the Taiwan National Tsing-Hua University (Hsin-Chu, Taiwan). The visit was supported by the Visiting Scholar grant 101-14 from the Mathematics Research Promotion Center in Taiwan awarded to the second author. This support is greatly appreciated. References [1] Berkson, J. and Gage, R.P. (1952). Survival curves for cancer patients following treatment. Journal of the American Statistical Association 47, 501–515. [2] Boag, J.M. (1949). Maximum likelihood estimates of the proportions of patients cured by cancer therapy. Journal of the Royal Statistical Society, Series B 11, 15-53. [3] Chen, M.H., Ibrahim, J.G. and Sinha, D. (1999). A new Bayesian model for survival data with a surviving fraction. Journal of the American Statistical Association 94, 909–919. [4] Cooner, F., Banerjee, S., Carlin, B.P. and Sinha, D. (2007). Flexible cure rate modeling under latent activation schemes. Journal of the American Statistical Association 102, 560–572. [5] Farewell, V.T. (1982). The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38, 1041–1046. [6] Hanin, L.G. (2002). Identification problem for stochastic models with application to carcinogenesis, cancer detection and radiation biology. Discrete Dynamics in Nature and Society 7, 177–189.
28
[7] Kuk, A.Y.C. and Chen, C.-H. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika 79, 531–541. [8] Li, C.-S., Taylor, J.M.G. and Sy, J.P. (2001). Identifiability of cure models. Statistics and Probability Letters 54, 389–395. [9] Lu, W. and Ying, Z. (2004). On semiparametric transformation cure models. Biometrika 91, 331–343. [10] Ma, Y. and Yin, G. (2008). Cure rate model with mismeasured covariates under transformation. Journal of the American Statistical Association 103, 743–756. [11] Maller, R.A. and Zhou, S. (1996). Survival Analysis With Long-Term Survivors, Wiley, Chichester, U.K. [12] Mao, M. and Wang, J.-L. (2010). Semiparametric efficient estimation for a class of generalized proportional odds cure models. Journal of the American Statistical Association 105, 302–311. [13] Peng, Y. and Dear, K.B.G. (2000). A nonparametric mixture model for cure rate estimation. Biometrics 56, 237–243. [14] Peng, Y., and Zhang, J. (2008). Identifiability of a mixture cure frailty model. Statistics and Probability Letters 78, 2604–2608. [15] Sy, J.P. and Taylor, J.M.G. (2000). Estimation in a Cox proportional hazards cure model. Biometrics 56, 227–236. [16] Taylor, J.M.G. (1995). Semi-parametric estimation in failure time mixture models. Biometrics 51, 899–907. [17] Tournoud, M. and Ecochard, R. (2008). Promotion time models with time-changing exposure and heterogeneity: Application to infectious diseases. Biometrical Journal 50, 395-407. [18] Tsodikov, A.D., Ibrahim, J.G. and Yakovlev, A.Y. (2003). Estimating cure rates from survival data: An alternative to two-component mixture models. Journal of the American Statistical Association 98, 1063–1078. [19] Yakovlev, A.Y. (1994). Letter to the Editor. Statistics in Medicine 13, 983–986. [20] Yakovlev, A.Y., Asselain, B., Bardou, V.-J., Fourquet, A., Hoang, T., Rochefordi`ere, 29
A. and Tsodikov, A.D. (1993). A simple stochastic model of tumor recurrence and its application to data on premenopausal breast cancer. In: Asselain, B., Boniface, M., Duby, C., Lopez, C., Masson, J.-P. and Tranchefort, J., eds, Biom´etrie et Analyse de Donn´ees Spatio-Temporelles, vol. 12, Rennes, France: Soci´et´e Fran¸caise de Biom´etrie, ENSA, 66–82. [21] Yakovlev, A.Y. and Tsodikov, A.D. (1996). Stochastic Models of Tumor Latency and Their Biostatistical Applications, World Scientific, Singapore. [22] Yu, M., Taylor, J.M.G. and Sandler, H. (2008). Individualized prediction in prostate cancer studies using a joint longitudinal-survival-cure model. Journal of the American Statistical Association 103, 178–187. [23] Zeng, D., Yin, G. and Ibrahim, J.G. (2006). Semiparametric transformation models for survival data with a cure fraction. Journal of the American Statistical Association 101, 670–684.
30