Journal of Statistical Planning and Inference 74 (1998) 31– 49
Ancillarity properties of generalized residuals with applications in failure time models a
Inmaculada B. Aban a , Edsel A. Pe˜na b; ∗ Department of Mathematics, University of Nevada, Reno, Nevada, USA of Mathematics and Statistics, Bowling Green State University, Bowling Green, Ohio 43403, USA
b Department
Received 15 April 1996; received in revised form 17 February 1998; accepted 18 February 1998
Abstract This paper addresses the issue of when residuals from failure time models, which are useful in model validation and diagnostics, possess a conditional ancillarity property. This property states that the distribution of the residuals depends on the model parameters only through a many-toone function of these parameters, which in certain models turn out to be the censoring proportion. Concrete results are obtained for models which possess an invariance structure, and these results are applied to commonly used failure time models. Aside from furthering our understanding of the distributional structure of residuals, this conditional ancillarity property can be exploited to study in a more ecient manner the distributional properties of residuals either analytically c 1998 Elsevier Science B.V. All rights reserved. and=or through numerical methods. Keywords: Accelerated failure time model; Censored data; Conditional ancillarity; Equivariance; Invariant statistical models; Model diagnostics and validation; Proportional hazards model; Type II censoring
1. Introduction and motivation Generalized residuals, which were rst studied by Cox and Snell (1968), play an important role in statistical hypotheses testing such as in goodness-of- t tests, model validation and diagnostics. Exact and asymptotic distributional properties of residuals arising in conventional linear regression models are well known (cf., Cook and Weisberg, 1982). For residuals arising in survival models (cf., Kay, 1977; Crowley and Hu, 1977; Kalb eisch and Prentice, 1980; Cox and Oakes, 1984), where data are typically censored, some of their properties have been studied recently. Horowitz and Neumann (1992) obtained the asymptotic properties of test statistics when applied to residuals arising from the Cox proportional hazards model (Cox (1972)), ∗
Corresponding author.
c 1998 Elsevier Science B.V. All rights reserved. 0378-3758/98/$ – see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 9 8 ) 0 0 0 9 4 - 9
32
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
while Baltazar-Aban and Pe˜na (1995) and Pe˜na (1995) obtained exact distributional results for residuals under the random and Type II censorship models, respectively. In Pe˜na (1995) the asymptotic distributions of a class of test statistics when applied to residuals from a Type II censored model were also obtained. In the present paper conditional ancillarity properties of residuals from failure time models are presented. To motivate the problem dealt with in this paper from a survival analysis viewpoint, let us consider the random censorship model. In this model we have T1 ; : : : ; Tn independent and identically distributed (i.i.d.) random variables from a hazard function T (·; ) and C1 ; : : : ; Cn are i.i.d. random variables from a hazard function C (·; ), and with the Ti ’s and the Ci ’s independent. The observables are (Z ; ) = {(Zi ; i ): i = 1; : : : ; n}, where Zi = Ti ∧ Ci and i = I {Ti 6Ci } with ‘∧’ and I {·} denoting minimum and indicator function, respectively. The true hazard-based right-censored residual vector is de ned by (R0 ; ) ≡ {(R0i ; i ): i = 1; : : : ; n}, where R0i = T (Zi ; ) = T (Ti ; ) ∧ T (Ci ; ):
(1.1)
Note that i = I {T (Ti ; )6T (Ci ; )}. There could be other ways of de ning the true residuals under this model (e.g., R0i = Z (Zi ; ; )). However, since interest usually focuses in assessing the model associated with the failure times and not particularly the model associated with the censoring times, the de nition in Eq. (1.1) is the most natural. This is also the de nition adopted in the literature (cf., Kay, 1977; Crowley and Hu, 1977). Since the T (Ti ; )’s are i.i.d. from an exponential distribution with mean unity by the unit exponentiality property of hazard functions, the distribution of (R0 ; ) depends on (; ) only through the dependence of the distribution of T (Ci ; ) on (; ). But P{T (Ci ; )¿w} = exp{−C [−1 T (w; ); ]}; so that the distribution of (R0 ; ) depends on (; ) only through the function ∗ (·; ; ) = C [−1 T (·; ); ]:
(1.2)
Given the function ∗ (·; ; ), the true hazard-based residuals (R0 ; ) are therefore conditionally ancillary quantities in the sense that their distributions depend on (; ) only through ∗ (·; ; ). In the special case where C (·; ) = 0, i.e., no censoring, the vector of true hazard-based residuals have distributions that are independent of (; ), so that they are ancillary quantities. For example, if T (t; ) = t and C (c; ) = c, then ∗ (t; ; ) = (=)t, which is equivalent to (; ) ≡ = =( + ), the probability of an uncensored observation. Thus, in this example, the true residuals (R0 ; ) are conditionally ancillary given . The true hazard-based residuals (R0 ; ) de ned in Eq. (1.1) are, however, not observable since the true values of (; ) are not known but are known only to belong to some parameter space × . Rather, what is observable is the (estimated) residual vector (R; ) ≡ {(Ri ; i ): i = 1; : : : ; n}, where ˆ Ri = T (Zi ; )
(1.3)
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
33
ˆ ; ) an estimator of based on (Z ; ). In the concrete example above, with ˆ = (Z if one uses the maximum likelihood estimator (MLE) of , which is given by ˆ = Pn Pn D= i=1 Zi , where D = i=1 i , it was shown in Baltazar-Aban and Pe˜na (1995) that d
d
(R; ) = (DV ; ). Here ‘=’ denotes ‘equal-in-distribution’, V = (V1 ; : : : ; Vn ) has a singular Dirichlet distribution with parameter vector (1; 1; : : : ; 1), is a vector of i.i.d. Bernoulli variables with parameter , D is binomial with parameters n and , and V and are independent. Thus, in this example, (R; ) is a conditionally ancillary statistic given the censoring parameter . Note that without censoring, is a vector of 1’s, and hence the residuals are ancillary statistics. Motivated by this and similar observations in other settings (cf., Baltazar-Aban and Pe˜na (1995); Pe˜na (1995)), it is of interest to address the following question. If (R0 ; ) is a vector of conditionally ancillary quantities given ∗ (·; ; ), does it imply that (R; ) is a vector of conditionally ancillary statistics given ∗ (·; ; )? Such a conditional ancillarity property can be exploited when one is studying the distributional properties of residuals, either analytically or through Monte Carlo methods. The idea is if this property holds, then it is sucient to consider one arbitrary value of (; ) for each of the equivalence classes generated by the function ∗ . We now outline the contents of this paper. In Section 2 we formulate the problem more generally and characterize statistical models where the above phenomenon holds. The abstract formulation enables us to obtain general results which are applicable in other models, aside from failure time models, and facilitates a more ecient proof of the results. In Section 3 we apply the general results to residuals arising from commonly encountered failure time models. A summary is provided in Section 4.
2. Conditional ancillarity in general models The general characterization results in this section are not limited to failure time models. In order to gain this generality, we therefore treat the problem by considering abstract statistical models. Thus, let (X; A; P = {P(; ) : (; ) ∈ × }) be a statistical model, i.e., (X; A) is some measurable space, and for each (; ) ∈ × ; P(; ) is a probability measure on (X; A). The parameter is considered to be the parameter of interest, while the parameter is viewed as a nuisance parameter. These parameters may be nite dimensional or in nite dimensional. Let = { : ∈ } be such that that the mapping (x; ) 7→ (x) from (X × ; A ⊗ ()) to (Y; B) is measurable, and such that, for each (; ) ∈ × , P(; ) −1 = Q(; ) ;
(2.1)
where : × → R for some space R, and {Q : ∈ R} is a collection of probability measures on (Y; B) parametrized by . Let ˆ be an estimator of , i.e., ˆ is a measurable mapping from (X; A) into ˆ (; ()). For x ∈ X; (x) represents the associated estimate of . The main problem
34
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
dealt with in this section is to examine conditions on and ˆ such that, for each (; ) ∈ × , ∗ P(; ) −1 = Q(; ) ; ˆ
(2.2)
where {Q∗ : ∈ R} is some collection of probability measures on (Y; B) parametrized by , which may be dierent from {Q : ∈ R}. Since the mapping ˆ : (X; A) → (Y; B) is measurable, it is a statistic. If (; ) is constant in (; ), Eq. (2.2) is equivalent to stating that ˆ is an ancillary statistic. For our purposes we allow the more general situation where (; ) is nonconstant in (; ). In particular, the case where : × → R is a many-to-one mapping is of interest. The following notion will be of interest. De nition 2.1. The statistic ˆ : X → Y is said to be conditionally ancillary, given (; ), if Eq. (2.2) is satis ed. Note that there is an asymmetry in the two parameters and since is involved in the class of transformations and is also viewed as the parameter of interest. In view of this asymmetry, it is of interest to consider reparametrizations of (; ) which leaves unchanged. This notion is formally introduced in the following de nition. De nition 2.2. The mapping : × → × R is said to be a -invariant reparametrization if it is injective and for each (; ) ∈ × ; (; ) = (; (; )), where : × → R. The new parameter is denoted by (; ). The question of which -invariant reparametrization is most useful for examining the conditional ancillarity of ˆ will now be addressed. To do this, certain invariance assumptions need to be imposed on the statistical model. Recall that in classical settings, ancillarity and model invariance are intrinsically related (cf., Lehmann, 1986). In this case, it turns out that invariance also plays a role in the development of the concept of conditional ancillarity. Recall the de nition of model invariance. De nition 2.3. Let G = {g : (X; A) → (X; A)} be a group of measurable transformations on (X; A), and G = {g : × → × } be a group of transformations on The statistical model × such that there is a homomorphism g 7→ g from G to G. (X; A; P = {P(; ) : (; ) ∈ × }) is said to be (G; G)-invariant if P(; ) {g−1 A} = Pg(; ) {A} for every (; ) ∈ × ; A ∈ A; and g ∈ G. i.e., The group G induces a partition of × consisting of the orbits of G, ): g ∈ G}. We say that (1 ; 1 ) is GOG = {[; ]: (; ) ∈ × }, where [; ] = {g(; G
equivalent to (2 ; 2 ), denoted by (1 ; 1 ) ∼ (2 ; 2 ), if (2 ; 2 ) ∈ [1 ; 1 ]. Furthermore, recall that a mapping : × → R is G-invariant if (; ) = (g(; )) for each (; ) ∈ × ; g ∈ G; and is maximal G-invariant if it is G-invariant and (1 ; 1 ) =
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
35
G
(2 ; 2 ) implies that (1 ; 1 ) ∼ (2 ; 2 ). Since in this setting the parameter is of interest, a question that arises is whether it is possible to have a -invariant reparametriza tion (; ) 7→ (; ) such that : × → R is maximal G-invariant. The following proposition addresses this issue.
If Proposition 2.1. Let (X; A; P = {P(; ) : (; ) ∈ × }) be (G; G)-invariant. ∗ then there g(; ) = (; ) for some (; ) ∈ × implies that g = 1g (identity in G), ∗
exists a -invariant reparametrization (; ) 7→ (; ∗ (; )) such that ∗ is maximal G-invariant. In particular; one may de ne ∗ to be ∗ (; ) = [; ]: Proof. De ne ∗ (; ) = (; ∗ (; )) where ∗ (; ) = [; ]. Suppose that (1 ; [1 ; 1 ]) = (2 ; [2 ; 2 ]). Then 1 = 2 and so [1 ; 1 ] = [1 ; 2 ]. This implies that there exists a g ∈ G such that g( 1 ; 1 ) = (1 ; 2 ), which by the condition of the proposition implies that g = 1g; so that 1 = 2 . Thus (1 ; [1 ; 1 ]) = (2 ; [2 ; 2 ]) implies that (1 ; 1 ) = (2 ; 2 ) proving the injectiveness of ∗ . Thus, ∗ is a -invariant reparametrization and ∗ is maximal G-invariant. The function ∗ induces an equivalence relation de ned as follows. If (1 ; 1 ) and (2 ; 2 ) are such that ∗ (2 ; 2 ) = ∗ (1 ; 1 ), then (1 ; 1 ) and (2 ; 2 ) are ∗ -equivalent, ∗
denoted by (1 ; 1 ) ∼ (2 ; 2 ). Clearly, under the conditions of Proposition 2.1, the G
∗
equivalence relations ∼ and ∼ coincide. From Proposition 2.1, if (X; A; P = {P(; ) : (; ) ∈ × }) is (G; G)-invariant, and if there exists (; ) ∈ × such that g(; ) ∗ = (; ) implies g = 1G , then one could reparametrize P using (; ∗ ) such that ∗ is maximal G-invariant. To see this, de ne a family of probability measures on (X; A) ∗ ∗ given by P∗ = {P(; ∗ ) : (; ) ∈ × R} where ∗ P(; ∗ ) (A) = P(; ) (A);
∀A ∈ A;
(2.3)
whenever ∗ = [; ]. Note that the de ning relation for P ∗ in Eq. (2.3) is well de ned since, for given , the map that brings to ∗ is a bijection. For g ∈ G, let g ≡ (g1 ; g2 ) be its homomorphic image. If ∗ = [; ], then we have that ∗ −1 −1 ∗ P(; ∗ ) (g A) = P(; ) (g A) = P(g1 (; ); [g1 (; ); g2 (; )]) (A) ∗ = P(∗g1 (; ); [g(; )]) (A) = P(g∗ (; ∗ ); ∗ ) (A) 1
since [g(; )] = [; ] = ∗ and with g∗1 : × R → de ned according to g∗1 (; ∗ ) = g1 (; ) for ∗ = [; ]. Letting g∗ : × R → × R with g∗ (; ∗ ) = (g∗1 (; ∗ ); ∗ ) = (g1 (; ); [; ]) for ∗ = [; ], then for g ∈ G, (; ∗ ) ∈ × R and A ∈ A, ∗ −1 ∗ ∗ P(; ∗ ) (g A) = Pg∗ (; ∗ ) (A) = P(g∗ (; ∗ ); ∗ ) (A): 1
(2.4)
∗ Since the mapping g 7→ g∗ from G into G = {g∗ : g ∈ G} is a homomorphism, then ∗ it follows that the reparametrized model (X; A; P∗ ) is (G; G )-invariant. Indeed, if
36
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
∗ −1 = {g : × → × R is de ned via (; ) = (; [; ]), then G = G −1 : g ∈ G}. The model invariance condition in Eq. (2.4) can be viewed as a conditional invariance property (on ∗ ). The important thing to note is that Eq. (2.4) indicates the asymmetry in viewing the parameters and . In the reparametrized model, the is concealed in the maximal invariant parameter ∗ . At this stage of our development, there is the temptation to make the notation more compact by writing g∗1 (; ∗ ) = g∗ for some g∗ : → . However, this is counterproductive as it could lead to confusion, so we therefore employ the present notation which explicitly shows the possible dependence of the transformed -value on ∗ . For the models of interest to us, which are those where generalized residuals and=or pivotal quantities with respect to the parameter of interest are de ned, it will be seen later that the relevant g∗1 needs to be constant in ∗ . In terms of the reparametrized statistical model one would like to obtain equivalent forms of Eqs. (2.1) and (2.2). In fact, one would like to know when the (; ) in Eq. (2.1) will coincide (or be equivalent) with the maximal G-invariant parameter ∗ in Proposition 2.1, the second parameter in the reparametrized model. For this goal we extend the equivariance notion to models with nuisance parameters. ∗ De nition 2.4. A (measurable) mapping : X × → Y will be said to be (G; G )∗ ∗ ∗ invariant if P(; ∗ ) {x : (x) = g1 (; ∗ ) (gx)} = 1, for every (; ) ∈ × R; g ∈ G; ∗ while a (measurable) mapping : X → will be said to be (G; G )-equivariant if ∗ ∗ ∗ P(; ∗ ) {x: (gx) = g1 ((x); r)} = 1, for every (; ) ∈ × R; r ∈ R; g ∈ G: ∗
The de nition we are adopting for the (G; G )-invariance of the mapping (x; ) 7→ (x) has the consequence that for some invariant statistical models there will be no ∗ (nontrivial) mappings : X × → Y which are (G; G )-invariant. The reason for this is that in order to obtain nontrivial invariant (x) according to De nition 2.4, the invariant statistical model should have the property that the mappings g∗1 : × R → are constant in ∗ , or equivalently, the mappings g1 : × → are constant in . Consequently, the results presented later will not be of relevance to invariant models which do not satisfy the just mentioned property concerning g∗1 . Note, however, that this is not a restriction since the models of interest are those where generalized residuals and=or pivotal quantities with respect to the parameter of interest exist, and these are models where an invariant : X × → Y exists, hence the appropriate g∗1 will be constant in ∗ . In some applications in Section 3, we will therefore simply write g∗1 (; ∗ ) ≡ g∗ . The above-mentioned exclusion of some invariant statistical models in the applications of the results is vividly illustrated using a model presented by a referee. This model is given by X≡
X1 X2
∼ Normal
0 1 ; ; 2 0
(; 1 ; 2 ) ∈ < × <2+ :
(2.5)
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
37
√ Note that the parameters satisfy the inequality ||6 1 2 . De ning the groups G1 = {(x1 ; x2 ) 7→ (x1 + ax2 ; x2 ): a ∈ <}; G 1 = {(; 1 ; 2 ) 7→ ( + a2 ; 1 + 2a + a2 2 ; 2 ): a ∈ <}; then the model is (G1 ; G 1 )-invariant. With as the parameter of interest and (1 ; 2 ) as the nuisance parameters, and letting (; 1 ; 2 ) 7→ ∗ = {( + a2 ; 1 + 2a + a2 2 ; 2 ) : a ∈ <}; then Eq. (2.4) is satis ed with g1 (; 1 ; 2 ) = +a2 . For this (G1 ; G 1 )-invariant model there are, however, no nontrivial mappings ((x1 ; x2 ); ) 7→ (x1 ; x2 ) which are (G1 ; G 1 )-invariant, hence there are no nontrivial pivotal quantities for nor nontrivial generalized residuals involving (x1 ; x2 ) and only. This is therefore a model which is not of particular relevance to us from the perspective of the study of generalized residuals. However, if we restrict the parameter space to (<\{0}) × <2+ and now de ne the groups of transformations G2 = {(x1 ; x2 ) 7→ (ax1 ; ax2 ): a ∈ <\{0}}; G 2 = {(; 1 ; 2 ) 7→ (a2 ; a2 1 ; a2 2 ): a ∈ <\{0}}; then the model is (G2 ; G 2 )-invariant. Note that the mapping (; 1 ; 2 ) 7→ a2 does not depend on the nuisance parameters (1 ; 2 ). The G 2 -induced equivalence classes are [; 1 ; 2 ] = {(a2 ; a2 1 ; a2 2 ): a ∈ <\{0}}, which are in one-to-one correspondence with ∗ = (1 =; 2 =), so the latter could serve as the maximal invariant parameter. Given a b = (b1 ; b2 )0 ∈ <2 , we may de ne the mapping b0 x b1 x1 + b2 x2 √ ; ((x1 ; x2 ); ) 7→ ((x1 ; x2 ); b) = √ =
(2.6)
which satis es a2 ((ax1 ; ax2 ); b) = ((x1 ; x2 ); b), hence is (G2 ; G 2 )-invariant. In Example 2.1, we will apply our main result (Theorem 2.1) when the in Eq. (2.6) is replaced by an estimator. These bivariate normal examples with two dierent invariance structures illustrate the need to have g∗1 to be constant in ∗ in order for an invariant : X × → Y to exist. This constancy condition on g∗1 is therefore a natural consequence of our interest in models where there is a parameter of interest, , and a nuisance parameter, , and the generalized residual is obtained by rst de ning a mapping (x; ) 7→ (x) whose distribution depends on (; ) only through some function (; ), and then nally obtaining the observable generalized residual by replacing by an estimator based on x. Such statistical models are prevalent in survival, reliability and econometric models where the observable data are usually right-censored. In such situations, the generalized residuals are used in model validation and diagnostics (cf., Crowley and Hu, 1977; Kay, 1977; Kalb eisch and Prentice, 1980; Cox and Oakes, 1984), although ˆ in many of these applications, the fact that the is replaced by an estimator (x) in (x) is usually not taken into account. It was pointed out in Baltazar-Aban and Pe˜na (1995) that without proper recognition of the eects of such a substitution, misleading
38
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
conclusions from these residual-based validation procedures could arise. We expect that the results of the present paper will contribute to an enhanced understanding of distributional properties of generalized residuals. Proposition 2.2. Under the conditions of Proposition 2:1 and if the members of = ∗ { : ∈ } are (G; G )-invariant; then the probability measures {P(; ) −1 : (; ) ∈ × } on (Y; B) depend on (; ) only through the maximal G-invariant ∗ , i.e., : X × →Y is a conditionally ancillary quantity given the maximal Ginvariant ∗ . Proof. Consider (1 ; 1 ) and (2 ; 2 ) with ∗ = [1 ; 1 ] = [2 ; 2 ], so that for some 1 ; 1 ). Recalling that : × → × R is de ned via (; ) = g ∈ G, (2 ; 2 ) = g( −1 , then (2 ; ∗ ) = g∗ (1 ; ∗ ). Consequently, (; [; ]) and letting g∗ = g −1 ∗ P(2 ; 2 ) −1 2 (B) = P(2 ; ∗ ) 2 (B)
= Pg∗∗ (1 ; ∗ ) −1 g∗ (1 ; ∗ ) (B) 1
= Pg∗∗ (1 ; ∗ ) {x: g∗1 (1 ; ∗ ) (x) ∈ B} = Pg∗∗ (1 ; ∗ ) {x: 1 (g−1 x) ∈ B}
∗ by the (G; G )-invariance
of =
Pg∗∗ (1 ; ∗ ) {gx:
=
∗ P( ∗ {x: 1; )
=
P(1 ; 1 ) −1 1 (B):
1 (x) ∈ B}
1 (x) ∈ B}
by the model invariance
∗ Therefore, the value of P(; ) −1 (B) depends on (; ) only through = [; ].
Proposition 2.2 simply states that the distribution of depends only on ∗ . It is therefore conceivable that further reduction is possible in the sense that the distribution of depends on ∗ through h ◦ ∗ , where h : R → H and ‘◦’ denotes composition of functions. However, the determination of such an h may have to be done directly. We take this possibility into account in the following theorem, which provides an answer to the question posed in this section with regard to the conditional ancillarity of ˆ under an invariant statistical model. Theorem 2.1. Assume the conditions of Proposition 2:1. If the members of = { : ∗ ∈ } are (G; G )-invariant and satisfy the condition that; for every (; ) ∈ × ; P(; ) ({x : −1 (x) 6= {x}}) = 0; the probability measures {P(; ) −1 : (; ) ∈ × } depend on (; ) only through ∗ ∗ ˆ : (; ) ∈ × } depends on (; ) h◦ ; and is (G; G )-equivariant; then {P(; ) −1 ˆ only through h ◦ ∗ ; so that ˆ : X → Y is conditionally ancillary given h ◦ ∗ .
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
39
Proof. For every (; ) ∈ × , B ∈ B, g ∈ G, and with h∗ ≡ h ◦ ∗ , we have the following sequence of equalities: −1 (B) = P(; ) −1 P(; ) −1 ˆ (B) ˆ
= Qh∗ −1 (B) ˆ
where Qh∗ = P(; ) −1
= Qh∗ {x: (x) ˆ (x) ∈ B} = Qh∗ g∗1 (; ∗ ) {gx: (x) ˆ (x) ∈ B}
∗
by (G; G )-invariance
of −1 = Qh∗ g∗1 (; ∗ ) {x: (g ˆ −1 x) (g x) ∈ B} −1 = Qh∗ g∗1 (; ∗ ) {x: (g∗ )−1 ((x); ˆ ∗ ) (g x) ∈ B} 1
since ˆ is
∗ (G; G )−equivariant
= Qh∗ g∗1 (; ∗ ) {x: (x) ˆ (x) ∈ B}
∗ by (G; G )-invariance
of (B): = Qh∗ g∗1 (; ∗ ) −1 ˆ )) for every g ∈ G, then for some xed Consequently, since h∗ (; ) = h∗ (g(; (0 ; 0 ), −1 −1 Pg( 0 ; 0 ) ˆ (B) = Qh∗ (0 ; 0 ) 0 ˆ (B)
for every g ∈ G;
depends only on h∗ (; ). implying that P(; ) −1 ˆ Remark 2.1. Another way of establishing the weaker result that ˆ is conditionally ancillary given the maximal G-invariant ∗ is by noting that under the conditions of (gx) = g∗ (x) Theorem 2.1, ˆ : X → Y is G-invariant, i.e., (gx) ˆ ˆ (gx) = (x) ˆ (x). Con(X ), where X ∼ P , will depend on (; ) only sequently, the distribution of (X ˆ ) (; ) through a maximal G-invariant parameter by using Theorem 6.3 in Lehmann (1986, p. 292). This theorem though considers the case where there are no nuisance parameters. It should be noted that the invariant statistic ˆ of main concern in this paper resulted from transformations of the observations with the transformations depending on (; ) only through , the parameter of interest. The constructive proof presented in Proposition 2.2 and Theorem 2.1 highlights what transpires when using residuals, which is by rst coming up with the class which depends on and satisfying Eq. (2.1), ˆ and then replacing by an estimator . We rst illustrate these concepts and results using models without censoring. The rst example deals with the bivariate normal model in Eq. (2.5), while the second example uses the Weibull distribution. Example 2.1. Let X1 ; : : : ; Xn be i.i.d. from the model speci ed in Eq. (2.5) with restricted to be nonzero, where Xi = (Xi1 ; Xi2 )0 ; i = √ 1; : : : ; n. Using √ the mapping in Eq. (2.6), we have by Proposition 2.2 that (b0 X1 = ; : : : ; b0 Xn = ) has distribution
40
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
that depends on (; 1 ; 2 ) only through (1 =; 2 =). Since the natural estimator of given by ˆ 1 ; : : : ; Xn ) = S12 ≡ (X
n 1 P (Xi1 − X 1 )(Xi2 − X 2 ); n − 1 i=1
Pn where Xj = n−1 i=1 Xij , is (G2 ; G 2 )-equivariant, it follows from Theorem 2.1 that the distribution of 0 b0 Xn b X1 √ ;:::; √ S12 S12 depends on (; 1 ; 2 ) only through (1 =; 2 =). Thus, if one wants to study this distribution, it suces to focus on one particular value of , say = 1 (but not = 0), and vary (1 ; 2 ). ; Example 2.2. Let T be a random variable with survival function P(; ) (T ¿t) ≡ F(t; 2 ) = exp{−(t) }; t ∈ <+ ; (; ) ∈ <+ : De ning the groups G = {ga : <+ → <+ | a ∈ <+ } with ga (t) = t a , and G = {ga : <2+ → <2+ | a ∈ <+ } with ga (; ) = (=a; a ), then the model is (G; G)-invariant with maximal invariant parameter [; ] = {(=a; a ): a ∈ <+ }, which is in one-to-one correspondence with ∗ = : Let be the parameter of interest and be the nuisance parameter. The parametrized model of in∗ ∗ ∗ ∗ 2 ∗ terest becomes P(; ∗ ) (T ¿t) ≡ F (t; ; ) = exp{− t }; t ∈ <+ ; (; ) ∈ <+ which ∗ ∗ is (G; G )-invariant, where G = {g∗a : <2+ → <2+ | a ∈ <+ } with g∗a (; ∗ ) = (=a; ∗ ). Now, consider T = (T1 ; : : : ; Tn ) which is a vector of i.i.d. random variables from F(t; ∗ ; ): De ne (t1 ; : : : ; tn ) = (t1 ; : : : ; tn );which is (G; G )-invariant. By Proposition 2.2,
(T) has distribution that depends on (; ∗ ) only through ∗ . If we then consider ˆ 1 ; : : : ; tn ) of which solves the equation the MLE (t " # n P nti 1 − Pn log ti = 0; (2.7) n+ i=1 i=1 tj ˆ 1 ; : : : ; tn ) = a(t ˆ a ; : : : ; t a ): Hence, it we see that the MLE is (G; G ∗ )-equivariant since (t n 1 ˆ
ˆ
follows from Theorem 2.1 that ˆ (T1 ; : : : ; Tn ) = (T1(T1 ;:::;Tn ) ; : : : ; Tn(T1 ;:::;Tn ) ) has distribution that depends on (; ∗ ) only through ∗ , or in terms of the original parameters, only through . One could then analyze, either analytically or numerically, the distribution of ˆ (T1 ; : : : ; Tn ) or its relevant functions by simply taking = 1. This will greatly simplify such an analyses since T1 ; : : : ; Tn , when = 1, are i.i.d. exponential variables with parameter . Suppose now that the parameter of interest is and the nuisance parameter is . Rede ning the groups via G = {gb : <+ → <+ | b ∈ <+ } with gb (t) = bt, and G = {gb : with <2+ → <2+ | b ∈ <+ } with gb (; ) = (; =b), then the model is (G; G)-invariant maximal invariant parameter [; ] = {(; =b): b ∈ <+ }; which is in one-to-one cor∗ respondence with ∗ = : Hence, the reparametrized model of interest P(; ∗ ) (T ¿t) ∗ ∗ Given is the same as the original model where is replaced by and G = G.
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
41
; ); de ne (t1 ; : : : ; tn ) = T = (T1 ; : : : ; Tn ) a vector of i.i.d. random variables from F(t; ∗ (t1 ; : : : ; tn ); which is (G; G )-invariant. By Proposition 2.2, (T) has distribution that ˆ 1 ; : : : ; tn ) that solves Eq. (2.7), depends on (; ∗ ) only through ∗ . Using the ˆ = (t the MLE (t ˆ 1 ; : : : ; tn ) of is given by " (t ˆ 1 ; : : : ; tn ) =
n
Pn
ˆ i=1 ti
#1=ˆ ;
which is (G; G ∗ )-equivariant since (t ˆ 1 ; : : : ; tn ) = b(bt ˆ 1 ; : : : ; btn ) for b ∈ <+ , where we ˆ ˆ used the fact that (bt1 ; : : : ; btn ) = (t1 ; : : : ; tn ). It then follows from Theorem 2.1 that ˆ(T1 ; : : : ; Tn ) = (T ˆ 1 ; : : : ; T ˆ n ), or its relevant functions, have distributions that depend on (; ∗ ) only through ∗ , or in terms of the original parameters, only through . When examining the distributions of such statistics, it therefore suces to take = 1. 3. Applications to residuals in failure time models In this section we apply the general results in the preceding section to residuals arising from commonly used failure time models. To illustrate the generality of the results, we consider for our rst model the accelerated failure time model or log-linear model, which postulates that the logarithm of a failure time variable is linearly related to a covariate vector (see Kalb eisch and Prentice, 1980). The result for this model therefore applies also to the conventional linear regression model. 3.1. Accelerated failure time model Let (Yi ; Xi0 ); i = 1; : : : ; n, be 1 × (q + 1) random vectors where Yi is the logarithm of the failure time variable, and Xi is a q × 1 covariate vector. In the accelerated failure time model it is postulated that Y = X 0 + ; where Y = (Y1 ; : : : ; Yn )0 , X = (X1 ; : : : ; Xn ), and = (1 ; : : : ; n )0 , and with these random vectors satisfying (i) X and are independent; (ii) X1 ; : : : ; Xn are i.i.d.R with common joint distribution H (·) which belongs to H = {H a joint df on < q with kxk2 dH (x)¡ with common distribution F(·) which belongs to F = ∞}; (iii) 1 ; : : : ; n are i.i.d. R R {F is a df on < with 2 d F()¡∞ and d F() = 0}; and where is a q × 1 regression parameter vector and is a positive scale parameter. The parameter vector of the model is therefore ( ; ; F; H ) with ( ; ) being the parameters of interest, and (F; H ) being nuisance parameters. Let R 0 = ( ; ) (Y ; X ) = (Y −X 0 )=. Since the model is invariant with respect to the group consisting of mappings (y; x) 7→ (cy; x), c¿0, which induces the group of mappings on the parameter space given by ( ; ; F; H ) 7→ ( =c; =c; F; H ), then by Proposition 2.2 and since ( ; ) is invariant, the distribution of R 0 depends on ( ; ; F; H )
42
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
only through the maximal invariant parameter. The maximal invariant parameter is ∗ ( ; ; F; H ) = ( =; F; H ). Indeed, by direct calculations, it is easy to see that the distribution of R 0 depends only on h∗ ( ; ; F; H ) = F. For each ( ; ), ( ; ) is an ˆ ) injective mapping. Therefore, according to Theorem 2.1, if ( ; ˆ is an equivariant estimator of ( ; ), then the residuals 0ˆ R = ( ;ˆ ) ˆ (Y ; X ) = (Y − X )= ˆ
will have distribution that depends on ( ; ; F; H ) only through h∗ = F. In the special case where F = F0 , a known distribution function, R is therefore an ancillary statistic. Otherwise, if F is not speci ed, then R is conditionally ancillary given F. For this linear model without censoring, this result is well known; the point of this example is to illustrate that this ancillarity is subsumed by the general result in Theorem 2.1. We also remark that nding the exact distribution of R is an entirely dierent matter and may not be feasible in practice. However, by virtue of the above result, if one wants to obtain the distribution of R, either analytically or by simulation, then one needs only to vary F with ( ; ; H ) held x at some arbitrary value ( 0 ; 0 ; H0 ). One possible estimator of ( ; ) that satis es the equivariance condition is the least-squares ˆ ˆ2 ) = (X 0 X )−1 X 0 Y ; {Y 0 [I − X (X 0 X )−1 X 0 ]Y }=(n − q) ; so the residual estimator ( ; vector becomes √ n − q{[I − X (X 0 X )−1 X 0 ]Y } R= p : Y 0 [I − X (X 0 X )−1 X 0 ]Y Notice that, whereas the components of R 0 are i.i.d. from some distribution R 0 (·) which only depends on F, the components of R are not anymore i.i.d. from R 0 (·). Thus, checking that the components of R are i.i.d. from R 0 (·), either graphically or formally, in order to validate the model, could be a misleading procedure. Indeed, note that R is a highly dependent vector since R0 R = n − q. In many situations in clinical trials, reliability, and economics, it is not always possible to observe completely the exact values of Y since there is usually a vector C = (C1 ; : : : ; Cn )0 which right-censors Y . The observable vector is (Z ; ; X ) = {(Z1 ; 1 ; X1 ); : : : ; (Zn ; n ; Xn )}, where Zi = min{Yi ; Ci } and i = I {Yi 6Ci }. Let us assume that C is linearly related to X via C = X 0 + ∗ ; where is a q × 1 vector, is a positive scale parameter, and ∗ = (1∗ ; : : : ; n∗ ) is independent of (X ; ), and has i.i.d. components with common distribution G belonging to some class of distributions. For the statistical model associated with this situation, the sample space is (< × {0; 1} × < q ) n , and the parameter vector is ( ; ; ; ; F; G; H ) with ( ; ) being the parameters of main interest. The model is invariant with respect to transformations of form (z; ; x) 7→ (cz; ; x) for c¿0, and the induced transformations on the parameter space are of form ( ; ; ; ; F; G; H ) 7→ ( =c; =c; =c; =c; F; G; H ). The maximal invariant parameter is therefore given by ∗ ( ; ; ; ; F; G; H ) = ( =;
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
43
=; =; F; G; H ). Let
0
(R ; ) = ( ; ) (Z ; ; X ) =
Zi − Xi0 ; i ; i = 1; : : : ; n :
Clearly, ( ; ) (z; ; x) is invariant, so by Proposition 2.2, the distribution of (R 0 ; ) will only depend on the parameters through ∗ . Direct calculations yield Z Z ∞ 0 x ( − ) w 0 P( ; ; ; ; F; G; H ) {Ri ¿r; i = 1} = + dF(w) dH (x); G r which depends on the parameters only through h∗ = (( − )=; =; F; G; H ). Since ˆ ) ˆ is ( ; ) (·; ·; ·) is an injective mapping for every ( ; ), then by Theorem 2.1, if ( ; an equivariant estimator of ( ; ), the residuals ( ! ) Zi − Xi0 ˆ (R; ) = ( ;ˆ ) ; i ; i = 1; : : : ; n ˆ (Z ; ; X ) = ˆ depend on the parameters only through h∗ . Again, a potential use of this result is that if one wants to determine the distribution of (R; ), one need not consider all possible parameter values but only those associated with the equivalence classes induced by h∗ . One possible equivariant estimator of ( ; ) under this censored setting is that proposed by Buckley and James (1979). Let us consider the special case where H (·) is degenerate at x = 0, so the model is the no-covariate model. Furthermore, suppose that F = F0 and G = G0 , where F0 and G0 are completely speci ed. Then, since the distribution of (R 0 ; ) is given by P(; ) {R0i ¿r; i
Z = 1} =
∞ r
Z 1 w h i d F0 (w) = G0 G 0 F0−1 (u) du; F0 (r)
which is determined by the function (·; =) ≡ G 0 [(=)F0−1 (·)], then the distribution of (R; ) will depend on (; )R only through the (·; =) function. Furthermore, if the 1 mapping (·; =) 7→ p(=) ≡ 0 (u; =) du is injective, then the distribution of (R; ) will only depend on the (un)censoring probability p(=). Since p(c) = E{G 0 (cW )}, then, if there exists an open interval O where W ∼ F0 (·), and G 0 (·) is nonincreasing, R such that G0 (·) is nonconstant in O and if O dF0 (w)¿0, the mapping (·; c) 7→ p(c) is injective, hence the distribution of (R; ) depends only on the (un)censoring probability. 3.2. Proportional hazards model A widely used regression model in biostatistical settings is the proportional hazards model of Cox (1972). This model postulates that the hazard function of a failure time variable T , given a q × 1 covariate vector X , is T |X (t | X ) = 0 (t) exp{X 0 };
(3.1)
44
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
where 0 (·) is a baseline hazard function, and is a q × 1 regression coecient vector. This model is equivalent to the accelerated failure time model whenever 0 (·) is a Weibull hazard function (cf., Kalb eisch and Prentice, 1980). Let (T1 ; X1 ); : : : ; (Tn ; Xn ) be independent random vectors with the Xi ’s assumed xed, and assume that the hazard function of Ti , given Xi , follows Eq. (3.1). The baseline hazard function 0 , though unknown, will be assumed to belong to a class C consisting of strictly increasing and continuous hazard functions. The parameter of the model is therefore (0 ; ) which takes values in C × < q . Let T denote the range of the failure time variable, which in most situations will be T = [0; ∞). Consider a group (with respect to the operation of composition of functions) G of strictly increasing and continuous functions g from T onto itself. For each g ∈ G, associate a mapping from T onto T de ned via t 7→ g(t), and a mapping from C × < q onto itself de ned via (0 ; ) 7→ (0 g−1 ; ). Provided that for each 0 ∈ C and g ∈ G, we have 0 g−1 ∈ C, the resulting groups of mappings from T onto itself, and C × < q onto itself, formed by letting g vary over G, make the model invariant. If for each (2) (1) (2) −1 ∈ G, then a maximal invariant parameter is given (1) 0 ; 0 ∈ C, we have 0 (0 ) ∗ by (0 ; ) 7→ (0 ; ) = . Let R 0 = (0 ; ) (T | X ) = {R0i ≡ 0 (Ti ) exp{Xi0 }: i = 1; : : : ; n}: Since (0 g−1 ; ) (g(t) | X ) = (0 ; ) (t | X ) for every t ∈ T, (0 ; ) ∈ C × < q , and g ∈ G, it follows from Proposition 2.2 that the distribution of R 0 depends on (0 ; ) only through ∗ (0 ; ) = . By direct calculations, conditionally on X , R 0 is, in fact, a vector of i.i.d. unit exponential variates, so R 0 is an ancillary quantity. Since 0 is strictly increasing and we have assumed the Xi ’s to be xed, the mappings (0 ; ) (· | X ) ˆ is an equivariant estimator of (0 ; ), then it are injective. Consequently, if (ˆ 0 ; ) follows from Theorem 2.1 that the residual vector 0ˆ ˆ R = (ˆ 0 ; ) ˆ (T | X ) = {Ri ≡ 0 (Ti ) exp{Xi }: i = 1; : : : ; n}
ˆ of (0 ; ) is provided is an ancillary statistic. One such equivariant estimator (ˆ 0 ; ) by taking ˆ to be the partial maximum likelihood estimator of , which is the that maximizes the partial likelihood function # " 0 n Q } exp{X i P LP ( ) = 0 i=1 { j: Tj ¿Ti } exp{Xj } and by letting ˆ 0 be the Aalen–Breslow estimator given by # " P 1 ˆ 0 (t) = : P 0ˆ {i:Ti 6t} { j: Tj ¿Ti } exp{Xj } Next, consider the situation where a right-censored data (Z1 ; 1 ; X1 ); : : : ; (Zn ; n ; Xn ) is available, where Zi = min{Ti ; Ci } and i = I {Ti 6Ci }, and with the Xi ’s considered xed. We assume that, conditionally on X , the Ci ’s are independent and that
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
45
the Ti ’s and Ci ’s are also independent. To allow for the possibility that censoring depends on the covariates, we assume that the hazard function of Ci , given Xi , is C|X (c|X ) = 1 (c) exp{X 0 }, where 1 is a baseline hazard function belonging to a class C1 of strictly increasing and continuous hazard functions, and such that for every 1 ∈ C1 and g ∈ G, we have 1 g−1 ∈ C1 . Note that if = 0, then the censoring variables are i.i.d. For this censorship model, the parameter vector is (0 ; ; 1 ; ) with (0 ; ) being the parameter of primary interest. For each g ∈ G, consider the transformations (z; ) 7→ (g(z); )
and
(0 ; ; 1 ; ) 7→ (0 g−1 ; ; 1 g−1 ; ):
Then the groups obtained by letting g vary over G make the censored model invari(2) (1) (2) −1 ∈ G, then a maximal invariant ant. If, for each (1) 0 ; 0 ∈ C, we have 0 (0 ) −1 ∗ parameter is (0 ; ; 1 ; ) 7→ (0 ; ; 1 ; ) = (1 0 ; ; ): Let (R 0 ; ) = (0 ; ) (Z ; | X ) = {(0 (Zi ) exp{Xi0 }; i ): i = 1; : : : ; n}: Since (0 ; ) (·; · | X ) is invariant, it follows from Proposition 2.2 that the distribution of (R 0 ; ) depends on (0 ; ; 1 ; ) only through ∗ (0 ; ; 1 ; ) = (1 −1 0 ; ; ). Indeed, direct calculations show that P(0 ; ; 1 ; ) {R0i ¿r; i = 1 | Xi } Z ∞ 0 0 exp{−[1 −1 = 0 (u exp{−Xi })] exp{Xi }} exp(−u) du; r
so the distribution of (R 0 ; ) depends on the parameter vector only through the functions 0 0 h∗ (0 ; ; 1 ; ) = {(1 −1 0 [· exp(−Xi )]) exp(Xi ): i = 1; : : : ; n}:
(3.2)
ˆ be an estimator of (0 ; ) based on {(Zi ; i ; Xi ): i = 1; : : : ; n}, and de ne Let (ˆ 0 ; ) the residuals to be 0ˆ ˆ (R; ) = (ˆ 0 ; ) ˆ (Z ; | X ) = {(0 (Zi ) exp{Xi }; i ): i = 1; : : : ; n}:
ˆ is equivSince ((0 ; ) (·; · | X1 ); : : : ; (0 ; ) (·; · | Xn )) is an injective map, then if (ˆ 0 ; ) ariant, it follows from Theorem 2.1 that the distribution of (R; ) depends on (0 ; ; 1 ; ) only through the functions in Eq. (3.2). One such equivariant estimator of (0 ; ) ˆ where ˆ is the censored data partial likelihood MLE of , and ˆ 0 is the is (ˆ 0 ; ), censored data version of the Aalen–Breslow estimator (cf., Kalb eisch and Prentice, 1980). The random right censorship model is obtained from the censored Cox model above by setting Xi = 0, (i = 1; : : : ; n). In such a case, the distribution of the right-censored residuals {(Ri ; i ) = (ˆ 0 (Zi ); i ): i = 1; : : : ; n}, where ˆ 0 is the Nelson–Aalen estimator (cf., Kalb eisch and Prentice, 1980) of 0 de ned via P i ˆ ; 0 (t) = {i: Zi 6t} n − rank(Zi ) + 1
46
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
depends on (0 ; 1 ) only through the function (·) = 1 −1 model, the the0 (·). For this R∞ oretical proportion of uncensored observations is p(0 ; 1 ) = −∞ exp[−(u)] exp(−u) du: Unless additional constraints are imposed on the classes C and C1 , this probability is not in one-to-one correspondence with the function (·), so we could not conclude that the distribution of (R; ) depends only on the (un)censoring probability. One special case in which the uncensoring probability determines the distribution of the residuals is under the model of Koziol and Green (1976) where it is assumed that 1 (·) = c0 (·). Under this model, (t) = 1 −1 0 (t) = ct which is clearly in one-toone correspondence with p(0 ; 1 ) = 1=(1 + c), and indeed, the exact distribution of (R; ) can be characterized as was done in Baltazar-Aban and Pe˜na (1995). Though the Koziol–Green model is somewhat restrictive from a practical standpoint (cf., Csorg˝o, 1989), it could serve as a viable model in competing risks model. Its further utility is it enables obtaining closed-form expressions in many instances which allow the exact evaluation of eciency losses of non- or semi-parametric procedures relative to parametric procedures. This model has been the subject of recent research; for instance, see the papers of Chen et al. (1982), Abdushukurov (1984), Cheng and Lin (1987), Csorg˝o (1988), Hollander and Pe˜na (1989), Herbst (1992a, b, 1993), Stute (1992) and Ditka (1995). 3.3. Type II censorship model In reliability life-testing studies, where it is usually desired to control the total number of failed units, an often-used model is the Type II censorship model. Under this model, T1 ; : : : ; Tn are i.i.d. with common unknown hazard function (·) which belongs to a parametric class of hazard function C = {(·; ) : ∈ }, where is some subset of Euclidean space, and a d ∈ {1; : : : ; n} is prespeci ed. The observable random variables are T(1) ; : : : ; T(d) , the rst d-order statistics of T1 ; : : : ; Tn . Let T be the range space of the failure time variable T , G be a group of transformations from T onto itself, G be a group of transformations from onto itself, and such that there is a homomorphism Suppose that (G; G) makes the uncensored model invariant, so that h from G onto G. for every g ∈ G; ∈ , if (T1 ; : : : ; Tn ) are i.i.d. (·; ), then (g(T1 ); : : : ; g(Tn )) are i.i.d. the Type II censored model (·; g()), where g = h(g). Then, with respect to (G; G), is also invariant since, for every g ∈ G and ∈ , P {g(T(1) )6z(1) ; : : : ; g(T(d) )6z(d) } P P {g(Ti1 )6z(1) ; : : : ; g(Tid )6z(d) ; g(Tid+1 )¡∞; : : : ; g(Tin )¡∞} = i1 ;:::; in
=
P
i1 ;:::; in
Pg() {Ti1 6z(1) ; : : : ; Tid 6z(d) ; Tid+1 ¡∞; : : : ; Tin ¡∞}
= Pg() {T(1) 6z(1) ; : : : ; T(d) 6z(d) }: In this situation there are no nuisance parameters, and it is usually the case that so the group G is transitive in the sense that, for each ∈ , C = {g(): g ∈ G},
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
47
that the maximal invariant parameter is a constant. Below we shall assume that G is transitive. Let R 0 = (T(1) ; : : : ; T(d) ) = {R0(i) = (T(i) ; ): i = 1; : : : ; d}. If (·; ·) is (G; G)invariant, that is, (g(t); g()) = (t; ) for every g ∈ G, t ∈ T, and ∈ , then it follows from Proposition 2.2 that R 0 is an ancillary quantity. If (·; ) is strictly increasing on T, then (·; : : : ; ·) is an injective map, so by Theorem 2.1, if ˆ is a (G; G)-equivariant estimator of based on T(1) ; : : : ; T(d) , then the residual vector ˆ i = 1; : : : ; d} R = ˆ(T(1) ; : : : ; T(d) ) = {R(i) = (T(i) ; ): is an ancillary statistic. Thus, for the purpose of examining the distribution of R, either analytically or through numerical methods, it would suce to consider one particular value of . This will make such examination more ecient and simpli ed. We provide two concrete examples to illustrate the above results. For our rst example, consider the exponential model, so that C = {(t; ) = t : ∈ <+ }. The model becomes invariant with respect to the groups G = {g(t) = ct : c ∈ <+ } and G = {g() = =c : c ∈ <+ }. Also, clearly, G is transitive. The MLE of based on Pd T(1) ; : : : ; T(d) is ˆ = d=[ i=1 T(i) + (n − d)T(d) ], which is equivariant. Consequently, ˆ (d) ) is an ancillary statistic. The second examˆ (1) ; : : : ; T the residual vector R = (T ple is the Weibull model where C = {(t; ; ) = (t) : (; ) ∈ <2+ }. The groups G = ) = (=b; b =a): (a; b) ∈ <2+ } make the Type II {g(t) = at b : (a; b) ∈ <2+ } and G = {g(; ˆ of (; ) censored model invariant. Again, note that G is transitive. The MLE (; ˆ ) based on (T(1) ; : : : ; T(d) ) is the unique solution of the set of equations d P t(i) + (n − d)t(d) = 0; −d i=1
d= − (n − d)[t(d) ] +
d P i=1
(log(t(i) )[1 − (t(i) ) ]) = 0:
ˆ (i) ]ˆ: It is easy to see that the MLE is equivariant, so the residual vector R = {Ri = [T i = 1; : : : ; d} is an ancillary statistic. We point out that in this example the exact distribution of R is not in closed form; on the other hand, the preceding ancillarity result indicates that to examine the distribution of R, for example through numerical methods, it suces to consider one particular value of (; ), say, (; ) = (1; 1). Finally, we remark that in this Type II censorship model, a nonparametric speci cation where we allow (·) to vary in the class of all continuous hazard functions is not ˆ (i) ): i = 1; : : : ; d}, of much interest. This is so since the residual vector R = {R(i) = (T ˆ where (·) is the Nelson–Aalen estimator of (·) based on T(1) ; : : : ; T(d) , is degenerate Pi at { j=1 1=(n − j + 1): i = 1; : : : ; d}! 4. Summary Generalized residuals arising in failure time models, and in other statistical models, are useful in goodness-of- t tests, model validation and diagnostics. Their exact and
48
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
asymptotic distributional properties, however, have not yet been studied extensively. This paper addresses the question of when residuals possess a conditional ancillarity property given some many-to-one valued function of the model parameters. This property can be exploited to simplify the examination of the exact distributional properties of residuals via analytic and=or Monte Carlo methods. It was found that when the statistical model possesses an invariance structure, and if the estimators of the model parameters used in obtaining the residuals are equivariant, then the conditional ancillarity property obtains. The general characterization result which provides sucient conditions in order for conditional ancillarity to hold is applied to the accelerated failure time model, the Cox proportional hazards model, which covers as a special case the random right censorship model without covariates, and Type II censored models.
Acknowledgements Pe˜na acknowledges the support of the BGSU Faculty Research Committee Basic Grant. The authors also wish to sincerely thank the two referees and Dr. Michael Srensen for their very helpful comments and criticisms which led to clari cations and improvements. References Abdushukurov, A., 1984. On some estimates of the distribution function under random censorship (in Russian). Conf. of Young Scientists, Math. Inst. Acad. Sci. Uzbek SSR, Tashkent, VINITI No. 8756-V. Baltazar-Aban, I., Pe˜na, E., 1995. Properties of hazard-based residuals and implications in model diagnostics. J. Amer. Statist. Assoc. 90, 185–197. Buckley, J., James, I., 1979. Linear regression with censored data. Biometrika 66, 429 – 436. Chen, Y., Hollander, M., Langberg, N., 1982. Small-sample results for the Kaplan-Meier estimator. J. Amer. Statist. Assoc. 77, 726 – 743. Cheng, P., Lin, G., 1987. Maximum likelihood estimation of survival function under the Koziol-Green proportional hazards model. Statist. Probab. Lett. 5, 75 – 80. Cook, R., Weisberg, S., 1982. Residuals and In uence in Regression. Chapman & Hall, New York. Cox, D., 1972. Regression models and life tables (with discussion). J. Roy. Statist. Soc. B 34, 187– 220. Cox, D., Oakes, D., 1984. Analysis of Survival Data. Chapman & Hall, London. Cox, D., Snell, E., 1968. A general de nition of residuals. J. Roy. Statist. Soc. B 30, 248 – 275. Crowley, J., Hu, M., 1977. Covariance analysis of heart transplant survival data. J. Amer. Statist. Assoc. 72, 27– 36. Csorg˝o, S., 1988. Estimation in the proportional hazards model of random censorship. Statistics 19, 437– 463. Csorg˝o, S., 1989. Testing for the proportional hazards model of random censorship. Proc. 4th Prague Symp. on Asymptotic Statistics (Prague 1988), Charles University, Prague, pp. 41–53. Ditka, G., 1995. Asymptotic normality under the Koziol-Green model. Commun. Statist. – Theory Meth. 6, 1537–1549. Herbst, T., 1992a. Test of t with the Koziol–Green model for random censorship. Statist. Decisions 10, 163–171. Herbst, T., 1992b. Estimation of moments under Koziol–Green model for random censorship. Commun. Statist. – Theory Meth. 21, 613–624. Herbst, T., 1993. On estimation of residual moments under Koziol–Green model of random censorship. Commun. Statist. – Theory Meth. 22, 2403–2419.
I.B. Aban, E.A. Pe˜na / Journal of Statistical Planning and Inference 74 (1998) 31– 49
49
Hollander, M., Pe˜na, E., 1989. Families of con dence bands for the survival function under the general random censorship model and the Koziol–Green model. Canadian J. Statist. 17, 59 –74. Horowitz, J., Neumann, G., 1992. A generalized moments speci cation test of the proportional hazards model. J. Amer. Statist. Assoc. 87, 234 – 240. Kalb eisch, J., Prentice, R., 1980. The Statistical Analysis of Failure-Time Data. Wiley, New York. Kay, R., 1977. Proportional hazard regression models and the analysis of censored survival data. Appl. Statist. 26, 227– 237. Koziol, J., Green, S., 1976. A Cramer–von Mises statistic for randomly censored data. Biometrika 63, 465–474. Lehmann, E.L., 1986. Testing Statistical Hypotheses. 2nd ed. Wiley, New York. Pe˜na, E., 1995. Residuals from Type II censored samples. In: Balakrishnan, N. (Ed.), Recent Advances in Life-Testing and Reliability (Ch. 28, pp. 523–544). CRC Press, Boca Raton, FL. Stute, W., 1992. Strong consistency under the Koziol–Green model. Statist. Probab. Lett. 14, 313 –320.