Journal of Statistical Planning and Inference 134 (2005) 49 – 63 www.elsevier.com/locate/jspi
Estimation from retrospectively ascertained data—on what to condition? Ori Davidov Department of Statistics, University of Haifa, Mount Carmel, Haifa 31905, Israel Received 28 January 2003; accepted 25 January 2004 Available online 23 July 2004
Abstract This paper discusses the estimation of the sojourn time distribution in a latent state using observational data. We assume that observations are available only if an event has occurred. This type of sampling is referred to as retrospective ascertainment. Our focus is the information content of various constructions of the likelihood function. © 2004 Elsevier B.V. All rights reserved. MSC: 62B99; 62F12; 62N99 Keywords: Conditioning; Information inequalities; Latency period; Retrospective ascertainment; Semi-parametric efficiency; Truncated exponential distribution
1. Introduction and notation Consider the random variables S1 , S2 , . . . which are a realization of a stochastic process, with intensity function (x), in the observation interval [0, W ]. With each time Si we associate two positive and independent random variables Ui and Vi , with probability density functions qU (x) and qV (x), respectively. The associated distribution functions are QU (x) and QV (x). Further define Ti = Si + Ui if Ui Vi and Ti = ∞ otherwise. More concretely the random variable T represents the time at which a “disease” develops or is diagnosed. It is assumed that the disease is caused by an event, referred to as “infection”, that occurred at time S. The disease has a latent sojourn of U time units. The random variable V represents the ensemble of competing risks. Hence if V < U then the disease, does not develop and E-mail address:
[email protected] (O. Davidov). 0378-3758/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2004.01.021
50
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
the infection time S is unobservable. We further assume that (S, T ) can be observed only if T W . This type of sampling is known as retrospective ascertainment. This model was originally motivated by the problem of estimating the latency period of AIDS from blood transfusion data. In this context, S is the time of infection with HIV, T is the time of AIDS diagnosis, and W is the time from the beginning of the epidemic. Obviously U = T − S is the sojourn time in the incubation or latent phase of the disease and V is the time to death from other causes. Note that most often the time of infection with HIV is unknown. However, if the infection is due to a blood transfusion then its date can be accurately determined. This set up has been investigated in numerous papers by a variety of methods. For more details and related problems, see the review articles by Donnelly and Cox (2001) and Becker and Marschner (2001) and the references therein. In this communication, we derive the conditional distributions of the latency period. We condition on either, the time of infection or, the time of diagnosis and show that the resulting distributions differ. We compare the conditional approaches with the full likelihood with respect to the information about the parameter indexing the distribution of U . 2. The conditional-likelihood functions Let (s, t) denote a realization of the infection–disease process. Its intensity function is given by fST (s, t) = (s)qU (t − s)QV (t − s) for 0 s t, where QV (x) = 1 − QV (x) is the tail probability for the competing risks. A realization of this process is observed if and only if T W . Thus, the density of the observed o (s, t), may be written as process, which we denote by fST o fST (s, t) : =fST (s, t|T W ) =
(s)qU (t − s)QV (t − s) . PW
(2.1)
o (s, t) integrates to one. Thus, Note that PW is simply a normalizing constant and that fST
PW = [qU QV ∗ ](W ),
t where ∗ indicates a convolution and (t) = 0 (u)du. In what follows we assume that o is a conditional density (as indicated by the superT W . Although strictly speaking fST script), in the sequel we will refer to it as the full likelihood. Consider the transformations (S, T ) −→ (U, S) and (S, T ) −→ (U, T ). In matrix notation they are expressed as U −1 1 S = , S 1 0 T and
U T
=
−1 1 0 1
S . T
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
51
The Jacobian of both transformations is unity. Hence, the joint density of the transformed variables are easy to obtain. They are given by
(s)qU (u)QV (u) , PW (t − u)qU (u)QV (u) . fUo T (u, t) = PW
fUo S (u, t) =
Next, we calculate the conditional density of U given S and T, respectively. We derive that fUo |S (u|s) =
fUo S (u, s) qU (u)QV (u) = W −s , o fS (s) qU (u)QV (u) du
(2.2)
fUo T (u, t) (t − u)qU (u)QV (u) . = t o fT (t) 0 (t − u)qU (u)QV (u) du
(2.3)
0
fUo |T (u|t) =
Based on (2.2) and (2.3) and given the data (S1 , T1 ), . . . , (SN , TN ) it is possible to construct two likelihood functions for the parameter indexing the latency distribution. We call these the conditional likelihoods. We emphasize that these conditional distributions are derived from o by further conditioning; either on the time of infection or the time of diagnosis. Note fST that although (Si , Ti ) are identically distributed, the contribution (in terms of information) of each pair to either of the conditional likelihoods varies. Furthermore, it is clear that the distributions of U and V are not separately identifiable from this type of data. In other words, because U and V are not observed separately we cannot (without further assumptions, or knowledge) estimate their distributions directly. However, since U is observed conditional on U V and T W we can always estimate the density W 0
qU (u)QV (u) qU (x)QV (x) dx
.
(2.4)
If an independent estimate of QV (such an estimator may be obtained using life-table methodology) is available, then qU may be estimated; it is obtained by normalizing the ratio of the estimator of (2.4) to the estimator of QV . Alternatively, one may assume no competing risk. The latter approach is more common in practice. Both parametric and non-parametric methods can be devised to estimate the latency distribution. For example, conditioning on the time of transfusion Lui et al. (1986) assumed a Weibull distribution for the latency period. Similarly conditioning, Lagakos et al. (1988) estimated the latency distribution using non-parametric methods for truncated survival data. Their work was further generalized by Finkelstein et al. (1993) who fit a proportional hazard model. Medley et al. (1987) and Brookmeyer and Gail (1988) used the full likelihood (2.1) based on the joint distribution of (S, T ). Other authors (Kalbfleisch and Lawless, 1989; Wang and See, 1992, among others) calculate the joint distribution of (N, S, T ), where N is the sample size. They assume that the sample size has a Poisson distribution with rate PW . It has been shown that this construction leads to the same parameter estimates and inferences as (2.1). In the following, we compare the likelihoods based on (2.1)–(2.3). All three approaches have merit. The likelihood (2.3), conditioning on the time of diagnosis, reflects the
52
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
mechanism by which the data has accrued; the joint likelihood (2.1) is the most informative, whereas (2.2) eliminates the nuisance function .
3. An illustrative example Consider the following simplified situation. Assume that there are no competing risk, i.e., QV (t) = 1 for all t, and that the latency period U is exponentially distributed with mean 1/. Set 0, and let the intensity function of the infection process have the form (s) = exp(±s). Note that a “+” in the exponent of (s) indicates an increasing infection rate whereas a “−” indicates a decreasing infection rate. If =0 the infection rate is constant in time. A short calculation shows that fUo |S (u|s) =
exp(−u) 1 − exp(−(W − s))
(3.1)
and provided = , fUo |T (u|t) =
( ± ) exp(−( ± )u) . 1 − exp(−( ± )t)
(3.2)
Note that both (3.1) and (3.2) are the distribution functions of a truncated exponential random variable. Conditioning on S = s, the time of infection, as in (3.1), and looking forward in time amounts to observing an exponential random variable with hazard truncated at W . Conditioning on T = t, the time of diagnosis, as in (3.2), and looking-back in time (now only S which occurred prior to T is random) amounts to observing an exponential random variable with hazard ± truncated at 0. Note that if the infection rate is decreasing then − may be zero or negative. Hence, (3.2) belongs to the generalized truncated exponential family described by Hannon and Dahiya (1999). The case = is special as fUo |T (u|t) = 1/t for 0 u t. This distribution provides no direct information about . Note that the conditional distribution (3.2) involves , and is thus confounded with the intensity of the infection process (t), whereas the distribution (3.1) is not. Hence, the conditioning in (3.1) eliminates the infection process, which may be regarded as a nuisance parameter, from the likelihood. This is generally true as (2.3) involves the incidence of the infection process, whereas (2.2) does not. We use IU |s () and IU |t () to denote the Fisher information about in (3.1) and (3.2), respectively. Their expected values are IU |S () and IU |T (). Assume that the infection process is homogenous in time, i.e., = 0. Consequently both U |S =s and U |T =t are exponential random variables with hazard rate , truncated at W −s and t, respectively. Recall that the information in a single observation from an exponential distribution truncated at x is 1
2
−
x 2 exp(−x) (1 − exp(x))2
,
(3.3)
where x = W − s for (3.1) and x = t for (3.2). Clearly, the information in the observation (S, T ) is random. Note that T = W − S if and only if (S, T ) is symmetric about W/2.
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
53
This event occurs with probability zero. However, it is straightforward to establish that fTo (x) = fWo −S (x) for all values of x and all W . Hence, the expected information satisfies IU |S () = E[IU |s ()] = E[IU |t ()] = IU |T (), where the expectations are taken with respect to S and T, respectively. Consequently the likelihoods (3.1) and (3.2) are equally informative about when the rate of the infection process in constant. The information in the conditional likelihoods can be calculated explicitly, it is given by 1 exp(W ) [J (1, 1 − W ) − J (2, W )] , (W − 1) exp(W ) + 1 2 where
y
J (x, y) = 0
s 2 exp(−xs) (1 − exp(−s))2
ds.
With some algebra we can show that J (1, W ) − J (2, W ) may be written as 2(3) − 2W (2, exp(−W )) − 2(3, exp(−W )) + (W )2 log(1 − exp(−W )), n a where is the polylog function defined by (a, b) = ∞ n=1 b /n , and (3) ≈ 1.202 is Riemann’s Zeta function. It is interesting to compare the conditional likelihoods (3.1) and (3.2) (calculated at =0) with the full likelihood o (s, t) = fST
2 exp(−(t − s)) W + exp(−W ) − 1
(3.4)
based on the joint distribution of (S, T ). Note that this is the distribution of a truncated (at W), randomly shifted (by s), exponential random variable. Here the information is given by IST () =
2
2
+
W 2 [(W + 1) exp(−W ) − 1] (W + exp(−W ) − 1)2
.
In Fig. 1 we compare the information in the conditional and full likelihoods. Fig. 1a describes the effect of the observation period when the mean sojourn time (1/) is 1. Fig. 1b describes the effect of the mean sojourn time when the observation period (W) is 10. Fig. 1a, indicates that both IU |S (or IU |T ) and IST approach unity, or more generally 1/2 , when the observation period becomes large. The asymptotic relative efficiency (ARE), comparing the conditional to the joint likelihood and given by IU |S /IST , is smaller than unity for all values of W . In fact, we can show that for all lim
W →0
IU |S 3 = , 4 IST
54
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
Information
Information and relative efficiency - for varying observation times 1.2 0.8 Conditional Joint ARE
0.4 0.0 0
5
10
(a)
15
20
25
30
Observation time
Information and relative efficiency - for varying mean sojourn times Information
6 5 4 3
Conditional Joint ARE
2 1 0 0
(b)
5
10
15
20
25
30
Hazard rate
Fig. 1.
which shows that the conditional approach is less efficient for small observation periods. Moreover, although both information numbers approach 1/2 , we can show that IU |S 2((3) − 1)/ 1 lim =1 = lim 1 − +O W →∞ IST W →∞ W W2 hence for long observation periods the conditional approaches is less efficient in second order. Fig. 1b shows that the information increases with the mean sojourn time, approaching W 2 /24 and W 2 /18, respectively. Moreover, similar limiting behavior is obtained with 1/ interchanged with W . The situation when the infection rate is not constant is more complicated. In the following, we present some numerical calculations of the asymptotic relative efficiency comparing the conditional likelihoods and the full likelihood. We let the observation period be W = 10 and consider the values = 18 , 41 , 21 , 1 for the infection rate parameter and 1/ = 1, 2, 5, 10, 15 for the mean sojourn time. Calculations for an increasing infection rate are displayed in Table 1 and for decreasing rate in Table 2. The results indicate that using the joint likelihood (3.4) is more efficient than using either of the conditional likelihoods (3.1) or (3.2). Table 1 reveals that for an increasing infection process conditioning on the time of infection is the least informative. The ratio IU |S /IST varies between 50% and 90% and is decreasing as a function of the mean sojourn time and the infection rate. The ratio IU |T /IST behaves similarly as a function of 1/ and but is much higher between 80% and 100%. Table 2 shows that if the infection rate is decreasing then IU |S is generally higher that IU |T , i.e., conditioning on the time of infection is more informative than conditioning on the time of diagnosis. In fact it may be considerably higher especially for large values of and long mean sojourn times. In these situations the
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
55
Table 1 The asymptotic relative efficiency comparing the full and conditional likelihoods for an infection process with increasing rate IU |S IST
Mean sojourn
= 18
1 2 5 10 15
.933 .879 .787 .741 .724
IU |T IST
IU |S IU |T
.849 .766 .659 .606 .590
IU |T IST
IU |S IU |T
.990 .979 .964 .949 .944
.915 .858 .763 .724 .708
1.00 1.00 1.00 1.00 1.00
.768 .678 .588 .548 .533
= 41 .978 .960 .917 .892 .883
.954 .916 .858 .831 .819
= 21 1 2 5 10 15
IU |S IST
.906 .840 .736 .687 .668
=1 .998 .997 .994 .995 .992
.851 .768 .663 .609 .595
.768 .678 .588 .548 .533
Table 2 The asymptotic relative efficiency comparing the full and conditional likelihoods for an infection process with decreasing rate IU |S IST
Mean sojourn
= 18
1 2 5 10 15
.976 .947 .886 .851 .836
IU |T IST
IU |S IU |T
.997 — .964 .946 .940
IU |T IST
IU |S IU |T
.880 .760 .599 .538 .519
1.12 1.27 1.54 1.66 1.70
— .238 .132 .114 .109
— 4.20 7.47 8.66 8.99
= 41 .925 .851 .728 .672 .654
1.06 1.11 1.22 1.26 1.28
= 21 1 2 5 10 15
IU |S IST
.988 .968 .922 .893 .884
=1 .766 — .363 .314 .300
1.30 — 2.66 3.01 .313
— 1.00 .990 .984 .981
AREs IU |S /IST are close to unity. The missing entries indicate that U |T follows a uniform distribution and therefore IU |T = 0. The absolute as well as relative efficiency are functions of . Large values of imply that S is close to W , T − S is small and hence little information is available about . On
56
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
the other hand, large negative values of imply that S is close to 0, T − S is as large as possible and therefore more information on is available. The relative efficiencies are also functions of , for example we can show that for fixed W, lim
→−∞
IU |T () IU |S () = lim =1 IST () →∞ IST ()
and
lim
→∞
IU |S () IU |T () = lim = 0. IST () →−∞ IST ()
4. Main results The results above hold more generally and do not depend on the exponentiality of U . To show this, first note, that if the infection process is constant in time, with rate , then the joint density (2.1) reduces to qU (t − s) , (4.1) RW W where RW = PW / = 0 QU (s) ds. From (2.2) and (2.3) it follows that U |S and U |T are distributed like U , truncated at W − S and T, respectively. Hence, comparing the respective likelihoods amounts to comparing the truncation times. We observe that W W −s 1 o P [T W − S] = fST (s, t) dt ds = qU (t − s) dt ds RW 0 s+t W 0 W W 1 1 1 QU (s) ds = . QU (W − 2s) ds = = 2 2RW 0 RW 0 o (s, t) = fST
Thus, the information in (2.2) is larger/smaller than the information in (2.3) on average half of the time. In the following, we assume that the distribution of U is indexed by a vectorvalued parameter . We investigate the relationship among the likelihoods (2.1)–(2.3). For completeness, we also investigate the likelihood fUo |R (u|r), where R = max(T , W − S). We begin with the following intuitive lemma which generalizes the proof of Kass and Vos (1997) given for exponential families. Lemma 1. Let X be a random variable whose distribution is indexed by a parameter , and let IX () denote the information about . Define Y = g(x) then, I X () I Y () with equality if and only if Y is sufficient for . () = d/d log f (X; ) and l () = d/d log f (Y ; ) where f (X; ) and Proof. Let lX X Y X Y fY (Y ; ) are the density functions of X and Y, respectively. The information about in X is IX () = Var[lX ()] = E[Var[lX ()|Y ]] + Var[E[lX ()|Y ]].
(4.2)
Note that the first term of (4.2) is non-negative, i.e., E[Var[lX ()|Y ]] 0.
(4.3)
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
57
From the second term of (4.2) we deduce that, lX ()fX (x; ) dx E[lX ()|Y ∈ C] = Y −1 (C)
* = fX (x; ) dx * Y −1 (C) * = fY (y; ) dy * C * fY (y; ) dy = C * =
C
lY ()fY (y; ) dy
= E[lY ()|Y ∈ C]. ()|Y ∈ where Y −1 (C) denote the X values for which g(X) ∈ C. Using the equality E[lX C]= E[lY ()|Y ∈ C] and the properties of conditional expectations it immediately follows that ()|Y ]. lY () = E[lX
The information about in Y is IY () = Var[lY ()] = Var[E[lX ()|Y ]].
(4.4)
Combining Eqs. (4.2)–(4.4)we conclude that IX () IY (). ()|Y ] = 0. But Equality is attained if and only if Var[lX Var[lX ()|Y ] = E[lX ()2 |Y ] − E[lX ()|Y ]2 () is constant given Y . However, l () = l () if and only if equals zero if and only if lX X Y Y is sufficient for . In this case IX () = IY ().
Note that X and in the proof above may be vector valued. We now apply the preceding result and compare the various information quantities; for notational convenience we suppress their dependence on . Theorem 2. Assume that (s) = , then for all and W , A : IU |T () = IU |S () B : IU |S () < IST () Furthermore for R = max(T , W − S), we have C : IU |S () = IU |R ()
58
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
Proof. (A) From (2.2) and (2.3) it follows that qU (u) , QU (W − s) qU (u) . fUo |T (u|t) = QU (T ) fUo |S (u|s) =
Thus, U |S and U |T are distributed like U , truncated at W − S and T, respectively. The conditional information is denoted IU |x () for x = T or W − S. Note that W W qU (t − s) QU (W − s) o fST (s, t) dt = dt = , fSo (s) = RW RW st s t QU (t) qU (t − s) o fTo (t) = ds = . fST (s, t) ds = RW RW 0 0 Hence fTo (x) = fWo −S (x) for all x W and it follows that IU |T () = IU |t ()fTo (t) dt = IU |W −s ()fWo −s (s) ds = IU |S (). (B) The transformation (S, T ) −→ (U, S) is linear and invertible therefore (U, S) is sufficient for . Applying Lemma 4.1 we establish that IST () = IU S (). Moreover, since fUo S (u, s) = fUo |S (u|s)fSo (s) we have IU S () = IU |S () + IS (). The positivity of the information immediately implies that IST () > IU |S (). Similar arguments establish the relations IST () > IU |S (). (C) Using the representation U −1 1 S = I{S+T >W } R 0 1 T −1 1 S 0 + + I{S+T W } . −1 0 T W Clearly, (S, T ) −→ (U, R) is not invertible therefore (U, R) is not sufficient for and by Lemma 4.1 IST () > IU R () > IU |R (). Furthermore, we can show that for constant rate models qU (u) 2 fU |R (u|r) = and fR (r) = QU (2r − W ), where W/2 r W. QU (2r − W ) RW Algebraically evaluating the information we see IU |R () = IU |S () the calculations are omitted for brevity. We note that (B) may be proved directly by observing that IST () may be written as
2 R 2 d EST − W log qU (t − s) , 2 d RW
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
59
= (d/d)R . Similarly I where RW W U |S () may be written as
2 W QU (W − s)2 1 d − log qU (t − s) QU (W − s) ds, EST d RW 0 QU (W − s)2
where QU (W − s) = (d/d)QU (W − s). Therefore IST − IU |S > 0 if and only if W QU (s) ds 0
W 0
QU (s)2 ds > QU (s)
W 0
QU (s) ds
2 ,
√ which is nothing but the Cauchy–Schwartz inequality with f = QU (s)/ QU (s) and g = √ QU (s). We conclude that when the infection rate is constant as a function of time conditioning on the time of infection is equivalent to conditioning on the time of diagnosis. The reason is that the lengths of the time intervals [S, W ] and [0, T ] have the same marginal distribution. (C) above is rather surprising because it shows that conditioning on R the maximum among T and W − S, which is equivalent to conditioning on the longer of the two intervals, provides no systematic gain in terms of information. Finally, the conditional likelihoods are less efficient relative to the complete likelihood. The loss of efficiency is due to the fact that the location of the [S, T ] within [0, W ] contains information on . The next Theorem explores the relationship when the infection process is non-constant. Theorem 3. Let (s) be arbitrary. Then for all and W , IU |T () IU |S () if and only if
2
2 s W s W 0 qU (u) du 0 (s − u)qU (u) du s
ds. (W − s) s ds 0 0 0 qU (u) du 0 (s − u)qU (u) du
(4.5)
Proof. Lemma 4.1 implies that IU T () = IU S () also equal IST (). It is easy to see that, IU S () = IU |S () + IS (), IU T () = IU |T () + IT (). Hence, IU |T () IU |S () if and only if IT () IS (). Now, d2 o IS = E − 2 log fS (s) d may be simplified to W Q (W − s)2 P 2 1 (s) U . ds − W 2 PW 0 QU (W − s) PW
60
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
Also IT , equals 1 PW
W
(s)qU (t − s) ds
t 0
(s)qU (t − s) ds
0
2
t 0
dt −
2 PW 2 PW
.
Comparing IS and IT the inequality (4.5) is now easily established.
The condition (4.5) provides a way of determining on which event, infection or diagnosis, it is preferred to condition. Note that when (s)= is constant, the RHS and LHS of (4.5) are equal and the conclusion of Theorem 2 holds. Note that the LHS of (4.5) may be viewed as a mean of a ratio, whereas the RHS is a ratio of means. In general closed form expressions for either side of (4.5) are not available even in the simple exponential case discussed in Section 3. Our numerical experiments indicate that IU |T () is generally, but not always, larger than IU |S () when (s) is increasing and the reverse when it is decreasing. This phenomena is explained as follows. Let fS (s) denote the distribution of infection times in [0, W ]. Recall that we observe (S, T ) only if T W . Therefore the distribution of the observed infection times denoted by fSo (s) is fSo (s) =
W s
o fST (s, t) dt.
It is easy to show graphically that fS (s) is different from fSo (s) especially near W . In fact this is the reason that fSo (s) contains information about . In other words, the information about increases with the “distance” between fS (s) and fSo (s). An increasing infection rate implies that most infections occur near W . Therefore the distance between fS (s) and fSo (s) increases with the infection rate because all the “action” is near W . Hence, the observed infection times are particularly informative when the infection rate is high. Recall that IU |S () + IS () = IU |T () + IT (). Since IS () must be large IU |S () must be small and therefore IU |T () IU |S (). We conclude by noting that the inequalities max(IU |T (), IU |S ()) IU |R () IST () established in Theorem 2 hold in this situation as well. 4.1. Unknown infection rate The preceding results apply when the infection rate process is either completely known or unknown up to a finite number of parameters. More generally, we would like to estimate the vector without any assumptions about . Suppose that ∈ where is an arbitrary (and infinite dimensional) set of functions defined on [0, W ]. Clearly, cannot be estimated as efficiently when is unknown. A lower bound for the information about is found. Our development follows Van der Vaart (1998). Theorem 4. The efficient score function for estimating is d/d log fU |S (·; , ).
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
61
Proof. Let {fST (·; , ) : ∈ Rm , ∈ } denote the model space for the observations (S, T ). The sufficiency principle implies that IST (, ) = IU S (, ) for every fixed ∈ . Thus the models fST (·; , ) and fU S (·; , ) are equivalent in terms of information about . Consider the collection of models fU S (·; + h, + ), ∈ R, derived by a perturbation of fU S (·; , ). Then d log fU S (·; + h, + ) = ht lU S (, ) + S (·; , ) d =0 where lU S (, ) = (d/d) log fU S (·; , ) is the score function for assuming that is fixed. Using this notation it is clear that lU S (, ) = lU |S (, ) + lS (, ). The function : =S (·; , ) is a score function for assuming that is fixed. It satisfies the relationship S (s; , )fS (s; , ) ds = 0. (4.6) The set , of functions satisfying (4.6) is known as the tangent space for . Let , denote the orthogonal projection onto , . It is well known that the efficient score is the ortho-complement of the projection of lU S (, ) onto , . Thus, lU S (, ) − , lU S (, ) = lU S (, ) − , [lU |S (, ) + lS (, )]
= lU S (, ) − [ , lU |S (, ) + , lS (, )]
= lU S (, ) − [0 + lS (, )] = lU |S (, ) but lU |S (, ) = d/d log fU |S (·; , ), completing the proof.
The theorem states that the score function based on the conditional distribution of U |S is the semi-parametric score for estimating . Hence, in a semi-parametric setting the distribution of the infection times fS provides no information about . 5. Summary This paper discusses the estimation of the sojourn time distribution from retrospectively ascertained data. Although the original motivation was the estimation of the latency period of AIDS from blood transfusion data, the results can be applied in a variety of settings in epidemiology, e.g., time to pregnancy studies (Scheike et al., 1999), as well as other disciplines. We compare three likelihood functions; the so called full likelihood of the data; and two conditional likelihoods one conditions on the time of infection and the other on the time of diagnosis. All three approaches have merit. For example, conditioning on the time of diagnosis reflects the way the data has accrued, whereas conditioning on the time of infection eliminates the infection process from the likelihood. The full likelihood needs no justification. We investigate the likelihood functions with respect to their information content about . We show that when the infection process is constant conditioning on the
62
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
time of infection is equivalent to conditioning on the time of diagnosis. However, when the infection process is increasing, then conditioning on the time of infection is generally less efficient than conditioning on the time of diagnosis; the opposite is true for decreasing infection rates. We show that the full likelihood is more informative than either of the conditional approaches. Thus, the conditional likelihood based on U |T is judged to be the least favorable approach because it is less informative than the joint likelihood and it does not, in general, eliminate the nuisance parameter. However, it should be realized that Theorem 3 indicates that under parametric modeling of an increasing infection rate the conditional likelihood based on U |T may be more informative that the conditional likelihood based on U |S. Clearly, the comparisons among the three likelihoods is of intrinsic interest. The above comments are relevant when is known in advance, or if it is unknown up to a finite number of parameters. However, as the dimension of the parameter set for grows there is less information in the marginal distribution of S about . Finally, when belongs to an infinite-dimensional set then fST (·; , ) and fU |S (·; , ) are equally informative about . Thus the conditionality principle (Brookmeyer and Gail, 1988; Cox and Hinkley, 1974) does not apply to this semi-parametric setting. This is important because it shows that in many practical situations where not much is known about the nuisance infection rate function there is no need to model it. In fact by conditioning it out we gain two things: (a) robustness against model misspecification; and (b) computational simplicity since the parameter indexing need not be estimated. In addition limited simulation results indicate better finite sampling properties. Thus we recommend using the conditional likelihood based on U |S. Our results are related to those of Kalbfleisch and Lawless, 1989, who used a special discrete non-parametric model for both the distribution of U and intensity function . They show that the resulting estimator coincides with Lagakos et al.’s (1988) non-parametric estimator which is calculated conditioning on the time of infection. In fact, Theorem 5.1 may be extended to the completely non-parametric situation, where itself is infinite dimensional. This will be pursued elsewhere. References Becker, N.G., Marschner, I.C., 2001. Advances in medical statistics arising from the AIDS epidemic. Statist. Methods Med. Res. 10, 117–140. Brookmeyer, R., Gail, M.H., 1988. A method for obtaining short term projections and lower bounds on the size of the AIDS epidemic. J. Amer. Statist. Assoc. 83, 301–308. Cox, D.R., Hinkley, D.V., 1974. Theoretical Statistics, Chapman & Hall, London. Donnelly, C.A., Cox, D.R., 2001. Mathematical biology and medical statistics: contributions to understanding of the AIDS epidemiology. Statist. Methods Med. Res. 10, 141–154. Finkelstein, D.M., Morre, D.F., Schoenfeld, D.A., 1993. A proportional hazards model for truncated AIDS data. Biometrics 49, 731–740. Hannon, P.M., Dahiya, R.C., 1999. Estimation of parameters for the truncated exponential distribution. Comm. Statist.: Theory Methods 28, 2591–2612. Kalbfleisch, J.D., Lawless, J.F., 1989. Inferences based on retrospective ascertainment: an analysis of the data on transfusion related AIDS. J. Amer. Statist. Assoc. 84, 360–372. Kass, R.E., Vos, P.W., 1997. Geometrical Foundations of Asymptotic Inference, Wiley, New York. Lagakos, S.W., Barraj, L.M., De Gruttola, V., 1988. Nonparametric analysis of truncated survival data with application to AIDS. Biometrika 75, 515–523.
O. Davidov / Journal of Statistical Planning and Inference 134 (2005) 49 – 63
63
Lui, K.J., Lawrence, D.N., Peterman, T.A., Haverkos, H.H., Bergman, D.J., 1986. A model-based approach for estimating the mean incubation period of transfusion-associated acquired immunodeficiency syndrome. Proc. Natl. Acad. Sci. 83, 2913–2917. Medley, G.F., Billard, L., Cox, D.R., Anderson, R.M., 1987. The distribution of the incubation period AIDS in patients infected via blood transfusion. Nature 328, 719–721. Scheike, T.H., Petersen, J.H., Martinussen, T., 1999. Retrospective ascertainment of recurrent events: an application to time to pregnancy. J. Amer. Statist. Assoc. 94, 713–725. Van der Vaart, A.W., 1998. Asymptotic Statistics, Cambridge University Press, Cambridge. Wang, M.C., See, L.C., 1992. N-Estimation from retrospectively ascertained events with application to AIDS. Biometrics 48, 129–141.