Journal of the Korean Statistical Society (
)
–
Contents lists available at ScienceDirect
Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss
Estimation of Kendall’s tau for bivariate doubly truncated data Pao-sheng Shen Department of Statistics, Tunghai University, Taichung, 40704, Taiwan
article
info
Article history: Received 5 January 2015 Accepted 25 July 2015 Available online xxxx AMS 2000 subject classifications: primary 62N01 secondary 62N02
abstract In this article, we consider the estimation of Kendall’s tau for bivariate doubly truncated data, where two correlated event times are potentially observed only if both fall within subject specific intervals of times. Using the inverse-probability-weighted (IPW) approach, we propose two nonparametric estimators of Kendall’s tau for bivariate doubly truncated data. The first estimator is based on V-statistics and the second estimator is based on weighted comparable pairs. The asymptotic properties of the proposed estimators are established. Simulation studies are conducted to investigate their finite sample performance. © 2015 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.
Keywords: Association of variables Bivariate truncation Inverse-probability-weighted
1. Introduction Double truncation of survival data occurs when only those individuals whose event times lie within a certain subjectspecific observational window are observed. Doubly truncated data play an important role in the statistical analysis of survival times (Bilker & Wang, 1996; Moreira & de Uña-Álvarez, 2010a,b; Shen, 2010a,b; Zhu & Wang, 2012, 2014) as well as in other fields such as astronomy (Efron & Petrosian, 1999; Lynden-Bell, 1971) or economy. In this article, we consider bivariate double-truncated data. Consider the following application: Example: Age-of-onset anticipation (AOA) The clinical phenomenon called age-of-onset anticipation or AOA is defined as a decrease in age at onset and/or an increase in disease severity in successive generations of afflicted families. The cause of AOA has been identified as DNA instability, such as nucleotide repeats that change in length in subsequent generations, which alters the phenotype of the disease. Reports of AOA exist in literatures, such as bipolar disorder (McInnis, McMahon, Stine, & Ross, 1993), facioscapulohumeral muscular dystrophy (Zatz et al., 1995), schizophrenia (Bassett & Honer, 1994), rheumatoid arthritis (Deighton, Heslop, McDonagh, Walker, & Thomson, 1994). Recently, changes in disease phenotype in subsequent generations also have been identified in other disorders, such as in colon cancer, breast cancer, Alzheimer disease and diabetes (Nilbert, Timshel, Bernstein, & Larsen, 2009; Paterson, Kennedy, & Petronis, 1996). The data set used in testing for AOA usually consist of affected parent–child pairs between two calender times, say τ1 and τ2 . Hence, the age of onset distribution in parents and children, respectively, is doubly truncated relative to the population distribution. For example, for the period 1992–2003 the Odense Pharmaco-epidemiological Database (OPED) (see Støvring, Andersen, Beck-Nielsen, Geen, & Vach, 2003, Støvring & Wang, 2007) contains subject information on all prescriptions for subsidized medications redeemed at any pharmacy in the County of Fyn, as well as information on births, deaths and
E-mail address:
[email protected]. http://dx.doi.org/10.1016/j.jkss.2015.07.005 1226-3192/© 2015 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.
2
P.-s. Shen / Journal of the Korean Statistical Society (
)
–
Fig. 1. Schematic depiction of bivariate doubly truncated data.
migration into and out of the County of Fyn. The tracking of individuals is based on the Civil Registration Number (CRN) which is assigned to all at birth or first immigration into Denmark. Incident events (occurrence of diabetic) are defined to be the first treatment event observed in the time window for subjects who did not have any previous events during a one year run-in period. Assume that the minimal and maximal observable age (in years) at diabetic onset before death is known and denoted by τ0 and τM , respectively. Let τ1 = 1992 and τ2 = 2003. Define a target population as the individuals who were born after the calendar time (in years) τ1 − τM (i.e. 1992-maximum age = year of birth for the oldest person), and before τ2 − τ0 (i.e. 2003-minimum age = year of birth for the youngest person) and will be treated with the diabetic before death. For a pair of parents and children of the population defined above, let τB1 and τB2 be the calendar time (in years) of the initiating events (birth) for parents and children, respectively. Similarly, let τD1 and τD2 be the calendar time (in years) at diabetic onset for parents and children, respectively. Let T1∗ = τD1 − τB1 and T2∗ = τD2 − τB2 be the age (in years) at diabetic onset for parents and children. For i = 1, 2, let Ui∗ = τ1 − τBi and Let d0 = τ2 − τ1 . Notice that Ui∗ and Ui∗ + d0 denote the age (in years) at τ1 and τ2 , respectively. Hence, we observe (T1∗ , T2∗ ) if and only if U1∗ ≤ T1∗ ≤ U1∗ + d0 and U2∗ ≤ T2∗ ≤ U2∗ + d0 . To assess AOA, the researchers are interested in finding the association between age of onset of parents and that of their children. Thus, we need to estimate association between T1∗ and T2∗ using bivariate doubly truncated data. Fig. 1 highlights all the different times for bivariate doubly truncated data as described in example. For any distribution function W denote the left and right endpoints of its support by aW = inf{t : W (t ) > 0} and bW = inf{t : W (t ) = 1}, respectively. For i = 1, 2, let Gi (u) = P (Ui∗ ≤ u) denote the distribution function of Ui∗ . Throughout this article we assume that aG1 = aG2 = aG , aF1 = aF2 = aF and aG ≤ aF ≤ aG + d 0
and
bG ≤ bF ≤ bG + d0 .
(1.1)
Under assumption (1.1), Fi (t ) = P (Ti ≤ t ) and Gi (u) are both identifiable (see Woodroofe, 1985). Furthermore, we assume that (T1∗ , T2∗ ) is independent of (U1∗ , U2∗ ). The measurement of association has been a major topic in bivariate survival analysis. Kendall’s tau (Kendall & Gibbons, 1990) is a popular measure of association and is suitable for lifetime data since it is rank invariant. Let (T1i∗ , T2i∗ ) and (T1j∗ , T2j∗ ) (i ̸= j) be two independent realizations from (T1∗ , T2∗ ). The (i, j)th pair is called concordant if (T1i∗ − T1j∗ )(T2i∗ − T2j∗ ) > 0 and discordant if (T1i∗ − T1j∗ )(T2i∗ − T2j∗ ) < 0. The untruncated version of Kendall’s tau is (denoted by τ ) defined as the difference of concordance and discordance probabilities between the (i, j)th pair, i.e. ∗
τ = P ((T1i∗ − T1j∗ )(T2i∗ − T2j∗ ) > 0) − P ((T1i∗ − T1j∗ )(T2i∗ − T2j∗ ) < 0) = 2P ((T1i∗ − T1j∗ )(T2i∗ − T2j∗ ) > 0) − 1.
(1.2)
In Section 2, using inverse-probability-weighted (IPW) approach, we propose two nonparametric estimators of Kendall’s tau for bivariate doubly truncated data. The first estimator is based on V-statistics and the second estimator is based on weighted comparable pairs. The asymptotic properties of the proposed estimators are established. In Section 3, a simulation study is conducted to investigate their performance in finite samples. 2. The proposed estimators 2.1. The V-statistics approach When (T1i∗ , T2i∗ )’s are continuous positive random variables, τ can be written as
τ = 4P (T1i∗ > T1j∗ , T2i∗ > T2j∗ ) − 1 = 4
∞
0
∞
S (x, y)S (dx, dy) − 1, 0
where S (x, y) = P (T1∗ > x, T2∗ > y) is the joint survival function of T1∗ and T2∗ . Therefore, we can write τ = Γ (S ), where Γ : D [S ] → R, S = {(x, y) : S (x, y) > 0} and D [S ] is the space of cadlag functions on S . When there is no truncation, i.e. (T1i∗ , T2i∗ )’s are observable, τ can be estimated by Γ (Sˆ ), where Sˆ is the empirical estimator of S. Γ (Sˆ ) has the form of a
P.-s. Shen / Journal of the Korean Statistical Society (
)
–
3
ˆ However, when truncation is present Sˆ is not consistent since so-called V-statistic and Γ (Sˆ ) will inherit nice properties of S. it estimates the conditional probability P (T1∗ > x, T2∗ > y|U1∗ ≤ T1∗ ≤ U1∗ + d0 , U2∗ ≤ T2∗ ≤ U2∗ + d0 ). For bivariate doubly truncated data, when the right truncation times (denoted by Vi∗ ) are not completely dependent on left truncation times, i.e. Vi∗ ̸= Ui∗ + d0 , Shen (2013) proposed an algorithm to jointly compute the nonparametric maximum likelihood estimates (NPMLEs) of F (t1 , t2 ) = P (T1∗ ≤ t1 , T2∗ ≤ t2 ) and H (u1 , v1 , u2 , v2 ) = P (U1∗ ≤ u1 , V1∗ ≤ v1 , U2∗ ≤ u1 , V2∗ ≤ v2 ). Here, we shall derive the NPMLEs of S (t1 , t2 ) and G(u1 , u2 ) = P (U1∗ ≤ u1 , U2∗ ≤ u2 ) under the complete dependence, i.e. Vi∗ = Ui∗ + d0 (i = 1, 2). Let (Tij , Uij ) (i = 1, 2; j = 1, . . . , n) denote the double-truncated sample. Define K (t1 , t2 ) = P (U1∗ ≤ t1 ≤ U1∗ + d0 , U2∗ ≤ t2 ≤ U2∗ + d0 ) = P (t1 − d0 ≤ U1∗ ≤ t1 , t2 − d0 ≤ U2∗ ≤ t2 ), which can be written as K (t1 , t2 ) = G(t1 , t2 ) − G((t1 − d0 )−, t2 ) − G(t1 , (t2 − d0 )−) + G((t1 − d0 )−, (t2 − d0 )−). Consider the conditional survival function S˜ (t1 , t2 ) = P (T1i > t1 , T2i > t2 )
= p −1
bF
P (U1∗ ≤ x ≤ U1∗ + d0 , U2∗ ≤ y ≤ U2∗ + d0 )S (dx, dy)
t1
t2
= p−1
bF
bF
bF
K (x, y)S (dx, dy),
t1
t2
where p = P (U1∗ ≤ T1∗ ≤ U1∗ + d0 , U2∗ ≤ T2∗ ≤ U2∗ + d0 ) denotes the un-truncation probability. Thus, we have
S˜ (dt1 , dt2 ) = p−1 K (dt1 , dt2 )S (dt1 , dt2 ) and S (dt1 , dt2 ) = p
S˜ (dt1 , dt2 ) K (t1 , t2 )
.
When K (x, y) and p are known, S (t1 , t2 ) can be estimated by Sˆ (t1 , t2 ; K , p) = n−1 p
n I[T
1i >t1 ,T2i >t2 ]
K (T1i , T2i )
i=1
.
Let t1 → ∞, t2 → ∞, it follows that p can be estimated by pˆ (K ) = n
n i=1
−1
1 K (T1i , T2i )
.
Hence, given K , S (t1 , t2 ) can be estimated by Sˆ (t1 , t2 ; K ) =
n i =1
−1 n
1 K (T1i , T2i )
I[T1i >t1 ,T2i >t2 ]
i=1
K (T1i , T2i )
.
Next, consider the conditional distribution function
˜ (u1 , u2 ) = P (U1i ≤ u1 , U2i ≤ u2 ) G = p−1 P (U1∗ ≤ u1 , U2∗ ≤ u2 , U1∗ ≤ T1∗ ≤ U1∗ + d0 , U2∗ ≤ T2∗ ≤ U2∗ + d0 ) u2 u1 = p−1 [S (x−, y−) − S (x−, y + d0 ) − S (x + d0 , y−) + S (x + d0 , y + d0 )]G(dx, dy). aG
aG
Thus, we have G(du1 , du2 ) =
˜ (du1 , du2 ) G S (u1 −, u2 −) − S (u1 −, u2 + d0 ) − S (u1 + d0 , u2 −) + S (u1 + d0 , u2 + d0 )
.
Thus, given S (t1 , t2 ) and p, G(u1 , u2 ) can be estimated by
ˆ (u1 , u2 ; L, α) = n−1 p G
n I[U
1i ≤u1 ,U2i ≤u2 ]
i=1
L(U1i , U2i )
,
where L(U1i , U2i ) = S (U1i −, U2i −) − S (U1i −, U2i + d0 ) − S (U1i + d0 , U2i −) + S (U1i + d0 , U2i + d0 ). Let u1 → ∞, v1 → ∞, u2 → ∞, v2 → ∞, it follows that p can be estimated by pˆ (L) = n
n i =1
1 L(U1i , U2i )
−1
.
4
P.-s. Shen / Journal of the Korean Statistical Society (
)
–
Thus, given L, G(u1 , u2 ) can be estimated by
ˆ (u1 , u2 ; L) = G
n i=1
−1 n
1 L(U1i , U2i )
I[U1i ≤u1 ,U2i ≤u2 ] L(U1i , U2i )
i =1
.
ˆ (u1 , u2 ; L), S (t1 , t2 ) and G(u1 , u2 ) can be estimated by simultaneously solving the By the expression of Sˆ (t1 , t2 ; K ) and G following two equations: Sˆn (t1 , t2 ) =
n
−1 n
1 Kˆ n (T1i , T2i )
i =1
I[T1i >t1 ,T2i >t2 ] Kˆ n (T1i , T2i )
i=1
,
(2.1)
and
ˆ n (u1 , u2 ) = G
n
−1 n
1 Lˆ n (U1i , U2i )
i =1
I[U1i ≤u1 ,U2i ≤u2 ] Lˆ n (U1i , U2i )
i =1
,
(2.2)
where Kˆ n (T1i , T2i ) =
ˆ n (T1i , T2i ) − Gˆ n ((T1i − d0 )−, T2i ) − Gˆ n (T1i , (T2i − d0 )−) + Gˆ n ((T1i − d0 )−, (T2i − d0 )−), G and Lˆ n (U1i , U2i ) = Sˆn (U1i −, U2i −) − Sˆn (U1i −, U2i + d0 ) − Sˆn (U1i + d0 , U2i −) + Sˆn (U1i + d0 , U2i + d0 ). Since
Sˆn (x, y)Kˆ n (dx, dy) =
Kˆ n (x, y)Sˆn (dx, dy),
and
Sˆn (x, y)Kˆ n (dx, dy) =
−
ˆ n (dx, dy) − Sˆn (x + d0 , y + d0 )G
ˆ n (dx, dy), Sˆn (x, y)G
ˆ n (dx, dy) + Sˆn (x + d0 , y)G
ˆ n (dx, dy) Sˆn (x, y + d0 )G
we have n
n i=1
−1
1 Kˆ n (T1i , T2i )
=n
n i =1
−1
1 Lˆ n (U1i , U2i )
.
(2.3)
Let S˜n (x, y) = n−1
n
I[T1i >x,T2i >y] ,
˜ n (x, y) = n−1 and G
i=1
n
I[U1i ≤x,U2i ≤y]
i =1
˜ (x, y), respectively. Similar to Theorem 2.1 of Shen (2013), we can denote the empirical distribution function of S˜ (x, y) and G ˆ n are the NPMLEs of S and G. show that Sˆn and G Plugging Sˆn into Γ (·), we obtain an estimator τˆnV = Γ (Sˆn ) = 4
∞
0
∞
Sˆn (x, y)Sˆn (dx, dy) − 1, 0
where Sˆn (dx, dy) = Sˆn (x−, y−) − Sˆn (x, y−) − Sˆn (x−, y) + Sˆn (x, y). Next, we investigate the asymptotic properties of τˆnV . Theorem 1. Let [a, bF ] = [a1 , bF ] × [a2 , bF ], where [a, bF ] ∈ [aF , ∞) × [aF , ∞) such that L(u1 , v1 , u2 , v2 ) > δ > 0 for b b [u1 , v1 ] × [u2 , v2 ] ∈ [a, bF ]. Assume that (i) a F a F S (dx, dy)/K (x, y) < ∞ and (ii) K (dx, dy)/S (dx, dy) is uniformly bounded 1
2
on [a, bF ]. Then Sˆn (x, y) is uniformly consistent on [a, bF ].
P.-s. Shen / Journal of the Korean Statistical Society (
)
–
5
Proof. By Eqs. (2.1) and (2.2), the estimator Sˆn (t1 , t2 ) is equivalent to solution of
˜ n ; t1 , t2 ) = 0, U (Sˆn , S˜n , G where
˜ n ; t1 , t2 ) = Sˆn (t1 , t2 ) − U (Sˆn , S˜n , G
bF
t2
S˜n (dx, dy)
bF
t1
y y−d0
x
˜ n (du1 ,du2 ) G
x−d0
.
(2.4)
Lˆ n (u1 ,u2 )
The rest of proof is similar to that of Theorems 2.2 of Shen (2013) and is omitted. Note that in Eq. (2.4),
y y−d0
x x−d0
˜ n (du1 ,du2 ) G Lˆ n (u1 ,u2 )
converges to zero if min(x, y) → aG . Thus, we require assumption (i) to
control singularity. Assumption (i) holds if aG < aF .
˜ (x, y) − G˜ (x, y−) − G˜ (x−, y) + G˜ (x−, y−) and K˜ n (x, y) = G˜ n (x, y) − G˜ n (x, y−) − G˜ n (x−, y) + Theorem 2. Let K˜ (x, y) = G ˜ n (x−, y−). Under assumptions (i) and (ii) of Theorem 1, we assume that (iii) the class of functions F , where F consists of G functions with envelop 1/K (r , s) is a F˜ (r , s)-Donsker class, and (iv)
u2
u1
F˜n (dx, dy)
u1 −d0
˜ n (x, y) K (x, y)G
u2 −d0
≤ MF (u1 , u2 )
˜ (u1 , u2 )-Donsker. Then with probability tending to 1, where MF is such that the class of functions with envelope MF is G √ ˆ n(Sn (t1 , t2 ) − S (t1 , t2 )) is asymptotically normal for every t ∈ [a, bF ]. Proof. The proof is similar to that of Theorems 2.3 of Shen (2013) and is omitted. Based on Theorems 1 and 2, the asymptotic properties of Γ (Sˆn ) can be developed by applying a Taylor series expansion on Γ (Sˆn ) (Fernholz, 1983). It is easy to show that Γ (S ) = 4 SdS − 1 is compactly differentiable with S being a cadlag function of bounded variation. Hence, τˆnV = Γ (Sˆn ) will inherit the asymptotic properties of Sˆn . 2.2. The estimator based on comparable pairs In this section, we derive an alternative estimator for τ based on concordant pairs. When there is not truncation, based on (1.2) τ can be estimated by 2
n(n − 1) i
sgn((T1i∗ − T1j∗ )(T2i∗ − T2j∗ )),
where sgn(u) = 1 if u > 0 and sgn(u) = −1 if u < 0. However, when truncation is present, a modified approach is needed since only truncated sample is available. A conditional Kendall’s tau has been widely used for tests of quasi-independence for survival data under truncation (see Martin & Betensky, 2005, Tsai, 1990), based on comparability of truncated data. When T1∗ is subject to double truncation and T2∗ is subject to right censoring, Zhu and Wang (2012) considered semiparametric association estimation of (T1∗ , T2∗ ) based on a copula model. Using IPW to adjusting truncation bias, Zhu and Wang (2014) also proposed a nonparametric estimator of τ . Here, we define comparable pairs for bivariate doubly truncated data as follows. A bivariate pair (T1i , T2i ) and (T1j , T2j ) is comparable if for k = 1, 2, max min Ukij = max(Uki , Ukj ) ≤ min(Tki , Tkj ) = Tkij
and max min Tkij = max(Tki , Tkj ) ≤ min(Uki + d0 , Ukj + d0 ) = Vkij .
Let pij be the conditional probability of a bivariate pair being comparable given (T1i , T2i ) and (T1j , T2j ), i.e. max min max min pij = P (Ukij ≤ Tkij , Tkij ≤ Vkij , k = 1, 2|T1i , T2i , T1j , T2j ).
Under independence of (T1∗ , T2∗ ) and (U1∗ , U2∗ ), pij can be expressed pij = Wij2 , where min min max min min max max max Wij = G(T1ij , T2ij ) − G((T1ij − d0 ), T2ij ) − G(T1ij , (T2ij − d0 )) + G((T1ij − d0 ), (T2ij − d0 )),
where G(u1 , u2 ) = P (U1∗ ≤ u1 , U2∗ ≤ u2 ) is the joint distribution function of U1∗ and U2∗ . Let Iij ’ be the indicator for a bivariate comparable pair. Since E [Iij sgn((T1i − T1j )(T2i − T2j ))] = pij E [sgn((T1i − T1j )(T2i − T2j ))|Iij = 1]
= pij E [sgn((T1i∗ − T1j∗ )(T2i∗ − T2j∗ ))],
6
P.-s. Shen / Journal of the Korean Statistical Society (
)
–
given G, τ can be estimated by
i
Iij −1
Iij sgn((T1i − T1j )(T2i − T2j ))
pij
pij
i
.
ˆ n. The following theorems establish the asymptotic properties of G Theorem 3. Let [b, bG ] = [b1 , bG ] × [b2 , bG ] ∈ [0, ∞) × [0, ∞) such that K (t1 , t2 ) > δ > 0 for [t1 , t2 ] ∈ [b, bG ]. Assume b b ˆ n (x, y) is uniformly that (a) b G b G G(dx, dy)/L(x, y) < ∞ and (b) L(dx, dy)/G(dx, dy) is uniformly bounded on [b, bG ]. Then G 1 2 consistent on [b, bG ]. Proof. The proof is deferred to Appendix.
˜ (x, y) = S˜ (x−, y−) − S˜ (x−, y + d0 ) − S˜ (x + d0 , y−) + S˜ (x + d0 , y + d0 ), and W ˜ n (x, y) = S˜n (x−, y−) − Theorem 4. Let W S˜n (x−, y + d0 ) − S˜n (x + d0 , y−) + S˜n (x + d0 , y + d0 ). Under the assumptions (a) and (b) of Theorem 3, we assume that (c) the ˜ (r , s)-Donsker class, and class of functions G, where G consists of functions with envelop 1/L(r , s) is a G (d)
u2
u1
˜ n (dx, dy) G
u1 −d0
˜ n (x, y) L(x, y)W
u2 −d0
≤ MG (u1 , u2 )
with probability tending to 1, where MG (·) is such that the class of functions with envelope MG (·) is S˜ (u1 , u2 )-Donsker. Then
√
ˆ n (t1 , t2 ) − G(t1 , t2 )) is asymptotically normal for every t ∈ [b, bG ]. n( G
Proof. The proof is deferred to Appendix. Now, we discuss assumptions in Theorems 3 and 4. Assumption (a) holds if bG < bF since L(x, y) is bounded on [b, bG ]. When bG = bF , assumption (a) is not easily justified in practice. However, it is the minimal condition under which our proof ˜ (dx, dy) = L(x, y)G(dx, dy), condition (d) is asymptotically equivalent to works. Since G
u2
u1
G(dx, dy)
u1 −d0
˜ (x, y) W
u2 −d0
≤ MG (u1 , u2 )
with probability tending to 1, where MG (·) is such that the class of functions with envelope MG (·) is S˜ (u1 , u2 )-Donsker. Note ˜ (du1 , du2 ) = p−1 L(du1 , du2 )G(du1 , du2 ). Hence, we have that S˜ (du1 , du2 ) = p−1 K (u1 , u2 )S (du1 , du2 ) and G
bG
b2
bG b1
˜ (dx, dy) G ˜ (x, y) L(x, y)W
bG
b2
≤ M∗
y+d0 x+d0
b1
bG b2
G(dx, dy)
bG
=
y
bG b1
x
K (u1 , u2 )S (du1 , du2 )
G(dx, dy) L(x, y)
,
because 1/K (u1 , u2 ) < M ∗ . In other words, condition (d) is the empirical counter part of the all the time needed
bG
G(dx,dy) L(x,y)
bG b2
< ∞. Similar to assumption (a), if bG < bF then L(x, y) is bounded on [b, bG ], and hence, condition (d) holds b1 with M (τ ) = M ∗ . However, when aG = aF = 0, it is difficult to check if condition (d) holds. ˆ n , we can obtain a nonparametric estimator τˆn as follows: Based on G Iij −1 Iij sgn((T1i − T1j )(T2i − T2j )) i
Theorem 5. As n → ∞, τˆn is a consistent estimator of τ and n1/2 (τˆn − τ ) converges weakly to a normal distribution with mean zero and variance σ 2 .
P.-s. Shen / Journal of the Korean Statistical Society (
)
–
7
Proof. Similar to the approach of Zhu and Wang (2014), we define an un-rescaled estimator
τˆu =
n −1 I sgn((T − T )(T − T )) ij 1i 1j 2i 2j . ˆ p 2 ij i
We write sgn((T1i − T1j )(T2i − T2j )) = sij .
−1 n Iij sij −τ pˆ ij 2 i
n1/2 (τˆu − τ ) = n1/2
Similar to the discussion in Lakhal-Chaieb, Cook, and Lin (2010), we can show that the concordance or discordance status is conditionally independent of the comparability. It follows that the first term is a zero-mean U-statistic of order 2. Next, n1/2
1 pˆ ij
−
1
pij
= n1/2
ˆ ij ] 2[Wij − W ˆ ij2 Wij W
+ op (1).
ˆ ij − Wij can be approximated by a sum of independent and identically distributed zero-mean terms. Thus, Note that W n1/2 (τˆu − τ ) is asymptotically equivalent to a zero-mean U-statistic of order 2. Next, −1 n Iij − 1 + op (1) pˆ ij 2 i
n1/2 (τˆn − τˆu ) = n1/2 (τˆu − τ ) − n1/2 τ
(2.5)
The second term in (2.5) is a sum of i.i.d. zero-mean terms and the third term in (2.5) is also asymptotically equivalent to a sum of i.i.d. zero-mean terms. Thus, n1/2 (τˆn − τ ) converges to a mean-zero normal random variable. It is difficult to obtain the explicit forms of the asymptotic variances of τˆnV and τˆn . Thus, we consider the bootstrap estimation of the variances. For doubly truncated data, Moreira and de Uña-Álvarez (2010b) demonstrated that the simple bootstrap is a suitable method to estimate the finite sample distribution of the nonparametric maximum likelihood estimate. Here, we use the simple bootstrap method for the estimation of the variances of τˆnV and τˆn . For b = 1, . . . , B, b b b b b b b b (T11 , U11 , T21 , U21 ), . . . , (T1n , U1n , X2n , U2n ) are generated using the empirical distribution that puts weight 1/n at each of the observations (T1j , U1j , T2j , U2j ), j = 1, . . . , n. Using the bootstrap sample, we obtain the bootstrap estimators (τˆnV,1 , . . . , τˆnV,B )
for τˆnV . Based on the bootstrap estimators τˆnV,b ’s, we estimate the standard deviation of τˆnV by
bstd =
B B 2 τˆnV,i − τˆnV,j /B 1/2
i=1
j =1
(B − 1)
.
An approximate 95% confidence interval for τ is constructed using τˆnV ± z0.025 bstd, where z0.025 is the 0.025 upperpercentile point of the standard normal distribution. Similarly, we can obtain the bootstrap estimator for τˆn . 3. Simulation results A simulation study is conducted to investigate the performance of the proposed estimators τˆnV and τˆn . Case 1: Clayton’s family Failure times (T1∗ , T2∗ ) are generated from Clayton’s family with marginal survival functions Si (t ) = e−t (i = 1, 2) and joint survival function S (t1 , t2 ) = Cα (S1 (t1 ), S2 (t2 )) = (S1 (t1 )1−α + S2 (t2 )1−α − 1)1/(1−α) ,
α > 1.
The values of α0 are chosen as α = 0.6, α = 2.0 and α = 6.0 such that Kendall’s tau, denoted by τ , is equal to 0.23, 0.5 and 0.75 respectively. The sample sizes are chosen as n = 100, n = 200 and n = 400. The replication is 1000 times. The left-truncation times U1∗ and U2∗ are generated at U1∗ = U2∗ + D∗ , where U2∗ is exponentially distributed with mean θ and D∗ is independent of U2∗ and exponentially distributed with mean 0.1. The right-truncation times V1∗ and V2∗ are generated as Vi∗ = Ui∗ + 5 (i = 1, 2). The values of θ are chosen as 0.13 and 0.5 such that the proportions of truncation are equal to 0.25 and 0.51 for α = 0.6, equal to 0.23 and 0.46 for α = 2.0, and equal to 0.21 and 0.42 for α = 6.0. In order to evaluate the performance of the bootstrap variance estimation, we also obtain the bootstrap standard deviation estimator
8
P.-s. Shen / Journal of the Korean Statistical Society (
)
–
Table 1 Simulation results for τˆnV and τˆn (Clayton’s family).
τ 0.23 0.23 0.23 0.23 0.23 0.23 0.50 0.50 0.50 0.50 0.50 0.50 0.75 0.75 0.75 0.75 0.75 0.75
qT
0.25 0.25 0.25 0.51 0.51 0.51 0.23 0.23 0.23 0.46 0.46 0.46 0.21 0.21 0.21 0.42 0.42 0.42
cp
n
0.82 0.82 0.82 0.67 0.67 0.67 0.82 0.82 0.82 0.67 0.67 0.67 0.85 0.85 0.85 0.71 0.71 0.71
100 200 400 100 200 400 100 200 400 100 200 400 100 200 400 100 200 400
τˆnV
τˆn
bias
std
bstd
cov
bias
std
bstd
cov
−0.015 −0.010 −0.009 −0.062 −0.045 −0.032 −0.018 −0.008 −0.019 −0.065 −0.041 −0.027 −0.018 −0.011
0.143 0.116 0.083 0.224 0.182 0.120 0.095 0.071 0.052 0.132 0.105 0.076 0.071 0.053 0.043 0.090 0.078 0.065
0.122 0.104 0.077 0.192 0.161 0.112 0.082 0.066 0.048 0.109 0.096 0.071 0.064 0.049 0.040 0.085 0.072 0.061
0.932 0.942 0.947 0.930 0.940 0.946 0.934 0.941 0.945 0.928 0.940 0.944 0.935 0.942 0.946 0.932 0.941 0.945
−0.008 −0.005
0.136 0.107 0.076 0.195 0.167 0.108 0.092 0.068 0.051 0.127 0.096 0.073 0.065 0.049 0.039 0.078 0.069 0.060
0.115 0.097 0.071 0.172 0.153 0.100 0.080 0.061 0.047 0.108 0.089 0.068 0.056 0.045 0.036 0.069 0.062 0.056
0.933 0.942 0.948 0.931 0.941 0.946 0.935 0.941 0.946 0.930 0.941 0.945 0.935 0.942 0.947 0.932 0.942 0.946
bias
std
bstd
cov
bias
std
bstd
cov
−0.016 −0.009 −0.006 −0.068 −0.052 −0.025 −0.019 −0.012 −0.005 −0.065 −0.041 −0.020 −0.014 −0.010 −0.005 −0.061 −0.038 −0.026
0.147 0.115 0.080 0.221 0.175 0.121 0.110 0.078 0.054 0.157 0.108 0.077 0.072 0.053 0.044 0.091 0.076 0.067
0.125 0.104 0.075 0.189 0.161 0.114 0.093 0.072 0.051 0.133 0.097 0.072 0.062 0.048 0.040 0.078 0.070 0.063
0.936 0.942 0.949 0.933 0.940 0.944 0.934 0.942 0.947 0.934 0.940 0.944 0.937 0.942 0.948 0.935 0.941 0.945
−0.008 −0.004
0.136 0.107 0.076 0.195 0.167 0.116 0.105 0.074 0.052 0.151 0.102 0.073 0.065 0.047 0.039 0.078 0.069 0.060
0.116 0.098 0.071 0.168 0.152 0.109 0.097 0.068 0.049 0.127 0.095 0.069 0.058 0.044 0.036 0.071 0.066 0.057
0.938 0.943 0.948 0.934 0.940 0.945 0.935 0.943 0.947 0.935 0.941 0.945 0.938 0.943 0.950 0.937 0.942 0.947
0.005
−0.067 −0.047 −0.026
0.008
−0.054 −0.037 −0.021 −0.014 −0.007 0.007
−0.056 −0.043 −0.021 −0.008 −0.003 0.001 0.050 0.031 0.020
Table 2 Simulation results for τˆnV and τˆn (Frank’s family).
τ 0.24 0.24 0.24 0.24 0.24 0.24 0.50 0.50 0.50 0.50 0.50 0.50 0.70 0.70 0.70 0.70 0.70 0.70
qT
0.20 0.20 0.20 0.44 0.44 0.44 0.21 0.21 0.21 0.50 0.50 0.50 0.28 0.28 0.28 0.51 0.51 0.51
cp
n
0.81 0.81 0.81 0.66 0.66 0.66 0.82 0.82 0.82 0.67 0.67 0.67 0.83 0.83 0.83 0.70 0.70 0.70
100 200 400 100 200 400 100 200 400 100 200 400 100 200 400 100 200 400
τˆnV
τˆn
0.008
−0.054 −0.037 −0.018 −0.012 −0.007 0.004
−0.046 −0.035 −0.022 −0.008 −0.007 0.001 0.050 −0.027 0.018
and the average coverage probabilities of the 95% bootstrapped normal-approximation confidence intervals. Here, we choose B = 100. Table 1 shows the proportions of truncation (denoted by qT = 1 − p), the biases, standard deviations (denoted by std), bootstrap estimator (bstd), and empirical coverage (cov) of τˆnV and τˆn . Table 1 also shows the proportion of comparable pairs (denoted by cp). Case 2: Frank’s family Failure times (T1 , T2 ) are generated from Frank’s family S (t1 , t2 ) = −
1
α
log 1 +
(e−αS1 (t1 ) − 1)(e−αS2 (t2 ) − 1) , e−α − 1
with marginal survival functions Si (t ) = e−t (i = 1, 2). We use the algorithm of Genest (1987) to generate random pairs. The values of α are set as α = 2.30, α = 5.65 and α = 11.51 such that τ is equal to 0.24, 0.5 and 0.70, respectively. The truncation times are generated the same way as case 1. The values of θ are chosen as 0.13 and 0.5 such that the proportions of truncation are equal to 0.20 and 0.44 for α = 11.51, equal to 0.21 and 0.50 for α = 5.65, and equal to 0.28 and 0.51 for α = 2.30. Table 2 shows the simulation results. Based on the results of Tables 1 and 2, we have the following conclusions: (1) Simulation results indicate that the estimator τˆn outperforms τˆnV in terms of biases and standard deviations. The advantage of using τˆn is more significant when truncation is severe. One explanation for this result is that the estimator
P.-s. Shen / Journal of the Korean Statistical Society (
)
–
9
τˆnV involves integration, which may cause instability. When truncation is mild (i.e. qT ≤ 0.30), the biases of both estimators are small. However, when truncation is severe (i.e. qT = 0.51), the biases of τˆnV can be large. (2) Given n and τ , the biases and the standard deviations of both estimators decrease as the proportion of truncation (qT ) decreases (i.e. the proportion of comparable pair (pc) increases). (3) Given n and qT , the standard deviations of both estimators decrease as τ increases. (4) When n = 100, since the bootstrap standard deviations underestimate the true standard deviations, the empirical coverage probabilities of the bootstrap-based confidence intervals are less than the nominal level 95%. The undercoverage improves as sample size increases. When n = 400, the bootstrap-based confidence intervals have empirical coverage probabilities close to the nominal level. 4. Discussions For bivariate double truncation data, using the IPW approach, we have proposed two nonparametric estimators of Kendall’s tau, τˆnV and τˆn . The first estimator τˆnV is based on V-statistics and its asymptotic properties rely on the assumption (i) of Theorem 1, which holds if aG < aF . Simulation results indicate that when aG = aF and truncation is severe, the biases of τˆnV can be large. The second estimator τˆn is based on weighted comparable pairs and its asymptotic properties rely on the assumption (a) of Theorem 3, which holds if bG < bF . Simulation results indicate that when bG = bF = ∞ the estimator τˆn performs satisfactorily. In some cases, the joint distribution of (U1∗ , U2∗ ) can be parameterized as Gθ . In this situation, we can obtain an estimator θˆn by maximizing the conditional likelihood of (U1∗ , U2∗ )’s, given (T1∗ , T2∗ )’s (see Moreira & de Uña-Álvarez, 2010a, Qin & Wang, 2001, Shen, 2010b, Wang, 1989). The large-sample properties of the estimator Gθˆn can be established using the arguments similar to those of Wang (1989). Based on Gθˆn , we may obtain a more efficient estimator τθˆn of Kendall’s tau. Further research is required in this issue. Acknowledgments The author would like to thank the associate editor and referees for their helpful and valuable comments and suggestions. Appendix A. Proof of Theorem 3
ˆ n (t1 , t2 ) is equivalent to the solution of By (2.1)–(2.3) the estimator G ˆ n , S˜n , G˜ n ; t1 , t2 ) = 0, V (G where
ˆ n , S˜n , G˜ n ; t1 , t2 ) = Gˆ n (t1 , t2 ) − V (G
t2
t1
aG
aG
˜ n (dx, dy) G y+d0 x+d0 S˜n (du1 ,du2 ) , y
(A.1)
Kˆ n (u1 ,u2 )
x
ˆ n (u1 , u2 ) − Gˆ n ((u1 − d0 )−, u2 ) − Gˆ n (u1 , (u2 − d0 )−) + Gˆ n ((u1 − d0 )−, (u2 − d0 )−). Note that the where Kˆ n (u1 , u2 ) = G integration
t2 bG
y+d0 x+d0
y
in Eq. (A.1) converges to zero if max(x, y) → bG . Since
˜ (dx, dy) G
t1 bG
S˜n (du1 ,du2 ) Kˆ n (u1 ,u2 )
x
y+d0 x+d0 y
x
S˜ (du1 ,du2 ) K (u1 ,u2 )
t2
t2
≤ bG
bG
G(dx, dy) L(x, y)
,
this singularity is controlled by the assumption (a). Let
ˆ n (u1 ∧ t1 , u2 ∧ t2 ) − Gˆ n (u1 ∧ t1 , (u2 ∧ t2 − d0 )−) Kˆ n (u1 , u2 ; t1 , t2 ) = G − Gˆ n (((u1 ∧ t1 − d0 )−), u2 ∧ t2 ) + Gˆ n ((u1 ∧ t1 − d0 )−, (u2 ∧ t2 − d0 )−) and
ˆ n , S˜n , G˜ n ; t1 , t2 ) = Q (G
∞
0
∞
0
Kˆ n (u1 , u2 ; t1 , t2 ) Kˆ n (u1 , u2 )
˜ n (t1 , t2 ). S˜n (du1 , du2 ) − G
ˆ n , S˜n , G˜ n ; t1 , t2 ) = 0 is equivalent to Q (Gˆ n , S˜n , G˜ n ; t1 , t2 ) = 0. It follows that V (G ˆ n has a convergent subsequence which converges uniformly to a G∞ , By the argument of Van der Laan (1996, p. 122), G ˆ n(k) be this convergent subsequence. Note that for [u1 , u2 ] ∈ [b, bG ], Kˆ n(k) (u1 , u2 ) is which has the same support as G. Let G
10
P.-s. Shen / Journal of the Korean Statistical Society (
)
–
uniformly bounded away from zero for n large enough and Kˆ n (u1 , u2 ) is of uniformly bounded sectional variation. Thus, by Lemmas 3.2 and 3.3 of Van der Laan (1996, p. 121), we have ∞
0
Kˆ n(k) (u1 , u2 ; t1 , t2 )
∞
Kˆ n(k) (u1 , u2 )
0
S˜n(k) (du1 , du2 ) ≤ C ∥S˜n(k) − S˜ ∥∞
ˆ n(k) to G∞ imply that Q (G∞ , S˜n , G˜ n ; t1 , t2 ) = 0 for a C < ∞. Empirical process theory and the uniform consistency of G ˜ ˜ ˜ ˜ for all t1 , t2 . Let S0 , G0 , G0 , K0 and L0 be the true functions of S, G, G, K and L, respectively. It remains to show that ˜ 0 ; t1 , t2 ) = 0 implies G = G0 , from which it follows that G∞ = G0 due to K (u1 , u2 ) > δ > 0. Hence, Gˆ n is Q (G, S˜0 , G uniformly consistent on [b, bF ]. ˜ 0 ; t1 , t2 ) = 0 is equivalent to V (G, S˜0 , G˜ 0 ; t1 , t2 ) = 0. Notice that Q (G, S˜0 , G ˜ 0 (du1 , du2 ) = p−1 L0 (u1 , u2 )G0 (du1 , du2 ) and S˜0 (dx, dy) = p−1 K0 (x, y)S0 (dx, dy), we have Since G ˜ 0 ; t1 , t2 ) − V (H0 , S˜0 , G˜ 0 ; t1 , t2 ) = G(t1 , t2 ) − G0 (t1 , t2 ) 0 = V (G, S˜0 , G t2
+ p−1
t1
y
K (u1 , u2 ) − K0 (u1 , u2 ) K (u1 , u2 )
x
y+d0 x+d0
where aG (x, y) = ∥∞ ) defined by:
x+d0
y
aG
aG
y+d0
S0 (du1 , du2 )
G0 (dx, dy) aG (x, y)
,
(A.2)
˜ (du1 , du2 ). Consider the linear operator I − AG,0 : (D[b, ∞], ∥ · ∥∞ ) → (D[b, ∞], ∥ ·
1 S K (u1 ,u2 ) 0
x
(I − AG,0 )(h)(t1 , t2 ) = h(t1 , t2 ) + p−1
t2
aG
t1
aG
y+d0
y
x+d0
x
H ( u1 , u2 ) K ( u1 , u2 )
S0 (du1 , du2 )
G0 (dx, dy) aG (x, y)
,
where H (u1 , u2 ) = h(u1 , u2 ) − h(u1 , (u2 − d0 )−) − h((u1 − d0 )−, u2 ) + h((u1 − d0 )−, (u2 − d0 )−). Eq. (A.2) tells us that (I − AG,0 )(G − G0 )(t1 , t2 ) = 0. If we can prove that the linear operator I − AG,0 is 1-1, then G = G0 follows. The operator can be proved to be invertible in the same way as was in Subsection 3.4 of Van der Laan (1996, p. 125). Let y+d x+d Λ1 (du1 , du2 ) = S0 (du1 , du2 )/K (u1 , u2 ) and Λ2 (dx, dy) = G0 (dx, dy)/aG (x, y). Let Λ∗1 (x, y) = y 0 x 0 Λ1 (du1 , du2 ). By the proof in Subsection 3.4 of Van der Laan (1996), it is necessary that Λ∗1 (dx, dy)/Λ2 (dx, dy) is uniformly bounded
bG bG
on [b, bG ] and that
b2
b1
Λ2 (dx, dy) < ∞ (i.e.,
bG bG b2
b1
G0 (dx, dy)/aG (x, y) < ∞). Because K (u1 , u2 ) > δ > 0 on
,dy) is uniformly bounded [u1 , u2 ] ∈ [b, bG ], Λ1 (dx, dy)/Λ2 (dx, dy) is uniformly bounded if the assumption (b) holds, i.e. GL00((dx dx,dy) on [b, bG ]. Similarly, because y+d0 x+d0 1 aG (x, y) = S˜0 (du1 , du2 ) K ( u1 , u2 ) y x y+d0 x+d0 K0 (u1 , u2 ) S0 (du1 , du2 ) ≥ p−1 δ L0 (x, y), = p −1 K (u1 , u2 ) y x b b we only need b G b G G0 (dx, dy)/L0 (x, y) < ∞. The proof is completed. 2 1 ∗
Appendix B. Proof of Theorem 4 Let an (r , s) =
s+d0
s
ann (r , s) =
r +d0
s
1 Kˆ n (u1 , u2 )
r s+d0
r +d0
1 Kˆ n (u1 , u2 )
r
S˜ (du1 , du2 ), S˜n (du1 , du2 )
and a(r , s) =
s+d0
s
r +d0
1 K (u1 , u2 )
r
S˜ (du1 , du2 ).
Define the empirical processes hG˜ n (x, y) =
√
˜ n (x, y) − G˜ (x, y)) and hS˜ (u1 , u2 ) = n(G n
√
n(S˜n (u1 , u2 ) − S˜ (u1 , u2 )). We know
d
that (hG˜ n , hS˜n ) −→ (hG˜ , hS˜ ) (jointly) in (D[b, bG ], ∥ · ∥∞ , B ) for two Gaussian processes hG˜ and hS˜ . By telescoping it follows that we have t2 t1 hG˜ n (dr , ds) √ ˆ n , S˜n , G˜ n ; t1 , t2 ) − V (Gˆ n , S˜ , G˜ ; t1 , t2 )) = n(V (G an (r , s) aG aG
t2
t1
− aG
aG
1 ann
(r , s)an (r , s)
s+d0
s
r +d0
r
1 K ( u1 , u2 )
˜ n (dr , ds). hS˜n (du1 , du2 )G
(B.3)
P.-s. Shen / Journal of the Korean Statistical Society (
)
–
11
First, both 1/an (r , s) and 1/a(r , s) fall in a class of functions with envelope 1/L(r , s) with probability tending to 1. Under ˜ assumption (c), both 1/an (r , s) and 1/a(r , s) fall in a G-Donsker class. Since
|1/an (r , s) − 1/a(r , s)| ≤ M ∥Gˆ n − G∥∞ 1/L(r , s), t t ˜ (dr , ds) → 0. Hence, t2 t1 (1/an (r , s) − under assumption (a) of Theorem 2, we have a 2 a 1 [1/an (r , s) − 1/a(r , s)]2 G aG aG G G t t 1/a(r , s))hG˜ n (dr , ds) converges to zero in probability. Since a 2 a 1 1/a(r , s)hG˜ n (dr , ds) converges weakly, the weak G G
convergence of the first term in (B.3) holds. By Fubini’s theorem, we can rewrite the second term of (B.3) as max(u2 −d0 ,t2 )
max(u1 −d0 ,t1 )
1
qn (u1 , u2 )hS˜n (du1 , du2 ),
K ( u1 , u2 )
aG
aG
where qn (u1 , u2 ) =
u2
u1
˜ n (dr , ds). 1/[ann (r , s)a(r , s)]G
u1 −d0
u2 −d0
Under the assumption (d), we have qn (u1 , u2 ) ≤
u2
u2 −d0
u1
˜ n (dr , ds) G
u1 −d0
˜ n (r , s) L(r , s)W
≤ MG (u1 , u2 )
with probability tending to 1, and both qn (u1 , u2 ) and q(u1 , u2 ) =
u2
u2 −d0
u1
˜ (dr , ds) 1/a2 (r , s)G
u1 −d0
fall with probability tending to 1 in a S˜ (u1 , u2 )-Donsker class. Since supu1 ,u2 |qn (u1 , u2 ) − q(u1 , u2 )| → 0, (qn (u1 , u2 ) − q(u1 , u2 ))2 ≤ MG2 (u1 , u2 ) and b2
aG
b1
MG2 (u1 , u2 )S˜ (du1 , du2 ) < ∞,
aG
it follows that b2
aG
b1
(qn (u1 , u2 ) − q(u1 , u2 ))2 S˜ (du1 , du2 ) → 0 in probability.
aG
Hence, the weak convergence of the second term in (B.3) holds. This proves that Zn (t ) ≡
√
d
ˆ n , S˜n , G˜ n ; t1 , t2 ) − V (Gˆ n , S˜ , G˜ ; t1 , t2 )) −→ Z (t1 , t2 ) n( V ( G
t2
t2
≡ aG
aG
hG˜ (dr , ds) a(r , s)
t2
t1
− aG
aG
1 a2
(r , s)
s
s+d0
r +d0 r
1 K ( u1 , u2 )
˜ (dr , ds). hS˜ (du1 , du2 )G
√
ˆ n , S˜n , G˜ n ; t1 , t2 ) = V (G, S˜ , G˜ ; t1 , t2 ) = 0, this implies that n(V (Gˆ n , S˜ , G˜ ; t1 , t2 ) − V (G, S˜ , G˜ ; t1 , t2 )) ≡ −Zn (t1 , t2 ) Since V (G ˜ ) only as a function it is straightforwardly verified (see is asymptotically normal with mean zero. Since G appears in V (G, S˜ , G ˜ ˜ ˆ n s.t. ∥Gˆ n − G∥∞ → 0 Subsection 3.3 of Van der Laan (1996)) that G → V (G, S , G) is Fréchet-differentiable for any sequence G we have 1
∥Gˆ n − G∥∞
(V (Gˆ n , S˜ , G˜ ) − V (G, S˜ , G˜ ) − d1 V (G, S˜ , G˜ )(Gˆ n − G)) → 0
˜ ) is a linear mapping. By the usual kind of argument (see Van der Vaart & with respect to the supnorm, where d1 V (G, S˜ , G Wellner, 1996) for M-estimators it follows now that √
˜ )( n(Gˆ n (t1 , t2 ) − G(t1 , t2 ))) = −Zn (t1 , t2 ) + op (1). d1 V (G, S˜ , G ˜ ) has a bounded inverse in the same way as Under the assumptions (a) and (b) of Theorem 2, we can prove that d1 V (G, S˜ , G was in Subsection 3.4 of Van der Laan (1996); then √
ˆ n (t1 , t2 ) − G(t1 , t2 )) = −d1 V (G, S˜ , G˜ )−1 (Zn (t1 , t2 )) + op (1). n( G
Hence, the weak convergence of Zn implies, by the continuous mapping theorem, weak convergence of proof is completed.
√
ˆ n − G). The n(G
12
P.-s. Shen / Journal of the Korean Statistical Society (
)
–
References Bassett, A., & Honer, W. (1994). Evidence for anticipation in schizophrenia. The American Journal of Human Genetics, 54, 864–870. Bilker, W. B., & Wang, M.-C. (1996). A semiparametric extension of the Mann–Whitney test for randomly truncated data. Biometrika, 52, 10–20. Clayton, D. G. (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika, 65, 141–151. Deighton, C., Heslop, P., McDonagh, J., Walker, D., & Thomson, G. (1994). Does genetic anticipation occur in familial rheumatoid arthritis. Annals of the Rheumatic Diseases, 53, 833–835. Efron, B., & Petrosian, V. (1999). Nonparametric methods for doubly truncated data. Journal of the American Statistical Association, 94, 824–834. Fernholz, L. T. (1983). Von Mises calculus for statistical functions. New York: Springer-Verlag. Genest, C. (1987). Frank’s family of bivariate distributions. Biometrika, 74, 549–555. Kendall, M., & Gibbons, J. D. (1990). Rank correlation methods (fifth ed.). London: Edward Arnold. Lakhal-Chaieb, L., Cook, R., & Lin, X. (2010). Inverse probability of censoring weighted estimates of Kendall’s τ for gap time analyses. Biometrics, 66, 1145–1152. Lynden-Bell, D. (1971). A method of allowing for known observational selection in small samples applied to 3CR quasars. Monograph National Royal Astronomical Society, 155, 95–118. Martin, E. C., & Betensky, R. A. (2005). Testing quasi-independence of failure and truncation times via conditional Kendall’s tau. Journal of the American Statistical Association, 100, 484–492. McInnis, M. G., McMahon, F. J., Stine, O. C., & Ross, C. A. (1993). Anticipation in bipolar affective disorder. The American Journal of Human Genetics, 53, 385–390. Moreira, C., & de Uña-Álvarez, J. (2010a). A semiparametric estimator of survival for doubly truncated data. Statistics in Medicine, 29(30), 3147–3159. Moreira, C., & de Uña-Álvarez, J. (2010b). Bootstrapping the NPMLE for doubly truncated data. Journal of Nonparametric Statistics, 22(5), 567–583. Nilbert, M., Timshel, S., Bernstein, I., & Larsen, K. (2009). Role for genetic anticipation in Lynch syndrome. Journal of Clinical Oncology, 27, 360–364. Paterson, A. D., Kennedy, J. L., & Petronis, A. (1996). Evidence for genetic anticipation in non-Mendelian diseases. The American Journal of Human Genetics, 59, 264–268. Qin, J., & Wang, M.-C. (2001). Semiparametric analysis of truncated data. Lifetime Data Analysis, 7, 225–242. Shen, P.-S. (2010a). Nonparametric analysis of doubly truncated data. Annals of the Institute of Statistical Mathematics, 62(5), 835–853. Shen, P.-S. (2010b). Semiparametric analysis of doubly truncated. Communications in Statistics-Theory and Methods, 39, 3178–3190. Shen, P.-S. (2013). Nonparametric estimation of the bivariate distribution function with doubly truncated data. Communications in Statistics-Theory and Methods, 42, 3805–3818. Støvring, H., Andersen, M., Beck-Nielsen, H., Geen, A., & Vach, W. (2003). Rising prevalence of diabetes: evidence from a Danish pharmaco-epidemiological database. Lancet, 363, 537–538. Støvring, H., & Wang, M.-C. (2007). A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incident events. BMC Medical Research, 7:53, 1–11. Tsai, W.-Y. (1990). Testing the assumption of independence of truncation time and failure time. Biometrika, 77, 169–177. Van der Laan, M. J. (1996). Nonparametric estimation of the bivariate survival function with truncated data. Journal of Multivariate Analysis, 58, 107–131. Van der Vaart, A. W., & Wellner, J. A. (1996). Weak convergence and empirical processes with applications to statistics. New York: Springer. Wang, M.-C. (1989). A semiparametric model for randomly truncated data. Journal of the American Statistical Association, 84, 742–748. Woodroofe, M. (1985). Estimating a distribution function with truncated data. The Annals of Statistics, 13, 163–177. Zatz, M., Maria, S. K., Passo-Bueno, M. A., Vainzof, M., Camplotto, S., Cerqueira, A., et al. (1995). High proportion of new mutations and possible anticipation in Brazilian facioscapulohumeral muscular dystrophy families. The American Journal of Human Genetics, 56, 99–105. Zhu, H., & Wang, M.-C. (2012). Analysing bivariate survival data with interval sampling and application to cancer epidemiology. Biometrika, 99, 345–361. Zhu, H., & Wang, M.-C. (2014). Nonparametric inference on bivariate survival data with interval sampling: association estimation and testing. Biometrika, http://dx.doi.org/10.1093/biomet/asu005.