Journal of Statistical Planning and Inference 142 (2012) 2913–2925
Contents lists available at SciVerse ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
The weighted least square based estimators with censoring indicators missing at random Xiayan Li a, Qihua Wang b,n a b
Department of Statistics and Finance, University of Science and Technology of China, Hefei, China Academy of Mathematics and Systems, Chinese Academy of Science, Beijing 100190, China
a r t i c l e i n f o
abstract
Article history: Received 22 August 2011 Received in revised form 26 April 2012 Accepted 26 April 2012 Available online 11 May 2012
In this paper, we study linear regression analysis when some of the censoring indicators are missing at random. We define regression calibration estimate, imputation estimate and inverse probability weighted estimate for the regression coefficient vector based on the weighted least squared approach due to Stute (1993), and prove all the estimators are asymptotically normal. A simulation study was conducted to evaluate the finite properties of the proposed estimators, and a real data example is provided to illustrate our methods. & 2012 Elsevier B.V. All rights reserved.
Keywords: Censoring indicator Weighted least square Imputation Regression calibration Augmented inverse probability weighting Missing at random
1. Introduction Linear regression model with censored failure time data is applied frequently in many statistical areas. Various estimating approaches have been suggested. By modifying the normal equations, Miller (1976) suggested the slope and intercept estimators, which are weighted linear combinations of the uncensored observations where the weights are derived from the Kaplan–Meier product-limit estimator of a distribution function. However, the consistency of Miller’s estimators need critical conditions. To overcome the inconsistency problems in Miller’s approach, Buckley and James (1979) develop estimating approach by modifying the sum of squares of residuals instead of modifying the normal equations. However, Buckley and James’s estimators use a complicate iterative algorithm. Koul et al. (1981) (KSV) suggest a data transformation approach based on the least square technique. Compared with Buckley & James estimators, this method is easy to be carried out because no iterations are required and standard least squares computation routines can be used once the observations of the response are transformed by the censoring information. Further, Stute (1993, 1996) suggests a weighted least squares estimate (WLS) which is proved to perform better than the KSV method (Bao et al., 2007). The above-mentioned estimators and inference approaches require that the censoring indicator is always observed. In practice, however, deaths with an unknown cause (DUC) occur because information on the death is missing or incomplete or because the clinicians or algorithms that interpret the information could not agree on or determine a cause. The time to death from one cause can be censored by a death from a different cause when multiple causes of death are operating. For instance, in a clinical trial one might distinguish between deaths attributable to the disease of interest and deaths due to
n
Corresponding author. E-mail address:
[email protected] (Q. Wang).
0378-3758/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jspi.2012.04.016
2914
X. Li, Q. Wang / Journal of Statistical Planning and Inference 142 (2012) 2913–2925
all other causes. Hence, DUC implies that the censoring indicator is missing. To illustrate how common DUC can be, in a recent meta-analysis of 20 studies on childhood malaria mortality in Africa, 7.9% (1104/14,009) of all deaths had an unknown cause (Rowe et al., 2006), because the verbal autopsy interview forms were lost. Other clinic trial example happens when hospital case notes be inconclusive for some patients. The naive way to deal with the problem is to simply ignore cases with missing data. The remaining dataset can then be analyzed using conventional statistical inference method. However, this method may be very inefficient, especially when there is a significant degree of missingness. This complete case estimator is consistent only when the ignored censoring indicators are missing completely at random (MCAR), where missingness is independent of all the observed or unobserved variables. Some authors have done a lot of work on regression analysis when some censoring indicators are missing, typically under the MCAR assumption. For example, Dinsert (1982) obtained a nonparametric maximum likelihood estimator. Geotghebeur and Ryan (1995) derived a modified log-rank test survival differences between groups, but on a proportional hazard function assumption. McKeague and Subramanian (1998) proposed an estimator by estimating the missing proportion. Zhou and Sun (2003) extended the idea of McKeague and Subramanian (1998) to additive hazard regression model, also under MCAR assumption. Wang and Dinse (2011) provided a modification of the KSV estimator to accommodate missing indicators. They define an imputation and an inverse probability weighted estimators and establish their asymptotically normality. However, their results are based on the missingness mechanism assumption that missingness can depend on the observed response variable but not on the missing indicator variable and the observed covariable vector X, which is less restrictive than MCAR, but stronger than missing at random (MAR). MAR implies that missingness can depend on the observed variables including the covariables, but not on missing censoring indicator variable. As pointed out previously, the weighted least square method improves the KSV method in the absence of missing censoring indicator. Thus we now focus on modifying WLS method to fit missing censoring indicators to expect better results than Wang and Dinse (2011). Moreover, we base our proposal on a more popular missingness assumption, MAR, which is also less restrictive than the missing mechanism used in Wang and Dinse (2011). This paper is organized as follows. Section 2 develops approach based on weighted least square estimation and gives its asymptotic property. Sections 3 and 4 define imputation and inverse probability weighted estimators and presents their asymptotic properties. In Section 5, we conducted a simulation study to evaluate the finite sample behaviors of the estimators. The simulations also compare the finite sample performance of the proposed estimators with Wang and Dinse’s (2011) estimators. Section 6 analyzes a real data set from clinic trail to illustrate our methods. 2. WLS calibration estimator Let Yi be a response variable, Xi a vector of d explanatory variables. Assume that Y and X follow a linear regression model Y i ¼ X> i b þ ei ,
i ¼ 1, . . . ,n,
ð1Þ
where b is a d-vector of unknown regression coefficients, ei are independent error terms. We assume that given X 1 , . . . ,X n , the errors e1 , . . . , en are independent and identically distributed with mean zero. Let Ci be censoring variable for the ith subject. In survival analysis, Yi is usually logarithm of failure time. Assume Yi and Ci are independent conditional on Xi. For each t, define 1GðtÞ ¼ PðC ZtÞ. We assume that Yi can be randomly censored on the right by Ci. Let Y~ i ¼ Y i 4C i , di ¼ IðY i r C i Þ. Define a missingness indicator xi which is 1 if di is observed and is 0, otherwise. The observed data are fY~ ,X, x, xdg. For simplicity, we set Z ¼ ðY~ ,XÞ. Throughout this paper, we assume d is missing at random (MAR). That is Pðx ¼ 19Y~ ,X, dÞ ¼ Pðx ¼ 19Y~ ,XÞ: It is a common assumption in missing data analysis and is reasonable in practice, see Little and Rubin (1987, Chapter 1). Bao et al. (2007) showed that, although the algorithm of the WLS estimate is almost as simple as that of the KSV estimate, the WLS approach performs much better than the KSV method with censored data by comparing their finite sample performances. The WLS estimation approach for estimating b was first proposed by Stute (1993), in which the consistency of the estimator was also proved. Then, the asymptotic normality of the estimator was studied by Stute (1996).The weighted least squares (WLS) estimator is given by !1 n n X X di di X i Y~i > ^ b WLS ¼ XiXi , ð2Þ ~i Þ 1Gð Y 1Gð Y~i Þ i¼1 i¼1 which minimizes Q ðbÞ ¼
n X
di
~ i ¼ 1 1GðY i Þ
ðY~ i b1 X i1 bd X id Þ2 :
ð3Þ
X. Li, Q. Wang / Journal of Statistical Planning and Inference 142 (2012) 2913–2925
2915
In the presence of missing indicators, the WLS estimator cannot be applied directly. However, we can modifying WLS method by replacing di with its conditional expectation in (3), which leads to the following weighted least squares estimating function: Q ðbÞ ¼
n X mðZ i Þ ~ ðY i b1 X i1 bd X id Þ2 , 1GðY~i Þ
i¼1
where mðZ i Þ ¼ Eðd9Z i Þ. Let b^ be the minimizer of Q ðbÞ. Then we have !1 n n X X X i Y~i mðZ i Þ > mðZ i Þ ^ b¼ XiXi : ~ ~ 1GðY i Þ i¼1 i ¼ 1 1GðY i Þ
ð4Þ
In practice, mðÞ is usually unknown. Naturally, we may define the estimator of b by replacing the unknown mðÞ in b^ by its estimator. One usual way is to estimate m(Z) parametrically by assuming a parametric model mðzÞ ¼ m0 ðZ, yÞ, where m0 ð,Þ is a known function and y is an unknown parameter vector. The estimate of y, say y^ n , can be obtained by maximizing the likelihood n Y
m0 ðZ i , yÞxi di ð1m0 ðZ i , yÞÞxi ð1di Þ :
ð5Þ
i¼1
~ uðyÞ ~ can then be estimated nonparametrically by ~ ¼ Eðd9Y~ ¼ yÞ. We define uðyÞ !! ~ Y~ i Pn y i ¼ 1 xi di K hn ~ ¼ !! , u^ n ðyÞ ~ Y~ i Pn y x K i i¼1 hn ~ where KðÞ is a kernel function and hn is a bandwidth sequence. We adopt the following estimator of GðyÞ: ~ ^ 1u n ðY i Þ Y nRi ~ ¼ 1 , G^ n ðyÞ nR þ 1 i ~
ð6Þ
ð7Þ
i:Y i r y~
where Ri denotes the rank of Y~ i ði ¼ 1, . . . ,nÞ. By the fact that a:s:
~ ~ !0: yÞ9 sup 9u^ n ðyÞuð
ð8Þ
0 r t r t0
Similar to Wang and Ng (2008), we can get a:s:
sup 9G^ n ðtÞGðtÞ9!0,
ð9Þ
0 r t r t0
where t0 is defined in Condition C (Appendix A). Now we obtain the following WLS calibration estimator of b: !1 ! n n ^ X X X i Y~i m0 ðZ i ; y^ n Þ > m0 ðZ i ; y n Þ ^ bR ¼ Xi Xi : 1G^ n ðY~ i Þ 1G^ n ðY~ i Þ i¼1 i¼1
ð10Þ
In Wang and Dinse (2011), the estimate of GðÞ is defined in the following way. First, they define a Horvitz and ~ Thompson (1952) type inverse probability weighted estimator of uðyÞ ! ~ Y~ i Pn y ~ i ¼ 1 ðxi di =pn ðY i ÞÞK hn ~ ¼ ! , ð11Þ u~ n ðyÞ ~ Y~ i Pn y ~ ð x = p ð Y ÞÞK n i i¼1 i hn ~ ¼ Pðx ¼ 19yÞ. ~ Then, they define an estimator where pn ðÞ is the Nadaraya–Watson kernel regression estimate of pðyÞ of GðY~ Þ by Y nRi 1u~ n ðY~ i Þ ~ ¼ 1 : ð12Þ G~ n ðyÞ nRi þ 1 ~ i:Y i r y~
~ and hence the estimators of b due to Wang and Dinse (2011) only hold under the But, the strong consistency of G~ n ðyÞ missing mechanism assumption: E½x9Y~ ,X, d ¼ E½x9Y~ , which is stronger than our MAR assumption. As it should be, all the asymptotic properties of the estimators due to Wang and Dinse (2011) are based on this assumption. The following theorem describes the asymptotic normality of b^ R .
2916
X. Li, Q. Wang / Journal of Statistical Planning and Inference 142 (2012) 2913–2925
Theorem 2.1. Assuming m0 ðZ, yÞ is specified correctly (there exists a y0 , m0 ðZ; y0 ¼ Eðd9ZÞÞ). Under Condition C (listed in the Appendix A), we have pffiffiffi ^ L nðb R bÞ!Nð0,V R Þ, where V R ¼ S1 OR S1 , S ¼ E½XX > m0 ðZ; y0 Þ=ð1GðY~ ÞÞ and OR ¼ limt0 -tH OR ðt0 Þ with OR ðt0 Þ defined in (B.27).
OR has complex structure and is hard to estimate well by combining ‘‘plug-in’’ method and sample moment method. Alternatively, we can estimate VR by V^ JR , where V^ JR is a jackknife estimator (Peddada and Patwardhan, 1992) of the asymptotic variance of b^ R . Using some algebra, V^ JR reduces to !1 " !1 # n n n ^ X X X mo ðZ i ; y^ n Þ > mo ðZ i ; y n Þ ~ i,R X > b^ Þ2 X i X > V^ JR ¼ Xi X> ð Y X X : ð13Þ i i R i i i 1G^ n ðY~ i Þ 1G^ n ðY~ i Þ i¼1 i¼1 i¼1
Theorem 2.2. Under the same assumptions of Theorem 2.1, we have p
nV^ JR !V R :
3. WLS imputation estimator In the analysis of missing data, imputation is a common method to handle dataset containing missing values. Once all missing values have been imputed, the dataset can then be analyzed using standard techniques for complete data. Actually, motivated by the fact E½xd þð1xÞmðZÞ ¼ E½d under MAR assumption, we can impute missing di with m0 ðZ i ; y^ n Þ in the expression (3). That is, we can define the following WLS estimation function: Q I ðbÞ ¼
n X
oi,I ðY~ i b1 X i1 bd X id Þ2 ,
i¼1
where
xi di þð1xi Þm0 ðZ i ; y^ n Þ
oi,I ¼
1G^ n ðY~ i Þ
:
Let b^ I be the minimizer of Q I ðbÞ. Then !1 n n X X b^ I ¼ oi,I X i X >i ðoi,I X i Y~ i Þ: i¼1
ð14Þ
i¼1
Theorem 3.1. Under the same assumptions of Theorem 2.1 pffiffiffi ^ L nðb I bÞ!Nð0,V I Þ, where V I ¼ S1 OI S1 , OI ¼ OR þ OI1 , and OI1 is defined in (B.31) of the Appendix. Similar to Section 2, we can estimate VI by jackknife method. Clearly b^ I has a larger asymptotic variance than b^ R . Hence, b^ R is asymptotically more efficient than b^ I . But, as we will show in the simulation section, b^ I performs much better than b^ R in terms of bias. If all failure indicators are fully observed, that is, all x equals to 1, then, b^ turns out to be the WLS estimator in Stute (1996). i
I
4. WLS inverse probability weighted estimator An alternative approach to deal with missing data is based on inverse probability weighting (IPW), which was proposed by Horvitz and Thompson (1952). ~ ¼ Eðx9Y~ ¼ yÞ, ~ which can be estimated nonparametrically by Let pðyÞ ! ~ Y~ i Pn y x W i¼1 i bn ~ ¼ ! , ð15Þ pn ðyÞ ~ Y~ i Pn y i¼1W bn
X. Li, Q. Wang / Journal of Statistical Planning and Inference 142 (2012) 2913–2925
2917
where WðÞ is a kernel function and bn is a bandwidth sequence. Here, we substitute d with " # ! xi di xi ^ þ 1 m0 ðZ i ; y n Þ pn ðY~ i Þ pn ðY~ i Þ in expression (2). Define
xi di
oi,W ¼
pn ðY~ i Þ
þ 1
!
xi
pn ðY~ i Þ
m0 ðZ i ; y^ n Þ :
1G^ n ðY~ i Þ
So, the IPW type estimator of b is !1 n n X X > ^ b ¼ o XX ðo i,W
W
i¼1
i
i
~
i,W X i Y i Þ:
ð16Þ
i¼1
The estimator b^ W is such a vector that minimizes Q W ðbÞ Q W ðbÞ ¼
n X
oi,W ðY~ i b1 X i1 bd X id Þ2 :
i¼1
The WLS IPW estimator has the ‘‘double robust’’ property. This implies that b^ W is consistent as long as m0 ðZ i , yÞ is ~ is replaced by the kernel estimator of E½x9Y~ ,X, the estimator b^ is always specified correctly or E½x9Y~ ¼ E½x9Y~ ,X, d: If pn ðyÞ consistent even if m0 ðZ i , yÞ is specified error. However, this estimator has ’’curse of dimension’’ problem. Theorem 4.1. Under the same assumption of Theorem 2.1 pffiffiffi ^ L nðb W bÞ!Nð0,V R Þ, where V W ¼ S1 OW S1 , OW ¼ OR þ OW1 with OW1 defined in (B.34) in the Appendix. Similarly, we can estimate VR by jackknife method. Obviously, the asymptotic variance OW is larger than bR ’s. Noting a> ðOW1 OI1 Þa 4 0 for any d-vector a, the asymptotic variance O is also larger than b ’s. However, simulation in Section 5 tells us b^ has a smaller bias than b^ and b^ . W
I
W
I
R
Compare to b^ I and b^ R , b^ W has the advantage of double robust property. 5. Simulation study Simulation studies were conducted to examine the finite sample properties of the proposed estimators. In the study, we calculated WLS calibration estimator ðb^ R Þ, WLS imputation estimator ðb^ I Þ, WLS IPW estimator ðb^ W Þ. For comparison, we also calculated the WLS estimator ðb^ WLS Þ, the estimator b^ K suggested by Koul et al. (1981). Both b^ WLS and b^ K are actually unachievable but we could treat it as ‘‘gold standard’’ for the purpose of comparison. Under the two cases of MAR and a stronger assumption than MAR: E½x9Y~ ,X, d ¼ E½x9Y~ , we compare the proposed estimators with that due to Wang and Dinse (2011). We note their estimators as follow: b~ R ¼ KSV type regression calibration estimator, b~ I ¼ KSV type imputation estimator, b~ W ¼ KSV type IPW estimator. We first consider the following linear model: ModelðaÞ :
Y ¼ a þ bX þ E,
where the true value of the parameter is a ¼ 3 and b ¼ 0:2. The random variables X was generated from Uð0; 1Þ. Let C 0 expð1Þ. Then C was generated as C 0 þ m. We varied m to get different censoring rates (CR). For Model (a), we conducted simulations in the following two cases to show how the estimators perform and compare the proposed estimators with the existing estimators. Case 1.1: E are generated from Nð0; 1Þ. We considered Pðx ¼ 19Y~ ,X, dÞ ¼ Pðx9Y~ Þ, a stronger missing mechanism ~ ¼ logitðpðyÞÞ ~ ¼ y1 þ y2 y, ~ which was used in Wang and Dinse (2011). assumption than MAR, and assume logitðEðx9Y~ ¼ yÞÞ Case 1.2: E are generated from Nð0; 1Þ. We considered the MAR missing mechanism, and assumed, ~ ¼ xÞÞ ¼ y1 þ y2 y~ þ y3 x2 : logitðEðx9Y~ ¼ y,X In each case, different y yields different missing rate. For details, see Table 1, where we list all the combinations of m and y we used in the simulation. m0 ðz, yÞ was taken to be the logistic model and the parameters are estimated by maximum likelihood (ML) method. To calculate the proposed estimators, the kernel functions WðÞ,KðÞ were taken to be WðuÞ ¼ 1=2 if 9u9 r 1; 0 otherwise, and KðuÞ ¼ ð15=16Þð12u2 þu4 Þ if 9u9 r 1; 0 otherwise. The bandwidths ðhn ,bn Þ were taken to be ðn1=3 ,n1=3 Þ.
2918
X. Li, Q. Wang / Journal of Statistical Planning and Inference 142 (2012) 2913–2925
Table 1 Simulation parameters: m in the exponential censoring variable distribution and the regression coefficients y in the logistic missingness model for Cases 1.1 and 1.2. Cases
m
CR (%)
y
y
MR ¼20%
MR ¼40%
Case 1.1
20 40
3.27 2.56
(1.6, 0.06) (0.89, 0.18)
(0.485, 0.02) (0.08, 0.12)
Case 1.2
20 40
3.27 2.53
(0.78, 0.02, 2.5) (0.75, 0.09, 3.8)
( 0.05, 0.07, 2.1) (0.05, 0.27, 3.8)
Table 2 Bias and square error (SE) of the estimators in Case 1.1.
b~ K
b~ R
b~ I
b~ W
b^ WLS
b^ R
b^ I
b^ W
Bias SE
0.0635 1.3100
0.1316 1.0761
0.0556 1.4367
0.0362 1.5664
0.0145 0.6249
0.0467 0.4834
0.0185 0.6268
0.0128 0.6725
40
Bias SE
0.0502 1.3157
0.1143 1.2235
0.0536 1.5166
0.0127 1.8176
0.0274 0.6271
0.0547 0.4989
0.0324 0.6229
0.0195 0.7264
20
Bias SE
0.1243 1.9859
0.2453 1.5899
0.1088 2.4024
0.0811 2.6449
0.0334 0.7519
0.0866 0.4692
0.0405 0.7864
0.0314 0.8671
40
Bias SE
0.1104 1.9632
0.2173 1.8120
0.1204 2.4302
0.0658 3.0238
0.0349 0.7561
0.0832 0.4898
0.0512 0.7462
0.0360 0.9341
20
Bias SE
0.0603 1.0056
0.1304 0.8015
0.0537 1.1029
0.0300 1.2173
0.0137 0.4602
0.0400 0.3432
0.0150 0.4719
0.0072 0.5143
40
Bias SE
0.0569 1.0146
0.1374 0.8938
0.0692 1.1498
0.0322 1.4212
0.0103 0.4600
0.0461 0.3525
0.0196 0.4585
0.0045 0.5509
20
Bias SE
0.0803 1.5621
0.2229 1.1453
0.0633 1.8588
0.0339 2.0573
0.0346 0.5667
0.0878 0.3299
0.0392 0.6032
0.0304 0.6655
40
Bias SE
0.0821 1.5454
0.2243 1.2944
0.1107 1.8473
0.0473 2.3326
0.0318 0.5751
0.0867 0.3417
0.0433 0.5724
0.0231 0.7214
n
CR (%)
MR (%)
50
20
20
40
100
20
40
Table 3 Bias and square error (SE) of the estimators in Case 1.2. n
50
100
CR (%)
MR (%)
b~ W
b^ W
Bias (SE)
Bias (SE)
20
20 40
0.0882 (1.5666) 0.3308 (1.8394)
0.0461 (0.6766) 0.0878 (0.7298)
40
20 40
0.2161 (2.7028) 0.2509 (3.0618 )
0.0514 (0.8554) 0.0899 (0.9402)
20
20 40
0.0766 (1.2136) 0.3064 (1.4677)
0.0332 (0.5107) 0.0824 (0.5538)
40
20 40
0.2034 (2.0713) 0.2444 (2.4279)
0.0509 (0.6669) 0.0865 (0.7402)
For each case, we generated 10,000 Monte Carlo random samples under every combinations of sample size, censoring rate and missing rate. The simulation results are presented in Tables 2 and 3 to compare the bias and SE of the estimators mentioned above. From Table 2, it is clear that both b^ K and b^ WLS do not depend on the missing mechanism. This is reasonable as they use all the d values. Bias and SE of all the estimators decrease as sample size increases for every combination of censoring and missing rates. The proposed estimators outperform uniformly the estimators suggested by Wang and Dinse in terms of bias and standard error (SE), especially when censoring or missing rate becomes higher or sample size is smaller. Both b^ W and b^ I have smaller bias but larger SE than b^ R . And b^ W performs better than b^ I in terms of bias but has larger SE than b^ I . It is interest to find that the proposed estimators even perform better than KSV estimator b~ K , one of the gold standard, and b^ R has smaller SE but larger bias than b^ WLS , another gold standard. Especially, Wang and Dinse’s (2011) estimators have serious bias when censoring rate is large. From Table 3, b^ has far less bias and SE than b~ . We also run simulations to see how the W
W
proposed estimators perform when the error distribution changes, or the number of the parameters increase, while we obtained similar observations from the similar results. To save space, we did not present these simulation results here.
X. Li, Q. Wang / Journal of Statistical Planning and Inference 142 (2012) 2913–2925
2919
Table 4 The proposed parameter estimates (and jackknife estimates of squared errors) based on the breast cancer dataset. Estimators
b^ R b^ I b^ W
Treatment
4 4 Nodes positive
Estrogen receptor ( þ)
Primary tumor 4 3 cm
b1
b2
b3
b4
0.2316 (0.37)
0.6439 (0.36)
2.6091 (0.34)
0.3060 (0.36)
0.1801 (0.72)
0.6353 (0.53)
2.6443 (0.43)
0.2995 (0.66)
0.1515 (0.67)
0.7171 (0.54)
2.6766 (0.50)
0.2182 (0.62)
6. Breast cancer application We considered the dataset in Cummings et al. (1986). These are data on 169 elderly women with stage II breast cancer from the ECOG clinical trial E1178 comparing tamoxifen and placebo. Among them, we restrict our attention to 79 women died by the end of the trial. Unfortunately, the cause of death is incomplete. Of these 79 women, 44 died from breath cancer, 17 died from other known cause, and the remaining 18 women died from unknown reasons. In this example, we use the logarithm of the survival days, which is denoted by Y~ . Let d be an indicator showing whether death was caused by cancer, and x be an indicator of whether cause of death was known. The data file has four covariates, ðX 1 ,X 2 ,X 3 ,X 4 Þ. X1 is defined as whether the patients accept treatment (1, treatment; 0, placebo). X2 denotes if there are four or more axillary lymph nodes positive (1, yes; 0, no). X3 indicates whether estrogen receptor status is positive (1, yes; 0, unknown). X4 indicates whether the primary tumor is 3 cm or larger (1, yes; 0, no). Therefore, the model is: Y ¼ b1 X 1 þ b2 X 2 þ b3 X 3 þ b4 X 4 þ E, where Y is the log of the time to death due to breast cancer. All the parametric estimators and jackknife variance estimators of b are exhibited in Table 4. All the estimators are positive. And, under the assumed model, time to breast-cancer death was not significantly affected by whether a woman received tamoxifen or placebo. Also, whether the primary tumor is 3 cm or larger does not make statistically significant differences neither. The result is consistent with other researcher’s result, Cummings et al. (1986) for example. But, our results can give a direct inference of their survival time. Although the proposed methods all lead to the same conclusion, the estimated standard error of the regression calibration estimator is the smallest and that of the inverse probability weighted estimator is the largest, which is consistent with the simulation results.
Acknowledgment Wang’s research was supported by the National Science Fund for Distinguished Young Scholars in China (10725106), the National Natural Science Foundation of China (10671198, 11171331), and the National Science Fund for Creative Research Groups in China, a grant from the Key Lab of Random Complex Structure and Data Science, CAS and the Key grant from Yunnan Province (2010CC003).
Appendix A. Assumptions Condition C: Define > @mðZ; yÞ @mðZ; yÞ @mðZ; yÞ @mðZ; yÞ r½mðZ; yÞ ¼ , , ..., , @y1 @y2 @yd @yd þ 1 and ( IðyÞ ¼ E
)
xrm0 ðZ, yÞr> m0 ðZ, yÞ , m0 ðZ, yÞ=½m0 ðZ, yÞð1m0 ðZ, yÞÞ
and the matrix Iðy0 Þ 4 0. (C.1)
" E
# JXJ2 ðY~ X > bÞ2 o1, 2 H ðY~ Þ
where HðtÞ ¼ PðY~ rtÞ and HðtÞ ¼ 1HðtÞ. Define tH ¼ inf ft : HðtÞ ¼ 1g. There exist a t0 2 ð0, tH Þ, such that H(t) is continuous on ½t0 , tH Þ, and the matrix SR ð>Þ, SI ð>Þ, SW ð>Þ given in Appendix B is positive definite for each > 2 ½t0 , tH Þ. (C.2) E½JXJ2 o 1.
2920
(C.3) (C.4) (C.5) (C.6) (C.7)
X. Li, Q. Wang / Journal of Statistical Planning and Inference 142 (2012) 2913–2925
pðÞ has a bounded derivative of order 1. mðÞ has a bounded derivative of order 1. AðyÞ is a positive definite matrix. WðÞ is a kernel function of order 1 with bounded support. 2 nhn !1 and nhn !0. 2
(C.8) nbn !1 and nbn !0.
Appendix B. Proofs of Theorems 2.1, 2.2, 3.1 and 4.1 To prove Theorem 2.1, the following Lemma is needed. Lemma 1. Under condition C G^ ðtÞGðtÞ a:s n sup !0: ^ n ðtÞ 0 r t r t0 1G
ðB:1Þ
Proof. 1GðtÞ SðtÞ G^ n ðtÞGðtÞ ¼ 1 þ :¼ 1 þ , 1G^ n ðtÞ 1G^ n ðtÞ S^ n ðtÞ by Taylor expansion
! SðtÞ SðtÞ ½log S^ n ðtÞ þ LðtÞ2 expðwnn ðtÞÞ, ¼ exp log ¼ expðLðtÞlog S^ n ðtÞÞ ¼ 1ðlog S^ n ðtÞ þ LðtÞÞ þ ^S n ðtÞ ^S n ðtÞ 2
where wnn ðtÞ 2 ð0,LðtÞlog S^ n ðtÞÞ. By Wang and Ng (2008), we have ^ n ðtÞ9a:s sup 9log S^ n ðtÞ þ L !0,
0 r t r t0
a:s
^ n ðtÞLðtÞ9!0: sup 9L
0 r t r t0
Lemma 1 is then proved.
&
Proof of Theorem 2.1. We can rewrite that
Sn ¼
pffiffiffi ^ nðb R bÞ ¼ S1 n An , where
n 1X m0 ðZ i ; y^ n Þ X X>, n i ¼ 1 1G^ n ðY~ i Þ i i
n 1 X X i ðY~i X > An ¼ pffiffiffi i bÞ ni¼1
ðB:2Þ ! m0 ðZ i ; y^ n Þm0 ðZ i ; y0 Þ m0 ðZ i ; y0 Þ þ : 1G^ n ðY~ i Þ 1G^ n ðY~ i Þ
ðB:3Þ
Define " # n n 1X m0 ðZ i ; y^ n Þ 1X m0 ðZ i ; y^ n Þm0 ðZ i ; y0 Þ m0 ðZ i ; y0 Þ > > > Sn ðt0 Þ ¼ ¼ XX I ~ XiXi þ X i X i IfY~ i r t0 g :¼ Sn1 ðt0 Þ þ Sn2 ðt0 Þ: n i ¼ 1 1G^ n ðY~ i Þ i i fY i r t0 g n i ¼ 1 1G^ n ðY~ i Þ 1G^ n ðY~ i Þ
ðB:4Þ We can show that
Sn1 ðt0 Þ ¼
n 1X ni¼1
(
"
^ Xi X> i ½m0 ðZ i ; y n Þm0 ðZ i ; y0 ÞI fY~ i r t0 g
1 1GðY~i Þ
þ
G^n ðY~i ÞGðY~i Þ ½1GðY~i Þ½1G^ n ðY~ i Þ
#) :¼ Sn11 ðt0 Þ þ Sn12 ðt0 Þ
and
Sn11 ðt0 Þ ¼
> ^ ^ n X X> I n X X>I 1X 1X i i Y~ i r t0 ½m0 ðZ i ; y n Þm0 ðZ i ; y0 Þ i i Y~ i r t0 r m0 ðZ i ; y0 Þðy n y0 Þ ¼ þ op ð1Þ: ni¼1 ni¼1 1GðY~i Þ 1GðY~i Þ
Since y^ n is the ML estimator by maximizing (5), by Taylor expansion we have ðy^ n y0 Þ ¼ I1 ðy0 Þ
n 1X xi ½di m0 ðZ i ; y0 Þrm0 ðZ i ; y0 Þ þ op ð1Þ: n i ¼ 1 m0 ðZ i ; y0 Þ½1m0 ðZ i ; y0 Þ
ðB:5Þ
X. Li, Q. Wang / Journal of Statistical Planning and Inference 142 (2012) 2913–2925
2921
Applying large number of law, it then as follows:
Sn11 ðt0 Þ ¼ m1 ðt0 ÞI1 ðy0 Þ
n 1X xi ½di m0 ðZ i ; y0 Þrm0 ðZ i ; y0 Þ þop ð1Þ, n i ¼ 1 m0 ðZ i ; y0 Þ½1m0 ðZ i ; y0 Þ
where "
m1 ðt0 Þ ¼ E
# XX > r> m0 ðZ; y0 ÞIfY~ r t0 g : 1GðY~ Þ
ðB:6Þ
According to our mechanism assumption, it is direct to prove n 1X xi ½di m0 ðZ i ; y0 Þrm0 ðZ i ; y0 Þ ¼ op ð1Þ: n i ¼ 1 m0 ðZ i ; y0 Þ½1m0 ðZ i ; y0 Þ
This leads to
Sn11 ðt0 Þ ¼ op ð1Þ: It is clear G^ ðy~ ÞGðy~ Þ n i i Sn12 ðt0 Þ r sup Sn11 ðt0 Þ ¼ op ð1Þ: 1G^ n ðy~ Þ 0 r Y~i r t0 i Similarly " # n 1X 1 G^ n ðY~ i ÞGðY~ i Þ > IfY~ i r t0 g :¼ Sn21 ðt0 Þ þ Sn22 ðt0 Þ: þ Sn2 ðt0 Þ ¼ X X m0 ðZ i ; y0 Þ ni¼1 i i 1GðY~i Þ ½1GðY~i Þ½1G^ n ðY~ i Þ
ðB:7Þ
We have "
Sn21 ðt0 Þ ¼ E
Sn22 ðt0 Þ r
# XX > m0 ðZ i ; y0 ÞIfY r t0 g þ op ð1Þ, 1GðY~ Þ
" # G^ ðy~ ÞGðy~ Þ XX > m0 ðZ; y0 ÞIfY r t0 g n i i þop ð1Þ ¼ op ð1Þ: E 1G^n ðy~ Þ 1GðY~ Þ 0 r Y~i r t0 i sup
Eqs. (B.3)–(B.7) lead to " # > 1 XX m0 ðZ i ; y0 ÞIfY r t0 g S1 ð t Þ ¼ E þ op ð1Þ: 0 n 1GðY~ Þ
ðB:8Þ
ðB:9Þ
ðB:10Þ
Define An ðt0 Þ as
" # n 1 X m0 ðZ i ; y^ n Þm0 ðZ i ; y0 Þ m0 ðZ i ; y0 Þ > ~ An ðt0 Þ ¼ pffiffiffi X i ðY i X i bÞIfY~ i r t0 g þ ni¼1 1G^ n ðY~ i Þ 1G^ n ðY~ i Þ " # n 1 X m0 ðZ i ; y^ n Þm0 ðZ i ; y0 Þ m0 ðZ i ; y0 Þ G^ n ðY~ i ÞGðY~i Þ m0 ðZ i ; y0 Þ þ ¼ pffiffiffi X i ðY~i X > b ÞI þ þop ð1Þ ~ i fY i r t0 g ni¼1 1GðY~i Þ 1GðY~i Þ 1GðY~i Þ 1G^ n ðY~ i Þ
An ðt0 Þ :¼ An1 ðt0 Þ þ An2 ðt0 Þ þ An3 ðt0 Þ þ op ð1Þ: Set "
m2 ðt0 Þ ¼ E
# XðY~ X > bÞr> m0 ðZ; y0 ÞIfY~ r t0 g : 1GðY~ Þ
Under conditions (C4)–(C6) n m ðt ÞI1 ðy Þx ½d m ðZ ; y ÞÞI 1 X 0 i i 0 i 0 2 0 fY~ i r t0 g þ op ð1Þ: An1 ðt0 Þ ¼ pffiffiffi m0 ðZ i ; y0 Þ½1m0 ðZ i ; y0 Þ ni¼1
Under the assumed missingness mechanism, we have ( ) m2 ðt0 ÞI1 ðy0 Þxi ½di m0 ðZ i ; y0 Þrm0 ðZ i ; y0 ÞIfY~ i r t0 g E ¼ 0: m0 ðZ i ; y0 Þ½1m0 ðZ i ; y0 Þ
ðB:11Þ
ðB:12Þ
2922
X. Li, Q. Wang / Journal of Statistical Planning and Inference 142 (2012) 2913–2925
By the central limit theorem L
An1 ðt0 Þ!Nð0, O1 ðt0 ÞÞ, 1
ðB:13Þ
where O1 ðt0 Þ ¼ m2 ðt0 ÞI ðy0 Þm t Set H~0 ðtÞ ¼ PðY~i 4 t, di ¼ 0Þ, and define
cðY~i , di , xi ; tÞ ¼
> 2 ð 0 Þ.
Z Y~i 4t ½xi pðY~i Þ½di mðY~i Þ ~ dH~0 ðsÞ I½Y~i rt, di ¼ 0 þ I½Y i r t þ : 2 ~ ~ ½1HðsÞ pðY i Þ½1HðY i Þ 1HðY~i Þ 0
ðB:14Þ
According to Wang and Ng (2008), we have n 1GðtÞ X cðY~i , di , xi ; tÞ þ op ðn1=2 Þ: G^n ðtÞGðtÞ ¼ n i¼1
ðB:15Þ
An2 ðt0 Þ can be written as n 1 X m0 ðZ i ; y0 Þ G^ n ðY~ i ÞGðY~i Þ þ op ð1Þ An2 ðt0 Þ ¼ pffiffiffi X i ðY~i X > i bÞIfY~ i r t0 g ni¼1 1GðY~i Þ 1GðY~i Þ n n 1 X m0 ðZ i ; y0 Þ X ¼ pffiffiffi X i ðY~i X > cðY~j , dj , xj ; Y~i Þ þ op ð1Þ: i bÞIfY~ i r t0 g n ni¼1 1GðY~i Þ j ¼ 1
Define m0 ðZ j ; y0 Þ m0 ðZ i ; y0 Þ cðY~j , dj , xj ; Y~i Þ þ X j ðY~j X > cðY~i , di , xi ; Y~j Þ hðY~i , dj , xj ; Y~j , t0 Þ ¼ X i ðY~i X > i bÞIfY~ i r t0 g j bÞIfY~ i r t0 g 1GðY~i Þ 1GðY~i Þ
ðB:16Þ
1 X ~ hðY i , dj , xj ; Y~j , t0 Þ: U n ðt0 Þ ¼ pffiffiffi n ioj
ðB:17Þ
and
Under assumed missingness mechanism E½cððY~i , di , xi ; Y~j Þ9Y~j ¼ 0
for iaj,
for any d-vector of constants, say a, let ha ðÞ ¼ a> hðÞ, we have E½ha ðX 1 , Y~1 , d1 , x1 , t0 ; X 2 , Y~2 , d2 , x2 , t0 Þ ¼ 0: As a result of the relationship 2
E½c ððY~i , di , xi ; Y~j Þ9Y~j Þ ¼ ½1GðY~j Þ2
Z
Y~j
½HðsÞ2 dH~0 ðsÞ r
0
½1GðY~j Þ2 2 H ðY~j Þ
for iaj,
under condition (C.1), it follows as 2 E½ha ðX 1 , Y~1 , d1 , x1 , t0 ; X 2 , Y~2 , d2 , x2 , t0 Þo 1:
Let U na ðt0 Þ ¼ a> U n ðt0 Þ for any d-vector a. By the central limit theorem for U-statistics, we have L
U na ðt0 Þ!Nð0, a> O2 ðt0 ÞaÞ,
ðB:18Þ
where
O2 ðt0 Þ ¼ E½gðX 1 , Y~1 , d1 , x1 , t0 ÞgðX 1 , Y~1 , d1 , x1 , t0 Þ>
ðB:19Þ
with ( ) ~ ~ X 2 m0 ðZ 2 ; y0 ÞðY~2 X > 2 bÞIfY~ i r t0 g cðY 1 , d1 , x1 ; Y 2 Þ ~ ~ gðX 1 , Y 1 , d1 , x1 , t0 Þ ¼ E X 1 , Y 2 , d1 , x1 : 1GðY~2 Þ That is L
An2 ðt0 Þ!Nð0, O2 ðt0 ÞÞ:
ðB:20Þ
Note that An3 ðt0 Þ is the summation of i.i.d variables. By the central limit theorem, we get L
An3 ðt0 Þ!Nð0, O3 ðt0 ÞÞ,
ðB:21Þ
X. Li, Q. Wang / Journal of Statistical Planning and Inference 142 (2012) 2913–2925
2923
where "
# m20 ðZ; y0 ÞXX > ðY~ X > bÞ2 IfY~ r t0 g O3 ðt0 Þ ¼ E : ½1GðY~ Þ2
As for covariances, under the assumed missing mechanism, we have " # X i ðY~i X > i bÞIfY~ i r t0 g m2 ðt0 Þxi ½di m0 ðZ i ; y0 Þ m0 ðZ i ; y0 Þ ¼ 0: CovðAn1 ðt0 Þ,An3 ðt0 ÞÞ ¼ I1 ðy0 ÞE m0 ðZ i ; y0 Þ½1m0 ðZ i ; y0 Þ 1GðY~i Þ
ðB:22Þ
n1 1 m ðt0 Þx1 ½d1 m0 ðZ 1 ; y0 Þ I ðy0 ÞE 2 hðY~1 , d1 , x1 ; Y~2 , d2 , x2 , t0 Þ n m0 ðZ 1 ; y0 Þ½1m0 ðZ 1 ; y0 Þ ( " #) X 2 ðY~2 X > m2 ðt0 ÞIfY~ 1 r t0 g x1 ½d1 m0 ðZ 1 ; y0 Þ ~ 1 1 2 bÞIY~ 2 r t0 I ðy0 ÞE c ðY 1 , d1 , x1 ; Y~2 Þ9Y~2 ¼ 1 E n m0 ðZ 1 ; y0 Þ½1m0 ðZ 1 ; y0 Þ 1GðY~2 Þ
CovðAn1 ðt0 Þ,An2 ðt0 ÞÞ ¼
ðB:23Þ !I1 ðy0 ÞE
~ ~ m0 ðZ 2 ; y0 ÞðY~2 X > bÞX 2 m> 2 ðt0 ÞIðY 1 r Y 1 ÞIfY~ 1 r t0 g ÞIfY~ 2 r t0 g :¼ O1;2 ðt0 Þ: ~ ~ ½1GðY 2 Þ½1HðY 1 Þ
ðB:24Þ
2
3 n X X 1 m ðZ ; y Þ 1 0 0 i > CovðAn2 ,An3 Þ ¼ E4pffiffiffi X i ðY~ i X i bÞIfY~ i r t0 g hðY~ i , di , xi , t0 ; Y~ j , dj , xj , t0 Þ5 pffiffiffi ni¼1 1GðY~ i Þ n n i o j " # > X 2 ðY~ 2 X > nðn1Þ X 1 ðY~ 1 X 1 bÞIfY~ 1 r t0 g m0 ðZ 1 ; y0 Þ 2 bÞIfY~ 2 r t0 g m0 ðZ 2 ; y0 Þ ~ 2 , d2 , x ; Y~ 1 Þ9Y~ 1 ¼ E c ð Y E 2 n2 1GðY~ 1 Þ 1GðY~ 2 Þ !) ( Z Y~ 1 4Y~ 2 > ~ ~ 0 ðsÞ X 1 ðY~ 1 X > dH IðY~ 2 r Y~ 1 , d2 ¼ 0Þ 1 bÞm0 ðZ 1 ; y0 Þ X 1 ðY 2 X 2 bÞm0 ðZ 2 ; y0 Þ !E þ IfY~ 1 r t0 g IfY~ 2 r t0 g ½1HðsÞ2 1GðY~ 1 Þ 1GðY~ 2 Þ 1HðY~ 1 Þ 0 :¼ O2;3 ðt0 Þ:
ðB:25Þ
Together Eqs. (B.10), (B.19)–(B.24), we get L
An ðt0 Þ!Nð0, OR ðt0 ÞÞ,
ðB:26Þ
where
OR ðt0 Þ ¼ O1 ðt0 Þ þ O2 ðt0 Þ þ O3 ðt0 Þ þ 2O2;3 ðt0 Þ þ2O1;2 ðt0 Þ:
ðB:27Þ
We give the following notations: O1 :¼ O1 ðtH Þ, O2 :¼ O2 ðtH Þ, O3 :¼ O3 ðtH Þ, O1;2 :¼ O1;2 ðtH Þ, O2;3 :¼ O2;3 ðtH Þ, O1;3 :¼ O1;3 ðtH Þ. p Under condition C, since An ðt0 ÞAn ðtH Þ!0, limt0 -tH OR ðt0 Þ ¼ OR , and Sðt0 ÞS!0. We then prove Theorem 2.1. & Proof of Theorem 2.2. In the same spirit as that of Theorem 2.2 of Wang and Dinse (2011), we can prove Theorem 2.2. & Proof of Theorem 3.1.
SnI ðt0 Þ ¼
n n 1X xi di þ ð1xi Þm0 ðZ i ; y^ n Þ 1X xi di þ xi m0 ðZ i ; y^ n Þ XiX> Xi X> i I fY~ i r t0 g ¼ Sn ðt0 Þ þ i IfY~ i r t0 g ^ ~ ni¼1 n 1G n ðY i Þ 1G^ n ðY~ i Þ i¼1
¼ Sn ðt0 Þ þ
n n 1X xi ðdi m0 ðZ i , y0 ÞÞ 1X xi ðm0 ðZ i , y0 Þm0 ðZ i , y^ n ÞÞ Xi X> XiX> i IfY~ i r t0 g þ i I fY~ i r t0 g þ op ð1Þ: ~ ni¼1 n 1GðY i Þ 1GðY~i Þ i¼1
It is easy to see " # xi ðdi m0 ðZ i , y0 ÞÞ > X i X i IfY~ i r t0 g ¼ 0: E 1GðY~i Þ By Taylor expansion and the asymptotic expression of y^ n , we have n n m ðt ÞI1 ðy Þx I 0 i fY~ i r t0 g ½di m0 ðZ i ; yÞ 1X xi ðm0 ðZ i , y^ n Þm0 ðZ i , y0 ÞÞ 1X 3 0 þ op ð1Þ Xi X> i IfY~ i r t0 g ¼ ~ ni¼1 ni¼1 m0 ðZ i ; yÞ½1m0 ðZ i ; yÞ 1GðY i Þ " # m3 ðt0 ÞI1 ðy0 Þxi IfY~ i r t0 g ½di m0 ðZ i ; yÞ ¼E þ op ð1Þ ¼ op ð1Þ, m0 ðZ i ; yÞ½1m0 ðZ i ; yÞ
2924
X. Li, Q. Wang / Journal of Statistical Planning and Inference 142 (2012) 2913–2925
where "
# > XX > r m0 ðZ; y0 ÞxIfY~ r t0 g m3 ðt0 Þ ¼ E : 1GðY~ Þ This shows 1 1 S1 ðt0 Þ þ op ð1Þ, nI ðt0 Þ ¼ Sn ðt0 Þ þ op ð1Þ ¼ S
ðB:28Þ (
n n 1X xi di þ ð1xi Þm0 ðZ i ; y^ n Þ 1X x ½d m0 ðZ i ; y^ n Þ X i ðY~i X > X i ðY~i X > bÞIfY~ i r t0 g i i ¼ An ðt0 Þ þ i bÞIfY~ i r t0 g i ni¼1 ni¼1 1GðY~i Þ 1G^ n ðY~ i Þ ) ^ ^ ^ ~ ~ ~ ~ xi di m0 ðZ i ; y0 Þ G n ðY i ÞGðY i Þ xi ½m0 ðZ i ; y n Þm0 ðZ i ; y0 Þ G n ðY i ÞGðY i Þ þ 1GðY~i Þ 1GðY~i Þ 1G^ n ðY~ i Þ 1G^ n ðY~ i Þ ( n ^ 1X xi ½di m0 ðZ i ; y n Þ xi ½di m0 ðZ i ; y0 Þ G^ n ðY~ i ÞGðY~i Þ ¼ An ðt0 Þ þ X i ðY~i X > þ i bÞIfY~ i r t0 g ni¼1 1GðY~i Þ 1GðY~i Þ 1GðY~i Þ ) ^ ^ ~ ~ x ½m0 ðZ i ; y n Þm0 ðZ i ; y0 Þ G n ðY i ÞGðY i Þ þ op ð1Þ :¼ An ðt0 Þ þ Bn1 ðt0 Þ þ Bn2 ðt0 Þ þ Bn3 ðt0 Þ þ op ð1Þ: i 1GðY~i Þ 1G^ n ðY~ i Þ
AnI ðt0 Þ ¼
ðB:29Þ
Similar to An1 ðt0 Þ, we have 1 n xI ðy0 Þ½di m0 ðZ i ; y0 Þ 1 X i fY~ i r t0 g m4 ðt0 ÞI þop ð1Þ, Bn2 ðt0 Þ ¼ pffiffiffi m0 ðZ i ; y0 Þ 1m0 ðZ i ; y0 Þ ni¼1
where "
m4 ðt0 Þ ¼ E
# > XðY~ X > bÞr m0 ðZ; y0 ÞxIfY~ r t0 g : 1GðY~ Þ
By some calculations we have " # n 1 X X i ðY~i X > m4 ðt0 ÞI1 ðy0 Þ i bÞ ^ Bn1 ðt0 Þ þ Bn2 ðt0 Þ ¼ pffiffiffi xi I ~ ½di m0 ðZ i ; y n Þ þ op ð1Þ :¼ Bn12 ðt0 Þ þ op ð1Þ: m0 ðZ i , y0 Þ½1m0 ðZ i , y0 Þ n i ¼ 1 fY i r t0 g 1GðY~i Þ
By the central limit theorem, it follows as L
Bn12 ðt0 Þ!Nð0, OI1 ðt0 ÞÞ,
ðB:30Þ
where
OI1 ðt0 Þ ¼ EfðLðX, Y~ ÞÞðLðX, Y~ ÞÞ> IfY~ i r t0 g pðY~ Þm0 ðZ; yÞ½1m0 ðZ; yÞg,
ðB:31Þ
and LðX, Y~ Þ ¼
XðY~ X > bÞ m4 ðt0 ÞI1 ðy0 Þ : m0 ðZ; y0 Þ 1m0 ðZ; y0 Þ 1GðY~ Þ
For Bn3 ðt0 Þ, we have n X n xi ½di m0 ðZ i ; y0 ÞcðY~ j , dj , xj ; Y~i Þ 1 X X i ðY~i X > Bn3 ðt0 Þ ¼ pffiffiffi þ op ð1Þ i bÞI fY~ i r t0 g n ni¼1j¼1 1GðY~i Þ
xi ½di m0 ðZ i ; y0 ÞcðY~ j , dj , xj ; Y~i Þ 1 X X i ðY~i X > þ op ð1Þ: ¼ pffiffiffi i bÞIfY~ i r t0 g n n iaj 1GðY~i Þ Similar to Wang and Dinse (2011), we get Bn3 ðt0 Þ ¼ op ð1Þ:
ðB:32Þ
Under the assumed missingness mechanism covðAn1 ðt0 Þ, Bn1;2 ðt0 ÞÞ ¼ 0, covðU n ,Bn1;2 ðt0 ÞÞ ¼ 0, covðAn3 ðt0 Þ,Bn1;2 ðt0 ÞÞ ¼ 0. p Define, OI ðt0 Þ ¼ OR ðt0 Þ þ OI1 ðt0 Þ, OI :¼ OI ðtH Þ: Under Condition C, since AnI ðt0 ÞAnI ðtH Þ!0, and limt0 -tH OI ðt0 Þ ¼ OI . This and (B.30), (B.31), (B.32), together prove Theorem 3.1. & Proof of Theorem 4.1. Similar to the prove of Theorem 2.1, we can show that
x
i ½di m0 ðZ i ; y0 Þ n 1X pðY~ i Þ Xi X> SnW ðt0 Þ ¼ Sn ðt0 Þ þ i I fY~ i r t0 g ni¼1 1GðY~i Þ
X. Li, Q. Wang / Journal of Statistical Planning and Inference 142 (2012) 2913–2925
xi n 1X pðY~ i Þ ni¼1
½m0 ðZ i ; y^ n Þm0 ðZ i ; y0 Þ 1GðY~i Þ
2925
x
i ½di m0 ðZ i ; y0 Þ ^ ~ n 1X G n ðY i ÞGðY~i Þ pðY~ i Þ I þ XiX> XiX> i fY~ i r t0 g i IfY~ i r t0 g þ op ð1Þ: ~ ni¼1 1GðY i Þ 1G^ n ðY~ i Þ
Similar to the proof of Theorem 3.1 in Wang and Dinse (2011), we can prove the latter two terms to be op ð1Þ. So, SnW ðt0 Þ ¼ Sn ðt0 Þ þop ð1Þ ¼ Sðt0 Þ þop ð1Þ. For AnW ðt0 Þ, we have n xi IfY~ i r t0 g di =p^ ðY~i Þ þ ½1xi =p^ ðY~i Þm0 ðZ i ; y^ n Þ 1 X AnW ðt0 Þ ¼ pffiffiffi X i ðY~i X > i bÞ ni¼1 1G^ n ðY~ i Þ n 1 X ½di m0 ðZ i ; y0 Þxi =pðY~i Þ X i ðY~i X > ¼ An ðt0 Þ þ pffiffiffi i bÞIfY~ i r t0 g ni¼1 1GðY~i Þ n 1 X ½m0 ðZ i ; y^ n Þm0 ðZ i ; y0 Þxi =pðY~i Þ pffiffiffi X i ðY~i X > þ op ð1Þ i bÞIfY~ i r t0 g ni¼1 1GðY~i Þ
:¼ An ðt0 Þ þ C n1 ðt0 Þ þC n2 ðt0 Þ þ op ð1Þ, where 1 n xI ðy0 Þ 1 X i fY~ i r t0 g ½di m0 ðZ i ; y0 Þm2 ðt0 ÞI þop ð1Þ: C n2 ðt0 Þ ¼ pffiffiffi m0 ðZ i ; y0 ½1m0 ðZ i ; y0 Þ ni¼1
ðB:33Þ
Hence, it follows from the central limit theorem: L
C n1 ðt0 Þ þC n2 ðt0 Þ!Nð0, OW1 ðt0 ÞÞ, where
(
~ Y~ , t0 ÞÞðLðX, ~ Y~ , t0 ÞÞ> OW1 ðt0 Þ ¼ E ðLðX,
) m0 ðZ, yÞ½1m0 ðZ; yÞIfY~ r t0 g , pðY~ Þ
ðB:34Þ
and > ~ pðY~ Þm2 ðt0 Þ ~ Y~ , t0 Þ ¼ XðY X bÞ : LðX, m0 ðZ, y0 Þ½1m0 ðZ, y0 Þ 1GðY~ Þ p
It can be shown that covðAn ðt0 Þ,C n1 ðt0 Þ þ C n2 ðt0 ÞÞ ¼ 0. Under condition C.2, AnW ðt0 ÞAnW ðtH Þ!0, and limt0 -tH OW ðt0 Þ pffiffiffi ¼ OW ðtH Þ :¼ OW , since nðb^ W b0 Þ ¼ S1 nW ðtH ÞAnW ðtH Þ, we then prove Theorem 4.1. & References Bao, Y., He, S., Mei, C., 2007. The Koul–Susarla–Van Ryzin and weighted least square estimates for censored linear regression model: a comparative study. Computational Statistics & Data Analysis 51, 6488–6497. Buckley, J., James, I., 1979. Linear regression with censored data. Biometrika 66, 429–436. Cummings, F.J., Gray, R., Davis, T.E., Tormey, D.C., Harris, J.E., Falkson, G.G., Arseneau, J., 1986. Tamoxifen versus placebo: double-blind adjuvant trial in elderly women with stage ii breast cancer. NCI Monograph 1, 119–123. Dinsert, G.E., 1982. Nonparametric estimation for partially-complete time and type of failure data. Biometrics 38, 417–431. Geotghebeur, E.J., Ryan, L., 1995. Analysis of competing risks survival data when some failure types are missing. Biometrika 82, 821–833. Horvitz, D.G., Thompson, D.J.H., 1952. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association 47, 663–685. Koul, H., Susarla, V., van Ryzin, J., 1981. Regression analysis with randomly right-censored data. Annals of Statistics 9, 1276–1288. Little, R.J.A., Rubin, D.B., 1987. Statistical Analysis with Missing Data, first ed. Wiley, New York, pp. 1276–1288. McKeague, I.W., Subramanian, S., 1998. Product-limit estimators and cox regression with missing censoring information. Scandinavian Journal of Statistics 25, 589–601. Miller, R.G., 1976. Least square regression with censored data. Biometrika 63, 449–464. Peddada, S.D., Patwardhan, G., 1992. Jackknife variance estimators in linear models. Biometrika 79, 654–657. Rowe, A.K., Rowe, S.Y., Snow, R.W., et al., 2006. The burden of malaria mortality among African children in the year 2000. International Journal of Epidemiology 35, 691–704. Stute, W., 1993. Consistent estimation under random censorship when covariables are present. Journal of Multivariate Analysis 45, 89–103. Stute, W., 1996. Distributional convergence under random censorship when covariables are present. Scandinavian Journal of Statistics 23, 461–471. Wang, Q., Dinse, G.E., 2011. Linear regression analysis with missing censoring indicators. Lifetime Data Analysis 17, 256–279. Wang, Q., Ng, K.W., 2008. Asymptotically efficient product-limit estimators with censoring indicators missing at random. Statistica Sinica 18, 749–768. Zhou, X., Sun, L., 2003. Additive hazards regression with missing censoring information. Statistica Sinica 13, 1237–1257.