Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i
Checking the adequacy of a general linear model with responses missing at random Zhihua Suna, b , Qihua Wangb, c, ∗ a b c
Department of Mathematics, Graduate University of Chinese Academy of Sciences, Beijing 100049, China Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, China Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong, China
A R T I C L E
I N F O
Article history: Received 20 February 2008 Received in revised form 1 August 2008 Accepted 12 April 2009 Available online 12 May 2009 Keywords: General linear model Lack-of-fit test Missing at random Score-type test Empirical process Sensitivity analysis
A B S T R A C T
In this paper, we consider model checking problem for a general linear model with response missing at random. First, two completed data sets are constructed by imputation and inverse probability weighting methods. Then two score-type and two empirical process based test statistics are proposed by using the constructed data sets. The large sample properties of the test statistics under the null and local alternative hypotheses are investigated. It is shown that both score-type tests are consistent and can detect the local alternatives close to the null ones at the rate n−r with 0 ⱕ r ⱕ 12 , and the empirical process based tests can detect the local alternatives close to the null ones at the rate n−1/2 . The power and the choice of the weighting function of the proposed tests are discussed. Simulation studies show that the tests perform well and outperform the existing results. © 2009 Elsevier B.V. All rights reserved.
1. Introduction In this paper, we consider the general linear model Y = H(X) + ,
(1.1)
where Y is a scalar response, X is a d-variate random covariate vector, H(·) is a known vector function of p-dimensions, is an unknown parameter vector of p-dimensions, the model error satisfies E[|X] = 0 and E[2 |X] = 2 (X) < ∞. Here, the superscript in (1.1) denotes a transpose. Clearly, if H(·) is a known polynomial function of the covariate X, then (1.1) reduces to a polynomial model, see Cheng and Van Ness (1998), Cheng and Schneeweiss (1998), Cheng and Kukush (2004), and so on; if H(X) = X, model (1.1) is the classical linear model, which has gained extensive studies and wide applications. Compared with the traditional linear model Y = X + with E(|X) = 0, model (1.1) allows for interaction and high order terms of the covariate and therefore is more flexible and applicable. Clearly, the general linear model (1.1) takes the traditional linear model as a special case and a fitting method for the classical linear model can also be used for the general linear model by simple data transformation. So, like the traditional linear model, the general linear model can be widely used in many data analysis problems due to its simplicity and flexibility. Of course, before any conclusion can be drawn from a model, it is necessary to ascertain whether the model can fit the data or not. This can be obtained by carrying out model checking study. In fact, recently, model checking problem for the classical linear model has obtained some attentions. When both Y and X are observed completely, Ha¨ rdle and Mammen (1993) applied the distance between the nonparametric kernel and parametric fits to build test statistic and resorted to a wild bootstrap method to determine ∗ Corresponding author at: Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, China. E-mail address:
[email protected] (Q. Wang). 0378-3758/$ - see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2009.04.024
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
3589
critical value of the test; to avoid applying nonparametric fit method, Stute et al. (1998) proposed a test method based on the empirical process of the regressor residuals. Manteiga and González (2006) extended Ha¨ rdle and Mammen's (1993) method to missing response data case with complete case analysis and imputation analysis. When the covariates are measured with error, by assuming additive error model structure and the known error variance, Zhu and Cui (2005) proposed a score-type test for the general linear model. In this paper, we investigate the model checking problem for (1.1) with Y missing, that is, we wish to test the following null hypothesis: H0 : E(Y|X = ·) = H(·)
(1.2)
for some and known H(·) with some Y's missing. Throughout this paper, we take the commonly used missing at random (MAR) assumption. Suppose we obtain a random sample of incomplete data (Yi , i , Xi ),
i = 1, 2, . . . , n
from model (1.1), where i = 0 if Yi is missing, otherwise i = 1. The MAR assumption implies that and Y are conditional independent given X. That is, P( = 1|Y, X) = P( = 1|X). MAR is a common assumption for statistical analysis with missing data and is reasonable in many practical situations; See Little and Rubin (1987, Chapter 1), Wang and Rao (2002), Wang et al. (2004), and so on. Under the MAR assumption, we first construct two completed data sets by the very popular imputation and inverse probability weighting methods. Then based on the completed data sets, two score-type and two empirical process based test statistics are built, respectively. Actually, for the model checking problem of the classical linear model with response missing at random, seminal work is due to Manteiga and González (2006), which built test statistics based on the L2 distance between the nonparametric and parametric fits. The general linear model (1.1) that we consider in this paper takes the classical linear model considered by Manteiga and González (2006) as a special case. Furthermore, instead of the methods involving local nonparametric smoothing, which are applied by Manteiga and González (2006), we develop score-type and empirical process based test methods. These methods, compared with the methods of Manteiga and González (2006), have the following merits: first, some of these methods have no the “curse of dimensionality” problem; secondly they are able to detect local alternative hypothesis close to the null hypothetical model at faster rate; thirdly, the powers of the proposed tests depend slightly on the choice of the smoothing parameters. The rest of this paper is organized as follows. In Section 2, we construct two completed data sets. In Section 3, two score-type tests are built and their asymptotic properties are investigated. The empirical process based testing methods are investigated in Section 4. In Section 5, simulation studies are conducted to evaluate the proposed tests and compare the proposed methods with the existing methods. We also apply the proposed methods to analyze a real data example in this section. The proof of the main results are presented in the Appendix. 2. Construction of completed data sets by imputation and the inverse probability weighted approach We first estimate using the completely observed data. From (1.1), we have
i Yi = i H(Xi ) + i i .
(2.1)
Then we can estimate by the least square method: ⎡ 1 ˆ n = ⎣ n
n
⎤−1 ⎦
i H(Xi )H(Xi )
i=1
n 1 i H(Xi )Yi . n
(2.2)
i=1
ˆ n (x) = n K((x − X )/bn )/ n K((x − X )/bn ) with K(·) being a symmetric kernel function Denote (x) = P( = 1|X = x) and i i i=1 i i=1 of order 2 and bn a bandwidth sequence. Then, by imputation and the inverse probability weighting methods, we can construct two completed data sets, respectively, as follows: (Yˆ ij , Xi ),
i = 1, 2, . . . , n, j = 1, 2,
(2.3)
where Yˆ i1 = i Yi + (1 − i )H(Xi ) ˆ n ,
i = 1, 2, . . . , n
(2.4)
H(Xi ) ˆ n ,
(2.5)
and Yˆ i2 =
i
ˆ n (Xi )
Yi + 1 −
i
ˆ n (Xi )
i = 1, 2, . . . , n.
3590
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
3. Score-type tests and their asymptotic properties Notice that under H0 , we have E[(Y − H(X) )W(X)] = 0
(3.1)
for any bounded weighting function W(·). Then we can apply the completed data (2.3) to construct two residual-based test statistics as follows: n 1 ˆ (Yij − H(Xi ) ˆ n )Wsj (Xi ), Rsnj = √ n
j = 1, 2,
(3.2)
i=1
where Wsj (·) is a bounded measurable weighting function. For Rsnj , j = 1, 2, we have the following results. Theorem 3.1. Under H0 and Assumptions (A.a)–(A.e), we have L
Rsnj −→ N(0, Vjs ),
j = 1, 2,
where 2 V1s = E[(X)2 (X)Ws1 (X)] + A1 −1 −1 A1 − 2A1 −1 E[(X)H(X)2 (X)Ws1 (X)]
and
V2s = E
1 2 2 (X)Ws2 (X) + A2 −1 −1 A2 − 2A2 −1 E[H(X)2 (X)Ws2 (X)] (X)
(3.3)
(3.4)
with A1 = E[(X)H(X) Ws1 (X)],
= E[H(X)H(X) ],
= E[(X)H(X)H(X) 2 (X)],
A2 = E[H(X) Ws2 (X)].
Since Rsnj , j = 1, 2, is not scale-invariant, we define a standardized score-type test statistic with the following quadratic form: s Tnj
=
[Rsnj ]2 s Vnj
⎤2 ⎡ n n ⎣1 ˆ ˆ = s (Yij − H(Xi ) n )Wsj (Xi )⎦ , Vnj n
j = 1, 2,
(3.5)
i=1
s is some consistent estimator of the asymptotic variance V s . For example, we can define V s as follows. First, we estimate where Vnj n1 j 2 (X)], A , , and E[(X)H(X)2 (X)W (X)], respectively, by E[(X)2 (X)Ws1 1 s1 n 1 2 i (Yi − H(Xi ) ˆ n )2 Ws1 (Xi ), n i=1
n 1 i H(Xi )H(Xi ) , n i=1
n 1 i H(Xi ) Ws1 (Xi ), n i=1
n 1 i H(Xi )H(Xi ) (Yi − H(Xi ) ˆ n )2 n i=1
and n 1 i H(Xi )(Yi − H(Xi ) ˆ n )2 Ws1 (Xi ). n i=1
Then we can obtain a consistent estimator of the asymptotic variance V1s by combining sample moment and “plug-in” methods. An advantage of the above variance estimation is that it avoids estimating (x). This is based on the following considerations. It avoids selecting a smoothing parameter and to some extent avoids the problem “curse of dimensionality” which may happen in the estimating of (x). Similarly, we can define a consistent estimator of V2s . From Theorem 3.1, we have the following results. Corollary 3.1. Under the conditions of Theorem 3.1, we have L
s Tnj −→ 21 ,
j = 1, 2,
where 21 is a central Chi-squared random variable with 1 degree of freedom.
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
3591
s is large enough, we reject H . According to Corollary 3.1, if Tnj 0 Next we investigate how sensitive the tests (3.5) are to the alternatives. Consider a sequence of models
H1n :
Yi = H(Xi ) + Cn G(Xi ) + i ,
i = 1, 2, . . . , n
(3.6)
with E[i |X] = 0 and some arbitrary function G(·) satisfying E[G(X)]2 < ∞. Then we have the following results. Theorem 3.2. Assuming the same conditions as Theorem 3.1, under H1n , we have (i) if n1/2 Cn −→ 1, L
s −→ 21 (D2j ), Tnj
j = 1, 2
where 21 (D2j ) is a noncentral Chi-squared random variable with 1 degree of freedom and the noncentrality parameter D21 =
1 [E((X)G(X)Ws1 (X)) − A1 −1 E((X)H(X)G(X))]2 V1s
D22 =
1 [E(G(X)Ws2 (X)) − A2 −1 E((X)H(X)G(X))]2 . V2s
and
(ii) If n Cn −→ a with 0 ⱕ < 1/2 and a 0, then L
s −→ ∞, Tnj
j = 1, 2.
s and T s have asymptotic power 1 for global alternatives ( = 0) and local alternatives The results suggest that both tests Tn1 n2 which are distinct from the null hypothesis at the rate n− with 0 < < 1/2. The tests can also detect alternatives converging to the null hypothesis at the rate n−1/2 , the possible fastest rate for lack-of-fit test. s is 2 − ( From the above theorems, it can be also shown that, for j = 1, 2, the asymptotic power of Tnj /2 − Dj ) − (/2 + Dj ) for
alternatives which are distinct from the null ones at the rate n−1/2 , where (·) is the standard normal distribution function and /2 is the /2 quantile of the standard normal distribution. Hence, we can find the optimal weighting function by maximizing D2j . Note that D21 = =
1 [E((X)G(X)Ws1 (X)) − A1 −1 E((X)H(X)G(X))]2 V1s [E((X)G(X)Ws1 (X)) − A1 −1 E((X)H(X)G(X))]2 2 (X)] + A −1 −1 A − 2A −1 E[(X)H(X)2 (X)W (X)] E[(X)2 (X)Ws1 1 1 s1 1
{E[(Ws1 (X) − A1 −1 H(X))G(X)]}2
=
E[(Ws1 (X) − A1 −1 H(X))(Y − H(X) )]2
G2 (X) ⱕE . [(Y − H(X) )]2 The equality holds only when Ws1 (X) =
G(X)
[(Y − H(X) )]2
+ A1 −1 H(X)G(X)
(3.7)
with any constant not equal to 0. Hence the test with weighting function Ws1 (·) which satisfies (3.7) is optimal and the corresponding optimal asymptotic power is 2 − ⎝/2 − E ⎛
G2 (X)
(Y − H(X) )2
⎛ ⎞ ⎠ − ⎝/2 + E
G2 (X)
(Y − H(X) )2
⎞ ⎠.
3592
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
s , by maximizing the asymptotic power 2 − ( Similarly, for testing Tn2 /2 − D2 ) − (/2 + D2 ), we can obtain the optimal weighting function Ws2 (·) satisfying
Ws2 (X) =
(X)G(X)
[(Y − H(X) )]2
+ (X)A2 −1 H(X)G(X)
(3.8)
and D22 attains the maximum E{G2 (X)/([(Y − H(X) )]2 )}. s and T s have the same optimal Clearly, when the weight functions Ws1 (·) and Ws2 (·) are chosen to be optimal, the tests Tn1 n2 asymptotic power ⎛ ⎛ ⎞ ⎞ 2 (X) G G2 (X) ⎠ − ⎝/2 + E ⎠. 2 − ⎝/2 − E (Y − H(X) )2 (Y − H(X) )2 As for the choice of the optimal bandwidth, Zhu and Ng (2003) and Manteiga and González (2006) pointed out that it is an open problem in the testing problem. Manteiga and González (2006) also pointed out that the choice of bandwidth has great influence on the effect of their proposed tests. No effective method is afforded to choose the bandwidth in their paper. However, s does not depend on any bandwidth for our tests, the influence of the bandwidth is not so serious. It is noted that the first test Tn1 s s ˆ and the second test T depends bandwidth bn by n (·). For T , the optimal bandwidth should make that the testing power n2
n2
s is mainly determined by the distance between its two distributions under attains the maximum. Clearly, the testing power for Tn2 H0 and H1 . And the bandwidth does not affect the first order term of the distance although it may affect the higher order term. This shows that the effect of the bandwidth bn on the test is not so serious, which is also verified by the simulation. The reasons s globally by ˆ n (·) and the effect of the bandwidths at every point X can somewhat cancel might be that the bandwidths affect Tn2 i out each other for i = 1, 2, . . . , n.
4. Empirical process based tests and their asymptotic properties In this section, we develop testing methods based on empirical process. Note that the fact that H0 is true is equal to E[(Y − H(X) )W(X)I(X ⱕ x)] = 0
(4.1)
for any x and any bounded weight function W(·). With the two completed data sets given in (2.3), we can construct an estimated empirical process of (4.1) as follows: n 1 ˆ (Yij − H(Xi ) ˆ n )WEj (Xi )I(Xi ⱕ x), REnj (x) = √ n
j = 1, 2,
(4.2)
i=1
where WEj (·) is a bounded measurable weight function. Then a test statistic can be defined by E = [REnj (x)]2 dF n (x), j = 1, 2, Tnj
(4.3)
where Fn is the empirical distribution based on (X1 , X2 , . . . , Xn ). Since n 1 i (Yi − H(Xi ) ˆ n )WE1 (Xi )I(Xi ⱕ x) REn1 (x) = √ n
(4.4)
n 1 i REn2 (x) = √ (Y − H(Xi ) ˆ n )WE2 (Xi )I(Xi ⱕ x). ˆ n (x) i n i=1
(4.5)
i=1
and
Theorem 4.1. Under H0 and Assumptions (A.a)–(A.e), we have (i)
n n 1 1 REn1 (x) = √ i (Yi − H(Xi ) )WE1 (Xi )I(Xi ⱕ x) − B1 (x)−1 √ i H(Xi )(Yi − H(Xi ) ) + op (1) n n
(4.6)
n n 1 1 i REn2 (x) = √ i H(Xi )(Yi − H(Xi ) ) + op (1) (Yi − H(Xi ) )WE2 (Xi )I(Xi ⱕ x) − B2 (x)−1 √ (Xi ) n n
(4.7)
i=1
i=1
and
i=1
i=1
where B1 (x) = E[(X)H(X) WE1 (X)I(X ⱕ x)], B2 (x) = E[H(X) WE2 (X)I(X ⱕ x)] and is given in Theorem 3.1.
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
3593
(ii) REnj converges in distribution to REj in the Skorohod space D[−∞, ∞]d for j = 1, 2, where REj is a centered Gaussian process with covariance function 2 (X)I(X ⱕ x)] + B1 (x1 )−1 −1 B1 (x2 ) Cov(RE1 (x1 ), RE1 (x2 )) = E[(X)2 (X)WE1
− B1 (x1 )−1 E[(X)H(X)2 (X)WE1 (X)I(X ⱕ x2 ] − B1 (x2 )−1 E[(X)H(X)2 (X)WE1 (X)I(X ⱕ x1 ] and
Cov(RE2 (x1 ), RE2 (x2 )) = E
(4.8)
2 (X)I(X ⱕ x) 2 (X)WE2 + B2 (x1 )−1 −1 B2 (x2 ) (X)
− B2 (x1 )−1 E[(X)H(X)2 (X)WE2 (X)I(X ⱕ x2 ] − B2 (x2 )−1 E[(X)H(X)2 (X)WE2 (X)I(X ⱕ x1 ]
(4.9)
where x = x1 ∧ x2 and , are shown in Theorem 3.1. E converges in distribution to [RE (x)]2 dF(X) where F(·) is the distribution of X. (iii) Tnj j E , j = 1, 2 are to the alternatives (3.6). Next we consider how sensitive the tests Tnj
Theorem 4.2. Under the assumptions of Theorem 4.1 and the alternative (3.6), if n1/2 Cn −→ 1, we have (i) REnj converges in distribution to REj + Sj where REj is a centered continuous Gaussian process as shown in Theorem 4.1 and S1 = E[(X)G(X)WE1 (X)I(X ⱕ x)] − B1 (x)−1 E((X)H(X)G(X)), S2 = E[G(X)WE2 (X)I(X ⱕ x)] − B2 (x)−1 E(H(X)G(X)). E converges in distribution to (RE + S )2 dF(X); if n C −→ a, 0 ⱕ < 1/2, then we have T E −→ ∞, for j = 1, 2. (ii) Tnj n j j nj E has the same fine properties as T s : it has asymptotic power 1 for global alternatives The above theorem shows that the test Tnj nj ( = 0) and local alternatives which are distinct from the null hypothesis at the rate n− with 0 < < 1/2; it can also detect alternatives converging to the null hypothesis at the rate n−1/2 . E From Theorem 4.1, although the asymptotic covariance of REnj (x), for j = 1, 2, is available, the variance of the test statistic Tnj seems very complex. To determine the critical value, we consider a modified version of the wild bootstrap method to approximate the null distribution of the tests. The method comes from the idea of random symmetrization, see Pollard (1984). The procedure is as follows.
Step 1: We generate independent random variables ei , i = 1, 2, . . . , n, with E(ei ) = 0 and E(e2i ) = 1. Denote En = (e1 , e2 , · · · , en ). Let n n 1 [BE] ˆ n )−1 √1 Rn1 (x) = √ ei i (Yi − H(Xi ) ˆ n )WE1 (Xi )I(Xi ⱕ x) − Bˆ n1 (x)( ei i H(Xi )(Yi − H(Xi ) ˆ n ) n n i=1
(4.10)
i=1
and n n 1 [BE] ˆ n )−1 √1 Rn2 (x) = √ ei i (Yi − H(Xi ) ˆ n )WE2 (Xi )I(Xi ⱕ x) − Bˆ n2 (x)( ei i H(Xi )(Yi − H(Xi ) ˆ n ), n n i=1
(4.11)
i=1
where n 1 Bˆ n1 (x) = i H(Xi ) WE1 (Xi )I(Xi ⱕ x), n i=1
n 1 i Bˆ n2 (x) = H(Xi ) WE2 (Xi )I(Xi ⱕ x) ˆ n i=1 n (Xi )
ˆ n = 1/n(n H(X )H(X ) ). The resulting conditional test statistic is are, respectively, estimators of B1 (x) and B2 (x), i i i=1 i [BE] [BE] Tnj (En ) = [Rn (x)]2 dF n (x), j = 1, 2. (4.12)
3594
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604 [BE]
[BE]
Step 2: Generate m sets of En , w.r.t. Eni , i = 1, 2, . . . , m and get m values of Tnj (En ), write Tnj (Eni ), i = 1, 2, . . . , m. [BE]
E for j = 1, 2. Step 3: The 1 − quantile of Tnj (Eni ), i = 1, 2, . . . , m will act as the a-level critical value for the test Tnj
For the bootstrap test statistic, we have the following results. Theorem 4.3. Assume the conditions of Theorem 4.1 hold, under the null hypothesis or alternatives (3.6) with Cn = O(n−1/2 ), for [BE] sequence sample {(Y1 , X1 , 1 ), · · · , (Yn , Xn , n ), · · ·}, the conditional distribution of Tnj (En ) converges in distribution to the limiting null
E respectively for j = 1, 2. distribution of Tnj
As the re-sample procedure aims for determining the critical values, therefore it is required that the determination of the critical values is not affected by the underlying model. The above theorem shows that the critical value determined by the proposed re-sampling method under the local alternative with Cn = O(n−1/2 ) is asymptotically equal to that under the null hypothesis. In fact, from the proof of Theorem 4.2 and Theorem 4.3, we can see that under local alternatives for Cn = O(n−1/2 ), the cumulative sum process of residuals REnj (x) has an extra nonrandom shift Sj for j = 1, 2 and the random symmetrization variable ei [BE]
in Rnj (x) makes such a nonrandom shift vanish as n tends to infinity. However, when Cn O(n−1/2 ), the random symmetrization [BE]
variable ei in Rnj (x) cannot eliminate the nonrandom shift.
E for Next we investigate the choices of the weight functions WEj (·) for j = 1, 2. Theorems 4.1 and 4.2 show that, for the test Tnj j = 1, 2, the distance between the asymptotic distributions of the tests under the null hypothesis and that under local alternative hypothesis distinct from the null one at the rate n−1/2 can be measured by Sj . Hence it is reasonable to choose the weight function E for j = 1, 2. Exactly speaking, W (·) should be chosen by maximizing the standardized |S |. WEj (·) by maximizing |Sj | for the test Tnj Ej j By this rule and Cauchy–Schwarz inequality, the theoretical optimal weight functions WE1 (x) and WE2 (x) are obtained as follows:
WE1 (X) =
G(X)
[(Y − H(X) )]2
+ A1 −1 H(X)G(X)
(4.13)
+ (X)A2 −1 H(X)G(X).
(4.14)
and WE2 (X) =
(X)G(X)
[(Y − H(X) )]2
Both the theoretical optimal weight functions contain some unknown quantities. Hence, the empirical optimal weight functions can be obtained by replacing the unknown quantities by their estimators. In practice, the weight function can be chosen to be the constant 1 while it is somehow complicated, which means that the weight function is usually not considered. In fact, in literature such as Ha¨ rdle and Mammen (1993), Stute et al. (1998), Lin et al. (2002) and Dikta et al. (2006), the weight function is taken to be constant 1. In the simulation studies of Section 5, it is found that even if the weight function is taken to be 1, the proposed tests perform well. From Theorems 4.1 and 4.2, it can be shown that the bandwidth does not affect the first order term of the distribution distance between the null hypothesis and the local alternative hypothesis distinct from the null one at the rate n−1/2 , although it may E is very small, which is also affect the higher order terms. This shows that the effect of the bandwidth bn on the test based on Tn2 verified by our simulation results. 5. Numerical studies This section illustrates the performance of the proposed methods by using simulation and a real data example. 5.1. Simulations In the following, we conducted simulation studies to evaluate the proposed tests for finite sample sizes and compared the proposed method with the existing simulation results as shown in Manteiga and González (2006). Consider two testing problems as follows: A : H0 : Y = H(X) + 1 ,
H1 : Y = H(X) + Cn G(Xi ) + 1
(5.1)
and B : H0 : Y = H(X) + (Xi )2 ,
H1 : Y = H(X) + Cn G(Xi ) + (Xi )2
(5.2)
with H(X) = 1 + X 2 , G(X) = X 3 , = 1 and (X) = X. Let the variable X be generated from the uniform distribution on [0,1], 1 generated from the normal distribution N(0, 14 ) and 2 generated from the normal distribution N(0, 34 ). The missing mechanism
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
0.05
0 0
0.5
1
estimated power
estimated power
empirical size
0.1
0.5
0
1
0
bn
0 0.5
0.5
0
1
0
0.5
0.05
0 0.5
0.5
0 0
bn
0.5
0 0
0 1
bn
0 0
0.5
0 0.5 bn
0.5
1
bn
1
0
1
0.5
1
estimated power
estimated power
0.05
0.5
1
bn
0.1
0.5
0.5
1
bn
1
1
0.5
1
1
estimated power
estimated power
empirical size
0
bn
0.1
empirical size
0 bn
1
bn
0
0.5
1
estimated power
estimated power
empirical size
0.05
0
1
bn
0.1
0
0.5
3595
1
1
0.5
0 0
0.5
1
bn
s E Fig. 1. The estimated power curves of the tests Tn2 and Tn2 against the bandwidth bn with missing mechanisms 1 (x) and sample size 100 under different choices s s s E , Cn = 0; (b) model A, Tn2 , Cn = 2 ∗ n−1/2 ; (c) model A, Tn2 , Cn = 4 ∗ n−1/2 ; (d) model A, Tn2 , Cn = 0; (e) model of Cn for testing Problem A and Problem B. (a) Model A, Tn2 s s s E E E −1/2 −1/2 ; (f) model A, Tn2 , Cn = 4 ∗ n ; (g) model B, Tn2 , Cn = 0; (h) model B, Tn2 , Cn = 2 ∗ n−1/2 ; (i): model B, Tn2 , Cn = 4 ∗ n−1/2 ; (j) model B, Tn2 , Cn = 0; A, Tn2 , Cn = 2 ∗ n E E −1/2 −1/2 ; (l) model B, Tn2 , Cn = 4 ∗ n . (k) model B, Tn2 , Cn = 2 ∗ n
follows logit((x)) = 1 + 2 x with
1 (x) :
1 = 0.3, 2 = 0.3,
2 (x) :
1 = 1.0, 2 = 0.8,
which corresponds to the two cases where the average response probabilities are 0.60 and 0.80, respectively. For each of the two cases, we generated 1000 Monte Carlo random samples of size n = 30, 60 and 100. To estimate the response probability 2 4 (X) = E(|X), the kernel function was taken to be K(u) = 15 16 (1 − 2u + u ), if |u| ⱕ 1; 0, otherwise. The nominal level is chosen to be 0.05.
3596
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
Table 1 Sizes of the tests Tnjs and TnjE for j = 1, 2 under different missing mechanisms and sample sizes for testing Problem A and Problem B.
1 (x)
n
2 (x)
s Tn1
s Tn2
E Tn1
E Tn2
s Tn1
s Tn2
E Tn1
E Tn2
A
30 60 100
0.047 0.055 0.049
0.040 0.043 0.047
0.035 0.040 0.054
0.033 0.046 0.050
0.055 0.053 0.051
0.056 0.049 0.050
0.047 0.055 0.051
0.043 0.047 0.053
B
30 60 100
0.071 0.066 0.057
0.065 0.057 0.066
0.067 0.590 0.053
0.059 0.058 0.049
0.064 0.052 0.049
0.057 0.055 0.048
0.058 0.055 0.045
0.056 0.047 0.047
Table 2 Simulated powers of the tests Tnjs and TnjE , j = 1, 2 under different missing mechanisms, sample sizes and choices of Cn for testing Problem A and Problem B. n
A
1 (x)
s Tn1
s Tn2
E Tn1
E Tn2
2 (x)
s Tn1
s Tn2
E Tn1
E Tn2
B
1 (x)
s Tn1
s Tn2
E Tn1
E Tn2
2 (x)
s Tn1
s Tn2
E Tn1
E Tn2
Cn = C ∗ n−1/2
Cn = C ∗ n−1/6
C=2
C=4
C=6
C=8
C=2
C=4
C=6
C=8
30 60 100 30 60 100 30 60 100 30 60 100
0.227 0.237 0.284 0.169 0.192 0.240 0.193 0.235 0.229 0.179 0.223 0.221
0.573 0.702 0.756 0.426 0.568 0.637 0.583 0.675 0.698 0.551 0.643 0.686
0.834 0.943 0.970 0.667 0.815 0.903 0.821 0.924 0.965 0.788 0.918 0.954
0.932 0.996 0.998 0.758 0.912 0.966 0.926 0.996 1 0.898 0.983 0.997
0.860 0.993 1 0.683 0.903 0.979 0.852 0.986 1 0.798 0.984 1
0.960 1 1 0.826 0.959 0.989 0.982 0.999 1 0.958 0.999 1
0.971 1 1 0.845 0.955 0.991 0.984 1 1 0.971 1 1
0.983 1 1 0.850 0.963 0.995 0.988 1 1 0.974 1 1
30 60 100 30 60 100 30 60 100 30 60 100
0.281 0.999 1 0.242 0.291 0.312 0.274 0.299 0.333 0.264 0.290 0.334
0.751 1 1 0.686 0.771 0.818 0.724 0.793 0.837 0.710 0.782 0.826
0.941 1 1 0.880 0.966 0.983 0.937 0.984 0.996 0.929 0.984 0.991
0.985 1 1 0.952 0.993 0.998 0.982 0.998 1 0.982 0.998 1
0.944 0.993 1 0.903 0.988 1 0.956 0.999 1 0.947 0.997 1
0.994 1 1 0.968 0.998 1 0.995 1 1 0.996 1 1
0.997 1 1 0.970 0.999 1 0.999 1 1 0.998 1 1
0.998 1 1 0.979 0.999 1 1 1 1 1 1 1
30 60 100 30 60 100 30 60 100 30 60 100
0.106 0.122 0.124 0.098 0.104 0.105 0.117 0.120 0.126 0.102 0.109 0.124
0.311 0.344 0.306 0.244 0.267 0.257 0.290 0.338 0.310 0.270 0.321 0.294
0.531 0.573 0.562 0.433 0.492 0.512 0.525 0.574 0.575 0.501 0.559 0.574
0.689 0.779 0.812 0.584 0.726 0.755 0.684 0.777 0.815 0.608 0.770 0.817
0.540 0.769 0.887 0.425 0.703 0.837 0.518 0.781 0.895 0.487 0.785 0.897
0.905 0.991 1 0.763 0.938 0.988 0.902 0.995 1 0.885 0.994 1
0.965 1 1 0.846 0.967 0.995 0.972 1 1 0.964 1 1
0.972 1 1 0.848 0.970 0.998 0.983 1 1 0.968 1 1
30 60 100 30 60 100 30 60 100 30 60 100
0.137 0.145 0.153 0.127 0.139 0.150 0.133 0.142 0.157 0.124 0.139 0.154
0.384 0.396 0.403 0.349 0.389 0.398 0.368 0.396 0.417 0.368 0.399 0.414
0.647 0.687 0.715 0.632 0.679 0.704 0.639 0.691 0.720 0.637 0.703 0.729
0.828 0.879 0.886 0.797 0.872 0.893 0.828 0.889 0.898 0.838 0.891 0.928
0.668 0.888 0.972 0.626 0.886 0.961 0.662 0.898 0.978 0.659 0.910 0.980
0.962 0.999 1 0.946 0.998 1 0.974 1 1 0.973 1 1
0.992 1 1 0.972 1 1 0.997 1 1 0.996 1 1
0.997 1 1 0.974 1 1 0.999 1 1 0.999 1 1
s and T E . The bandwidths were, respectively, taken to be We first investigate the influence of the bandwidth bn on the test Tn2 n2 j/100 for j = 1, . . . , 100. Based on the 1000 simulations, we plot the estimated power curve against the above bandwidth sequences −1/2 with C = 0, 2, 4, which is shown in Fig. 1. In fact, that C = 0 with sample size 100, missing mechanisms 1 (x) and Cn = Cn corresponds to the null hypothetical model and the curves in subplots (a), (d), (g) and (j) of Fig. 1 are empirical size curves.
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
3597
Table 3 s s E E , Tn2 , Tn1 , Tn2 proposed in the paper and the tests Tn,S , Tn,I proposed by Manteiga and González (2006) under different choices Simulated powers of the four tests Tn1 of (·) and C with sample size 100.
(X) = 1
bn
0.1
0.2
0.3
(X) = 0.5
C=0
C=1
C=5
C=0
C=1
C=5
s Tn1 s Tn2 E Tn1 E Tn2
0.0490 0.0490 0.0460 0.0410
0.1090 0.1245 0.1100 0.1140
0.9060 0.9495 0.9460 0.9590
0.0525 0.0535 0.0460 0.0510
0.2850 0.3190 0.3170 0.3340
1 1 1 1
Tn,S Tn,I hn = 0.1 Tn,I hn = 0.2 Tn,I hn = 0.3
0.0400 0.0440 0.0490 0.0460
0.0470 0.0580 0.0730 0.0670
0.4830 0.5190 0.6210 0.6320
0.0400 0.0440 0.0440 0.0460
0.0920 0.1010 0.1380 0.1450
0.9150 0.9330 0.9610 0.9580
s Tn1 s Tn2 E Tn1 E Tn2
0.0470 0.0490 0.0440 0.0450
0.1095 0.1150 0.1060 0.1140
0.9225 0.9575 0.9480 0.9590
0.0455 0.0570 0.0530 0.0510
0.2780 0.3100 0.3310 0.3340
1 1 1 1
Tn,S Tn,I hn = 0.1 Tn,I hn = 0.2 Tn,I hn = 0.3
0.0660 0.0610 0.0670 0.0660
0.0980 0.0980 0.1020 0.0990
0.7840 0.7740 0.7940 0.8090
0.0660 0.0610 0.0670 0.0660
0.2090 0.2050 0.2090 0.2210
0.9970 0.9920 0.9970 0.9990
s Tn1 s Tn2 E Tn1 E Tn2
0.0495 0.0525 0.0490 0.0510
0.1110 0.1275 0.1010 0.1080
0.9040 0.9040 0.9500 0.9520
0.0525 0.0535 0.0470 0.0540
0.2760 0.3215 0.3050 0.3190
1 1 1 1
Tn,S Tn,I hn = 0.1 Tn,I hn = 0.2 Tn,I hn = 0.3
0.0650 0.0650 0.0660 0.0700
0.1080 0.0990 0.1100 0.1100
0.8510 0.8270 0.8460 0.8540
0.0650 0.0650 0.0660 0.0700
0.2290 0.2340 0.2370 0.2350
0.9990 0.9990 0.9990 0.9990
s and T E as long as the bandwidth From Fig. 1, it is easy to see that the bandwidth choices have little influence on both tests Tn2 n2 −1/3 is not too small. In the following simulations, we choose bn = n . s and T E for j = 1, 2 under different sample From the 1000 simulated values, we calculated the empirical sizes of the tests Tnj nj sizes and missing mechanisms for the model checking Problems A and B as shown in (5.1) and (5.2). The results are reported in Table 1. To show the power of the tests, we take Cn to be Cn−1/2 and Cn−1/6 , respectively, with C = 2, 4, which correspond to different alternative hypotheses. The estimated powers of the four tests under different sample sizes, missing mechanisms and choices of Cn are reported in Table 2. We also conducted simulations under different choices of bandwidth and obtain similar results. From Table 1, apparently the significance level is well attained. Table 2 reports that all tests work well while the sample sizes are not too small and s and T E have higher power than the tests T s and T E . The larger C , the higher the power. This is comparatively the tests Tn1 n n1 n2 n2 reasonable. Next we compare the finite sample properties of the proposed tests in this paper with those proposed by Manteiga and González (2006). We used the same simulation background as that of Manteiga and González (2006) which considered the following testing problem:
H0 : Yi = 5Xi + (Xi )i ,
H1 : Yi = 5Xi + CX 2i + (Xi )i ,
1 ⱕ i ⱕ n,
where Xi were generated from the uniform distribution in the interval [0,1] and i ∼ N(0, 1). Sample size was taken to be 100, C = 0, 1, 5 and (Xi ) has two possible choices: 0.5 and 1. The missing mechanism was taken to be (x) = 1 − 0.4 exp{−5(x − 0.4)2 }. The nominal level was chosen to be 0.05. To estimate the response probability (X) = E(|X), the kernel function was taken to be 2 4 K(u) = 15 16 (1 − 2u + u )I(|u| ⱕ 1) and the bandwidth bn was chosen to be 0.1,0.2,0.3. s and T E for j = 1, 2, which are From the 2000 simulated values, we calculated the empirical sizes and powers of the tests Tnj nj presented in Table 3. In Table 3, we also list the simulation results which are taken from Table 1 of Manteiga and González (2006, p. 185), where hn is the bandwidth applied in nonparametric fit for the regression model, and Tn,S and Tn,I denote two testing methods proposed by Manteiga and González (2006). From the results of Table 3, we have the following observations. All four proposed testing methods have higher power and the size is closer to the nominal level 0.05. This suggests that all the proposed methods outperform the methods of Manteiga and González (2006) in terms of power and the size. All four testing methods depend slightly on the choice of bandwidth bn , while the methods in Manteiga and González (2006) depend on two bandwidths bn and hn more heavily.
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
4
4
3.5
3
3
2
Y
Y
3598
2.5
1
2
0 0
1
2
3
4
0
1
3
4
6
8
X
4
1
3
0.5 residual
Y
X
2
2
1
0
–0.5
0
–1 0
1
2 X
3
4
0
2
4 X/std(X)
Fig. 2. (a) The scatter plots of Y against X using only the complete data. (b) The scatter plots of Y against X using all the data by simply making the missing response values zero. (c) The fitted curve. (d) The residual plot against X/std(X).
5.2. Applications We use the proposed methods to analyze the data of foreign language study from Schafer (1997). The sample size is n = 279. Here we study the model checking problem for the response variable Y: current college grade point average, and the covariate X: high school grade point average. There are 34 subjects with missing response values and one subject missing covariate value. We impute the missing covariate value by the sample mean and take it as original data sine only one subject has missing covariate value and the sample number is quite large (279). Hence this treatment should have little influence on the relationship between the response and the covariate. In Fig. 2(a) and (b), we plot the scatter plots of Y against X, respectively, using only the complete data and all the data by simply making the missing response values zero. Obviously, the two subplots indicate an apparent linear relationship between the covariate and the response. In the following, we further use the proposed methods to verify this somehow subjective result. That is, we aim for testing model structure H0 : E(Y|X) = 1 + 2 X. From (2.2), we get the estimators 2 4 of 1 and 2 : 1 = 2.2651, 2 = 0.3699. Take the weight function W(x) = 1, K(u) = 15 16 (1 − 2u + u )I(|u| ⱕ 1) and bandwidth s = 0, T s = 0.0439, T E = 0.0136, and T E = 0.0208. By applying the asymptotic results for T s , T s bn = 2n−1/3 . We can obtain Tn1 n1 n2 n2 n1 n2 E , T E , we can get the corresponding p-values 1, 0.998, 0.995 and 0.998, respectively. Hence and the resampling technique for Tn1 n2 it is safe to conclude that the linear model is credible. In Fig. 2(c), we add the fitted curve in Fig. 2(b). In Fig. 2(d), We plot the residual plot against X/std(X). From Fig. 2(c) and (d), we can see that the model checking result is reasonable. 6. Concluding remarks For the model checking problem for a general linear model with response missing at random, we propose four methods: two score-type and two empirical process based tests. The asymptotic properties of the test statistics under the null and local alternative hypotheses are studied. The theory demonstrates that both score-type tests are consistent and can detect the local alternatives close to the null ones at the rate n−r with 0 ⱕ r ⱕ 12 , and the empirical process based tests can detect the local alternatives close to the null ones at the rate of n−1/2 . The simulation studies show that the proposed methods depend on the choices of the bandwidth slightly. This advantage makes the proposed methods comparatively easier to carried out and therefore more practical.
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
3599
Acknowledgments The authors greatly appreciate the constructive comments and suggestions of the editor and two referees. Qihua Wang's research was supported by National Science Fund for Distinguished Young Scholars in China (10725106), National Natural Science Foundation of China (10671198), National Science Fund for Creative Research Groups in China, the Research Grants Council of the Hong Kong (HKU 7050/06P). Zhihua Sun's research was supported by President Fund of GUCAS and China Postdoctoral Science Foundation. Appendix A. Proof of the theorems We begin this section by listing the conditions needed in the proofs of all the theorems. (A.a) is a positive definite matrix. (A.b) (i) inf x (x) > 0; (ii) (x) has bounded derivatives up to order 2. (A.c) The density of X, say f (·), exists and has bounded derivatives up to order 2 and satisfies 0 < inf x∈R f (x) ⱕ supx∈R f (x) < ∞. (A.d) (i) supx E[Y 2 |X = x] < ∞; (ii) E[ X 2 ] < ∞. d
(A.e) (i) nbn → ∞; (ii) bn → 0. Remark. Conditions (A.a) and (A.d) are necessary for the asymptotic normality of the least squares estimator. Condition (A.b)(i) and (A.c) are for avoiding tedious proofs of the theorems. Conditions (A.e) and (A.b)(ii) are usual assumptions for convergent rates of kernel estimating method. Lemma A.1. Under H1n , assuming (A.a) and (A.d), we have √
n √ −1 n(ˆ n − ) = −1 nCn E[(X)H(X)G(X)] + √ i H(Xi )i n i=1
⎧⎡ ⎫ ⎤−1 ⎪ ⎪ n ⎨ 1 ⎬ √ + ⎣ i H(Xi )H(Xi ) ⎦ − −1 Cn nE((X)H(X)G(X)) + op (1). ⎪ ⎪ n ⎩ ⎭ i=1
(A.1)
Proof. By some computations, we have √
⎡
⎤−1 n n 1 1 n(ˆ n − ) = ⎣ i H(Xi )H(Xi ) ⎦ √ i H(Xi )(Yi − H(Xi ) ) n n i=1
i=1
n √ 1 = −1 nCn E[(X)H(X)G(X)] + −1 √ i H(Xi )i n i=1
⎧⎡ ⎫ ⎤−1 ⎪ ⎪ n n ⎨ 1 ⎬ C + ⎣ i H(Xi )H(Xi ) ⎦ − −1 √n i H(Xi )G(Xi ) ⎪ ⎪ ⎩ n ⎭ n i=1
i=1
⎧⎡ ⎫ ⎤−1 ⎪ ⎪ n n ⎨ 1 ⎬ 1 −1 + ⎣ i H(Xi )H(Xi ) ⎦ − i H(Xi )i + op (1). √ ⎪ ⎪ ⎩ n ⎭ n i=1
(A.2)
i=1
√ Obviously, [1/n( ni=1 i H(Xi )H(Xi ) )]−1 − −1 = op (1) and 1/ n( ni=1 i H(Xi )i = Op (1)). Hence the fourth main term at right hand side of the last equality in (A.2) is op (1). Then the lemma is proved by (A.2) and the law of large numbers. s , Theorems Next we prove the theorems. For simplicity, we only give the proofs of Theorem 3.1 for Rsn2 , and Theorem 3.2 for Tn2 [BE]
s , Theorem 4.3 for T s s 4.1 and 4.2 for REn2 and Tn2 n2 . The proofs of Theorem 3.1 for Rn1 , and Theorem 3.2 for Tn1 , Theorem 4.1 and [BE]
s , Theorem 4.3 for T Theorem 4.2 for REn1 and Tn1 n1 are similar.
3600
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
Proof of Theorem 3.1 for Rsn2 . Clearly, it can be verified that n n 1 i 1 i (Yi − H(Xi ) )Ws2 (Xi ) + √ H(Xi ) ( − ˆ n )Ws2 (Xi ) Rsn2 = √ (Xi ) (Xi ) n n i=1
n
1 +√ n
i=1
1
−
ˆ n (Xi )
i=1
n
1 = √ n
1 (Xi )
i=1
n 1 n
i (Yi − H(Xi ) )Ws2 (Xi ) + √
1
ˆ n (Xi )
i=1
−
1 (Xi )
i H(Xi ) ( − ˆ n )
√ i (Y − H(Xi ) )Ws2 (Xi , ) − E[H(X) Ws2 (X, )] n(ˆ n − ) + op (1). (Xi ) i
(A.3)
By the central limit theorem and the fact √
n −1
n(ˆ n − ) = √ n
i H(Xi )i + op (1)
(A.4)
i=1
Theorem 3.1 for Rsn2 is proved.
s . Clearly, Proof of Theorem 3.2. for Tn2
s Tn2
⎤2 ⎡ n 1 ⎣ 1 i ˆ = s (Yi − H(Xi ) n )Ws2 (Xi )⎦ √ ˆ Vn2 n i=1 n (Xi ) ⎤2 ⎤2 ⎡ ⎡ n n s V2s − Vn2 1 1 ⎣ 1 i i ˆ ˆ ⎣√ = s √ (Yi − H(Xi ) n )Ws2 (Xi )⎦ + (Yi − H(Xi ) n )Ws2 (Xi )⎦ s Vs ˆ ˆ V2 Vn2 n n 2 i=1 n (Xi ) i=1 n (Xi )
:=
s 1 2 V2s − Vn2 2 M + s V s Mn + op (1) V2s n Vn2 2
(A.5)
√ ˆ n (X ))(Y − H(X ) ˆ )W (X )). Next we consider Mn . It is easy to see that with Mn = 1/ n( ni=1 (i / s2 i i i i n n n 1 i 1 Mn = √ (Yi − H(Xi ) ˆ n )Ws2 (Xi ) + √ (Xi ) n n i=1
i=1
1
1 − ˆ (X i) n (Xi )
i (Yi − H(Xi ) ˆ n )Ws2 (Xi )
:= Mn1 + Mn2.
(A.6)
Obviously, when the local alternative model (3.6) is true, we have n n 1 i 1 i Mn1 = √ (Yi − H(Xi ) )Ws2 (Xi ) + √ H(Xi ) ( − ˆ n )Ws2 (Xi ) (Xi ) (Xi ) n n i=1
1 = √ n =
n i=1
i=1
√ i (H(Xi ) + Cn G(Xi ) + i − H(Xi ) )Ws2 (Xi ) − E(H(X) Ws2 (X)) n(ˆ n − ) + op (1) (Xi )
n √ √ 1 i i nCn E(G(X)Ws2 (X)) + √ W (X ) − E(H(X) Ws2 (X)) n(ˆ n − ) + op (1). (Xi ) s2 i n
(A.7)
i=1
For Mn2 , it can be validated that n 1 Mn2 = √ n
i=1
1
ˆ n (Xi )
−
1 (Xi )
n 1 n
i (Yi − H(Xi ) )Ws2 (Xi ) − √
i=1
1
ˆ n (Xi )
−
1 (Xi )
i H(Xi ) (ˆ n − )Ws2 (Xi )
:= Mn2,1 − Mn2,2.
(A.8)
For Mn2,1 , it yields n 1 Mn2,1 = √ n i=1
1
ˆ n (Xi )
−
1 (Xi )
n 1 n
i i Ws2 (Xi ) + √
i=1
1
ˆ n (Xi )
−
1 (Xi )
i Cn G(Xi )Ws2 (Xi ).
(A.9)
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
3601
The first term can be proved to be op (1). We denote the second term by 2 . Then, we have n (Xi ) − ˆ n (Xi ) C + op (1) i G(Xi )Ws2 (Xi ) n 2 (Xi )
2 = √n
i=1
n
j=1 ((Xi ) − j )K
n Cn i G(Xi )Ws2 (Xi ) = √ n 2 (Xi )
n
j=1 K
i=1
!
! + op (1)
" # n n Xi − Xj i G(Xi )Ws2 (Xi ) ((Xi ) − j )K + op (1) 2 bn (Xi )f (Xi ) j=1
Cn
=
Xi −Xj bn
Xi −Xj bn
n3/2 bdn i=1
" # n n Xi − Xj i G(Xi )Ws2 (Xi ) ( (X ) − )K + op (1) j j bn n3/2 bdn i=1 2 (Xi )f (Xi ) j=1 Cn
=
=
Cn
n
n3/2 bdn
j=1
Cn = √ n
n
((Xj ) − j )
" # n X i − Xj i G(Xi )Ws2 (Xi ) K + op (1) bn 2 (Xi )f (Xi ) i=1
(Xj ) − j G(Xj )Ws2 (Xj ) + op (1). (Xj )
j=1
This proves n Cn (Xj ) − j G(Xj )Ws2 (Xj ) + op (1). Mn2,1 = √ (Xj ) n
(A.10)
j=1
By Lemma A.1, it is easy to see that n 1 Mn2,2 = n
i=1
1
1 − ˆ n (Xi ) (Xi )
⎧ ⎪ ⎨
√
i H(Xi ) Ws2 (Xi ) −1 nCn E[(X)H(X)G(X)] ⎪ ⎩
⎧⎡ ⎫ ⎫ ⎤−1 ⎪ ⎪ ⎪ n n ⎨ 1 ⎬ ⎬ √ 1 +−1 √ i H(Xi )i + ⎣ i H(Xi )H(Xi ) ⎦ − −1 Cn nE[(X)H(X)G(X)] + op (1) ⎪ ⎪ ⎪ n ⎩ n ⎭ ⎭ i=1
n Cn =√ n
i=1
n 1 + n n 1 n
i=1
1
1 − ˆ (X i) n (Xi )
i=1
+
i=1
⎧ ⎪ ⎨
1
1 − ˆ (X i) n (Xi ) 1 1 − ˆ (X i) n (Xi )
i H(Xi ) Ws2 (Xi ) −1 E[(X)H(X)G(X)] ⎪ ⎩
n 1 i H(Xi )i n
i H(Xi ) Ws2 (Xi )−1 √
i=1
⎧⎡ ⎫ ⎤−1 ⎪ ⎪ n ⎨ 1 ⎬ i H(Xi ) Ws2 (Xi ) ⎣ i H(Xi )H(Xi ) ⎦ − −1 ⎪ ⎪ n ⎩ ⎭ i=1
⎫ ⎪ ⎬ √ ×Cn nE[(X)H(X)G(X)] + op (1) ⎪ ⎭ [1]
[2]
[3]
= Mn2,2 + Mn2,2 + Mn2,2 + op (1).
(A.11)
Applying the standard kernel method, we can get n Cn (Xi ) − i [1] H(Xi ) Ws2 (Xi )−1 E[(X)H(X)G(X)] Mn2,2 = √ (Xi ) n i=1
+ op (1).
(A.12)
3602
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
Furthermore, it can be shown that n 1 n i=1
1
ˆ n (Xi )
−
1 (Xi )
i H(Xi ) Ws2 (Xi ) = op (1).
√ This together with the fact 1/ n( ni=1 i H(Xi )i ) = Op (1) proves [2]
Mn2,2 = op (1).
(A.13)
For Cn n −→ a, 0 ⱕ ⱕ 1/2, by noting that [1/n(
n
i=1 i H(Xi )H(Xi )
−1
]
) − −1 = op (1), we have
[3]
Mn2,2 = op (1).
(A.14)
From (A.12)–(A.14), we have n Cn (Xi ) − i Mn2,2 = √ H(Xi ) Ws2 (Xi )−1 E[(X)H(X)G(X)] + op (1). (Xi ) n
(A.15)
i=1
Then from (A.8), (A.10) and (A.15), we can obtain that n Cn (Xj ) − j Mn2 = √ {G(Xj )Ws2 (Xj ) − H(Xi ) Ws2 (Xi )−1 E[(X)H(X)G(X)]} + op (1). (Xj ) n
(A.16)
j=1
Further by (A.6), (A.7) and (A.16), it can be concluded that
Mn =
n √ √ 1 i i nCn E(G(X)Ws2 (X)) + √ Ws2 (Xi ) − E(H(X) Ws2 (X)) n(ˆ n − ) (X ) n i i=1
n Cn (Xj ) − j {G(Xj )Ws2 (Xj ) − H(Xi ) Ws2 (Xi )−1 E[(X)H(X)G(X)]} + op (1). +√ (Xj ) n
(A.17)
j=1
If n1/2 Cn −→ 1, by Lemma A.1, we can see that n 1 i i Mn = E[G(X)Ws2 (X)] + √ W (X ) − E(H(X) Ws2 (X)) (Xi ) s2 i n i=1
n 1 i H(Xi )i } + op (1). × {−1 E((X)H(X)G(X)) + −1 √ n
(A.18)
i=1
s is a consistent estimator of V s , we have that Recalling that Vn1 1 s V2s − Vn2 Mn2 = op (1). s s Vn2 V2
(A.19)
L
s −→ 2 (D2 ) with the non-centrality parameter From (A.5), (A.18) and (A.19), we have that Tn2 1 2
D22 =
1 [E(G(X)Ws2 (X)) − E(H(X) Ws2 (X))−1 E((X)H(X)G(X))]2 . V2s
If n Cn −→ a, 0 ⱕ < 1/2, then it yields s is then proved. 3.2 for Tn2
√
s −→ ∞. Theorem nCn −→ ∞, as n −→ ∞. From (A.5), (A.17) and Lemma A.1, we have Tn2
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
3603
E From (2.5) and (4.2), by some computations, it yields Proof of Theorem 4.1 for REn2 . and Tn2
REn2 (x) =
n i i 1 ˆ ˆ Yi + 1 − H(Xi ) n − H(Xi ) n WE2 (Xi )I(Xi ⱕ x) √ ˆ n ˆ n (Xi ) i=1 n (Xi )
n 1 i (Y − H(Xi ) )WE2 (Xi )I(Xi ⱕ x) = √ (Xi ) i n i=1
n 1 i H(Xi ) ( − ˆ n )WE2 (Xi )I(Xi ⱕ x) +√ (Xi ) n i=1
n 1 +√ n
i=1
n 1 +√ n i=1
=:
4
1
1 − ˆ (X i) n (Xi ) 1
ˆ n (Xi )
−
1 (Xi )
i (Yi − H(Xi ) )WE2 (Xi )I(Xi ⱕ x)
i H(Xi ) ( − ˆ n )WE2 (Xi )I(Xi ⱕ x)
Onj (x).
(A.20)
j=1
By the asymptotic representation of $ n and the law of large number, it yields √ On2 (x) = − E[H(X) WE2 (X)I(X ⱕ x)] n(ˆ n − ) + op (1) n −1
= − E[H(X) WE2 (X)I(X ⱕ x)] √
n
i H(Xi )i + op (1).
(A.21)
i=1
Furthermore, it can be validated that On3 (x) = op (1),
On4 (x) = op (1).
(A.22)
Hence, n 1 i (Y − H(Xi ) )WE2 (Xi )I(Xi ⱕ x) REn2 (x) = √ (Xi ) i n i=1
n −1
− E[H(X) WE2 (X)I(X ⱕ x)] √ n
i H(Xi )i + op (1).
(A.23)
i=1
For any fixed x, the main term of the right-hand side of (A.23) is essentially a normalized sum of n independent zero-mean random variable. It then follows from the multivariate central limit theorem that REn2 (x) converges in finite-dimensional distributions to a zero-mean Gaussian process. Furthermore, the tightness of the first term of (A.23) follows from the fact that it consists of sums of √ monotone step functions and the tightness of the second term follows from the asymptotic normality of −1 / n( ni=1 i H(Xi )i ) and the fact that E[H(X) WE2 (X)I(X ⱕ x)] is a function of x. Hence, REn2 (x) converges weakly to a zero-mean Gaussian process with E is then proved. the covariance function given in (4.9). By continuous mapping theorem, Theorem 4.1 for Tn2 E When the local alternative model (3.6) is true, similarly to prove (A.17), it can be shown Proof of Theorem 4.2 for REn2 . and Tn2 that
REn2 (x) =
n √ 1 i i nCn E(G(X)Ws2 (X)I(X ⱕ x)) + √ W (X )I(Xi ⱕ x) (Xi ) s2 i n i=1
n Cn (Xj ) − j − E(H(X) Ws2 (X)I(X ⱕ x)) n(ˆ n − ) + √ (Xj ) n
√
j=1
× {G(Xj )Ws2 (Xj )I(Xj ⱕ x) + H(Xi ) Ws2 (Xi )I(Xi ⱕ x)−1 E[(X)H(X)G(X)]} + op (1).
(A.24)
3604
Z. Sun, Q. Wang / Journal of Statistical Planning and Inference 139 (2009) 3588 -- 3604
If n1/2 Cn −→ 1, then it yields n 1 i i REn2 (x) = E[G(X)Ws2 (X)I(X ⱕ x)] + √ W (X )I(Xi ⱕ x) (Xi ) s2 i n
⎧ ⎨
i=1
−1
− E(H(X) Ws2 (X)I(X ⱕ x)) ⎩
E(H(X)G(X)) +
−1
⎫ n ⎬ 1 i H(Xi )i + op (1). √ ⎭ n
(A.25)
i=1
By similar method in the proof of Theorem 4.1, the tightness of REn2 (x) can be proved. Hence, the results of REn2 (x) then can be E can be easily obtained. If n C −→ a, 0 ⱕ < 1/2, then it is easy to see proved. By continuous mapping theorem, the result for Tn2 n √ s −→ ∞. Thus Theorem 4.2 for T E is then proved. that nCn −→ ∞, as n −→ ∞. From (A.24) and Lemma A.1, we have Tn2 n2 BE 1/2 C −→ a, it can be shown that Proof of Theorem 4.3 for RBE n n2 . and Tn2 If Cn = 0 or n n 1 ei i i RBE W (X )I(Xi ⱕ x) − E(H(X) Ws2 (X)I(X ⱕ x))−1 n2 (x) = √ (Xi ) s2 i n i=1
n 1 ×√ ei i H(Xi )i + op (1). n
(A.26)
i=1
By similar arguments to Theorem 4.1, it can concluded that RBE n2 (x) converges in distribution to a Gaussian process. Denote the main BE BE E E (x). It can be shown that E[RBE term of the right side of (A.26) by RBE 2 2 (x)]=0 and cov(R2 (x1 ), R2 (x2 )) converges to cov(R2 (x1 ), R2 (x2 )). BE follows from continuous mapping theorem. is then proved. Theorem 4.3 for T Hence Theorem 4.3 for RBE n2 n2 References Cheng, C.L., Kukush, A., 2004. A goodness-of-fit test for a polynomial errors-in-variables model. Ukrainian Mathematical Journal 56, 527–543. Cheng, C.L., Schneeweiss, H., 1998. Polynomial regression with errors in the variables. Journal of the Royal Statistical Society Series B 60, 189–199. Cheng, C.L., Van Ness, J.W., 1998. Statistical Regression with Measurement Error. Oxford University Press, London. Dikta, G., Kvesic, M., Schmidt, C., 2006. Bootstrap approximations in model checks for binary data. Journal of the American Statistical Association 101, 521–530. H¨ardle, W., Mammen, E., 1993. Testing parametric versus nonparametric regression. Annals of Statistics 21, 1926–1947. Lin, D.Y., Wei, L.J., Ying, Z., 2002. Model-checking techniques based on cumulative residuals. Biometrics 58, 1–12. Little, R.J.A., Rubin, D.B., 1987. Statistical Analysis with Missing Data. Wiley, New York. Manteiga, G.W., González, P.A., 2006. Goodness-of-fit tests for linear regression models with missing response data. The Canadian Journal of Statistics 34, 149–170. Pollard, D., 1984. Convergence of Stochastic Processes. Springer, New York. Schafer, J.L., 1997. Analysis of Incomplete Multivariate Data. Chapman & Hall, New York. Stute, W., Manteiga, G.W., Quindimil, M.P., 1998. Bootstrap approximations in model checks for regression. Journal of the American Statistical Association 93, 141–149. Wang, Q.H., Rao, J.N.K., 2002. Empirical likelhood-based inference under imputation for missing response data. The Annals of Statistics 30, 345–358. Wang, Q.H., Lindon, O., H¨ardle, W., 2004. Semiparametric regression analysis with missing response at random. Journal of the American Statistical Association 99, 334–345. Zhu, L.X., Cui, H.J., 2005. Testing lack-of-fit for general linear errors in variables models. Statistica Sinica 15, 1049–1068. Zhu, L.X., Ng, K.W., 2003. Checking the adequacy of partial linear model. Statistica Sinica 13, 763–781.