Goodness-of-fit tests for general linear models with covariates missed at random

Goodness-of-fit tests for general linear models with covariates missed at random

Journal of Statistical Planning and Inference 142 (2012) 2047–2058 Contents lists available at SciVerse ScienceDirect Journal of Statistical Plannin...

288KB Sizes 4 Downloads 62 Views

Journal of Statistical Planning and Inference 142 (2012) 2047–2058

Contents lists available at SciVerse ScienceDirect

Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

Goodness-of-fit tests for general linear models with covariates missed at random Xu Guo a,b, Wangli Xu b,n a b

Department of Mathematics, Hong Kong Baptist University, Hongkong, China Center for Applied Statistics,School of Statistics, Renmin University of China, China

a r t i c l e i n f o

abstract

Article history: Received 18 July 2011 Received in revised form 26 December 2011 Accepted 13 February 2012 Available online 23 February 2012

In this paper, we consider a model checking problem for general linear models with randomly missing covariates. Two types of score type tests with inverse probability weight, which is estimated by parameter and nonparameter methods respectively, are proposed to this goodness of fit problem. The asymptotic properties of the test statistics are developed under the null and local alternative hypothesis. Simulation study is carried out to present the performance of the sizes and powers of the tests. We illustrate the proposed method with a data set on monozygotic twins. & 2012 Elsevier B.V. All rights reserved.

Keywords: General linear model Lack of fit test Randomly missing covariates Inverse probability weights

1. Introduction A general linear regression model for the dependence of scalar response Y and covariables X of dimension m has the form >

Y ¼ f ðXÞb þ E,

ð1Þ

where fðÞ is a known vector function of dimension p, and b is an unknown parameter vector of dimension p. We assume that the conditional expectation of E satisfies EðE9XÞ ¼ 0 and EðE2 9XÞ ¼ s2 ðXÞ o 1, and we write the transpose of a matrix fðXÞ as f> ðXÞ in (1). Model (1) covers one important statistical model: the classical linear model with fðXÞ  X, and there are many research works in the literature dealing with the classical linear model. General linear model (1), compared with the classical linear model, is more flexible and applicable because they allow for interactions and high order terms of the covariates. Clearly, any statistical analysis within the model, to prevent wrong conclusions, should be accompanied by a check of whether the hypothetical parametric model is satisfied at all. In the context of complete samples, many have investigated the model checking for parameter regression models, and these tests may be readily applied to handel the goodness of fit ¨ for the model (1). Among others, Hardle and Mammen (1993) considered comparisons between a parametric and ¨ nonparametric fits and used the wild bootstrap for computing critical value of the test. Hardle et al. (1998) studied testing for parametric versus semiparametric modeling in generalized linear models, again using the wild bootstrap method. Stute et al. (1998) proposed an innovation process approach so as to obtain asymptotically distribution-free and optimal tests. n

Corresponding author. E-mail address: [email protected] (W. Xu).

0378-3758/$ - see front matter & 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2012.02.039

2048

X. Guo, W. Xu / Journal of Statistical Planning and Inference 142 (2012) 2047–2058

Stute and Zhu (2002) developed nonparametric tests for the validity of a generalized linear model, and properly transformed to their innovation parts so that the resulting test statistics are distribution-free. Stute and Manteiga (1996) constructed a test based on the comparison of a fully nonparametric fit and a parametric estimator. When the covariates are measured with error, under the assumption of additive error model structure and the known error variance, Zhu and Cui (2005) proposed a score-type test for the general linear model. In practice, however, some values of the covariates, denoted them as U with X¼(U, T), may be missing for various reasons. For example, the covariate is too expensive to measure for the full study cohort due to a limited budget, and there are many works improving study efficiency through biased sampling when the full assessment of U on the whole study cohort is not feasible. Among others, Zhou et al. (2002) proposed a two-component outcome dependent sampling scheme to enhance efficiency, and the data structure with U observed consists of a simple random sample and a supplement sample drawn based on information of the response at the first stage. Weaver and Zhou (2005) developed estimated likelihood method for continuous outcome regression models, and the data structure consists not only of the data with U observed but also of the data with U unobserved. Another example for missing is because of drop-outs due to serious side effects, refusals to reply to certain questions during survey, errors in the measuring apparatus, and so forth. In fact, missing covariates are very common in clinical longitudinal studies, opinion polls, medical studies and other scientific experiments. There are some literatures investigated the model checking when the response variable Y is missing at random. Among ¨ others, Manteiga and Gonza´lez (2006) extended Hardle and Mammen’s (1993) method to test the goodness of fit of a linear regression model with missing response data, and builded test statistics based on the L2 distance between the nonparametric and parametric fits. For the general linear model (1) with missing response at random, Sun and Wang (2009) impute the incomplete observations by imputation and inverse probability weighting methods and then proceed to construct two score type tests and two empirical process tests with the completed samples. To our knowledge, little works focus on the goodness of fit with missing covariates. Our interest lies in obtaining a model checking for the model (1) adapted to the case where the covariate variable U has missing data and the other variables have complete data, that is, under null hypothesis H0 : EðY9XÞ ¼ fðXÞ> b

ð2Þ

for some b and known fðÞ with covariates U missing. Two possible approaches, namely parametric and nonparametric, are considered to estimate the inverse probability function, and the two test statistics studies in this paper are based on these two estimators. Another desired point is to ascertain which of the two proposed test performs better. From the simulation analysis, it is shown that the tests have their own advantages. The rest of this paper is organized as follows. In Section 2, we construct the test statistics and derive their asymptotic properties under the null hypothesis and under local alternative hypothesis. In Section 3, some simulations are reported and a real data analysis is carried out to illustrate the proposed tests. The proofs of the asymptotic results are presented in Appendix. 2. Test procedure 2.1. Construction of test statistics For model (1), let X ¼(U,T) and we assume that the covariate U is missing at random (MAR), while Y and T are fully observed. Here U, T are d1, d2-dimensional random vectors respectively. Define di is a missing indicator for the ith individual whether ui is observed ðdi ¼ 1Þ or not ðdi ¼ 0Þ, then MAR implies Pðd ¼ 19Y,U,TÞ ¼ Pðd ¼ 19Y,TÞ ¼ pðZÞ, here Z ¼(Y,T). MAR is a common assumption for statistical analysis with missing data and is applicable in many practical situations, see Little and Rubin (1987). Note that under H0   d > ðYf ðXÞbÞ ¼ 0, E pðZÞ two residual-based test statistics are constructed as follows: n 1 X di > ðy f ðxi Þb^ N Þ, T n1 ¼ pffiffiffi n i ¼ 1 p^ ðzi Þ i

ð3Þ

n 1 X di > ðy f ðxi Þb^ P Þ, T n2 ¼ pffiffiffi n i ¼ 1 pðzi , a^ Þ i

ð4Þ

where b^ N and b^ P are the estimators of b, and p^ ðzi Þ and pðzi , a^ Þ are the parametric and nonparametric estimators of pðzi Þ respectively. The estimators b^ N , b^ P , p^ ðzi Þ and pðzi , a^ Þ will be specified later.

X. Guo, W. Xu / Journal of Statistical Planning and Inference 142 (2012) 2047–2058

2049

It is worth mentioning that our idea for constructing tests is to weight the observed residuals by inverse probability function, while that in Sun and Wang (2009) is to impute the incomplete observations and then construct the tests with the completed samples. For the estimation of pðzi Þ, we can estimate it nonparametrically, i.e., Pn dj K h ðzi zj Þ , ð5Þ p^ ðzi Þ ¼ Pj ¼n 1 j ¼ 1 K h ðzi zj Þ here K h ðÞ ¼ Kð=hÞ=h with KðÞ being a kernel function and h being a bandwidth. However, when the dimension of Z is high, kernel estimator may suffer from the curse of dimensionality one faces in fully nonparametric models. In this case, a purely parametric model may be more applicable, that is, we assume that pðZÞ ¼ pðZ, aÞ. The logistic regression, based on di ,yi ,ti ,i ¼ 1, . . . ,n, can yield consistent estimates of the regression coefficients in the model, provided that pðzi , aÞ is 1 correctly specified. More specifically, we suppose pðzi , aÞ ¼ ð1 þexpða0 a1 yi a> where a ¼ ða0 , a1 , a2 Þ> is an 2 t i ÞÞ unknown vector parameter. The least square estimator a^ of a is defined as n X

a^ ¼ arg min a

ðdi pðzi , aÞÞ2 :

ð6Þ

i¼1

The corresponding estimator of pðZÞ is

pðzi , a^ Þ ¼ ð1 þexpða^ 0 a^ 1 yi a^ >2 ti ÞÞ1 :

ð7Þ

The estimators p^ ðzi Þ and pðzi , a^ Þ in (3) and (4) are from that in (5) and (7) respectively. We estimate the regression parameters b by using inverse probability weight method, that is, !1 n n X X di di b^ N ¼ fðxi Þf> ðxi Þ fðxi Þyi ^ ðzi Þ ^ ðzi Þ p p i¼1 i¼1 or

b^ P ¼

!1

n X

di

i¼1

pðzi , a^ Þ

fðxi Þf> ðxi Þ

n X

di

i¼1

pðzi , a^ Þ

fðxi Þyi ,

which is dependent on whether pðZÞ is estimated nonparametrically or parametrically. 2.2. Asymptotic behavior of the test statistics Let p0 ðZ, aÞ ¼ grada ðpðZ, aÞÞ. Under mild conditions, see Jennrich (1969), we have n pffiffiffi 1 X p0> ðzi , aÞðdi pðzi , aÞÞ þop ð1Þ: nða^ aÞ ¼ Eðp0> ðZ, aÞp0 ðZ, aÞÞ1 pffiffiffi ni¼1

ð8Þ

To state the theorems, we introduce some notations that is related to the limiting variance of the test statistic. Let S ¼ EðfðXÞf> ðXÞÞ, G ¼ ð1,ZÞ, M1 ¼ Eðð1pðZÞÞGeÞ,M2 ¼ Eðð1pðZÞÞGfðXÞeÞ and Sa ¼ Eðp0> ðZ, aÞ p0 ðZ, aÞÞ. We now state the asymptotic properties of Tni (i¼1, 2) in (3) and (4). Theorem 1. Under H0 and the conditions in Appendix, we have T n1 -Nð0,V 1 Þ

and

T n2 -Nð0,V 2 Þ,

where >

>

V 1 ¼ Eðe2 f1Eðf ðXÞÞS1 fðXÞg2 ð1pðZÞÞfEðe9ZÞEðf ðXÞÞS1 EðfðXÞe9ZÞg2 Þ=pðZÞ, >

1

V 2 ¼ Efedð1Eðf ðXÞÞS

>

0> 2 fðXÞÞ=pðZ, aÞðM1 Eðf ðXÞÞS1 M2 ÞS1 a p ðZ, aÞðdpðZ, aÞÞg :

We now investigate the sensitivity of the tests for a sequence of local alternatives with the form >

H1n : Y ¼ f ðXÞb þC n GðXÞ þ Z, where EðZ9XÞ ¼ 0 and the function GðÞ satisfies EðG2 ðXÞÞ o1. Then we have the following theorem: Theorem 2. Assume the same hypotheses as Theorem 1, under local alternatives H1n, we have, >

(i) If n1=2 C n -1, T n1 -Nðm1 ,V 1 Þ and T n2 -Nðm2 ,V 2 Þ, where m1 ¼ m2 ¼ EðGðXÞÞEðf ðXÞÞS1 EðfðXÞGðXÞÞ. (ii) If nr C n -a with 0 or o 1=2 and aa0, then T n1 -1 and T n2 -1. Theorem 2 suggests that our proposed test has asymptotic power 1 for local alternatives which are distinct from the null hypothesis at the rate nr with 0 or o 1=2. Also the test can detect alternatives converging to the null hypothesis at the rate n1=2 , which is the fastest possible rate for lack-of-fit test.

2050

X. Guo, W. Xu / Journal of Statistical Planning and Inference 142 (2012) 2047–2058

Remark 1. The asymptotic properties for the test Tn2 in Theorems 1 and 2 is based on the assumption that pðZ, aÞ is logistics regression function, and that from any purely parameter function can be similarly derived. For the estimation of a, we can also estimate the parameter a through the well-known parametric methods, such as generalized estimating equations (GEE), maximum likelihood estimation (MLE) and restricted maximum likelihood estimation (RMLE). In this case, we only need to replace the asymptotic result of (8) about a^ with that derived from the different estimation method, and the corresponding terms in Theorems 1 and 2 should also be replaced. From Theorem 2, if GðXÞ ¼ fðXÞ> g with g being any parameter, m1 ¼ m2 ¼ 0, which implies the null hypothesis holds. 3. Numerical analysis 3.1. Simulation study This section presents the performance of our proposed test statistics through some simulation runs. The first example is Study 1. We generate the data from the model >

Y ¼ f ðXÞb þ aGðXÞ þ e,

ð9Þ

2

3

where fðXÞ ¼ 1þ X with X  Uð0; 1Þ, b ¼ 1, GðXÞ ¼ X and e  Nð0,0:25Þ. For model (9), it is evident that the null hypothesis > H0 : EðY9XÞ ¼ f ðXÞb for some b is valid if and only if a ¼0. For this model with univariate covariate, we assume that X is missing at random. Two missing probability mechanisms are considered for model (9) Case 1. p1 ðyÞ ¼ Pðd ¼ 19Y ¼ yÞ ¼ 1=ð1 þexpðð1 þ0:8yÞÞ. Case 2. p2 ðyÞ ¼ Pðd ¼ 19Y ¼ yÞ ¼ 1=ð1 þ0:29y9Þ.

For the above different cases, the mean response rates are Ep1 ðyÞ  0:88 and Ep2 ðyÞ  0:79 respectively. The kernel function is taken to be KðuÞ ¼ 15=16ð1u2 Þ2 , if 9u9 r 1; 0 otherwise. For bandwidth selection, as pointed out by Zhu and Ng (2003), how to select optimal bandwidth is still an open problem in the testing problems and it deserves further study. In the simulation part, we choose the bandwidth as h0 ¼ s^ ðYÞn1=3 with s^ ðYÞ being the empirical estimator of the standard deviation of variable Y, which satisfies condition (4) in Appendix. We also investigate the sensitivity of the bandwidth selection for three bandwidth selections, that is, h0 ¼ s^ ðYÞn1=3 , h1 ¼ 0:5s^ ðYÞn1=3 and h2 ¼ 2s^ ðYÞn1=3 . The power performance is investigated by simulation runs with different alternatives through varying the values of a in (9), different sample size n¼50, 100 and different missing mechanism pi ðyÞ ði ¼ 1; 2Þ. The simulation results with sample size n¼100 and missing mechanism p1 ðyÞ are presented in Table 1, and the size and power from Table 1 are not too sensitive for different bandwidth h ¼ h0 ,h1 ,h2 . For example, the power for a¼0.9 is 0.921, 0.923 and 0.923 for h1, h0 and h2 respectively. We just report the results in Figs. 1 and 2 with bandwidth h0 for other simulation runs for space considerations. Fig. 1(a) and (b) shows the plots with sample size n ¼50, while Fig. 2(a) and (b) shows the plots with sample size n ¼100. From them, the empirical sizes of Tn2 is very close to the theoretical level a ¼ 0:05, while that of Tn1 is a little bit smaller. Under the alternative hypothesis, which means aa0, the power increase quickly as a in (9) increases, i.e., the tests are very sensitive to the alternatives. Also, the power performance with n ¼100 is better than that with n ¼50, and that with missing probability mechanisms p1 ðyÞ is efficient than that with p2 ðyÞ, which is reasonable because the mean response rates of p1 ðyÞ is larger than that of p2 ðyÞ. For comparison the tests Tn1 and Tn2, the power of Tn2 is more powerful than that of Tn2 when the missing probability is p1 ðyÞ, while they are going in the opposite way with p2 ðyÞ. That is, for Tn2 calculated from pðZ, a^ Þ, which is assumed to be a logistics function, the results for Tn2 is better for p1 ðyÞ which is indeed a logistics function. However, the results for Tn1 calculated from nonparametric p^ ðZÞ is better for p2 ðyÞ, which is not a logistics function. Table 1 Simulated size and power under sample size n¼ 100, missing mechanisms p1 ðyÞ, and different a for Study 1. a

0.000 0.300 0.600 0.900 1.200 1.500 1.800 2.100

Tn2

Tn1 h1

h0

h2

0.043 0.160 0.582 0.921 0.996 1.000 1.000 1.000

0.042 0.166 0.592 0.923 0.997 1.000 1.000 1.000

0.038 0.170 0.614 0.923 0.996 1.000 1.000 1.000

0.050 0.244 0.691 0.946 0.997 1.000 1.000 1.000

X. Guo, W. Xu / Journal of Statistical Planning and Inference 142 (2012) 2047–2058

Empirical power with π (y)

0.8 0.6 0.4 0.2 0

0

0.5

1 a

2

1

proportion of rejection

proportion of rejection

Empirical power with π (y)

1

1

1.5

0.8 0.6 0.4 0.2 0

2

0

Empirical power with π (y,x ) 3

1 a

1.5

4

2

2

0.4 proportion of rejection

proportion of rejection

0.5

Empirical power with π (y,x )

2

0.5 0.4 0.3 0.2 0.1 0

2051

0

0.2

0.4

0.6

0.8

0.3 0.2 0.1 0

1

a

0

0.2

0.4

0.6

0.8

1

a

Fig. 1. Power of tests for Tn1 and Tn2 with n¼ 50: (a) for p1 ðyÞ and Study 1; (b) for p2 ðyÞ and Study 1; (c) for p3 ðy,x2 Þ and Study 2; (d) for p4 ðy,x2 Þ and Study 2. The dotted curve is for test Tn1 and the dash-dotted curve is for test Tn2.

Study 2. The data was generated from the following model: >

Y ¼ f ðXÞb þ aGðXÞ þ e,

ð10Þ pffiffiffi 2 where fðXÞ ¼ ðX 1 ,X 2 Þ with X 1  Uð0; 1Þ,X 2  Nð0; 1Þ, b ¼ ð1; 1Þ , GðXÞ ¼ fðX 1 þ 2X 2 Þ= 5g and e  Nð0,0:25Þ. For model (10), > the testing problem is H0 : EðY9XÞ ¼ f ðXÞb. It is evident that a¼0 is corresponding to the null hypothesis and aa0 to alternatives. We assume that X1 is missing at random, and the missing mechanisms are >

>

Case 3. p3 ðy,x2 Þ ¼ 1=ð1 þ9y9expðy2 x22 ÞÞ. Case 4. p4 ðy,x2 Þ ¼ 1=ð1 þ0:259y=ðyþ x2 Þ9Þ. For the above different cases, the mean response rates are Ep3 ðy,x2 Þ  0:86 and Ep4 ðy,x2 Þ  0:84 respectively. The results are present in Fig. 1(c) and (d) with n ¼50 and in Fig. 2(a) and (b) with n ¼100. From the plots, we can get the similar results as Study 1 except the following fact. Although the missing probability p3 ðy,x2 Þ and p4 ðy,x2 Þ are not logistics functions, the performance of Tn1 is competitive compared with that of Tn2 even if the dimension of p3 ðy,x2 Þ and p4 ðy,x2 Þ is two. To study the power performance of the proposed tests against high frequency alternatives, which is suggested by a referee, we carry out the following simulation study. Study 3. We generate the data from the model >

Y ¼ f ðXÞb þ aGðXÞ þ e:

ð11Þ

2052

X. Guo, W. Xu / Journal of Statistical Planning and Inference 142 (2012) 2047–2058

Empirical power with π (y)

Empirical power with π (y)

1

1

proportion of rejection

proportion of rejection

0.8 0.6 0.4 0.2 0

0

0.5

1 a

2

1

1.5

0.8 0.6 0.4 0.2 0

2

0

Empirical power with π (y, x ) 3

0.5

1 a

1.5

2

Empirical power with π (y, x )

2

4

2

0.8 proportion of rejection

proportion of rejection

0.6 0.6

0.4

0.2

0

0

0.2

0.4

a

0.6

0.8

1

0.4

0.2

0

0

0.2

0.4

a

0.6

0.8

1

Fig. 2. Power of tests for Tn1 and Tn2 with n¼100: (a) for p1 ðyÞ and Study 1; (b) for p2 ðyÞ and Study 1; (c) for p3 ðy,x2 Þ and Study 2; (d) for p4 ðy,x2 Þ and Study 2. The dotted curve is for test Tn1 and the dash-dotted curve is for test Tn2.

Here, the settings for fðXÞ, b and e in (11) are the same as that in (9), and GðXÞ ¼ sinð2pXÞ is a high frequency function. For model (11), the null hypothesis is valid if and only if a ¼0, and we assume that X is missing at random. Two missing probability mechanisms below are considered, i.e. Case 5. p5 ðyÞ ¼ Pðd ¼ 19Y ¼ yÞ ¼ 1=ð1 þ9y9expðyÞÞ. Case 6. p6 ðyÞ ¼ Pðd ¼ 19Y ¼ yÞ ¼ 1=ð1 þexpðy2 ÞÞ. For the above different cases, the mean response rates are Ep5 ðyÞ  0:78 and Ep6 ðyÞ  0:79; respectively. The size and power performance of Tnj are presented in Table 2. From it, we can see the power performance of the proposed tests still work well when the alternative is a high frequency function. Also, we can conclude that Tn2 is robust to nonparametric missing mechanisms but may lose some efficiency compared to Tn1. 3.2. Real data analysis Lee and Scott (1986) and Xue (2009) analyze a data set with sample size 50 on monozygotic twins. A scalar response Y about birth-weight of a baby and two covariates variables, i.e. XAC ( ¼AC) for abdominal circumference and XBDP ( ¼BDP) for the biparietal (head) diameter, are included in this data set. This real data is used for illustrating Xue’s (2009) methodology, which is to construct the confidence intervals and regions for the parameters of interest in linear regression models with missing response data.

X. Guo, W. Xu / Journal of Statistical Planning and Inference 142 (2012) 2047–2058

2053

Table 2 Simulated size and power under sample size n¼ 100, missing mechanisms p5 ðyÞ and p6 ðyÞ and different a for Study 3.

p5 ðyÞ

a

0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.000

p6 ðyÞ

Tn1

Tn2

Tn1

Tn2

0.042 0.127 0.321 0.534 0.769 0.905 0.968 0.991 0.997 1.000 1.000

0.061 0.114 0.275 0.485 0.727 0.887 0.962 0.991 0.995 0.999 1.000

0.055 0.144 0.334 0.576 0.791 0.905 0.973 0.992 0.999 1.000 1.000

0.060 0.107 0.226 0.427 0.643 0.803 0.907 0.953 0.968 0.989 0.999

1500

1000

1000

500

500 Y

Y

1500

0

0

−500

−500

−1000 −10

−5

0 XAC

5

10

−1000 −30

−20

−10 0 XBPD

10

20

Fig. 3. The plot for real data set: (a) for XAC and Y; (b) for XBPD and Y.

In this subsection, instead of missing 20% of the response as Xue (2009), we illustrate our methods by missing 20% of the covariate XAC randomly. The kernel function in Section 3.1 is used for computation. All the variables are centered first and the same notations are given without loss of generality. The scatter plot for the covariates and the outcome is shown in Fig. 3. The null hypothesis is considered H0 : EðY9XÞ ¼ X BDP bBDP þ X AC bAC

ð12Þ

for some bBDP and bAC . That is, we want to check whether the model is linear or not. Since d is generated randomly, the results are computed from 2000 simulation runs. The p-values for Tn1 and Tn2 are 0.478 and 0.562 respectively. As a result, the null hypothesis (12) cannot be rejected.

Acknowledgments The research was supported by the Fundamental Research Funds for the Central Universities, and the Research Funds of Renmin University of China (No. 12XNI004). Appendix. Proofs of the theorems The following conditions are required for the theorems in Section 2: (1) (2) (3) (4)

S and Sa are the positive definite matrixes.

pðZÞ has bounded partial derivatives up to order 2 almost surely. 4 4 sup Eðe2 9X ¼ xÞ o 1, E9X9 o1 and E9Y9 o 1. pffiffiffi 2 pffiffiffi nh -0 and nh-1 as n-1.

2054

X. Guo, W. Xu / Journal of Statistical Planning and Inference 142 (2012) 2047–2058

(5) The density of Z, call it f(z) on support C, exists and has bounded derivatives up to order 2 and satisfies 0 oinf f ðzÞ r supf ðzÞ o1: z2C

z2C

(6) The continuous kernel function KðÞ satisfies: (i) the support of KðÞ is the interval ½1; 1; (ii) KðÞ is symmetric about 0; R1 R1 (iii) 1 KðuÞ du ¼ 1 and 1 9u9KðuÞ dua0. Remark 2. Conditions (4) and (6) are typical for obtaining convergence rates when nonparametric estimation is applied. Condition (2) is a common assumption in missing data studies, and is also Sun and Wang (2009), and others. The conditions (1) and (3) are necessary for the asymptotic normality of the least squares estimator. Condition (5) is aimed for avoiding tedious proofs of the theorems. Without this condition, we have to resort to some truncation technique since some denominators may be zeros. We fist give two lemmas which are used for the proof of the theorems. pffiffiffi Lemma 1. Under conditions1–7 and the alternative H1n, the asymptotic properties of nðb^ N bÞ based on p^ ðzi Þ is as follows:  n  pffiffiffi pffiffiffi ^ S1 X di pðzi Þdi EðfðXÞZ9zj Þ þ S1 C n nEðfðXÞGðXÞÞ fðxi ÞZi þ nðb N bÞ ¼ pffiffiffi pðzi Þ n i ¼ 1 pðzi Þ 2 3 !1 n pffiffiffi 1X di > 1 5 4 C n nEðfðXÞGðXÞÞ þop ð1Þ: fðxi Þf ðxi Þ S þ n p^ ðzi Þ i¼1

Proof of Lemma 1. Note that ( )1 n n pffiffiffi ^ 1X di 1 X di pffiffiffi fðxi Þfðxi Þ> fðxi Þðyi fðxi Þ> bÞ ¼ A1 nðb N bÞ ¼ 1 A2 , n i ¼ 1 p^ ðzi Þ n i ¼ 1 p^ ðzi Þ we consider the properties of A1 and A2 below. For A1, we have n n 1X di 1X di ðpðzi Þp^ ðzi ÞÞfðxi Þfðxi Þ> fðxi Þfðxi Þ> þ n i ¼ 1 pðzi Þ n i ¼ 1 p2 ðzi Þ Pn n n pðzj Þdj 1X di 1X j ¼ 1 ðpðzi Þdj ÞK h ðzi zj Þ > > EðfðXÞf ðXÞ9zj Þ þop ð1Þ ¼ Sþ f ðx Þ f ðx Þ þ o ð1Þ ¼ S þ p i i n i ¼ 1 p2 ðzi Þ n j ¼ 1 pðzj Þ nf ðzi Þ

A1 ¼

¼ S þ op ð1Þ, >

here S ¼ EðfðXÞf ðXÞÞ. For A2, it can be verified that n n n 1 X di 1 X di 1 X di ðpðzi Þp^ ðzi ÞÞfðxi ÞZi fðxi ÞðZi þC n Gðxi ÞÞ ¼ pffiffiffi fðxi ÞðZi þ C n Gðxi ÞÞ þ pffiffiffi A2 ¼ pffiffiffi n i ¼ 1 p^ ðzi Þ n i ¼ 1 pðzi Þ n i ¼ 1 p2 ðzi Þ n 1 X di ðpðzi Þp^ ðzi ÞÞfðxi ÞC n Gðxi ÞÞ ¼ B1 þ B2 þB3 : þ pffiffiffi n i ¼ 1 p2 ðzi Þ

ð13Þ

For the term B1, it is evident that n pffiffiffi 1 X di fðxi ÞZi þC n nEðfðXÞGðXÞÞ: B1 ¼ pffiffiffi n i ¼ 1 pðzi Þ

For the terms B2 and B3, we can derive Pn n n pðzj Þdj 1 X di 1 X j ¼ 1 ðpðzi Þdj ÞK h ðzi zj Þ p ffiffiffi EðfðXÞZ9zj Þ þ op ð1Þ B2 ¼ fðxi ÞZi þ op ð1Þ ¼ pffiffiffi 2 nf ðzi Þ n i ¼ 1 p ðzi Þ n j ¼ 1 pðzj Þ and n pðzj Þdj Cn X EðfðXÞGðXÞ9zj Þ þ op ð1Þ: B3 ¼ pffiffiffi n j ¼ 1 pðzj Þ

Based on the fact that   pffiffiffi pðzj Þdj EðfðXÞGðXÞ9zj Þ þop ð1Þ ¼ op ð1Þ, EðB3 Þ ¼ C n nE pðzj Þ  2 pðZÞd EðfðXÞGðXÞ9ZÞ -0, VarðB3 Þ ¼ C 2n E pðZÞ

ð14Þ

ð15Þ

X. Guo, W. Xu / Journal of Statistical Planning and Inference 142 (2012) 2047–2058

2055

we conclude that B3 ¼ op ð1Þ:

ð16Þ

Based on Eqs. (13)–(16), we have n pffiffiffi di fðxi ÞZi þ ðpðzi Þdi ÞEðfðXÞZ9zi Þ 1 X : A2 ¼ C n nEðfðXÞGðXÞÞ þ pffiffiffi pðzi Þ ni¼1

As a result, 0( 1 )1 n X pffiffiffi ^ 1 d i 1 > nðb N bÞ ¼ @ fðxi Þfðxi Þ S AA2 þ S1 A2 n i ¼ 1 p^ ðzi Þ 0( 1 )1 n X pffiffiffi pffiffiffi 1 d i 1 ¼@ fðxi Þfðxi Þ> S AC n nEðfðXÞGðXÞÞ þ S1 C n nEðfðXÞGðXÞÞ n i ¼ 1 p^ ðzi Þ 

n di fðxi ÞZi þ ðpðzi Þdi ÞEðfðXÞZ9zi Þ S1 X

þ pffiffiffi n

i¼1

pðzi Þ



,

ð17Þ

P P and the last equation in (17) follows according to fn1 ni¼ 1 di fðxi Þfðxi Þ> =p^ ðzi Þg1 S1 ¼ op ð1Þ and n1=2 ni¼ 1 fdi fðxi ÞZi þ ðpðzi Þdi ÞEðfðXÞZ9zi Þg=pðzi Þ ¼ Op ð1Þ. We finish the proof for Lemma 1. & Lemma 2. Under conditions1–7 in Appendix and alternative H1n, the asymptotic properties of

pffiffiffi ^ nðb P bÞ with pðzi , a^ Þ is

n pffiffiffi pffiffiffi pffiffiffi ^ S1 X di fðxi ÞZi þ S1 C n nEðfðXÞGðXÞÞS1 Eðð1pðZ, aÞGEðfðXÞZ9ZÞÞÞ nða^ aÞ nðb P bÞ ¼ pffiffiffi n i ¼ 1 pðzi , aÞ 2 3 !1 n X pffiffiffi 1 d > i 1 þ4 fðxi Þf ðxi Þ S 5C n nEðfðXÞGðXÞÞ þop ð1Þ: n i ¼ 1 pðzi , a^ Þ

Proof of Lemma 2. Since ( )1 n n pffiffiffi ^ 1X di 1 X di > pffiffiffi nðb P bÞ ¼ fðxi Þfðxi Þ fðxi Þðyi fðxi Þ> bÞ ¼ C 1 1 C2, n i ¼ 1 pðzi , a^ Þ n i ¼ 1 pðzi , a^ Þ we only need to investigate the properties of C1 and C2. For the term C1, note that pðzi , a^ Þpðzi , aÞ ¼ p0 ðzi , aÞða^ aÞ þ op ðn1=2 Þ ¼ pðzi , aÞð1pðzi , aÞÞGi ða^ aÞ þ op ðn1=2 Þ, we have C1 ¼

pffiffiffi nða^ aÞ ¼ Op ð1Þ and

n n 1X di 1X di ðpðzi , a^ Þpðzi , aÞÞfðxi Þfðxi Þ> þ op ð1Þ fðxi Þfðxi Þ>  n i ¼ 1 pðzi , aÞ n i ¼ 1 p2 ðzi Þ

¼ S

n 1X di ð1pðzi , aÞÞGi ða^ aÞ þ op ð1Þ ¼ S þ op ð1Þ: n i ¼ 1 pðzi , aÞ

For C2, it can be verified that n n 1 X di 1 X di ðpðzi , a^ Þpðzi , aÞÞfðxi ÞZi C 2 ¼ pffiffiffi fðxi ÞðZi þ C n Gðxi ÞÞ pffiffiffi n i ¼ 1 pðzi , aÞ n i ¼ 1 p2 ðzi Þ n 1 X di ðpðzi , a^ Þpðzi , aÞÞfðxi ÞC n Gðxi ÞÞ þ op ð1Þ ¼ D1 þ D2 þ D3 þ op ð1Þ:  pffiffiffi n i ¼ 1 p2 ðzi Þ

ð18Þ

Blow we show the properties of D1, D2 and D3 respectively. As for D1, we have n pffiffiffi 1 X di D1 ¼ pffiffiffi fðxi ÞZi þ C n nEðfðXÞGðXÞÞ: p ðz , a Þ ni¼1 i

ð19Þ

For D2, we have n pffiffiffi 1 X di ð1pðzi , aÞÞGi ða^ aÞfðxi ÞZi þ op ð1Þ ¼ Eðð1pðZÞÞGEðfðXÞZ9ZÞÞ nða^ aÞ þ op ð1Þ: D2 ¼ pffiffiffi n i ¼ 1 pðzi , aÞ

ð20Þ

It’s easy to show D3 ¼

n pffiffiffi Cn X di ð1pðzi , aÞÞGi fðxi ÞGðxi Þ nða^ aÞ þ op ð1Þ ¼ op ð1Þ: n i ¼ 1 pðzi , aÞ

ð21Þ

2056

X. Guo, W. Xu / Journal of Statistical Planning and Inference 142 (2012) 2047–2058

Combine (18)–(21), we have n pffiffiffi pffiffiffi 1 X di C 2 ¼ C n nEðfðXÞGðXÞÞ þ pffiffiffi fðxi ÞZi Eðð1pðZÞÞGEðfðXÞZ9ZÞÞ nða^ aÞ þ op ð1Þ: n i ¼ 1 pðzi , aÞ P P Based on the above analysis and the fact that fn1 ni¼ 1 di fðxi Þfðxi Þ> =pðzi , a^ Þg1 S1 ¼ op ð1Þ and n1=2 ni¼ 1 fdi fðxi ÞZi = pðzi , aÞg ¼ Op ð1Þ, we have 2( 3 )1 n pffiffiffi ^ 1X di 1 5 > 4 C 2 þ S1 C 2 fðxi Þfðxi Þ S nðb P bÞ ¼ n i ¼ 1 pðzi , a^ Þ n S1 X

¼ pffiffiffi n 2

di

pðzi , aÞ i¼1

pffiffiffi

n 1X di fðxi Þf> ðxi Þ þ4 n i ¼ 1 pðzi , a^ Þ

Thus Lemma 2 is proved.

pffiffiffi

fðxi ÞZi þ S1 C n nEðfðXÞGðXÞÞS1 Eðð1pðZÞGEðfðXÞZ9ZÞÞÞ nða^ aÞ 3

!1

pffiffiffi C n nEðfðXÞGðXÞÞ þ op ð1Þ:

1 5

S

&

Proof of Theorem 1. For Tn1, it can be verified that n n n 1 X di 1 X di > 1 X di > > ðyi f ðxi ÞbÞ pffiffiffi T n1 ¼ pffiffiffi ðpðzi Þp^ ðzi ÞÞðyi f ðxi ÞbÞ f ðxi Þðb^ N bÞ þ pffiffiffi n i ¼ 1 pðzi Þ n i ¼ 1 pðzi Þ n i ¼ 1 p2 ðzi Þ n 1 X di >  pffiffiffi ðpðzi Þp^ ðzi ÞÞf ðxi Þðb^ N bÞ þop ð1Þ n i ¼ 1 p2 ðzi Þ n pffiffiffi di ei þ ðpðzi Þdi ÞEðe9zi Þ 1 X > Eðf ðXÞÞ nðb^ N bÞ þ op ð1Þ: ¼ pffiffiffi pðzi Þ ni¼1

By the fact that n pffiffiffi ^ di fðxi Þei þ ðpðzi Þdi ÞEðfðXÞe9zi Þ S1 X þop ð1Þ, nðb N bÞ ¼ pffiffiffi pðzi Þ n i¼1

we have > n n ðpðzi Þdi ÞfEðe9zi ÞEðf ðXÞÞS1 EðfðXÞe9zi Þg 1 X di ei f1Eðf> ðXÞÞS1 fðxi Þg 1 X þ pffiffiffi þ op ð1Þ: T n1 ¼ pffiffiffi pðzi Þ pðzi Þ ni¼1 ni¼1

Thus the asymptotic properties of Tn1 follows by the central limit theorem. For Tn2, we have n n n 1 X di 1 X di 1 X di > > ðy f ðxi ÞbÞ pffiffiffi T n2 ¼ pffiffiffi ðpðzi , a^ Þpðzi , aÞÞðyi f ðxi ÞbÞ f> ðxi Þðb^ P bÞ pffiffiffi n i ¼ 1 pðzi , aÞ i n i ¼ 1 pðzi , aÞ n i ¼ 1 p2 ðzi Þ n 1 X di > þ pffiffiffi ðpðzi , a^ Þpðzi , aÞÞf ðxi Þðb^ P bÞ n i ¼ 1 p2 ðzi Þ n pffiffiffi pffiffiffi 1 X di ei > Eðf ðXÞÞ nðb^ P bÞEðð1pðZÞÞGEðe9ZÞÞ nða^ aÞ þ op ð1Þ: ¼ pffiffiffi n i ¼ 1 pðzi , aÞ

Based on the equations that n pffiffiffi pffiffiffi ^ S1 X di fðxi Þei S1 Eðð1pðZÞÞGEðfðXÞe9ZÞÞ nða^ aÞ þ op ð1Þ nðb P bÞ ¼ pffiffiffi n i ¼ 1 pðzi , aÞ n pffiffiffi 1 X p0 ðzi , aÞ> ðdi pðzi , aÞÞ þ op ð1Þ, nða^ aÞ ¼ Eðp0 ðZ, aÞ> p0 ðZ, aÞÞ1 pffiffiffi ni¼1

we got n n 1 X di ei ð1Eðf> ðXÞÞS1 fðxi ÞÞ 1 X > pffiffiffi þ fEðf ðXÞÞS1 M 2 M 1 gS1 T n2 ¼ pffiffiffi p0 ðzi , aÞ> ðdi pðzi , aÞÞ þ op ð1Þ: a pðzi , aÞ ni¼1 ni¼1

Consequently, the asymptotic distribution of Tn2 follows based on the center limit theorem. Proof of Theorem 2. Under the alternative H1n, for the test Tn1, we have  n n  1 X di 1 X di d > > ðyi f ðxi Þb^ N Þ þ pffiffiffi  i ðyi f ðxi Þb^ N Þ ¼ I1 þ I2 : T n1 ¼ pffiffiffi n i ¼ 1 pðzi Þ n i ¼ 1 p^ ðzi Þ pðzi Þ

&

X. Guo, W. Xu / Journal of Statistical Planning and Inference 142 (2012) 2047–2058

2057

We study the properties of I1 and I2 respectively. For I1, we have n n n pffiffiffi pffiffiffi 1 X di 1 X di > 1 X di > ðyi f ðxi ÞbÞ pffiffiffi f ðxi Þðb^ N bÞ ¼ pffiffiffi Z þ C n nEðGðXÞÞEðf> ðXÞÞ nðb^ N bÞ: I1 ¼ pffiffiffi n i ¼ 1 pðzi Þ n i ¼ 1 pðzi Þ n i ¼ 1 pðzi Þ i

For I2, it can be showed that   n  n  1 X di d 1 X di d > >  i ðyi f ðxi ÞbÞ pffiffiffi  i f ðxi Þðb^ N bÞ ¼ I21 I22 : I2 ¼ pffiffiffi n i ¼ 1 p^ ðzi Þ pðzi Þ n i ¼ 1 p^ ðzi Þ pðzi Þ

ð22Þ

ð23Þ

We can prove for I21 that Pn n 1 X di j ¼ 1 ðpðzi Þdj ÞK h ðzi zj Þ ðZi þ C n Gðxi ÞÞ þop ð1Þ I21 ¼ pffiffiffi nf ðzi Þ n i ¼ 1 p2 ðzi Þ n n n pðzj Þdj pðzj Þdj pðzj Þdj 1 X Cn X 1 X EðZ9zj Þ þ pffiffiffi EðGðXÞ9zj Þ þ op ð1Þ ¼ pffiffiffi EðZ9zj Þ þ op ð1Þ, ¼ pffiffiffi n j ¼ 1 pðzj Þ n j ¼ 1 pðzj Þ n j ¼ 1 pðzj Þ

ð24Þ

and the last equation follows because the expectation and variance of the second term tends to zero. For I22, based on Lemma 1 and the following expression: n n 1X di 1X pðzi Þdi > > Eðf ðXÞ9zi Þ þop ð1Þ ¼ op ð1Þ, ðpðzi Þp^ ðzi ÞÞf ðxi Þ ¼ 2 n i ¼ 1 p ðzi Þ n i ¼ 1 pðzi Þ

we can have I22 ¼

n n X pffiffiffi 1X di S1 C n pðzi Þdi > > Eðf ðXÞ9zi Þ þop ð1Þ ¼ op ð1Þ, ðpðzi Þp^ ðzi ÞÞf ðxi Þ nðb^ N bÞ ¼ pffiffiffi EðfðXÞGðXÞÞ n i ¼ 1 p2 ðzi Þ pðzi Þ n i¼1

ð25Þ

where the last equation is based on the fact that the expectation and variance of the first term tends to zero. According to (22)–(25), we have n pffiffiffi pffiffiffi di Zi þðpðzi Þdi ÞEðZ9zi Þ 1 X > þC n nEðGðXÞÞEðf ðXÞÞ nðb^ N bÞ: T n1 ¼ pffiffiffi p ðz Þ ni¼1 i

If n1=2 C n -1, by Lemma 1, we can have n di Zi þðpðzi Þdi ÞEðZ9zi Þ 1 X > þEðGðXÞÞEðf ðXÞÞ T n1 ¼ pffiffiffi pðzi Þ ni¼1 !  n  S1 X di pðzi Þdi EðfðXÞZ9zi Þ þ S1 EðfðXÞGðXÞÞ þ op ð1Þ:  pffiffiffi fðxi ÞZi þ pðzi Þ n i ¼ 1 pðzi Þ

pffiffiffi If nr C n -a,0 or o 1=2, then it yields nC n -1, as n-1. As a result, we have T n1 -1. We investigate the asymptotic property of Tn2 blow. Note that  n n  1 X di 1 X di di > > ðyi f ðxi Þb^ P Þ þ pffiffiffi ðyi f ðxi Þb^ P Þ ¼ J1 þ J 2 ,  T n2 ¼ pffiffiffi ^ n i ¼ 1 pðzi , aÞ n i ¼ 1 pðzi , a Þ pðzi , aÞ we only need to study the properties of J1 and J2. For J1, we have n n n pffiffiffi pffiffiffi 1 X di 1 X di 1 X di > ðyi f ðxi ÞbÞ pffiffiffi f> ðxi Þðb^ P bÞ ¼ pffiffiffi Z þ C n nEðGðXÞÞEðf> ðXÞÞ nðb^ P bÞ: J1 ¼ pffiffiffi n i ¼ 1 pðzi , aÞ n i ¼ 1 pðzi , aÞ n i ¼ 1 pðzi , aÞ i

ð26Þ For J2, it can be shown that   n  n  1 X di di 1 X di di > ðyi f ðxi ÞbÞ pffiffiffi   f> ðxi Þðb^ P bÞ ¼ J 21 J22 : J2 ¼ pffiffiffi n i ¼ 1 pðzi , a^ Þ pðzi , aÞ n i ¼ 1 pðzi , a^ Þ pðzi , aÞ

ð27Þ

It can be proved for J21 that n n pffiffiffi pffiffiffi 1X di Cn X di ðpðzi , aÞ1ÞGi Zi nða^ aÞ þ ðpðzi , aÞ1ÞGi Gðxi Þ nða^ aÞ þ op ð1Þ n i ¼ 1 pðzi , aÞ n i ¼ 1 pðzi , aÞ pffiffiffi ¼ Eðð1pðZÞÞGEðZ9ZÞÞ nða^ aÞ þop ð1Þ:

J21 ¼

The last equation follows from the expectation and variance of the second term tends to zero.

ð28Þ

2058

X. Guo, W. Xu / Journal of Statistical Planning and Inference 142 (2012) 2047–2058

For the term J22, based on Lemma 2, it can be shown that J 22 ¼ ¼

n pffiffiffi 1X di > ðpðzi , aÞ1ÞGi f ðxi Þða^ aÞ nðb^ P bÞ n i ¼ 1 pðzi , aÞ n X pffiffiffi di > pffiffiffi EðfðXÞGðXÞÞ ðpðzi , aÞ1ÞGi f ðxi Þ nða^ aÞ þ op ð1Þ ¼ op ð1Þ, p ðz , a Þ n i i¼1

S1 C n

ð29Þ

here, the last equation follows since the expectation and variance of the first term tends to zero. According to (26)–(29), we have n pffiffiffi pffiffiffi pffiffiffi 1 X di Zi > T n2 ¼ pffiffiffi þ C n nEðGðXÞÞEðf ðXÞÞÞ nðb^ P bÞEðð1pðZÞÞGEðZ9ZÞÞ nða^ aÞ þ op ð1Þ: n i ¼ 1 pðzi , aÞ

If n1=2 C n -1, by Lemma 2, n n 1 X di Zi S1 X di > þ EðGðXÞÞEðf ðXÞÞ pffiffiffi T n2 ¼ pffiffiffi fðxi ÞZi þ S1 EðfðXÞGðXÞÞ n i ¼ 1 pðzi , aÞ n i ¼ 1 pðzi , aÞ  pffiffiffi pffiffiffi S1 Efð1pðZ, aÞGEðfðXÞZ9ZÞÞg nða^ aÞ Eðð1pðZÞÞGEðZ9ZÞÞ nða^ aÞ þ op ð1Þ:

If nr C n -a,0 o r o 1=2, it yields

pffiffiffi nC n -1 as n-1. As a result, T n -1. Theorem 2 is proved.

&

References ¨ Hardle, W., Mammen, E., 1993. Comparing nonparametric versus parametric regression fits. Annals of Statistics 21, 1926–1947. ¨ ¨ Hardle, W., Mammen, E., Muller, M., 1998. Testing parametric versus semiparametric modeling in generalized linear models. Journal of the American Statistical Association 93, 1461–1474. Jennrich, R.I., 1969. Asymptotic properties of non-least squares estimators. Annals of Mathematical Statistics 40, 633–643. Lee, A.J., Scott, A.J., 1986. Ultrasound in ante-natal diagnosis. In: Brook, R.J., Arnold, G.C., Hassard, T.H., Pringle, R.M. (Eds.), The Fascination of Statistics, Marcel Dekker, New York, pp. 277–293. Little, R.J.A., Rubin, D.B., 1987. Statistical Analysis with Missing Data. Wiley, New York. Manteiga, G.W., Gonza´lez, P.A., 2006. Goodness-of-fit tests for linear regression models with missing response data. Canadian Journal of Statistics 34, 149–170. Stute, W., Manteiga, G.W., 1996. NN goodness-of-fit tests for linear models. Journal of Statistical Planning and Inference 53, 75–92. Stute, W., Thies, S., Zhu, L.X., 1998. Model checks for regression: an innovation process approach. Annals of Statistics 26, 1916–1934. Stute, W., Zhu, L.X., 2002. Model checks for generalized linear models. Scandinavian Journal of Statistics 29, 535–545. Sun, Z.H., Wang, Q.H., 2009. Checking the adequacy of a general linear model with responses missing at random. Journal of Statistical Planning and Inference 139, 3588–3604. Weaver, M.A., Zhou, H., 2005. An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling. Journal of the American Statistical Association 100, 459–469. Xue, L.G., 2009. Empirical likelihood for linear models with missing responses. Journal of Multivariate Analysis 100, 1353–1366. Zhou, H., Weaver, M.A., Qin, J., Longnecker, M.P., Wang, M.C., 2002. A semiparametric empirical likelihood method for data from an outcome-dependent sampling scheme with a continuous outcome. Biometrics 58, 413–421. Zhu, L.X., Cui, H.J., 2005. Testing lack-of-fit for general linear errors in variables models. Statistica Sinica 15, 1049–1068. Zhu, L.X., Ng, K.W., 2003. Checking the adequacy of a partial linear model. Statistica Sinica 13, 763–781.