Journal of Statistical Planning and Inference 142 (2012) 347–357
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
Semiparametric estimation for weighted average derivatives with responses missing at random Wanrong Liu a, Xuewen Lu a,b,, Changchun Xie c a b c
HPCSIP Key Laboratory, Ministry of Education, Department of Statistics and Financial Mathematics, Hunan Normal University, Changsha 410081, China Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada T2N 1N4 Department of Medicine, Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada L8L 2X2
a r t i c l e i n f o
abstract
Article history: Received 11 February 2011 Received in revised form 28 July 2011 Accepted 29 July 2011 Available online 5 August 2011
When responses are missing at random, we propose a semiparametric direct estimator for the missing probability and density-weighted average derivatives of a general nonparametric multiple regression function. An estimator for the normalized version of the weighted average derivatives is constructed as well using instrumental variables regression. The proposed estimators are computationally simple and asymptotically normal, and provide a solution to the problem of estimating index coefficients of singleindex models with responses missing at random. The developed theory generalizes the method of the density-weighted average derivatives estimation of Powell et al. (1989) for the non-missing data case. Monte Carlo simulation studies are conducted to study the performance of the methods. & 2011 Elsevier B.V. All rights reserved.
Keywords: Kernel estimation Missing probability and density-weighted average derivatives Responses missing at random Single-index model
1. Introduction In studying the relationship between a response and a set of predictor variables, the mean response variable is often assumed to be a linear regression function of the predictor variables. Recently, many models have been developed in studying nonparametric estimation of the regression function and its derivatives, the latter is of special interest to econometricians and other applied researchers. A general nonparametric multiple regression model is usually defined as Y ¼ mðXÞ þ e,
ð1Þ
where Y is a response variable, X is a k-vector predictor variable and distributed with density f(x), mðÞ is an unknown regression function, e is a random error satisfying Eðe9X ¼ xÞ ¼ 0 and Eðe2 9X ¼ xÞ ¼ s2 ðxÞ. A non-constant variance function s2 ðxÞ represents possible heteroscedasticity. For any function s(x), let sð1Þ ðxÞ ¼ @sðxÞ=@x. In model (1), for an arbitrary weighting function w(x), the weighted average derivatives of m(x) are defined by
dw ¼ E½wðXÞmð1Þ ðXÞ:
ð2Þ
A primary motivation for weighted average derivatives is the single-index model given by Y ¼ gðX T bÞ þ e,
ð3Þ
Corresponding author at: Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada T2N 1N4. Tel.: þ 1 403 220 6620; fax: þ 1 403 282 5150. E-mail address:
[email protected] (X. Lu).
0378-3758/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2011.07.024
348
W. Liu et al. / Journal of Statistical Planning and Inference 142 (2012) 347–357
when mðxÞ ¼ gðxT bÞ for an unknown link function gðÞ and a vector of regression coefficients b, where aT denotes the transpose of a column vector or a matrix a. The ‘‘single-index’’ form arises in various models of limited dependent variables, see Stoker (1986) for examples. This model generalizes many other models, such as classical linear models, generalized linear models and censored regression models. It has wide applications in various fields such as economics and ¨ biometrics. In addition to Hardle and Stoker (1989) and Powell et al. (1989), various authors have studied how to estimate ¨ average derivatives and single-index coefficients, for example, Horowitz and Hardle (1996), Hristache et al. (2001), Yu and Ruppert (2002), Li et al. (2003), GØrgens (2004), Lu and Burke (2005), Xia (2006), and Xue and Zhu (2006), among others. For the singe-index model (3), we see that @m @gðxT bÞ ¼ ¼ ½g ð1Þ ðxT bÞb @x @x then, the vector of weighted average derivatives defined in (2) becomes
dw ¼ E½wðXÞmð1Þ ðXÞ gw b, which is proportional to b, provided gw ¼ E½wðXÞg ð1Þ ðX T bÞa0. This ‘‘single-index’’ restriction on the regression gðÞ enables ¨ us to estimate b up to scale. For example, the vector of weighted average derivatives dw has been studied by Hardle and Stoker (1989) for w(x)¼1 and Powell et al. (1989) for w(x)¼f(x) respectively. In particular, the estimation presented in Powell et al. (1989) is called the density-weighted average derivative (DWAD) estimation. By integration by parts, we obtain
dw ¼ E½lðXÞmðXÞ, where lðXÞ ¼ wð1Þ ðXÞwðXÞf ð1Þ ðXÞ=f ðXÞ. Taking wðxÞ ¼ f ðxÞ, we have lðxÞ ¼ 2f ð1Þ ðxÞ. Then, the vector of density-weighted average derivatives can be written as
ddw ¼ E½f ðXÞmð1Þ ðXÞ ¼ 2E½f ð1Þ ðXÞmðXÞ: Therefore, an estimator of ddw is defined as
d^ dw ¼ 2n1
n X
ð1Þ f^ i ðXi ÞYi ,
i¼1 ð1Þ where f^ i ðxÞ is the leave-one-out kernel density derivative estimator of f ð1Þ ðxÞ given by k þ 1 n X ð1Þ xXj 1 1 f^ i ðxÞ ¼ : K ð1Þ n1 j ¼ 1,jai h h
Powell et al. (1989) have stated that the DWAD estimation procedure exhibits several attractive characteristics: for example, data summarization through interpretable coefficients, graphical depiction of possible nonlinearity between Y and X T b, easy and direct computations without maximization or minimization and iteration. Compared to many other approaches, the DWAD estimation procedure does not require the choice of a trimming bound and the estimator d^ dw does not involve a random denominator, it has fewer complex technical and practical problems. On the other hand, simulations studies have shown that it has better finite sample performance than other direct estimators. Geenens and Delecroix (2006) made some comparisons among different semiparametric estimators. They showed that in a simulation study, when the sample size was small, the DWAD estimation even outperformed some semiparametric efficient estimators such ¨ as the semiparametric least-squares estimator studied by Ichimura (1993) and Hardle et al. (1993). All the aforementioned estimation procedures are based on the condition that the data are completely observed. However, in practice, often not all responses are available for various reasons such as unwillingness of some subjects to supply the desired information and not fully conducted experiments to save expenses. Examples like these, for instance, can be found in Chen et al. (2006) and Liang et al. (2007), where the standard inference procedures cannot be used directly. In this paper, when there are missing responses, we are interested in inference for a new weighed average derivative of m(x) using both the missing probability and the density in the weights, where both quantities are modeled nonparametrically. In specific, suppose that we obtain the following incomplete i.i.d observations ðYi , xi ,Xi Þ,
i ¼ 1,2, . . . ,n
from model (1), where all the Xi ’s are observed, xi ¼ 0 if Yi is missing and xi ¼ 1 otherwise. We assume that the missing data mechanism follows the missing at random (MAR) in the sense that Pðx ¼ 19Y,XÞ ¼ Pðx ¼ 19XÞ ¼ pðXÞ for some unknown selection probability pðXÞ. Our objective is to estimate weighted average derivatives of m(x) in model (1) and b in model (3). For the regression problems with missing responses, many researchers have used semiparametric models and methods of data imputation for the estimation. For example, Wang et al. (2004) studied the partially linear model under the responses MAR. Under a different missing mechanism, i.e., the missingness structure is missing not at random (MNAR), Liang et al. (2007) considered a class of semiparametric estimators for the parameter of interest in that model. However,
W. Liu et al. / Journal of Statistical Planning and Inference 142 (2012) 347–357
349
when the responses are missing at random, there is little work on the estimation of weighted average derivatives and the ¨ single-index model. Although Hardle and Stoker (1989) and Powell et al. (1989) provided simple estimators, their methods cannot be applied directly to this situation. A naive method is to use the complete case (CC) data based on their methods by simply ignoring the missing data. For example, we could define a naive estimator adopted from the estimator given by Powell et al. (1989) as
d^ ndw ¼ 2n1
n X
ð1Þ f^ i ðXi Þxi Yi :
ð4Þ
i¼1
However, unlike the new estimator d^ mpdw defined in (6), this naive estimator estimates dndw ¼ 2E½f ð1Þ ðXÞfpðXÞmðXÞg ¼ E½f ðXÞfpðXÞmðXÞgð1Þ ¼ E½f ðXÞmðXÞpð1Þ ðXÞ þ E½f ðXÞpðXÞmð1Þ ðXÞ, which is not a weighted average derivative of m(x) when the average derivative of the non-missing probability pðxÞ ¼ Eðx9X ¼ xÞ is not proportional to that of m(x), then is not proportional to the index coefficients b in the single-index model, i.e., the estimator of b will be biased, unless the responses are missing completely at random (MCAR) in the sense that the missing mechanism is independent of any other variables or the missing indicator x follows a single-index model with the same index coefficients b. To construct a simple and meaningful estimator for weighted average derivatives dw and get an asymptotically unbiased estimator of b in the single-index model, in this paper, we will propose a direct estimator of the missing probability and density-weighted average derivatives and show that our estimator generalizes that of Powell et al. (1989) to the case of responses MAR and preserves the same features of their estimator. We also suggest an estimator for the rescaled weighted average derivatives through instrumental variables regression. For the single-index model (3), the estimators of the index coefficients b are obtained via the normalization of both the estimators. Theory for inferences is established for all the parameters. Recently, Wang et al. (2010) considered the estimation problems for the single-index model (3), they derived semiparametric nonlinear least-squares estimators by incorporating missing mechanism into the ¨ least-squares loss function proposed by Hardle et al. (1993) and minimizing the loss function with respect to the bandwidth and the parameters simultaneously. Their estimators are non-direct estimators, the algorithm is iterative and requires some additional care in choosing trimming bounds or sets and initial values. Our methods are direct and computationally simple, as useful alternative approaches, they can work not only for the single-index model (3) but also for a more general nonparametric multiple regression model (1). We will conduct simulation studies to compare the two different approaches for the estimation of the single-index model. Some advantages of the proposed methods are shown in these simulation studies. The plan of the paper is as follows. In Section 2, we present the proposed methods and main results. In Section 3, we report Monte Carlo simulation results. In Section 4, some concluding remarks are made. The assumptions needed for the main theorems and some technical details are placed as an appendix. 2. Methodology and main results 2.1. Missing probability and density-weighted average derivatives To motivate our new estimators for weighted average derivatives and single-index coefficients with responses missing ~ at random, letting wðxÞ ¼ p2 ðxÞf ðxÞ, we consider the following weighted average derivative: ð1Þ ~ dw~ E½xpðXÞf ðXÞmð1Þ ðXÞ ¼ E½p2 ðXÞf ðXÞmð1Þ ðXÞ ¼ E½wðXÞm ðXÞ:
Under Assumptions 1–3 in the Appendix, mðxÞf ðxÞ vanishes at the boundary of the support of X. Using integration by parts and noticing that the responses are missing at random, we have Z dw~ ¼ fpðxÞf ðxÞg2 mð1Þ ðxÞ dx Z ¼ 2 mðxÞpðxÞf ðxÞfpðxÞf ð1Þ ðxÞ þf ðxÞpð1Þ ðxÞg dx ¼ 2E½pðXÞmðXÞfpðXÞf ðXÞgð1Þ ¼ 2E½xYfpðXÞf ðXÞgð1Þ : On the other hand, for the single-index model (3), we have ð1Þ ~ dw~ ¼ E½wðXÞm ðXÞ gw~ b,
ð5Þ T
~ ðX bÞa0. This type of weighted average derivative dw~ has a which is proportional to b as well, provided gw~ ¼ E½wðXÞg ~ involves simple form for missing response data, it is of interest for us to investigate it. Since the new weight function w both pðxÞ and f(x), we name the new weighted average derivative dw~ as the missing probability and density-weighted average derivative (MPDWAD) and denoted it by dmpdw . This development makes it possible for us to estimate b asymptotically unbiasedly in the single-index model with responses missing at random. We summarize the proceeding observations in the following lemma. ð1Þ
350
W. Liu et al. / Journal of Statistical Planning and Inference 142 (2012) 347–357
Lemma 1. Let qðxÞ ¼ pðxÞf ðxÞ, then,
dmpdw ¼ E½xqðXÞmð1Þ ðXÞ ¼ E½fpðXÞqðXÞgmð1Þ ðXÞ ¼ 2E½qð1Þ ðXÞfxYg:
In the rest of the paper, our objectives are to develop new methods of estimation for dmpdw and b. 2.2. Estimation for MPDWAD Under the assumptions given in the Appendix, we will establish some analogous results to those of Powell et al. (1989). The detailed proofs can be obtained following the same lines in their proofs for the associated lemmas and theorems. For the sake of the simplicity, we outline our approaches only and omit details. In the similar spirit as in the DWAD estimation method by Powell et al. (1989) for complete data, we define the following MPDWAD estimator for dmpdw , then consider its asymptotic normality. An estimator of dmpdw is defined as
d^ mpdw ¼ 2n1
n X
ð1Þ q^ i ðXi Þxi Yi ,
ð6Þ
i¼1 ð1Þ where q^ i ðxÞ is the leave-one-out kernel estimator of qð1Þ ðxÞ given by k þ 1 n X xXj 1 1 ð1Þ q^ i ðxÞ ¼ , xj K ð1Þ n1 j ¼ 1,jai h h
ð7Þ
which is a consistent estimator of qð1Þ ðxÞ. In fact, we can write " #ð1Þ " # ð1Þ q^ i ðxÞ q^ i ðxÞ ^ ^ ^q ð1Þ f i ðxÞ þ f i ðxÞ : i ðxÞ ¼ ^ f i ðxÞ f^ i ðxÞ Since q^ i ðxÞ -P pðxÞ, f^ ðxÞ
f^ i ðxÞ-P f ðxÞ
and
ð1Þ f^ i ðxÞ-P f ð1Þ ðxÞ,
i
where -P denotes convergence in probability, we have ð1Þ q^ i ðxÞ-P pð1Þ ðxÞf ðxÞ þ f ð1Þ ðxÞpðxÞ ¼ ½pðxÞf ðxÞð1Þ ¼ qð1Þ ðxÞ:
Inserting (7) into (6), we obtain k þ 1 n n X Xi Xj 2 X 1 d^ mpdw ¼ K ð1Þ xi xj Yi : nðn1Þ i ¼ 1 j ¼ 1,jai h h Using the fact K ð1Þ ðuÞ ¼ K ð1Þ ðuÞ, we write d^ mpdw in the standard ‘‘U-statistic’’ form as k þ 1 1 X n1 X n n Xi Xj 1 d^ mpdw ¼ K ð1Þ xi xj ðYi Yj Þ: h h 2 i ¼ 1 j ¼ iþ1 To prove the asymptotic normality of d^ mpdw , we can use a general result on the asymptotic behavior of U-statistics given by Powell et al. (1989). Consider a general ‘‘second-order’’ U-statistic of the form 1 X n1 X n n Un ¼ Zn ðZi ,Zj Þ, 2 i ¼ 1 j ¼ iþ1 where fZi ,i ¼ 1, . . . ,ng is an i.i.d. random sample and Zn is a d-dimensional symmetric kernel; that is, Zn ðZi ,Zj Þ ¼ Zn ðZj ,Zi Þ. Define rn ðZi Þ ¼ E½Zn ðZi ,Zj Þ9Zi ,
yn ¼ E½rn ðZi Þ ¼ E½Zn ðZi ,Zj Þ, n 2X U^ n ¼ yn þ ½rn ðZi Þyn , ni¼1
where U^ n is the projection of the statistic Un. Powell et al. (1989) established the following asymptotic equivalence of Un and U^ n : pffiffiffi Lemma 2 (Powell et al., 1989, Lemma 3.1). If E½JZ ðZi ,Zj ÞJ2 ¼ oðnÞ, then nðUn U^ n Þ ¼ op ð1Þ. n
W. Liu et al. / Journal of Statistical Planning and Inference 142 (2012) 347–357
351
Now, we apply Lemma 2 to our case. Let Zi ¼ ðYi , xi ,XiT ÞT , and rewrite (6) as 1 X n1 X n n d^ mpdw ¼ Zn ðZi ,Zj Þ, 2 i ¼ 1 j ¼ iþ1 with k þ 1 Xi Xj 1 K ð1Þ xi xj ðYi Yj Þ: h h
Zn ðZi ,Zj Þ ¼
kþ2
Following the arguments in the proof of (3.13) in Powell et al. (1989), it is easy to show that E½JZn ðZi ,Zj ÞJ2 ¼ O½nðnh Þ1 , kþ2 2 hence, E½JZn ðZi ,Zj ÞJ ¼ oðnÞ if and only if nh -1 as h-0. Therefore, under this condition, we have the following theorem: Theorem 1. Given Assumptions 1–6 in the Appendix, if h obeys (a) n-1, h-0; (b) nh
kþ2
2r
-1; and (c) nh -0, then
n1=2 ðd^ mpdw dmpdw Þ-D Nð0, Smpdw Þ, T
where ‘‘-D ’’ means convergence in distribution, Smpdw ¼ 4E½rmpdw ðY, x,XÞrmpdw ðY, x,XÞT 4dmpdw dmpdw is the variance–covariance matrix of 2rmpdw ðY, x,XÞ, with rmpdw ðy, x,xÞ ¼ xqðxÞmð1Þ ðxÞxfymðxÞgqð1Þ ðxÞ:
Denote the sample analog of rmpdw ðZi Þ ¼ rmpdw ðYi , xi ,Xi Þ by k þ 1 n Xi Xj 1 X 1 K ð1Þ xi xj ðYi Yj Þ: r^ mpdw ðZi Þ ¼ n1 j ¼ 1,jai h h The asymptotic variance–covariance matrix Smpdw can be estimated using r^ mpdw ðZi Þ’s via Pn T ðZ Þr^ ðZ ÞT r^ S^ mpdw ¼ 4 i ¼ 1 mpdw i mpdw i 4d^ mpdw d^ mpdw : n
2.3. Estimation for the rescaled MPDWAD When model (1) is truly the single-index model (3), from (5) we see that any scaling normalization permits identification of the index coefficients b. Since the scaling normalization implicit in the definition of dmpdw ¼ E½fpðXÞqðXÞgmð1Þ ðXÞ may not always yield the most easily interpreted estimates, following Powell et al. (1989), we use the rescaled coefficients
dmprs dmpdw =E½pðXÞqðXÞ ¼ E½fpðXÞqðXÞgmð1Þ ðXÞ=E½pðXÞqðXÞ: It is seen that dmprs is a true weighted average of the ‘‘effects’’ mð1Þ ðXÞ and is more easily interpreted than dmpdw . For example, when the underlying model is a linear model, i.e., mðxÞ ¼ m0 þ xT b, it is obtained that mð1Þ ðxÞ ¼ b, dmpdw ¼ E½pðXÞqðXÞb and dmprs ¼ b, thus, dmprs is comparable to the regression coefficients b in a linear model. We use an instrumental variables regression to estimate dmprs . Define n X
d^ x,mpdw ¼ 2n1
ð1Þ q^ i ðXi Þxi XiT ,
i¼1
which consistently estimates Ik E½pðXÞqðXÞ ¼ 2E½pðXÞqð1Þ ðXÞX T , where Ik is the k k identity matrix. The proposed estimator of dmprs is then given by !1 n n X X ð1Þ ð1Þ 1 ^ T ^d mprs ¼ ðd^ ^ q ðX Þx X q^ ðX Þx Y : Þ d ¼ x,mpdw
mpdw
i
i
i i
i
i¼1
i
i i
i¼1
ð1Þ
In this estimator, q^ i ðXi Þ are the estimated derivatives of the missing probability weighted density. When Yi are missing at random, they are used as instrumental variables in the following linear equation: Yi ¼ XiT dmprs þ Ui : Therefore, we have 1 d^ mprs dmprs ¼ d^
x,mpdw
( 2n
1
n X i¼1
) ð1Þ q^ i ðXi Þxi Ui :
352
W. Liu et al. / Journal of Statistical Planning and Inference 142 (2012) 347–357
By Theorem 1, the term in the brackets consistently estimates the missing probability and density weighted average derivative of DðxÞ EðU9xÞ ¼ mðxÞxT dmprs . We see that its probability limit is E½pðXÞqðXÞDð1Þ ðXÞ ¼ E½pðXÞqðXÞmð1Þ ðXÞE½pðXÞqðXÞdmprs ¼ 0. Hence, d^ mprs is consistent. Moreover, we obtain " # n X pffiffiffi ^ pffiffiffi ð1Þ n½d mprs dmprs ¼ E½pðXÞqðXÞ1 n 2n1 q^ ðX Þx U þop ð1Þ: ð8Þ i
i
i
i
i¼1
From an immediate application of Theorem 1 with Y replaced by U ¼ YX T dmprs , we obtain the asymptotic normality of d^ mprs as follows: Theorem 2. Under the conditions in Theorem 1, we have n1=2 ðd^ mprs dmprs Þ!D Nð0, Smprs Þ, where Smprs ¼ 4E½rmprs ðY, x,XÞrmprs ðY, x,XÞT is the variance–covariance matrix of 2rmprs ðY, x,XÞ, with rmprs ðy, x,xÞ ¼ E½pðXÞqðXÞ1 ½xqðxÞfmð1Þ ðxÞdmprs gxfymðxÞgqð1Þ ðxÞ:
Theorem 2 is analogous to Corollary 4.1 of Powell et al. (1989). Let U^ i ¼ Yi XiT d^ mprs , i¼1,2, y, n, denote the estimated residuals from the model Yi ¼ XiT dmprs þUi , define 8 9 k þ 1 n < 1 X = 1 X X 1 i j K ð1Þ xi xj ðU^ i U^ j Þ : r^ mprs ðZi Þ ¼ d^ x,mpdw :n1 ; h h j ¼ 1,jai Then, the variance–covariance matrix Smprs can be consistently estimated by Pn ðZ Þr^ ðZ ÞT r^ S^ mprs ¼ 4 i ¼ 1 mprs i mprs i : n Without the above justification, simply substituting xYi for Yi in the normalized estimator given by Powell et al. (1989) for the non-missing case, using the estimator given by (4), a naive estimator of the rescaled weighted average derivative may be defined as !1 n n X X ð1Þ ð1Þ f^ ðX ÞX T f^ ðX Þx Y : d^ nrs ¼ ðd^ Þ1 d^ ¼ ð9Þ x,dw
ndw
i
i¼1
i
i
i
i
i i
i¼1
Again, the limit of this estimator may not be proportional to b and the estimator is biased, unless x is independent of X or follows a single-index model with the same index coefficient b as Y does. 2.4. Estimation for the index coefficients of the single-index model For the single-index model (3), we can obtain an estimator of the index coefficient b from either d^ mpdw or d^ mprs , given by
b^ ¼ d^ =Jd^ J, where d^ ¼ d^ mpdw or d^ mprs , the corresponding b^ ¼ b^ mpdw or b^ mprs . The asymptotic distributions of the estimators are given by the following theorem. Theorem 3. Under the conditions in Theorem 1, we have n1=2 ðb^ bÞ-D Nð0,ASAT Þ, T
where A ¼ JdJ1 fIk dd =JdJ2 g, d ¼ dmpdw or dmprs and S ¼ Smpdw or Smprs , depending on which d is estimated. The matrix A can T be consistently estimated by A^ ¼ Jd^ J1 fIk d^ d^ g=Jd^ J2 g.
The proof of Theorem 3 is obtained by the results of Theorems 1 and 2 and applying the delta method to the function hðyÞ ¼ y=JyJ for any k-dimensional vector y such that JyJa0. Theorems 1–3 can be used to make normal approximation inference about the weighted average derivatives and the single-index coefficients. They will be used in the simulation studies. 3. Monte Carlo simulation study We first conduct a Monte Carlo simulation study to compare the finite sample performance of the four estimators of the single-index coefficient b in model (3). They are the proposed asymptotically unbiased estimators b^ mpdw , b^ mprs , the naive estimators b^ ndw ¼ d^ ndw =Jd^ ndw J and b^ nrs ¼ d^ nrs =Jd^ nrs J, where d^ ndw and d^ nrs are defined in (4) and (9), respectively.
W. Liu et al. / Journal of Statistical Planning and Inference 142 (2012) 347–357
353
The model is a quadratic bivariate single-index regression model (k¼2), Y ¼ ðX T b0 1Þ2 =2 þ esðX T b0 Þ, pffiffiffi pffiffiffi pffiffiffi where the true index coefficient vector b0 ¼ ð2= 5,1= 5ÞT ¼ ð0:894,0:447ÞT , X ¼ ðX1 ,X2 ÞT , X1 ¼ ðX10 3Þ= 6, a standardized random variable of X10 , X10 follows w2 ð3Þ and X2 follows N(0,1), X1 and X2 are independent. For this model specification, the error distribution is multiplicatively heteroscedastic, that is, e is Nð0,0:52 Þ and independent of X, and s2 ðXÞ ¼ expðX T b þ kÞ, k is a constant chosen such that E½s2 ðXÞ ¼ 1. In the model we use, k ¼ 0:9702. The selection probability given X ¼x is taken as pðxÞ ¼ ½c0:1f9x1 9 þ 9x2 9gI½9x1 9 þ 9x2 9 r 4 þ 0:2I½9x1 9 þ9x2 9 4 4, where c¼ 0.9 and 0.7, respectively, the corresponding missing probability is mp¼ 26% and 46%. The sample size n has been chosen to be 100, 200, 400, respectively. To compute the estimates, we construct a fourth order kernel using the jackknifed techniques proposed by Powell et al. (1989). A modified ‘‘plug-in’’ estimator of the optimal bandwidth given by Powell and Stoker (1996) is used to select bandwidth for the estimators (see their Propositions 4.2 and 5.1), where an initial estimator value h0 is set to be h0 ¼ 0:8n1=6 , which satisfies the bandwidth conditions (at r ¼ 4 and k¼2) required by Theorems 1–3. Simulating the model for 1000 times, we report the results in Table 1, where BIAS is the estimated bias, SSE is the sample standard error of the point estimates, ESE is the mean of the estimated standard errors by the sample variance–covariance formulas and the CP is the 95% empirical coverage probability. Table 1 shows that the two estimators b^ mpdw and b^ mprs work well under different sample sizes and missing probabilities, the ESE agrees with SSE, the coverage probability is close to the nominal level. On the other hand, the instrumental variables regression estimator b^ mprs outperforms b^ mpdw in terms of both bias and standard error. The naive estimators produce large biases and standard errors, indicating inconsistency of the estimators and non-stable coverage probabilities. In the second simulation study, we adopt a sine model considered by Wang et al. (2010) and compare our methods with theirs. The model under their consideration is given by Y ¼ sinðX T b0 Þ þ 0:5e, Table 1 Summary statistics of the simulation study for the estimation of the single-index coefficients based on the proposed estimators and the naive estimators. n: sample size; mp: missing probability; Bias: bias of the estimator; SSE: sample standard error; ESE: mean of the estimated standard error; CP: 95% empirical coverage probability. n
mp
b
The proposed estimators
b^ mpdw BIAS 100
ESE
CP 0.950 0.927 0.946 0.934 0.948 0.945
0.002 0.0004 0.0002 0.002 0.0003 0.0004
0.026 0.052 0.017 0.035 0.014 0.027
0.026 0.052 0.018 0.036 0.013 0.026
0.968 0.964 0.944 0.948 0.943 0.949
46%
0.894 0.447 0.894 0.447 0.894 0.447
0.015 0.018 0.010 0.014 0.004 0.005
0.052 0.092 0.034 0.064 0.026 0.047
0.046 0.085 0.030 0.057 0.021 0.040
0.939 0.923 0.952 0.933 0.942 0.945
0.005 0.004 0.002 0.001 0.0005 0.0004
0.032 0.063 0.021 0.041 0.016 0.031
0.033 0.065 0.022 0.044 0.016 0.031
0.953 0.958 0.970 0.964 0.947 0.943
mp
b
The naive estimators
BIAS
b^ nrs SSE
ESE
CP
BIAS
SSE
ESE
CP
26%
0.894 0.447 0.894 0.447 0.894 0.447
0.040 0.045 0.026 0.036 0.018 0.027
0.097 0.133 0.057 0.098 0.046 0.076
0.076 0.123 0.051 0.088 0.035 0.065
0.933 0.899 0.941 0.919 0.938 0.917
0.018 0.013 0.009 0.007 0.007 0.009
0.082 0.115 0.044 0.085 0.033 0.063
0.065 0.116 0.043 0.082 0.031 0.061
0.931 0.951 0.946 0.955 0.941 0.948
46%
0.894 0.447 0.894 0.447 0.894 0.447
0.065 0.052 0.041 0.045 0.031 0.041
0.162 0.195 0.097 0.142 0.069 0.107
0.120 0.176 0.081 0.131 0.057 0.096
0.902 0.882 0.910 0.909 0.947 0.920
0.040 0.019 0.021 0.014 0.017 0.018
0.145 0.178 0.084 0.130 0.062 0.099
0.109 0.171 0.072 0.127 0.052 0.092
0.905 0.925 0.916 0.930 0.943 0.940
200 400
400
CP
0.037 0.069 0.025 0.047 0.017 0.033
b^ ndw
200
ESE
0.043 0.073 0.028 0.053 0.025 0.042
400
100
SSE
0.015 0.021 0.008 0.013 0.003 0.004
200
100
BIAS
0.894 0.447 0.894 0.447 0.894 0.447
400
n
SSE
26%
200
100
b^ mprs
354
W. Liu et al. / Journal of Statistical Planning and Inference 142 (2012) 347–357
Table 2 Summary statistics of the simulation study for the estimation of the single-index coefficients based on the proposed estimators and the estimator of Wang et al. (2010). n: sample size; mp: missing probability; Bias: bias of the estimator; SSE: sample standard error; ESE: mean of the estimated standard error; CP: 95% empirical coverage probability. n
mp
WSHW’s methoda
b
BIAS 100
32% (c¼ 0.5)
200 400 100
46% (c ¼1)
200 400 n
200 400 100
46% (c ¼1)
200 400
a
ESE
CP
BIAS
SSE
ESE
CP
0.005 0.006 0.001 0.003 0.002 0.001
0.089 0.088 0.056 0.057 0.038 0.038
0.100 0.100 0.077 0.077 0.054 0.054
0.882 0.894 0.934 0.924 0.962 0.962
0.006 0.006 0.003 0.003 0.002 0.003
0.093 0.093 0.068 0.068 0.051 0.057
0.090 0.090 0.069 0.068 0.050 0.050
0.912 0.928 0.948 0.934 0.934 0.946
0.707 0.707 0.707 0.707 0.707 0.707
0.003 0.012 0.0013 0.0087 0.002 0.0001
0.100 0.104 0.069 0.073 0.044 0.044
0.121 0.124 0.083 0.082 0.061 0.061
0.932 0.936 0.944 0.936 0.958 0.948
0.001 0.019 0.005 0.004 0.002 0.007
0.116 0.120 0.081 0.079 0.058 0.060
0.098 0.101 0.078 0.078 0.060 0.060
0.866 0.874 0.920 0.910 0.938 0.956
b
WSHW’s methoda BIAS
32% (c¼ 0.5)
SSE
0.707 0.707 0.707 0.707 0.707 0.707
mp
100
The proposed b^ mpdw
The proposed b^ mprs
SSE
ESE
CP
BIAS
SSE
ESE
CP
0.707 0.707 0.707 0.707 0.707 0.707
0.005 0.006 0.001 0.003 0.002 0.001
0.089 0.088 0.056 0.057 0.038 0.038
0.100 0.100 0.077 0.077 0.054 0.054
0.882 0.894 0.934 0.924 0.962 0.962
0.007 0.005 0.004 0.003 0.002 0.001
0.091 0.090 0.067 0.068 0.049 0.049
0.092 0.092 0.069 0.069 0.050 0.050
0.936 0.934 0.952 0.944 0.936 0.944
0.707 0.707 0.707 0.707 0.707 0.707
0.003 0.012 0.0013 0.0087 0.002 0.0001
0.100 0.104 0.069 0.073 0.044 0.044
0.121 0.124 0.083 0.082 0.061 0.061
0.932 0.936 0.944 0.936 0.958 0.948
0.0003 0.019 0.004 0.005 0.001 0.007
0.116 0.118 0.080 0.080 0.061 0.063
0.109 0.112 0.083 0.083 0.061 0.062
0.904 0.918 0.936 0.932 0.936 0.948
Results are taken from Tables 1–2 of Wang et al. (2010).
pffiffiffi pffiffiffi where b0 ¼ ð 2=2, 2=2ÞT ¼ ð0:707,0:707ÞT , e Nð0,1Þ, X ¼ ðX1 ,X2 ÞT has the two-dimensional standard normal distribution. The selection probability follows the model of Pðx ¼ 19X,YÞ ¼
1 : 1 þ c9X1 þX2 9
The results from Tables 1–2 of Wang et al. (2010) based on 500 replications are summarized in Table 2 in the left panel (abbreviated as WSHW’s method). We applied our methods to the same simulation procedure, the bandwidth selection method suggested in the first simulation study was used with h0 ¼ 1:4n1=6 . The results are presented in Table 2 in the right panel for the purpose of comparison. From these simulation results, we observe that, the biases of the three estimators are comparable, among the two proposed estimators, b^ mprs performs slightly better than b^ mpdw in terms of the coverage probability. It seems that the WSHW’s estimator has smaller SSE, and when the missing probability is high (mp¼46%), better coverage probabilities than the other two. But after carefully examining the SSE and ESE, we find that the WSHW’s method overestimates the standard error of the estimates quite a bit, which broadens the empirical coverage probability. Interestingly, these estimated standard errors are very close to those obtained from our methods. Another consequence of the overestimation of the standard error is that the coverage probability is poorer at the low missing probability mp¼32% than at the high missing probability mp¼46%, which is counterintuitive. While our method provides a consistent estimator of the standard error, the resultant coverage probability is more reliable and reasonable. 4. Concluding remarks Semiparametric direct estimators are proposed to estimate the missing probability and density-weighted average derivatives of a general nonparametric regression function and the index coefficients in a single-index model with responses missing at random. The proposed estimators do not require solving an optimization problem and is non-iterative, so the estimates can be computed very quickly. When the responses are not missing, the proposed estimators become the popular
W. Liu et al. / Journal of Statistical Planning and Inference 142 (2012) 347–357
355
direct estimators of Powell et al. (1989) for the density-weighted average derivatives. Asymptotic normality of the estimators are obtained. Monte Carlo simulation results indicate that the new estimators provide accurate inference results and are comparable to the estimator of Wang et al. (2010) derived from semiparametric nonlinear least-squares. In our model, the estimators of pðxÞ, f(x), q(x) and their derivatives are all fully nonparametric, when the dimension of X is high, they suffer the ‘‘curse of dimensionality’’. One remedy is to posit parametric or semiparametric structures on these functions. For example, pðxÞ can be modeled by logistic regression and f(x) can be modeled by copulas. Liebscher (2005) investigated semiparametric density estimators using copulas. How to incorporate copulas to our model remains an interesting topic in our future research.
Acknowledgements We are grateful to the Editor and a referee for their constructive comments, which have greatly helped to improve this article. Liu’s research is partly supported by Hunan Provincial Natural Science Foundation of China (08jj6039). Lu’s research is supported in part by NSERC of Canada. Appendix A A.1. Assumptions The following assumptions are needed for Theorems 1–3. They are modified from Powell et al. (1989) with some changes in Assumptions 3 and 5, where we replace f(x) by qðxÞ ¼ pðxÞf ðxÞ. Assumption 1. The support O of f is a convex, possibly unbounded subset of Rk with nonempty interior O0 . The underlying measure of (y,x) can be written as my mx , where mx is Lebesgue measure on Rk . Assumption 2. The density function f is continuous in the components of x for all x 2 Rk , so that f ðxÞ ¼ 0 for all x 2 @O, where @O denotes the boundary of O. Furthermore, f(x) is continuously differentiable in the components of x for all x 2 O0 and mðxÞ ¼ EðY9X ¼ xÞ is continuously differentiable in the components of x for all x 2 O , where O differs from O0 by a set of measure zero. Assumption 3. The components of the random vector mð1Þ ðXÞ and random matrix ½qð1Þ ðXÞðY,X T Þ have finite second moments. Also, qð1Þ ðxÞ and ðqmÞð1Þ ðxÞ satisfy the following Lipschitz conditions: For some v(x), Jqð1Þ ðx þ nÞqð1Þ ðxÞJ ovðxÞJnJ, JðqmÞð1Þ ðx þ nÞðqmÞð1Þ ðxÞJ ovðxÞJnJ, with E½ð1 þ 9Y9 þJXJÞvðXÞ2 o1. Finally, M2 ðxÞ ¼ EðY 2 9X ¼ xÞ is continuous in x and is bounded on O. Assumption 4. The support OK of K(u) is a convex (possibly unbounded) subset of Rk with nonempty interior, with the R R origin as an interior point. K(u) is a bounded differentiable function such that KðuÞ du ¼ 1 and uKðuÞ du ¼ 0. KðuÞ ¼ 0 for all u 2 @OK , where @OK denotes the boundary of OK . K(u) is a symmetric function, KðuÞ ¼ KðuÞ for all u 2 OK . Assumption 5. Let r ¼ ðkþ 4Þ=2 if k is even and r ¼ ðk þ 3Þ=2 if k is odd. All partial derivatives of q(x) of order r þ 1 exist. The expectation E½Yð@t qðxÞ=@xl1 @xlt Þ9x ¼ X exists for all t r r þ1. All moments of K(u) of order r exist. Assumption 6. The kernel function K(u) obeys Z KðuÞ du ¼ 1, Z
ul11 ul22 ulkk KðuÞ du ¼ 0,
l1 þl2 þ þ lk o r,
ul11 ul22 ulkk KðuÞ dua0,
l1 þ l2 þ þ lk ¼ r:
and Z
A.2. Proof of Theorem 1 kþ2
Under the condition that nh -1 as h-0, we have E½JZn ðZi ,Zj ÞJ2 ¼ oðnÞ. By Lemma 2, we know that P Un yn ¼ d^ mpdw Eðd^ mpdw Þ and U^ n yn ¼ ð2=nÞ ni¼ 1 ½rn ðZi ÞEfrn ðZi Þg have same asymptotic distribution. To characterize this distribution, define rmpdw ðzÞ ¼ rmpdw ðy, x,xÞ ¼ xqðxÞmð1Þ ðxÞxfymðxÞgqð1Þ ðxÞ:
356
W. Liu et al. / Journal of Statistical Planning and Inference 142 (2012) 347–357
Considering Zn ðZi ,Zj Þ, we have rn ðZi Þ ¼ EfZn ðZi ,Zj Þ9Zi g Z k þ 1 1 X x K ð1Þ i xi pðxÞfYi mðxÞgf ðxÞ dx ¼ h h Z 1 ð1Þ K ðuÞxi pðxÞfYi mðXi þ huÞgpðXi þ huÞf ðXi þhuÞ du ¼ h Z Z @ðmqÞðXi þ huÞ @qðXi þhuÞ KðuÞ duYi KðuÞ du ¼ xi @x @x Z Z @ðmqÞðXi þ huÞ @ðmqÞðXi Þ @qðXi þ huÞ @qðXi Þ ¼ rmpdw ðZi Þ þ xi KðuÞ duYi KðuÞ du @x @x @x @x rmpdw ðZi Þ þ tn ðZi Þ: Thus, we obtain n n n 2 X 2 X 2 X pffiffiffi ½rn ðZi ÞEfrn ðZi Þg ¼ pffiffiffi ½rmpdw ðZi ÞEfrmpdw ðZi Þgþ pffiffiffi ½tn ðZi ÞEftn ðZi Þg: ni¼1 ni¼1 ni¼1
If we can show that the last term of the above equation n 2 X Rn pffiffiffi ½tn ðZi ÞEftn ðZi Þg ¼ op ð1Þ ni¼1
pffiffiffi ^ pffiffiffi P we conclude that the limiting distribution of nfd mpdw Eðd^ mpdw Þg is the same as that of ð2= nÞ ni¼ 1 ½rmpdw ðZi Þ Efrmpdw ðZi Þg. Then, application of the Lindeberg–Levy central limit theorem yields the result in Theorem 1. In fact, by Assumption 3, the second moment of Rn is bounded by EðR2n Þ r4E½tn2 ðZi Þ Z 2 Z @ðmqÞðXi þ huÞ @ðmqÞðXi Þ 9KðuÞ9 du þ 9Yi 9 @qðXi þ huÞ @qðXi Þ 9KðuÞ9 du r4 @x @x @x @x Z 2 Z r 4E JhuJvðXi Þ9KðuÞ9 du þ9Yi 9 JhuJvðXi Þ9KðuÞ9 du 2 Z r 4E hð1 þ9Yi 9ÞvðXi Þ JuJ9KðuÞ9 du 2 Z 2 ¼ 4h2 E ð1 þ 9Yi 9ÞvðXi Þ JuJ9KðuÞ9 du ¼ Oðh2 Þ-0,
h-0,
which implies Rn ¼ op ð1Þ. This completes the proof of Theorem 1. A.3. Proof of Theorem 2 Noticing that U ¼ YX T dmprs and DðxÞ ¼ EðU9xÞ, if Y is missing, then U follows the same missing mechanism. We consider the following model similar to model (1): U ¼ DðXÞ þ e~ : Following the same arguments in the proof of Theorem 1 and replacing y by u, m(x) by D(x), and mð1Þ ðxÞ by pffiffiffi pffiffiffi P ð1Þ Dð1Þ ðxÞ ¼ mð1Þ ðxÞdmprs , respectively, we can show that n½2n1 ni¼ 1 q^ i ðXi Þxi Ui and 2= n½r~ mpdw ðZi ÞEfr~ mpdw ðZi Þg have the same limiting distribution, where r~ mpdw ðzÞ ¼ xqðxÞDð1Þ ðxÞxfuDðxÞgqð1Þ ðxÞ ¼ xqðxÞfmð1Þ ðxÞdmprs gxfymðxÞgqð1Þ ðxÞ: Thus, let rmprs ðzÞ ¼ rmprs ðy, x,xÞ ¼ E½pðXÞqðXÞ1 r~ mpdw ðzÞ and notice that E½rmprs ðZÞ ¼ 0, by (8), we obtain Theorem 2. References Chen, J., Fan, J., Li, K.-H., Zhou, H., 2006. Local quasi-likeilihood estimation with data missing at random. Statisica Sinica 16, 1071–1100. Geenens, G., Delecroix, M., 2006. A survey about single-index models theory. International Journal of Statistics and Systems 1, 203–230. GØrgens, T., 2004. Average derivatives for hazard functions. Econometric Theory 20, 437–463. ¨ Hardle, W., Hall, P., Ichimura, H., 1993. Optimal smoothing in single-index models. Annals of Statistics 21, 157–178. ¨ Hardle, W., Stoker, T.M., 1989. Investigating smooth multiple regression by the method of average derivative. Journal of American Statistical Association 84, 986–995. ¨ Horowitz, J.L., Hardle, W., 1996. Direct semiparametric estimation of single-index models with discrete covariates. Journal of American Statistical Association 91, 1632–1640.
W. Liu et al. / Journal of Statistical Planning and Inference 142 (2012) 347–357
357
Hristache, M., Juditsky, A., Spokoiny, V., 2001. Direct estimation of the index coefficients in a single-index model. Annals of Statistics 29, 595–623. Ichimura, H., 1993. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics 58, 71–120. Li, Q., Lu, X., Ullah, A., 2003. Multivariate local polynomial regression for estimating average derivatives. Journal of Nonparametric Statistics 4–5, 607–624. Liang, H., Wang, S., Carroll, R.J., 2007. Partially linear models with missing response variables and error-prone covariates. Biometrika 94, 185–198. Liebscher, E., 2005. Semiparametric density estimators using copulas. Communications in Statistics—Theory and Methods 34, 59–71. Lu, X., Burke, M.D., 2005. Censored multiple regression by the method of average derivatives. Journal of Multivariate Analysis 95, 182–205. Powell, J.L., Stock, J.H., Stoker, T.M., 1989. Semiparametric estimation of index coefficients. Econometrica 57, 1403–1430. Powell, J.L., Stoker, T.M., 1996. Optimal bandwidth choice for density-weighted average. Journal of Econometrics 75, 291–316. Stoker, T.M., 1986. Consistent estimation of scaled coefficients. Econometrica 54, 1461–1481. ¨ Wang, Q., Linton, O., Hardle, W., 2004. Semiparametric regression analysis with missing response at random. Journal of American Statistical Association 99, 334–345. Wang, Y., Shen, J., He, S., Wang, Q., 2010. Estimation of single index model with missing response at random. Journal of Statistical Planning and Inference 140, 1671–1690. Xia, Y., 2006. Asymptotic distributions for two estimators of the single-index models. Econometric Theory 22, 1112–1137. Xue, L.G., Zhu, L.X., 2006. Empirical likelihood for single-index models. Journal of Multivariate Analysis 97, 1295–1312. Yu, Y., Ruppert, D., 2002. Penalized spline estimation for partially linear single-index models. Journal of American Statistical Association 97, 1042–1054.