ARTICLE IN PRESS
Journal of Econometrics 144 (2008) 118–138 www.elsevier.com/locate/jeconom
Weak identification robust tests in an instrumental quantile model Sung Jae Jun Department of Economics, The Pennsylvania State University, 619 Kern Graduate Building, University Park, PA 16802, USA Available online 20 January 2008
Abstract We develop a testing procedure that is robust to identification quality in an instrumental quantile model. In order to reduce the computational burden, a multi-step approach is taken, and a two-step Anderson–Rubin (AR) statistic is considered. We then propose an orthogonal decomposition of the AR statistic, where the null distribution of each component does not depend on the assumption of a full rank of the Jacobian. Power experiments are conducted, and inferences on returns to schooling using the Angrist and Krueger data are considered as an empirical example. r 2008 Elsevier B.V. All rights reserved. JEL classification: C01; C12; C30 Keywords: Quantile regression; Instruments; GMM; Weak identification
1. Introduction Quantile regression models are useful to analyze the impact of treatment variables on the distribution of outcomes and have been used in many economic applications (see e.g., Buchinsky, 1994). However, in many cases, the treatment variables are endogenous, and conventional quantile regression is inappropriate for causal or structural analysis. To overcome this difficulty, several modified models have been proposed (see e.g., Amemiya, 1982; Abadie et al., 2002; Chernozhukov and Hansen, 2005, 2006; Imbens and Newey, 2002; Chesher, 2003; Ma and Koenker, 2006; Lee, 2007). Chernozhukov and Hansen take an instrumental variable approach. In constrast to Abadie et al. (2002), Chernozhukov and Hansen’s model allows for a nonbinary endogenous variable. It is also more general than Amemiya (1982), which is a location model that does not allow for treatment effect heterogeneity. More importantly, this model can be cast in the framework of generalized methods of moments (GMM) in spite of its computational difficulty. We take Chernozhukov and Hansen’s approach to endogenous quantile models and consider testing the parameter of an endogenous variable without making the identification assumption in the GMM framework. Removing the identification assumption is particularly interesting in quantile analysis because defending instruments is not as easy as in a mean regression model, and the identification condition depends on what quantile effects are considered. It is never clear for what quantile effects the given instruments are most E-mail address:
[email protected] 0304-4076/$ - see front matter r 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2007.12.006
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
119
informative. In particular, when instruments are ‘‘weak,’’ the identification of some quantile effects can be relatively harder than that of others. Since the mean effect is the integral of all the quantile effects, the identification of the mean effect is more demanding than that of a particular quantile effect. As is discussed in Staiger and Stock (1997), Stock and Wright (2000) and Stock et al. (2002) among others, it is well known that the normal approximation of the GMM estimator can be extremely poor without the strong identification assumption, which is not always obvious for verification in practice. One might want to pretest the quality of identification (see e.g., Hahn and Hausman, 2002), but this is not thought of as a desirable approach because consequential inference should be conditional on the pretest. For this reason, many efforts have been made to develop testing procedures that are robust to the potential identification difficulty (Staiger and Stock, 1997; Stock and Wright, 2000; Moreira, 2003; Kleibergen, 2002, 2004, 2005). So far, these efforts seem to be quite successful. In particular, Kleibergen (2005) shows that the score statistic based on the continuous-updating GMM is robust to weak values of the expectation of the Jacobian, where he assumes the differentiability of the GMM objective function.1 The differentiability assumption is restrictive because it rules out interesting models such as the instrumental quantile model considered in this paper. The nonsmoothness of the objective function causes several problems: the computation becomes harder and the score statistic is not available. To avoid a high dimensional grid search, the coefficients of the exogenous variables are concentrated out under the null hypothesis of the structural parameter. This approach significantly reduces the computational burden and results in a straightforward two-step Anderson–Rubin (AR) statistic, which is a joint test of the location of the parameter and the overidentifying restrictions (Anderson and Rubin, 1949; Stock and Wright, 2000; Honore and Hu, 2004).2 We then show how to obtain an orthogonal decomposition of the AR statistic. Simulation experiments demonstrate the power improvement achieved by this decomposition. We then apply this method to the study of returns to schooling using the Angrist and Krueger data (e.g., Angrist and Krueger, 1991; Angrist et al., 1999). Confidence sets of returns to schooling for various quantiles are obtained by inverting the two-step AR test and its orthogonal decomposition. Although returns to schooling for the upper quantiles seem to be quite constant, the robust confidence sets for the lower quantiles are so wide that they still remain obscure, which differs from the results in Chernozhukov and Hansen (2006). The paper is organized as follows. Section 2 presents the model and discusses weak identification. Section 3 focuses on the concentrated moment condition, and proposes a two-step AR statistic and its orthogonal decomposition. The simulation results are then provided, and the example of returns to schooling is discussed in Section 5. Conclusions are presented in the last section. 2. The model and weak identification Following Chernozhukov and Hansen (2005, 2006), we consider a conditional moment restriction Eð1fY pDa þ X 0 bg tjZ; X Þ ¼ 0,
(1)
where a ¼ aðtÞ; b ¼ bðtÞ, 1fg is the standard indicator function, Y is an outcome, D 2 D R is a single endogenous regressor, and X 2 X Rk is a vector of exogenous covariates. An unconditional moment restriction can then be obtained by mg ða; bÞ ¼ EðgðX ; ZÞð1fY pDa þ X 0 bg tÞÞ ¼ 0,
(2)
where a ¼ aðtÞ; b ¼ bðtÞ, and gðX ; ZÞ is a known function with dimðgðX ; ZÞÞX1 þ k. As it is well known, the optimal choice of g is given by " # ! Df Zt ð0jD; Z; X Þ 1 E g ðX ; ZÞ ¼ Z; X , (3) Xf Zt ð0jD; Z; X Þ tð1 tÞ 1 Local identification in general GMM requires full rank of the Jacobian of the expectation, which is well defined even if the expectation of the Jacobian is not (see e.g., Pakes and Pollard, 1989). 2 In the working paper version of Chernozhukov and Hansen (2006), they propose dual inference that is robust to the quality of instruments. This idea is in line with the AR statistic, which is a joint test of the location and the overidentifying restrictions. In fact, it is easy to show that this idea leads to the AR statistic in the standard linear mean regression model. See also Section 4.
ARTICLE IN PRESS 120
S.J. Jun / Journal of Econometrics 144 (2008) 118–138
where f Zt ðjD; Z; X Þ is the conditional density of Zt ¼ Y DaðtÞ X 0 bðtÞ given D; Z; X (see e.g., Chamberlain, 1987; Chernozhukov et al., 2005). Note that the optimal choice g is not a known function. Since the defining property of identification failure is that g is not consistently estimable (or vanishes), we do not consider estimating g . Instead, we will focus on inference that is fully robust to identification failure for a given choice of g. In particular, as Chernozhukov et al. (2005) also suggest, we simply choose an identity function for g throughout the paper so that we consider the moment condition " # " # m1 ða; bÞ EðX ð1fY pDa þ X 0 bg tÞÞ mða; bÞ ¼ ¼ ¼ 0, (4) EðZð1fY pDa þ X 0 bg tÞÞ m2 ða; bÞ where a ¼ aðtÞ and b ¼ bðtÞ. Although it is conceptually straightforward to obtain a GMM estimator from this moment condition, there are several difficulties. First, the indicator function is not a smooth function and the optimization requires a grid search over a potentially high dimensional space. Second, consistency and asymptotic normality of the GMM estimator requires that mða; bÞ ¼ 0 only when a ¼ aðtÞ; b ¼ bðtÞ and that the Jacobian, qm ðqða;b 0 ÞÞja¼aðtÞ;b¼bðtÞ ; has a full column rank. These are the global and the local identification conditions, respectively. When the parameters are not globally identified, consistent estimation is impossible. The lack of local identification when the Jacobian is equal to zero implies that estimating the parameters can be extremely difficult when the Jacobian is not zero but arbitrarily close to zero (see e.g., Dufour, 1997). Stock and Wright (2000) propose an alternative asymptotic tool to capture this phenomenon by setting the Jacobian to be local to zero. They show that the GMM estimator is not consistent and not asymptotically normal under this alternative asymptotics, which they call the weak identification asymptotics. The condition of the full column rank of the Jacobian is however just an assumption, which is not clear how to check in practice.3 Chernozhukov et al. (2005) propose an exact finite sample inference method based on the moment condition (2), or more practically on condition (4). They show how to obtain a valid confidence set for aðtÞ, bðtÞ jointly, and suggest projecting out the part of bðtÞ to conduct marginal inference on aðtÞ. They also propose a computational algorithm to implement this projection procedure conveniently. As we will discuss in the following subsection, this is asymptotically equivalent to the AR statistic that tests all the parameters jointly. When the interest is solely in the parameter aðtÞ, starting from a joint confidence set makes inference unnecessarily conservative. See also Section 4 on simulations. Kleibergen (2005) shows how to conduct robust inference in GMM against the potential identification difficulty. He assumes that the objective function is differentiable, and shows that a score statistic from a continuous updating objective function is robust to weak values of the expectation of the Jacobian. This approach does not apply to the current model because of nonsmoothness of the indicator function. It should also be noted that what matters for identification is not the expectation of the Jacobian but the Jacobian of the expectation. The key point turns out to be not having the score of the pffiffiffi objective function but estimating the Jacobian in such a way that it is asymptotically independent of the n times sample analog of mðaðtÞ; bðtÞÞ. We give some remarks on the smoothed GMM approach (see e.g., Horowitz, 1992; MaCurdy and Timmins, 2001). One might think that Kleibergen’s method can be directly applied by smoothing the indicator with a suitable distribution function (a cdf-type kernel). However, this approach is not pursued in this paper because it makes the problem unnecessarily complicated due to the fact that the bias and undersmoothing issue should be dealt with from the first step of defining the AR statistic. 2.1. A conservative approach: joint inference and projection We first consider testing aðtÞ and bðtÞ jointly. In this case, an exact finite sample inference method is available, which is asymptotically equivalent to an AR statistic (see Chernozhukov et al., 2005). When the interest is solely in aðtÞ, a marginal confidence set can be obviously obtained by projecting out the part of bðtÞ 3
In the linear mean location model, it is relatively easier to defend this condition because it is equivalent to the sufficiently strong correlation between the endogenous variable and the instruments. However, it is not clear how to check this condition in general nonlinear models. Honore and Hu (2004) write, ‘‘it is not clear exactly which (easily interpretable) assumptions would lead to this being true in general.’’
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
121
from the joint confidence set, although it is a conservative approach. In this section, we illustrate the conservative approach. Let us start from a moment condition (4), and suppose that a random sample with size n fðY oi ; Di ; X 0i ; Z 0i Þ0 g is available. When we denote generic random variables, we will suppress the subscript i. We define a test statistic as follows: pffiffiffi pffiffiffi ^ ^ ARn ðaðtÞ; bðtÞÞ ð nmðaðtÞ; bðtÞÞÞ0 W 1 bðtÞÞÞ, (5) n ð nmðaðtÞ; Pn o 0 0 0 0 ^ where mðaðtÞ; bðtÞÞ ¼ ð1=nÞ i¼1 QXi ð1fY i pDi aðtÞ þ X i bðtÞg tÞ with QXi ¼ ½X i : Zi , and W n is a positive p
definite matrix such that W n ! VarðQX ð1fY pDaðtÞ þ X 0 bðtÞg tÞÞ. Chernozhukov et al. (2005) note that the exact finite sample distribution of ARn conditional on X 1 ; . . . ; X n ; Z1 ; . . . ; Z n can be easily simulated, because 1fY oi pDi aðtÞ þ X 0i bðtÞg is a Bernoulli-distributed random variable conditional on X 1 ; . . . ; X n ; Z1 ; . . . ; Z n . Since ARn clearly converges in distribution to a chi-squared distribution, the exact procedure is essentially a chi-squared test in large samples based on d
ARn ðaðtÞ; bðtÞÞ ! w2 ðdimðQX ÞÞ,
(6)
which is the well-known AR test. Therefore, we can construct a joint confidence set for aðtÞ and bðtÞ by inverting this test: C J ¼ fðaðtÞ; bðtÞÞ : ARn ðaðtÞ; bðtÞÞocrta g,
(7)
where crta is an appropriately chosen critical value from either the chi-squared or the simulated exact distribution. In order to obtain a marginal confidence set for aðtÞ, one can project out the part of bðtÞ from C J as C M;aðtÞ ¼ aðtÞ : there is at least one bðtÞ such that ðaðtÞ; bðtÞÞ 2 C J ; i.e. inf ARn ðaðtÞ; bÞocrta . (8) b
Some comments are needed. First, since it is obtained by projecting out bðtÞ from a joint confidence set, it is a conservative confidence region for the parameter aðtÞ. Let C M;bðtÞ be a confidence region for bðtÞ that is obtained by projecting out the part of aðtÞ from C J in a similar manner. It is easy to see that ðaðtÞ; bðtÞÞ 2 C J ¼) ðaðtÞ; bðtÞÞ 2 C M;aðtÞ C M;bðtÞ ¼) aðtÞ 2 C M;aðtÞ ,
(9)
which implies that the inference based on C M;aðtÞ is conservative. Note that each step in (9) contributes to the cost of being conservative. Unless C J is rectangular, C J will be strictly contained in C M;aðtÞ C M;bðtÞ . In addition, PrððaðtÞ; bðtÞÞ 2 C M;aðtÞ C M;bðtÞ Þo PrðaðtÞ 2 C M;aðtÞ Þ, unless PrðbðtÞ 2 C M;bðtÞ jaðtÞ 2 C M;aðtÞ Þ ¼ 1. In Section 4, we will illustrate the cost of the conservative approach by simulation experiments. In order to conduct inference in a less conservative manner, we will concentrate out bðtÞ under the null of aðtÞ, and propose a two-step AR statistic in later sections (see also Honore and Hu, 2004). Second, since ARn is based on a moment condition (4), it wastes degrees of freedom, when the model is highly overidentified. This issue can be addressed by decomposing the AR statistic, as Kleibergen (2005) explained. In the following sections, we consider a two-step AR statistic, and we will show how to achieve an orthogonal decomposition of the two-step AR statistic. 3. Concentrated moment conditions Recall that the main interest of this paper is in testing the structural parameter aðtÞ. At least conceptually, this can be done in a straightforward manner by estimating bðt; aðtÞÞ that solves mðaðtÞ; bÞ ¼ 0 under the hypothesis of aðtÞ. The practical difficulty is however that it is often computationally infeasible, because solving the sample analog of mðaðtÞ; bÞ ¼ 0 requires a grid search over a potentially high dimensional space. ^ aðtÞÞ using the first set of To overcome this computational difficulty, it is noted that finding an estimator bðt; moment conditions m1 ðaðtÞ; bÞ ¼ 0 only does not require the grid search because it is a conventional quantile regression problem that can be easily solved by linear programming (see Koenker and Basset, 1978; Buchinsky, 1998, among others). Therefore, we take the following two-step strategy. First, define bðt; aÞ that
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
122
solves m1 ða; bÞ ¼ 0 for each a, and we consider the concentrated moment condition mc ðaÞ m2 ða; bðt; aÞÞ ¼ EðZi ð1fY i pDi a þ X 0i bðt; aÞg tÞÞ,
(10)
which is equal to 0 when a ¼ aðtÞ. The AR statistic testing aðtÞ can be easily obtained by this two-step strategy (Honore and Hu, 2004). However, the power issue still remains due to wasted degrees of freedom, and we will pursue an orthogonal decomposition of this two-step AR statistic. For this purpose, the Jacobian of the concentrated moment condition is considered. First, we make the following assumptions. Assumption A. t Y DaðtÞ X 0 bðt; aðtÞÞ has a well-defined density f t ðsjD; Z; X Þ conditional on D; Z; X , and f t ðsjD; Z; X Þ is bounded away from 0 at s ¼ 0, where bðt; aÞ solves m1 ða; bÞ ¼ 0 for each a. Assumption B. EðXX 0 f t ð0jD; Z; X ÞÞ is nonsingular. Assumption B ensures that bðt; aðtÞÞ is well identified from the first set of moment conditions, m1 ða; bðt; aÞÞ ¼ 0 for each a. Lemma 1. Under Assumptions A and B, we have G qmqac ðaÞ ja¼aðtÞ ¼ EðQDf t ð0jD; Z; X ÞÞ, where Q Z AX with A EðZX 0 f t ð0jD; Z; X ÞÞEðXX 0 f t ð0jD; Z; X ÞÞ1 . Proof. See Appendix A. Note that when G ¼ 0, aðtÞ is not identified. In the following subsections, we first consider an AR statistic that is based on the concentrated moment condition (10). We will then achieve its orthogonal decomposition, and show that the distribution of each component is robust to weak values of EðQDf t ð0jD; Z; X ÞÞ. Note also that Q is the transformed instrument which will be different from Z unless A ¼ 0. Although A is unknown in practice, it can be easily estimated under the null of aðtÞ. 3.1. A two-step AR statistic and its decomposition We first consider a two-step AR statistic, which was also considered by Honore and Hu (2004). The main purpose of this subsection is to achieve the orthogonal decomposition of the AR statistic. As mentioned earlier, directly smoothing the objective function with a cdf-type kernel is not pursued, because it raises a bias issue from the first step of defining an AR statistic. Nevertheless, it is useful to estimate G ¼ EðQDf t ð0jD; Z; X ÞÞ by its derivative. Therefore, we take a hybrid approach, where the indicator is smoothed only to deal with the derivative part while the indicator in the moment condition itself is left unsmoothed. In the following discussion, the data fðY i ; Di ; X 0i ; Z 0i Þ0 g are assumed to be iid. R R R Assumption C. We have a kernel kðvÞ such that sup jkðvÞjo1; jkðvÞj dvo1; kðvÞ dv ¼ 1; kðvÞ2 dvo1, and jvjjkðvÞj ! 0 as jvj ! 1. Assumption D. f t ðsjD; Z; X Þ is twice continuously differentiable at s ¼ 0. ^ Assumption E. There is an estimator pffiffiffi ^ pffiffiffi Pn bðt; aðtÞÞ for bðt; aðtÞÞ that has an influence function Xi ðaðtÞÞ; nðbðt; aðtÞÞ bðt; aðtÞÞÞ ¼ ð1= nÞ i¼1 Xi ðaðtÞÞ þ op ð1Þ. Assumptions C and D are standard for kernel methods. The density of the standard normal distribution is a ^ aðtÞÞ can be easily obtained by conventional quantile common choice for kðvÞ. Under the null of aðtÞ, bðt;
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
123
regression of Y DaðtÞ on X. We now introduce some notation. hn denotes a bandwidth that goes to 0: n 1X Z i ð1fY i pDi aðtÞ þ X 0i bg tÞ, n i¼1 n X Di aðtÞ þ X 0i b Y i ^ n;hn ðaðtÞ; bÞ 1 G Qi Di k , nhn i¼1 hn n X Di aðtÞ þ X 0i b Y oi 0 ^ n;h ðaðtÞ; bÞ 1 Z X k , Y i i n nhn i¼1 hn n X Di aðtÞ þ X 0i b Y oi 0 ^ n;h ðaðtÞ; bÞ 1 X X k . D i i n nhn i¼1 hn
^ c ðaðtÞ; bÞ m
Similarly, we define GðaðtÞ; bÞ EðQi Di f t ðX 0i ðb bðt; aðtÞÞÞjDi ; Z i ; X i ÞÞ, YðaðtÞ; bÞ EðZ i X 0i f t ðX 0i ðb bðt; aðtÞÞÞjDi ; Z i ; X i ÞÞ, DðaðtÞ; bÞ EðX i X 0i f t ðX 0i ðb bðt; aðtÞÞÞjDi ; Zi ; X i ÞÞ and 2
3 ^ n;hn ðaðtÞ; bÞ G 6 7 ^ n;h ðaðtÞ; bÞÞ 7; ^ ðaðtÞ; bÞ ¼ 6 vecðY G n n;hn 4 5 ^ n;h ðaðtÞ; bÞÞ vecðD n
2
3 GðaðtÞ; bÞ 6 7 G ðaðtÞ; bÞ ¼ 4 vecðYðaðtÞ; bÞÞ 5. vecðDðaðtÞ; bÞÞ
Now, let " G n;hn ðaðtÞ; bÞ "
^ c ðaðtÞ; bÞ m
#
^ ðaðtÞ; bÞ EðG ^ ðaðtÞ; bðt; aðtÞÞÞÞ , G n;hn n;hn mc ðaðtÞ; bÞ
#
G hn ðaðtÞ; bÞ
^ ðaðtÞ; bÞÞ EðG ^ ðaðtÞ; bðt; aðtÞÞÞÞ , EðG n;hn n;hn " # mc ðaðtÞ; bÞ . GðaðtÞ; bÞ G ðaðtÞ; bÞ G ðaðtÞ; bðt; aðtÞÞÞ ^ ðaðtÞ; bðt; aðtÞÞÞÞ is a normalization for convenience; note that G n;hn ðaðtÞ; bðt; aðtÞÞÞ has Subtracting EðG n;hn mean zero and that Ghn ðaðtÞ; bðt; aðtÞÞÞ ¼ 0. Finally, let " pffiffiffi # 0 nI 1 pffiffiffiffiffiffiffi Un;hn , 0 nhn I 2 ^ , respectively. Now, we make a few more ^ c and G where I 1 and I 2 are identity matrices conformable with m n;hn assumptions. Assumption F. GðaðtÞ; bÞ can be linearized with respect to b around bðt; aðtÞÞ. That is, GðaðtÞ; bÞ ¼ ½H 1 ðaðtÞÞ0 : H 2 ðaðtÞÞ0 0 ðb bðt; aðtÞÞÞ þ oðkb bðt; aðtÞÞkÞ for some HðaðtÞÞ ¼ ½H 1 ðaðtÞÞ0 : H 2 ðaðtÞÞ0 0 . Assumption G. Let 0
2
31 n 1 P Xi ðaðtÞÞ H 1 ðaðtÞÞ pffiffiffi 5A ¼ V hn ðaðtÞÞ, n i¼1 Var@Un;hn G n;hn ðaðtÞ; bðt; aðtÞÞÞ þ 4 0
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
124
where
"
V hn ðaðtÞÞ ¼
V ðaðtÞÞ V 21;hn ðaðtÞÞ
V 12;hn ðaðtÞÞ V 22;hn ðaðtÞÞ
#
with V ðaðtÞÞ positive definite. Then, for any sequence hn 40 with hn ! 0; nhn ! 1, we have 0 2 31 n 1 P p ffiffi ffi ðaðtÞÞ X ðaðtÞÞ H d 1 i 5A ! n i¼1 V hn ðaðtÞÞ1=2 @Un;hn G n;hn ðaðtÞ; bðt; aðtÞÞÞ þ 4 Nð0; IÞ. 0 pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi with hn ! 0; nhn ! 1 and nhn hn ! c1 (or nhn h2n ! c2 if RAssumption H. For any sequence pffiffiffiffiffiffiffi hn 40 vkðvÞ dv ¼ 0), Bn;hn ðaðtÞ; bÞ nhn ðEðG^ n;hn ðaðtÞ; bÞÞ G ðaðtÞ; bÞÞ ! BðaðtÞ; bÞ is uniform in b for some BðaðtÞ; bÞ continuous at b ¼ bðt; aðtÞÞ. Assumption I. For any dn # 0 and for any hn 40 with hn ! 0; nhn ! 1, we have sup
kUn;hn ðG n;hn ðaðtÞ; bÞ G hn ðaðtÞ; bÞÞ Un;hn ðG n;hn ðaðtÞ; bðt; aðtÞÞÞ G hn ðaðtÞ; bðt; aðtÞÞÞÞk
kbbðt;aðtÞÞkodn
¼ op ð1Þ. Assumption F is satisfied when GðaðtÞ; bÞ is differentiable. Assumption G is a central limit theorem. Given a R random sample, it is satisfied under mild conditions such as kðvÞ2þd o1 for some d40 and the existence of enough moments. Assumption H says that the bias should behave well. R This is satisfied e.g., when f t ðsjD; Z; X Þ has the uniformly bounded first order (or second order when vkðvÞ dv ¼ 0) derivative. Note that it also includes the uniform convergence of G hn ðaðtÞ; bÞ to GðaðtÞ; bÞ. Assumption I is stochastic equicontinuity with a Lindberg type condition (see e.g., Andrews, 1994; van der Vaart and Wellner, 1996, pp. 220 and 221). It is a restriction on the class of functions involved in G n;hn ðaðtÞ; bÞ. Since a uniform kernel or smooth kernels satisfying the Lipschitz condition in b are typically allowed, this assumption holds with various choices of kernels. Lemma 2. Suppose Assumptions A– I hold. Let
! n ^ aðtÞÞ Y i 1 X Di aðtÞ þ X 0i bðt; ^^ ^ ^ Gn;hn ðaðtÞ; bðt; aðtÞÞÞ ¼ Q Di k , nhn i¼1 i hn
^ aðtÞÞÞDða; ^ aðtÞÞÞ1 . For any sequence hn 40 with hn ! 0; ^ bðt; ^ bðt; where Q^ i ¼ Z i A^ n;hn X i with A^ n;hn ¼ Yða; R pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi 2 nhn ! 1, and nhn hn ! c1 (or nhn hn ! c2 and vkðvÞ dv ¼ 0), we have 2 3 #! " # " pffiffiffi ^ aðtÞÞÞ ^ c ðaðtÞ; bðt; nm 0 V ðaðtÞÞ 0 d 4 pffiffiffiffiffiffiffi 5!N ; ^^ ^ BðaðtÞÞ 0 V22 ðaðtÞÞ nhn ðG n;hn ðaðtÞ; bðt; aðtÞÞÞ GðaðtÞ; bðt; aðtÞÞÞÞ for some BðaðtÞÞ and V22 ðaðtÞÞ. Proof. See Appendix A. Lemma 2 shows that we can obtain an AR statistic in two steps: pffiffiffi d 2 ^ aðtÞÞÞÞ0 V^ ðaðtÞÞ1 ðpffiffinffim ^ aðtÞÞÞÞ ! ^ n ðaðtÞÞ ¼ ð nm ^ c ða; bðt; ^ c ða; bðt; w ðdimðZ i ÞÞ, AR ^ aðtÞÞ is obtained from quantile regression of where V^ ðaðtÞÞ is a consistent estimator for V ðaðtÞÞ, and bðt; Y i Di aðtÞ on X i . Note that V ðaðtÞÞ is taking into account the effect of Xi ðaðtÞÞ, which reflects the fact that the ^ aðtÞÞ. Adjusting the variance after the second step is standard in true bðt; aðtÞÞ is replaced with its estimator bðt; the two-step estimator theory (see e.g., Newey and McFadden, 1994). Estimation of V ðaðtÞÞ can be done by either the bootstrap or kernel smoothing. The bootstrap is however not computationally attractive, because
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
125
V ðaðtÞÞ is the continuous updating variance that depends on aðtÞ. To construct a confidence set, the bootstrap would have to be repeated for every hypothesized value of aðtÞ. For estimation of V ðaðtÞÞ via kernel smoothing, see Appendix A. ^ ^ aðtÞÞÞ and hn 40 are given as Lemma 2. Theorem 1. Suppose Assumptions A– I hold. Suppose that G^ n;hn ðaðtÞ; bðt; pffiffiffi ^ aðtÞÞÞÞ0 V^ 1=2 PV^ 1=2 ^ c ðaðtÞ; bðt; Letting V^ ðaðtÞÞ be a consistent estimator for V ðaðtÞÞ, define K^ n ðaðtÞÞ ¼ ð nm pffiffiffi ^^ ^ aðtÞÞÞÞ, where P is the projection matrix onto the column space of V^ 1=2 ðaðtÞÞG ^ c ðaðtÞ; bðt; ð nm n;hn ðaðtÞ; ^ aðtÞÞÞ. We then have bðt; d K^ n ðaðtÞÞ ! w2 ð1Þ,
pffiffiffiffiffiffiffi pffiffiffi where GðaðtÞ; bðt; aðtÞÞÞ is (i) 0, (ii) Oð1= nÞ, (iii) Oð1= nhn Þ, or (iv) of full rank. Proof. See Appendix A. Note that K^ n ðaðtÞÞ is not a score statistic of the GMM objective function. Although the score of the objective function is not available, the score statistic of Kleibergen (2005) can be mimicked by using the score of the limit function. In fact, K^ n ðaðtÞÞ is testing the first order condition of the GMM objective function in the limit, which can also be satisfied at points where the limit function is not actually minimized.4 Therefore, K^ n ðaðtÞÞ will have spurious power declines when the derivative of the limit function is equal to zero. This problem was also pointed out by Kleibergen (2005), and he suggested using the other orthogonal component of the AR statistic. To be more concrete, note that K^ n ðaðtÞÞ provides an orthogonal decomposition of ^ n ðaðtÞÞ: AR ^ n ðaðtÞÞ K^ n ðaðtÞÞ þ J^ n ðaðtÞÞ, AR
(11)
where K^ n ðaðtÞÞ and J^ n ðaðtÞÞ are orthogonal by construction. Even when K^ n ðaðtÞÞ does not have power, J^ n ðaðtÞÞ still does. Therefore, we can improve the power behavior of K^ n ðaðtÞÞ by combining it with J^ n ðaðtÞÞ. Tests with size a can be conducted by using a1 and a a1 critical values from w2 ð1Þ and w2 ðdimðZ i Þ 1Þ for K^ n ðaðtÞÞ and J^ n ðaðtÞÞ, respectively.5 Kleibergen (2005) also considered a GMM counterpart of the conditional likelihood ratio statistic of Moreira (2003). He proposed conditioning on a rank statistic testing the reduced rank of GðaðtÞ; bðt; aðtÞÞÞ. In principle, this approach can be extended, but such an extension is not considered here.6 Lastly, we point out that the AR statistic is robust to reduced ranks of the Jacobian in general but that its decomposition is valid only under the local-to-zero asymptotics. In our case, GðaðtÞ; bðt; aðtÞÞÞ is in fact a vector so that considering the local-to-zero asymptotics is not so restrictive. In a general case with multiple parameters, considering other forms of identification failure (e.g., reduced ranks of the Jacobian matrix) in improving the AR statistic is an important but difficult problem. 4. Simulations For simulation purposes, the simplest location model is used. The experiments use the following design: Y i ¼ bðU i Þ þ Di a0 , Di ¼ g1 þ Z 0i g2 þ V i ,
(12)
where bðsÞ ¼ F1 ðsÞ; U i uniformð0; 1Þ. This is a location model in the sense that aðtÞ ¼ a0 for all t 2 ð0; 1Þ. It is also assumed that Z i ¼ ½Z 2i ; Z 3i ; . . . ; Z 9i 0 iid Nð0; I 8 Þ, where I k is a k k identity matrix. Since K^ n ðaðtÞÞ tests the first order condition in the limit, S ¼ fa 2 R : K^ n ðaðtÞÞ ¼ 0g consists of those satisfying the first order condition of the objective function in the limit. When the objective function is uniquely minimized, S typically consists of a singleton when n is large enough, which corresponds to the Hodges–Lehman estimator. We thank an anonymous referee for pointing this out. 5 Let crta;k be a critical value from w2 ðkÞ at level a. Note that PrðK^ n ðaðtÞÞ4crta1 ;1 or J^ n ðaðtÞÞ4crtaa1 ;1dimðZi Þ Þ a1 þ ða a1 Þ a1 ða a1 Þ a by the independence of K^ n ðaðtÞÞ and J^ n ðaðtÞÞ. 6 Such an extension will require estimating V22 ðaðtÞÞ. 4
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
126
We assume that 0
½bðU i Þ; V i iid N 0;
"
1
r
r
1
#! ,
and that Z i and ½bðU i Þ; V i 0 are independent. Note that ra0 implies endogeneity of Di . The parameter of interest is the tth quantile effect of the exogenous changes of Di : That is, the goal is testing H 0 : aðtÞ ¼ aH for a given t.7 If it is known a priori that aðtÞ ¼ a0 for all t, then this is a standard simultaneous equations model and there are many other methods to estimate and test a0 , such as 2SLS and LIML. Nevertheless, this design is used for the following reasons. First, the instrumental quantile method is sometimes preferred even in this simple location model because of its robustness to outliers (Chen and Portnoy, 1996; Honore and Hu, 2004). Second, the possibility of quantile effect heterogeneity should be allowed for, and thus assuming a priori that aðtÞ ¼ a0 for all t is not desirable. Third, the purpose of the experiments is to see the power properties of the developed methods under the various values of the Jacobian of the limit function, and thus a sufficiently simple model with the analytically tractable Jacobian is preferred. For a given t 2 ð0; 1Þ, let mða; bÞ ½m1 ða; bÞ m2 ða; bÞ0 0 ¼ EðQXi ð1fY i pDi a þ bg tÞÞ, where QXi ¼ ½1 Z 0i 0 . Define bðt; aÞ to be a solution to m1 ða; bÞ ¼ 0 for each a, and then consider the concentrated moment condition: mc ðaÞ ¼ m2 ða; bðt; aÞÞ ¼ 0 when a ¼ aðtÞ ¼ a0 for any given t. Suppose that g2 ¼ 0 so that D and Z are independent. We then have mc ðaÞ ¼ EðZð1fY i pDi a þ bðt; aÞg tÞÞ ¼ EðZÞEð1fY i pDi a þ bðt; aÞg tÞ, where Eð1fY i pDi a þ bðt; aÞg tÞ ¼ 0 for any a 2 R by definition of bðt; aÞ. Note that the designed setup is simple enough to analytically compute the Jacobian at a ¼ a0 qmc bðt; a0 Þ2 1=2 ¼ G ¼ ð2pÞ exp (13) VarðZ i Þg2 . qa a¼a0 2 The full rank assumption of G hinges on several points. The variation of Z i should be sufficiently rich and g2 should be nonzero. Given the variation of Zi ; when g2 is too close to 0, identification becomes weak. Another interesting feature is that local identification is also affected by t. Recall that bðt; a0 Þ is the tth quantile of Y i Di a0 ¼ bðU i Þ. Hence, it is easy to see that bðt; a0 Þ ¼ bðtÞ ¼ F1 ðtÞ. Since expððF1 ðtÞÞ2 =2Þ is maximized at t ¼ 0:5, the median effect að0:5Þ is best identified. When the given instruments are weak, identification of the tail quantile effects will suffer more seriously than that of the median effect. The fact that identification depends on t is interesting. Although the median is better identified than the tails in this simple setup, this is purely due to the artificial design. In practice, it is never clear for what quantiles the given instruments will be most informative. In the power experiments, the sample size n ¼ 500 was used, and 1; 000 replications were made for Monte Carlo. r is set to be 0:8 so that Di is highly endogenous. g1 and g2 were set to be 1 and w½1 10 , respectively, where w 2 f0; 0:02; 0:05; 0:10; 0:20; 1:00g. When w ¼ 0, there is no identification, and when w ¼ 1:00, identification is strong. t ¼ 0:25; 0:50, and 0:75 were considered. Figs. 1–3 illustrate the power curves of the various test statistics. H 0 : aðtÞ ¼ 1 was considered as the various values of a0 were tried. hn ¼ n1=5 was used for the bandwidth choice.8 For expository purposes, the power curves of the dual inference method9 is illustrated. We also considered a conservative approach based on the joint test of aðtÞ and bðtÞ (see Chernozhukov et al., 2005). Specifically, we rejected H 0 : aðtÞ ¼ 1, when inf ARð1; bÞ4crt0:05; 9 , b p pffiffiffi pffiffiffi pffiffiffi ^ bÞÞ with W n ! Avarð nmð1; ^ bÞÞ, and crt0:05;9 is the 5% critical value ^ bÞÞ0 W n ð nmð1; where ARð1; bÞ ¼ ð nmð1; from w2 ð9Þ. Note that the exact procedure by Chernozhukov et al. (2005) is asymptotically equivalent to this chi-squared test. 7
We distinguish the true value a0 from the hypothesized value aH . The bandwidth choice issue is not addressed here because it is beyond the scope of this paper. 9 Under H 0 : aðtÞ ¼ aH , we ran the quantile regression of Y DaH on X ¼ 1 and Z; and constructed the Wald statistic testing the coefficients of Z equal 0 (see e.g., Chernozhukov and Hansen, 2008). 8
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
127
Fig. 1. Power curves for testing H 0 : að0:25Þ ¼ 1. The x-axis shows various values of a0 , and the y-axis shows the Monte Carlo rejection rates. The nominal size of the tests is 5%. C–H Dual is the dual method proposed by Chernozhukov and Hansen (2008), and Joint is the projection method based on the asymptotic version of Chernozhukov et al. (2005). For K–J combination, the experiments used 4% and 1% critical values for each K and J component, respectively.
ARTICLE IN PRESS 128
S.J. Jun / Journal of Econometrics 144 (2008) 118–138
Fig. 2. Power curves for testing H 0 : að0:50Þ ¼ 1. The x-axis shows various values of a0 , and the y-axis shows the Monte Carlo rejection rates. The nominal size of the tests is 5%. C–H Dual is the dual method proposed by Chernozhukov and Hansen (2008), and Joint is the projection method based on the asymptotic version of Chernozhukov et al. (2005). For K–J combination, the experiments used 4% and 1% critical values for each K and J component, respectively.
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
129
Fig. 3. Power curves for testing H 0 : að0:75Þ ¼ 1. The x-axis shows various values of a0 , and the y-axis shows the Monte Carlo rejection rates. The nominal size of the tests is 5%. C–H Dual is the dual method proposed by Chernozhukov and Hansen (2008), and Joint is the projection method based on the asymptotic version of Chernozhukov et al. (2005). For K–J combination, the experiments used 4% and 1% critical values for each K and J component, respectively.
ARTICLE IN PRESS 130
S.J. Jun / Journal of Econometrics 144 (2008) 118–138
The rejection rates of the test based on the projection scheme are all smaller than the nominal size 5% under the null. When the parameter aðtÞ is weakly identified, the power curves of the test based on the joint inference are uniformly below those of the test statistics using the concentrated moment condition. This is not surprising, because projecting out the part of bðtÞ from the joint inference on aðtÞ and bðtÞ is a conservative approach, when the interest is solely in the parameter aðtÞ.10 It is not difficult to expect that the cost of being conservative will increase as the number of (included) exogenous regressors becomes larger. The dual method, AR, K, and K J combination give the approximately correct size regardless of the quality of identification. In case of the complete lack of identification, none of these statistics have any spurious power. The power declines of the K statistic can be explained by the shape of the limit function since it is based on the score of the limit function. Note that the power declines of the K statistic could be easily fixed by combining it with the J statistic which is the other orthogonal component of the AR statistic. It is also worth noting that the power curve of the dual inference method is similar to that of the AR statistic. This is also expected, because both of them test the location and the overidentifying restrictions jointly. 5. Empirical example: returns to schooling This section considers an empirical example using the data analyzed by Angrist and Krueger (1991). Angrist and Krueger (1991) used birth quarter as an instrument for estimating the effects of compulsory schooling on earnings, which sparked much controversy about weak instruments. The sample used, consists of 329,509 men born between 1930 and 1939 from the 1980 census. This data set was also used by Angrist et al. (1999) and is accessible on the web via the Journal of Applied Econometrics. The dependent variable Y is the log weekly wage, and the endogenous variable D is the years of schooling. The exogenous variables X consist of 10 dummy variables indicating the year of birth. The instruments Z are the quarter of birth interacted with the year of birth. Hence, there are 30 instruments, 10 (included) exogenous variables, and a single endogenous variable. This is a simple specification of Angrist–Krueger, which does not include the state dummies and their interactions with the year of birth. While Angrist et al. (1999) estimate the mean effect by the conventional 2SLS, LIML, and the jacknife IV method, Honore and Hu (2004), and Chernozhukov and Hansen (2006) consider the effects of schooling on various quantiles. Chernozhukov and Hansen (2006) consider a different specification that includes the state dummies, and they use a different estimation method, which ignores the potential issue from weak instruments. Honore and Hu (2004) use the two-step AR statistic with a sub-sample of white men. Although these previous studies use a different sample or specification, it is interesting to compare them with the results obtained here because the potential weak instrument issue is explicitly considered here.11 Chernozhukov and Hansen (2006) found that education effects are larger at the lower end of the income distribution.12 It is, however, found that these results should be more carefully understood because the data are not informative for the lower quantile effects. As a bandwidth choice, hn ¼ n1=5 was used. The pdf of the standard normal distribution was chosen as a kernel. The 95% confidence sets are provided in Fig. 4. Returns to schooling for the tth quantile are defined as the effect of the exogenous change of education on the tth quantile response function of the potential log wage, which was denoted by aðtÞ in the previous sections. The values of aðtÞ that could not be rejected by the AR test and the K–J combination were collected to obtain the confidence sets. For the K–J combination, 4% and 1% critical values were used for each K and J component, respectively. The confidence sets found here are similar to those of Chernozhukov and Hansen (2006) for the upper quantiles. However, the lower quantiles are quite different. The robust confidence sets for the lower quantiles are much wider than the results in those studies. This indicates that the weak instrument problem is concerned 10 However, it should be emphasized that the projection approach allows exact finite sample inference. This is an important advantage of the projection method. 11 Honore and Hu (2004) noted the weak instrument problem. However, they only considered the 25th, 50th, and 75th percentiles with the AR statistic, which wastes many degrees of freedom. 12 In the working paper version of Lee (2007), he investigated the same sample and the same specification as those considered in this paper. He used a different method which ignores the potential weak instrument issue, and he reported similar results to those of Chernozhukov and Hansen (2006).
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
131
Fig. 4. 95% confidence sets for returns to schooling. The exogenous variables are 10 dummies indicating the years of birth, and the instruments are 30 dummies that consist of the quarters of birth interacted with the years of birth. For K–J combination, 4% and 1% critical values were used for each K and J component, respectively.
with the lower end of the income distribution and that the tight confidence sets for these quantiles are misleading. Although education effects on the upper quantiles seem to be quite constant, the effects on the lower quantiles still remain obscure. The confidence sets for the upper quantiles could be tightened by using the K–J combination. 6. Conclusion We considered the linear instrumental quantile model proposed by Chernozhukov and Hansen (2005, 2006) and developed test procedures that are robust to the potential identification difficulty (Dufour, 1997; Stock and Wright, 2000, among others). Specifically, we first considered a two-step AR statistic and obtained an orthogonal decomposition of the AR statistic, where the null distribution of each component is robust to weak values of the Jacobian of the concentrated moment condition. Removing the identification assumption is particularly interesting in quantile models because (1) defending instruments is not as easy as in mean regression models, and (2) given the instruments, the identification depends on what quantile effects are examined. It is never clear for what quantile effects the given instruments are most informative. In particular, when the instruments are weak, the identification of some quantile effects can be relatively harder than that of others. As an empirical example, returns to schooling using the Angrist–Krueger data were examined, and confidence sets for various quantile effects were obtained. Although returns to schooling for the upper quantiles seem to be quite constant, the robust confidence sets for the lower quantiles were far wider than those obtained by Chernozhukov and Hansen (2006). This indicates that the given instruments are not informative for the lower quantiles, and thus education effects on the lower distribution of income still remain obscure. The confidence sets for the upper quantiles could also be tightened by using the decomposition of the AR statistic. As an alternative in the literature, a control function approach in a triangular system has also been suggested for endogenous quantile analysis (Imbens and Newey, 2002; Chesher, 2003; Ma and Koenker, 2006; Lee, 2007). The advantage of the GMM approach is that the weak identification problem is well understood while there is little discussion on this topic in the control function approach. There is, however, an important limitation in the GMM approach: it only considers different quantiles of outcomes and does not explicitly model heterogeneous effects of instruments on an endogenous variable. In constrast, the triangular system directly models the first-stage equation and thus provides the better framework to discuss the idea of local
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
132
instruments and identification pitfalls: the instruments do not affect a particular quantile of the endogenous variable although they do affect others. Therefore, identification, lack of identification, and inference in the triangular system are worthy of further study, and they are left for the future research. Lastly, we point out that the methods proposed in this paper can apply to inference in other nondifferentiable models such as rank regressions.13 Appendix A A.1. Proof of Lemma 1 Recall that m1 ða; bðt; aÞÞ ¼ EðX ð1fY pDa þ X 0 bðt; aÞg tÞÞ ¼ 0 by definition. Note that we can rewrite this as m1 ða; bðt; aÞÞ ¼ EðX ðF t ðDða aðtÞÞ þ X 0 ðbðt; aÞ bðt; aðtÞÞjD; Z; X Þ tÞÞÞ ¼ 0
(14)
due to the law of iterated expectations, where F t ðjÞ is the conditional distribution function of t . Under Assumption B, we apply implicit function theorem to obtain qbðt; aÞ ¼ EðXX 0 f t ð0jD; Z; X ÞÞ1 EðXDf t ð0jD; Z; X ÞÞ. (15) qa a¼aðtÞ Now, consider the concentrated moment function mc ðaÞ ¼ m2 ða; bðt; aÞÞ ¼ EðZð1fY pDa þ X 0 bðt; aÞg tÞÞ ¼ EðZðF t ðDða aðtÞÞ þ X 0 ðbðt; aÞ bðt; aðtÞÞjD; Z; X Þ tÞÞÞ
(16)
by the definition of t and the law of iterated expectations. By the chain rule, it follows that qmc ¼ EðZDf t ð0jD; Z; X ÞÞ EðZX 0 f t ð0jD; Z; X ÞÞEðXX 0 f t ð0jD; Z; X ÞÞ1 EðXDf t ð0jD; Z; X ÞÞ qa a¼aðtÞ ¼ EððZ AX ÞDf t ð0jD; Z; X ÞÞ,
(17)
where A ¼ EðZX 0 f t ð0jD; Z; X ÞÞEðXX 0 f t ð0jD; Z; X ÞÞ1 . A.2. Lemmas for the Proof of Lemma 2 Lemma 3. For any sequence hn 40 with hn ! 0; nhn ! 1, we have " # pffiffiffi ^ H ðaðtÞÞ n ð bðt; aðtÞÞ bðt; aðtÞÞÞ 1 ^ aðtÞÞÞ ¼ Un;h G n;h ðaðtÞ; bðt; aðtÞÞÞ þ þ op ð1Þ. Un;hn G n;hn ðaðtÞ; bðt; n n 0 ^ aðtÞÞ, respectively. Proof. For the sake of notational simplicity, we will write bt and b^ t for bðt; aðtÞÞ and bðt; Note that Gn;hn ðaðtÞ; bÞ ¼ GðaðtÞ; bÞ þ G n;hn ðaðtÞ; bt Þ þ ðG n;hn ðaðtÞ; bÞ G hn ðaðtÞ; bÞÞ ðG n;hn ðaðtÞ; bt Þ G hn ðaðtÞ; bt ÞÞ þ ðG hn ðaðtÞ; bÞ GðaðtÞ; bÞÞ ¼ Ln;hn ðaðtÞ; bÞ þ ðG n;hn ðaðtÞ; bÞ G hn ðaðtÞ; bÞÞ ðGn;hn ðaðtÞ; bt Þ Ghn ðaðtÞ; bt ÞÞ þ ðG hn ðaðtÞ; bÞ GðaðtÞ; bÞÞ þ oðkb bt kÞ,
(18)
where Ln;hn ðaðtÞ; bÞ Hðb bt Þ þ Gn;hn ðaðtÞ; bt Þ; note that we used the fact that G hn ðaðtÞ; bt Þ ¼ 0. Intuition for this decomposition is that G n;hn ðaðtÞ; bÞ consists of the linear term, the stochastic equicontinuity term, the bias 13
We thank an anonymous referee for pointing this out.
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
133
term, and the remainder. Therefore, kUn;hn ðG n;hn ðaðtÞ; bÞ Ln;hn ðaðtÞ; bÞÞk pkUn;hn ðG n;hn ðaðtÞ; bÞ Ghn ðaðtÞ; bÞÞ Un;hn ðGn;hn ðaðtÞ; bt Þ Ghn ðaðtÞ; bt ÞÞk pffiffiffi þ kUn;hn ðG hn ðaðtÞ; bÞ GðaðtÞ; bÞÞk þ oð nkb bt kÞ.
(19)
Note here that " Un;hn ðG hn ðaðtÞ; bÞ GðaðtÞ; bÞÞ ¼
#
0 Bn;hn ðaðtÞ; bÞ Bn;hn ðaðtÞ; bt Þ
,
which implies that for any dn # 0, kUn;hn ðGhn ðaðtÞ; bÞ GðaðtÞ; bÞÞk
sup kbbt kpdn
sup
¼
kBn;hn ðaðtÞ; bÞ Bn;hn ðaðtÞ; bt Þk
kbbt kpdn
p
sup
kBn;hn ðaðtÞ; bÞ BðaðtÞ; bÞk þ
sup
kbbt kpdn
kbbt kpdn
sup
kBn;hn ðaðtÞ; bt Þ BðaðtÞ; bt Þk ¼ oð1Þ
þ
kBðaðtÞ; bÞ BðaðtÞ; bt Þk (20)
kbbt kpdn
by Assumption H. Since b^ t is consistent for bt , we therefore know kUn;hn ðG hn ðaðtÞ; b^ t Þ GðaðtÞ; bt ÞÞk ¼ op ð1Þ
(21)
with probability approaching to 1. Therefore, combining Eqs. (19), (21), and Assumption I, we obtain kUn;hn ðG n;hn ðaðtÞ; b^ t Þ Ln;hn ðaðtÞ; b^ t ÞÞk ¼ op ð1Þ
(22)
with probability approaching to 1. Therefore, Un;hn Gn;hn ðaðtÞ; b^ t Þ ¼ Un;hn Ln;hn ðaðtÞ; b^ t Þ þ op ð1Þ ¼ Un;hn HðaðtÞÞðb^ t bt Þ þ Un;hn G n;hn ðaðtÞ; bt Þ þ op ð1Þ " # pffiffiffi H 1 ðaðtÞÞ nðb^ t bt Þ ¼ Un;hn G n;hn ðaðtÞ; bt Þ þ þ op ð1Þ 0 pffiffiffiffiffi pffiffiffi because hn H 2 ðaðtÞÞ nðb^ t bt Þ ¼ hn Op ð1Þ ¼ op ð1Þ. &
(23)
Lemma 4. ^ aðtÞÞÞ1 I DðaðtÞ; ^ aðtÞÞÞ1 A ^ ^ bðt; bðt; vecðA^ n;hn Þ vecðAÞ ¼ ½DðaðtÞ; 2 3 ^ aðtÞÞÞÞ vecðYðaðtÞ; bðt; aðtÞÞÞÞ ^ vecðYðaðtÞ; bðt; 5. 4 ^ aðtÞÞÞÞ vecðDðaðtÞ; bðt; aðtÞÞÞÞ ^ vecðDðaðtÞ; bðt; ^ aðtÞÞ Y i =hn Þ, Proof. For the sake of simplicity, we will write f i ; k^i for f t ð0jDi ; Z i ; X i Þ and kðDi aðtÞ þ X 0i bðt; 1 0 respectively. Using A ¼ EðZ i X i f i ÞEðX i X i f i Þ , we first obtain that ! ! !1 n n n X X X 1 1 1 Z i X 0i k^i EðZi X 0i f i Þ X i X 0i k^i A X i X 0i k^i EðX 1 X 0i f i Þ A^ n;hn A ¼ nhn i¼1 nhn i¼1 nhn i¼1 !1 n 1 X 0^ X i X i ki . nhn i¼1 Then, take the vec-operator.
&
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
134
R pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi Lemma 5. For a sequence hn 40 with hn ! 0; nhn ! 1, and nhn hn ! c1 (or nhn ! c2 and vkðvÞ dv ¼ 0), we have " # #! " # " pffiffiffi ^ aðtÞÞÞ 0 V ðaðtÞÞ 0 ^ c ðaðtÞ; bðt; nm d pffiffiffiffiffiffiffi ^ ; ^ aðtÞÞÞ G ðaðtÞ; bðt; aðtÞÞÞÞ ! N BðaðtÞ; bðt; aðtÞÞÞ 0 V 22 ðaðtÞÞ nhn ðG n;h ðaðtÞ; bðt; n
for some V 22 ðaðtÞÞ. ^ aðtÞÞ. First, write Proof. For the sake of simplicity, we will write bt ; b^ t for bðt; aðtÞÞ and bðt; 2 3 n 1 P 2 3 n pffiffiffi T ðaðtÞÞ i 1 P 6 7 n i¼1 Xi ðaðtÞÞ H 1 ðaðtÞÞ pffiffiffi 7 5¼6 n i¼1 Un;hn G n;hn ðaðtÞ; bt Þ þ 4 6 1 P 7, n 4 pffiffiffiffiffiffiffi 5 S i;hn ðaðtÞÞ 0 nhn i¼1 where T i ðaðtÞÞ ¼ Z i ð1fti p0g tÞ þ H 1 ðaðtÞÞXi ðaðtÞÞ and S i;hn ðaðtÞÞ ¼ oi kðti =hn Þ Eðoi kðti =hn ÞÞ with oi ¼ ½Q0i Di vecðZ i X 0i Þ0 vecðX i X 0i Þ0 0 . Since the data are iid, it is easy to see that 31 02 n 1 P p ffiffi ffi T ðaðtÞÞ i B6 7C n i¼1 B6 7C 6 7C B V hn ðaðtÞÞ ¼ VarB6 7C n 1 P @4 pffiffiffiffiffiffiffi Si;hn ðaðtÞÞ 5A nhn i¼1 2 3 1 EðT i ðaðtÞÞT i ðaðtÞÞ0 Þ mc ðaðtÞÞmc ðaðtÞÞ0 pffiffiffiffiffi EðT i ðaðtÞÞSi;hn ðaðtÞÞ0 Þ 6 7 hn 6 7 ¼6 7. 1 1 4 0 0 5 pffiffiffiffiffi EðS i;hn ðaðtÞÞT i ðaðtÞÞ Þ EðSi;hn ðaðtÞÞSi;hn ðaðtÞÞ Þ hn hn Although mc ðaðtÞÞ ¼ 0, we subtracted its outer-product to emphasize that it is a continuous-updating variance which depends on the null.14 Note here that ! 2 Z 1 1 s 0 0 EðS i;hn ðaðtÞÞSi;hn ðaðtÞÞ Þ ¼ E oi oi k f t ðsjDi ; Z i ; X i Þ ds hn hn hn Z 1 s hn E wi k f ðsjDi ; Z i ; X i Þ ds hn hn t Z 0 1 s E wi k f ðsjDi ; Zi ; X i Þ ds hn h n t Z 2 0 ! E oi oi f t ð0jDi ; Z i ; X i Þ kðsÞ ds V 22 ðaðtÞÞ. (24) By the same change-of-variable technique, we also have Z pffiffiffiffiffi 1 1 s pffiffiffiffiffi EðT i ðaðtÞÞS i;hn ðaðtÞÞ0 Þ ¼ hn E T i ðaðtÞÞo0i k f ðsjDi ; Z i ; X i ; T i ðaðtÞÞÞ ds hn hn hn Z pffiffiffiffiffi s hn E T i ðaðtÞÞÞEðo0i k f t ðsjDi ; Z i ; X i Þ ds ! 0, hn where f ðsjDi ; Z i ; X i ; T i ðaðtÞÞÞ is the conditional density of t . 14
Note that Xi ðaÞ and Si;hn ðaÞ have mean zero for every a 2 R by definition.
(25)
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
Therefore, we now have " V ðaðtÞÞ V hn ðaðtÞÞ ! 0
135
#
0 V 22 ðaðtÞÞ
,
(26)
where V ðaðtÞÞ ¼ EðT i ðaðtÞÞT i ðaðtÞÞ0 Þ mc ðaðtÞÞmc ðaðtÞÞ0 . Combining Lemma 3 and (26), we obtain "
V ðaðtÞÞ
0
0
V 22 ðaðtÞÞ
#1=2 Un;hn G n;hn ðaðtÞ; b^ t Þ
31 n 1 P V ðaðtÞÞ 0 Xi ðaðtÞÞ 7C B 6 H 1 pffiffinffi i¼1 ¼ @Un;hn G n;hn ðaðtÞ; bt Þ þ 4 5A þ op ð1Þ 0 V 22 ðaðtÞÞ 0 0 2 31 n 1 P Xi ðaðtÞÞ 7C d 6 H 1 pffiffinffi 1=2 B i¼1 ¼ V hn @Un;hn G n;hn ðaðtÞ; bt Þ þ 4 (27) 5A þ op ð1Þ ! Nð0; IÞ. 0 pffiffiffiffiffiffiffi ^ ðaðtÞ; bt ÞÞ G ðaðtÞ; bt ÞÞ ! BðaðtÞ; bt Þ by Assumption H, and combine it with Now, recall that nhn ðEðG (27). & "
#1=2
0
2
A.3. Proof of Lemma 2 ^ aðtÞÞ for simplicity. From Lemma 5, recall that bt and b^ t will continue to denote bðt; aðtÞÞ and bðt; 3 2 pffiffiffi 3 2 02 31 ^ c ðaðtÞ; b^ t Þ nm 0 V ðaðtÞÞ 0 0 0 7 6 pffiffiffiffiffiffiffi 7 d B6 B G 7 6 0 6 7C nhn ðG^ n;hn ðaðtÞ; b^ t Þ GðaðtÞ; bt ÞÞ 7 6 7 6 B6 7C ! ; N 7 6 pffiffiffiffiffiffiffi 6 7 B 6 7 C, ^ n;h ðaðtÞ; b^ t ÞÞ vecðYðaðtÞ; bt ÞÞÞ 7 B 6 nhn ðvecðY 4 5 @ 4 5A 0 V ðaðtÞÞ Y 22 n 5 4 pffiffiffiffiffiffiffi ^ n;h ðaðtÞ; b^ t ÞÞ vecðDðaðtÞ; bt ÞÞÞ BD 0 nhn ðvecðD n where BG ; BY ; BD are the corresponding components of BðaðtÞ; bt Þ. By Lemma 4, we now obtain 2 3 pffiffiffi ^ c ðaðtÞ; b^ t Þ 3 2 02 31 nm V ðaðtÞÞ 0 0 0 6 pffiffiffiffiffiffiffi 7 6 nhn ðG^ n;hn ðaðtÞ; b^ t Þ GðaðtÞ; bt ÞÞ 7 d B6 7 6 0 C 6 7 ! N @4 V 22 ðaðtÞÞ V 23 ðaðtÞÞ 7 BG 5; 4 5A, pffiffiffiffiffiffiffi 6 7 nhn ðvecðA^ n;hn Þ vecðAÞÞ 4 5 0 V 32 ðaðtÞÞ V 33 ðaðtÞÞ K 1 BY þ K 2 BD (28) ^ ^ where ½K 1 K 2 ¼ p lim½DðaðtÞ; b^ t Þ1 I DðaðtÞ; b^ t Þ1 A, and V :: ðaðtÞÞ are defined from K 1 ; K 2 , and V 22 ðaðtÞÞ. Note that n 1 X ^^ ^ ^ ^ ^ ðaðtÞ; b Þ ¼ G ðaðtÞ; b Þ ð A AÞ X i Di k^i;hn G n;hn n;hn n;hn t t nhn i¼1 ! n X 1 ^ n;hn ðaðtÞ; b^ t Þ ¼G X i Di k^i I vecðA^ h;hn Þ vecðAÞ , nhn i¼1
(29)
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
136
P where k^i ¼ kðDi aðtÞ þ X 0i b^ t Y i =hn Þ. Therefore, letting K 3 ¼ p limð1=nhn Þ ni¼1 X i Di k^i , we know that pffiffiffiffiffiffiffi ^ pffiffiffiffiffiffiffi ^ n;hn ðaðtÞ; b^ t Þ GðaðtÞ; bt ÞÞ ¼ nhn ðGn; ^ hn ðaðtÞ; b^ t Þ GðaðtÞ; bt ÞÞ nhn ðG pffiffiffiffiffiffiffi ðK 3 IÞ nhn ðvecðA^ h;hn Þ vecðAÞÞ þ op ð1Þ.
(30)
Combining (28) and (30) proves the lemma. A.4. Proof of Theorem 1 First, recall from Lemma 2 that 2 3 " # " # " pffiffiffi ^ c ðaðtÞ; b^ t ÞÞ nm 0 V ðaðtÞÞ C1 d 4 pffiffiffiffiffiffiffi 5! N ; ^^ BðaðtÞÞ 0 C2 nhn ðGn;hn ðaðtÞ; b^ t Þ GðaðtÞ; bt ÞÞ
0 V22 ðaðtÞÞ
Note here that C1 and C2 are independent. Define pffiffiffiffiffiffiffi ^ pffiffiffi ^ n;hn ðaðtÞ; b^ t ÞÞ0 V ðaðtÞÞ1 ð nm ^ c ðaðtÞ; b^ t ÞÞ. Sn ð nhn G
#! .
(31)
We will consider three cases separately. pffiffiffiffiffiffiffi ^ pffiffiffi d ^ n;hn ðaðtÞ; b^ t Þ ! C2 . It then follows that Case 1: GðaðtÞ; bt Þ ¼ 0 or GðaðtÞ; bt Þ ¼ C 1 = n. Note that nhn G d Sn ! C02 V ðaðtÞÞ1 C1 , which is Nð0; C02 V ðaðtÞÞ1 C2 Þ conditional on C2 . Therefore, it follows that d
S0n ðC02 V ðaðtÞÞ1 C2 Þ1 S n ! w2 ð1Þ
conditional on C2 .
(32)
Since the asymptotic distribution does not depend on C2 , it is in fact unconditional. Replacing C2 with pffiffiffiffiffiffiffi ^ ^ n;hn ðaðtÞ; b^ t Þ yields nhn G pffiffiffiffiffiffiffi ^ pffiffiffiffiffiffiffi ^ d 2 ^ n;hn ðaðtÞ; b^ t ÞÞÞ1 Sn ! K n ðaðtÞÞ S 0n ðð nhn G^ n;hn ðaðtÞ; b^ t ÞÞ0 V ðaðtÞÞ1 ð nhn G w ð1Þ. (33) Replacing V ðaðtÞÞ with V^ ðaðtÞÞpcompletes the proof.pffiffiffiffiffiffiffi ffiffiffiffiffiffiffi d ^ Case 2: GðaðtÞ; bt Þ ¼ C 2 = nhn . Note that nhn G^ n;hn ðaðtÞ; b^ t Þ ! C2 þ C 2 . It then follows that d
Sn !ðC2 þ C 2 Þ0 V ðaðtÞÞ1 C1 , which is Nð0; ðC2 þ C 2 Þ0 V ðaðtÞÞ1 ðC2 þ C 2 ÞÞ conditional on C2 . Therefore, it follows that d
S0n ððC2 þ C 2 Þ0 V ðaðtÞÞ1 ðC2 þ C 2 ÞÞ1 S n ! w2 ð1Þ conditional on C2 .
(34)
Since the asymptotic distribution does not depend on C2 , it is in fact unconditional. Replacing C2 þ C 2 with pffiffiffiffiffiffiffi ^ ^ n;hn ðaðtÞ; b^ t Þ yields nhn G pffiffiffiffiffiffiffi ^ pffiffiffiffiffiffiffi ^ d 2 ^ n;hn ðaðtÞ; b^ t ÞÞÞ1 Sn ! K n ðaðtÞÞ S 0n ðð nhn G^ n;hn ðaðtÞ; b^ t ÞÞ0 V ðaðtÞÞ1 ð nhn G w ð1Þ. (35) Replacing V ðaðtÞÞ with V^ ðaðtÞÞ completes the proof. p ^ Case 3: GðaðtÞ; bt Þ has a full rank. In this case, we have G^ n;hn ðaðtÞ; b^ t Þ ! GðaðtÞ; bt Þ. Therefore, we have 1 d pffiffiffiffiffiffiffi S n ! GðaðtÞ; bt ÞV ðaðtÞÞ1 C1 Nð0; GðaðtÞ; bt ÞV ðaðtÞÞ1 GðaðtÞ; bt ÞÞ. nhn It then follows that 0 1 1 d Sn ðGðaðtÞ; bt ÞV ðaðtÞÞ1 GðaðtÞ; bt ÞÞ1 S n ! w2 ð1Þ. nhn nhn ^^ ^ ^ Replacing GðaðtÞ; bt Þ and V ðaðtÞÞ with G n;hn ðaðtÞ; bt Þ and V ðaðtÞÞ completes the proof.
(36)
ARTICLE IN PRESS S.J. Jun / Journal of Econometrics 144 (2008) 118–138
137
A.5. First step quantile regression and V ðaðtÞÞ ^ aðtÞÞ. Note first that H 1 ðaðtÞÞ For the sake of simplicity, we continue to write bt and b^ t for bðt; aðtÞÞ and bðt; 0 0 ^ is given by qmc ðaðtÞ; b=qb jb¼bt ¼ EðZi X i f t ð0jDi ; Z i ; X i ÞÞ. Suppose that bt is obtained by standard quantile regression of Y i Di aðtÞ on X i . It then follows that Xi ðaðtÞÞ ¼ EðX i X 0i f t ð0jX i ÞÞ1 X i ð1fY i Di aðtÞ X i bt p0g tÞ. In this case, V ðaðtÞÞ is therefore given by V ðaðtÞÞ ¼ VarððZ i H 1 ðaðtÞÞH x ðaðtÞÞ1 X i Þð1fY i Di aðtÞ X i bt p0g tÞÞ, where H x ðaðtÞÞ ¼
(37)
EðX i X 0i
1 H^ 1 ðaðtÞÞ ¼ nhn 1 H^ x ðaðtÞÞ ¼ nhn
f t ð0jX i ÞÞ. Note here that H 1 ðaðtÞÞ and H x ðaðtÞÞ can be estimated by n X Di aðtÞ þ X 0i bt Y i 0 Zi X i k , hn i¼1 n X Di aðtÞ þ X 0i bt Y i 0 X iX ik , hn i¼1
(38)
respectively. Therefore, under mild conditions such as Powell (1991), V ðaðtÞÞ can be estimated by the sample variance of Eq. (37) by using H^ 1 ðaðtÞÞ; H^ x ðaðtÞÞ, and b^ t . References Abadie, A., Angrist, J., Imbens, G., 2002. Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings. Econometrica 70, 91–117. Amemiya, T., 1982. Two stage least absolute deviations estimators. Econometrica 50, 689–711. Anderson, T.W., Rubin, H., 1949. Estimation of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics 20, 46–63. Andrews, D.W.K., 1994. Empirical Process Methods in Econometrics. In: Handbook of Econometrics, vol. 4, pp. 2248–2294 (Chapter 37). Angrist, J.D., Krueger, A.B., 1991. Does compulsory schooling attendance affect schooling and earnings? Quarterly Journal of Economics 106, 979–1014. Angrist, J.D., Imbens, G.W., Krueger, A.B., 1999. Jacknife instrumental variables estimation. Journal of Applied Econometrics 14, 57–67. Buchinsky, M., 1994. Changes in the U.S. wage structure 1963–1987: application of quantile regression. Econometrica 62, 405–458. Buchinsky, M., 1998. Recent advances in quantile regression models: a practical guideline for empirical research. Journal of Human Resources 33, 88–126. Chamberlain, G., 1987. Asymptotic efficiency in estimation with conditional moment restrictions. Journal of Econometrics 34, 305–334. Chen, L.-A., Portnoy, S., 1996. Two-stage regression quantiles and two-stage trimmed least squares estimators for structural equation models. Communications in Statistics: Theory and Method 25, 1005–1032. Chernozhukov, V., Hansen, C., 2005. An IV model of quantile treatment effects. Econometrica 73, 245–261. Chernozhukov, V., Hansen, C., 2006. Instrumental quantile regression inference for structural and treatment effect models. Journal of Econometrics 132, 491–525. Chernozhukov, V., Hansen, C., 2008. Instrumental variable quantile regression: A robust inference approach. Journal of Econometrics 142, 379–398. Chernozhukov, V., Hansen, C., Jansson, M., 2005. Finite sample inference for quantile regression models. MIT Working Paper. Chesher, A., 2003. Identification in nonseparable models. Econometrica 71, 1405–1441. Dufour, J.-M., 1997. Some impossibility theories in econometrics with applications to structural and dynamic models. Econometrica 65, 1365–1388. Hahn, J., Hausman, J., 2002. A new specification test for the validity of instrumental variables. Econometrica 70, 163–189. Honore, B.E., Hu, L., 2004. On the performance of some robust instrumental variables estimators. Journal of Business and Economic Statistics 22, 30–39. Horowitz, J.L., 1992. A smoothed maximum score estimator for the binary response model. Econometrica 60, 505–531. Imbens, G.W., Newey, W.K., 2002. Identification and estimation of triangular simultaneous equation models without additivity. NBER Working Paper. Kleibergen, F., 2002. Pivotal statistics for testing structural parameters in instrumental variables regression. Econometrica 70, 1781–1803. Kleibergen, F., 2004. Pivotal statistics for testing subsets of structural parameters in the IV regression model. Review of Economics and Statistics 86, 418–423. Kleibergen, F., 2005. Testing parameters in GMM without assuming that they are identified. Econometrica 73, 1103–1123. Koenker, R., Basset Jr., G., 1978. Regression quantiles. Econometrica 46, 33–50.
ARTICLE IN PRESS 138
S.J. Jun / Journal of Econometrics 144 (2008) 118–138
Lee, S., 2007. Endogeneity in quantile regression models: A control function approach. Journal of Econometrics 141, 1131–1158. Ma, L., Koenker, R., 2006. Quantile regression methods for recursive structural equation models. Journal of Econometrics 134, 471–506. MaCurdy, T., Timmins, C., 2001. Bounding the influence of attrition on intertemporal wage variation in the NLSY. Duke University Working Paper. Moreira, M.J., 2003. A conditional likelihood ratio test for structural models. Econometrica 71, 1027–2048. Newey, W.K., McFadden, D., 1994. Large sample estimation and hypothesis testing. In: Handbook of Econometrics, vol. 4, pp. 2113–2148 (Chapter 36). Pakes, A., Pollard, D., 1989. Simulation and the asymptotics of optimization estimators. Econometrica 57, 1027–1057. Powell, J.L., 1991 Estimation of monotonic regression models under quantile restrictions. In: Nonparametric and Semiparametric Methods in Econometrics and Statistics. Cambridge University Press, Cambridge. Staiger, D., Stock, J.H., 1997. Instrumental variables regression with weak instruments. Econometrica 65, 557–586. Stock, J.H., Wright, J.H., 2000. GMM with weak identification. Econometrica 68, 1055–1096. Stock, J.H., Wright, J.H., Yogo, M., 2002. A survey of weak instruments and weak identification in generalized method of moments. Journal of Business and Economic Statistics 20, 518–529. van der Vaart, A.W., Wellner, J.A., 1996. Weak Convergence and Empirical Processes. Springer, New York.