Weak convergence of local quantile treatment effect processes

Weak convergence of local quantile treatment effect processes

Economics Letters 162 (2018) 49–52 Contents lists available at ScienceDirect Economics Letters journal homepage: www.elsevier.com/locate/ecolet Wea...

315KB Sizes 0 Downloads 77 Views

Economics Letters 162 (2018) 49–52

Contents lists available at ScienceDirect

Economics Letters journal homepage: www.elsevier.com/locate/ecolet

Weak convergence of local quantile treatment effect processes Ju Hyun Kim a , Byoung G. Park b, * a b

Department of Economics, University of North Carolina at Chapel Hill, USA Department of Economics, University at Albany, 1400 Washington Avenue, Albany, NY 12222, USA

highlights • The weak convergence of the local quantile treatment effect (LQTE) estimator is established. • The empirical bootstrap is proposed to consistently estimate the limiting distribution of the LQTE process. • Examples are given to illustrate the use of the limiting distribution of the LQTE process.

article

info

Article history: Received 11 August 2017 Received in revised form 20 October 2017 Accepted 23 October 2017 Available online 4 November 2017

a b s t r a c t This paper considers a quantile regression process in the instrument variable model of Abadie et al. (2002). We extend pointwise analysis of local quantile treatment effects (LQTE) to the quantile process by establishing its weak convergence. We discuss the usefulness of our result in the context of hypothesis testing for the LQTE process. © 2017 Elsevier B.V. All rights reserved.

JEL classification: C13 C14 C15 C35 C36 Keywords: Treatment effects Quantile regression Endogeneity Semiparametric model Weak convergence Bootstrap

1. Introduction Since the seminal work of Imbens and Angrist (1994), the local treatment effects framework has drawn a considerable amount of attention in economics and econometrics. This approach has opened the way to allow for self-selection and unobserved heterogeneity in the inference for causal impacts. Abadie et al. (2002) developed an easy-to-compute method of estimating the local quantile treatment effects (LQTE) in semiparametric models. They focus on the pointwise inference at a given quantile level, but it has remained unanswered in the literature how to infer the quantile process over multiple quantile levels in the same setup. In this paper, we extend pointwise analysis to the quantile process author. * Corresponding E-mail addresses: [email protected] (J.H. Kim), [email protected] (B.G. Park). https://doi.org/10.1016/j.econlet.2017.10.021 0165-1765/© 2017 Elsevier B.V. All rights reserved.

by establishing its weak convergence, and propose the inference methods for the process. The information on quantile processes provides us with key answers to interesting questions about the distributional impacts of some policy. For example, researchers may be interested in if people at all quantile levels benefit from some policy. This can be shown by stochastic dominance test between marginal distributions of potential outcomes. Also, they may ask how the benefits of some policy are distributed across different quantile levels, which can be tested by jointly comparing the impacts at different quantile levels. There has been a large literature on the uniform inference for quantile regression processes since pioneering papers including Gutenbrunner and Jurecková (1992) and Koenker and Xiao (2002). For the quantile treatment effects, Firpo (2007), Firpo and Pinto (2016) and Ferreira et al. (2017) develop and apply estimators of unconditional distributional effects under the selection on

50

J.H. Kim, B.G. Park / Economics Letters 162 (2018) 49–52

observables assumption. Chernozhukov and Hansen (2004, 2007) consider the uniform inference on conditional quantile treatment effects under selection on unobservables. While our work is closely related to the literature, it is the first paper developing uniform inference theory for the LQTE process in the model of Abadie et al. (2002). The remaining part of the paper is organized as follows. In Section 2, we discuss the LQTE setup. Section 3 proposes estimation and inference methods by establishing weak convergence of local quantile treatment effect processes. Section 4 reports Monte Carlo simulation results to investigate the finite sample performance of our proposed method for hypothesis testing.

One particular case of the functional form of h is a linear conditional quantile model h(D, X , θ0 (τ ), τ ) = α0 (τ ) · D + X ⊤ β0 (τ ).

(1)

Here, θ0 (τ ) = (α0 (τ ), β0 (τ )) , and α0 (τ ) is the parameter representing the causal effect of D on the outcome in the model. Some interesting questions in policy evaluation involve hypotheses testing based on the quantile process α0 (·) over the entire quantile. ⊤



Example 1. A policy maker might be interested in testing whether the treatment is beneficial to everyone. This test involves the following hypotheses: H0 :α0 (τ ) ≥ 0 for all τ ∈ T ,

2. Setup of the local quantile treatment effect model

H1 :α0 (τ ) < 0 for some τ ∈ T .

We consider the potential outcome framework with a binary treatment and a binary instrument. Let D ∈ {0, 1} be the indicator for the treatment intake, and Yd ∈ R be the potential outcome associated with the treatment state d ∈ {0, 1}. The observed outcome Y can be written as Y = DY1 + (1 − D)Y0 . Suppose that we observe a binary instrument Z ∈ {0, 1}. The instrument affects the choice of treatment D. Let Dz ∈ {0, 1} denote the potential treatment state when Z = z with z ∈ {0, 1}. X is a vector of observed characteristics. In sum, the underlying model consists of variables (Y1 , Y0 , D1 , D0 , X , Z ), and the econometrician only observes a random sample {(Yi , Di , Xi , Zi )}ni=1 . We impose the same assumptions as those in Abadie et al. (2002).

Example 2. The second example is the hypothesis testing for the constant quantile treatment effect. The constant effect across quantiles means that the treatment affects only the location of outcome, but not any other moments, in which α0 (τ ) is constant across all τ ∈ T . Then the hypotheses are formulated as follows:

Assumption 1. The following assumptions hold conditional on X almost surely. (i) Independence: (Y1 , Y0 , D1 , D0 ) is jointly independent of Z . (ii) Nontrivial Assignment: Pr(Z = 1|X ) ∈ (0, 1). (iii) First-Stage: E [D1 |X ] ̸ = E [D0 |X ]. (iv) Monotonicity: Pr(D1 ≥ D0 |X ) = 1.

where ψ is the gradient of the check function given by

Assumption 1 is a restatement of the assumptions in Abadie et al. (2002) for our model. Assumption 1(i) is a standard independence assumption in the heterogeneous treatment effect model. Assumption 1(ii) is unlikely controversial for the discrete instrument in the literature. Assumption 1(iii) is the relevance condition for the instrument. Assumption 1(iv) is the monotonicity assumption, which is also known as the ‘‘no defier’’ assumption. It is the key identifying assumption. The individuals with D1 > D0 are referred to as the compliers. Let QYd (τ |X , D1 > D0 ) denote the conditional τ th quantile of Yd given X and D1 > D0 for d ∈ {0, 1} and τ ∈ (0, 1). It is well known that the conditional marginal distributions of the potential outcomes Y0 and Y1 given X are identified for compliers, which the following lemma in Abadie et al. (2002) formally states. Lemma 1. Under Assumption 1, QYd (τ |X , D1 > D0 ) is identified for d ∈ {0, 1}, τ ∈ (0, 1), and almost any X . Here we consider a known functional form as a semiparametric restriction, which is widely imposed in practice. Assumption 2. Let T be a subinterval of (0, 1). For any τ ∈ T , there exists θ0 (τ ) ∈ Θ such that QY (τ |D, X , D1 > D0 ) = h(D, X , θ0 (τ ), τ ), where h : {0, 1} × support(X ) × Θ × T → R is a known function up to θ0 .

H0 :α0 (τ ) is constant for all τ ∈ T , H1 :α0 (τ ) varies with τ ∈ T . 3. Estimation and inference According to Theorem 3.3 in Koenker and Bassett (1978), the true parameter value θ0 satisfies E [ψ (Y , D, X , θ0 , τ )|D1 > D0 ] = 0,

∂ h(D, X , θ (τ ), τ ) ∂θ (τ ) · (τ − 1{Y ≤ h(D, X , θ (τ ), τ )}) .

ψ (Y , D, X , θ, τ ) =

Note that this problem cannot be solved directly because the group of compliers is not identified. To convert this into a problem involving observed quantities only, we use a weighting function proposed in Abadie (2003). Let

κ0∗ (D, X , Z ) = 1 −

D · (1 − Z ) 1 − π0 (X )



(1 − D) · Z

π0 (X )

,

where π0 (X ) = Pr(Z = 1|X ). The following lemma is given in Abadie (2003). Lemma 2. Let g(Y , D, X ) be any real function of (Y , D, X ). Suppose that Assumption 1 holds and that E |g(Y , D, X )| < ∞. Then, we have E [g(Y , D, X )|D1 > D0 ] =

1 Pr(D1 > D0 )

[ ] · E κ0∗ (D, X , Z ) · g(Y , D, X ) . Now we can rewrite the parameter θ0 as a solution to the following problem: E κ0∗ (D, X , Z ) · ψ (Y , D, X , θ0 , τ ) = 0

[

]

(2)

for any τ ∈ T . However, the numerical algorithm does not ensure the global optimum because the weighting function κ ∗ turns negative when D ̸ = Z , which poses a nonconvex optimization problem. To address this issue, Abadie et al. (2002) proposed a modified version of the weighting function:

κ0 (U) = E [κ0∗ (D, X , Z )|U ] D · (1 − ν0 (U)) (1 − D) · ν0 (U) − , =1− 1 − π0 (X ) π0 (X )

J.H. Kim, B.G. Park / Economics Letters 162 (2018) 49–52

for ν0 (U) = Pr(Z = 1|U) and U = (Y , D, X ). The modified κ0 is always nonnegative. By the law of iterated expectation, Eq. (2) can be rewritten as E [κ0 (U) · ψ (U , θ0 , τ )] = 0.

(3)

The estimator θˆ for θ0 is obtained by numerically solving Eq. (3), which is a linear programming problem with the globally convex objective function. In the first stage, we estimate ν0 and π0 by nonparametric power series estimators. Let J be the number of power series used in the estimation and assume that J increases as the sample size n increases. We assume that the same number of basis functions are used for both ν0 and π0 for brevity of proof. Let πˆ (X ) and νˆ (U) be the fitted values for π0 (X ) and ν0 (U), respectively. Using νˆ and πˆ , the weighting function is computed as

κˆ (Y , D, X ) = 1 −

D · (1 − νˆ (U)) 1 − πˆ (X )



(1 − D) · νˆ (U)

πˆ (X )

The following assumptions are imposed on the power series estimators. Assumption 3. (i) The support of X and the support of U are compact. (ii) For a positive integer s, π0 (X ) is s-times continuously differentiable in x, and ν0 (U) is s-times continuously differentiable in x and y. (iii) π0 (X ) is bounded away from zero and one. (iv) n · J −2s/(k+2) = o(1) and n−1 · J 6 = o(1), where J is the number of power series used in estimation, s is given in Assumption 3(ii) and k is the dimension of the vector X . Assumption 3 is imposed for the uniform convergence of κˆ . Assumption 3(iii) is to avoid zero denominators. Assumption 3(i), (ii) and (iv) are standard assumptions for the uniform convergence of power series, for example, in Newey (1997). In what follows, we write κˆ i for κˆ (Ui ) and ψi (θ, τ ) for ψ (Ui , θ, τ ). Similarly define κ0i , ν0i , π0i , νˆ i , and πˆ i . θˆ (τ ) is estimated using the sample counterpart of Eq. (3). We assume that the estimator θˆ (τ ) satisfies

 n  1 ∑    sup  κˆ i · ψi (θˆ , τ ) = op (n−1/2 ).  τ ∈T  n i=1

Now we characterize the limiting distribution of the quantile process θˆ (·) − θ0 (·), and moreover show that the bootstrap method consistently estimates the limiting distribution of θˆ (·) − θ0 (·). We need the following assumptions to establish the limiting distribution of the quantile process. Assumption 4. (i) T ⊂ (0, 1) is compact. Θ is compact. (ii) For each τ ∈ T , θ0 (τ ) is the unique solution to E [κ0 (U) · ψ (U , θ , τ )] = 0 in Θ . (iii) θ0 (τ ) is continuous in τ . (iv) Conditional on X , each of Y0 and Y1 is continuously distributed with a density function bounded away from zero and infinity. (v) h(D, X , θ , τ ) is twice continuously differentiable with respect to θ uniformly over τ ∈ T almost surely. The first and second derivatives are bounded almost [ surely. (vi) Let V0 (τ )

· ∂ hi (θ∂θ0 (′τ ),τ )

]

=

E fY |DX (hi (θ0 (τ ), τ )|Di , Xi ) ·

assumption for a parameter space. The role of Assumption 4(ii) is to guarantee the identifiability of θ0 . Assumption 4(iv) and (v) are differentiability conditions used for the first order Taylor approximation. The derivative in the approximation V0 is assumed to be nonsingular in Assumption 4(vi). The following theorem establishes the weak convergence of the LQTE process



Theorem 1. Under Assumptions 1–4, n(θˆ (·) − θ0 (·)) weakly converges to G in l∞ (T ), where G is a mean zero Gaussian process with covariance function defined by E G(τ )G(τ ′ )⊤ = V0−1 (τ )E χi (τ ) · χi⊤ (τ ′ ) V0−1 (τ ′ )

[

]

∂ hi (θ0 (τ ),τ ) ∂θ

. The minimum eigenvalue of V0 (τ ) is bounded away

from zero uniformly over τ ∈ T . Assumption 4(i) and (iii) impose the compactness of T and the continuity with respect to τ . They are necessary for the uniform inference with respect to τ . The compactness of Θ is a standard

[

]

with (1 − Di ) · ν0i χi (τ ) = κi · ψi (θ0 , τ ) − E π0i2 ) ] Di · (1 − ν0i ) | − ψ ( θ , τ ) X · (Zi − π0i ). i 0 i (1 − π0i )2

[(



,

51

If τ is fixed at a point, the result in Theorem 1(i) is equivalent to Theorem 3.1 in Abadie et al. (2002). Still, Theorem 1 in this paper is stronger than the pointwise analysis because it establishes the limiting distribution of the process over the entire quantiles. The weak convergence result is required, for example, when making an inference on the global shape of the quantile process. To numerically approximate the limiting distribution of θˆ , we propose the empirical bootstrap method. Define θˆ ∗ to be the bootstrap estimator computed as follows: First, draw a bootstrap sample from the original data. Second, compute θˆ ∗ in the same way θˆ is computed, using the bootstrap sample. Iterate these steps to simulate the bootstrap distribution of θˆ . The following theorem shows the validity of the bootstrap method.



Theorem 2. Under Assumptions 1–4, n(θˆ ∗ (·) − θˆ (·)) weakly con√ verges to G in l∞ (T ), which is the limiting distribution of n(θˆ (·) − θ0 (·)). As an alternative, one may use the subsampling method. As a by-product of Theorem 1, the asymptotic influence function of θˆ (τ ) is given by V0−1 (τ ) · χi (τ ). One can simulate the asymptotic distribution of θˆ (τ ) by subsampling the estimated influence functions. There are two advantages of the subsampling method. First, one can avoid the computational cost to estimate the first-stage weighting functions in each iteration. Second, the subsampling method is easier to accommodate restrictions on the parameter. While the empirical bootstrap is easy and straightforward to implement, the subsampling requires the choice of the subsample size and the estimator for the influence function. See Chernozhukov and Fernández-Val (2005) for more discussion. We briefly show how the results can be applied to the examples considered above. Example 1. (continued) To test for the null hypothesis in Example 1, one may consider a Cramer–von Mises type test statistic given by

√ Tn =



min{αˆ (τ ), 0}w (τ )dτ ,

n T

where αˆ (τ ) is the LQTE estimator for α (τ ), and w (τ ) is a nonnegative weighting function. The null limiting distribution of Tn follows from the weak convergence of αˆ . It is easy to show that the test statistic diverges when the alternative hypothesis is true. To calculate the critical value, one can use the empirical bootstrap with an appropriate recentering method to mimic the least favorable null hypothesis, α (τ ) = 0.

52

J.H. Kim, B.G. Park / Economics Letters 162 (2018) 49–52

Table 1 Rejection rates.

We base our hypothesis testing on the Cramer–von Mises type test statistic as follows:

Sample size

DGP 1

DGP 2

DGP 3

DGP 4

100 200 500

0.069 0.057 0.042

0.007 0.012 0.015

0.120 0.185 0.492

0.662 0.908 0.999

Example 2. (continued) The hypothesis of a constant treatment effect can be tested using the Cramer–von Mises test statistic of the form

√ Tn =



[αˆ (τ ) − α] ¯ 2 w(τ )dτ ,

n T

where αˆ (τ ) is the LQTE estimator for α (τ ), α¯ = T αˆ (τ )dτ , and w (τ ) is a nonnegative weighting function. In a similar way to the previous example, one can derive the null limiting distribution of the test statistic from the weak convergence of αˆ . The critical value can be calculated by the bootstrap method with an appropriate recentering.



4. Monte Carlo simulations In this section, we conduct a series of Monte Carlo simulations to examine the finite sample performance of our proposed bootstrap procedure. We consider data generating processes (DGP) with the selection equation D = I {Z + 0.5X > V }, where X and V are drawn from i.i.d. χ 2 (3), and Z is i.i.d. Bernoulli(0.5). All variables are mutually independent. We assume that the conditional quantile function of the outcome for compliers is given by QY (τ |D, X , D1 > D0 ) = α · τ + β (τ ) · D + γ · τ · X , and the τ th local quantile treatment effect is given by β (τ ). We fix α = 0.7 and γ = 0.5 in all DGPs. Four different designs for β (τ ) are considered. DGP 1 assumes β (τ ) = 0 for all τ ∈ (0, 1). DGP 2 through 4 assume β (τ ) = 2τ , β (τ ) = 2τ − 1, and β (τ ) = 2τ − 2, respectively. We test the null hypothesis of positive treatment effects over the entire quantile: H0 :β (τ ) ≥ 0 for all τ ∈ (0, 1), H1 :β (τ ) < 0 for some τ ∈ (0, 1). DGPs 1 and 2 satisfy the null hypothesis, but DGPs 3 and 4 do not.

∫ T =

1

min{βˆ (τ ), 0}dτ ,

0

where βˆ (τ ) is the LQTE estimate for β (τ ). We compute βˆ (τ ) for 49 equally-spaced values of τ . We calculate the critical value by the bootstrap of 500 times. The sample sizes are n = 100, 200, and 500, and the number of replications is set to 1000. In Table 1, we report the empirical rejection probabilities of our test under the DGPs at the 5% significance level. The table shows that the size of our test is close to 5% in DGP 1, but the size is smaller in DGP 2. This is because DGP 1 is the least favorable case under the null hypothesis and DGP 2 is an interior value in the null hypothesis. In DGP 3, the test has power against a DGP that violates the null hypothesis only on a subset of quantiles. The power increases in DGP 4 as the violation gets larger. In both DGPs, the power increases as the sample size increases as expected. References Abadie, A., 2003. Semiparametric instrumental variable estimation of treatment response models. J. Econometrics 113 (2), 231–263. Abadie, A., Angrist, J., Imbens, G., 2002. Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings. Econometrica 70 (1), 91–117. Chernozhukov, V., Fernández-Val, I., 2005. Subsampling inference on quantile regression processes. Sankhya 67 (Part 2), 253–276. Chernozhukov, V., Hansen, C., 2004. The impact of 401 K on savings: an IV-QR analysis. Rev. Econ. Stat. 86 (3), 735–751. Chernozhukov, V., Hansen, C., 2007. Instrumental quantile regression: A robust inference approach. J. Econometrics 142 (1), 379–398. Ferreira, F.H., Firpo, S., Galvao, A.F., 2017. Estimation and inference for actual and counterfactual growth incidence curves. Firpo, S., 2007. Efficient semiparametric estimation of quantile treatment effects. Econometrica 75 (1), 259–276. Firpo, S., Pinto, C., 2016. Identification and estimation of distributional impacts of interventions using changes in inequality measures. J. Appl. Econometrics 31, 457–486. Gutenbrunner, C., Jurecková, J., 1992. Regression rank scores and regression quantiles. Ann. Statist. 20 (1), 305–330. Imbens, G., Angrist, J., 1994. Identification and estimation of local average treatment effects. Econometrica 62 (2), 467–475. Koenker, R., Bassett, G., 1978. Regression quantiles. Econometrica 46 (1), 33–50. Koenker, R., Xiao, Z., 2002. Inference on the quantile regression process. Econometrica 70 (4), 1583–1612. Newey, W.K., 1997. Convergence rates and asymptotic normality for series estimators. J. Econometrics 79 (1), 147–168.