Causal inference by quantile regression kink designs

Causal inference by quantile regression kink designs

Accepted Manuscript Causal inference by quantile regression kink designs Harold D. Chiang, Yuya Sasaki PII: DOI: Reference: S0304-4076(19)30038-7 ht...

828KB Sizes 0 Downloads 104 Views

Accepted Manuscript Causal inference by quantile regression kink designs Harold D. Chiang, Yuya Sasaki

PII: DOI: Reference:

S0304-4076(19)30038-7 https://doi.org/10.1016/j.jeconom.2019.02.005 ECONOM 4642

To appear in:

Journal of Econometrics

Received date : 31 August 2016 Revised date : 21 December 2018 Accepted date : 24 February 2019 Please cite this article as: H.D. Chiang and Y. Sasaki, Causal inference by quantile regression kink designs. Journal of Econometrics (2019), https://doi.org/10.1016/j.jeconom.2019.02.005 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Causal Inference by Quantile Regression Kink Designs Harold D. Chiang∗

Yuya Sasaki†‡

Vanderbilt University February 25, 2019

Abstract The quantile regression kink design (QRKD) is proposed by empirical researchers, but its causal interpretation remains unknown. We show that the QRKD estimand measures a weighted average of heterogeneous marginal effects at respective conditional quantiles of outcome given a designed kink point. We also derive limit processes for the QRKD estimator to conduct statistical inference on heterogeneous treatment effects using the QRKD. Applying our methods to the Continuous Wage and Benefit History Project (CWBH) data, we find heterogeneous positive causal effects of unemployment insurance benefits on unemployment durations. These effects are larger for individuals with longer unemployment durations. Keywords: causal inference, heterogeneous treatment effects, identification, regression kink design, quantile regression, unemployment duration. JEL Classification: C14, C21 ∗

Harold D. Chiang: [email protected]. Department of Economics, Vanderbilt University, VU Station

B #351819, 2301 Vanderbilt Place, Nashville, TN 37235-1819, USA †

Yuya Sasaki (corresponding author): [email protected]. Department of Economics, Vanderbilt University,

VU Station B #351819, 2301 Vanderbilt Place, Nashville, TN 37235-1819, USA. ‡

We would like to thank Patty Anderson and Bruce Meyer for kindly agreeing to our use of the CWBH data.

We benefited from very useful comments by Han Hong (the editor), the associate editor, anonymous referees, Matias Cattaneo, Andrew Chesher, Antonio Galvao, Emmanuel Guerre, Blaise Melly, Chris Taber, Jungmo Yoon, seminar participants at Academia Sinica, National Chengchi University, National University of Singapore, Otaru University of Commerce, Penn State University, Queen Mary University of London, Temple University, University of Pittsburgh, University of Surrey, University of Wisconsin-Madison and Vanderbilt University, and conference participants at AMES 2016, CeMMAP-SNU-Tokyo Conference on Advances in Microeconometrics, ESEM 2016, New York Camp Econometrics XI. All remaining errors are ours.

1

1

Introduction

Some recent empirical research papers, including Nielsen, Sørensen and Taber (2010), Landais (2015), Simonsen, Skipper and Skipper (2015), Card, Lee, Pei and Weber (2016), and Dong (2016), conduct causal inference via the regression kink design (RKD). A natural extension of the RKD with a flavor of unobserved heterogeneity is the quantile RKD (QRKD), which is the object that we explore in this paper. Specifically, consider the quantile derivative Wald ratio of the form QRKD(τ ) =

limx↓x0

∂ ∂x QY |X (τ | x) d limx↓x0 dx b(x)

∂ ∂x QY |X (τ d limx↑x0 dx b(x)

− limx↑x0 −

| x)

(1.1)

at a design point x0 of a running variable x, where QY |X (τ |x) := inf{y : F (y|x) ≥ τ } denotes the τ -th conditional quantile function of Y given X = x, and b is a policy function. Note that it is analogous to the RKD estimand of Card, Lee, Pei and Weber (2016): RKD =

limx↓x0

∂ ∂x

∂ ∂x E[Y d limx↑x0 dx b(x)

E[Y | X = x] − limx↑x0

limx↓x0

d dx b(x)



| X = x]

,

(1.2)

except that the conditional expectations in the numerator are replaced by the corresponding conditional quantiles. While the QRKD estimand (1.1) is of potential interest in the empirical literature for assessment of heterogeneous treatment effects, little seems known about its econometric theories. Specifically, Landais (2011) considers (1.1), but no formal theories of identification, estimation, and inference are provided. This paper develops causal interpretation (identification) and estimation theories for the QRKD estimand (1.1). In addition, we also present a practical guideline of robust inference by pivotal simulations, a procedure for bandwidth selection, and statistical testing of heterogeneous treatment effects based on the QRKD. To understand our objective, consider a structural relation y = g(b, x, ), where the outcome y is determined by observed factors (b, x) and unobserved factors . The marginal causal effect of b on y for individual i with (bi , xi , i ) is quantified by g1 (bi , xi , i ), where g1 denotes the partial derivative of g with respect to the first argument. An estimand θ has a causal interpretation at (b, x) if it admits θ=

Z

g1 (b, x, )dµ() 2

(1.3)

for some probability measure µ whose support is contained in that of . The literature has proposed this way of causal interpretations for major statistical estimands. Examples include the OLS slope (Yitzhaki, 1996), the two stage least squares estimand under multivalued discrete treatments (Angrist and Imbens, 1995), an IV estimand under partial equilibrium (Angrist, Graddy and Imbens, 2000), a list of most common treatment effects (Heckman and Vytlacil, 2005), and the slope of the quantile regression (Kato and Sasaki, 2017). In a similar spirit, we argue in the present paper that the QRKD estimand (1.1) can be reconciled with the causal interpretation of the form (1.3). Making causal interpretations of the QRKD estimand (1.1) in the form (1.3) is perhaps more challenging than the mean RKD estimand (1.2) because the differentiation operator

d dx

and the conditional

quantile do not ‘swap.’ For the mean RKD estimand (1.2), the interchangeability of the differentiation operator and the expectation (integration) operator allows each term of the numerator in (1.2) to be additively decomposed into two parts, namely the causal effects and the endogeneity effects. Taking the difference of two terms in the numerator then cancels out the endogeneity effects, leaving only the causal effects. This trick allows the mean RKD estimand (1.2) to have causal interpretations in the presence of endogeneity. Due to the lack of such interchangeability for the case of quatiles, this trick is not straightforwardly inherited by the quantile counterpart (1.1). Having said this, we show in Section 2 that a similar decomposition is possible for the QRKD estimand (1.1), and therefore argue that its causal interpretations are possible even under the lack of monotonicity. Specifically, we show that the QRKD estimand corresponds to the quantile marginal effect under monotonicity and to a weighted average of marginal effects under non-monotonicity. For estimation of the causal effects, we propose a sample-counterpart estimator for the QRKD estimand (1.1) in Section 3. To derive its asymptotic properties, we take advantage of the existing literature on uniform Bahadur representations for quantile-type loss functions, including Kong, Linton and Xia (2010), Guerre and Sabbah (2012), Sabbah (2014), and Qu and Yoon (2015a). Qu and Yoon (2015b) apply the results of Qu and Yoon (2015a) to develop methods of statistical inference with

3

quantile regression discontinuity designs (QRDD), which are closely related to our QRKD framework. We take a similar approach with suitable modifications to derive asymptotic properties of our QRKD estimator. Weak convergence results for the estimator as quantile processes are derived. Applying the weak convergence results, we propose procedures for testing treatment significance and treatment heterogeneity following Koenker and Xiao (2002), Chernozhukov and Fern´andez-Val (2005) and Qu and Yoon (2015b). Simulation studies presented in Section 4 support the theoretical properties. Literature: The method studied in this paper falls in the broad framework of design-based causal inference, including RDD and RKD. There is an extensive body of literature on RDD by now – see a historical review by Cook (2008) and surveys in the special issue of Journal of Econometrics edited by Imbens and Lemieux (2008), Imbens and Wooldridge (2009; Sec. 6.4), Lee and Lemieux (2010), and Volume 38 of Advances in Econometrics edited by Cattaneo and Escanciano (2016), as well as the references cited therein. The first extension to quantile treatment effects in the RDD framework was made by Frandsen, Fr¨ olich and Melly (2012). More recently, Qu and Yoon (2015b) develop uniform inference methods with QRDD that empirical researchers can use to test a variety of important empirical questions on heterogeneous treatment effects. While the RDD has a rich set of empirical and theoretical results including the quantile extensions, the RKD method which developed more recently does not have a quantile counterpart in the literature yet, despite potential demands for it by empirical researchers (e.g., Landais, 2011). Our paper can be seen as a quantile extension to Card, Lee, Pei and Weber (2016) and a RKD counterpart of Qu and Yoon (2015b).

2

Causal Interpretation of the QRKD Estimand

In this section, we develop some causal interpretations of the QRKD estimand (1.1). For the purpose of illustration, we first present a simple case with rank invariance in Section 2.1. It is followed by a formal argument for general cases in Section 2.2.

4

2.1

Illustration: Causal Interpretation under Rank Invariance

The causal relation of interest is represented by the structural equation y = g(b, x, ). The outcome y is determined through the structural function g by two observed factors, b ∈ R and x ∈ R, and a scalar unobserved factor,  ∈ R. We assume that g is monotone increasing in , effectively imposing the rank invariance; causal interpretations in a more general setup with non-monotone g and/or multivariate  is established in Section 2.2. The factor b is a treatment input, and is in turn determined by the running variable x through the structural equation b = b(x) for a known policy function b. limx→x− 0

db(x) dx

We say that b has a kink at x0 if b0 (x+ 0 ) := limx→x+ 0

db(x) dx

6=

− + =: b0 (x− 0 ) is true, where x → x0 and x → x0 mean x ↓ x0 and x ↑ x0 , respectively.

Throughout this paper, we assume that the location, x0 , of the kink is known from a policy-based research design, as is the case with Card, Lee, Pei and Weber (2016). 0 − Assumption 1. b0 (x+ 0 ) 6= b (x0 ) holds, and b is continuous on R and differentiable on R \ {x0 }.

The structural partial effects are g1 (b, x, ) := ∂ ∂ g(b, x, ).

∂ ∂b g(b, x, ),

g2 (b, x, ) :=

∂ ∂x g(b, x, )

and g3 (b, x, ) :=

In particular, a researcher is interested in g1 which measures heterogeneous partial effects

of the treatment intensity b on an outcome y. While the structural partial effect g1 is of interest, it is not clear if the QRKD estimand (1.1) provides any information about g1 . In this section, we argue that (1.1) does have a causal interpretation in the sense that it measures the structural causal effect g1 (b(x0 ), x0 , ) at the τ -th conditional quantile of ε given X = x0 . Under regularity conditions (to be discussed in Section 2.2 in detail), some calculations yield the decomposition ∂ Q (τ | x) = g1 (b(x), x, ) · b0 (x) + g2 (b(x), x, ) − ∂x Y |X 5

R

∂ −∞ ∂x fε|X (e

| x)de

fε|X ( | x)

· g3 (b(x), x, ),

(2.1)

where τ = Fε|X ( | x). The first term on the right-hand side is the partial effect of the running variable x on the outcome y through the policy function b. The second term is the direct partial effect of the running variable on the outcome y. The third term measures the effect of endogeneity in the running variable x. We can see that this third term is zero under exogeneity,

∂ ∂x fε|X

= 0. In order to get the

causal effect g1 (b(x), x, ) of interest through the QRKD estimand (1.1), therefore, we want to remove the last two terms in (2.1). Suppose that the designed kink condition of Assumption 1 is true, but all the other functions, g1 , g2 , g3 , 1/fε|X and

∂ ∂x fε|X ,

in the right-hand side of (2.1) are continuous in (b, x) at (b(x0 ), x0 ). Then,

(2.1) yields ∂ ∂x QY |X (τ

| x+ 0)−

b0 (x+ 0)−

∂ ∂x QY |X (τ b0 (x− 0)

| x− 0)

= g1 (b(x0 ), x0 , ),

(2.2)

showing that the QRKD estimand (1.1) measures the structural causal effect g1 (b(x0 ), x0 , ) of b on y for the subpopulation of individuals at the τ -th conditional quantile of ε given X = x0 . This section provides only an informal argument for ease of exposition, but Section 2.2 provides a formal mathematical argument under a general setup without the rank invariance assumption.

2.2

General Result: Causal Interpretation without Rank Invariance

In this section, we continue to use the basic settings from Section 2.1 except that the unobserved factors  are now allowed to be M -dimensional, as opposed to being a scalar, and that g is now allowed to be non-monotone with respect to any coordinate of . As such, we can consider general structural functions g without the rank invariance. In this case, there can exist multiple values of  corresponding to a single conditional quantile τ of Y given X = x0 , and therefore the simple identifying equality (2.2) for the case of rank invariance cannot be established in general. Furthermore, QRKD(τ ) even fails to equal the average of the structural derivatives g1 (b(x0 ), x0 , ) for those  that coincide with the τ -th conditional quantile of Y given X = x0 . Nonetheless, we argue that QRKD(τ ) represents a weighted average of the structural derivatives g1 (b(x0 ), x0 , ) for those  that coincide with the τ -th

6

conditional quantile of Y given X = x0 . Define the lower contour set of  evaluated by g(b(x), x, ·) below a given level of y as follows: V (y, x) = { ∈ RM |g(b(x), x, ) ≤ y}. Its boundary is denoted by ∂V (y, x). Note that ∂V (y, x) represents the boundary in the space of unobserved latent variable  of those individuals residing at the FY |X (y|x)-th conditional quantile function QY |X at (X, Y ) = (x, y). Furthermore, the velocities of the boundary ∂V (y, x) at  with respect to a change in y and a change in x are denoted by ∂υ(y, x; )/∂y and ∂υ(y, x; )/∂x, respectively. The velocity ∂υ(y, x; )/∂y measures the direction and the magnitude of change in the boundary ∂V (y, x) in response a change in the outcome y while the running variable x is fixed. For instance, as the outcome y infinitesimally increases while x is fixed, individuals with infinitesimally higher abilities  may be included in the lower contour set V (y, x), and hence the velocity ∂υ(y, x; )/∂y is supposedly a positive vector if the outcome tends to be increasing in the abilities . Likewise, the velocity ∂υ(y, x; )/∂x measures the direction and the magnitude of change in the boundary ∂V (y, x) in response a change in the running variable x while the outcome y is fixed. Since the concept of quantile regressions is to measure the response of y to x while fixing the probability of the event V (y, x), balancing the inflows of  measured by the velocity ∂υ(y, x; )/∂y and the outflows of  measured by the velocity ∂υ(y, x; )/∂x is the key to controlling for the unobservables  to make a causal interpretation of the quantile regression derivative. For a short hand notation, we write h(x, ) = g(b(x), x, ) and hx (x, ) =

∂h(x,) ∂x .

Under regularity conditions to be stated below, the implicit function theorem

allows the velocities defined above to be explicitly written as ∂υ(y, x; )/∂y = 1/||∇ h(x, )|| and ∂υ(y, x; )/∂x = −hx (x, )/||∇ h(x, )|| for all  ∈ V (y, x). Let Σ denote an (M − 1)-dimensional rectangle, and we parameterize the manifold ∂V (y, x) by Πy,x : Σ → ∂V (y, x) for all (y, x). We refer to Padula (2011) for further details of these objects and notations. Let mM and H M −1 denote the Lebesgue measure on RM and the Hausdorff measure1 on ∂V (y, x), respectively. Letting X = supp(X), 1

The Hausdorff measure is defined as follows.

M

Define a function HpM −1 : 2R

7

→ R by HpM −1 (S) =

we make the following assumptions. Assumption 2. (i) h(·, ) is continuously differentiable on X \ {x0 } for all  ∈ E and h(x, ·) is continuously differentiable for all x ∈ X . (ii) k∇ h(x, ·)k = 6 0 on ∂V (y, x) for all (x, y) ∈ X × Y. (iii) The conditional distribution of ε given X is absolutely continuous with respect to mM , fε|X is continuously differentiable, and X 3 x 7→ fε|X (·|x) ∈ L1 (RM , mM ) is continuous.2 (iv) x)dH M −1 () > 0 for all (x, y) ∈ X × Y.

R

∂V (y,x) fε|X (

|

Assumption 3. (i) For M = 1: ∂V (y, x) is a finite set, and h(x, ·) is locally invertible with a continuously differentiable local inverse function in a neighborhood of each point in ∂V (y, x). (ii) For M > 1: Σ×X 3 (s, x) 7→ Πy,x (s) ∈ RM is continuous for all y ∈ Y, and Σ×Y 3 (s, y) 7→ Πy,x (s) ∈ RM is continuous for all x ∈ X . X 3 x 7→ ∂υ(y, x; Πy,x (·))/∂x ∈ L1 (Σ, mM −1 ) is continuous for all y ∈ Y.3 Y 3 y 7→ ∂υ(y, x; Πy,x (·))/∂y ∈ L1 (Σ, mM −1 ) is continuous for all x ∈ X .4 Assumption 4. Let γ(x, ) := k∇ h(x, )k−1 . There exist p > 1 and q > 1 satisfying p−1 + q −1 = 1 such that kγ(x, · )kLp (∂V (y,x),H M −1 ) < ∞ and kfε kLq (∂V (y,x),H M −1 ) < ∞ hold for all (x, y) ∈ X × Y. Assumption 5. (i) There exists wy,x ∈ L1 (∂V (y, x), H M −1 ) such that |γ(x, )hx (x, )fε|X (|x)| ≤ wy,x () and |γ(x, )fε|X (|x)| ≤ wy,x () for all  ∈ ∂V (y, x) for all (y, x) ∈ Y × X . (ii) There exists wy,x ∈ L1 (V (y, x), mM −1 ) such that |∂fε|X (|x)/∂x| ≤ wy,x () for all  ∈ V (y, x) for all (y, x) ∈ Y ×X . Assumptions 2, 3 and 4 are used to derive a structural decomposition of the quantile partial derivative – see Sasaki (2015) for detailed discussions of these assumptions. Assumption 3 branches into two cases, depending on (i) M = 1 or (ii) M > 1. We note that case (i) accommodates a non-monotone structure g in a scalar unobservable ε, whereas case (ii) generally concerns about non-monotonicity due supδ>0 inf

P∞

i=1 (diamSi )

M −1

M −1 | ∪∞ : B(V ) → R of HpM −1 to i=1 Si ⊃ S, diamSi < δ . We then define a restriction H

the Borel sigma algebra B(V ) of a metric space V ⊂ RM is a measure, and we call it the (M − 1)-dimensional Hausdorff measure. Intuitively, H M −1 measures a scaled area of Borel subsets of the (M − 1)-dimensional manifold V ⊂ RM . R 2 That is, for all δ1 > 0 there exists δ2 > 0 such that |x0 − x| < δ2 implies fε|X (|x0 ) − fε|X (|x) dmM () < δ1 . R 3 That is, ∀δ1 > 0 ∃δ2 > 0 such that |x0 − x| < δ2 implies |∂υ(y, x; Πy,x0 (s))/∂x − ∂υ(y, x; Πy,x (s))/∂x| ds < δ1 . R 4 That is, ∀δ1 > 0 ∃δ2 > 0 such that |y 0 − y| < δ2 implies |∂υ(y, x; Πy0 ,x (s))/∂y − ∂υ(y, x; Πy,x (s))/∂y| ds < δ1 .

8

to multi-dimensional unobservables ε. These two cases are stated separately because the restriction in case (ii) among others entails that ∂V (y, x) is a connected set, which is too strong for case (i) with non-monotonicity. In Assumptions 2 (iv) and 4, statements concern about integration of fε|X=x on ∂V (y, x). This manifold ∂V (y, x) has a Lebesgue measure zero, i.e., mM (∂V (y, x)) = 0. On the other hand, the Hausdorff measure evaluates this Lebesgue null set positively, i.e., H M −1 (∂V (y, x)) > 0. Hence these assumptions are nontrivial statements. The regularity conditions in Assumption 5 facilitate the dominated convergence theorem to make a structural sense of the QRKD estimand (1.1). Specifically, by the dominated convergence theorem, Assumption 5 (i) and (ii) together with Assumption 2 (iv) and 4 are sufficient for the existence of the ∂ ∂ reduced-form expressions limx→x+ ∂x QY |X (τ |x) and limx→x− ∂x QY |X (τ |x). With B(y, x) denoting the 0

0

−1 collection of Borel subsets of ∂V (y, x), we define the function µM y,x : B(y, x) → R by −1 µM y,x (S)

:= R

R

1 M −1 () s k∇ h(x,)k fε|X (|x)dH 1 M −1 () ∂V (y,x) k∇ h(x,)k fε|X (|x)dH

for all S ∈ B(y, x).

Because the zero-dimensional Hausdorff measure H 0 is a counting measure, the case of M = 1 yields µ0y,x ({})

:=

1 k∇ h(x,)k fε|X (|x) P 1 ∈∂V (y,x) k∇ h(x,)k fε|X (|x)

for all  ∈ ∂V (y, x).

(2.3)

The next theorem claims that this is a probability measure and gives weights with respect to which the QRKD estimand (1.1) measures the average structural causal effect of the treatment intensity b on an outcome y for those individuals at the τ -th conditional quantile of Y given X = x0 . Theorem 1. Suppose that Assumptions 1, 2, 3, 4 and 5 hold. Let τ ∈ (0, 1) and y = QY |X (τ |x0 ). −1 Then, µM y,x0 is a probability measure on ∂V (y, x0 ), and

QRKD(τ ) =

Z

∂V (y,x0 )

−1 g1 (b(x0 ), x0 , )dµM −1 [g1 (b(x0 ), x0 , ε)] . y,x0 () = EµM y,x 0

(2.4)

As is often the case in the treatment literature (e.g., Angrist and Imbens, 1995), this theorem shows a causal interpretation in terms of a weighted average. Specifically, (2.4) shows that the QRKD estimand (1.1) measures a weighted average of the heterogeneous causal effects g1 (b(x0 ), x0 , ) displayed 9

on the right-hand side of (2.4). Since the weights are positive on the support of the conditional distribution of ε given X = x0 , the QRKD estimand is a strict convex combination of the ceteris paribus causal effects of b on y for those individuals at the τ -th conditional quantile of Y given X = x0 . −1 The weights given in the definition of µM y,x0 are proportional to fε|X (|x0 )/k∇ h(x0 , )k. Since

fε|X (|x0 ) is the conditional density of the unobservables ε given X = x0 , the discrepancy between the weighted and unweighted averages is imputed to the denominator, k∇ h(x0 , )k. For example, larger weights are assigned to those locations of  ∈ ∂V (y, x0 ) at which k∇ h(x0 , )k is smaller. In other words, the QRKD emphasizes those locations of  ∈ ∂V (y, x0 ) at which the effects of unobservables ε on the structure g are smaller in magnitude. On the other hand, the QRKD de-emphasizes those locations of  ∈ ∂V (y, x0 ) at which the effects of unobservables ε on the structure g are larger in magnitude. One may worry about the obscurity of the causal interpretations under the ‘weighted’ averages. Note that the weighted average becomes an unweighted average when k∇ h(x0 , )k is constant in . There are some cases where the weight is constant. As an example which is often relevant to empirical practices, the polynomial random coefficient models of the form g(b, x, ) = 00 +

pb X

ν0 bν +

ν=1

px X

0ν xν +

ν=1

pb X px X

νb νx bνb xνx

(2.5)

νb =1 νx =1

satisfies that k∇ h(x0 , )k is constant in  = (00 , 10 , . . . , pb 0 , 01 , . . . , 0px , 11 , . . . , pb px ). Therefore, Theorem 1 entails an unweighted average causal interpretation for the QRKD estimand under this model with −1 µM y,x (S)

:= R

R

s fε|X (|x)dH

M −1 ()

∂V (y,x) fε|X (|x)dH

M −1 ()

for all S ∈ B(y, x).

When the unobservable ε is a scalar random variable (i.e., M = 1), the Hausdorff measure H M −1 becomes a counting measure H 0 on the zero-dimensional manifold ∂V (y, x0 ) ⊂ R. In that case, (2.4)

10

may be rewritten as QRKD(τ ) =

X

∈∂V (y,x0 )

−1 g1 (b(x0 ), x0 , ) · µM M −1 [g1 (b(x0 ), x0 , ε)] . y,x0 ({}) = Eµy,x 0

(2.6)

In particular, the case where ∂V (y, x0 ) is a singleton allows for the following straightforward causal interpretation for the QRKD estimand. Corollary 1. Suppose that the assumptions for Theorem 1 hold. Let τ ∈ (0, 1) and y = QY |X (τ |x0 ). If ε is a scalar random variable (i.e., M = 1) and ∂V (y, x0 ) is a singleton, then QRKD(τ ) = g1 (b(x0 ), x0 , (y, x0 )), where (y, x0 ) is the sole element of ∂V (y, x0 ). Note that this corollary is a generalization of (2.2), and admits the straightforward causal interpretation QRKD(τ ) = g1 (b(x0 ), x0 , (y, x0 )) without requiring the ‘global’ monotonicity of g in . To see the point in case, consider the structural function given by 1 1 g(b, x, ) = −9b + b3 − 9x + x3 . 3 3 If b(x0 ) + x0 6= 0, then this structure is not globally monotone in  at x = x0 . However, ∂V (y, x0 ) is a singleton (i.e., g(b(x0 ), x0 , ·) is locally monotone) for each value of y 6∈ [−18, 18], and hence the causal interpretation QRKD(τ ) = g1 (b(x0 ), x0 , (y, x0 )) of Corollary 1 applies. On the other hand, for each value of y ∈ [−18, 18], we can interpret the QRKD at most in terms of the weighted sum of the form (2.6). In either of these cases, heterogeneity in values of the QRKD estimand across quantiles τ can be used as evidence for heterogeneity in treatment effects. Therefore, we can still conduct statistical inference for heterogeneous treatment effects based on the weak convergence results presented below in Section 3.

11

3

Estimation and Inference

3.1

The Estimator and Its Asymptotic Distribution

We propose to estimate the QRKD estimand (1.1) by its sample counterpart b+ b− \ ) = β1 (τ ) − β1 (τ ) , QRKD(τ 0 − b0 (x+ 0 ) − b (x0 )

(3.1)

where the two terms in the numerator are given by the p-th order local polynomial quantile smoothers βb1+ (τ ) =ι02

βb1− (τ ) =ι03

argmin

n X

(α,β1+ ,β1− ,...,βp+ ,βp− )∈R2p+1 i=1 n X

argmin

(α,β1+ ,β1− ,...,βp+ ,βp− )∈R2p+1 i=1

K

p x − x   v X i 0 − − (xi − x0 ) (βv+ d+ ρτ yi − α − + β d ) v i i hn,τ v! v=1

p x − x   v X i 0 − − (xi − x0 ) ρτ yi − α − (βv+ d+ K i + βv di ) hn,τ v! v=1

for τ ∈ T , where T ⊂ (0, 1) is a closed interval, K is a kernel function, ρτ (u) = u(τ − 1{u < 0}) for − 0 0 2p+1 for u ∈ R, d+ i = 1{xi > x0 }, di = 1{xi < x0 }, and ι2 = [0, 1, 0, 0, ..., 0] , ι3 = [0, 0, 1, 0, ..., 0] ∈ R

a fixed integer p ≥ 1 of polynomial order. Following Landais (2011), we have the level parameter α common between the left (xi < x0 ) and the right (xi > x0 ), while higher-order coefficients are allowed to be different between the left and the right. In the remainder of this section, we present weak convergence results for the quantile processes τ 7→ (βb1+ (τ ), βb1− (τ )), which in turn yield a weak convergence result for the quantile process of the

QRKD estimator of treatment effects. Using these results, we then propose methods to test hypotheses concerning heterogeneous treatment effects in Section 3.2. Define the kernel-dependent constant matrix N =

R

− p + p − 0 2p+1 , d+ = 1{u > 0} and u ¯u ¯0 K(u)du, where u ¯ = [1, ud+ u , udu , ..., u du , u du ] ∈ R u

d− ¯ > x0 such that the following u = 1{u < 0}. We assume that there exist constants x < x0 and x conditions are satisfied. Assumption 6. (i) (a) The density function fX (·) exists and is continuously differentiable in a neighborhood of x0 and 0 < fX (x0 ) < ∞. (b) {(yi , xi )}ni=1 is an i.i.d. sample of n observations of the bivariate random vector (Y, X). (ii) (a) fY |X (QY |X ( · |x0 )|x0 ) is Lipschitz continuous on T . (b) There exist finite constants fL > 0, fU > 0, and ξ > 0, such that fY |X (QY |X (τ |x) + η|x) lies between fL and 12

− fU for all τ ∈ T , |η| ≤ ξ and x ∈ [x, x ¯]. (iii) (a) QY |X ( · |x0 ), ∂QY |X ( · |x+ 0 )/∂τ , and ∂QY |X ( · |x0 )/∂τ

exist and are Lipschitz continuous on T . (b) QY |X (τ | · ) is continuous at x0 . For v = 0, 1, ..., p + 1, (x, τ ) 7→ ∂ v QY |X (τ |x)/∂xv exists and is Lipschitz continuous on {(x, τ )|x ∈ (x0 , x ¯], τ ∈ T } and {(x, τ )|x ∈ [x, x0 ), τ ∈ T }. (iv) The kernel K is compactly supported, having finite first-order derivative and satisfying K(·) ≥ 0,

R

K(u)du = 1, and

R

uK(u)du = 0.The matrix N is positive definite. (v) The

bandwidths satisfy hn,τ = c(τ )hn , where nh3n → ∞ and nhn2p+3 → 0 as n → ∞, and c(·) is Lipschitz continuous satisfying 0 < c ≤ c(τ ) ≤ c < ∞ for all τ ∈ T. Parts (i)–(v) of this assumption correspond to Assumptions 1–5, respectively, of Qu and Yoon (2015a), adapted to our framework. Part (i) (a) also implies Assumption 6 of Qu and Yoon (2015a). Part (i) (a) requires smoothness of the density of the running variable. This assumption can be interpreted as the design requirement for absence of endogenous sorting across the kink point x0 . The i.i.d assumption in part (i) (b) is usually considered to be satisfied for micro data of random samples. Part (ii) concerns about regularities of the conditional density function of Y given X. It requires sufficient smoothness, but does not rule out a quantile regression kink at x0 , which is the main crucial assumption for our identification argument. Part (iii) concerns about regularities of the conditional quantile function of Y given X. Like part (ii), it does not rule out a quantile regression kink at x0 . For conciseness of the statements, we write parts (ii) and (iii) of Assumption 6 in terms of high-level objects, but we also provide sufficient conditions stated in terms of structural primitives in Appendix B.5. Part (iv) prescribes requirements for a kernel function to be chosen by a user. In Section 4 for simulation studies, we propose an example of such a choice to satisfy this requirement. Part (v) specifies admissible rates at which the bandwidth parameters diminish as the sample size becomes large. It obeys the standard rate for a first-order derivative estimation, but we also require its uniformity over quantiles τ in T . While nh2p+3 → 0 is required for a valid inference with higher n order bias reduction, it is not necessary for the uniform Bahadur representation to hold. Under this set of assumptions, we obtain uniform Bahadur representations for the component

13

estimators, βb+ (τ ) and βb− (τ ), of our interest similarly to Qu and Yoon (2015a) – see Lemma 1 in

Appendix B.1. The following weak convergence result in turn derives from the uniform Bahadur representations. Theorem 2. Suppose that Assumption 6 holds. Let Zn = Zn (·, ·) be defined by 



Zn (τ1 , 2) =    Zn (τ2 , 3)   q +  nh3n,τ1 βb1 (τ1 ) −   q βb− (τ2 ) − nh3 n,τ2

1

∂QY |X (τ1 |x+ 0 ) ∂x ∂QY |X (τ2 |x− 0 ) ∂x



ι0 (N )−1 hpn,τ1 2(p+1)!

− hpn,τ2

ι03 (N )−1 (p+1)!

We have



  u K(u)du  u ¯ + R     . R ∂ p+1 QY |X (τ2 |x+ ∂ p+1 QY |X (τ2 |x− p+1 0 ) + 0 ) − u ¯ du + du u K(u)du ∂xp+1 ∂xp+1 R

R



∂ p+1 QY |X (τ1 |x+ 0 ) + du ∂xp+1





∂ p+1 QY |X (τ1 |x− 0 ) − du ∂xp+1



p+1



Zn (·, 2) G(·, 2)  ⇒      Zn (·, 3) G(·, 3)

for a tight zero-mean Gaussian process G : Ω 7→ `∞ (T × {2, 3}) with covariance function given by E[G(τ1 , j1 )G(τ2 , j2 )] =

ι0j1 N −1 T (τ1 , τ2 )N −1 ιj2 (τ1 ∧ τ2 − τ1 τ2 ) fX (x0 )fY |X (QY |X (τ1 |x0 )|x0 )fY |X (QY |X (τ2 |x0 )|x0 )

for each τ1 , τ2 ∈ T and j1 , j2 ∈ {2, 3}, where T (τ1 , τ2 ) = (c(τ1 )c(τ2 ))−1/2

  u + u − up + up − 0 and u ¯(τ ) = 1, c(τ ) du , c(τ ) du , ..., cp (τ ) du , cp (τ ) du .

R

¯(τ1 )¯ u0 (τ2 )K( c(τu1 ) )K( c(τu2 ) )du Ru

This result can be established by adapting Qu and Yoon (2015a) to our framework, and a proof is provided in Appendix B.2. In this theorem, we explicitly write the p-th order bias terms for the purpose of emphasizing on what is the remaining order of biases. The next weak convergence result for the QRKD estimator (3.1) follows from Theorem 2. Corollary 2. Suppose that Assumptions 1 and 6 hold. We have q   \ ) − QRKD(τ ) ⇒Y (τ ) = G(τ, 2) − G(τ, 3) , nh3n,τ QRKD(τ 0 − b0 (x+ 0 ) − b (x0 )

τ ∈ T,

where Y is a zero mean Gaussian process with covariance function given by E[Y (τ1 )Y (τ2 )] = (τ1 ∧ τ2 − τ1 τ2 )×

ι02 N −1 T (τ1 , τ2 )N −1 ι2 + ι03 N −1 T (τ1 , τ2 )N −1 ι3 − ι02 N −1 T (τ1 , τ2 )N −1 ι3 − ι03 N −1 T (τ1 , τ2 )N −1 ι2 0 − 2 fX (x0 )fY |X (QY |X (τ1 |x0 )|x0 )fY |X (QY |X (τ2 |x0 )|x0 )(b0 (x+ 0 ) − b (x0 )) 14

for all τ1 , τ2 ∈ T . The random process Y (·) has mean zero, as G(·, 2) and G(·, 3) do. In practice, we can compute its covariance structure by using the pivotal method suggested in Qu and Yoon (2015a) – see Appendix C.2 for a practical guide on the implementation. To account for possibly higher variance from the conditional quantiles at the localities where the conditional density is small, we may also consider the following standardized version of the weak convergence results. Let σ s (τ ) := {EY 2 (τ )}1/2 , and σ bs (τ )

be the uniformly consistent standard error estimate based on the pivotal method (see Appendix C.2). An application of Slutsky’s theorem and the continuous mapping theorem to Corollary 2 leads to the next result. Corollary 3. Suppose that Assumptions 1 and 6 hold. If σ s (·) is uniformly bounded away from 0 on T , then we have q  QRKD(τ \ ) QRKD(τ )  Y (τ ) G(τ, 2) − G(τ, 3) nh3n,τ − ⇒Y std (τ ) := s = s , 0 − σ bs (τ ) σ s (τ ) σ (τ ) σ (τ )(b0 (x+ 0 ) − b (x0 ))

τ ∈ T,

where Y std is a zero mean Gaussian process with covariance function given by E[Y std (τ1 )Y std (τ2 )] = (τ1 ∧ τ2 − τ1 τ2 )×

ι02 N −1 T (τ1 , τ2 )N −1 ι2 + ι03 N −1 T (τ1 , τ2 )N −1 ι3 − ι02 N −1 T (τ1 , τ2 )N −1 ι3 − ι03 N −1 T (τ1 , τ2 )N −1 ι2 0 − 2 σ s (τ1 )σ s (τ2 )fX (x0 )fY |X (QY |X (τ1 |x0 )|x0 )fY |X (QY |X (τ2 |x0 )|x0 )(b0 (x+ 0 ) − b (x0 )) and E(Y std (τ ))2 = 1 for all τ1 , τ2 ∈ T . These weak convergence results are applicable for many purposes. They are readily applicable to computing uniform confidence bands for the QRKD. Of particular interest may be the uniform tests regarding heterogeneous treatment effects. We discuss these applications in Section 3.2.

3.2

Testing for Heterogeneous Treatment Effects

Researchers are often interested in the following hypotheses regarding heterogeneous treatment effects. Treatment Significance

H0S :

QRKD(τ ) = 0

for all τ ∈ T.

Treatment Heterogeneity H0H : QRKD(τ ) = QRKD(τ 0 ) for all τ, τ 0 ∈ T. 15

By the result in Section 2.1, under the case of rank invariance, these hypotheses regarding QRKD are equivalent to the corresponding structural hypotheses: H0S

⇐⇒

g1 (b(x0 ), x0 , Qε|X=x0 (τ )) = 0

H0H

⇐⇒

g1 (b(x0 ), x0 , Qε|X=x0 (τ )) = g1 (b(x0 ), x0 , Qε|X=x0 (τ 0 ))

for all τ ∈ T. for all τ, τ 0 ∈ T.

Furthermore, by the result in Section 2.2, even under the general case without rank invariance, the hypotheses regarding QRKD are logically implied by the corresponding structural hypotheses, i.e., for all  ∈ RM .

H0S

⇐=

g1 (b(x0 ), x0 , ) = 0

H0H

⇐=

g1 (b(x0 ), x0 , ) = g1 (b(x0 ), x0 , 0 )

for all , 0 ∈ RM .

Therefore, by the contrapositive logic, a rejection of the null hypothesis H0S implies a rejection of the structural hypothesis of uniform zero. Likewise, a rejection of the null hypothesis H0H implies a rejection of the structural hypothesis of homogeneity. For these logical equivalences or implications, the hypotheses H0S and H0H may well be of practical interest. Both of the two hypotheses, H0S and H0H , are considered in Koenker and Xiao (2002), Chernozhukov and Fern´ andez-Val (2005) and Qu and Yoon (2015b), among others. Following the approach of these preceding papers, the two hypotheses, H0S and H0H , may be tested using the statistics q \ ) and nh3n,τ QRKD(τ τ ∈T Z q −1 0 0 3 \ \ W Hn (T ) = sup nhn,τ QRKD(τ ) − |T | QRKD(τ )dτ , W Sn (T ) = sup

τ ∈T

T

or their standardized versions

\ ) QRKD(τ = sup and σ bs (τ ) τ ∈T R \ q \ 0 )dτ 0 QRKD(τ ) − |T |−1 T QRKD(τ std 3 , W Hn (T ) = sup nhn,τ σ bh (τ ) τ ∈T W Snstd (T )

q

nh3n,τ

respectively, where |T | denotes the length (Lebesgue measure) of the interval T ⊂ (0, 1), and σ bh (τ )

denotes the uniformly consistent estimator of σ h (τ ) := {E[φ0QRKD (Y )(τ )]2 }1/2 based on the pivotal method (Appendix C.2). 16

For the second term in the statistic W Hn (T ), we could also substitute a mean RKD estimator in place of |T |−1

R

T

\ 0 )dτ 0 . Nonetheless, we use the above definition for its convenient feature QRKD(τ

\ that it is written as a functional only of QRKD(·). Consequences of Corollaries 2 and 3 are the following asymptotic distributions of these test statistics. A proof is provided in Appendix B.4. Corollary 4. Suppose that Assumptions 1 and 6 hold. If σ s ( · ) and σ h ( · ) are bounded away from zero uniformly on T , then (i) W Sn (T ) ⇒ supτ ∈T |Y (τ )| and W Snstd (T ) ⇒ supτ ∈T |Y (τ )/σ s (τ )| under the null hypothesis H0S ; and (ii) W Hn (T ) ⇒ supτ ∈T |φ0QRKD (Y )(τ )| and W Hnstd (T ) ⇒ supτ ∈T |φ0QRKD (Y )(τ )/σ h (τ )| under the null hypothesis H0H , where φ0QRKD (λ)(τ ) = λ(τ ) − |T |−1

R

T

λ(τ 0 )dτ 0 for all λ ∈ `∞ (T ), the space of

all bounded, measurable, real-valued functions defined on T .

3.3

Covariates

In empirical researches, we often face the circumstances where covariates are observed in addition to the basic variables. Under a mean regression setting, Calonico, Cattaneo, Farrell and Titiunik (2016) have investigated regression discontinuity using covariates. This subsection presents an extension of the baseline QRKD method and its asymptotic results to models with covariates. Let W = (W1 , ..., Wk )0 denote the covariate random vector of dimension k ∈ N. We suppose that the model is compatible with the following partial linear structure: y = g(b(x), x, ) + w0 θ() = QY |X (|x) + w0 θ() = QY |X (|x, w0 ).

(3.2)

where ε is normalized to ε ∼ U nif orm(0, 1). We focus on this simple quantile regression representation with additive covariates and a univariate ε in this section to provide a practical solution in the presence of covariates. We could maintain the non-separability of covariates and the multi-dimensionality of ε by naively extending the baseline framework, but such a naive extension would be doomed to a

17

non-practicality in the curse of dimensionality. For the model (3.2), we are able to obtain the same convergence rate for the estimator as in the baseline estimator. Adding W0 γ to the baseline estimator, we propose βb1+ (τ ) =ι02

βb1− (τ ) =ι03

n X

argmin

(α,β1+ ,β1− ,...,βp+ ,βp− ,γ 0 )0 ∈R1+2p+k i=1 n X

argmin

(α,β1+ ,β1− ,...,βp+ ,βp− ,γ 0 )0 ∈R1+2p+k i=1

p x − x    v X i 0 − − (xi − x0 ) 0 K (βv+ d+ ρ τ yi − α − + β d − W γ ) v i i i hn,τ v! v=1

p x − x    v X i 0 − − (xi − x0 ) 0 K ρ τ yi − α − (βv+ d+ + β d − W γ ) v i i i hn,τ v! v=1

With these local linear estimators, the QRKD is estimated in turn by b+ b− \ cov (τ ) = β1 (τ ) − β1 (τ ) . QRKD + b0 (x0 ) − b0 (x− 0) For convenience of concisely presenting assumptions and results, we introduce the following short− p + p − 0 0 1+2p+k where w = [w , ..., w ]0 ∈ Rk , R = hand notations: u ˜ = [1, ud+ 1 k u , udu , ..., u du , u du , w ] ∈ R

R

Rk+1

u ˜u ˜0 K(u)fW|X (w0 |x0 )dudw1 ...dwk , and Γ(τ ) =

R

Rk+1

u ˜u ˜0 K(u)fY |WX (g(b(x0 ), x0 , τ )+w0 θ(τ )|w0 , x0 )

fW|X (w0 |x0 )dudw1 ...dwk . Most of the required assumptions stated in Assumption 7 below are direct analogues of Assumption 6. Let y ≤

y≥

sup (,w0 ,x)∈T ×supp(W)×([x,x0 )∪(x0 ,x])

inf

(,w0 ,x)∈T ×supp(W)×([x,x0 )∪(x0 ,x])

g(b(x), x, ) + w0 θ() and

g(b(x), x, ) + w0 θ().

Assumption 7. (i) (a) {(yi , xi , W0i )}ni=1 is an i.i.d. sample of n observations of (k + 2)-dimensional random vector (Y, X, W0 ). Random vector W has a compact support. (b) fW|X is continuously differentiable in x on supp(W) × [x, x0 ) and supp(W) × (x0 , x]. fX is continuously differentiable in a neighborhood of x0 and 0 < fX (x0 ) < ∞. (ii) (a) fY |WX is continuous on [y, y] × supp(W) × [x, x] and is continuously differentiable and Lipschitz continuous on [y, y] × supp(W) × [x, x0 ) and [y, y] × supp(W) × (x0 , x]. (b) There exist finite constants fL > 0, fU > 0 and ξ > 0 such that fY |WX (g(b(x), x, ) + w0 θ() + η|w0 , x) lies between fL and fU for all  ∈ T , |η| ≤ ξ and (w0 , x) ∈ supp(W)×[x, x ¯]. (iii) (a) g(b(x0 ), x0 , ) and

∂ ∂ g(b(x0 ), x0 , )

exist and are Lipschitz continuous in  on

T . Each coordinate of θ() is continuously differentiable and their derivatives are Lipschitz continuous in  on T . (b) g(b(x), x, ) is continuous in x at x0 . For v = 0, 1, ..., p + 1, (x, ) 7→ 18

∂v ∂xv [g(b(x), x, )]

exists and is Lipschitz continuous on {(x, )|x ∈ (x0 , x ¯],  ∈ T } and {(x, )|x ∈ [x, x0 ),  ∈ T }. (iv) The kernel K is compactly supported, having finite first-order derivative and satisfying K(·) ≥ 0,

R

K(u)du = 1, and

R

uK(u)du = 0. The matrices R and Γ() are positive definite for each  ∈ T and

the entries of their inverse matrices are uniformly bounded functions of  on T . (v) The bandwidths satisfy hn, = c()hn , where nh3n → ∞ and nhn2p+3 → 0 as n → ∞, and c(·) is Lipschitz continuous satisfying 0 < c ≤ c() ≤ c < ∞ for all  ∈ T. The following theorem states weak convergence results for the model (3.2) with covariates, analogously to Theorem 2 and Corollary 2 for the baseline model. The proofs are similar to their baseline counterparts and are therefore omitted. Theorem 3. Suppose that Assumption 7 holds for (3.2). Define X00n = X00n (·, ·) by       q ∂QY |X (τ1 |x+ + 0 ) 00 3 b Xn (τ1 , 2)  nhn,τ1 β1 (τ1 ) −  ∂x = .     q −  ∂Q (τ |x ) 2 X00n (τ2 , 3) nh3n,τ2 βb1− (τ2 ) − Y |X∂x 0

We have

   

X00n (·, 2)







  Gcov (·, 2)  ⇒     X00n (·, 3) Gcov (·, 3)

for a tight zero mean Gaussian process Gcov : Ω 7→ `∞ (T × {2, 3}) with covariance function given by E[Gcov (τ1 , j1 )Gcov (τ2 , j2 )] =

ι0j1 (Γ(τ1 ))−1 T˜(τ1 , τ2 )(Γ(τ2 ))−1 ιj2 (τ1 ∧ τ2 − τ1 τ2 ) fX (x0 )

for all τ1 , τ2 ∈ T and j1 , j2 ∈ {2, 3}, where T˜(τ1 , τ2 ) =(c(τ1 )c(τ2 ))−1/2 and

Z

u ˜(τ1 )˜ u0 (τ2 )K(u/c(τ1 ))K(u/c(τ2 ))fW|X (w1 , ..., wk |x0 )dudw1 ...dwk

− + p − p 0 1+2p+k u ˜(τ ) =[1, ud+ . u /(τ ), udu /(τ ), ..., (udu /c(τ )) , (udu /c(τ )) , w1 , ..., wk ] ∈ R

Consequently, if Assumption 1 also holds, then we have q   \ cov (τ ) − QRKDcov (τ ) ⇒ Ycov (τ ) := Gcov (τ, 2) − Gcov (τ, 3) , nh3n,τ QRKD 0 − b0 (x+ 0 ) − b (x0 ) 19

τ ∈ T.

4

Simulation Studies

In this section, we report the performance of our causal inference methods using simulated data. The main building blocks for the model consist of the policy function b, the outcome production function g, and the joint distribution of (x, ε). Consider the following policy function with a kink at x0 = 0.      −x if x 6 0 b(x) =     if x > 0 x

For convenience of assessing the performance of our estimator for homogeneous treatment effects and heterogeneous treatment effects, we consider the following three outcome structures. Structure 0:

g(b, x, ) = 0.0b + 1.0x + 0.1x2 + 

Structure 1:

g(b, x, ) = 0.5b + 1.0x + 0.1x2 + 

Structure 2:

g(b, x, ) = Fε|X=x0 ()b + 1.0x + 0.1x2 + 

where Fε|X=x0 denotes the conditional CDF of ε given X = x0 . Note that Structures 0 and 1 entail homogeneous treatment effects, while Structure 2 entails heterogeneous treatment effects across quantiles τ as follows. Structure 0:

g1 (b, x, Qε|X=x0 (τ )) = 0.0

Structure 1:

g1 (b, x, Qε|X=x0 (τ )) = 0.5

Structure 2:

g1 (b, x, Qε|X=x0 (τ )) = τ

To allow for endogeneity, we generate the primitive data according to      

2 ρσX σε   xi  i.i.d.  0   σX  ,   ∼ N   ,        ρσX σε σε2 εi 0

where σX = 1.0 and σε = ρ = 0.5. For estimation, we use the tricube kernel function K defined by K(u) =

3 70  1 − |u|3 1{|u| < 1}. 81 20

We set p = 2, and the bandwidths are selected with the choice rule based on the MSE minimization for local linear estimator – see Appendix C.1 for details. Figure 1 shows simulated distributions of the QRKD estimates under Structure 1 (left) and Structure 2 (right). The top row, the middle row, and the bottom row report results for the sample sizes of N = 1, 000, 2, 000, and 4, 000, respectively. In each graph, the horizontal axis measures quantiles τ , while the vertical axis measures the QRKD. The true QRKD is indicated by solid gray lines. Note that it is constant at 0.5 in the left column for Structure 1, while it is increasing in τ in the right column for Structure 2. The other broken curves indicate the 5-th, 10-th, 50-th, 90-th, and 95-th percentiles of the simulated distributions of the QRKD estimates based on Monte Carlo 2,500 iterations. Observe that the displayed distribution shrinks for each structure at each quantile τ as the sample size N increases. The biases appear to be minor relative to the variances, which is consistent with our employment of the bias corrected estimation approach. In order to more quantitatively analyze the finite sample pattern, we summarize some basic statistics for the simulated distributions in Table 1 for Structure 1 (top panel) and Structure 2 (bottom panel). In each panel, the three column groups list the absolute biases (|Bias|), the standard deviations (SD), and the root mean squared errors (RMSE). For each structure at each quantile τ , we again observe that SD and RMSE decrease as the sample size N increases. The biases are minor relative to the variances. These patterns are of course consistent with our previous discussions on Figure 1. Finally, we present uniform inference results using the techniques introduced in Section 3.2. Figure 2 shows acceptance probabilities for the 95% level uniform test of significance (panel A) and the 95% level uniform test of heterogeneity (panel B) based on 2,500 iterations. Panel A shows that the acceptance probability for the test of the null hypothesis of insignificance converges to the nominal probability 95% for Structure 0, while the acceptance probability decreases toward zero as the sample size increase for each of Structure 1 and Structure 2. These results are consistent with the construction of Structure 0, Structure 1, and Structure 2. Structure 0 exhibits the uniform zero QRKD, while

21

neither of Structure 1 nor Structure 2 has the uniform zero QRKD. Panel B shows that the acceptance probability for the test of the null hypothesis of homogeneity converges to the nominal probability 95% for Structure 0 and Structure 1, while the acceptance probability decreases toward zero as the sample size increases for Structure 2. These results are again consistent with the construction of Structure 0, Structure 1, and Structure 2. Each of Structure 0 and Structure 1 exhibits a constant QRKD across τ , while Structure 2 has non-constant QRKD across τ .

5

An Empirical Illustration

In labor economics, causal effects of the unemployment insurance (UI) benefits on the duration of unemployment are of interest from policy perspectives. Landais (2015) proposes an empirical strategy using the RKD to identify the causal effects of UI on the duration. Using the data set of the Continuous Wage and Benefit History Project (CWBH – see Moffitt, 1985), Landais estimates the effects of benefit amounts on the duration of unemployment. In this section, we apply our QRKD methods, and aim to discover potential heterogeneity in these causal effects. Using quantiles in this application also has an advantage of informing a likely direction of the selection bias of the mean RKD estimator that stems from not observing the mass of employed individuals at the low quantile (y = 0). In all of the states in the United States, a compensated unemployed individual receives a weekly benefit amount b that is determined as a fraction τ1 of his or her highest earning quarter x in the base period (the last four completed calendar quarters immediately preceding the start of the claim) up to a fixed maximum amount bmax , i.e. b = min{τ1 · x, bmax }. The both parameters, τ1 and bmax , of the policy rule vary from state to state. Furthermore, the ceiling level bmax changes over time within a state. For these reasons, empirical analysis needs to be conducted for each state for each restricted time period. The potential duration of benefits is determined in a somewhat more complicated manner. Yet, it also can be written as a piecewise linear and kinked function of a fraction of a running variable x in the CWBH data set.

22

Following Landais (2015), we make our QRKD empirical illustration by using the CWBH data for Louisiana. The data cleaning procedure is conducted in the same manner as in Landais. As a result of the data processing, we obtain the same descriptive statistics (up to deflation) as those in Landais for those variables that we use in our analysis. For the dependent variable y, we consider both the claimed number of weeks of UI and the actually paid number of weeks. For the running variable x, we use the highest quarter wage in the based period. The treatment intensity b is computed by using the formula b(x) = min{(1/25) · x, bmax }, with a kink where the maximum amount is bmax = $4, 575 for the period between September 1981 and September 1982 and bmax = $5, 125 for the period between September 1982 and December 1983. Table 2 summarizes empirical results for the time period between September 1981 and September 1982. Table 3 summarizes empirical results for the time period between September 1982 and December 1983. In each table, we display the RKD results by Landais (2015) for a reference. In the following rows, the QRKD estimates are reported with respective standard errors in parentheses for quantiles τ ∈ {0.10, · · · , 0.90}. At the bottom of each table, we report the p-values for the test of significance and the test of heterogeneity. Observe the following patterns in these result tables. First, the estimated causal effects have positive signs throughout all the quantiles but for one (τ = 0.10 in Table 2), implying that higher benefit amounts cause longer unemployment durations consistently across the outcome levels. Second, these causal effects are smaller at lower quantiles (e.g., τ = 0.10), while they are larger at middle and higher quantiles. This pattern implies that unemployed individuals who have longer unemployment durations tend to have larger unemployment elasticities with respect to benefit levels. The extent of this increase of the causal effects in quantiles is more prominent for the results in Table 2 (1981–1982) than in Table 3 (1982–1983).5 This result implies that the causal effects are heterogeneous. Third, the 5

We remark that the qualitative differences in the results that we find between the non-recession period (1981–1982)

and the recession period (1982–1983) can be perhaps useful for telling apart the two potential routes of the causal effects, namely the moral hazard and liquidity effects.

23

causal effects are very similar between the results for claimed UI as the outcome and the results for paid UI as the outcome variable. The respective standard errors are almost the same between these two outcome variables, but they are not exactly the same. Fourth, the uniform tests show that the causal effects are significantly different from zero for the both time periods. Lastly, the uniform tests show that the causal effects are also significantly heterogeneous for the both time periods. Indeed, the heterogeneity is insignificant in Table 3 (1982–1983) according to the non-standardized test statistics, but it is significant according to the standardized ones.

6

Summary

Economists have taken advantage of policy irregularities to assess causal effects of endogenous treatment intensities. A new approach along this line is the regression kink design (RKD) used by recent empirical papers, including Nielsen, Sørensen and Taber (2010), Landais (2015), Simonsen, Skipper and Skipper (2015), Card, Lee, Pei and Weber (2016), and Dong (2016). While the prototypical framework is only able to assess the average treatment effect at the kink point, inference for heterogeneous treatment effects using the RKD is of potential interest by empirical researchers (e.g., Landais (2011) considers it). In this light, this paper develops causal analysis and methods of inference for the quantile regression kink design (QRKD). We first develop causal interpretations of the QRKD estimand. It is shown that the QRKD estimand measures the marginal effect of the treatment variable on the outcome variable at the conditional quantile of the outcome given the design point of the running variable provided that the causal structure exhibits rank invariance. This result is generalized to the case of no rank invariance, where the QRKD estimand is shown to measure a weighted average of the marginal effects of the treatment variable on the outcome variable at the conditional quantile of the outcome given the design point of the running variable. Second, we propose a sample counterpart QRKD estimator, and develop its asymptotic properties for statistical inference of heterogeneous treatment effects. Under some extra

24

assumptions, a variation of the QRKD estimand that accounts for covariates is also provided. We obtain weak convergence results for the QRKD estimators. Applying the weak convergence results, we propose procedures for statistical tests of treatment significance and treatment heterogeneity. Simulation studies support our theoretical results. Applying our methods to the Continuous Wage and Benefit History Project (CWBH) data, we find significantly heterogeneous causal effects of unemployment insurance benefits on unemployment durations in the state of Louisiana for the period between September 1981 and December 1983.

Appendix A

Proof of Theorem 1

Proof. For the first part of the proof, we branch into two cases: (i) M = 1 and (ii) M > 1. −1 is a probability measure on ∂V (y, x) follows from (2.3) under Assumption (i) For M = 1: That µM y,x

4. By Leibniz integral rule and the implicit function theorem under Assumptions 2, 3 (i) and 4, the QPD

∂ ∂x Q(τ

| x) exists and ∂ Q(τ | x) = ∂x

P

hx (x,) ∈∂V (y,x) |h (x,)| fε|X (

P

| x) −

R

∂ V (y,x) ∂x fε|X (

1 ∈∂V (y,x) |h (x,)| fε|X (

| x)

| x)d

= Eµ0y,x [hx (x, ε)] − A(y, x), where A is defined by

R

∂ V (y,x) ∂x fε|X ( | x)d 1 ∈∂V (y,x) |h (x,)| fε|X ( |

A(y, x) := P

x)

−1 is a probability measure on ∂V (y, x) follows from Lemma 2 of Sasaki (ii) For M > 1: That µM y,x

(2015) under Assumption 4. Next, by Lemma 1 of Sasaki (2015) under Assumptions 2, 3 (ii) and 4,

25

the QPD

∂ ∂x Q(τ

| x) exists and

∂ Q(τ | x) = ∂x

R

(M −1)/2 hx (x,) fε|X (|x)·M π dH M −1 () ∂V (y,x) k∇ h(x,)k 2M −1 Γ( M +1 )

R

2



R

∂ V (y,x) ∂x fε|X (

| x)dmM ()

fε|X (|x)·M π (M −1)/2 1 dH M −1 () ∂V (y,x) k∇ h(x,)k 2M −1 Γ( M +1 ) 2

= EµM −1 [hx (x, ε)] − A(y, x), y,x where Γ is the Gamma function and A is defined by A(y, x) := R

R

∂ M V (y,x) ∂x fε|X ( | x)dm () fε|X (|x)·M π (M −1)/2 1 dH M −1 () ∂V (y,x) k∇ h(x,)k 2M −1 Γ( M +1 ) 2

From this point on, we treat both cases (i) M = 1 and (ii) M > 1 together. Note that g2 =

∂g ∂x

−1 is continuous in x by Assumption 2 (i). Also, µM y,x () is continuous in x for each fixed y according

to parts (i), (ii) and (iii) of Assumption 2. Furthermore, Assumption 2 (i), (ii), (iii) and (iv) imply that A(y, x) is well-defined and is continuous in x for all y ∈ Y. Therefore, applying the dominated convergence theorem under Assumptions 2 (iv), 4 and 5 yields lim

x→x+ 0

∂ Q(τ | x) = ∂x

lim

x→x+ 0

Z

=

Z

∂V (y,x)

lim

∂V (y,x0 )

Z

=

−1 {hx (x, )}dµM y,x () − lim A(y, x)

x→x+ 0

x→x+ 0

∂ −1 {g(b(x), x, )}dµM y,x0 () − A(y, x0 ) ∂x

−1 lim {g1 (b(x), x, )b0 (x) + g2 (b(x), x, )}dµM y,x0 () − A(y, x0 ) +

∂V (y,x0 ) x→x0

Z

=

M −1 {g1 (b(x0 ), x0 , )b0 (x+ 0 ) + g2 (b(x0 ), x0 , )}dµy,x0 () − A(y, x0 )

Similarly, taking the limit from the left, we have ∂ Q(τ | x) = lim − ∂x x→x0

Z

∂V (y,x0 )

M −1 {g1 (b(x0 ), x0 , )b0 (x− 0 ) + g2 (b(x0 ), x0 , )}dµy,x0 () − A(y, x0 ).

Taking the difference of the right and left limits eliminates and thus produces lim

x→x+ 0

R

M −1 ∂V (y,x0 ) g2 (b(x0 ), x0 , )dµy,x0 ()−A(y, x0 ),

∂ ∂ 0 − Q(τ | x) − lim Q(τ | x) = [b0 (x+ −1 [g1 (b(x0 ), x0 , ε)] . 0 ) − b (x0 )]EµM y,x0 − ∂x x→x0 ∂x

0 − Finally, note that Assumption 1 has b0 (x+ 0 ) − b (x0 ) 6= 0, and hence we can divide both sides of the 0 − above equality by b0 (x+ 0 ) − b (x0 ). This gives the desired result.

26

B

Appendix for the Large Sample Results

In this appendix, we abbreviate QY |X as Q for conservation of the space. In addition to those notations introduced in the main text, we define the linear extrapolation error and the estimation errors follows. p h i X (xi − x0 )v  ∂ v Q(τ |x+ ∂ v Q(τ |x− 0) + 0) − ei (τ ) = Q(τ |x0 ) + d d + − Q(τ |xi ), i i v! ∂xv ∂xv v=1

− − ui (τ ) = yi − α(τ ) − β1+ (τ )d+ i (xi − x0 ) − β1 (τ )di (xi − x0 ) −

p X (xi − x0 )v v=2

v!

− − [βv+ (τ )d+ i − βv (τ )di ],

u0i (τ ) = yi − Q(τ |xi ). Furthermore, we define the vector of centered and rescaled candidate parameter values as   α(τ ) − Q(τ |x0 )         +   |x0 )  hn,τ β1+ (τ ) − ∂Q(τ  ∂x      −    ∂Q(τ |x0 ) −  hn,τ β1 (τ ) −  p ∂x   φn (τ ) = nhn,τ   ∈ R1+2p .   . ..         p +  p (hn,τ /p!) β + (τ ) − ∂ Q(τp|x0 )  p   ∂x      p  − ∂ p Q(τ |x0 ) − (hn,τ /p!) βp (τ ) − p ∂x

Although it is not a direct object of interest, the level estimator is denoted by α b(τ ) =

ι01

argmin

n X

(α,β1+ ,β1− ,...,βp+ ,βp− )∈R1+2p i=1

p x − x   v X i 0 − − (xi − x0 ) ρτ yi − α − (βv+ d+ + β d ) , K v i i hn,τ v! v=1

xi −x0 − where ι1 = [1, 0, ..., 0]0 ∈ R2p+1 , d+ u = 1{u > 0}, du = 1{u < 0}, Ki,n,τ = K( hn,τ ) and zi,n,τ = h i0 xi −x0 p + xi −x0 p − + xi −x0 − 0 1, ( xhi −x )d , ( )d , ..., ( ) d , ( ) d ∈ R1+2p . With the similarly defined derivative i i i i hn,τ hn,τ hn,τ n,τ

27

estimators, βb1+ (τ ), βb2− (τ ), ..., βbp+ (τ ), βbp− (τ ), we now define 



α b(τ ) − Q(τ |x0 )        +    ) ∂Q(τ |x + 0 b  hn,τ β1 (τ ) −  ∂x       −   ∂Q(τ |x0 ) − b   h β (τ ) − p n,τ 1 ∂x   φbn (τ ) = nhn,τ  .   . ..         p +  p (hn,τ /p!) βb+ (τ ) − ∂ Q(τp|x0 )  p   ∂x      p  ∂ p Q(τ |x− − 0 ) b (hn,τ /p!) β (τ ) − p p

∂x

Furthermore, the following notations will be used in the subsequent proofs. Vn,τ (φn (τ )) =

n X

ρτ (u0i (τ )

i=1

− ei (τ ) −

Sn (τ, φn (τ ), ei (τ )) = (nhn )−1/2

n X i=1

0 (nhn,τ )−1/2 zi,n,τ φn (τ ))Ki,n,τ



n X i=1

ρτ (u0i (τ ) − ei (τ ))Ki,n,τ ,

0 {P ((u0i (τ ) ≤ ei (τ ) + (nhn,τ )−1/2 zi,n,τ φn (τ ))|xi )

0 − 1(u0i (τ ) ≤ ei (τ ) + (nhn,τ )−1/2 zi,n,τ φn (τ ))}zi,n,τ Ki,n,τ .

B.1

Uniform Bahadur Representation

The following lemma states the uniform Bahadur representation from Qu and Yoon (2015a, Theorem 1) adapted to our framework. Lemma 1. Under Assumption 6, we have   p+1 ∂Q(τ |x+ ∂ Q(τ |x+ ι0 (N )−1 R + 0 ) 0 ) + q − hpn,τ 2(p+1)! R du +  βb1 (τ ) − ∂x ∂xp+1 3 nhn,τ    ∂Q(τ |x− ∂ p+1 Q(τ |x+ ι0 (N )−1 R 0 ) 0 ) + βb− (τ ) − − hpn,τ 3 d + p+1 1



 = 

∂x

ι02 N −1

Pn

(p+1)!

R

i=1 zi,n,τ Ki,n,τ (τ −1{yi ≤Q(τ |xi )}) nhn,τ fX (x0 )fY |X (Q(τ |x0 )|x0 ) P ι03 N −1 n √ i=1 zi,n,τ Ki,n,τ (τ −1{yi ≤Q(τ |xi )}) nhn,τ fX (x0 )fY |X (Q(τ |x0 )|x0 )



uniformly in τ ∈ T.

u

∂x



∂ p+1 Q(τ |x− 0 ) − du ∂xp+1





up+1 u ¯K(u)du    ∂ p+1 Q(τ |x− p+1 0 ) − d u u ¯ K(u)du u ∂xp+1

+ op (1)   + op (1)

For this lemma, we mostly follow the proof of Theorem 1 in Qu and Yoon (2015a). The major difference is that we define our zi,n,τ , φn (τ ), u0i (τ ), ui (τ ) and ei (τ ) slightly differently in order to + incorporate the constraint Q(τ |x− 0 ) = Q(τ |x0 ) = Q(τ |x0 ).

28

Proof. We prove the lemma in three steps. Step 1. Let Kn = log1/2 (nhn ), Φn = {(τ, φn (τ )) : τ ∈ T, kφn (τ )k ≤ log1/2 (nhn )}. In this step, we show that φbn (τ ) ∈ Φn with probability 1 − o(1). Since Vn,τ (0) = 0, Vn,τ (φbn (τ )) ≤ 0 always holds for

τ ∈ T and any n. We now claim that for each  > 0, there exist some finite constants N0 and η > 0 independent of quantile such that if kφn (τ )k ≥ Kn for some τ , then   P Vn,τ (φn (τ )) > ηKn2 > 1 −  holds for n ≥ N0 . It then suffices to show P



inf

 inf Vn,τ (φn ) > ηKn2 > 1 − 

kφn k≥Kn τ ∈T

for all n ≥ N0 . By convexity of Vn,τ (·), it holds that Vn,τ (γφn ) − Vn,τ (0) ≥ γ(Vn,τ (φn ) − Vn,τ (0)) for any γ > 1. Thus, it suffices to show for all n ≥ N0 P



 inf Vn,τ (φn ) > ηKn2 > 1 − .

inf

kφn k=Kn τ ∈T

Using Knight’s (1998) decomposition, we write Vn,τ = Wn,τ + Zn,τ with the summands defined by −1/2

Wn,τ (φn ) = −(nhn,τ ) Zn,τ (φn ) =

n X

Ki,n,τ

i=1

Z

n X i=1

0 Ki,n,τ ψτ (u0i (τ ) − ei (τ ))zi,n,τ φn

0 (nhn,τ )−1/2 zi,n,τ φn

0

and

{1(u0i (τ ) − ei (τ ) ≤ s) − 1(u0i (τ ) − ei (τ ) ≤ 0)}ds,

where ψτ (u) = τ − 1(u < 0). Therefore, we obtain inf

inf Kn−2 Vn,τ (φn ) ≥

kφn k=Kn τ ∈T

inf

inf Kn−2 Zn,τ (φn ) −

kφn k=Kn τ ∈T

sup

sup |Kn−2 Wn,τ (φn )| = (I) − (II).

kφn k=Kn τ ∈T

First, (II) is bounded as (II)

n X −1 −1/2 0 ≤Kn sup k(nhn,τ ) {ψτ (u0i (τ ) − ei (τ )) − ψτ (u0i (τ ))}zi,n,τ Ki,n,τ k τ ∈T i=1 n X −1 −1/2 0 +Kn sup k(nhn,τ ) ψτ (u0i (τ ))zi,n,τ Ki,n,τ k τ ∈T i=1

Lemma 8 implies the first term is Op (Kn−1 ) = op (1). An application of the Lindeberg-Feller CLT for each fixed τ combined with stochastic equicontinuity from Lemma 5 shows that the second term is Op (Kn−1 ) = op (1). 29

Next, we show that (I) is strictly positive with probability 1 − o(1). First note that Zn,τ (φn ) is nonnegative and satisfies Z

0 (nhn,τ )−1/2 zi,n,τ φn

0

≥(nhn,τ )−1/2

{1(u0i (τ ) − ei (τ ) ≤ s) − 1(u0i (τ ) − ei (τ ) ≤ 0)}ds

0  o zi,n,τ φn n  0 z 0 φn  1 ui (τ ) − ei (τ ) ≤ (nhn,τ )−1/2 i,n,τ − 1 u0i (τ ) − ei (τ ) ≤ 0 2 2

following the same argument as in Lemma A.1 of Oka and Qu (2011). Applying the inequality to Zn,τ yields Kn−2 Zn,τ (φn ) ≥Kn−2

s

nhn φ0n {Sn (τ, 0, ei (τ )) − Sn (τ, φn /2, ei (τ ))} nhn,τ 2 n n  0 X zi,n,τ φn  P u0i (τ ) − ei (τ ) ≤ (nhn,τ )−1/2 xi 2 2

φ0 +Kn−2 (nhn,τ )−1/2 n

i=1

o − P u0i (τ ) − ei (τ ) ≤ 0 xi zi,n,τ Ki,n,τ = (III) + (IV ) 

Note that (III) = Op (Kn−1 ) = op (1) because of Lemma 7, kφn k = Kn and

nhn,τ nh

= O(1). The mean

value theorem implies n

X 1 0 (IV ) = Kn−2 (nhn,τ )−1 fY |X (˜ yi |xi )Ki,n,τ φ0n zi,n,τ zi,n,τ φn , 4 i=1

0 for some y˜i lying between Q(τ |xi ) + ei (τ ) and Q(τ |xi ) + ei (τ ) + (nhn,τ )−1/2 zi,n,τ φn /2. Note that xi

is vanishing in a neighborhood of xi . Furthermore, y˜i → Q(τ |xi ) as n → ∞. Assumption 6 (ii)(b) suggests fY |X (˜ yi |xi ) ≥ fL asymptotically. Therefore,   X 1 0 (IV ) ≥ Kn−2 fL φ0n (nhn,τ )−1 Ki,n,τ zi,n,τ zi,n,τ φn 4 n

i=1

uniformly on T for n large enough. By Assumption 6(iv), we have (IV ) ≥ 18 fL λmin (τ ) with probability 1 − o(1) on T , where λmin (τ ) > 0 is the minimum eigenvalue of fX (x0 )N . Thus we have shown that (I) is strictly positive uniformly in τ with probability going to one. Step 2. We now focus on the behavior of the subgradient (subgradient) =

n X 0 cn (τ ))}zi,n,τ Ki,n,τ φ {τ − 1(u0i (τ ) ≤ ei (τ ) + (nhn,τ )−1/2 zi,n,τ i=1

30

on the set Φn . Theorem 2.1 of Koenker (2005) and Assumption 6 (iv) imply (nhn )−1/2 ·(subgradient) = Op ((nhn )−1/2 ) uniformly in τ ∈ T. Following Qu and Yoon (2015a), we can rewrite the subgradient (scaled by (nhn )−1/2 ) as (nhn )−1/2

n X 0 cn (τ ))}zi,n,τ Ki,n,τ {τ − 1(u0i (τ ) ≤ ei (τ ) + (nhn,τ )−1/2 zi,n,τ φ i=1

cn (τ ), ei (τ )) − Sn (τ, 0, ei (τ ))} + {Sn (τ, 0, ei (τ )) − Sn (τ, 0, 0)} + Sn (τ, 0, 0) ={Sn (τ, φ +(nhn )−1/2

n X 0 cn (τ )|xi )}zi,n,τ Ki,n,τ {τ − P ((u0i (τ ) ≤ ei (τ ) + (nhn,τ )−1/2 zi,n,τ φ i=1

The differences inside the first two pairs of curly brackets are of order op (1) on the set Φn by Lemma 7. The conditional probability in the last term is a conditional CDF of Y |X. Applying the first order mean value expansion to the last term at y = Q(τ |xi ) yields (nhn )−1/2

n X 0 cn (τ )|xi )}zi,n,τ Ki,n,τ {τ − P ((u0i (τ ) ≤ ei (τ ) + (nhn,τ )−1/2 zi,n,τ φ i=1

= − (nhn )−1/2

n X i=1

fY |X (˜ yni |xi )ei (τ )zi,n,τ Ki,n,τ

− (nhn )−1/2 (nhn,τ )−1/2

n X i=1

 0 cn (τ ), fY |X (˜ yni |xi )Ki,n,τ zi,n,τ zi,n,τ φ

0 cn (τ ). where y˜ni lies between Q(τ |xi ) and Q(τ |xi ) + ei (τ ) + (nhn,τ )−1/2 zi,n,τ φ

Taking the above auxiliary results together, we can now rewrite the subgradient (scaled by (nhn )−1/2 )

as Sn (τ, 0, 0) − (nhn )−1/2

n X i=1

− (nhn )−1/2 (nhn,τ )−1/2

fY |X (˜ yni |xi )ei (τ )zi,n,τ Ki,n,τ

n X i=1

 0 cn (τ ). fY |X (˜ yni |xi )Ki,n,τ zi,n,τ zi,n,τ φ

Recall that this subgradient (scaled by (nhn )−1/2 ) is op (1) uniformly in τ ∈ T . Note that, since (nhn,τ )−1

Pn

P 0 yni |xi )Ki,n,τ zi,n,τ zi,n,τ − → i=1 fY |X (˜

Sn (τ, 0, 0) − (nhn )−1/2 =(

fY |X (Q(τ |x0 )|x0 )fX (x0 )N uniformly in τ ∈ T , we have

n X i=1

fY |X (˜ yni |xi )ei (τ )zi,n,τ Ki,n,τ

 hn,τ 1/2  cn (τ ) + op (1) ) fY |X (Q(τ |x0 )|x0 )fX (x0 )N + op (1) φ hn 31

uniformly in τ ∈ T . Since N is positive definite and fY |X (Q(τ |x0 )|x0 )fX (x0 ) > 0 by Assumption 6(i)(a),(ii)(b), we obtain  cn (τ ) = fX (x0 )fY |X (Q(τ |x0 )|x0 )N + op (1) −1 × φ    n X hn 1/2 −1/2 Sn (τ, 0, 0) − (nhn,τ ) fY |X (Q(τ |x0 )|x0 ) ei (τ )zi,n,τ Ki,n,τ + op (1) hn,τ

(B.1)

i=1

uniformly in τ ∈ T . Step 3. We finally obtain the uniform Bahadur representation. Under Assumption 6(iii)(b),(iv) for any xi such that (xi − x0 )/hn,τ ∈ supp(K), we have ei (τ ) = −

 x − x p+1  ∂ p+1 Q(τ |x+ )  ∂ p+1 Q(τ |x− 1 i 0 + p+1 p+1 0 0) − d d + i i hn,τ + o(hn,τ ) (p + 1)! hn,τ ∂xp+1 ∂xp+1

uniformly in τ ∈ T . Also note that we have q − nh2p+3 n,τ

Z  p+1 q n  p+1 1 X ∂ Q(τ |x+ Q(τ |x− 2p+3 fX (x0 ) + P p+1 0) + ∂ 0) − e (τ )z K d − → − nh d + d u ¯K(u)du n,τ i i,n,τ i,n,τ u u u i (p + 1)! R ∂xp+1 ∂xp+1 nhp+1 n,τ i=1

uniformly in τ ∈ T . Applying this convergence result and substituting Sn (τ, 0, 0) into equation (B.1), we have cn (τ ) = N φ |

−1 −1 (nh n,τ ) 2

Pn

− 1{yi ≤ Q(τ |xi )})zi,n,τ Ki,n,τ fX (x0 )fY |X (Q(τ |x0 )|x0 ) {z } i=1 (τ

Stochastic Term

Z  p+1 q  N −1 ∂ Q(τ |x+ ∂ p+1 Q(τ |x− p+1 0) + 0) − d + d u ¯K(u)du +op (1) + nh2p+3 n,τ i i u (p + 1)! R ∂xp+1 ∂xp+1 | {z } Bias Term

uniformly in τ ∈ T. The first term on RHS is the pivotal part (stochastic term) and the second is the bias term. Premultiplying both sides by [ι2 , ι3 ]0 then gives the desired result.

B.2

Proof of Theorem 2

Proof. Lemma 1 and Assumption 6 (iii) (b), (v) imply Zn = Xn + op (1) uniformly in (τ, j), where Xn (τ, j) =

n X

Wni (τ, j) where Wni (τ, j) =

i=1

ι0j N −1 zi,n,τ Ki,n,τ (τ − 1{yi ≤ Q(τ |xi )}) p nhn,τ fX (x0 )fY |X (Q(τ |x0 )|x0 )

Therefore, in light of Theorem 18.10 (iv), Theorem 18.14, and Lemma 18.15 of van der Vaart (1998), it suffices to show that the leading term in the uniform Bahadur representation Xn (τ, j) is asymptotically 32

tight and has finite dimensional convergence in distribution to the Gaussian distribution with the proposed covariance function. The finite dimensional convergence follows from the multivariate version of Lindeberg-Feller CLT (van der Vaart, 1998, Proposition 2.27).

For each m ∈

N, fix any T 0 = (τ1 , ..., τm ),, τk ∈ T

and any J 0 = (j1 , ..., jm ), jk ∈ {2, 3} for all k ∈ {1, ..., m}, then denote Yni = Yni (T 0 , J 0 ) = [Wni (τ1 , j1 ), ..., Wni (τK , jm )]0 , an m-dimensional random vector. We will verify that 1. 2.

Pn

i=1 E[kYni k

Pn

2

1{kYni k > }] → 0 for all  > 0;

i=1 Cov(Yni )

and

→ Σ.

Then the multivariate Lindeberg-Feller CLT implies

Pn

i=1 (Yni

− EYni ) converges in distribution to

N (0, Σ). Also, by the law of iterated expectations, E[Wni (τ, j)] = 0 for all (τ, j), and thus EYni = 0. For calculation of the covariance, pick any τ1 , τ2 ∈ T 0 and j1 , j2 ∈ J 0 . Under Assumption 6(i)(ii)(iv)(v), the law of iterated expectations implies that EXn (τ1 , j1 )Xn (τ2 , j2 ) =E =E =E =

n 0 X ι0j1 N −1 zi,n,τ1 zi,n,τ N −1 ιj2 Ki,n,τ1 Ki,n,τ2 (τ1 − 1{yi ≤ Q(τ1 |xi )})(τ2 − 1{yi ≤ Q(τ2 |xi )}) 2 p 2 (x )f nhn c(τ1 )c(τ2 )fX 0 Y |X (Q(τ1 |x0 )|x0 )fY |X (Q(τ2 |x0 )|x0 ) i=1

n 0 X ι0j1 N −1 zi,n,τ1 zi,n,τ N −1 ιj2 Ki,n,τ1 Ki,n,τ2 E[(τ1 − 1{yi ≤ Q(τ1 |xi )})(τ2 − 1{yi ≤ Q(τ2 |xi )})|xi ] 2 p 2 (x )f nhn c(τ1 )c(τ2 )fX 0 Y |X (Q(τ1 |x0 )|x0 )fY |X (Q(τ2 |x0 )|x0 ) i=1

0 ι0j1 N −1 zi,n,τ1 zi,n,τ N −1 ιj2 Ki,n,τ1 Ki,n,τ2 (τ1 ∧ τ2 − τ1 τ2 ) 2 p 2 (x )f hn c(τ1 )c(τ2 )fX 0 Y |X (Q(τ1 |x0 )|x0 )fY |X (Q(τ2 |x0 )|x0 )

ι0j1 N −1 T (τ1 , τ2 )N −1 ιj2 (τ1 ∧ τ2 − τ1 τ2 ) + O(hn ) = O(1). fX (x0 )fY |X (Q(τ1 |x0 )|x0 )fY |X (Q(τ2 |x0 )|x0 )

Therefore,  > 0,

Pn

0 i=1 E[Yni Yni ]

lim

n→∞

→ Σ. It now remains to verify the Lindeberg-Feller condition, i.e. for any

n m m hX nX oi X 2 2 E Wni (τk , jk ) 1 Wni (τk , jk ) > 2 = 0 i=1

k=1

k=1

33

Note for any  > 0, RHS can be bounded by a Lyapunov type condition n m m hX nX oi X 2 2 E Wni (τk , jk ) 1 Wni (τk , jk ) > 2

≤ ≤

i=1 n X i=1 n X i=1

k=1

k=1

m m 3/2 n X oi 1 h X 2 2 E Wni (τk , jk ) 1 Wni (τk , jk ) > 2  k=1

k=1

m 3/2 i 1 h X 2 E Wni (τk , jk ) 

(B.2)

k=1

Since, under Assumption 6 (i) (a), and (ii) (b), all entries of N are finite, K has bounded support and |τ − 1(yi ≤ Q(τ |xi ))| ≤ 1 and the densities are bounded away from zero, we have for all τ and j, there exist finite constants cj,l (independent of n and τ ) and C 0 (independent of n, τ and j) such that     xi −x0 p − xi −x0 cj,1 + cj,2 ( xhi −x0 )d+ (τ − 1{yi ≤ Q(τ |xi )}) + ... + c ( ) d K j,2p+1 i i hn,τ hn,τ n,τ p |Wni (τ, j)| = nhn,τ fX (x0 )fY |X (Q(τ |x0 )|x0 ) x − x  C0 i 0 ≤√ K . chn nhn Thus for C 00 = (m)3/2 (C 0 )2 , we have m m  0 3/2 i h X h X ιj N −1 zi,n,τ Ki,n,τ (τ − 1{yi ≤ Q(τ |xi )}) 2 3/2 i 2 p E Wni (τk , jk ) =E nhn,τ fX (x0 )fY |X (Q(τ |x0 )|x0 ) k=1 k=1 Z   C 00 3 xi − x0 fX (xi )dxi ≤ K chn (nhn )3/2 Z C 00 hn ≤ K 3 (u/c)(fX (x0 ) − uhn M )du (nhn )3/2   1 =O , 1/2 n3/2 hn

where M is a finite constant coming from the local Lipshitzness of fX at x0 implied by the continuous differentiability at x0 and the fact that K has a bounded support from Assumption 6(i)(iv). Under Assumption 6(i)(b),(v), by this intermediate result and identical distribution of data for each sample size n, equation (B.2) yields n m m hX nX oi  1  X 2 2 E = o(1). Wni (τk , jk ) 1 Wni (τk , jk ) > 2 = O √  nhn i=1 k=1 k=1

This establishes the finite dimensional asymptotic normality that to N (0, Σ). 34

Pn

i=1 Yni

converges in distribution

The tightness follows because the denominator is bounded away from zero by Assumption 6 (i), (ii), and because the numerator is tight by Lemma 5.

B.3

Proof of Corollary 2

The result follows from an application of the delta method under Assumption 1. That the limiting distribution in Theorem 2 is zero-mean Gaussian implies that the limiting distribution

G(τ,2)−G(τ,3) 0 − b0 (x+ 0 )−b (x0 )

is

also zero-mean Gaussian. The covariance is obtained by E = =

h G(τ , 2) − G(τ , 3)  G(τ , 2) − G(τ , 3) i 1 1 2 2 0 (x− ) 0 (x+ ) − b0 (x− ) b0 (x+ ) − b b 0 0 0 0

1 E[G(τ1 , 2)G(τ2 , 2) + G(τ1 , 3)G(τ2 , 3) − G(τ1 , 2)G(τ2 , 3) − G(τ1 , 3)G(τ2 , 2)] 2 (b0 (x+ ) − b0 (x− 0 0 )) (τ1 ∧ τ2 − τ1 τ2 ) × 0 − 2 (b0 (x+ 0 ) − b (x0 ))

ι02 N −1 T (τ1 , τ2 )N −1 ι2 + ι03 N −1 T (τ1 , τ2 )N −1 ι3 − ι02 N −1 T (τ1 , τ2 )N −1 ι3 − ι03 N −1 T (τ1 , τ2 )N −1 ι2 fX (x0 )fY |X (QY |X (τ1 |x0 )|x0 )fY |X (QY |X (τ2 |x0 )|x0 ) for each τ1 , τ2 ∈ T , where the last equality follows from the covariance expression of G derived in Theorem 2.

B.4

Proof of Corollary 4

Proof. We focus on the non-standardized tests since results for the standardized tests will follow from those for the non-standardized ones through Slutsky’s Theorem under the stated assumptions that σ s , σ h are bounded away from zero uniformly on T . Part (i) of the corollary follows from Corollaries 2 and 3. Part (ii) of the corollary follows by an application of the functional delta method (van der Vaart,1998; Theorem 20.8) with Corollaries 2 and 3. Finally, note that the functional φ

g 7→ g − |T |−1

R

T

gdτ is linear with an operator norm bounded by 2, and is therefore Hadamard

differentiable at QRKD tangentially to `∞ (T ).

35

B.5

Sufficient Primitive Conditions

In this section, we present primitive conditions for the high-level statements in parts (ii) and (iii) of Assumption 6. We introduce the short-hand notations f1 (y, x) =

Z

∂V (y,x)

f2 (y, x) =

Z

V (y,x)

f3 (y, x) =

Z

hx (x, ) f ( | x)dH M −1 () k∇ h(x, )k ε|X

∂ f ( | x)dmM () ∂x ε|X

∂V (y,x)

1 f ( | x)dH M −1 () k∇ h(x, )k ε|X

where h, V and ∂V are defined in Section 2.2. We show that Assumption 8 stated below in terms of the structural primitives is sufficient for the aforementioned high-level conditions in parts (ii) and (iii) of Assumption 6. Define y ∗ = inf

(τ,x)∈T ×[x,x]

inf{y ∈ Y|

Assumption 8. (i)

sup (τ,x)∈T ×[x,x]

R

M V (y,x) fε|X (|x)dm ()

∂q f ∂xj ∂y q−j 1

and

inf{y ∈ Y|

≥ τ }.

∂q f ∂xj ∂y q−j 2

R

M V (y,x) fε|X (|x)dm ()

≥ τ } and y∗ =

exist for each 0 ≤ j, q ≤ p − 1, j + q ≤ p − 1 and

are Lipschitz continuous on [y∗ , y ∗ ] × [x, x0 ) and [y∗ , y ∗ ] × (x0 , x]. (ii) f3 is Lipschitz continuous on [y∗ , y ∗ ]×[x, x].

∂q f ∂xj ∂y q−j 3

exists for each 0 ≤ j ≤ q ≤ p and is Lipschitz continuous on [y∗ , y ∗ ]×[x, x0 )

and [y∗ , y ∗ ] × (x0 , x]. (iii) For each κ ∈ (0, ∞), there exist finite positive constants fL0 (κ) and fU0 (κ) such that 0 < fL0 (κ) < f3 (y, x) < fU0 (κ) < ∞ uniformly in (y, x) on [−κ, κ] × [x, x]. (iv) y∗ > −∞ and y ∗ < ∞. (v) For each τ ∈ T , x 7→ inf{y ∈ Y| on [x, x]. Furthermore, τ 7→ inf{y ∈ Y| (vi) fε|X (·|x0 ) is Lipschitz continuous.

R

R

M V (y,x) fε|X (|x)dm ()

M V (y,x0 ) fε|X (|x0 )dm ()

≥ τ } is p − 1-time differentiable

≥ τ } is Lipschitz continuous on T .

Lemma 2. Assumptions 2, 3, 4, 5, and 8 imply parts (ii) and (iii) of Assumption 6. Proof. Following the proof of Lemma 1 of Sasaki (2015) under Assumptions 2, 3, 4, and 5, we obtain constants cj , j = 1, 2, 3, such that c3 6= 0, ∂ FY |X (Q(τ |x)|x) ∂ c1 f1 (Q(τ |x), x) − c2 f2 (Q(τ |x), x) Q(τ |x) = − ∂x =− , ∂x fY |X (Q(τ |x)|x) c3 f3 (Q(τ |x), x)

∂ 1 1 Q(τ |x) = = . ∂τ fY |X (Q(τ |x)|x) c3 f3 (Q(τ |x), x) 36

and

(B.3) (B.4)

Assumption 8 (iv) implies

sup (τ,x)∈T ×[x,x]

|Q(τ |x)| < ∞. Assumption 8 (iii) allows us to pick κ large

enough to ensure that the denominator f3 (Q(τ |x), x) is uniformly bounded away from zero. Using these calculations, we argue that Assumption 8 implies parts (ii) and (iii) of Assumption 6. First, as in the calculation for (B.4) above under Assumptions 2, 3, 4, and 5, we can write fY |X (QY |X ( · |x0 )|x0 ) = c3 f3



Z n inf y ∈ Y

V (y,x0 )

o  fε|X (|x0 )dmM () ≥ τ , x0 .

By Assumption 8 (v) and (vi), fY |X (QY |X ( · |x0 )|x0 ) is Lipschitz continuous on T . This shows that Assumption 6 (ii) (a) holds. Second, set κ = max{|y∗ |, |y ∗ |} + δ for a small δ > 0 and invoke Assumption 8 (iii), (iv), so we have 0 < fL0 (κ) < fY |X (y|x)/c3 = f3 (y, x) < fU0 (κ) < ∞ uniformly in (y, x) on [−κ, κ] × [x, x]. By definitions of κ, y∗ , and y ∗ , it holds that −κ < y∗ − δ/2 < y∗ ≤ Q(τ |x) ≤ y ∗ < y ∗ + δ/2 < κ on T × [x, x]. Take ξ = δ/2, and we have fL (κ) ≤ fY |X (QY |X (τ |x) + η|x) ≤ fU (κ) for all τ ∈ T , |η| ≤ ξ ¯]. This shows that Assumption 6 (ii) (b) holds. and x ∈ [x, x Third, Assumption 8 (v) implies that Q(τ |x0 ) is Lipschitz continuous. For

+ ∂ ∂τ Q(τ |x0 ),

since

limx→x+ f3 (·|x) is uniformly bounded away from zero and is Lipchitz in y by the argument in the 0

second step under Assumption 8 (ii), (iii), and (iv), (B.4) is well defined when it is evaluated at x = x+ 0 . We can then conclude that it is Lipschitz continuous using the Lipschitz continuity of Q(τ |x0 ), which also follows from Assumption 8 (v). The same reasoning applies to

− ∂ ∂τ Q(τ |x0 ).

This

shows that Assumption 6 (iii) (a) holds. Fourth, Assumption 8 (v) implies that Q(τ |·) is continuous at x0 for each τ ∈ T . This shows that the first statement of Assumption 6 (iii) (b) holds. Finally, note that, by Assumption 8 (i)-(v) and by the chain rule, ∂2 Q(τ |x) ∂x2 ∂ ∂ [c1 f1 (Q(τ |x), x) − c2 f2 (Q(τ |x), x)]c3 ∂x f3 (Q(τ |x), x) − c3 f3 (Q(τ |x), x) ∂x [c1 f1 (Q(τ |x), x) − c2 f2 (Q(τ |x), x)] = 2 2 c3 f3 (Q(τ |x), x)

exists and is Lipschitz continuous. A similar argument holds for higher order derivatives. This shows that the second statement of Assumption 6 (iii) (b) holds. 37

B.6

Auxiliary Lemmata

The following Lemmata correspond to Lemma B.1-B.6 of Qu and Yoon (2015a), and proofs follow very closely from theirs. We define our zi,n,τ , φn (τ ), u0i (τ ), ui (τ ) and ei (τ ) slightly differently to accommodate our setting. Lemma 3. Suppose that Assumptions 6 (i)(iv) and (v) hold. (1) For any γ ≥ 1, there exists B > 0 such that for any τ1 , τ2 ∈ T with τ1 ≤ τ2 , Ekzi,n,τ1 Ki,n,τ1 − zi,n,τ2 Ki,n,τ2 k2γ ≤ Bhn (τ2 − τ1 )2γ . (2) Let bn = (nhn )1/2+κ for κ ∈ (0, 1/2) and hn → 0, nhn → ∞ as n → ∞. Let δn = {(τ1 , τ2 ) : τ1 , τ2 ∈ T, τ1 ≤ τ2 ≤ τ1 + b−1 n }. Then, sup (τ1 ,τ2 )∈δn

(nhn )−1/2

n X i=1

kzi,n,τ1 Ki,n,τ1 − zi,n,τ2 Ki,n,τ2 k = op (1).

Proof. Without loss of generality, we assume supp(K) = D = [−1, 1]. Note that (τ2 − τ1 )v ≤ (τ2 − τ1 ) for all 1 ≤ v ≤ p. Thus Assumption 6(iv),(v) suggest that there is a constant C¯ independent of τ such that ¯ (xi − x0 ∈ c¯Dhn )(τ2 − τ1 ). kzi,n,τ1 Ki,n,τ1 − zi,n,τ2 Ki,n,τ2 k ≤ 1

(B.5)

Hence, Ekzi,n,τ1 Ki,n,τ1 − zi,n,τ2 Ki,n,τ2 k2γ ≤ C¯ 2γ (τ2 − τ1 )2γ P (xi − x0 ∈ c¯Dhn ). Assumption 6(i)(a) then implies the existence of a positive constant A such that P (xi − x0 ∈ c¯Dhn ) ≤ Ahn . This shows (1). Equation (B.5) implies (nhn )−1/2

n X i=1

1/2 ¯ kzi,n,τ1 Ki,n,τ1 − zi,n,τ2 Ki,n,τ2 k ≤ C(nh (τ2 − τ1 ){(nhn )−1/2 n)

n X i=1

1(xi − x0 ∈ c¯Dhn )}

An application of the weak law of large numbers implies the part in the curly brackets is Op (1). 1/2 (τ − τ ) ≤ C(nh −κ → 0 as n → ∞. ¯ ¯ Finally, sup(τ1 ,τ2 )∈δn C(nh n) 2 1 n)

38

Lemma 4. Let bn = (nhn )1/2+κ with κ ∈ (0, 1/2), hn → 0 and nhn → ∞ as n → ∞. There exist γ > 1 and C¯ < ∞ such that for any τ1 , τ2 ∈ T , |τ2 − τ1 | ≥ b−1 n , ¯ 2 − τ1 |γ . EkSn (τ2 , 0, 0) − Sn (τ1 , 0, 0)k2γ ≤ C|τ Proof. Without loss of generality, we assume τ2 ≥ τ1 . Let n   o A1i = τ2 − 1(u0i (τ2 ) ≤ 0) − τ1 − 1(u0i (τ1 ) ≤ 0) zi,n,τ2 Ki,n,τ2

  A2i = τ1 − 1(u0i (τ1 ) ≤ 0) (zi,n,τ2 Ki,n,τ2 − zi,n,τ1 Ki,n,τ1 )

and Akij stands for the j-th element of Aki . By the Minkowski’s inequality, (EkSn (τ2 , 0, 0) − Sn (τ1 , 0, 0)k)

1/γ



n X 2γ o1/γ (nhn )−γ E A1ij + A2ij

2p+1 Xn j=1

i=1

Applying the Rosenthal’s inequality and the Minkowski’s inequality, we get −γ

(nhn )

n X 2γ E A1ij + A2ij i=1

γ

−γ

≤2 C(nhn )

n X i=1

2

EkA1i k + EkA2i k

2



γ

+ 2 C(nhn )

−γ

n  X i=1

(EkA1i k2γ )1/γ + (EkA2i k2γ )1/γ

γ (B.6)

Using an argument similar to those in Lemma 3, we obtain   EkA1i k2γ ≤E E{[τ2 − 1(u0i (τ2 ) ≤ 0) − τ1 + 1(u0i (τ1 ) ≤ 0)]2 |xi }kzi,n,τ2 Ki,n,τ2 k2γ ≤(τ2 − τ1 )Ekzi,n,τ2 Ki,n,τ2 k2γ ≤ Chn (τ2 − τ1 )

Also, EkA2i k2γ ≤ Ek(zi,n,τ2 Ki,n,τ2 − zi,n,τ1 Ki,n,τ1 )k2γ ≤ Bhn (τ2 − τ1 )2γ by Lemma 3 (1). Similarly, EkA1i k2 ≤ Bhn (τ2 − τ1 ) and EkA2i k2 ≤ Bhn (τ2 − τ1 )2 . Combining all these, we have the right-hand side of (B.6) bounded by M (τ2 − τ1 )γ + M (nhn (τ2 − τ1 ))(1−γ) (τ2 − τ1 )γ for some finite M . It is then further bounded by 2M (τ2 − τ1 )γ by definition of bn and γ > 1. Thus (EkSn (τ2 , 0, 0) − Sn (τ1 , 0, 0)k)1/γ ≤ (2p + 1)(2M )1/γ (τ2 − τ1 ), which completes the proof. 39

Lemma 5. For any  > 0, η > 0, there exists δ > 0 such that for large n, P



sup τ 00 ,τ 0 ∈T,|τ 00 −τ 0 |≤δ

 kSn (τ 00 , 0, 0) − Sn (τ 0 , 0, 0)k >  < η.

Proof. We can without loss of generality assume that the components of zi,n,τ are nonnegative. Given a δ > 0, T contains at most 1/δ intervals of length δ. So it suffices to show that for any  > 0, η > 0, there exists δ > 0 such that P



sup s≤τ ≤δ+s,τ ∈T

 kSn (τ, 0, 0) − Sn (s, 0, 0)k >  < δη

(B.7)

for all s ∈ T for large n. Partition [s, δ + s] into bn = (nhn )1/2+κ intervals of equal sizes with 0 < κ < 1/2. Let τj stand for the lower limit of the j-th interval and τ1 = s. Then, sup s≤τ ≤δ+s,τ ∈T

kSn (τ, 0, 0) − Sn (s, 0, 0)k ≤ sup

sup

1≤j≤bn τ ∈[τj ,τj+1 ]

kSn (τ, 0, 0) − Sn (τj , 0, 0)k

+ sup kSn (τj , 0, 0) − Sn (s, 0, 0)k. 1≤j≤bn

For any τ ∈ [τj , τj+1 ], one has Sn (τ, 0, 0) − Sn (τj , 0, 0) ≥Sn (τj+1 , 0, 0) − Sn (τj , 0, 0) −1/2

+ (nhn )

n X (τj+1 − u0i (τj+1 < 0))(zi,n,τ Ki,n,τ − zτi,n,τj+1 Kτi,n,τj+1 ) i=1

n X −1/2 − (nhn ) (τj+1 − τj )zi,n,τ Ki,n,τ i=1

By Lemma 3, −1/2

k(nhn )

≤(nhn )−1/2

n X i=1

n X i=1

(τj+1 − u0i (τj+1 < 0))(zi,n,τ Ki,n,τ − zτi,n,τj+1 Kτi,n,τj+1 )k

k(zi,n,τ Ki,n,τ − zτi,n,τj+1 Kτi,n,τj+1 )k = op (1)

Also, since |τj+1 − τj | ≤ (nhn )−1/2−κ , k(nhn )

−1/2

n X i=1

(τj+1 − τj )zi,n,τ Ki,n,τ k = op (1) 40

Therefore, Sn (τ, 0, 0) − Sn (τj , 0, 0) ≥ Sn (τj+1 , 0, 0) − Sn (τj , 0, 0) − /5 with probability at least 1 − δη uniformly on T for large n. Similarly, we have for any τ ∈ [τj , τj+1 ], one has Sn (τ, 0, 0) − Sn (τj , 0, 0) ≤(nhn )−1/2

n X i=1

+ (nhn )−1/2 − (nhn )−1/2

1(u0i (τj ) < 0)(zτi,n,τj Kτi,n,τj − zi,n,τ Ki,n,τ )

n X

i=1 n X i=1

(τj+1 − τj )zi,n,τ Ki,n,τ τj (zτi,n,τj Kτi,n,τj − zi,n,τ Ki,n,τ )

=op (1) + op (1) + op (1). Therefore Sn (τ, 0, 0) − Sn (τj , 0, 0) ≤ /5 with probability at least 1 − δη uniformly on T for large n. So we have sup s≤τ ≤s+δ

kSn (τ, 0, 0) − Sn (s, 0, 0)k ≤ sup kSn (τj+1 , 0, 0) − Sn (τj , 0, 0)k + 1≤j≤bn

≤3 sup kSn (τj , 0, 0) − Sn (s, 0, 0)k + 1≤j≤bn

2 + sup kSn (τj , 0, 0) − Sn (s, 0, 0)k 5 1≤j≤bn

2 5

with probability at least 1 − δη uniformly on T for large n. Thus, to show (B.7), it suffices to show P ( sup kSn (τj , 0, 0) − Sn (τ1 , 0, 0)k > /5) < δη 1≤j≤bn

for large n. Theorem 12.2 of Billingsley (1968) states that, if there exists β ≥ 0, α > 1 and ul ≥ 0 for P l ∈ {1, ..., bn } such that E(kSn (τj , 0, 0) − Sn (τi , 0, 0)kβ ) ≤ ( i ) ≤ −α Cβ,α (u1 + ... + ubn )α . 1≤j≤bn

Apply this statement with β = 2γ and α = γ. Lemma 4 implies for 0 ≤ i ≤ j ≤ bn , one has ¯ j − τi )α . Therefore EkSn (τj , 0, 0) − Sn (τi , 0, 0)kβ ≤ C(τ  ¯ α−1 ). P ( sup kSn (τj , 0, 0) − Sn (τ1 , 0, 0)k > /5) ≤ δ(( )−α Cβ,α Cδ 5 1≤j≤bn 41

¯ α−1 ) ≤ η and this completes the proof. We may choose δ so δ(( 5 )−α Cβ,α Cδ Lemma 6. Suppose that Assumption 6 holds. Let bn = (nhn )1/2+κ , κ ∈ (0, 1/2) and consider a partition of T into bn intervals of equal sizes. Let τj denote the lower limit of the j-th interval. Then, sup

sup

1≤j≤bn τj−1 ≤τ ≤τj

sup

sup

n

X

zi,n,τ Ki,n,τ {1(u0i (τj ) ≤ ei (τ )) − 1(u0i (τj ) ≤ ei (τj ))} = op (1)

(nhn )−1/2

sup

1≤j≤bn τj−1 ≤τ ≤τj kφn k≤log1/2 (nhn,τ )

− (nhn )−1/2

n X i=1

and

i=1



n

n   X

0 zi,n,τ Ki,n,τ 1 u0i (τj ) ≤ ei (τ ) + (nhn,τ )−1/2 zi,n,τ φn

(nhn )−1/2 i=1

0 zi,n,τj Ki,n,τj 1 u0i (τj ) ≤ ei (τj ) + (nhn,τj )−1/2 zi,n,τ φ j n

o

= op (1).

Proof. We can assume without loss of generality that zi,n,τ is a nonnegative scalar. First, consider the left-hand side of the first equation in the statement of lemma. By part (2) of Lemma 3, the left-hand side has the same order as sup

sup

1≤j≤bn τj−1 ≤τ ≤τj

k(nhn )

−1/2

n X i=1

zi,n,τj Ki,n,τj {1(u0i (τj ) ≤ ei (τ )) − 1(u0i (τj ) ≤ ei (τj ))}k

(B.8)

Furthermore, for any τ ∈ [τj , τj+1 ], for any  > 0, if n is large enough, it holds that ei (τ ) − ei (τj ) = op ((nhn )−1/2 ) ≤ (nhn )−1/2 uniformly on T with probability approaching one. Thus, (B.8) is bounded by sup k(nhn )−1/2

1≤j≤bn

n X i=1

zi,n,τj Ki,n,τj 1(ei (τj ) − (nhn )−1/2 ≤ u0i (τj ) ≤ ei (τj ) + (nhn )−1/2 )k.

Let ξi,τj =1(ei (τj ) − (nhn )−1/2 ≤ u0i (τj ) ≤ ei (τj ) + (nhn )−1/2 ) − E[1(ei (τj ) − (nhn )−1/2 ≤ u0i (τj ) ≤ ei (τj ) + (nhn )−1/2 )|xi ]. Then, (B.8) can be further bounded by −1/2

sup (nhn )

1≤j≤bn

k

n X

zi,n,τj Ki,n,τj ξi,τj k

i=1 n X −1/2

+ sup (nhn ) 1≤j≤bn

i=1

E

h

i

1(ei (τj ) − (nhn )−1/2 ≤ u0i (τj ) ≤ ei (τj ) + (nhn )−1/2 ) xi . 42

The second term is op (1). Note that by a union bound, the first term satisfies P



−1/2

sup (nhn )

1≤j≤bn

k

n X i=1



bn n   X X −1/2 zi,n,τj Ki,n,τj ξi,τj k > η ≤ P (nhn ) k zi,n,τj Ki,n,τj ξi,τj k . j=1

i=1

Applying Rosenthal inequality with γ > 1 to the j-th term on the right-hand side above shows that it is bounded by Cη

−2γ

(nhn )

−γ/2

n n  γ X −1/2 E (nhn ) E(kξi,τj k2 |xi )kzi,n,τj Ki,n,τj k2 i=1

n  o X −γ/2 +(nhn ) E kzi,n,τj Ki,n,τj k2γ E(kξi,τj k2 |xi ) i=1

Note that E(kξi,τj k2γ |xi ) ≤ E(kξi,τj k2 |xi ) ≤ B(nhn )−1/2 for some B > 0. The term inside the curly brackets is finite. Therefore, the display above can be bounded by CM η −2γ (nhn )−γ/2 . After summing over j, it is bounded by CM η −2γ (nhn )1/2+κ−γ/2 → 0 for γ > 1 + 2κ. This proves the first statement. 0 One can prove the second result using similar arguments since the difference between (nhn,τ )−1/2 zi,n,τ φn

and (nhn,τ )−1/2 zi,n,τj φn is of a smaller order than (nhn )−1/2 by Assumption 6(v). The uniformity can be shown as in Lemma 7. Lemma 7. Suppose that Assumption 6 holds. Let Kn = log1/2 (nhn ). We have sup

sup

τ ∈T kφn k≤Kn

kSn (τ, φn , ei (τ )) − Sn (τ, 0, ei (τ ))k = op (1).

Proof. We again assume without loss of generality that the components of zi,n,τ are all nonnegative. Partition T into bn = (nhn )1/2+κ intervals and let τj be the lower limit of the j-th interval. Applying Lemma B.8 and a similar argument as in Lemma B.7 gives that for any τ ∈ [τj , τj+1 ] and any  > 0, it holds uniformly for large n that Sn (τj+1 , φn , ei (τj+1 )) − Sn (τj , 0, ei (τj )) + 2 ≤Sn (τ, φn , ei (τ )) − Sn (τ, 0, ei (τ )) ≤Sn (τj , φn , ei (τj )) − Sn (τj+1 , 0, ei (τj+1 )) + 2.

43

By adding and subtracting terms, we obtain sup

sup

τ ∈T kφn k≤Kn

kSn (τ, φn , ei (τ )) − Sn (τ, 0, ei (τ ))k ≤2 sup kSn (τj+1 , 0, 0) − Sn (τj , 0, 0)k 1≤j≤bn

+4

sup 1≤j≤bn +1

+2

sup

kSn (τj+1 , 0, ei (τj )) − Sn (τj , 0, 0)k sup

1≤j≤bn +1 kφn k≤Kn

kSn (τj , φn , ei (τj )) − Sn (τj , 0, ei (τj ))k + 4. (B.9)

Consider the third term on the right-hand side. For any  > 0, the set {φn : kφn k ≤ Kn } can be partitioned into N (δ) spheres such that each of them has a diameter smaller than or equal to δ. Note N (δ) = O((Kn /δ)2p+1 ). Denote by Dh the spheres with center φh,n for h ∈ {1, ..., N (δ)}. For any φn ∈ Dh , one has 0 0 0 zi,n,τ φ − kzi,n,τj kδ ≤ zi,n,τ φ ≤ zi,n,τ φ + kzi,n,τj kδ. j h,n j n j h,n

Therefore, sup

sup

1≤j≤bn +1 kφn k≤Kn



sup

kSn (τj , φn , ei (τj )) − Sn (τj , 0, ei (τj ))k

sup

1≤j≤bn +1 k∈{1,2},φn ∈Dh 1≤h≤N (δ)

+

n

n o X

0 zi,n,τj Ki,n,τj E 1(u0i (τj ) ≤ ei (τj ) + (nhn,τj )−1/2 [zi,n,τ φ + (−1)k kzi,n,τj kδ])|xi

(nhn )−1/2 j h,n i=1

0 − E{1(u0i (τj ) ≤ ei (τj ) + (nhn,τj )−1/2 zi,n,τ φ )}

n j sup

max

max k(nhn )−1/2

1≤j≤bn +1 1≤h≤N (δ) k=1,2

n X i=1

zi,n,τj Ki,n,τj ξi,j,h,k k

(B.10)

where ξi,j,h,k =E

n

o

0 1(u0i (τj ) ≤ ei (τj ) + (nhnτj )−1/2 [zi,n,τ φ + (−1)k kzi,n,τj kδ]) xi j n

0 − 1(u0i (τj ) ≤ ei (τj ) + (nhnτj )−1/2 [zi,n,τ φ + (−1)k kzi,n,τj kδ]) j n

+ E{1(u0i (τj ) ≤ ei (τj ))|xi } − 1(u0i (τj ) ≤ ei (τj )). An application of the Taylor expansion shows that the first term of the right-hand side of (B.10) is of the same order as δ, which can be made arbitrarily small. For the second term, an application of

44

Rosenthal inequality shows that for any  > 0, 1 ≤ j ≤ bn + 1, 1 ≤ h ≤ N (δ) and k ∈ {1, 2}, we have n

  X

−1/2 bn N (δ)P (nhn ) zi,n,τj Ki,n,τj ξi,j,h,k >  ≤ M −2γ bn N (δ)(nhn )−γ/2 Knγ i=1

for large n, which can be shown to converge to zero for δ = Kn−κ using the definition of bn , N (δ) and Kn and choosing γ = 1 + 2κ + c for some c > 0. The second term in the right-hand side of (B.9) can be analyzed similarly and is op (1). The first term in the right-hand side of (B.9) is op (1) by Lemma 5. Lemma 8. Under Assumption 6, we have n

X

−1/2 sup (nhn ) {ψτ (u0i (τ ) − ei (τ )) − ψτ (u0i (τ ))}zi,n,τ Ki,n,τ = Op (1).

τ ∈T

i=1

Proof. The terms inside the norm can be written as Sn (τ, 0, 0) − Sn (τj , 0, ei (τ )) + (nhn )−1/2

n X i=1

zi,n,τ Ki,n,τ {P (u0i (τ ) ≤ 0) − E[1(u0i (τ ) ≤ ei (τ ))|xi ]} + op (1).

The third term is op (1) as shown in Lemma 7 and the rest is Op (1) by the Taylor expansion and due to ei (τ ) = O(hp+1 n ).

C

Practical Guideline

C.1

Bandwidth Choice

This section presents a guide to practice for bandwidth choice. Imbens and Kalyanaraman (2012), Calonico, Cattaneo and Titiunik (2014), and Arai and Ichimura (2016) provide data-driven optimal bandwidth selection algorithms for the mean regression discontinuity design. In this section, we propose a bandwidth selection rule based on the MSE for the local linear estimation of the conditional CDF, which is compatible with orders p > 1 for biased-corrected estimation. − 0 We define the following notations: u1 = [1, ud+ u , udu ] , N1 =

T1 = (c(τ ))−1

R

+ − + − 0 2 R [1, udu , udu ][1, udu , udu ] K (u)du

45

R

+ − + − 0 R [1, udu , udu ][1, udu , udu ]1 K(u)du,

and 2 = [0, 1, 0]0 , 3 = [0, 0, 1]0 . With the order

of polynomial set to one, Lemma 1 and Theorem 2 together imply that the approximate MSE is M SE(βb1+ (τ ) − βb1− (τ )) = Bias2 (βb1+ (τ ) − βb1− (τ )) + V ar(βb1+ (τ ) − βb1− (τ )), where

h 0 (N )−1 Z  ∂ 2 Q(τ |x+ )  ∂ 2 Q(τ |x− 1 + 2 0 0 0) − Bias(βb1+ (τ ) − βb1− (τ )) =hn,τ 2 d + d u u u u1 K(u)du 2 2 2! ∂x ∂x R Z  2  i 03 (N1 )−1 ∂ Q(τ |x+ ∂ 2 Q(τ |x− 2 0 0) + 0) − − d + d u u K(u)du u u 1 2! ∂x2 ∂x2 R V ar(βb1+ (τ ) − βb1− (τ )) =

and

1 τ (1 − τ )(02 N1−1 T1 (τ )N1−1 2 + 03 N1−1 T1 (τ )N1−1 3 − 202 N1−1 T1 (τ )N1−1 3 ) nh3n,τ fX (x0 )fY |X (Q(τ |x0 )|x0 )

Taking the first order condition with respect to the bandwidth, under Assumption 6, we obtain the approximate optimal choice of hn,τ for the QRKD estimand: h∗n,τ (s) =

 3 C (τ )  1 1 2 − 2 n 5, 2 C12 (τ )

where 0 (N1 )−1 C1 (τ ) = 2 2! − C2 (τ ) =

03 (N1 )−1 2!

Z  2  ∂ Q(τ |x+ ∂ 2 Q(τ |x− 0) + 0) − d + d u2 u01 K(u)du u u ∂x2 ∂x2 R Z  2  ∂ Q(τ |x+ ∂ 2 Q(τ |x− 0) + 0) − d + d u2 u01 K(u)du u u ∂x2 ∂x2 R

and

τ (1 − τ )(02 N1−1 T1 (τ )N1−1 2 + 03 N1−1 T1 (τ )N1−1 3 − 202 N1−1 T1 (τ )N1−1 3 ) . fX (x0 )fY |X (Q(τ |x0 )|x0 )

For bias-corrected estimation with an order p > 1, this bandwidth rule above provides a rate that is required by Assumption 6 (v). In the above formulas, the unknown densities, fX and fY |X , and the unknown conditional quantile function Q and its derivative

∂2 Q ∂x2

need to be replaced by the respective

estimates fbX , fbY |X , α ˇ , and βˇ2± . We thus propose to replace C1 (τ ) and C2 (τ ) by 0

b1 (τ ) = 2 (N1 ) C 2! −

−1

03 (N1 )−1 2!

Z  R

Z  R

 ˇ− (τ )d− u2 u0 K(u)du βˇ2+ (τ )d+ + β u u 1 2  ˇ− (τ )d− u2 u0 K(u)du βˇ2+ (τ )d+ + β u u 1 2

and

−1 −1 −1 −1 −1 −1 0 0 0 b2 (τ ) = τ (1 − τ )(2 N1 T1 (τ )N1 2 + 3 N1 T1 (τ )N1 3 − 22 N1 T1 (τ )N1 3 ) , C fbX (x0 )fbY |X (ˇ α(τ )|x0 )

46

respectively, where α ˇ (τ )

=ι01

βˇ2+ (τ ) =ι04 βˇ2− (τ ) =ι05

argmin

n X

(α,β1+ ,β1− ,β2+ ,β2− )∈R5 i=1

argmin

n X

(α,β1+ ,β1− ,β2+ ,β2− )∈R5 i=1

argmin

n X

(α,β1+ ,β1− ,β2+ ,β2− )∈R5 i=1

ρτ ρτ





2 v X − − (xi − x0 ) yi − α − (βv+ d+ , i + βv di ) v! v=1

2 v X − − (xi − x0 ) yi − α − (βv+ d+ + β d ) , v i i v!

and

v=1

2  v X − − (xi − x0 ) ρτ yi − α − (βv+ d+ . i + βv di ) v! v=1

Bandwidth choices for the preliminary estimates, fbX and fbY |X , can be conducted by standard rule-

¯ yn , h ¯ x )0 denote the bandwidths used for estimating fbX of-thumb or data-driven methods. Let hxn and (h n

and fbY |X , respectively. First, hxn may be obtained by minimizing approximate MISE. In other words,

hxn =

R

−2/5 R 1/5 u2 K(u)du K(u)2 du

 −5 −1/5 −1/5 3 √ σ n , X 8 π

where σX can be estimated by sample

variance of X. See Sections 3.3 and 3.4 of Silverman (1986). Second, Bashtannyk and Hyndman ¯ yn , h ¯ x )0 may be obtained by (2001) suggest that (h n ¯y , h ¯x 0 (h n n) = where R(K) =

R



1/4 1/6  58 )1/8 32R2 (K)σY5 (260π 9 σX d2 v ¯x, √ , h n 4 d5/2 v 3/4 [v 1/2 + d(16.25πσ 10 )1/4 ] 5 nσK 2.85 2πσX X

0

,

√ 3 (3d2 σ 2 + 8σ 2 ) − 32σ 2 σ 2 exp(−2), and d is the slope of an K 2 (u)du, v = 0.95 2πσX X Y Y X

2 is the variance with respect to the kernel function K. The variances, OLS of yi on [1, xi ]0 . Here, σK 2 and σ 2 , can be replaced by sample variances of x and y , respectively. σX i i Y

C.2

Pivotal Simulation and Implementation of Uniform Inference

As pointed out in Section 6 of Qu and Yoon (2015a) and Remark 2 of Qu and Yoon (2015b), the distribution of the process G(τ, j) is conditionally pivotal and the randomness of Uniform Bahadur Representations come only from {τ − 1{yi ≤ Q(τ |xi )}}ni=1 conditional on data. In this light, we can simulate the distribution of G(τ, j) in the following manner. In each iteration, we generate i.i.d.

{ui }ni=1 ∼ U nif orm(0, 1) independently from data, and evaluate {τ − 1{ui ≤ τ }}ni=1 in place of {τ − 1{yi ≤ Q(τ |xi )}}ni=1 in the Uniform Bahadur representations. Repeat this process many times. With this procedure, we can perform the tests of significance and heterogeneity as in Section 3.2

47

via simulating the supremum of G(τ, j). The following algorithm presents a complete procedure to implement the non-standardized test of significance and test of heterogeneity in corollary 4. Algorithm 1. 1. Discretize T into a grid points Td = {t1 , ..., tT }. For each τ ∈ Td , estimate (b α(τ ), βb1+ (τ ), βb1− (τ )). 2. Estimate fbX (x0 ) and estimate fbY |X (b α(τ )|x0 ) for each τ ∈ Td . i.i.d.

3. Generate {ui }ni=1 ∼ U nif orm(0, 1) independently from data. 4. For each τ ∈ Td , compute P (ι02 − ι03 )N −1 ni=1 zi,n,τ Ki,n,τ (τ − 1{ui ≤ τ }) b Y1 (τ ) = p 0 − b b (b0 (x+ α(τ )|x0 ) 0 ) − b (x0 )) nhn,τ fX (x0 )fY |X (b

5. Iterate the third and fourth steps M times to obtain {Ybj (·)}M j=1 on Td . 6. Compute the test statistic(s) W Sn (Td ) and/or W Hn (Td ).

7. Compute the p-th quantile(s) of maxτ ∈Td |Ybj (τ )| and/or maxτ ∈Td |φ0QRKD (Ybj )(τ )|, the simulated critical values for the test statistic(s), W Sn (Td ) and/or W Hn (Td ), respectively.

To compute standardized version of the test statistics, we also need to obtain estimates for σ bs and

σ bh . We compute them based on the standard deviations of Asn (τ ) =

Ahn (τ )

q \ ) nh3n,τ QRKD(τ

and

Z q h i −1 3 \ \ 0 )dτ 0 . = sup nhn,τ QRKD(τ ) − |T | QRKD(τ τ ∈T

T

The following algorithm outlines a procedure for the standardized version of the test. Algorithm 2. Steps 1–5 remain the same as those in Algorithm 1. 6. Compute the test statistic(s) W Snstd (Td ) and/or W Hnstd (Td ). 7. Compute the p-th quantile(s) of maxτ ∈Td |Ybj (τ )/b σ s (τ )| and/or maxτ ∈Td |φ0QRKD (Ybj )(τ )/b σ h (τ )|, the simulated critical values for the test statistic(s), W Snstd (Td ) and/or W Hnstd (Td ), respectively. 48

References Angrist, Joshua D., Kathryn Graddy, and Guido W. Imbens (2000) The Interpretation of Instrumental Variables Estimators in Simultaneous Equations Models with an Application to the Demand for Fish, Review of Economic Studies, Vol. 67, No. 3, pp. 499–527. Angrist, Joshua D. and Guido W. Imbens (1995) Two-Stage Least Squares Estimation of Average Causal Effects in Models with Variable Treatment Intensity, Journal of the American Statistical Association, Vol. 90, No. 430, pp. 431–442. Arai, Yoichi, and Hidehiko Ichimura (2016) Optimal Bandwidth Selection for the Fuzzy Regression Discontinuity Estimator, Economics Letters, Vol 141, pp. 103–106. Bashtannyk, David M., and Rob J. Hyndman (2001) Bandwidth selection for kernel conditional density estimation, Computational Statistics and Data Analysis, Vol. 36, No. 3, pp. 279–298. Billingsley, Patrick (1968) Convergence of probability measures. John Wiley and Sons. Card, David, David Lee, Zhuan Pei, and Andrea Weber (2016) Inference on Causal Effects in a Generalized Regression Kink Design, Econometrica, Vol. 83, No. 6, pp. 2453–2483. Calonico, Sebastian, Matias D. Cattaneo, and Rocio Titiunik (2014) Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs, Econometrica, Vol. 82, No. 6, pp. 2295–2326. Calonico, Sebastian, Matias D. Cattaneo, Max Farrell and Rocio Titiunik (2016) Regression Discontinuity Designs Using Covariates, Working Paper. Cattaneo, Matias D., and Juan Carlos Escanciano (2016) Regression Discontinuity Designs: Theory and Applications, Advances in Econometrics, Vol. 38 (Forthcoming) Chernozhukov, Victor and Iv´ an Fern´ andez-Val (2005) Subsampling Inference on Quantile Regression Processes, Sankhya: The Indian Journal of Statistics, Vol. 67, No. 2, pp. 253–276.

49

Cook, Thomas D. (2008) Waiting for Life to Arrive: a History of the Regression-Discontinuity Design in Psychology, Statistics and Economics, Journal of Econometrics, Vol. 142, No. 2, pp. 636–654. Dong, Yingying (2016) Jump or Kink? Identifying Education Effects by Regression Discontinuity Design without the Discontinuity, Working Paper. Frandsen, Brigham R., Markus Fr¨ olich and Blaise Melly (2012) Quantile Treatment Effects in the Regression Discontinuity Design, Journal of Econometrics, Vol. 168, No.2 pp. 382-395. Guerre, Emmanuel and Camille Sabbah (2012) Uniform Bias Study and Bahadur Representation for Local Polynomial Estimators of the Conditional Quantile Function, Econometric Theory, Vol. 26, No. 5, pp. 1529-1564. Heckman, James J. and Edward Vytlacil (2005) Structural Equations, Treatment Effects, and Econometric Policy Evaluation, Econometrica, Vol. 73, No. 3, pp. 669–738. Imbens, Guido and Thomas Lemieux (2008) Special Issue Editors’ Introduction: The Regression Discontinuity Design – Theory and Applications, Journal of Econometrics, Vol. 142, No. 2, pp. 611–614. Imbens, Guido W. and Jeffrey M. Wooldridge (2009) Recent Developments in the Econometrics of Program Evaluation, Journal of Economic Literature, Vol. 47, No. 1, pp. 5–86. Imbens, Guido W. and Karthik Kalyanaraman (2012) Optimal Bandwidth Choice for the Regression Discontinuity Estimator, Review of Economic Studies, Vol. 79, No. 3, pp. 933–959. Kato, Ryutah and Yuya Sasaki (2017) On Using Linear Quantile Regressions for Causal Inference. Econometric Theory, Vol. 33, No. 3, pp. 664–690. Knight, Keith (1998) Limiting distributions for L1 regression estimators under general conditions. (1998) Annals of statistics Vol. 26, No.2, pp. 755-770.

50

Koenker, Roger (2005) Quantile Regression, Cambridge University Press: Cambridge. Koenker, Roger and Zhijie Xiao (2002) Inference on the quantile regression process, Econometrica, Vol. 70, No. 4, pp.1583–1612. Kong, Efang, Oliver B. Linton, and Yingcun Xia (2010) Uniform Bahadur Representation for Local Polynomial Estimates of M-Regression and its Application to the Additive Model, Econometric Theory, Vol. 26, No. 5, pp. 1529-1564. Landais, Camille (2011) Heterogeneity and Behavioral Responses to Unemployment Benefits over the Business Cycle, Working Paper, LSE. Landais, Camille (2015) Assessing the Welfare Effects of Unemployment Benefits Using the Regression Kink Design, American Economic Journal: Economic Policy, Vol. 7, No. 4, pp. 243–278. Lee, David S., and Thomas Lemieux (2010) Regression Discontinuity Designs in Economics, Journal of Economic Literature, Vol. 48, No. 2, pp. 281–355. Moffitt, Robert (1985) The Effect of the Duration of Unemployment Benefits on Work Incentives: An Analysis of Four Datasets, Unemployment Insurance Occasional Papers 85-4, U.S. Department of Labor, Employment and Training Administration. Nielsen, Helena Skyt, Torben Sørensen, and Christopher Taber (2010) Estimating the Effect of Student Aid on College Enrollment: Evidence from a Government Grant Policy Reform, American Economic Journal: Economic Policy, Vol. 2, No. 2, pp. 185–215. Oka, Tatsushi, and Zhongjun Qu (2011)Estimating structural changes in regression quantiles. Journal of Econometrics Vol. 162, No. 2, pp. 248-267. Padula, Mariarosaria (2011) Asymptotic Stability of Steady Compressive Fluids. Springer. Qu, Zhongjun and Jungmo Yoon (2015a) Nonparametric Estimation and Inference on Conditional Quantile Processes, Journal of Econometrics, Vol. 185, No.1 pp. 1-19. 51

Qu, Zhongjun and Jungmo Yoon (2015b) Uniform Inference on Quantile Effects under Sharp Regression Discontinuity Designs, Working Paper, 2015. Sabbah, Camille (2014) Uniform Confidence Bands for Local Polynomial Quantile Estimators, ESAIM: Probability and Statistics, Vol. 18, pp. 265-276. Sasaki, Yuya (2015) What Do Quantile Regressions Identify for General Structural Functions?, Econometric Theory, Vol. 31, No. 5, pp. 1102-1116. Silverman, Bernard W. (1986) Density Estimation for Statistics and Data Analysis, Chapman & Hall/CRC: London. Simonsen, Marianne, Lars Skipper, and Niels Skipper (2015) Price sensitivity of demand for prescription drugs: Exploiting a regression kink design, Journal of Applied Econometrics, Forthcoming. van der Vaart, Aad W. (1998) Asymptotic Statistics, Cambridge University Press: Cambridge. Yitzhaki, Shlomo (1996) On Using Linear Regressions in Welfare Economics, Journal of Business and Economic Statistics, Vol. 14, No. 4, 478–486.

Tables and Figures

52

|Bias|

Structure 1

SD

RMSE

N=

1000

2000

4000

1000

2000

4000

1000

2000

4000

τ = 0.10

0.00

0.00

0.02

0.28

0.24

0.20

0.28

0.24

0.20

τ = 0.20

0.00

0.00

0.01

0.22

0.19

0.16

0.22

0.19

0.16

τ = 0.30

0.00

0.00

0.01

0.19

0.17

0.14

0.19

0.17

0.14

τ = 0.40

0.01

0.00

0.01

0.18

0.15

0.13

0.18

0.16

0.13

τ = 0.50

0.00

0.00

0.00

0.18

0.15

0.13

0.18

0.15

0.13

τ = 0.60

0.00

0.00

0.00

0.18

0.16

0.14

0.18

0.16

0.14

τ = 0.70

0.00

0.00

0.00

0.19

0.17

0.14

0.19

0.17

0.14

τ = 0.80

0.00

0.00

0.00

0.21

0.18

0.16

0.21

0.18

0.16

τ = 0.90

0.01

0.00

0.00

0.28

0.24

0.21

0.28

0.24

0.21

|Bias|

Structure 2

SD

RMSE

N=

1000

2000

4000

1000

2000

4000

1000

2000

4000

τ = 0.10

0.04

0.03

0.02

0.38

0.34

0.29

0.38

0.34

0.29

τ = 0.20

0.00

0.00

0.00

0.33

0.28

0.24

0.33

0.28

0.24

τ = 0.30

0.00

0.00

0.00

0.28

0.25

0.21

0.28

0.25

0.21

τ = 0.40

0.00

0.00

0.00

0.25

0.22

0.18

0.25

0.22

0.18

τ = 0.50

0.01

0.01

0.00

0.22

0.19

0.16

0.23

0.19

0.16

τ = 0.60

0.02

0.02

0.02

0.20

0.17

0.14

0.21

0.17

0.14

τ = 0.70

0.04

0.03

0.03

0.19

0.15

0.11

0.19

0.15

0.12

τ = 0.80

0.04

0.04

0.04

0.21

0.15

0.11

0.21

0.16

0.12

τ = 0.90

0.02

0.02

0.04

0.28

0.20

0.15

0.28

0.20

0.15

Table 1: Simulated finite-sample statistics of the QRKD estimates.

53

September 1981 – September 1982 Dependent Variable

UI Claimed

RKD (Landais, 2015) QRKD

UI Paid

0.038

(0.009)

0.040

(0.009)

τ = 0.10

0.000

(0.010)

0.022

(0.008)

τ = 0.20

0.037

(0.011)

0.036

(0.011)

τ = 0.30

0.053

(0.012)

0.060

(0.011)

τ = 0.40

0.070

(0.013)

0.070

(0.012)

τ = 0.50

0.081

(0.014)

0.080

(0.013)

τ = 0.60

0.093

(0.015)

0.089

(0.016)

τ = 0.70

0.086

(0.015)

0.068

(0.012)

τ = 0.80

0.154

(0.024)

0.142

(0.022)

τ = 0.90

0.145

(0.017)

0.159

(0.016)

Test of Significance

p-Value

0.000

0.000

Standardized Test of Significance

p-Value

0.000

0.000

Test of Heterogeneity

p-Value

0.000

0.000

Standardized Test of Heterogeneity

p-Value

0.000

0.000

Table 2: Empirical estimates and inference for the causal effects of UI benefits on unemployment durations based on the RKD and QRKD. The period of data is from September 1981 to September 1982. The numbers in parentheses indicate standard errors.

54

September 1982 – December 1983 Dependent Variable

UI Claimed

RKD (Landais, 2015) QRKD

UI Paid

0.046

(0.006)

0.042

(0.006)

τ = 0.10

0.030

(0.014)

0.029

(0.014)

τ = 0.20

0.067

(0.019)

0.066

(0.019)

τ = 0.30

0.083

(0.019)

0.082

(0.021)

τ = 0.40

0.091

(0.021)

0.085

(0.023)

τ = 0.50

0.112

(0.016)

0.118

(0.017)

τ = 0.60

0.072

(0.021)

0.075

(0.020)

τ = 0.70

0.094

(0.016)

0.100

(0.020)

τ = 0.80

0.026

(0.014)

0.032

(0.015)

τ = 0.90

0.065

(0.034)

0.068

(0.037)

Test of Significance

p-Value

0.002

0.005

Standardized Test of Significance

p-Value

0.000

0.000

Test of Heterogeneity

p-Value

0.140

0.142

Standardized Test of Heterogeneity

p-Value

0.000

0.000

Table 3: Empirical estimates and inference for the causal effects of UI benefits on unemployment durations based on the RKD and QRKD. The period of data is from September 1982 to December 1983. The numbers in parentheses indicate standard errors.

55

Figure 1: Simulated distributions of QRKD estimates. Structure 1; N = 1, 000

Structure 2; N = 1, 000

Structure 1; N = 2, 000

Structure 2; N = 2, 000

Structure 1; N = 4, 000

Structure 2; N = 4, 000

56

(A) Acceptance Probabilities for the 95% Level Test of Significance Without Standardization

With Standardization

(B) Acceptance Probabilities for the 95% Level Test of Heterogeneity Without Standardization

With Standardization

Figure 2: Acceptance probabilities for the 95% level uniform test of significance (panel A) and the 95% level uniform test of heterogeneity (panel B) based on 2,500 replications.

57