Specification testing for transformation models with an application to generalized accelerated failure-time models

Specification testing for transformation models with an application to generalized accelerated failure-time models

Journal of Econometrics 184 (2015) 81–96 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate...

566KB Sizes 1 Downloads 110 Views

Journal of Econometrics 184 (2015) 81–96

Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Specification testing for transformation models with an application to generalized accelerated failure-time models✩ Arthur Lewbel a,∗ , Xun Lu b , Liangjun Su c a

Department of Economics, Boston College, United States

b

Department of Economics, Hong Kong University of Science and Technology, Hong Kong

c

School of Economics, Singapore Management University, Singapore

article

abstract

info

Article history: Received 2 May 2013 Received in revised form 22 July 2014 Accepted 13 September 2014

This paper provides a nonparametric test of the specification of a transformation model. Specifically, we test whether an observable outcome Y is monotonic in the sum of a function of observable covariates X plus an unobservable error U. Transformation models of this form are commonly assumed in economics, including, e.g., standard specifications of duration models and hedonic pricing models. Our test statistic is asymptotically normal under local alternatives and consistent against nonparametric alternatives violating the implied restriction. Monte Carlo experiments show that our test performs well in finite samples. We apply our results to test for specifications of generalized accelerated failure-time (GAFT) models of the duration of strikes. © 2014 Elsevier B.V. All rights reserved.

JEL classification: C12 C14 Keywords: Additivity Control variable Endogenous variable Monotonicity Nonparametric nonseparable model Hazard model Specification test Transformation model Unobserved heterogeneity

and H1 : Rdx → R such that 1. Introduction

Y = G [H1 (X ) + U] a.s., and G is strictly monotonic;

Consider a scalar observable outcome Y , a dx × 1 vector of observable covariates of interest X , and a scalar unobservable cause or error U. Our goal of this paper is to test the following hypotheses1 :

H10 : There exist two measurable functions G : R → R

✩ Halbert White inspired this project, brought us together to work on it, and provided substantial advice, discussion, and enthusiasm. We deeply mourn his passing. We gratefully thank the co-editor Jianqinq Fan, the associate editor, and three anonymous referees for their many constructive comments. We are also indebted to Songnian Chen, Yanqin Fan, and Shakeeb Khan for helpful comments and suggestions. Lu acknowledges support from Hong Kong University of Science and Technology under grant number FSECS13BM02. Su acknowledges support from the Singapore Ministry of Education for Academic Research Fund under grant number MOE2012-T2-2-021. ∗ Corresponding author. Tel.: +1 617 552 3678; fax: +1 617 552 2308. E-mail address: [email protected] (A. Lewbel). 1 The error term U is a scalar under the null H , however, under the alternative 10

H1A , Y could be a function of X and a vector of unobservable errors U. For example, the alternative might include models with random coefficients. More generally, we http://dx.doi.org/10.1016/j.jeconom.2014.09.008 0304-4076/© 2014 Elsevier B.V. All rights reserved.

H1A : H10 is false. Specifications that are monotonic functions of additive models have been called ‘‘transformation models’’ (e.g. Chiappori et al., 2013), or ‘‘transformed additively separable models’’ (e.g. JachoChávez et al., 2010), or ‘‘generalized additive models with unknown link function’’ (e.g. Horowitz, 2001; Horowitz and Mammen, 2004). Broadly speaking, there are two kinds of transformation models that are common in the economics literature. The first type assumes that Y and X are observable, U is unobservable, and the link function G (·) may be known or unknown. Our paper belongs to this category. Ridder (1990), Horowitz (1996), Ekeland et al. (2004), Ichimura and Lee (2011), and Chiappori et al. (2013) discuss identification and estimation for transformation models of this

could write H1A as Y = R (X , U), and then under the null there would exist a scalar valued function H2 such that U = H2 (U). We later define a more specific set of alternatives that our test has power against.

82

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

category. In this class of models, the functions G and H1 and the distribution of U are identified and estimated. In the second type of transformation model, both X and U are observable, and Y is an object that can be estimated such as a conditional mean or quantile function. Horowitz (2001), Horowitz and Mammen (2004, 2007, 2011), Horowitz and Lee (2005), and Jacho-Chávez et al. (2010) provide identification and estimation results for this second kind of transformation model, while Gozalo and Linton (2001) consider specification tests for such models. See also Horowitz (2014) for a recent survey on the latter class of models. The transformation models under our null are commonly used (and hence assumed to hold) in a wide range of economic applications. For example, they are often used to study duration data (see, e.g. Heckman and Singer, 1984; Keifer, 1988; Mata and Portugal, 1994; Engle, 2000; Abbring et al., 2008). In particular generalized accelerated failure-time (GAFT) models, which includes accelerated failure-time (AFT) models, proportional hazard (PH) models, and mixed proportional hazard (MPH) models as special cases, are all examples of models that satisfy our null hypothesis. The MPH specification in particular is a widely used class of duration data specifications (for a review, see Van den Berg, 2001). Despite its popularity, economic theory rarely justifies the MPH or other GAFT specifications. For example, Van den Berg (2001, p. 3400) points out that ‘‘the MPH model specification is not derived from economic theory and it remains to be seen whether the MPH specification is actually able to capture important theoretical relations.’’ He also provides some specific economic examples where the MPH specification is violated. In their microeconometrics textbook, Cameron and Trivedi (2005, p. 613) say that ‘‘the multiplicative heterogeneity assumption [in MPH models] is also rather special, but it is mathematically ˙ convenient. . . ’’Given the popularity (and the limitations) of GAFT models, especially MPH models, it is obvious that a formal specification test of these models would be useful for empirical research. While some specification tests for certain parametric forms of duration models exist (see, e.g. Fernandes and Grammig, 2005), to the best of our knowledge, ours is the first that specifically tests for some testable implications of the general specification of GAFT models.2 Another major set of applications of transformation model specifications where U is unobservable are hedonic models (see, e.g. Ekeland et al., 2004; Heckman et al., 2005). Here again, we believe that our paper is the first to provide a general specification test for this class of transformation models. A conditional exogeneity assumption is imposed to test H10 , i.e., we assume that U and X are conditionally independent, conditioning on an observable covariate vector Z . This is analogous to the conditional unconfoundedness assumption in the treatment effect literature, and to the assumptions required for use of control function type methods of dealing with endogeneity (See, e.g. Heckman and Robb, 1986 and Blundell and Powell, 2003. In a control function setting Z would be the errors obtained after regressing endogenous elements of X on a vector of instruments.) Chiappori et al. (2013) provide a nonparametric estimator for the transformation model under similar assumptions. Our test allows for a covariate vector Z , but unlike some other estimators and tests (see below), we do not require a vector Z , i.e., our test can also be applied when U and X are unconditionally independent and no other relevant covariates are observed.

We first show that if the data are generated by a transformation model, i.e., if H10 holds, then the ratio of the derivatives with respect to Y and to X of the conditional CDF of Y given (X , Z ) can be written as a product of functions of X and Y .3 We then use local polynomial methods to estimate these derivatives, and construct test statistics based on the L2 distance between restricted and unrestricted estimators of this ratio of derivatives. We show that our test statistic is asymptotically normal under the null and under a sequence of Pitman local alternatives and is consistent against the alternatives violating the implied restriction. To facilitate the application of our test, we use subsampling to obtain p-values or critical values. We also evaluate our test both in a Monte Carlo setting, and in an empirical application concerning duration of strikes by manufacturing workers. Our null H10 is weaker than additive separability but stronger than monotonicity. Lu and White (2014) and Su et al. (forthcoming) propose tests for additive separability under the same conditional exogeneity assumption that U is independent of X given Z . Specifically, they test whether there exists an unknown measurable function G1 such that

2 Recently, Chiappori et al. (2013) provide a nonparametric test, not for the transformation model specification itself, but for a conditional exogeneity assumption within the context of a transformation model. Still, their test might be interpreted as a model specification test. See Remark 2.6 in Section 2.1 for details.

3 Horowitz (1996) considers the estimation of the semiparametric model under our null, where the function H1 takes a parametric form (unlike our nonparametric case) and without covariates Z . His estimator also relies on the implication that the ratio of the derivatives is a multiplicative function of X and Y .

Y = G1 (X ) + U

a.s.

Testing H10 is more general than testing for separability, since our null is equivalent to additive separability in the special case where G is known to be the identity function. Hence if we reject our H10 , then we also reject their additive separability. Hoderlein et al. (2014, HSWY hereafter) test for monotonicity under a conditional exogeneity assumption. HSWY test whether there exists a function R such that Y = R(X , U ) where R is strictly monotonic in its second argument. Our null is stronger than monotonicity, so if the HSWY test rejects monotonicity, then our null H10 is also rejected. Our null H10 combines monotonicity with the additional restriction that the observable X and unobservable U are additively separable under a transformation function G. Our test exploits this additivity restriction, and so should be generally stronger than HSWY for testing our null H10 . Also, the HSWY test requires that Z not be empty, while our test of H10 can be applied even if we have no conditioning covariates Z . The rest of the paper is organized as follows. In Section 2, we propose and motivate our test. In Section 3, we show that our test statistics are asymptotically normal under the null, and we analyze their global and local power. In Section 4, we conduct some Monte Carlo simulations to evaluate the finite sample performance of our test statistics. In Section 5, we provide an empirical application to test the specification of GAFT models in data on the durations of strikes. In Section 6, we discuss extensions to other closely related hypotheses. Section 7 concludes. The proofs of the main results in the paper are relegated to Appendices A–C and those for the technical lemmas are given in the supplementary material (see Appendix D). 2. A specification test for transformation models In this section, we describe implications of H10 that are used to motivate our test construction, and then describe our proposed test statistic.

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

2.1. Motivation To construct our test, we first impose a conditional exogeneity assumption. Let X ⊥ U | Z denote that X and U are independent given Z . Assumption A.1. Let Z be an observable random vector of dimension dz ∈ N, such that X ⊥ U | Z and that X and U are not measurable with respect to the sigma-field generated by Z . Assumption A.1 is equivalent to the unconfoundedness assumption in the treatment effect literature and is widely used to identify causal effects. For detailed discussions, see Altonji and Matzkin (2005), Hoderlein and Mammen (2007), Imbens and Newey (2009), and White and Lu (2011), among others. It is also closely related to the assumptions used to allow for endogeneity in the control function literature, where Z would equal the residuals from a regression of X on exogenous instruments. See, e.g., Heckman and Robb (1986) and Blundell and Powell (2003, 2004). Let F (· | x, z ) ≡ FY |X ,Z (· | x, z ) and f (· | x, z ) ≡ fY |X ,Z (· | x, z ) denote the conditional cumulative distribution function (CDF) and probability density function (PDF) of Y given (X , Z ) = (x, z ),  ′ respectively. Let V = X ′ , Z ′ . Let X, Y, V , and W denote the supports of X , Y , V , and W , respectively. Note that we allow the support of F (·|x, z ) to change according to the values x and z. Let D F (y|x,z ) r 0 (y; x, z ) ≡ fx(y|x,z ) , so r 0 (y; x, z ) is the ratio of two partial derivatives of F (y | x, z ), since f (y | x, z ) = ∂ F (y | x, z ) /∂ y and Dx F (y | x, z ) ≡ ∂ F (y | x, z ) /∂ x. The following theorem characterizes some useful properties of the transformation model under H10 . Theorem 2.1. Suppose that f (y | x, z ) ̸= 0 for all (y, x, z ) ∈ W . (a) If H10 and A.1 hold and the first order (partial) derivatives of G and H1 exist, then there exist two measurable functions s1 : Rdx → Rdx and s2 : R → R+ (or s2 : R → R− ) such that r 0 (Y ; X , Z ) = s1 (X ) s2 (Y )

a.s.,

(2.1)

where s1 (x) = −∂ S1 (x) /∂ x for some measurable function S1 : Rdx → R, and 1/s2 (y) = ∂ S2 (y) /∂ y for some measurable function S2 : R → R. (b) If there exist two measurable functions s1 : Rdx → Rdx and s2 : R → R+ (or s2 : R → R− ) such that (2.1) holds, s1 (x) = −∂ S1 (x) /∂ x for some function S1 : Rdx → R, and 1/s2 (y) = ∂ S2 (y) /∂ y for some measurable function S2 : R → R, then H10 holds in the sense that there exist two measurable functions G : R → R and H1 : Rdx → R such that Y = G [H1 (X ) + U]

a.s.

(2.2)

where G is strictly monotonic and differentiable, all first order partial derivatives of H1 exist, and U is a scalar unobservable random variable satisfying X ⊥ U | Z . Remark 2.1. Theorem 2.1(a) says that under H10 and the conditional exogeneity condition in A.1, the ratio r 0 (y; x, z ) is free of z and can be factored into the product of a function s1 of x and a function s2 of y, where the function s1 can be written as the derivative of a scalar function, and the function s2 does not alternate in sign on its support. Theorem 2.1(b) says the converse is also true: as long as the factorization in (2.1) holds with s1 and s2 satisfying appropriate conditions, the observables (Y , X , Z ) will satisfy the version of transformation model (2.2) under the null. Remark 2.2. Theorem 2.1 gives a characterization of H10 , but it does not by itself provide a test for H10 . The proof of Theorem 2.1(a) shows that s1 and s2 in the theorem depend on the unknown functions H1 and G, respectively, so we cannot directly test Eq. (2.1). We instead propose a feasible and straightforward test statistic that is based on implications of the factorization in (2.1).

83

Let Y0 ≡ [y, y¯ ] ⊂ Y for finite real numbers y and y¯ . Let 1 {·} denote the indicator function that equals one when · is true and zero otherwise, and let EY (·) and EXZ (·) denote expectations with respect to Y and (X , Z ), respectively. Define r (y; x, z ) ≡ r 0 (y; x, z ) 1 {y ∈ Y0 } ,

(2.3)

r0 ≡ EY EXZ [r (Y ; X , Z )] , r1 (x) ≡ E [r (Y ; x, Z )] , r2 (y) ≡ E [r (y; X , Z )] , where r (y; x, z ) denotes a trimmed version of r 0 (y; x, z ). Note that r , r0 , r1 and r2 are all dx × 1 vectors. The following corollary summarizes a testable implication of (2.1) under H10 and A.1. Corollary 2.2. Suppose that H10 and A.1 hold and r0 ̸= 0. Then r (Y ; X , Z ) ◦ r0 = r1 (X ) ◦ r2 (Y )

a.s.,

(2.4)

where ◦ denotes the Hadamard product. Remark 2.3. This corollary remains valid if we drop the indicator 1 {y ∈ Y0 } in the definition of r in (2.3). Equivalently, one can take Y0 = Y in the definition of r and still obtain the above result provided that r is well defined. We incorporate the indicator function in our theorem to permit the trimming of the data in the tails that facilitates the rigorous establishment of the asymptotic properties of our test. Specifically our asymptotic theory below requires consistent estimation of r (y; x, z ) uniformly in (y; x, z ) ∈ W0 ≡ Y0 × V . If f (y | x, z ) is too close to zero for some values of (y, x, z ) ∈ W , then we cannot estimate r (y; x, z ) uniformly in (y; x, z ) ∈ W at a sufficiently fast rate. We therefore restrict our attention to a subset W0 such that f (y | x, z ) is bounded away from zero on it. Based on Corollary 2.2, consider the following null hypothesis

H0 : Pr [r (Y ; X , Z ) ◦ r0 − r1 (X ) ◦ r2 (Y ) = 0] = 1.

(2.5)

The alternative hypothesis HA is the negation of H0 , i.e.,

HA : Pr [r (Y ; X , Z ) ◦ r0 − r1 (X ) ◦ r2 (Y ) = 0] < 1.

(2.6)

According to the characterization result in Theorem 2.1, rejection of (2.5) can only be due to the violation of either H10 , the original null hypothesis of interest, or the conditional exogeneity condition in A.1. Maintaining the conditional exogeneity assumption, we may therefore use the null hypothesis H0 to test the original null of interest, H10 . Alternatively, if we maintain the transformation model specification in H10 , our test can be used to test the conditional exogeneity Assumption A.1 (see Remark 2.6 for more on this last point). To test the null hypothesis H0 in (2.5), we use a construction analogous to that of Härdle and Mammen (1993) by considering the weighted L2 distance between r ◦ r0 and r1 ◦ r2 :

  Γ ≡ E ∥r (Y ; X , Z ) ◦ r0 − r1 (X ) ◦ r2 (Y )∥2 a (Y ; X , Z ) ,

(2.7)

where ∥·∥ denotes the Euclidean norm, and a (y; x, z ) is a nonnegative weight function that is continuous on its compact support Y0 × V0 , where V0 ⊂ V . Then Γ = 0 under H0 and generally deviates from zero under HA . In the next subsection we consider the sample version of Γ based on local polynomial estimates of r , r0 , r1 , and r2 . Remark 2.4. In a previous version of this paper, we considered another set of testable implications: Pr r (Y ; X , Z ) r0π − r1 (X ) r2π (Y ) = 0 = 1





under H10 ,

(2.8)

 ′ where r2π (y) ≡ π ′ r2 (y) , r0π ≡ π ′ r0 , and π ≡ π1 , . . . , πdx as a dx × 1 weight vector (e.g., π = (1/dx , . . . , 1/dx )). This exploits

84

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

the additional implication that s2 (·) is a scalar function under H10 , however, constructing a test based on Eq. (2.8) introduces the problem of choosing a weight vector π . Thus, following the suggestion of a referee, we now construct a test based on the null hypothesis (2.5), which is free of π . Note that when dx = 1, π can only equal 1, in which case (2.5) and (2.8) are equivalent. The null H0 only exploits some of the implications of Theorem 2.1(a). This suggests that there might be some additional testable restrictions in H10 that H0 ignores. Corollary 2.3 shows that when dx = 1, the only additional restriction implied by H10 that H0 ignores is a sign restriction on s2 (·), which is s2 : R → R+ or s2 : R → R− . That is, the sign of s2 (y) does not depend on y. Corollary 2.3. Suppose that dx = 1, f (y | x, z ) ̸= 0 for all (y, x, z ) ∈ W , Y0 = Y and r0 ̸= 0. Then H10 holds if and only if Pr [r (Y ; X , Z ) · |r0 | − r1 (X ) · |r2 (Y )| = 0] = 1.

(2.9)

In principle, we could test Eq. (2.9) directly. However, a test based on (2.9) would involve the non-differentiable absolute value function, which would greatly complicate the asymptotic analysis. When dx > 1 there are, in addition to the sign restriction of s2 (·), two more restrictions in H10 that H0 ignores, as follows. (i) We do not exploit the fact that s2 (·) is a scalar function under H10 . This information could be incorporated into our test as discussed in Remark 2.4. (ii) We do not exploit the fact that s1 (·), a dx multivariate function, equals a vector of the derivatives of an unknown scalar function, i.e., s1 (·) can be written as s1 (x) = −∂ S1 (x) /∂ x for some measurable function S1 : Rdx → R. It might be possible to exploit this restriction in the null hypothesis by imposing the implied constraint that the matrix of derivatives ∂ s1 (x) /∂ x′ is symmetric (though this would require an additional smoothness assumption). Overall, these differences between H10 and H0 appear to be relatively minor, so the main substance of the testable implications of H10 is given by the product form in H0 that we test. Remark 2.5. H0 is a necessary condition of H10 . Following a referee’s suggestion, we provide a characterization of the intersection of H0 and H1A . Let Y0 = Y. When dx = 1, the intersection of H0 and H1A includes all data generating processes (DGPs) that satisfy Eq. (2.5), but fail to satisfy Eq. (2.9). Even though we feel that the difference between Eqs. (2.5) and (2.9) is small, the intersection of H0 and H1A is not empty. To illustrate, consider the example under H1A that Y = R (X , U ) = X 2 U, where X has support on R+ \ {0} and U has support [a, b] where a < 0 and b > 0.4 In this example, r (Y ; X , Z ) = −

2Y X

 

,

2 r1 (X ) = − E (Y ) , X

r0 = −2E and

1

X

E (Y ) ,

r2 (Y ) = −2E

  1

X

Y.

then propose a weighted L2 -distance-based test of the conditional exogeneity condition (i.e., that X ⊥ U | Z ) by testing (2.10). We test implications of Y = G [H1 (X ) + U] against the general alternative given either X ⊥ U | Z or just X ⊥ U. In contrast, CKK assume G(H1 (X , Z ) + U ) holds and test if X ⊥ U | Z . Thus both our null and our alternative differ from CKK’s test, and so we view our test as a complement to theirs, not a substitute. However, one point of commonality is that, if one maintains the assumption that X ⊥ U | Z , then either CKK’s test or our test could be applied to test our null H10 . But even when maintaining this conditional independence assumption, there are still substantial differences between the tests for considering the null H10 . For example, unlike CKK, we exploit the exclusion restriction that Z does not enter the structural equation Y = G [H1 (X ) + U] under H10 , and so we have power in those directions.6 2.2. Estimation and test statistic The derivation in the previous section allows the covariates Z to be continuous or discrete. To describe our estimators and associated test statistics, we first consider the (more difficult) case where Z is continuous. Remark 2.7 then discusses the case where some or all of the elements of Z are discrete. We employ local polynomial regression to estimate various  ′ unknown population objects. Let v ≡ x′ , z ′ = (v1 , . . . , vd )′ be a d × 1 vector, d ≡ dx + dz , where x is dx × 1 and z is dz × 1. Let j ≡ (j1 , . . . , jd ) be a d-vector of non-negative integers. Following Masry (1996), adopt the notation

 0≤|j |≤p

When dx > 1, the intersection of H0 and H1A includes all DGPs that satisfy Eq. (2.5), but fail to satisfy of the following   one (or both)  conditions: (i) r (Y ; X , Z ) · r0π  − r1 (X ) · r2π (Y ) = 0, where

′

r2 (y) ≡ π r2 (y) , r0 ≡ π r0 , and π ≡ π1 , . . . , πdx as a dx × 1 weight vector (e.g., π = (1/dx , . . . , 1/dx )) (for the role ′

θ (y; x, z ) w (x, z ) d (x, z ) − θ (y; x, z ) = 0 ∀ (y, x, z ) , (2.10)  where w (·) is a weight function such that w(x, z )d(x, z ) = 1. CKK

j ! ≡ 5di=1 ji !,

|j | ≡

d 

ji ,

i =1

     1  1   · |E (Y )| · Y − E (Y ) |Y | ̸= 0. = −4 · E |E (Y )| X X 

π



j

r (Y ; X , Z ) · |r0 | − r1 (X ) · |r2 (Y )|



Remark 2.6. Chiappori et al. (2013, CKK hereafter) consider a model that is similar to our null model. Using our notation, their model can be written as Y = G (H1 (X , Z ) + U ), where X and Z are dx × 1 and dz × 1 vectors of observable variables, respectively, U is a scalar unobservable error term, and G : R → R is an unknown strictly monotone function and H1 : Rdx +dz → R is an unknown measurable function. They assume that X ⊥ U | Z and mainly focus on the identification and estimation of G (·) and H1 (·) .5 In an intermediate step in the proof of their identification result, CKK show that (again using our notation) r 0 (Y ; X , Z ) = s˜1 (X , Z ) · s2 (Y ) a.s., where s˜1 : Rdx +dz → R and s2 : R → R are two measurable functions. This is similar to our characterization of r 0 , though CKK use their result in a completely different way. In particular, CKK use their result to show that θ (y; x, z ) is a constant function of (x, z ), where θ (y; x, z ) = S (y; x, z ) /E [S (Y ; x, z )] and y S (y; x, z ) = 0 r 0 (u; x, z ) du. Equivalently,

v j ≡ 5di=1 vi i ,

It is easy to show that Eq. (2.5) holds. However,

π

of the weight π , see Remark 2.4); and (ii) r1 (·) can be written as r1 (x) = −∂ S1 (x) /∂ x for some measurable function S1 : Rdx → R.



4 If U has support on R+ \ {0}, then we can write Y = exp ln X 2 + ln (U ) ,

 

which is essentially a DGP under the null H10 by setting U˜ = ln (U ) .







p  k 

···

k 

.

k=0 j1 =0 jd =0 j1 +···+jd =k

5 To identify H (·), and hence for estimation, CKK also assume that they observe 1 an additional instrumental variable Q such that E (U | Q ) = 0 and that the conditional distribution of Z given Q is complete. 6 It is not surprising that our test would have advantages over CKK in testing H , 10

because that was not CKK’s intended null. Other differences are that our test is based directly on Corollary 2.2 while CKK’s is based on Eq. (2.10), we use local polynomials instead of Nadaraya–Watson kernel estimators, and our test is numerically simpler by not requiring any numerical integration, while CKK requires multiple numerical integration steps.

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96 j 5di=1 i i ,

From v j ≡ v the ji ’s represent powers applied to the elements of v when constructing polynomials. Consider the pth order local polynomial estimators Dx Fˆb (y|x, z ) of Dx F (y|x, z ). The subscript b = bn is a bandwidth parameter.  ′ Let Vi ≡ Xi′ , Zi′ so Vi − v = ((Xi − x)′ , (Zi − z )′ )′ . Given observations {(Yi , Vi ) , i = 1, . . . , n}, we estimate Dx F (y|v) by solving the weighted least squares problem min n−1 β

n 

2

 1 {Yi ≤ y} −

i =1

 0≤|j |≤p

× Kb (Vi − v) .

(2.11)

Here β stacks the βj ’s (0 ≤ |j | ≤ p) in lexicographic order (with β0 , indexed by 0 ≡ (0, . . . , 0), in the first position, the element with index (0, 0, . . . , 1) next, etc.) and Kb (·) ≡ K (·/b) /bd , where ˆ (y|v) denote the solution to K (·) is a symmetric PDF on Rd . Let β the above minimization problem. Let Nl ≡ (l + d − 1)!/(l!(d − 1)!) be the number of distinct d-tuples j having |j | = l. In the above estimation problem, this denotes the number of distinct lth order p partial derivatives of F (y|v) with respect to v . Let N ≡ l=0 Nl . Let µ (·) be a stacking function such that µ ((Vi − v)/b) denotes an N × 1 vector that stacks ((Vi − v) /b)j , 0 ≤ |j | ≤ p, in lexicographic order (e.g., µ (v) = (1, v ′ )′ when p = 1). Let µb (v) ≡ µ (v/b). Then

βˆ (y|v) = [Sb (v)]−1 n−1 n  × Kb (Vi − v) µb (Vi − v) 1 {Yi ≤ y} ,

(2.12)

i=1

n where Sb (v) ≡ n−1 i=1 Kb (Vi − v) µb (Vi − v) µb (Vi − v)′ . The

pth order local polynomial estimator Dx Fˆb (y|x, z ) of Dx F (y|x, z ) is given by

ˆ (y|x, z ) /b Dx Fˆb (y|x, z ) = e1 β

min n γ

n 

2

 Lc (Yi − y) −

i =1



γj ((Vi − v) /c ) ′

j

Kc (Vi − v) ,

0≤|j |≤p

where γ stacks the γj ’s (0 ≤ |j | ≤ p) in lexicographic order and Lc (·) ≡ L (·/c ) /c, with L (·) a symmetric kernel function defined on R and c ≡ cn a bandwidth parameter. Here, we use the same bandwidth sequence for Yi and Vi , although different choices of bandwidths are also possible. To reduce the bias of the estimator fˆc , we permit the use of a higher-order kernel for L. It is straightforward to verify that fˆc (y|v) = e′2 [Sc (v)]−1 n−1

×

n 

rˆ (y; x, z ) ≡ rˆ0 ≡

Dx Fˆb (y|x, z ) fˆc (y|x, z )

1 { y ∈ Y0 } ,

n n  1   rˆ Yi ; Xj , Zj , 2 n i =1 j =1 n 1

n i=1

rˆ (Yi ; x, Zi ) ,

and rˆ2 (y) ≡

n 1

n i =1

rˆ (y; Xi , Zi ) .

Our proposed test statistic is

Γˆ =

n 1 

n i =1

 rˆ (Yi ; Xi , Zi ) ◦ rˆ0 − rˆ1 (Xi ) ◦ rˆ2 (Yi )2

× a (Yi ; Xi , Zi ) ,

(2.15)

which is a sample analogue of Γ in (2.7). We next study the asymptotic properties of Γˆ under H0 , HA , and a sequence of Pitman local alternatives. Remark 2.7. The above estimators and associated tests are easily extended to allow some or all elements of Z to be discrete. To estimate r (y; x, z ) in this case, just stratify the sample by each distinct discrete outcome. Specifically, suppose Z = (Zc , Zd ), where Zc is continuous and Zd discrete. Then estimate r (y; x, z ) = r (y; x, zc , zd ) as above (replacing Z with Zc everywhere), just using the data having Zdi = zd , and repeat for each value zd in the support of Zd . The functions r0 , r1 , and r2 can be estimated exactly the same way, by averaging out (Xi , Yi , Zi ) , (Yi , Zi ), and (Xi , Zi ), respectively, and then our test statistic Γˆ is still given by (2.15). More sophisticated estimators (e.g., smoothing across the discrete Zd cells as proposed in Li and Racine, 2003) could also be used to estimate r these functions. We omit the details for brevity.

(2.13)

where e1 ≡ [0dx ×1 , Idx , 0dx ×(N −dx −1) ] selects the estimator of the coefficient of (Xi − x)/b in the above regression. To estimate f (y|v), the conditional PDF of Yi given Vi = v , we again employ local polynomial regression. Like Fan et al. (1996), we estimate f (y|v) as fˆc (y|v), the minimizing constant in the weighted least squares problem −1

Define

rˆ1 (x) ≡

βj′ ((Vi − v) /b)j

85

Kc (Vi − v) µc (Vi − v) Lc (Yi − y) ,

(2.14)

i =1

where e2 ≡ (1, 0, . . . , 0)′ is an N × 1 vector.7

7 To ensure that the estimator of f (y | v) is positive, we can replace fˆ (y | v) c √ here with a trimmed version defined as max{fˆc (y | v) , ϵ/ n}, where ϵ is a small positive number, e.g., ϵ = 0.01. This would not change our resulting asymptotic theory.

3. Asymptotic properties of the test statistic 3.1. Basic assumptions To study asymptotic properties of Γˆ , make the following assumptions.

′

Assumption C.1. Let Wi ≡ Yi , Xi′ , Zi′ , i = 1, 2, . . . , n, be IID random variables on (Ω , F , P ), with (Yi , Xi , Zi ) distributed identically to (Y , X , Z ).



Assumption C.2. (i) The PDF f (v) of Vi is continuous in v ∈ V , and f (y|v) is continuous in (y, v) ∈ Y0 × V . (ii) There exist C1 , C2 ∈ (0, ∞) such that C1 ≤ infv∈V f (v) ≤ supv∈V f (v) ≤ C2 , and C1 ≤ inf(y,v)∈Y0 ×V f (y|v) ≤ sup(y,v)∈Y0 ×V f (y|v) ≤ C2 . Assumption C.3. (i)   F (·|v) is equicontinuous on Y0 : ∀ϵ > 0, ∃δ > 0: y − y˜  < δ ⇒ supy∈Y0 |F (y|v) − F (˜y|v)| < ϵ . For each y ∈ Y0 , F (y | ·) is Lipschitz continuous on V and has all partial derivatives up to order p + 1, p ∈ N. (ii) Let Dj F (y|v) ≡ ∂ |j| F (y|v) /∂ j1 v1 . . . ∂ jd vd . For each y ∈ Y0 , Dj F (y | ·) with |j | = p + 1 is uniformly bounded and Lipschitz continuous on V : for all v, v˜ ∈ V , |Dj F (y | v) − Dj F (y | v˜ ) | ≤ C3 ∥v − v˜ ∥ for some C3 ∈ (0, ∞) where ∥·∥ is the Euclidean norm. (iii) For each v ∈ V and for all y, y˜ ∈ Y0 , |Dj F (y | v) − Dj F y˜ | v | ≤ C4 |y − y˜ | for some C4 ∈ (0, ∞) where |j | = p + 1.

86

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

Assumption C.4. Let r ≥ 2. The rth derivative f (r ) (y|v) of f (y|v) with respect to y and all the (p + 1)th partial derivatives of f (y|v) with respect to v are uniformly continuous on Y0 × V .

Assumption C.7∗ . (i) p > d/2 and r > d/2. (ii) As n → ∞, nb2(p+1)+d → 0 and nb2r +d+2 → 0. (iii) As n → ∞, min{nb2d , nb3(d+1)/2 / ln n, nbd+2 , nbdx +2 / ln n} →

∞. Assumption C.5. (i) The kernel K : Rd → R+ is a continuous, bounded, and symmetric PDF. (ii) v → ∥v∥2p+1 K (v) is integrable on Rd with respect to the Lebesgue measure. (iii) Let Kj (u) ≡ v j K (v) for all j with 0 ≤ |j | ≤ 2p + 1. For some finite constants σK , σ¯ 1 , and σ¯ 2 , either K (·) is compactly supported such that K (v) = 0 for ∥u∥ > σK , and |Kj (v) − d Kj (˜v )| ≤ σ¯ 2 ∥v − v˜ ∥ for any v, v˜ ∈ R  and for allj with 0 ≤  |j | ≤ 2p + 1; or K (·) is differentiable, ∂ Kj (v) /∂v  ≤ σ¯ 1 , and for some ι0 > 1, |∂ Kj (v) /∂v| ≤ σ¯ 1 ∥v∥−ι0 for all ∥v∥ > σK and for all j with 0 ≤ |j | ≤ 2p + 1. Assumption C.6. The univariate kernel function L satisfies L (y)2  dy s< ∞ and is a symmetric rth order kernel,  i.e., L(y)dy = 1, y L (y) dy = 0 for all s = 1, . . . , r − 1, and yr L (y) dy < ∞. The rth derivative of L exists and is continuous.



Assumption C.7. (i) p > d/2. (ii) As n → ∞, (c p+1 + c r )/bd/2 → 0, (bp + c p+1 + c r )bd/2+2 / c d+1 → 0, bd+4 /c d+1 → 0, nb2(p+1)+d → 0, and nbd+2 (c 2(p+1) + c 2r ) → 0. (iii) As n → ∞, min{nb2d , nb3d/2+1 / ln n, nbd+2 , nbdx +2 / ln n, nbd+1 c (d+1)/2/ln n, nbd/2 c d+1/ln n, nb−(d/2+2) c 2(d+1)/ln n, nb−1 c 3(d+1)/2 / ln n, nb−(d+4) c 3(d+1) } → ∞. We assume IID observations in Assumption C.1, which is standard in cross-section studies. Assumptions C.2–C.4 impose smoothness conditions on the conditional CDF F (y|v) and PDF f (y|v) that are used to ensure uniform consistency of our local polynomial estimators, based on results of Masry (1996) and Hansen (2008). Assumption C.2(ii) is relatively strong, but it is needed given the definitions of r (y; x, z ) , r0 , r1 (x) and r2 (y) and the uniform consistency of rˆ (y; x, z ) (uniformly in (y; (x, z )) ∈ Y0 × V ) to establish the asymptotic properties of the estimators rˆ0 , rˆ1 (x) and rˆ2 (y) ; see the proofs of Lemma B.3(ii)–(iii) in the supplemental material (see Appendix D). However, if we modify the definition of r (y; x, z ) to be r (y; x, z ) = r 0 (y; x, z ) 1 {(y; (x, z )) ∈ Y0 × V0 } and define r0 , r1 (x) and r2 (y) accordingly, then we can relax Assumption C.2(ii) to: Assumption C.2. (ii∗ ) There exist C1 , C2 ∈ (0, ∞) such that C1 ≤ infv∈V0 f (v) ≤ supv∈V0 f (v) ≤ C2 , and C1 ≤ inf(y,v)∈Y0 ×V0 f (y|v) ≤ sup(y,v)∈Y0 ×V0 f (y|v) ≤ C2 . In this case, we would only need to restrict f (v) and f (y|v) to be well behaved on V0 and Y0 × V0 , respectively.8 The asymptotic theory developed in the next two subsections would remain essentially the same. Assumptions C.5 and C.6 impose conditions on the kernels K and L, which are standard in the literature for local polynomial regression or conditional density estimation. Assumption C.7 restricts the choice of bandwidth sequences b and c, the order p of local polynomial regressions, and the order r of the kernel L. This assumption allows c to differ from b, but in the case where b = c Assumption C.7 simplifies to the following assumption:

Note that we allow dz = 0, otherwise the condition nbdx +2 / ln n → ∞ as n → ∞ becomes redundant. 3.2. Asymptotic null distribution In this section, we study the asymptotic behavior of the test  ′ statistic in (2.15). To state the next result, let wi ≡ yi , vi′ and introduce the following notation:

ζ1k (y; v) ≡ b−1 f (y|v)−1 e1 S¯ b (v)−1 µb (Vk − v) × Kb (Vk − v) 1¯ y (Wk ) 1{y ∈ Y0 }, ζ2k (y; v) ≡ f (y|v)−2 Dx F (y|v) e′2 S¯ c (v)−1 × µc (Vk − v) Kc (Vk − v) L¯ y (Wk ) 1{y ∈ Y0 }, ζk (y; v) ≡ ζ1k (y; v) − ζ2k (y; v) ,    ′ ζ Wi , Wj , Wk ≡ ζj (Yi ; Vi ) ◦ r0 (ζk (Yi ; Vi ) ◦ r0 ) ai ,      ϕ wi , wj ≡ E ζ W1 , wi , wj , where S¯ b (v) ≡ E [Sb (v)] , 1¯ y (Wk ) ≡ 1{Yk ≤ y} − F (y|Vk ), L¯ y (Wk ) ≡ Lc (Yk − y) − α (y|Vk ) , α (y|v) ≡ E [Lc (Yk − y)|Vk = v ] and ai ≡ a (Yi ; Xi , Zi ) . Define the asymptotic bias term d

B n ≡ n− 1 b 2 + 2

n 

ϕ (Wi , Wi )

i =1

 2 n  n n        ζk Yj ; Xi , Zj ◦ r2 (Yi ) ai + n− 4 b    j=1 k=1 i =1  ′ n n   d ζl (Yi ; Vi ) ◦ r0 − 2n−3 b 2 +2 d +2 2

i =1

 ×

n  n 

l =1

 ζk Yj ; Xi , Zj ◦ r2 (Yi ) ai 



j=1 k=1

≡ B1n + B2n − 2B3n ,

say.

We establish the asymptotic null distribution of the Γˆ test statistic as follows: Theorem 3.1. Suppose Assumptions A.1 and C.1–C.7 hold. Then d

d

nb 2 +2 Γˆ − Bn → N 0, σ02 ,





where σ02 ≡ limn→∞ σn2 and σn2 = 2bd+4 E [ϕ (W1 , W2 )2 ]. d

Remark 3.1. The asymptotic bias Bn of nb 2 +2 Γˆ contains three terms B1n , B2n , and −2B3n . The first two terms reflect the contributions of rˆ (Yi ; Vi ) ◦ rˆ0 and rˆ1 (Xi ) ◦ rˆ2 (Yi ), respectively, and the last term reflects the interaction between these latter d two terms. We show that B1n = OP (b 2 +2 (b−d−2 + c −d−1 )) in d

Lemma B.4, B2n = OP (b 2 +2 (b−dx −2 + c −dx )) in Lemma B.5(b), and d +2 2

B3n = OP (b (b−dx −2 + c −dx )) in Lemma B.6(b). Here B1n never vanishes asymptotically whereas B2n and B3n are asymptotically negligible under appropriate conditions, such as when b = c and d

8 We would like to thank an associate editor for pointing out that Assumption C.2 (ii) could be relaxed in this way.

dz > dx . The asymptotic variance σn2 of nb 2 +2 Γˆ only reflects the contribution of rˆ (Yi ; Vi ) ◦ rˆ0 , due to the faster convergence rate of rˆ1 (Xi ) ◦ rˆ2 (Yi ) to r1 (Xi ) ◦ r2 (Yi ) than that of rˆ (Yi ; Vi ) ◦ rˆ0 to r (Yi ; Vi ) ◦ r0 .

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

To implement the test, we need consistent estimates of the asymptotic bias and variance. Let

ζˆ1k (y; v) ≡ b−1 fˆc (y|v)−1 e1 Sb (v)−1 µb (Vk − v) × Kb (Vk − v) 1ˆ y (Wk ) 1{y ∈ Y0 }, ζˆ2k (y; v) ≡ fˆc (y|v)−2 Dx Fˆb (y|v) e′2 Sc (v)−1 × µc (Vk − v) Kc (Vk − v) Lˆ y (Wk ) 1{y ∈ Y0 }, ζˆk (y; v) ≡ ζˆ1k (y; v) − ζˆ1k (y; v) , n   ′     ζˆk (Yi ; Vi ) ◦ rˆ0 ai , ϕˆ Wj , Wk ≡ n−1 ζˆj (Yi ; Vi ) ◦ rˆ0 i =1

where 1ˆ y (Wk ) ≡ 1 {Yk ≤ y} − Fˆb (y|Vk ) , Lˆ y (Wk ) ≡ Lc (Yk − y) − fˆc (y|Vk ), and Fˆb (y|Vk ) is the pth order local polynomial estimator of F (y|Vk ) by using the kernel K and bandwidth b. We propose to estimate the asymptotic bias Bn and variance σn2 respectively by d

ˆ n ≡ n− 1 b 2 + 2 B

n 

ϕˆ (Wi , Wi )

i =1

2  n  n n        ζˆk Yj ; Xi , Zj ◦ rˆ2 (Yi ) ai +n b    i =1 j=1 k=1 ′ n n   d − 2n−3 b 2 +2 ζˆl (Yi ; Vi ) ◦ rˆ0 i =1 l =1   n  n    ˆ ζk Yj ; Xi , Zj ◦ rˆ2 (Yi ) ai , × −4

d +2 2

Tn ≡ nb

  Γˆ − Bˆ n / σˆ n2

(3.1)

to the critical value zα defined as the upper α percentile from the N (0, 1) distribution (since the test is one-sided) and reject the null when Tn > zα .

The following theorem shows that the test Tn is consistent for the class of global alternatives

HA : µA

Corollary 3.4. Suppose that the conditions in Theorem 3.3 hold. Then rn (y; x, z ) ◦ r0n − r1n (x) ◦ r2n (y)

¯ n (y; x, z ) + o (γn ) = γn ∆

for all (y, x, z ) ∈ W0n ,

¯ n (y; x, z ) is defined in (B.7) in Appendix and W0n is defined where ∆ as Wn but with y restricted on Y0 .

Yni ≡ Rn (Xni , Uni )

= G [H1 (Xni ) + Uni + γn · δ1 (Xni ) · Uni ] , where δ1 (·) : Rd → R is a measurable function. Further, we assume that δ1 (Xni ) > 0 a.s. and Y0 = Y. Then we can show that in this case,

  ≡ E ∥r (Y ; X , Z ) ◦ r0 − r1 (X ) ◦ r2 (Y )∥2 a (Y ; X , Z ) > 0. Theorem 3.2. Suppose Assumptions C.1–C.7 hold. Then Tn diverges under HA , i.e., P (Tn > en ) → 1 as d

n → ∞ under HA for any nonstochastic sequence en = o(nb 2 +2 ). To study the local power of our test, we consider the model Yni ≡ Rn (Xni , Uni ) = G [H1 (Xni ) + Uni + γn δ (Xni , Uni )] ,

(3.3)

¯ n (y; x, z ) more clearly, we consider a special Remark 3.2. To see ∆ case of Eq. (3.2):

3.3. Consistency and asymptotic local power

to infinity at the rate of nb

for all (y, x, z ) ∈ Wn

As in Section 2.1, let rn (y; x, z ) ≡ rn0 (y; x, z ) 1 {y ∈ Y0 } , r0n ≡ EYn EXn Zn [rn (Yni ; Xni , Zni )] , r1n (x) ≡ E [rn (Yni ; x, Zni )], and r2n (y) ≡ E [rn (y; Xni , Zni )]. The following corollary indicates that r1n (x) ◦ r2n (y) only deviates from rn (y; x, z ) ◦ r0n locally.

n  n   2 ϕˆ Wi , Wj .

d +2 2

Theorem 3.3. Suppose that A.1∗ holds. Suppose that fn (y | x, z ) is continuously differentiable with respect to both y and x for each z ∈ Zn and fn (y | x, z ) ̸= 0 for all (y, x, z ) ∈ Wn . Suppose that G : R → R is strictly increasing and continuously differentiable, H1 : Rdx → R is continuously differentiable, and δ (x, ·) is continuously differentiable for each x ∈ Xn . Then

where s1n : Rdx → Rdx and s2 : R → R+ (or R− ) are defined in (B.6) and ∆n (y; x, z ) is defined in (B.5).

ˆ n −Bn = oP (1) and σˆ n2 −σn2 = oP (1). It is straightforward to show B We can now compare d +2 2

Let δn (x, u) ≡ u + γn δ (x, u). Given the strict monotonicity of G, without loss of generality, we assume G is also strictly increasing. This, in conjunction with Assumption A.1∗ implies that δn (x, ·) is strictly increasing for each x on the support of Xni . Let Fn (· | x, z ) and fn (· | x, z ) denote the conditional CDF and PDF of Yni given (Xni , Zni ) = (x, z ), respectively. We now use Xn , Zn , and Wn to denote the supports of Xni , Zni , and (Yni , Xni , Zni ) , D F (y|x,z ) respectively. Let rn0 (y; x, z ) ≡ fx (ny|x,z ) . The following theorem n parallels Theorem 2.1(a) and lays down the foundation for the asymptotic local analysis of our test statistic.

+ o (γn )

i =1 j =1



Assumption A.1∗ . Xni ⊥ Uni | Zni and Eq. (3.2) holds such that Rn (x, ·) is strictly increasing for each x on the support of Xni .

rn0 (y; x, z ) = s1n (x) s2 (y) + γn ∆n (y; x, z )

j=1 k=1

σˆ n2 ≡ 2n−2 bd+4

87

H1′ (x)  − γn r (y; x, z ) = −  ∂ G−1 (y) /∂ y



G−1 (y) · δ1′ (x)  −1  ∂ G (y) /∂ y

(3.2)

where γn → 0 as n → ∞, δ (Xni , Uni ) is not additively separable in Xni and Uni , and G and H1 areas defined  under H10 in Section 1. Note that we allow both Yni and Xni, Uni to be double-array and the structural function Rn is now n-dependent. As before, we assume that Xni and Uni are independent given the dz -vector Zni : Xni ⊥ Uni | Zni . Following the literature on nonseparable models, we also assume that Rn (x, ·) is strictly increasing for each x on the support of Xni . Formally, we put these requirements into the following assumption.



1

− H1 (x) δ1 (x)  −1  + o (γn ) . ∂ G (y) /∂ y ′

Hence, ∆n (y; x, z ) in Theorem 3.3 is:

∆n (y; x, z ) = −δ1′ (x)

G−1 (y)

 ∂ G−1 (y) /∂ y 

1

+ H1 (x) δ1′ (x)  −1  , ∂ G (y) /∂ y

88

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

¯ n (y; x, z ) in Corollary 3.4 becomes: and ∆

in its second argument (even though we assume monotonicity to facilitate the derivation), and the case where the conditional exogeneity condition in Assumption A.1∗ is locally violated.10

     ¯ n (y; x, z ) = E H1′ (Xni ) ◦ δ1′ (x) − H1′ (x) ◦ E δ1′ (Xni ) ∆    G−1 (y) 1     × ·E ∂ G−1 (y) /∂ y ∂ G−1 (Yni ) /∂ y    G−1 (Yni ) 1   ·  −1  −E . ∂ G−1 (Yni ) /∂ y ∂ G (y) /∂ y

Remark 3.4. Alternatively, following Remark 2.4, one can exploit the testable implication in (2.8) and construct the test statistic:

Γˆ (π ) =

¯ n (y; x, z ) ̸= 0 in this special case, let G (·) be the identity To see ∆ function: G (y) = y and dx = 1. Then    ¯ n (y; x, z ) = E H1 (Xni ) · δ1′ (x) − H1′ (x) · E δ1′ (Xni ) ∆ · [y − E (Yni )] .   y − E (Yni ) is not zero when Yni is random. E H1′ (Xni ) · δ1′ (x) −   H1′ (x) · E δ1′ (Xni ) is also in general not zero unless H1 (x) = b + c · δ1 (x) for all x, where b and c are some constants.9  



With the above corollary, we can study the local power of our test. We consider the following sequence of Pitman local alternatives:

HA (γn ) : rn (y; x, z ) ◦ r0n − r1n (x) ◦ r2n (y)

¯ n (y; x, z ) + o (γn ) , = γn ∆

(3.4)

¯ n is a nonzero measurable where γn → 0 as n → ∞, and ∆ ¯ n (Yni ; Xni , Zni )2 a (Yni ; Xni , Zni )] < function with µ0 ≡ limn→∞ E [∆ ∞. For technical reasons, we assume that the term o (γn ) in (3.4) holds uniformly in (y, x, z ) on the support of the weight function a (y; x, z ) . We continue to use rˆ , rˆ0 , rˆ1 , and rˆ2 to denote the local polynomial estimates of rn , r0n , r1n , and r2n , respectively. The asymptotic bias and variance terms are estimated as before with slight notational changes to account for double-array processes. The final test statistic Tn is constructed as before. The following theorem reports the asymptotic property of Tn under HA (γn ) . Theorem 3.5. Suppose Assumptions C.1–C.7 hold with the obvious notational changes that allows double-array IID processes. Then under d

HA (γn ) with γn = n−1/2 b−d/4−1 , Tn → N (µ0 /σ0 , 1) . Remark 3.3. Theorem 3.5 implies that the Tn test has non-trivial power against Pitman local alternatives that converge to zero at rate n−1/2 b−d/4−1 , provided 0 < µ0 < ∞. The asymptotic local power function of the test is given by 1 − Φ (zα − µ0 /σ0 ), where Φ is the standard normal CDF. It is worth mentioning that the local alternative in (3.4) may be motivated from models other than that considered in (3.2). Generally speaking, the local alternative in (3.4) represents a class of local deviations from the implied null hypothesis H0 which our test has power to detect. Such a local deviation may be caused by the violation of any conditions specified under H01 . This includes the model in (3.2) where δn (x, u) ≡ u + γn δ (x, u) may or may not be monotone

9 If H (X ) = b + c · δ (X ) holds and U + c > 0 a.s., then 1 1 ni γn Yni ≡ Rn (Xni , Uni ) = G [H1 (Xni ) + Uni + γn · δ1 (Xni ) · Uni ]

= G [b + c δ1 (Xni ) + Uni + γn · δ1 (Xni ) · Uni ]         1 c c = G exp ln γn δ1 (Xni ) + + ln Uni + +b− . γn γn γn ˜ This ≡ 10 by definingGn (·)  is essentially a DGP under  the null H  ˜ 1n (x) ≡ ln γn δ1 (x) + 1 , and U˜ ni ≡ ln Uni + c . G exp (·) + b − γc , H γ γ n n n ¯ n (y; x, z ) = 0 in this case. Hence, it is not surprising that ∆

n  1  rˆ (Yi ; Xi , Zi ) rˆ π − rˆ1 (Xi ) rˆ π (Yi )2 a (Yi ; Xi , Zi ) , 0 2 n i=1

where rˆ0π ≡ π ′ rˆ0 and rˆ2π (y) ≡ π ′ rˆ2 (y) . In an earlier version

of this paper, we showed that a suitable normalization of Γˆ (π ), say, Tn (π ), has the standard normal limiting null distribution. This test would require selection of a weigh vector π . For convenience, one could choose equal weights π = (1/dx , . . . , 1/dx ). A referee suggested that one might improve power by maximizing over possible weight vectors, e.g., basing a test on the statistic supπ ∈S Γˆ (π ) where S is a unit sphere in Rdx . Such a test would likely be considerably more demanding computationally than our proposed test. Remark 3.5. Like many nonparametric specification tests in the literature (e.g., Härdle and Mammen, 1993, CKK, and HSWY), our test suffers from a typical curse of dimensionality. Our test can detect local alternatives that converge to the null at the rate of γn = n−1/2 b−d/4−1 , so as d increases, the local power of our test deteriorates, and does so at rates that are common for nonparametric tests. In our simulations and empirical applications, d is small, equaling one or two. For estimating the model under the null, to alleviate the curse of dimensionality we could consider semiparametric models of H1 (x). Examples of such specifications for H1 (x) could include partially linear models, single-index models, or additive models. See, e.g., Fan et al. (2001) or Fan and Jiang (2005), among others. However, for the L2 -distance type test like ours, these semiparametric restrictions do not help to alleviate the curse of dimensionality, as we need to estimate the model under the alternative. Alternatively, to alleviate the curse of dimensionality, one may consider the integrated conditional moment type test (see, e.g., Bierens and Ploberger, 1997). 3.4. Simulating the null distribution It is well known that the asymptotic normal null distributions with estimated variance matrices often do not provide good approximations for kernel-based tests, in part because tests based on normal critical values can be very sensitive to the choice of bandwidth and suffer from substantial finite sample size distortions. We found that to be the case in some experiments with our test (not reported to save space). As an alternative, we consider resampling methods to obtain the simulated p-values or critical values for our test. As discussed earlier in Remark 2.6, CKK propose a test related to ours. They employ a nonparametric bootstrap method to obtain p-values for their test, but they do not formally demonstrate its asymptotic validity for technical reasons.11 , 12 In contrast, the

10 This means the conditional PDF f (·|x) of Y given X = x deviates from the n ni ni conditional PDF fn (·|x, z ) of Yni given Xni = x and Zni = z locally: fn (y|x, z ) − fn (y|x) = γn η (y; x, z ) + o (γn ), where γn → 0 as n → ∞ and η (y; x, z ) is a measurable function of (y, x, z ) . 11 This bootstrap depends on rates of uniform convergence of kernel estimated objects, and such rates can fail due to issues associated with unbounded support or to boundary biases. 12 As an associate editor kindly points out, bootstrap tests can have size distortion when the null hypothesis is nonparametric or semiparametric as in our case. For the recent literatures on this, see, e.g., Barrientos-Marin and Sperlich (2010) and Rodriguez-Poo et al. (forthcoming).

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

asymptotic validity of subsampling can be justified by standard arguments, so we use subsampling instead of a standard bootstrap. Let m = mn be a sequence of positive integers such that m → ∞ and m/n → 0 as n → ∞. Let B be a large integer. The subsampling (or equivalently, the m-out-of-n bootstrap) procedure is as follows: 1. Randomly draw B subsamples

...,m 2.  For k

B

∗(k)

Xi

k=1



∗(k)

Xi

∗(k)

, Yi

∗(k)

, Zi



, i = 1,

of size m from the original sample {(Xi , Yi , Zi )}ni=1 . 1, . . . , B, compute Tn using the subsample

=

, Yi∗(k) , Zi∗(k)

m

i=1

∗(k)

and denote this as Tˆn,m .

89

Table 1 Empirical rejection frequency: DGP 1.

λ

n

Subsample size

⌊n0.80 ⌋

⌊n0.85 ⌋

⌊n0.90 ⌋

0.05

0.10

0.05

0.10

0.05

0.10

0

200 300

0.006 0.034

0.032 0.134

0.006 0.032

0.038 0.134

0.004 0.016

0.026 0.090

0.25

200 300

0.124 0.428

0.286 0.642

0.110 0.358

0.258 0.653

0.056 0.230

0.196 0.504

0.5

200 300

0.460 0.860

0.686 0.954

0.388 0.820

0.622 0.926

0.258 0.638

0.498 0.860

1

200 300

0.910 0.998

0.990 1.000

0.864 0.996

0.968 1.000

0.748 0.982

0.908 0.996

3. Calculate the subsampling p-value as −1

p=B

Table 2 Empirical rejection frequency: DGP 2.

B  

 1 Tn < Tˆn∗(,mk) .

λ

k=1

n

Subsample size

⌊n0.80 ⌋

The asymptotic validity of the above subsampling method can be readily established as in Politis et al. (1999). Intuitively, ∗(k) under the null hypothesis both Tn and Tˆn,m are asymptotically distributed as N (0, 1) and thus the test based on the subsamplingbased p-value has the correct asymptotic size, and under the fixed ∗(k) alternative Tn diverges to infinity at a speed faster than Tˆn,m , giving the test its power.

⌊n0.85 ⌋

⌊n0.90 ⌋

0.05

0.10

0.05

0.10

0.05

0.10

0

200 300

0.010 0.036

0.040 0.098

0.008 0.028

0.040 0.092

0.004 0.020

0.022 0.066

0.25

200 300

0.186 0.412

0.380 0.686

0.134 0.326

0.344 0.592

0.046 0.178

0.216 0.444

0.5

200 300

0.700 0.938

0.872 0.986

0.596 0.892

0.836 0.970

0.390 0.694

0.674 0.924

1

200 300

0.958 0.994

0.996 1.000

0.878 0.980

0.988 0.998

0.594 0.734

0.908 0.972

4. Monte Carlo simulations In this section, we use simulations to examine the finite sample performance of our test. We consider four data generating processes (DGPs):



DGPs 1 and 3: Y = X + U + λX 1√ + U 2; DGPs 2 and 4: Y = Φ (X + U + λX 1 + U 2 ); For each DGP, λ takes four values: 0, 0.25, 0.5, and 1. λ = 0 corresponds to the null DGP and λ = 0.25, 0.5 and 1 represent gradual departure from the null. In DGPs 1 and 2, X ∼ Uniform (−1, 1) , U ∼ Uniform (−1, 1), and X and U are independent. In DGPs 3 and 4, X and U are no longer independent: X = 0.5Z + 0.5ε1 and U = 0.5Z + 0.5ε2 , where ε1 ∼ Uniform (−1, 1) , ε2 ∼ Uniform(−1, 1) , Z follows a standard normal distribution truncated by −2 and 2 in the tails, and ε1, ε2 , and Z are mutually independent. By construction, X ⊥ U | Z . We use second order (quadratic) local polynomial estimators, i.e., p = 2, with a Gaussian PDF for the kernel function. For the bandwidth sequence b and c, we use the rule κ · std (V ) · n

− 2(p+11)+1

− 2(p+11)+1

and κ · std (Y ) · n associated with V and Y , respectively, where κ is a constant and std (V ) and std (Y ) are sample standard deviations of V and Y , respectively. In general, the optimal κ depends on the underlying specific DGPs. For simplicity we let κ = 1 for DGPs 1 and 2 and κ = 2 for DGPs 3 and 4. For DGPs 1 and 2, we specify the weight function a = 1, corresponding to no trimming, whereas for DGPs 3 and 4, a trims out 2.5% data on each tail of each dimension of (Y , X , Z ), so a (Y ; X , Z ) = 1 [y0.025 ≤ Y ≤ y0.975 ]

·1 [x0.025 ≤ X ≤ x0.975 ] · 1 [z0.025 ≤ Z ≤ z0.975 ] , where y0.025 and y0.975 are the 0.025 and 0.975 quantiles of Y respectively, and similarly for x0.025 , x0.975 , z0.025 , and z0.975 . We consider the subsampling test with the sample sizes n = 200 and 300. We try three different subsample sizes m = ⌊n0.80 ⌋, ⌊n0.85 ⌋, and ⌊n0.90 ⌋, where ⌊·⌋ denotes the integer part of ·. The number of subsamples is B = 200 and the number of replications is 500.

We consider two conventional nominal levels: 0.05 and 0.10. Tables 1–4 present the rejection frequencies for DGPs 1–4, respectively. In each table, λ = 0 corresponds to the null DGP. When the sample size is 200, the subsampling tests are undersized. However, when the sample size increases to 300, the performance improves and the rejection frequencies are closer to their nominal levels. This suggests that a moderate to large sample is required for the test to have good level behavior. This is not surprising, as the estimation of derivatives is much harder and has a slower convergence rate than the estimation of the conditional expectation itself. In general, the tests are less under-sized when the subsample size m is relatively small. In each table, λ = 0.25, 0.5 and 1 correspond to alternative DGPs and thus they are used to examine the power of the tests. The test has substantial power. For example, at the 5% level, the rejection frequency is 0.910 for DGP 1 when λ = 1, sample size n = 200, and subsample size m = ⌊n0.80 ⌋. For all the values of λ = 0.25, 0.5 and 1, the power increases rapidly as the sample size increases. For example, when n increases from 200 to 300, the rejection frequency increases from 0.460 to 0.860 for DGP 1 with λ = 0.5 and m = ⌊n0.80 ⌋. As expected, the rejection frequencies increase with the value of λ for all DGPs. The power of the tests increases when the subsample size m decreases. This is likely because, under the alternative, the test statistics diverge, so the difference between the original test statistics and subsampled test statistics is large when the difference between the original sample size and subsample size is large. In general, for the same sample size, the rejection frequencies for DGPs 1 and 2 under the alternative are higher than those for DGPs 3 and 4. This suggests that when d is large, we need to have a relatively large sample to achieve reasonable powers. This reflects the ‘‘curse of dimensionality’’ of our test. 5. Empirical applications In this section, we consider testing whether duration data obey the class of nonlinear generalized accelerated failure-time (GAFT) models. We then apply our test empirically on a data set of duration of strikes among manufacturing workers in the US.

90

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

Table 3 Empirical rejection frequency: DGP 3.

λ

n

Table 5 Summary statistics for the strike data (sample size n = 566).

Subsample size

⌊n0.80 ⌋

⌊n0.85 ⌋

Median Standard deviation

43.62

28.00

0.10

0.05

0.10

0.05

0.10

Y X

200 300

0.010 0.034

0.040 0.106

0.000 0.018

0.016 0.074

0.002 0.004

0.014 0.034

0.25

200 300

0.056 0.174

0.162 0.370

0.034 0.122

0.092 0.334

0.016 0.034

0.074 0.174

0.5

200 300

0.216 0.546

0.482 0.754

0.122 0.446

0.368 0.720

0.072 0.220

0.242 0.514

1

200 300

0.806 0.972

0.928 0.990

0.642 0.940

0.864 0.986

0.484 0.808

0.782 0.952

Table 4 Empirical rejection frequency: DGP 4. n

Mean

0.05 0

λ

Variable Name

⌊n0.90 ⌋

Subsample size

⌊n0.80 ⌋

⌊n0.85 ⌋

⌊n0.90 ⌋

0.05

0.10

0.05

0.10

0.05

0.10

0

200 300

0.018 0.040

0.054 0.096

0.012 0.022

0.030 0.074

0.002 0.004

0.022 0.044

0.25

200 300

0.030 0.084

0.096 0.158

0.020 0.048

0.062 0.134

0.000 0.012

0.038 0.074

0.5

200 300

0.074 0.134

0.182 0.292

0.038 0.098

0.102 0.252

0.010 0.042

0.068 0.142

1

200 300

0.250 0.448

0.482 0.678

0.138 0.346

0.344 0.612

0.060 0.166

0.226 0.404

Duration of strikes (days) Business cycle position

0.006

0.008

44.67 0.050

Min

Max

1

235

−0.140 0.086

Table 6 p-values for the strike data (sample size n = 566). Subsample size

⌊n0.80 ⌋ = 159

⌊n0.85 ⌋ = 219

⌊n0.90 ⌋ = 300

κ κ κ κ κ κ

0.605 0.617 0.651 0.599 0.540 0.374

0.497 0.522 0.578 0.558 0.534 0.437

0.472 0.496 0.572 0.572 0.530 0.445

= 0.5 = 0.75 =1 = 1.25 = 1.5 =2

Proposition 5.1 shows that the MPH model has two important implications: (i) it equals a transformation model of the type given by our null, and (ii) U allows a distribution determined by ln(− ln (1 − ε) /ξ ). In principle, both restrictions might be testable, though we focus on implication (i), corresponding to our null hypothesis.13 If our null is rejected, then the specification of MPH models is rejected, so our test can be used as a falsification test for MPH models.

5.1. Testing for GAFT models 5.2. Duration of strikes Let Y be the duration of a certain state (a nonnegative random variable) such as duration of a strike. Our test is directly applicable to nonlinear GAFT models, since such models can be written in the form Y = G [H1 (X ) + U], where X is a vector of covariates, and U an unobservable random variable (see, e.g., Eq. (2.5) in Ridder, 1990). MPH models are a particularly popular class of GAFT models. Below we provide a direct link between our null hypothesis and MPH models. Let h (Y , X , ξ ) denote the hazard function for Y . An MPH model of survival time Y is one where h (Y , X , ξ ) = λ (Y ) · θ (X ) · ξ

(5.1)

holds for some baseline hazard function λ (Y ) and some nonnegative function of covariates θ (X ). The MPH model is widely applied in empirical research (for a detailed review, see Van den Berg, 2001). For example, when ξ = 1, this is the standard proportion hazard (PH) model developed by Cox (1972). A particularly popular parametric specification of the MPH model due to  Lancaster (1979) assumes that λ (Y ) = α Y α−1 , θ (X ) = exp X ′ β and ξ is a gamma distributed random variable. The following Proposition provides a general characterization of MPH models. Proposition 5.1. Suppose that the hazard function of the survival time Y is h (Y , X , ξ ), where Y ∈ R+ , X ∈ Rdx , ξ ∈ R+ and ξ ̸= 0 with probability 1. Let λ: R → R+ and θ : Rdx → R+ be two measurable functions such that λ (Y ) = 0 with probability 0 and θ (X ) = 0 with probability 0. Then h (Y , X , ξ ) is a MPH model: h (Y , X , ξ ) = λ (Y ) · θ (X ) · ξ , if and only if

In this subsection, we test the specification of GAFT models using data on the duration of strikes. Here Y is the duration of strikes in US manufacturing firms, defined as the number of days since the start of a strike. Our X is a scalar variable indicator of the business cycle position of the economy, measured by the deviation of output from its trend. Positive values of X mean that the economy is above its growth trend. We assume that A.1 holds with X ⊥ U, i.e., Z is empty. Our data set was used in Kennan (1985) and is employed in several econometrics textbooks including Cameron and Trivedi (2005) and Greene (2011). The sample size is 566. Table 5 presents data summary statistics. We apply our subsampling based test. The details of implementation is the same as in the simulations. For the bandwidth sequence b and c, we try various values of the constant κ , letting κ = 0.5, 0.75, 1, 1.25, 1.5 and 2. Results based on 1000 subsamples are reported in Table 6. Our results are robust, yielding similar p-values across different bandwidths and subsample sizes. The p-values are high for all subsample sizes under investigation. This suggests that our test supports the specification of GAFT models. 6. Extensions Our methodology can be extended to test other related hypotheses for specifications in nonseparable models. For example, consider a simple bivariate X such that X ≡ (X1 , X2 )′ , where X1

Y = G [H1 (X ) + U] , where G : R → R+ is a strictly increasing function that is dx differentiable → R, and U =  a.e. on its support, H1 : R 

ln

− ln(1−ε) ξ

ε ⊥ (X , ξ ) .

, where ε is a uniform random variable on [0, 1] and

13 More broadly, this proposition shows that nonparametrically the only difference between GAFT and MPH models is some regularity conditions, since if one is given a GAFT model which by Ridder (1990) satisfies Y = G [H1 (X ) + U], then given the regularity assumed in Proposition 5.1, one can construct an equivalent MPH model by letting ξ = [− ln (1 − ε)] e−U where ε is uniform.

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

and X2 are two scalar random variables. Then our results can be used to test the hypotheses:

H20 : There exist two measurable functions R2 and H3 such that Y = R2 [H3 (X1 , X2 ) , U] a.s.

Appendix A. Proof of the main results in Section 2 Proof of Theorem 2.1. We first prove (a). F (y | x, z ) = Pr [Y ≤ y | X = x, Z = z]

= Pr [G [H1 (X ) + U] ≤ y | X = x, Z = z]   = Pr U ≤ G−1 (y) − H1 (x) | Z = z   = FU |Z G−1 (y) − H1 (x) , z ,

H2A : H20 is false; and

H30 : There exist three measurable functions R3 , H4 and H5

where FU |Z (·, z ) denotes the conditional CDF of U given Z = z. Let F1,U |Z be the derivative of FU |Z with respect to its first argument. Then,

such that Y = R3 [H4 (X1 ) + H5 (X2 ) , U] a.s.

H3A : H30 is false. Given the key conditional exogeneity Assumption A.1, a testable implication of H20 is

∂ FY |X1 ,X2 ,Z (y | x1 , x2 , z ) /∂ x1 = r3 (x1 , x2 ) , ∂ FY |X1 ,X2 ,Z (y | x1 , x2 , z ) /∂ x2

(6.2)

for some unknown measurable functions r4 and r5 . Our test can also be extended to test for semiparametric specifications. For example, consider a multi-dimensional X and one may be interested in testing

H40 : There exist β ∈ Rdx and one measurable functions R4 Y = R4 X ′ β + U a.s., and R4 is strictly monotonic.



∂ F (y | x, z ) /∂ x ∂ F (y | x, z ) /∂ y  −1

 (y) − H1 (x) , z · [−∂ H1 (x) /∂ x]     = F1,U |Z G−1 (y) − H1 (x) , z · ∂ G−1 (y) /∂ y −∂ H1 (x) /∂ x = . ∂ G−1 (y) /∂ y F1,U |Z G

(6.1)

where FY |X1 ,X2 ,Z (y | x1 , x2 , z ) is the conditional CDF of Y given (X1 , X2 , Z ) and r3 some unknown measurable function. Similarly, H30 implies that

∂ FY |X1 ,X2 ,Z (y | x1 , x2 , z ) /∂ x1 = r4 (x1 ) · r5 (x2 ) ∂ FY |X1 ,X2 ,Z (y | x1 , x2 , z ) /∂ x2



So the functions s1 and s2 exist and are given by s1 (x) = −C ∂ H1 (x) /∂ x and s2 (y) = C ∂ G−11(y)/∂ y , where C ̸= 0 is an arbitrary constant. Clearly, s2 : R → R+ if C > 0 and s2 : R → R− if C < 0. The measurable functions S1 and S2 are given by CH1 and

CG−1 , respectively. We now prove (b). Without loss of generality, assume that s2 : R → R+ . We can always find two scalar functions S1 and S2 such that ∂ S1 (x) /∂ x = −s1 (x) and ∂ S2 (y) /∂ y = 1/s2 (y), where S2 (·) is strictly increasing. Combining this with the definition of r (y; x, z ) gives D x F ( y | x, z ) Dy F (y | x, z )

H4A : H40 is false; Then H40 implies that Dx F (y | x, z ) f ( y | x, z )

= r6 (y)

91

= s1 (x) s2 (y) =

(6.3)

for some unknown measurable function r6. To test Eqs. (6.1)–(6.3), one can readily construct test statistics similar to ours, using marginal integration as proposed in testing H0 .

(A.1)

Let U˜ ≡ S2 (Y )− S1 (X ) and u˜ ≡ S2 (y)− S1 (x). By the monotonicity of S2 , we have Y = S2−1 [S1 (X ) + U˜ ] and y = S2−1 [S1 (x) + u˜ ]. It follows that



FU˜ |X ,Z u˜ , x, z ≡ P U˜ ≤ u˜ |X = x, Z = z







  = P S2 (Y ) − S1 (X ) ≤ u˜ |X = x, Z = z     = P Y ≤ S2−1 S1 (x) + u˜ |X = x, Z = z = P (Y ≤ y|X = x, Z = z ) = F (y|x, z ) .

7. Concluding remarks In this paper, we proposed a specification test for a transformation model containing a vector of covariates and an unobservable error. This test is related to tests for separability and monotonicity in nonseparable structural equations. We exploit the testable implication of the transformation model that the ratio of the derivatives of a conditional CDF takes a product form. Our test statistics are based on the L2 distance between restricted and unrestricted estimators of this ratio of derivatives. We show that the test statistics are asymptotically normal and consistent against the alternative of this testable implication. We provide limit normal distribution theory as well as subsampling methods for obtaining p-values under the null. Our simulations suggest that the test statistics perform well in moderate size samples. We apply our statistic to test the specification of GAFT models for data on the durations of strikes among manufacturers in the US and fail to reject the specification of GAFT models. We find this result to be stable and robust over a wide range of tuning parameter values.

−∂ S1 (x) /∂ x for all (x, y, z ) ∈ W . ∂ S2 (y) /∂ y

Then D x F ( y | x, z ) Dy F (y | x, z )

Dx FU˜ |X ,Z u˜ , x, z



Dy FU˜ |X ,Z u˜ , x, z





=



    ∂ FU˜ |X ,Z u˜ , x, z /∂ u˜ · s1 (x) + ∂ FU˜ |X ,Z u˜ , x, z /∂ x   = ∂ FU˜ |X ,Z u˜ , x, z /∂ u˜ · (1/s2 (y))   ∂ FU˜ |X ,Z u˜ , x, z /∂ x · (1/s2 (y))   = s1 (x) s2 (y) + ∂ FU˜ |X ,Z u˜ , x, z /∂ u˜ for all (x, y, z ) ∈ W .

(A.2)

Comparing (A.1) with (A.2) yields ∂ FU˜ |X ,Z u˜ , x, z /∂ x = 0 for all





˜ Therefore, u˜ , x, z ∈ U × V where U denotes the support of U.





U˜ ⊥ X |Z . So far, we have shown that Y = S2−1 [S1 (X ) + U˜ ]

92

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

where S2−1 is strictly monotonic and X ⊥ U˜ |Z . The conclusion in −1

part (b) follows by setting G = S2

˜ , H1 = S1 , and U = U.



Proof of Corollary 2.2. Under H10 and Assumption A.1, (2.1) in Theorem 2.1(a) holds, implying that r0 ≡ EY EXZ [r (Y ; X , Z )] = E [s1 (X )] E [s2 (Y ) 1 {Y ∈ Y0 }] , r2 (y) ≡ E [r (y; X , Z )] = E [s1 (X )] s2 (y) 1 {y ∈ Y0 } . It follows that r (Y ; X , Z ) ◦ r0 − r1 (X ) ◦ r2 (Y ) = [s1 (X ) s2 (Y ) 1 {Y ∈ Y0 }] ◦ {E [s1 (X )] E [s2 (Y ) 1 {Y ∈ Y0 }]} − {s1 (X ) E [s2 (Y ) 1 {Y ∈ Y0 }]} ◦ {E [s1 (X )] s2 (Y ) 1 {y ∈ Y0 }} = 0.  Proof of Corollary 2.3. The ‘‘if’’ part follows from Theorem 2.1(b) by letting s1 (X ) = r1 (X ) and s2 (Y ) = |r2 (Y )| / |r0 |. The ‘‘only if’’ part is implied by Theorem 2.1(a). Note that when s2 (Y ) > 0, r (Y ; X , Z ) · |r0 | − r1 (X ) · |r2 (Y )|

= s1 (X ) · s2 (Y ) · |E [s1 (X )] · E [s2 (Y )]| − s1 (X ) · E [s2 (Y )] · |E [s1 (X )] · s2 (Y )| = s1 (X ) · s2 (Y ) · |E [s1 (X )] | · E [s2 (Y )] − s1 (X ) · E [s2 (Y )] · |E [s1 (X )]| · s2 (Y ) = 0.

n 1

n i =1

and

1 j 1≤|j|≤p j! D F

n i =1

B(cL) (y; v) ≡

n 1

n i=1

Kc (Vi − v) µc (Vi − v)



  1 j (j) × α (y|Vi ) − f (y|v) − α (y|v) (Vi − v) , j! 1≤|j|≤p ≡ Lc (Yi − y) − α (y|Vi ) and α (y|v) ¯ c(L) (y; v) ≡ E [B(cL) (y; v)]. E [Lc (Yi − y)|Vi = v ]. Let B

where L¯ y (Wi )



ln n),

2 −1 2 where νbc ≡ ν1b b + bp + ν2c + c p+1 + c r + ν1b ν2c b−1 .

Proof of Theorem 3.1. Let ai ≡ a (Yi ; Xi , Zi ) , ri ≡ r (Yi ; Xi , Zi ) , r1i ≡ r1 (Xi ) , r2i ≡ r2 (Yi ), rˆi ≡ rˆ (Yi ; Xi , Zi ) ,rˆ1i ≡ rˆ1 (Xi ), and rˆ2i ≡ rˆ2 (Yi ). Let ξ1i = ri ◦ r0 − r1i ◦ r2i , ξ2i = rˆi − ri ◦ r0 + ri ◦

d

− d

(y|v) (Vi − v)j , and 1¯ y (Wi ) ≡ 1 {Yi ≤ y} − F (y|Vi ). Let S¯ b (v) ≡ E [Sb (v)] and ¯ b (y; v) ≡ E [Bb (y; v)], where Sb (v) is defined after (2.12). Define B n 1 V(cL) (y; v) ≡ Kc (Vi − v) µc (Vi − v) L¯ y (Wi ) , 





= b 2 +2

Kb (Vi − v) µb (Vi − v) 1¯ y (Wi ) ,

where ∆i,y (v) ≡ F (y|Vi )− F (y|v)−



n         rˆi − ri + ri ◦ rˆ0 − r0 + r0 i =1



 ′ µ (v/b). Let Wi ≡ (Yi , Vi′ )′ and w ≡ y, v ′ . Define

Vb (y; v) ≡

(c) supy∈Y0 rˆ2 (y) − r2 (y) = OP (νbc + n−1/2 b−1

d

To prove Theorem 3.1, we first establish some technical lemmas. All the proofs of the lemmas can be found in the online supplementary material (see Appendix D). Recall that Vi ≡  ′ ′ ′  ′ Xi , Zi , v ≡ x′ , z ′ , Kb (v) ≡ b−d K (v/b), and µb (v) ≡

Kb (Vi − v) µb (Vi − v) ∆i,y (v)

ln n.

(a) rˆ (y; v) − r (y; v) = b−1 e1 S¯ b (v)−1 Vb (y; v) f (y|v)−1 − (L) Dx F (y|v) e′2 S¯ c (v)−1 Vc (y; v) f (y|v)−2 + OP (νbc ) uniformly in (y, v) ∈ Y0 × V ,  (b) rˆ0 − r0 = OP νbc + n−1/2 b−1 ,

nb 2 +2 Γˆ = b 2 +2

Appendix B. Proof of the main results in Section 3

n i=1



        rˆ − r0 − rˆ1i − r1i ◦ r2i − r1i ◦ rˆ2i − r2i , and ξ3i = rˆi − ri ◦      0 rˆ0 − r0 − rˆ1i − r1i ◦ rˆ2i − r2i . Then

Similarly, when s2 (Y ) < 0, r (Y ; X , Z ) · |r0 |−r1 (X ) · |r2 (Y )| = 0.

n 1

where ν2c ≡ n−1/2 c −(d+1)/2

Lemma B.3. Suppose that Assumptions C.1–C.7 hold. Then

r1 (x) ≡ E [r (Y ; x, Z )] = s1 (x) E [s2 (Y ) 1 {Y ∈ Y0 }] ,

Bb (y; v) ≡

(L) 2 ¯ (cL) (y; v)]+OP (ν2c + (a) fˆc (y|v)−f (y|v) = e′2 S¯ c (v)−1 [Vc (y; v)+B p+1 ν2c c ), (b) fˆc (y|v) − f (y|v) = OP (ν2c + c p+1 + c r ),



Lemma B.1. Suppose that Assumptions C.1–C.3, C.5 and C.7 hold. Then uniformly in (y, v) ∈ Y0 × V ,

¯ b (y; v)]+ (a) Dx Fˆb (y|v)− Dx F (y|v) = b−1 e1 S¯ b (v)−1 [Vb (y; v)+ B 2 −1 OP (ν1b b + ν1b bp ), (b) Dx Fˆb (y|v) − Dx F (y|v) = OP (ν1b b−1 + bp ), √ where ν1b ≡ n−1/2 b−d/2 ln n.

Lemma B.2. Suppose that Assumptions C.1–C.7 hold. Then uniformly in (y, v) ∈ Y0 × V ,



n 

2 rˆ2i − r2i + r2i  ai

 

rˆ1i − r1i + r1i ◦





∥ξ1i + ξ2i + ξ3i ∥2 ai

i =1

= Γ1n + Γ2n + Γ3n + 2Γ4n + 2Γ5n + 2Γ6n , (B.1)  d d n where Γln ≡ b 2 +2 i=1 ∥ξli ∥2 ai for l = 1, 2, and 3, Γ4n ≡ b 2 +2 n n ′ n d d +2 +2 ′ 2 2 i=1 ξ1i ξ2i ai , Γ5n ≡ b i=1 ξ1i ξ3i ai , and Γ6n ≡ b i=1 ′ ξ2i ξ3i ai . Under H0 , Γjn = 0 for j = 1, 4, 5. It suffices to prove the   d theorem by showing that (i) Γ2n −Bn → N 0, σ02 , (ii) Γ3n = oP (1), and (iii) Γ6n = oP (1) . 10 d +2 2 To show (i), we write Γ2n = j=1 Γ2n,j where Γ2n,1 ≡ b 2      n   d n  rˆi − ri ◦ r0  ai , Γ2n,2 ≡ b 2 +2 ri ◦ rˆ0 − r0 2 i=1 i=1      d d 2 n ai , Γ2n,3 ≡ b 2 +2 i=1  rˆ1i − r1i ◦ r2i  ai , Γ2n,4 ≡ b 2 +2  2   ′ n   d   ai , Γ2n,5 ≡ 2b 2 +2 n rˆi − ri ◦ r0 i=1 r1i ◦ rˆ2i − r2i i=1     ′  n  d ri ◦ rˆ0 − r0 ai , Γ2n,6 ≡ −2b 2 +2 i=1 rˆi − ri ◦ r0 rˆ1i −         d ′ n + 2 rˆi − ri ◦ r0 r1i ◦ rˆ2i − r1i ◦ r2i ai , Γ2n,7 ≡ −2b 2 ′     n  i=1 d +2 2 r2i ai , Γ2n,8 ≡ −2b rˆ1i − r1i ◦ r2i ai , i=1 ri ◦ rˆ0 − r0      d ′ Γ2n,9 ≡ −2b 2 +2 ni=1 ri ◦ (ˆr0 − r0 ) r1i ◦ (ˆr2i − r2i ) ai , and  d Γ2n,10 ≡ 2b 2 +2 ni=1 ((ˆr1i − r1i )◦ r2i )′ (r1i ◦(ˆr2i − r2i ))ai . We show in Lemma B.4 that Γ2n,1 contributes to both the asymptotic bias and   d variance of our test statistic: Γ2n,1 − B1n → N 0, σ02 , and B1n =   d OP (b 2 +2 b−d−2 + c −d−1 ) is the dominant bias term that never vanishes asymptotically no matter whether Z is absent or not. The d

normalization constant nb 2 +2 in the front of Γˆ in (B.1) indicates that the rate of Γˆ converging to zero under the null is uniquely determined by the convergence rate of rˆi to ri as  the asymptotic  variance of rˆ (y; x, z ) − r (y; x, z ) is of the order O n−1 b−(d+2) . We show in Lemmas B.5(b) and B.6(c) that Γ2n,3 and Γ2n,6 also contribute to the asymptotic bias of our test statistic: Γ2n,3 = B2n + d

oP (1) and Γ2n,6 = −2B3n +oP (1) , B2n = OP (b(dz −dx )/2 +b 2 +2 c −dx ) d +2 2

and B3n = OP (b(dz −dx )/2 + b c −dx ). B2n signifies the estimation effect of rˆ1i and B3n signifies the estimation effect of both rˆi and d

rˆ1i . B2n and B3n become oP (1) if dz > dx and b 2 +2 c −dx = o (1). We also show in the other parts of Lemmas B.5(b) and B.6(c) that Γ2n,s = oP (1) for s = 2, 4, 5, 7, . . . , 10.

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

In short, by Lemmas B.4 and B.5(b) and B.6(c), Γ2n,1 + Γ2n,3 +   d Γ2n,6 − Bn → N 0, σ02 where Bn ≡ B1n + B2n − 2B3n , and by Lemmas B.5 (a) and (c) and B.6(a) and (c)–(f), Γ2n,s = oP (1) for   d s = 2, 4, 5, 7, . . . , 10. It follows that Γ2n − Bn → N 0, σ02 . Next, we show (ii). By the Cauchy–Schwarz inequality and the fact that ∥A ◦ B∥ ≤ ∥A∥ ∥B∥ for any two conformable d

vectors A and B, Γ3n ≤ 2Γ3n,1 + 2Γ3n,2 , where Γ3n,1 ≡ b 2 +2       rˆ0 − r0 2 n rˆi − ri 2 ai and Γ3n,2 ≡ b 2d +2 n  rˆ1i − r1i ◦ i=1 2 i=1  rˆ2i − r2i  ai . Following the arguments used in the proofs of Lemmas B.4 and B.5(b) respectively, we can readily show that n    d rˆi − ri 2 ai Γ¯ 3n,1 ≡ b 2 +2  id=1   = OP b 2 +2 b−d−2 + c −d−1 d

Γ¯ 3n,2 ≡ b 2 +2

and

(B.2)

n    rˆ1i − r1i 2 ai i=1 d

= OP (b 2 +2 (b−dx −2 + c −dx )). (B.3) 2   2 Then by Lemma B.3(b), Γ3n,1 = rˆ0 − r0  Γ¯ 3n,1 = OP νbc + n−1   d   b−2 OP b 2 +2 b−d−2 + c −d−1 = oP (1) and Γ3n,2 ≤ supy∈Y0 2   2    rˆ2 (y) − r2 (y) Γ¯ 3n,2 = OP ν + n−1 b−2 ln n OP b 2d +2 b−dx −2 bc  + c −dx = oP (1). Consequently, Γ3n = oP (1) . To show (iii), note that Γ6n = Γ6n,1 − Γ6n,2 + Γ6n,3 − Γ6n,4 −    d Γ6n,5 +Γ6n,6 −Γ6n,7 +Γ6n,8 , where Γ6n,1 ≡ b 2 +2 ni=1 [ rˆi − ri ◦r0 ]′        d [ rˆi − ri ◦ rˆ0 − r0 ]ai , Γ6n,2 ≡ b 2 +2 ni=1 [(ˆri −ri )◦r0 ]′ rˆ1i − r1i ◦       n d rˆ2i − r2i ai , Γ6n,3 ≡ b 2 +2 i=1 [ri ◦(ˆr0 −r0 )]′ rˆi − ri ◦ rˆ0 − r0      d n ai , Γ6n,4 ≡ b 2 +2 i=1 [ri ◦ (ˆr0 − r0 )]′ [ rˆ1i − r1i ◦ rˆ2i − r2i ]ai ,      d Γ6n,5 ≡ b 2 +2 ni=1 [(ˆr1i −r1i )◦r2i ]′ rˆi − ri ◦ rˆ0 − r0 ai , Γ6n,6 ≡       n d b 2 +2 i=1 [ rˆ1i − r1i ◦ r2i ]′ rˆ1i − r1i ◦ rˆ2i − r2i ai , Γ6n,7 ≡     n d + 2 r2i − r2i )]′ rˆi − ri ◦ rˆ0 − r0 ai , and Γ6n,8 ≡ b2 i=1 [r1i ◦ (ˆ  ′     n  d b 2 +2 i=1 r1i ◦ rˆ2i − r2i rˆ1i − r1i ◦ rˆ2i − r2i ai . By Lemma B.3(b) and (B.2),

    Γ6n,1  ≤ ∥r0 ∥ rˆ0 − r0  Γ¯ 3n,1   d = Op νbc + n−1/2 b−1 Op (b 2 +2 (b−d−2 + c −d−1 )) = oP (1) . By the Cauchy–Schwarz inequality, Lemma B.3(c), and Eqs. (B.2) and (B.3),

    Γ6n,2  ≤ ∥r0 ∥ sup rˆ2 (y) − r2 (y) b 2d +2 y∈Y0

n     rˆi − ri  rˆ1i − r1i  ai × i=1

  1/2 ≤ ∥r0 ∥ sup rˆ2 (y) − r2 (y) Γ¯ 3n,1 Γ¯ 3n,2

93

Lemma B.6(a),

  d Γ6n,3 = b 2 +2 OP νbc + n−1/2 b−1 × OP (b−d−1 + n1/2 b−1 + c −d−1 ) = oP (1) . ′  n d Note that Γ6n,4 = rˆ0 − r0 Γ¯ 6n,4 where Γ¯ 6n,4 ≡ b 2 +2 i=1      ri ◦ rˆ1i − r1i ◦ rˆ2i − r2i ai . Following the proof of Lemma B.6(f), we can show that Γ¯ 6n,4 = oP (1). This, in conjunction with Lemma B.3(b), implies that Γ6n,4 = oP (1). Note that Γ6n,5 = (ˆr0 −     n  d r0 )′ Γ¯ 2n,6 where Γ¯ 2n,6 ≡ b 2 +2 i=1 rˆi − ri ◦ rˆ1i − r1i ◦ r2i ai . By  Lemma B.3 and   the proof of Lemma  B.6(b), Γ6n,5 = OP νbc + n−1/2 b−1 OP (b(dz −dx )/2 ) + oP (1) = oP (1). Note     that Γ6n,6  ≤ supy∈Y0 rˆ2 (y) − r2 (y) Γ¯ 6n,6 where Γ¯ 6n,6 ≡ n d b 2 +2 i=1 ((ˆr1i − r1i ) ◦ r2i ◦ (ˆr1i − r1i ))ai . Analogously to the proof d of Lemma B.4, we can show that Γ¯ 6n,6 = OP (b 2 +2 (b−dx −2 + c −dx )). Combining this with Lemma B.3(c) yields

 √    Γ6n,6  = OP νbc + n−1/2 b−1 ln n  d   × OP b 2 +2 b−dx −2 + c −dx = oP (1) . Observe that Γ6n,7 ≡









d

′

rˆ0 − r0 Γ¯ 2n,7 where Γ¯ 2n,7 ≡ b 2 +2

rˆi − ri ◦ r1i ◦ rˆ2i − r2i



√ −1

B.6(c), Γ6n,7 = OP (νbc + n−1/2 b ln n)oP (1) = oP (1) . Lastly, by the Cauchy–Schwarz inequality and the study of Γ2n,4 and Γ3n,2 above Γ6n,8  ≤ Γ2n,4 Γ3n,2 proved Γ6n = oP (1). 







1/2

= oP (1) . Consequently we have

Remark. Admittedly, the formulae for the asymptotic bias and variance are quite complicated because we consider the general local polynomial regressions to estimate both f (y|x, z ) and Dx F (y|x, z ) and each of the four estimates rˆi , r0 , rˆ1i , and rˆ2i contribute to the asymptotic bias and variance of our test statistic in different manners. Lemma B.4. Suppose Assumptions C.1–C.7 hold. Then Γ2n,1 − d



d



B1n → N 0, σ02



where B1n

d

n

= n− 1 b 2 + 2

i=1

ϕ (Wi , Wi ) =

OP (b 2 +2 b−d−2 + c −d−1 ).



Lemma B.5. Suppose Assumptions C.1–C.7 hold. Then d

2  n   ai = oP (1) ,  i=1 ri ◦ rˆ0 − r0      d n + 2  rˆ1i − r1i ◦ r2i 2 ai = B2n + oP (1) , (b) Γ2n,3 = b 2 i =1  2 n  d (c) Γ2n,4 = b 2 +2 i=1 r1i ◦ rˆ2i − r2i  ai = oP (1) , 2   n n n d where B2n ≡ n−4 b 2 +2 i=1  j=1 k=1 ζk Yj ; Xi , Zj ◦ r2i    d ai = OP (b 2 +2 b−dx −2 + c −dx ). If dz > 0, then (b) also   d ¯ 2n ≡ n−4 b 2 +2 ni=1 nj=1 holds when we replace B2n by B n 2  ζj (Yk ; Xi , Zk ) ◦ r2i  ai . (a) Γ2n,2 = b 2 +2

k=1

 √  = OP νbc + n−1/2 b−1 ln n   d   × OP b 2 +2 b−d−2 + c −d−1  d  1/2 × OP b 2 +2 b−dx −2 + c −dx

(a) Γ2n,5 = 2b 2 +2 ai = oP (1) ,

= oP (1) . Γ¯ 2n,5 ≡

n  i =1

   ′ d = b 2 +2 rˆ0 − r0 ◦ rˆ0 − r0 Γ¯ 2n,5 where   rˆi − ri ◦ ri ai . By Lemma B.3(b) and the proof of

i=1

ai . By Lemma B.3(b) and the proof of

y∈Y0

Note that Γ6n,3

n

Lemma B.6. Suppose that Assumptions C.1–C.7 hold. Then d

n  i =1



rˆi − ri ◦ r0

′ 

d

n 

rˆi − ri ◦ r0

(c) Γ2n,7 = −2b 2 +2 ai = oP (1) ,

d

n 

rˆi − ri ◦ r0

(d) Γ2n,8 = −2b 2 +2 ai = oP (1) ,

n 

(b) Γ2n,6 = −2b 2 +2 i=1 ai = −2B3n + oP (1) ,

d

i=1

i=1





ri ◦ rˆ0 − r0



′ 



′ 

ri ◦ rˆ0 − r0





rˆ1i − r1i ◦ r2i



r1i ◦ rˆ2i − r2i

′ 







rˆ1i − r1i ◦ r2i



94

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

(e) Γ2n,9 = −2b ai = oP (1) ,

d +2 2

d

(f) Γ2n,10 = 2b 2 +2 ai = oP (1) ,

n 

Yj ; Xi , Zj ◦ r2 (Yi )



ri ◦ rˆ0 − r0

i=1

n  i=1

′ 



rˆ1i − r1i ◦ r2i









r1i ◦ rˆ2i − r2i

′ 

r1i ◦ rˆ2i − r2i

respect to its first and second arguments, respectively, and

∆n (y; x, z ) =

n n

d +2 2

i=1

l =1

ζl (Wi ) ◦ r0



′ n n j =1

k=1

ζk

Proof of Theorem 3.2. The proof follows closely from that of Theorem 3.1. By (B.1) and the proof of Theorem 3.1. Now Γˆ = d

d

d

n−1 b−( 2 +2) Γ1n + n−1 b−( 2 +2) Γ4n + n−1 b−( 2 +2) Γ5n + oP (1). It is easy −( 2d +2)

Γ1n = n

n −1

∥ri ◦ r0 − r1i ◦ r2i ∥2 ai = µA + oP (1) and n b Γsn = oP (1) under HA for s = 4, 5. d ˆ n = oP (1) and In addition, under HA , we have n−1 b−( 2 +2) B d d p 2 2 −1 −( 2 +2) −1 −( 2d +2) σˆ n → Tn = n b [nb( 2 +2) Γˆ − σA . It follows that n b ˆ n ]/ σˆ n2 = µA /σA + oP (1) and the result follows.  B to show that n−1 b

i=1

−1 −( 2d +2)

Proof of Theorem 3.3. By assumption, δn (x, ·) is strictly increasing and continuously differentiable for each x ∈ Xn . Let δn−1 (x, ·) denote the inverse function of δn (x, ·). Let δn,u (x, ·) denote the derivative of δn (x, u) with respect to ·, i.e., δn,u (x, u) = 1 + γn δu (x, u) where δu (x, u) = ∂δ (x, u) /∂ u. Then by the inverse function theorem dδn−1 (x, t ) dt

=

1

δn,u x, δn

−1



δn (x, t ) = ηn (x) +

t



t



1 − γn δu x, δn−1 (x, s)







ds + o (γn t )

0

Fn (y | x, z ) ≡ Pr [Yni ≤ y | Xni = x, Zni = z] Pr [G [H1 (Xni ) + Uni + γn δ (Xni , Uni )] y | Xni = x, Zni = z] Pr [δn (x, Uni ) ≤ t (x, y) | Zni = z] Pr Uni ≤ δn (x, t (x, y)) | Zni = z



−1

(B.6)

1 C ∂ G−1 (y) /∂ y

,

where C ̸= 0 is an arbitrary constant. Clearly, s2 : R → R+ if C > 0 and s2 : R → R− if C < 0. The measurable functions S1n and S2 are given by C [ηn (x) − H1 (x)] and CG−1 , respectively.  Proof of Corollary 3.4. Let 1ni = 1 {Yni ∈ Y0 } and 1y = 1{y ∈ Y0 }. By Theorem 3.3, r0n = E [s1n (Xni )] E [s2 (Yni ) 1ni ] + γn EYn EXn Zn [∆n (Yni ; Xni , Zni ) 1ni ] + o (γn ) , r1n (x) = s1n (x) E [s2 (Yni ) 1ni ] + γn E [∆n (Yni ; x, Zni ) 1ni ] + o (γn ) , r2n (y) = E [s1n (Xni )] s2 (y) 1y + γn E [∆n (y; Xni , Zni )] 1y + o (γn ) .

(B.7)

Proof of Theorem 3.5. The proof follows closely from that of Theorem 3.1, now keeping the additional terms that do not vanish under HA (γn ) with γn = n−1/2 b− 4 −1 . Let rni ≡ rn (Yni ; Xni , Zni ) , r1ni ≡ r1n (Xni ) , r2ni ≡ r2n (Yni ), rˆi ≡ rˆ (Yni ; Xni , Zni ) , ˆ n = Bn + oP (1) rˆ1i ≡ rˆ1 (Xni ), and rˆ2i ≡ rˆ2 (Yni ). Noting that B and σˆ n2 = σ02 + oP (1) under HA (γn ), it suffices to show that p

under HA (γn ), (i) Γ1n → µ0 , (ii) Γ4n = op (1) and (iii) Γ5n = op (1), where Γ1n , Γ4n , and Γ5n are defined after (B.1) with r0 , ri , r1i , and r2i now replaced by rn0 , rni , r1ni , and r2ni , respectively. The results in previous lemmas continue to hold when the singleindex IID observations (Yi , Xi , Zi ) are replaced by the double-array ′ ′ IID observations (Yni , Xni , Zni ). Let Vni ≡ (Xni′ , Zni ) and fni ≡ fn (Yni |Vni ) . d



  = FUn |Zn δn−1 (x, t (x, y)) , z . It follows that rn0 (y; x, z ) ≡

and

d

= ηn (x) + t − γn η¯ n (x, t ) + o (γn t ) , (B.4)  t  −1  where η¯ n (x, t ) = 0 δu x, δn (x, s) ds. Let FUn |Zn (·, z ) and fUn |Zn (·, z ) denote the conditional CDF and PDF of Uni given Zni = z, respectively. Let t (x, y) ≡ G−1 (y)−H1 (x). Then by (3.2), (B.4) and the strict monotonicity of G and δn , = ≤ = =

∂x

 ds

for some function ηn (x) that does not depend on t . Noting that 1 = 1 − a + o (a) when a = o (1) , we have 1+a

δn−1 (x, t ) = ηn (x) +

s2 (y) =

C ∂ [ηn (x) − H1 (x)]

¯ n (y; x, z ) ∆   = ∆n (y; x, z ) 1y ◦ {E [s1n (Xni )] E [s2 (Yni ) 1ni ]}     + s1n (x)s2 (y) 1y ◦ EYn EXn Zn [∆n (Yni ; Xni , Zni ) 1ni ]   − {s1n (x) E [s2 (Yni ) 1ni ]} ◦ E [∆n (y; Xni , Zni )] 1y   − {E [∆n (Yni ; x, Zni ) 1ni ] ◦ E [s1n (Xni )] s2 (y)} 1y . 

1 + γn δu x, δn−1 (x, s)

0

s1n (x) =

¯ n (y; x, z ) + It follows that rn (y; x, z ) ◦ r0n − r1n (x) ◦ r2n (y) = γn ∆ o (γn ) for all (y, x, z ) ∈ W0n , where

1



(B.5)

1

  = . 1 + γn δu x, δn−1 (x, t ) (x, t )

It follows that −1

(y) /∂ y



So the functions s1n and s2 exist and are given by

d

ai = OP (b(dz −dx )/2 + b 2 +2 c −dx ).

1

∂ G−1

 × ηn (x) η¯ n,2 (x, t (x, y)) − η¯ n,1 (x, t (x, y)) . 

where B3n ≡ n−3 b





  ∂ δ −1 (x, t (x, y)) /∂ x ∂ Fn (y | x, z ) /∂ x  =  n−1 ∂ Fn (y | x, z ) /∂ y ∂ δn (x, t (x, y)) /∂ y

∂ [ηn (x) + t (x, y) − γn η¯ n (x, t (x, y)) + o (γn t (x, y))] /∂ x ∂ [ηn (x) + t (x, y) − γn η¯ n (x, t (x, y)) + o (γn t (x, y))] /∂ y   ∂ (ηn (x) + t (x, y)) /∂ x − γn η¯ n,1 (x, t (x, y)) + η¯ n,2 (x, t (x, y)) · ∂ t (x, y) /∂ x ≃ ∂ t (x, y) /∂ y − γn η¯ n,2 (x, t (x, y)) · ∂ t (x, y) /∂ y ∂ [ηn (x) − H1 (x)] /∂ x ≃ + γn ∆ n ( y ; x , z ) , ∂ G−1 (y) /∂ y

=

where ≃ is used to indicate that we have suppressed high-order remainder terms, η¯ n,1 and η¯ n,2 are the partial derivative of η¯ n with

(i) holds under HA (γn ) because Γ1n = b 2 +2

2

n   i=1 rni ◦ rn0 −

¯ n (Yni ; Xni , Zni )∥2 ai = µ0 + oP (1) by r1ni ◦ r2ni  ai = n−1 i=1 ∥∆ the law of large numbers. For (ii), we decompose Γ4n as Γ4n = Γ4n,1 + Γ4n,2 − Γ4n,3 − Γ4n,4 , where d

Γ4n,1 ≡ b 2 +2

n

n 

(rni ◦ r0n − r1ni ◦ r2ni )′



rˆi − rni ◦ r0n ai ,





i =1

Γ4n,2 ≡ b

d +2 2

n 

   (rni ◦ r0n − r1ni ◦ r2ni )′ rni ◦ rˆ0 − r0n ai ,

i=1 d

n 

d

n 

Γ4n,3 ≡ b 2 +2

(rni ◦ r0n − r1ni ◦ r2ni )′



rˆ1i − r1ni ◦ r2ni ai ,





i =1

Γ4n,4 ≡ b 2 +2

i =1

   (rni ◦ r0n − r1ni ◦ r2ni )′ r1ni ◦ rˆ2i − r2ni ai .

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

It suffices to prove Γ4n,s = oP (1) for s = 1, 2, 3, 4. We only ¯ ni ≡ prove Γ4n,1 = oP (1) as the other cases are similar. Let ∆ ¯ n (Yni ; Xni , Zni ). Under HA (γn ) we apply Lemma B.3(a) to obtain ∆ 1

d

Γ4n,1 = n− 2 b 4 +1

n 

δni′







rˆi − rni ◦ r0n ai

i=1 1

d

= Γ¯ 4n,1 + n 2 b 4 +1 OP (νbc ) = Γ¯ 4n,1 + oP (1) , n 1 d ¯ ′ni {[b−1 e1 S¯ b (Vni )−1 Vb (Yni ; Xni ) where Γ¯ 4n,1 ≡ n− 2 b 4 +1 i=1 ∆ 3 −2 −1 −1 (L) ′¯ fni − Dxi e2 Sc (Vni ) Vc (Vni ) fni ] ◦ r0n }ai . Write Γ¯ 4n,1 = n− 2       d 2 n n ¯ ′ni ζj (Yni ; Vni ) ◦ r0n ai . Then E Γ¯ 4n,1  = O(b 2d b 4 +1 i=1 j=1 ∆ d

d

+ n−1 b 2 +2 (b−d−2 + c −d−1 ) + n−2 b 2 +2 (b−2d−2 + c −2d−1 )) = o (1), implying that Γ¯ 4n,1 = oP (1). It follows that Γ4n,1 = oP (1) . We now show (iii). Decompose Γ5n = Γ5n,1 − Γ5n,2 where ′     d ¯ ′ni rˆ1i − r1ni ◦ Γ5n,1 = rˆ0 − r0 Γ¯ 5n,1 , Γ5n,2 = γn b 2 +2 ni=1 ∆   n d rˆ − r2ni ai , and Γ¯ 5n,1 ≡ b 2 +2 i=1 [rni ◦ r0n − r1ni ◦ r2ni ] ◦   2i rˆi − rni ai . Analogous to the study of Γ4n,1 , one can readily show that Γ¯ 5n,1 = oP (1). Then by Lemma B.3(b), Γ5n,1 = OP (νbc + n−1/2 b−1 )oP (1) = oP (1). Analogous to the proof of Lemma B.6(f), we can show that Γ5n,2 = oP (γn ). Thus Γ5n = oP (1) . d Consequently, P (Tn ≥ z |HA (n−1/2 b− 4 −1 )) → 1−Φ (z −µ0 /σ0 ). This concludes the proof of the theorem.



Appendix C. Proof of the main results in Section 5 Proof of Proposition 5.1. We first prove the ‘‘if’’ part. By the definition of the hazard function, for any values (y, x, ς) on the f (y|x,ς) support of (Y , X , ξ ) , h (y, x, ς) = 1−F (y|x,ς) , where f (y|x, ς) and F (y|x, ς) are conditional PDF and CDF of Y given (X , ξ ) = (x, ς), respectively. Then, F (y|x, ς)

       − ln (1 − ε) ≤ y X = x, ξ = ς = P G H1 (X ) + ln ξ      − ln (1 − ε) =P ≤ exp G−1 (y) − H1 (X )  X = x, ξ = ς ξ     = P ε ≤ 1 − exp −ς exp G−1 (y) − H1 (x)    = 1 − exp −ς exp G−1 (y) − H1 (x) .    Thus f (y|x, ς) = ς exp −ς exp G−1 (y) − H1 (x) exp[G−1 (y)   − H1 (x)]

d G−1 (y)

and

dy

f (y|x, ς)

h (y, x, ς) =

1 − F (y|x, ς)

 =

exp G−1 (y)



g G−1 (y)





 exp [−H1 (x)] ς

= λ (y) · θ (x) · ς ,     where λ (y) = exp G−1 (y) /g G−1 (y) , g (s) = dG (s) /ds, and θ (x) = exp [−H1 (x)]. This holds for all (y, x, ς) on the support of (Y , X , ξ ), thus the ‘‘if’’ part is proved. Next, we prove the ‘‘only if’’ part. Define the integrated hazard Y function H (Y , X , ξ ) = 0 h (y, X , ξ ) dy. Then H (Y , X , ξ ) =

Y



h (y, X , ξ ) dy

0 Y



λ (y) dy · θ (X ) · ξ = Λ (Y ) · θ (X ) · ξ ,

= 0

95

Y

where Λ (Y ) = 0 λ (y) dy. Let F (Y | X , ξ ) be the conditional CDF of Y given X and ξ . For any distribution function F , the integrated hazard function is related to its distribution function by H (Y , X , ξ ) = − ln (1 − F (Y | X , ξ )). Therefore Λ (Y ) θ (X ) ξ = − ln (1 − F (Y | X , ξ )). Define the random variable ε = F (Y | X , ξ ). By construction ε is uniformly distributed on [0, 1] and ε ⊥ (X  , ξ ) andΛ (Y ) θ (X ) ξ = − ln

(1 − ε). Thus ln [Λ (Y )] = ln − ln(ξ1−ε) + ln θ(1X ) . That is, Y = G [H1 (X ) + U] where G (·) isthe inverse  function of ln[Λ(·)], − ln(1−ε) H1 (X ) = − ln [θ (X )] and U = ln .  ξ Appendix D. Supplementary material Supplementary material related to this article can be found online at http://dx.doi.org/10.1016/j.jeconom.2014.09.008. References Abbring, J.H., Chiappori, P., Zavadil, T., 2008. Better safe than sorry? Ex ante and ex post moral hazard in dynamic insurance data. Working Paper, VU University Amsterdam and Columbia University. Altonji, J., Matzkin, R., 2005. Cross section and panel data estimators for nonseparable models with endogenous regressors. Econometrica 73, 1053–1102. Barrientos-Marin, J., Sperlich, S., 2010. The size problem of bootstrap tests when the null is non- or semiparametric. Rev. Colombiana Estadíst. 33, 307–319. Bierens, H., Ploberger, W., 1997. Asymptotic theory of integrated conditional moment tests. Econometrica 65, 1129–1151. Blundell, R., Powell, J., 2003. Endogeneity in nonparametric and semiparametric regression models. In: Dewatripont, M., Hansen, L., Turnovsky, S. (Eds.), Advances in Economics and Econometrics. Cambridge University Press, Cambridge, pp. 294–311. Blundell, R., Powell, J., 2004. Endogeneity in semiparametric binary response models. Rev. Econom. Stud. 71, 581–913. Cameron, C., Trivedi, P., 2005. Microeconometrics: Methods and Applications. Cambridge University Press, New York. Chiappori, P.-A., Komunjer, I., Kristensen, D., 2013. Nonparametric identification and estimation of transformation models. Working paper, Department of Economics, Columbia University. Cox, D.R., 1972. Regression models and life tables (with discussion). J. R. Stat. Soc. Ser. B 34, 187–220. Ekeland, I., Heckman, J.J., Nesheim, L., 2004. Identification and estimation of hedonic models. J. Political Economy 112, 60–109. Engle, R.F., 2000. The econometrics of ultra-high-frequency data. Econometrica 68, 1–22. Fan, J., Jiang, J., 2005. Nonparametric inference for additive models. J. Amer. Statist. Assoc. 100, 890–907. Fan, J., Yao, Q., Tong, H., 1996. Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika 83, 189–206. Fan, J., Zhang, C., Zhang, J., 2001. Generalized likelihood ratio statistics and Wilks phenomenon. Ann. Statist. 29, 153–193. Fernandes, M., Grammig, J., 2005. Nonparametric specification tests for conditional duration models. J. Econometrics 127, 35–68. Gozalo, P., Linton, O., 2001. Testing additivity in generalized nonparametric regression models with estimated parameters. J. Econometrics 104, 1–48. Greene, W., 2011. Econometric Analysis, seventh ed. Prentice Hall, NJ. Hansen, B.E., 2008. Uniform convergence rates for kernel estimation with dependent data. Econometric Theory 24, 726–748. Härdle, W., Mammen, E., 1993. Comparing nonparametric versus parametric regression fits. Ann. Statist. 21, 1926–1947. Heckman, J.J., Matzkin, R., Nesheim, L., 2005. Estimation and simulation of hedonic models. In: Kehoe, T., Srinivasan, T., Whalley, J. (Eds.), Frontiers in Applied General Equilibrium. Cambridge University Press, Cambridge., pp. 277–340. Heckman, J.J., Robb, R., 1986. Alternative methods for solving the problem of selection bias in evaluating the impact of treatments on outcomes. In: Wainer, H. (Ed.), Drawing Inferences from Self-Selected Samples. SpringerVerlag, New York, pp. 63–107. Mahwah, NJ: Lawrence Erlbaum Associates. Heckman, J.J., Singer, B., 1984. A method for minimizing the impact of distributional assumptions in econometric Models for duration data. Econometrica 52, 271–320. Hoderlein, S., Mammen, E., 2007. Identification of marginal effects in nonseparable models without monotonicity. Econometrica 75, 1513–1518. Hoderlein, S., Su, L., White, H., Yang, T., 2014. Testing for monotonicity in unobservables under unconfoundedness. Working paper, Dept. of Economics, Boston College. Horowitz, J.L., 1996. Semiparametric estimation of a regression model with an unknown transformation of the dependent variable. Econometrica 64, 103–137. Horowitz, J.L., 2001. Nonparametric estimation of a generalized additive model with an unknown link function. Econometrica 69, 499–513.

96

A. Lewbel et al. / Journal of Econometrics 184 (2015) 81–96

Horowitz, J.L., 2014. Nonparametric additive models. In: Racine, J., Su, L., Ullah, A. (Eds.), The Oxford Handbook of Applied Nonparametric and Semiparametric Econometrics and Statistics. Oxford University Press, Oxford, pp. 129–148. Horowitz, J., Lee, S., 2005. Nonparametric estimation of an additive quantile regression model. J. Amer. Statist. Assoc. 100, 1238–1249. Horowitz, J.L., Mammen, E., 2004. Nonparametric estimation of an additive model with a link function. Ann. Statist. 36, 2412–2443. Horowitz, J.L., Mammen, E., 2007. Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions. Ann. Statist. 35, 2589–2619. Horowitz, J.L., Mammen, E., 2011. Oracle-efficient estimation of an additive model with an unknown link function. Econometric Theory 27, 582–608. Ichimura, H., Lee, S., 2011, Identification and estimation of a nonparametric transformation model. Working paper, Department of Economics, Tokyo University. Imbens, G., Newey, W., 2009. Identification and estimation of triangular simultaneous equations models without additivity. Econometrica 77, 1481–1512. Jacho-Chávez, D., Lewbel, A., Linton, O., 2010. Identification and nonparametric estimation of a transformed additively separable model. J. Econometrics 156, 392–407. Keifer, N.M., 1988. Economic duration data and hazard functions. J. Econ. Literature 26, 646–679. Kennan, J.E., 1985. The duration of contract strikes in US manufacturing. J. Econometrics 28, 5–28.

Lancaster, T., 1979. Econometric models for the duration of unemployment. Econometrica 47, 939–956. Li, Q., Racine, J., 2003. Nonparametric estimation of distributions with categorical and continuous data. J. Multivariate Anal. 86, 266–292. Lu, X., White, H., 2014. Testing for separability in structural equations. J. Econometrics 182, 14–26. Masry, E., 1996. Multivariate local polynomial regression for time series: uniform strong consistency rates. J. Time Ser. Anal. 17, 571–599. Mata, J., Portugal, P., 1994. Life duration of new firms. J. Ind. Econ. 42, 227–245. Politis, D., Romano, J., Wolf, M., 1999. Subsampling. Springer-Verlag, New York. Ridder, G., 1990. The nonparametric identification of generalized accelerated failure-time models. Rev. Econom. Stud. 57, 167–181. Rodriguez-Poo, J.M., Sperlich, S., Vieu, P., 2014. Nonparametric specification testing when the null hypothesis is non- or semiparametric. Econometric Theory forthcoming. Su, L., Tu, Y., Ullah, A., 2014. Testing additive separability of error term in nonparametric structural models. Econometric Rev. forthcoming. Van den Berg, G., 2001. Duration models: Specification, identification and multiple durations. In: Heckman, J.J., Leamer, E.E. (Eds.), Handbook of Econometrics, vol. 5. Elsevier, B.V, pp. 3381–3460. White, H., Lu, X., 2011. Causal diagrams for treatment effect estimation with application to selection of efficient covariates. Rev. Econ. Stat. 93, 1453–1459.