Nonparametric identification and estimation of sample selection models under symmetry

Nonparametric identification and estimation of sample selection models under symmetry

Journal of Econometrics ( ) – Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom ...

467KB Sizes 1 Downloads 82 Views

Journal of Econometrics (

)



Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Nonparametric identification and estimation of sample selection models under symmetry Songnian Chen a, *, Yahong Zhou b , Yuanyuan Ji b,c a b c

Hong Kong University of Science and Technology, Hong Kong Shanghai University of Finance and Economics, China Shanghai Academy of Social Sciences, China

article

info

Article history: Received 27 October 2016 Received in revised form 5 June 2017 Accepted 27 September 2017 Available online xxxx JEL classification: C13 C14

a b s t r a c t Under a conditional mean restriction Das et al. (2003) considered nonparametric estimation of sample selection models. However, their method can only identify the outcome regression function up to a constant. In this paper we strengthen the conditional mean restriction to a symmetry restriction under which selection biases due to selection on unobservables can be eliminated through proper matching of propensity scores; consequently we are able to identify and obtain consistent estimators for the average treatment effects and the structural regression functions. The results from a simulation study suggest that our estimators perform satisfactorily. © 2017 Elsevier B.V. All rights reserved.

Keywords: Sample selection Nonparametric estimation Symmetry

1. Introduction Since Heckman’s (1974) seminal work, sample selection models have been widely used in applied research in correcting for bias arising from non-random sampling, which includes applications in modeling the impact of unions, occupational choice, the choice of region of residence and choice of industry, among others. Heckman’s two-step estimator and its extension to semiparametric two-step estimators of Newey (1988), Powell (1989), Ahn and Powell (1993), Chen (1999) and Chen and Zhou (2010) require parametric specification for either the regression function or the error distribution, or both. Therefore, these parametric and semiparametric two-step estimators are, in general, not robust to misspecification of the functional form of the regression function or the parametric error distribution. Das et al. (2003) considered two-step nonparametric estimation of sample selection models under a conditional mean restriction that allows for the same degree of flexibility as standard nonparametric regression. However, in their analysis, the regression function in the outcome equation is only identified up to an unknown constant, and as a result, their method cannot be used to identify or estimate either conditional or unconditional average treatment effects. In this paper, we also consider nonparametric identification

*

Correspondence to: Department of Economics, Hong Kong University of Science and Technology, Clearwater Bay, Kowloon, Hong Kong. E-mail address: [email protected] (S. Chen).

and estimation of the sample selection model, but strengthen the conditional mean restriction in Das et al. (2003) to a joint symmetry restriction. Under this shape restriction on the error distribution, selection biases due to selection on unobservables can be eliminated through proper matching of propensity scores and as a result, we are able to identify and construct consistent estimators for the conditional and unconditional average treatment effects, as well as the structural regression function in the outcome equations. Under the symmetry restriction, we show that selection bias can be eliminated through an appropriate weighting scheme, leading to nonparametric identification and consistent estimation of the outcome regression function and the average treatment effects. Symmetry restrictions on underlying error distribution have been widely used in the literature. In the program evaluation literature, various alternative assumptions have been imposed for the identification of the average treatment effect (ATE). For the control function approach (e.g., Heckman and Navarro-Lozano, 2004), the idea of ‘‘Identification at infinity’’ (Heckman, 1990), which requires a ‘‘large support’’ condition, has played an important role in identifying the average treatment effect (ATE). As pointed out by Heckman and Navarro-Lozano (2004), in practice, the ‘‘large support’’ condition is often unrealistic, and consequently they imposed the joint symmetry restriction instead of appealing to the ‘‘identification at infinity’’ for the identification of ‘‘ATE’’. In the context of binary discrete outcomes, Aakvik et al. (1999), partly

https://doi.org/10.1016/j.jeconom.2017.09.004 0304-4076/© 2017 Elsevier B.V. All rights reserved.

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.

2

S. Chen et al. / Journal of Econometrics (

motivated by Chen (1999), also exploited the joint symmetry restriction for identification of average treatment effect (ATE). From a different perspective, Angrist (2004) considered various assumptions under which IV estimates have broader predictive power beyond the compliers group. In particular, he is interested in the assumptions that link a local Average Treatment Effect (LATE) to the population Average Treatment Effect (ATE), which is not instrument-dependent. Among various assumptions considered, Angrist (2004) noted that the symmetry assumption is more appealing because it is not fundamentally inconsistent with the benchmark Roy-type selection model, unlike ‘‘no selection bias’’ or ‘‘conditional constant effects’’ assumptions. Angrist (2004) suggested that intuitively, with symmetrically distributed latent errors in the index framework, together with a symmetric first stage, the LATE becomes equivalent to the ATE because average treatment effects for individuals with characteristics of the compliers are representative of average treatment effects for individual over the entire distribution.1 Angrist (2004) illustrated these ideas using sibling-sex composition to estimate the effect of childbearing on economics and marital outcomes; in particular, Angirst found in the study that for teen mothers, LATE is indeed identical to the population average treatment effect ATE when the latter is imputed under joint symmetry restriction. While Aakvik et al. (1999) and Angrist (2004) exploited the joint symmetry restriction to identify the average treatment effect (ATE), Chen and Khan (2010) made use of the joint symmetry restriction, together with an ‘‘equality’’ condition,2 to identify the average treatment effect (ATE) on wage inequality. Using the approach developed by Chen and Khan (2010), Antonczyk (2011) estimated the ATE of collective bargaining on the dispersion of wages in Germany. Symmetry restrictions on underlying error distributions have also been used in other models. Chen (2000) and Chen et al. (2016) studied the identification and estimation of binary choice models under the symmetry restriction. Powell (1986), Honoré (1992) and Dong and Lewbel (2011) are yet more models that exploit the symmetry restriction. The paper is organized as follows. Section 2 presents the model and discusses the identification and estimation issues. Section 3 contains the large sample results of the proposed estimators. We provide the results of a Monte Carlo simulation study in Section 4. Section 5 contains the concluding remarks. The proof of the main theorem is relegated to the Appendix. 2. The model and estimator We consider the nonparametric switching regression model yi = y1i di + y0i (1 − di )

(i = 1, . . . , n)

(2.1)

where y1i = g1 (xi ) + u1i

(2.2)

and y0i = g0 (xi ) + u0i

(2.3)

denote the outcome equations under regimes 0 and 1, and the selection equation is of the form di = 1 {m(wi ) > vi } .

(2.4)

1 Alternatively, it implies that expected outcomes for compliers can be obtained as the average of expected outcomes for always and never-takers. 2 In a sense, the ‘‘equality’’ condition resembles the ‘‘similarity’’ condition of Chernozhukov and Hansen (2005).

)



Standard sample selection models correspond to the special case where y0i is identically zero. Here g1 (x) and g0 (x) are unknown functions in the outcome equations, d is the binary selection indicator, m(w ) is an unknown function, x ∈ Rdx and w ∈ Rdw are vectors of the regressors with possibly overlapping components, and (u1 , u0 , v ) are the unobservable error terms independent of (x, w ) with E(u0 |x, w ) = 0. As in Das et al. (2003), we also impose an exclusion restriction such that w contains some component not in x. The main equation can be written as a nonparametric regression model with a random coefficient yi = g0 (xi ) + di (α (xi ) + εi ) + u0i

(2.5)

where α (xi ) = g1 (xi ) − g0 (xi ) and εi = u1i − u0i . In this paper we consider the identification and estimation of α (x) and α = E [α (xi )], or αS = ES [α (xi )] = E [α (xi )|xi ∈ S ], for some fixed set,3 S under the condition that (εi , vi ) is independent of (xi , wi ) and symmetrically distributed around the origin, and E(u0 |x, w ) = 0. Also, assume P(w ) = Pr(m(wi ) > vi |wi = w ) = Fv (m(w )) is a strictly increasing function of m(w ). Under the conditional mean restriction that E(u0 |x, w ) = E(u1 |x, w ) = 0, Das et al. (2003) considered identification and estimation of the nonparametric sample selection model. However, their approach only provides the identification result and consistent estimators for g0 (x) and g1 (x) up to unknown constant terms; as a result, the conditional and the unconditional average treatment effects α (x) and αS are not identified in their context. In contrast, under the joint symmetry restriction, we are able to identify and consistently estimate α (x) and αS (similarly, g0 (x) and g1 (x) as well). Under the conditional mean restriction with a special regressor, Lewbel (2007) considered the estimation of a more general model which includes the switching regression model as a special case; however, Lewbel (2007) requires a large support condition for the special regressor, which is not needed for our approach here. To motivate our identification and estimation approach, consider the regression function g(x, P) = E(yi |xi = x, P(wi ) = P)

= g0 (x) + P α (x) + E(di εi |xi = x, P(wi ) = P) = g0 (x) + P α (x) + λ(P)

(2.6)

where the selection bias term λ(P), which only depends on the propensity score P in this index sufficiency framework, Heckman et al. (1998) in general, does not vanish. However, under the index sufficiency and joint symmetry, the selection bias term λ(P) can be shown to be symmetric around P = 1/2. This symmetry result is exploited below to achieve the cancellation of the selection bias terms and thus facilitate the identification and estimation of the treatment effects parameters. Proposition 1. If (εi , vi ) is independent of (xi , wi ) and symmetrically distributed around the origin, P(w ) = Fv (m(w )) is a strictly increasing function of m(w ), then the selection bias term λ(P) is symmetric around P = 1/2 ; namely, λ(P) = λ(1 − P). Proof. Under the independence and symmetry condition, and the fact that Fv (·) is a strictly increasing function, it is easy to see Fv−1 (P) + Fv−1 (1 − P) = 0. 3 Here S is the set of the common support of x for the two regimes, or a subset i of it; see Heckman et al. (1998) for some detailed discussions.

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.

S. Chen et al. / Journal of Econometrics (

Thus, for any P > 1/2, we have Fv−1 (P) > 0 > Fv−1 (1 − P), and hence

λ(P) = E(di εi |P(wi ) = P) ∫ ∫ } { = 1 Fv−1 (P) > v ε fe (ε, v )dε dv ∫ ∫ { } = 1 Fv−1 (P) > v > Fv−1 (1 − P) ε fe (ε, v )dε dv ∫ ∫ { } + 1 Fv−1 (1 − P) > v ε fe (ε, v )dε dv ∫ ∫ } { = 1 Fv−1 (1 − P) > v ε fe (ε, v )dε dv

)



3

parameters, and Ii = I(wi ) is a nonnegative smooth trimming function so that P(w ) can be reasonably precisely estimated, for w such that fw (w), the density of wi at w is uniformly bounded away from 0 when I(w ) > 0. When the selection probabilities are unknown, we define our estimator5 for α (x), by αˆ (x), where

∑n ( αˆ (x) =

i̸ =j

yi − yj (Pˆ i − Pˆ j )k2

)

∑n

ˆ − Pˆ j )2 k2

i̸ =j (Pi

(

(

xi −x h2

xi −x h2

)

k2

) (

k2

(

xj −x h2

xj −x h2

)

k1

)

k1



Pi +Pˆ j −1 h1



Pi +Pˆ j −1 h1

)

)

Ii Ij

Ii Ij

and the selection probabilities Pi are replaced by nonparametric ˆ wi ) where estimates Pˆ i = P(

= λ(1 − P) where fe (ε, v ) denotes the joint density of (εi , vi ) and the first term on the right-hand side of the third equality vanishes because of the symmetry assumption. In the traditional sample selection/switching regression model, the joint symmetry assumption holds trivially under joint normality. Other jointly symmetric distributions include the class of symmetric NMVM (normal mean–variance mixture) distribution (see, e.g., Fan and Wu, 2010), of which multivariate t-distribution is a special case, among others. Also, it is worth noting that the joint symmetry can be relaxed somewhat; specifically, the above proposition still holds if we assume (u1 − u0 , v ) is jointly symmetrically distributed, which allows u0 and u1 to have any common component. Now we make use of the above proposition for the purpose of identification and estimation. Specifically, note that Proposition 1 implies that

( w −w ) ∑n dk i ˆP(w) = ∑i=1 i ( h ) n wi −w i=1

k

h

where k (·) and h are the kernel function and its corresponding bandwidth parameter. Here, for notational simplicity, we assume all the elements in x and w are continuously distributed; otherwise, we can replace the kernel function with indicator functions for the discrete components as in Ahn and Powell (1993). Finally, we propose to estimate the unconditional average treatment effects αS by

∑n ˆ (xi )IS (xi ) i=1 α . αˆ S = ∑ n i=1 IS (xi )

if g(x, 1 − P) is well defined. By subtracting Eq. (2.6) from Eq. (2.7), we have

where IS (xi ) = 1 if xi ∈ S and 0 otherwise. So far we have considered the identification and estimation of the conditional and unconditional average treatment effects. In the traditional sample selection model, only one of the outcome variables y0 and y1 can be observed, and indeed estimation of the structural regression functions g1 (x) and g0 (x) is the main goal of Das et al. (2003). However, under a mean restriction, Das et al. (2003) can only identify and estimate g0 (x) and g1 (x) up to a constant term. The insights and techniques of our estimator above are also applicable to the identification and estimation of g1 (x) and g0 (x) separately. For the identification of g1 (x), note that

g(x, P) − g(x, 1 − P) = (2P − 1)α (x)

g1∗ (xi , Pi ) = E(di yi |xi , Pi ) = Pi g1 (xi ) + λ1 (Pi )

g(x, 1 − P) = g0 (x) + (1 − P)α (x) + λ(1 − P)

= g0 (x) + (1 − P)α (x) + λ(P)

(2.7)

4

(2.8)

due to the cancellation of the selection bias terms, similar to Angrist (2004), Chen and Khan (2010) and Heckman et al. (1998). Note that when Eq. (2.8) is well defined, it could directly serve as the basis for the identification and estimation of α (x). In particular, Eqs. (2.5)–(2.8) suggest that for a pair of observations (i, j) such that xi ≈ x, xj ≈ x and Pi + Pj ≈ 1, where Pi = P(wi ) and Pj = P(wj ), then yi − yj ≈ (Pi − Pj )α (x) + ξij (x)

(2.9)

where ξij (x) = ξi (x) − ξj (x), with E(ξi (x)|xi , Pi ) = 0 and E(ξij (x)|xi , xj , Pi , Pj ) = 0. As a result, α (x) can be viewed as the local slope coefficient for the local linear regression model (2.9) with Pi − Pj as the regressor; consequently, a local least squares type approach suggests that a natural estimator for α (x) can be defined as

αˆ ∗ (x) =

i̸ =j

)

yi − yj (Pi − Pj )k2

∑n

i̸ =j (Pi

− Pj )2 k2

(

(

xi −x h2

xi −x h2

)

k2

) (

g1∗ (x, P) − g1∗ (x, 1 − P) = (2P − 1)g1 (x) Hence, a nonparametric estimator for g1 (x) can be defined similar to αˆ (x). Similarly, g0 (x) can be identified and estimated nonparametrically. 3. Large sample properties

ξi (x) = (di − Pi ) α (x) + di εi − λ(Pi ) + u0i

∑n (

where λ1 (x, P) = E(di u1i |xi = x, Pi = P). Then, under the condition that (vi , u1i ) is independent of (xi , wi ) and symmetrically distributed around the origin, we have λ1 (P) = λ1 (1 − P), similar to the symmetry property of λ(P); therefore

k2

(

xj −x h2

xj −x

)

h2

)

k1

(

k1

(

Pi +Pj −1 h1

Pi +Pj −1 h1

)

)

Ii Ij

Ii Ij

if Pi were known, for i = 1, 2, . . . , n, where k1 (·) and k2 (·) are two kernel functions with h1 and h2 the corresponding bandwidth 4 Both g(x, P) and g(x, 1 − P) are well defined if the support of m(w ) is the entire i real line given xi = x, which requires the exclusion mentioned above. We thank a referee for pointing out this issue.

In this section, we present the large sample properties of our estimators defined in the previous section. First, we make the following assumptions. Assumption 1: (yi , di , xi , wi ), i = 1, 2, . . . , n, is an i.i.d sample generated from Eqs. (2.1)–(2.5), with finite second moments for each component of (yi , g0 (xi ) , g1 (xi )). 5 Note that in a way Pˆ can be viewed as a generated regressor, which is estimated i nonparametrically by kernel regression in the first step. Recently Mammen et al. (2012) studied a general nonparametric regression model with generated regressors that are based on first step local polynomial regression estimates. There are two aspects of our estimator, namely, the pairwise matching of xi and xj around x and the symmetric matching of Pˆ i and 1 − Pˆ j , which seem to prevent the direct application of the results of Mammen et al. (2012). However, given the close connection, it is interesting to further explore the extension of Mammen et al. (2012) in our setup. We appreciate an anonymous referee for pointing out this link.

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.

4

S. Chen et al. / Journal of Econometrics (

Assumption 2: The error term (εi , vi ) is independent of (xi , wi ) and symmetrically distributed around the origin, that is, fe (ϵ, v ) = fe (−ϵ, −v ), and E(u0i |xi , wi ) = 0. Furthermore, the selection probability function P(w ) = E(d|w ) = Fv (m(w )) is strictly increasing in m(w ). Let f (x, P) denote the joint density function of (xi , Pi ) at (x, P) and fw (w ) the density of wi at w . Assumption 3: (i) P(w ) and fw (w ) are continuously differentiable up to lth order, and all these derivatives are uniformly bounded; (ii) f (x, P) and the regression function g(x, P) = E(yi |xi = x, P(wi ) = P) are l2 th order and l1 th order continuously differentiable with respect to x and P , respectively, α (x) is l2 th order continuously differentiable, and all these derivatives are uniformly bounded; (iii) I(w ) is a nonnegative smooth trimming function such that fw (w ) is uniformly bounded away from 0 if I(w ) > 0. Assumption 4: The kernel functions k(.) and k2 (.) are continuously differentiable with bounded derivative,6 k1 is twice continuously differentiable with uniformly bounded first and∫second order and (i) k(.) is bounded and symmetric, k(t)dt = 1, ∫derivatives, t j k(t)dt ∫= 0, 1 ≤ |j| < ∫ l; (ii) k1 (.) is bounded and symmetric, satisfying k1 (t)dt = 1, t j k1 (t)dt =∫0, 1 ≤ |j| < l1∫; (iii) k2 (.) is bounded and symmetric and satisfies k2 (t)dt = 1, t j k2 (t)dt = 0, 1 ≤ |j| < l2 . Let πn = ln1/2 (n)(nhdw )−1/2 + hl . Assumption 5: The bandwidth parameters h, h1 and h2 all tend to 0 as n → ∞ and satisfy: ( dx )−1/2 3 (1) πn2 h− 1 = o nh2 d

2d

d

2l

(2) (h2 x h21 hdw )−1 = o (n), nh2x h2l = o(1) and nh2x h1 1 = o(1). (3)



d l nh2x h22

→ c for some c ≥ 0.

Let π2n = ln1/2 (n)(nh2x )−1/2 + h22 . Assumption 5’: The bandwidth parameters h, h1 and h2 all tend to 0 as n → ∞ and satisfy: ( −1/2 ) 3 (1) πn2 h− 1 = o n d

l

2l

2d

(2) (h2 x h21 hdw )−1 = o(n), nh2l = o (1) and nh1 1 = o(1). −1/2

(3) π = o n . Assumption 1 describes the data generating process; the independence restriction can in fact be weakened slightly so that the conditional distribution of (εi , vi ) is allowed to depend on (xi , wi ), but only through |m(wi )|, so that fe (ϵ, v | x, w ) = fe (ϵ, v | |m(w )|). Assumption 2 imposes a shape restriction on the error distribution, which implies that the selection bias terms due to ‘‘selection on unobservables’’ can be eliminated through proper matching of the selection probabilities; as a result, we are able to identify the conditional and unconditional average treatment effects. Assumption 3 contains certain boundedness and smoothness conditions. As in Das et al. (2003), for the purpose of identification we need an exclusion restriction that w contains a component that does not belong to x. Assumptions 4 and 5 place restrictions on the kernel functions and bandwidth sequences. Note that three bandwidth sequences need to be chosen to take into account the impact of the first step nonparametric estimation on the limiting distribution of the second step estimator. Also note that Assumption 5’ is required for the estimation of αS , whereas asymptotic results for αˆ (x) requires √ Assumption 5; the former requires undersmoothing to achieve n consistency for αˆ S , compared with Assumption 5. The following theorem contains the main results. 2 2n

(

)



Theorem 2. (i) Under Assumptions 1–5, Suppose S(x), defined below, is nonzero, then αˆ (x) is consistent and asymptotically normal,



l

d

nh2x (αˆ (x) − α (x) − h22 b(x)) → N(0, V1 )

where

∫ V1 =

k22 (u)dufx (x)E ηi2 (x)|xi = x ,

[

]

with

ηi (x) = 2S(x)−1 ξi (2Pi − 1)f (x, 1 − Pi )Ii I ∗ (x, 1 − Pi ) for ξi = (di − Pi )α (xi ) + (di εi − λ(Pi )) + u0i , I ∗ (x, P) = E(Ii |xi = x, Pi = P), and S(x) = E (2Pi − 1)2 f (x, 1 − Pi )I ∗ (x, Pi )I ∗ (x, 1 − Pi ) ,

[

]

and the asymptotic bias term b(x) = b1 (x) + b2 (x), where b1 (x) and b2 (x) are defined in Lemmas A.3 and A.4 in the Appendix. (ii) Under Assumptions 1–4 and Assumption 5’, Suppose S(x) is uniformly bounded away from zero for x ∈ S, then αˆ S is consistent for αS and is asymptotically normal



n(αˆ S − αS ) → N(0, V2 )

where V2 = E δi2 , for δi = (EIS (xi ))−1 [(δ1i − δ2i + (α (xi ) − αS )) IS (xi )], with

δ1i = ξi

(2Pi − 1) S(xi )

fx (xi )f (xi , 1 − Pi )Ii I ∗ (xi , 1 − Pi )IS (xi )

and

δ2i =

α (x˜ , x∗i ) f (x˜ , wi )f (x˜ , x∗i , 1 − Pi ) fw (wi ) S(x˜ , x∗i ) fx (x˜ , x∗i )I(x˜ , wi )I ∗ (x˜ , x∗i , 1 − Pi )IS (x˜ , x∗i )dx˜ d i − Pi



(2Pi − 1)

The proof of this theorem is given in the Appendix. 4. A Monte Carlo study

)

6 For a d-dimensional vector, v = (v , . . . , v )′ and a corresponding vector of 1 d j j integers, j = (j1 , . . . , jd )′ , v j denotes v11 · · · vdd . Also, we adopt the convention ∂ j ω(v )/∂v j = ∂ |j| ω(v )/∂ j1 v1 · · · ∂ jd vd for a differentiable function, ω, with |j| = ∑d i=1 ji .

In this section, we present the results of a Monte Carlo study to illustrate the usefulness of our proposed estimators. The results from 500 replications from each design are presented with sample sizes of 400 and 800, respectively. We report the Mean, SD (standard deviation) and RMSE (root mean square error) of the estimators for each design. For designs I and I’, the data are generated by the following model, y = x + d(α (x) + ϵ ) + u1 d = 1{0.5x + w + u2 > 0} where α (x) = 1 + x, u1 is drawn from the standard normal distribution, x is uniformly distributed on (0, 1), w is drawn from the normal distribution with mean zero and variance 4, and they are independent of each other. Table 1 reports the simulation results when u2 is drawn from the normal distribution N(0, 2). Table 2 reports the results for the design with a non-normal error where u2 is generated by a mixture of normal random variables as follows, u2 = 0.5 ∗ (N(0, 1) + 1) + 0.5 ∗ (N(0, 1) − 1). We set ϵ = 0.5u2 . For designs II and II’ and designs III and III’, the data are generated from the model y = x2 + d(α (x) + ϵ ) + u1 d = 1{0.5x2 + w + u2 > 0}.

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.

S. Chen et al. / Journal of Econometrics ( Table 1 Design I (normal). x

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

α (x) 1.100 1.200 1.300 1.400 1.500 1.600 1.700 1.800 1.900

ATE (α)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

α (x) 1.100 1.200 1.300 1.400 1.500 1.600 1.700 1.800 1.900

ATE (α)

Sample size = 400

Sample size = 800

x

Bias

SD

RMSE

Bias

SD

RMSE

0.061 0.029 0.010 0.002 −0.001 −0.004 −0.017 −0.053 −0.114

0.284 0.251 0.233 0.229 0.231 0.234 0.242 0.264 0.306

0.290 0.252 0.233 0.229 0.231 0.234 0.242 0.269 0.327

0.060 0.033 0.023 0.021 0.020 0.015 0.004 −0.023 −0.074

0.212 0.188 0.181 0.179 0.178 0.179 0.182 0.195 0.227

0.220 0.191 0.182 0.180 0.179 0.179 0.182 0.196 0.238

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−0.003

0.197

0.197

0.018

0.142

0.143

ATE (α )

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ATE (α )

α (x) 1.010 1.040 1.090 1.160 1.250 1.360 1.490 1.640 1.810

5

α (x) 1.010 1.040 1.090 1.160 1.250 1.360 1.490 1.640 1.810

Sample size = 400

Sample size = 800

Bias

SD

RMSE

Bias

SD

RMSE

0.021 0.037 0.050 0.061 0.069 0.064 0.034 −0.033 −0.144

0.292 0.258 0.238 0.229 0.229 0.238 0.253 0.277 0.312

0.293 0.260 0.243 0.237 0.239 0.247 0.255 0.278 0.343

0.023 0.024 0.021 0.019 0.025 0.035 0.033 −0.006 −0.093

0.204 0.181 0.171 0.170 0.172 0.173 0.177 0.190 0.214

0.205 0.183 0.173 0.171 0.173 0.176 0.180 0.190 0.234

−0.009

0.200

0.200

−0.001

0.144

0.144

Table 5 Design III (normal).

Sample size = 400

Sample size = 800

x

Bias

SD

RMSE

Bias

SD

RMSE

0.060 0.026 0.008 −0.001 −0.004 −0.012 −0.033 −0.074 −0.136

0.299 0.272 0.257 0.251 0.248 0.251 0.262 0.279 0.305

0.305 0.273 0.257 0.251 0.248 0.251 0.264 0.288 0.334

0.073 0.044 0.031 0.025 0.018 0.009 −0.007 −0.038 −0.090

0.213 0.189 0.185 0.188 0.183 0.181 0.188 0.198 0.216

0.225 0.194 0.187 0.189 0.184 0.181 0.188 0.201 0.233

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−0.013

0.201

0.201

0.003

0.147

0.147

ATE (α )

Table 3 Design II (normal). x



Table 4 Design II’ (nonnormal).

Table 2 Design I’ (nonnormal). x

)

α (x) 1.316 1.447 1.548 1.633 1.707 1.775 1.837 1.894 1.949

Sample size = 400

Sample size = 800

Bias

SD

RMSE

Bias

SD

RMSE

0.062 0.004 −0.009 −0.003 0.007 0.006 −0.009 −0.042 −0.091

0.293 0.260 0.240 0.235 0.238 0.245 0.254 0.270 0.306

0.299 0.260 0.240 0.235 0.238 0.244 0.254 0.273 0.319

0.061 0.020 0.013 0.016 0.018 0.020 0.016 0.002 −0.029

0.203 0.182 0.173 0.172 0.172 0.173 0.178 0.189 0.209

0.212 0.183 0.174 0.173 0.173 0.174 0.179 0.189 0.211

0.009

0.193

0.193

0.022

0.136

0.137

Table 6 Design III’ (Nonnormal). Sample size = 400

Sample size = 800

x

Bias

SD

RMSE

Bias

SD

RMSE

−0.008 0.013 0.029 0.041 0.048 0.044 0.017 −0.047 −0.154

0.289 0.257 0.236 0.228 0.233 0.240 0.249 0.274 0.316

0.289 0.257 0.237 0.232 0.238 0.244 0.249 0.278 0.352

0.026 0.032 0.031 0.031 0.039 0.051 0.050 0.011 −0.078

0.204 0.181 0.171 0.171 0.169 0.169 0.177 0.189 0.211

0.206 0.184 0.174 0.173 0.173 0.177 0.184 0.189 0.225

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−0.001

0.191

0.191

0.015

0.130

0.131

ATE (α )

The function α (x) is set to 1 + x2 in designs II and II’ and α (x) = √ 1 + x in designs III and III’, with x, u1 , u2 and w generated as in Designs I and I’, respectively. We choose normal kernel functions in both stages. For the bandwidth selection of h in the first step, we adopt Silverman’s (1986) rule-of-thumb approach. For the bandwidths h1 and h2 in the second step, similar to Abrevaya and Shin (2011), we choose h2 = c ∗ sd(x) ∗ n−λ and h1 = c ∗ sd(P) ∗ n−λ , and we experimented with c = 1, 1.5, 2, and λ = 1/5 and 1/6. Here we only report the results for c = 1.5 and λ = 1/5, since the estimates are quite stable for other values of c and λ. From the simulation results reported in Tables 1–6, based on the above designs, we can see that our proposed estimators have reasonably good finite sample performance for the estimation of α and α (x), especially when x is not near the end points 0 or 1. Our method appears to be fairly robust since we have considered the functions of different shapes, including a square root function, a linear function and a quadratic function. While the implementation of our method involves choosing several smoothing parameters, the estimates do not appear to be very sensitive to their choices. We now consider cases with more covariates. For Design IV, the data are generated according to the model: y = x2 + d(α (x) + ϵ ) + u1

α (x) 1.316 1.447 1.548 1.633 1.707 1.775 1.837 1.894 1.949

Sample size = 400

Sample size = 800

Bias

SD

RMSE

Bias

SD

RMSE

0.047 −0.004 −0.014 −0.007 0.004 0.006 −0.006 −0.035 −0.077

0.298 0.272 0.255 0.246 0.243 0.239 0.238 0.251 0.286

0.301 0.271 0.255 0.245 0.243 0.239 0.238 0.253 0.296

0.058 0.016 0.013 0.021 0.023 0.018 0.007 −0.018 −0.059

0.209 0.189 0.184 0.183 0.180 0.176 0.176 0.189 0.217

0.216 0.190 0.184 0.184 0.182 0.177 0.176 0.189 0.224

0.009

0.203

0.203

0.012

0.132

0.133

d = 1{0.5x2 + z1 + z2 + u2 > 0}. where α (x) = 1 + x2 , u1 is drawn from the standard normal N(0, 1), x is from the uniform distribution U(0, 1), z1 and z2 are drawn from independent standard normal distribution, and u2 is drawn from the normal distribution N(0, 2) with ϵ = 0.5u2 . For Design V, the data are generated from the model: y = x2 + d(α (x, x1 ) + ϵ ) + u1 d = 1{0.5x2 + z2 + u2 > 0}. where the function α (x, x1 ) is set to 1 + x2 + x1 , with x, u1 , u2 and z2 generated as in Designs IV, and x1 is drawn from the uniform distribution U(0, 1). For these two designs, we choose the fourth order Gaussian kernel in the first step, with its bandwidth h chosen based on Silverman’s rule-of-thumb (1986). In the second step, we use the second order normal kernel with the bandwidth parameters h1 = c ∗ sd(p) ∗ n−1/6 and h2 = c ∗ sd(x) ∗ n−1/6 for Design IV, and h2x = c ∗ sd(x) ∗ n−1/7 for x, h2x1 = c ∗ sd(x1 ) ∗ n−1/7 for x1 and h1 = c ∗ sd(p) ∗ n−1/7 for Design V. We report the results in Tables 7–10 with c = 1 and 1.5. For Design IV where the estimated function is one-dimensional, the reported biases are not very sensitive to the choice of bandwidth, although there is noticeable sensitivity of

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.

6

S. Chen et al. / Journal of Econometrics (

Table 7 Design IV, c = 1.

α (x)

x

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ATE (α )

n = 800

x

Bias

SD

RMSE

Bias

SD

RMSE

−0.176 −0.163 −0.144 −0.131 −0.129 −0.132 −0.148 −0.205 −0.344

0.423 0.384 0.391 0.402 0.401 0.422 0.432 0.430 0.534

0.458 0.417 0.417 0.422 0.421 0.442 0.456 0.476 0.635

−0.067 −0.049 −0.048 −0.056 −0.055 −0.052 −0.066 −0.103 −0.199

0.303 0.277 0.280 0.268 0.261 0.278 0.293 0.306 0.347

0.310 0.281 0.284 0.274 0.266 0.283 0.301 0.323 0.400

−0.182

0.258

0.316

−0.093

0.164

0.188

x1

α (x)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

n = 400

1.010 1.040 1.090 1.160 1.250 1.360 1.490 1.640 1.810

ATE (α )

SD

RMSE

Bias

SD

RMSE

−0.135 −0.120 −0.107 −0.096 −0.090 −0.100 −0.139 −0.221 −0.355

0.360 0.333 0.325 0.324 0.327 0.338 0.351 0.374 0.437

0.384 0.353 0.342 0.338 0.339 0.353 0.377 0.434 0.563

−0.056 −0.042 −0.035 −0.029 −0.023 −0.024 −0.050 −0.117 −0.231

0.262 0.235 0.225 0.217 0.216 0.224 0.234 0.253 0.290

0.268 0.239 0.227 0.218 0.217 0.225 0.239 0.279 0.371

−0.168

0.257

0.307

−0.085

0.161

0.182

0.1

x1

α (x, x1 )

n = 400

n = 800

Bias

SD

RMSE

Bias

SD

RMSE

0.022

0.1 0.3 0.5 0.7 0.9

1.110 1.310 1.510 1.710 1.910

−0.076 −0.175 −0.198 −0.205 −0.235

1.218 1.073 1.068 1.083 1.350

1.219 1.086 1.085 1.101 1.369

−0.001 −0.029 −0.051 −0.136

0.812 0.702 0.740 0.761 0.960

0.812 0.702 0.740 0.762 0.968

0.3

0.1 0.3 0.5 0.7 0.9

1.190 1.390 1.590 1.790 1.990

−0.014 −0.091 −0.149 −0.132 −0.169

0.988 0.846 0.848 0.852 1.121

0.988 0.850 0.861 0.861 1.133

0.039 0.004 −0.004 −0.025 −0.024

0.666 0.600 0.608 0.636 0.815

0.666 0.600 0.608 0.636 0.815

0.5

0.1 0.3 0.5 0.7 0.9

1.350 1.550 1.750 1.950 2.150

0.012 −0.041 −0.098 −0.083 −0.158

0.961 0.872 0.884 0.951 1.153

0.961 0.872 0.888 0.954 1.162

0.052 0.060 0.018 −0.023 0.004

0.658 0.595 0.640 0.666 0.836

0.660 0.597 0.640 0.666 0.836

0.7

0.1 0.3 0.5 0.7 0.9

1.590 1.790 1.990 2.190 2.390

0.074 −0.056 −0.052 −0.012 −0.102

1.034 0.932 0.875 0.940 1.223

1.036 0.933 0.876 0.940 1.226

0.133 0.088 0.034 0.040 0.075

0.742 0.643 0.705 0.725 0.850

0.753 0.648 0.705 0.725 0.852

0.9

0.1 0.3 0.5 0.7 0.9

1.910 2.110 2.310 2.510 2.710

−0.169 −0.343 −0.301 −0.249 −0.265

1.529 1.255 1.264 1.274 1.664

1.537 1.300 1.298 1.297 1.684

−0.128 −0.113 −0.137 −0.158 −0.189

0.990 0.890 0.986 1.041 1.105

0.997 0.897 0.995 1.052 1.120

−0.138

0.354

0.380

−0.030

0.220

0.222

ATE (α )

n = 800 SD

RMSE

1.110 1.310 1.510 1.710 1.910

0.049 −0.062 −0.119 −0.156 −0.231

0.781 0.676 0.645 0.672 0.857

0.781 0.678 0.655 0.689 0.887

0.095 0.033 0.003 −0.037 −0.116

0.556 0.458 0.462 0.513 0.660

0.564 0.459 0.462 0.514 0.670

0.3

0.1 0.3 0.5 0.7 0.9

1.190 1.390 1.590 1.790 1.990

0.103

−0.003 −0.065 −0.098 −0.171

0.652 0.549 0.529 0.560 0.728

0.659 0.549 0.532 0.568 0.747

0.127 0.063 0.035 0.010 −0.036

0.458 0.384 0.391 0.430 0.547

0.474 0.389 0.393 0.430 0.548

0.5

0.1 0.3 0.5 0.7 0.9

1.350 1.550 1.750 1.950 2.150

0.161 0.050 0.003 −0.021 −0.105

0.636 0.560 0.535 0.575 0.745

0.656 0.562 0.535 0.575 0.751

0.186 0.120 0.084 0.069 0.045

0.460 0.394 0.391 0.426 0.527

0.495 0.411 0.400 0.432 0.529

0.7

0.1 0.3 0.5 0.7 0.9

1.590 1.790 1.990 2.190 2.390

0.138 0.026 0.009 −0.006 −0.105

0.709 0.615 0.563 0.607 0.818

0.721 0.615 0.563 0.607 0.824

0.189 0.114 0.077 0.073 0.055

0.521 0.430 0.442 0.476 0.577

0.553 0.444 0.448 0.481 0.579

0.9

0.1 0.3 0.5 0.7 0.9

1.910 2.110 2.310 2.510 2.710

−0111 −0.235 −0.237 −0.239 −0.333

0.926 0.764 0.720 0.773 1.051

0.932 0.798 0.757 0.808 1.101

−0.052 −0.116 −0.150 −0.162 −0.186

0.649 0.548 0.586 0.634 0.727

0.650 0.559 0.604 0.654 0.750

−0.068

0.318

0.325

0.013

0.211

0.211

ATE (α )

Bias

SD

RMSE

Bias

5. Conclusion

Table 9 Design V, c = 1. x

n = 400

0.1 0.3 0.5 0.7 0.9

n = 800

Bias

α (x, x1 )

0.1

Table 8 Design IV, c = 1.5. x



Table 10 Design V, c = 1.5. n = 400

1.010 1.040 1.090 1.160 1.250 1.360 1.490 1.640 1.810

)

the standard errors. For Design V where we are estimating a twodimensional function, our estimates do exhibit more sensitivity to the bandwidth choice, especially in terms of the standard errors, reflecting the general difficulty in nonparametric estimation with increasing dimension. On the other hand, the overall performances of our estimators are still reasonably satisfactory, with overall biases relatively small.

In this paper we have considered nonparametric identification and estimation of a sample selection model under a joint symmetry assumption. Unlike Das et al. (2003) which relies on a conditional mean restriction, we are able to identify the underlying regression function in the outcome equation, and thus further obtain consistent estimators for the conditional and unconditional average treatment effects. We also derive the consistency and asymptotic normality of our estimators. A Monte Carlo simulation study indicates the usefulness of our approach. There are several issues worth further investigation. As our estimator is of a local linear least squares type, potentially a weighted version might lead to improved performance; because of the presence of asymptotic bias in nonparametric estimation, a careful analysis would be needed, which is to be considered in a future project. As the joint symmetry delivers some very powerful results, another interesting issue is to examine the validity of this shape restriction in practice since researchers may not be sure whether the joint symmetry holds for a particular application. It would be most ideal to formulate a residual-based test, as in Fan and Gencay (1995) and Fang et al. (2015). However, given our model setup in which the latent residual terms in the binary selection equation cannot be observed, direct test for the joint symmetry is not feasible. On the other hand, rather than making use of the joint symmetry directly, our identification result and estimation procedure are only based on the equality λ(P) = λ(1 − P), which is implied by the joint symmetry assumption. Indeed, as we have noted earlier, the joint symmetry assumption can be relaxed somewhat. Therefore, we could consider test the relation λ(P) = λ(1 − P) directly for possible model misspecification, or equivalently, a test can be designed to check the validity of g(x, P)− g(x, 1 − P) = (2P − 1)α (x). For example, given our estimator αˆ (x) ˆ , P) for g(x, P), a possible test and the nonparametric estimator g(x statistic could be constructed based on the magnitude of n 1 ∑[

n

ˆ i , Pˆ i ) − g(x ˆ i , −(2Pˆ i − 1)αˆ (Xi ) g(x

]2

wi

i=1

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.

S. Chen et al. / Journal of Econometrics (

where wi is the weight given to the ith observation. Alternatively, other approaches are also possible. It is an interesting topic for future research.

(

1 h1

k1

7

)

Pˆ i + Pˆ j − 1 h1

(

1

=

Appendix



and

Acknowledgments We would like to thank two anonymous referees and an associate editor for their insightful comments and suggestions, which have greatly improved the presentation of the paper. Zhou’s research was supported by NSFC through No. 71471108, Key Laboratory of Mathematical Economics (SUFE), Ministry of Education, and Program for Innovative Research Team of Shanghai University of Finance and Economics.

)

h1

Pi + Pj − 1

k1

h1

(

1

k′ 2 1

+

Pi + Pj − 1

)[

h1

h1



Pi + P¯ j − 1

1

k′′ 3 1

+

) ]

)[

(Pˆ i − Pˆ j ) − (Pi − Pj )

h1

h1

(Pˆ i − Pˆ j ) − (Pi − Pj )

]2

where P¯ i is between Pi and Pˆ i for i = 1, 2, . . . , n. Then a Taylor expansion yields Sn (x) = Sn0 (x) + Sn1 (x) + Sn2 (x) + Sn3 (x) + Snr (x)

First, we state a lemma that contains some results useful for proving the main theorem.

where 1

Sn0 (x) =

Lemma A.1. Under Assumptions 1, 3 and 4, we have

(

(A.1)

fˆw (w ) − fw (w ) = Op (πn )

× k2 1

Sn1 (x) =

( k2

i=1

These are standard results in nonparametric estimation literature; see, e.g., Lemmas B1 and B2 in Newey and McFadden (1994) for the uniform rate of convergence results (A.1) and (A.2); (A.3) follows a linearization and application of the above rate results. Proof of Theorem 2. Write αˆ (x) as

1

Sn2 (x) =

k2

2

1

Snε (x) =

1

n ( ∑

1 2

(

xi − x

)

(

xj − x

)

h2

k1

Pˆ i + Pˆ j − 1 h1

and Sn (x) =

n ∑

n(n − 1) h2dx h1 2

( × k2

xj − x h2

(Pˆ i − Pˆ j )2 k2

(

i ̸ =j

(

) k1

Pi + Pj − 1

Pˆ i + Pˆ j − 1 h1

xi − x

)

h2

) Ii Ij

(

(Pi − Pj ) k2

xi − x

(

) k2

h2

[

xj − x

)

h2

]

Ii Ij (Pˆ i − Pi ) − (Pˆ j − Pj )

2

)

(

)(

3 2 h− 1 πn

)

n ⏐ 1 ∑ (Pi − Pj )2 ⏐⏐k2 2d



1 n(n − 1) h

(

xi − x h2

x

2

)

i̸ =j

( k2

)⏐ ⏐ Ii Ij ⏐

xj − x ⏐ h2

Using the U-statistics result in Lemma A.3 of Ahn and Powell −2d 1 (1993), if h2 x h− 1 = o (n), we have Sn0 (x) = E(Pi − Pj )2

( × k1

1 2d

h2 x h1

( k2

Pi + Pj − 1

h1 = S(x) + op (1)

2

= (Pi − Pj )2 + (Pˆ i − Pˆ j )2 − (Pi − Pj )2 [ ][ ] = (Pi − Pj )2 + (Pˆ i − Pˆ j ) + (Pi − Pj ) (Pˆ i − Pˆ j ) − (Pi − Pj ) [ ] = (Pi − Pj )2 − 2(Pi − Pj ) (Pˆ i − Pˆ j ) − (Pi − Pj ) [ ]2 + (Pˆ i − Pˆ j ) − (Pi − Pj )

Ii Ij

h1 2

)

)

with

We first consider Sn (x). Note that for any pair of (i, j), we have (Pˆ i − Pˆ j )

k1

i

Snr (x) = 1

(

)

i ̸ =j

h1

) Ii Ij

Ii Ij

h1

]2

xj − x

Pi + Pj − 1

(

(

)

and



1

k′1

Pi + Pj − 1

∗ −3 ∗ h1 max |Pˆ i − Pi | = Op Snr Snr (x) = Op Snr

i̸ =j

k2

h2

)

yi − yj − (Pˆ i − Pˆ j )α (x) (Pˆ i − Pˆ j )

n(n − 1) h2dx h1 k2

h21

k1

h2

n(n − 1) h2dx h1

where

(

)

n ∑

1

(

xj − x

i ̸ =j

k2

h2

]

(Pˆ i − Pi ) − (Pˆ j − Pj )

(

)

1

(A.4)

Sn (x)

xi − x

[

n [ ∑

1 2

(

Ii Ij

2(Pi − Pj ) (Pˆ i − Pi ) − (Pˆ j − Pj )

h2

n(n − 1) h2dx h1

Sn3 (x) =

Snε (x)

k2

h2

h2

i ̸ =j

(

)

)

)

h1

n ∑

1

xi − x

Pi + Pj − 1

k1

n(n − 1) h2dx h1

uniform in w ∈ W0 for any compact set W0 , on which, fw (·) is uniformly bounded away from 0.

αˆ (x) = α (x) +

(

2

( ) n ∑ ( ) di − P(w ) wi − w k + Op πn2 (A.3) f w (w ) h

nhdw

)

h2

xi − x

i̸ =j

(A.2)

and

ˆ w) − P(w) = P(

xj − x

(

2

(Pi − Pj ) k2

n(n − 1) h2dx h1 2

ˆ w) − P(w) = Op (πn ) P(

1

n ∑

1

)

xi − x h2

)

( k2

xj − x

)

h2

Ii Ij + op (1)

where S(x) = E (2P − 1)2 f (x, 1 − P)I ∗ (x, P)I ∗ (x, 1 − P) .

[

]

Similarly, Sn1 (x) = E

1 2d h2 x h1

2

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.

8

S. Chen et al. / Journal of Econometrics (

⏐ ) ( ) ( )⏐ ( ⏐ xj − x Pi + Pj − 1 ⏐ xi − x ⏐ × ⏐⏐(Pi − Pj )k2 k2 k1 ⏐ h2 h2 h1

)



× f (x, 1 − Pi )Ii I ∗ (xi , 1 − Pi ) + op

× Ii Ij × Op (πn ) = op (1).

Sn2ε (x)

=

l b1 (x)h22

=

l b2 (x)h22

+ op

((

d nh2x

((

d

nh2x

)−1/2 )

)−1/2 )

Using similar arguments, we can also show Sn2 (x) = Sn3 (x) = op (1)

Sn3ε (x)

In addition, we also have Snr (x) = op (1) by Lemma A.1. Therefore, we obtain

and

Sn (x) = S(x) + op (1).

Sn4ε (x) = op

(A.5)

Next, we consider Snε (x). Note that

where

n 1∑

Snε (x) =

n

yi − Pˆ i α (x) = yi − Pi α (x) − (Pˆ i − Pi )α (x)

= g0 (xi ) + λ(Pi ) + Pi (α (xi ) − α (x)) − (Pˆ i − Pi )α (x) + ξi = ϕi + Pi (α (xi ) − α (x)) − (Pˆ i − Pi )α (x) + ξi

Snε (x) = Sn1ε (x) + Sn2ε (x) + Sn3ε (x) − α (x)Sn4ε (x)

(A.6)

where n



2

( × k2

xj − x

1

Sn2ε (x) =

k1

( × k2

Pˆ i + Pˆ j − 1



2d h2 x h1 i̸=j

xj − x

k1

h2

xi − x h2

d

nh2x

Ii Ij

(A.7)

ϕij (Pˆ i − Pˆ j )k2

(

xi − x h2

Pˆ i + Pˆ j − 1

) Ii Ij

h1

(A.8)

2

Pi (α (xi ) − α (x)) − Pj (α (xj ) − α (x)) (Pˆ i − Pˆ j ) (A.9)

]

i̸ =j

k2

xi − x

)

( k2

h2

xj − x h2

n ∑

1 d

nh2x

ξi (2Pi − 1)k2

i=1

(

) k1

Pˆ i + Pˆ j − 1

xi − x

)

h2

h2

i=1

where

h1

l

d

nh2x (αˆ (x) − α (x) − h22 b(x))→d N(0, V1 )

where V1 = σ 2 (x)f (x) k22 (u)du with σ 2 (x) = E ηi2 (x)|xi = x . Now we consider the asymptotic distribution of αˆ S . We first obtain some uniform rates of convergence for Snε (x) and Sn (x), which are useful for the analysis of αˆ S . First we consider Sn (x). We still make use of the decomposition above that



]

[

Sn (x) = Sn0 (x) + Sn1 (x) + Sn2 (x) + Snr (x). In particular, working with the U-process decomposition of Sn0 (x) and making use of Lemma 10A and Theorem 3 in Sherman (1994), we have

) Ii Ij

(

)

Sn0 (x) = ESn0 (x) + Op (nh2x )−1/2 ln n + Op (n−1+c0 h2x h1 )

and Sn4ε (x)

(

Consequently, as for standard kernel nonparametric regression, Theorem 2(i) follows by applying Lindeberg Central Limit Theorem

)

n ∑ [

(

h2

ηi (x) = 2S(x)−1 ξi (2Pi − 1)f (x, 1 − Pi )Ii I ∗ (x, 1 − Pi )

n(n − 1) h2dx h1

×

)

× f (x, 1 − Pi )Ii I ∗ (x, 1 − Pi ) + op (1) ( ) n 1 ∑ xi − x = √ ηi (x) + op (1) k2

)

1

1

k2

xi − x

l

d

= S(x)−1 √



(

)

d

h2x

(

nh2x (αˆ (x) − α (x) − h22 b(x))

)

h1

n ∑

1

n(n − 1)

Sn3ε (x) =

)

h2

ξij (Pˆ i − Pˆ j )k2

(

i̸ =j

(

2ξi (2Pi − 1)

i=1

1

where b(x) = b1 (x) + b2 (x). By combining the above results, we obtain

where ϕi (x) = g0 (xi ) + λ(Pi ). Thus we can write Snε as

1

)−1/2 )

× f (x, 1 − Pi )Ii I (xi , 1 − Pi ) (( )−1/2 ) l + b(x)h22 + op nhd2x

Hence,

1

)−1/2 )



ξi = (di − Pi )α (xi ) + (di εi − λ(Pi )) + u0i

n(n − 1) h2dx h1

d

nh2x

d nh2x

Therefore, we have

yi = g0 (xi ) + di α (xi ) + di εi + u0i = g0 (xi ) + λ(Pi ) + Pi α (xi ) + ξi

Sn1ε (x) =

((

+ op

((

=

1

n ∑

2

n(n − 1) h2dx h1 2

( × k2

xj − x h2

(Pˆ i − Pi )(Pˆ i − Pˆ j )k2

i̸ =j

(

) k1

Pˆ i + Pˆ j − 1 h1

Sn1ε (x) =

n

i=1

2ξi (2Pi − 1)

1 d

h2x

( k2

xi − x

Ii Ij

)

d

(

= S(x) + O hl + h11 + h22 + Op (nhd2x )−1/2 ln n

)

h2

) (A.10)

l

l

)

uniformly in x ∈ S for any small number c0 > 0, where the last equality follows a Taylor expansion. Also, with the results in Lemma A.1, it is straightforward to show that Sn1 (x) = Op (πn )

with ξij = ξi − ξj and ϕιj = ϕi − ϕj . From Lemmas A.2–A.5, we have n 1∑

(

(

d

xi − x h2

)

Sn2 (x) = Op (πn2 ) 1 Sn3 (x) = Op (πn h− 1 )

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.

S. Chen et al. / Journal of Econometrics (

)



9

(

× Ii Ij IS (xl ) (Pˆ i − Pˆ j ) − Pi − Pj

and

(

))

( ) 3 Snr (x) = Op πn2 h− 1 uniformly in x ∈ S . Therefore, we have

(

l

l

Sn (x) = S(x) + O hl + h11 + h22

)

1 + Op (nhd2x )−1/2 ln n + Op (πn h− 1 )

n(n − 1)n h2dx h2

×

l

l

Snε (x) = O hl + h11 + h22

)

(

k2

)

1 + Op (nhd2x )−1/2 ln n + Op (πn h− 1 )

=

Sn (x)

=

( )2 l l + O hl + h11 + h22 S(x) )2 ( 2 × + Op (nhd2x )−1/2 ln n + Op (πn2 h− 1 ) S(x)

) ( + op n−1/2

=

xj − xl

l

S(xl )

1

IS (xl ) + op n

(

Pi + Pj − 1

)

l

)

(

(

)

Next, we can write Un0 as Un0 =

1

1

n(n − 1)(n − 2) h2dx h1

n ∑

ψijl + op (n−1/2 )

i̸ =j̸ =l

where

l=1

(

k′1

)

2

n 1 ∑ Snε (xl )

)

Un2 = Un3 = Op (hl + h11 + h22 ) + op n−1/2 = op n−1/2

) αˆ (xl ) − α (xl ) IS (xl )

l=1

( k2

Following the analysis of Sn11ε (x) in the proof of Lemma A.2, we can show that

1 ∑(

n

)

(

n

=

xi − xl

)

Un4 ≤ πn2 h31 = op n−1/2

uniform in x ∈ S , where the last equality follows from Assumption 5’. Based on the above results, we have

n

S −1 (xl ) yi − yj − (Pi − Pj )α (xl ) (Pi − Pj )

and

Snε (x)

Snε (x)

1 i̸ =j

(

h2 ( h2 h1 ( )) × Ii Ij IS (xl ) (Pˆ i − Pˆ j ) − Pi − Pj

uniformly in x ∈ S . Consequently, we obtain Snε (x)

n ∑

(l=1

uniformly in x ∈ S . Similarly, we can show that

(

n ∑

1 2

)

(

1

Un3 =

( ) ψijl = S −1 (xl ) yi − yj − (Pi − Pj )α (xl ) (Pi − Pj ) ( ) ( ) ( ) xi − xl xj − xl Pi + Pj − 1

) −1/2

k2

1

k2

h2

k1

h2

h1

Ii Ij IS (xl )

2d

By Lemma A.3 in Ahn and Powell (1993), if h2 x h1 = o (n), we have

n(n − 1)n h2dx h1 2

n n ∑ ∑

×

(

)

S −1 (xl ) yi − yj − (Pˆ i − Pˆ j )α (xl ) (Pˆ i − Pˆ j )

Un0 = E ψijl +

i̸ =j l=1

( k2

i=1

xi − xl

(

)

k1

h2 −1/2

(

)

xj − xl

k2

h2

(

Pˆ i + Pˆ j − 1

n ] [ ]) 1 ∑( [ + E ψijl |ωj − E ψijl n

)

h1

j=1

n ] [ ]) ( ) 1 ∑( [ + E ψijl |ωl − E ψijl + op n−1/2 . n

)

× Ii Ij IS (xl ) + op n = Un0 − Un1 + Un2 + Un3 + Un4

j=1

where Un0 =

1 2

n ∑

Un1 =

n ] [ ]) 1 ∑( [ E ψijl |ωi − E ψijl n

i ̸ =j

i=1

S −1 (xl ) yi − yj − (Pi − Pj )α (xl ) (Pi − Pj )

(

(l=1

xi − xl

k2

With some tedious but straightforward calculation, under Assumptions 1–4 and 5’, we can show that

n ∑

1

n(n − 1)n h2dx h1

×

)

)

( k2

h2 1

xj − xl

)

( k1

h2

Pi + Pj − 1

n ] [ ]) 1 ∑( [ = E ψijl |ωj − E ψijl n

)

j=1

Ii Ij IS (xl )

h1

=

1 2

×

n

∑∑

S

−1

(xl )

((

(Pˆ i − Pˆ j ) − Pi − Pj

(

))

k2

Un2 =

xi − xl

)

1

1 n(n − 1)n

( k2

( k2

h2

h2

xj − xl

)

)

h2 n n ∑ ∑

( k2

xj − xl h2

( k1

2d h2 x h1 i̸=j l=1

xi − xl

n

( ) δ1i + op n−1/2

i=1

α (xl ) (Pi − Pj )

n ( ) ] [ ]) 1 ∑( [ E ψijl |ωl − E ψijl = op n−1/2 n

)

where

)

i̸ =j l=1

(

n 1∑

and

n(n − 1)n h2dx h1 n

n ] [ ]) 1 ∑( [ E ψijl |ωi − E ψijl n

Pi + Pj − 1

Ii Ij IS (xl )

h1

S −1 (xl ) yi − yj − (Pi − Pj )α (xl )

(

)

( k1

Pi + Pj − 1 h1

)

j=1

δ1i = ξi

)

(2Pi − 1) S(xi )

fx (xi )f (xi , 1 − Pi )Ii I ∗ (xi , 1 − Pi )IS (xi )

Now we consider Un1 . Similar to Sn11ε in the proof of Lemma A.2, we can write Un1 as Un1 =

1 n(n − 1)(n − 2)(n − 3)

n ∑

( ) ςijlm + op n−1/2

i̸ =j̸ =l̸ =m

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.

10

S. Chen et al. / Journal of Econometrics (

)

where

ςijlm =

dm − Pi α (xl )

1

2d h2 x h1 hdw fw (wi ) S(xl )

( k1

) (

Pi + Pj − 1

k

h1

× k2

( (Pi − Pj )k2

wm − wi

xi − xl

)

h2

( k2

xj − xl h2

Sn12ε (x) =

) Ii Ij IS (xl )

h

n 1∑

n

n

l h11

δ2i + Op

l h22

+

l

+h

)

(

+ op n

δ2i + op n

(

−1/2

−1/2

(

Pi + Pj − 1

) Ii Ij

h1

ξij (Pi − Pj )

1 i̸ =j

xi − x

)

( k2

h2

Pi + Pj − 1

)

xj − x

Ii Ij

h2

)[

(Pˆ i + Pˆ j ) − (Pi + Pj )

h1

]

and Rnε1 (x) is the reminder term, which, similar to Snr (x), can be 3 2 shown to be of the order Op (h− 1 πn ). We consider the individual terms Sn10ε (x), Sn11ε (x) and Sn12ε (x), separately. Following the U-statistic projection results in Ahn and −2d 1 Powell (Lemma 3.1, 1993) when h2 x h− 1 = o (n), we obtain

)

α (x˜ , x∗i ) f (x˜ , wi )f (x˜ , x∗i , 1 − Pi ) fw (wi ) S(x˜ , x∗i ) f (x˜ , x∗i )I(x˜ , wi )I(x˜ , x∗i , 1 − Pi )IS (x˜ , x∗i )dx˜



(2Pi − 1)

Sn10ε (x)

=

n

n

EIS (xi )

i=1

=

n

k2

d

h2x

(Pi − Pj )

xi − x

(

d

h2x h1

h2

( k2

(

k2

xj − x

)

h2

)

) Ii

h2

1

k2

)

xj − x

)

( k1

h2

Pi + Pj − 1

)

)

h1

Ij |Pi

( ) + op n−1/2 . Let

( ζi = E (Pi − Pj )

( ) δi + op n−1/2

(

1

k2

d

h2x h1

xj − x

(

) k1

h2

Pi + Pj − 1 h1

)

) Ij |Pi

and

i=1

where δi = (EIS (xi ))−1 (δ1i − δ2i + (α (xi ) − αS )) IS (xi ). Therefore, an application of the Lindeberg–Levy Central Limit Theorem yields



(

1

2ξi

i=1

E

i=1 IS (xi )

n

2d

h2 x h1

xi − x

Ii Ij |ξi , xi ], Pi + op n−1/2

h1

(

) ∑n ( ˆ (xi ) − α (xi ) IS (xi ) i=1 α ∑n αˆ S − αS = i=1 IS (xi ) ∑n (xi ) − αS ) IS (xi ) (α + i=1 ∑n n 1∑

Pi + Pj − 1

n 1∑

Consequently, we have

2E [ξij (Pi − Pj )

)

(

1

i=1

k1

) ∑n ( n ( ) ˆ (xi ) − α (xi ) IS (xi ) 1 ∑ (δ1i − δ2i )IS (xi ) i=1 α ∑n = + op n−1/2 . i=1 IS (xi )

n 1∑

(

Given the above results, we obtain

=

( k1

h2

n ∑

1

)

xj − x

n(n − 1) h2dx h2

k′1

where

δ2i =

1

( k2

h2

× k2 (

)

i=1

di − Pi

)

2

i=1 n

1∑

=

(

xi − x

)

Again, by applying the U-statistic projection techniques (Ahn and Powell, 1993), we can show that Un1 =



(

n 1∑

Vn (x) =

n(αˆ S − αS )→ N(0, V2 ) d

n

2ξi (2Pi − 1)

[i=1

1 d

h2x

( k2

xi − x

)

h2

] × f (x, 1 − Pi )Ii I (x, 1 − Pi ) − ζi . ∗

where V2 = E δi2 .

By some standard argument in kernel regression, we can show that

Lemma A.2. Under Assumptions 1–5, Sn1ε (x) defined in (A.7) satisfies

Var (Vn (x)) = op

[ ]

n

1∑

Sn1ε (x) =

n

2ξi (2Pi − 1)

i=1

1 d

h2x

( k2

xi − x

) ((

d

nh2x

)−1/2 )

.

Proof of Lemma A.2. With a Taylor expansion, we can decompose Sn1ε as Sn1ε (x) = Sn10ε (x) + Sn11ε (x) + Sn12ε (x) + Rnε1 (x) where Sn10ε (x) =

1

n ∑

1

n(n − 1) h2dx h1 2

( × k2

Sn11ε (x) =

xj − x

1

( k1

1

n(n − 1) h2dx h1 2

(

xi − x

Pi + Pj − 1

n ∑ i̸ =j

h1

)−1 )

Sn10ε (x)

=

n 1∑

n

2ξi (2Pi − 1)

i=1

and EVn = 0

Sn11ε (x) =

)

[ ] ξij (Pˆ i − Pˆ j ) − (Pi − Pj )

d

h2x

( k2

xi − x

)

h2

((

d

nh2x

)−1/2 )

.

To deal with Sn11ε (x), we apply the asymptotic linearization result (A.3) and U-Statistic projection techniques, using the arguments similar to part (ii) of the proof of Theorem 3.1 in Ahn and Powell −2d 1 −dw (1993), we can show that, if h2 x h− = o (n), then 1 h 1

2

n(n − 1)n h2dx h1 2

Ii Ij

1

× f (x, 1 − Pi )Ii I ∗ (x, 1 − Pi ) + op

)

h2

i̸ =j

)

h2

ξij (Pi − Pj )k2

d

nh2x

Hence,

h2

× f (x, 1 − Pi )Ii I ∗ (xi , 1 − Pi ) + op

((

=

n 1∑

n

+

n n ∑ ∑

q(ωi , ωj , ωl ) + O(πn2 )

i̸ =j l=1

E(q(ωi , ωj , ωl )|ωi ) +

i=1 n 1∑

n

n 1∑

n

E(q(ωi , ωj , ωl )|ωj )

j=1

E(q(ωi , ωj , ωl )|ωl )

l=1

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.

S. Chen et al. / Journal of Econometrics (

+ O(πn2 ) + o n

(

=

n

where 2

q(ωi , ωj , ωl ) =

( × k1

2d

h2 x h1 hdw

Pi + Pj − 1

ξij k2

xi − x

k

)

)

( k2

h2

wl − wi

) (

h1

(

xj − x

fw (wl )

+ op

n

=

1∑ 2

[ E

1

× Ij k

xi − x

k2

wl − wi

( k1

(Pl − Pi ) f (wl )

h

n

Pi + Pj − 1 h1

k2

xi − x h2

f (x, 1 − Pi )Ii I ∗ (xi , 1 − Pi )

.

((

+ op

)−1/2 )

d nh2x

(

|j|=l2

⎫ ⎬ ] ∂j [ ∗ g(x)f (x , P)I (x , P) ⎭ j1 !...jdx ! ∂ xj 1

|j|=l2

|xi , wi

× (2P − 1)f (x, P)dP

d

E(q(ωi , ωj , ωl )|ωi ) = Op (hl nh2x

)−1/2

) = op

((

Proof of Lemma A.3. Similar to Sn1ε (x), we write Sn2ε (x) as

d

nh2x

)−1/2 )

i=1

Sn2ε (x) = Sn20ε (x) + Sn21ε (x) + Sn22ε (x) + Rnε2 (x) where

E(q(ωi , ωj , ωl )|ωj ) = op

((

d

nh2x

)−1/2 )

Sn20ε (x)

.

1

=

2

( 1 n

× k2

∑n

ωi , ωj , ωl )|ωl ) = 0, we obtain (( )−1/2 ) ( )−1/2 ( ) d d ) + o n−1/2 = op nh2x Sn11ε (x) = O(πn2 ) + Op (hl nh2x Therefore, together with

l=1 E(q(

q1 (ωi , ωj , ωl ) =

ξ

1

k′ 2 1

(

1 hdw

− Pj )k2

Pi + Pj − 1

( k

wl − wi

)

h

xi − x

(

) k2

h2

xj − x

)

h2

n n ∑ ∑

n(n − 1)n

n

fw (wl )

1∑

=

+

(

)

(

)−1/2

= op ( nhd2x

(

n

n

j=1

(

) k2

h2

xj − x

(

) k1

h2

Pi + Pj − 1 h1

) Ii Ij

1 2

×

n ∑

1

ϕij (Pi − Pj )k2

k′1

(

Pi + Pj − 1

) −1/2 (

) −1/2

)[

h1

(

xi − x

)

h2

( k2

xj − x

)

h2

(Pˆ i + Pˆ j ) − (Pi + Pj )

Ii Ij

]

3 2 with Rnε2 (x) = Op (h− 1 πn ). Regarding Sn10ε , with a U-statistic projection (Lemma A.3, Ahn −2d 1 and Powell, 1993), given h2 x h− 1 = o (n), we obtain n 1∑

n

[ϕi1 − E ϕi1 ] + op n−1/2

(

)

(

Pi + Pj − 1

i=1

where

ϕi1 = 2

1 d

h2x

( k2

xi − x

×

)

h2

[ Consequently, we obtain

xi − x

]

Sn20ε (x) = E ϕi1 +

3 + O(πn2 h− 1 )+o n

)

(

n(n − 1) h2dx h2

= o (n),

E(q1 (ωi , ωj , ωl )|ωj )

E(q1 (ωi , ωj , ωl )|ωl )

π

ϕij k2

i̸ =j l=1

l=1 3 O( n2 h− o 1 ) −1/2 d op ( nh2x )

+

h1 h

2 q1 (ωi , ωj , ωl ) + O(πn2 h− 1 )

i=1

n

n ∑

1

Sn22ε (x) =

n

+

1

i ̸ =j

E(q1 (ωi , ωj , ωl )|ωi ) +

1∑

Ii Ij

and

.

n

1∑

Pi + Pj − 1

(Pˆ i − Pˆ j ) − (Pi − Pj )

(dl − Pi )

n

=

h2

)

h1

[

−2dx −3 −dw

1

k1

i ̸ =j

Ii Ij

Then, similar to the analysis of Sn11ε (x), with h2 we can show that

Sn12ε (x) =

(

)

n(n − 1) h2dx h1

×

)

h1

h1

xi − x

i̸ =j

)

h2

ϕij (Pi − Pj )k2

(

2

(

ij (Pi 2d h2 x h1

xj − x

1

Sn21ε (x) =

To analyze Sn12ε (x), define 2

n ∑

1

n(n − 1) h2dx h1

j=1

Sn1ε (x)

d h2x

× (2P − 1)f (x, 1 − P)dP ⎧ ⎫ ∫ ⎨∑ ⎬ j [ ] 1 ∂ ∗ − g(x)f (x , 1 − P)I (x , 1 − P) ⎩ ⎭ j1 !...jdx ! ∂ xj

and a similar calculation shows n 1∑

1

)−1/2 )

⎧ ∫ ⎨∑

]

n

n

=

)

Thus a simple mean and variance calculation shows that 1∑

d

nh2x



)

h2

)

((

l b1 (x)h22

b1 (x) =

Ii

xj − x

2ξi (2Pi − 1)

)

where

)

h2

(

d h2x h1 hdw

(

(

ξi k2

d h2x i=1

n

Sn2ε (x)

E(q(ωi , ωj , ωl )|ωi )

i=1 n

11

(

Lemma A.3. Under Assumptions 1–5, Sn2ε (x) defined in (A.8) satisfies

Ii Ij

with ωi = (xi , wi , di , yi ). Note that n 1∑



i=1

)

h2

(dl − Pi )

h

)

n 1∑

) −1/2

ϕij (Pi − Pj )

1 d

h2x h1

Ii E

( k2

xj − x h2

) k1

h1

)

] Ij |xi , Pi .

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.

12

S. Chen et al. / Journal of Econometrics (

By Assumptions 3–5, it is straightforward to show that

( E

)2

n

1∑ n

[ϕi1 − E ϕi1 ]



i=1

) ( 1 ( 2) l l −d E ϕi1 = O n−1 h2 x (h11 + h22 ) n

Sn30ε (x) =

= E ϕi1 + op [

d nh2x

1

= 2E

2d

h2 x h1

( × k1

Pi + Pj − 1

l

d

nh2x

1

(

xi − x

) k2

h2

] + op

)−1/2

(

((

d

nh2x

xj − x h2

)−1/2

Sn31ε (x) =

)

2d

h2 x h1

( × k1

(

Pi + Pj − 1

( k2

n(n − 1)

xj − x

n(n − 1)n

h2

n

Sn32ε (x) =

[

1

]]

Similarly, we can show that Sn22ε (x) = op ((nh2x )−1/2 ). By combining the above results, we obtain d

.

d

)−1/2 )

⎧ ∫ ⎨∑ ⎩

|j|=l2

⎫ ]⎬ 1 ∂ [ ∗ α (x)f (x, P)I (x, P) ⎭ j1 !...jdx ! ∂ xj j

× P(2P − 1)f (x, 1 − P)I ∗ (x, 1 − dP) ⎧ ⎫ ∫ ⎨∑ ⎬ ] 1 ∂j [ ∗ − α (x)f (x , 1 − P)I (x , 1 − P) ⎩ ⎭ j1 !...jdx ! ∂ xj |j|=l2

× P(2P − 1)f (x, P)I ∗ (x, P)dP

Ii Ij

h1

Pi (α (xi ) − α (x)) − Pj (α (xj ) − α (x)) (Pi − Pj )

]

(

)

xi − x

k2

h2

xj − x

)

1

Ii Ij

h2



h1

(

k1

Pi + Pj − 1

)

h1

]

3 2 with Rnε3 (x) = Op (h− 1 πn ). 20 Then similar to Snε (x), we can show that

Sn30ε (x) = E { Pi (α (xi ) − α (x)) − Pj (α (xj ) − α (x)) (Pi − Pj )

[

]

( k2

(

)

xi − x

k2

h2

+ + op

((

d

nh2x

xj − x

)−1/2 d

nh2x

(

) k1

h2

((

= b2 (x)h22 + op

Pi + Pj − 1

)

h1

Ii Ij }

) )−1/2 )

.

Also, similar to the analysis of Sn21ε ) (x) and Sn22ε (x), we can show that (

(

d

nh2x

l

Sn3ε (x) = b2 (x)h22 + op

((

Sn3ε (x)

+ Rnε3 (x)

+

Sn31ε (x)

+

Sn32ε (x)

d

)−1/2

. Therefore

)−1/2 )

d

nh2x

.

we can write it as

)

Proof of Lemma A.5. By applying the mean-value theorem and the rate result (A.1), we can show that Sn4ε (x) =

1

n ∑

2

n(n − 1) h2dx h1 2

× k2

Proof of Lemma A.4. Similar to Sn30ε (x)

(

Sn4ε (x) = op (nh2x )−1/2

(

Sn2ε (x),

=

)

Lemma A.5. Under Assumptions 1–5, Sn4ε (x) defined in (A.9) satisfies

where b2 (x) =

h2

Pi + Pj − 1

n ∑ [

Sn31ε (x) = Sn32ε (x) = op

Lemma A.4. Under Assumptions 1–5, Sn3ε (x) defined in (A.9) satisfies nh2x

( k1

1

l

((

)

× (Pˆ i + Pˆ j ) − (Pi + Pj )

= op ((nhd2x )−1/2 ).

d op ((nh2x )−1/2 )

xj − x

[

[ ] + E q2 (ωi , ωj , ωl ) + Op (πn2 ) ( ) l l = On hl (nhd2x )−1/2 + πn (h11 + h22 ) + Op (πn2 )

l

( k2

]

k2 q2 (ωi , ωj , ωl ) + Op (πn2 )

i̸ =j l=1

Sn3ε (x) = b2 (x)h22 + op

)

xi − x

i ̸ =j

i=1

+

]

h2

(

l=1

=

Pi (α (xi ) − α (x)) − Pj (α (xj ) − α (x)) (Pi − Pj )

n(n − 1) h2dx h1

×

2 E(q2 (ωi , ωj , ωl )|ωi ) − E q2 (ωi , ωj , ωl )

l b1 (x)h22

Ii Ij

h1

2

)

n [ ]] 1 ∑[ + E(q2 (ωi , ωj , ωl )|ωl ) − E q2 (ωi , ωj , ωl ) n

Sn2ε (x)

)

and

Ii Ij

n 1∑ [

=

Pi + Pj − 1

× (Pˆ i − Pˆ j ) − (Pi − Pj )

)

By following the analysis of Sn20ε (x), we can show that n n ∑ ∑

( k1

n ∑ [

[

)

wl − wi (dl − Pi ) 1 k d w fw (wi ) h h

1

)

i ̸ =j

)

h1

xj − x h2

1

(

(

Sn21ε (x) =

h2

2d h2 x h1

×

h2

( k2

1

)

xi − x

)

xi − x

)

k2

ϕij k2

]

i ̸ =j

To analyze Sn21ε (x), define q2 (ωi , ωj , ωl ) =

Pi (α (xi ) − α (x)) − Pj (α (xj ) − α (x)) (Pi − Pj )

(

Ii Ij

h1

((

n ∑ [

k2

)

1 2

)−1/2 )

ϕij (Pi − Pj )k2

= b1 (x)h22 + op

1

n(n − 1) h2dx h1

×

((



where

which then implies that Sn20ε (x)

)

xi − x

(Pˆ i − Pi )(Pˆ i − Pˆ j )

i̸ =j

)

h2

( ) 1 + Op πn3 h− 1

( k2

xj − x

)

h2 1

( k1

Pi + Pj − 1 h1

) Ii Ij

2

n(n − 1) h2dx h1 2

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.

S. Chen et al. / Journal of Econometrics (

) ( ) ( )⏐ ( n ⏐ ∑ ⏐ Pi + Pj − 1 ⏐ xj − x xi − x ′ ⏐ ⏐ k2 k1 × ⏐ ⏐k2 h2 h2 h1 i̸ =j ( ) 1 = Op πn2 + πn3 h− 1 where we have used the fact that 1 2 n(n − 1) h2dx h1 2

) ( ) ( )⏐ ( n ⏐ ∑ ⏐ ⏐ ⏐k2 xi − x k2 xj − x k′ Pi + Pj − 1 ⏐ = Op (1) × 1 ⏐ ⏐ h2 h2 h1 i̸ =j

and 1

2

n(n − 1) h2dx h1 2

( ) ( ) ( )⏐ n ⏐ ∑ ⏐ ⏐ ⏐k2 xi − x k2 xj − x k1 Pi + Pj − 1 ⏐ Ii Ij = Op (1) × ⏐ ⏐ h2 h2 h1 i̸ =j

both of which can be established by following Lemma A.3 of Ahn and Powell (1993). References Aakvik, A., Heckman, J.J., Vytlacil, E.J., 1999. Semiparametric Program Evaluation Lessons from an Evaluation of a Norwegian Trainging Program Manuscript. Abrevaya, J., Shin, Y., 2011. Rank estimation of partially linear index models, 14, 409–437. Ahn, H., Powell, J.L., 1993. Semiparametric estimation of censored selection models with a nonparametric selection mechanism. J. Econometrics 58, 3–29. Angrist, J.D., 2004. Treatment effect heterogeneity in theory and practice. Econom. J. 114 (494), C52–C83. Antonczyk, D., 2011. Using Social Norms To Estimate the Effect of Collective Bargaining Ont He Wage Structure, Manuscript. Albert-Ludwigs-University Freiburg and IZA. Chen, S., 1999. Distribution-free estimation of the random coefficient dummy endogenous variable model. J. Econometrics 91, 171–199. Chen, S., 2000. Efficient estimation of binary choice models under symmetry. J. Econometrics 96, 183–199.

)



13

Chen, S.H., Khan, S., 2010. Estimation of Causal Effects of Education on Wage Inequaltiy Using IV Methods and Sample Selection Models, Manuscript. Chen, S., Khan, S., Tang, X., 2015. Informational content of special regressors in heteroskedastic binary regressions. J. Econometrics 193, 162–182. Chen, S., Zhou, Y.H., 2010. Semiparametric and nonparametric estimation of sample selection models under symmetry. J. Econometrics 157, 143–150. Chernozhukov, V., Hansen, C., 2005. An IV model of quantile treatment effects. Econometrica 73, 245–261. Das, M., Newey, W.K., Vella, F., 2003. Nonparametric estimation of sample selection models. Rev. Econom. Stud. 70, 33–58. Dong, Y., Lewbel, A., 2011. Nonparametric identification of a binary random Factor in Cross Section Data. J. Econometrics 163 (2), 163–171. Fan, Y., Gencay, R., 1995. A consistent nonparametric test of symmetry in linear regression models. J. Amer. Statist. Assoc. 90, 551–557. Fan, Y., Wu, J., 2010. Partial identification of the distribution of treatment effects in switching regime models an dits confidence sets. Rev. Econom. Stud. 77, 1002–1041. Fang, Y., Li, Q., Yu, X., Zhang, D., 2015. A data-driven smooth test of symmetry. J. Econometrics 188, 490–501. Heckman, J., 1990. Varieties of selection bias. Am. Econ. Rev. 80, 313–318. Heckman, J., Ichimura, H., Smith, J., Todd, P., 1998. Characterizing selection bias using experimental data. Econometrica 66, 1017–1098. Heckman, J.J., Navarro-Lozano, S., 2004. Using matching, intrumental variables, and control functions to estimate economic choice models. Rev. Econ. Stat. 86, 30–57. Honoré, B., 1992. Trimmed lad and least squares estimation of truncated and censored regression models with fixed effects. Econometrica 60, 533–565. Lewbel, A., 2007. Endogenous selection or treatment model estimation. J. Econometrics 141, 777–806. Mammen, E., Rothe, C., Schienle, M., 2012. Nonparametric regression with nonparametrically generated covariates. Ann. Statist. 40, 1132–1170. Newey, W.K., 1988. Two Step Series Estimation of Sample Selection Models. Working Paper. MIT. Newey, W.K., McFadden, D., 1994. Large sample estimation and hypothesis testing. In: Engle, R.F., McFadden, D.L. (Eds.), In: Handbook of Econometrics, vol. IV, Elsevier Science, Amsterdam, (Chapter 36). Powell, J.L., 1986. Symmetrically trimmed least squares estimation for tobit models. Econometrica 54, 1435–1460. Powell, J.L., 1989. Semiparametric Estimation of Bivariate Latent Variable Models, Manuscript. Sherman, R.P., 1994. U-processes in the analysis of a generalized semiparametric regression estimation. Econometric Theory 37, 2–395.

Please cite this article in press as: Chen S., et al., Nonparametric identification and estimation of sample selection models under symmetry. Journal of Econometrics (2017), https://doi.org/10.1016/j.jeconom.2017.09.004.