Partial identification in binary response models with nonignorable nonresponses

Partial identification in binary response models with nonignorable nonresponses

Economics Letters 121 (2013) 74–78 Contents lists available at ScienceDirect Economics Letters journal homepage: www.elsevier.com/locate/ecolet Par...

392KB Sizes 0 Downloads 79 Views

Economics Letters 121 (2013) 74–78

Contents lists available at ScienceDirect

Economics Letters journal homepage: www.elsevier.com/locate/ecolet

Partial identification in binary response models with nonignorable nonresponses Tadao Hoshino ∗,1 Graduate School of Information Science and Engineering, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro, Tokyo, Japan

highlights • • • •

We propose an estimation method for semiparametric binary response models with nonignorable nonresponses. The parameter of interest is partially identifiable without relying on restrictive distributional assumptions. Our estimation method, which is based on the special regressor approach, is easy to implement. The proposed estimator is consistent in the Hausdorff metric.

article

info

Article history: Received 18 May 2013 Received in revised form 11 July 2013 Accepted 13 July 2013 Available online 18 July 2013

abstract This study investigates the identification of parameters in semiparametric binary response models of the form y = 1(x′ β + v + ε > 0) when there are nonignorable nonresponses. We propose an estimation procedure for the identified set, the set of parameters that are observationally indistinguishable from the true value β , based on the special regressor approach of Lewbel (2000). We show that the estimator for the identified set is consistent in the Hausdorff metric. © 2013 Elsevier B.V. All rights reserved.

JEL classification: C13 C14 C25 Keywords: Semiparametric binary response models Nonignorable nonresponses Special regressor approach Partial identification

1. Introduction

on the MAR assumption. Consider the following binary response model:

In the field of survey and interview data analysis, nonresponses and missing data are common and often unavoidable. In many empirical studies, models are estimated using only a subsample of the complete data. These results are valid under the missing-at-random (MAR) assumption, which implies that missing data is ignorable. However, Manski (2003), for example, points out that such an assumption is untestable, so nonresponses are in general nonignorable. Indeed, the MAR assumption does not hold in many empirical settings, yielding biased estimates under such an assumption. In this paper, we consider the estimation of binary response models with nonignorable missing response data without relying

y = 1(x′ β + v + ε > 0),

 Y =



Tel.: +81 0 357342651. E-mail addresses: [email protected], [email protected].

1 Research Fellow (PD), Japan Society for the Promotion of Science. 0165-1765/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.econlet.2013.07.009

(1)

where y is a binary outcome, x is a k × 1 vector of observed regressors, β is a k × 1 vector of parameters to be estimated, ε is an unobserved error term, and v is a scalar random variable whose coefficient is normalized to 1 for identification. This paper extends (1) to the case in which y is not observable to econometricians with a positive probability less than 1. When an individual’s response is not observable, it is in general impossible to infer whether the ‘‘potential’’ response is 1 or 0. In other words, what we can ‘‘observe’’ is only a random set Y defined by

{y} {1, 0}

if d = 1 if d = 0.

(2)

In the equation above, d is an indicator representing the observability of y. A formal definition of a random set is given in the next section.

T. Hoshino / Economics Letters 121 (2013) 74–78

When it is not credible to assume the MAR assumption, an alternative often used in the literature is a sample selection model (Heckman, 1979). In order to use such a model, we need to introduce additional structural and distributional assumptions on the relationship between the observability d, and its explanatory variables. The parameter estimates are generally not consistent when these assumptions are not met, but it is often very difficult to obtain a correct model specification. Thus, in this paper, we do not impose such structural and distributional assumptions on the observability of response data. In addition, we do not assume any parametric form for the distribution of the error term ε . Under this setup, this paper considers estimating the set of all observationally equivalent values of the parameter β . We call this set the identified set, and denote it by Θβ . For a semiparametric binary response model, a straightforward estimator of the identified set, Θβ , would be the maximum likelihood estimator. However, applying the maximum likelihood estimator is often problematic in terms of computational burden and the assumption that ε is independent of (x, v). To overcome these problems, this paper suggests using the method proposed by Lewbel (2000). If there are no missing responses in the population, i.e., P(d = 0) = 0, and v can be used as a special regressor, Lewbel (2000) shows that β can be estimated by a linear regression of [y − 1(v > 0)]/f (v|x) onto x, i.e.,

  y − 1(v > 0) . β = E(xx′ )−1 E x f (v|x)

(3)

At the cost of assuming the existence of a special regressor, unlike the other estimators of semiparametric binary response models proposed by, for example, Klein and Spady (1993) and Ichimura (1993), the estimator in (3) does not require restrictive conditions on the error term such as statistical independence or single-index sufficiency. In addition, by utilizing the special regressor approach, the computation of the identified set Θβ can be greatly simplified as compared with use of the maximum likelihood estimator.2 The remainder of this paper is organized as follows. In Sections 2 and 3, we describe the estimation and inference procedure for our model, respectively. In Section 4, we propose a method to determine the sign of the coefficient of the special regressor when it is not known a priori. In Section 5, we introduce an assumption called the stigma-affecting response, which is reasonable to assume in some empirical situations, and can improve the bound. Finally, in Section 6, we present the conclusion. 2. Consistent estimation of the identified set First, let us introduce the following assumptions. Assumption A. 1. The conditional distribution of v given x is absolutely continuous with respect to a Lebesgue measure with nondegenerate Radon–Nikodym conditional density f (v|x). 2. The conditional distribution of ε given x is independent of v for all (v, x) ∈ supp(v, x). 3. (a) The conditional distribution of v given x has a support [L, U ] for some constants L and U such that −∞ ≤ L < 0 < U ≤ ∞; and (b) supp(−x′ β − ε) ⊆ [L, U ]. 4. (a) E(xε) = 0; and (b) E(xx′ ) exists and is nonsingular. Assumption A1–3 characterize the special regressor v . The conditional independence condition in A2 is much weaker than the statistical independence condition. The large support condition in A3 implies that the probability of observing y = 1 approaches

75

0 (1) if v becomes sufficiently small (large). This condition can be relaxed by replacing it with the tail symmetry condition (for details, see Magnac and Maurin, 2007). Assumption A4(a) excludes the case where x is endogenous. If a set of suitable instrumental variables exists such that E(z ε) = 0, this condition can be relaxed. Now, we consider the partial identification of the parameters. Let us introduce some terms and their definitions in the field of random set theory (for a comprehensive review, see, e.g., Li et al., 2010, Molchanov, 2005). Random set theory provides very useful tools to analyze a certain class of partially identified models, as in Beresteanu and Molinari (2008) and Beresteanu et al. (2012). Let (Ω , A, µ) be a probability space. Throughout this paper, we assume that the probability space is nonatomic. Let K (Rk ) be the family of all nonempty closed subsets of Rk . Definition 1 (Random Set). A set-valued mapping F : Ω → K (Rk ) is called a random set if, for each open subset O in Rk , F −1 (O) := {ω ∈ Ω : F (ω) ∩ O ̸= ∅} ∈ A.3 Definition 2 (Selection). An Rk -valued function f : Ω → Rk is called a selection for a random set F : Ω → K (Rk ) if f (ω) ∈ F (ω) for all ω ∈ Ω . Let S (F ) be a selection set in L1 [Ω ; Rk ] for a random set F , where L [Ω; Rk ] is the space of measurable functions f : Ω → Rk such that Ω |f |dµ is finite, i.e., S (F ) := {f ∈ L1 [Ω ; Rk ] : f (ω) ∈ F (ω) for all ω ∈ Ω }. 1

Definition 3 (Aumann Integral of a Random Set). For each random set F , the  Aumann integralof F , denoted by E(F ), is defined by E(F ) = Ω f dµ : f ∈ S (F ) . Assumption B. 1. The random variables (Y , v, x) are defined on a nonatomic probability space (Ω , A, µ). 2. Any y∗ ∈ S (Y ) is admissible for the true y. 3. Let

 yU = xj

y 1

yU −1(v>0) f (v|x)

if d = 1 if d = 0 and xj

L [Ω ; R ]. 1

 and yL =

yL −1(v>0) f (v|x)

y 0

if d = 1 if d = 0.

, j = 1, . . . , k, are random variables in

The nonatomicity Assumption B1 is introduced in order to simplify the estimation of the identified set, and it is not too restrictive because the appropriate probability space for a sequence of i.i.d. random elements is nonatomic (see Beresteanu et al., 2012). Assumption B2 excludes the case, for example, in which supp(−x′ β − ε) is known to researchers. If it were known and the observed value of v were larger (smaller) than its upper (lower) boundary, we could set y to 1 (0) regardless of the observability of y, yielding a smaller admissible set than S (Y ). Define G(ω) := x(ω)Y ∗ (ω),

where Y ∗ (ω) :=

Y − 1(v > 0) f (v|x)

(ω).

Now, we characterize the population-identified set Θβ as follows. Proposition 1. Suppose that Assumptions A and B hold. Then, the identified set for β is given by

Θβ = E(xx′ )−1 E(G).

(4)

Further, the set in (4) is equivalent to E(xx′ )−1 E(coG),

(5)

where, for a set A, coA is a convex hull of A. 2 This paper is not the first to investigate identification and estimation in incomplete binary response models based on the special regressor approach. Magnac and Maurin (2008) consider a binary response model in which the special regressor v is either discrete or measured within intervals.

3 In general, we can consider a mapping F : Ω → K (X) with X being a general metric space. For the purpose of this study, it suffices to consider the case where X = Rk .

76

T. Hoshino / Economics Letters 121 (2013) 74–78

Proof. First, summarizing all the available information and using the law of iterated expectations, when y is perfectly observed, β is characterized by the following set:





b ∈ Rk : E x

y − 1(v > 0)



0 is satisfied, b∗ is one of the observationally equivalent values of the parameter β . Hence, the population-identified set Θβ is characterized by

Θβ =

b ∈ R : b = E(xx )

′ −1

 E x

y∗ − 1(v > 0)



f (v|x)

 , y ∈ S (Y ) ∗

= E(xx ) E(G), ′ −1

where the last equality follows from Assumption B3. Using the nonatomicity of the probability space, the second result follows from the convexification theorem (see Theorem 2.1.5 in Li et al., 2010; Theorem 2.1.15 in Molchanov, 2005).  Define, for i = 1, . . . , n, Gi := xi Yi∗ ,

where Yi∗ :=

Yi − 1(vi > 0) f (vi |xi )

.

A direct sample analogue estimator of Θβ in (4), n−1



n −1

−1 x x′  i=1 i i (1−di )

−1

n− 1

ˆ i=1 coGi . Because f (v|x) is unknown, writing Gi as a

n

random set by replacing f (v|x) in Gi with an estimate fˆn (vi |xi ), we define our estimator of Θβ as

 ˆ β := Θ

n 1

n i =1

 −1 xi x′i

n 1

n i=1

ˆ i. coG

and yU − 1(v > 0) f (v|x)

+ 1(xq ≤ 0) ·

yL − 1(v > 0) f (v|x)

.

ˆ β ) = n− 1 of Θβ in the direction of q. Analogously, we have s(q, Θ ˆ q) = xqi w ˆ qi , and β(



n− 1

n

i=1

xi x′i

 −1

n− 1

ˆ β in the direction of q, where frontier point of Θ   −1 n  ′ 1 ′ xqi = q xi xi xi , and n i=1

w ˆ qi = 1(xqi > 0) ·

yU ,i − 1(vi > 0)

+ 1(xqi ≤ 0) ·

In practice, one choice for the conditional density estimator fˆn (v|x) for Assumption C2 is a kernel estimator of the joint density of v and x divided by a kernel estimator of the density of x. Assumption C3 is introduced only to simplify the proof and can be dropped (see Proposition 8 in Bontemps et al., 2012). Now, we have the following result.



fˆn (vi |xi ) yL,i − 1(vi > 0) fˆn (vi |xi )



p

ˆ β , Θβ → 0 as Proposition 2. If Assumptions A–C hold, then H Θ n → ∞. Proof. Θβ is compact and convex in Rk by the expression in (5).

ˆ β is also compact and convex in Rk for sufficiently Under C3, Θ large n. Then, because the space of compact convex sets with the Hausdorff distance is a metric space (see Theorem 1.1.2 in Li et al., 2010), by the triangular inequality, 

    ˆ β, Θ ˜β +H Θ ˜ β , Θβ , ≤H Θ

.

n

i =1

xi w ˆ qi is a

n

−1

n coGi . For the second  pi=1  ˜ β , Θβ → 0 follows from Lemma term on the right-hand side, H Θ A.6 in Beresteanu and Molinari (2008). Let wqi be a random variable defined in the same way as w ˆ qi , except that the true conditional density f (vi |xi ) is used in place of fˆn (vi |xi ). For the first term, noting the equivalence that, for compact convex sets A and B, H (A, B) = supq∈S |s(q, A) − s(q, B)|, we have    −1  n n     ′ −1  ′ −1  ˆ β, Θ ˜ β = sup q n H Θ x x n xi w ˆ qi i i  q∈S  i=1 i=1    −1  n n    ′ −1 ′ −1 −q n xi xi n xi wqi   i=1 i=1   n n     −1   −1 ≤ C sup n xi w ˆ qi − n xi wqi   q∈S  i=1 i =1  n   yU ,i − 1(vi > 0)  ≤ C  n− 1 xi  fˆn (vi |xi ) i=1  n   yU ,i − 1(vi > 0)  −1 −n xi   f (vi |xi ) i=1 ˜ β := where Θ

Then, similar to Proposition 2 in Bontemps et al. (2012), we have s(q, Θβ ) = E(xq wq ), and β(q) = E(xx′ )−1 E(xwq ) is a frontier point

i=1

sufficiently large n.

(6)

where S is a unit sphere of Rk , i.e., S = {q ∈ Rk : ∥q∥ = 1}, and s(·, A) is the support function of A defined by s(q, A) := supa∈A ⟨q, a⟩, q ∈ S. Thus, the problem of estimating Θβ is reduced to the problem of estimating its support function. The support function for Θβ can be easily calculated as follows. Let

n

  n n    yℓ,i − 1(vi > 0)  p  −1  yℓ,i − 1(vi > 0) −1 −n xi xi n  → 0.   f (vi |xi ) fˆn (vi |xi ) i=1 i=1    −1  n  3. There exists a constant C such that  n−1 i=1 xi x′i  < C for



Θβ = {b ∈ Rk : ⟨q, b⟩ ≤ s(q, Θβ ) for all q ∈ S},

wq = 1(xq > 0) ·

Assumption C. 1. The observations {(Yi , vi , xi )}ni=1 are i.i.d. random variables from the joint distribution of (Y , v, x).4 2. For ℓ = U , L, as n → ∞,

ˆ β , Θβ H Θ

Under the assumptions made, the set in (5) is compact and convex in Rk . Then, we can sharply characterize the set through its support function:

xq = q′ E(xx′ )−1 x,



ˆ β , Θβ goes to zero in probaIn the following, we prove that H Θ bility as n goes to infinity.

n

n comi=1 Gi , involves the Minkowski sum, requiring 2 putations of least squares, which is intractable if the number of nonresponses is large. Therefore, the result (5) is important  nin practice because it implies that we can estimate Θβ by n−1 i=1 xi x′i

b∈B a∈A



which is unique in R under Assumption A4(b). Therefore, for  a y∗ −1(v>0) ∗ ∗ k ′ ∗ given y ∈ S (Y ), and for some b ∈ R , if E x f (v|x) − xx b =

k



a∈A b∈B

k





H (A, B) = max sup inf ∥a − b∥, sup inf ∥a − b∥ .



− xx′ b = 0 ,

f (v|x)

Let H (A, B) be the Hausdorff distance between sets A and B with the Euclidean norm, i.e.,



n− 1

i=1

xi x′i

n− 1

4 For the definition of the distribution of a set-valued random variable, see, for example, section 3.1.1 of Li et al. (2010).

T. Hoshino / Economics Letters 121 (2013) 74–78

 n   yL,i − 1(vi > 0)  +  n− 1 xi  fˆn (vi |xi ) i=1  n  p  y − 1 (v > 0 ) L , i i  − n− 1 xi  → 0,  f (v | x ) i i i=1 as n → ∞, by Assumption C2.

4. Sign of the coefficient on the special regressor Without loss of generality, our model can be written as follows:

 {y} Y = {1, 0}



3. Statistical inference Statistical inference in the special regressor-based approaches has an important issue. As shown in Khan and Tamer (2010), estimation of β based on the special regressor is often irregular when v given x has finite variance. In this case the estimator is not root-n consistent, and the semiparametric efficiency bound for β is not finite. Based on Lewbel (2000), Magnac and Maurin (2007) and Khan and Tamer (2010), root-n-consistent estimation requires either that the support of v strictly contains supp(−x′ β −ε), or that the distribution of v has sufficiently thick tails, or that Magnac and Maurin’s (2007) tail symmetry condition holds. In the following, we assume that at least one (any one) of these conditions is met. In the case where y is completely observed, Lewbel (2000) showed that



d

n(βˆ − β) → N (0, V ),

(7)

where

d

(8)

where V (q) := E(xx′ )−1 E[x(wq − E(wq |v, x)

+ E(wq |x) − x′ β(q))2 x′ ]E(xx′ )−1 ; this further implies the asymptotic normality of the estimator for the support function of a given q ∈ S,

√ 



d

ˆ β ) − s(q, Θβ ) → N (0, q′ V (q)q). n s(q, Θ

(9)

Now, based on the result (9), the inference method proposed by Bontemps et al. (2012), which is based on the asymptotic behavior of the support function, is applicable to our model. Proposition 11 of Bontemps et al. (2012) demonstrates how to construct a set CRn that satisfies limn→∞ infb∈Θβ P(b ∈ CRn ) = 1 − α . Because the true β is necessarily a member of the identified set, limn→∞ P(β ∈ CRn ) ≥ 1 − α holds. An important requirement to carry out this inference procedure is that the support function must be everywhere differentiable in q ∈ S, since the differentiability assures that the identified set has no exposed faces. When the identified set has exposed faces, the stochastic process presented in the lefthand side of (9) does not uniformly converge in distribution to a Gaussian process in q ∈ S. As shown in Lemma 3 in Bontemps et al. (2012), the identified set has exposed faces if x includes discrete regressors. For this issue, recently, Chandrasekhar et al. (2012) have proposed a data-jittering technique that adds to the discrete regressors an arbitrary small amount of smoothly distributed noise to ensure the differentiability.

5 Furthermore, Magnac and Maurin (2007) showed that the special regressorbased estimation of β with a nonparametric estimator fˆn (v|x) is more efficient than the one with the true conditional density f (v|x), and that βˆ in (7) is indeed semiparametrically efficient, i.e., V is the efficiency bound.

where J (ω) =

which is a closed interval defined by

V := E(xx′ )−1 E[x(˜y − E(˜y|v, x) + E(˜y|x) − x′ β)2 x′ ]E(xx′ )−1 ,

ˆ q) − β(q)) → N (0, V (q)), n(β(

where y = 1(δv + x′ β + ε > 0),

∂ f (v, x) (ω)Y (ω). (10) ∂v Under the nonatomicity of the probability space, Θδ = −2E(coJ ), Θδ := −2E(J ),

−2 E h

with y˜ = [y − 1(v > 0)]/f (v|x).5 To extend the result (7) to case where y is only incompletely observed, suppose that the identified set has no exposed faces. Then, we would have the asymptotic normality of the estimator for the frontier point of the identified set in a given direction q ∈ S,

if d = 1 , if d = 0

with δ = −1 or +1. In the previous sections, it was assumed that the sign of δ was known a priori. As suggested by Lewbel (2000), when y is perfectly observed, one way to determine the sign of δ is to assign it the sign of the estimated density-weighted average derivative E[f (v, x)∂ E(y|v, x)/∂v], where f (v, x) is the joint density of (v, x). It is easy to see that E[f (v, x)∂ E(y|v, x)/∂v] is proportional to δ . As in Lemma 2.1 in Powell et al. (1989), assuming that f (v, x) = 0 for all (v, x) ∈ ∂ supp(v, x), where ∂ supp(v, x) denotes the boundary of supp(v, x), we have E [f (v, x)∂ E(y|v, x)/∂v ] = −2E [y∂ f (v, x)/∂v ] by integrating by parts. This result suggests that the true δ must be included a set that is proportional to the following set:

where βˆ is the sample analogue estimator of (3) with a kernel density estimator fˆn (v|x) in place of f (v|x), and



77

 

− ∂ f (v, x)

∂v





,E h

+ ∂ f (v, x)

∂v



,

(11)

   ∂ f (v, x) ∂ f (v, x) > 0 · yL + 1 ≤ 0 · yU ∂v ∂v     ∂ f (v, x) ∂ f (v, x) ≤ 0 · yL + 1 > 0 · yU . h+ = 1 ∂v ∂v  By construction, the following inequality is satisfied: E h− ∂ f (v, x)/    ∂v ≤ E [y∂ f (v, x)/∂v ] ≤ E h+ ∂ f (v, x)/∂v for the true y, which   implies that δ = −1 if the upper bound, −2E h− ∂ f (v, x)/∂v , is h− = 1



negative, and vice versa. When the number of nonresponses is large, it may often be the case that the maximum of Θδ is positive but the minimum is negative. In such a case, we cannot determine whether δ = −1 or +1 from data alone. In some cases, economic theory may suggest the sign of δ . For example, when y is a binary indicator for the purchase of a good and v is the price of the good, δ must be negative unless the good is a Giffen good. 5. Responses affected by Stigma Consider the case of survey questions about experience with socially unacceptable behavior. For example, y = 1 if the respondent has a history of drug abuse. Then, if hesitation and reluctance to admit to these experiences are the main reasons for nonresponses, it can be credible to assume that the probability that nonrespondents engaged in such activities is at least as high as the probability that respondents did. Pepper (2001) and Molinari (2010) consider the same assumption. Namely, the assumption of a stigma-affecting response (SAR) implies that P[y = 1|d = 0] ≥ P[y = 1|d = 1]. When an SAR holds, the resulting identified set is given by

Θβ =



b ∈ R : b = E(xx )

′ −1

k

 E x

y∗ − 1(v > 0) f (v|x)



, 

y ∈ S (Y ) s.t. P[y = 1|d = 0] ≥ P[y = 1|d = 1] . ∗



(12)

78

T. Hoshino / Economics Letters 121 (2013) 74–78

The set presented above is clearly smaller than that presented in (4). In practice, the identified set under an SAR can be calculated as follows. Let ⌈a⌉ be the smallest integer larger than a, and let cn be n 

cn = ⌈a⌉ s.t.

a n 

=

( 1 − di )

di y i

i =1 n

.



i =1

di

i=1

Further, let n (q) be the size of the subsample {i : di = 0, xqi > 0}. For a given q ∈ S, if n+ (q) is larger than or equal to cn , an SAR does not tighten the bound in the direction q. If n+ (q) is smaller than cn , we need to select cn − n+ (q) observations from the subsample {i : di = 0, xqi ≤ 0}, and set their potential outcomes to 1. Note that, for a given direction q, we have +

ˆ β) = − s(q, Θ

n 1  xqi 1(vi > 0)

n i =1

+

1

fˆn (vi |xi )

n 

+

1

n 

xqi yi

n i=1:d =1 fˆn (vi |xi )

max . n i=1:d =0 y∗i ∈{0,1} fˆn (vi |xi ) i

(13)

We can see that the subsample {i : di = 0, xqi ≤ 0} does not contribute to the third term on the right-hand side of (13) without an n n SAR (n−1 i=1:di =0 maxy∗ ∈{0,1} xqi y∗i /fˆn (vi |xi ) = n−1 i=1:di =0,xqi >0 i

xqi /fˆn (vi |xi )). Thus, we modify this term to satisfy the SAR condition. To do so, suppose that the observations in the subsample {i : di = 0, xqi ≤ 0} are ordered in accordance with the magnitude of the ratios xqi /fˆn (vi |xi ) so that 0 ≥ xq1 /fˆn (v1 |x1 ) ≥ xq2 /fˆn (v2 |x2 ) ≥ · · ·. Then, the estimator of the support function for (12) is obtained by modifying the third term on the right-hand side of (13) as 1

n 

n i=1:d =0,x >0 i qi

xqi fˆn

(vi |xi ) +

1

cn −n+ (q)



xqi

n i=1:d =0,x ≤0 fˆn i qi

Acknowledgments I wish to thank the anonymous referee for very helpful comments. This study is financially supported by a Grant-in-Aid for Scientific Research (PD247943). Any remaining errors are my own. References

i

xqi y∗i

is an informal test. We conjecture that we can construct the confidence interval for δ by applying the method of Imbens and Manski (2004) to the result (11). Third, the asymptotic properties, including consistency, of the estimator for the identified set under an SAR proposed in Section 5 remain unknown. These are topics for future research. In addition, we did not consider the case in which there are unobservable or only partially observable explanatory variables. How identified sets can be constructed for binary response models where both the response variable and the explanatory variables cannot be completely observed would be worth exploring.

(vi |xi ).

6. Conclusion In this paper, we have investigated the partial identification of semiparametric binary response models with nonignorable nonresponses. We showed that, by using the special regressor approach of Lewbel (2000), the identified set of the model parameters could be easily estimated. In addition, we also showed that the estimated identified set was consistent in the Hausdorff metric under some regularity conditions. There are several unsolved research questions. First, the validity of the statistical inference method proposed in Section 3 is not yet formally proved. Second, the proposed approach to check the sign of the coefficient on the special regressor given in Section 4

Beresteanu, A., Molchanov, I., Molinari, F., 2012. Partial identification using random set theory. Journal of Econometrics 166, 17–32. Beresteanu, A., Molinari, F., 2008. Asymptotic properties for a class of partially identified models. Econometrica 76, 763–814. Bontemps, C., Magnac, T., Maurin, E., 2012. Set identified linear models. Econometrica 80, 1129–1155. Chandrasekhar, A., Chernozhukov, V., Molinari, F., Schrimpf, P., 2012. Inference for best linear approximations to set identified functions. Working Paper. Heckman, J.J., 1979. Sample selection bias as a specification error. Econometrica 47, 153–161. Ichimura, H., 1993. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics 58, 71–120. Imbens, G.W., Manski, C.F., 2004. Confidence intervals for partially identified parameters. Econometrica 72, 1845–1857. Khan, S., Tamer, E., 2010. Irregular identification, support conditions, and inverse weight estimation. Econometrica 78, 2021–2042. Klein, R., Spady, R.H., 1993. An efficient semiparametric estimator for binary response models. Econometrica 61, 387–421. Lewbel, A., 2000. Semiparametric qualitative response model estimation with unknown heteroskedasticity or instrumental variables. Journal of Econometrics 97, 145–177. Li, S., Ogura, Y., Kreinovich, V., 2010. Limit Theorems and Applications of SetValued and Fuzzy Set-Valued Random Variables. Kluwer Academic Publishers, Dordrecht. Magnac, T., Maurin, E., 2007. Identification and information in monotone binary models. Journal of Econometrics 139, 76–104. Magnac, T., Maurin, E., 2008. Partial identification in monotone binary models: discrete regressors and interval data. Review of Economic Studies 75, 835–864. Manski, C.F., 2003. Partial Identification of Probability Distributions. SpringerVerlag, New York. Molchanov, I.S., 2005. Theory of Random Sets. Springer-Verlag, London. Molinari, F., 2010. Missing treatments. Journal of Business and Economic Statistics 28, 82–95. Pepper, J.V., 2001. How do response problems affect survey measurement of trends in drug use? In: Manski, C.F., Pepper, J.V., Petrie, C. (Eds.), Informing America’s Policy on Illegal Drugs: What We Don’t Know Keeps Hurting Us. National Academy Press, Washington, DC, pp. 321–348. Powell, J.L., Stock, J.H., Stoker, T.M., 1989. Semiparametric estimation of index coefficients. Econometrica 57, 1403–1430.