Sharp bounds on treatment effects in a binary triangular system

Sharp bounds on treatment effects in a binary triangular system

Journal of Econometrics 187 (2015) 74–81 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate...

593KB Sizes 1 Downloads 47 Views

Journal of Econometrics 187 (2015) 74–81

Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Sharp bounds on treatment effects in a binary triangular system Ismael Mourifié Department of Economics, University of Toronto, 150 St. George Street, Toronto ON M5S 3G7, Canada

article

info

Article history: Received 24 June 2013 Received in revised form 24 October 2014 Accepted 31 January 2015 Available online 18 February 2015 JEL classification: C14 C31 C35

abstract This paper considers the evaluation of the average treatment effect (ATE ) in a triangular system with binary dependent variables. I impose a threshold crossing model on both the endogenous regressor and the outcome. The bounds proposed by Shaikh and Vytlacil (2011,SV) on the ATE are sharp only under a restrictive condition on the support of the covariates and the instruments, which rules out a wide range of models and many relevant applications. In this setting, I provide a methodology that allows the construction of sharp bounds on the ATE by efficiently using the variation of covariates without imposing support restrictions. © 2015 Elsevier B.V. All rights reserved.

Keywords: Partial identification Threshold crossing model Triangular system Average treatment effect Endogeneity

0. Introduction This paper considers the evaluation of the average treatment effect (ATE ) of a binary endogenous regressor on a binary outcome when a threshold crossing model on both the endogenous regressor and the outcome is imposed. The joint threshold crossing (JTC) model was recently investigated by Shaikh and Vytlacil (2011), but their proposed bounds are sharp only under a critical restriction imposed on the support of the covariates and the instruments. The support condition required is very strong and often fails to hold for a wide range of models. SV takes advantage of the threshold crossing condition imposed on the endogenous regressor to refine the known bounds on the ATE in the model with an unrestricted endogenous regressor. However, whenever the support condition fails, their bounds are still valid but no longer sharp, because they do not take full advantage of the threshold crossing condition imposed on the endogenous regressor. I show in this paper how to fully exploit the second threshold crossing restriction imposed on the endogenous regressor without imposing any support restrictions. Therefore, this paper complements SV’s work by providing a methodology that allows the construction of sharp bounds on

E-mail address: [email protected]. http://dx.doi.org/10.1016/j.jeconom.2015.01.006 0304-4076/© 2015 Elsevier B.V. All rights reserved.

the ATE by efficiently using variation on covariates. The proposed methodology requires only mild regularity conditions on the distribution of unobservable variables and a typical independence assumption between the covariates (except the binary endogenous regressor) and the unobservable variables. Inference of the bounds can easily be carried out using the inferential methods of Chernozhukov et al. (2013) or of Andrews and Shi (2014). The proof of the sharpness of the proposed bounds is based on copula theory and a characterization theorem proposed by Chiburis (2010). Indeed, a similar objective was pursued by Chiburis (2010), however his characterization is not an operational characterization in the sense that it does not allow a direct computation of the identified set based on the knowledge of the observed probabilities in the data because the copula is an infinite-dimensional nuisance parameter. This makes his approach computationally infeasible in most cases of interest. Also, the JTC model is a particular case of the Chesher (2005), and Jun et al. (2010) models. However, their analyses imposed an additional restriction on the joint distribution of the unobservable variables. I do not impose such a restriction. The rest of the paper is organized as follows. The following section considers joint threshold crossing models, explains why SV’s bounds fail to be sharp without their support condition, and proposes a methodology to sharpen their bounds in this case. The second and third sections present a numerical illustration and discuss the inference procedure. The last section concludes, and proofs are collected in the Appendix.

I. Mourifié / Journal of Econometrics 187 (2015) 74–81 Table 1 Collection of sets.

1. Joint threshold crossing model

P+ (x′ , p) = {p(x′ , z ′ ) = p′ ∈ Supp(P |X = x′ ) : p ≤ p′ } P− (x′ , p) = {p(x′ , z ′ ) = p′ ∈ Supp(P |X = x′ ) : p ≥ p′ } ′ ′ Ω+ d1 d2 (x) = {x : ν(d1 , x) ≤ ν(d2 , x )} ′ ′ Ω− ( x ) = { x : ν( d , x ) ≥ ν( d , x )} 1 2 d1 d2

I adopt the framework of the potential outcome model Y = Y1 D + Y0 (1 − D), where Y is an observed outcome, D denotes the observed binary endogenous regressor, and Y1 , Y0 are potential outcomes. The potential outcomes and D are as follows: Yd = 1{ν(d, X ) > u}, D = 1{p(X , Z ) > v},

d = 0, 1

75

(1.1)

where u and v are normalized to be uniformly distributed u, v ∼ U [0, 1], 1{.} denotes the indicator function, ν(0, X ) and ν(1, X ) are unknown functions of a vector of exogenous regressors X , and p(X , Z ) is an unknown function of a vector of exogenous regressors [X , Z ]. The formal assumption I use in this section may be expressed as follows: Assumption 1. (X , Z ) and (u, v) are statistically independent.

X and Z denote the respective supports of the variables X and Z . Since u, v ∼ U [0, 1] and Y and D are binary, we have the following: ν(d, x) = P (Yd = 1 | X = x) = E[Yd |X = x], and p(x, z ) = P (D = 1 | X = x, Z = z ) for all (x, z ) ∈ Supp(X , Z ), where Supp(X , Z ) denotes the joint support of (X , Z ). The normalization of u is convenient when the potential outcomes are binary since it implies E[Yd | X = x] = ν(d, x), and bounds on treatment effect parameters can be derived from bounds on the structural parameters ν(1, x) and ν(0, x). Then, we may define the average structural function (ASF ) and the average treatment effect (ATE ), respectively, as: ν(d, x) and ∆ν(x) = ν(1, x) − ν(0, x). Let Supp(P | X ) denotes the support of p(X , Z ) conditional on X. When no confusion is possible, I shall use the shorthand notation p = p(x, z ), p′ = p(x′ , z ′ ), where p(x, z ) ∈ Supp(P |X = x) and p(x′ , z ′ ) ∈ Supp(P |X = x′ ), P (i, j|x, p) = P (Y = i, D = j|X = x, p(X , Z ) = p), and sign(a) = 1{a > 0} − 1{a < 0}. SV used the JTC equations determining Y and D along with additional assumptions to identify the sign of [ν(1, x′ ) − ν(0, x)] from the distribution of observed data, and then they took advantage of this information to construct bounds on ASF that exploit variation in covariates. However, their strategy provides bounds on the ASF that are sharp only whenever Supp(X , P (X , Z )) = X × Supp(P (X , Z )), namely the ‘‘critical support condition’’. Moreover, whenever Supp(P | X = x) ∩ Supp(P | X = x′ ) is empty or reduced to a singleton, SV’s bounds do not take advantage of the threshold restriction imposed on the equation determining D. This ‘‘critical support condition’’ implies that Supp(P | X = x) = Supp(P | X = x′ ) for all (x, x′ ) ∈ X × X; in other words for all (x, x′ ) ∈ X × X and z ∈ Supp(Z |X = x), there exists z ′ ∈ Supp(Z |X = x′ ) such that p(x, z ) = p(x′ , z ′ ). This type of ‘‘perfect matching restriction’’ is difficult to achieve in many applications. As Chiburis (2010) pointed out, the SV critical support condition tends to hold only when p(x, z ) does not depend on x, which is only true in the rare case of a complete dichotomy between variables in the outcome equation and variables in the treatment equation. I will now show how it is possible to sharpen bounds on the ASF without imposing the ‘‘critical support condition’’. 1.1. Sharpening the bounds 1.1.1. First main idea Let us present a simple intuition of the main idea of this paper. We have

ν(0, x) = P (u ≤ ν(0, x), v ≥ p(x, z )) + P (u ≤ ν(0, x), v ≤ p(x, z )), where P (u ≤ ν(0, x), v ≥ p(x, z )) = P (1, 0|x, p), but the second term P (u ≤ ν(0, x), v ≤ p(x, z )) = P (Y0 = 1, D = 1|X = x, Z =

z ) is the unobserved counterfactual. SV proposed bounding this counterfactual by exploiting variation in covariates. Indeed, SV’s idea suggests that we may bound the unobserved counterfactual for untreated individuals (D = 0) with characteristic x by using information on treated individuals (D = 1) with different characteristics x′ whenever they have exactly the same probability of being treated. In fact, if we have a treated individual with characteristic x′ belonging to the set ∆p (x) = {x′ : ν(0, x) ≤ ν(1, x′ )} ∩ {x′ : ∃p′ ∈ Supp(P | X = x′ ), p(x, z ) = p(x′ , z ′ )}, the proposed bounds of SV for the unobserved counterfactual can be summarized as follows: P (u ≤ ν(0, x), v ≤ p(x, z ))

 ≤

P (u ≤ ν(1, x′ ), v ≤ p(x′ , z ′ )) p(x, z )

if x′ ∈ ∆p (x) if ∆p (x) = ∅,

where P (u ≤ ν(1, x′ ), v ≤ p(x′ , z ′ )) = P (1, 1|x′ , p′ ). However, this idea is not sufficient to provide sharp bounds. My argument relies on the fact that under the threshold crossing model assumption imposed on the treatment (D), we may bound the unobserved counterfactual P(Y0 = 1, D = 1|x, z ) by using information on treated individuals with different characteristics x′ even if they have different probabilities of being treated. In fact, if we have a treated individual with characteristic x′ belonging to the subset ˜ p (x) = {x′ : ν(0, x) ≤ ν(1, x′ )} ∩ {x′ : ∃p′ ∈ Supp(P | X = ∆ x′ ), p(x, z ) ≤ p(x′ , z ′ )}, the unobserved counterfactual may be bounded as follows: P (u ≤ ν(0, x), v ≤ p(x, z )) ≤

P (1, 1|x′ , p′ ) p(x, z )



˜ p ( x) if x′ ∈ ∆ ˜ p (x) = ∅. if ∆

When p(x, z ) ̸∈ Supp(P | X = x) ∩ Supp(P | X = x′ ), we cannot identify P (u ≤ ν(1, x′ ), v ≤ p(x, z )) from the data. In this case, SV proposed bounding P (u ≤ ν(1, x′ ), v ≤ p(x, z )) from above by P (v ≤ p(x, z )) = p(x, z ). However, whenever it is possible to ˜ p (x), I propose bounding P (u ≤ ν(1, x′ ), v ≤ p(x, z )) find x′ ∈ ∆ from above by P (u ≤ ν(1, x′ ), v ≤ p(x′ , z ′ )) = P (1, 1|x′ , p′ ), which may be lower than P (v ≤ p(x, z )) = p(x, z ) in many cases. ˜ p (x), it is easy to see that we may obtain an imSince ∆p (x) ⊆ ∆ ˜ p (x) instead of ∆p (x), espeprovement over SV’s bounds by using ∆ cially when ∆p (x) is empty or ∆p (x)={x}. When Supp(P |X = x) = ˜ p (x) = ∆p (x); this fact explains why the Supp(P |X = x′ ), we have ∆ SV bounds would be sharp when Supp(P |X = x) = Supp(P |X = x′ ). Hereafter, I adopt the convention that the supremum over the empty set is zero and the infimum over the empty set is one. Before formalizing this idea, I will define some subsets summarized in Table 1. ′ ′ + ′ Indeed, for all x′ ∈ Ω+ 01 (x) and p(x , z ) ∈ P (x , p), we have:

ν(0, x) = P (u ≤ ν(0, x), v ≥ p(x, z )) + P (u ≤ ν(0, x), v ≤ p(x, z )) ≤ P (u ≤ ν(0, x), v ≥ p(x, z )) + P (u ≤ ν(1, x′ ), v ≤ p(x, z )) ≤ P (u ≤ ν(0, x), v ≥ p(x, z )) + min[P (u ≤ ν(1, x′ ), v ≤ p(x′ , z ′ )), p(x, z )]. Therefore,

ν(0, x) ≤ P (1, 0|x, p) + min[ inf

inf

+ ′ + ′ x′ ∈Ω01 (x) p ∈P (x ,p)

P (1, 1|x′ , p′ ), p]. (1.2)

76

I. Mourifié / Journal of Econometrics 187 (2015) 74–81

The lower bound for ν(0, x), as well as the lower and upper bounds for ν(1, x), can be similarly derived. The upper bound for ν(0, x) that I just built is lower than SV’s bounds but may not be sharp since it is also possible to take advantage of the knowledge of the sign of [ν(0, x′ ) − ν(0, x)] and the range of P+ (x, p). 1.1.2. Second main idea The first idea is not sufficient to characterize fully all the empirical content of the model. In the first idea, I showed that to bound the ASF for an untreated individual (D = 0) with characteristic x, we may use information on a treated individual (D = 1) with different characteristics x′ . My second idea relies on the fact that, under the threshold crossing model assumption imposed on the treatment (D), to bound the ASF for an untreated individual (D = 0) with characteristic x, we may also use information on other untreated individuals (D = 0) with different characteristics x′ . In fact, if we have two untreated individuals with characteristics x and x′ , such that ν(0, x) < ν(0, x′ ), we have the following:

under mild assumptions. SV showed that [ν(1, x′ ) − ν(0, x)] shares the same sign as the observable function h1 (x, x′ , p, p′ ) = (P (1, 1|x′ , p) − P (1, 1|x′ , p′ )) − (P (1, 0|x, p′ ) − P (1, 0|x, p)), when p(X , Z ) is not degenerate, and both p and p′ belong to Supp(P | X = x) ∩ Supp(P | X = x′ ) such that p′ < p. SV’s idea cannot identify the sign of [ν(1, x′ ) − ν(0, x)] when Supp(P | X = x) ∩ Supp(P | X = x′ ) is empty or a singleton. However, the sign of [ν(1, x′ ) − ν(0, x)] would still be identified. Moreover, I show that the sign of [ν(d, x′ ) − ν(d, x)]d ∈ {1, 0} may also be identified under mild assumptions. The following proposition summarizes this result. Proposition 1. Suppose that Supp(P |X = x) and Supp(P |X = x′ ) are not singletons for some (x, x′ ) ∈ X2 , and let p′1 , p′2 ∈ Supp(P | X = x′ ) and p1 , p2 ∈ Supp(P | X = x) such that p1 ≤ p′1 < p′2 ≤ p2 . Then, (1) h1 (x, x′ , p1 , p2 , p′1 , p′2 ) ≥ 0 ⇒ [ν(1, x′ ) − ν(0, x)] ≥ 0,

P (ν(0, x) ≤ u ≤ ν(0, x )) = P (ν(0, x) ≤ u ≤ ν(0, x ), v ≥ p(x, z )) ′



+ P (ν(0, x) ≤ u ≤ ν(0, x′ ), v ≤ p(x, z )) ≥ P (ν(0, x) ≤ u ≤ ν(0, x′ ), v ≥ p(x, z )) ≥ P (u ≤ ν(0, x′ ), v ≥ p(x, z )) − P (u ≤ ν(0, x), v ≥ p(x, z )).

where h1 (x, x′ , p1 , p2 , p′1 , p′2 ) = (P (1, 1|x′ , p′2 ) − P (1, 1|x′ , p′1 )) −(P (1, 0|x, p1 )− P (1, 0|x, p2 )). When p′1 = p1 and p′2 = p2 , we have sign([ν(1, x′ ) − ν(0, x)]) = sign(h1 (x, x′ , p1 , p2 , p1 , p2 )). (2)

Then, for all p(x, z ) ≤ p(x′ , z ′ ), we have

h0 (x, x′ , p1 , p2 , p′1 , p′2 ) ≥ 0 ⇒ [ν(0, x′ ) − ν(1, x)] ≥ 0,

ν(0, x′ ) − ν(0, x) ≥ P (u ≤ ν(0, x′ ), v ≥ p(x′ , z ′ )) − P (u ≤ ν(0, x), v ≥ p(x, z )) ≥ P (1, 0|x′ , p′ ) − P (1, 0|x, p).

where h0 (x, x′ , p1 , p2 , p′1 , p′2 ) = (P (1, 0|x′ , p′1 ) − P (1, 0|x′ , p′2 )) − (P (1, 1|x, p2 ) − P (1, 1|x, p1 )). (3) h˜ 1 (x, x′ , p1 , p2 , p′1 , p′2 ) ≥ 0 ⇒ [ν(1, x′ ) − ν(1, x)] ≥ 0,

Thus, for all x′ belonging to {x′ : ν(0, x) ≤ ν(0, x′ )} ∩ {x′ : ∃p′ ∈ Supp(P | X = x′ ), p(x, z ) ≤ p(x′ , z ′ )}, we shall have

ν(0, x′ ) − ν(0, x) ≥ max[P (1, 0|x′ , p′ ) − P (1, 0|x, p), 0].

(1.3)

SV’s bounds do not recover this feature of the model. For instance, in a case where Supp(P |X = x) ∩ Supp(P |X = x′ ) = ∅, SV’s bounds become the bounds derived in Proposition 2 of Chiburis (2010), and then, if ν(1, x′ ) > ν(0, x′ ) > ν(0, x) > ν(1, x), we have ν(0, x′ ) − ν(0, x) ≥ P (1, 0|x′ , p′ ) − P (1, 0|x, p) − P (D = 1|x, p), which is wider than the bound proposed in (1.3). This restriction becomes particularly useful when ν(0, x′ ) or ν(0, x) is identified. I propose the following strategy to take into account this feature of ′ + ′ the model. For all x′ ∈ Ω+ 00 (x) and p ∈ P (x , p), we have:

ν(0, x′ ) − ν(0, x) ≥ max[P (1, 0|x′ , p′ ) − P (1, 0|x, p), 0]. Therefore,

ν(0, x′ ) − ν(0, x)  ≥ sup

sup

p∈Supp(P |X =x) p′ ∈P+ (x′ ,p)



max[P (1, 0|x′ , p′ ) − P (1, 0|x, p), 0] .

Similarly, we can provide a lower bound for [ν(1, x′ ) − ν(1, x)]. Throughout the discussion above, the signs of the quantities [ν(1, x′ ) − ν(0, x)], [ν(d, x′ ) − ν(d, x)] for d = 1, 0 are important in my analysis. In the following, I provide a proposition that helps to determine the sign of these quantities under mild assumptions. 1.2. Sign of the marginal average effect Model (1.1) has an attractive feature that allows identification of the sign of the marginal average effect:

E[Y1 |X = x′ ] − E[Y0 |X = x] = [ν(1, x′ ) − ν(0, x)]

(1.4)

(4) h˜ 0 (x, x′ , p1 , p2 , p′1 , p′2 ) ≥ 0 ⇒ [ν(0, x′ ) − ν(0, x)] ≥ 0, where h˜ 0 (x, x′ , p1 , p2 , p′1 , p′2 ) = (P (1, 0|x′ , p′1 ) − P (1, 0|x′ , p′2 )) −(P (1, 0|x, p1 )− P (1, 1|x, p2 )). When p′1 = p1 and p′2 = p2 , we have sign([ν(0, x′ ) − ν(0, x)]) = sign(h˜ 0 (x, x′ , p1 , p2 , p1 , p2 )). First, this result generalizes Lemma 2.1 of SV, which identifies the sign of [ν(1, x′ ) − ν(0, x)] only in the presence of exact matches p′1 = p1 and p′2 = p2 . Second, this result shows that the sign of the marginal average effect [ν(d, x′ ) − ν(d, x)]d ∈ {1, 0} would be identified even if the ASF are only partially identified. Finally, this feature of the model will reduce the computational burden of the proposed bounds since it will help partially identify the − sets Ω+ d1 d2 (x) and Ωd1 d2 (x) for d1 , d2 ∈ {0, 1}. I have shown that the JTC model predicts a set of observable bounds on ν(d, x) and [ν(d, x′ ) − ν(d, x)] for d ∈ {1, 0}. The following theorem proves that this set of observable bounds is sufficient to characterize fully the identified set for ν(0, ·) and ν(1, ·). The proof is quite involved, and it relies on copula theory and a characterization theorem found in Chiburis (2010). Theorem 1. Consider the potential outcomes model Y = Y1 D + Y0 (1 − D), where Yd , D determined by model (1.1). Under Assumption 1, (ν(0, ·), ν(1, ·)) : X → [0, 1]2 is in the identified set if and only if sup

sup

sup



p∈Supp(P |X =x) x′ ∈X\x p′ ∈Supp(P |X =x′ )

and

P (1, 0|x, p)

+ P (1, 1|x′ , p′ )1{p ≥ p′ }1{ν(0, x) ≥ ν(1, x′ )}

E[Yd |X = x′ ] − E[Yd |X = x] = [ν(d, x′ ) − ν(d, x)], d = 1, 0

where h˜ 1 (x, x′ , p1 , p2 , p′1 , p′2 ) = (P (1, 1|x′ , p′2 ) − P (1, 1|x′ , p′1 )) −(P (1, 1|x, p2 )− P (1, 1|x, p1 )). When p′1 = p1 and p′2 = p2 , we have sign([ν(1, x′ ) − ν(1, x)]) = sign(h˜ 1 (x, x′ , p1 , p2 , p1 , p2 )).

(1.5)

≤ ν(0, x)



I. Mourifié / Journal of Econometrics 187 (2015) 74–81



inf

inf

inf



p∈Supp(P |X =x) x′ ∈X\x p′ ∈Supp(P |X =x′ )

P (1, 0|x, p)

to determine the bounds on the ATE. One main advantage of the characterization proposed in Theorem 1 is that it does not need such an algorithm and the implementation is straightforward. Even though our proposed bounds are easier to implement, it is valuable to find a methodology to reduce the number of orderings to be checked.

+ min([P (1, 1|x′ , p′ ) + 1{p > p′ }  + 1{ν(0, x) > ν(1, x′ )}], p) ,  sup sup sup P (1, 1|x, p)

p∈Supp(P |X =x) x′ ∈X\x p′ ∈Supp(P |X =x′ )

1.3.2. Using Proposition 1 to identify a partial ordering on S1 The properties of the functions hd (x, x′ , p1 , p2 , p′1 , p′2 ) and

 + P (1, 0|x , p )1{p ≥ p}1{ν(1, x) ≥ ν(0, x )} ′







h˜ d (x, x′ , p1 , p2 , p′1 , p′2 ), d ∈ {0, 1} defined in Proposition 1 allow us to discard some orderings that are non-compatible with the data and then help us to identify a partial ordering on S1 , which can dramatically reduce the number of orderings to be checked, particularly when Supp(P | X = x) ∩ Supp(P | X = x′ ) is large. This is illustrated in the following example:

≤ ν(1, x) ≤

inf

inf

inf

p∈Supp(P |X =x) x′ ∈X\x p′ ∈Supp(P |X =x′ )



P (1, 1|x, p)

+ min([P (1, 0|x′ , p′ ) + 1{p′ > p}  + 1{ν(1, x) > ν(0, x′ )}], p) for all x ∈ X, and

ν(0, x) − ν(0, x′ ) ≥

sup



sup

p∈Supp(P |X =x) p′ ∈Supp(P |X =x′ )

max[P (1, 0|x, p)



− P (1, 0|x , p ), 0]1{p ≤ p} × 1{ν(0, x ) ≤ ν(0, x)}  − 1{p′ > p} − 1{ν(0, x′ ) > ν(0, x)} ,  ν(1, x) − ν(1, x′ ) ≥ sup sup max[P (1, 1|x, p) ′







77

p∈Supp(P |X =x) p′ ∈Supp(P |X =x′ )

− P (1, 1|x′ , p′ ), 0]1{p′ ≥ p}   × 1{ν(1, x′ ) ≤ ν(1, x)} − 1{p′ < p} − 1{ν(1, x′ ) > ν(0, x)} for all (x, x′ ) ∈ X × X. In this theorem, I used a formulation of the sharp bounds that eliminates the sets Ωd+1 d2 (x) and Ωd−1 d2 (x). This formulation has advantages for implementation and inference, as shall be clarified soon. An equivalent formulation using the sets Ωd+1 d2 (x) and

Ωd−1 d2 (x) is given in the proof of Theorem 1 in the Appendix. 1.3. Computation of the bounds

For the construction of the bounds, we need to know the ordering of the elements of the set S1 = {ν(d, x) : x ∈ X and d ∈ {0, 1}}. Two different methods could be used in this context. 1.3.1. Visit all orderings Without restrictions on the true ordering on S1 , one may examine all possible orderings of S1 , and keep only the orderings for which the bounds derived in Theorem 1 do not cross. However, this method may be very costly, even under parametric restrictions for ν , especially for the approach in Chiburis (2010). In fact, the approach in Chiburis (2010) requires an algorithm that determines the existence of a copula satisfying a set of restrictions; see Chiburis (2010) for more details. Such an algorithm is computationally intensive, especially in the model entertained here, where there is no parametrization of the copula. Indeed, the Chiburis approach is not an operational characterization for the ASF , as it does not allow a direct computation of the identified set based on the knowledge of the observed probabilities in the data because the nonparametric unknown copula is an infinite dimensional nuisance parameter. For example, in his empirical example, Chiburis (2010) failed to determine the bounds on the ATE in model (1.1) due to computational intractability for a very limited support of X . Even when he assumed parametric restrictions for ν to reduce the number of orderings to be checked, he still failed

Example 1. Denoting P (Y = 1, D = d|X = x, Z = z ) = p(1, d|x, z ), consider X = {0, 1}, Z = {0, 1}, and the following observables: p(1, 1|0, 0) = β1

p(1, 0|0, 0) = γ1

p(1, 1|0, 1) = β2

p(1, 0|1, 0) = γ3

p(1, 1|1, 1) = β4

p(1, 0|0, 1) = γ2 p(1, 1|1, 0) = β3 p(1, 0|1, 1) = γ4 , with 0 < βi , γi < 1 for i = 1, . . . , 4; and p(x, z ), such that (1) β1 − β2 > γ2 − γ1 > β3 − β4 > γ4 − γ3 and (2) p(0, 1) = p(1, 1) < p(0, 0) < p(1, 0). These conditions are sufficient to determine the sign of some marginal average effects. We can describe the set of all possible orderings on S1 as follows:

ν(d1 , d′1 ) < ν(d2 , d′2 ) < ν(d3 , d′3 ) < ν(d4 , d′4 ), where {di , d′i } ̸= {dj , d′j } for i ̸= j and di , d′i ∈ {0, 1}; hence, there are 24 possible orderings. We have Supp(P |X = 0) = {p(0, 1), p(0, 0)} and Supp(P |X = 1) = {p(1, 1), p(1, 0)}. Thus, Supp(P | X = 0) ̸= Supp(P | X = 1). SV’s approach allows the identification of only the sign of [ν(1, 1) − ν(0, 1)] and [ν(1, 0) − ν(0, 0)] from the observable function h1 (x, x, p, p, p′ , p′ ). sign[ν(1, 1) − ν(0, 1)] = sign[h(1, 1, p(1, 1), p(1, 0))] = 1 sign[ν(1, 0) − ν(0, 0)] = sign[h(0, 0, p(0, 0), p(0, 1))] = 1. Then, we have the partial ordering ν(0, 1) ≤ ν(1, 1) and ν(0, 0) ≤ ν(1, 0) on S1 . Among the 24 possible orderings, only six are compatible with the restrictions imposed by this partial ordering. We can see that even in the worst case, when Supp(P | X = x) ∩ Supp(P | X = x′ ) is empty or a singleton, the number of orderings to be checked drops from 24 to 6 i.e. when Supp(P | X = x) is not a singleton, we can always identify the sign of [ν(1, x)− ν(0, x)]. Moreover, we can see that the sign of [ν(0, 1) − ν(1, 0)] may be identified using the function h0 (0, 1, p(1, 1), p(1, 0), p(0, 1), p(0, 0)). Indeed, sign[ν(0, 1)−ν(1, 0)] = 1. Then, we have only one ordering compatible with the data in this generic case: ν(0, 0) < ν(1, 0) < ν(0, 1) < ν(1, 1). 2. Numerical illustration Now, I provide a numerical illustration of the bounds on ATE using Theorem 1. In addition to the bounds proposed in Theorem 1, I will compute SV’s bounds. Consider the following special case of the model: Y = 1{α D + X β > ϵ1 } D = 1{X γ + Z η > ϵ2 },

(2.1)

78

I. Mourifié / Journal of Econometrics 187 (2015) 74–81

(a) (β = 1, γ =

1 5

, η = 21 ).

(c) (β = 1, γ =

1 100

, η = 1 ).

(b) (β = 1, γ =

1 5

, η = 4).

(d) (β = 1, γ =

1 5

, η = 12 ).

Fig. 1. Sharp bounds on ATE.

with X  = [−  2, 2], Z = {0, 1}, and (ϵ1 , ϵ2 ) v N (0,



=

1

ρ

ρ

1

 ), where

.

It can be seen that the SV support condition fails to hold for γ ̸= 0. Indeed, Supp(P | X = x) ∩ Supp(P | X = x′ ) is either empty or reduces to a singleton for all x ∈ [−2, 2]. I will construct the ATE by fixing α = 2 and ρ = 12 while varying other parameters. I will consider all the ordering induced by the parametric form α ′ D + xβ ′ . In fact, every pair of parameters (α ′ , β ′ ) induces one ordering. Before constructing the bounds, I will apply Proposition 1 to reduce the number of orderings to be checked. For example, the function h1 (x, x, p, p, p′ , p′ ) allows us to identify the sign of [ν(1, x) − ν(0, x)] for all x, which implies that α ′ > 0. Now, we may examine only the ordering induced by (α ′ > 0, β ′ ). In this numerical illustration, I consider all the orderings induced by α ′ ∈ (0, 5) and β ′ ∈ [β − 2.5, β + 2.5]. It is possible to consider a larger space, but within the simulation note that most of the orderings induced by (α ′ , β ′ ) are rejected whenever those orderings deviate slightly from the true ordering induced by the true parameters (α, β). All the figures show the ATE (x) while x varying from [−2, 2] for different values of the parameters. Within all the figures, the notations SVL and SVU denote, respectively, the lower and upper bounds of SV; SBL and SBU denote, respectively, the sharp lower and upper bounds proposed in Theorem 1. Here, I will describe four different facts: (1) Fig. 1(a) represents the case where (β = 1, γ = 15 , η = 12 ). We can see that my lower bound (SBL) improves significantly on the SV lower bound (SVL), while the upper bound is exactly the

same. Indeed, since ν(0, x′ ) ≤ ν(1, x) for many values of (x, x′ ), our bounds refine the lower bound of ν(1, x) and similarly for the upper bound of ν(0, x′ ). However, there are two main reasons why the bounds do not refine the SV upper bound. There are only a few values of (x, x′ ) such that ν(1, x) ≤ ν(0, x′ ), and in the case where it holds, it is likely that p(x′ , z ′ ) ≥ p(x, z ), while we need p(x′ , z ′ ) ≤ p(x, z ) if we want to refine ν(1, x) using ν(0, x′ ). (2) In Fig. 1(b), I increase the strength of the effect of the instrument (β = 1, γ = 51 , η = 4). I note two important facts: First, I am now able to refine the upper bound of ν(1, x) using ν(0, x′ ) since it is likely that p(x′ , 0) ≤ p(x, 1). The discontinuity denotes the point where both of the following conditions start to hold simultaneously : ν(1, x) ≤ ν(0, x′ ) and p(x′ , 0) ≤ p(x, 1). Second, when the strength of the instrument increases, we tend to have identification. This phenomenon is an example of identification at infinity, as in Heckman (1990). We can see that SV’s bounds do not include this feature when their support condition fails. (3) In Fig. 1(c), I decrease the strength of the covariate in the selection equation to see how the bounds behave when the SV 1 support condition almost holds (β = 1, γ = 100 , η = 1). We see that our bounds improve on the SV bounds. Indeed, when we get closer to the ‘‘perfect matching restriction’’, additional simulations show that my bounds become better, but SV’s bounds are not sensitive to that and jump directly to the tightest bounds when γ = 0.

I. Mourifié / Journal of Econometrics 187 (2015) 74–81

(4) In Fig. 1(d), I reduce the strength of the covariate in the outcome equation (β = 1, γ = 51 , η = 12 ), I note that both types of bounds are very wide, and I obtain only a small improvement over the SV bounds. Indeed, the strength of this present analysis is based on the variation of the covariates. When X is small, our improvement over the SV might be small. This fact explains why Chiburis (2010) found only a small improvement over the SV bounds within his empirical example, where the domain of X was {0, 1}. 3. Inference Although this paper focuses on identification, I will briefly discuss inference in finite samples for my bounds. To the best of my knowledge, no estimation or inference procedure has been previously proposed for the sharp bounds on the ASF in model (1.1). Chiburis (2010) recognized the difficulty of using his approach in terms of inference and did not address this question. However, Theorem 1 characterized the identified set for the ASF in terms of intersection bounds, so the inference can be easily carried out using existing inferential methods, specifically Chernozhukov et al. (2013) or Andrews and Shi (2014). It is important to note that, if we use Proposition 1 to reduce the number of orderings to be checked, we need to take the pre-estimation errors into account. When the sign of the different marginal average effects can be estimated super consistently, it will not affect the asymptotic properties of the estimators in Chernozhukov et al. (2013) or in Andrews and Shi (2014). However, when the sign of the orderings on S1 involves (near) ties, i.e. sign[ν(1, x) − ν(0, x′ )] ≃ 0 or sign[ν(d, x) − ν(d, x′ )] ≃ 0, d = {0, 1} for some (x, x′ ) ∈ X × X, the super consistency argument would no longer hold and the preestimation may therefore be problematic.1 This issue is beyond the scope of this paper and remains a goal for future research.

Conference on Incomplete Models, and from seminar audiences at the Cambridge, University of Chicago, Columbia, Exeter Business School, Penn State, University of Toronto, and Warwick economics departments. Parts of this paper were written while I was visiting Penn State; and I thank my hosts for their hospitality and support.

Appendix. Proof of results in the main text Proof of Proposition 1. I will prove cases (1) and (3). The cases (2) and (4) can be similarly proved. Let p′1 , p′2 ∈ Supp(P | X = x′ ) and p1 , p2 ∈ Supp(P | X = x) such that p1 ≤ p′1 < p′2 ≤ p2 then,

• Case (1): h1 (x, x′ , p1 , p2 , p′1 , p′2 ) = (P (1, 1|x′ , p′2 ) − P (1, 0|x′ , p′1 ))

− (P (1, 0|x, p1 ) − P (1, 1|x, p2 )) = P (u ≤ ν(1, x′ ), p′1 < v < p′2 ) − P (u ≤ ν(0, x), p1 < v < p2 ) ≤ P (u ≤ ν(1, x′ ), p1 < v < p2 ) − P (u ≤ ν(0, x), p1 < v < p2 ). The last inequality holds since [p′1 , p′2 ] ⊆ [p1 , p2 ]. Therefore, if h1 (x, x′ , p1 , p2 , p′1 , p′2 ) ≥ 0, then P (u ≤ ν(1, x′ ), p1 < v < p2 ) − P (u ≤ ν(0, x), p1 < v < p2 ) ≥ 0, which implies that [ν(1, x′ ) − ν(0, x)] ≥ 0 since we have the following:



Acknowledgements I am deeply grateful to Marc Henry for his constant guidance, inspiration, and encouragement. I am also grateful to Louis-Philippe Béland, Ivan Canay, Han Hong, Sung Jae Jun, Désiré Kedagni, Joris Pinkse, Christoph Rothe, Alexander Torgovitsky, Bernard Salanié, Yuanyuan Wan, an associate editor, and three anonymous referees for their helpful comments and discussions, as well as comments from participants at the Second CIREQ-CeMMAP

1 I thank an anonymous referee who pointed out this pre-estimation issue to me.

P (u ≤ ν(1, x′ ), p1 < v < p2 ) − P (u ≤ ν(0, x), p1 < v < p2 )



 P (ν(0, x) ≤ u ≤ ν(1, x′ ), p1 < v < p2 )    if ν(1, x′ ) > ν(0, x)  = 0 if ν(1, x′ ) = ν(0, x)  ′   −P (ν(1, x ′) ≤ u ≤ ν(0, x), p1 < v < p2 ) if ν(1, x ) < ν(0, x).

4. Conclusion I have considered the special case of joint threshold crossing models in which no parametric form or distributional assumptions are imposed. I provided sharp bounds on the average treatment effect (ATE ) when I imposed only mild regularity conditions on the distribution of unobservable variables. I presented a methodology to construct sharp bounds on the ATE by efficiently using the variation of covariates without imposing any support restrictions. A numerical illustration showed that my proposed bounds significantly improve on the Shaikh and Vytlacil (2011) bounds, which, until now, where the tightest feasible bounds proposed in the literature for this model. There are several natural extensions of this work. First, this methodology may be easily used to provide sharp bounds for other functionals of treatment, not just the average. Second, this methodology efficiently exploits variation of covariates to sharpen bounds, and it may be extended to narrow cross-sectional bounds using time variation in panel data.

79

• Case (3): (i) h˜ 1 (x, x′ , p1 , p2 , p′1 , p′2 ) = (P (1, 1|x′ , p′2 ) − P (1, 0|x′ , p′1 ))

− (P (1, 1|x, p2 ) − P (1, 1|x, p1 )) = P (u ≤ ν(1, x′ ), p′1 < v < p′2 ) − P (u ≤ ν(1, x), p1 < v < p2 ) ≤ P (u ≤ ν(1, x′ ), p1 < v < p2 ) − P (u ≤ ν(1, x), p1 < v < p2 ). The last inequality holds since [p′1 , p′2 ] ⊆ [p1 , p2 ]. Therefore,

if h˜ 1 (x, x′ , p1 , p2 , p′1 , p′2 ) ≥ 0, then P (u ≤ ν(1, x′ ), p1 < v < p2 ) − P (u ≤ ν(1, x), p1 < v < p2 ) ≥ 0, which implies that [ν(1, x′ ) − ν(1, x)] ≥ 0 since we have the following:



P (u ≤ ν(1, x′ ), p1 < v < p2 ) − P (u ≤ ν(1, x), p1 < v < p2 )



 P (ν(1, x) ≤ u ≤ ν(1, x′ ), p1 < v < p2 )     if ν(1, x′ ) > ν(1, x) = 0 if ν(1, x′ ) = ν(1, x)  ′  −P (ν(1, x ) ≤ u ≤ ν(1, x), p1 < v < p2 )  if ν(1, x′ ) < ν(1, x). • Case (3): (ii) if p′1 = p1 = p′ and p′2 = p2 = p, define h˜ 1 (x, x′ , p, p′ ) ≡ h˜ 1 (x, x′ , p1 , p2 , p′1 , p′2 ), h˜ 1 (x, x′ , p, p′ ) = (P (1, 1|x′ , p) − P (1, 1|x′ , p′ ))

− (P (1, 1|x, p) − P (1, 1|x, p′ )) = P (u ≤ ν(1, x′ ), p′ < v < p) − P (u ≤ ν(1, x), p′ < v < p);

80

I. Mourifié / Journal of Econometrics 187 (2015) 74–81

Claim 1. Under Assumption 1, ν(d, x) : {0, 1} × X −→ [0, 1] is in the identified set if and only if there exists a subcopula C whose support is S1 ∪ {0, 1} × S2 ∪ {0, 1} such that:

then,

 P (ν(1, x) ≤ u ≤ ν(1, x′ ), p′ < v < p)     if ν(1, x′ ) > ν(1, x) ˜h1 (x, x′ , p, p′ ) = 0 if ν(1, x′ ) = ν(1, x)  ′ ′  −P (ν(1, x ) ≤ u ≤ ν(1, x), p < v < p)  ′ if ν(1, x ) < ν(1, x). This completes the proof.

(1) C (u, 0) = C (0, v) = 0 for all u ∈ S1 ∪ {0, 1} and for all v in S2 ∪ {0, 1}, (2) C (ν(1, x), p) = P (1, 1|x, p) and C (ν(0, x), p) = ν(0, x) − P (1, 0|x, p) for all (x, p) ∈ X × supp(P |X ),



where S2 = {p(x, z ) : (x, z ) ∈ X × Z}.

Proof of Theorem 1. Here, I will present two equivalent reformulations of Theorem 1. The first one is the most obvious from the discussion I provided in the text. To ease the notation, when no confusion is possible, I shall use the shorthand notation supp ≡ supp∈Supp(P |X =x) , supp′ ≡ supp′ ∈Supp(P |X =x′ ) , supΩ−(+) (x) ≡

The proof of this claim is given in Chiburis (2010). For the sake of simplicity, I shall use in this section the following notation, L0 (x, p) ≡ supΩ− (x) supP− (x′ ,p) P (1, 1|x′ , p′ ) and M0 (x, p) ≡

supx′ ∈Ω−(+) (x) , supP−(+) (x′ , p) ≡ supp′ ∈P−(+) (x′ , p), and similarly for

ν(0, x) lies inside the interval [SL0 (x), SU0 (x)]. To show that these

d1 d2

d1 d2

the inf. I will focus only on ν(0, x) for the sake of simplicity. New formulation: Using (1.2) we have

ν(0, x)

01

supΩ− (x) supP− (x′ ,p) (SL0 (x′ ) − P (1, 0|x′ , p′ )). We have shown that 00

bounds are sharp, it is sufficient to construct a subcopula that respects the conditions cited in Claim 1 when ν(0, x) equals SL0 (x) or SU0 (x). Then, assume that

ν(0, x) = SL0 (x) = sup{P (1, 0|x, p) + max[L0 (x, p), M0 (x, p)]}, 

≤ inf P (1, 0|x, p) + min( inf

inf

Ω+ (x) P+ (x′ ,p) 01

p



P (1, 1|x , p ), p) . ′



p

(A.1)

This formulation is equivalent to the upper bounds for ν(0, x) presented in Theorem 1. Indeed, we have the following:

 ν(0, x) ≤ inf P (1, 0|x, p) + min( inf



inf

Ω+ (x) P+ (x′ ,p) 01

p

P (1, 1|x′ , p′ ), p) ′

Moreover, let us assume that we know the sharp bounds

[SL0 (x′ ), SU0 (x′ )] for ν(0, x′ ); (1.5) may be rewritten equivalently ′ + ′ as follows: for all x′ ∈ Ω+ 00 (x) and p ∈ P (x , p), ν(0, x) − P (u ≤ ν(0, x), v ≥ p) ≤ ν(0, x′ ) − P (u ≤ ν(0, x′ ), v ≥ p′ ) ≤ SU0 (x′ ) − P (u ≤ ν(0, x′ ), v ≥ p′ ). Therefore, inf (SU0 (x′ )

Ω+ (x) P+ (x′ ,p) 00

− P (1, 0|x′ , p′ )).

(A.2)

By combining the upper bounds for ν(0, x) derived in (A.1) and (A.2). We may propose the following upper bound for ν(0, x):

inf

inf

(x) P+ (x′ ,p) Ω+ 01

p



inf (SU0 (x′ ) − P (1, 0|x′ , p′ ))

Ω+ (x) P+ (x′ ,p) 00

C (ν(0, x), p) = −P (u ≤ ν(0, x), v ≥ p) + sup{P (1, 0|x, p)

+ max[L0 (x, p), M0 (x, p)]}.

= inf inf inf P (1, 0|x, p) + min([P (1, 1|x , p ) p x′ ∈X p′  + 1{p′ > p} + 1{ν(0, x) > ν(1, x′ )}], p) . ′

  ν(0, x) ≤ inf P (1, 0|x, p) + min min( inf

C (ν(1, x), p) = P (u ≤ ν(1, x), v ≤ p) p



ν(0, x) ≤ P (1, 0|x, p) + inf

where the supremum is taken over Supp(P | X ). I will now show that the following function is a subcopula on domain S1 ∪ {0, 1} × S2 ∪ {0, 1}:

P (1, 1|x′ , p′ ), p),

.

By construction, our function verifies properties (1) and (3) of Definition 1, and it remains necessary to verify property (2). When Supp(P × X ) = Supp(P ) × Supp(X ), property (2) imposes restrictions on ν(0, x) for all p, p′ ∈ Supp(P ) = Supp(P | X ) = Supp(P | X ′ ). This is no longer the case when we have Supp(P × X ) ̸= Supp(P ) × Supp(X ) because of additional data observability constraints. Indeed, C (ν(1, x), p(x′ , z ′ )) = P (u ≤ ν(1, x), v ≤ p(x′ , z ′ )) cannot be identified from the data when p(x′ , z ′ ) ̸∈ Supp(P | X ). Thus, property (2) does not always impose additional testable constraints. To clarify this point, consider the two following situations: (1) Supp(P | X ) ∩ Supp(P | X ′ ) = ∅, u1 = ν(0, x), u2 = ν(1, x′ ), v1 = p(x′ , z ′ ) and v2 = p(x, z ). Then, property 2 does not impose additional restrictions on ν(0, x) since we cannot identify C (ν(1, x), p(x′ , z ′ )) = P (u ≤ ν(1, x), v ≤ p(x′ , z ′ )). (2) Supp(P | X ) ∩ Supp(P | X ′ ) = ∅, u1 = ν(0, x), u2 = ν(1, x′ ), v1 = p(x, z ), and v2 = p(x′ , z ′ ). The only constraint from property 2 is C (ν(1, x), p(x, z )) ≥ C (ν(1, x′ ), p(x′ , z ′ )). I now prove in two steps that the proposed function verifies property (2). Before going over these steps, I need a technical result: Claim 2.

This bound is an equivalent formulation of the bounds proposed + − in Theorem 1 using the set Ω01 (x) and Ω00 (x). Then, to prove Theorem 1, I will prove that the bounds derived using the latter formulation are sharp.

P (u2 ≤ u ≤ u1 , v2 ≤ v ≤ v1 )

Definition 1. A two-dimensional subcopula (or brief subcopula) is a function C with the following properties (Nelsen, 2006):

First step: Let p ∈ Supp(P | X ) ∩ Supp(P | X ′ ) ̸= ∅.

(1) Domain(C) = D1 × D2 , where D1 and D2 are subsets of [0, 1] containing 0 and 1. (2) C (u1 , v1 ) − C (u1 , v2 ) − C (u2 , v1 ) + C (u2 , v2 ) ≥ 0, for all u1 , u2 ∈ D1 and v1 , v2 ∈ D2 such that u1 ≥ u2 and v1 ≥ v2 . (3) C (u, 1) = u and C (1, v) = v for all u ∈ D1 and for all v in D2 .

= [P (u ≤ u1 , v ≤ v1 ) + P (u ≤ u2 , v ≥ v1 )] − [P (u ≤ u1 , v ≤ v2 ) + P (u ≤ u2 , v ≥ v2 )] ≥ 0.

(1) Let (x, x′ ) satisfy ν(0, x) ≥ ν(1, x′ ). C (ν(0, x), p) − C (ν(1, x′ ), p)

= −(P (u ≤ ν(0, x), v ≥ p) + P (u ≤ ν(1, x′ ), v ≤ p)) + sup{P (u ≤ ν(0, x), v ≥ p) + max[L0 (x, p), M0 (x, p)]} p

I. Mourifié / Journal of Econometrics 187 (2015) 74–81

≥ −(P (u ≤ ν(0, x), v ≥ p) + P (u ≤ ν(1, x′ ), v ≤ p)) + sup{P (u ≤ ν(0, x), v ≥ p) + L0 (x, p)} p

≥ −(P (u ≤ ν(0, x), v ≥ p) + P (u ≤ ν(1, x′ ), v ≤ p)) + sup{P (u ≤ ν(0, x), v ≥ p) p

+ sup

81

+ sup{P (u ≤ ν(0, x), v ≥ p) + sup

sup (SL0 (x∗ )

− (x) P − (x∗ ,p) Ω00

p

− P (u ≤ ν(0, x∗ ), v ≥ p∗ ))} − SL0 (x′ ) ≥ −(P (u ≤ ν(0, x), v ≥ p) − P (u ≤ ν(0, x′ ), v ≥ p)) + sup{P (u ≤ ν(0, x), v ≥ p) p

sup P (u ≤ ν(1, x∗ ), v ≤ p∗ )} ≥ 0.

− Ω01 (x) P − (x∗ ,p)

− The last inequality holds because p ∈ Supp(P | X ), x′ ∈ Ω01 (x) and p ∈ P − (x′ , p). In addition C (ν(0, x), p) − C (ν(1, x′ ), p) is increasing in p by the first equality. Indeed, according to Claim 2 (P (u ≤ ν(1, x′ ), v ≤ p) + P (u ≤ ν(0, x), v ≥ p)) is decreasing in p since ν(0, x) ≥ ν(1, x′ ). Then, for all p′ < p ∈ Supp(P | X ) ∩ Supp(P | X ′ ), we have C (ν(0, x), p) − C (ν(1, x′ ), p) ≥ C (ν(0, x), p′ ) − C (ν(1, x′ ), p′ ). Thus, C (ν (0, x), p) − C (ν(1, x′ ), p) − C (ν(0, x), p′ ) + C (ν(1, x′ ), p′ ) ≥ 0. Therefore, property 2 is verified. (2) Let (x, x′ ) satisfy ν(0, x) ≤ ν(1, x′ ).

C (ν(0, x), p) − C (ν(1, x′ ), p)

− P (u ≤ ν(0, x′ ), v ≥ p)} ≥ 0. − ′ The fourth inequality holds because x′ ∈ Ω− 00 (x) and p ∈ P (x , p). In addition, C (ν(0, x), p) − C (ν(0, x′ ), p) increases in p by the first equality.

Second step: Supp(P | X ) ∩ Supp(P | X ′ ) = ∅ (1) ν(0, x) ≥ ν(1, x′ ) and p(x, z ) ≥ p(x′ , z ′ ) C (ν(0, x), p) − C (ν(1, x′ ), p′ ) ≥ C (ν(0, x), p) − C (ν(1, x′ ), p)

≥ 0. The last inequality holds in accordance with point (1) of the first step. (2) ν(0, x) ≤ ν(1, x′ ) and p(x, z ) ≤ p(x′ , z ′ )

= −(P (u ≤ ν(0, x), v ≥ p) + P (u ≤ ν(1, x′ ), v ≤ p)) + sup{P (u ≤ ν(0, x), v ≥ p) + max[L0 (x, p), M0 (x, p)]}

C (ν(0, x), p) − C (ν(1, x′ ), p′ ) ≤ C (ν(0, x), p) − C (ν(1, x′ ), p)

≤ −(P (u ≤ ν(0, x), v ≥ p) + P (u ≤ ν(0, x), v ≤ p)) + sup{P (u ≤ ν(0, x), v ≥ p) + max[L0 (x, p), M0 (x, p)]}

The last inequality holds in accordance with point (2) of the first step. (3) ν(0, x) ≥ ν(0, x′ ) and p(x, z ) ≥ p(x′ , z ′ )

≤ 0.

p

p

≤ −P (u ≤ ν(0, x)) + sup{P (u ≤ ν(0, x), v ≥ p)

C (ν(0, x), p) − C (ν(0, x′ ), p) ≥ C (ν(0, x), p) − C (ν(0, x′ ), p)

≥ 0.

p

+ max[L0 (x, p), M0 (x, p)]} ≤ −P (u ≤ ν(0, x)) + SL0 (x) ≤ 0. The first inequality holds because ν(0, x) ≤ ν(1, x′ ). In addition, C (ν(0, x), p) − C (ν(1, x′ ), p) increases in p by the first equality. Indeed, according to Claim 2, (P (u ≤ ν(1, x′ ), v ≤ p) + P (u ≤ ν(0, x), v ≥ p)) increases in p since ν(0, x) ≤ ν(1, x′ ). (3) Let (x, x′ ) satisfy ν(0, x) ≤ ν(0, x′ ). By interchanging x by x′ at point (4) below, we obviously find that C (ν(0, x), p)− C (ν(0, x′ ), p) decreases in p and is greater than 0. (4) Let (x, x′ ) satisfy ν(0, x) ≥ ν(0, x′ ). C (ν(0, x), p) − C (ν(0, x′ ), p)

= −(P (u ≤ ν(0, x), v ≥ p) − P (u ≤ ν(0, x′ ), v ≥ p)) + sup{P (u ≤ ν(0, x), v ≥ p) + max[L0 (x, p), M0 (x, p)]} p

− sup{P (u ≤ ν(0, x′ ), v ≥ p) p′

+ max[L0 (x′ , p′ ), M0 (x′ , p′ )]} ≥ −(P (u ≤ ν(0, x), v ≥ p) − P (u ≤ ν(0, x′ ), v ≥ p)) + sup{P (u ≤ ν(0, x), v ≥ p) + M0 (x, p)} − SL0 (x′ ) p

≥ −(P (u ≤ ν(0, x), v ≥ p) − P (u ≤ ν(0, x′ ), v ≥ p))

The last inequality holds in accordance with point (3) of first step. (4) ν(0, x) ≤ ν(0, x′ ) C (ν(0, x), p) − C (ν(0, x′ ), p) ≥ C (ν(0, x), p) − C (ν(0, x′ ), p)

≥ 0. The last inequality holds in accordance with point (4) of the first step. Then, property (2) holds. I can proceed in the same way for ν(0, x) = SU0 (x). This completes the proof.  References Andrews, D., Shi, X., 2014. Nonparametric inference based on conditional moment inequalities. J. Econometrics 179 (1), 31–45. Chernozhukov, V., Lee, S., Rosen, A., 2013. Intersection bounds: Estimation and inference. Econometrica 81 (2), 667–737. Chesher, A., 2005. Nonparametric identification under discrete variation. Econometrica 73, 1525–1550. Chiburis, R.C., 2010. Semiparametric bounds on treatment effect. J. Econometrics 159, 267–275. Heckman, J., 1990. Varieties of selection bias. Amer. Econ. Rev. 80, 313–318. Jun, S., Pinkse, J., Xu, H., 2010. Tighter bounds in triangular systems. J. Econometrics 161 (2), 122–128. Nelsen, Roger B., 2006. An Introduction to Copulas. Springer, New york. Shaikh, A., Vytlacil, E., 2011. Partial identification in triangular systems of equations with binary dependent variables. Econometrica 79, 949–955.