Economics Letters 109 (2010) 24–27
Contents lists available at ScienceDirect
Economics Letters j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / e c o l e t
Bounds on ATE with discrete outcomes Jinyong Hahn Department of Economics, UCLA, 8283 Bunche Hall, Mail Stop 147703, Los Angeles, CA 900995, USA
a r t i c l e
i n f o
Article history: Received 16 November 2009 Received in revised form 22 June 2010 Accepted 6 July 2010 Available online 19 August 2010
a b s t r a c t The bounds on ATE by Chesher (2007) and Kitagawa (2009b) are compared. The difference between them is attributed to the scalar error assumption imposed by Chesher (2007). © 2010 Elsevier B.V. All rights reserved.
JEL classification: C31 Keywords: Average Treatment Effects Bounds
1. Introduction The past two decades saw a tremendous growth of research on estimation of treatment effects in the context of program evaluation. It is now well-understood that the average treatment effects (ATE) is identified when the binary treatment variable is exogenous (given observable covariates), but it is in general unidentified when the treatment is endogenous. Given this impossibility, two approaches emerged in the literature. Imbens and Angrist (1994) proposed to pay attention to a feasible parameter by using an instrument, i.e., the local average treatment effects (LATE). Manski (2003) proposed to identify the interval that contains the parameter of interest. In the recent past, Chesher (2007) and Kitagawa (2009b) suggested two additional approaches following the spirit of Manski (2003). Both approaches envisage an endogenous treatment along with instruments. It would be of interest to compare the bounds on ATE implied by their two different approaches. This is done by restricting attention to the simple case where the outcome variable, endogenous treatment variable and instrument variables are all binary. It is found that the bounds are different from each other. The purpose of this note is to reconcile these two bounds. The model considered in this paper is a simple one where the outcome variable is binary as well, which facilitates the comparison of the two bounds. This simple structure raises two other possibilities. One can construct an intuitive bound by extending LATE. One can also use the linear programming approach to construct the bound of ATE. It is shown that the Chesher's bound is strictly smaller than the Kitagawa's bound, which happens to be identical to Manski's bound
E-mail address:
[email protected]. 0165-1765/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.econlet.2010.07.003
under mean independence, and the gain can be attributed to Chesher's assumption that the binary outcome is generated by a scalar error term, which is not assumed by Kitagawa. Numerical evidence suggests that Manski/Kitagawa bound is identical to the one based on linear programming approach as long as monotonicity assumption is satisfied. 2. Bounds implied by existing literature In this section, the four different bounds introduced in the previous section are discussed in more detail. In all cases, following notations are adopted. The observed binary outcome is denoted Y, the binary treatment variable is denoted D, and the binary instrument is denoted Z. The observed outcome Y is generated by Y = DY ð1Þ + ð1−DÞY ð0Þ; where Y ð1Þ and Y ð0Þ are hypothetical outcomes under treatment and control. Our object of interest is the average treatment effects (ATE) β = E ½Y ð1Þ−Y ð0Þ: The observed treatment D is related to the instrument Z through D = ZDð1Þ + ð1−Z ÞDð0Þ; where Dð1Þ and Dð0Þ are hypothetical treatments when Z = 1 and Z = 0.
J. Hahn / Economics Letters 109 (2010) 24–27
25
2.1. Bound implied by Imbens-Angrist (1994)
2.3. Bound implied by Chesher (2007)
Imbens and Angrist (1994) define the local average treatment effects (LATE) to be
Chesher (2007) considered a model of the form Y = hðX; U Þ with U uniformly distributed and independent of Z, and obtained a bound on hðX; τÞ. In our case, we are dealing with the case where Y, X, Z are all binary, and D = X. The binary nature implies that there is a threshold crossing representation
λ = E ½Y ð1Þ−Y ð0ÞjDð1Þ ≠ Dð0Þ and show that1 λ=
E ½Y jZ = 1−E ½Y jZ = 0 E ½Y jZ = 1−E ½Y jZ = 0 = E ½DjZ = 1−E ½D jZ = 0 P ð1Þ−P ð0Þ
Y = 1ðU N pðDÞÞ ð1Þ
where P ðzÞ = E½DjZ = z. Note that β = λ × ðP ð1Þ−P ð0ÞÞ + E ½Y ð1Þ−Y ð0ÞjDð1Þ = Dð0Þ × ½1−ðP ð1Þ−P ð0ÞÞ
ð3Þ
We can connect Imbens-Angrist-Rubin to Chesher's notation by noting that Y ð1Þ = 1ðU N pð1ÞÞ and Y ð0Þ = 1ðU N pð0ÞÞ. Therefore, the ATE is equal to β = ð1−pð1ÞÞ−ð1−pð0ÞÞ = pð0Þ−pð1Þ.Chesher's fundamental inequalities take the form Pr ½Y ≤ 1ðτ N pðDÞÞjZ = z ≥ τ Pr ½Y b 1ðτ N pðDÞÞjZ = z b τ
Imbens and Angrist (1994) do not identify E½Y ð1Þ−Y ð0Þj Dð1Þ = Dð0Þ, but we can bound it in between −1 and 1 using the fact that Y is binary. It follows that
for all τ ∈ð0; 1Þ. These inequalities can be used3 to give us the following bounds on β = pð0Þ−pð1Þ:
−1 + ðP ð1Þ−P ð0ÞÞ + λ × ðP ð1Þ−P ð0ÞÞ ≤ β
max E ½ð1−DÞð1−Y Þ jZ = z + max E ½DY jZ = z−1 ≤ β z z ≤ min E ½Y jZ = z− max E ½Y jZ = z
≤ λ × ðP ð1Þ−P ð0ÞÞ + 1−ðP ð1Þ−P ð0ÞÞ
z
Using Eq. (1), we can rewrite it as2
z
or ð5Þ
max E ½Y jZ = z− min E½Y jZ = z ≤ β
−1 + ðE ½DjZ = 1−E ½D jZ = 0Þ + E ½Y jZ = 1−E ½Y jZ = 0
z
z
≤1− max E ½ð1−DÞY jZ = z− max E½Dð1−Y ÞjZ = z z
≤β ≤ E ½Y jZ = 1−E ½Y jZ = 0 + 1−ðE ½DjZ = 1−E ½DjZ = 0Þ:
ð2Þ
Because Y ð1Þ and Y ð0Þ are binary, we can note that the β necessarily lies in ½−1; 1 interval, so the bound Eq. (2) can be tightened a bit by intersecting it with ½−1; 1 interval, which is what I will call the bound implied by Imbens and Angrist (1994). Remark 1. This is a naïve bound using the LATE parameter. It does not utilize all the information contained in their model, so one can argue that it is not correct to call it the Imbens-Angrist bound. 2.2. Bound implied by Manski (2003) and Kitagawa (2009b) One of Kitagawa's (2009a) objectives was to bound the distribution the outcome distribution, i.e., Y ð1Þ and Y ð0Þ. His argument can be used to establish that max Pr½Y = 1; D = 1 jZ = z ≤ E ½Y ð1Þ≤1− max Pr ½Y = 0; D = 1 j Z = z z
z
and max Pr ½Y = 1; D = 0 j Z = z ≤ E ½Y ð0Þ ≤ 1− max Pr ½Y = 0; D = 0 jZ = z z
Note that these bounds are identical to Manski's (2003, Proposition 7.5) bound derived under mean independence assumption. Subtracting the latter from the former, we obtain Kitagawa's (2009b, Proposition 4.1) bound: max E½DY jZ = z + max E½ð1−DÞð1−Y Þ jZ = z−1 z z ≤ E ½Y ð1Þ−E ½Y ð0Þ ≤ 1− max E ½Dð1−Y ÞjZ = z− max E ½ð1−DÞY jZ = z z
1 I will assume that the monotonicity assumption in Imbens and Angrist (1994) is satisfied. I will assume that Dð1Þ≥Dð0Þ, although it is not necessary. 2 Note that, if Z = D, we then have P ð1Þ−P ð0Þ = 1, and then the ATE β is point identified by Eq. (2).
z
2.4. Comparison between Chesher and Manski/Kitagawa We can see that the left end point of Eq. (4) is identical to the left end point of Manski/Kitagawa bound. We can also see that the right end point of Eq. (5) is identical to the right end point of Manski/ Kitagawa bound. This suggests that Chesher bound is narrower than Manski/Kitagawa bound, which is in some sense natural because Chesher's model satisfies all the assumptions of Imbens and Angrist (1994). A natural question is whether the Manski/Kitagawa bound is identical to Chesher bound if the model satisfies Chesher's assumptions. Following example shows that the Chesher bound is strictly contained in the Manski/Kitagawa bound even if it were the case. Suppose as in Chesher (2007) that there indeed is a scalar error U that affect both Y ð1Þ and Y ð0Þ. We assume that D = 1ðV N qðZ ÞÞ and Y = 1ðU N pðDÞÞ, where we assume that U and V are uniform ð0; 1Þ 1 2 1 random variables, and that qð0Þ = , qð1Þ = , pð0Þ = , and 3 3 3 2 1 pð1Þ = . Note that β = − . If U and V are independent of each 3 3 h i 5 1 other, it can be shown that the Manski/Kitagawa bound is − ; , 9 o 9 h i h i h i n 5 1 1 1 5 1 1 4 . whereas the Chesher bound is − ; − ∪ ; = − ; − ∪ 9
z
z
ð4Þ
9
9 9
9
9
9
Even if the model follows Chesher's assumption, Manski/Kitagawa bound is wider than Chesher's in general. The difference between the two bounds seems to be attributable to the difference of the models that Chesher (2007) and Kitagawa (2009b) consider. Chesher assumes existence of a scalar error U that affect both Y ð1Þ and Y ð0Þ. Considering the pair ðY ð1Þ; Y ð0ÞÞ from the switching regression perspective, the scalar error assumption is stronger than the recent tradition in the program evaluation literature. See, e.g., Heckman (1990). The scalar error assumption places a restriction on the class of distribution of ðDð0Þ; Dð1Þ; Y ð0Þ; Y ð1ÞÞ, which seems to explain the
3 4
Proof is available upon request. Proof is available upon request.
26
J. Hahn / Economics Letters 109 (2010) 24–27
difference in bounds. For example, it can be shown5 that the single error assumption imposes the restriction min ðPr ½Y ð1Þ = 1; Pr ½Y ð0Þ = 1Þ ≤ Pr ½Y = 1jZ = z ≤ max ðPr ½Y ð1Þ = 1; Pr ½Y ð0Þ = 1Þ;
ð6Þ
3. Some numerical calculations In order to assess the properties of the four bounds, the probabilities Pr ½Y ð1Þ = y1 ; Y ð0Þ = y0 ; Dð1Þ = d1 ; Dð0Þ = d0
which may or may not be satisfied by the “true” underlying distribution of ðDð0Þ; Dð1Þ; Y ð0Þ; Y ð1Þ; Z Þ.6 The bounds of ATE are obtained by considering a class of possible true distributions of ðDð0Þ; Dð1Þ; Y ð0Þ; Y ð1Þ; Z Þ that do not contradict the joint distribution of ðD; Y; Z Þ. The scalar error restriction enables Chesher to consider a smaller class of distributions, which resulted in tighter bounds. On the other hand, the scalar error assumption does not seem to be a testable restriction. 7 More precisely, as long as ðDð0Þ; Dð1Þ; Y ð0Þ; Y ð1ÞÞ is independent of Z, and that monotonicity assumption is satisfied, there exists some ðD ð0Þ; D ð1Þ; Y ð0Þ; Y ð1ÞÞ such that the conditional distribution of ðD ; Y Þ given Z, where D = ZD ð1Þ + ð1−Z ÞD ð0Þ and Y = D Y ð1Þ + ð1−D ÞY ð0Þ, is identical to that of ðD; Y Þ. 2.5. Balke and Pearl (1994) Balke and Pearl (1994) proposed a linear programming based method of characterizing the bound. The characterization begins with the full specification of the distribution of ðY ð1Þ; Y ð0Þ; Dð1Þ; Dð0ÞÞ. We write ξ ðy1 ; y0 ; d1 ; d0 Þ = Pr ½Y ð1Þ = y1 ; Y ð0Þ = y0 ; Dð1Þ = d1 ; Dð0Þ = d0 and our objective of interest is
were generated randomly from a Dirichlet distribution with sixteen components.8 The probabilities were then transformed into eight observable quantities Pr ½Y = y; D = djZ = z, based on which the four bounds of the ATE were computed. The results are based on 5000 simulations. The bounds generated by linear programming were compared with the ones implied by Manski (2003) and Kitagawa (2009b). The average of the absolute difference of the left end points was 3.2× 10 − 4, and the average of the absolute difference of the right end points was 3.6× 10 − 4. In fact, in 4817 and 4836 out of 5000 cases, the differences were smaller than 10 − 8 and were recorded to be identically equal to 0. The bound implied by Chesher (2007) did not contain the true ATE for many cases, 36.4% of the cases. It is expected because the simulation design did not impose the single error restriction. Out of those cases where the true ATEs were indeed covered by the Chesher bounds, it was found that the Chesher bound consists of two disjoint intervals for 92.9% cases. For the remaining 7.1% of the cases, the Chesher bound took the form of one connected interval. The average length of the Chesher bounds was about 84.3% of the average length of the LP bounds.9 The simulation design in Eq. (8) may be subject to the criticism that Imbens and Angrist's (1994) monotonicity restriction is not imposed. Because Chesher's model is observationally equivalent to Imbens and Angrist's model, such additional restriction seems reasonable. This can be done by setting Pr ½Y ð1Þ = y1 ; Y ð0Þ = y0 ; Dð1Þ = 0; Dð0Þ = 1 = 0
β=
∑ ξð1; y0 ; d1 ; d0 Þ− ∑ ξðy1 ; 1; d1 ; d0 Þ
y0 ;d1 ;d0
ð7Þ
y1 ;d1 ;d0
Because Pr ½Y = y; D = d jZ = z = Pr ½Y = y; DðzÞ = d jZ = z = Pr ½Y ð1ÞDðzÞ + Y ð0Þð1−DðzÞÞ = y; DðzÞ = djZ = z = Pr ½Y ð1ÞDðzÞ + Y ð0Þð1−DðzÞÞ = y; DðzÞ = d = Pr ½Y ð1ÞDðzÞ + Y ð0Þð1−DðzÞÞ = yjDðzÞ = dPr ½DðzÞ = d we observe the following from conditional (on Z) probabilities: Pr ½Y Pr ½Y Pr ½Y Pr ½Y
= 1; D = = 1; D = = 0; D = = 0; D =
1jZ 0jZ 1jZ 0jZ
= = = =
z = Pr ½Y ð1Þ z = Pr ½Y ð0Þ z = Pr ½Y ð1Þ z = Pr ½Y ð0Þ
= 1; DðzÞ = 1 = 1; DðzÞ = 0 = 0; DðzÞ = 1 = 0; DðzÞ = 0
More precisely, we observe Pr ½Y = 1; D = 1j Z = 1 = Pr ½Y ð1Þ = 1; Dð1Þ = 1 = ∑ ξð1; y0 ; 1; d0 Þ y0 ;d0
Pr ½Y = 1; D = 1j Z = 0 = Pr ½Y ð1Þ = 1; Dð0Þ = 1 = ∑ ξð1; y0 ; d1 ; 1Þ ⋮
y0 ;d0
The bound for β can then be represented as an LP problem maximizing (minimizing) the objective function in Eq. (7) subject to the eight equality restrictions above plus one more restriction (that the ξs add up to one). 5 6 7
See Appendix B. It is not immediately clear whether this particular restriction is testable. See Appendix A.
ð8Þ
ð9Þ
for all four ðy1 ; y0 Þ combinations, which was essentially done by generating the probabilities randomly from a Dirichlet distribution with twelve components. As before, the probabilities were then transformed into eight observable quantities Pr½Y = y; D = djZ = z, based on which the four bounds of the ATE were computed. As before, the results are based on 5000 simulations. The monotonicity restriction brings up the question regarding the validity of the LP approach discussed in Section 2.5, which did not impose the zero restriction on the four probabilities in Eq. (9). Therefore, two different bounds were produced based on two different LP's. The results were numerically identical for all practical purpose. The absolute difference of the lower bounds was about 2.5 × 10 − 10 on the average, and the absolute difference of the upper bounds was about 2.8 × 10 − 10 on the average. This suggests that the monotonicity restriction is not necessary for bound calculation. The absolute difference between the lower bounds of the new LP and the Manski/Kitagawa counterpart was about 1.5 × 10 − 10 on the average, and the absolute difference between the upper bounds of the new LP and the Manski/Kitagawa counterpart was about 1.8 × 10 − 10 on the average. The bound implied by Chesher (2007) continue to exclude the true ATE for many cases, 27.1% of the cases, and the average length of the Chesher bounds was about 80.1% of the average length of the LP bounds. Although Chesher's model is observationally equivalent to Imbens and Angrist's model, the bounds are essentially 8 To be more precise, sixteen independent random variables were generated, each with exponential distribution with mean equal to 1. They were then divided by their sum, which were taken to be the joint probability in Eq. (8). 9 The LP or Kitagawa (2009) bounds were not very useful in determining the sign of the average treatment effects. In 99.4% of the cases, the bounds included zero, i.e., the identified intervals could not be used to sign the average treatment effects. When the Chesher bound did contain the ATE and took the form of one connected interval, it signed the ATE correctly. These findings about the sign of the ATE are probably very context specific.
J. Hahn / Economics Letters 109 (2010) 24–27
constructed by considering a class of all possible underlying distributions of ðDð0Þ; Dð1Þ; Y ð0Þ; Y ð1Þ; Z Þ satisfying the scalar error assumption. Because of the scalar error restriction, such a class of distributions is a proper subset of the set of all possible underlying distributions. As a consequence, Chesher's bound is smaller than Manski/Kitagawa bound, and sometimes misses the true ATE.10
Acknowledgments This note was inspired from the conversation that I had with Andrew Chesher, to whom I am indebted. I am grateful to Guido Imbens and Toru Kitagawa for their helpful comments. David Kang provided excellent research assistance, which is greatly appreciated. Financial support for this research was generously provided through NSF SES 0921187 and SES 0819638.
Appendix A. Falsifiability of Scalar Error Assumption. Our maintained assumption is that ðDð0Þ; Dð1Þ; Y ð0Þ; Y ð1ÞÞ is independent of Z, and that monotonicity assumption is satisfied. By Vytlacil (2002), we may assume without loss of generality that Dð0Þ = 1ðV N qð0ÞÞ and Dð1Þ = 1ðV N qð1ÞÞ with V uniformly distributed over ð0; 1Þ. We will assume that Pr ½D = 1 jZ = 1 b Pr ½D = 1 jZ = 0, which is equivalent to qð0Þ b qð1Þ. This is harmless because we can always relabel the binary instrument Z. We will also assume that E ½Y jZ = 0 N E ½Y jZ = 1. (The case E ½Y jZ = 1 N E ½Y jZ = 0 can be handled similarly, and will not be discussed.) Let m1 = E ½DY jZ = 0−E ½DY jZ = 1 = E ½Dð0ÞY ð1Þ−E ½Dð1ÞY ð1Þ = E ½1ðqð0Þ b V b qð1ÞÞY ð1Þ ≥ 0; m2 = E ½Y jZ = 1−E ½Y jZ = 0 = E ½ðDð1Þ−Dð0ÞÞðY ð1Þ−Y ð0ÞÞ = E ½ðDð0Þ−Dð1ÞÞðY ð0Þ−Y ð1ÞÞ = E ½1ðqð0Þ b V b qð1ÞÞðY ð0Þ−Y ð1ÞÞ N 0;
Note that
27
and define the conditional PDF f ðu jvÞ of U * given V as in the following: 8 E ½DY j Z = 1 > 3 if v N qð1Þ; u N p ð1Þ > > 1−qð1Þ > > > > > 3 E ½Dð1−Y Þj Z = 1 > > if v N qð1Þ; u b p ð1Þ > > 2 1−qð1Þ > > > > > 3 E ½ð1−DÞY j Z = 0 > > if v b qð0Þ; u N p ð0Þ > 2 qð0Þ > > > < E ½ð1−DÞð1−Y Þj Z = 0 3 if v b qð0Þ; u b p ð0Þ f u jv = qð0Þ > > > > m1 > > 3 if qð0Þ b v b qð1Þ; u N p ð1Þ > > qð1Þ−qð0Þ > > > > m2 > > 3 if qð0Þ b v b qð1Þ; p ð0Þ b u b p ð1Þ > > qð1Þ−qð0Þ > > > > > > m1 m2 > − : 3 1− if qð0Þbvbqð1Þ; u bp ð0Þ qð1Þ−qð0Þ
qð1Þ−qð0Þ
Now,let D = 1ðV N qðZ ÞÞ, Y ð0Þ = 1ðU N p ð0ÞÞ, Y ð1Þ = 1ðU N p ð1ÞÞ, and Y = D Y ð1Þ + ð1−D ÞY ð0Þ. We can see that the conditional distribution of ðD ; Y Þ given Z is identical to that of ðD; Y Þ. B. Proof of Eq. (6) Note that we have Y = 1ðU N pðDÞÞ in Chesher's model. This structure implies that Y ð1Þ = 1ðU N pð1ÞÞ and Y ð0Þ = 1ðU N pð0ÞÞ, which implies that Pr ½DðzÞ = 1 jY ð1Þ = 1 = Pr ½DðzÞ = 1 j1ðU N pð1ÞÞ Pr ½DðzÞ = 0 jY ð0Þ = 1 = Pr ½DðzÞ = 0 j1ðU N pð0ÞÞ Writing μ ðuÞ = Pr ½DðzÞ = 1 jU = u, we have Pr ½DðzÞ = 1 jY ð1Þ = 1 = ∫ μ ðuÞðu N pð1ÞÞdu Pr ½DðzÞ = 0 jY ð0Þ = 1 = ∫ð1−μ ðuÞÞðu N pð0ÞÞdu It is easy to see that Pr½DðzÞ = 1 jY ð1Þ = 1 + Pr½DðzÞ = 0jY ð0Þ = 1 is bounded between 0 and 1. It follows that Pr ½Y = 1 jZ = z = Pr ½Y ð1Þ = 1; DðzÞ = 1 + Pr ½Y ð0Þ = 1; DðzÞ = 0 = Pr ½DðzÞ = 1jY ð1Þ = 1Pr ½Y ð1Þ = 1 + Pr ½DðzÞ = 0jY ð0Þ = 1Pr ½Y ð0Þ = 1
m1 + m2 + E ½1ðqð0Þ b V b qð1ÞÞð1−Y ð0ÞÞ = E ½1ðqð0Þ b V b qð1ÞÞ = qð1Þ−qð0Þ;
is bounded between min ðPr ½Y ð1Þ = 1; Pr½Y ð0Þ = 1Þ and max ðPr½Y ð1Þ = 1; Pr ½Y ð0Þ = 1Þ.
and that
References
E ½1ðqð0Þ b V b qð1ÞÞð1−Y ð0ÞÞ ≥ 0:
The latter inequality follows because 1−Y ð0Þ is a 0–1 variable. Now, let
p ð1Þ =
2 1 ; p ð0Þ = 3 3
10 Out of those cases where the true ATEs were indeed covered by the Chesher bounds, it was found that the Chesher bound consists of two disjoint intervals for 83.4% cases. For the remaining 16.6% cases, the Chesher bound took the form of one connected interval. The LP or Kitagawa (2009) bounds were not very useful in determining the sign of the average treatment effects. In 94.2% of the cases, the bounds could not be used to sign the average treatment effects. Again, these findings about the sign of the ATE are probably very context specific.
Balke, A., Pearl, J., 1994. Counterfactual Probabilities: Computational Methods, Bounds and Applications. Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence. Morgan Kauffman, San Francisco, pp. 46–54. Chesher, A., 2007. Endogeneity and Discrete Outcomes, CEMMAP working paper. Heckman, J.J., 1990. Varieties of Selection Bias. Am. Econ. Rev. 313–318. Imbens, G.W., Angrist, J.D., 1994. Identification and Estimation of Local Average Treatment Effects. Econometrica 62, 467–475. Kitagawa, T., 2009a. “Testing for Instrument Independence in the Selection Model”, unpublished manuscript. Brown University. Kitagawa, T., 2009b. “Identification Region of the Potential Outcome Distributions under Instrument Independence”, unpublished manuscript. CEMMAP. Manski, C., 2003. Partial Identification of Probability Distributions. Springer-Verlag, New York. Vytlacil, E., 2002. Independence, Monotonicity, and Latent Index Models: An Equivalence Result. Econometrica 70, 331–341.