Economics Letters 125 (2014) 360–363
Contents lists available at ScienceDirect
Economics Letters journal homepage: www.elsevier.com/locate/ecolet
A note on the identification in two equations probit model with dummy endogenous regressor✩ Ismael Mourifié a , Romuald Méango b,∗ a
Department of Economics, University of Toronto, 150 St. George Street, Toronto ON M5S 3G7, Canada
b
Ifo Institute, Munich, Poschingerstr. 5, 81679, Munich, Germany
highlights • We study identification in two equation probit models with endogenous dummy regressor. • Parameters are set identified without exclusion restriction. • Numerical evidence contradict Wilde (2000).
article
abstract
info
Article history: Received 14 October 2013 Received in revised form 9 October 2014 Accepted 9 October 2014 Available online 22 October 2014
This paper deals with the question whether exclusion restrictions on the exogenous regressors are necessary to identify two equation probit models with endogenous dummy regressor. We show that Wilde (2000)’s criterion is insufficient for (point) identification. © 2014 Elsevier B.V. All rights reserved.
JEL classification: C35 Keywords: Probit model Endogenous dummy regressor Partial identification
1. Introduction This paper discuss identification in the following two equations probit model with endogenous dummy regressor. Y1 = I xT1 β1 + u1 > 0
(1)
Y2 = I δ Y1 + xT2 β2 + u2 > 0
(2)
where
1 (u1 , u2 ) follows N 0, ρ
ρ 1
,
I (A) = 1 if A is true and zero otherwise and ρ ∈ (−1, 1) (see Sartori, 2003, for a treatment of the case ρ = 1). In all the paper,
✩ Financial support by the Leibniz Association (SAW-2012-ifo-3) is gratefully acknowledged. The authors are grateful to Joris Pinkse, Rachidi Kotchoni and Marc Henry and an anonymous referee for helpful discussions and comments. ∗ Corresponding author. E-mail address:
[email protected] (R. Méango).
http://dx.doi.org/10.1016/j.econlet.2014.10.006 0165-1765/© 2014 Elsevier B.V. All rights reserved.
we shall use the notation Φ2 (·, ·; ρ) to denote the bivariate normal standard cumulative distribution with correlation parameter ρ and φ2 (·, ·; ρ) the corresponding bivariate density function. We denote by Φ (·) the univariate normal standard cumulative distribution and by φ(·) the corresponding density function. We say that there exists an exclusion restriction when there exists a variable in x1 that does not appear in x2 . Two main opinions dominate the literature about identification in this model. On one hand, Maddala (1983, p. 122) claimed that an exclusion variable is necessary for identification. His argument was based on the following fact. In a case where x1 = x2 = 1, the model has four different parameters to be identified β1 , β2 , δ and ρ , while we observe only three independent probabilities. Adding an exclusion variable increases the number of observed independent probabilities, thus enabling the number of observed independent probabilities to be larger than or equal to the number of parameters to be identified. On the other hand, Wilde (2000) notes that even without an exclusion variable, the presence of only a common dichotomous covariate might result in the number of observed independent probabilities equating the number of parameters to
I. Mourifié, R. Méango / Economics Letters 125 (2014) 360–363
(a) (ρ, f (ρ)) region, β21 = −0.4, ρ0 = −0.3.
361
(b) (ρ, f (ρ)) region, β21 = 0.4, ρ0 = 0.5.
Fig. 1. Numerical results: f (ρ) (plain blue line). The straight dotted line is the observed probability P111 . β11 = 0.3, β12 = 0.4, β22 = 0.5, δ = 0.3.
be identified. Therefore, following the assertion of Heckman (1978, p. 957) in a more general context, Wilde (2000) argued that only the full rank of the (regressor) data matrix is needed to identify all the model parameters. We show that the simple criterion proposed by Wilde (2000) and the rank condition proposed by Heckman (1978) are not sufficient to ensure identification in Models (1)–(2) for the following reason: the fact that the number of unknown is larger than or equal to the number of independent probabilities does not ensure unicity of the solution since the system of equations is nonlinear in the parameters. We provide numerical evidence that contradicts the result of Wilde and suggests that the model without exclusion is usually only partially identified. Finally, we point out that beside the fact that an exclusion variable increases the number of observed independent probabilities, its intrinsic feature to shift the selection equation (1) by keeping fix the outcome equation (2) allows us to point identify the model. All our results hold, also, for a sample selection model with binary outcome. 2. Failure of point identification We consider the simple case where a dichotomous regressor enters both equations. In (1) and (2), let xT1 = xT2 = [1, x] and β1 = [β11 , β12 ]T the associated parameters where x ∈ {0, 1}, a binary regressor. As noted by Wilde, we observe now 6 independent probabilities, and we have 6 parameters to identify i.e (β11 , β12 , β21 , β22 , δ, ρ). We will use the following notation: Pijk ≡ P (Y1 = i, Y2 = j|x = k)
for all (i, j, k) ∈ {0; 1} . 3
(3)
Wilde argued that with 6 independent equations and 6 parameters, we have now enough variation in the model to identify the parameters, unlike in the case without covariates where we had 3 independent equations with 4 parameters. Although this argument is a sensible one when the equations are linear in the parameters, it is likely to fail when linearity or monotonicity does not hold. For instance, consider the following trivial nonlinear single equation with one parameter ρ 2 − 14 = 0.
First, note that β1 = [β11 , β12 ]T will be identified from the usual hypothesis on a probit model with the outcome variable Y1 . Second, since the error terms are jointly normally distributed with correlation ρ , we can write: u1 = ρ u2 + e where e follows N 0, 1 − ρ 2 and e is independent of u2 . Therefore, we can derive
the following equalities:
β11 + ρ y = Φ − φ (y) dy 1 − ρ2 −β21 +∞ β11 + ρ y = Φ φ (y) dy 1 − ρ2 −β21 −δ +∞ β11 + β12 + ρ y φ (y) dy = Φ − 1 − ρ2 −β21 −β22 +∞ β11 + β12 + ρ y = Φ φ (y) dy. 1 − ρ2 −β21 −β22 −δ +∞
P010
P110
P011
P111
(4)
(5)
(6)
(7)
Note now the following: since β11 is identified and the integrand is always positive, once you fix a value for ρ , the right-term of Eq. (4) is strictly monotone in β21 . It follows that we identify a unique value for β21 given ρ . By using the same recursive solving strategy applied to Eqs. (5)–(6), we find that all the parameters are identified given a value of ρ . The question is whether ρ will be identified once we consider also (7). Once we solve the first three equations for β2 = [β21 , β22 ]T and δ given ρ , the support of the integral on the righthand side term (RHS) of Eq. (7) depends on ρ , and the latter is not necessary monotone with respect to ρ . The following numerical results suggest the nonmonotonicity of this function and find that several values of ρ might solve the system of equation.1 Denote by f (ρ) the RHS of Eq. (7): f (ρ) =
+∞
∗ (ρ)−β ∗ (ρ)−δ ∗ (ρ) −β21 22
Φ
β11 + β12 + ρ y 1 − ρ2
φ (y) dy (8)
∗ ∗ (ρ), β22 (ρ), δ ∗ (ρ) solve Eqs. (4)–(6) given ρ . Fig. 1 where β21 plots f (·) for ρ ∈ (−1, 1) given different values of the other parameters.2 Considering the first set of parameters (Fig. 1(a)), f (ρ) exhibits a nonmonotonic behavior, increasing first, then decreasing after
1 Details on the numerical are exposed in the Appendix section. The routines can be found on the following link: https://sites.google.com/site/ismaelymourifie/ research-papers. 2 Note that the endpoints, where ρ approaches −1 and 1, are trimmed for better readability. The numerical approximation behaves poorly, mainly because of the term 1 − ρ 2 in the denominator.
362
I. Mourifié, R. Méango / Economics Letters 125 (2014) 360–363
(a) (ρ, β22 ) region, β12 = 0.4.
(b) (ρ, δ) region, β12 = −0.6.
′
′
0 0 Fig. 2. Numerical results: β2z (ρ) and β2z (ρ). β11 = 0.3, β20 = 0.4, β22 = 0.0, δ 0 = 0.3, ρ0 = 0.5. In Fig. 2(a) (Fig. 2(b)) β2z (ρ) the lowest slope (highest slope) and β2z (ρ) the highest slope (lowest slope) are increasing and intersect each other at a single point.
reaching a maximum. The identified set consists of two singletons. Considering the second set of parameters (Fig. 1(b)), we observe that f (ρ) is (relatively) flat in the neighborhood of P111 , suggesting weak or set identification. Several values of ρ contained in the interval [0.483; 0.596] deliver probability values close to the value observed (|f (ρ) − P111 | < 1e − 4).
in a more general bivariate probit model, that having an exclusion restriction is sufficient for point identification in models with common exogenous covariates that are present in both equations.3
Remark 1. Using the same strategy, we can show that in the example of Maddala where there is no covariate, the model fails to put any restriction on the correlation parameter i.e ρ is completely nonidentified. However β2 and δ are partially identified. This remark complements the result of Maddala (1983) by showing that (β2 , δ) are not completely nonidentified, but are partially identified.
We discussed identification in two equations probit model with endogenous dummy regressor. We contradict the identification criterion proposed by Wilde (2000) and argue that adding a regressor with enough variation allows to shrink the identified set, and may permit point identification in some cases, but in general, additional restrictions should complement the full rank condition. Therefore, we reinforce the opinion of the necessity of an exclusion restriction to ensure point identification in this model. Note that, even when economic theory offers little guidance on an excluded variable, inference in models where point identification fails because of a single parameter have been quite well studied (see for example Escanciano and Zhu, 2013).
One, might think that the identified set will shrink to a point as soon as the covariate is non-binary. In fact, if xT1 = xT2 ∈ {0, 1, 2} we have 9 independent probabilities and 6 parameters to be identified. We would think that there is an overidentification, but there is no necessary identification due to the nonlinearity of the system. Indeed, in addition to (4), (6), we have the following equation:
P012 =
+∞
−β21 −2β22
β11 + 2β12 + ρ y Φ − 1 − ρ2
Appendix. Details of the numerical exercise
φ (y) dy.
(9)
As previously we can invert the three equations and get
∗ (ρ) = Ψ0 (ρ, P010 ), (β21 + β22 )∗ (ρ) = Ψ1 (ρ, P011 ) and (β21 + β21 2β22 )∗ (ρ) = Ψ2 (ρ, P012 ). By solving this simple system we get the following equation in ρ : g (ρ) = Ψ1 (ρ, P011 ) − 21 Ψ2 (ρ, P012 ) − 1 Ψ (ρ, P010 ) = 0. Since g (ρ) is not necessarily monotone we 2 0
might have multiple solutions and then set identification. 2.1. Introducing an exclusion variable
An important insight from the strategy derived above is that all parameters are identified given the correlation parameter. Point identification requirement translates then in the existence of restrictions that will pin down the value of ρ . An exclusion restriction on a binary variable will provide just the right restriction. Applied to the case without covariates, having an exclusion restriction will provide two set of values P (Y1 = 0, Y2 = 1|z ) and P (Y1 = 0, Y2 = 1|z ′ ) related respectively to two functions β2z (ρ) ′
3. Conclusion
and β2z (ρ). In a proof not reported here, we show that the two functions are single-crossing. ρ is therefore uniquely identified at the crossing point of the two functions (see Fig. 2(a) and (b)). Indeed, recently and independently Han and Vytlacil (2013) showed,
The numerical exercise is performed using the random number generator and the programming platform of Matlab R2013b. The routines are available from the following link: https: //sites.google.com/site/ismaelymourifie/research-papers. We expose here the details. We initialize the parameters to: θ0 = 0 0 0 0 , β22 , δ 0 , ρ0 ). (β11 , β12 , β21 Step 1: Approximate the true DGP We draw an i.i.d normally distributed sample of size N × 2 (N = 500,000 in the reported results), denoted by U. Following Eqs. (4)–(7) and given θ0 , we approximate the integral on the RHS by using averages for each value of the covariate. For example, Pˆ 010 = 1/N
0 β11 + ρ 0 U2 0 ; U2 > −β21 . I U1 < − 1 − (ρ 0 )2 n=1
N
(A.1)
The resulting quantities approximate the true DGP. Step 2: Compute the solution of Eqs. (4)–(6) We discretize the domain of the correlation parameter ρ , i.e. the interval (−1; 1) into np points (np = 1000 in the reported results).
3 We thank the anonymous referee for pointing it out to us.
I. Mourifié, R. Méango / Economics Letters 125 (2014) 360–363
For every value ρ on the grid, we approximate first the RHS of Eq. (4) as function of β21 . This function is evaluated at β21 in the sub-routine function_p010.m. The sub-routine performs the following computation for each β21 : fp010 (β21 ) = 1/N
N n=1
I
β 0 + ρ U2 U1 < − 11 ; U2 > −β21 1 − (ρ)2
363
averaging technique as in (A.2). We report a plot of f (ρ) and the values of ρ , such that the observed DGP is close to f (ρ). References
. (A.2)
∗ We then use the built-in routine fminsearch.m to find β21 that
minimizes the squared difference between Pˆ 010 and fp010 (β21 ), the approximated value for RHS of Eq. (4). The result is then used in Eq. (5) to compute δ ∗ in the same way as in (A.2), for the same value ∗ of ρ . Finally, β22 is also recursively computed from Eq. (6). Step 3: Evaluate f (ρ) For every value ρ on the grid and the corresponding values ∗ ∗ β21 , δ ∗ and β22 , we compute the RHS of (7), using the same
Escanciano, J.C., Zhu, L., 2013. Set inferences and sensitivity analysis in semiparametric conditionally identified models, cemmap working paper, Centre for Microdata Methods and Practice, No. CWP55/13. Han, S., Vytlacil, E.J., 2013. Identification in a generalization of bivariate probit models with endogenous regressors, Department of Economics Working Paper 130908, University of Texas at Austin. Heckman, J.J., 1978. Dummy endogenous variables in a simultaneous equation system. Econometrica 48, 931–959. Maddala, G.S., 1983. Limited-dependent and Qualitative Variables in Econometrics, vol. 3. Cambridge University Press. Sartori, A.E., 2003. An estimator for some binary-outcome selection models without exclusion restrictions. Polit. Anal. 11 (2), 111–138. Wilde, J., 2000. Identification of multiple equation probit models with endogenous dummy regressors. Econom. Lett. 69 (3), 309–312.