Economics Letters North-Holland
329
33 (1990) 329-332
A SIMPLE TEST FOR EXOGENEITY IN PROBIT, AND POISSON REGRESSION MODELS
LOGIT,
Jeffrey GROGGER LJniversity Received Accepted
of California, Santa Barbara, CA 93106, USA 28 November 1989 23 January 1990
Several tests for exogeneity currently exist for single equation probit models, most based on variants of limited information maximum likelihood estimators. This paper develops an alternative test by recasting the estimation problem in a non-linear least squares framework. The test is applicable to logit models as well as probit models, and is also extended to provide an exogeneity test for the Poisson regression model.
1. Introduction Much effort has been given to the development of estimators and specification tests suitable for probit models with endogenous regressors [see, e.g., Amemiya (1978) Heckman (1978) Nelson and Olsen (1978) Smith and Blundell (1986) Newey (1987) and Vuong and Rivers (1988)]. In general, these estimators are developed in a limited information maximum likelihood framework in which the disturbance of the single structural probit equation is assumed to be normally distributed, jointly with the disturbances of the reduced forms of endogenous variables appearing in the equation of interest. Two-step estimators have been proposed, obtained by maximizing a probit likelihood after estimating the reduced from parameters by first-stage linear regressions. GLS modifications of this procedure have been proposed to increase efficiency [see Amemiya (1978) and Newey (1987)]. Specification tests of the null hypothesis of exogeneity can be carried out with statistics generated by these multi-stage estimation techniques. In this paper, we motivate an alternative Hausman-type specification test for exogeneity by considering the estimation problem in a non-linear least squares framework. Econometric practitioners may prefer this test to those of previous authors for a number of reasons. First, the test is computationally convenient. Second, we make no distributional assumptions regarding the endogenous regressors. The specification test can therefore be applied to binary logit as well as probit models, and is robust to departures from normality which are likely to occur in actual application. Next, the approach taken to develop the test for binary outcome models can be applied directly to many other models with non-linear regression functions. In particular, we develop a specification test for endogeneity in Poisson regression models. Finally, the non-linear instrumental variables estimator used in conducting the tests is consistent in the presence of endogenous regressors, and therefore may provide useful starting values for more efficient maximum likelihood estimation or be of interest in its own right. Section 2 motivates our approach to specification testing of probit and logit models. Computation of the test statistic is considered in section 3. In section 4 we illustrate the testing procedure for Poisson regression models. Section 5 concludes. 0165-1765/90/$03.50
0 1990 - Elsevier Science Publishers
B.V. (North-Holland)
330
J. Grogger / A simple test for exogeneity
2. Motivation Throughout the paper, we assume the observed data vector to consist of i.i.d. random variables with finite second moments, and that the first and second moments of maximands from which our estimators are derived are bounded in expectation. We also assume that the parameter vector to be estimated lies in the interior of a compact convex set of Euclidean space. The only deviation form the usual regularity conditions used to ensure the asymptotic properties of extremum estimators we entertain is that of possible dependence between a subset of the regressors and the disturbance term. Binary dependent variable models are usually motivated by considering a linear model of the form y,*=
xy+X#+U,,
i=l,...,
U,
where y,* is a latent, unobserved dependent variable, y. is a 1 X G vector of possibly endogenous regressors, X, is a 1 x K, vector of exogenous regressors, y and j3 are commensurable unknown parameter vectors, and ui is a zero mean i.i.d. disturbance vector with known variance. The probit model is generated by assuming ui to be normally distributed with variance one, while the logit model arises from the assumption that ui follows the standardized sech-squared distribution. Assuming that the observable yI = 1 for y,* > 0, and equals zero otherwise allows us to write E(_Yi]Zi)=F(Zl6), where ZiE(Yi, X,), a=(~, P>, and F(z) = @i(z) for the probit specification, F(z) = e’/(l + e’) for the logit model. The sample log likelihood can be written as lnl=
~u,lnF(Z,S)+(l-y,)ln[l-F(Z,S)]. i=l
For exogenous x:, maximization of (1) either the probit or logit model, which is is endogenous, however, 8m,Iis generally For our purposes, it is useful to recast this, note that E(yi]Zi)
(I)
yields the unique maximum likelihood estimator of 6 for consistent and asymptotically efficient. In the case where x inconsistent. the problem in a nonlinear least squares framework. To do
=F(ZJ).
For exogenous q, the assumptions given above ensure that the parameter vector 6 can be consistently estimated by non-linear least squares for either choice of F( 0). The NLLS estimator solves
Weighting the observations by ~,yi’~ = [ F( Z$)(l - F( Zj8))]-“2, where 8 is an initial consistent unweighted NLLS estimate, will generally increase efficiency due to the heteroskedasticity of e=(e,,...,e,). While this estimator too is generally inconsistent in the presence of endogenous y, considering the problem as a non-linear least squares estimation exercise suggests a consistent estimator which can be used to form a test for exogeneity. If one has available G or more instrumental variables w which are correlated with x but uncorrelated with the disturbances ujr then the non-linear instrumental variables estimator, defined as the solution to lllfl
C [Y;-F(ZjS)]~(W'W)-'~'[yj-F(ZiS)] i=l
J. Grogger / A simple test for exogeneity
331
will provide consistent estimates of S. Again, the weighted NLIV estimator will in general be more efficient than its unweighted counterpart. By casting the estimation problem in a non-linear least-squares framework, we see that we have available a consistent, efficient estimate of S under the null hypothesis of exogenous x:, and an estimator which is consistent under both the null and the alternative. The exogeneity of q can therefore be tested using Hausman’s (1978) principle.
3. Computation
of the test statistic
The test statistic h is given by
%lJ[ mwV) - wML)] +(Plwv -
h = (?,,,v -
PbfL>~
(2)
where ?(T,) is the upper G X G block of the estimated covariance matrix of T,, and A + denotes a Moore-Penrose inverse of A. Use of the weighted NLIV estimator will improve the power of the test. Under the null, hh xi. Given current computing technology, this form of the test may be extremely simple to compute. For example, computation of the two sets of NLIV estimates (unweighted and weighted), the ML estimates, and the test statistic requires only four commands using the software package LIMDEP. Another computationally convenient version of the test statistic can also be given. Following White (1987), h can be computed as nR2 from the regression of a vector of units on [Y,-F(Z~~,,)]Z,
and
[~i-F(Z,‘~~~v)]~.
The first terms are the scores of the probit or logit likelihood, evaluated at s^,,. The second set of Under the null, these latter terms terms gives the scores corresponding to y:, evaluated at d,,,,. should sum to zero, as is guaranteed for the first terms. Differences in the scores evaluated at the different parameter estimates will yield non-zero coefficients in the auxiliary regression however, giving a relatively large value for the R*.
4. Exogeneity tests for the Poisson regression model The Poisson regression model naturally arises in the analysis of dependent variables which takes on only non-negative integer values, and has recently received much attention from econometricians [see Gourieroux et al. (1984), Cameron and Trivedi (1986) Portney and Mullahy (1986) and Grogger (1990)]. The model is generally motivated by assuming that e-A#gJ, p(Yilzt)=*
I.
where
h, = exp( Z,S),
i = 1,. . . , n.
Since the sample log-likelihood has favorable computational as well as statistical properties, maximum likelihood has been the estimation method of choice in the literature. The estimation of this model in the presence of endogenous regressors has been problematic, however [see, e.g., Cameron and Trivedi (1988), and Grogger (1990)]. Since E(y, 1X,) = exp(Z$) for this model, however, it is clear from the above discussion that a NLIV estimator will provide consistent estimates of 6.
332
J. Grogger / A simple test for exogeneity
For this model, the Hausman regression of unity on
test statistic
can be computed
as in (2), or as nR2
from the
[y,-exp(y,&,)]-T and [yi-exP(Z,Jmw)]Y,.
5. Summary This paper has provided simple specification tests for exogeneity for several non-linear models that are extensively used in applied econometric research. While the test has been illustrated specifically for probit, logit, and Poisson regression models, its general form is clearly applicable in many other non-linear settings, including models for censored and truncated variables.
References Amemiya, T., 1978, The estimation of a simultaneous equation generalized probit model, Econometrica 46, 1193-1205. Cameron, A. and P. Trivedi, 1986, Econometric models based on count data: Comparisons and application of some estimators and tests, Journal of Applied Econometrics 1, 29-53. Cameron, A., and P. Trivedi, 1986, Econometric models based on count data: comparisons and application of some estimators and tests, Journal of Applied Econometrics 1, 29-53. Gourieroux, C., A Montfort and A. Trognon, Pseudo maximum likelihood methods: Applications to Poisson models, Econometrica 52. 701-720. Grogger, J., 1990, The deterrent effect of capital punishment: An analysis of daily homicide counts, Journal of the American Statistical Association (forthcoming). Heckman, J., 1978, Dummy endogenous variables in a simultaneous equation system, Econometrica 46, 931-959. Nelson, F. and L. Olsen, 1978, Specification and estimation of a simultaneous equation model with limited dependent variables, International Economic Review 19, 695-705. Newey, W., 1987, Efficient estimation of limited dependent variable models with endogenous explanatory variables, Journal of Econometrics 36, 231-250. Portney, P. and J. Mullahy, 1986, Urban air quality and acute respiratory illness, Journal of Urban Economics 20, 21-38. Smith, R. and R. Blundell, 1986, An exogeneity test for a simultaneous equation tobit model with an application to labor supply, Econometrica 54, 679-685. Vuong, Q. and D. Rivers, 1988, Limited information estimators and exogeneity tests for simultaneous probit models, Journal of Econometrics 39, 347-366. White, H., 1987, Specification testing in dynamic models, in: T. Bewley, ed., Advanced in econometrics (Cambridge, University Press) l-58.