Economics Letters 127 (2015) 47–50
Contents lists available at ScienceDirect
Economics Letters journal homepage: www.elsevier.com/locate/ecolet
An LM test based on generalized residuals for random effects in a nonlinear model William Greene a,∗ , Colin McKenzie b a
New York University, USA
b
Keio University, Japan
highlights • • • • •
We derive the LM test statistic for random effects in a panel probit model. We obtain a reparameterization of the statistic that produces a feasible calculation. We show that the statistic can be computed using generalized residuals. We demonstrate that the results generalize to other models. The test is employed in a substantive application.
article
info
Article history: Received 29 November 2014 Received in revised form 18 December 2014 Accepted 22 December 2014 Available online 31 December 2014
abstract We obtain an LM test for the random effects probit model. In the natural parameterization of the model the necessary derivatives are identically zero under the null hypothesis. After a reparameterization, the feasible LM test is based on generalized residuals. © 2014 Elsevier B.V. All rights reserved.
JEL classification: C23 C25 I11 Keywords: LM test Panel data Probit Random effects
1. Introduction Empirical analyses involving limited dependent variables and random individual effects in panel data sets have become quite common. By far the leading application of the random effects (RE) model after the linear regression is the binary probit model. The estimated coefficients from RE and pooled probit models are different because of the different normalizations implied by the models. As Arulampalam (1999) discusses, in the presence of random effects, an adjustment of the estimated partial effects is needed to remove an ambiguity in the interpretation of the estimates.
∗ Correspondence to: Stern School of Business, New York University, 44 West 4, St., New York, NY, 10012, USA. Tel.: +1 212 998 0876. E-mail address:
[email protected] (W. Greene). http://dx.doi.org/10.1016/j.econlet.2014.12.031 0165-1765/© 2014 Elsevier B.V. All rights reserved.
Our interest is in the Lagrange Multiplier (LM) test for random effects in the probit model. The LM test has provided a standard means of testing parametric restrictions for a variety of settings. Its primary advantage among the trinity of tests (LM, Likelihood Ratio (LR), Wald) is that it is based on the null, restricted model, which is usually simpler to estimate than the alternative, unrestricted one. Breusch and Pagan’s (1980) LM test for random effects in a linear model that is based on pooled OLS residuals is the leading example. Testing for random effects in the probit model is an example of a problem that emerges when the parametric restriction in the null hypothesis puts the value of a variance parameter on the boundary of the parameter space. The restriction is that the standard deviation of the random effect equals zero. When RE probit models are estimated, popular computer packages automatically produce LR and Wald-type tests of the null hypothesis of no random effects, but would appear to use the χ(21) (or standard normal)
48
W. Greene, C. McKenzie / Economics Letters 127 (2015) 47–50
distribution to compute the p-values for these tests. If, under the null hypothesis, the parameter being tested lies on the boundary of the parameter space, an additional advantage of the LM test is that it will still have standard distributional properties, whereas the LR and Wald tests will not (see Andrews, 2001). In fact, in testing for random effects in the probit model, the LR and Wald tests will be distributed as a (1/2)χ(21) distribution under the null hypothesis (see Gourieroux et al., 1987). This means the correct critical values for these two tests at the 5% and 10% significance level are 5.02 and 3.84 respectively, rather than the commonly used values of 3.84 and 2.71 from the χ(21) distribution. The RE probit model is, after the linear regression model, by far the leading application of the more general class of random effects models. Despite the obvious simplicity of the restricted model, the LM test for this model does not appear in the existing literature. One reason for this is that the usual parameterization of the model has the inconvenient feature that the score vector is identically zero at the restricted ML estimates. The received literature (e.g., Chesher, 1984, Lee and Chesher, 1986 and Kiefer, 1982) identifies a handful of specific cases in which the score vector needed to compute the LM statistic is identically zero at the restricted estimates, which would seem to preclude using the LM test. We find that the RE probit model represents an entire class of such models. Lee and Chesher (1986) discuss a general theory of how to deal with score vectors that are zero under the null hypothesis. Despite what would seem to be its broad application, we have not found any applications in the subsequent 30 years of literature. We will provide a useful expression for the LM test statistic and illustrate its use with an empirical application on hospitalization behavior. 2. The random effects probit model
i = 1, . . . , n; t = 1, . . . , Ti ,
yit = 1[y∗it > 0],
εit ∼ N [0, 12 ], ui ∼ N [0, 12 ], E [εit εjs ] = 0, i ̸= j, t ̸= s; E [ui uj ] = 0, i ̸= j, E [εit us ] = 0 ∀i, t , s,
(1)
n i=1
log
Ti
Φ [qit (β′ xit + σu ui )],
i=1
=
n
qit φ[qit ait ] Φ [qit ait ]
∞
T i
−∞
t =1
log
,
(4)
Φ [qit ait ] φ (ui ) dui
−∞ t =1
where ait = β′ xit + σu ui . In order to compute the LM statistic, we need to evaluate this expression at σu = 0. Moving all terms not involving ui outside the integrals produces
∂ log Li (β, 0) ∂σu Ti Ti ′ Φ [qit (β xit )] t =1
t =1
=
Ti
qit φ[qit (β′ xit )] Φ [qit (β′ xit )]
∞ −∞
ui φ (ui ) dui
.
∞
Φ [qit (β xit )] ′
−∞
(5)
φ (ui ) dui
t =1
The integrals in the numerator and denominator are E [ui ] = 0 and 1, respectively. Regardless of the value of β′ xit , each term in ∂ log L(β, σu )/∂σu is identically zero when σu equals zero. The terms in ∂ log Li (β, σu )/∂β are also zero. The score vector under the null hypothesis is identically zero. The result (and the derivation to follow) will extend generally to other single index models with random effects. (Surprisingly, it also holds for the linear regression model for which Breusch and Pagan’s LM test has been used since 1980. See Chesher, 1984.)
Chesher (1984), Lee and Chesher (1986) and Cox and Hinkley (1974) suggested reparameterization of the model as a strategy for obtaining the LM test. We use γ = σu2 . The log likelihood becomes log L(β, γ ) n
log Li (β, γ )
i=1
=
n
∞
log −∞
i=1
T i
√ Φ qit (β xit + ui γ ) φ (ui ) dui . ′
(6)
t =1
t =1
1
√
∂ log Li (β, γ ) = ∂γ
2 γ
Ti
∞
Φit
−∞ t =1
Ti
git
ui φ (ui ) dui
t =1
∞
Ti
,
(7)
Φit φ (ui ) dui
t =1
Φ [qit (β′ xit + σu ui )] φ (ui ) dui
log Li (β, σu ),
ui φ (ui ) dui
Ti
−∞ ∞
Then,
log L(β, σu )
=
Ti
t =1
(2)
where Φ (t ) is the standard normal CDF and qit = 2yit − 1. ML estimation is based on the unconditional log likelihood given by n
Φ [qit ait ]
t =1
=
=
where β and xit are K × 1 vectors. (The analysis to follow is not dependent on normality for ui , though that is the natural case to consider.) The log likelihood for a sample of n observations, conditioned on the unobserved heterogeneity, u1 , u2 , . . . , un , is log L(β, σu |u1 , . . . , un ) =
−∞
2.2. LM test based on a reparameterization
The random effects probit model is y∗it = β′ xit + σu ui + εit ;
∂ log Li (β, σu ) ∂σu Ti ∞
√
(3)
i=1
where φ(t ) is the standard normal PDF. Butler and Moffitt’s (1982) estimation method based on Hermite quadrature is generally used in contemporary applications.
where bit = β′ xit + ui γ , φit = φ(qit bit ), Φit = Φ (qit bit ) and √ git = qit φit /Φit . Note that git ui is ∂ log Φit /∂( γ ). Evaluated at γ = 0, the numerator now takes the form 0/0. We use L’Hôpital’s rule, taking the limits as γ approaches zero from above. Then,
∂ log Li (β, 0) ∂γ lim
2.1. LM test for random effects To form the LM statistic for the test of the null hypothesis of no random effects, σu = 0, we require ∂ log Li (β, σu )/∂σu ;
1
1 γ ↓0 2 2√γ
∞
−∞ Li
Ti
hit
+
t =1
=
∞
Ti
2 git
t =1 Ti
−∞ t =1
Φit
φ (ui ) dui
1 √ 2 γ
u2i φ (ui ) dui
, (8)
W. Greene, C. McKenzie / Economics Letters 127 (2015) 47–50
49
Table 1 Estimated probit models for hospitalization. Pooled
Random effects Standard errora
Coefficient Constant Age Age squared Health Sat. Handicapped Handicap degree Marital status Education Household income Kids present Self-employed Civil servant Blue collar Working Public insurance Addon insurance
0.2411
0.3600 0.0163 0.0002 0.0085 0.0449 0.0012 0.0530 0.0100 0.1097 0.0457 0.0856 0.0809 0.0501 0.0659 0.0747 0.1188 N/A
−0.0295∗ −0.0004∗ −0.1134∗∗∗ −0.0302 0.0034∗∗
−0.0463 −0.0242 0.1734 0.0306 −0.0520 −0.0445 0.0726 −0.0695 −0.1262∗ 0.2615∗∗ 0.0000
ρ
Coefficient 0.2820
−0.0454∗∗∗ ∗∗∗
0.0006
−0.1243∗∗∗ −0.0493 0.0037∗∗∗
−0.0534 −0.0290∗∗ 0.2270 0.0524 −0.1143 −0.0485 0.0898 −0.0652 −0.0999 0.2500∗ 0.3361∗∗∗
−3674.9207
log L N
0.3847 0.0172 0.0002 0.0084 0.0644 0.0013 0.1175 0.0512 0.0894 0.0997 0.0576 0.0693 0.0576 0.0693 0.0871 0.1434 0.0264
−3542.6122 3691 Groups, Σi Ti = 14, 243
14,243
LR test Wald test LM test
Standard error
264.62 [0.000] 162.08 [0.000] 129.44 [0.000]
a
Standard errors corrected for clustering by individual. *, **, *** indicate significance at 90%, 95%, 99% levels.
where hit = ∂ 2 log Φit√(qit bit )/∂(qit bit )2 = −bit git − git2 . The two occurrences of 1/(2 γ ) in (8) cancel. The integral in the numerator now involves E [u2i ] = 1. Moving the invariant (with respect to ui ) terms out of the integrals, the product terms in the numerator and denominator cancel and we obtain
T i 0 git xit n t = 1 ∂ log L(β, γ ) (γ = 0) = T T 2 i i 1 β i=1 ∂ h0it + git0 γ 2 t =1
n
=
i=1
2.3. LM test based on generalized residuals The generalized residual for the probit model under the null hypothesis (see Chesher 1982 and Gourieroux et al., and Irish, . The lower term in (9) can be
written as ∂ log L(β, 0)
∂γ
=
2 i=1
=−
=
2
Ti n 1
−β′ xit wit − wit
t =1
+
T i
wit
2
t =1
Ti Ti Ti n N 1 1 β′ xit wit + wit wis 2 i=1 t =1 2 i=1 s=1,s̸=t t =1
Ti Ti n 1
2 i=1 s=1,s̸=t t =1
wit wis .
The end result follows from i G evaluated at (βˆ MLE , 0) equals (0′ , gγ ) and (G′ G)(K +1),(K +1) is the (K + 1), (K + 1) element of (G′ G)−1 .
(9)
i=1
qit φ[qit β′ xit ] Φ [qit β′ xit ]
(11)
′
3. Application gi0 ,
where the superscripts on hit and git indicate they are evaluated at γ = 0.
1987) is wit = git0 =
LM = (i′ G)(G′ G)−1 (G′ i) = (gγ )2 (G′ G)(K +1),(K +1) .
t =1
n
gi (β, 0) =
second line of (10) includes the second term in curled braces in the first line. Denote by G the n × (K + 1) matrix with ith row equal to g′i0 and let i denote an n × 1 column vector of ones. Then, we compute the LM statistic using the pooled MLE of β, the data and wit using
(10)
The first term in the second line of (10) is (−1/2)β′ ∂ log L(β, 0) /∂β = 0 at the restricted (pooled) MLE. The second term in the
Riphahn, Wambach and Million (RWM, 2003) use data from the German Socioeconomic Panel Survey over the period 1984–95 to model jointly the number of times the individual visits a doctor and the number of times a patient is hospitalized in a year, and determine whether insurance significantly affects the demand for health care. The authors conduct separate analyses for male and female patients. Here, we analyze whether or not male patients are hospitalized in the relevant year. The unbalanced panel contains up to seven years of data for 3691 households for a total of 14,243 observations. We follow the specification used by RWM. The variables in the model are age, age squared, health satisfaction, a dummy variable for whether or not the person is handicapped, the degree of the handicap, marital status, years of schooling, household income, a dummy variable for whether or not there are children under the age of 16 in the household, and dummy variables for self-employment, civil servants, blue collar employees and employed people, and for take up of public and addon health insurance. Estimates of the pooled probit model and the random effects probit model are reported in Table 1. The computed value of the LM test is 129.44, which clearly rejects the null hypothesis of no random effects (one degree of freedom, p-value in brackets). The values of the Wald and LR tests are 162.08 and 262.42, respectively. (The p-values are computed using the non-standard distribution noted in Andrews, 2001.) All three tests clearly reject the null hypothesis of no random effects.
50
W. Greene, C. McKenzie / Economics Letters 127 (2015) 47–50
4. Conclusion The strategy used here appears in Lee and Chesher (1986) and Chesher (1984) for some special cases, a sample selection model, the stochastic frontier model and models with unobserved heterogeneity. The reformulation in terms of generalized residuals is new. The latter result implies that the test can easily be extended to other single index models, such as the Tobit and Poisson regression models and, in fact, is what is used for the linear random effects model. We find it surprising that despite its simplicity, the LM test for random effects in a probit model has not been used routinely, in spite of the fact that the null hypothesis being tested is typically part of empirical analysis using the probit model with panel data. Acknowledgments The authors thank Debopam Bhattacharya for helpful comments on an earlier version of this paper. The second author acknowledges the financial support of the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) Supported Program for the Strategic Research Foundation at Private
Universities entitled ‘Globalization and the Building of a High Quality Economic System.’ References Andrews, D.W.K., 2001. Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica 69, 683–734. Arulampalam, W., 1999. A note on estimated coefficients in random effects probit models. Oxford Bull. Econ. Stat. 61, 597–602. Breusch, T.S., Pagan, A.R., 1980. The Lagrange multiplier test and its applications to model specification in econometrics. Rev. Econom. Stud. 47, 239–253. Butler, J.S., Moffitt, R., 1982. A computationally efficient quadrature procedure for the one-factor multinomial probit model. Econometrica 50, 761–764. Chesher, A.D., 1984. Testing for neglected heterogeneity. Econometrica 52, 865–872. Chesher, A., Irish, M., 1982. Residual analysis in the grouped and censored normal linear model. J. Econometrics 34, 33–61. Cox, D.R., Hinkley, D.V., 1974. Theoretical Statistics. Chapman and Hall, London. Gourieroux, C., Monfort, A., Renault, E., Trognon, A., 1987. Generalized residuals. J. Econometrics 34, 5–32. Kiefer, N.M., 1982. A remark on the parameterization of a model for heterogeneity. Working Paper 278, Department of Economics, Cornell University, Ithaca. Lee, L.-F., Chesher, A., 1986. Specification testing when score test statistics are identically zero. J. Econometrics 31, 121–149. Riphahn, R.T., Wambach, A., Million, A., 2003. Incentive effects in the demand for health care: a bivariate panel count data estimation. J. Appl. Econometrics 18, 387–405.