Economics Letters 80 (2003) 373–377 www.elsevier.com / locate / econbase
Fixed effects models with time invariant variables: a theoretical note Ronald L. Oaxaca*, Iris Geisler University of Arizona, Department of Economics, 401 McClelland Hall, PO Box 210108, Tucson, AZ 85721 0108, USA Received 25 October 2002; accepted 17 March 2003
Abstract This paper demonstrates the equivalence between a consistent two-stage GLS estimator and the pooled OLS estimator of the coefficients on time invariant covariates in an unbalanced FE panel. In general the estimated standard errors differ between these two procedures. 2003 Elsevier B.V. All rights reserved. Keywords: Panel data; Time invariant regressors JEL classification: C2
It is not uncommon to find explanatory variables of interest in panel data sets that are time invariant, e.g. race, sex, regional location. In a fixed effects model these variables are ‘‘swept away’’ by the within estimator of the coefficients on the time varying covariates. Nevertheless, it is possible to identify and consistently estimate the effects of the time invariant regressors through two-stage procedures. Hausman and Taylor (1981) analyze models in which some of the variables (both time varying and time invariant) are endogenous. Baltagi (1995) provides a comprehensive treatment of panel data models in the contexts of both single equation and systems methods. Polacheck and Kim (1994) examine a single equation model in which the slope parameters of time-invariant regressors vary across individuals. In this paper we consider the case of a single equation model with an unbalanced design and time invariant regressors. We develop a two-stage GLS estimation procedure for consistent estimation of the coefficients on the time invariant regressors, and demonstrate the equivalence between these GLS coefficient estimates and the OLS coefficient estimates on the time invariant covariates from a pooled cross-section, time-series model. However, in general the estimated standard errors differ between these two procedures. * Corresponding author. Tel.: 11-520-621-4135; fax: 11-520-621-8450. E-mail address:
[email protected] (R.L. Oaxaca). 0165-1765 / 03 / $ – see front matter 2003 Elsevier B.V. All rights reserved. doi:10.1016 / S0165-1765(03)00121-6
R.L. Oaxaca, I. Geisler / Economics Letters 80 (2003) 373–377
374
The basic model may be expressed as Yit 5 ai 1 Xit b 1 ´it , i 5 1, . . . ,n
t 5 1, . . . ,T i ,
(1)
where ai is an individual specific intercept term, Xit is a 1 3 k 2 vector of observations on the time varying covariates, and b is a k 2 31 parameter vector. Consider the restriction ai 5 a 1 Z1ig 1 where a is a constant, Z1i is a 13k 1 vector of observations on time invariant covariates, g 1 is a k 1 31 parameter vector. Under the restrictions on the fixed effects, pooled OLS would be the estimator of choice given the usual assumptions on ´it . If the restrictions are not true, the fixed effects within estimator can be used to estimate b from the model: y it 5 x it b 1 ´it 2 ´i.
(2)
where y it 5 Yit 2 Yi. , x it 5 Xit 2 Xi. , and i. denotes the group mean average for the ith cross-sectional unit. The FE estimator is accordingly fe b˜ 5sX9MD Xd 21 X9MD Y
(3)
] ] where X is a nT 3k 2 observation matrix for the X9s, MD 5 I nT] 2 DsD9Dd 21 D9, D is a nT 3n ] ] n observation matrix for the individual dummy variables, T 5 n 21 o i51 T i , and Y is a nT 31 observation vector on the Y9s. Note that the group means representation of the restricted model is given by Yi. 5 a 1 Z1ig 1 1 Xi. b 1 ´i. . Upon subtracting Xi. b˜
fe
(4)
from both sides of Eq. (4) we obtain
Yi. 2 Xi. b˜ fe 5 a 1 Z1ig 1 1 hi , i 5 1, . . . ,n 5 Zig 1 hi ,
(5)
fe 1 2 where hi 5 Xi.s b 2 b˜ d 1 ´i. , Zi 5s1,Z1id and g 5sa,g d9. Note that Eshid 5 0, and Var(h ) 5 s ´ V, where h 5sh1 , . . . ,hnd9. A typical diagonal element of V is given by s(1 /T i ) 1 Xi.sX9MD Xd 21 X i.9 d, and a ] ] typical off-diagonal element is given by Xi.sX9MD Xd 21 X 9j. . Therefore, V 5sD9Dd 21 1XsX9MD Xd 21X9, ] where X 5sD9Dd 21 D9X is a n3k 2 observation matrix on the Xi. ’s. Polacheck and Kim (1994) estimate Eq. (5) by a OLS and a GLS method that corrects only for the heteroscedasticity inherent in the diagonal elements of V. Our two-stage GLS estimator of g also takes account of the non-zero off diagonal elements of V :
] ] ] ] ] fe g˜ fe 5sZ9V 21Zd 21Z9V 21sY 2X b˜ d, (6) ] ] where Z is a n3sk 1 1 1d observation matrix on the constant term and the Z 9i s, and Y is a n31 ] ] observation vector on the Yi. ’s. The variance / covariance matrix for g˜ fe is given by s 2´sZ9V 21Zd 21 . Now consider the restricted pooled cross-section, time series model specified by Yit 5 Zig 1 Xit b 1 ´it , i 5 1, . . . , n
t 5 1, . . . , T i .
ols Let gˆ denote the OLS estimator of g from Eq. (7):
(7)
R.L. Oaxaca, I. Geisler / Economics Letters 80 (2003) 373–377
375
gˆ ols 5sZ9Mx Zd 21 Z9Mx Y, ] where Z is the nT 3sk 1 1 1d observation matrix on the time invariant covariates and constant term, and Mx 5 I nT] 2 XsX9Xd 21 X9. It is easily shown that Varsgˆ olsd 5 s ´2 sZ9Mx Zd 21 . It turns out that gˆ ols 5 g˜ fe , or more formally ] ] ] ] ] fe Theorem. (sZ9Mx Zd 21 Z9Mx Y 5sZ9V 21Zd 21Z9V 21sY 2X b˜ d) The following results will be used in the proof of the theorem: ] Z 5sD9Dd 21 D9Z ] X 5sD9Dd 21 D9X ] Y 5sD9Dd 21 D9Y ] ] Proof. (Part 1) We first show that sZ9Mx Zd 21 5sZ9V 21Zd 21 . This is equivalent to showing Z9Mx Z 5 ] 21] Z9V Z. ] Upon substitution for Z, we have Z9Mx Z 5 Z9fDsD9Dd 21 V 21sD9Dd 21 D9gZ. Since Z ± 0 (nT] 3(k 1 11 )) , it suffices to show Mx 5 DsD9Dd 21 V 21sD9Dd 21 D9. Premultiplying by D9 and postmultiplying by D, yields D9Mx D 5 V 21 ⇒sD9Mx Dd 21 5 V. Substitution for Mx yields
fD9D 2 D9X(X9X)21X9Dg 21 5 V We make use of the matrix algebra result f A 1 BCB9g 21 5 A21 2 A21 BfC 21 1 B9A21 Bg 21 B9A21 (see Greene, 2000) where A 5 D9D, B 5 D9X, and C 5 (2X9X)21 . Upon making these substitutions, we obtain sD9Dd 21 2sD9Dd 21 D9Xf 2 X9X 1 X9DsD9Dd 21 D9Xg 21 X9D sD9Dd 21 5 V ⇒sD9Dd 21 1sD9Dd 21 D9XfX9X 2 X9DsD9Dd 21 D9Xg 21 X9D sD9Dd 21 5 V ⇒sD9Dd 21 1sD9Dd 21 D9XfX9MD Xg 21 X9D sD9Dd 21 5 V ] ] ] Substituting X for sD9Dd 21 D9X yields sD9Dd 21 1XsX9MD Xd 21X9 5 V, which completes the first part of the proof. This also proves that ] ] Varsgˆ olsd 5 s ´2sZ9Mx Zd 21 5 s ´2sZ9V 21Zd 21 5Varsg˜ fed. Part 2 ] ] ] fe The second half of the proof is to show Z9Mx Y 5Z9V 21sY 2X b˜ d. ] ] ] Upon substituting for Z, Y and X and collecting terms, we have Z9Mx Y 5 Z9DsD9Dd 21 V
fe
sD9Dd 21 D9sY 2 Xb˜ d.
21
As shown above in Part 1, Mx 5 DsD9Dd 21 V 21sD9Dd 21 D9; therefore, we can write fe fe Z9Mx Y 5 Z9MxsY 2 Xb˜ d ⇒ Z9Mx Y 5 Z9Mx Y 2 Z9Mx Xb˜ 5 Z9Mx Y
since Mx X 5 ] 0
(nT 3k 2 )
.
R.L. Oaxaca, I. Geisler / Economics Letters 80 (2003) 373–377
376
] ] ] fe This proves the second half of the theorem: Z9Mx Y 5Z9V 21 (Y 2X b˜ ). fe ] ] ] ] ] 21 21 21 Thus, sZ9Mx Zd 21 Z9Mx Y 5sZ9V Zd Z9V sY 2X b˜ d or gˆ ols 5 g˜ fe . Although the true standard errors for gˆ ols and g˜ fe are the same, the estimated standard errors are in general different for the two estimation procedures. This is due to differences in estimating the error variance. For model (7)
sˆ
2 ´
O O ] (Y 2 Z gˆ 2 X bˆ ) , 5 ]]]]]]]]]]] n
Ti
i 51
t 51
ols
it
ols 2
i
it
nT 2 (k 1 1 k 2 1 1)
and for GLS estimation of model (5) the error variance for ´ is estimated with the residuals from the within (fixed effects) estimator of model (2):
sˆ
2 ´
O O [(Y] 2 Y ) 2 (X 2 X )b˜ ] . 5 ]]]]]]]]]]]] n
Ti
i 51
t 51
fe 2
it
i.
it
i.
nT 2 (n 1 k 2 )
Consistency of the two-stage GLS estimator is shown next. From Eqs. (5) and (6) we can express the estimator as
g˜ fe
5 5 5
] 21] 21] 21 ] sZ9 V Zd Z9V sZg 1 hd ] ] ] g 1sZ9V 21Zd 21Z9V 21h ] ] ] ] g 1sZ9V 21Zd 21Z9V 21fXs b 2 b˜ fed 1 ´g.
] Consistency depends on the time series observations approaching infinity so that T i → `, ;i ⇒T → `. Therefore, ] ] ] ] ] ] p lim g˜ fe 5 g 1 ]lim T 21sZ9V 21Zd 21 ]lim T 21sZ9V 21Xd p lims b 2 b˜ fed ]T →` T →` T →` ] ] ] ] ] 1 ]lim T 21sZ9V 21Zd 21 p lim T 21sZ9V 21 ´d
F
G
F
5g
T →`
G
fe ] ] ] ] ] ] assuming flim ]T →` T 21sZ9V 21Zdg 21 is finite and positive definite, lim ]T →` T 21sZ9V 21Xd is finite, b˜ is ] ] consistent, and p lim ]T →` T 21sZ9V 21 ´d 5 0. A straight forward F test can be used to test
H 0 : ai 5 a 1 Z1ig 1 , (pooled OLS ) 1
H 1 : ai ± a 1 Z1ig : (FE )
s´ˆ 9ols ´ˆ ols 2 ´ˆ fe9 ´ˆ fed /(n 2 k 1 2 1) ] ]]]]]]]]] | F (n 2k 1 21 ),(nT2n2k . ] 2) ´ˆ 9fe ´ˆ fe /(nT 2 n 2 k 2 )
R.L. Oaxaca, I. Geisler / Economics Letters 80 (2003) 373–377
377
Acknowledgements We gratefully acknowledge the helpful comments of Badi Baltagi, Greg Crawford, Alfonso Flores-Lagunes, Daniel Houser, and William Horrace.
References Baltagi, B.H., 1995. Econometric Analysis of Panel Data. John Wiley and Sons, Chichester. Greene, W.H., 2000. Econometric Analysis, 4th Edition. Prentice Hall, Upper Saddle River, NJ. Hausman, J.A., Taylor, W.E., 1981. Panel data and unobservable individual effects. Econometrica 49, 1377–1398. Polacheck, S.W., Kim, M., 1994. Panel estimates of the gender earnings gap: individual-specific intercept and individualspecific slope models, in: Neuman, S., Silber, J. (Eds.). The Econometrics of Labor Market Segregation and Discrimination, Journal of Econometrics, 61, 23–42.