Regional Science and Urban Economics 40 (2010) 272–282
Contents lists available at ScienceDirect
Regional Science and Urban Economics j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / r e g e c
Dynamic panels with endogenous interaction effects when T is small J. Paul Elhorst ⁎ University of Groningen, Department of Economics and Econometrics, P.O. Box 800, 9700 AV Groningen, The Netherlands
a r t i c l e
i n f o
Article history: Received 22 January 2009 Received in revised form 17 March 2010 Accepted 17 March 2010 Available online 25 March 2010 JEL classification: C15 C21 C22 C23 Keywords: Dynamic panels Fixed effects Interaction effects Monte Carlo experiments
a b s t r a c t This paper compares the performance, measured in terms of bias, root mean squared error and computation time, of different estimators of the fixed effects dynamic panel data model extended to include endogenous interaction effects when T is small: (i) the bias corrected LSDV (BCLSDV) estimator based on Yu, De Jong and Lee (2008); (ii) the ML estimator based on Hsiao, Pesaran and Thamiscioglu (2002) and Bhargava and Sargan (1983) extended in this paper to include endogenous interaction effects; (iii) the GMM estimator based on Arrelano and Bond (1991) extended in this paper to include endogenous interaction effects; (iv) the ML estimator mixed with the BCLSDV parameter estimate of the endogenous interaction effects; and (v) the GMM estimator mixed with the BCLSDV parameter estimate of the endogenous interaction effects. It is found that the mixed ML/BCLSDV estimator outperforms the other estimators in terms of bias and root mean squared error when T = 5, and that the mixed GMM/BCLSDV estimator is a reasonable alternative for values of N greater than 500 due to differences in computation time. © 2010 Elsevier B.V. All rights reserved.
1. Introduction The idea that cross-sectional units interact with others has received considerable attention, as evidenced in the development of theoretical frameworks to explain social phenomena such as social norms, peer influence, neighborhood effects, network effects, contagion, epidemics, social interactions, interdependent preferences, etc. (for examples and reviews, see, among others, Manski, 1993; Brock and Durlauf 2001; Brueckner, 2003; Soetevent, 2006). The term we use throughout this paper is interaction effects. More recently, there has also been a growing interest in the estimation and testing of dynamic panel data models with interaction effects. Baltagi, Song, Jung, and Koh (2006) consider the testing of serial and spatial error correlation in a random effects model. Elhorst (2005) and Su and Yang (2007) study dynamic models with fixed or random effects extended to include cross-sectionally correlated error terms. Yu, De Jong, and Lee (2008) construct a bias corrected estimator for a dynamic model with fixed effects extended to include both contemporaneous and lagged endogenous interaction effects, by analytically modifying the Least Squares Dummy Variables (LSDV) estimator. They provide a rigorous asymptotic theory for the LSDV estimator and their bias corrected LSDV (BCLSDV) estimator when both the number of spatial units (N) and the number of time points (T) in the sample go to infinity such that the limit of the ratio of N and ⁎ Tel.: +31 50 3633893; fax: +31 50 3637337. E-mail address:
[email protected]. 0166-0462/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.regsciurbeco.2010.03.003
T exists and is bounded between 0 and ∞ (0 b lim(N/T) b ∞). In addition, they conduct Monte Carlo experiments for T = 10 and 50, and N = 49 and 196. Korniotis (2010) constructs a bias corrected LSDV estimator for a dynamic panel data model with fixed effects extended to include lagged endogenous interaction effects, also assuming 0 b lim(N/T) b ∞, and conducts Monte Carlo experiments for T = 20, 30, 40 and 60, and N = 48. One question asks to which extent these bias corrected estimators are of practical relevance when T is small relative to N as is the case in many empirical studies dealing with interaction effects.1 To answer this question, we consider two alternative estimators taken from the dynamic panel data literature, and extend these estimators in this paper to include contemporaneous endogenous interaction effects. The objective of this paper is to compare the performance of these estimators and of the BCLSDV estimator, as measured in terms of bias, root mean squared error and computation time, using Monte Carlo experiments for values of T that are smaller than in Yu et al. (2008). In addition, and for reasons to be explained below, we will consider values of N that are greater. The first estimator that is extended to include endogenous interaction effects in this paper is the maximum likelihood (ML) estimator based on Hsiao, Pesaran, and Thamiscioglu (2002) and Bhargava and Sargan (1983). Regression equations that include variables lagged one period in time are often estimated conditional upon the first observation. The same applies to the BCLSDV estimator 1
When N/T → ∞, Yu et al. (2008) find their estimators to be T consistent, as is usual.
J.P. Elhorst / Regional Science and Urban Economics 40 (2010) 272–282
of Yu et al. (2008). Nerlove (1999, p. 139), however, has pointed out that conditioning on the initial values is an “undesirable feature”, especially when the time dimension of the panel is short. If the process generating the data in the sample period is stationary, the initial values convey a great deal of information about this process since they reflect how it has operated in the past. By taking account of the density function of the first observation of each time-series of observations, the unconditional likelihood function is obtained. This procedure has been applied successfully to random effects dynamic panel data models formulated in levels (Bhargava and Sargan, 1983), though without endogenous interaction effects. More recently, Hsiao et al. (2002) have suggested a similar procedure (again without endogenous interaction effects) for a fixed effects dynamic panel data model. This procedure first-differences the model to eliminate the cross-sectional fixed effects and then considers the unconditional likelihood function of the first-differenced model taking into account the density function of the first first-differenced observations for each cross-sectional unit. They find that this likelihood function is well defined, depends on a fixed number of parameters and satisfies the usual regularity conditions. Thereupon, they conclude that the ML estimator is consistent and asymptotically normally distributed when N tends to infinity, regardless of the size of T. They also find that the ML estimator is asymptotically more efficient than the GMM estimator. The second estimator is the generalized method-of-moments (GMM) estimator based on Arrelano and Bond (1991). Just as the unconditional ML estimator, this estimator first-differences the model to eliminate the cross-sectional fixed effects, but then it applies GMM using a set of appropriate instruments. This is perhaps the most popular estimation method for dynamic panel models at the moment and can easily be extended to include endogenous interaction effects, as we will show in this paper. One of the main results of our Monte Carlo experiments is that the bias in the response parameter of the endogenous interaction effects, when using the BCLSDV estimator, is rather small. For this reason, we will also consider the performance of the ML and GMM estimators mixed with the BCLSDV parameter estimate of the endogenous interaction effects. These improved estimators are called the mixed ML/BCLSDV and GMM/BCLSDV estimators. The reason for also considering computation time when measuring performance is because any estimator that cannot be computed within a reasonable amount of time impedes its application in empirical research. Estimation of models with endogenous interaction effects involves the manipulation of N × N matrices, such as matrix multiplication, matrix inversion, the computation of characteristic roots and/or Cholesky decomposition.
273
These manipulations may be computationally intensive and/or may require significant amounts of memory if N is large. Although the mixed ML/BCLSDV estimator will eventually turn out to be the best performing of the estimators, the mixed GMM/BCLSDV estimator is found to be a reasonable alternative for values of N greater than 500 due to differences in computation time. The remainder of this paper is organized as follows. Section 2 sets out the dynamic panel data model. Section 3 presents the BCLSDV estimator, Section 4 the extended ML estimator, and Section 5 the extended GMM estimator. Section 6 describes the setup and the results of our Monte Carlo experiments in order to evaluate the performance of the different estimators, while Section 7 offers our conclusions. The Appendix contains an overview of methods that have been used to speed up computation time. 2. The model We focus on a dynamic panel data model with cross-sectional fixed effects extended to include endogenous interaction effects N
yit = τyit−1 + δ ∑ wij yjt + xit β + μi + εit ;
ð1Þ
j=1
where yit is the dependent variable for cross-sectional unit i at time t (i = 1, ..., N; t = 1, ..., T). τ is the response parameter of the lagged dependent variable, which is assumed to be restricted to the interval (− 1,1). xit a 1 × K vector of exogenous variables, and β a matching K × 1 vector of fixed but unknown parameters. The variable ∑jwijyjt denotes the interaction effect of the dependent variable yit with the dependent variables yjt in neighboring units, where wij is an element of a pre-specified nonnegative N × N spatial weights matrix W describing the arrangement of the cross-sectional units in the sample.2 The response parameter of the endogenous interaction effects, δ, is assumed to be restricted to the interval (1/ r min , 1), where r min equals the most negative purely real characteristic root of W after this matrix is row-normalized (see LeSage and Pace, 2009, pp. 88–89 for mathematical details). εit is assumed to be normally distributed throughout this paper, ɛit ∼ N (0, σ2), and μi is a cross-sectional fixed effect. The standard reasoning behind cross-sectional fixed effects is that they control for all time-invariant variables whose omission could bias the estimates in a typical cross-section study (Baltagi, 2005). Since the full sample data matrix of exogenous variables should have full column rank, it may not contain time-invariant variables because these variables would be perfectly collinear with the cross-sectional fixed effects.
3. The bias corrected LSDV estimator The Least Squares Dummy Variables (LSDV) estimator of the fixed effects model taking into account the endogeneity of ∑jwijyjt can be obtained by concentrating out the fixed effects (Yu et al., 2008; Elhorst, 2010). For this purpose the dependent variable is demeaned by 1 T * yit = yit − ∑ yit ; T t =1
ð2Þ N
* + δ ∑ w y* + x* β + ε* is and the same for the variables yit − 1, Σjwijyjt and xit (Baltagi, 2005, p. 34). Next, the resulting equation yit* = τyit−1 ij jt it it j=1
estimated by ML. Since the error terms have been assumed to be normally distributed, the corresponding log-likelihood function is
LogL = −
T NT 1 N 2 log 2πσ + T log jIN −δWj− 2 ∑ ∑ 2 2σ i = 1 t = 1
!2 " # N * * −δ ∑ w y *β : yit* −τyit−1 −x ij jt it j=1
ð3Þ
2 The symbol W, and the designations “spatial weights matrix” and “neighboring units” are taken from the spatial econometrics literature. The social economics literature uses the notation E(y|x) to denote endogenous interaction within groups at a particular point in time (Manski, 1993). Although they define different models, both specifications are related to each other (Leenders, 2002).
274
J.P. Elhorst / Regional Science and Urban Economics 40 (2010) 272–282
A simple two-step procedure can be used to find the maximum for τ, δ, β and σ2 in Eq. (3). Anselin (1988) and Anselin and Hudak (1992) furnishes the cross-sectional version of this procedure, while Elhorst (2010) extends this procedure for spatial panels. If the observations are stacked as successive cross-sections for t = 1,…,T to obtain NT × 1 vectors for Y⁎, (IT ⊗ W)Y* and Y− 1⁎, where the symbol ⊗ denotes the Kronecker product, and an NT × K matrix for X⁎ of the demeaned variables,3 then the estimates for τ, β and σ2 can be expressed in the function of the autoregressive coefficient δ τˆ ˜ 0 ˜ −1X* ˜ 0 ½Y*−δðI ⊗WÞY*; T ˆ = X* X* β 2 σˆ =
ð4aÞ
h i h i 1 ˜ τˆ β ˜ τˆ β ˆ 0 0 0 Y*−δðIT ⊗WÞY*−X* ˆ0 0 ; Y*−δðIT ⊗WÞY*−X* NT
ð4bÞ
⁎ 1 X *]. Substitution of Eqs. (4a) and (4b) into Eq. (3) yields the concentrated likelihood function, which contains only one where X̃ * = [Y− unknown parameter, the autoregressive coefficient δ LogLC = C−
NT 0 log ðe0 −δe1 Þ ðe0 −δe1 Þ + T log jIN −δWj; 2
ð5Þ
where C is a constant not depending on δ and e0 and e1 are residuals of successively regressing Y * and (IT ⊗ W)Y * on X̃ *. This maximization problem can only be solved numerically, since a closed-form solution for δ does not exist. However, since the concentrated log-likelihood function is concave in δ, the numerical solution is unique (Anselin and Hudak, 1992). The estimator of δ, τ and β is called the LSDV estimator. The LSDV estimator is not consistent if the model contains a lagged dependent variable as in this paper. Nickell (1981) has derived an analytical expression for the bias in the coefficient τ of the lagged dependent variable using the LSDV estimator in a fixed effects dynamic panel data model without endogenous interaction effects, and Yu et al. (2008) for the bias in the coefficients τ, δ, β and σ2 using the LSDV estimator extended to a spatial setting and provided that both N and T go to infinity such that 0 b lim(N/T) b ∞. This bias takes the following form (see the formulae in their box 1)4 h 3 i−1 1 tr 1−τˆ IN −δˆ W 6 7 N 6 7 2 3 6 7 h i n o τˆ −1 61 7 1 6 δˆ 7 6 tr W IN −δˆ W 1−τˆ IN −δˆW + tr W IN −δˆ W 7 6 7 6 7: N bias4 ˆ 5 = 6 N 7 β 6 7 6 7 0 ˆσ 2 6 7 4 5 1 2
ˆ 2σ
ð6Þ
2
To correct the LSDV estimator, Yu et al. (2008, Eqs. (17) and (18) propose the bias corrected LSDV (BCLSDV) estimator5 that takes the form 2
3 2 3 2 3 τˆ τˆ τˆ
−1 6 δˆ 7 6 δˆ 7 6 δˆ 7 −Σ 1 6 7 7 7 bias6 =6 − ; 4β 4β 4β ˆ 5 ˆ 5 ˆ 5 NT T 2 2 2 ˆ ˆ ˆ σ σ σ BCLSDV LSDV LSDV
ð7Þ
where Σ is the asymptotic information matrix of the LSDV parameter estimates under the normality assumption of ε (because this matrix is symmetric, the upper diagonal elements are not shown) 2
1 *0 * Y−1 Y−1 6 ˆ2 σ 6 6 h i 0 i h 6 1 *0 ~W ~ +W ~ 0W ~ Y * τˆ T*tr W ~ 0W ~ + 1 τˆ β ~ X* ˜ * I ⊗W ˜ τˆ β ˆ0 X 6 Y−1 IT ⊗W ˆ0 0 −1 T 2 6ˆ 2 ˆ σ 6σ 6 0 0 6 1 1 1 0 * * ~ X*β 6 ˆ X Y X * IT ⊗W X * X* 6 2 −1 −1 2 ˆ ˆ ˆ2 6 σ σ σ 6 4 T ~ tr W 0 0 ˆ2 σ
3 7 7 7 7 7 7 7 7 7 7 7 7 7 NT 5
ð8Þ
ˆ4 2σ
̃ and W= W(IN−δ̂ W)− 1. This information matrix is taken from Yu et al. (2008, box III) but changed to the notation we have at hand.6
3 Note that we assume that the data are sorted first by time and then by cross-sectional units, whereas the classic panel data literature tends to sort the data first by crosssectional units and then by time. 4 h ∞ h They also consider lagged endogenous interaction effects with coefficient ρ. We have set ρ = 0, as a result of which their matrix ∑∞ h = 0An takes the form ∑h = 0An = [IN − τ̂(IN − δ̂W)− 1]− 1. 5 When N/T → ∞ and T/N1/3 → ∞, Yu et al. (2008, theorem 5) find that the BCLSDV estimator is also an improvement upon the T consistency of the LSDV estimator. This is the case when N = Tp with 1 b p b 3. 6
For this purpose we also consulted a Matlab routine that has been made available by Jihai Yu.
J.P. Elhorst / Regional Science and Urban Economics 40 (2010) 272–282
275
4. Unconditional maximum likelihood estimation Taking first-differences of the model in Eq. (1) and rearranging terms, we have N
Δyit −δ ∑ wij Δyjt = τΔyit−1 + Δxit β + Δεit :
ð9Þ
j=1
If it is assumed that the initial observations yi0 and xi0 are available and the observations are stacked as successive cross-sections for t = 1,…,T to obtain an NT × 1 vector for ΔY and an NT × K matrix for ΔX, where ΔY = (ΔY1’, ..., ΔYT’)’ and ΔX = (ΔX1’, ..., ΔXT’)’, then the model can be rewritten as ðIN −δWÞΔYt = τΔYt−1 + ΔXt β + Δεt ; t = 1; …; T:
ð10Þ
The joint probability distribution function of ΔY, given ΔX and provided that ε is not serially correlated, is given by f ðΔY jΔX Þ =
T
∏ ft ðΔYt jΔYt−1 ; ΔXt Þ;
ð11Þ
t =1
where f and ft denote normal density functions. The log-likelihood function conditional upon the vector of first observations, ΔY1, leaving Δɛ1 aside, takes the form 1 N 1 2 0 −1 Log f ðΔYT ; ΔYT1 ; …; ΔY2 j ΔY1 Þ = NðT 1Þ log 2πσ + ðT 1Þ log jIN δW j log ðGÞ Δe ðG⊗IN Þ Δe; 2 2 2σ 2 3 ðI −δWÞΔY2 −τΔY1 −ΔX2 β 6 N 7 : Δe = 4 5; ðIN −δWÞΔYT −τΔYT−1 −ΔXT β
ð12aÞ
2
ð12bÞ
where G is a (T − 1) × (T − 1) matrix which has twos in the main diagonal, minus ones in the first subdiagonals and zeroes otherwise (see Baltagi, 2005, p. 137 for a mathematical exposition). To be able to specify the maximum likelihood function of the complete sample ΔYt (t = 1,…,T), the probability function of ΔY1 must be derived too. However, the derivation of the first term f(ΔY1|ΔY0,ΔX1) is more difficult because ΔY0 depends on unobserved pre-sample values Y− 1. Therefore, we repeatedly lag Eq. (10) by one period. For ΔYt − m (m ≥ 1) we get ðIN −δWÞΔYt−m = τΔYt−ðm + 1Þ + ΔXt−m β + Δεt−m ;
ð13Þ
or −1
ΔYt−m = τðIN −δWÞ
−1
ΔYt−ðm + 1Þ + ðIN −δWÞ
−1
ΔXt−m β + ðIN −δWÞ
Δεt−m ;
ð14Þ
provided that (IN − δW)is nonsingular. Then by continuous substitution of ΔYt− 1 up to ΔYt−(m− 1) into (10), we get h i −1 m−1 ðIN −δWÞΔYt = τ τðIN −δWÞ ΔYt−m
ð15Þ
h i −1 −1 m−1 + ΔXt β + τðIN −δW Þ ΔXt1 β + … + τðIN −δW Þ ΔXtðm1Þ β −1
+ Δεt + τðIN −δWÞ
h i −1 m−1 Δεt−1 + … + τðIN −δWÞ Δεt−ðm−1Þ :
Using Δɛt − j = ɛt − j − ɛt − j− 1 (j = 0,…,m − 1), the m disturbance terms in this equation may be rewritten as Ξ = εt + ðΠ−IN Þεt−1 + ðΠ−IN ÞΠεt−2 + … + ðΠ−IN ÞΠ
m−2
εt−ðm−1Þ −Π
m−1
εt−m :
ð16Þ
where Π = τ(IN − δW)− 1. As E(ɛt) = 0 (t = 1,…,T) and the successive values of ɛt are uncorrelated, we have E½Ξ = 0;
ð17aÞ
h i 2 0 −1 0 m−1 0 −1 0m−1 0 0 m−1 2 ≡σ V: ðΠ−IN Þ −ðΠ−IN ÞΠ IN −ΠΠ Π ðΠ−IN Þ − ΠΠ Var½Ξ = σ IN + ð−IN Þ IN −ΠΠ
ð17bÞ
If W is symmetric, we also have Π = Π’, as a result of which V is reduced to the simpler form −1
V = 2*ðIN + Π Þ
2m−1 : IN + Π
ð17cÞ
If the process started long ago and m approaches infinity as a result, the characteristic roots of the matrix Π, denoted by πi (i = 1,…,N), must lie inside the unit circle, |πi| b 1, because otherwise the matrix Πm − 1 in Eq. (17b) (or Π2m − 1 in Eq. (17c)) will become infinite and the matrix V in Eq.
276
J.P. Elhorst / Regional Science and Urban Economics 40 (2010) 272–282
(17b) may lose its property of being positive definite.7 Since Π = τ(IN − δW)− 1, the smallest and largest purely real characteristic roots of this matrix take the form τ/(1−δrmin) and τ/(1−δ), or vice versa (see Griffith, 1988, p. 44, Table 3.1). Consequently, the characteristic roots of the matrix Π will lie inside the unit circle if |τ| b 1 − δ for δ ≥ 0 and |τ| b 1 − δrmin for δ b 0 (see also Elhorst, 2008). Even though E[Ξ] = 0, the expected value of ΔY1 using Eq. (15) cannot be determined, since EððIN −δWÞΔY1 Þ = τΠ
m−1
m−1
j
ΔY1−m + ∑ Π ΔX1−j β;
ð18aÞ
j=0
or −1
EðΔY1 Þ = ðIN −δWÞ
τΠ
m−1
m−1
! j
ΔY1−m + ∑ Π ΔX1−j β ;
ð18bÞ
j=0
depends on unobserved pre-sample values of Y and X (unless m = 1). For similar reasons, the variance of ΔY1 is undetermined. Consequently, it is not possible to derive the exact unconditional maximum likelihood function of the model. The panel data literature has suggested different approximations for ΔY1, but the leading one in the dynamic panel data literature is by Bhargava and Sargan (1983), which is also applied in Hsiao et al. (2002). Starting with a random effects dynamic panel data model formulated in levels and without endogenous interaction effects, Bhargava and Sargan (1983) suggest predicting Y1 by all the nonstochastic variables in the model subdivided by time over the observation period and an additional error term. In other words, when the model contains K explanatory variables over T time periods, Y1 is approached by K × T regressors. Applying the Bhargava and Sargan approximation to ΔY1 in Eq. (15), we obtain m−1
j
ðIN −δWÞΔY1 = α0 ιN + ΔX1 α1 + … + ΔXT αT + ξ + ∑ Π Δε1−j ; j=0
ð19Þ
where α0 is a scalar, αt (t = 1,…,T) are K × 1 vectors of parameters, and ξ is normally distributed, ξ ∼ N(0, σ2ξ IN), and independent of ɛt for every t. Note that N must be greater than 1 + K × T and that the matrix [ΔX1…ΔXT] must have full column rank, otherwise the number of variables used to predict ΔY1 must be reduced. 1 j Letting Δv1 = ξ + ∑m− j = 0 Π Δɛ1−j, we have EðΔv1 Þ = 0;
ð20aÞ
0 2 2 2 2 E Δv1 Δv1 = σξ IN + σ V≡σ ðθIN + V Þ≡σ Vθ ;
ð20bÞ
0 2 E Δv1 Δε2 = −σ IN ;
ð20cÞ
0 E Δv1 Δεt = 0; t = 3; …; T:
ð20dÞ
Instead of estimating σ2ξ and σ2, it is easier to estimate θ = σ2ξ /σ2 and σ2, which is allowed as there exists a one-to-one correspondence between σ2ξ and θ.8 Furthermore, it is assumed that θ is a positive and bounded constant and that the row and column sums of Vθ are uniformly bounded in N or in both N and T. This implies that the covariance matrix of Δv = (Δv’1, Δɛ’2, ..., Δɛ’T)’ is finite and can be written as Var(Δv) = σ2HVθ, by which the NT × NT matrix HVθ is defined as 2
Vθ 6 −IN 6 6 0 HVθ ≡6 6 : 6 4 0 0
−IN 2 × IN −IN : 0 0
0 −IN 2 × IN : 0 0
: : : : : :
0 0 0 : 2 × IN −IN
3 0 0 7 7 0 7 7; : 7 7 −IN 5 2 × IN
ð21Þ
where the upper left submatrix is the N × N matrix Vθ defined by Eq. (20b). Consequently, the unconditional log-likelihood function of the complete sample of size T is Log f ðΔYT ; ΔYT−1 ; …ΔY2 ; ΔY1 Þ = −
NT 1 1 2 0 −1 log 2πσ + T log jIN −δW j− log jHVθ j− 2 Δe HVθ Δe; 2 2 2σ
2
ð22aÞ
3
6 ðIN −δWÞΔY1 −α0 ιN −ΔX1 α1 −…−ΔXT αT 6 Δe = 6 ðIN −δWÞΔY2 −τΔY1 −ΔX2 β 6 4 :
7 7 7; 7 5
ð22bÞ
ðIN −δWÞΔYT −τΔYT−1 −ΔXT β
where the second log and the third log on the right-hand side correspond to the Jacobian term of the transformation from the error term Δv = (Δv’1,Δɛ’2,...,Δɛ’T)’ to the dependent variable ΔY = (ΔY1’,...,ΔYT’)’. This log-likelihood function may contain up to (K + 1)T + 6 unknown parameters 7 8
This condition is equivalent to assumption 6 in Yu et al. (2008). To avoid θ taking on a negative value in the estimation, it is often programmed as θ2.
J.P. Elhorst / Regional Science and Urban Economics 40 (2010) 272–282
277
to be estimated: the vectors α1,...,αT and β, each consisting of K unknown parameters, and the scalars α0, τ, δ, θ, m and σ2. Fortunately, σ2, α and β can be solved from their first-order maximizing conditions 2
3 ˆ0 α 6α 7 0 6ˆ 1 7 Δe0 H−1 Vθ Δe ˜ −1 ˜ −1X˜ 0 H−1Y; ˜ ˆ2 = σ and ⌢ γ≡6 : 7 Vθ 6 7 = X HVθ X NT 4α ˆ 5
ð23aÞ
T
ˆ β
where 2
ιN 6 6 60 X˜ = 6 6 : 4 0
ΔX1
:
ΔXT
0 :
: :
0 :
0
:
0
0
3
2 3 ðIN −δWÞΔY1 7 7 6 ΔX2 7 ðIN −δWÞΔY2 −τΔY1 7 7; 7 and Y˜ = 6 4 5 : 7 : 5 ðIN −δWÞΔYT −τΔYT−1 ΔXT
ð23bÞ
provided that HVθ is nonsingular. The somewhat unusual structure of X̃ is explained by the fact that ΔY1 is predicted by a constant and all ΔX subdivided over time [see Eq. (19)]. Substituting σ̂2 and γ̂ in the log-likelihood function, the concentrated log-likelihood function of τ, δ, θ and m is obtained, which depends on only 4 unknown parameters. To compute the unconditional ML estimator, an iterative two-step estimation procedure is used in which the two sets of parameters γ,σ2 and θ, τ, δ are alternately estimated until convergence occurs, provided that m tends to infinity; γ and σ2, given θ, τ and δ, are estimated using (23), while θ, τ and δ, given γ and σ2, are estimated by maximizing the log-likelihood function in (22). To maximize this log-likelihood function computationally, the routine “fmincon” from Matlab's optimization toolbox is used. This function finds a constrained minimum of a scalar function of several variables subject to lower and upper bounds, and subject to linear inequalities. As lower and upper bounds, the restrictions −1 b τ b 1 and 1/rmin b δ b 1 are imposed, and as linear inequalities the restrictions |τ| b 1 − δ if δ ≥ 0 and |τ| b 1 − δrmin if δ b 0. In the Monte Carlo simulation experiments below, we will also discuss an extension of this routine that includes the estimation of m.
5. The GMM estimator Just as the unconditional ML estimator, the GMM estimator takes the first-differenced model in Eq. (9) as its point of departure. Note, however, that the number of observations in the time domain when using the GMM estimator is one less than when using the ML or the BCLSDV estimator. The GMM estimator, which extends the Arrelano and Bond (1991) estimator (see also Baltagi, 2005, pp. 136–142) to include endogenous interaction effects, takes the form9 0 1 hP τˆ Pi−1 P0 0
−1 0 @ δˆ A = X 0 Z Z 0 ðG⊗IN ÞZ −1 Z 0 X X Z Z ðG⊗IN ÞZ Z ΔY; ð24Þ ˆ β where X̅ = [ΔY− 1 (IT ⊗ W)ΔY ΔX], Z = diag(Z2,…,ZT), and the block-diagonal submatrices of Z are Zt = ½ Y0
WY0
… Yt−2
WYt−2
X0
WX0
… Xt−1
WXt−1 for t = 2; …; T:
ð25Þ
A few remarks are needed to better understand this estimator. First, since we assume that the data are sorted first by time and then by crosssectional units (footnote 3), we have G ⊗ IN instead of IN ⊗ G as in Baltagi (2005, p. 137). Second, since we assume that Y0 is available, t = 2 is the first period we observe Eq. (9) instead of t = 3 as in Arrelano and Bond (1991). Third, to estimate τ, Arrelano and Bond (1991) suggest using Y0 as an instrument of ΔY1 = Y1 − Y0, since it is highly correlated with ΔY1 and not correlated with Δε2 = ε2 − ε1 as long as ε is not serially correlated. X0 and X1 are valid instruments too, since they are also not correlated with Δε2. Similarly, valid instruments for ΔY2 are Y1, Y0, X0, X1 and X2. One can continue in this fashion, adding two extra valid instruments, one for Y and one for X, for each forward period. Since the dynamic panel data model in Eq. (9) has been extended with the endogenous variable WΔYt, this variable also needs to be instrumented. To estimate the parameter δ of the endogenous variable WYt of a model formulated in levels (such as in Eq. (1)), Kelejian and Prucha (1998) and Kelejian, Prucha, and Yuzefovich (2004) suggest using the instruments [Xt WXt … WdXt], where d is a pre-selected constant. Typically, one would take d = 1 or d = 2, dependent on the number of regressors.10 The set of instruments that satisfy the conditions spelled out both in Arrelano and Bond (1991) and in Kelejian et al. (2004) when the model is transformed into first-differences consists of the variables Y0 to Yt − 2 and X0 to Xt − 1 (as in the dynamic panel data literature) and WY0 to WYt − 2 and WX0 to WXt − 1 (as in the spatial econometrics literature when d = 111). Note that this setup was first considered by Revelli (2001). Finally, the matrix of instruments should have full column rank. 9 To be able to compare the Monte Carlo results of the different estimators in the next section, we will generate data under the assumption that ε is normally distributed, even though the GMM estimator does not formally require the normality assumption. When ε is assumed to be normally distributed, the so-called one-step and two-step GMM estimators are identical. 10 Lee (2003) introduced the optimal instrument 2SLS estimator, but Kelejian et al. (2004) found that the 2SLS estimator based on this set of instruments has quite similar small sample properties. 11 The reason to choose d = 1 is to limit the number of instruments. For example, if T = 5 and K = 1, we already have 48 instrumental variables. In the Monte Carlo simulation experiment, we will also briefly discuss the results of the case d = 2.
0.003 − 0.006 − 0.005 − 0.002 − 0.001 0.001 0.001 0.001 − 0.004 − 0.002 − 0.005 0.003 − 0.044 − 0.039 − 0.023 − 0.029 − 0.056 − 0.025 − 0.010 − 0.018 − 0.021 − 0.009 − 0.015 0.026 0.003 − 0.009 − 0.010 − 0.010 − 0.013 − 0.021 − 0.018 − 0.006 − 0.002 − 0.001 − 0.010 0.009 0.002 − 0.008 − 0.009 − 0.006 − 0.002 − 0.002 − 0.002 − 0.003 − 0.006 − 0.004 − 0.006 0.005 − 0.101 − 0.053 − 0.050 − 0.050 − 0.013 − 0.020 − 0.022 − 0.022 − 0.013 − 0.015 − 0.015 0.034 β=1 β=1 β=1 β=1 β=1 β=1 β=1 β=1 β=1 β=1 β=1 0.009 0.110 0.013 − 0.073 0.115 0.112 0.017 − 0.071 0.091 0.008 − 0.055 0.061 0.004 0.050 − 0.002 − 0.048 0.045 0.084 − 0.018 − 0.067 0.072 0.006 − 0.029 0.037 0.005 − 0.011 0.001 0.009 − 0.005 − 0.005 0.002 0.007 − 0.005 − 0.002 0.006 0.005 0.004 − 0.013 0.000 0.010 − 0.011 − 0.010 0.002 0.010 − 0.016 0.002 0.014 0.008 δ=0 δ = 0.4 δ=0 δ = − 0.3 δ = 0.8 δ = 0.4 δ=0 δ = − 0.3 δ = 0.4 δ=0 δ = − 0.3 − 0.116 − 0.068 − 0.068 − 0.065 − 0.025 − 0.039 − 0.042 − 0.039 − 0.021 − 0.021 − 0.022 0.048 − 0.006 − 0.013 − 0.011 − 0.009 − 0.002 − 0.005 − 0.005 − 0.005 0.003 0.007 0.003 0.006 − 0.118 − 0.080 − 0.069 − 0.073 − 0.015 − 0.035 − 0.042 − 0.037 − 0.004 − 0.020 − 0.014 0.046 − 0.004 − 0.016 − 0.017 − 0.023 0.004 0.002 − 0.008 − 0.006 0.018 − 0.012 0.010 0.011 In absolute value. a
− 0.257 − 0.163 − 0.174 − 0.167 − 0.057 − 0.101 − 0.113 − 0.106 − 0.062 − 0.070 − 0.067 0.122 τ = 0.8 τ = 0.4 τ = 0.4 τ = 0.4 τ=0 τ=0 τ=0 τ=0 τ = − 0.3 τ = − 0.3 τ = − 0.3 Meana
− 0.007 − 0.029 − 0.023 − 0.020 − 0.007 − 0.014 − 0.016 − 0.014 − 0.006 − 0.004 − 0.006 0.012
ML/BC LSDV GMM ML BC LSDV LSDV βX GMM ML BCLSDV ML/BCLSDV, GMM/BCLSDV, LSDV δWYt GMM/BC LSDV ML/BC LSDV GMM ML BC LSDV LSDV
12 Similar results were obtained when the set of instrumental variables is expanded by setting d = 2 (see previous footnote).
Table 1 Bias of the LSDV, BCLSDV, ML, GMM, mixed ML/BCLSDV and mixed GMM/BCLSDV estimators (N = 60, T = 5) with respect to τ, δ and β.
This section describes the design and the results of simulation experiments conducted to compare the performance of the LSDV, BCLSDV, ML and GMM estimators, as well as the mixed ML/BCLSDV and GMM/BCLSDV estimators, where the BCLSDV estimator is used to estimate the response parameter of the endogenous interaction effects, and the ML or the GMM estimator is used to estimate the other parameters of the model. We will first describe the standard design with N = 60 and T = 5, and then consider alternative designs. The model used to generate yit is given by Eq. (1). We set yi0 = 0 for all i, generated yit for all i and t = 1,…,100, and discarded the first 94 observations (cf. Hsiao et al., 2002). This ensured that the results were not unduly influenced by the initial values. Consequently, (i) T = 5 when using the LSDV and BCLSDV estimators based on the model formulated in levels conditional upon the first observation on every cross-sectional unit, (ii) T = 5 when using the ML estimator based on the model reformulated in first-differences taking into account the density function of the first first-differenced observations on each cross-sectional unit, and (iii) T = 4 when using the GMM estimator based on the model reformulated in first-differences. The spatial arrangement of the units in the cross-sectional domain is based on the “Bucky Ball”. This is composed of N = 60 units distributed over the surface of a sphere in such a way that the distance from any unit to its nearest neighbors is the same for all units. Each point has exactly three neighbors. The Bucky Ball spatial weights matrix is a 60-by-60 symmetric matrix and has applications for physical objects, such as the seams in a soccer ball. It has three non-zero elements in each row and column, for a total of 180 nonzero values, as a result of which the sparsity index is 5%. If this matrix is normalized, its largest characteristic root is 1 and its smallest characteristic root is −0.8727. This implies that the parameter space on which δ is defined is (−1.146, 1). We considered one exogenous explanatory variable xit drawn from a uniform distribution on the interval [− 1,1] and used σ2 = 0.1 (σ = 0.31) to generate normally distributed error terms ε. This value of σ2 corresponds to average R2 values of approximately 0.7 for the experiments under consideration (including the impact of the fixed effects). Similarly, we used normal distributions with σμ = 0.05 to generate cross-sectional fixed effects. The relationship between the different σs is based on experiences with previous empirical studies. The coefficient of the exogenous explanatory variable was held constant, β = 1, but the coefficient τ of the lagged dependent variable (yit − 1) and δ of the endogenous interaction effects (∑jwijyjt) were varied over the range = 0.3 to 0.8 by increments of 0.3 or 0.4. Experimental parameter combinations that violate |τ| b 1 − δ for δ ≥ 0 or |τ| b 1 − δrmin for δ b 0 were left aside, otherwise the model would be nonstationary in time. This has led to 11 different experimental parameter combinations. In the standard design, m is finally assumed to go to infinity (long history). Table 1 reports the bias in the parameters τ, δ and β based on 1000 replications for each of the experimental parameter combinations and six different estimators. From this table it can be seen that the bias in the coefficient τ of the lagged dependent variable, yit − 1, when using the LSDV estimator, is large and negative (0.122 on average and in absolute value), just as is the Nickell bias in a dynamic panel data model without endogenous interaction effects. The bias in τ is reduced when using the GMM estimator, but it is still quite large (0.046).12 This corroborates the finding of Hsaio, Pesaran, and Thamiscioglu (2002) that GMM estimators can still be severely biased. The greatest reduction of the bias in τ is obtained when using the BCLSDV or the ML estimator (0.011 and 0.006, respectively).
GMM/BC LSDV
6. Monte Carlo results
− 0.043 − 0.023 − 0.021 − 0.020 − 0.007 − 0.007 − 0.007 − 0.009 − 0.008 − 0.007 − 0.009 0.015
J.P. Elhorst / Regional Science and Urban Economics 40 (2010) 272–282
τYt − 1
278
J.P. Elhorst / Regional Science and Urban Economics 40 (2010) 272–282
In contrast to the bias in τ, the bias in the coefficient δ of the endogenous interaction effects, ∑jwijyjt, when using the LSDV estimator is rather small, 0.008 on average (in absolute value). This bias is reduced further to 0.005 when using the BCLSDV estimator. By contrast, the bias in δ is large and unacceptable when using the GMM estimator (0.061) and, although smaller than that of the GMM estimator, it also is when using the ML estimator (0.037). In other words, whereas the GMM and ML estimators, just as in a dynamic panel data model without endogenous interaction effects, (partly) remove the bias in the coefficient τ of the variable yit − 1, which would be obtained when applying the LSDV estimator, they blow up the bias in the coefficient δ of the variable ∑jwijyjt. On the basis of these findings and because Yu et al. (2008) proved that the BCLSDV estimator asymptotically eliminates the bias in δ, we decided to consider two alternative estimators by which δ is based on the BCLSDV estimator and the other parameters, given δ, are based on the ML or on the GMM estimator. These estimators are called the mixed ML/BCLSDV and the mixed GMM/BCLSDV estimators. Due to this construction, the empirically observed bias in δ of both estimators is equal to that of the BCLSDV estimator. Of all the six estimators, the mixed ML/BCLSDV estimator then produces the smallest bias in each of the parameters τ, δ and β. For 91% of the results reported in Table 1 the bias in either τ, δ or β is smaller than 0.005 (in absolute value), and for 9% smaller than 0.013, which is the maximum bias in either τ, δ or β of the ML/BCLSDV estimator found. Importantly, the mixed ML/BCLSDV estimator also outperforms the BCLSDV estimator. The average bias in τ decreases by 50% and in β by 40%. These results show that taking into account the density function of the first first-differenced observations when T is small helps to further improve the accuracy of the parameter estimates. Table 2 reports the bias and the Root Mean Squared Error (RMSE) of the BCLSDV and of the mixed ML/BCLSDV estimator with respect to τ, δ and β for the standard design, as well as for eight alternative designs. As we did not find any notable difference among the results of the different experimental parameter combinations, these results have been pooled (the bias in absolute values).13 First, we considered N = 150 instead of N = 60, and T = 15 instead of T = 5. In the first case, we used a two-dimensional circular world in which each point has two neighbors. This is because an extension of the Bucky Ball for larger values of N does not exist. In the second case, we used the last 15 observations of the 100 observations generated for each crosssectional unit. As expected, both the bias and the RMSE of each parameter decrease as N or T increases. Due to the increase of N from 60 to 150, the bias in the parameters τ, δ and β falls by 28% on average, and their RMSEs by 41% on average. Similarly, due to the increase of T from 5 to 15, the bias in the parameters τ, δ and β falls by 13% on average, and their RMSEs by 50% on average. Note, however, that the mixed ML/BCLSDV estimator does no longer outperform the BCLSDV estimator when T = 15. For this value of T the first first-differenced observations apparently do not have any added value any more. Second, we considered five complications. The first complication is the extension of the model with time-period fixed effects.14 The second complication simulates a correlation between not only the dependent variable, but also the independent variable in both the cross-sectional and time domain: xit = 0.25xit − 1 + 0.25 ∑jwijxjt + 0.5υ, where υ is drawn from a uniform distribution on [−1,1]. The third complication simulates cross-sectional fixed effects whose variance is greater than that of the normally distributed error terms (σμ = 0.63 and σ = 0.31). The fourth complication simulates error terms that follow a Student's t distribution, which has heavier tails 13 Detailed results for the alternative designs, as reported in Table 1, are available on request. 14 We used a normal distribution with σλ = 0.025 to generate time-period fixed effects.
279
than the normal distribution. The fifth complication considers a data generating process that has started in the past, but not too far before the 0th period. To simulate this we generated only 10 observations for each cross-sectional unit and discarded the first 4 observations. Table 2 shows that in 60% of the cases the bias in τ, δ and β and their RMSEs increase as a result of these complications, but within reasonable limits. Exceptions are the bias and the RMSE of the parameter β, when the independent variable is correlated in both the cross-sectional and time domains, and the bias and the RMSE of the parameters τ and β, when the error terms are not normally distributed. Importantly, however, the BCLSDV estimator shows similar deteriorations under these circumstances, as a result of which the mixed ML/BCLSDV estimator outperforms the BCLSDV estimator for all the complications being considered. The bias in τ decreases by 50.8%, on average, and in β by 27.2% (note that the bias in δ is the same by construction). These results support our conclusion that taking into account the density function of the first firstdifferences observations when T is small helps to further improve the accuracy of the parameter estimates, also when the true data generating process is more complicated. We also investigated whether the performance of the mixed ML/ BCLSDV estimator improves if the estimation routine is extended to include m, the parameter that measures the time passed between the moment when the data generating process starts (which is unknown) and the first observation in the sample. For this purpose, the iterative two-stage estimation procedure was extended using a third stage in which the likelihood function is maximized with respect to m (for m ≥ 0), given the values of the other parameters, and taking into account that the matrix V in (Eqs. (17b), (17c)), and thus the likelihood function, is only defined for integer values of m. However, the results in the last two panels of Table 2 show that the performance of the mixed ML/BCLSDV estimator does not improve if m is also estimated. Finally, we examined how much computation time the different estimators required.15 The methods that have been used to speed up computation time are discussed in the Appendix. If N = 60 and T = 5, the average computation time in seconds amounts to 0.09 for the BCLSDV estimator, 0.009 for the GMM estimator, 15 for the ML estimator, 0.18 for the mixed ML/BCLSDV estimator and 0.01 for the mixed GMM/BCLSDV estimator. The time needed to compute the ML/ BCLSDV estimator and the GMM/BCLSDV estimator for larger values of N (up to 1000) is reported in Table 3. The results show that the time needed to compute the ML/BCLSDV estimator for larger values of N is substantial and may therefore impede its application in empirical research. Table 4 reports the bias and RMSE with respect to τ, δ and β of the BCLSDV estimator and of the mixed GMM/BCLSDV estimator for different values of N in order to find out for which value of N the time benefits of using the GMM/BCLSDV estimator, rather than the ML/ BCLSDV estimator, compensates for the associated loss of accuracy. For N = 60 and N = 150 the biases in the parameters τ and β of the mixed GMM/BCLSDV estimator, as reported in Table 4, are greater than their counterparts for the mixed ML/BCLSDV estimator, as reported in Table 2 (note that the bias in δ is again the same by construction), while the RMSEs are approximately the same. This result just corroborates our previous finding that the mixed ML/ BCLSDV estimator outperforms the mixed GMM/BCLSDV estimator in terms of bias and RMSE for the same values of N. However, we also found that the bias and the RMSE of each parameter tend to fall when N increases. The results in Table 4 show that this also holds for the GMM/BCLSDV estimator. Consequently, the biases in τ, δ and β, when using the mixed GMM/BCLSDV estimator if N = 500, are comparable to the corresponding biases in τ, δ and β, when using the mixed ML/
15
On a 1.86 GHz pentium 4 desktop computer with 0.97 GB of memory.
280
J.P. Elhorst / Regional Science and Urban Economics 40 (2010) 272–282
Table 2 Bias and RMSE of the BCLSDV and the mixed ML/BCLSDV estimators under different designs. Design
Standarda
Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE
N = 150 T = 15 Time dummies Correlated x σμ N σ t distribution error terms Short history Short history m included a
Both estimators δ
BCLSDV τ
β
ML/BCLSDV τ
β
0.0052 0.0615 0.0039 0.0316 0.0041 0.0308 0.0187 0.0502 0.0115 0.0687 0.0074 0.0610 0.0081 0.0506 0.0067 0.0612 0.0082 0.0622
0.0123 0.0500 0.0109 0.0309 0.0031 0.0230 0.0122 0.0524 0.0322 0.0587 0.0142 0.0489 0.0405 0.0831 0.0108 0.0475 0.0129 0.0497
0.0045 0.0737 0.0044 0.0465 0.0046 0.0384 0.0080 0.0709 0.0156 0.1407 0.0212 0.2141 0.0157 0.2001 0.0103 0.0730 0.0085 0.0731
0.0061 0.0485 0.0027 0.0307 0.0037 0.0228 0.0046 0.0517 0.0085 0.0579 0.0048 0.0484 0.0226 0.0804 0.0045 0.0453 0.0085 0.0491
0.0027 0.0732 0.0035 0.0465 0.0046 0.0384 0.0072 0.0703 0.0142 0.1406 0.0058 0.0724 0.0119 0.2022 0.0082 0.0726 0.0068 0.0726
Standard is T = 5, N = 60, no time dummies, x uncorrelated, normally distributed error terms, σ = 0.31, σμ = 0.05, m large (long history) and not included in estimation.
BCLSDV estimator if N = 60. In addition, the RMSEs of these parameters, when using the GMM/CLSDV estimator, are considerably lower. The results in Table 4 also show that the biases in τ and β, when using the mixed GMM/BCLSDV estimator if N = 500, fall below the corresponding biases in τ and β (49% and 30% reduction, respectively), when using the BCLSDV estimator if N = 500 (the bias in δ is again the same by construction), while the RMSEs are comparable. These findings suggest that for values of N greater than 500 the mixed GMM/ BCLSDV is a reasonable alternative to both the BCLSDV and the ML/ BCLSDV estimators. Computation time will fall considerably, possibly from several minutes to less than 1 min (see Table 3), while the accuracy of the estimates is better (bias) and comparable to (RMSE) that of the BCLSDV estimator, and comparable to (bias) or even better (RMSE) than that of the mixed ML/BCLSDV estimator when N would be 60. 7. Conclusions The objective of this paper was to compare the performance, measured in terms of bias, root mean squared error and computation time, of different estimators of the fixed effects dynamic panel data model extended to include endogenous interaction effects when T is small: (i) the BCLSDV estimator based on Yu, De Jong, and Lee (2008); (ii) the ML estimator based on Hsiao et al. (2002) and Bhargava and Sargan (1983) extended in this paper to include endogenous interaction effects; (iii) the GMM estimator based on Arrelano and Bond (1991) extended in this paper to include endogenous interaction effects; (iv) the ML estimator mixed with the BCLSDV parameter estimate of the endogenous interaction effects; and (v) the GMM estimator mixed with the BCLSDV parameter estimate of the endogenous interaction effects. This paper has produced two major results. First, in contrast to the parameter estimate τ of the lagged dependent variable yit − 1 and the parameter estimates β of the explanatory variables xit, the parameter estimate δ of the endogenous interaction effects ∑jwijyjt is consid-
erably biased when using the ML estimator or the GMM estimator. Since the BCLSDV estimator asymptotically eliminates the bias in δ and indeed appeared to be hardly biased empirically even if T is small, this was the main reason to also consider the ML estimator, as well as the GMM estimator, mixed with the BCLSDV parameter estimate of the endogenous interaction effects. Note that we considered different parameter values within the interval on which δ is defined, different values for N and T, correlated and uncorrelated exogenous variables in the cross-sectional and time domain, two different ratios between σμ and σ generating the cross-sectional fixed effects and the error terms, normally and t-distributed error terms, an extension of the model with time-period fixed effects, and different periods of time between the moment that the process generating the data started and the first observation in the sample. Second, our Monte Carlo results showed that the performance of the mixed ML/BCLSDV estimator, in terms of bias and RMSE with respect to each of the parameters, is good and strongly favors the mixed ML/BCLSDV estimator over the other estimators. It was also found that the bias in τ decreases by approximately 50% and in β by 25–40% when using mixed ML/BCLSDV rather than the BCLSDV estimator when T is small (T = 5). These results show that taking into account the density function of the first first-differenced observations when T is small helps to further improve the accuracy of the parameter estimates. Only when T increases (T = 15) do the first first-differenced observations not have any added value any more. One disadvantage of the mixed ML/BCLSDV estimator is that its computation time for larger values of N may impede its application in empirical research. For values of N greater than 500, we therefore
Table 4 Bias and RMSE of the BCLSDV and the mixed GMM/BCLSDV estimators for different values of Na. N
N = 60 Table 3 Computation time in seconds for different values of N (T = 5).
N = 150
N
100
200
300
400
500
600
700
800
900
1000
ML/BCLSDV GMM/BCLSDV
0.46 0.11
2.33 0.28
5.94 0.61
14.94 1.13
78.92 2.36
178 9
318 14
528 29
798 39
1004 51
N = 500
Bias RMSE Bias RMSE Bias RMSE
Both estimators δ
BCLSDV τ
β
τ
GMM/BCLSDV β
0.0052 0.0615 0.0039 0.0316 0.0030 0.0169
0.0123 0.0500 0.0109 0.0309 0.0125 0.0165
0.0045 0.0737 0.0044 0.0465 0.0033 0.0247
0.0479 0.0471 0.0199 0.0315 0.0064 0.0170
0.0145 0.0729 0.0072 0.0460 0.0023 0.0246
a T = 5, no time dummies, x uncorrelated, normally distributed error terms, σ = 0.31, σμ = 0.05, m large (long history) and not included in estimation.
J.P. Elhorst / Regional Science and Urban Economics 40 (2010) 272–282
propose the mixed GMM/BCLSDV estimator. The main reason is that computation time will fall considerably, possibly from several minutes to less than 1 min, while the performance of the mixed GMM/BCLSDV estimator in terms of bias and RMSE, when N is greater than 500, is comparable or even better than that of the BCLSDV and ML/BCLSDV estimators (the former for the same value of N and the latter for N is 60). In view of these results, a Matlab routine will be developed where the mixed ML/BCLSDV estimator is applied if N ≤ 500, and the mixed GMM/BCLSDV estimator if N N 500 (default). To open up the path to empirical applications, this routine will be made available for free downloading from the author's Web site. Acknowledgements The author thanks Jan Jacobs, the guest editors, two anonymous referees, and many colleagues in spatial econometrics for helpful suggestions, constructive comments and/or critical remarks on previous versions of this manuscript. Appendix A. Methods to speed up computation time To speed up computation time, the spatial weights matrix W is converted to sparse form by squeezing out any zero elements. This approach, which is taken from Pace and Barry (1997), is based on the idea that it is not necessary to store a full N by N matrix if most of its elements are zero. All of Matlab's built-in arithmetic, logical and indexing operations return sparse matrices when they are applied to sparse matrices. To speed up computation time and to overcome numerical difficulties one might face in evaluating T ln |IN − δW| in the loglikelihood functions of the LSDV estimator (see Eq. (3)) and the ML estimator (see Eq. (22)), Pace and Barry (1997) propose to compute this determinant once over a grid of values for the parameter δ ranging from 1/rmin to 1 prior to estimation, provided that W is normalized. This only requires the determination of the smallest characteristic root of W. They suggest a grid based on 0.001 increments for δ over the feasible range. Given these predetermined values for the log determinant of (IN − δW), one can quickly evaluate the concentrated log-likelihood function for a particular value of δ. To compute the log determinant over the feasible range for small values of N (b500), they compute ∑ilog|uii|, where uii (i = 1,…,N) denotes the diagonal elements of the upper triangular LU decomposition matrix of (IN − δW). When the sparse structure of the spatial weights matrix is exploited, the required computation time of this decomposition can be reduced from order N3 to order N2. For larger values of N (≥ 500), Barry and Pace (1999) suggest approaching the log determinant for a particular value of δ over the feasible range by " # 0 R P 0 1 ∑ −N ∑ zr Wp zr = zr zr ðδp = pÞ , where zr denotes an N × 1 R r=1
p=1
vector of independent standard normal variates. The precision of this estimate can be manipulated by means of the tuning parameters R (the number of simulations generated over which the estimate is averaged) and P (the number of elements in the sum of ratios of quadratic forms). The required computation time of this simulation approach can be reduced to order NlogN and allows for the estimation of models with very large numbers of observations in the crosssectional domain. Both computations can be carried out using the “lndet” Matlab routine, which can be downloaded for free from LeSage's website www.spatial-econometrics.com (LeSage, 1999). The LU decomposition approach may also be used to compute the log determinant of HVθ, the other Jacobian term in the log-likelihood function of the ML estimator (see Eq. (22)), the difference being that the calculation of this determinant must be repeated in every iteration. On the other hand, due to the property |HVθ| = |Γ|, where
281
Γ = (1 − T)IN + TVθ, the computational burden may be reduced since Г is of order N, whereas HVθ is of order NT (see Eq. (21)). To determine the inverse of HVθ computationally, we make use of the property that −1
−1
HVθ = ð1−T ÞGv j v
= 0 ⊗Γ
−1
h −1 + Gv j v
−1 = 1 −ð1−T ÞGv j v = 0
i
Γ
−1
Vθ ; ðA1Þ
0
v −1 0 : B −1 2 −1 : B where Gv is a T × T matrix, Gv = B B 0 −1 2 : @ : : : : 0 0 0 :
1 0 0C C 0C C. :A 2
This implies that instead of an NT × NT matrix, two T × T matrices and one N × N matrix must be inverted. The inversion of the N × N matrix Г might be problematic nonetheless, because it usually does not contain any zero elements. To demean the y and x variables in Eq. (1), one might use the
0 demeaning operator in matrix form Q = INT − T1 ιT ιT ⊗IN , where ιT is a unit vector of length T. The disadvantage of storing this NT × NT matrix is that it slows down computation time considerably. When the demeaning operator in Eq. (2) is used instead, only N(3 + K) mean values of the y and x variables need to be stored. In their Matlab routines, Yu et al. (2008) store the NT × NT matrices ˜ ) needed to compute the LSDV parameter (IT ⊗ W) and (IT⊗ W estimates of τ, δ β, and σ2 and the corresponding variance–covariance matrix [see Eqs. (4) and (8)]. However, a variable such as (IT ⊗ W)Y* can also be computed by selecting row (N − 1) × t + 1 up to row N × t of Y* for t = 1,…,T and by premultiplying each block of rows by W. Then these matrices do not have to be stored. References Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer, Dordrecht. Anselin, L., Hudak, S., 1992. Spatial econometrics in practice: a review of software options. Regional Science and Urban Economics 22, 509–536. Arrelano, M., Bond, S., 1991. Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies 58, 277–297. Baltagi, B.H., 2005. Econometric Analysis of Panel Data, 3rd edition. Wiley, Chichester. Baltagi, B.H., Song, S.H., Jung, B.C., Koh, W., 2006. Testing for serial correlation, spatial autocorrelation and random effects using panel data. Journal of Econometrics 140, 5–51. Barry, R.P., Pace, R.K., 1999. Monte Carlo estimates of the log determinant of large sparse matrices. Linear Algebra and its Applications 289, 41–54. Bhargava, A., Sargan, J.D., 1983. Estimating dynamic random effects models from panel data covering short time periods. Econometrica 51, 1635–1659. Brock, W.A., Durlauf, S.N., 2001. Interactions-based models. In: Heckman, J.J., Leamer, E. (Eds.), Handbook of Econometrics, Vol. 5. Elsevier, Amsterdam, pp. 3297–3380. Brueckner, J.K., 2003. Strategic interaction among local governments: an overview of empirical studies. International Regional Science Review 26, 175–188. Elhorst, J.P., 2005. Unconditional maximum likelihood estimation of linear and loglinear dynamic models for spatial panels. Geographical Analysis 37, 62–83. Elhorst, J.P., 2008. Serial and spatial autocorrelation. Economics Letters 100, 422–424. Elhorst, J.P., 2010. Spatial panel data models. In: Fischer, M.M., Getis, A. (Eds.), Handbook of Applied Spatial Analysis. Springer, Berlin, pp. 377–407. Griffith, D.A., 1988. Advanced Spatial Statistics. Kluwer, Dordrecht. Hsiao, C., Pesaran, M.H., Tahmiscioglu, A.K., 2002. Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods. Journal of Econometrics 109, 107–150. Kelejian, H.H., Prucha, I.R., 1998. A generalized spatial two stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. Journal of Real Estate Finance and Economics 17, 99–121. Kelejian, H.H., Prucha, I.R., Yuzefovich, Y., 2004. Instrumental variable estimation of a spatial autoregressive model with autoregressive disturbances: large and small sample results. In: LeSage, J.P., Pace, K. (Eds.), Spatial and Spatiotemporal Econometrics. Elsevier, Amsterdam, pp. 163–198. Korniotis, G.M., 2010. Estimating panel models with internal and external habit formation. Journal of Business & Economic Statistics 28, 145–158. Lee, L.-F., 2003. Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econometric Reviews 22, 307–335. Leenders, R.T.A.J., 2002. Modeling social influence through network autocorrelation: constructing the weight matrix. Social Networks 24, 21–47. LeSage, J.P., 1999. Spatial Econometrics. http://www.spatial-econometrics.com/html/ sbook.pdf1999.
282
J.P. Elhorst / Regional Science and Urban Economics 40 (2010) 272–282
LeSage, J.P., Pace, R.K., 2009. Introduction to Spatial Econometrics. CRC Press Taylor & Francis Group, Boca Raton. Manski, C.F., 1993. Identification of endogenous social effects: the reflection problem. Review of Economic Studies 60, 531–542. Nerlove, M., 1999. Properties of alternative estimators of dynamic panel models: an empirical analysis of cross-country data for the study of economic growth. In: Hsiao, C., Lahiri, K., Lee, L.-F., Pesaran, M.H. (Eds.), Analysis of Panels and Limited Dependent Variable Models. Cambridge University Press, Cambridge, pp. 136–170. Nickell, S., 1981. Biases in dynamic models with fixed effects. Econometrica 49, 1417–1426. Pace, R.K., Barry, R., 1997. Quick computation of spatial autoregressive estimators. Geographical Analysis 29, 232–246.
Revelli, F., 2001. Spatial patterns in local taxation: tax mimicking or error mimicking? Applied Economics 33, 1101–1107. Soetevent, A.R., 2006. Empirics of the identification of social interactions: an evolution of the approaches and their results. Journal of Economic Surveys 20, 193–228. Su, L., Yang, Z., 2007. QML Estimation of Dynamic Panel Data Models with Spatial Errors. http://fp.paceprojects.f9.co.uk/Su.pdf2007. Yu, J., de Jong, R., Lee, L., 2008. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. Journal of Econometrics 146, 118–134.