On the behaviour of the GMM estimator in persistent dynamic panel data models with unrestricted initial conditions

On the behaviour of the GMM estimator in persistent dynamic panel data models with unrestricted initial conditions

Accepted Manuscript On the behavior of the GMM estimator in persistent dynamic panel data models with unrestricted initial conditions Kazuhiko Hayakaw...

2MB Sizes 1 Downloads 41 Views

Accepted Manuscript On the behavior of the GMM estimator in persistent dynamic panel data models with unrestricted initial conditions Kazuhiko Hayakawa, Shuichi Nagata PII: DOI: Reference:

S0167-9473(15)00075-4 http://dx.doi.org/10.1016/j.csda.2015.03.007 COMSTA 6059

To appear in:

Computational Statistics and Data Analysis

Received date: 21 October 2013 Revised date: 10 March 2015 Accepted date: 10 March 2015 Please cite this article as: Hayakawa, K., Nagata, S., On the behavior of the GMM estimator in persistent dynamic panel data models with unrestricted initial conditions. Computational Statistics and Data Analysis (2015), http://dx.doi.org/10.1016/j.csda.2015.03.007 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

*Manuscript Click here to view linked References

On the Behavior of the GMM Estimator in Persistent Dynamic Panel Data Models with Unrestricted Initial Conditions Kazuhiko Hayakawaa,∗, Shuichi Nagatab a Department b School

of Economics, Hiroshima University, 1-2-1 Kagamiyama, Higashi-Hiroshima, Hiroshima, 739-8525, Japan of Business Administration, Kwansei Gakuin University, 1-155 Uegahara-1bancho, Nishinomiya, Hyogo, 662-8501, Japan

Abstract The behaviour of the first-difference generalized method of moments (FD-GMM) estimator for dynamic panel data models when the persistency of the data is (moderately) strong and the initial conditions are unrestricted is investigated. It is shown that both the initial conditions and the degree of persistency affect the rate of convergence of the GMM estimator when the autoregressive parameter is modelled as a local to unity system. One of the most important implications is that the FD-GMM estimator can be consistent even when persistency is strong if mean nonstationarity is present. This result is in sharp contrast to the well-known weak instruments problem of the FD-GMM estimator that arises under mean stationarity. Monte Carlo simulations are carried out and it is confirmed that the derived asymptotic results approximate the finite sample behaviour well. Since these results are for the AR(1) case, extensive simulations are conducted for models with an endogenous variable, and it is shown that similar results to the AR(1) case are obtained. Finally, an empirical illustration that supports the theoretical results is provided, and it is confirmed that the FD-GMM estimator can precisely estimate the coefficients even when persistency is strong if mean nonstationarity is present. In addition, the insight that reducing the number of instruments as a strategy to mitigate the finite sample bias is not always useful is obtained. Keywords: Dynamic panel data models, GMM, initial conditions, strong persistency, weak instruments

∗ Corresponding

author. Tel/Fax: 81-82-424-7264 Email addresses: [email protected] (Kazuhiko Hayakawa), [email protected] (Shuichi Nagata)

Preprint submitted to Elsevier

March 10, 2015

1. Introduction In empirical studies, the first-difference generalized method of moments (FD-GMM) estimator has been widely used to estimate dynamic panel data models since the works of Holtz-Eakin, Newey and Rosen (1988) and Arellano and Bond (1991). However, when the persistency of the data is strong, the FD-GMM estimator may not work well because of the weak instruments problem, as noted by Blundell and Bond (1998) and Blundell, Bond and Windmeijer (2000). Blundell and Bond (1998) demonstrate that the system GMM estimator of Arellano and Bover (1995), which is obtained by imposing a restriction on the initial conditions, can address the weak instruments problem. Since then, the system GMM estimator has been considered to be more efficient than the FD-GMM estimator and as such it has been widely used in empirical studies. However, Bun and Windmeijer (2010) show that the system GMM estimator does suffer from the weak instruments problem when the variance ratio of the individual effects to the disturbances is large (see also Hayakawa (2007)). The purpose of this paper is to revisit the weak instruments problem of the FD-GMM estimator with an emphasis on the role of the initial conditions. Since the system GMM estimator is not consistent with the general form of the initial conditions, we do not discuss it. The purpose is motivated by the simulation evidence provided by Hayakawa (2009), who shows that the assumption of the initial conditions substantially affects the finite sample behaviour of the FD-GMM estimator. Since only the simulation results and an intuitive reason are provided in Hayakawa (2009), this paper provides an asymptotic theory that can explain the finite sample behaviour of the FD-GMM estimator. Although several papers discussing the weak instruments problem in dynamic panel data models, such as Hahn, Hausman and Kuersteiner (2007) and Bun and Windmeijer (2010), assume the initial conditions such that the data are covariance or mean stationary, their assumptions might not hold in practice. In fact, Arellano (2003b, p.96) and Arellano (2003a) provide empirical evidence of mean nonstationarity, where the precise meaning of ‘mean nonstationarity’ is defined in Section 2. More recently, Roodman (2009) casts doubt on mean stationarity. Hence, it is worth reconsidering the weak instruments problem of the FD-GMM estimator with unrestricted initial conditions. To investigate the weak instruments problem with unrestricted initial conditions, standard first-order asymptotic theory is unsuitable since the FD-GMM estimator is always consistent and asymptotically normal despite the assumption of the initial conditions. Hence, we employ the local to unity asymptotics advanced by Hahn, Hausman and Kuersteiner (2007). The use of local to unity asymptotics is motivated by the fact that the instruments become weak as persistency increases. While Hahn, Hausman and Kuersteiner (2007) employ a local to unity sequence given by αN = 1 − c/N , we consider a more general local to unity sequence given by αN = 1 − c/N p , (0 < p < ∞). The local to unity sequence considered in this study is related to the notion of ‘nearly weak instruments’, ‘weak instruments’, and ‘near non-identification’ introduced by Hahn and Kuersteiner (2002). To see what these terms mean and understand their theoretical implications, let us consider a cross-section two-stage least squares (2SLS) regression model: yi

= βxi + vi ,

xi

= z′i π + ei = z′i (µ/N p ) + ei ,

(i = 1, ..., N ),

and

where yi and xi are scalars, zi and π are K × 1 vectors, and 0 < p < ∞. Hahn and Kuersteiner (2002) consider three situations according to p. They call the case of 0 < p < 1/2 ‘nearly weak’, the case of p = 1/2 ‘weak’, and the case of 1/2 < p < ∞ ‘near non-identified’. Note that the seminal paper Staiger and Stock (1997) that studies weak instruments assumes p = 1/2. Under each situation, the asymptotic distribution of the 2SLS estimator βb is given by ( ) d N 1/2−p (βb − β) −→ N 0, ω 2 , if 0 < p < 1/2, d βb − β −→ ξ1 ,

d βb − β −→ ξ2 ,

if p = 1/2,

and

if 1/2 < p < ∞,

where ξ1 and ξ2 are non-standard distributions, and ( )( )−1 ( )−1 N N N ∑ ∑ ∑ 1 1 1 ′ ′ 2 2 xi zi zi zi zi xi  . ω = σv plim  N 1−p i=1 N i=1 N 1−p i=1 N →∞

An important implication of this result is that when 0 < p < 1/2, the 2SLS estimator has an asymptotic distribution implied by standard first-order asymptotic theory with suitable normalization, while the rate of 2

√ convergence is slower than the usual rate N . Hence, as long as 0 < p < 1/2, near weak instruments asymptotics is qualitatively the same as standard first-order asymptotic theory. However, when p ≥ 1/2, the 2SLS estimator is inconsistent. By using a local to unity sequence with αN = 1 − c/N p , this paper investigates whether the same implication is also obtained in dynamic panel models with unrestricted initial conditions. Thus, this paper can be seen as an extension of Hahn, Hausman and Kuersteiner (2007), who relax the assumption of the initial conditions, and of Hahn and Kuersteiner (2002) to the dynamic panel model case. We derive the asymptotic distribution of the FD-GMM estimator under the local to unity system and find that depending on the value of p, the FD-GMM estimator is or is not consistent when the data are mean stationary, while it is always consistent when the data are mean nonstationary regardless of the value of p. Since this asymptotic result cannot approximate the finite sample behaviour of the estimator when the data are close to being mean stationary, but not exactly mean stationary, we introduce a new notion called ‘near mean stationary’ to incorporate the closeness of the data to mean stationarity in asymptotic theory. We then derive the asymptotic distribution under the local to unity and near mean-stationary systems and show that the FD-GMM estimator remains consistent even when persistency is strong if mean nonstationarity is present, but inconsistent when persistency is strong and the data are nearly or exactly mean stationary. We confirm the derived theoretical implications by using a simulation. Consequently, we find that the behaviour of the FD-GMM estimator crucially depends on the degree of mean nonstationarity and the variance ratio of the individual effects to the disturbances, as the theoretical results imply. We also find that the meanstationary case tends to be the worst when the variance ratio is large and that the FD-GMM estimator using a subset of the instruments is much more affected by the variance ratio of the individual effects to the disturbances than the FD-GMM estimator using all the instruments. We also consider two alternative estimators that might mitigate the weak instruments problem, namely the long-difference instrumental variable (LDIV) estimator of Hahn, Hausman and Kuersteiner (2007) and biascorrected within-group (BCWG) estimator of Bun and Carree (2005). We find that the LDIV estimator is fairly robust to the strength of the instruments, while the BCWG estimator is completely unaffected by the strength of the instruments, although this finding is applicable only when all the regressors are strictly exogenous. In practice, we often use the difference Sargan test to detect mean nonstationarity. However, its finite sample properties are not well examined in the literature. Therefore, we provide simulation results of the difference Sargan test and show that the size property crucially depends on the number of instruments, while the power property depends on the variance ratio of the individual effects to the disturbances. Finally, we provide an empirical example that supports the theoretical results by using the dataset of Arellano (2003a). Specifically, since one of the major results of this paper is that the FD-GMM estimator remains consistent even when persistency is strong if mean nonstationarity is present, we provide empirical evidence that such a case actually occurs in practice. The rest of the paper is organized as follows. In Section 2, we introduce our model, assumptions, and estimators. In Section 3, the main results of this paper are provided. In Section 4, we conduct a Monte Carlo simulation. Section 5 provides an empirical illustration and, finally, Section 6 concludes. 2. Setup 2.1. Model and assumptions Let us consider the following model: yit = αyi,t−1 + ηi + vit ,

(1)

(i = 1, ..., N ; t = 1, ..., T )

where α is the parameter of interest with |α| < 1 and ηi is the unobserved individual effects. We make the following assumptions: Assumption 1. vit ∼ iid(0, σv2 ) over i = 1, ..., N and t = 1, ..., T with 0 < σv2 < ∞. Assumption 2. ηi ∼ iid(0, ση2 ) over i = 1, ..., N with 0 < ση2 < ∞. Assumption 3. For the initial conditions, we assume yi0 = δµi + εi0 ,

(2) iid(0, σ02 )

where µi = ηi /(1 − α) and δ ̸= 0. For εi0 , we assume that εi0 ∼ over i = 1, ..., N , where λσv2 /(1 − α2 ) with λ > 0. Further, we assume that E(ε4i0 ) = κσ04 , where 0 < κ < ∞. 3

σ02

=

Assumption 4. vit , ηi , and εi0 are mutually independent. Assumptions 1, 2, and 4 are widely used in previous studies such as Alvarez and Arellano (2003). Assumption 3 plays a crucial role in deriving the asymptotic distribution of the FD-GMM estimator. Hence, we discuss Assumption 3 in some detail below. Note that the initial conditions (2) are also considered by Arellano (2003b, p.97). We do not consider the case of δ = 0 since it is natural that the initial conditions contain individual effects. From the initial conditions (2), we have the following expression: yit

t−1 ∑ [ ] 1 − (1 − δ)αt µi + αj vi,t−j + αt εi0

=

j=0

= ht µi + ξit + αt εi0 , where ht = 1 − (1 − δ)αt and ξit =

∑t−1

j=0

(3)

αj vi,t−j . The expectation and covariance of yit given in (3) are

E(yit |ηi ) = Cov(yis , yit |ηi ) =

ht µi ,

and ( ) 1 − (1 − λ)α2s σv2 αt−s , 1 − α2

(s ≤ t).

By imposing restrictions on δ and λ, we have several schemes for the data characteristics. For example, if we set δ = 1, λ = 1, then we have E(yit |ηi ) = µi and Cov(yit , yis |ηi ) = σv2 αt−s /(1 − α2 ); hence, yit is a meanand covariance-stationary process. If we set δ = 1 and let λ be unrestricted, E(yit |ηi ) = µi does not depend on t; hence, we call the case of δ = 1 ‘mean stationary’. Note that in this case, Cov(yit , yis |ηi ) can depend on t. If we assume that δ ̸= 1, yit is mean nonstationary since E(yit |ηi ) = ht µi depends on time t. Note that we use the terminology ‘mean nonstationary’ to distinguish from ‘nonstationary’, which is conventionally used to indicate an integrated variable. Cov(yit , yis |ηi ) can depend on t in this case, too. Note that most previous studies including Hahn, Hausman and Kuersteiner (2007) assume δ = λ = 1, while we do not assume δ = 1 in this paper, i.e., δ can be equal to or not equal to one. Thus, assuming a general initial condition (2) is useful to characterize the data characteristics. 2.2. GMM estimator We introduce the FD-GMM estimator. Consider the following FD model: ∆yit = α∆yi,t−1 + ∆vit ,

(i = 1, ..., N ; t = 2, ..., T ).

The moment conditions proposed by Arellano and Bond (1991) are given by E(yis ∆vit ) = 0,

(s = 0, ..., t − 2; t = 2, ..., T ).

(4)

Note that the validity of these moment conditions does not depend on the initial conditions. Although other moment conditions such as those proposed by Ahn and Schmidt (1995) can be used, we do not include them since the moment conditions (4) are the most widely used. The FD-GMM estimator is defined as (∑ ) (∑ )+ (∑ ) N N N L2 L2′ L2 L2′ ′ i=1 ∆yi,−1 Zi i=1 Zi HZi i=1 Zi ∆yi α bF D2 = (∑ (5) ) (∑ )+ (∑ ), N N N L2 L2′ L2 L2′ ′ i=1 ∆yi,−1 Zi i=1 Zi HZi i=1 Zi ∆yi,−1 where A+ denotes the Moore-Penrose inverse of A, ∆yi = (∆yi2 , ..., ∆yiT )′ , ∆yi,−1 = (∆yi1 , ..., ∆yi,T −1 )′ ,

ZL2 i



  = 

0

yi0 yi0

yi1

..

.

0

yi0

4

···

yi,T −2



  , 

(6)

and 

2 −1  −1 2  H= 

0 −1

..

.

0

−1

2



  . 





L2 ′ For later use, let us define ∆y−1 = (∆y′1,−1 , ..., ∆y′N,−1 )′ and ZL2 = (ZL2 1 , ..., ZN ) .

3. Asymptotic Properties In this section, we derive the asymptotic properties of the FD-GMM estimator when the persistency of the data is strong. To this end, let us consider the following parameter sequence: αN = 1 −

c , Np

where c > 0 is a constant and 0 < p < ∞. Note that the unit root case of c = 0 is excluded since we assume |α| < 1. Furthermore, we do not consider the case of p = 0, since consistency and asymptotic normality can be easily shown by applying the general results of the GMM estimator. Note that as p increases, the data become more persistent. We use this local to unity approach to approximate the distribution near unity, as in Hahn, Hausman and Kuersteiner (2007). Note that Hahn, Hausman and Kuersteiner (2007) assume that p = 1. Further, note that, under this local to unity system, the initial conditions formulated in Assumption 3 can be written as yi0 = δ

Np ηi + εi0 . c

We first provide an intuitive reason why the initial conditions affect the strength of the instruments, which was originally presented by Hayakawa (2009), and then derive the order of magnitude of the F statistic in the first-stage regression in a 2SLS regression form. 3.1. Strength of the instruments and initial conditions The Jacobian of the moment conditions (4) is given by −

ση2 dE(yis ∆vit ) σ 2 αt−s−2 [1 − (1 − λ)α2s ] = E(yis ∆yi,t−1 ) = hs αt−2 (1 − δ) − v . dα 1−α 1+α

(7)

By noting that E(yis ∆yit ) measures the strength of the instruments, we find that an additional correlation between yis and ∆yit , the first term in (7), appears when δ ̸= 1. Hence, depending on the relative size of the first and second terms in (7), the instruments become weak in some cases and strong in others. Formally, let us consider the strength of the instruments in terms of the concentration parameter, which is substituted by the first-stage F statistic (see Stock, Wright and Yogo (2002) for a survey). To simplify the discussion, let us consider the case of T = 2. Since we consider the fixed T case, we can expect a similar result to be obtained for a more general case with multiple instruments. When T = 2, the 2SLS regression form can be written as ∆yi2

= α∆yi1 + ∆vi2 ,

∆yi1

= πyi0 + ui .

and

By using (A.4) and (A.8) in the Appendix, it is easy to show that the OLS estimator of π is given by ∑N ( ) ( ) N −1 i=1 yi0 ∆yi1 (1 − δ)Op (N p ) + Op (1) 1 1 = (1 − δ)O + O . π b= = ∑N 2 p p Op (N 2p ) Np N 2p N −1 i=1 yi0

From this, it is observed whether δ = 1 affects the speed toward zero as N grows. The first-stage F statistic that tests H0 : π = 0 is given by (∑ ) N 2 π b2 i=1 yi0 F ≈ = (1 − δ)2 Op (N ) + (1 − δ)Op (N 1−p ) + Op (N 1−2p ). (8) ∑N N −1 i=1 (∆yi1 − π byi0 )2 5

This suggests that the initial conditions and degree of persistence affect the strength of the instruments. When δ = 1, it follows that F → ∞ when 0 < p < 1/2, F = O(1) when p = 1/2, and F → 0 when 1/2 < p < ∞ as N → ∞. However, when δ ̸= 1, F → ∞ regardless of the value of p as N → ∞. This result indicates that when δ ̸= 1, the instruments are strong even if the data are persistent. Depending on p and δ, we consider the following four cases: Case A(a):

δ = 1 and 0 < p <

Case A(b):

δ = 1 and p =

Case A(c):

δ = 1 and

Case B:

δ ̸= 1.

1 , 2

1 , 2

1 < p < ∞, 2

and

Note that the data are mean stationary in Case A, while they are mean nonstationary in Case B. Furthermore, with regard to Case A, note that the persistency of the data in Case A(a) is the weakest, while it is the strongest in Case A(c). We do not need separate sub-cases for Case B. The following theorem provides the asymptotic distributions of the FD-GMM estimator in Cases A and B. Theorem 1. Let Assumptions 1 to 4 hold and let N → ∞ with T fixed. Then, we have Case A(a): Case A(b): Case A(c): Case B: where 2 ωA

2 ωB



∆y′−1 ZL2 = σv2 plim  N N →∞ =

σv2



plim 

N →∞

∆y′−1 ZL2 N p+1

( ) d 2 N 1/2−p (b αF D2 − α) −→ N 0, ωA , ( )+ (ξF D2 + µ1 )′ J′ HJ ζ F D2 d α bF D2 − α −→ , ( ′ )+ (ξ F D2 + µ1 )′ J HJ (ξF D2 + µ1 ) ( ′ )+ ′ J HJ ζ F D2 d ξ α bF D2 − α −→ F′ D2 ( ′ , and )+ ξF D2 J HJ ξF D2 √ ) ( d 2 N (b αF D2 − α) −→ N 0, ωB , (

¯ L2 ZL2 HZ 2p+1 N

)−1

(

¯ L2 ZL2 HZ N 2p+1

)−1





−1 ′ 4ση2 ZL2 ∆y−1  = , 2 2 2 ′ N c σv λ ιm (J′ HJ)+ ιm −1 ′ ZL2 ∆y−1 σv2  = , N p+1 ση2 δ 2 ι′m (J′ HJ)+ ιm

and

¯ = IN ⊗ H, ιm is an m = (T − 1)T /2 dimensional vector of ones, ξF D2 and ζ F D2 are the zero where H mean random vectors defined by (A.16) in the Appendix, µ1 is defined in Lemma A3(c.1) in the Appendix, and [ ] J = diag ι1 , ι′2 , · · · , ι′T −1 .

Remark 1. In Case A, where the data are mean stationary, we find that the rate of convergence of the FDGMM estimator depends on p. In Case A(a), the FD-GMM estimator is N 1/2−p -consistent, while in Cases A(b) and A(c), it is inconsistent. This finding indicates that the FD-GMM estimator behaves poorly when the data are mean stationary and the persistency of the data is strong, which is well known in the literature. 2 Further, note that in Case A(a), asymptotic variance ωA is in the same form as the variance derived from standard first-order asymptotic theory where p is fixed. Hence, this fact suggests that results similar to Hahn and Kuersteiner (2002) and Caner (2010) are also obtained in dynamic panel data models. √ Remark 2. In Case B, where the data are mean nonstationary, we find that the FD-GMM estimator is N consistent for 0 < p < ∞. In other words, when the data are mean nonstationary, the FD-GMM estimator performs well even when the persistency of the data is strong, which is in sharp contrast to the well-known result that the FD-GMM estimator performs poorly with strong persistency. Remark 3. By comparing the asymptotic variances of Case A(a) and Case B, we find that the variance ratio ση2 /σv2 has the opposite effect. When ση2 /σv2 is large, the variance in Case A(a) becomes large, while that in Case B becomes small. Blundell, Bond and Windmeijer (2000) and Bun and Windmeijer (2010) show that as 6

ση2 /σv2 increases, the FD-GMM estimator suffers from the weak instruments problem and does not perform well. However, this opposite effect indicates that the FD-GMM estimator does not always work poorly when ση2 /σv2 is large: rather, there are cases where it performs well. 3.2. Asymptotic results with ‘near mean stationary’ initial conditions Theorem 1 shows that the degree of persistence and initial conditions affect the rate of convergence. However, when δ is not equal to one, but very close to one, say, δ = 1.0001, the behaviour of the FD-GMM estimator might be close to the asymptotic distribution of Case A and not that of Case B. One possible way in which to explain this is that the closeness between δ and 1 is not taken into consideration when δ is fixed. In other words, when δ is fixed, no distinction is made between δ = 1.0001 and δ = 10, say. To take the closeness of δ to 1 into account, we introduce the following sequence: δN = 1 −

d , Nq

(9)

where 0 < q < ∞ and d ̸= 0 is a constant. We do not consider the case of q = 0 since this case is equivalent to a fixed δ. In this sequence, q controls the closeness of the data to mean stationarity. When q is close to 0, δN deviates from 1, which implies that the data tend to be mean nonstationary. As q increases, δN approaches 1, implying that the data are near mean stationary. Thus, with (9), the closeness between δ and 1 is taken into consideration through q. With the sequence (9), the first-stage F statistic that tests H0 : π = 0 given in (8) becomes F = Op (N 1−2q ) + Op (N 1−p−q ) + Op (N 1−2p ). Depending on the values of p and q, the order of magnitudes of the F statistic differs. In all, 11 cases are summarized in Table 1. From Table 1, we find that for Cases C1, C2, C3, C4, and C7, the F statistic diverges, while for Cases C5, C6, C8, and C9, the F statistic is bounded as N → ∞. This finding implies that when the persistency of the data is not so strong or when δ sufficiently deviates from 1, the instruments become strong. However, as the persistency of the data increases or as the data approach mean stationarity, the instruments become weak. The following theorem shows the asymptotic distribution of the 11 cases in Table 1. Table 1: Order of magnitudes of the F statistic in the first-stage p = 12 0 < p < 12 Case C1(a): Op (N 1−2q ) if p > q 0 < q < 12 Case C1(b): Op (N 1−2r ) if p = q = r Case C2: Op (N 1−2q ) Case C1(c): Op (N 1−2p ) if p < q q = 12 Case C4: Op (N 1−2p ) Case C5: Op (1) 1 < q < ∞ Case C7: Op (N 1−2p ) Case C8: Op (1) 2 q\p

regression 1
Theorem 2. Let Assumptions 1 to 4 hold, and let N → ∞ with T fixed. Then, we have ( ) d 2 Cases C1(a), C2, C3: N 1/2−q (b αF D2 − α) −→ N 0, ωC1(a)23 , ( ) d 2 Case C1(b): N 1/2−r (b αF D2 − α) −→ N 0, ωC1(b) , ( ) d 2 Cases C1(c), C4, C7: N 1/2−p (b αF D2 − α) −→ N 0, ωC1(c)47 , Case C5: Case C6: Case C8: Case C9:

( )+ (ξF D2 + µ3 )′ J′ HJ ζ F D2 , α bF D2 − α −→ ( )+ (ξ F D2 + µ3 )′ J′ HJ (ξF D2 + µ3 ) ( )+ (ξF D2 + µ2 )′ J′ HJ ζ F D2 d , α bF D2 − α −→ ( ′ )+ (ξ F D2 + µ2 )′ J HJ (ξF D2 + µ2 ) ( )+ (ξF D2 + µ1 )′ J′ HJ ζ F D2 d α bF D2 − α −→ , ( )+ (ξ F D2 + µ1 )′ J′ HJ (ξF D2 + µ1 ) ( ′ )+ ′ J HJ ζ F D2 d ξ α bF D2 − α −→ F′ D2 ( ′ , )+ ξF D2 J HJ ξ F D2 d

7

and

where 2 ωC1(a)23

2 ωC1(b)

2 ωC1(c)47

=

σv2



(

′ ¯ L2 ZL2 HZ 2p+1 N

)−1

∆y′−1 ZL2 N

(

′ ¯ L2 ZL2 HZ 2p+1 N

)−1

∆y′−1 ZL2 N

(

¯ L2 ZL2 HZ 2p+1 N

)−1

∆y′−1 ZL2 plim  p−q+1 N N →∞ 

= σv2 plim  N →∞



= σv2 plim  N →∞



−1 ′ ZL2 ∆y−1 σv2  = , p−q+1 2 2 ′ N d ση ιm (J′ HJ)+ ιm −1 ′ 4σv2 ση2 ZL2 ∆y−1  =[ , and ]2 N 2dσ 2 − cλ2 σ 2 ι′ (J′ HJ)+ ιm η



−1

ZL2 ∆y−1  N

=

v

m

4ση2 , c2 λ2 σv2 ι′m (J′ HJ)+ ιm

where ξF D2 and ζ F D2 are the zero mean random vectors defined by (A.16) in the Appendix and µ1 , µ2 , and µ3 are defined in Lemmas A3(c.1), A3(c.5), and A3(c.6) in the Appendix, respectively. Remark 4. We find that when 0 < p < 1/2 or 0 < q < 1/2, the FD-GMM estimator is consistent. However, its rate of convergence depends on p and q. When p > q, the rate of convergence is N 1/2−q , while when p < q, √ 1/2−p it is N . Further, note that these convergence rates are less than N for Case B where δ is fixed. This means that the closeness of the data to mean stationarity affects the rate of convergence, which is the main purpose of introducing the parameter sequence (9). Remark 5. When 1/2 ≤ p < ∞ and 1/2 ≤ q < ∞, the FD-GMM estimator is inconsistent. This finding implies that as the data become persistent and approach mean stationarity, the FD-GMM estimator deteriorates. 2 2 Remark 6. The asymptotic variances ωC1(a)23 , ωC1(b) , and ωC1(c)47 are in the same form as that obtained from standard first-order asymptotic theory where p and q are fixed. This finding confirms that results similar to Hahn and Kuersteiner (2002) and Caner (2010) also hold for the case with a notion of near mean stationarity. An implication of this result is that if the data are strongly mean nonstationary or the persistency of the data is not strong or both, standard first-order asymptotic theory predicts the behaviour of the GMM estimator well.

Remark 7. When the variance ratio ση2 /σv2 is large, the asymptotic variances of Cases C1(a), C1(b), C2, and C3 become small, while those of Cases C1(c), C4, and C7 become large. This finding implies that when p > q, a large ση2 /σv2 makes the asymptotic variance small, while when p < q, it makes the variance large. Hence, depending on the relative size of p and q, the effect of ση2 /σv2 changes dramatically. Remark 8. The asymptotic distribution of Cases C1(a), C2, and C3 indicates that even when the persistency of the data is strong, the FD-GMM estimator performs well as q → 0. This result explains the finite sample behaviour reported in Hayakawa (2009) (see also Section 4). 3.3. Relationship between Kruiniger (2009) and this paper In this subsection, we investigate the relationship between Kruiniger (2009) and this paper. Since the models and asymptotic techniques of Kruiniger (2009) and this paper look similar, this comparison may be useful. Kruiniger (2009) considers the following model with −1 < α ≤ 1: yit = αyi,t−1 + (1 − α)µi + vit ,

(i = 1, ..., N ; t = −S + 1, ..., T )

with the initial conditions yi,−S = µi + (1 − α)νi,−S . By using these, we have the following expression: yit = µi + vit + αvi,t−1 + · · · + αt+S−1 vi,−S+1 + αt+S (1 − α)νi,−S .

(10)

Then, under |α| < 1, we have E(yit |µi ) = Cov(yit , yis |µi ) =

µi ,

and (

σv2 αt−s

1 − α2(s+S) 1 − α2 8

)

2 + αt+s+2S (1 − α)2 E(νi,−S ),

(s ≤ t).

There are several differences between Kruiniger (2009) and this paper in terms of model specification. First, while the process starts from t = −S in Kruiniger (2009), in our case, it starts from t = 0. Second, while we consider the case with |α| < 1 only, Kruiniger (2009) considers both cases with |α| < 1 and α = 1 though these two cases are investigated separately. Third, the form of the individual effects is different. While Kruiniger (2009) uses (1 − α)µi , we use ηi . However, as long as |α| < 1, this difference is not important. Comparing the two processes (3) and (10), there are no essential differences except for the coefficient of µi and the presence of S, which come from different assumptions on the initial conditions. When α = 1, the difference in the form of the individual effects is important. In our case, the individual effects become a trend, and a discontinuity between the models with |α| < 1 and α = 1 arises, although this is not the case for Kruiniger (2009). However, since the case with α = 1 is excluded in this paper, the form of the individual effects does not matter in the current context. Fourth, the forms of the initial conditions, which determine the nature of the process, are different. While Kruiniger (2009) assumes mean stationarity, we allow for more general cases including all combinations of mean-(non)stationary and covariance-(non)stationary cases. Theorem 4 of Kruiniger (2009) still holds even though the initial conditions have the form yi,−S = δµi + (1 − α)νi,−S with δ ̸= 1. Aside from the differences in model specification between Kruiniger (2009) and this paper, other differences include the asymptotic scheme and assumption of V ar(µi ). The different results of Kruiniger (2009) and this paper are thus attributed to these two differences. For the first difference, both Kruiniger (2009) and this paper consider the sequence αN = 1 − c/N p . However, while Kruiniger (2009) considers the asymptotics with S/N k → κ > 0 as N, S → ∞, we consider the sequence δN = 1 − d/N q . Thus, the asymptotic framework is rather different. For the second difference, in Kruiniger (2009), it is implicitly assumed that the individual effects tend to be non-random as α approaches one. More specifically, since Kruiniger (2009) assumes V ar(µi ) < ∞, he implicitly assumes V ar(ηi ) = O(1/N 2p ). However, we assume that V ar(µi ) diverges as αN → 1 and keep V ar(ηi ) fixed. This difference is most important since in our setup, the individual effects term dominates the idiosyncratic part, while this is not so in Kruiniger (2009), which leads to different results. Which assumption is more relevant to empirical studies is generally inconclusive and depends on the data to be analysed. Some discussions on this point are provided in Section 5. Given these differences, it is difficult to compare the results of the two studies theoretically. However, there are some potential overlaps in the implications. To see this, let us focus on the case of k = 0 since k = 0 implies fixed S, which is closer to our setup (k corresponds to d in Kruiniger (2009)). In Theorem 4(i) of Kruiniger (2009), he shows that α bF D2 is N 1/2−p -consistent for 0 < p < 1/2 but inconsistent for 1/2 ≤ p < ∞ (p corresponds to g in Kruiniger (2009)). This finding implies that when persistency is not strong in the sense that 0 < p < 1/2 holds, then the FD-GMM estimator is consistent. However, if persistency is strong in the sense that 1/2 ≤ p < ∞, the FD-GMM estimator becomes inconsistent. These implications are in line with Cases C1(c), C4, C7, C5, C6, C8, and C9 in our setup. Hence, although Kruiniger (2009) assumes mean stationarity, his results are expected to hold for the near mean-stationary case with 1/2 ≤ q < ∞. Further, the consistency results for the case of mean nonstationarity (Cases C1(a),(b), C2, and C3 ) are the major differences between Kruiniger (2009) and this paper. 3.4. Model with an endogenous variable In this subsection, we extend an AR(1) model to a model with an endogenous variable and show that results similar to those presented in the previous section can be obtained. The key point in the previous section is that an additional correlation between an endogenous variable and the instruments appears to be a result of mean nonstationarity (see (7)). This subsection shows that the same is true for a model with an endogenous variable and provides a brief intuitive discussion. Let us consider the following model: yit

=

αyi,t−1 + βxit + ηi + vit ,

xit

=

ρxi,t−1 + τ ηi + θvit + eit ,

(i = 1, ..., N ; t = 1, ..., T ),

and

(11)

where eit ∼ iid(0, σe2 ). xit is endogenous unless θ = 0. This specification is used in Blundell, Bond and Windmeijer (2000). Note that this model can be written as ) ) ( ) ( )( ) ( ( (1 + θβ)vit + βeit (1 + βτ )ηi yi,t−1 α βρ yit , + + = θvit + eit τ ηi xi,t−1 0 ρ xit which we rewrite as yit = Φyi,t−1 + η i + vit .

(i = 1, ..., N ; t = 1, ..., T ) 9

(12)

Assume that all eigenvalues of Φ fall inside the unit circle. This fact implies that the process yit is stable. For the initial conditions, we assume yi0 = Ξµi + wi0 , where µi = (I2 − Φ)−1 η i and wi0 =

(13)

∑∞

Φj vi,−j . From (12), (13), and (∑ ) ] [ t αt β αt−j ρj t j=1 Φ = , 0 ρt j=0

we have (14)

yit = Ht µi + wit , where Ht = I2 − (I2 − Ξ)Φt =

(hyt



hxt )′ = 

1 − (1 − ξ11 )αt αt ξ21

1 + (1 − ξ22 )ρt +

(∑

)

t t−j j ρ j=1 α (∑ ) t t−j j βξ21 α ρ j=1

ρt ξ12 + β(ξ11 − 1)

 

∑∞ y x ′ and wit = j=0 Φj vi,t−j = (wit wit ) with ξij being the (i, j) element of Ξ. Note that yit is mean stationary only when Ξ = I2 . Each component in (14) can be written as yit xit

y = (hyt )′ µi + wit ,

=

(hxt )′ µi

+

and

x wit .

We now consider the GMM estimation of model (11). Let us consider the following moment conditions: E(yis ∆vit ) = 0.

(s = 0, ..., t − 2; t = 2, ..., T )

Then, the Jacobian of the moment condition is ∂E(yis ∆vit ) ∂α ∂E(yis ∆vit ) − ∂β −

y y = E(yis ∆yi,t−1 ) = (hys )′ Σµ ∆hyt−1 + E(wis ∆wi,t−1 ),

and

y x = E(yis ∆xit ) = (hys )′ Σµ ∆hxt + E(wis ∆wit ),

(15) (16)

where Σµ = V ar(µi ). From (15) and (16), it is easy to see that both the initial conditions and the degree of persistency affect the strength of the instruments. Note that the first terms on the right-hand side of (15) and (16) disappear only when Ξ = I2 . Hence, depending on the underlying parameters, the instruments may be strong in some cases and weak in others, although we do not know this in advance. The same result applies to the moment conditions E(xis ∆vit ) = 0, (s = 0, ..., t − 2; t = 2, ..., T ). While it may be possible to derive the asymptotic distribution under a local to unity system as in the previous section, we do not pursue this direction since we can expect similar results. Instead, we conduct a large-scale Monte Carlo simulation and show in the next section that results similar to those for the AR(1) case are also obtained for model (11). 4. Simulation studies In this section, we conduct a simulation to assess whether the derived asymptotic properties approximate the behaviour of the FD-GMM estimator well. We also investigate the other estimators that might mitigate the weak instruments problem. We first consider an AR(1) model and then proceed to a model with an endogenous variable. 4.1. AR(1) model Design. We consider the panel AR(1) model discussed in Section 3. The simulation design is identical to that of Hayakawa (2009) except for the specification of the variance of an idiosyncratic term in the initial conditions. The data are generated as yit yi0

= α0 yi,t−1 + ηi + vit , ηi = + εi0 , 1−α ¯

(i = 1, ..., N ; t = 1, ..., T ),

10

and

where α ¯ and δ in (2) are related to δ = (1 − α0 )/(1 − α). ¯ We set α0 = 0.95, T = 8, and N = 300. We focus on the case where the persistency of the data is strong. The simulation results in Hayakawa (2009) show that when the persistency of the data is not strong, the GMM estimators perform well irrespective of the values of α ¯ . Note that this finding is consistent with the theoretical implication that when the persistency of the data is not strong, the GMM estimator works well. We generate vit ∼ iidN (0, 1), ηi ∼ iidN (0, ση2 ) with ση2 = {0.2, 1, 5}, and εi0 ∼ N (0, λ/(1 − α02 )) with λ = {1, 5, 10, 50}. We move α ¯ from 0.9000 to 0.9975 in steps of 0.0025. Note that the data are mean stationary when α ¯ = 0.95 and mean nonstationary when α ¯ ̸= 0.95. The number of replications for each design is 1,000. The number of total designs is 3(λ) × 3(ση2 ) × 40(¯ α) = 360. We consider six estimators. The first is the FD-GMM estimator given by (5), which we denote as α bF D2 . To investigate the effect of the number of instruments, we also compute the FD-GMM estimator by using a smaller number of instruments ZL1 = diag(yi0 , yi1 , ..., yi,T −2 ), which is denoted as α bF D1 . The moment conditions for i α bF D2 and α bF D1 are denoted as ‘FD2’ and ‘FD1’, respectively. These two estimators are based on the FD model. Another model is the one in forward orthogonal deviations. Since the GMM estimators for the FD model and for those in forward orthogonal deviations are numerically identical if the instruments ZL2 i are used, we consider a GMM estimator for the model in forward orthogonal deviations where the instruments ZL1 are used. We i denote this estimator as α bF OD1 . The fourth and fifth estimators are the LDIV estimators of Hahn, Hausman and Kuersteiner (2007). The infeasible LDIV estimator is defined as (∑ ) (∑ )+ (∑ ) N N N ˙ ′i ˙ i z˙ ′i ˙ i y˙ i i=1 y˙ i,−1 z i=1 z i=1 z α bLDIV = (∑ ) (∑ )+ (∑ ), N N N ˙ ′i ˙ i z˙ ′i ˙ i y˙ i,−1 i=1 y˙ i,−1 z i=1 z i=1 z

where y˙ i = yiT − yi1 , y˙ i,−1 = yi,T −1 − yi0 , v˙ i = viT − vi1 , z˙ i = (yi0 , ui2 , ..., ui,T −1 )′ , and uit = ηi + vit . Since uit is unobservable, we need to use a preliminary initial estimate. We denote the LDIV estimator obtained from the residual of α bF D2 as α bLDIV 0 and denote the three times iterated LDIV estimator as α bLDIV 3 . Finally, we consider the BCWG estimator, which we denote as α bBCW G , according to Bun and Carree (2005). Since the consistency of the BCWG estimator does not depend on the assumption of the initial conditions, the BCWG estimator is expected to perform well for all types of the initial conditions because the numerator of the asymptotic bias of this estimator does not depend on the assumption of the initial conditions, while the denominator does. However, since the denominator can be consistently estimated by its sample analogue, the BCWG estimator is valid for all initial conditions. Results. To save space, we only report the results with λ = 1. Complete results are provided in a supplementary appendix, which is available upon request. The simulation results of the means, standard deviations, and empirical sizes at a 5% significance level of these estimators are summarized in Figures 1 to 3. In these figures, the horizontal axis is α ¯ ranging from 0.9000 to 0.9975 and the three lines correspond to ση2 = 0.2, 1, 5. Figure 1 shows that the GMM estimators α bF D2 , α bF D1 , and α bF OD1 perform poorly when α ¯ is close to α0 = 0.95 and tend to perform better as α ¯ moves away from α0 . This behaviour is consistent with the theoretical results. Moreover, we find that reducing the number of instruments has a large effect on behaviour (e.g. compare Figure 1(a) with Figure 1(b)). We also find that α bF D1 and α bF OD1 behave similarly. In terms of inference, Figure 3 shows that substantial size distortions for GMM estimators exist when α ¯ is close to 0.95. An exception is α bF OD1 (Figure 3(c)). However, Figure 2(c) shows that α bF OD1 has a large dispersion and its confidence interval is quite wide, which might be uninformative for inference purposes. Moreover, the LDIV estimators also have substantial size distortions despite their bias being relatively small because of the two-step nature of the LDIV estimator, where estimated residuals are used to construct the moment conditions. The unreported simulation results also reveal that the size distortions are very small for infeasible LDIV estimators where the true residuals are used. For the effects of mean nonstationarity, we find that the size distortions become smaller as α ¯ moves away from α0 . This finding indicates that the normal approximation of the asymptotic distribution is accurate when the degree of mean nonstationarity is large. Overall, we can say that the derived asymptotic results approximate the finite sample behaviour reasonably well and that the behaviour of the GMM estimators crucially depends on the assumption of the initial conditions. As a remedy for this problem, we explore another estimator that might mitigate the dependence on the initial conditions. Figures 1–3 show that the LDIV estimator is much less affected by the initial conditions than are the GMM estimators. Although α bLDIV 0 is slightly affected by the initial conditions because of the initial poor estimation of α bF D2 , the three times iterated α bLDIV 3 performs fairly well. Further, α bBCW G is completely unaffected by the initial conditions. This finding may not be surprising 11

since α bBCW G does not use the instrumental variables. However, α bBCW G is only applicable to models with strictly exogenous variables, which is restrictive in practice.

12

1

1

0.9

0.9

0.8

0.8 0.2 1 5

0.7

0.6

0.6

0.5

0.5 0.91

0.93

0.95

0.97

0.2 1 5

0.7

0.99

0.91

0.93

0.95

0.97

0.99

α ¯

α ¯

Figure 1(a): Mean of α bF D2 (α0 = 0.95, λ = 1)

Figure 1(b): Mean of α bF D1 (α0 = 0.95, λ = 1)

1

1

0.9

0.9

0.8

0.8 0.2 1 5

0.7

0.6

0.6

0.5

0.5 0.91

0.93

0.95

0.97

0.2 1 5

0.7

0.99

0.91

0.93

0.95

0.97

0.99

α ¯

α ¯

Figure 1(c): Mean of α bF OD1 (α0 = 0.95, λ = 1)

Figure 1(d): Mean of α bLDIV 0 (α0 = 0.95, λ = 1)

1

1

0.9

0.9

0.8

0.8 0.2 1 5

0.7

0.6

0.6

0.5

0.5 0.91

0.93

0.95

0.97

0.2 1 5

0.7

0.99

0.91

0.93

0.95

0.97

α ¯

0.99

α ¯

Figure 1(e): Mean of α bLDIV 3 (α0 = 0.95, λ = 1)

Figure 1(f): Mean of α bBCW G (α0 = 0.95, λ = 1)

13

0.4

0.4

0.35

0.35

0.3

0.3

0.25

0.25 0.2 1 5

0.2 0.15

0.15

0.1

0.1

0.05

0.05

0

0.91

0.93

0.95

0.97

0.99

0

α ¯

Figure 2(a): Standard deviation of α bF D2 (α0 = 0.95, λ = 1) 0.4

0.35

0.35

0.3

0.3

0.25

0.25 0.2 1 5

0.15

0.1

0.1

0.05

0.05

0.91

0.93

0.95

0.97

0.99

0

α ¯

Figure 2(c): Standard deviation of α bF OD1 (α0 = 0.95, λ = 1) 0.4

0.35

0.35

0.3

0.3

0.25

0.25 0.2 1 5

0.15

0.1

0.1

0.05

0.05

0.91

0.93

0.95

0.97

0.99

0.97

0.99

α ¯

0.2 1 5

0.91

0.93

0.95

0.97

0.99

α ¯

0.2 1 5

0.2

0.15

0

0.95

Figure 2(d): Standard deviation of α bLDIV 0 (α0 = 0.95, λ = 1)

0.4

0.2

0.93

0.2

0.15

0

0.91

Figure 2(b): Standard deviation of α bF D1 (α0 = 0.95, λ = 1)

0.4

0.2

0.2 1 5

0.2

0

α ¯

Figure 2(e): Standard deviation of α bLDIV 3 (α0 = 0.95, λ = 1)

0.91

0.93

0.95

0.97

0.99

Figure 2(f): Standard deviation of α bBCW G (α0 = 0.95, λ = 1)

14

α ¯

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 0.2 1 5

0.5 0.4

α ¯

0.4

0.3

0.2

0.1

0.1 0.91

0.93

0.95

0.97

0

0.99

Figure 3(a): Empirical size of α bF D2 (α0 = 0.95, λ = 1)

0.93

0.95

0.97

0.99

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 0.2 1 5

0.5 0.4

0.2 1 5

0.5

α ¯

0.4

0.3

0.3

0.2

0.2

0.1

0.1 0.91

0.93

0.95

0.97

0

0.99

Figure 3(c): Empirical size of α bF OD1 (α0 = 0.95, λ = 1) 0.9 0.8 0.7 0.6 0.2 1 5

0.5 0.4

α ¯

0.3 0.2 0.1 0.91

0.93

0.95

0.97

0.91

0.93

0.95

0.97

0.99

Figure 3(d): Empirical size of α bLDIV 0 (α0 = 0.95, λ = 1)

1

0

0.91

Figure 3(b): Empirical size of α bF D1 (α0 = 0.95, λ = 1)

1

0

α ¯

0.3

0.2

0

0.2 1 5

0.5

0.99

Figure 3(e): Empirical size of α bLDIV 3 (α0 = 0.95, λ = 1)

15

α ¯

4.2. Model with an endogenous variable In this subsection, we extend the AR(1) model to include an endogenous variable and investigate whether results similar to those in the AR(1) case are obtained. Design. We consider the following model: yit

= α0 yi,t−1 + β0 xit + ηi + vit ,

xit

= ρ0 xi,t−1 + τ ηi + θvit + eit ,

and

where ηi ∼ iid(0, ση2 ), vit ∼ iid(0, σv2 ), and eit ∼ iid(0, σe2 ). The data are generated as ( ) ( )( ) ( ) ( ) yit α0 β0 ρ0 yi,t−1 (1 + β0 τ )ηi (1 + θβ0 )vit + β0 eit = + + xit 0 ρ0 xi,t−1 τ ηi θvit + eit with the initial conditions: ( ) ( yi0 1−α ¯ = xi0 0 wi0 =

10 ( ∑ α0 0 j=0

−β0 ρ¯ 1 − ρ¯

)−1 (

β0 ρ0 ρ0

)j (

(1 + βτ )ηi τ ηi

)

+ wi0 ,

(1 + θβ0 )vij + β0 eij θvij + eij

)

and

,

(17)

where vij ∼ iid(0, σv2 ) and eij ∼ iid(0, σe2 ). Note that when α ¯ = α0 and ρ¯ = ρ0 , yit and xit are mean stationary. For the parameter values, we consider (α0 , ρ0 ) = (0.4, 0.8), (0.8, 0.4), β0 = 1 − α0 , θ = {0, 0.5}, and τ = 0.25. When θ = 0.5, xit is an endogenous variable, while when θ = 0, xit is a strictly exogenous variable. For the values of ση2 and σe2 , we follow the approach of Bun and Sarafidis (2013): ση2

=

σe2

=

c2v

=

c2e

=

ζ2

=

c2v σv2 V R , ζ2 SN R + 1 − c2v , c2e [ ] (1 + αρ) 2ρ(α + ρ)(1 + βθ) 2 2 (1 + βθ) + ρ − , (1 − α2 )(1 − ρ2 )(1 − αρ) 1 + αρ (1 + αρ)β 2 , and 2 (1 − α )(1 − ρ2 )(1 − αρ) βτ + 1 − ρ , (1 − α)(1 − ρ)

where V R stands for the ‘variance ratio’ and SN R stands for the ‘signal-to-noise ratio’: SN R = (V ar(yit |ηi ) − σv2 )/σv2 . We consider V R = {1, 100} and SN R = {4, 9} with σv2 = 1. Such a large value of V R is also used in Bun and Sarafidis (2013). Note that if θ = 0.5, ση2 = 0.3313 when (α0 , ρ0 , V R) = (0.4, 0.8, 1), ση2 = 33.13 when (α0 , ρ0 , V R) = (0.4, 0.8, 100), ση2 = 0.1247 when (α0 , ρ0 , V R) = (0.8, 0.4, 1), and ση2 = 12.47 when (α0 , ρ0 , V R) = (0.8, 0.4, 100). α ¯ and ρ¯ move over the range (α0 − 0.0525, α0 + 0.0475) and (ρ0 − 0.0525, ρ0 + 0.0475) in steps of 0.005, respectively. For a given (α0 , ρ0 , ση2 ), we have 202 = 400 cases. Thus, to draw one 3D plot, we replicate 400 × 1000 = 400, 000 times for each (α0 , ρ0 , ση2 ). For the sample size, we only consider the case of T = 8, N = 200 to save space. The number of replications is 1,000. We consider six estimators. As in the AR(1) case, we consider three GMM estimators. The first two estimators are the FD-GMM estimators using the moment conditions E(yis ∆vit ) = 0, E(xis ∆vit ) = 0 (s = 0, ..., t − 2; t = 2, ..., T ), and E(yi,t−2 ∆vit ) = 0, E(xi,t−2 ∆vit ) = 0 (t = 2, ..., T ). In the computation, a onestep optimal weighting matrix is used. These two estimators are denoted as (b αF D2 , βbF D2 ) and (b αF D1 , βbF D1 ). The third is a GMM estimator for the equations in forward orthogonal deviations with moment conditions similar to α bF D1 , which is denoted as (b αF OD1 , βbF OD1 ). Another two estimators under consideration are the LDIV estimators where (yi0 , xi0 , ui2 , ..., ui,T −1 )′ are used as the instruments. We denote the LDIV estimator by using the residuals from (b αF D2 , βbF D2 ) as (b αLDIV 0 , βbLDIV 0 ), and the three times iterated estimator as (b αLDIV 3 , βbLDIV 3 ). The BCWG estimator is denoted as (b αBCW G , βbBCW G ). Note that the BCWG estimator is consistent only when xit is strictly exogenous, i.e., θ = 0. 16

Results. To save space, we only report the results of the means of the estimators for (α0 , ρ0 ) = (0.8, 0.4), V R = {1, 100}, θ = 0.5, and SN R = 4. Complete results are again provided in a supplement, which is available upon request. The simulation results are summarized in Figures 4 to 7. Similar comments to the AR(1) case apply for models with an endogenous variable. From these figures, including those reported in the supplement, we find that as in the AR(1) cases, the GMM estimators are very sensitive to the initial conditions and variance ratio ση2 /σv2 . When α ¯ is close to α0 , the GMM estimators generally have large biases. Interestingly, the meanstationary case tends to be the worst when V R = 100. We also observe that the GMM estimator using a small number of instruments is more affected by the initial conditions than that using all the instruments. This finding suggests that discarding the long lagged variables is not always a good strategy to mitigate the bias. Furthermore, we find that ρ¯ does not affect the bias of the GMM estimators to a large degree. The LDIV estimators have a very small bias and are much more robust to the initial conditions than are the GMM estimators because the instruments are strong enough to have almost no bias regardless of the value of the initial conditions (Hahn, Hausman and Kuersteiner, 2007). Thus, while the simulation design is somewhat limited, the LDIV estimator can be an alternative estimator to the FD-GMM estimator. The BCWG estimator has very small bias and is unaffected by the initial conditions when xit is strictly exogenous (θ = 0), while it is severely biased when xit is endogenous (θ = 0.5) as expected. For the effects of the signal-to-noise ratio, we find that overall performance improves when SN R increases from 4 to 9 as reported in the supplementary appendix.

17

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.84

0.84 0.82

α ¯

0.82

0.44 0.42

0.8 0.4

0.78 0.76

0.38

α ¯

ρ¯

0.44 0.42

0.8 0.4

0.78 0.76

0.36

Figure 4(a): Mean of α bF D2 (α0 = 0.8, ρ0 = 0.4, V R = 1, SN R = 4, θ = 0.5)

0.38

ρ¯

0.36

Figure 4(b): Mean of α bF D1 (α0 = 0.8, ρ0 = 0.4, V R = 1, SN R = 4, θ = 0.5)

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.84

0.84 0.82

α ¯

0.82

0.44 0.42

0.8 0.4

0.78 0.76

0.38

α ¯

ρ¯

0.44 0.42

0.8 0.4

0.78 0.76

0.36

Figure 4(c): Mean of α bF OD1 (α0 = 0.8, ρ0 = 0.4, V R = 1, SN R = 4, θ = 0.5)

0.38

ρ¯

0.36

Figure 4(d): Mean of α bLDIV 0 (α0 = 0.8, ρ0 = 0.4, V R = 1, SN R = 4, θ = 0.5)

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.84

0.84 0.82

α ¯

0.82

0.44 0.42

0.8 0.4

0.78 0.76

0.38

α ¯

ρ¯

0.44 0.42

0.8 0.4

0.78 0.76

0.36

Figure 4(e): Mean of α bLDIV 3 (α0 = 0.8, ρ0 = 0.4, V R = 1, SN R = 4, θ = 0.5)

0.38

ρ¯

0.36

Figure 4(f): Mean of α bBC (α0 = 0.8, ρ0 = 0.4, V R = 1, SN R = 4, θ = 0.5)

18

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0.84

0.84 0.82

α ¯

0.82

0.44 0.42

0.8 0.4

0.78 0.76

0.38

α ¯

ρ¯

0.44 0.42

0.8 0.4

0.78 0.76

0.36

Figure 5(a): Mean of βbF D2 (β0 = 0.2, ρ0 = 0.4, V R = 1, SN R = 4, θ = 0.5)

0.38

ρ¯

0.36

Figure 5(b): Mean of βbF D1 (β0 = 0.2, ρ0 = 0.4, V R = 1, SN R = 4, θ = 0.5)

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0.84

0.84 0.82

α ¯

0.82

0.44 0.42

0.8 0.4

0.78 0.76

0.38

α ¯

ρ¯

0.44 0.42

0.8 0.4

0.78 0.76

0.36

Figure 5(c): Mean of βbF OD1 (β0 = 0.2, ρ0 = 0.4, V R = 1, SN R = 4, θ = 0.5)

0.38

ρ¯

0.36

Figure 5(d): Mean of βbLDIV 0 (β0 = 0.2, ρ0 = 0.4, V R = 1, SN R = 4, θ = 0.5)

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0.84

0.84 0.82

α ¯

0.82

0.44 0.42

0.8 0.4

0.78 0.76

0.38

α ¯

ρ¯

0.44 0.42

0.8 0.4

0.78 0.76

0.36

Figure 5(e): Mean of βbLDIV 3 (β0 = 0.2, ρ0 = 0.4, V R = 1, SN R = 4, θ = 0.5)

0.38

ρ¯

0.36

Figure 5(f): Mean of βbBC (β0 = 0.2, ρ0 = 0.4, V R = 1, SN R = 4, θ = 0.5)

19

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.84

0.84 0.82

α ¯

0.82

0.44 0.42

0.8 0.4

0.78 0.76

0.38

α ¯

ρ¯

0.44 0.42

0.8 0.4

0.78 0.76

0.36

Figure 6(a): Mean of α bF D2 (α0 = 0.8, ρ0 = 0.4, V R = 100, SN R = 4, θ = 0.5)

0.38

ρ¯

0.36

Figure 6(b): Mean of α bF D1 (α0 = 0.8, ρ0 = 0.4, V R = 100, SN R = 4, θ = 0.5)

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.84

0.84 0.82

α ¯

0.82

0.44 0.42

0.8 0.4

0.78 0.76

0.38

α ¯

ρ¯

0.44 0.42

0.8 0.4

0.78 0.76

0.36

Figure 6(c): Mean of α bF OD1 (α0 = 0.8, ρ0 = 0.4, V R = 100, SN R = 4, θ = 0.5)

0.38

ρ¯

0.36

Figure 6(d): Mean of α bLDIV 0 (α0 = 0.8, ρ0 = 0.4, V R = 100, SN R = 4, θ = 0.5)

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.84

0.84 0.82

α ¯

0.82

0.44 0.42

0.8 0.4

0.78 0.76

0.38

α ¯

ρ¯

0.44 0.42

0.8 0.4

0.78 0.76

0.36

Figure 6(e): Mean of α bLDIV 3 (α0 = 0.8, ρ0 = 0.4, V R = 100, SN R = 4, θ = 0.5)

0.38

ρ¯

0.36

Figure 6(f): Mean of α bBC (α0 = 0.8, ρ0 = 0.4, V R = 100, SN R = 4, θ = 0.5)

20

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0.84

0.84 0.82

α ¯

0.82

0.44 0.42

0.8 0.4

0.78 0.76

0.38

α ¯

ρ¯

0.44 0.42

0.8 0.4

0.78 0.76

0.36

Figure 7(a): Mean of βbF D2 (β0 = 0.2, ρ0 = 0.4, V R = 100, SN R = 4, θ = 0.5)

0.38

ρ¯

0.36

Figure 7(b): Mean of βbF D1 (β0 = 0.2, ρ0 = 0.4, V R = 100, SN R = 4, θ = 0.5)

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0.84

0.84 0.82

α ¯

0.82

0.44 0.42

0.8 0.4

0.78 0.76

0.38

α ¯

ρ¯

0.44 0.42

0.8 0.4

0.78 0.76

0.36

Figure 7(c): Mean of βbF OD1 (β0 = 0.2, ρ0 = 0.4, V R = 100, SN R = 4, θ = 0.5)

0.38

ρ¯

0.36

Figure 7(d): Mean of βbLDIV 0 (β0 = 0.2, ρ0 = 0.4, V R = 100, SN R = 4, θ = 0.5)

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0.84

0.84 0.82

α ¯

0.82

0.44 0.42

0.8 0.4

0.78 0.76

0.38

α ¯

ρ¯

0.44 0.42

0.8 0.4

0.78 0.76

0.36

Figure 7(e): Mean of βbLDIV 3 (β0 = 0.2, ρ0 = 0.4, V R = 100, SN R = 4, θ = 0.5)

0.38

ρ¯

0.36

Figure 7(f): Mean of βbBC (β0 = 0.2, ρ0 = 0.4, V R = 100, SN R = 4, θ = 0.5)

21

4.3. Testing for mean stationarity Empirical studies often test for mean stationarity by using the difference Sargan test. However, its finite sample properties are not well studied in the literature. The exceptions we are aware of are Blundell et al. (2000) and Bun and Sarafidis (2013). Hence, we investigate the size and power properties of the difference Sargan test. For this, we consider an AR(1) model with the same data-generating processes as in Section 4.1. We consider the cases with T = 8, N = 300, α0 = 0.2, 0.95, and λ = 1. We compute the sizes and powers of standard Sargan tests and difference Sargan tests as well as the means of the two-step system GMM estimators (denoted 2step 2step 2step as α bSY bSY bSY S2 and α S1 ). α S2 uses the moment conditions E(yis ∆vit ) = 0 for s = 0, ..., t − 2; t = 2, ..., T and 2step E[∆yi,t−1 (ηi + vit )] = 0 for t = 2, ..., T . These moment conditions are denoted as ‘SYS2’. α bSY S1 uses moment conditions E(yi,t−2 ∆vit ) = 0 and E[∆yi,t−1 (ηi + vit )] = 0 for t = 2, ..., T . These moment conditions are denoted as ‘SYS1’. Figure 8 depicts the mean of the two-step system GMM estimators, while Figures 9 and 10 depict the rejection frequencies of the Sargan and difference Sargan tests at a significance level of 0.05. Before investigating the size/power properties of the standard and difference Sargan tests, we examine the behaviour of the system GMM estimators that underlies the tests. Figures 8(a) and 8(b) show that the bias of the system GMM estimator is small regardless of the value of α ¯ and the variance ratio when α0 = 0.2. This result is surprising since the system GMM estimator is consistent only when α ¯ = α0 . This finding might come from the special structure of the system GMM estimator, which it is a linear combination of the FD- and level GMM estimators. Specifically, Blundell et al. (2000) show that the (one-step) system GMM estimator can be written as α bSY S = γ α bF D + (1 − γ)b αLEV , where α bSY S , α bF D , α bLEV are the system, FD-, and level GMM estimators, respectively, and γ is a weight determined by the data. Hayakawa (2007) demonstrates that α bF D has a negative bias, while α bLEV has a positive bias; consequently, these two biases with opposite directions are cancelled out in the system GMM estimator α bSY S . This finding might explain the unexpected behaviour of the system GMM estimator. However, for the case of α0 = 0.95, we find that the system GMM estimator has large biases even for α ¯ = 0.95 where the system GMM estimator is consistent. For the case of α ¯ < α0 , the bias of the system GMM estimator is large, which is a natural consequence of inconsistency. However, in the region α ¯ > α0 , surprisingly, the bias reduces as α ¯ approaches one despite the system GMM being inconsistent. This finding may result from the special structure of the system GMM estimator explained above. Since the weight γ approaches one as the instruments strengthen (Blundell et al., 2000), and noting that the instruments for α bF D are strong in the region α ¯ > α0 (see Figure 2 or the discussion of Hayakawa (2009)), we may conclude that γ is very close to one and the contribution of inconsistent α bLEV becomes quite minor when α ¯ > α0 . We now consider the size/power properties of the standard Sargan and difference Sargan tests. Since the consistency of the FD-GMM estimators does not depend on the initial conditions, the lines on Figures 9(a), 9(b) 10(a), and 10(b) should be close to the 0.05 significance level for all values of α ¯ . However, since the system GMM estimator is consistent only when α ¯ = α0 , the lines on Figures 9(c), 9(d), 10(c), and 10(d) should be close to 0.05 at α ¯ = α0 and should move toward one as α ¯ moves away from α0 . Figures 9(a), 9(b), 10(a), and 10(b) show that the sizes of the Sargan test based on the FD-GMM estimator are similar regardless of the value of α ¯ and the variance ratio. However, the Sargan test based on the estimator 2step 2step α bF bF D2 , which uses more instruments than α D1 , suffers from a large size distortion. For the Sargan test of the system GMM estimator (Figures 9(c), 9(d), 10(c), and 10(d)), we find a large size distortion if many instruments are used for both α0 = 0.2 and 0.95. Moreover, we find that when the variance ratio is small (ση2 /σv2 = 0.2), the power is surprisingly low. For the case of α0 = 0.2, this might not be problematic since the bias of the system GMM estimator is small. However, when α0 = 0.95, this is not the case. Figures 9(c) and 9(d) show that the system GMM estimator is biased even for a small variance ratio ση2 /σv2 = 0.2. Hence, the low power of the Sargan test is a serious problem. By contrast, when the variance ratio is large (ση2 /σv2 = 5), power increases as α ¯ moves away from α0 , which is an expected (and desirable) property. Finally, we consider the difference Sargan test. Since the moment conditions used in the system GMM estimator but not in the FD-GMM estimator, in other words, the moment conditions for equation in levels, hold only when α ¯ = α0 , the lines on Figures 9(e), 9(f), 10(e), and 10(f) should be close to 0.05 at α ¯ = α0 and should move toward one as α ¯ moves away from α0 . Similar results to the standard Sargan test for the system GMM are obtained. As the number of instruments decreases, the size properties improve. When the variance ratio is small, the power of the test is low, but power rises as the variance ratio grows.

22

0.5

0.2 1 5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0.16

0.18

0.2

0.22 α b2step SY S2 ,

0.24

α ¯

0

0.2 1 5

0.16

Figure 8(a): Mean of (α0 = 0.2, λ = 1)

1.1

1

1

0.9

0.9

0.2 1 5

0.7

0.6

0.6

0.91

0.93

0.95

0.97 2step α bSY S2 ,

0.22 2step α bSY S1 ,

0.24

α ¯

0.99

0.2 1 5

0.8

0.7

0.5

0.2

Figure 8(b): Mean of (α0 = 0.2, λ = 1)

1.1

0.8

0.18

α ¯

0.5

Figure 8(c): Mean of (α0 = 0.95, λ = 1)

0.91

0.93

0.95

0.97 2step α bSY S1 ,

Figure 8(d): Mean of (α0 = 0.95, λ = 1)

23

0.99

α ¯

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 0.2 1 5

0.5 0.4

0.4

0.3

0.3

0.2

0.2

0.1 0

0.2 1 5

0.5

0.1 0.16

0.18

0.2

0.22

0.24

α ¯

0

0.16

Figure 9(a): Sargan test(FD2), (α0 = 0.2, λ = 1)

0.8 0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

α ¯

0.16

0.18

0.2

0.22

0.24

α ¯

0

0.2 1 5

0.16

Figure 9(c): Sargan test(SYS2), (α0 = 0.2, λ = 1)

0.18

0.2

0.22

0.24

α ¯

Figure 9(d): Sargan test(SYS1), (α0 = 0.2, λ = 1)

1

1

0.9

0.9 0.2 1 5

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0.24

0.9 0.2 1 5

0.7

0.8

0.22

1

0.9

0

0.2

Figure 9(b): Sargan test(FD1), (α0 = 0.2, λ = 1)

1

0.8

0.18

0.16

0.18

0.2

0.22

0.24

α ¯

0

Figure 9(e): Difference Sargan test(FD2& SYS2), (α0 = 0.2, λ = 1)

0.2 1 5

0.16

0.18

0.2

0.22

0.24

α ¯

Figure 9(f): Difference Sargan test(FD1 & SYS1), (α0 = 0.2, λ = 1)

24

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 0.2 1 5

0.5 0.4

0.4

0.3

0.3

0.2

0.2

0.1 0

0.2 1 5

0.5

0.1 0.91

0.93

0.95

0.97

0.99

α ¯

0

0.91

0.93

Figure 10(a): Sargan test(FD2), (α0 = 0.95, λ = 1) 1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.99

α ¯

0.6 0.2 1 5

0.5 0.4

0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1 0.91

0.93

0.95

0.97

0.99

α ¯

0

0.2 1 5

0.91

Figure 10(c): Sargan test(SYS2), (α0 = 0.95, λ = 1)

0.93

0.95

0.97

0.99

α ¯

Figure 10(d): Sargan test(SYS1), (α0 = 0.95, λ = 1)

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 0.2 1 5

0.5 0.4

0.4 0.3

0.2

0.2

0.1

0.1 0.91

0.93

0.95

0.97

0.99

0.2 1 5

0.5

0.3

0

0.97

Figure 10(b): Sargan test(FD1), (α0 = 0.95, λ = 1)

1

0

0.95

α ¯

0

Figure 10(e): Difference Sargan test(FD2& SYS2), (α0 = 0.95, λ = 1)

0.91

0.93

0.95

0.97

0.99

α ¯

Figure 10(f): Difference Sargan test(FD1 & SYS1), (α0 = 0.95, λ = 1)

25

5. Empirical illustration In this section, we provide an empirical example to demonstrate the theoretical results. One of the important results in the theoretical implications is that the FD-GMM estimator remains consistent even when persistency is strong if mean nonstationarity is present. We now show that this actually happens by using a real dataset. Specifically, following Arellano (2003a), we estimate a VAR(1) model by using Spanish firm panel data covering 1983–1990 on 738 manufacturing companies (T = 8, N = 738) as follows: yit = Ayi,t−1 + ci + εit

(18)

where yit =

[

nit wit

]

,

A=

[

a11 a21

a12 a22

]

,

ci =

[

ηi κi

]

,

and

εit =

[

vit eit

]

where nit and wit denote the logs of employment and wages. All variables are in deviations from the periodspecific cross-section means. We also define [ ] µi = (I2 − A)−1 ci . λi Table 2 provides the estimation results for model (18), which is estimated by using one- and two-step FDGMM estimators with a various number of instruments. From the Sargan test, we find that all the moment conditions are valid. Further, from the difference Sargan test, which checks for mean stationarity, we find that mean stationarity is strongly rejected for both equations regardless of the number of instruments. We provide estimates of the variances of ηi , vit , κi , eit , µi , and λi , which are denoted as σ bη2 , σ bv2 , σ bκ2 , σ be2 , σ bµ , and σ bλ2 . The variance ratios associated with these are also reported. As shown in the previous sections, a large variance ratio is an important factor to determine the strength of the instruments. The presented results show that the bκ2 /b σe2 ranges from 3.38 to 11.90 (average σv2 ranges from 1.11 to 5.50 (average is 2.81), while σ variance ratio σ bη2 /b 2 is 5.82), which seems to be large. In terms of σµ , we find that the estimated σ bµ2 /b σv2 for the employment equation is substantially large (around 135). In the theory part, we implicitly assumed that σµ2 → ∞ as αN → 1. Given σv2 , we may say that this assumption seems to be justified. This finding also implies the very large values of σ bµ2 /b that the assumptions in Kruiniger (2009) do not seem to be compatible with the data. Although Kruiniger (2009) assumes ση2 → 0 as αN → 1, from the values of σ bη2 /b σv2 , it is difficult to justify that ση2 → 0 is satisfied. Of course, it should be stressed that the results of Kruiniger (2009) may be more useful than ours for other cases: the usefulness depends on the data to be analysed. From coefficient a11 , which exhibits potentially strong persistency, we find that the values are similar with a few exceptions regardless of the number of instruments. However, the standard errors reduce as the number of instruments increases, which is not surprising. From the simulation results presented in Section 4, we know that if we change the number of instruments, the estimates (sometimes) substantially change, becoming severely biased under (near) mean stationarity but almost unbiased under mean nonstationarity when the variance ratio is large. For example, by comparing Figure 1(a) with Figure 1(b), we find that the two estimates are different for α ¯ = 0.95 but similar and almost unbiased for, say, α ¯ = 0.94 when ση2 /σv2 = 5. Given this result, we may conclude that the coefficient a11 is precisely estimated thanks to the strong instruments coming from mean nonstationarity and a large variance ratio. A Monte Carlo simulation whose design is loosely calibrated to the estimated results, which is given below, may also support this result. Furthermore, we may obtain another insight from the empirical results. In empirical studies, researchers often only use a subset of the instruments in each period in the estimation to mitigate the finite sample bias caused by using several instruments. However, as this example illustrates, reducing the number of instruments is not always effective at reducing the finite sample bias since estimates are similar regardless of the number of instruments; however, this does cause unnecessary efficiency loss, which implies a wide confidence interval. Also remember that we have a simulation result that using a smaller number of instruments may cause a larger finite sample bias than the case of using more instruments under mean stationarity (see Section 4). From these results, we obtain two important implications for practitioners. First, although the FD-GMM estimator faces the weak instruments problem and is severely biased, this is not always the case when mean nonstationarity is present. The FD-GMM estimator can precisely estimate the coefficients even when persistency is strong if mean stationarity is present and the variance ratio is large. Second, although empirical researchers often use only a subset of the instruments in the estimation to mitigate the finite sample bias, such a strategy is 26

not always effective. The above example shows that even if we reduce the number of instruments, the estimated coefficients are similar and precisely estimated but the standard errors grow. This finding means that reducing the number of instruments may lead to the unnecessary inflation of standard errors.

27

28

coef. 0.231 0.271 0.260 0.275 0.291 0.285

a21 coef. s.e. -0.126 0.147 -0.071 0.115 -0.059 0.100 -0.043 0.085 -0.023 0.081 -0.030 0.082

Estimator DIF1(1step) DIF2(1step) DIF3(1step) DIF4(1step) DIF5(1step) DIF6(1step)

a22

a12

s.e. 0.155 0.130 0.119 0.106 0.101 0.102

0.113 0.088 0.071 0.061 0.058 0.055

s.e. 0.122 0.111 0.082 0.075 0.071 0.069

2 σ bκ 0.115 0.080 0.077 0.068 0.060 0.063

0.012 0.032 0.041 0.033 0.038 0.030

0.017 0.059 0.031 0.021 0.024 0.025

σ bη2

0.011 0.011 0.011 0.011 0.011 0.011

1.11 2.96 3.80 3.09 3.58 2.76

132.26 134.52 135.96 135.88 136.86 136.28

Wage equation Variance Variance ratio 2 2 2 σ bλ σ be2 σ bκ /b σe2 σ bλ /b σe2 0.106 0.012 9.40 8.65 0.107 0.012 6.70 8.96 0.107 0.012 6.44 8.95 0.107 0.012 5.78 9.02 0.107 0.012 5.09 9.12 0.107 0.012 5.34 9.09

1.470 1.462 1.463 1.466 1.467 1.472

Employment equation Variance Variance ratio 2 2 σ bµ σ bv2 σ bη2 /b σv2 σ bµ /b σv2 1.467 0.011 1.58 133.23 1.461 0.011 5.50 136.20 1.463 0.011 2.84 135.07 1.469 0.011 1.91 134.28 1.468 0.011 2.24 135.00 1.468 0.011 2.28 135.04

S -

0.407 0.152 0.157 0.291 0.150 0.177

Sargan test d.f. p-value -

10 20 28 34 38 40

Sargan test d.f. p-value -

10.385 26.436 35.460 38.034 46.994 48.110

S 12 12 12 12 12 12

0.000 0.000 0.000 0.000 0.000 0.000

12 12 12 12 12 12

0.000 0.000 0.000 0.000 0.000 0.000

difference Sargan test Sd d.f. p-value -

43.560 46.716 53.222 73.382 80.833 84.116

difference Sargan test Sd d.f. p-value -

DIF1(2step) -0.161 0.143 0.173 0.146 0.151 0.106 0.013 11.90 8.38 5.200 10 0.877 38.491 DIF2(2step) -0.034 0.091 0.291 0.102 0.063 0.107 0.012 5.34 9.10 9.568 20 0.975 41.660 DIF3(2step) 0.045 0.073 0.350 0.083 0.040 0.109 0.012 3.46 9.44 16.189 28 0.963 47.804 DIF4(2step) 0.039 0.067 0.340 0.075 0.042 0.109 0.012 3.61 9.41 18.735 34 0.984 58.848 DIF5(2step) 0.061 0.063 0.354 0.069 0.039 0.110 0.012 3.38 9.50 24.891 38 0.950 59.890 DIF6(2step) 0.057 0.063 0.353 0.069 0.039 0.110 0.012 3.40 9.51 25.943 40 0.958 63.143 Note1: GMMj(1step,2step) is the first-difference GMM estimator which uses j lagged instruments in each period at the maximum. Note2: Numbers of instruments in GMM1 to GMM6 are 12, 22, 30, 36, 40 and 42, respectively.

0.124 0.067 0.073 0.102 0.120 0.140

0.113 0.079 0.058 0.052 0.051 0.046

0.901 0.846 0.826 0.841 0.829 0.847

coef. 0.112 0.028 0.091 0.119 0.124 0.124

DIF1(2step) DIF2(2step) DIF3(2step) DIF4(2step) DIF5(2step) DIF6(2step)

s.e. 0.117 0.105 0.072 0.065 0.065 0.062

coef. 0.883 0.796 0.847 0.872 0.862 0.861

Estimator DIF1(1step) DIF2(1step) DIF3(1step) DIF4(1step) DIF5(1step) DIF6(1step)

a11

Table 2: Estimation results of VAR(1) model(T = 8, N = 738)

This section ends with a small Monte Carlo simulation that supports the empirical results. The design is loosely calibrated to the empirical results presented above. [ ] [ ][ ] [ ] [ ] yit a11 a12 yi,t−1 ηi vit = + + , (i = 1, ..., N ; t = 2, ..., T ) and xit a21 a22 xi,t−1 κi eit [ ] [ ][ ] yi1 1 − a11 −a12 ηi = Υ + wi1 xi1 −a21 1 − a22 κi where wi1 is generated as (17), [ ] ([ ] [ ηi 0 ∼ iidN , κi 0 [ ] ([ ] [ vit 0 ∼ iidN , eit 0 For Υ, we consider the following two cases: [ ] 1 0 Υ1 = 0 1

0.030 −0.020 0.011 −0.006

and Υ2 =

[

−0.020 0.077 −0.006 0.012

0.85 0.15 0 0.85

]) ]) ]

,

and

.

.

Υ1 corresponds to the mean-stationary initial conditions, while Υ2 corresponds to the mean-nonstationary initial conditions. The values in Υ2 are based on the estimates of Υ obtained by Arellano (2003a). Note that V ar(ηi )/V ar(vit ) = 2.72 and V ar(κi )/V ar(eit ) = 6.42. We set a11 = 0.85, a12 = 0.15, a21 = 0, a22 = 0.3, T = 8, and N = 738. The number of replications is 1,000. The simulation results in Table 3 show that all the GMM estimators for a11 are severely biased when the data are mean stationary (the case of Υ1 ) even in a large sample with N = 738. We also find that reducing the number of instruments enlarges the finite sample bias and standard deviation. However, when the data are mean nonstationary (the case of Υ2 ), we find that all the GMM estimators estimate the coefficients precisely even when persistency is strong. These findings support the above empirical results and implications for practitioners. 6. Conclusion In this paper, we investigated the behaviour of the FD-GMM estimator for persistent dynamic panel data models where the autoregressive parameter is modelled as αN = 1 − c/N p , (0 < p < ∞). We showed that the assumptions of both the initial conditions and the degree of persistency affect the rate of convergence. One of the most important results is that the FD-GMM estimator can be consistent even when persistency is strong if mean nonstationarity is present, which is in sharp contrast to the well-known result that the FD-GMM estimator performs poorly when persistency is strong. The simulation results showed that the derived theoretical results capture the finite sample behaviour reasonably well. We then provided an empirical example that demonstrates that the FD-GMM estimator performs well even when persistency is strong. The results of this paper imply that the initial conditions, degree of persistency, and variance ratio substantially affect the behaviour of the GMM estimator and, therefore, the inference and related tests. Hence, it is important to develop an estimator that is not affected by these factors. Further, constructing a statistic that measures the strength of the instruments, such as in Stock and Yogo (2005), in the dynamic panel context should be important for applied research. Finally, while the coefficients of the model are assumed to be constant over time throughout the paper, it is of interest to consider the case where the coefficients change during the estimation period as in De Wachter and Tzavalis (2012). Acknowledgements The authors are grateful to the editor, three referees, Kohtaro Hitomi, Yoshihiko Nishiyama, Ryo Okui, Peter Phillips, Mototsugu Shintani, and the seminar participants at the Kyoto University, Singapore Management University, and the 16th Panel Data Conference in Amsterdam for helpful comments. The first author thanks Hashem Pesaran for his support while the first author was visiting the University of Cambridge as a JSPS Postdoctoral Fellow for Research Abroad. The first author acknowledges the financial support from the JSPS Fellowship and the Grant-in-Aid for Scientific Research (KAKENHI 20830056, 22730178) provided by the JSPS. 29

Table 3: Simulation results (T = 8, N = 738)

Estimator GMM1(1step) GMM2(1step) GMM3(1step) GMM4(1step) GMM5(1step) GMM6(1step)

coef. 0.614 0.646 0.690 0.717 0.733 0.743

GMM1(2step) GMM2(2step) GMM3(2step) GMM4(2step) GMM5(2step) GMM6(2step)

0.558 0.626 0.678 0.708 0.725 0.735

Estimator GMM1(1step) GMM2(1step) GMM3(1step) GMM4(1step) GMM5(1step) GMM6(1step)

coef. 0.837 0.827 0.822 0.820 0.819 0.819

Mean-stationary initial conditions (Υ1 ) a11 a12 a21 std. dev. coef. std. dev. coef. std. dev. 0.298 0.051 0.131 0.031 0.277 0.168 0.067 0.076 0.029 0.175 0.125 0.089 0.057 0.028 0.134 0.103 0.102 0.047 0.025 0.111 0.090 0.109 0.041 0.023 0.097 0.086 0.114 0.040 0.021 0.090 0.367 0.187 0.138 0.115 0.101 0.097

0.029 0.061 0.086 0.099 0.106 0.111

0.157 0.081 0.060 0.051 0.045 0.044

0.032 0.029 0.029 0.025 0.023 0.021

0.282 0.179 0.139 0.115 0.102 0.095

Mean-nonstationary initial conditions (Υ2 ) a11 a12 a21 std. dev. coef. std. dev. coef. std. dev. 0.052 0.151 0.036 0.008 0.056 0.051 0.152 0.036 0.013 0.054 0.049 0.151 0.034 0.015 0.052 0.047 0.150 0.032 0.014 0.050 0.046 0.149 0.030 0.014 0.048 0.045 0.148 0.030 0.013 0.047

GMM1(2step) 0.837 0.054 0.151 0.037 0.009 GMM2(2step) 0.826 0.054 0.151 0.037 0.013 GMM3(2step) 0.821 0.052 0.150 0.035 0.015 GMM4(2step) 0.820 0.050 0.149 0.032 0.014 GMM5(2step) 0.818 0.049 0.148 0.031 0.014 GMM6(2step) 0.818 0.048 0.148 0.031 0.013 Note: a11 = 0.85, a12 = 0.15, a21 = 0, a22 = 0.3. See Table 2 for the

0.057 0.056 0.054 0.052 0.050 0.049 definition

coef. 0.310 0.306 0.303 0.302 0.301 0.300

a22 std. dev. 0.121 0.082 0.064 0.054 0.048 0.046

0.309 0.306 0.303 0.302 0.301 0.300

coef. 0.297 0.293 0.292 0.293 0.294 0.294

0.124 0.085 0.066 0.056 0.049 0.047

a22 std. dev. 0.038 0.038 0.035 0.033 0.032 0.031

0.297 0.039 0.293 0.039 0.292 0.036 0.293 0.034 0.294 0.033 0.294 0.033 of estimators.

Appendix Lemma A1. For 0 < p < ∞ and t ≥ 1, ( ) ct 1 t = 1− p +O αN , N N 2p   1, ( ) for Case A t ct(1 − δ) 1 ht = 1 − (1 − δ)αN = , +O , for Case B  δ+ Np N 2p ( ) d 1 t ht = 1 − (1 − δN )αN =1− q +o . for Case C N Nq

(A.1) and

(A.2) (A.3)

Proof of Lemma A1. Using asymptotic equivalence log(1 + x) = x + O(x2 ) as x → 0 and ex = 1 + x + O(x2 ), we have [ ( )] ( ) [ ( c )] −ct 1 ct 1 t αN = exp t log 1 − p = exp + O = 1 − + O . N Np N 2p Np N 2p The results for ht are easily obtained from the first result. □ In what follows, we denote αN and δN as α and δ, respectively, for brevity. ∑t−3 Lemma A2. Let us define wi,t−1 = vi,t−1 + (α − 1) j=0 αj vi,t−j−2 and ψj = (1 − α2j )/(1 − α2 ). Then, for

30

t ≥ s, we have (a)

(b)

(c)

(d)

E(yis ∆yi,t−1 ) = hs αt−2 (1 − δ)

ση2 σ 2 αt−s−2 [1 − (1 − λ)α2s ] − v 1−α 1+α (s = 0, ..., t − 2, t ≥ 2)

 −λσv2   + o(1), for Case A    2 − c/N p   δ(1 − δ)ση2 N p = (A.5) + O(1), for Case B ,  c   dσ 2 N p−q 2  λσv  η  − + o(N p−q ) + o(1). for Case C  c 2 − c/N p  2   (1 − δ)2 h2 V ar(ηi ) + (1 − α)2 V ar(ε2 )  i0 0   (1 − α)2    +σ 2 (h2 σ 2 + σ 2 ) + σ 2 σ 2 (1 − δ − h0 )2  (s = 0, t = 2) v 0 µ η 0 0        2   (1 − δ)2 h2 α2(t−2) V ar(ηi ) + V ar(ξ w is i,t−1 ) s V ar(yis ∆yi,t−1 ) = (A.6) (1 − α)2  2 2(t+s−2) 2  +(1 − α) α V ar(εi0 )     +[σv2 + (1 − α)2 σv2 ψt−2 ][h2s σµ2 + α2s σ02 ]      +α2(t−2) [(1 − δ)αs − hs ]2 ση2 σ02    2 2(t−2)  [(1 − δ)2 ση2 + (1 − α)2 σ02 ]  +σv ψs α   −2ψ σ 2 α2t−s−4 [(1 − δ)h σ 2 − αs (1 − α)2 σ 2 ] (s = 0, .., t − 2, t ≥ 3) s η s v 0  2 2 2p σv ση N   + o(N 2p ), for Cases A and C c2 = , (A.7) 2 2 2 2 2 2 2p   [δ (1 − δ) V ar(ηi ) + σv ση δ ]N + o(N 2p ). for Case B c2 t−s α [1 − (1 − λ)α2s ]σv2 E(yit yis ) = ht hs σµ2 + (A.8) 1 − α2  2 2p σ N   η + o(N 2p ), for Cases A and C 2 c = (A.9) , 2 2 2p σ (1 − δ) N  p  η + O(N ), for Case B c2 V ar(yit yis ) = h2t h2s V ar(µ2i ) + V ar(ξit ξis ) + α2(t+s) V ar(ε2i0 ) ( )2 ( ) + ht αs + hs αt σµ2 σ02 + h2t ψs + h2s ψt + 2hs ht αt−s ψs σµ2 σv2 ( ) + α2s ψt + 3α2t ψs σv2 σ02 , (A.10) = O(N 4p ).

(e)

(A.4)

for Cases A, B, and C, and 2αt−s [1 − (1 − λ)α2s ]σ 4 V ar(yis ∆vit ) = 2ht hs σv2 σµ2 + (s = 0, .., t − 2) 1 − α2  2 2 2p ( ) 2σv ση N   + o N 2p , for Cases A and C 2 c . = 2 2 2 2p 2(1 − δ) σ σ N  v η  + O (N p ) . for Case B 2 c

Proof of Lemma A2. Note that ∆yi,t−1 can be written as { (1 − δ)ηi + vi1 − (1 − α)εi0 , ∆yi,t−1 = (1 − δ)αt−2 ηi + wi,t−1 − (1 − α)αt−2 εi0 Further, note that

1 Np = , 1−α c

1 1 = , 1+α 2 − c/N p

for t = 2, for t ≥ 3.

1 Np = . 2 1−α 2c − c2 /N p

(A.11) (A.12) (A.13)

(A.14)

(A.15)

2 (a): Using E(wi,t−1 ) = σv2 + (1 − α)2 σv2 ψt−2 , E(ξis ξit ) = σv2 αt−s ψs for s ≤ t, and E(ξis wi,t−1 ) = −(1 − α)αt−s−2 ψs σv2 for s = 0, .., t − 2, we get (A.4). The results for Cases A and C can be obtained by using (A.15) and Lemma A1. The result for Case B can be obtained by noting that the first term in (A.4) is dominating the second one.

31

(b): We prove the case of t ≥ 3 since the result of t = 2 is straightforward. Using (A.14), (A.6) for the case of t ≥ 3 is obtained as follows: [( ) 1−δ V ar(yis ∆yi,t−1 ) = V ar αt−2 hs ηi2 + ξis wi,t−1 − (1 − α)αt+s−2 ε2i0 1−α ( ) +(1 − δ)αt−2 ηi ξis + (1 − δ)αt+s−2 − hs αt−2 ηi εi0 ] +hs µi wi,t−1 + ξis wi,t−1 + αs wi,t−1 εi0 − (1 − α)αt−2 ξis εi0 =

(1 − δ)2 2(t−2) 2 α hs V ar(ηi2 ) + V ar(ξis wi,t−1 ) + (1 − α)2 α2(t+s−2) V ar(ε2i0 ) (1 − α)2 ( )2 +(1 − δ)2 α2(t−2) V ar(ηi ξis ) + (1 − δ)αt+s−2 − hs αt−2 V ar(ηi εi0 ) +h2s V ar(µi wi,t−1 ) + α2s V ar(wi,t−1 εi0 ) + (1 − α)2 α2(t−2) V ar(ξis εi0 )

+2(1 − δ)hs αt−2 Cov(ηi ξis , µi wi,t−1 ) − 2(1 − α)αt+s−2 Cov(wi,t−1 εi0 , ξis εi0 )

=

(1 − δ)2 2(t−2) 2 α hs V ar(ηi2 ) + V ar(ξis wi,t−1 ) + (1 − α)2 α2(t+s−2) V ar(ε2i0 ) (1 − α)2 ( )2 +(1 − δ)2 α2(t−2) σv2 ση2 ψs + (1 − δ)αt+s−2 − hs αt−2 ση2 σ02 +h2s σµ2 (σv2 + (1 − α)2 σv2 ψt−2 ) + α2s σ02 (σv2 + (1 − α)2 σv2 ψt−2 )

+(1 − α)2 α2(t−2) ψs σv2 σ02

−2(1 − δ)hs ψs α2t−s−4 σv2 ση2 + 2(1 − α)2 α2t−4 ψs σv2 σ02 .

∑s s−j To prove (A.7), we need to derive V ar(ξis wi,t−1 ). Using ξis = vij and wi,t−1 = vi,t−1 − (1 − j=1 α ∑t−1 ∑t−2 t−ℓ−2 t−ℓ−2 , (ℓ = 1, ..., t − 2), we have α) ℓ=1 α viℓ = ℓ=1 bℓ viℓ where bt−1 = 1 and bℓ = −(1 − α)α 2 2 E(ξis wi,t−1 ) =

s ∑

t−1 ∑

α2s−j1 −j2 bℓ1 bℓ2 E(vij1 vij2 viℓ1 viℓ2 )

j1 ,j2 =1 ℓ1 ,ℓ2 =1 4 = E(vij )

s ∑

α2s−2j b2j + σ 4

j=1

s ∑ t−1 ∑

α2s−2j b2ℓ + 2σ 4

j=1 ℓ=1

1 − α2s + σ4 1 − α2 (1 − α2s )2 +2σ 4 (1 − α)2 α2(t−s−2) (1 − α2 )2 4 p σ N = + o(N p ). 2c − c2 /N p

4 = E(vij )(1 − α)2 αt−s−2

(

s ∑ s ∑

α2s−j−ℓ bj bℓ

j=1 ℓ=1

1 − α2s 1 − α2

)(

1 + (1 − α)2

1 − α2(t−2) 1 − α2

)

Hence, by noting that E(ξis wi,t−1 ) = O(1), we have V ar(ξis wi,t−1 ) =

σ4 N p + o(N p ). 2c − c2 /N p

For Cases A and C, the result follows from the fact that the fourth term in (A.6) is O(N 2p ) and dominating other terms. The result for Case B is obtained by noting that the first and fourth terms are O(N 2p ) and dominating other terms. (c): (A.8) can be easily shown using (3). (A.9) is obtained from the fact that the first term in (A.8) is dominating.

32

(d): Using (3), (A.10) is obtained as follows: [ V ar(yit yis ) = V ar ht hs µ2i + ξit ξis + αt+s ε2i0 + ht µi ξis

+(ht αs + hs αt )µi εi0 + hs µi ξit + αs ξit εi0 + αt ξis εi0

]

= h2t h2s V ar(µ2i ) + V ar(ξis ξit ) + α2(t+s) V ar(ε2i0 ) + h2t V ar(µi ξis ) ( )2 + ht αs + hs αt V ar(µi εi0 ) + h2s V ar(µi ξit ) + α2s V ar(ξit εi0 ) + α2t V ar(ξis εi0 ) +2ht hs Cov(µi ξis , µi ξit ) + 2αt+s Cov(ξis εi0 , ξit εi0 )

= h2t h2s V ar(µ2i ) + V ar(ξis ξit ) + α2(t+s) V ar(ε2i0 ) + h2t σµ2 σv2 ψs )2 ( + ht αs + hs αt σµ2 σ02 + h2s σµ2 σv2 ψt + α2s σv2 σ02 ψt + α2t σv2 σ02 ψs +2ht hs σv2 σµ2 αt−s ψs + 2α2t σv2 σ02 ψs

= a1 + a2 + · · · + a10 .

To show (A.11), we need to assess the magnitude of orders of each term. From (A.2) and (A.3), it is easy to show that a1 is O(N 4p ), and that a4 , · · · , a10 are smaller than O(N 3p ). To complete the proof, we show that a2 = O(N 2p ) and a3 = O(N 2p ). a3 = O(N 2p ) is obtained from the fact that E(ε4i0 ) = κσ04 = O(N 2p ) from Assumption 3. To show a2 = O(N 2p ), we use the following relationship, which is obtained by applying the Cauchy-Schwartz inequality: √ 2 2 4 )E(ξ 4 ). V ar(ξis ξit ) < E(ξis ξit ) ≤ E(ξis it ∑s s−j Using ξis = j=1 α vij , we have 4 E(ξis )

=

s ∑

α4s−j1 −j2 −j3 −j4 E(vij1 vij2 vij3 vij4 )

j1 ,j2 ,j3 ,j4 =1

=

s ∑

4 α4(s−j) E(vij ) + 3σ 4

α4s−2j1 −2j2

j1 ,j2 =1 j1 ̸=j2

j=1

=

s ∑

4 E(vij )(1 − α4s ) + 3σ 4 1 − α4

(

1 − α2s 1 − α2

)2

= O(N 2p ).

Hence, V ar(yit yis ) = O(N 4p ). 2 (e): (A.12) is obtained from the fact that V ar(yis ∆vit ) = 2σv2 E(yis ). (A.13) is obtained from (A.9). □ Next, we show the convergence results used in deriving the asymptotic behavior of the GMM estimator. For simplicity, we denote ZL2 i as Zi . Lemma A3. Let Assumptions 1 to 4 hold. Then, as N → ∞ with T fixed, we have (a.1) (a.2) (b.1) (b.2) (c.1)

(c.2)

1 N 2p+1

N ∑ i=1

N ∑

p

Z′i HZi −→

ση2 ′ J HJ, c2

for Cases A and C,

ση2 (1 − δ)2 ′ J HJ, for Case B, N 2p+1 i=1 c2 ( ) N ∑ σv2 ση2 ′ 1 d ′ Z ∆v −→ N 0, J HJ , for Cases A and C, i c2 N (2p+1)/2 i=1 i ) ( N ∑ σv2 ση2 (1 − d)2 ′ 1 d ′ J HJ , for Case B, Z ∆vi −→ N 0, c2 N (2p+1)/2 i=1 i 1

p

Z′i HZi −→

N 2 1 ∑ ′ p −λσv Z ∆yi,−1 −→ ιm ≡ µ1 , for Case A(a), N i=1 i 2   N 1 ∑ ′ Z ∆v   [ ] i  N i=1 i  d ζ F D2   −→ , for Case A(b), N   ξ F D2 + µ1  1 ∑ ′  Zi ∆yi,−1 N i=1

33

(c.3)

(c.4) (c.5) (c.6) (c.7) (c.8) (c.9)

     

N ∑

1

Z′i ∆vi

N (2p+1)/2 i=1 N ∑ 1 Z′ ∆yi,−1 N (2p+1)/2 i=1 i

1

N p+1

N ∑ i=1

 [ ]  d  −→ ζ F D2 ,  ξ F D2 

δ(1 − δ)ση2 ιm , c

p

Z′i ∆yi,−1 −→

N ∑



for Case A(c),

for Case B,

dση2 ιm ≡ µ2 , for Case C1(a), N p−q+1 i=1 c ) ( N dση2 1 ∑ ′ λσv2 p − ιm ≡ µ3 , for Case C1(b), Z ∆yi,−1 −→ N i=1 i c 2 1

p

Z′i ∆yi,−1 −→

N 1 ∑ ′ p Z ∆yi,−1 −→ µ1 , N i=1 i

1

N 3/2−q 1 N p−q+1

N ∑ i=1

p

Z′i ∆yi,−1 −→ µ2 ,

N ∑ i=1

for Case C1(c),

p

Z′i ∆yi,−1 −→ µ2 ,

for Case C2, for Case C3,

N 1 ∑ ′ p Z ∆yi,−1 −→ µ1 , for Case C4, N i=1 i   N 1 ∑ ′ Zi ∆vi   [ ]  d  N ζ F D2 i=1  −→ (c.11)  , for Case C5, N   ξF D2 + µ3   1 ∑ ′ Z ∆yi,−1 N i=1 i   N ∑ 1 ′ Z ∆v   [ ] i  d  N (2p+1)/2 i=1 i ζ F D2  −→  , for Case C6, (c.12)  N  ∑ ξ F D2 + µ2 1   ′ ∆y Z i,−1 N (2p+1)/2 i=1 i

(c.10)

N 1 ∑ ′ p for Case C7, Z ∆yi,−1 −→ µ1 , N i=1 i   N 1 ∑ ′ Z ∆v   [ ] i  N i=1 i  d ζ F D2  −→ (c.14)  , N   ξF D2 + µ1  1 ∑ ′  Zi ∆yi,−1 N i=1   N ∑ 1 ′ Z ∆vi   [ ]  d  N (2p+1)/2 i=1 i  −→ ζ F D2 , (c.15)  N   ∑ ξ F D2 1   Z′i ∆yi,−1 (2p+1)/2 N i=1

(c.13)

where

[

ζ F D2 ξ F D2

]

∼N

C = lim Cov N →∞

(

[(

0 0

) ( ,

1 N (2p+1)/2

N ∑

ση2 σv2 ′ c2 J HJ ′

C

Z′i ∆vi ,

i=1

34

for Case C8,

and

for Cases C9(a), (b), and (c)

C

ση2 σv2 ′ c2 J J

1 N (2p+1)/2

)]

N ∑ i=1

,

(A.16)

and

Z′i ∆yi,−1

)

.

Proof of Lemma A3. (a.1)-(a.2): The results are obtained from the fact that E(yit yis ) does not depend on s and t when α = 1 − c/N p (see Lemma A2(c)). ) (∑ N ′ = σv2 N E(Z′i HZi ), by applying the central limit (b.1)-(b.2): Since E(Z′i ∆vi ) = 0 and V ar i=1 Zi ∆vi

theorem and from the results are obtained. ( ( (a.1)-(a.2), ) ) ∑N ∑N ′ −1 (c.1): Since E N = O(1) and V ar N −1 i=1 Z′i ∆yi,−1 = O(1/N ) → 0, we have the i=1 Zi ∆yi,−1 result. (c.4), (c.5), (c.7), (c.8),)(c.9), (c.10), and (c.13) can be proved in ( (c.6), ( ) the same way. ∑N ∑N (c.2): Since E N −1 i=1 Z′i ∆yi,−1 = O(1) and V ar N −1 i=1 Z′i ∆yi,−1 = O(1), by applying the central limit theorem, ( the result is obtained. (c.11), in the same ) (c.12), and (c.14) ( can be proved ) way. ∑N ∑N ′ ′ −(2p+1)/2 −(2p+1)/2 (c.3): Since E N i=1 Zi ∆yi,−1 → 0 and V ar N i=1 Zi ∆yi,−1 = O(1), by applying the central limit theorem, the result is obtained. (c.15) can be proved in the same way. □ Proof of Theorems 1 and 2 Using Lemma A3, the results are obtained. □ References Ahn, S. C. and P. Schmidt (1995) “Efficient Estimation of Models for Dynamic Panel Data,” Journal of Econometrics, 68, 5-27. Alvarez, J. and M. Arellano (2003) “The Time Series and Cross-Section Asymptotics of Dynamic Panel Data Estimators,” Econometrica, 71, 1121-1159. Arellano, M. (2003a) “Modelling Optimal Instrumental Variables For Dynamic Panel Data Models,” working paper. Arellano, M. (2003b) Panel Data Econometrics, Oxford: Oxford University Press. Arellano, M. and S. Bond (1991) “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations,” Review of Economic Studies, 58, 277-297. Arellano, M. and O. Bover (1995) “Another Look at the Instrumental Variable Estimation of Error-Components Models,” Journal of Econometrics, 68, 29-51. Blundell, R. and S. Bond (1998) “Initial Conditions and Moment Restrictions in Dynamic Panel Data Models,” Journal of Econometrics, 87, 115-143. Blundell, R., S. Bond, and F. Windmeijer (2000) “Estimation in Dynamic Panel Data Models: Improving on the Performance of the Standard GMM Estimator,” in Baltagi, B. H. ed. Nonstationary Panels, Panel Cointegration and Dynamic Panels, 15 of Advances in Econometrics, Amsterdam: JAI Press, 53-91. Bun, M. J. G. and M. A. Carree (2005) “Bias-Corrected Estimation in Dynamic Panel Data Models,” Journal of Business and Economic Statistics, 23, 200-210. Bun, M. J. G. and V. Sarafidis (2013) “Dynamic Panel Data Models,” in Baltagi, B. ed. Oxford Handbook on Panel Data: Oxford University Press. Bun, M. J. G. and F. Windmeijer (2010) “The Weak Instrument Problem of the System GMM Estimator in Dynamic Panel Data Models,” Econometrics Journal, 13, 95-126. Caner, M. (2010) “Testing, Estimation in GMM and CUE with Nearly-Weak Identification,” Econometric Reviews, 29, 330-363. Hahn, J., J. Hausman, and G. Kuersteiner (2007) “Long Difference Instrumental Variables Estimation for Dynamic Panel Models with Fixed Effects,” Journal of Econometrics, 127, 574-617. Hahn, J. and G. Kuersteiner (2002) “Discontinuities of Weak Instrument Limiting Distributions,” Economics Letters, 75, 325-331.

35

Hayakawa, K. (2007) “Small Sample Bias Properties of the System GMM Estimator in Dynamic Panel Data Models,” Economics Letters, 95, 32-38. Hayakawa, K. (2009) “On the Effect of Mean-Nonstationarity in Dynamic Panel Data Models,” Journal of Econometrics, 153, 133-135. Holtz-Eakin, D., W. K. Newey, and H. S. Rosen (1988) “Estimating Vector Autoregressions with Panel Data,” Econometrica, 56, 1371-1395. Kruiniger, H. (2009) “GMM Estimation and Inference in Dynamic Panel Data Models with Persistent Data,” Econometric Theory, 25, 1348-1391. Roodman, D. (2009) “A Note on the Theme of Too Many Instruments,” Oxford Bulletin of Economics and Statistics, 71, 135-158. Staiger, D. and J. H. Stock (1997) “Instrumental Variables Regression with Weak Instruments,” Econometrica, 65, 557-586. Stock, J. H., J. H. Wright, and M. Yogo (2002) “A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments,” Journal of Business and Economic Statistics, 20, 518-529. Stock, J. H. and M. Yogo (2005) “Testing for Weak Instruments in Linear IV Regression,” in Andrews, Donald W. K. and James H. Stock eds. Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg: Cambridge University Press, 80-108. De Wachter, S. and E. Tzavalis (2012) “Detection of Structural Breaks in Linear Dynamic Panel Data Models,” Computational Statistics & Data Analysis, 56, 3020–3034.

36