Long difference instrumental variables estimation for dynamic panel models with fixed effects

Long difference instrumental variables estimation for dynamic panel models with fixed effects

ARTICLE IN PRESS Journal of Econometrics 140 (2007) 574–617 www.elsevier.com/locate/jeconom Long difference instrumental variables estimation for dy...

451KB Sizes 6 Downloads 211 Views

ARTICLE IN PRESS

Journal of Econometrics 140 (2007) 574–617 www.elsevier.com/locate/jeconom

Long difference instrumental variables estimation for dynamic panel models with fixed effects Jinyong Hahna, Jerry Hausmanb,, Guido Kuersteinerc a UCLA, USA Massachusetts Institute of Technology, Department of Economics, 50 Memorial Drive, Cambridge, MA 02142, USA c UC Davis, USA

b

Available online 6 September 2006

Abstract This paper proposes a new instrumental variables estimator for a dynamic panel model with fixed effects with good bias and mean squared error properties even when identification of the model becomes weak near the unit circle. We adopt a weak instrument asymptotic approximation to study the behavior of various estimators near the unit circle. We show that an estimator based on long differencing the model is much less biased than conventional implementations of the GMM estimator for the dynamic panel model. We also show that under the weak instrument approximation conventional GMM estimators are dominated in terms of mean squared error by an estimator with far less moment conditions. The long difference (LD) estimator mimics the infeasible optimal procedure through its reliance on a small set of moment conditions. r 2006 Elsevier B.V. All rights reserved. JEL classification: C13; C23; C51 Keywords: Dynamic panel; Bias correction; Second order; Unit root; Weak instrument

1. Introduction We are concerned with estimation of the dynamic panel model with fixed effects. Under large n, fixed T asymptotics it is well known from Nickell (1981) that the standard Corresponding author. Tel.: +1 617 253 3644; fax: +1 617 253 1330.

E-mail addresses: [email protected] (J. Hahn), [email protected] (J. Hausman), [email protected] (G. Kuersteiner). 0304-4076/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2006.07.005

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

575

maximum likelihood estimator suffers from an incidental parameter problem leading to inconsistency. In order to avoid this problem the literature has focused on instrumental variables estimation (GMM) applied to first differences. Examples include Anderson and Hsiao (1982), Holtz-Eakin et al. (1988), and Arellano and Bond (1991). Ahn and Schmidt (1995), Hahn (1997), and Blundell and Bond (1998) considered further moment restrictions. Unfortunately, the standard GMM estimator1 has been found to suffer from substantial finite sample biases, especially when the autoregressive parameter is close to the unit circle. See Alonso-Borrego and Arellano (1999). Motivated by this problem, modifications of likelihood based estimators emerged in the literature. See Kiviet (1995), Lancaster (1997), Hahn and Kuersteiner (2002a). The likelihood based estimators do reduce finite sample bias compared to the standard maximum likelihood estimator, but the remaining bias is still substantial for T relatively small. We attempt to solve these problems by using the ‘‘long difference technique’’ of Griliches and Hausman (1986). Griliches and Hausman noted that bias is reduced when long differences are used in the errors in variable problem, and a similar result works here with the second order bias. Long differences also increase the explanatory power of the instruments which further reduces the finite sample bias and also decreases the MSE of the estimator. Monte Carlo results demonstrate that the long difference (LD) estimator performs quite well, even for high positive values of the lagged variable coefficient where previous estimators are badly biased. In order to analyze the properties of the LD estimator we consider a local to nonidentification asymptotic approximation. We use the weak instrument approximation of Staiger and Stock (1997) and Stock and Wright (2000). Here we let the autoregressive parameter tend to unity as the number of cross-sectional observations n tends to infinity. Our limiting distribution for the GMM estimator shows that only moment conditions involving initial conditions are asymptotically relevant. We define a class of estimators based on linear combinations of asymptotically relevant moment conditions and show that a bias minimal estimator within this class can approximately be based on taking long differences of the dynamic panel model. In general, it turns out that under near nonidentification asymptotics the optimal procedures of Alvarez and Arellano (1998), Arellano and Bond (1991), Ahn and Schmidt (1995, 1997) are suboptimal from a bias point of view, and optimal inference should be based on a smaller than the full set of moment conditions. We show that a bias minimal estimator can be obtained by using a particular linear combination of the original moment conditions. As far as bias minimization under the alternative asymptotics is concerned this optimality result shows that only a single moment condition should optimally be used. We also analyze the form of the optimal estimator in terms of minimal mean squared error (MSE) under weak identification asymptotics. First order asymptotically efficient procedures are again found to be suboptimal. The optimal number of moment conditions minimizing the MSE under alternative asymptotics can be up to a factor of 10 smaller than the set of first order optimal instruments. Optimal procedures under weak instrument asymptotics are difficult to implement because they depend on unobservable nuisance parameters that cannot be readily estimated. We show that the LD estimator we propose in this paper is a good heuristic 1

GMM estimator that estimate differenced equations using level variables as instruments.

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

576

approximation to the optimal procedures. It captures the essence of our theoretical analysis which shows that the selection of a few highly significant instruments is called for to minimize bias and MSE. 2. LD: Intuitive motivation Consider the standard dynamic panel model with fixed effects: yit ¼ ai þ byi;t1 þ it ;

i ¼ 1; . . . ; n; t ¼ 1; . . . ; T.

(1)

Throughout the paper we assume that the moment conditions put forward by Ahn and Schmidt (1997) and summarized in Condition 4 hold. In addition we impose distributional assumptions for specific theoretical results. In this section we impose the following somewhat strong conditions, which we partly relax in Section 3. i:i:d:

Condition 1. it  Nð0; s2 Þ over i and t.   ai s2 Condition 2. yi0 jai N 1b ; 1b and ai Nð0; s2a Þ. 2 The usual GMM estimator2 is based on the first difference form of the model yit  yi;t1 ¼ bðyi;t1  yi;t2 Þ þ ðit  i;t1 Þ, where the instruments are based on the orthogonality condition E½yi;s ðit  i;t1 Þ ¼ 0, s ¼ 0; . . . ; t  2. Unfortunately, the standard GMM estimator obtained after first differencing has been found to suffer from substantial finite sample biases, especially when b is close to the unit circle. Such behavior can be understood by using the higher order bias formula of 2SLS (GMM),3 which depends on 4 factors: ‘‘Explained’’ variance of the first stage reduced form equation, covariance between the stochastic disturbance of the structural equation and the reduced form equation, the number of instruments, and sample size: E½b^ 2SLS  b 

1 ðnumber of instrumentsÞ  ð‘‘ covariance’’ Þ n ‘‘ Explained’’ variance of the first stage reduced form equation

We now use this formula to explain the bias of GMM in the first difference set up.4 Assume that T ¼ 4. The first difference set up considers the following equation among others: y4  y3 ¼ bðy3  y2 Þ þ 4  3 .

(2)

For the RHS variables it uses the instrument equation y3  y2 ¼ ðb  1Þy2 þ a þ 3 . Now calculate the R2 for the first stage equation using the Ahn–Schmidt (AS) moments under ‘‘ideal conditions’’ where the researcher knows b in the sense that the nonlinear 2 Condition 2 is a stationarity condition that is imposed for analytical convenience. Anderson and Hsiao (1982) show that estimators exploiting this initial condition can be very sensitive to misspecification. For this reason we do not exploit the initial condition to construct our estimator. 3 As explained in Donald and Newey (2001) the formula is valid strictly speaking only when the number of instruments tends to infinity. It provides however a good approximation to the finite sample behavior of GMM estimators as far as the impact of additional instruments on bias is concerned. Kuersteiner (2000) shows that the same expression holds in a time series context if appropriate adjustments are made for the covariance term. 4 It is possible to estimate the higher order bias, and subtract it off from the GMM estimator. Monte Carlo simulations reveal that this procedure does not work well when b is close to one. See Appendix A.

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

577

restrictions become linear restrictions: We would then use ðy2 ; y1 ; y0 Þ as instruments. Assuming stationarity for symbols, but not using it as additional moment information, we can write y0 ¼ a=ð1  bÞ þ Z0 , where Z0 ð0; s2 =ð1  b2 ÞÞ. It can be shown that the covariance between the structure error and the first stage error5 is s2 , and the ‘‘explained variance’’ in the first stage is equal to s2 ð1  bÞ=ðb þ 1Þ. Therefore, the ratio that determines the bias of 2SLS is equal to ð1 þ bÞ=ð1  bÞ, which is equal to 19 for b ¼ :9. For n ¼ 100, this implies a percentage bias of 105:56. We now turn to the LD technique of Griliches and Hausman (1986). The LD technique is based on a single equation yiT  yi1 ¼ bðyiT1  yi0 Þ þ ðiT  i1 Þ

(3)

It is easy to see that the initial observation yi0 would serve as a valid instrument. To increase further the explanatory power of the instruments, we can use similar intuition as in Hausman and Taylor (1983) or Ahn and Schmidt (1995), and see that the residuals yiT1  byiT2 ; . . . ; yi2  byi1 are valid instruments as well. We call this estimator the LD estimator. In order to understand why the LD technique may improve on the first difference, we use the higher order bias formula again. Assume that T ¼ 4 as before. The LD set up is based on the equation y4  y1 ¼ bðy3  y0 Þ þ 4  1 . It can be shown that the covariance between the first stage and second stage errors is b2 s2 , and the ‘‘explained variance’’ in the first stage is given by s2

ð2b6  4b4  2b5 þ 4b2 þ 4b  2b3 þ 6Þs2 þ b6  b4 þ 2  2b3 , ð2b  3 þ b2 Þs2  1 þ b2

where s2 ¼ s2a =s2 . Therefore, the ratio that determines the bias is equal to :37408 þ

2:5703  104 s2 þ 4:8306  102

for b ¼ :9. Note that the maximum value that this ratio can take in absolute terms is :37, which is much smaller than 19 for the AS estimator. We therefore conclude that the LD increases R2 but decreases the covariance thus alleviating the ‘‘weak instruments’’ problem. Further, the number of instruments is smaller in the LD specification so we should expect even smaller bias. The LD estimator as discussed above uses additional instruments, which requires knowledge of b. A feasible version of the LD estimator therefore requires a preliminary consistent estimator of b. An obvious choice is Arellano and Bover’s (1995) estimator bbFABH .6 We call such an estimator bbLD2AB : It is 2SLS on (3) using ðyi0 ; yiT1  bbFABH  yiT2 ; . . . ; yi2  bbFABH  yi1 Þ as instruments. Such instruments were previously used in Hausman et al. (1987). Using the preliminary consistent estimator bbLD2AB to obtain residuals as instruments, we can also come up with another LD estimator. Call it bbLD1 . We can iterate it further, and come up with bbLD2 ; bbLD3 ; . . . : Another possibility is to interpret 5 6

This covariance is conditional on the set of instruments. See Appendix A for precise definition of bbFABH .

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

578

Table 1 Monte Carlo comparison of estimators bbLDGMM

bbFAS

bbLD2AB

bbLD1

bbLD2

bbLD3

0.101 0.092 0.101 0.113 0.040 0.160 0.318

0.098 0.070 0.098 0.057 0.050 0.143 0.268

0.103 0.101 0.102 0.127 0.033 0.169 0.344

0.100 0.106 0.101 0.144 0.031 0.169 0.352

0.102 0.104 0.102 0.145 0.034 0.169 0.352

0.100 0.103 0.099 0.136 0.032 0.165 0.349

0.508 0.092 0.507 0.296 0.448 0.569 0.725

0.526 0.150 0.503 0.273 0.429 0.588 1.033

0.504 0.113 0.491 0.295 0.430 0.559 0.922

0.507 0.102 0.504 0.282 0.437 0.573 0.759

0.512 0.123 0.503 0.255 0.429 0.588 0.826

0.519 0.146 0.503 0.250 0.421 0.598 0.953

0.520 0.170 0.497 0.245 0.418 0.592 1.073

b ¼ 0:9, T ¼ 5, n ¼ 100 Mean 0.664 RMSE 0.388 Median 0.693 1st percentile 0.221 25th percentile 0.499 75th percentile 0.882 99th percentile 1.273

0.826 0.221 0.798 0.380 0.689 0.954 1.349

0.709 0.317 0.697 0.077 0.558 0.854 1.295

0.816 0.249 0.762 0.467 0.691 0.861 1.752

0.853 0.240 0.814 0.384 0.734 0.911 1.764

0.880 0.234 0.838 0.427 0.747 0.961 1.722

0.902 0.264 0.844 0.495 0.750 0.981 1.879

Estimator

bbBB

b ¼ 0:1, T ¼ 5, n ¼ 100 Mean 0.105 RMSE 0.072 Median 0.104 1st percentile 0.054 25th percentile 0.057 75th percentile 0.152 99th percentile 0.276 b ¼ 0:5, T ¼ 5, n ¼ 100 Mean RMSE Median 1st percentile 25th percentile 75th percentile 99th percentile

the LD as a GMM estimator based on E½yi0  ðyiT  yi1  bðyiT1  yi0 ÞÞ ¼ 0 E½ðyiT1  byiT2 Þ  ðyiT  yi1  bðyiT1  yi0 ÞÞ ¼ 0 .. . E½ðyi2  byi1 Þ  ðyiT  yi1  bðyiT1  yi0 ÞÞ ¼ 0 We will call this estimator bbLDGMM . In Table 1, we compare Monte Carlo7 properties of various estimators8 with the LD. We examine bbFAS , the nonlinear estimator based on the moment conditions proposed by Ahn and Schmidt (1995), and bbBB , the estimator proposed by Blundell and Bond (1998),9 among others. Precise definitions of the estimators are given in Section 4.10 We find that 7

We set s2 ¼ s2a ¼ 1 for our Monte Carlo simulation. The results are based on 5000 runs. The GMM estimators in our Monte Carlo simulations are all ‘‘four step’’ estimators. See Appendix A.2 for discussion. 9 Arellano and Bover (1995) consider an estimator that augments the estimator of Anderson and Hsiao with the moment condition later used by Blundell and Bond (1998). 10 The version of Ahn and Schmidt’s estimator (1995) adopted in this Monte Carlo simulation does not utilize homoscedasticity. 8

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

579

LD does better than the other estimators for large b and not significantly worse for moderate b. 3. Near unit root approximation In order to analyze the properties of the LD estimator, we consider a local to nonidentification asymptotic approximation.11 We use the weak instrument approximation as in Staiger and Stock (1997) and Stock and Wright (2000). It is based on letting the correlation between instruments and regressors decrease at a prescribed rate as the sample size increases. In the case of the dynamic panel model it is the autoregressive parameter that determines the quality of lagged observations used as instruments. Here we consider model (1) for T fixed and n ! 1 when also bn tends to unity. We want to keep certain aspects of the model, in our case the quality of the instruments, constant as the sample size increases. The purpose of alternative asymptotic sequences is to obtain asymptotic approximations that better reflect finite sample properties, in our case poor identification, than standard asymptotics with fixed parameters would deliver. The fact that bn ! 1 with n ! 1 is not meant to be a reflection of some realistic data generating process but is an analytical device. The asymptotics that we propose are also quite different from the near unit root literature in a pure time series context. Here, T is fixed and the degree of temporal dependence modelled by bn is not directly relevant for our analysis. What matters are the implications for identification as discussed before. We analyze the bias of the associated weak instrument limit distribution. We analyze the class of GMM estimators that exploit Ahn and Schmidt’s (1997) complete set of moment conditions and show that a strict subset of the full set of moment restrictions should be used in estimation in order to minimize bias. We show that the LD estimator is a good approximation to the bias minimal procedure. Following Ahn and Schmidt (1995, 1997) we exploit the moment conditions E½ui u0i  ¼ s2 I þ s2a 110 , E½ui yi0  ¼ say0 1 with 1 ¼ ½1; . . . ; 10 a vector of dimension T and ui ¼ ½ui1 ; . . . ; uiT 0 where uit ¼ ai þ it . The moment conditions can be written more compactly as " #       vech E½ui u0i  vech I 0 vech 110 2 2 þ sa ¼ s d¼ þ say0 , (4) E½ui yi0  0 1 0 where the redundant moment conditions have been eliminated by use of the vech operator which extracts the upper diagonal elements from a symmetric matrix. Representation (4) makes it clear that the vector d 2 RTðTþ1Þ=2þT is contained in a three dimensional subspace which is another way of stating that there are G ¼ TðT þ 1Þ=2 þ T  3 restrictions imposed on d. This statement is equivalent to Ahn and Schmidt’s (1997) analysis of the number of moment conditions. 11

The LD technique was motivated by higher order theory, and the related higher order theory is presented in Appendix A. Unfortunately, the second order bias calculations do not predict well the performance of the estimator near the unit circle where the model suffers from a near non-identification problem. See Blundell and Bond (1998), who related the problem to the analysis by Staiger and Stock (1997).

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

580

GMM estimators are obtained from the moment conditions by eliminating the unknown parameters s2 ; s2a and say0 . The set of all GMM estimators leading to consistent estimates of b can therefore be described by a ðTðT þ 1Þ=2 þ TÞ  G matrix A which contains all the vectors spanning the orthogonal complement of d. This matrix A satisfies d 0 A ¼ 0. For our purposes it will be convenient to choose A such that d 0 A ¼ ½E½uit Duis ; E½uiT Duij ; E½ui Duik ; E½Du0i yi0 , s ¼ 2; . . . ; T; t ¼ 1; . . . ; s  2; j ¼ 2; . . . ; T  1; k ¼ 2; . . . ; T, P where Dui ¼ ½ui2  ui1 ; . . . ; uiT  uiT1 0 and ui ¼ T 1 Tt¼1 uit . It becomes transparent that any other representation of the moment conditions can be obtained by applying a corresponding nonsingular linear operator C~ to the matrix A.12 It can be checked that there exists a nonsingular matrix C~ such that d 0 AC~ ¼ 0 is identical to the moment conditions (4a)–(4c) in Ahn and Schmidt (1997). We investigate the properties of (infeasible) linear GMM estimators based on E½uit Duis ðbÞ ¼ 0;

E½uiT Duij ðbÞ ¼ 0;

E½ui Duik ðbÞ ¼ 0;

E½yi0 Duit ðbÞ ¼ 0

obtained by setting Duit ðbÞ  Dyit  bDyit1 . Here, we assume that the instruments uit are observable. We partition the moment conditions into two groups where the second group only contains instruments involving yi0 . In particular, let gi1 ðbÞ denote a column vector consisting of uit Duis ðbÞ; uiT Duij ðbÞ; ui Duik ðbÞ. Also let gi2 ðbÞ  ½yi0 Dui ðbÞ. It will turn out that for most estimators used in practice only the second set of instruments are asymptotically relevant under near unit root asymptotics. Finally, let gn ðbÞ  P n3=2 ni¼1 ½gi1 ðbÞ0 ; gi2 ðbÞ0 0 with the optimal weight matrix On  E½gi ðbn Þgi ðbn Þ0 . The 0 infeasible GMM estimator of a possibly transformed set of moment conditions C~ gn ðbÞ then solves ~ ¼ argmin gn ðbÞ0 Cð ~ C~ 0 On CÞ ~ þ C~ 0 gn ðbÞ, bGMM ðCÞ

(5)

b

where C~ is a G  r matrix for 1prpG such that C~ is of full column rank r and ~ C~ 0 OCÞ ~ þ C~ 0 ÞX1. We use ðC~ 0 OCÞ ~ þ to denote the Moore–Penrose inverse. We thus rank ðCð allow the use of a singular weight matrix. Choosing r less than G allows to exclude certain moment P conditions. Let f i;1  qgi1 ðbn Þ=qb, f i;2  qgi2 ðbn Þ=qb, and f n  n3=2 ni¼1 ½f 0i1 ; f 0i;2 0 . The infeasible 2SLS estimator can be written as ~  bn ¼ ðf 0 Cð ~ C~ 0 On CÞ ~ þ C~ 0 f Þ1 f 0 Cð ~ C~ 0 On CÞ ~ þ C~ 0 gn ðbn Þ. bGMM ðCÞ n n n

(6)

~  bn under local to unity asymptotics. We We are now analyzing the behavior of bGMM ðCÞ make the following assumptions.13 12 This specification rules out certain estimators such as the estimator considered by Blundell and Bond (1998) which depends on an additional stationarity condition. We do not impose the stationarity condition and therefore exclude it from the analysis presented in this section. Near unit root asymptotics for the Blundell and Bond estimator are available from the authors upon request. 13 Kruiniger (2000) considers similar local-to-unity asymptotics.

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

581

Condition 3. Let14 yit ¼ ai þ bn yit1 þ it where bn ¼ expðc=nÞ for some 14c40 where yi0 ¼ Zi0 þ ai =ðbn  1Þ and Zi0 ¼ i0 =ð1  b2n Þ1=2 . Condition 4. Assume that it ; Zi0 and ai are independent and identically distributed across i for all t. Assume that it satisfies E½it  ¼ 0 and E½jit j4 o1 for all i and t ¼ 1; . . . ; T. Similarly,   E½Zi0  ¼ 0, E½ai  ¼ 0, E½jZi0 j4 o1 and E jai j4 o1 for all i. Let t ¼ ðt1 ; . . . ; tk Þ 2 Rk and 0  ¼ ðit1 ; . . . ; itk Þ, then ft1 ;...;tk ðtÞ  E½eit   is the joint characteristic function with corresponding cumulant generating function ln ft1 ;...;tk ðtÞ. The joint kth order cross-cumulant function is   qk  cumðit1 ; . . . ; itk Þ  ln ft1 ;...;tk ðtÞ .  qt1    qtk t¼0

In the same way define the cumulant function cumðit1 ; . . . ; itk ; Zi0 Þ, cumðit1 ; . . . ; itk ; ai0 Þ and cumðit1 ; . . . ; itk ; Zi0 ; ai Þ. Assume that E½2it  ¼ s2 , E½Z2i0  ¼ s2 =ð1  b2n Þ, E½a2i  ¼ s2a , E½it is  ¼ 0 for tas, E½it Zi0  ¼ 0, E½it ai  ¼ 0 for all t and i, E½ai Zi0  ¼ 0 for all i. Condition 5. In addition to Condition 4 assume that it and ai are all mutually independent for all t ¼ 1; . . . ; T. Remark 1. The moment conditions summarized in Condition 4 comprise the moment conditions (A.1) to (A.4) of Ahn and Schmidt (1997). In addition we impose stationarity of yi0 in our approximating sequence indexed by bn but do not exploit it in estimation. The critical assumption in Condition (3) is that bn ¼ expðc=nÞ ¼ 1  c=n þ oðn1 Þ. Note that linearity of the model implies that one could specify bn ¼ 1  c=n without affecting the results. The parameter bn controls the degree of covariation between regressors and instruments. For the case of GMM estimators based on first differences this covariance amounts to E½Dyt yt2  ¼ Oðbn  1Þ ¼ c=n. Our specification thus implies an asymptotic lack of correlation more stringent than the one used by Staiger and Stock (1997). In Hahn and Kuersteiner (2002b) it is argued that the faster n1 -rate corresponds to a situation of near-nonidentification. Numerical and analytical calculations15 in the 14

An alternative specification is the model yit ¼ ð1  bn Þai þ bn yit1 þ it . We analyze this model in an auxiliary appendix available on http://econ-www.mit.edu/faculty/jhausman/papers.htm. We find that for this model, if the estimator bGMM is based solely on the condition E½ui Duik ðb0 Þ ¼ 0 (7) then bGMM is consistent and thus weak instrument problems can be avoided in this case. If we consider estimators that do use all the moment conditions except (7) then we are in a situation where again the moment conditions in gi1 ðbÞ become asymptotically redundant. Now, however the limiting distribution of the non-redundant conditions has a noncentrality parameter. More explicitly, if bGMM is the GMM estimator based on gi2 ðbÞ then it has the following nonstandard limiting distribution under the weak identification asymptotics adopted here d

ðm þ xx Þ0 CðC 0 OCÞ1 C 0 xy

¼ X, ðm þ xx Þ0 CðC 0 OCÞ1 C 0 ðm þ xx Þ where m ¼ 1T1 s2 =2. It turns out that this limiting distribution is the same as the one obtained when b ¼ pffiffiffi expðc= nÞ as is shown in a separate Appendix also available at http://econ-www.mit.edu/faculty/jhausman/ papers.htm. 15 Available at http://econ-www.mit.edu/faculty/jhausman/papers.htm bGMM  b0 !

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

582

context of our dynamic panel model show that n1=2 -rate asymptotics do not capture the type of finite sample biases that we find important for the behavior of GMM estimators and that the specification bn ¼ expðc=nÞ delivers more accurate predictions. Under the generating mechanism described in the previous definition the following lemma can be established. Lemma 1. Assume that Conditions 3 and 4 hold. For T fixed and as n ! 1 n3=2

n X

p

n3=2

f i;1 ! 0;

n X

i¼1

p

gi;1 ðbn Þ ! 0

i¼1

and n3=2

n X

d

½f 0i;2 ; g0i;2 ðbn Þ0 !½x0x ; x0y 0 .

i¼1

The joint distribution of ½x0x ; x0y 0 Nð0; SÞ with S ¼ ½SS11 SS12  where S11 ¼ dI þ K1 , 21 22 S12 ¼ dM 01 þ K3 , S22 ¼ dM 2 þ K2 , S12 ¼ S021 , and where K1 ; K2 and K3 are defined in Appendix C. We used the definitions for d¼

s2a s2 , c2

and

2

1

6 6 6 M1 ¼ 6 6 4

1 .. 0

0

.

..

.

..

.

3

7 7 7 7; 1 7 5 1

2

2

6 6 1 6 M2 ¼ 6 6 4

1 .. . .. .

0

3

7 7 7 7. .. . 1 7 5 1 2 ..

.

It also follows that " # 0 0 1 On ¼ þ oð1Þ. 0 S22 n2 If in addition, Condition 5 holds then S11 ¼ dI, S12 ¼ dM 01 , S22 ¼ dM 2 . Proof. See Appendix B. It can be shown that the limiting normal distribution of Lemma 1 is singular. An alternative representation of the limiting distribution can be given by eliminating redundant dimensions from ½x0x ; x0y 0 . Let xy ¼ ½x0y0 ; xy1 0 where xy1 is the last element of xy and xu ¼ ½x0x ; xy1 0 . Then xy ¼ Hxu , where H is a ðT  1Þ  T matrix defined in Appendix C. This leads to an alternative representation of the limiting distribution. Lemma 2. Assume that Conditions 3 and 4 hold. Then n3=2

n X

d

½f 0i;2 ; g0i;2 ðbn Þ0 !½x0x ; x0u H 0 0 .

i¼1

Lemmas 1 and 2 imply that the moment conditions involving the initial condition dominate the limiting behavior of GMM estimators. As our analysis below shows,

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

583

moment conditions of the form E½yi;s ði;t  it1 Þ ¼ 0 are asymptotically equivalent to conditions involving yi;0 because the contribution of the innovations contained in yi;s asymptotically disappears. The reason for this result lies in the fact that asymptotically all instruments are weak but also in the stationarity condition which means that asymptotically yi;0 is the dominating source of variation in yi;t . ~  bn is stated in the next corollary. Using Lemma 1 the limiting distribution of bGMM ðCÞ For this purpose we define the augmented vectors x#x ¼ ½0; . . . ; 0; x0x 0 and x#y ¼ ½0; . . . ; 0; xy  0 0 0 0 and partition C~ ¼ ½C~ 0 ; C~ 1 0 such that C~ x#x ¼ C~ 1 xx . Let r1 denote the rank of C~ 1 . ~  bn be given by (6). If Conditions 3 and 4 are satisfied and if Corollary 1. Let bGMM ðCÞ r1 X1 then 0 ~ ~0 ~ þ ~0 d xx C 1 ðC 1 S22 C 1 Þ C 1 Hxu ~ 1! bGMM ðCÞ ¼ xðC~ 1 ; S22 Þ. 0 0 x0x C~ 1 ðC~ 1 S22 C~ 1 Þþ C~ 1 xx

(8)

~ is based on moment conditions that Remark 2. The restriction r1 X1 insures that bGMM ðCÞ use yi0 as an instrument. All estimators used in practice that we are aware of, satisfy this condition.16 Unlike the limiting distribution for the standard weak instrument problem, xðC~ 1 ; S22 Þ, as defined in (8), is based on normal vectors that have zero mean. This degeneracy is generated by the presence of the fixed effect in the initial condition, scaled up appropriately to satisfy the stationarity requirement for the process yit . Inspection of the proof shows that the usual concentration parameter appearing in the limit distribution is dominated by a stochastic component related to the fixed effect. Lemma 1 and Corollary 1 are indicative of the fact, that under the weak instrument asymptotic approximation it makes sense to consider estimators that use only moment restrictions based on initial conditions. Conventional estimators for the dynamic panel model such as the estimators proposed by Arellano and Bover (1995) or Ahn and Schmidt (1997) on the other hand are based on both types of moment conditions. To have a benchmark of procedures under weak instrument asymptotics we define the following class of GMM estimators for the dynamic panel model. ~  argmin g2;n ðbÞ0 C 1 ðC 0 OC ~ 1 Þ1 Definition 1. Let b1GMM be P defined as b1GMM ðC 1 ; OÞ b 1 n 0 3=2 ~ C 1 g2;n ðbÞ, where g2;n ðbÞ  n i¼1 gi2 ðbÞ, O is a symmetric positive definite ðT  1Þ  ðT  1Þ matrix of constants and C 1 is a ðT  1Þ  r1 matrix of full column rank r1 pT  1 such that C 01 C 1 ¼ I r1 . The limiting distribution of the class b1GMM of estimators now follows immediately from Lemma 1 and Corollary 1. This result is summarized in the next Corollary. d

~ Corollary 2. Let b1GMM be defined as in Definition 1. Then b1GMM  1 ! xðC 1 ; OÞ. In Definition 1 the matrix C 1 is restricted to be of full column rank. As the next result shows, this restriction is without loss of generality. Theorem 1. Let C~ be any matrix of dimension G  r and rank r such that 1prpG. For 0 0 q ¼ TðT þ 1Þ=2  2, partition C~ ¼ ½C~ 0 ; C~ 1 0 where C~ 0 is a q  r matrix of rank r0 and C~ 1 is ~ when r1 ¼ 0. In an auxiliary appendix available from the authors we investigate the properties of bGMM ðCÞ

16

ARTICLE IN PRESS 584

J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

a ðT  1Þ  r matrix of rank r1 X1. Then there exists a ðT  1Þ  r1 matrix C 1 of full column d rank such that xðC~ 1 ; S22 Þ ¼ xðC 1 ; S22 Þ and C 01 C 1 ¼ I r1 . This result is important for the analysis of the bias minimal estimator in the class of all estimators based on the AS moment conditions as well as for the analysis of the estimator that minimizes quadratic loss under near unit root asymptotic approximations. It shows that for these optimality considerations, the class b1GMM fully describes the set of all possible estimators that implicitly or explicitly use yi0 as an instrument. As we show later in Section 4, under the near non-identification asymptotics, commonly used estimators are not optimal in this class. Next we turn to the analysis of the asymptotic bias for the estimators b1GMM of the dynamic panel model. Since the limit only depends on zero mean normal random vectors we can apply the results of Smith (1993). ~ be the limiting distribution of b1 Theorem 2. Let xðC 1 ; OÞ GMM  1 in Definition 1 under 1 0 0 ~ ¯  ðD þ D0 Þ=2, where D ¼ W S21 S1 Conditions 3 and 4. Let D 11 and W ¼ C 1 ðC 1 OC 1 Þ C 1 . Define L such that S11 ¼ LL0 . Let G be the matrix of eigenvectors with corresponding eigenvalues li of L0 WL. Assume l1 X    XlT1 . Let L ¼ diagðl1 ; . . . ; lT1 Þ and L1 be the upper right block of L containing all the non-zero eigenvalues. Partition G ¼ ½G1 ; G2  where G1 are the eigenvectors corresponding to the nonzero eigenvalues of L0 WL such that G1 L1 G01 ¼ L0 WL and G1 G01 þ G2 G02 ¼ I. Then ~ ¼ Cðl¯ 1 ; G0 L0 DLG ¯ E½xðC 1 ; OÞ 1 ; L1 ; rÞ 1 1 1 X ð1Þk ð2 Þ1þk 1;k 0 ¯ 1 ; I r  l¯ 1 L1 Þ, C 1þk ðG1 DG  l¯ 1 ðr =2Þ k! 1 1þk k¼0 where r ¼ rankðW Þ, ðaÞb is the Pochhammer symbol Gða þ bÞ=GðbÞ, C 1;k pþk ð:; :Þ is a top order invariant polynomial defined by Davis (1980) and l¯ is the largest eigenvalue of L0 WL. The ~ exists for rX1. mean E½xðC 1 ; OÞ If in addition Condition 5 holds then ~ ¼ Cðl¯ 1 ; G0 DG ¯ 1 ; L1 ; rÞ, E½xðC 1 ; OÞ 1 where D ¼ WM 1 and L ¼ I. Proof. See Appendix B. The theorem shows that the bias of b1GMM both depends on the choice of C 1 and the ~ Note for example that E½xðC 1 ; I T1 Þ ¼ tr D=r ¯ 1. weight matrix O. 1 ~ The problem of minimizing the bias of bGMM by choosing optimal matrices C 1 and O does not seem to lead to an analytical solution but could in principle be carried out numerically for a given number of time periods T. For our purpose we are not interested in such an exact minimum. Instead we analyze optimal choices of C 1 minimizing bias for a ~ are most relevant in practice, namely O ~ ¼ I T1 and given weight matrix. Two choices of O ~ O ¼ S22 . Under certain additional distributional restrictions we are able to obtain an analytical solution for the bias minimal estimator in the class of estimators defined in Definition 1. We use this analytical optimum as a bench mark against which we judge existing estimators that have been proposed in the literature.

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

585

~ be defined as in Corollary 2. Assume that Conditions 3–5 are Theorem 3. Let xðC 1 ; OÞ ~ 1 Þ1 C 0 M 1 . Let G1 be as defined in ¯ ¼ ðD þ D0 Þ=2 where D ¼ C 1 ðC 01 OC satisfied. Let D 1 Theorem 2. Then min

C 1 s:t: C 01 C 1 ¼I r1 r1 ¼1;...;T1

jE½xðC 1 ; I T1 Þj ¼

min

C 1 s:t: C 01 C 1 ¼I r1 r1 ¼1;...;T1

jE½xðC 1 ; S22 Þj.

Moreover, ¯ 1. E½xðC 1 ; I T1 Þ ¼ tr D=r Let C n1 ¼ argminC 1 jE½xðC 1 ; I T1 Þj subject to C 01 C 1 ¼ I r1 ; r1 ¼ 1; . . . ; T  1. Then C n1 ¼ ri where ri is the eigenvector corresponding to the smallest eigenvalue l i of M 2 . Thus, ¯ 1 ¼  min l i =2. As T ! 1 the smallest eigenvalue of M 2 , min l i ! 0. minC tr D=r 1

Theorem 3 shows that the estimator that minimizes the bias is based only on a single moment condition which is a linear combination of the moment conditions involving yi0 as instrument where the weights are the elements of the eigenvector ri corresponding to the smallest eigenvalue of M 2 . Optimal estimators based on C n1 are unintuitive to implement. Moreover, optimality only holds near the unit circle. For these reasons we do not recommend direct implementation of these optimal procedures. We instead use them as benchmarks against which we measure the performance of more easily implementable estimators. 4. Near unit root approximations for infeasible estimators We turn now to an analysis of bias and MSE of commonly used estimators under near unit root asymptotics. All the estimators we consider can be described in terms of linear transformations of the basic moment conditions discussed by Ahn and Schmidt (1995). In this section we carry out the analysis assuming that the innovations uit are observable instruments. The case of feasible estimators where uit is replaced with an estimate is analyzed in Section 5. To represent the Arellano and Bond (AB) estimator based on the moment conditions E½yi;s ðit  i;t1 Þ ¼ 0;

s ¼ 0; . . . ; t  2

we define the matrix C AB;n such that gi;AB ¼ C AB;n ½gi1 ðbÞ0 ; gi2 ðbÞ0 0 where gi;AB contains the elements yi;s ðit  i;t1 Þ. The estimator b^AB , explicitly defined in Appendix C, is the infeasible GMM estimator based on gi;AB . One of the first instrumental variables estimators proposed in the literature is the estimator of Anderson and Hsiao (1981) based on the moment condition P E½ Tt¼2 yi;t2 ði;t  i;t1 Þ ¼ 0. It can be expressed in terms of a matrix C AH;n , defined in the Appendix, as C b^AH  bn ¼

AH;n0

gn ðbn Þ . AH;n0 C fn

To represent the estimator of Arellano and Bover (1995) denote by uni the orthogonalized deviations obtained by the Helmert transform17 of ui . Define 17

The Helmert transform is defined in Appendix A.

ARTICLE IN PRESS 586

J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

ðgni;1 ðbÞ0 ; gni;2 ðbÞ0 Þ0 which is obtained by replacing Duit with unit in ðgi;1 ðbn Þ0 ; gi;2 ðbn Þ0 Þ0 . Arellano and Bover (1995) estimator b^ABH defined in Appendix C, Eq. (21), is the infeasible GMM estimator based on the moment gi;ABH ¼ C AB;n0 ðgni;1 ðbÞ0 ; gni;2 ðbÞ0 Þ0 . We analyze the infeasible version of the GMM estimator based on the moment conditions (4a) and (4b) of Ahn and Schmidt (1997). We do not use the moment conditions (4c) to insure comparability with the LD estimator where we do not exploit (4c). We call this estimator the ASM estimator because it is based on the linear moment conditions of Ahn and Schmidt.18 The selector matrix C ASM;n defined in Appendix C selects the relevant moment conditions gi;ASM ¼ C ASM;n ½gi1 ðbÞ0 ; gi2 ðbÞ0 0 for the ASM estimator. The infeasible is denoted by b^ASM and explicitly defined in Appendix C. GMM estimator based on g i;ASM

The LD estimator is based on the moments Eyi0 ðuiT  ui1 Þ ¼ 0 as well as Euit ðuiT  ui1 Þ ¼ 0 for t ¼ 2; . . . ; T  1. We define the selector matrix C LD such that the LD moment 0 0 0 LD conditions collected in gi;LD can be written as gi;LD ¼ C LD0 ½gLD i1 ðbÞ ; gi2 ðbÞ  where gi1 ðbÞ denotes the moment conditions basedd on uit ðuiT  ui1 Þ. The infeasible GMM estimator using these moment conditions is denoted by b^LD and explicitly defined in Appendix C. The limiting distributions of the estimators b^AB , b^AH , b^ABH , b^ASM and b^LD are summarized in the following lemma. AH AB ASM Lemma 3. Assume Conditions 3–5 are satisfied. Let the matrices C AB and 1 , C1 , C1 , C1 LD n C 1 as defined in Appendix C. Then, for B defined in (20), 0 AB þ AB0 ðC AB0 S22 C AB d x C 1 Þ C 1 Hxu b^AB  1 ! x 0 1 AB 1 AB0  xAB , þ AB0 xx C 1 ðC 1 S22 C AB 1 Þ C 1 xx AH0 Hx d C b^AH  1 ! 1 AH0 u  xAH , C 1 xx 0 n0 AB þ AB0 n ðC AB0 Bn S22 Bn0 C AB d x B C 1 Þ C 1 B Hxu b^ABH  1 ! x 0 n0 1 AB 1 AB0 n  xABH , þ AB0 n xx B C 1 ðC 1 B S22 Bn0 C AB 1 Þ C 1 B xx 0 ASM ðC ASM0 S22 C ASM Þþ C ASM0 Hxu d x C 1 1 b^ASM  1 ! x 0 1 ASM 1 ASM0  xASM , ASM þ ASM0 xx C 1 ðC 1 S22 C 1 Þ C 1 xx 0 LD þ LD0 ðC LD0 S22 C LD d x C 1 Þ C 1 Hxu b^LD  1 ! x 0 1 LD 1 LD0  xLD . þ LD0 xx C 1 ðC 1 S22 C LD 1 Þ C 1 xx d

d

d

d

It follows that xASM ¼ xAB ¼ xABH and xLD ¼ xAH where ¼ means equal in distribution. One implication of Lemma 3 is that estimators which are asymptotically equivalent under standard asymptotics where identification is strong continue to be equivalent under 18

In this section we use observed innovations as instruments. In other words, for the near unit root asymptotics we do not consider nonlinear estimation of the AS estimator in order to simplify the limit theory while for the Monte Carlo simulations we do implement a nonlinear version of this estimator. The moment conditions used for estimation are the same in both cases.

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

587

the weak instrument asymptotics imployed here as long as they implicitly or explicitly use the same moment conditions involving yi0 . To be more concrete, consider the Arellano and Bond (AB) estimator based on the moment conditions E½yis Duit ðbn Þ ¼ 0 and an alternative estimator based on the moment conditions E½yi0 Duit ðbn Þ ¼ 0 as well as E½uis Duit ðbn Þ ¼ 0 for s ¼ 0; . . . ; t  2 and t ¼ 2; . . . ; T. Both estimators are equivalent under standard asymptotic approximations. Under the alternative near unit root asymptotics of this section the first version of the estimator converges in distribution to xAB while the second 0 1 estimator converges in distribution to the random variable x0x S1 22 Hxu =xx S22 xx . Note that ASM the column rank of C 1 is T  1. By Theorem 1 it then follows that 0 1 x0x S1 Hx =x S x ¼ x . In other words the limiting distribution is unchanged in this u AB 22 x 22 x example because both sets of moment conditions ‘‘span’’ the same moment conditions involving yi0 . In Section 5 we see that this equivalence may break down once uit is replaced with an estimated instrument. The limiting distributions of the estimators b^AB , b^ABH , b^ASM and b^LD can be expressed in d

d

terms of the general result of Corollary 1. We have xAB ¼ xðC AB 1 ; S22 Þ, xASM ¼ d

d

LD LD xðC ASM ; S22 Þ, xABH ¼ xðBn0 C AB ¼ 1 1 ; S22 Þ and xLD ¼ xðC 1 ; S22 Þ where now W LD LD0 LD þ LD0 C 1 ðC 1 S22 C 1 Þ C 1 is singular of rank 1. The next result gives explicit formulas for the bias of the asymptotic limiting distribution of the ASM, Arellano and Bond, Arellano and Bover and long difference estimators. ABH ; S22 Þ, xðC AB ; S22 Þ and xðC LD Corollary 3. Let xðC ASM 1 1 ; S22 Þ; xðC 1 1 ; S22 Þ be the limiting distribution of the ASM, Arellano– Bond, Arellano– Bover and Long Difference estimators ¯ a  ðDa þ Da0 Þ=2, where Da ¼ W a M 01 and W a ¼ under Condition 3. Let D a a0 a þ a0 C 1 ðC 1 S22 C 1 Þ C 1 for a ¼ f‘‘ ASM’’ ; ‘‘ AB’’ ; ‘‘ ABH’’ ; ‘‘ LD’’ g. Let l¯ a be the largest eigenvalue of W a . Then for a ¼ f‘‘ ASM’’ ; ‘‘ AB’’ ; ‘‘ ABH’’ g, a ¯a E½xðC a1 ; S22 Þ ¼ Cðl¯ 1 a ; D ; W ; T  1Þ.

Let G the matrix of eigenvectors of W LD with G ¼ ½G1 ; G2  where G1 is the eigenvector corresponding to the nonzero eigenvalue of W LD such that G1 L1 G01 ¼ W LD and G1 G01 þ G2 G02 ¼ I. Define z1 ¼ L1 G01 xx and z2 ¼ L1 G02 xx . Then, for a ¼ f‘‘ LD’’ g, 0 ¯a E½xðC a1 ; S22 Þ ¼ Cðl¯ 1 a ; G1 D G1 ; L1 ; 1Þ.

Numerical procedures to evaluate the top order invariant polynomials appearing in the expressions for the bias are discussed in Smith (1993). Implementation of these methods is quite complicated and in our numerical work we rely on Monte Carlo integration to evaluate the moments of various procedures instead. 5. Near unit root approximations for feasible estimators Feasible versions of the estimators discussed in Section 4 depend on first stage parameter estimates for the estimation of instruments and weight matrices. When identification of the model is weak, these objects can no longer be estimated consistently and their presence potentially affects the limiting distribution. In this section we derive results for a general class of feasible estimators and then specialize these results to a few procedures of interest.

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

588

For the Arellano and Bover (1995) estimator we note that the optimal weight matrix 0 OAB n is the block diagonal matrix with blocks Ezit zit where zit has the representation 3 2 3 2 0  0 1 7 6 b0 7 6 0  07 6 n 6 bn 7 7 6 7yi0 . 6 zit ¼ 6 . .. 7ui þ 6 .. .. 7 7 6 .. . . . 5 4 5 4 t1 t2 0 b n bn    bn 0 Note that E½ui u0i  ¼ s2a 1T 10T þ s2 I T , E½y2i0  ¼ s2 =ð1  b2n Þ þ s2a =ðbn  1Þ2 and E½ui yi0  ¼ s2a =ðbn  1Þ. It follows that n2 E½zit z0it  ! s2a =c2 1t 10t . Then by a strong law of large P ^ AB !p OAB ¼ s2 =c2 diag numbers it follows that n3 ni¼1 zit zit !p s2a =c2 1t 10t and O a n ^ AB is a block diagonal matrix with blocks n3 Pn zit zit ð1; 12 102 ; . . . ; 1T1 10T1 Þ where O n i¼1 for t ¼ 1; . . . ; T  1 on the diagonal. The feasible version of the Arellano and Bover (1995) estimator can be written as ^ AB Þþ C AB;n0 f n Þ1 f n0 C AB;n ðO ^ AB Þþ C AB;n0 gn b^FABH  bn ¼ ðf nn0 C AB;n ðO n n n n n d

x0 Bn0 C AB ðOAB Þþ C AB0 Bn Hx ! x 0 n0 1 AB AB þ 1 AB0 n u  xFABH . xx B C 1 ðO Þ C 1 B xx d

n n0 AB Note that s2 OAB ¼ C AB0 such that xFABH ¼ xABH . Using the limiting 1 B S22 B C 1 representation for the Arellano and Bover estimator we consider feasible two stage estimators where the vector ui is replaced by u^ i ¼ ½yi1  b^FABH yi0 ; . . . ; yiT  b^FABH yiT1 0 . It follows that u^ it ¼ uit þ ðb^FABH  bn Þyit1 . We can then write 3 2 2 3 0  0 1 6 b0 6 7 0  07 7 6 n 6 bn 7 7 6 ^ ^ 6 7yi0 . (9) u^ i ¼ ui þ ðbFABH  bn Þ6 . .. 7ui þ ðbFABH  bn Þ6 .. .. 7 7 6 .. . . . 4 5 5 4 bnT1 bT2    b0n 0 n

Using more compact notation this can be written as u^ i ¼ ui þ ðb^FABH  bn ÞðB¯ n ui þ b¯ n yi0 Þ where B¯ n and b¯ n are defined corresponding to (9). The feasible moment conditions then are g^ i1 ¼ BðI TðT1Þ þ ðb^FABH  bn ÞðI T1 B¯ n ÞÞðDui ui Þ þ ðb^FABH  bn ÞBðI T1 b¯ n Þgi2 , where the matrix B is defined in Appendix C. In the same way the derivative f^i;1 ¼ qg^ i1 =qb is given by ¯ n ÞÞðDyi;1 ui Þ f^i;1 ¼ BðI TðT1Þ þ ðb^FABH  bn ÞðI T1 B þ ðb^FABH  bn ÞBðI T1 b¯ n Þf i2 . These representations are suggestive of the fact that under the near unit root asymptotics, two stage estimators are not feasible in the usual sense because the first stage estimator b^FABH is not consistent and thus affects the limiting distribution of the two stage estimator. This problem is in fact the same as the inconsistency found in 2SLS under weak instrument asymptotics as shown by Staiger and Stock (1997). The next lemma establishes the

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

589

distributional properties of sample moments that are used to define the two stage estimator. Lemma 4. Assume Conditions 3 and 4 are satisfied. Then 2 3 xABH BðI T1 1T Þ 0 " # n 6 7 xx X I T1 0 0 7 0 0 3=2 0 0 d 6 ^ n ½f i;1 ; f i;2 ; g^ i;1 gi;2  !6 . 7 0 xABH BðI T1 1T Þ 5 Hxu 4 i¼1 0 I T1 The feasible GMM estimator not only relies on estimated residuals but also is based on an estimate of the optimal weight matrix. For the estimate of the weight matrix we need to keep in mind that gi is a function of bn because it depends on Dui ðbn Þ. To construct estimators of the second moments of gi , bn needs to be replaced by an estimate. Thus the weight matrix is defined as ^ n ¼ n3 O

n X

½g^ 0i;1 ðb^FABH Þ; g0i;2 ðb^FABH Þ0 ½g^ 0i;1 ðb^FABH Þ; g0i;2 ðb^FABH Þ,

i¼1

where it becomes transparent that under near unit root asymptotics the estimation error does not vanish even asymptotically. Lemma 5. Assume Conditions 3 and 4 are satisfied. Let " # xABH BðI T1 1T Þ x2ABH BðI T1 1T Þ Rx ðxABH Þ ¼ . xABH I T1 I T1 It then follows that d ^n! Rx ðxABH ÞSRx ðxABH Þ0 . O

The general form of the feasible GMM is given by 0 ^ n CÞþ C 0 f^ Þ1 f^0 CðC 0 O ^ n CÞþ C 0 g^ n ðbn Þ, b^FGMM  bn ¼ ðf^n CðC 0 O n n P P 0 where f^n ¼ n3=2 ni¼1 ½f^i;1 ; f 0i;2 0 and g^ n ðbn Þ ¼ n3=2 ni¼1 ½g^ 0i;1 ðbn Þ; g0i;2 ðbn Þ0 . The next Corollary summarizes the limiting distribution of the feasible two stage estimator which is based on the Arellano and Bover estimator in the first stage.

Theorem 4. Assume Conditions 3 and 4 are satisfied. Let B be as defined in Eq. (19) in Appendix C. Let " # " # xABH Bðxx 1T Þ xABH BðHxu 1T Þ x^ x ðxABH Þ ¼ ; x^ y ðxABH Þ ¼ . xx Hxu Then it follows that ^ x ðxABH Þ0 CðC 0 Rx ðxABH ÞSRx ðxABH Þ0 CÞþ C 0 x^ y ðxABH Þ d x ¼ xFGMM . b^FGMM  bn ! x^ x ðxABH Þ0 CðC 0 Rx ðxABH ÞSRx ðxABH Þ0 CÞþ C 0 x^ x ðxABH Þ As before the LD estimator requires a slightly modified treatment because the moment conditions are more easily expressed directly as functions of the underlying innovations. The LD estimator is based on the moments E½yi0 ðuiT  ui1 Þ ¼ 0 as well as

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

590

E½uit ðuiT  ui1 Þ ¼ 0 for t ¼ 2; . . . ; T  1. As before, we replace unobserved innovations uit LD by estimates u^ it based on b^FABH such that g^ LD ðDui u^ i Þ and f^i;1 ¼ BLD ðDyi;1 u^ i Þ i;1 ¼ B Pn ^LD0 0 0 3=2 ^ n ðbn Þ ¼ n3=2 where BLD is defined in (22). Define f^LD n ¼n i¼1 ½f i;1 ; f i;2  and g Pn 0 LD0 0 ^ i;1 ðbn Þ; gi;2 ðbn Þ . The feasible LD estimator then is defined as i¼1 ½g ^ LD C LD;n Þþ C LD;n0 g^ LD f^ LD0 C LD;n ðC LD;n0 O n n b^FLD  bn ¼ n LD0 LD;n LD;n0 LD , ^ C LD;n Þþ C LD;n0 f^LD f^ C ðC O n

n

n

where the weight matrix is estimated as n X 0 LD0 ^ 0 ^ ^ ^ LD ¼ n3 ^ i;1 ðbFABH Þ; g0i;2 ðb^FABH Þ. ½g^ LD0 O n i;1 ðbFABH Þ; gi;2 ðbFABH Þ ½g i¼1

The next result establishes the limiting distribution of the feasible LD estimator. Theorem 5. Assume Conditions 3 and 4 are satisfied. Let BLD be as defined in (22) in Appendix C. As before we use the notation " # " # xABH BLD ðxx 1T Þ xABH BLD ðHxu 1T Þ LD LD x^ x ðxABH Þ ¼ ; x^ y ðxABH Þ ¼ . xx Hxu defined by replacing B with BLD in Rx Then, it follows that for RLD x b^FLD  b n

0 LD 0 LD þ LD0 ^ LD LD ^ LD ðC LD0 RLD Þ C xy ðxABH Þ d xx ðxABH Þ C x ðxABH ÞSRx ðxABH Þ C ! LD ^x ðxABH Þ0 C LD ðC LD0 RLD ðxABH ÞSRLD ðxABH Þ0 C LD Þþ C LD0 x^ LD ðxABH Þ x

x

x

x

 xFLD . Furthermore, it follows that xFLD ¼ xLD . Iterated versions of the LD estimator have limiting distributions that can be defined recursively for the jth step. If b^FLD;j is the feasible LD estimator based on the jth iteration then its limiting distribution is given by b^FLD;j  bn 0 LD 0 LD þ LD0 ^ LD LD ^ LD ðC LD0 RLD Þ C xy ðxFLD;j1 Þ d xx ðxFLD;j1 Þ C x ðxFLD;j1 ÞSRx ðxFLD;j1 Þ C ! LD ^x ðxFLD;j1 Þ0 C LD ðC LD0 RLD ðxFLD;j1 ÞSRLD ðxFLD;j1 Þ0 C LD Þþ C LD0 x^ LD ðxFLD;j1 Þ x x x x  xFLD;j . From Theorem 5 it follows immediately, that xFLD;j does not depend on xFLD;j1 and in fact xFLD;j ¼ xLD for all j. We use the results in Theorems 4 and 5 to evaluate bias and MSE of feasible estimators. Due to the complex nature of the limiting distribution for the general feasible GMM estimator these evaluations cannot be done analytically. Numerical evaluation on the other hand is possible. 6. Numerical evaluation of near unit root approximations We calibrate our numerical evaluations to b ¼ :95 and n ¼ 100 which leads to a value of c ¼ 100ð log :95Þ. We set s2a ¼ s2 ¼ 1 and assume that it and ai are independent as in

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

591

Table 2 Median bias of near unit root approximations T

ASM/AB/ABH

LD/AH

Optimal

5 6 7 8 9 10

0.6619 0.5694 0.5006 0.4510 0.4121 0.3722

0.2445 0.1822 0.1677 0.1523 0.1279 0.1083

0.1910 0.1340 0.0990 0.0761 0.0603 0.0489

Condition 5 in order to eliminate nuisance parameters related to higher order moments of the innovations. We then generate 10,000 random draws from the distribution of x. Using these values we compute the asymptotic distribution of various estimators as established in Corollary 1, Lemma 3 and Theorems 4 and 5. Table 2 reports numerical values for the biases of the linear estimator based on the moment conditions of Ahn and Schmidt (ASM), the Arellano and Bover (ABH) and the Arellano and Bond (AB) estimator under near unit root asymptotics and compares them with biases for the LD estimator as well as the bias minimal estimator. We report median biases for numerically evaluated biases in columns AB/ASM/ABH and LD/AH while the last column reports theoretical lowerbounds for the mean bias based on Theorem 3. For AB/ASM/ABH median and mean are almost identical while for LD/AH the mean does not seem to be reliably evaluated by simulation techniques although it exists theoretically. Despite the fact that LD does not achieve the same bias reduction as the fully optimal estimator, it has significantly less bias than the more commonly used implementations of the ABH estimator. Lemma 3 also shows that under the near unit root asymptotics adopted here, there is no difference between the ASM, ABH and AB estimators. We now compare predictions of the near unit root approximation and the higher order approximation reported in Section A.1 to the actual finite sample performance of different estimators. Since there is no difference between ASM, AB and ABH under local to unity asymptotics we focus on the ABH procedure for which we have reported Monte Carlo results in Table 3. Considering the results in Table 3 we note that for ABH the bias predicted by higher order asymptotic theory for b ¼ :95, n ¼ 100 and T ¼ 5 is 3:84 while the prediction of the near unit root approximation in Table 2 is :66. The actual small sample bias found in Monte Carlo experiments and reported in Table 3 is :598. For the LD estimator the higher order theory predicts a bias of 1:8, the near unit root theory predicts a bias of :25 and the actual small sample bias is :185. These results suggest that the near unit root theory does a much better job of approximating the bias near the unit circle than the higher order theory which tends to overestimate bias by a large margin. In Table 4 we report the median bias of the feasible estimators under near unit root approximations derived in Section 5. For the feasible version of the Arellano and Bover estimator (FABH) the results are the same as for the infeasible version, while the feasible ASM19 estimator shows an increase in 19

The poor properties of the limiting distribution for the feasible version of the ASM estimator may be due to the fact that for the near unit root asymptotics we use estimated residuals rather than a nonlinear estimator.

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

592

Table 3 Properties of selected estimators: higher order theory and practice bbLD2AB

bbLD1

Bias predicted by higher order theory 0.10 100 5 0.017 0.50 100 5 0.054 0.90 100 5 1.019 0.95 100 5 3.849

0.000 0.007 0.449 1.800

0.001 0.003 0.119 0.300

0.001 0.009 0.081 0.793

0.001 0.013 0.214 1.591

0.001 0.014 0.317 2.179

Actual bias 0.10 100 0.50 100 0.90 100 0.95 100

0.015 0.050 0.452 0.598

0.001 0.005 0.142 0.185

0.002 0.005 0.077 0.112

0.002 0.012 0.037 0.069

0.002 0.019 0.001 0.030

0.002 0.024 0.010 0.010

RMSE predicted by higher order theory 0.10 100 5 0.081 0.50 100 5 0.133 0.90 100 5 0.965 0.95 100 5 3.200

0.099 0.097 0.609 2.557

0.102 0.120 1.181 6.296

0.102 0.134 1.449 8.863

0.102 0.142 1.561 10.544

0.102 0.147 1.633 11.600

Actual RMSE 0.10 100 0.50 100 0.90 100 0.95 100

0.099 0.097 0.188 0.215

0.102 0.120 0.170 0.173

0.102 0.137 0.195 0.183

0.102 0.158 0.255 0.231

0.102 0.172 0.285 0.278

b

n

T

5 5 5 5

5 5 5 5

bbFABH

0.081 0.131 0.552 0.690

bbLD2

bbLD3

bbLD4

Table 4 Median bias of near unit root approximations T

FABH

FASM

FLD

FLD(4)

5 6 7 8 9 10

0.6619 0.5694 0.5006 0.4510 0.4121 0.3722

0.8250 0.7903 0.7704 0.7173 0.6905 0.6586

0.2445 0.1822 0.1677 0.1523 0.1279 0.1083

0.2445 0.1822 0.1677 0.1523 0.1279 0.1083

the median bias relative to the infeasible version. This is an indication that the additional overidentifying restrictions, when combined with estimated instruments, further increase the bias problems of this particular specification. The feasible LD estimator on the other hand is invariant to first stage estimation of the instruments and has the same distribution as the infeasible version. Turning now to the same comparison for the MSE we note that the higher order prediction for the MSE of ABH in Table 3 is ð3:200Þ2 ¼ 10:24, for the near unit root approximation in Table 5, it is .7411 and the actual MSE from Table 3 is ð0:690Þ2 ¼ :4761. As before for the median bias feasible ABH has the same MSE as the infeasible version, a fact that is predicted by our theory. We see again that even though the near unit root approximation is not entirely accurate it does much better in terms of predicting the right

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

593

Table 5 MSE of near unit root approximations T

Optimal

Moments

ASM/ABH

FABH

FASM

Moments (ASM)

5 6 7 8 9 10

0.6506 0.4687 0.3159 0.2107 0.1513 0.1079

3 4 4 4 4 4

0.7411 0.5584 0.4640 0.3702 0.3149 0.2636

0.7411 0.5584 0.4640 0.3702 0.3149 0.2636

1.2152 1.1012 1.0026 0.9311 0.8823 0.8466

17 24 32 41 51 62

magnitude of the MSE than is the case for the higher order theory. It also should be noted that in the case of the near unit root approximation a number of moment conditions become irrelevant asymptotically. The associated drop in the degree of overidentification leads to an increase in the size of the variance. This is also the reason why for the LD estimator the MSE does not exist under the near unit root approximation (this follows by checking the conditions for the existence of moments in Smith 1993, p. 272). Under these asymptotics, the LD estimator asymptotically does only depend on one instrument, the initial observation yi0 , and thus does not possess enough overidentifying restrictions for the MSE to exist. This remains true for the feasible version of the estimator which, under near unit root asymptotics, has the same distributional properties as the infeasible version. In order to better understand the trade off between bias reduction and quadratic error loss under the near unit root asymptotics we now turn to a more detailed analysis of the d

MSE implied by the asymptotic distribution under Condition 3. Since b1GMM  1 ! x ðC 1 ; S22 Þ we take E½xðC 1 ; S22 Þ2 as a measure for the MSE of b1GMM . Unlike in the case of bias it is not feasible to find analytical optima for the problem of minimizing E½xðC 1 ; S22 Þ2 with respect to C 1 . Nevertheless, for a given value of T and a particular point in the parameter space, such optimization can be carried out numerically by Monte Carlo integration techniques and subsequent numerical optimization. We carry out such a numerical exercise to establish that conventional estimators which are optimal under standard first order asymptotic theory fail to be admissible under weak identification circumstances. In our numerical examples we calibrate the asymptotic distribution to the case where b ¼ :95 and n ¼ 100. This implies a value for c ¼ 100ð log :95Þ. In Table 5 we compare E½xðC n1 ; S22 Þ2 to E½xðC ASM ; S22 Þ2 for different values of T. Here 1 C n1 ¼

arg min C 1 s:t: C 01 C 1 ¼I r1 r1 ¼3;...;T1

E½xðC 1 ; S22 Þ2 .

Note that the minimal dimension of C 1 is 3 to guarantee the existence of the second moment. The results of Table 5 show that the MSE of the ASM, AB and ABH estimators is clearly dominated by the optimal procedure. It also shows that the optimal number of moment conditions increases slowly with T and is much lower than the number of moments used by Ahn and Schmidt. In fact, the optimal procedure uses only about 1=10 of the available moments. These results show that the optimality arguments under standard asymptotic approximations break down when the model is weakly identified. While it is optimal

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

594

Table 6 IQR of near unit root approximations T

ASM/AB/ABH

LD/AH

FABH

FASM

FLD

5 6 7 8 9 10

0.6798 0.6110 0.5403 0.4761 0.4424 0.4071

1.3415 1.1827 1.1381 1.0182 0.9694 0.9436

0.6798 0.6110 0.5403 0.4761 0.4424 0.4071

1.0007 0.9973 1.0285 1.0291 1.0280 1.0339

1.3415 1.1827 1.1381 1.0182 0.9694 0.9436

from an asymptotic variance and thus MSE point of view under standard asymptotics to use as many instruments as possible, this argument is invalid under the near nonidentification asymptotics we use here. The GMM estimator that minimizes the asymptotic MSE under these circumstances uses a subset of moment conditions that is much smaller than the set of all available moment conditions. Some intuition can be gained from our analytical results regarding bias minimization. In Theorem 3 it is shown that the bias minimal procedure in fact only uses a single moment condition. When focusing on the MSE this bias reduction needs to be balanced off against efficiency considerations. Our numerical evaluation of the MSE minimization problem shows that the optimum is an interior one with respect to the number of moment conditions used. This implies that neither the Bias minimal estimator that uses only one instrument, nor the first order efficient estimator that uses all available instruments is optimal in the MSE sense under the local to unity asymptotics adopted here. However, we cannot compare the MSE of the ASM and AB estimators to the LD estimator analytically because under the near unit root asymptotic limiting distribution the LD estimator has unbounded second moments. In order to gain some insight into the dispersion of the LD estimator we report inter quartile ranges of various procedures in Table 6. The results are further evidence of the fact that feasible ABH is the same as infeasible ABH while on the other hand the feasible ASM estimator is more dispersed than the infeasible version. The lack of moments problem for the LD estimator manifests itself in higher inter quartile ranges. The findings with respect to the relationship between bias and dispersion of estimators are somewhat similar to our findings in Hahn et al. (2004) where we found that 2SLS estimators do best in terms of IQR but have significantly larger biases while estimators that do better with bias such as LIML or Fuller’s estimator, tend to have larger IQR than 2SLS. At the same time the Monte Carlo results are consistent with the analytical result that the optimal estimator does not use all the moment conditions. Indeed, the optimal estimator uses only a small fraction of the moment conditions. The LD estimator does well in terms of MSE because of its small amount of bias and its use of a limited set of moment conditions. Implementation of a procedure that is optimal in terms of its MSE under nonstandard asymptotics is not recommended because such a procedure depends on the unknown parameter b through d. Instead, our Monte Carlo results show that the LD estimator is a reasonable, very easily implementable approximation to such an optimal estimator.

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

595

7. Conclusion We have considered the dynamic panel data model where it has been recognized that the usual IV (GMM) estimators have substantial bias for a large positive b, which commonly occurs in applied research. Our approach is an IV estimator that uses a reduced set of instruments, in particular ‘‘long differences’’. This estimator leads to a significant reduction in bias as predicted by a ‘‘weak instrument’’ analysis. Further, the long difference (LD) estimator does well in MSE comparisons with existing estimators. We then conduct a ‘‘local to unity’’ analysis to analyze why the LD estimator does significantly better than traditional second order theory predicts. We find that near the unit circle the optimal estimator in terms of bias uses only one moment restriction, similar to the LD estimator. We also find that the optimal estimator in terms of MSE uses a much restricted set of moment restrictions, typically about 1/10 of the available restrictions. Thus, our analysis demonstrates that the usual first order asymptotic advice of using all of the moment restrictions does not provide proper guidance in the dynamic panel data model for large b. Instead a restricted set of moment conditions lead to a better estimator. The LD estimator, which uses a restricted set of moment conditions, provides such an improved estimator as our Monte Carlo results demonstrate. Acknowledgment We are grateful to Yu Hu for exceptional research assistance. Helpful comments by Guido Imbens, Whitney Newey, James Powell, and participants of workshops at Arizona State University, Berkeley, Harvard/MIT, Maryland, Triangle, UCL, UCLA, UCSD are appreciated. Hahn gratefully acknowledges financial support from NSF Grant SES0313651. Kuersteiner gratefully acknowledges financial support from NSF Grant SES0095132 and SES-0523186. Appendix A. Higher order theory We consider a version of the GMM estimator developed by Arellano and Bover (1995), which simplifies the characterization of the ‘‘weight matrix’’ in GMM estimation. We define the innovation uit  ai þ it . Arellano and Bover (1995) eliminate the fixed effect ai in (1) by applying Helmert’s transformation rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  T t 1 n uit  ðui;tþ1 þ    þ uiT Þ ; t ¼ 1; . . . ; T  1 uit  T tþ1 T t instead of first differencing.20 The transformation produces ynit ¼ bxnit þ nit ;

t ¼ 1; . . . ; T  1,

where xnt  yni;t1 . Let zit  ðyi0 ; . . . ; yit1 Þ0 . Our moment restriction is summarized by E½zit nit  ¼ 0, t ¼ 1; . . . ; T  1. It can be shown that, with the homoskedasticity assumption on it , the optimal ‘‘weight matrix’’ is proportional to a block-diagonal matrix, with typical 20

Arellano and Bover (1995) note that the efficiency of the resultant GMM estimator is not affected whether or not Helmert’s transformation is used instead of first differencing.

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

596 Table 7

Finite sample properties of bbABH and bbBC T

n

b

%bias ðbbFABH Þ

%bias ðbbBC Þ

RMSE ðbbFABH Þ

RMSE ðbbBC Þ

Actual

Actual

Actual

Actual

Second order theory

5 10 5 10

100 100 500 500

0.1 0.1 0.1 0.1

14.96 14.06 3.68 3.15

17.71 15.78 3.54 3.16

0.25 0.77 0.38 0.16

0.08 0.05 0.04 0.02

0.08 0.05 0.04 0.02

5 10 5 10

100 100 500 500

0.5 0.5 0.5 0.5

10.05 6.76 2.25 1.53

12.09 8.00 2.42 1.60

1.14 0.93 0.15 0.11

0.13 0.06 0.06 0.03

0.13 0.06 0.06 0.03

5 10 5 10

100 100 500 500

0.9 0.9 0.9 0.9

50.22 24.27 20.50 8.74

118.64 52.66 23.73 10.53

42.10 15.82 6.23 2.02

0.55 0.25 0.28 0.10

0.78 0.23 0.30 0.08

diagonal block equal to E½zit z0it . Therefore, the optimal GMM estimator is equal to PT1 n0 PT1 n0 n xt Pt xnt  bb2SLS;t t¼1 xt Pt yt bbFABH  PT1 ¼ t¼1 , (10) P T1 n0 n0 n n t¼1 xt Pt xt t¼1 xt Pt xt where bb2SLS;t denotes the 2SLS of ynt on xnt : xn0 Pt ynt bb2SLS;t  nt 0 ; xt Pt xnt

t ¼ 1; . . . ; T  1

and xnt  ðxn1t ; . . . ; xnnt Þ0 , ynt  ðyn1t ; . . . ; ynnt Þ0 , Z t  ðz1t ; . . . ; znt Þ0 , and Pt  Zt ðZ 0t Zt Þ1 Z0t . Therefore, the GMM estimator bbFABH may be understood as a linear combination of the 2SLS estimators bb2SLS;1 ; . . . ; bb2SLS;T1 . It has long been known that the 2SLS may be subject to substantial finite sample bias. See Nagar (1959), Rothenberg (1983), Bekker (1994), Donald and Newey (2001), Hahn and Hausman (2002) and Kuersteiner (2000) for related discussion. It is therefore natural to conjecture that a linear combination of the 2SLS may be subject to quite substantial finite sample bias. This is indeed the case. In the first column of Table 7, we summarized finite sample bias of bbFABH as a fraction of b approximated by 5,000 Monte Carlo runs. A.1. Bias correction Many econometric estimators, say b y for y based on a sample of size n allow for an expansion of the form:

pffiffiffi b 1 ð2Þ 1 ð3Þ 1 ð1Þ nðy  yÞ ¼ y þ pffiffiffi y þ y þ op , (11) n n n where yð1Þ , yð2Þ and yð3Þ are of order Op ð1Þ. Typically yð1Þ has mean zero and converges in distribution to a normal distribution. Ignoring the op ðn1 Þ term, and taking an expectation,

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

597

pffiffiffi we obtain that the ‘‘approximate mean’’ of nðb y  yÞ equals p1ffiffin E½yð2Þ . We can therefore y. Based on the mapping between E½yð2Þ  understand 1n E½yð2Þ  as the ‘‘second order bias’’ of b pffiffiffi b ð2Þ , say. The bias and primitive parameters, we can n-consistently estimate E½yð2Þ  by E½y ð2Þ 1 b  has zero second order bias. It is not always a priori clear if corrected estimator b y  E½y n

the second order bias 1n E½yð2Þ  approximately equals the actual bias E½b y  y. It is also not a ð2Þ 1 b priori clear if the bias corrected estimator b y  E½y  indeed reduces much of the bias. n

For the dynamic panel model of interest, we addressed these questions by Monte Carlo approximation. For this purpose, we first derive the second order bias of bbFABH : Theorem 6. Under Conditions 1–2, the second order bias of bbFABH is equal to

B1 þ B2 þ B3 1 þo , n n

(12)

where B1  U1 1

T 1 X

1 xzz trððGzz t Þ Gt;t Þ,

t¼1

B2   2U2 1

T 1 X T 1 X

0

zz 1 xzz zz 1 zx Gzx t ðGt Þ Gt;s ðGs Þ Gs ,

t¼1 s¼1

B3  U2 1

T 1 X T 1 X

0

zz 1 zz 1 zx Gzx t ðGt Þ B3;1 ðt; sÞðGs Þ Gs .

t¼1 s¼t 0

zx xzz zx zz 1 0 n n n 0 n 0 where Gzz t  E½zit zit ; Gt  E½zit xit ; Gt;s  E½it xis zit zis , B3;1 ðt; sÞ  E½it zit Gs ðGs Þ zis zis  PT1 zx0 zz 1 zx and U1  t¼1 Gt ðGt Þ Gt .

Proof. Available at http://econ-www.mit.edu/faculty/jhausman/papers.htm. In Table 7, we compare the actual performance of bbFABH and the prediction of its bias based on Theorem 6. Table 7 tabulates the actual bias of the estimator approximated by 5,000 Monte Carlo runs, and compares it with the second order bias based on formula (12). It is clear that the second order theory does a reasonably good job except when b is close to the unit circle and n is small. b1 ; B b2 ; B b3 are 6 suggests a natural way of eliminating the bias. Suppose that B pffiffiTheorem ffi n-consistent estimators of B1 ; B2 ; B3 . Then it is easy to see that 1 b b b bbBC  bbFABH  ðB (13) 1 þ B2 þ B3 Þ n is first order equivalent to bbFABH , and has second order bias equal to zero. Define P P P ^ zz ¼ n1 n zit z0 , G ^ zx ¼ n1 n zit xn , G ^ xzz ¼ n1 n en xn zit z0 and B^ 3;1 ðt; sÞ ¼ n1 G it is it t t t;s i¼1 i¼1 i¼1 it is Pn n 0 zx zz 1 0 ^ ðG ^ Þ zis z ,where en  yn  xn bbFABH . Let B^ 1 ; B^ 2 and B^ 3 be defined by G e z is it it it t i¼1 it it t zx xzz ^ zz ^ zx ^ xzz ^ replacing Gzz t ; Gt , Gt;s and B3;1 ðt; sÞ by Gt ; Gt , Gt;s and B3;1 ðt; sÞ in B1 ; B2 and B3 . Second order asymptotic theory predicts approximately that bbBC would be relatively free of bias. We examined whether this prediction is reasonably accurate in finite sample by 5,000 Monte Carlo runs. Table 7 summarizes the properties of bbBC . We have seen in Table 7 that the second order theory is reasonably accurate unless b is close to one. It is therefore sensible

ARTICLE IN PRESS 598

J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

Table 8 Inconsistency of bbBB under misspecification b

bF 0.3

0.6

0.8

0.9

Weight matrix I 0.3 0.6 0.8 0.9

0.000 0.267 0.253 0.220

0.071 0.000 0.205 0.181

0.074 0.116 0.000 0.129

0.035 0.054 0.088 0.000

Weight matrix II 0.3 0.6 0.8 0.9

0.000 0.267 0.258 0.235

0.077 0.000 0.208 0.197

0.149 0.268 0.000 0.137

0.084 0.149 0.126 0.000

to conjecture that bbBC would have a reasonable finite sample bias property as long as b is not too close to one. We verify this conjecture in Table 7. We do not expect other methods of bias correction to be much superior to bbBC . Hahn et al. (2002) recently showed that higher order MSEs of bias corrected maximum likelihood estimators are invariant to methods of bias correction. In particular, they considered a third order expansion of various bias corrected estimators, and showed that bootstrap bias corrected MLE, jackknife bias corrected MLE, and analytically bias corrected MLE all have the same higher order MSE. Their result crucially depends on the fact that the MLE is first order efficient. Because bbFABH is first order efficient, we expect their result to carry over, and the MSE of the bootstrap bias corrected estimator, say, is not expected to be substantially different from that of a analytically bias corrected estimator bbBC . However, the second order asymptotics ‘‘fails’’ to be a good approximation around b  1. This phenomenon can be explained by the ‘‘weak instrument’’ problem. See Staiger and Stock (1997). Blundell and Bond (1998) argued that the weak instrument problem can be alleviated by assuming stationarity on the initial observation yi0 . The stationarity assumption turns out to be a predominant source of information around b  1 as noted by Hahn (1999). The stationarity condition may or may not be appropriate for particular applications, and substantial finite sample biases due to inconsistency will result under violation of stationarity.21 A.2. Higher order bias and MSE of LD estimator In this section, we examine the third order expansion of the LD estimator and compare it with two other estimators discussed in the literature. Note that unlike the approximate   s2 ai Under stationarity, we would have yi0 N 1b ; 1b 2 . We analyze departures from the stationary initial   s2 ai condition by assuming that instead yi0 N 1b ; 1b 2 . Because the asymptotic bias depends on the weight matrix, 21

F

we consider two versions of GMM estimators. The first one uses an identity matrix for initial weight matrix, whose asymptotic bias is summarized in the upper half of Table 8. The second one uses Blundell and Bond’s recommendation, whose asymptotic bias is summarized in the lower half of Table 8. The results show that the bias of Blundell and Bond’s estimator can be substantial under misspecification.

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

599

Table 9 Higher order theoretical properties of various estimators n

b

Bias predicted 0.1 100 0.5 100 0.9 100 RMSE 0.1 0.5 0.9

T

bbFABH

bbLD2AB

by second order theory 5 0.017 0.000 5 0.054 0.007 5 1.019 0.449

predicted by higher order 100 5 0.081 100 5 0.133 100 5 0.965

theory 0.099 0.097 0.609

bbLD1

bbLD2

bbLD3

bbLD4

bbFAS

bbBB

bbLDGMM

0.001 0.003 0.119

0.001 0.009 0.081

0.001 0.013 0.214

0.001 0.014 0.317

0.001 0.002 0.340

0.005 0.012 0.078

0.003 0.012 1.055

0.102 0.120 1.181

0.102 0.134 1.449

0.102 0.142 1.561

0.102 0.147 1.633

0.069 0.094 2.099

0.070 0.093 0.422

0.091 0.124 3.558

bias calculations of the previous section the expansions considered here are valid for any fixed number of instruments. Exact formulas tend to be very complicated and are not reported here.22 Many econometric estimators, say b y for y based on a sample of size n 1 allow for an expansion of the form (11). Ignoring the oppffiffiðn pffiffiffi b ffi b Þ term and taking expectations 2 of ð nðy  yÞÞ , we obtain the approximate MSE of nðy  yÞ equal to

1 2 2 1 E½ðyð1Þ Þ2  þ E½ðyð2Þ Þ2  þ pffiffiffi E½yð1Þ yð2Þ  þ E½yð1Þ yð3Þ  þ o . n n n n Ignoring the oðn1 Þ term, we may understand 1 1 2 2 E½ðyð1Þ Þ2  þ 2 E½ðyð2Þ Þ2  þ pffiffiffi E½yð1Þ yð2Þ  þ 2 E½yð1Þ yð3Þ  n n n n n

(14)

as the approximate MSE of b y. We examined the higher order bias and MSE for bbFAS ,23 the nonlinear estimator proposed by Ahn and Schmidt (1995) and bbBB , the estimator proposed by Blundell and Bond (1998). Both are GMM estimators, and the third order expansion of the usual two step estimator is critically dependent on the initial weight matrices. It can be shown that the third order expansion of the ‘‘four step’’ GMM estimator is invariant to the initial weight matrix.24 Rilstone et al. (1996) obtained higher order bias and MSE formulas for method of moments estimators. Their expressions are difficult to adapt to our situation and we have developed our own expansions. We computed (14) analytically for T ¼ 5, and summarized it along with the second order bias in Table 9. Table 9 also contains the higher order properties of LD estimators. Given the two step IV nature of the LD estimators, this required a separate expansion for IV estimator along with a third order expansion25 of 22

Details are available at http://econ-www.mit.edu/faculty/jhausman/papers.htm. The version of Ahn and Schmidt’s estimator (1995) adopted in this comparison does not utlize homoscedasticity. 24 The details are available at http://econ-www.mit.edu/faculty/jhausman/papers.htm. On a related note, Newey and Smith (2000) point out the second order expansion of ‘‘three’’ step GMM estimator is invariant to the initial weight matrix. 25 All derivations referred to in this paragraph are available at http://econ-www.mit.edu/faculty/jhausman/ papers.htm. 23

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

600

bbFABH .Unfortunately, analytic calculation of (14) for LD estimators was infeasible. We used a stochastic approximation to each term in (14) instead.26 Appendix B. Proofs

Proof of Lemma 1. Note that Zi0  yi0  ai =ðbn  1Þ such that Dyit ¼ bnt1 ðbn  1ÞZi0 þ it þ ðbn  1Þ

t1 X

bs1 n its

s¼1

and qffiffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi E½juit Dyis1 jp E½u2it  E½ðDyis1 Þ2  vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 2 !2 3 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiu s2 X u 5 ¼ s2 þ s2a tE4 bs2 br1 n ðbn  1ÞZi0 þ is1 þ ðbn  1Þ n is1r r¼1

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiu s2 u X s2 ðbn  1Þ2 2 þ ðb  1Þ2 s2 þ s bn2ðr1Þ ¼ Oð1Þ. ¼ s2 þ s2a tb2ðs2Þ n   n 1  b2n r¼1 P By independence of uit Dyis1 across i, it therefore follows that n3=2 ni¼1 uit Dyis1 ¼ op ð1Þ. P n 3=2 By Pthe same reasoning, we obtain nP and i¼1 uiT Dyij1 ¼ op ð1Þ, n n 3=2 3=2 n Dyik1 ¼ op ð1Þ. We therefore obtain n i¼1 ui P i¼1 f i;1 ¼ op ð1Þ. We can similarly obtain n3=2 ni¼1 gi;1 ¼ opP ð1Þ. P Next we consider n3=2 ni¼1 f i;2 and n3=2 ni¼1 gi;2 . Note that " !# t1 X E½Dyit yi0  ¼ E yi0 bt1 br1 n ðbn  1ÞZi0 þ it þ ðbn  1Þ n itr r¼1 2 ¼ bt1 n s

s2

bn  1 c=n þ oð1Þ ¼ þ oð1Þ, ¼ s2 2 2c=n 2 1  bn

and 2 E½ðDyit yi0 Þ2  ¼ E4y2i0 bt1 n ðbn  1ÞZi0 þ it þ ðbn  1Þ

t1 X

!2 3 5 br1 n it1r

r¼1

¼

ðb  1Þ2 s2 bn2ðt1Þ n 1  b2n þ

s2a

þ 2

ð1  bn Þ ! s2 X 2 2 2ðr1Þ 2 s þ ðbn  1Þ s bn r¼1

 1Þ2 ð1  b2n Þ2

s4 ðb 3bn2ðt1Þ  n

! s2a s2 þ ð1  bn Þ2 ð1  b2n Þ

ð2Þ ð3Þ We perform Monte Carlo integration by generating i.i.d. sequences of ðyð1Þ j ; yj ; yj Þ j ¼ 1; . . . ; J, and PJ ð1Þ 2 ð1Þ 2 1 approximating E½ðy Þ  by J j¼1 ðyj Þ for example. Our choice of J was 5,000. 26

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

601

cumðit ; it ; ai ; ai Þ 2 n þ OðnÞ c2 s2 s2 þ cumðit ; it ; ai ; ai Þ 2 ¼  a n þ OðnÞ c2 P such that Varðn3=2 ni¼1 Dyit yi0 Þ ¼ Oð1Þ. P Use the notation ct;t;a;a ¼ cumðit ; it ; ai ; ai Þ, ct;s;a;x ¼ cumðit ; is ; ai ; Zi0 Þ etc. For n3=2 ni¼1 gi;2 ðbn Þ we have from the moment conditions that E½gi;2 ðbn Þ ¼ 0 and þ

2s2 s2a ct;t;a;a þ 2ct;t1;a;a þ ct1;t1;a;a þ þ OðnÞ 2 ð1  bn Þ ð1  bn Þ2 2s2 s2a þ ct;t;a;a þ 2ct;t1;a;a þ ct1;t1;a;a 2 n þ OðnÞ. ¼ c2 P The joint limiting distribution of n3=2 ni¼1 ½f 0i;2  E½f 0i;2 ; gi;2 ðbn Þ0 0 can now be obtained from a triangular array CLT. By previous arguments VarðDuit ðbn Þyi0 Þ ¼

E½f 0i;2 ; gi;2 ðbn Þ0  ¼ ½ m0

0



0

with m ¼ s2 =2i þ Oðn1 Þ where i is the T  1 dimensional vector with elements 1. Then E½ðf 0i;2  E½f 0i;2 ; gi;2 ðbn Þ0 Þ0 ðf 0i;2  E½f 0i;2 ; gi;2 ðbn Þ0 Þ ¼ Sn , where " Sn ¼

S11;n S21;n

# S12;n . S22;n

By previous calculations we have found the diagonal elements of S11;n and S22;n to be ðs2 s2a þ ct;t;a;a Þc2 n2 and ð2s2 s2a þ ct;t;a;a þ 2ct;t1;a;a þ ct1;t1;a;a Þc2 n2 . The off-diagonal elements of S11;n are found to be " ! s1 X s1 r1 2 2 E½Dyit Dyis yi0  ¼ E yi0 bn ðbn  1ÞZi0 þ is þ ðbn  1Þ bn isr r¼1

 bt1 n ðbn  1ÞZi0 þ it þ ðbn  1Þ ðbn  1Þ2 ð1  b2n Þ

¼

s1 bt1 n bn

¼

ct;s;a;a 2 n þ OðnÞ c2

s2 s2a þ c0;0;a;a ð1  bn Þ2

t1 X

!

br1 n itr

r¼1

þ

s2 s2a c2

while n1 ðE½Dyit yi0 Þ2 ¼ Oð1Þ. Thus n2 S11;n ! diagð elements of S22;n are obtained from

E½Duit Duis y2i0  ¼

!#

ct;s;a;a þ OðnÞ ð1  bn Þ2

;...;

s2 s2a Þ c2

þ K1 . The off-diagonal

8 c c c þct1;s1;a;a s2 s2a > þ t;s;a;a t;s1;a;að1bt1;s;a;a þ Oðn2 Þ <  ð1b Þ2 Þ2 n

t ¼ s þ 1 or t ¼ s  1;

n

þct1;s1;a;a > ct;s;a;a ct;s;1;a;a ct1;s;a;a : ð1bn Þ2

2

2

n Oðn Þ

otherwise

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

602

such that n2 S22;n ¼ dM 2 þ K2 . For S12;n , we consider 8 2 2 s sa þ ct;s;a;a  ct;s1;a;a 2 > > n þ Oðn2 Þ if t ¼ s; > > 2 > c > < 2 2 E½Dyit Duis y2i0  ¼ s sa þ ct;s;a;a  ct;s1;a;a n2 þ Oðn2 Þ if t ¼ s  1; > c2 > > > ct;s;a;a  ct;s1;a;a 2 > > : n þ Oðn2 Þ otherwise; c2 where S12;n ! dM 1 þ K3 . It then follows that for ‘ 2 R2ðT1Þ such that ‘0 ‘ ¼ 1, P d n3=2 ni¼1 ‘0 S1=2 ½f 0i;2  Ef i;2 ; gi;2 ðbn Þ0 0 ! Nð0; 1Þ by the Lindeberg–Feller CLT for triann gular arrays. It then follows from a straightforward application of the Cramer–Wold P d theorem and the continuous mapping theorem that n3=2 ni¼1 ½f 0i;2 ; gi;2 ðbn Þ0 0 !½x0x ; x0y 0 P where ½x0x ; x0y 0 Nð0; SÞ. Note that n3=2 ni¼1 E½f i;2  ¼ Oðn1=2 Þ and thus does not affect the limit distribution. Finally note that E½gi;1 g0i;1  ¼ Oð1Þ. It therefore follows that " # 0 0 1 0 þ oð1Þ: & E½gi gi  ¼ 0 S22 n2 Proof of Lemma 2. From the previous results it follows that n3=2

n n X X ½f i;2  E½f i;2  ¼ n3=2 ½i1 ; . . . ; iT1 0 yi0 þ op ð1Þ i¼1

i¼1

as well as n3=2

n X

gi;2 ðbn Þ ¼ n3=2

n X

i¼1

½Di2 ; . . . ; DiT 0 yi0 þ op ð1Þ

i¼1

which implies that n3=2

n X

½f 0i;2  E½f 0i;2 ; gi;2 ðbn Þ0 0 ¼ n3=2

i¼1

n  X I T1 i¼1

0T1;1

H



½i1 ; . . . ; iT1 ; DiT 0 yi0 þ op ð1Þ,

where n3=2

n  X I T1 i¼1

0T1;1



H

and H is defined in (18).

d

½i1 ; . . . ; iT1 ; DiT 0 yi0 !½xx ; Hxu 

&

Proof of Theorem 1. From the properties of the Moore–Penrose inverse, see Magnus and Neudecker (1988, p. 33), it follows that 0 0 0 1=2 1=2 1=2 1=2 1=2 1=2 C~ 1 ðC~ 1 S22 C~ 1 Þþ C~ 1 ¼ S22 S22 C~ 1 ðS22 C~ 1 Þþ ðC~ 1 S22 Þþ0 C~ 1 S22 S22 , 1=2 where S22 C~ 1 is a ðT  1Þ  r matrix of rank r1 pT  1. By Theorem 1.16 of Magnus and Neudecker (1988) there exist matrices V and U of dimensions ðT  1Þ  r1 and r  r1 and a 1=2 diagonal matrix S of dimension r1  r1 such that V 0 V ¼ U 0 U ¼ I r1 and S22 C~ 1 ¼ VS 1=2 U 0 .

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

603

1=2 1=2 The unique representation of ðS22 C~ 1 Þþ then is ðS22 C~ 1 Þþ ¼ US 1=2 V 0 . This implies that 0 0 1=2 1=2 C~ 1 ðC~ 1 S22 C~ 1 Þþ C~ 1 ¼ S22 VV 0 S22 . 1=2

1=2 Next set C 1 ¼ S22 V ðV 0 S1 where C 1 is a matrix of dimension ðT  1Þ  r1 of full 22 V Þ column rank such that 1=2

1=2

1=2

1=2 1=2 0 1=2 þ ððV 0 S1 V V ðV 0 S1 Þ S22 V 0 S22 C 1 ðC 01 S22 C 1 Þþ C 01 ¼ S22 V ðV 0 S1 22 V Þ 22 V Þ 22 V Þ 1=2

1=2

1=2 1 1=2 0 0 1 ¼ S22 V ðV 0 S1 ðV 0 S1 V S22 22 V Þ 22 V Þ ðV S22 V Þ 1=2

1=2

¼ S22 VV 0 S22

1=2 0 1 1=2 and C 01 C 1 ¼ ðV 0 S1 V S22 V ðV 0 S1 ¼ I r1 from which it follows that 22 V Þ 22 V Þ

xðC 1 ; S22 Þ ¼ xðC~ 1 ; S22 Þ:

&

Proof of Theorem 2. Use the transformation z ¼ L1 xx with S11 ¼ LL0 . Define W ¼ ~ ¼ z0 L0 WHxu =z0 L0 WLz. Let F ¼ S21 S1 and use the fact ~ 1 Þ1 C 0 . Then xðC 1 ; OÞ C 1 ðC 01 OC 1 11 that E½Hxu jxx  ¼ F xx ¼ FLz. Using a conditioning argument it then follows that  0 0  ~ ¼ E z L DLz . E½xðC 1 ; OÞ z0 L0 WLz ¯ where D ¯ ¼ 1 L0 ðD þ D0 ÞL is symmetric. Define z1 ¼ Note that z0 L0 DLz ¼ z0 L0 D0 Lz ¼ z0 Dz 2 0 0 G1 z and z2 ¼ G2 z where G1 are the eigenvectors corresponding to the nonzero eigenvalues of L0 WL such that G1 L1 G01 ¼ L0 WL and G1 G01 þ G2 G02 ¼ I. It now follows that:  0 0 0   0 0 0   0 0  z L DLz z1 G1 L DLG1 z1 þ z01 L0 G01 DLG2 z2 z1 G1 L DLG1 z1 E 0 ¼E , ¼E z LWLz z01 L1 z1 z01 L1 z1 where the first equality follows from G02 W ¼ 0 and the second equality follows from independence between z1 and z2 . The result then follows frompSmith (1993, Eq. 2.4, p. 273). ffiffiffi When Condition 5 is satisfied note that we can take L ¼ dI r1 and that F ¼ S21 S1 11 ¼ 1 M 01 where the second equality is based on S1 11 ¼ d I r1 . The last statement then follows by the same arguments as before. & Proof of Theorem 3. We first analyze E½xðC 1 ; S22 Þ. Note that in this case W ¼ C 1 ðC 01 S22 C 1 Þ1 C 01 ¼ d1 C 1 ðC 01 M 2 C 1 Þ1 C 01 such that  0 0  ¯ 1 z1 z G DG E½xðC 1 ; S22 Þ ¼ E 1 01 z1 L 1 z1 ¯ ¼ d=2 tr W ðM 01 þ M 1 Þ ¼ r1 =2 since M 01 þ M 1 ¼ M 2 where r1 is the rank of and d tr D C 1 . Then it follows from Smith (1993, p. 273 and Appendix A)  0 0  1 ð1Þ 1 X ¯ 1 z1 z1 G1 DG k 2 1þk 1;k 1 ¯ ¯ 1 ; I r  l¯ 1 L1 Þ,

E C 1þk ðG01 DG (15) ¼l r1 1 z01 L1 z1 2 1þk k! k¼0 where for any two real symmetric matrices Y 1 ; Y 2 k ð12 Þki k! X

trðY 1 Y i2 ÞC ki ðY 2 Þ C 1;k ðY ; Y Þ ¼ 1 2 1þk 2 12 kþ1 i¼0 ðk  iÞ!

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

604

and C k ðY 2 Þ ¼

k! ð12 Þk

d k ðY 2 Þ ¼ k1

d k ðY 2 Þ,

k1 X 1 j¼0

2

trðY kj 2 Þd j ðY 2 Þ,

d 0 ðY 2 Þ ¼ 1. Since all elements l¯ 1 li in l¯ 1 L1 satisfy 0ol¯ 1 li p1 it follows that I r1  l¯ 1 L1 is positive semidefinite implying that C k ðI r1  l¯ 1 L1 ÞX0, which holds with equality only if all eigenvalues li are the same. Also note that, if Y 2 ; Y 3 are diagonal matrices and Y 1 any conformable matrix, then tr Y 3 Y 1 Y i2 ¼ tr Y 1 Y 3 Y i2 . Therefore, ¯ 1 ðI r  l¯ 1 L1 Þi ¼ tr G01 ðWM 01 þ M 1 W ÞG1 ðI r  l¯ 1 L1 Þi tr G01 DG 1 1 0 0 0 1 ¼ tr ðL1 G M G1 þ G M 1 G1 L1 ÞðI r  l¯ 1 L1 Þi 1

2

1

1

1

¼ 12 tr G01 ðM 01 þ M 1 ÞG1 L1 ðI r1  l¯ 1 L1 Þi ¼  1 tr G0 M 2 G1 L1 ðI r  l¯ 1 L1 Þi . 2

1

1

Next, note that for two positive semi-definite matrices Y 1 and Y 2 it follows that trY 1 Y 2 ¼ 1=2 1=2 1=2 1=2 trY 2 Y 1 Y 2 X0 because Y 2 Y 1 Y 2 is well defined and positive semi-definite by positive semi-definiteness of Y 1 . Next note that L1 ðI r1  l¯ 1 L1 Þi is positive semi-definite by ¯ 1 ðI r  previous arguments. Also, G01 M 2 G1 is positive definite. Therefore, every trðG01 DG 1 ¯l1 L1 Þi ÞC ki ðI r  l¯ 1 L1 Þ has the same sign, and so does every C 1;k ðG0 DG ¯ 1; I r  1 1þk 1 1 l¯ 1 L1 Þ.Therefore, we have   1   0 k 2 1þk 1;k   ¯ 1 X ð1Þ 1 0 ¯ ¯ r jE½xðC 1 ; S22 ÞjXl C ðG ; I  l L Þ DG 1 r1 1 . 1 1þk 1   k! 2 1þk k¼0 For k ¼ 0, we have 1 0 ¯ 0 ¯ ¯ 1 C 1;0 1 ðG1 DG1 ; I r1  l L1 Þ ¼ tr G1 DG1 X  d r1 =2.

(16a)

¯ ¯ For the last inequality let lD i be the eigenvalues of D. By Magnus and Neudecker (1988, P ¯ r 1 ¯ is of reduced rank r1 such that ¯ 1 X i¼1 Theorem 13, p. 211), tr G01 DG lD . Note that D i1 r Pr1 D¯ 1 1 1 ¯ ¼ i¼1 li ¼ d r1 =2. Also, ð1Þ0 tr D 2 1 = 2 1 ¼ r1 . This shows that jE½xðC 1 ; S22 ÞjX 1 0 l¯ =2 for all C 1 such that C 1 C 1 ¼ I r1 and all r1 2 f1; 2; . . . ; T  1g. Then

min 0

C 1 s:t: C 1 C 1 ¼I r1 r1 2f1;2;...;T1g

jE½xðC 1 ; S22 ÞjX

min 0

C 1 s:t: C 1 C 1 ¼I r1 r1 2f1;2;...;T1g

l¯ 1 min l j X , 2d 2

(17)

where min l j is the smallest eigenvalue of M 2 . Since l¯ is the largest eigenvalue of W and W has the same nonzero eigenvalues as d1 ðC 01 M 2 C 1 Þ1 by Zhang (1999, Theorem 2.8, p. 51), it follows that l¯ is the largest eigenvalue of d1 ðC 01 M 2 C 1 Þ1 . The last inequality in (17) follows from Magnus and Neudecker (1988, Theorem 10, p. 209) as well as the fact that l¯ is d1 times the largest eigenvalue of ðC 01 M 2 C 1 Þ1 . Now let r1 ¼ 1 and C 1 ¼ ri , where ri is the eigenvector corresponding to min l j . Note that for this case W ¼ d1 ri r0i = min l j such ¯ 1 ¼ 1=ð2dÞ that G1 ¼ ri and l¯ ¼ d1 ðmin l j Þ1 . Then I r1  l¯ 1 L1 ¼ 1  1 ¼ 0 and G01 DG

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

605

¯ 1 j ¼ ð1= min l j Þ1 =2 ¼ min l j =2. Inequality (17) such that jE½xðC 1 ; S22 Þj ¼ l¯ 1 jtr G01 DG therefore holds with equality. Next consider E½xðC 1 ; I T1 Þ. Note that now W ¼ C 1 ðC 01 C 1 Þ1 C 01 ¼ C 1 C 01 since C 01 C 1 ¼ ¯ 1 ¼ 0 such I r1 such that W ¼ G1 L1 G01 with L1 ¼ I r1 and G1 ¼ C 1 . Then, l¯ ¼ 1 and I r1  lL 0 ¯ 0 ¯ ¯ ¼ that (15) reduces to E½xðC 1 ; I T1 Þ ¼ r1 C 1;0 ðG ; 0Þ ¼ tr G =r . But then D DG DG 1 1 1 1 1 1 1 ¯ 1 =r1 ¼ tr C 01 DC ¯ 1 =r1 ¼ 21 tr C 01 M 2 C 1 =r1 ¼ 21 ðC 1 C 01 M 1 þ M 01 C 1 C 01 Þ such that tr G01 DG ¯ 1 . We analyze tr D=r min

C 1 s:t: C 01 C 1 ¼I r1 r1 2f1;2;...;T1g

j  21 tr C 01 M 2 C 1 r1 j.

It can be checked easily that M 2 is positive definite symmetric. We can therefore minimize trðC 01 M 2 C 1 Þ. It is now useful to choose an orthogonal matrix R with jth row rj such that R0 R ¼ P RR0 ¼ I and M 2 ¼ RLR0 where L is the diagonal of eigenvalues of PT1 matrix 0 0 0 0 M 2 ¼ T1 l r r . Then it follows that trðC M C Þ ¼ l r C C r 2 1 1 j¼1 j j j j¼1 j j 1 1 j . Next note that all the eigenvalues of C 1 C 01 are either zero or one such that 0pr0j C 1 C 01 rj p1. The minimum of trðC 01 M 2 C 1 Þ is then found by choosing r1 ¼ 1 and C 1 such that C 01 rj ¼ 0 except for the ¯ 1 Þ is also minimized for r1 ¼ 1 eigenvector ri corresponding to min l j . To show that trðD=r ¯ 1 Þ ¼ min l j =2, consider augmenting C 1 by a column vector x and C 1 ¼ ri , where trðD=r suchP that x0 x ¼ 1 and r0i x ¼ 0. Then C 01 C 1 ¼ I 2 , r2 ¼ 2 and tr C 01 M 2 C 1 ¼ PT1 T1 0 2 0 2 l i þ jai l j ðrj xÞ . By Parseval’s equality jai ðrj xÞ ¼ 1. Since l j Xl i we can bound 0 0 trðC 1 M 2 C 1 ÞX2l i but then trðC 1 M 2 C 1 =2ÞXl i . This argument can be repeated to more than ¯ 1 Þ is minimized for one orthogonal addition x. It now follows that E½xðC 1 ; I T1 Þ ¼ trðD=r r1 ¼ 1 and C ¼ ri , where ri is the eigenvector corresponding to the smallest eigenvalue. Next note that from x0 x ¼ 1 such that min l i px0 M 2 xp max l i it follows that: min l i p10 M 2 1=ð10 1Þ ¼ 2ðT  1Þ1 for 1 ¼ ½1; . . . ; 10 which shows that the smallest eigenvalue is bounded by a monotonically decreasing function of the number of moment conditions. & Proof of Lemma 3. The result for b^ASM follows directly from Lemma 1 and the fact that AB AB the finite dimensional matrix C AB;n converges to a limit C AB ¼ ½C AB is 0 ; C 1  where C 0 ^ ^ defined in the obvious way. The result for bAB follows directly from Corollary 1. For bGMM we note gni;2 ¼ Bn gi;2 and f ni;2 ¼ Bn f i;2 such that the asymptotic properties of Pn that n0 3=2 n 0 n i¼1 ½f i;2 ; gi;2  follow directly from previous results where n3=2

n X

d

n0 n0 0 ½f i;2 ; gi;2  !½x0x Bn0 ; x0y Bn0 0 .

i¼1

For gni;1 ¼ BðBn I T ÞðDui ui Þ note that Ekgni;1 kpkBðI T Bn ÞkEkui Dui k pkBðI T Bn Þkðtr Eðui u0i Dui Du0i ÞÞ1=2 o1 as well as Ekf ni;1 kpkBðI T Bn ÞkEkui Dyi;1 k pkBðI T Bn Þkðtr Eðui u0i Dyi;1 Dy0i;1 ÞÞ1=2 o1

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

606

such that n1 before that

Pn

n

n

i¼1 ½f i;1 ; gi;1 

¼ Op ð1Þ and n3=2

Pn

n

n

i¼1 ½f i;1 ; gi;1 

¼ op ð1Þ. These results imply as

0 n0 AB þ AB0 n ðC AB0 Bn S22 Bn0 C AB d x B C 1 Þ C 1 B Hxu b^ABH  1 ! x 0 n0 1 AB 1 AB0 n  xABH . þ AB0 n xx B C 1 ðC 1 B S22 Bn0 C AB 1 Þ C 1 B xx For b^LD we have ( 2 2 2sa s ; tas; Euit uis ðuiT  ui1 Þ2 ¼ 2ðs2a þ s2 Þs2 ; t ¼ s P such that n3=2 ni¼1 gLD i;1 !p 0. For Euit ðyiT1  yi0 Þ note that

yiT1  yi0 ¼ ðbT1  1ÞZi0 þ n

T 1 X

bs1 n iTs

s¼1

leading to s2 Euit ðyiT1  yi0 Þ ¼ bTt1 n and T 1 X

Euit uis ðyiT1  yi0 Þ2 ¼ ðbT1  1Þ2 EðZ2i0 uit uis Þ þ n

bns1 1 bns1 1 EðiTs1 iTs2 uit uis Þ

s1 ;s2 ¼1

¼ ðbT1  1Þ n ¼

T 1 X

T 1 X ¼ sg þ s2a Þ þ bns1 1 bns1 1 EðiTs1 iTs2 uit uis Þ 2 1  bn s1 ;s2 ¼1

2 2 2 s ðs 1ft

bns1 1 bns1 1 EðiTs1 iTs2 uit uis Þ þ Oðn1 Þ.

s1 ;s2 ¼1

P This implies that n3=2 ni¼1 uit ðyiT1  yi0 Þ ¼ op ð1Þ. Finally, note that " # C AB0 1 ASM0 C1 ¼ 0T2;T1 such that " S22 C ASM C ASM0 1 1

¼

AB C AB0 1 S22 C 1

0

0

0

#

with " S22 C ASM Þþ ðC ASM0 1 1 d

¼

AB þ ðC AB0 1 S22 C 1 Þ

0

0

0

# .

d

This implies xASM ¼ xAB . To show that xAB ¼ xABH note that C AB 1 has rank T  1. Then, by Theorem 1 there exists a ðT  1Þ  ðT  1Þ matrix C AB of full rank such that C 0AB C AB ¼ I d

and xðC AB 1 ; S22 Þ ¼ xðC AB ; S22 Þ. But then xðC AB ; S22 Þ ¼ xðC AB K; S22 Þ for any nonsingluar n matrix K. In the same way, C AB0 1 B is of rank T  1. By the same argument as before there

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617 d

607

d

exists a full rank matrix C ABH such that xABH ¼ xðBn0 C AB 1 ; S22 Þ ¼ xðC ABH ; S22 Þ. Then d

AB choose K ¼ C 1 AB C ABH to show that xðC 1 ; S22 Þ ¼ xðC ABH ; S22 Þ. For the second equality note that " # 0T2;T1 LD0 C1 ¼ 10T1

such that

2

6 LD þ ðC LD0 1 S22 C 1 Þ ¼ 4

0T2;T2 01;T2

3 0T2;1 7 1 5. 10T1 S22 1T1

This implies that LD0 LD þ x0x C LD 1 ðC 1 S22 C 1 Þ

and

" C LD0 1 Hxu ¼

0T2;T1

¼

LD þ ½01;T2 ; x0x 1T1 ðC LD0 1 S22 C 1 Þ



x0 1T1 ¼ 01;T2 ; 0 x 1T1 S22 1T1



#

10T1 Hxu

such that xLD ¼

LD0 LD þ LD0 x0x C LD x0 1T1 10T1 Hxu 10T1 S22 1T1 1 ðC 1 S22 C 1 Þ C 1 Hxu ¼ x0 ¼ xAH : 0 LD LD0 LD þ LD0 1T1 S22 1T1 xx C 1 ðC 1 S22 C 1 Þ C 1 xx ðx0x 1T1 Þ2

&

Proof of Corollary f‘‘ ASM’’ ; ‘‘ AB’’ g the matrix W a is of full rank T  1. pffiffiffi 3. For a ¼ 1 Thus, for L ¼ dI T1 , z ¼ L xx Nð0; I T1 Þ and  0 0 a   0 0 a 1  z L W HE½xu jxx  z L W d S21 Lz E½xðC a1 ; S22 Þ ¼ E ¼ E z0 L0 W a Lz z0 L0 W a Lz and the result follows from Smith (1993, Eq 2.4, p. 273). For a ¼ f‘‘ LD’’ g note that x0x W a Hxu z01 G01 W a Hxu ¼ x0x W a xx z01 L1 z1 by the same arguments as in the proof of Theorem 2. The remainder of the proof is the same as in that theorem. & 4. We note that 3  0 0  07 7 .. 7 .. .. 7  M5 . .5 . 1  1 0

Proof of Lemma 2 0 61 6 ¯n ! 6 . B 6. 4.

P j ~ i1 ¼ and b¯ n ! 1T where B¯ n and b¯ n are defined in (9). Also define y~ it ¼ t1 j¼0 bn uitj and g ¯ ~ i1 ¼ BðI T1 Bn ÞðDu u Þ such that from the definition of B in (19) it follows that g i i P ~ it þ yi0 ÞDuik ; y~ iT Duij  for s ¼ 2; . . . ; T; t ¼ 1; . . . ; s  2; j ¼ 2; . . . ; T  ½y~ it1 Duis ; T 1 ð T1 t¼1 y

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

608

1 1; k ¼ 2; . . . ; T. It can be checked easily thatEðy~ it1 Duis Þ ¼ OðnP Þ for s ¼ 2; . . . ; T; 1 1 ~ it þ yi0 ÞDuik  ¼ t ¼ 1; . . . ; s  2, Ey~ iT Duij ¼ Oðn Þ for j ¼ 2; . . . ; T  1 and E½T ð T1 t¼1 y T 1 s2 þ Oðn1 Þ. This shows that Eg~ i1 ¼ T 1 s2 ½0; . . . ; 0; 10T1 ; 0; . . . ; 00 . Note that from ^ arguments in the proof of Lemma 3 the estimator P P bFABH is a continuous function of n3=2 ni¼1 ½f 0i2 ; g0i2 0 such that convergence of n3=2 ni¼1 ½f 0i;1 ; f 0i;2 ; g0i;1 g0i;2 0 and b^FABH is joint and P the continuous mapping theorem can be invoked to establish the limit of 0 n3=2 ni¼1 ½f^i;1 ; g^ 0i;1 0 . It then follows from previous results that

n3=2

n X

g^ i;1 ¼ ðb^FABH  bn ÞBðI T1 b¯ n Þn3=2

i¼1

n X

gi2 þ op ð1Þ

i¼1 d

! xABH BðI T1 1T ÞHxu and by the same arguments as before it follows that n3=2

n X

f^i;1 ¼ ðb^FABH  bn ÞBðI T1 b¯ n Þn3=2

i¼1

n X

f i2 þ op ð1Þ

i¼1 d

! xABH BðI T1 1T Þxx . Based on these results one concludes that n3=2

n X 0 ½f^i;1 ; f 0i;2 ; g^ 0i;1 g0i;2 0 i¼1

¼n

2

3=2

6 n 6 X 6 6 6 i¼1 4

ðb^FABH  bn ÞBðI T1 b¯ n Þ I T1

3

0

7" # 7 f i;2 7 7 g ðb^FABH  bn ÞBðI T1 b¯ n Þ 7 5 i;2 I T1 0

0 0

þ op ð1Þ. and the statement of the Lemma follows from previous arguments.

&

Proof of Lemma 5. One needs to consider g^ 0i;1 ðb^FABH Þ and g0i;2 ðb^FABH Þ. First, for Dyi ¼ ½yiT  yiT1 ; . . . ; yi2  yi1 0 g0i;2 ðb^FABH Þ ¼ yi0 ðDyi  b^FABH Dyi;1 Þ ¼ yi0 Dui  ðb^FABH  bn Þyi0 Dyi;1 ¼ g  ðb^FABH  b Þf . i;2

n

i;2

Second, ¯ n ÞÞððDyi  b^FABH Dyi;1 Þ ui Þ g^ i1 ðb^FABH Þ ¼ BðI TðT1Þ þ ðb^FABH  bn ÞðI T1 B þ ðb^FABH  bn ÞBðI T1 b¯ n Þg0i;2 ðb^FABH Þ

¼ BðI TðT1Þ þ ðb^FABH  bn ÞðI T1 B¯ n ÞÞððDui ui Þ  ðb^FABH  bn ÞðDyi;1 ui ÞÞ þ ðb^FABH  bn ÞBðI T1 b¯ n Þðgi;2  ðb^FABH  bn Þf i;2 Þ

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

609

We have seen before that EðDui Du0i ui u0i Þ ¼ Oð1Þ, EðDyi;1 Dy0i;1 ui u0i Þ ¼ Oð1Þ and EðDui Dy0i;1 ui u0i Þ ¼ Oð1Þ. Furthermore, for some conformable, fixed matrices K 1 , K 2 and K3 EðDui Du0i ui yi0 Þ ¼ EðDui Du0i Þ Eðui yi0 Þ þ nK 1

s2a 2 ¼ s M 2 1T þ K 1 n, c and in the same way EðDui Dy0i;1 ui yi0 Þ ¼

s2 s2 M 1 a 1T þ K 2 n, c

and EðDyi;1 Dy0i;1 ui yi0 Þ ¼

s2 s2 I T1 a 1T þ K 3 n. c

These results imply that " # n X f i;2 0 3 ^ n ¼ Rn n O ½ f i;2 gi;2

g0i;2 R0n þ op ð1Þ,

i¼1

where " Rn ¼

ðb^FABH  bn ÞBðI T1 b¯ n Þ ðb^FABH  b ÞI T1

ðb^FABH  bn Þ2 BðI T1 b¯ n Þ

#

I T1

n

and d

"

Rn ! Rx ðxABH Þ ¼

xABH BðI T1 1T Þ x2ABH BðI T1 1T Þ xABH I T1

I T1

# :

&

Proof of Theorem 5. As before unobserved innovations are replaced with u^ i such that LD ¯ n ÞÞðDui ui Þ g^ LD ðI TðT1Þ þ ðb^FABH  bn ÞBLD ðI T1 B i;1 ¼ B þ ðb^FABH  bn ÞBLD ðI T1 b¯ n Þgi2 P d LD where BLD is defined in (22) and n3=2 ni¼1 g^ LD ðI T1 1T ÞHxu . Similarly, i;1 ! xABH B LD ¯ n ÞÞðDyi;1 ui Þ f^LD ðI TðT1Þ þ ðb^FABH  bn ÞðI T1 B i;1 ¼ B þ ðb^FABH  bn ÞBLD ðI T1 b¯ n Þf i2

such that 2 3=2

n

6 n X 6 LD0 0 LD0 0 0 d 6 ^ ½f i;1 ; f i;2 ; g^ i;1 gi;2  !6 4 i¼1

xABH BLD ðI T1 1T Þ I T1 0 0

0

3

# 7" 7 xx 0 7 . xABH BLD ðI T1 1T Þ 7 5 Hxu I T1

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

610

By the same arguments as before it also follows that for ^ LD ¼ n3 O n

n X

0 0 LD0 ^ ^ ^ ^ i;1 ðbFABH Þ; g0i;2 ðb^FABH Þ ½g^ LD0 i;1 ðbFABH Þ; gi;2 ðbFABH Þ ½g

i¼1

one obtains ^ LD O n

¼

3 RLD n n

" # n X f i;2 0 ½ f i;2 gi;2

g0i;2 RLD0 þ op ð1Þ, n

i¼1

where " RLD n ¼

ðb^FABH  bn ÞBLD ðI T1 b¯ n Þ ðb^FABH  bn ÞI T1

ðb^FABH  bn Þ2 BLD ðI T1 b¯ n Þ

#

I T1

the weight matrix converges to the random limit d 0 LD ^ LD ! O RLD n x ðxABH ÞSRx ðxABH Þ

where d LD RLD n ! Rx ðxABH Þ

" ¼

# xABH BLD ðI T1 1T Þ x2ABH BLD ðI T1 1T Þ . xABH I T1 I T1

Next note that BLD ðI T1 1T Þ ¼ 10T1 1T2 , 10T1 S11 1T1 ¼ ðT  1Þd, 10T1 S12 1T1 ¼ d and 10T1 S22 1T1 ¼ 2d. Use the short hand notation xABH ¼ xG . It then follows that: " # a1T2 10T2 b1T2 0 LD LD0 LD LD , C Rx ðxG ÞSRx ðxG Þ C ¼ b10T2 d b ¼ x2G ðT  1Þd  x3G d  xG d  2x2G d where a ¼ x2G ðT  1Þd þ 2x3G d þ 2x4G d, 2 d ¼ xG ðT  1Þd þ 2xd þ 2d. It then follows that: " # e1T2 10T2 f 1T2 0 LD þ LD0 LD LD ðC Rx ðxG ÞSRx ðxG Þ C Þ ¼ f 10T2 g for some coefficients e; f ; g as well as " 0 LD þ LD0 LD C LD ðC LD0 RLD Þ C ¼ x ðxG ÞSRx ðxG Þ C

e1T2 10T2

f 1T2 10T1

f 1T1 10T2

g1T1 10T1

# .

Finally, noting that BLD ðxx 1T Þ ¼ ðx0x 1T1 Þ1T2 one obtains 0 LD þ LD0 ^ LD x^ x ðxG Þ0 C LD ðC LD0 RLD Þ C xx ðxG Þ x ðxG ÞSRx ðxG Þ C

¼ ex2G ð10T2 1T2 Þ2 ðx0x 1T1 Þ2 þ 2f xG ð10T2 1T2 Þðx0x 1T1 Þ2 þ gðx0x 1T1 Þ2 as well as 0 LD þ LD0 ^ LD x^ x ðxG Þ0 C LD ðC LD0 RLD Þ C xy ðxG Þ x ðxG ÞSRx ðxG Þ C

¼ ðex2G ð10T2 1T2 Þ2 þ 2f xG ð10T2 1T2 Þ þ gÞðx0x 1T1 Þðx0u H 0 1T1 Þ.

and

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

611

This now implies that xFLD ¼

ðx0x 1T1 Þðx0u H 0 1T1 Þ ¼ xLD : ðx0x 1T1 Þ2

&

Appendix C. Definitions of infeasible estimators in Section 4 C.1. Definitions for limiting distributions Definitions for Lemma 1: Use the notation ct;s;a;a ¼ cumðit ; is ; ai ; ai Þ. Then, 2 3  c1;T1;a;a c1;1;a;a 6 7 .. .. .. 7, K1 ¼ 6 . . . 4 5 cT1;1;a;a

2 6 K2 ¼ 6 4

cT1;T1;a;a



3

c2;2;a;a  c2;1;a;a  c1;2;a;a þ c1;1;a;a .. .

 .. .

c2;T;a;a  c2;T1;a;a  c1;T;a;a þ c1;T1;a;a .. .

cT;2;a;a  cT;1;a;a  cT1;2;a;a þ cT1;T1;a;a



cT;T;a;a  cT;T1;a;a  cT1;T;a;a þ cT1;T1;a;a

2 6 K3 ¼ 6 4

c1;2;a;a  c1;1;a;a .. .

 .. .

c1;T;a;a  c1;T1;a;a .. .

cT1;2;a;a  cT1;1;a;a



cT1;T;a;a  cT1;T1;a;a

7 7, 5

3 7 7. 5

Define 2

1

6 6 H¼6 6 4

1 ..

0

0 .

0 

..

.

1

1 0

3 0 .. 7 .7 7. 7 05 1

(18)

C.2. Selector matrix B Let B be given as B ¼ ½S 0D;T2 ; . . . ; S 0D;1 ; S0D;0 ; S 0D;1 0 ,

(19)

where

SD;j

8 ;I ;0 ;0  ½0 > < T1j;ðT1jÞðT1Þ T1j T1j;j T1j;jðT1Þ 1 0 ¼ I T1 T 1T > : ½0 ;I ;0  T2;ðT1ÞðT1Þ

T2

T2;1

for j40 for j ¼ 0 for j ¼ 1

with I i the i  i identity matrix and 0j;i a j  i matrix of zeros. The convention is used that when the index i or j of 0i;j is zero then the corresponding submatrix is absent.

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

612

C.3. Definition of the Arellano and Bond estimator Define the matrices S AB;n ¼ ½bjn ; . . . ; b0n , 0;j 2 S AB;n 1;k

3

01;k

7 6 ½S AB;n 7 6 0;0 ; 01;k1  7 6 ¼ 60kþ1;ðk1Þk=2 ; ; 0kþ1;qkðk1Þ=2k 7, . .. 7 6 5 4 AB;n ½S0;k1 

where q ¼ TðT þ 1Þ=2  2 and " AB;n # C0 AB;n C ¼ C AB;n 1 with h i AB;n0    S AB;n0 C AB;n ¼ 0q;1 S 1;1 1;T2 , 0 2 1 0  6 6 0 1 bn 0    6 6 6 1 bn b2n 0    AB;n ¼6 C1 6 .. 6 6 . 4 1

3



bT2 n

7 7 7 7 7 7. 7 7 7 5

AB;n Then limn!1 C AB;n ¼ C AB ¼ C AB 0 and limn!1 C 1 1 are given by 0 2 1 0  6 0 10 0  2 6 6 0 AB0 AB0 6 1 0 3 0 S1;1    S 1;T2 ; C AB C AB 0 ¼ ½ q;1 1 ¼6 6 .. 6 . 4

where 2

S AB 1;k

01;k

3

7 6 ½1; 01;k1  7 6 7 6 7 6 ½1; 1; 0  1;k2 ; 0 ¼ 60kþ1;ðk1Þk=2 ; kþ1;qkðk1Þ=2k 7. 7 6 .. 7 6 . 5 4 ½1; . . . ; 1

3 7 7 7  7 7, 7 7 5 0 1T1

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

613

Then, using these definitions the moment conditions for the AB estimator can be expressed as gi;AB ¼ C AB;n0 BðDui ui Þ þ C AB;n0 yi0 Dui 0 1 " # gi;1 ¼ C AB;n0 , gi;2 where B is defined in Eq. (19) and gi;AB ¼ ½yi;0 Dui2 ; yi;0 Dui3 ; yi1 Dui3 ; yi;0 Dui4 ; . . . ; yi2 Dui4 ; . . . ; yi0 DuiT ; . . . ; yiT2 DuiT 0 . The AB estimator can therefore be written as b^AB  bn ¼ ðf 0n C AB;n ðC AB;n0 On C AB;n Þþ C AB;n0 f n Þ1 f 0n C AB;n ðC AB;n0 On C AB;n Þþ C AB;n0 gn ðbn Þ,

where gn ðbn Þ, f n and On are defined in Section 3. C.4. Definitions for the Anderson and Hsiao estimator Define the matrices ¼ ½bjn ; . . . ; b0n , SAH;n 0;j

SAH;n ¼ ½01;ðk1Þk=2 ; SAH;n 1;k 0;k1 ; 01;qkðk1Þ=2k , where q ¼ TðT þ 1Þ=2  2 and " C AH;n ¼

C AH;n 0

#

C AH;n 1

with 2

C AH;n ¼ ½ 0q;1 0

C AH ¼ 1T1 . 1

SAH;n0 1;1



S AH;n0 1;T2 1T1 ;

C AH;n 1

1  60 b n 6 6 6 ¼6 6 6 4

0 b2n ..

. bT2 n

3 7 7 7 7 71T1 7 7 5

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

614

C.5. Definition of the Arellano and Bover estimator First define the Helmert transform through the given as rffiffiffiffiffiffiffiffiffiffiffiffi 2 rffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffi T 1 T  1T  2 T  1T 6 T T T  1 T T 6 rffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffi 6 6 T 2 T  2T 6 0 6 T  1 T  1T 6 6 .. . .. Bn ¼ 6 . 6 6 6 6  6 6 6 4 0 

T  1  T  1 non-singular matrix Bn rffiffiffiffiffiffiffiffiffiffiffiffi T 1  T T rffiffiffiffiffiffiffiffiffiffiffiffi T 2  T  1T .. rffiffiffi r.ffiffiffi 2 21 3 32 rffiffiffi 1 0 2

3 1 3 2

3 1 17 7 7 1 7 7 27 7 7 7. 7 7 7 7 7 7 7 5

(20)

Then the moment conditions for the ABH estimator can be constructed from gi;ABH ¼ C AB;n0 BðBn I T ÞðDui ui Þ  C AB;n0 Bn yi0 Dui 0 1 " n # gi;1 ¼ C AB;n0 n , gi;2 where gi;ABH ¼ ½yi;0 uni1 ; yi;0 uni2 ; yi1 uni2 ; yi;0 uni3 ; . . . ; yi2 uni3 ; . . . ; yi0 uniT1 ; . . . ; yiT2 uniT1 0 , and we have defined gni;1 ¼ Bðuni ui Þ and

gni;2 ¼ yi0 uni

with uni ¼ ½uni1 ; . . . ; uniT1 0 . Also define gnn ¼ n3=2 n0 n0 0 ½f i;1 ; f i;2  where f ni;1 ¼ BðBn I T ÞðDyi;1 ui Þ

and

Pn

i¼1

n0 n0 0 ½gi;1 ; gi;2  and f nn ¼ n3=2

Pn

i¼1

f ni;2 ¼ Bn yi0 Dyi;1 .

The Arellano and Bover estimator can therefore be written as b^ABH  bn ¼ ðf nn0 C AB;n ðC AB;n0 Onn C AB;n Þþ C AB;n0 f nn Þ1 f nn0 C AB;n ðC AB;n0 Onn C AB;n Þþ C AB;n0 gnn , (21) where Onn ¼ Egnn gnn0 . C.6. Definition of the ASM estimator Define C ASM;n as " ASM;n # C0 ASM;n C ¼ , C ASM;n 1

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

615

where " C ASM;n0 0

¼

"

#

C AB;n0 0 ½0T2;ðT2ÞðT1Þ=2þT1 ; I T2 

;

C ASM;n0 1

¼

C AB;n0 1 0T2;T1

# .

The moment conditions can be represented using the selector matrix C ASM;n such that BðDui ui Þ þ C ASM;n0 yi0 Dui gi;ASM ¼ C ASM;n0 0 1 " # gi;1 , ¼ C ASM;n0 gi;2 where a more explicit representation of gi;ASM is given as gi;ASM ¼ ½yi;0 Dui2 ; yi;0 Dui3 ; yi1 Dui3 ; . . . ; yi0 DuiT ; . . . ; yiT2 DuiT ; uiT Dui2 ; . . . ; uiT DuiT1 0 . The ASM estimator can therefore be written as b^ASM  bn ¼ ðf 0n C ASM;n ðC ASM;n0 On C ASM;n Þþ C ASM;n0 f n Þ1 f 0n C ASM;n ðC ASM;n0 On C ASM;n Þþ C ASM;n0 gn ðbn Þ, where gn ðbn Þ, f n and On are defined in Section 3. C.7. Definition of the LD estimator Define the matrix BLD ¼ 10T1 ½ 0T2;1

I T2

0T2;1 

(22)

such that LD gLD ðDui ui Þ; i;1 ¼ B

gLD i;2 ¼ gi;2 ¼ yi0 Dui .

Then define " C

LD0

¼

I T2

0T2;T1

01;T2

10T1

# LD0  ½C LD0 0 ; C1 

such that " gi;LD ¼ C

LD0

gLD i;1 gLD i;2

# .

LD0 LD0 LD LD 3=2 ðDy ui Þ, f LD ¼ ½f LD0 Let f LD i;1 ¼ B i;2 ¼ yi0 Dyi;1 , f i i;1 ; f i;2 , f n ¼ n Pn i;1 LD 3=2 LD0 LD0 gn ¼ n i¼1 ½gi;1 ; gi;2 . The LD estimator can then be written as LD;n LD;n þ LD;n0 LD 1 LD0 LD;n b^LD  bn ¼ ðf LD0 ðC LD;n0 OLD Þ C fn Þ fn C n C n C LD;n þ LD;n0 LD ðC LD;n0 OLD Þ C gn , n C LD LD0 where OLD gn . n ¼ E½gn

Pn

i¼1

f LD and i

ARTICLE IN PRESS 616

J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

References Ahn, S., Schmidt, P., 1995. Efficient estimation of models for dynamic panel data. J. Econometrics 68, 5–28. Ahn, S., Schmidt, P., 1997. Efficient estimation of dynamic panel data models: alternative assumptions and simplified estimation. J. Econometrics 76, 309–321. Alonso-Borrego, C., Arellano, M., 1999. Symmetrically normalized instrumental-variable estimation using panel data. J. Bus. Econ. Statist. 17, 36–49. Alvarez, J., Arellano, M., 1998. The time series and cross-section asymptotics of dynamic panel data estimators. Unpublished manuscript. Anderson, T.W., Hsiao, C., 1981. Estimation of dynamic models with error components. J. Amer. Statist. Assoc. 76 (375), 598–606. Anderson, T.W., Hsiao, C., 1982. Formulation and estimation of dynamic models using panel data. J. Econometrics 18, 47–82. Arellano, M., Bond, S.R., 1991. Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Rev. Econ. Stud. 58, 277–297. Arellano, M., Bover, O., 1995. Another look at the instrumental variable estimation of error-component models. J. Econometrics 68, 29–51. Bekker, P., 1994. Alternative approximations to the distributions of instrumental variable estimators. Econometrica 62, 657–681. Blundell, R., Bond, S., 1998. Initial conditions and moment restrictions in dynamic panel data models. J. Econometrics 87, 115–143. Davis, A.W., 1980. Invariant polynomials with two matrix arguments extending the zonal polynomials. In: Krishnaiah, P.R. (Ed.), Mulitivariate Analysis, vol. V. North-Holland, Amsterdam, pp. 287–299. Donald, S., Newey, W., 2001. Choosing the number of instruments. Econometrica 69, 1161–1191. Griliches, Z., Hausman, J., 1986. Errors in variables in panel data. J. Econometrics 31, 93–118. Hahn, J., 1997. Efficient estimation of panel data models with sequential moment restrictions. J. Econometrics 79, 1–21. Hahn, J., 1999. How informative is the initial condition in the dynamic panel model with fixed effects? J. Econometrics 93, 309–326. Hahn, J., Hausman, J., 2002. A new specification test for the validity of instrumental variables. Econometrica 70, 163–189. Hahn, J., Kuersteiner, G., 2002a. Asymptotically unbiased inference for a dynamic panel model with fixed effects when both n and T are large. Econometrica 70, 1639–1657. Hahn, J., Kuersteiner, G., 2002b. Discontinuities of weak instrument limiting distributions. Econ. Lett. 75, 325–331. Hahn, J., Kuersteiner, G., Newey, W.K., 2002. Higher order properties of bootstrap and Jackknife bias corrected maximum likelihood estimators. Unpublished manuscript. Hahn, J., Hausman, J., Kuersteiner, G., 2004. Estimation with weak instruments: accuracy of higher order bias and MSE approximations. Econometrics J. 7, 272–306. Hausman, J.A., Taylor, W.E., 1983. Identification in linear simultaneous equations models with covariance restrictions: an instrumental variables interpretation. Econometrica 51, 1527–1550. Hausman, J.A., Newey, W.K., Taylor, W.E., 1987. Efficient estimation and identification of simultaneous equation models with covariance restrictions. Econometrica 55, 849–874. Holtz-Eakin, D., Newey, W.K., Rosen, H., 1988. Estimating vector autoregressions with panel data. Econometrica 56, 1371–1395. Kiviet, J.F., 1995. On bias, inconsistency, and efficiency of various estimators in dynamic panel data models. J. Econometrics 68, 1–268. Kruiniger, H., 2000. GMM estimation of dynamic panel data models with persistent data. Unpublished manuscript. Kuersteiner, G.M., 2000. Mean squared error reduction for GMM estimators of linear time series models. Unpublished manuscript. Lancaster, T., 1997. Orthogonal parameters and panel data. Unpublished manuscript. Magnus, J., Neudecker, H., 1988. Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, New York. Nagar, A.L., 1959. The bias and moment matrix of the general k-class estimators of the parameters in simultaneous equations. Econometrica 27, 575–595.

ARTICLE IN PRESS J. Hahn et al. / Journal of Econometrics 140 (2007) 574–617

617

Newey, W.K., Smith, R.J., 2000. Asymptotic equivalence of empirical likelihood and the continuous updating estimator. Unpublished manuscript. Nickell, S., 1981. Biases in dynamic models with fixed effects. Econometrica 49, 1417–1426. Rilstone, P., Srivastava, V.K., Ullah, A., 1996. The second-order bias and mean squared error of nonlinear estimators. J. Econometrics 75, 369–395. Rothenberg, T.J., 1983. Asymptotic properties of some estimators in structural models. In: Studies in Econometrics, Time Series, and Multivariate Statistics. Smith, M.D., 1993. Expectations of ratios of quadratic forms in normal variables: evaluating some top-order invariant polynomials. Australian J. Statist. 35, 271–282. Staiger, D., Stock, J.H., 1997. Instrumental variables regressions with weak instruments. Econometrica 65, 557–586. Stock, J.H., Wright, J.H., 2000. GMM with weak identification. Econometrica 68, 1055–1096. Zhang, F., 1999. Matrix Theory. Springer, Berlin.