ARTICLE IN PRESS
Journal of Econometrics 126 (2005) 241–267 www.elsevier.com/locate/econbase
Estimation of a panel data model with parametric temporal variation in individual effects Chirok Hana,, Luis Oreab, Peter Schmidtc a
School of Economics and Finance, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand b University of Oviedo, Spain c Michigan State University, MI, USA Available online 2 July 2004
Abstract This paper is an extension of Ahn et al. (J. Econom. 101 (2001) 219) to allow a parametric function for time-varying coefficients of the individual effects. It provides a fixed-effect treatment of models like those proposed by Kumbhakar (J. Econom. 46 (1990) 201) and Battese and Coelli (J. Prod. Anal. 3 (1992) 153). We present a number of GMM estimators based on different sets of assumptions. Least squares has unusual properties: its consistency requires white noise errors, and given white noise errors it is less efficient than a GMM estimator. We apply this model to the measurement of the cost efficiency of Spanish savings banks. r 2004 Elsevier B.V. All rights reserved. JEL classification: C13; C31 Keywords: GMM; Generalized method of moments; Time-varying individual effects; Fixed-effects; Cost efficiency
Corresponding author.
E-mail address:
[email protected] (C. Han). 0304-4076/$ - see front matter r 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2004.05.002
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
242
1. Introduction In this paper we consider the model yit ¼ X 0it b þ Z0i g þ lt ðyÞai þ it ;
i ¼ 1; . . . ; N; t ¼ 1; . . . ; T:
ð1Þ
We treat T as fixed, so that ‘‘asymptotic’’ means as N ! 1. The distinctive feature of the model is the interaction between the time-varying parametric function lt ðyÞ and the individual effect ai . We consider the case that the ai are ‘‘fixed effects,’’ as will be discussed in more detail below. In this case estimation may be non-trivial due to the ‘‘incidental parameters problem,’’ i.e., the number of a’s grows with sample size; see, for example, Chamberlain (1980). Models of this form have been proposed and used in the literature on frontier productions functions (measurement of the efficiency of production). For example, Kumbhakar (1990) proposed the case that lt ðyÞ ¼ ½1 þ expðy1 t þ y2 t2 Þ 1 , and Battese and Coelli (1992) proposed the case that lt ðyÞ ¼ expð yðt TÞÞ. Both of these papers considered random effects models in which ai is independent of X and Z. In fact, both of these papers proposed specific (truncated normal) distributions for the ai , with estimation by maximum likelihood. The aim of the present paper is to provide a fixed-effects treatment of models of this type. There is also a literature on the case that the lt themselves are treated as parameters. That is, the model becomes yit ¼ X 0it b þ Z0i g þ lt ai þ it ;
i ¼ 1; . . . ; N; t ¼ 1; . . . ; T:
ð2Þ
This corresponds to using a set of dummy variables for time rather than a parametric function lt ðyÞ, and now lt ai is just the product of fixed time and individual effects. This model has been considered by Kiefer (1980), Holtz-Eakin et al. (1988), Lee (1991), Chamberlain (1992), Lee and Schmidt (1993), Ahn et al. (2001) and Bai (2003), among others. This model is sometimes called a one-factor model. Lee (1991) and Lee and Schmidt (1993) have applied this model to the frontier production function problem, in order to avoid having to assume a specific parametric function lt ðyÞ. Another motivation for the model is that a fixed-effects version allows one to control for unobservables (e.g., macro-events) that are the same for each individual, but to which different individuals may react differently. Ahn et al. (2001) establish some interesting results for the estimation of model (2). A generalized method of moments (GMM) estimator of the type considered by Holtz-Eakin et al. (1988) is consistent given exogeneity assumptions on the regressors X and Z. Least squares applied to (2), treating the ai as fixed parameters, is consistent provided that the regressors are strictly exogenous and that the errors it are white noise. The requirement of white noise errors for consistency of least squares is unusual, and is a reflection of the incidental parameters problem. Furthermore, if the errors are white noise, then a GMM estimator that incorporates the white noise assumption dominates least squares, in the sense of being asymptotically more efficient. This is also a somewhat unusual result, since in the usual linear model with normal errors, the moment conditions implied by the white noise assumption would not add to the efficiency of estimation.
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
243
The results of Ahn, Lee and Schmidt apply only to the case that the lt are unrestricted, and therefore do not apply to model (1). However, in this paper we show that essentially the same results do hold for model (1). This enables us to use a parametric function lt ðyÞ, and to test the validity of this assumption, while maintaining only weak assumptions on the ai . This may be very useful, especially in the frontier production function setting. Applications using unrestricted lt have yielded temporal patterns of efficiency that seem unreasonably variable and in need of smoothing, which a parametric function can accomplish. The plan of the paper is as follows. Section 2 restates the model and lists our assumptions. Section 3 considers GMM estimation under basic exogeneity assumptions, while Section 4 considers GMM when we add the conditions implied by white noise errors. Section 5 considers least-squares estimation and the sense in which it is dominated by GMM. In Section 6, this methodology is applied to the measurement of cost efficiency of Spanish banks. Finally, Section 7 contains some concluding remarks.
2. The model and assumptions The model is given in Eq. (1). We can rewrite it in matrix form, as follows. Let yi ¼ ðyi1 ; . . . ; yiT Þ0 ; X i ¼ ðX i1 ; . . . ; X iT Þ0 , and i ¼ ði1 ; . . . ; iT Þ0 . Define a function l : Y ! RT , where Y is a compact subset of Rp , such that lðyÞ ¼ ðl1 ðyÞ; . . . ; lT ðyÞÞ0 . lðyÞ must be normalized in some way such as lðyÞ0 lðyÞ 1 or l1 ðyÞ 1, to rule out trivial failure of identification arising from lðyÞ ¼ 0 or scalar multiplications of lðyÞ. Here we choose the normalization l1 ðyÞ 1. With this notation, our model in matrix form is yi ¼ X i b þ 1T Z 0i g þ lðyÞai þ i ;
i ¼ 1; . . . ; N:
ð3Þ
Here 1T is a T 1 vector of ones. We will let K be the number of variables in X it ; g will be the number of variables in Z i ; and p will be the dimension of y. Then yi is T 1; X i is T K; Z i is g 1 (and 1T Z0i is T g), lðyÞ is T 1; i is T 1; b is K 1; g is g 1; y is p 1 and ai is a scalar. (In this paper, all the vectors are column vectors, and the data matrices are ‘‘vertically tall.’’) We repeat that in this paper asymptotics are as N ! 1 with T fixed, so we view T as a fixed, relatively small number. The analysis of this paper would not apply well to the case where N and T are both large. We assume that ppT 1. When po T 1, our parametric specification for lðyÞ restricts the temporal pattern of l. However, this model also includes the model of Ahn et al. (2001) as the special case corresponding to p ¼ T 1 and lt ðyÞ ¼ yt ; t ¼ 2; . . . ; T (with l1 ðyÞ ¼ 1 as above). Let W i ¼ ðX 0i1 ; . . . ; X 0iT ; Z 0i Þ0 . We make the following ‘‘orthogonality’’ and ‘‘covariance’’ assumptions. Assumption 1 (Orthogonality). EðW 0i ; ai Þ0 0i ¼ 0. Assumption 2 (Covariance). Ei 0i ¼ s2 I T .
ARTICLE IN PRESS 244
C. Han et al. / Journal of Econometrics 126 (2005) 241–267
Assumption 1 says that it is uncorrelated with ai ; Z i , and X i1 ; . . . ; X iT , and therefore contains an assumption of strict exogeneity of the regressors. Note that it does not restrict the correlation between ai and ½Z i ; X i1 ; . . . ; X iT , so that we are in the fixed-effects framework. Assumption 2 asserts that the errors are white noise with finite variance. We also assume the following regularity conditions. Assumption 3 (Regularity). (i) ðW 0i ; ai ; 0i Þ0 is independently and identically distributed over i; (ii) i has finite fourth moment, and Ei ¼ 0; (iii) ðW 0i ; ai Þ0 has finite non-singular second moment matrix; (iv) EW i ðZ 0i ; ai Þ is of full column rank; (v) lðyÞ is twice continuously differentiable in y. The first four of these conditions correspond to assumptions (BA.1)–(BA.4) of Ahn et al. (2001), who give some explanation. We note that condition (iii) requires that the effect ai be correlated with some variable in W i . (It does not require that ai be correlated with all variables in W i ; it just rules out complete uncorrelatedness of ai and W i .) This condition is needed for identification under the orthogonality assumption only. Condition (v) is new, and self-explanatory.
3. GMM under the orthogonality assumption 3.1. Moment conditions and GMM estimation Let uit ¼ uit ðb; gÞ ¼ yit X 0it b Z 0i g, and ui ¼ ui ðb; gÞ ¼ ðui1 ; . . . ; uiT Þ0 . Since uit ¼ lt ðyÞai þ it , it follows that uit lt ðyÞui1 ¼ it lt ðyÞi1 , which does not depend on ai . This is a sort of generalized within transformation to remove the individual effects. The orthogonality assumption (Assumption 1) then implies the following moment conditions: EW i ½uit ðb; gÞ lt ðyÞui1 ðb; gÞ ¼ 0;
t ¼ 2; . . . ; T:
ð4Þ
These moment conditions can be written in matrix form, as follows. Define GðyÞ ¼ ½ l ðyÞ; I T 1 0 , where l ¼ ðl2 ; . . . ; lT Þ0 . The generalized within transformation corresponds to multiplication by GðyÞ0 , and the moment conditions (4) can equivalently be written as follows: Eb1i ðb; g; yÞ ¼ 0;
where b1i ðb; g; yÞ ¼ GðyÞ0 ui ðb; gÞ W i :
ð5Þ
(This corresponds to Eq. (7) of Ahn et al. (2001), but looks slightly different because our W i is a column vector whereas theirs is a row vector.) This is a set of ðT
1ÞðTK þ gÞ moment conditions. Han et al. (2002) contains a proof that these are all of the distinct moment conditions implied by the orthogonality assumption.
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
245
P ^ ^ , and y^ Let b1 ðb; g; yÞ ¼ N 1 N i¼1 b1i ðb; g; yÞ. Then the optimal GMM estimator b; g based on the orthogonality assumption solves the problem min N b1 ðb; g; yÞ0 V 1 11 b1 ðb; g; yÞ b;g;y
ð6Þ
where V 11 ¼ Eb1i b01i evaluated at the true parameters. As usual, V 11 can be replaced by any consistent estimate. A standard estimate would be N 1X ~ g~ ; yÞb ~ 1i ðb; ~ g~ ; yÞ ~ 0 V^ 11 ¼ b1i ðb; N i¼1
ð7Þ
~ g~ ; yÞ ~ is an initial consistent estimate of ðb; g; yÞ such as GMM using identity where ðb; weighting matrix. Under certain regularity conditions (Hansen, 1982, Assumption 3) pffiffiffiffiffi the resulting GMM estimator is N -consistent and asymptotically normal. The variance of the asymptotic distribution of the GMM estimates of b; g and y
1 equals ðB01 V 1 where V 11 ¼ Eb1i b01i and B1 is the matrix of expected derivative 11 B1 Þ of the moment conditions with respect to the parameters. Han et al. (2002) gives an explicit formula for B1 . In practice, the variance expression would be evaluated using consistent estimates V^ 11 and B^ 1 in place of V 11 and B1 . V^ 11 is given in (7). A consistent estimate of B^ 1 can be obtained as N 1 X B^ 1 ¼ B^ 1i ; N i¼1
0 0 ^ Þ W i: B^ 1i ¼ ðG^ X i ; G^ 1T Z 0i ; u^ i1 L
ð8Þ
Here B^ 1i is the matrix of first derivatives of b1i with respect to the parameters ðb; g; yÞ, evaluated at the GMM estimates. In this expression, L is the ðT 1Þ p matrix defined by L ¼ ql =qy0 , where as above l ¼ ðl2 ; . . . ; lT Þ0 . Equivalently, if L is the T p matrix defined by L ¼ ql=qy0 , then L is made up of the last T 1 rows of L. 3.2. Possible simplification (NCH assumption) A practical problem with this GMM procedure is that it is based on a rather large set of moment conditions. For example, in our empirical analysis, b1i will reflect 846 moment conditions. Some ways of dealing with this problem will be discussed in Section 4.3. However, in this section we note that we can reduce the number of moment conditions considerably without sacrificing efficiency of estimation if we make the following assumption of no conditional heteroskedasticity (NCH) of i : NCH Eði 0i jW i Þ ¼ S . Under the NCH assumption, V 11 ¼ E½G0 i 0i G W i W 0i ¼ G0 S G SWW :
ð9Þ
Using this result, the set of moment conditions (5) can be converted into an exactly identified set of moment conditions that yield an asymptotically equivalent GMM estimate. Specifically, we can replace the moment conditions Eb1i ¼ 0 by the moment conditions EB01 V 1 11 b1i ¼ 0. Routine calculation using the forms of B1 ; V 11 and b1i
ARTICLE IN PRESS 246
C. Han et al. / Journal of Econometrics 126 (2005) 241–267
yields the explicit expression: EX 0i GðG0 S GÞ 1 G 0 ui ¼ 0
ð10aÞ
EZ i 10T GðG 0 S GÞ 1 G0 ui ¼ 0
ð10bÞ
1 0 0 0 ES0W a S 1 WW W i L ðG S GÞ G ui ¼ 0:
ð10cÞ
These three sets of moment conditions, respectively, correspond to (21a), (21b), and (21c) of Ahn et al. (2001, p. 229). The point of this simplification is that we have drastically reduced the set of moment conditions: there are ðT 1ÞðTK þ gÞ moment conditions in b1i (Eq. (5)) but only K þ g þ p moment conditions in (10). We note that this is a stronger result than the corresponding result (Proposition 1, p. 229) of Ahn et al. (2001). In order to reach essentially the same conclusion on the reduction of the number of moment conditions, they impose the assumption that i is independent of ðW i ; ai Þ, a much stronger (and unnecessarily stronger) assumption than our NCH assumption. In order to make this procedure operational, we need to replace the nuisance parameters S ; SW a and SWW by consistent estimates, based on some initial consistent GMM estimates of b; g and y. SWW can be consistently estimated by P ^ WW ¼ N 1 N W i W 0 . Also, for any sequence ðbN ; gN Þ that converges in S i i¼1 probability to ðb0 ; g0 Þ, we have N p 1 X ui ðbN ; gN Þui ðbN ; gN Þ0 ! S þ s2a lðy0 Þlðy0 Þ0 : N i¼1
~ g~ ; yÞ, ~ Since GðyÞ0 lðyÞ ¼ 0, for any initial consistent estimate ðb; ! N X ~ g~ Þui ðb; ~ g~ Þ0 GðyÞ ~ 0 N 1 ~ ui ðb; GðyÞ
ð11Þ
ð12Þ
i¼1
will consistently estimate G 0 S G. Thus it is easy to construct a consistent estimate of V 11 as given in (9). In order to consistently estimate the asymptotic variance of the estimates under NCH, we need to estimate SWW ; SW a , and G 0 S G. Estimation of SWW and G 0 S G was discussed above. We can obtain an estimate of SW a from the GMM problem (5). A direct algebraic calculation gives us that 0 N N X 0 0 l^ u^ i 1 X dGðG 0d ^ ^ Wa ¼ 1 ð13Þ S Wi 0
W i ½l0 S S GÞ 1 G^ u^ i =ðl^ lÞ N i¼1 l^ l^ N i¼1 0 ^ g^ Þ; l^ ¼ lðyÞ; ^ G^ ¼ GðyÞ, ^ and ld where u^ i ¼ ui ðb; G is a consistent estimate of PN ^ 0 S 0
1 ^ ^ i u^ 0i G. l S G, one possibility of which is N i¼1 l u It is important to observe that the moment conditions (10) are linear combinations of the moment conditions based on b1i in (5), and therefore they are valid moment conditions under the orthogonality assumption only. That is, these moment conditions hold and can be used as a valid basis of GMM estimation so long as the orthogonality assumption holds, whether or not the NCH assumption holds. The
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
247
set of moment conditions in (5) may be very large, and so the simplification involved in using the exactly identified (minimal size) set of moment conditions (10) may be useful in practice. The only point of the NCH assumption is that, if it holds, the conditions (10) are the optimal exactly identified set of moment conditions, so that GMM using (10) is just as efficient as GMM using the full set of moment conditions given in (5). If the NCH condition does not hold, we can still base GMM on (10), but there is a loss of efficiency relative to using the full set of moment conditions (5). We have already discussed how to estimate the variance matrix of the GMM estimator under the NCH assumption. However, because the moment conditions (10) are still valid without the NCH assumption, it is useful to have an estimate of the variance of the GMM estimate that is consistent whether or not the NCH condition holds. Standard methods starting with the moment conditions (10) should yield such an estimate. Some details are given in Appendix A.
4. GMM under the orthogonality and covariance assumptions 4.1. Moment conditions and GMM estimation In this section we continue to maintain the orthogonality assumption (Assumption 1), but now we add the covariance assumption (Assumption 2), which asserts that Ei 0i ¼ s2 I T . Clearly the covariance assumption holds if and only if Eðui u0i Þ ¼ s2a ll0 þ s2 I T :
ð14Þ
Condition (14) contains TðT þ 1Þ=2 distinct moment conditions. It also contains the two nuisance parameters s2a and s2 , and so it should imply TðT þ 1Þ=2 2 moment conditions for the estimation of b; g and y. These are in addition to the moment conditions (5) implied by the orthogonality assumption. To write these moment conditions explicitly, we need to define some notation. Let H ¼ diagðH 2 ; H 3 ; . . . ; H T Þ, with H t equal to the T ðT tÞ matrix of the last T t columns (the ðt þ 1Þth through Tth columns) of I T for to T, and with H T equal to a T ðT 2Þ matrix of the second through ðT 1Þth columns of I T . Then we can write the distinct moment conditions implied by the orthogonality and covariance assumptions as follows: Eb1i ðb; g; yÞ ¼ 0;
where b1i ¼ G 0 ui W i
ð15aÞ
Eb2i ðb; g; yÞ ¼ 0;
where b2i ¼ H 0 ðG 0 ui ui Þ
ð15bÞ
Eb3i ðb; g; yÞ ¼ 0;
where b3i ¼ G 0 ui
l0 ui : l0 l
ð15cÞ
(In these expressions, G is short for GðyÞ; l is short for lðyÞ, and ui is short for ui ðb; gÞ.)
ARTICLE IN PRESS 248
C. Han et al. / Journal of Econometrics 126 (2005) 241–267
The moment conditions b1i in (15a) are exactly the same as those in (5) of the previous section, and follow from the orthogonality assumption. The moment conditions b2i in (15b) correspond to those in Eq. (12) of Ahn et al. (2001). Note that it is not the case that EðG0 ui ui Þ ¼ 0. Rather, looking at a typical element of this product, we have Eðuit lt ui1 Þuis , which equals zero for sat and sa1. The selection matrix H 0 picks out the logically distinct products of expectation zero, the number of which equals TðT 1Þ=2 1. The selection matrix H plays the same role as the definition of the matrices U it plays in Ahn et al. (2001). We note that the moment conditions b2i follow from the non-autocorrelation of the it ; homoskedasticity would not be needed. The ðT 1Þ moment conditions in b3i in (15c) correspond to those Pin Eq. (13) of Ahn et al. (2001). They assert that, for t ¼ 2; . . . ; T; Eðuit lt ui1 Þð Ts¼1 ls uis Þ ¼ 0, and their validity depends on both the non-autocorrelation and the homoskedasticity of the it . Han et al. (2002) contains a proof that these are all of the distinct moment conditions implied by the orthogonality and covariance assumptions. 4.2. Possible simplification (CIM4 assumption) The set of moment conditions (15) may be large, since the number of moment conditions in (15a) may be large. This point will be discussed further in the next section. In this section we note that we can reduce the number of moment conditions without a loss of efficiency if the following ‘‘conditional independence of the moments up to fourth order’’ (CIM4) assumption holds: CIM4 Conditional on ðW i ; ai Þ; it is independent over t ¼ 1; 2; . . . ; T, with mean zero, and with second, third and fourth moments that do not depend on ðW i ; ai Þ or on t. This is a strong assumption; it implies the orthogonality assumption, the covariance assumption, the NCH assumption, and more. Given assumption CIM4, the moment conditions (10), which are asymptotically equivalent to (15a), can be simplified as follows: EX 0i PG ui ¼ 0
ð16aÞ
EZ i 10T PG ui ¼ 0
ð16bÞ
0 ES0W a S 1 WW W i L PG ui ¼ 0:
ð16cÞ
That is, in place of the large set of moment conditions (15a)–(15c), we can use the reduced set of moment conditions consisting of (16), (15b) and (15c). (As previously, L ¼ ql=qy0 ; and PG is the projection onto G, where G was defined in the discussion above Eq. (5).) We can note that, when S ¼ s2 I, the moment conditions (10) are the same as (16). This is not surprising since, if the CIM4 assumption is true, so is the NCH assumption.
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
249
A final simplification arises if, conditional on ðW i ; ai Þ; it is i.i.d. normal. In this case, (15b) can be shown to be redundant given (15a) and (15c). (See Proposition 4 of Ahn et al., 2001, p. 231.) Hence, in that case, the GMM estimator using the moment conditions (16) and (15c) is efficient. We note that the simplifications that arise here, given the CIM4 assumption or the i.i.d. normal assumption, are similar in spirit to those that arose in Section 3 under the NCH assumption. For example, the set of moment conditions consisting of (16), (15b) and (15c) is much smaller than the full set (15), and this simplified set of moment conditions may be useful in practice, whether or not the CIM4 assumption holds. The point of the CIM4 assumption is simply that it identifies the circumstances under which we can use the reduced set of moment conditions without a loss of efficiency. If the CIM4 assumption does not hold, the GMM estimator using the reduced set of moment conditions is still consistent (so long as the orthogonality and covariance assumptions hold), but it would be less efficient than the GMM estimator using the full set of moment conditions (15). Similar comments apply to the simplification that arises from dropping (15b): we can always do this, but it causes a loss of efficiency if the i.i.d. normal assumption does not hold. In Appendix A we show how to calculate an estimate of the variance of these GMM estimates that is consistent whether or not the CIM4 or i.i.d. normal assumptions hold. Han et al. (2002) also gives an explicit expression for their variance, but calculated under the assumption that CIM4 holds. 4.3. Further discussion of the problem of too many moment conditions The number of moment conditions implied by the orthogonality assumption, with or without the covariance assumption, may be very large. For example, in our empirical example, the number of moment conditions (dimension of b1i ) implied by the orthogonality assumption is 846, and the number of moment conditions (dimension of b1i ; b2i and b3i ) implied by the orthogonality and covariance assumptions is 872. One problem that can arise, and which does arise in our empirical example, is that the estimated variance of the moment conditions ðV^ 11 Þ is singular if the number of moment conditions exceeds the sample size, N. (Our empirical example has N ¼ 50, so this clearly occurs.) This is true because the rank of V^ 11 cannot exceed N, but its dimension is the number of moment conditions. A mechanical solution to the þ problem is to use the Moore–Penrose pseudo-inverse V^ 11 as the weighting matrix. For a justification of this procedure, see Doran and Schmidt (2002). Another solution to this problem would be to reduce the number of moment conditions by using a subset of them. One possibility is to replace the instruments W i in (5) or (15a) by Pi , where Pi is a subset of W i . Some details relevant to this procedure are given in Appendix B. Now consider the set of moment conditions (10) of Section 3.2. (Similar comments apply to the set of moment conditions (16) of Section 4.2.) This is an exactly identified set of moment conditions so questions of the weighting matrix do not apply. Note that (10c)—or (16c)—contains SWW which is estimated by
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
250
P ^ WW ¼ N 1 N W i W 0 . This matrix will be singular if the number of instruments S i i¼1 (dimension of W i ) exceeds N. (This occurs in our empirical example because N ¼ 50 but the dimension of W i is 141.) Once again, a generalized inverse can be used. However, in this case the calculation of the generalized inverse can be avoided, by the following observation. In (10c) or (16c) we need to calculate S0W a S 1 WW W i , which is just the linear projection of ai on W i . When the dimension of W i exceeds N; S^ WW is singular but the projection is still well-defined, and in fact in this case the projection of ai on W i just equals ai . (We have a ‘‘perfect fit’’ in the regression because we have more regressors than observations.) As a result, when the dimension of W i exceeds N, we simply replace S0W a S 1 WW W i by ai . To operationalize this, we need an estimate of ai . The simplest estimate is 0 0 ð17Þ a^ i ¼ l^ u^ i =l^ l^ ~ u^ i ¼ ui ðb; ~ g~ Þ, and b; ~ g~ and y~ are initial consistent estimates (e.g., where l^ ¼ lðyÞ; from GMM based on (10), using the identity weighting matrix), see Eq. (19).
5. Least squares In this section we consider the concentrated least squares (CLS) estimation of the model. We treat the ai as parameters to be estimated, so this is a true ‘‘fixed effects’’ treatment. We can consider the following least-squares problem: min
b;g;y;a1 ;...;aN
N 1
N X
½yi X i b 1T Z 0i g lðyÞai 0 ½yi X i b 1T Z 0i g lðyÞai :
i¼1
ð18Þ Solving for a1 ; . . . ; aN first, we get ai ðb; g; yÞ ¼ ½lðyÞ0 lðyÞ 1 lðyÞ0 ui ðb; gÞ;
i ¼ 1; . . . ; N
ð19Þ where ui ðb; gÞ ¼ yi X i b
as before. Then the estimates b^ LS ; g^ LS , and y^ LS minimizing (18) are equal to the minimizers of the sum of the squared concentrated residuals 1T Z 0i g
g; yÞ ¼ N 1 Cðb;
N X i¼1
C i ðb; g; yÞ ¼ N 1
N X
ui ðb; gÞ0 M lðyÞ ui ðb; gÞ
ð20Þ
i¼1
which is obtained by replacing ai in (18) with (19). From the name of (20), we call b^ LS ; g^ LS and y^ LS the concentrated least-squares estimator. Since G 0 l ¼ 0, we have M l G ¼ G and therefore M l ¼ PG ¼ GðG 0 GÞ 1 G0 . So the first order conditions of the CLS estimation become 2 3 2 3 X 0i PG ui qC=qb N 2 X6 6 7 7 Z i 10T PG ui ð21Þ 4 qC=qg 5 ¼
4 5 ¼ 0: N i¼1 0 1 0 0 L PG ui ui lðl lÞ qC=qy
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
251
Interpreting (21) as sample moment conditions, we can construct the corresponding (exactly identified) implicit population moment conditions: EX 0i PG ui ¼ 0
ð22aÞ
EZ i 10T PG ui ¼ 0
ð22bÞ
EL0 PG ui u0i lðl0 lÞ 1 ¼ 0:
ð22cÞ
That is, the CLS estimator is asymptotically equivalent to the GMM estimator based on (22). The moment conditions (22a) and (22b) are satisfied under the orthogonality assumption. However, this is not true of (22c). The moment conditions (22c) require the covariance assumption to be valid (unless we make very specific and unusual assumptions about the form of l and its relationship to the error variance matrix). Thus, the consistency of the CLS estimator requires both the orthogonality assumption and the covariance assumption. This is a rather striking result, since the consistency of least squares does not usually require restrictions on the second moments of the errors, and is a reflection of the incidental parameters problem. We would generally believe that least squares should be efficient when the errors are i.i.d. normal. However, similarly to the result in Ahn et al. (2001), this is not true in the present case. The efficient GMM estimator under the orthogonality and covariance assumptions uses the moment conditions (15), while the CLS estimator uses only a subset of these. This can be seen most explicitly in the case that, conditional on ðW i ; ai Þ, the it are i.i.d. normal. Then (15b) is redundant and (15a) can be replaced by (16), so that the efficient GMM estimator is based on (16a), (16b), (16c) and (15c). The CLS estimator is based on (22a), which is the same as (16a); (22b), which is the same as (16b); and (22c), which is a subset of (15c).1 So the inefficiency of CLS lies in its failure to use the moment conditions (16c) and from its failure to use all of the moment conditions in (15c). The latter failure did not arise in the Ahn et al. (2001) analysis (see footnote 1). Han et al. (2002) gives the asymptotic variance matrix of the CLS estimator, under the CIM4 assumption of Section 4. Alternatively, along the lines of Appendix A, we could calculate our estimate of the asymptotic variance matrix that does not depend on the validity of the CIM4 assumption. This estimate would appear to be the more useful in practice.
6. Empirical application This section includes an application of the estimators suggested in previous sections to the measurement of cost efficiency. The application uses panel data from 1 The moment conditions (22c) are equivalent to EL0 GðG0 GÞ 1 b3i ¼ 0. When the number of parameters in y is less than T 1, the transformation L0 GðG0 GÞ 1 loses information. This will be so in most parametric models for lðyÞ, though it is not true in the model of Ahn et al. (2001).
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
252
Spanish savings banks covering the period 1992–1998. In order to allow for changes in cost efficiency over time, the individual effects are modeled in a parametric form as the ‘‘inverse’’ of the exponential time-varying function proposed by Battese and Coelli (1992) in an MLE framework. We stress at the outset that the motivation is not to undertake a serious analysis of Spanish saving banks, but rather to see how our estimates perform with a real data set. Therefore, we are more interested in seeing how the various estimators compare than we are in choosing the ‘‘best’’ or ‘‘correct’’ set of results. 6.1. The cost frontier model The technology of banks is modeled using the following translog cost function: ln C it ¼ ln C ðqit ; wit ; g; bÞ þ ait þ it ¼
gþ
m X
bqj ln qjit þ
j¼1
þ
n X
bwj ln wjit þ
j¼1
þ
m X n X
m X 1X bqj qj ðln qjit Þ2 þ bqj qk ln qjit ln qkit 2 j¼1 jo k
n X 1X bwj wj ðln wjit Þ2 þ bwj wk ln wjit ln wkit 2 j¼1 jo k !
bqj wk ln qjit ln wkit
þ expðyðt 1ÞÞai þ it
(23)
j¼1 k¼1
where C it is observed total cost, qjit is the jth output, wkit is the kth input price, b is a vector of parameters to be estimated, g is a scalar to be estimated, and it is the error term. The individual effects are modeled as the product of an exponential timevarying function lt ðyÞ ¼ expðyðt 1ÞÞ and a time-invariant firm effect. Our GMM estimators effectively treat the ai as fixed. We can define the timevarying individual effects (intercepts) as ait ¼ ai lt ðyÞ. We wish to decompose the time-varying intercepts into a frontier intercept which varies over time ðat Þ and a non-negative inefficiency term ðvit Þ. That is ait ¼ lt ðyÞai ¼ at þ vit ;
vit X0:
ð24Þ
Following Cornwell et al. (1990) the frontier intercept can be estimated as ^ minð^ai Þ a^ t ¼ minð^ait Þ ¼ lt ðyÞ
ð25Þ
and the inefficiency term as
^ a^ i minð^ai Þ ¼ lt ðyÞ^ ^ vi : v^it ¼ lt ðyÞ
ð26Þ
i
i
i
Since the dependent variable is expressed in natural logs, cost efficiency indexes can be calculated from (26) as
^ a^ i minð^ai Þ : dit ¼ exp lt ðyÞ CE ð27Þ i
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
253
It is easy to see from expressions (26) and (27) that cost efficiency compares the performance (individual effect) of a particular firm with a firm located on the frontier (i.e., with the minimum effect). Since cost efficiency is a relative concept, the average efficiency index is related to the variance s2a : the higher the variance of the individual effects, the smaller the average efficiency. 6.2. Data The application uses yearly data from Spanish savings banks. The number of banks has decreased over the last 10 or 15 years due to mergers and acquisitions. These mergers took place mainly in the early 1990s. In order to work with a balanced panel, we use data from 50 savings banks over the period 1992–1998. Thus N ¼ 50 and T ¼ 7. In Spain there are private banks as well as savings banks. We did not include private banks in our sample because private banks and savings banks are rather different. Savings banks concentrate on retail banking, providing checking, savings accounts and loan service to individuals (especially mortgage loans), whereas private banks are more involved in commercial and industrial loans. Another difference is that savings banks are more specialized than private banks in long-term loans, which do not require continuous monitoring. Since the two groups are likely to have different cost structures and have been regulated in different ways, we did not want to pool them. We tried to analyze private banks separately, but we were not very successful, and we do not report those results here. The variables used in the analysis are defined as follows. We follow the majority of the literature and use the intermediation approach (proposed by Sealey and Lindley, 1977) which treats deposits as inputs and loans as outputs. We include three types of outputs and three types of inputs. The outputs are: loans to banks, and other profitable assets ðq1 Þ; loans to firms and households ðq2 Þ; and non-interest income ðq3 Þ. Using non-interest income goes beyond the intermediation approach as commonly modeled. We include it in an attempt to capture off-balance-sheet activities such as securitization, brokerage services, and management of financial assets for individual customers and mutual funds. This method of measuring nontraditional banking activities is not fully satisfactory. For example, we cannot distinguish between variation due to changes in volume and variation due to changes in price, and non-interest income is partly generated from traditional activities such as service charges on deposits or credits rather than from non-traditional activities. The inputs are: borrowed money, including demand, time and saving deposits, deposits from non-banks, securities sold under agreements to repurchase, and other borrowed money ðx1 Þ; labor, measured by total number of employees ðx2 Þ; and physical capital, measured by the value of fixed assets in the balance sheet ðx3 Þ. All of the input prices, wi ði ¼ 1; 2; 3Þ were calculated in a straightforward way by dividing nominal expenses by input quantities. Accordingly, total cost includes both interest and operating expenses. We normalize our variables by dividing by the sample geometric mean, so that the first-order coefficients can be interpreted as the elasticities evaluated at that point.
ARTICLE IN PRESS 254
C. Han et al. / Journal of Econometrics 126 (2005) 241–267
Furthermore, the standard homogeneity of degree one in input prices is imposed by normalizing cost and input prices using the price of physical capital as a numeraire. Thus, in the end we have a translog function of five variables—three outputs and two normalized input prices. We also have an overall intercept, and our function lt ðyÞ has y as a scalar. So, in terms of our previous notation, we have N ¼ 50; T ¼ 7; K ¼ 20; g ¼ 1 and p ¼ 1. We have a total of 22 parameters (20 in b; 1 in g; 1 in y), in addition to the individual effects ai ði ¼ 1; . . . ; 50Þ. We note that the instrument vector W i defined in Section 2 is of dimension TK þ g ¼ 141, which is rather large. For some of our methods of estimation we will instead make use of a reduced instrument set Pi , as discussed in Section 4.3 and Appendix B. Our instrument set Pi is of dimension 6, and contains (for bank i) an intercept, plus the mean (over time) values of the three outputs and the two normalized input prices. 6.3. Methods of estimation The cost equation is estimated using the GMM estimators of the previous sections. We will use the following notation to refer to our estimators. GMM1(W) is the GMM estimator based on the moment conditions (5). This estimator uses the orthogonality assumption only. The W in the parentheses indicates that it uses the full instrument set W i . There are 846 moment conditions. Since the number of moment conditions exceeds N (which equals 50), V^ 11 is singular þ and we used the Moore–Penrose pseudo-inverse V^ 11 , as discussed in Section 4.3. GMM1(P) is the GMM estimator that exploits the same orthogonality assumption, but uses only the reduced instrument set Pi , that is, the moment conditions (B.1). There are 36 moment conditions so the estimated weighting matrix is non-singular (which is the point of using Pi ). GMM2 refers to the estimator based on the moment conditions (10), which were motivated by the NCH assumption. However, we evaluate the asymptotic variances without assuming NCH—see Appendix A. This is an exactly identified problem ðnumber of moment conditions ¼ number of parameters ¼ 22Þ so the weighting matrix is irrelevant. We need estimates of the nuisance parameters in (10c), however, GMM2(W) will refer to the case that the nuisance parameter estimates are based on the GMM1(W) results, whereas GMM2(P) means that the GMM1(P) results were ^ WW is singular (W i is of dimension 141 but N ¼ 50) we used. In either case, since S 0
1 need to replace SW a SWW W i by a^ i in (10c), as was discussed in Section 4.3. GMM3(W) is the GMM estimator that is based on the moment conditions (15). This estimator relies on the orthogonality and covariance assumptions. The number of moment conditions is 846 þ 20 þ 6 ¼ 872. Once again we need a generalized inverse. GMM3(P) uses the reduced instrument set Pi in (15a), and also omits the conditions (15b), in order to avoid having more moment conditions than N. The number of moment conditions is 36 þ 6 ¼ 42. GMM4 is the estimator based on the moment conditions (16), (15b) and (15c). This set of moment conditions was motivated by the CIM4 assumption. The number
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
255
of moment conditions is 22 þ 20 þ 6 ¼ 48. (We evaluate nuisance parameters in (16) using the GMM3(W) results so we could call our estimator GMM4(W). Also we ^ i in (16c), for the same reason as was discussed need to replace S0W a S 1 WW W i by a above for GMM2.) We evaluate the asymptotic variance without imposing the CIM4 assumption, as in Appendix A. GMM5 is the estimator based on the moment conditions (16) and (15c). That is, compared to GMM4 we just drop the b2i condition (15b). The number of moment conditions is 28. We will call this estimator GMM5(W) when the nuisance parameters are evaluated using the GMM3(W) results, and GMM5(P) when the nuisance parameters are evaluated using the GMM3(P) results. In either case we ^ i in (16c). need to replace S0W a S 1 WW W i by a CLS is the concentrated least-squares estimator of Section 5. In addition, we also consider the following estimators from the existing literature. WITHIN is the standard within (fixed-effects) estimator that assumes timeinvariant efficiency. See Schmidt and Sickles (1984). MLE1 is a Battese–Coelli (1992)-type estimator. The model is still (23) but now we assume that it is i.i.d. Nð0; s2 Þ whereas ai is i.i.d. as the truncation of Nð0; s2a Þ. See Battese and Coelli (1992) for details of estimation of the inefficiencies. MLE2 is an extension of MLE1, as follows. The model for MLE1 is not really comparable to our model for GMM, because the model for MLE1 has a frontier that varies over time only because the explanatory variables vary over time—the intercept is time invariant. (Compare this to the time-varying intercept for the GMM models, as in Eq. (24).) For MLE2 we assert that ai ¼ a þ ui , where ui is truncated normal, but aa0. Now the error is it þ ui expðyðt 1ÞÞ, so that we have a model of Battese–Coelli type, but the regression function (frontier) contains a expðyðt 1ÞÞ which is time varying. An empirically relevant observation is that a and g are only separately identified when ya0. We can make the following comparisons among the estimators. WITHIN is the special case of GMM1 corresponding to y ¼ 0. If y is not equal to zero, WITHIN is not really comparable to the other estimators because it has timeinvariant inefficiency. Similarly, unless y ¼ 0, MLE1 is not really comparable to the other estimators because it has a time-invariant frontier intercept. GMM1 is based on fewer assumptions than GMM3. All of the other GMM estimators (GMM2, GMM4, GMM5, CLS) represent attempts at simplification of GMM1 or GMM3. For all of the GMM estimators, as for WITHIN, efficiency is a relative concept in the sense that it is explicitly measured relative to the best firm in the sample, as opposed to an absolute standard. This is a consequence of not making a distributional assumption for vit , and might be expected to lead to low measured efficiencies. This phenomenon was given a Bayesian interpretation by Koop et al. (1997), who showed that the WITHIN estimator corresponds to an improper prior which strongly favors low efficiency. MLE2 differs from the GMM estimators in a number of ways. It relies on stronger assumptions. The individual effects ai are assumed to be the sum of a constant plus an i.i.d. half-normal error, and the it are assumed to be i.i.d. normal. Furthermore,
ARTICLE IN PRESS 256
C. Han et al. / Journal of Econometrics 126 (2005) 241–267
the MLE2 model makes the ‘‘random effects’’ assumption that the effects are independent of the regressors. This assumption violates Assumption 3(iii), and if it is true the parameter y is not identified by the orthogonality assumption alone, so the GMM1 estimator is not consistent. More fundamentally, because it makes a distributional assumption for the effects, it measures efficiency relative to an absolute standard, and might be expected to yield higher efficiency measures than the GMM procedures (as, indeed, turns out to be the case below). 6.4. Empirical results (economic discussion) The parameter estimates for the cost frontiers are presented in Table 1. All of the output coefficients are positive, except for one coefficient for GMM1(P), indicating that the estimated cost frontiers are, at the sample geometric mean, increasing in outputs. The input price elasticities are also positive at the geometric mean. These results confirm the positive monotonicity of the cost frontiers with respect to input prices and assure positive input cost-shares. Borrowed money represents about 65 percent of banks’ total cost, while labor represents about 30 percent, and physical capital less than 5 percent. Returns to scale can be estimated as one minus the scale elasticity; that is, as one minus the sum of the output cost elasticities. At the sample mean, the scale elasticity and returns to scale are only a function of the first-order output parameters. The results indicate moderately increasing returns to scale. However, this differs somewhat over methods of estimation. We can distinguish three groups of estimators in terms of their estimated returns to scale. For GMM2(W), GMM4(W), GMM5(W), CLS, MLE1 and MLE2, this value is positive but less than 0.10, indicating the existence of moderate scale economies, as found in many past analyses of Spanish banks. These scale economies, however, are larger for WITHIN, GMM1(W), GMM3(W), GMM2(P) and GMM5(P), and they are considerably larger (greater than 0.30) for GMM1(P) and GMM3(P). Next we consider the levels of cost efficiency implied by these estimates. The cost efficiency indexes are obtained using Eq. (27), except for the MLE estimators, where we follow Battese and Coelli (1992), with minor modifications to accommodate the case of a cost frontier instead of a production frontier. Table 2 reports the yearly average of the estimated efficiency levels. For GMM4(W), GMM5(W), CLS, MLE1 and MLE2, we obtain a mean efficiency index of approximately 85 or 90 percent. In these cases, the average efficiency levels are similar to those found in other studies using the same data set (see, for instance, Cuesta and Orea, 2002). However, the scores are slightly lower for GMM1(W), GMM2(W), GMM3(W) and GMM5(P). They are even lower for WITHIN and GMM2(P), and very low (under 40 percent) for GMM1(P) and GMM3(P). Except for the WITHIN estimator (where time-invariance of efficiency is imposed), the cost efficiency indexes increase or decrease over time on the basis of the sign of y. If this parameter is positive (negative), efficiency decreases (increases) and the differences among firms increase (decrease) due to the exponential functional form of lt ðyÞ. The estimated y value is positive in all cases except GMM2(P), and it is
Table 1 Estimated coefficients WITHIN Coef.
0.314 0.423 0.120 0.634 0.328 0.191 0.029
0.009 0.140 0.218
0.121
0.061 0.017 0.028 0.043
0.035
0.120 0.016 0.065
0.125 9.968 0.045
t-stat. 12.28 9.97 4.39 49.71 14.05 2.76 0.22
0.09 3.82 2.40
1.71
1.66 0.39 0.44 0.48
0.80
2.01 0.36 1.14
2.64 132.45 2.81
Coef. 0.320 0.529 0.059 0.667 0.284 0.199 0.145
0.078 0.126 0.112
0.204 0.011 0.054
0.024 0.042
0.046
0.019 0.004 0.028
0.123 9.848 0.061
t-stat. 30.55 29.79 4.81 89.01 25.75 10.24 2.94
2.02 6.38 2.45
9.58 0.56 3.79
1.21 0.98
1.92
0.58 0.21 0.93
4.96 209.90 4.14
Coef. 0.302 0.430 0.109 0.634 0.332 0.184 0.016
0.055 0.134 0.211
0.132
0.044 0.023 0.021 0.070
0.041
0.122 0.019 0.070
0.120 9.962 0.042
GMM4(W)
t-stat. 11.88 9.77 4.33 49.98 14.57 2.76 0.12
0.58 3.73 2.24
1.91
1.32 0.59 0.40 0.77
1.05
2.20 0.46 1.31
2.62 155.64 3.08
Coef.
t-stat.
0.325 74.67 0.529 54.68 0.061 8.87 0.692 152.17 0.293 39.26 0.179 10.92 0.121 4.83 0.048 1.90 0.179 13.09 0.145 6.46
0.109
8.29
0.077
8.26 0.040 3.36 0.029 2.20
0.004
0.13
0.049
2.98
0.037
2.01 0.035 2.46 0.005 0.26
0.168 14.08 9.721 4174.2 0.087 76.85
GMM5(W) Coef. 0.325 0.553 0.054 0.691 0.301 0.179 0.120 0.035 0.156 0.090
0.119
0.064 0.032 0.026
0.001
0.036
0.033 0.025 0.019
0.147 9.738 0.103
CLS
t-stat. 38.28 28.29 3.36 72.02 21.15 7.93 2.09 0.72 6.74 1.80
5.05
3.64 1.88 1.19
0.03
1.22
0.71 0.98 0.48
6.76 801.53 15.39
Coef. 0.309 0.543 0.057 0.678 0.305 0.177 0.102 0.000 0.141 0.065
0.126
0.054 0.050
0.004 0.017
0.086 0.036 0.046
0.013
0.116 9.814 0.063
0.189
0.143
0.092
0.159
0.085
0.068
0.091
0.026
0.298 0.040 28.526
0.186 0.021
0.298 0.040 27.694
0.099 0.019 42.904
0.096 0.019 9.629
0.015 0.159 0.003
t-stat. 44.21 47.01 7.00 91.77 35.58 12.38 2.98 0.01 9.63 2.28
7.61
4.22 4.65
0.24 0.63
5.23 1.52 3.24
0.63
6.96 391.43 9.34
ARTICLE IN PRESS
22.60 24.45 2.74 61.27 21.10 6.79 2.29 0.47 4.48 1.16
4.22
1.72 4.36
2.44
0.06
5.08 1.77 2.63 0.15
2.45
Coef.
GMM3(W)
257
sa s Obj. function
0.284 0.491 0.036 0.694 0.292 0.165 0.136 0.021 0.108 0.055
0.119
0.039 0.075
0.062
0.003
0.130 0.068 0.060 0.005
0.064
t-stat.
GMM2(W)
C. Han et al. / Journal of Econometrics 126 (2005) 241–267
ln q1 ln q2 ln q3 ln w1 ln w2 0.5(ln q1 Þ2 0.5(ln q2 Þ2 0.5(ln q3 Þ2 0.5(ln w1 Þ2 0.5(ln w2 Þ2 ln q1 ln q2 ln q1 ln q3 ln q1 ln w1 ln q1 ln w2 ln q2 ln q3 ln q2 ln w1 ln q2 ln w2 ln q3 ln w1 ln q3 ln w2 ln w1 ln w2 Intercept y a RTS
GMM1(W)
258
Table 1 (continued ) MLE1 Coef.
MLE2 t-stat.
Coef.
0.060
0.304 0.590 0.047 0.683 0.305 0.176 0.068 0.014 0.143 0.028
0.104
0.076 0.051
0.013 0.037
0.100 0.066 0.063
0.027
0.117 9.807 0.075
0.230 0.059
sa s Obj. function
0.027 0.140 668.05
0.024 0.112 706.55
28.65 27.56 2.37 73.10 18.22 7.42 1.70 0.86 5.78 1.12
3.41
3.41 3.54
1.77
0.06
5.01 2.19 3.13
0.44
3.86 648.74 0.05
31.50 34.75 3.95 60.26 24.23 8.25 1.41 0.37 6.45 0.68
4.08
3.87 3.13
0.60 1.04
3.95 1.99 2.72
0.87
4.81 291.45 6.38
6.77
Coef. 0.156 0.480
0.016 0.642 0.283
0.135 0.445 0.161 0.066 0.159
0.018 0.191 0.238
0.135
0.400
0.238
0.095 0.011 0.221
0.032 7.751 0.002
t-stat. 2.85 7.62
0.24 19.45 3.53
1.09 2.33 1.17 1.22 1.27
0.10 2.72 3.19
2.00
2.76
2.59
1.01 0.22 2.29
0.49 0.44 0.12
Coef. 0.292 0.492 0.032 0.670 0.280 0.150 0.272
0.013 0.081 0.080
0.197 0.058 0.088
0.058
0.059
0.105 0.032 0.018 0.033
0.064 9.314
0.013
GMM3(P)
t-stat. 21.54 17.26 2.49 74.41 22.45 6.36 4.15
0.29 3.37 1.39
8.74 2.31 5.09
2.99
1.04
3.40 0.79 0.74 0.93
2.11 27.85
1.28
Coef. 0.164 0.454 0.071 0.642 0.249
0.173 0.336 0.152
0.003 0.335 0.029 0.153 0.274
0.092
0.334
0.228
0.221
0.016 0.255
0.002 10.112 0.005
GMM5(P)
t-stat. 4.75 8.52 1.61 24.04 3.51
1.78 1.96 1.08
0.05 2.13 0.21 2.30 4.46
1.52
2.44
2.84
2.06
0.37 2.85
0.03 128.04 1.03
Coef. 0.313 0.520 0.026 0.698 0.287 0.163 0.138 0.019 0.147 0.077
0.123
0.032 0.076
0.027
0.020
0.123 0.023 0.062 0.009
0.096 9.659 0.035
0.380
0.185
0.311
0.141
1.894 0.044 4.472
0.373 0.020
0.607 0.051 10.221
0.132 0.028 8.231
t-stat. 21.11 23.23 1.80 68.70 17.80 7.32 1.85 0.33 6.47 1.48
4.19
1.20 4.44
1.15
0.33
4.58 0.60 2.87 0.25
4.81 299.29 11.15
ARTICLE IN PRESS
0.316 0.592 0.032 0.754 0.245 0.181 0.122 0.052 0.140 0.053
0.101
0.079 0.061
0.046
0.004
0.142 0.082 0.082
0.016
0.103 9.504 0.001
t-stat.
GMM2(P)
C. Han et al. / Journal of Econometrics 126 (2005) 241–267
ln q1 ln q2 ln q3 ln w1 ln w2 0:5ðln q1 Þ2 0:5ðln q2 Þ2 0:5ðln q3 Þ2 0:5ðln w1 Þ2 0:5ðln w2 Þ2 ln q1 ln q2 ln q1 ln q3 ln q1 ln w1 ln q1 ln w2 ln q2 ln q3 ln q2 ln w1 ln q2 ln w2 ln q3 ln w1 ln q3 ln w2 ln w1 ln w2 Intercept y a RTS
GMM1(P)
WITHIN
GMM1 (W)
GMM2 (W)
GMM3 (W)
GMM4 (W)
GMM5 (W)
CLS
MLE1
MLE2
GMM1 (P)
GMM2 (P)
GMM3 (P)
GMM5 (P)
1 2 3 4 5 6 7
57.25 57.25 57.25 57.25 57.25 57.25 57.25
82.73 82.05 81.35 80.63 79.89 79.12 78.33
85.40 84.58 83.72 82.81 81.87 80.87 79.84
82.70 82.07 81.42 80.75 80.06 79.35 78.63
89.19 88.29 87.33 86.29 85.18 83.99 82.72
89.74 88.72 87.60 86.38 85.05 83.61 82.05
88.37 87.68 86.96 86.19 85.39 84.55 83.67
89.00 89.00 88.99 88.98 88.98 88.97 88.97
91.03 90.38 89.69 88.95 88.17 87.33 86.45
27.58 27.50 27.43 27.35 27.27 27.19 27.12
52.28 52.71 53.14 53.56 53.98 54.41 54.82
36.64 36.47 36.29 36.12 35.95 35.78 35.61
76.95 76.26 75.55 74.83 74.09 73.34 72.57
1–7
57.25
80.59
82.73
80.71
86.14
86.16
86.12
88.98
88.86
27.35
53.56
36.12
74.80
ARTICLE IN PRESS
Year
C. Han et al. / Journal of Econometrics 126 (2005) 241–267
Table 2 Average efficiency (in percentage)
259
ARTICLE IN PRESS 260
C. Han et al. / Journal of Econometrics 126 (2005) 241–267
statistically different from zero except for GMM1(P), GMM2(P), GMM3(P) and MLE1. Except for MLE1, the estimators that have the smallest values of y are also the ones that have the lowest efficiency scores. In Table 3 we show the Spearman rank correlation coefficients between the efficiency levels estimated using the alternative models. Here the ‘‘observations’’ are the efficiency rankings of the various banks at a given time period ðtÞ. The efficiency levels vary over t but the ranks do not, so these rank correlations are the same for t ¼ 1; 2; . . . ; T. The rank correlation coefficients are often high (in general, over 0.8), indicating that choosing a specific estimator is not necessarily a crucial issue when ranking firms in terms of their efficiency levels. However, we can see that (perhaps unsurprisingly) estimates that yield similar average efficiency scores also tend to yield highly correlated efficiency scores. We can divide the methods into two groups. Group A consists of the eight estimators with the highest average efficiency scores: MLE1, MLE2, CLS, GMM1(W), GMM2(W), GMM3(W), GMM4(W) and GMM5(W). Group B consists of the four estimators with the lowest average efficiency scores: WITHIN, GMM1(P), GMM2(P) and GMM3(P). GMM5(P) is not in either of these groups but lies somewhere between. Then the efficiency scores of any estimator tend to correlate highly with the efficiency scores for other estimators in their group, and poorly with the efficiency scores for the estimators in the other group. The estimated efficiency scores for the first period are graphed according to bank size in Fig. 1. This figure seems to show a negative correlation between efficiency and size. We can see again that WITHIN, GMM1(P), GMM2(P) and GMM3(P) give lower efficiency levels than the other estimators. They also decrease more markedly with bank size. Even for the other methods, however, the efficiency levels are low for the very largest banks. This is especially apparent for the five biggest savings banks, which are the result of large mergers that took place in the early 1990s. Hence, these results support the argument that merged banks are less efficient than non-merged banks, as found also by Cuesta and Orea (2002). 6.5. Empirical results (further econometric discussion) Having discussed the economic results implied by our estimates, we now turn to a more econometrically oriented discussion. The basic issue is how to choose among these different estimators. We want to ask whether any of the models appear to be adequate and which seem to be best, in some sense. These are important and difficult questions in many efficiency estimation exercises, of course. In order to at least partially separate the judgemental aspects of these questions from technical matters, much of the discussion will be put off until the next section The remainder of this section will simply give the results of some specification tests. A standard GMM result is that the minimized value of the GMM criterion function is asymptotically (for large N) distributed as chi-squared, with degrees of freedom equal to the degree of overidentification. Rejection of the hypothesis of valid moment conditions is indicated by a significantly large chi-squared value. In our context, the hypothesis of valid moment conditions is a joint hypothesis that
WITHIN GMM1(W) GMM2(W) GMM3(W) GMM4(W) GMM5(W) CLS MLE1 MLE2 GMM1(P) GMM2(P) GMM3(P) GMM5(P)
100 86.3 84.0 91.2 81.8 71.1 80.5 75.2 69.1 95.6 99.3 95.9 98.3
GMM1 (W)
GMM2 (W)
GMM3 (W)
GMM4 (W)
GMM5 (W)
100 95.3 98.9 95.0 89.7 95.3 84.1 86.5 78.5 85.6 82.3 89.6
100 95.3 97.3 94.3 98.2 92.3 93.8 74.7 84.8 78.8 88.9
100 93.7 86.8 93.8 83.8 84.4 84.5 90.8 87.4 93.4
100 97.5 99.0 95.5 96.1 69.7 80.9 74.5 88.3
100 97.8 94.9 98.4 58.0 70.1 62.7 78.9
CLS
MLE1
MLE2
100 94.4 96.4 69.3 80.1 73.8 86.8
100 96.0 61.2 74.4 66.5 83.7
100 56.1 68.9 61.5 77.4
GMM1 (P)
GMM2 (P)
GMM3 (P)
GMM5 (P)
100 95.6 98.6 91.6
100 96.0 97.4
100 93.2
100
ARTICLE IN PRESS
WITHIN
C. Han et al. / Journal of Econometrics 126 (2005) 241–267
Table 3 Spearman rank correlation coefficients (in percentage)
261
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
262 100
Efficiency (%)
80 WITHIN GMM1(W) GMM2(W) GMM3(W) GMM4(W) GMM5(W) CLS MLE1 MLE2 GMM1(P) GMM2(P) GMM3(P) GMM5(P)
60
40
20
0 -3.325
-0.763 -0.303 0.029 0.314 1.002 Size (in logs and deviations from sample mean)
2.599
Fig. 1. Efficiency levels at first period.
includes correct specification of the regression function plus the correctness of the additional assumptions (orthogonality and/or covariance) that generate the moment conditions. The degree of overidentification is usually taken to be the number of moment conditions minus the number of parameters. However, when the number of moment conditions exceeds the sample size, there is some ambiguity here. A little notation may be useful. Let Nð¼ 50Þ be the number of observations, sð¼ 22Þ be the number of parameters, and r be the number of moment conditions (e.g., for GMM1(W), r ¼ 846). The degree of overidentification is usually quoted as r s. In the usual case that ro N, we can equivalently state this as rankðV^ 11 Þ s since rankðV^ 11 Þ ¼ r when ro N. However, when No r, as is true for some of our estimators, rankðV^ 11 Þ ¼ N and so rankðV^ 11 Þ s ¼ N s. There is really no guidance in the usual asymptotic theory to choose between these two possibilities, since asymptotically (as N ! 1) they become the same. At this point we will avoid a choice by simply describing what happens with either choice. We will begin with the estimators that rely on the orthogonality assumption only, namely, GMM1 and GMM2. We test the validity of the orthogonality assumption (and also the correctness of the regression specification) using the usual GMM overidentification test, as just described. For GMM2, the estimation problem is exactly identified so there is no overidentification test. For GMM1(W), we obtain a criterion function of 28.5, which is unremarkable in a chi-squared distribution with 50 22 ¼ 28 degrees of freedom. This would be a remarkably small value in a chisquared distribution with 846 22 ¼ 824 degrees of freedom. Either way, there is no strong evidence against the orthogonality assumption. For GMM1(P), the statistic is 4.47, which is certainly not significantly large in a chi-squared distribution with
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
263
36 22 ¼ 14 degrees of freedom. (In this case N4 r so there is no ambiguity about the degrees of freedom.) Once again this is a very small chi-squared value: such a small value should arise with a probability of less than 0.01. In any case, this does not constitute against the validity of the orthogonality assumption. We conclude that there is no strong evidence against the validity of the orthogonality assumption. Now consider estimators that use the covariance assumption in addition to the orthogonality assumption. We can test the hypothesis of the validity of the covariance assumption (as well as the model specification and the orthogonality assumption) using the GMM overidentification test for any of the GMM estimators that use these assumptions. GMM3(W) has a chi-squared statistic of 27.7, which is unremarkable in a chi-squared distribution with 50 22 ¼ 28 degrees of freedom, and which would be remarkably small with 872 22 ¼ 850 degrees of freedom. GMM3(P) has a statistic of 10.2, which is very small, not large, with 20 degrees of freedom. GMM4 has a statistic of 42.9, which is significant at the 0.05 level, with 26 degrees of freedom. GMM5(W) has a statistic of 9.63, which is not significant at usual levels, with six degrees of freedom. Finally, GMM5(P) has a statistic of 8.23, which is also not significant at usual levels, with six degrees of freedom. Thus there is some evidence, but certainly not overwhelmingly strong evidence, against the joint hypothesis that the model is correct and that the orthogonality and covariance assumptions hold. 6.6. Interpretation This section gives our interpretation of the preceding results. The comments are subjective but are intended to deal with the sorts of considerations that would arise in a serious empirical analysis of the data. (Of course, they are not the sort of comments that would arise in a Bayesian treatment, where the subjectivity is encapsulated in the prior.) We would dismiss the WITHIN results because there is strong evidence (based on the statistical significance of the parameter y from many of the other estimators) that efficiency varies over time. Also the efficiencies are too low to be believable. This is not just a matter of their being lower than for some other methods; it is that they are too low to be believable economically. We would dismiss the MLE1 results because the MLE2 model contains it as a special case, and the restriction ða ¼ 0Þ that defines the special case is decisively rejected by the data. We would dismiss all of the results that are based on the instrument subset Pi — that is, the GMM1(P), GMM2(P), GMM3(P) and GMM5(P) results. They all give efficiencies that are too low to be believable economically. Furthermore, the GMM1(P) and GMM3(P) returns to scale are too high to be believable economically. We simply may not have picked a good subset of the instruments. This could be true in either of two senses (neither of which we seriously investigated). First, they may be weak in the usual sense of not correlating strongly with the transformed regressors. Second, they might not be correlated with the effects (inefficiency), in
ARTICLE IN PRESS 264
C. Han et al. / Journal of Econometrics 126 (2005) 241–267
which case regularity condition 3(iv) would fail and the bP1i (see Eq. (B.1)) conditions would fail to identify y. Finally, we would dismiss the results from GMM1(W) and GMM3(W) because not enough is known about the finite sample properties of GMM when the number of moment conditions exceeds sample size. We have little feel for whether we could put any trust in the asymptotic theory for this case. The remaining estimators are GMM2(W), GMM4(W), GMM5(W), CLS and MLE2. Actually it is odd to have both GMM2(W) and MLE2 in this set, because MLE2 relies on the random effects assumption, and if it holds GMM2(W) is not consistent; conversely, if it does not hold GMM2(W) is consistent but MLE2 is not. Also, it should be remembered that the efficiency being measured by MLE2 is an absolute standard efficiency, whereas for the others it is a relative standard, so some difference might be expected. However, as a practical matter all of these estimators yield more or less the same results. (Of course, we have discarded other estimators because they gave different results, so maybe that is not surprising.) We can note that we were not in the end comfortable with the GMM estimates that used huge numbers of moment conditions. Our attempt to reduce the number by judgementally shrinking the instrument set was not successful. However, taking the linear combinations of moment conditions that would be optimal under further assumptions (NCH or CIM4) yielded estimators that seemed sensible. These estimators (GMM2, GMM4, GMM5) and the CLS estimator may be reasonable alternatives to MLE in that they rely on weaker assumptions than MLE and yet gave reasonably strong results. It will be interesting to see if this also occurs in empirical work using other data sets.
7. Conclusions In this paper we have considered a panel data model with parametrically timevarying coefficients on the individual effects. We give a ‘‘fixed effects’’ treatment that does not depend on the ‘‘random effects’’ assumption that the effects are uncorrelated with the explanatory variables. This may be contrasted with the existing methods in the literature, which depend both on the random effects assumption and on specific distributional assumptions for the effects and the errors. Following Ahn et al. (2001), we have enumerated the moment conditions implied by alternative sets of assumptions on the model, and we have shown how to construct GMM estimators that exploit these moment conditions. We also show how the number of moment conditions can be reduced under some additional assumptions, including the assumption of no conditional heteroskedasticity and a stronger conditional moment independence assumption. These assumptions are strong but they are still weaker than the assumptions that the existing MLE methods are based on. We have also considered CLS estimation. Here the incidental parameters problem is relevant because we are treating the fixed effects as parameters to be estimated. An interesting result is that the consistency of the least-squares estimator requires both
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
265
the orthogonality assumption and the assumption that the errors are white noise. Furthermore, given the white noise assumption, the least-squares estimator is inefficient, because it fails to exploit all of the moment conditions that are available. Still this estimator may be useful in practice because it is easy to compute and it still relies on weaker assumptions than MLE methods. Finally, we apply the methods developed in the paper to the measurement of cost efficiency of Spanish savings banks over the period 1992–1998. Our application demonstrates that these methods can feasibly be applied in a real empirical setting, and that they are a potentially useful alternative to MLE.
Appendix A. Variance expressions that do not depend on NCH or CIM4 First, we will show how to calculate an estimate of the asymptotic variance matrix of the GMM estimate based on (10). The consistency of this estimate does not depend on the NCH assumption. 0 1 ^ 1 . Here V^ is an estimate of the variance Our estimate will be of the form ðB^ V^ BÞ matrix of the moment conditions, and it can be constructed in the usual way as the average over observations of the sums of squares and cross-products of the moment conditions. (That is, the calculation is analogous to (7), but with a different set of moment conditions.) Similarly, the estimate B^ is an average over observations of matrices of derivatives of the moment conditions with respect to the parameters. (That is, it is analogous to (8).) To calculate these derivatives, we can simply replace the term G0 ui in (10a)–(10c) by the matrix 0 0 ~
½G~ X i ; G~ 1T Z0i ; u~ i1 L
ðA:1Þ
and then take the average over observations (i.e., average over i ¼ 1; . . . ; N). In doing so, we are ignoring the fact that the terms preceding G 0 ui depend on y. But the derivatives of these terms would be multiplied by G 0 ui , and the expectation of this product is zero, so they can be ignored. Similar considerations apply to the GMM estimation based on (16), (15b) and (15c). We can construct an estimate of its variance matrix that does not depend on the correctness of the CIM4 assumption. The only issue is the estimation of the expected derivative matrices. This follows from the previous discussion, as follows. 0 0 ~ Þ (i) For the expected derivative matrix of (16), replace G 0 ui by ðG~ X i ; G~ 1T Z 0i ; u~ i1 L and average over observations. (ii) For the expected derivative matrix of (15b) and (15c), see (B.4), (B.5) and (B.6)–(B.8) in Appendix B.
Appendix B. GMM using a subset of the instruments First, we consider estimation under the orthogonality assumption only. The available moment conditions are b1i as given in Eq. (5). They assert that W i in uncorrelated with ðuit lt ui1 Þ for t ¼ 2; . . . ; T, where W i is the ‘‘instrument’’ vector that contains X i1 ; . . . ; X iT and Zi .
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
266
Now we suppose that we use only a subset of the instruments. So, let Pi be any subset of W i , and we consider the moment conditions EbP1i ðb; g; yÞ ¼ E½GðyÞ0 ui ðb; gÞ Pi ¼ 0 which is the same as (5) except Pi replaces W i . Now let V P11 ¼ consistent estimate would be
ðB:1Þ 0 EbP1i bP1i ,
N 1 X P ~ g~ ; yÞb ~ P ðb; ~ g~ ; yÞ ~ 0 V^ 11 ¼ bP ðb; 1i N i¼1 1i
of which a
ðB:2Þ
~ g~ ; y~ is any consistent estimate. Let BP be the matrix of expected derivatives where b; 1 P of b1i with respect to b; g and y. Then a consistent estimate of BP1 is N 1X P P B^ 1i ; B^ 1 ¼ N i¼1
p 0 0 ~ Þ Pi B^ 1i ¼ ðG~ X i ; G~ 1T Z0i ; u~ i1 L
ðB:3Þ
~ u~ i1 and L ~ are evaluated at any consistent estimate. Finally, ðB^ P V^ P 1 B^ P Þ 1 where G; 1pffiffiffiffi 11 1 ffi is a consistent estimate of the asymptotic variance of N times ½ðb^ bÞ0 ; ð^g gÞ0 ; ðy^ yÞ0 0 , where these are the GMM estimates from (B.1). Next we consider estimation under the orthogonality and covariance assumptions. In this case the set of available moment conditions consists of b1i ; b2i and b3i as given in Eq. (15). Now suppose that we use Pi instead of W i , so that the moment conditions we use are expressed by bP1i ; b2i and b3i . Let bPi be the vector that contains bP1i ; b2i and b3i ; let V P be its variance matrix, and BP be its expected derivative matrix. We can estimate V P as in (B.2), but with bP1i replaced by bPi . The expected derivative matrix BP has blocks BP1 ; B2 and B3 (arranged vertically). BP1 can be estimated as in (B.3). B2 and B3 do not depend on whether we use Pi or W i . We can estimate B2 by N 1 X B^ 2i B^ 2 ¼ N i¼1
ðB:4Þ
where 0 0 ~ Þ u~ i þ G~ 0 u~ i ðX i ; 1T Z 0 ; 0Þ : B^ 2i ¼ H 0 ½ðG~ X i ; G~ 1T Z0i ; u~ i1 L i
ðB:5Þ
Similarly, we can estimate B3 by N 1 X B^ 3i B^ 3 ¼ N i¼1
ðB:6Þ
where 0 l~ u~ i 0 0 ~ Þ 10 G~ 0 u~ i ðl~ 0 X i ; l~ 0 1T Z 0 ; C i Þ B^ 3i ¼ 0 ðG~ X i ; G~ 1T Z 0i ; u~ i1 L i ~l l~ l~ l~ 0 ~ ~ þ 2 l 0u~ i l~ 0 L: ~ C i ¼ u~ 0i L l~ l~
ðB:7Þ ðB:8Þ
ARTICLE IN PRESS C. Han et al. / Journal of Econometrics 126 (2005) 241–267
267
References Ahn, S.C., Lee, Y.H., Schmidt, P., 2001. GMM estimation of linear panel data models with time-varying individual effects. Journal of Econometrics 101, 219–255. Bai, J., 2003. Inferential theory for factor models of large dimensions. Econometrica 71, 135–172. Battese, G.E., Coelli, T.J., 1992. Frontier production functions, technical efficiency and panel data: with application to paddy farms in India. Journal of Productivity Analysis 3, 153–170. Chamberlain, G.C., 1980. Analysis of covariance with qualitative data. Review of Economic Studies 47, 225–238. Chamberlain, G.C., 1992. Efficiency bounds for semiparametric regression. Econometrica 60, 567–596. Cornwell, C., Schmidt, P., Sickles, R., 1990. Production frontiers with cross-sectional and time-series variation in efficiency levels. Journal of Econometrics 46 (1/2), 185–200. Cuesta, R.A., Orea, L., 2002. Mergers and technical efficiency in Spanish savings banks: a stochastic distance function approach. Journal of Banking and Finance 26, 2231–2247. Doran, H.E., Schmidt, P., 2002. GMM Estimators with Improved Finite Sample Properties Using Principle Components of the Weighting Matrix, with an Application to the Dynamic Panel Data Model. Unpublished Manuscript. Han, C., Orea, L., Schmidt, P., 2002. Estimation of a panel data model with parametric temporal variation in individual effects, Working Paper 2002/09, School of Economics and Finance, Victoria University of Wellington (http://www.sef.vuw.ac.nz/vuw/fca/sef/files/prod.pdf); also Efficiency Series Paper 05/2002, Department of Economics, University of Oviedo (http://www.uniovi.es/eficiencia/ESP/esp0502.pdf). Hansen, L.P., 1982. Large sample properties of generalized method of moments estimators. Econometrica 50, 1029–1054. Holtz-Eakin, D., Newey, W., Rosen, H.S., 1988. Estimating vector autoregressions with panel data. Econometrica 56, 1371–1395. Kiefer, N.M., 1980. A time series-cross sectional model with fixed effects with an intertemporal factor structure. Unpublished Manuscript, Cornell University. Koop, G., Osiewalski, J., Steel, M.F.J., 1997. Bayesian efficiency analysis through individual effects. Journal of Econometrics 76, 77–105. Kumbhakar, S.C., 1990. Production frontiers, panel data and time-varying technical inefficiency. Journal of Econometrics 46, 201–212. Lee, Y.H., 1991. Panel data models with multiplicative individual and time effects: applications to compensation and frontier production functions. Unpublished Ph.D. Dissertation, Department of Economics, Michigan State University. Lee, Y.H., Schmidt, P., 1993. A production frontier model with flexible temporal variation in technical inefficiency. In: Fried, H., Lovell, C.A.K., Schmidt, S. (Eds.), The Measurement of Productive Efficiency. Oxford University Press, Oxford. Schmidt, P., Sickles, R., 1984. Production frontiers and panel data. Journal of Business and Economic Statistics 2, 367–374. Sealey, C.W., Lindley, J.T., 1977. Inputs, outputs, and a theory of production and cost at depository financial institutions. Journal of Finance 32, 1251–1266.