Semi-parametric single-index panel data models with interactive fixed effects: Theory and practice

Semi-parametric single-index panel data models with interactive fixed effects: Theory and practice

Journal of Econometrics 212 (2019) 607–622 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/loca...

490KB Sizes 0 Downloads 47 Views

Journal of Econometrics 212 (2019) 607–622

Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Semi-parametric single-index panel data models with interactive fixed effects: Theory and practice✩ ∗

Guohua Feng a , Bin Peng b , Liangjun Su c , , Thomas Tao Yang d a

Department of Economics, University of North Texas, 1155 Union Circle #311457, Denton, TX 76203-5017, USA Department of Economics, Deakin University, 221 Burwood Hwy, Burwood VIC 3125, Australia c School of Economics, Singapore Management University, 90 Stamford Road, 178903, Singapore d Research School of Economics, College of Business and Economics, Australian National University, Canberra, ACT 0200, Australia b

article

info

Article history: Received 5 November 2016 Received in revised form 25 November 2018 Accepted 11 May 2019 Available online 9 July 2019 JEL classification: C14 C23 C51

a b s t r a c t In this paper, we propose a single-index panel data model with unobserved multiple interactive fixed effects. This model has the advantages of being flexible and of being able to allow for common shocks and their heterogeneous impacts on cross sections, thus making it suitable for the investigation of many economic issues. The asymptotic theories are established accordingly. Our Monte Carlo simulations show that our methodology works well for large N and T cases. In our empirical application, we illustrate our model by analysing the returns to scale of large commercial banks in the U.S. © 2019 Elsevier B.V. All rights reserved.

Keywords: Nonlinear panel data model Interactive fixed effects Orthogonal series method

1. Introduction Panel data models with interactive fixed effects have drawn considerable attention in the last decade. In macroeconomics, interactive fixed effects can be incorporated to account for the heterogeneous impacts of unobservable common shocks that affect all countries or firms (Floro and van Roye, 2017; Boneva and Linton, 2017). In microeconomics, panel data models with interactive fixed effects can also find wide applications. For example, in earning studies, interactive fixed effects can be included to account for unmeasured skills or unobservable characteristics such as innate ability, perseverance, motivation, and industrious (Bai, 2009). In bank efficiency analysis, they can be used to represent timevarying unobserved heterogeneities such as asset quality, and geographic reach and local branch density of a bank’s branch network (Feng et al., 2017). In finance, combination of unobserved factors and observed covariates can explain the excess returns of assets (Su and Ullah, 2016). Among all methodologies dealing with interactive fixed effects, two most well known studies are Pesaran (2006) and Bai (2009). Building on them, different types of generalizations have been proposed. For example, both Bai et al. (2009) and Kapetanios et al. (2011) allow unobservable factors to be nonstationary; Su and Jin (2012) extend Pesaran (2006) to ✩ The authors would like to thank the Co-Editor, Professor Han Hong, an Associate Editor and two referees for their constructive suggestions and comments on an earlier version of this paper. Su gratefully acknowledges the Lee Kong Chian Fund for Excellence. Peng gratefully acknowledges the MOE (Ministry of Education in China) Project of Humanities and Social Sciences under Project No.19YJC790206. ∗ Corresponding author. E-mail addresses: [email protected] (G. Feng), [email protected] (B. Peng), [email protected] (L. Su), [email protected] (T.T. Yang). https://doi.org/10.1016/j.jeconom.2019.05.018 0304-4076/© 2019 Elsevier B.V. All rights reserved.

608

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

a non-parametric setting and provide non-parametric versions of common correlated effects mean group (CCEMG) and common correlated effects pooled (CCEP) estimators using sieve estimation technique; Li et al. (2016), by using a least absolute shrinkage and selection operator, extend Bai (2009) to allow for structural breaks; etc. The purpose of this study is to contribute to this literature by extending (Bai, 2009) to a semi-parametric single-index setting. Precisely, our model is yit = go (x′it θo ) + γo′,i fo,t + εit with i = 1, . . . , N , t = 1, . . . , T ,

(1.1)

where the regressor xit is a d × 1 vector, both the factor fo,t and the factor loading γo,i are m × 1 unknown vectors, and go is the so-called link function and is unknown. In addition, we assume both d and m are known and finite. The subscript ‘‘o " indicates true parameters or true functions throughout the paper. For notational simplicity, denote

φi [θ , g ] := (g(x′i1 θ ), . . . , g(x′iT θ ))′

(1.2)

for any given θ (belonging to R ) and g (defined on R). Then (1.1) can be expressed in matrix notation as d

Yi = φi [θo , go ] + Fo γo,i + εi ,

(1.3)

where Yi , Fo and εi are defined conformably to φi [θo , go ]. The single-index model with interactive fixed effects can be widely used in many sub-fields of economics such as production economics, economic growth, international trade, etc. Consider the issue to be examined in the application of this study — the issue of economies of scale of commercial banks in the U.S. This issue has received a considerable amount of attention in the past three decades, because during this period the U.S. banking industry has undergone an unprecedented transformation — one marked by a substantial decline in the number of commercial banks and savings institutions and a growing concentration of industry assets among large financial institutions (Jones and Critchfield, 2005). Most studies investigating this issue use a fully-parametric translog cost (distance or profit) function without interactive fixed effects to represent the production technology of banks. Compared with this commonly-used fully-parametric translog functional form, the single-index panel data model has two advantages. First, the link function (i.e., go (·)) is more flexible than the commonly-used translog functional form. Specifically, the translog function is merely a quadratic specification in log-space, and thus limits the variety of shapes the cost function permitted to take. In contrast, the singleindex model does not assume that go (·) is known and hence it is more flexible and less restrictive. Second, the single-index panel data model allows for common shocks, which have increased in the past three decades due to the following reasons: (1) the industry has become increasingly dominated by large banks that share similar business models and offer similar ranges of products, which has in turn increased the exposure of those banks to common shocks; and (2) as the process of banking and financial integration has progressed, bank linkages have also risen, which has also increased the exposure of large banks to common shocks (Houston and Stiroh, 2006). In summary, the flexibility of the link function and the presence of the interactive fixed effects make the model (1.1) very useful for investigating the issue of economies of scale of commercial banks in the U.S. The structure of this paper is as follows: Section 2 presents the basic settings for model (1.1). Section 3 derives the estimator with the corresponding asymptotic theories. Section 4 provides several extensions with relevant discussions. Section 5 conducts a Monte Carlo study. An empirical application is provided in Section 6. Section 7 concludes. The technical lemmas, proofs and extra results and discussions can be found in the supplementary file of this paper. Throughout this paper, we will use the following notations: λmin (W ) and λmax (W ) denote the minimum and maximum eigenvalues of a symmetric square matrix W , respectively; Iq denotes the identity matrix with dimension q; MW = IT − PW denotes the orthogonal projection matrix generated by matrix W , where PW = W (W ′ W )−1 W ′ , and W is a T × q matrix with rank q; →P and →D stand for convergence in probability and convergence in distribution, respectively; ∥ · ∥ denotes the Euclidean norm of a vector or the Frobenius norm of a matrix; ⌊a⌋ means the largest integer not exceeding a; ρ1 , O(1) and A always denote constants and may be different at each appearance. 2. Basic settings As is well-known, single-index models and factor models usually require some restrictions for the purpose of identification (Ichimura, 1993; Bai, 2009). Thus, we begin by introducing identification restrictions needed for (1.1). For the parameter vector θo = (θo,1 , . . . , θo,d )′ , we follow Dong et al. (2016) to define it as follows:

θo ∈ Θ with Θ being a compact subset of Rd , ∥θo ∥ = 1 and θo,1 > 0.

(2.1)

Let Fˆ and Γˆ denote the estimates of the factors and factor loadings that we will provide in the next section. For the purpose of identification, we impose the following restrictions: Fˆ ∈ DF , where DF =

{

Γˆ ∈ DΓ , where DΓ =

′ F = (f1 , . . . , fT ) |

T ×m

{

1 T

F ′ F = Im ;

}

} Γ = (γ1 , . . . , γN )′ | Γ ′ Γ being diagonal .

N ×m

(2.2)

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

609

For the true factors Fo = (fo,1 , . . . , fo,T )′ and factor loadings Γo = (γo,1 , . . . , γo,N )′ , we require them to satisfy: fo,t is identically distributed across t,

Fo′ Fo T

→P ΣF > 0, and E ∥fo,1 ∥4 < ∞;

Γo′ Γo

→P ΣΓ > 0, and E ∥γo,1 ∥4 < ∞. N The requirement of identical distributions in (2.3) is only for notational simplicity. The link function is assumed to be a square integrable function as follows: γo,i is identically distributed across i,

(2.3)

go ∈ ℑ1 with ℑ1 being a compact subset of L2 (R).

(2.4)

It is worth emphasizing that in order to develop asymptotic theories, extra conditions (e.g., Assumption B of Dong et al., 2016, or Assumption C of Dong and Linton, 2018) need to be imposed on the unknown function go apart from the square integrability on a certain function space. For conciseness and readability, we do not specify these restrictions at this point, but will do so wherever necessary. 3. Estimation and asymptotic theories In order to recover the link function, we follow Dong et al. (2016) to use the physicists’ Hermite polynomial system, which has the advantage of allowing all regressors to take values on the real line. A detailed discussion on the properties of this system can be found in the supplementary file of Dong et al. (2016). Let {Hn (w ) | n = 0, 1, 2, . . .} be the physicists’ Hermite polynomial system orthogonal with respect to exp(−w 2 ). The orthogonality of the system reads



√ n π 2 n!δnm ,

Hn (w )Hm (w ) exp(−w 2 )dw =

(3.1)

where δnm is the Kronecker delta. By (3.1), the corresponding Hermite function system is defined as {Hn (w ) = 1 √ √ Hn (w ) exp(−w 2 /2) | n = 0, 1, 2, . . .}. Then ∀g ∈ L2 (R) can be expanded into an infinite series as follows: 4 n π 2 n!

g(w ) = gk (w ) + δk (w ),

(3.2)

∑∞

where gk (w ) = C H(w ), δk (w ) = n=k cn Hn (w ), H(w ) = (H0 (w ), . . . , Hk−1 (w )) , C = (c0 , . . . , ck−1 ) , and cn = ∫ g(w )Hn (w )dw . Throughout this paper, k denotes the truncation parameter, so that gk (w ) is the partial sum of the infinite series, which converges to g(w ) under certain conditions. Correspondingly, the true link function in this case can be written as ′



go (w ) = go,k (w ) + δo,k (w ),



(3.3)

∑∞

where go,k (w ) = Co H(w ), δo,k (w ) = n=k co,n Hn (w ), and Co = (co,0 , . . . , co,k−1 ) . Given that our interest lies in both θo and go , we define a norm ∥ · ∥w for the 2-fold Cartesian product space formed by Rd and L2 (R). ′



{ }1/2 ∥(θ , g)∥w = ∥θ∥2 + ∥g ∥2L2 ,

(3.4)

}1/2

2 1/2 . n=0 cn

{∑∞

where (θ , g) ∈ R × L (R), and ∥g ∥L2 = g (w )dw = With the above setting, we are now ready to premultiply the model (1.3) by MFo = IT − Fo (Fo′ Fo )−1 Fo′ and use (3.3) to obtain d

{∫

2

2

}

MFo Yi = MFo φi [θo , go,k ] + MFo φi [θo , δo,k ] + MFo εi ,

(3.5)

where φi [θ , g ] has been defined in (1.2). Accordingly, the objective function is defined as SNT (θ , C , F ) =

N 1 ∑

NT

(Yi − φi [θ, gk ])′ MF (Yi − φi [θ, gk ]) ,

(3.6)

i=1

where (θ , C , F ) ∈ Θ × Rk × DF . Then the estimator is obtained by (θˆ , Cˆ , Fˆ ) =

argmin (θ ,C ,F ) ∈ Θ ×Rk ×DF

SNT (θ, C , F ).

(3.7)

Following the same arguments as in Bai (2009, p. 1236), (3.7) can be further decomposed into the following two expressions:

⎧ ⎪ ⎪ ⎪ ⎨ (θˆ , Cˆ ) = ⎪ ⎪ ⎪ ⎩

1 NT

argmin (θ ,C ) ∈ Θ ×Rk

∑N ( i=1

N 1 ∑

NT

ˆ gˆk ] Yi − φi [θ,

(Yi − φi [θ, gk ])′ MFˆ (Yi − φi [θ, gk ])

i=1

)(

)′ ˆ gˆk ] Fˆ = Fˆ VNT , Yi − φi [θ,

(3.8)

610

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

where gˆk (w ) = Cˆ ′ H(w ), and VNT is a diagonal matrix with the diagonal being the m largest eigenvalues of

)(

ˆ gˆk ] Yi − φi [θˆ , gˆk ] φi [θ, ( ) 1 ˆ′ ˆ , gˆk ] . F Y − φ [ θ i i T

)′

1 NT

∑N ( i=1

Yi −

arranged in descending order. Finally, Γˆ is expressed as Γˆ ′ = (γˆ1 , . . . , γˆN ), where γˆi =

3.1. Consistency To show the consistency of the estimator in (3.7), we make the following assumptions. Assumption 1. 1. {(yit , xit ) | 1 ≤ i ≤ N , 1 ≤ t ≤ T } are observable, and both m and d are known and finite. Moreover, let (2.1)–(2.4) hold. 0 2. Let xt = (x1t , . . . , xNt )′ and ξt = (ε1t , . . . , εNt )′ , and let F−∞ and Fτ∞ denote the σ -algebras generated by {(xt , ξt ) | t ≤ 0} and {(xt , ξt ) | t ≥ τ } respectively. Define the mixing coefficient α (τ ) = supA∈F 0 ,B∈Fτ∞ |Pr(A) Pr(B) −∞ − Pr(AB)|. (a) Assume {xi1 , . . . , xiT ; εi1 , . . . , εiT } is identically distributed across i. Let {(xt , ξt ) | t ≥ 1} be strictly stationary and α -mixing such that for some ν1 > 0, E |εi1 εj1 |2+ν1 /2 < ∞, and the mixing coefficient satisfies ∑ ∞ ν1 /(2+ν1 ) < ∞. Moreover, E ∥x11 ∥4 < ∞. t =1 [α (t)] 2 (b) E [ε11 ] = 0, E [ε11 ] = σε2 , and {εit | i ≥ 1, t ≥ 1} is independent of the other variables. Let E [εi1 εj1 ] = σij for ∑ ∑N ∑T i ̸ = j, and i̸ =j |σij | = O(N). Moreover, i,j=1 t ,s=1 |E [εit εjs ]| = O(NT ). Assumption 1.1 is standard in the literature, and has been discussed in Section 2. The mixing condition of Assumption 1.2 is the same as condition 2.1 of Chang et al. (2015). For notational simplicity, Assumption 1.2.a assumes the distribution of {xi1 , . . . , xiT ; εi1 , . . . , εiT } is identical across i. The rest of Assumption 1.2 is similar to Assumption 3.4 of Fan et al. (2016). Assumption 2. 2

(s)

(s)

1. Let kT → 0. Moreover, there exists a positive integer r such that w r −s go (w ) ∈ L2 (R) for s = 0, 1, . . . , r, where go denotes the sth derivative of go . In addition, for ∀g ∈ ℑ1 , g (1) (w ) ∈ L2 (R). 2. Let fθ (w ) denote the probability density function (PDF) of w = x′11 θ , and assume that sup(θ,w)∈Θ ×R fθ (w ) ≤ A < ∞. 3. Assume that L(θ , g) has a unique minimum on Θ × ℑ1 at (θo , go ), where L(θ , g) = E [∆g(x′11 θ )]2 − E [∆g(x′11 θ )fo′,1 ]ΣF−1 E [fo,1 ∆g(x′11 θ )] and ∆g(x′11 θ ) = g(x′11 θ ) − go (x′11 θo ). Assumptions 1.1–2.2 are fairly standard in the literature, and ensure that the approximation of the unknown function go (w ) by an orthogonal expansion has a fast rate of convergence (cf., Assumption B of Dong et al., 2016). Assumption 2.3 is for the purpose of identification, and is in the same spirit as Assumption 1 of Newey and Powell (2003). We provide more detailed discussion on Assumption 2.3 in the supplementary file. With the above assumptions, the consistency of the estimators in (3.7) can be established as follows. Under Assumptions 1 and 2, as (N , T ) → (∞, ∞),   1. PFˆ − PFo  →P 0, where PFˆ and PFo are the projection matrices generated by Fˆ and Fo respectively; 2. ∥(θˆ , gˆk ) − (θo , go )∥w →P 0, where gˆk (·) = Cˆ ′ H(·).

Theorem 3.1.

It is well understood that Fo is identifiable up to a non-singular rotation matrix in the literature, so we establish the consistency for the projection matrix PFˆ rather than Fˆ itself. Also, in the second result the consistency is established with respect to the norm defined in (3.4). Using Theorem 3.1, we obtain the following rates of convergence. Lemma 3.1. Let ηNT = max{ √1 , N

√1 T

} and Q −1 = VNT (Fo′ Fˆ /T )−1 (Γo′ Γo /N)−1 . Under Assumptions 1–2, as (N , T ) → (∞, ∞),

1. VNT →P V , where VNT is defined in (3.8), and V is an m × m diagonal matrix consisting of the eigenvalues of ΣF ΣΓ , and ΣF and ΣΓ  are defined in (2.3); 

ˆ gˆk ) − (θo , go )∥w ) + OP (ηNT ); 2.  √1 (Fˆ Q −1 − Fo ) = OP (∥(θ, T

    ( 2 )   ˆ gˆk ) − (θo , go )∥w ) + OP ηNT 3.  T1 Fo′ (Fˆ − Fo Q ) = OP (∥(θ, ; ( ) 2 2 ˆ gˆk ) − (θo , go )∥w ) + OP ηNT . 4. ∥PFˆ − PFo ∥ = OP (∥(θ,

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

611

The results of Lemma 3.1 are analogous to Proposition A.1 and Lemma A.7 of Bai (2009). However, due to the semiparametric nature of the model, the estimates of both the parameter of interest and link function have impacts on the rate of convergence for the estimates of unknown factors. 3.2. Asymptotic normality of θˆ In this subsection, we establish the asymptotic normality of θˆ . Before stating the next assumption, we introduce some notations for better presentation. Let ϵ be a sufficiently small positive number, and Ω (ϵ ) = {(θ, g) | ∥(θ, g) − (2) (θo , go )∥L2 ≤ ϵ}. For ∀(θ1 , g1 ), (θ2 , g2 ) ∈ Ω (ϵ ), let Git (θ1 , g1 , θ2 , g2 ) = g1 (x′it θ1 )g2 (x′it θ2 )xit x′it . Further define Υ (θ ) = −1 ′ ′ ′ ′ ′ Υ1 (θ ) + Υ2 (θ ) with Υ1 (θ ) = E [H(x11 θ )H(x11 θ ) ] and Υ2 (θ ) = E [H(x11 θ )fo,1 ]ΣF E [fo,1 H(x′11 θ )′ ]. For ∀(θ, g), let ψ1i [θ, g ] = (g(x′i1 θ )xi1 , . . . , g(x′iT θ )xiT )′ with i = 1, . . . , N. Assumption 3. 1. Assume that

  N T  1 ∑  ∑   (a) sup Git (θ1 , g1 , θ2 , g2 ) − E [G11 (θ1 , g1 , θ2 , g2 )] = oP (1);   (θ1 ,g1 ),(θ2 ,g2 )∈Ω (ϵ )  NT   T i=1 t =1 1 ∑ [ ]   xit g (1) (x′it θ )fo′,t − E x11 g (1) (x′11 θ )fo′,1  = oP (1). (b) max sup  i≥1 (θ ,g)∈Ω (ϵ )  T  t =1

2. Let 0 <

λmax (Υ1 (θ )) ≤ A and

λmin (Υ (θ )) ≥ ρ1 > 0 uniformly in k. ] E ∥xit1 ∥2 ∥xjt2 ∥2 ∥fo,t1 ∥2 ∥fo,t2 ∥2 ∥fo,t3 ∥2 ∥fo,t4 ∥2 ≤ A < ∞, where 1 ≤ i, j ≤ N and 1 ≤ t1 , t2 , t3 , t4 ≤ T .

sup

{θ |∥θ −θo ∥≤ϵ}

3. Let 4. Let

max

i,j,t1 ,t2 ,t3 ,t4 N √1 i=1 NT



inf

{θ|∥θ −θo ∥≤ϵ}

[

˜ ). ψ1i [θo , go(1) ]′ MFo εi →D N(0, Σ 3 2

2 k → 0, and suppose for r defined in Assumption 2.1, it satisfies NTk−(r +1/12) → A, where A is a 5. Assume ηNT nonnegative constant.

Assumption 3.1 requires uniform convergence in a small neighbourhood of (θo , go ). In fact, we can prove the uniform convergence of Assumption 3.1 by following a procedure similar to those given for (1) of Lemma B.3 in the supplementary file of this paper. However, this would require a quite lengthy derivation, so we adopt the current form for the sake of conciseness. For Assumption 3.2, we provide a detailed example in the supplementary file of this paper. Assumption 3.3 can be removed if one is willing to bound higher moments of xit and fo,t . Assumption 3.4 is in the same spirit as Assumption E of Bai (2009). Assumption 3.5 specifies the usual condition on the truncation parameter k. The first part of Assumption 3.5 bounds the rate at which k can diverge to infinity; the second part of Assumption 3.5 ensures that the truncation error shrinks to zero sufficiently fast. Note that the condition NTk−(r +1/12) → A is different from the usual rate condition as imposed in Newey (1997), Chen and Christensen ((2015) ) or Belloni et al. (2015). The major reason is that the typical uniform bound for the sieve approximation error (O k−r in our notation) imposed in these papers have been verified for some commonly used sieve bases for regressions with compact supports but not for the Hermite polynomial system. Although this system ensures one can approximate the link function defined on the whole real line, the rate associated with the sieve approximation error (cf., Lemma C.1 of Dong et al., 2016) is in fact slower than that assumed in the early literature. If one can improve the rate of approximation error for the Hermite polynomial system obtained in Dong et al. (2016), then one can also relax the condition on (k, r ) in Assumption 3.5. But we are not sure about this. Using the above assumptions, the asymptotic normality can be obtained as follows. Under Assumptions 1–3, as (N , T ) → (∞, ∞), ( ) NT θˆ − θo − V∗ −1 ΠNT 1 − V∗ −1 ΠNT 2 →D N(0, Σ∗ ),

Theorem 3.2.



˜ V∗−1 , Σ ˜ has been defined in Assumption 3.4, and where Σ∗ = V∗−1 Σ V∗ = V1 − V2 ΣF−1 V2′ ,

ΠNT 1 = ΠNT 2 =

N 1 ∑

NT

[(

go(1) (x′11 θo )

i=1

)2

]

x11 x′11 ,

ψ1i [θo , gˆk(1) ]′ MFˆ (φi [θo , go ] − φi [θo , gˆk ]),

i=1

N 1 ∑

NT

V1 = E

ψ1i [θo , gˆk(1) ]′ MFˆ Fo γo,i .

V2 = E go(1) (x′11 θo )x11 fo′,1 ,

[

]

612

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

We now make a few comments on the above theorem. Firstly, note that it is not hard to know that ∥ΠNT 1 ∥ = OP (∥go − gˆk ∥L2 ) and ∥ΠNT 2 ∥ = OP (∥MFˆ (Fo − Fˆ Q −1 )∥). ΠNT 1 is similar to condition (2.6) of Theorem 2 in Chen et al. (2003), where they point out that the verification of this type of condition is in some cases difficult, and is itself the subject of a long paper by Newey (1994). Although ΠNT 1 is exactly 0 uniformly in N and T when gˆk = go , we cannot further decompose ΠNT 1 without imposing further restrictions (e.g., the ‘‘Linearization" condition of Assumption 5.1 of Newey (1994)). ΠNT 2 is equivalent to the biased terms in Bai (2009). Due nature of the single-index model, it is not ∑Nto the semi-parametric (1) ′ 1 ˆ Q −1 )γo,i , where Q is given in Lemma 3.1. ˆ helpful to further decompose ΠNT 2 as ΠNT 2 = NT ψ [θ , g ] M (F − F 1i o o ˆ k i=1 F ˆ gˆk ) − (θo , go )∥w ) into the system. For This is because further decomposing Fo − Fˆ Q −1 will introduce the residual OP (∥(θ, parametric models, decomposing Fo −Fˆ Q −1 will only introduce θˆ −θo to the system. After further rearranging and imposing conditions like N /T → A, asymptotic normality can be established as in Bai (2009). However, for our semi-parametric model, it is not the case any more. If go is linear (i.e., go (w ) = w ), ΠNT 1 will disappear from the system, and we will be able to further decompose ΠNT 2 as in Bai (2009). In this case, we will get exactly the same components (i.e., B, C and D0 of Bai (2009, p. 1247)) for our asymptotic distribution. In a recent study, Jiang et al. (2017) point out that if E [γo′,i fo,t ] = 0 under the parametric setting, the bias term (i.e., ΠNT 2 of this paper) can be further simplified or even removed (see Jiang et al., 2017, Assumption 1.2 and Eq. 7.7–7.9 for more details), which also holds true for our semiparametirc setting in view of the development of this To be precise, if Assumption 1.2 and Eq. 7.7–7.9 of Jiang et al. (2017) are satisfied, we can easily show ∑paper. 5 ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ that j=1 IjNT (θ , gk , F ) = oP (∥(θ, gk ) − (θo , go )∥w ), where IjNT (θ, gk , F ) is defined in the proof of Lemma 3.1. This will further enable us to remove the bias term ΠNT 2 when establishing the asymptotic normality. While theoretically it remains unknown whether the biases can be removed under more generalized conditions, in practice one may use bootstrapping techniques to make statistical inferences. We will provide a detailed procedure in the empirical study section below. As for the semiparametric efficiency issue, unfortunately we do not know whether our estimator can achieve the semiparametric efficiency bound (SPEB) under some conditions. The SPEB was first derived for the single-index model in the i.i.d. setting (see, e.g., Newey, 1990). Such a bound can be reached by various kernel estimators for the single-index model in the i.i.d. setting (see, e.g., Ichimura, 1993). But to the best of our knowledge, no attempt has been made for single-index panel data models with the usual additive fixed effects, not to mention single-index models with interactive fixed effects. ( ∑ N ˆ ˆ (1) ′ ˆ ∗ = σˆ ε2 1 If εit is independent and identically distributed (i.i.d.) over i and t, Σ∗ can be estimated by Σ i=1 ψ1i [θ, gk ] NT (1)

)−1

MFˆ ψ1i [θˆ , gˆk ]

, where σˆ ε2 =

1 NT

∑N ∑T

t =1 (yit

i=1

− gˆk (x′it θˆ ) − γˆi′ fˆt )2 . If the cross-sectional dimension also exhibits certain

ordering structure arising from physical or economic distance, then we can follow Kim and Sun (2013) and consider a bivariate-kernel based HAC-type estimator. In the absence of ordering along the cross-sectional dimension, it is standard to assume some sparsity structure on the variance–covariance matrix and consider a shrinkage estimator (Bai and Liao, 2017). These are beyond the scope of the current paper and are left for future research. 3.3. Rates of convergence and asymptotic normality of gˆk Finally, in this section, we discuss the estimation of the link function and that of its first derivative. Theorem 3.3.

Let ηNT be as defined in Lemma 3.1. Under Assumptions 1–3, as (N , T ) → (∞, ∞),

(

1

2 1. ∥gˆk − go ∥L2 = OP k 2 ηNT

)

r

+ oP (k− 2 ); ( 2 ) r 1 2. supw∈R |gˆk (w ) − go (w )| = OP kηNT + oP (k− 2 + 2 ). ∑N ′ −1 ˇ ). Suppose that √ 1 (θo )Hi (θo )′ MFo εi →D N(0, Σ i=1 H(w ) Υ kNT √ ( ) NT ˇ ), where ΠNT 3 = 3. gˆk (w ) − go (w ) − ΠNT 3 →D N(0, Σ k

1 NT

H(w )′ Υ −1 (θo )

∑N

i=1

Hi (θo )′ MFˆ Fo γo,i . 1

1

1

1

2 Note that if N /T → A with A being a positive constant, then the leading term OP (k 2 ηNT ) reduces to OP (k 2 N − 2 T − 2 ), r which is the same as that obtained by Su and Jin (2012) for non-parametric panel data models. The term oP (k− 2 ) of the first result represents the rate of truncation error, and the order op (·) is due to Lemma A.3 of the supplementary file. The r

1

∑N

′ −1 ˇ) same argument applies to oP (k− 2 + 2 ) of the second result. The condition √ 1 (θo )Hi (θo )′ MFo εi →D N(0, Σ i=1 H(w ) Υ kNT of the third result can be easily verified by applying a procedure similar to that used for Lemma A1 of Chen et al. (2012a). For simplicity, this as a condition as Assumption E of Bai (2009). For ΠNT 3 , it is easy to know √ in the same−spirit ∑N we write 1 ′ ˆ 1 )∥), and all the discussions regarding ΠNT 2 under Theorem 3.2 that ∥ NT i=1 Hi (θo ) MFˆ Fo γo,i ∥ = OP ( k∥MFˆ (Fo − F Q apply to ΠNT 3 . The marginal effects are often of practical interest. To close this section, we briefly discuss the estimation of the first (1) derivative of go (w ). It is straightforward to show that the estimator is given as follows: gˆk (w ) = Cˆ ′ H(1) (w ), where (1) (1) (1) ′ H (w ) = (H0 (w ), . . . , Hk−1 (w )) .

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

Let ηNT be that denoted in Lemma 3.1. Under Assumptions 1–3, as (N , T ) → (∞, ∞),

Corollary 3.1. (1) gk

2.

( 2 ) 11 r ∥ = OP kηNT + oP ( (k− 2 + 12));

(1) go L2 (1) supw∈R gk (

1. ∥ ˆ

613





3

r

11

2 w) − go(1) (w)| = OP k 2 ηNT + oP (k− 2 + 12 ).

As in the case of the third result of Theorem 3.3, the asymptotic normality can also be established under certain extra (1) conditions. Here we do not pursue the study on go (w ) further, but we will examine the estimates of the first derivative of the link function through a Monte Carlo study in Section 5. 4. Some extensions and discussions In this section, we discuss some related issues, such as (1) non-integrable link functions, (2) determination of the truncation parameter and the number of factors, (3) hypothesis testing, and (4) generalizations of the single-index model. 4.1. Beyond integrability Following the spirit of Dong and Gao (2018), in this subsection we briefly discuss how to extend our results in the previous section to the non-integrable link function case, which is of interest in many situations (such as the empirical application presented in Section 6). We start by making the following assumption regarding the link function:1 go ∈ ℑ2 with ℑ2 being a compact subset of L2 R, exp(−w 2 ) .

(

)

(4.1)

Accordingly, a new norm for ∀(θ, g) ∈ Rd × L2 (R, exp(−w 2 )) is defined as follows:

} { 2 2 1/2 ∥(θ , g)∥w , ˜ = ∥θ∥ + ∥g ∥˜ L2 {∫ 2 }1/2 g (w ) exp(−w 2 )dw . where ∥g ∥˜L2 = As in (3.2), ∀g ∈ L2 (R, exp(−w 2 )) can be expanded into an infinite series as follows: g(w ) =

k−1 ∑

cn hn (w ) +

∞ ∑

cn hn (w ) := gk (w ) + δk (w ),

(4.2)

(4.3)

n=k

n=0

∫ ∑∞ δk (w ) = g(w )hn (w ) exp(−w 2 )dw , gk (w ) = C ′ H(w ), H(w ) = n=k cn hn (w ), cn = ′ (h0 (w ), . . . , hk−1 (w )) , and C = (c0 , . . . , ck−1 ) . Correspondingly, the true link function can be written as where hn (w ) =

1 √ √ 4 π 2n n! Hn (w ), ′

go (w ) = go,k (w ) + δo,k (w ),

(4.4)

∑∞

where go,k (w ) = Co H(w ), δo,k (w ) = n=k co,n hn (w ), and Co = (co,0 , . . . , co,k−1 ) . ˆ Cˆ , Fˆ ) are obtained in exactly The objective function SNT (θ, C , F ) in this case is the same as (3.6), and the estimates (θ, the same way as in the case of (3.7) and (3.8). We summarize the necessary assumptions and the asymptotic properties associated to these estimators in the supplementary file due to similarity. ′



4.2. Determination of k and m In this subsection, we discuss how to choose the truncation parameter k and the number of factors m. Assuming the link function is smooth enough, k = ⌊(NT )1/5 ⌋ always satisfies the assumptions required for asymptotic convergence. Although ⌊(NT )1/5 ⌋ may not be the optimal choice of k, all the derivation remains valid for consistency. Therefore, in the empirical application in Section 6 and the numerical studies in the supplementary file of this paper, we always use k = ⌊(NT )1/5 ⌋ (see Su and Jin (2012) and Dong et al. (2016) for similar arguments and settings). With the choice of the truncation parameter, we choose the number of factors by minimizing the following criterion information function:

( CI(m) = ln

N  2 1 ∑ ˆ gˆk ] − Fˆ γˆi  Yi − φi [θ,  NT

) +m·

i=1

N +T NT

( · ln

NT N +T

)

,

(4.5)

which is in the same spirit as that in Bai and Ng (2002). Our simulation results provided in Section 5 suggest that the criterion function (4.5) works well for choosing the number of factors. In general, how to simultaneously choose the optimal truncation parameter and the number of factors remains an open issue. For the case of cross-sectional or time series data, previous studies have examined the choices of optimal bandwidth and truncation parameter (e.g., Gao et al., 2002; Hall et al., 2007) for non-parametric and varying-coefficient models. For 1 Note that L2 (R, exp −w 2 ) := g |

(

)

{



g 2 (w) exp −w 2 dw , where exp −w 2 serves as a weight function for the functional space.

(

R

)

}

(

)

614

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

the case of panel data, even the choice of optimal bandwidth (or truncation parameter) alone remains unresolved (e.g., Su and Jin, 2012; Chen et al., 2012b), let alone simultaneous choice of optimal truncation parameter and the number of factors. For single-index panel data models with interactive fixed effects, simultaneous choice of k and m is even more daunting and is thus left for future research. 4.3. Specification tests In a recent work, Su et al. (2015) propose a procedure for testing the non-parametric model yit = m(xit ) + γo′,i fo,t + εit against the fully parametric model of Bai (2009). To be precise, their null and alternative hypotheses are respectively given by

H0 :

Pr[m(xit ) = x′it θo ] = 1

H1 :

Pr[m(xit ) = x′it θ] < 1

for some θo ∈ Rd , versus for all θ ∈ Rd .

(4.6)

Following the spirit of Su et al. (2015), the null and alternative hypotheses in our case can be formulated as follows:

H0 :

Pr[go (x′it θo ) = x′it θo ] = 1

H1 :

Pr[go (xit θ ) = xit θ ] < 1 ′



for some θo ∈ Rd , versus for all θ ∈ Rd .

(4.7)

Noting that the test in Su et al. (2015) only requires the estimation under the null hypothesis, we can directly apply their test to test our null hypothesis. In this case, once the null hypothesis is rejected, it is not clear how one should specify the alternative. For example, the alternative can be the fully nonparametric one considered by Su et al. (2015) or the semiparametric one studied in this paper. Alternatively, we can modify their test statistic by replacing xit in their kernel function by x′it θ˜ where θ˜ is the parametric estimator of θ under the null hypothesis. Their asymptotic theory remains virtually unchanged due to the parametric convergence rate of θ˜ . But the modification is extremely useful when the dimension ) of xit is high or when xit contains a subset of discrete regressors. It is designed specifically to test whether ( go x′it θo is linear in the single-index x′it θo . As a referee kindly points out, given the fact that we consider two classes of functions for go (·) and different classes of basis functions have to be chosen to estimate them, it is natural to ask which class of basis functions should be used in practice. The test of linearity partially serves the purpose in that we can use the linear model directly once we fail to reject the null. Admittedly, the problem still remains when we reject the null of linearity, and we are not sure how to test the general null hypothesis of integrability against the alternative hypothesis of non-integrability. Empirically, the question may become less relevant if the regressors have compact support, because in this case we can regard that go (·) has a compact support. 4.4. More generalized models Recent research in the literature has been concerned with generalizing the models proposed by Pesaran (2006) and Bai (2009). In our particular case, (1.1) can also be generalized by assuming that the factor structure enters the model nonlinearly as in the traditional nonparametric literature such that yit = go (x′it θo + γo′,i fo,t ) + εit

(4.8)

or by assuming that the factor structure enters the model nonseparably as yit = go (x′it θo , γo′,i fo,t ) + εit .

(4.9)

To the best of our knowledge, the above two models have not been studied yet. To estimate the model in (4.8), we can continue to consider the sieve approximation of go (w) by say, C ′ H (w), and estimate the parameters θ , C , Γ and F simultaneously via the least squares method. Similarly, to estimate the model in (4.9), we need to consider the sieve approximation of a bivariate function go (v, c ) by say, C ∗′ H∗ (v, c ), where H∗ (v, c ) is formed from the tensor product of H (v) and H (c ). Then we can also estimate the parameters θ , C ∗ , Γ and F jointly via the least squares method. Now, because the factors and factor loading parameters appear inside the unknown function, there is no way to concentrate Γ out as in Section 3. As a result, the formal asymptotic analyses of the resulting estimators in either case will be much more challenging and involved than what we have done in this paper and is thus left for future research. Despite the great difficulty in studying the asymptotic properties of the above estimators, it is possible to consider a specification test for the model in (4.8). In particular, we can follow Su et al. (2015) and test the linearity of go (·):

H0a : Pr [go (wit ) = wit ] = 1 versus H1a : Pr [go (wit ) = wit ] < 1,

(4.10)

where wit = xit θo + Since the test of Su et al. (2015) only requires the estimation under the null, we can apply their test directly to test H0a against H1a . Similarly, we can also apply their study to test the linear additive structure of go (·, ·) in (4.9): ′

γo′,i fo,t .

H0b : Pr [go (vit , cit ) = vit + cit ] = 1 versus H1b : Pr [go (vit , cit ) = vit + cit ] < 1,

γo′,i fo,t .

(4.11)

where vit = xit θo and cit = When either null is rejected, one should not apply the commonly used linear panel data models with interactive fixed effects. ′

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

615

Table 1 Biases and RMSEs for Case 1 – go (w ) = (1 + w 2 ) exp(−w 2 ).

Bias

SI_Factor

L_Factor

F_Panel

RMSE

SI_Factor

L_Factor

F_Panel

N \T

θˆ1

θˆ2

40

80

160

200

40

80

160

200

40 80 160 200 40 80 160 200 40 80 160 200

−0.0012 −0.0003

−0.0001

0.0004 0.0002 −0.8867 −0.8852 −0.8857 −0.8843 −0.9025 −0.8996 −0.9006 −0.9024

0.0001 −0.0002 0.0003 −0.8850 −0.8839 −0.8854 −0.8855 −0.8995 −0.9001 −0.9016 −0.9015

0.0002 0.0000 0.0000 0.0002 −0.8840 −0.8850 −0.8855 −0.8849 −0.9014 −0.9010 −0.9026 −0.9026

0.0004 0.0001 0.0000 −0.0001 −0.8852 −0.8844 −0.8854 −0.8851 −0.9012 −0.9021 −0.9010 −0.9015

0.0044 0.0002 0.0007 0.0005 0.8354 0.8380 0.8388 0.8392 0.8254 0.8257 0.8279 0.8243

0.0003 0.0004 −0.0001 0.0005 0.8382 0.8404 0.8408 0.8399 0.8279 0.8266 0.8260 0.8261

0.0004 0.0001 0.0000 0.0003 0.8407 0.8404 0.8406 0.8412 0.8270 0.8258 0.8246 0.8244

0.0007 0.0002 0.0000 −0.0001 0.8396 0.8414 0.8411 0.8414 0.8287 0.8260 0.8261 0.8267

40 80 160 200 40 80 160 200 40 80 160 200

0.0403 0.0152 0.0105 0.0092 0.8877 0.8858 0.8861 0.8846 0.9049 0.9016 0.9020 0.9039

0.0138 0.0097 0.0070 0.0059 0.8855 0.8841 0.8856 0.8857 0.9008 0.9011 0.9023 0.9022

0.0096 0.0067 0.0046 0.0040 0.8843 0.8851 0.8856 0.8849 0.9020 0.9015 0.9030 0.9030

0.0085 0.0060 0.0040 0.0036 0.8855 0.8845 0.8855 0.8852 0.9017 0.9025 0.9013 0.9018

0.0745 0.0202 0.0140 0.0123 0.8364 0.8385 0.8391 0.8394 0.8282 0.8277 0.8295 0.8260

0.0185 0.0130 0.0093 0.0078 0.8386 0.8406 0.8409 0.8400 0.8293 0.8275 0.8269 0.8269

0.0128 0.0089 0.0061 0.0054 0.8410 0.8405 0.8407 0.8413 0.8278 0.8263 0.8250 0.8248

0.0113 0.0080 0.0054 0.0048 0.8398 0.8415 0.8411 0.8414 0.8293 0.8264 0.8265 0.8270

5. Monte Carlo study In this subsection, we perform Monte Carlo simulations to investigate the finite sample properties of our models and estimators. The data generating process (DGP) of the model (1.1) is as follows. The error terms (denoted by ξt = (ε1t , . . . , εNt )′ in Assumption 1.2) are generated by an AR(1) process: ξt = 0.5ξt −1 + vt , where vt ∼ i.i.d. N(0, Σv ) and the (i, j)th element of Σv is 0.5|i−j| for 1 ≤ i, j ≤ N. The factor loadings γo,i ’s are generated as γo,i ∼ i.i.d. N(im , Im ), where im denotes an m × 1 vector of ones and∑ Im denotes an m × m identity matrix. The factors fo,t ’s are generated as m fo,t ∼ i.i.d. N(0, Im ). For regressors, let xit = 1 + j=1 |γo,i,j · fo,t ,j | + δit , where δit ∼ i.i.d. N(0, Id ), and γo,i,j and fo,t ,j denote the jth elements of γo,i and fo,t respectively. Let d = 2, m = 3 and θo = (0.8, −0.6)′ . For the link function, we consider two cases:

• Case 1: go (w) = (1 + w2 ) exp(−w 2 ) such that go ∈ L2 (R); • Case 2: go (w) = exp((0.6 + w)2 /8) such that go ∈ L2 (R, exp(−w2 )). Case 1 is estimated using the method outlined in Section 3, while Case 2 is estimated using the method described in Section 4. For simplicity, we refer to the single-index panel data model with interactive fixed effects as the ‘‘SI_Factor model". When estimating the ‘‘SI_Factor model", the truncation parameter k and the number of factors m are always chosen as described in Section 5. For comparison purposes, we also use two other models to estimate θo . The first model is the linear panel data model with interactive fixed effects proposed by Bai (2009). For simplicity, we refer to this model as the ‘‘L_Factor model". The second model is the traditional panel data models with fixed effects. We refer to this latter model as the ‘‘F_Panel model" below. For each generated data set and each of the three models mentioned above, we calculate the following bias and squared error (‘‘se" for short): biasℓ = θˆjℓ − θo,j and seℓ = (θˆjℓ − θo,j )2 for j = 1, . . . , d, where θˆjℓ and θo,j denote the jth elements of θˆ ℓ and θo respectively, and ℓ indicates the ℓth replication. After 1000 replications, we report the following two values in Tables 1 and 2. Bias =

1 1000

1000 ∑ j=1

biasℓ

   1 and RMSE = √

1000

1000 ∑

seℓ .

(5.1)

j=1

Two findings emerge from Tables 1 and 2. First, the biases and RMSEs associated with the SI_Factor model are much smaller than those associated with the other two models. For example, when N = T = 80, the biases for θˆ1 and θˆ2 associated with the SI_Factor model are 0.001 and 0.004 respectively, whereas those associated with the L_Factor model are −0.8839 and 0.8404; and those associated with the F_Panel model are −0.9001 and 0.8266. This suggests that

616

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

Table 2 Biases and RMSEs for Case 2 – go (w ) = exp((0.6 + w )2 /8). N \T

Bias

SI_Factor

L_Factor

F_Panel

RMSE

SI_Factor

L_Factor

F_Panel

θˆ2

θˆ1 40

80

160

200

40

80

160

200

40 80 160 200 40 80 160 200 40 80 160 200

0.0004 0.0000 0.0001 0.0000 −0.0970 −0.0838 −0.0714 −0.0702 −0.0508 −0.0501 −0.0409 −0.0451

−0.0001

0.0000 0.0001 0.0000 0.0000 −0.0693 −0.0578 −0.0534 −0.0536 −0.0500 −0.0474 −0.0468 −0.0503

0.0006 0.0001 0.0001 0.0000 0.2733 0.2735 0.2650 0.2632 0.2361 0.2372 0.2318 0.2315

−0.0001

0.0000 0.0000 0.0000 −0.0794 −0.0704 −0.0622 −0.0607 −0.0460 −0.0530 −0.0481 −0.0495

0.0000 0.0000 0.0000 0.0001 −0.0697 −0.0622 −0.0558 −0.0540 −0.0487 −0.0496 −0.0505 −0.0506

0.0001 0.0000 0.0000 0.2671 0.2640 0.2569 0.2558 0.2375 0.2404 0.2348 0.2356

0.0000 0.0000 0.0000 0.0001 0.2623 0.2561 0.2520 0.2510 0.2375 0.2373 0.2347 0.2358

0.0000 0.0001 0.0000 0.0000 0.2598 0.2533 0.2496 0.2503 0.2392 0.2356 0.2355 0.2382

40 80 160 200 40 80 160 200 40 80 160 200

0.0050 0.0032 0.0019 0.0017 0.1253 0.1038 0.0891 0.0854 0.1172 0.1192 0.1496 0.0905

0.0029 0.0019 0.0011 0.0010 0.0980 0.0822 0.0705 0.0689 0.0979 0.0813 0.0742 0.0734

0.0019 0.0012 0.0008 0.0032 0.0813 0.0706 0.0617 0.0594 0.0817 0.0702 0.0663 0.0628

0.0016 0.0021 0.0006 0.0006 0.0799 0.0652 0.0587 0.0584 0.0760 0.0664 0.0648 0.0617

0.0067 0.0042 0.0026 0.0022 0.2831 0.2797 0.2695 0.2679 0.2571 0.2588 0.2563 0.2445

0.0039 0.0025 0.0014 0.0014 0.2730 0.2679 0.2596 0.2584 0.2502 0.2478 0.2412 0.2417

0.0025 0.0016 0.0010 0.0052 0.2666 0.2588 0.2537 0.2525 0.2460 0.2420 0.2382 0.2387

0.0021 0.0030 0.0009 0.0008 0.2640 0.2555 0.2511 0.2517 0.2457 0.2399 0.2382 0.2406

compared with those obtained from the other two models, parameter estimates obtained from the SI_Factor are more accurate. Second, the biases associated with the SI_Factor model become virtually zero when both N and T are greater than or equal to 160. In contrast, those associated with the other two models are well above zero even when N = T = 200. It is of interest to examine how the accuracy of the estimated link function changes as N and T increase. For this purpose, we plot in Fig. 1 (Fig. 2) the 95% upper and lower bounds of gˆk (w ) for Case 1 (Case 2) on a selected interval based on 1,000 replications. As can be seen, the bounds become tighter as N and T increase, implying the estimates of the link function become more accurate as N and T increase. In addition, we examine the estimates of the first derivative of the link function in the same way as in Figs. 1 and 2. Due to similarity, we plot these results in the supplementary file of this paper to save space. Finally, we examine the accuracy of the estimate of the number of factors, chosen using (4.5) of the main text. In doing ˆ < 3), overestimated (i.e., m ˆ > 3), so, we calculate the percentage of the number of factors being underestimated (i.e., m ˆ = 3) based on 1000 replications. The results, reported in Table 3, show that as N and T or accurately estimated (i.e., m ˆ = 3 quickly approaches 1. This suggests that the procedure for selecting the number of increase, the percentage of m factors given in Section 4.2 works very well when sample sizes are relatively large. 6. An application to U.S. banking data In this section, we provide an application of the single-index panel data with interactive fixed effects to the analysis of returns to scale of large commercial banks in the U.S. Over the past three decades, the increasing dominance of large banks in the U.S. banking industry, caused by fundamental regulatory changes and technological and financial innovations, has stimulated considerable research into returns to scale at large banks in the U.S. Major regulatory changes include the removal of restrictions on interstate banking and branching and the elimination of restrictions against combinations of banks, securities firms, and insurance companies, while technological and financial innovations include, but are not limited to, information processing and telecommunication technologies, the securitization and sale of bank loans, and the development of derivatives markets. One of the most important consequences of these changes is the increasing concentration of industry assets among large banks. According to Jones and Critchfield (2005), the asset share of large banks (those with assets in excess of $1 billion) increased from 76% in 1984 to 86% in 2003. In the meantime, the average size of those banks increased from $4.97 billion to $15.50 billion (in 2002 dollars). This has raised concern that some banks might be too large to operate efficiently, stimulating a substantial body of research into returns to scale at large banks in the U.S. For excellent reviews, see Berger et al. (1999). The data used in this application are obtained from the Reports of Income and Condition published by the Federal Reserve Bank of Chicago. The sample covers the period 1986–2005. We examine only continuously operating large banks with assets of at least $1 billion (in 1986 dollars) to avoid the impact of entry and exit and to focus on the performance of a core of healthy, surviving institutions. This gives a total of 466 banks over 20 years (i.e., 80 quarters, so N = 466 and T = 80). To select the relevant variables, we follow the commonly-accepted intermediation approach (Sealey and Lindley, 1977). In this paper, three output quantities and three input prices are identified. The three outputs are consumer

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

617

Fig. 1. Estimated link function for Case 1 – go (w ) = (1 + w 2 ) exp(−w 2 ).

loans, y1 ; non-consumer loans, y2 , is composed of industrial and commercial loans and real estate loans; and securities, y3 , includes all non-loan financial assets, i.e., all financial and physical assets minus the sum of consumer loans, non-consumer loans, securities, and equity. All outputs are deflated by the GDP deflator to the base year 1986. The three input prices include: the wage rate for labour, w1 ; the interest rate for borrowed funds, w2 ; and the price of physical capital, w3 . The wage rate equals total salaries and benefits divided by the number of full-time employees. The price of capital equals expenses on premises and equipment divided by premises and fixed assets. The price of deposits and purchased funds equals total interest expense divided by total deposits and purchased funds. Total cost is thus the sum of these three input costs. This specification of output and input prices is the same as or similar to most of the previous studies in this literature (e.g., Stiroh, 2000). 6.1. Model specification The variables defined above suggest the following mapping:

(

w1 2 , w , w3 w3

y1 , y2 , y3

)



CT

w3

, where CT represents total

costs, and all the other variables have been defined as above. Note that we divide CT , w1 , and w2 by w3 to maintain linear homogeneity with respect to input prices. As discussed in the beginning of the introduction, our model (1.1) is more suitable for modelling the production technology in U.S. large banks than the conventional linear or translog model. To be specific, the single-index cost function with interactive fixed effects is then written as2 ln

CTit

w3,it

( =C

w1,it w2,it , , y1,it , y2,it , y3,it w3,it w3,it

= go (x′it θo ) + γo′,i fo,t + εit ,

)

+ γo′,i fo,t + εit (6.1)

2 As required by microeconomic theory, the cost function in (6.1) should satisfy monotonicity and curvature conditions. Specifically, monotonicity requires that the first-order derivatives of the cost function be nonnegative, and curvature requires that the cost function be a concave function of prices or, equivalently, that the Hessian matrix of the cost function be negative semidefinite (Serletis and Feng, 2015). These two conditions can be easily imposed on parametric cost functions using Cholesky decomposition, constrained maximum likelihood or Bayesian approaches. However, it is much more involved to impose them on the semiparametric model panel data model with interactive fixed effects. To the best of our knowledge, no attempt has been made in this area and thus will be left for future research.

618

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

Fig. 2. Estimated link function for Case 2 – go (w ) = exp((0.6 + w )2 /8).

)′ w , ln w2,it , ln y1,it , ln y2,it , ln y3,it , 1 ; (4.1) is imposed on go to facilitate comparison of empirical re3,it sults between this model and the two linear models given in (6.2) and (6.3); and C (·) [represents the func( normalized cost )] −1 ∑3 w1 w2 ∂ , tion. Given the estimation of (6.1), it is possible to compute returns to scale by RTS = , , y1 , y2 , y3 j=1 ∂ ln yj C w3 w3 ( ) w1 w2 ∂ where ∂ ln y C w , w , y1 , y2 , y3 is the cost elasticity of output j with j = 1, 2, 3. (

where xit = ln

j

w1,it w3,it

3

3

For comparison purposes, we also consider two other parametric models: (1) a fully-parametric translog cost function with fixed effects (referred to as ‘‘F_Panel"), which is commonly used in investigating returns to scale of banking industry; and (2) a fully parametric translog cost function with interactive fixed effects (referred to as ‘‘L_Factor"). Specifically, these two models are written as CTit = x′it βo + αi + εit , (6.2)

w3,it CTit

w3,it

= x′it βo + γo′,i fo,t + εit ,

(6.3)

where, in order to implement within transformation for (6.2), we exclude constant 1 from xit when estimating (6.2). 6.2. Empirical results Following the common practice in the literature of production economics (Parmeter et al., 2014), standard deviations and confidence intervals in this subsection are all constructed using the wild bootstrap method based on 299 replications. In our particular case, the wild bootstrap method is implemented as follows:3 3 In order to ensure the validity of the bootstrap procedure, stronger assumptions on the error terms are needed. For example, one can employ martingale difference type of assumptions as Assumption A.2 of Su and Chen (2013) and Assumption A.4 of Su et al. (2015), or simply assume that the error terms are i.i.d. over both i and t. Generally speaking, when the error term exhibits both cross-sectional and serial correlation, the results obtained from the bootstrap procedure are not reliable or incorrect. This issue in fact has been overlooked by the entire literature of production economics.

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

619

Table 3 Percentage of the number of factors estimated. Case 1

ˆ <3 m

ˆ =3 m

ˆ >3 m

Case 2

ˆ <3 m

ˆ =3 m

ˆ >3 m

N \T

20

40

80

160

200

20 40 80 160 200 20 40 80 160 200 20 40 80 160 200

0.2770 0.1410 0.0750 0.0560 0.0420 0.6530 0.8390 0.9230 0.9440 0.9580 0.0700 0.0200 0.0020 0.0000 0.0000

0.1250 0.0120 0.0020 0.0000 0.0000 0.8570 0.9870 0.9980 1.0000 1.0000 0.0180 0.0010 0.0000 0.0000 0.0000

0.0600 0.0000 0.0000 0.0000 0.0000 0.9400 1.0000 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000

0.0300 0.0000 0.0000 0.0000 0.0000 0.9700 1.0000 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000

0.0450 0.0000 0.0000 0.0000 0.0000 0.9550 1.0000 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000

20 40 80 160 200 20 40 80 160 200 20 40 80 160 200

0.2850 0.1450 0.0780 0.0530 0.0420 0.6620 0.8390 0.9210 0.9470 0.9560 0.0530 0.0160 0.0010 0.0000 0.0020

0.1300 0.0150 0.0020 0.0000 0.0000 0.8590 0.9840 0.9980 0.9990 0.9980 0.0110 0.0010 0.0000 0.0010 0.0020

0.0630 0.0000 0.0000 0.0000 0.0000 0.9370 1.0000 1.0000 0.9990 0.9980 0.0000 0.0000 0.0000 0.0010 0.0020

0.0350 0.0000 0.0000 0.0000 0.0010 0.9650 0.9990 1.0000 0.9990 0.9960 0.0000 0.0010 0.0000 0.0010 0.0030

0.0460 0.0000 0.0000 0.0000 0.0000 0.9540 0.9970 0.9990 0.9990 1.0000 0.0000 0.0030 0.0010 0.0010 0.0000

Table 4 MSEs and estimates of coefficients. SI_Factor 0.0024

MSE Variables w

ln w1 w3 ln w2 3 ln y1 ln y2 ln y3 Constant

L_Factor 0.0047

F_Panel 0.0105

Est

std

Est

std

Est

std

0.0636 0.0042 0.0636 0.0156 0.0897 0.9805

0.0503 0.0039 0.0534 0.0133 0.0721 0.0157

0.4216 0.0190 0.3883 0.0938 0.4786 0.5120

0.0154 0.0007 0.0024 0.0015 0.0033 0.0344

0.3728 0.0177 0.3560 0.0997 0.4213

0.0044 0.0017 0.0045 0.0027 0.0049

ˆ being selected using (4.5). 1. We first estimate (6.1) with k = ⌊(NT )1/5 ⌋ and m 2. Calculate εˆ it = yˆ it − yit based on the estimates, where yˆ it = gˆk (x′it θˆ ) + γˆi′ fˆt . 3. For each replication, we generate new dependent variables using y∗it = yˆ it + εˆ it eit , where eit is drawn randomly ˆ and k from N(0, 1) over i and t. Then we implement estimation based on {(y∗it , xit ) |i ≥ 1, t ≥ 1} using the same m for each replication. 4. Then the bootstrap bias-corrected estimate is given by θ˜ = 2θˆ − θ ∗ , where θ ∗ stands for the mean of the estimates of θo obtained from bootstrap replications. With regard to the number of factors for the first model (i.e., (6.1)), we choose it by minimizing (4.5). Specifically, with ˆ = 4. Similarly, the number of factors N = 466, T = 80, and k = ⌊(NT )1/5 ⌋ = 8, the number of factors is chosen as m is chosen as 4 for the model (6.3). The parameter estimates with the associated standard errors of the three models are reported in Table 4. Moreover, to test the linearity of go , we implement the hypothesis test outlined in Su et al. (2015):

H0 :

Pr[go (xit ) = x′it θo ] = 1

H1 :

Pr[go (xit ) = xit θ] < 1 ′

for some θo ∈ Rd , versus for all θ ∈ Rd .

For our case, we reject the null hypothesis that go is linear at the 1% significance level. To see the nonlinear nature of go more clearly, we also plot the estimate of go in Fig. 3 on some selected points over the interval [1.6, 2.3], where the solid

620

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

Fig. 3. Estimated cost function. Table 5 Annual average returns to scale. Average RTS of all years:

1.1446

Year

RTS

95% CI

1986 1987 1988 1989 1990 1991 1992 1993 1994 1995

1.1607 1.1579 1.1598 1.1596 1.1589 1.1581 1.1531 1.1492 1.1466 1.1469

(1.1415, (1.1406, (1.1423, (1.1422, (1.1421, (1.1420, (1.1385, (1.1345, (1.1320, (1.1326,

(1.1295, 1.1601)

1.1781) 1.1731) 1.1748) 1.1747) 1.1735) 1.1719) 1.1669) 1.1644) 1.1628) 1.1624)

Year

RTS

95% CI

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

1.1458 1.1435 1.1406 1.1381 1.1366 1.1354 1.1296 1.1253 1.1229 1.1238

(1.1315, (1.1294, (1.1267, (1.1244, (1.1231, (1.1221, (1.1152, (1.1105, (1.1085, (1.1093,

1.1614) 1.1593) 1.1567) 1.1544) 1.1525) 1.1510) 1.1455) 1.1411) 1.1384) 1.1390)

line represents the estimated cost function, while the two dashed lines represent the associated 95% pointwise confidence intervals of these estimates based on the bootstrap procedure given above. To compare the performance of these three models, follow Dong et al. (2016) to compute mean squared errors ∑Nwe∑ T 1 2 ˆ ˆ (MSE) for each of the three models using MSE = NT i=1 t =1 (yit − yit ) , where yit is the estimated normalized total costs for bank i in period t, and yit is the corresponding actual normalized total costs. The MSE values for the three models are reported in Table 4. As can be seen, the single-index model has the lowest MSE, confirming that the single-index cost function with interactive fixed effects outperforms the other two models. This is not surprising given the fact that the single-index cost function with interactive fixed effects nests the other two models as special cases. In what follows, we focus on empirical results from the single-index cost function with interactive fixed effects. Table 5 presents the annual average returns to scale (RTS) estimates for each year, obtained by averaging over all sampled banks in each year. Two findings emerge from this table. First, the annual average RTS is greater than unity for all sample years, ranging from 1.1229 to 1.1607, indicating that on average the large banks show increasing returns to scale during the sample period. This finding is consistent with Wheelock and Wilson (2012), who, using a non-parametric local-linear estimator to estimate the cost relationship for commercial banks in the U.S. over the period 1984–2006, find that U.S. banks operated under increasing returns to scale. This finding partially explains why bank mergers and acquisitions in 1990s and early 2000s occurred at an unprecedented rate, because mergers and acquisitions allow banks to exploit economies of scale. Second, the annual average RTS shows a general downward trend, falling gradually from 1.1607 in 1986 to 1.1238 in 2005. A possible reason for the decline in RTS is that U.S. large banks grew rapidly during the sample period, which allowed those banks to exploit economies of scale and at the same time lowered their returns to scale. For instance, the average size of the large banks in our sample increased from $1.22 billion in 1986 to $7.63 billion in 2005 (in 1986 dollars), which, for a given production technology, would certainly lead to lower returns to scale. However, it should be noted here that there is another factor that might have increased RTS as the banks grew bigger — new technologies. Specifically, as banks grow bigger, they are more likely to afford new technologies. The adoption of new technologies increases the banks’ optimal scales over time, which results in higher RTS for given bundles of inputs. This explains why the annual average RTS does not decline rapidly during the sample period, despite the rapid growth of the large banks. Having examined annual average RTS estimates, we now turn to RTS estimates at individual bank level. We calculate the percentage of banks facing increasing, constant, or decreasing returns to scale for each year. Specifically, we count the number of cases where the 95% confidence intervals are strictly greater than 1.0 (indicating increasing returns to scale or IRS for short), strictly less than 1.0 (indicating decreasing returns to scale or DRS for short), or contain 1.0 (indicating constant returns to scale or CRS for short). The results are reported in Table 6. Two findings in this table are noteworthy.

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

621

Table 6 Returns to scale at individual bank level. Average

IRS — 97.85%

Year

IRS

CRS

CRS — 1.33% DRS

DRS — 0.82% Year

IRS

CRS

DRS

1986 1987 1988 1989 1990 1991 1992 1993 1994 1995

99.35% 99.14% 99.14% 99.14% 98.93% 99.36% 99.14% 98.71% 98.71% 98.93%

0.43% 0.43% 0.43% 0.64% 0.86% 0.42% 0.43% 0.64% 0.64% 0.43%

0.22% 0.43% 0.43% 0.22% 0.21% 0.22% 0.43% 0.65% 0.65% 0.64%

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

98.50% 98.07% 97.64% 97.42% 96.78% 96.57% 96.35% 95.06% 94.64% 95.49%

0.64% 1.29% 1.29% 1.72% 2.58% 2.58% 2.58% 3.22% 3.22% 2.15%

0.86% 0.64% 1.07% 0.86% 0.64% 0.85% 1.07% 1.72% 2.14% 2.36%

First, on average the vast majority (97.85%) of the banks face increasing returns to scale, a tiny percentage (1.33%) face decreasing returns to scale, and an even tinier percentage (0.82%) face constant returns to scale. Second, the percentage of banks facing increasing returns to scale shows a general downward trend, while the percentage of banks facing constant (decreasing) returns to scale shows a general upward trend. Specifically, the percentage of banks facing increasing returns to scale decreases markedly from 99.35% in 1986 to 95.49% in 2005; the percentage of banks facing constant returns to scale increases noticeably from 0.43% in 1986 to 2.15% in 2005; and the percentage of banks facing decreasing returns to scale steadily from 0.22% in 1986 to 2.36% in 2005. The results presented here are consistent with our previous discussion, that is, due to their rapid growth, more and more banks have reached or passed their optimal scales, leaving more and more banks operating under constant or decreasing returns to scale. 7. Conclusion This paper extends (Bai, 2009) to a semi-parametric single-index setting. The new model is flexible due to the use of a link function; at the same time, it allows for time-varying heterogeneity because of the presence of the interactive fixed effects. The corresponding asymptotic theories are established. Finally, in the empirical study we show how our model and methodology can be used by analysing the returns to scale of large commercial banks in the U.S. over the period 1986–2005. Specifically, we estimate a single-index cost function with interactive fixed effects. We then compare this cost function with two other parametric cost functions: (1) a fullyparametric translog cost function with fixed effects, and (2) a fully-parametric translog cost function with interactive fixed effects. Our results show that the former cost function outperforms the latter two. Our empirical results from the single-index cost function with interactive fixed effects show that the majority of U.S. large banks face increasing returns to scale, and that the percentage of U.S. large banks that face increasing returns to scale has declined over time. Appendix A. Supplementary data Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jeconom.2019.05.018. References Bai, J., 2009. Panel data models with interactive fixed effects. Econometrica 77 (4), 1229–1279. Bai, J., Kao, C., Ng, S., 2009. Panel cointegration with global stochastic trends. J. Econometrics 149 (1), 82–99. Bai, J., Liao, Y., 2017. Inferences in panel data with interactive effects using large covariance matrices. J. Econometrics 200 (1), 59–78. Bai, J., Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica 70 (1), 191–221. Belloni, A., Chernozhukov, V., Chetverikov, D., Kato, K., 2015. Some new asymptotic theory for least squares series: Pointwise and uniform results. J. Econometrics 186 (2), 345–366. Berger, A.N., Demsetz, R.S., Strahan, P.E., 1999. The consolidation of the financial services industry: Causes, consequences, and implications for the future. J. Bank. Financ. 23 (2–4), 135–194. Boneva, L., Linton, O., 2017. A discrete choice model for large heterogeneous panels with interactive fixed effects with an application to the determinants of corporate bond issuance. Bank of England Staff Working Paper No. 640. Chang, J., Guo, B., Yao, Q., 2015. High dimensional stochastic regression with latent factors, endogeneity and nonlinearity. J. Econometrics 189 (2), 297–312. Chen, X., Christensen, T.M., 2015. Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions. J. Econometrics 188 (2), 447–465. Chen, J., Gao, J., Li, D., 2012a. A new diagnostic test for cross–section uncorrelatedness in nonparametric panel data models. Econometric Theory 28 (5), 1144–1163. Chen, J., Gao, J., Li, D., 2012b. Semiparametric trending panel data models with cross-sectional dependence. J. Econometrics 171 (1), 71–85. Chen, X., Linton, O., Keilegom, I.V., 2003. Estimation of semiparametric models when the criterion function is not smooth. Econometrica 71 (5), 1591–1608. Dong, C., Gao, J., 2018. Specification testing driven by orthogonal series for nonlinear cointegration with endogeneity. Econometric Theory 34 (4), 754–789.

622

G. Feng, B. Peng, L. Su et al. / Journal of Econometrics 212 (2019) 607–622

Dong, C., Gao, J., Tjøstheim, D., 2016. Estimation for single-index and partially linear single-index inetgrated models. Ann. Statist. 44 (1), 425–453. Dong, C., Linton, O., 2018. Additive nonparametric models with time variable and both stationary and nonstationary regressors. J. Econometrics 207 (1), 212–236. Fan, J., Liao, Y., Wang, W., 2016. Projected principal component analysis in factor models. Ann. Statist. 44 (1), 219–254. Feng, G., Peng, B., Zhang, X., 2017. Productivity and efficiency at bank holding companies in the U.S.: A time-varying heterogeneity approach. J. Product. Anal. 48 (2–3), 179–192. Floro, D., van Roye, B., 2017. Threshold effects of financial stress on monetary policy rules: a panel data analysis. European Central Bank Working Paper Series. No. 2042 - April 2017. Gao, J., Tong, H., Wolff, R., 2002. Model specification tests in nonparametric stochastic regression models. J. Multivariate Anal. 83 (2), 324–359. Hall, P., Li, Q., Racine, J.S., 2007. Nonparametric estimation of regression functions in the presence of irrelevant regressors. Rev. Econ. Stat. 89 (4), 784–789. Houston, J.F., Stiroh, K.J., 2006. Three Decades of Financial Sector Risk. Federal Reserve Bank of New York Staff Report No. 248. Ichimura, H., 1993. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J. Econometrics 58 (1–2), 71–120. Jiang, B., Yang, Y., Gao, J., Hsiao, C., 2017. Recursive estimation in large panel data models: Theory and practice. https://ssrn.com/abstract=2915749. Jones, K.D., Critchfield, T., 2005. Consolidation in the U.S. banking industry: Is the long, strange trip about to end? FDIC Bank. Rev. 17 (4), 31–61. Kapetanios, G., Pesaran, M.H., Yamagata, T., 2011. Panels with non-stationary multifactor error structures. J. Econometrics 160 (2), 326–348. Kim, M.S., Sun, Y., 2013. Heteroskedasticity and spatiotemporal dependence robust inference for linear panel models with fixed effects. J. Econometrics 177 (1), 85–108. Li, D., Qian, J., Su, L., 2016. Panel data models with interactive fixed effects and multiple structural breaks. J. Amer. Statist. Assoc. 111 (516), 1804–1819. Newey, W.K., 1990. Efficient estimation of semiparametric models via moment restrictions. Working Paper, Department of Economics, MIT. Newey, W.K., 1994. The asymptotic variance of semiparametric estimators. Econometrica 62 (6), 1349–1382. Newey, W.K., 1997. Convergence rates and asymptotic normality for series estimators. J. Econometrics 79 (1), 147–168. Newey, W.K., Powell, J.L., 2003. Instrumental variable estimation of nonparametric models. Econometrica 71 (5), 1565–1578. Parmeter, C.F., Sun, K., Henderson, D.J., Kumbhakar, S.C., 2014. Estimation and inference under economic restrictions. J. Product. Anal. 41 (1), 111–129. Pesaran, M.H., 2006. Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica 74 (4), 967–1012. Sealey, C.W., Lindley, J.T., 1977. Inputs, outputs, and a theory of production and cost at depository financial institutions. J. Finance 32 (4), 1251–1266. Serletis, A., Feng, G., 2015. Imposing theoretical regularity on flexible functional forms. Econometric Rev. 34 (1–2), 198–227. Stiroh, K., 2000. How did bank holding companies prosper in the 1990s? J. Bank. Financ. 24 (11), 1703–1745. Su, L., Chen, Q., 2013. Testing homogeneity in panel data models with interactive fixed effects. Econometric Theory 29 (6), 1079–1135. Su, L., Jin, S., 2012. Sieve estimation of panel data models with cross section dependence. J. Econometrics 169 (1), 34–47. Su, L., Jin, S., Zhang, Y., 2015. Specification test for panel data models with interactive fixed effects. J. Econometrics 186 (1), 222–244. Su, L., Ullah, A., 2016. Nonparametric and semiparametric panel econometric models: Estimation and testing. In: Ullah, A., Giles, D. (Eds.), Handbook of Empirical Economics and Finance. CRC Press. Wheelock, D.C., Wilson, P.W., 2012. Do large banks have lower costs? New estimates of returns to scale for U.S. banks. J. Money Credit Bank. 44 (1), 171–199.