Journal of Econometrics 189 (2015) 148–162
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Smooth coefficient estimation of a seemingly unrelated regression Daniel J. Henderson a,∗ , Subal C. Kumbhakar b,c , Qi Li d,e , Christopher F. Parmeter f a
Department of Economics, Finance and Legal Studies, University of Alabama, Tuscaloosa, AL 35487-0224, United States
b
Department of Economics, State University of New York at Binghamton, United States
c
University of Stavanger Business School, Stavanger, Norway
d
Department of Economics, Texas A& M University, United States
e
ISEM, Capital University of Economics & Business, China
f
Department of Economics, University of Miami, United States
article
info
Article history: Received 5 June 2012 Received in revised form 12 March 2015 Accepted 12 July 2015 Available online 29 July 2015 JEL classification: C14 C39
abstract This paper proposes estimation and inference for the semiparametric smooth coefficient seemingly unrelated regression model. We discuss the imposition of cross-equation restrictions which are required by economic theory as well as methods for data driven bandwidth selection. A test of correct functional form for the entire system of equations is also constructed. Asymptotic and finite sample results are given. We illustrate our estimator by applying it to a cost system for US commercial banks. Our results show that most of the banks are operating under increasing returns to scale, but that returns to scale decrease with bank size. © 2015 Elsevier B.V. All rights reserved.
Keywords: Semiparametric smooth coefficient model System estimation Bandwidth selection Banking
1. Introduction Nonparametric methods are now quite popular among statisticians, econometricians and applied economists. However, a well known criticism against the use of nonparametric models is the ‘curse of dimensionality’. In applied settings this is likely to be troubling as researchers typically have access to a potentially large number of explanatory variables. While one could employ dimension reduction methods such as projection pursuit (Huber, 1985) or engage in significance testing/automatic variable removal (Lavergne and Vuong, 2000; Hall et al., 2007), a common alternative is to use semiparametric methods. While not as flexible as their nonparametric counterparts, semiparametric methods can lessen the curse of dimensionality while not sacrificing too much flexibility for the problem at hand. Additionally, in some settings the use of semiparametric methods can allow easier implementation of an estimator that satisfies certain faculties of the given problem, imposing constraints for example.
∗
Corresponding author. Tel.: +1 205 348 8991; fax: +1 205 348 0186. E-mail address:
[email protected] (D.J. Henderson).
http://dx.doi.org/10.1016/j.jeconom.2015.07.002 0304-4076/© 2015 Elsevier B.V. All rights reserved.
Here we use the semiparametric smooth coefficient model (SPSCM) to illustrate this point. The SPSCM has its origins in econometrics dating back to the seminal work of Robinson (1989).1 Currently it has seen renewed interest, most likely stemming from the fact that it is easily manipulated to mesh with a variety of econometric settings. Das (2005) and Cai et al. (2006) proposed using this estimator in an instrumental variable setting while Cai and Li (2008) proposed using the SPSCM to estimate a dynamic panel regression model. In an applied setting, Mamuneas et al. (2006) used the SPSCM to study the relationship between development and human capital. In this paper we develop a SPSCM for estimating a seemingly unrelated regression (SUR) model. There are several reasons why we choose to use the SPSCM for a SUR model. First, as mentioned previously, semiparametric methods lessen the curse of dimensionality, which is important in applied settings. Second, in typ-
1 These methods were made popular in statistics when they were explored by Cleveland et al. (1991) and Hastie and Tibshirani (1993) where they are commonly referred to as varying coefficient models.
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
ical applications of SUR models, cross-equation restrictions2 are required and can be imposed via matrix arrangement when we use the SPSCM. Lastly, the SPSCM has a ‘conditionally’ parametric structure, which makes interpretation of results straightforward. Our discussion of semiparametric estimation in a system adds to the burgeoning literature in this field, which has seen recent interest (Welsh and Yee, 2006; Matzkin, 2008; Jun and Pinkse, 2009). Our semiparametric estimator is straightforward to implement and should prove valuable for modeling systems of equations when the functional form of the unknown responses is not immediately derived from economic theory and cross-equation coefficient restrictions are necessary to impose. In microeconomic theory, cross-equation coefficient restrictions are quite common. For example, in estimating consumer demand functions based on utility maximization with budget constraints (or minimizing cost for a given level of utility), the demand functions share the parameters in the utility function. Similarly, in production theory, conditional input demand functions for cost minimizing firms share the parameters of the production (cost) function. The (unconditional) input demand and output supply functions for profit maximizing firms depend on the production (profit) function parameters. We show how to incorporate these restrictions in a SUR system and that accounting for these cross-equation coefficient restrictions improves the asymptotic efficiency of the semiparametric smooth coefficient estimator. In a similar vein, Orbe et al. (2003) develop a semiparametric time-varying smooth coefficient estimator for a system of equations (see also, Orbe et al., 2005). In much the same spirit that Li et al. (2002) generalizes the work of Robinson (1989), our work here generalizes that of Orbe et al. (2003) who developed an estimator to impose various restrictions on (potentially) time-varying coefficients. No theoretical properties of the proposed estimator were provided although the method was demonstrated to work well in both practical and simulated settings. Moreover, the focus of Orbe et al. (2003) was on allowing for seasonality and trending for all coefficients in a system of equations, coupled with restrictions on the coefficients. Their estimator requires solving a recursion formula. In our setting, a closed form solution exists that does not become more difficult as the sample size grows. Further, we provide the asymptotic properties of our estimator as well as for our test of correct parametric specification of the SUR model. In addition, we propose a method for data driven bandwidth selection. Alternatively, models with varying coefficients stemming from a system of equations which depend on unobserved heterogeneity have recently been proposed. Jun (2009) presents a triangular model with varying coefficients that depend upon unobserved heterogeneity as opposed to explanatory variables. Jun’s (2009) model stems from a non-separable triangular system that allows for a wide variety of heterogeneity as well as endogeneity. A further benefit of the semiparametric system estimator within the production paradigm is the inclusion of non-traditional inputs or environmental variables (which will be illustrated in our empirical example). It is common to encounter key variables in an applied production setting which do not fit into a classic input/output analysis, but more than likely impact the production environment of the firm. Our semiparametric model can incorporate these variables directly into the smooth coefficients and, conditional on these variables, we have a consistent notion of the production environment. Another way to view the influence of
2 The cross-equation coefficient restrictions we consider here are required by economic theory and are not debatable. However, the theory is more general and could be used to impose other restrictions of economic interest (e.g., constant and/or unitary returns to scale, separability, monotonicity, etc.).
149
these variables is that they change the production landscape in a manner that makes the model a standard parametric production model, but holding the levels of these variables fixed. We apply our cross-sectional cost system method to US commercial banks in 2010. Since bank size is an important factor in the production environment, we use it as an argument for the smooth coefficients. We consider a single equation cost function with a SPSCM as well as a cost system with a SPSCM. Our results suggest that the impact of the production environment, as measured by bank size depends on whether we use the cost share equations or focus exclusively on the single equation cost function. We find increasing returns for most banks, but our results show that returnsto-scale diminish with bank size. When we use the single equation cost function, the increasing returns hold for even very large banks, whereas for the system estimator, we cannot reject constant returns for the largest banks. This finding is potentially important as increasing returns is often used to justify bank mergers and in policy debates on regulations limiting the size of banks (especially after the recent financial crisis). The remainder of the paper is organized as follows. Section 2 presents the SPSCM estimator for a SUR system and establishes the large sample theory. Section 3 provides a test of correct functional form for the entire SUR system. Section 4 provides finite sample results from a small Monte Carlo setup. Results from our empirical example are given in Section 5. Finally, Section 6 presents some concluding remarks and direction for future research. 2. Semiparametric smooth coefficient systems of equations The general setup of a varying coefficient regression takes the form yi = xTi β(zi ) + ui ,
i = 1, . . . , n
(1)
where yi is the response variable of unit i, xi is a l × 1 vector of regressors, the superscript T denotes transpose, zi is a vector of environmental variables of dimension q and ui is an additive idiosyncratic error. One can envision the setup in (1) stemming from the translog cost function presented in (11) via a set of environmental variables that characterize the operating environment of the firms. For example, Feng and Serletis (2009) allow the parameters of their translog cost function to varying depending on the size category that each bank falls within. Asaftei and Parmeter (2010) note that the smooth coefficient model can be thought of as linear in parameters for a fixed value of z. Li et al. (2002) discuss standard local-constant estimation of this model in the multivariate setting, prove its consistency and provide a test of function form while Lee and Ullah (2001) study the local-linear version of this estimator. Other theoretical contributions include Cai et al. (2000a,b) who propose a one-step local maximum likelihood estimator for generalized linear models with varying coefficients, Cai et al. (2000a,b) who study the time-series properties of the varying coefficient model and show how many practical time series models can have smoothly varying coefficients, and Cai (2007) and Cai et al. (2009) who discuss the asymptotic properties of the local linear smooth coefficient model in the presence of non-stationarity. Further, Fan and Huang (2005) detail inference via profile likelihood estimation of varying coefficient models and show that a profile likelihood ratio test provides power gains over existing tests involving varying coefficients while Li and Racine (2010) study the theoretical and practical properties of the varying coefficient estimator case in the mixed discrete–continuous data environment. As should be evident, the single equation varying coefficient regression estimator is well studied and has been shown to have suitable asymptotic properties across a range of models and assumptions.
150
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
2.1. Seemingly unrelated regression of varying coefficients
and
We consider a semiparametric smooth coefficient version of the SUR model proposed by Zellner (1962) (which we denote as a SPSCM SUR). First, let ysi = xTsi βs (zsi ) + usi ,
s = 1, . . . , m; i = 1, . . . , n
(2)
where the subscript s denotes observations from the sth equation for s ∈ {1, . . . , m}, xsi and βs (·) are both of dimension ls × 1, zsi is of dimension q, ysi and usi are scalars. The main interest of the paper is to obtain consistent estimates of the varying coefficient functions at an arbitrary point z ∈ Rq . For simplicity, we will consider the case that l1 = l2 = · · · = lm = l, and z1i = z2i = · · · = zmi = zi (the same zi variable appears in all m equations). Since there are two subscripts s and i, we define y˜ i = (y1i , y2i , . . . , ymi )T as a m × 1 vector of the dependent variable for the ith individual. Similarly, let x˜ i be the m × (ml) matrix given by xT1i 0
0 xT2i
.
.. .
... ...
0
0
...
x˜ i = . .
0 0
.. , .
(3)
xTmi
which is the matrix of explanatory variables for the ith individual; β(zi ) = (β1 (zi )T , β2 (zi )T , . . . , βm (zi )T )T is the (ml) × 1 vector of the varying coefficient function evaluated at zi ; and u˜ i = (u1i , u2i , . . . , umi )T the m × 1 vector of the error term for individual i. Below we first use the above notation to discuss some parametric model estimation; these will help us derive the semiparametric estimators. If β(ml)×1 is a vector of constant parameters, then the ordinary least squares (OLS) estimator of β can be obtained as the solution of b in the following minimization problem: min b
n
n ˜ ˜ = min [˜yi − x˜ i b]T [˜yi − x˜ i b].
uTi ui
i=1
b
i =1
However, it is well known that the OLS estimator of β is less efficient than the generalized least squares (GLS) estimator. Let Σ = Var (˜ui ) = {σts }t ,s=1,...,m is the m × m variance–covariance matrix of u˜ i = (u1i , . . . , umi )T . Then the GLS estimator of β can be obtained by minimizing min b
n
u˜ Ti Σ −1 u˜ i = min b
i=1
n [˜yi − x˜ i b]T Σ −1 [˜yi − x˜ i b]. i =1
For our semiparametric model, β(z ) is a function of z. We want to estimate β(z ) for a given point z. To achieve this goal, we need to use a kernel weight function Ki (z ) = Kzi ,z which gives more weights to observations (of zi ’s) that are closer to z. Hence, our semiparametric (OLS type) estimator of β(z ) is the solution of b that minimizes n [˜yi − x˜ i b]T [˜yi − x˜ i b]Ki (z ).
(4)
Similarly, the semiparametric GLS type estimator of β(z ) is the vector of b that minimizes n
(5)
i =1
˜ z ) and β( ˆ z ) to denote the solutions of b to (4) and If we use β( (5), respectively, then it is easy to show that ˜ z) = β(
n i=1
−1 n T T x˜ i x˜ i Ki (z ) x˜ i y˜ i Ki (z ) , i=1
−1 x˜ Ti Σ −1 x˜ i Ki (z )
n
x˜ Ti Σ −1 y˜ i Ki (z ) .
(7)
i=1
i=1
= (˜yT1 , y˜ T2 , . . . , y˜ Tn )T be the (mn) × 1 vector of the dependent variable, and X˜ = (˜x1 , x˜ 2 , . . . , x˜ n )T be the (mn) × (ml) ˜ (z ) = matrix of the explanatory variables, Γ˜ = In ⊗ Σ , and K K (z ) ⊗ Im , where K (z ) = Diag (K1 (z ), K2 (z ), . . . , Kn (z )) is a n × n diagonal kernel weight matrix. With these notations, it is ˆ z ) defined in (7) can be written as straightforward to show that β( −1 ˜ (z )1/2 Γ˜ −1 K˜ (z )1/2 Y˜ . (8) ˆ z ) = X˜ T K˜ (z )1/2 Γ˜ −1 K˜ (z )1/2 X˜ X˜ T K β( Let Y˜
The above method (notation) of the stacking data by grouping individual i’s data first (m of them), then stacking individuals one by one to get the full data is convenient for deriving the semiparametric estimator of β(z ) defined by (7) or equivalently by (8), but it is not the commonly used way of stacking data. The more conventional way is to put all the data for the first equation first (n observations for s = 1), followed by all the data for the second equation, and so on. Define a (mn) × 1 vector of dependent variables Y = (yT1 , yT2 , . . . , yTm )T , where ys = (ys1 , . . . , ysn )T is n × 1 vector of the dependent variable for the sth equation, s = 1, 2, . . . , m. Similarly, let Xs be the n × l matrix of explanatory variables from the sth equation (s = 1, . . . , m), and the (mn)×(ml) explanatory variable matrix X1 0
0 X2
... ...
0
0
...
X= .. .
.. .
0 0
.. . .
Xm
ˆ z) K(z ) = Im ⊗ K (z ) and Γ = Σ ⊗ In . Then it can be shown that β( defined in (7) (or (8)) can also be written as ˆ z ) = XT K(z )1/2 Γ −1 K(z )1/2 X −1 XT K(z )1/2 Γ −1 K(z )1/2 Y , (9) β( where Γ −1 = Σ −1 ⊗ In . What separates this model from Zellner (1962) is the vector of regression coefficients. The l × 1 parameter vector βs (z ) is a function of z. In this sense the sth equation is an example of the SPSCM of Li et al. (2002). We note that we can potentially vary the elements of z across s for the semiparametric Zellner (1962) case, but we keep it fixed here for simplicity.3 We estimate the varying coefficient functions by the nonparametric kernel method. We allow for z to contain both discrete and continuous components. Let z = (z c , z d ), where z c = (z1c , . . . , zqc1 )
and z d = (z1d , . . . , zqd2 ) are the continuous and discrete compo-
nents of z, respectively (q1 ≥ 1, q2 ≥ 0 with q1 + q2 = q). q1 c c Define product kernel functions Wi (z c ) = j=1 w((zij − zj )/h)
q
i =1
[˜yi − x˜ i b]T Σ −1 [˜yi − x˜ i b]Ki (z ).
ˆ z) = β(
n
(6)
d d 2 and Li (z d ) = j=1 l(zij , zj , λ), where w(·) and l(·) are univariate kernel functions. h and λ are smoothing parameters associated with z c and z d , respectively. For example, one can use the Gaussian kernel for w(·), l(zijd , zjd , λ) = 1(zijd = zjd ) + λ1(zijd ̸= zjd ) if
zjd is an unordered discrete variable (see Racine and Li (2004)), and 1(|z d −z d |)
l(zijd , zjd , λ) = 1(zijd = zjd ) + λ ij j 1(zijd ̸= zjd ) if zjd is an ordered discrete variable, where 1(A) = 1 if A holds true, and zero
3 There is no loss of generality in this approach as we could always redefine z = (z1 , z2 , . . . , zm ) where zs is the set of z variables in equation s. However, knowing which zs enter which equation will mitigate the impact of the curse of dimensionality. We thank a referee for drawing this generalization to our attention.
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
otherwise (see eq. (10) in Ouyang et al. (2009)). The n × n diagonal kernel matrix defined earlier: K (z ) = diag (K1 (z ), . . . , Kn (z )) has its diagonal element given by Ki (z ) = Wi (z c )Li (z d ). Our estimation method allows for both discrete and continuous covariates z (Li and Racine, 2010). Thus, we generalize Li et al. (2002) to both the SUR framework and to the mixed discrete and continuous nonparametric covariates case. Here we note that alternative versions of (9) exist (Lin and Carroll, 2000; Henderson and Ullah, 2005), but Welsh and Yee (2006) show that this version gives consistent estimates in a fully nonparametric SUR model. Further extensions (likely with the same benefits and difficulties) can be made by considering a local-linear version of (9) following the method discussed in Lee and Ullah (2001). A local linear estimator requires more complicated notation and asymptotic analysis. Therefore, we do not pursue a local linear estimator in this paper. The estimator defined in (9), however, is generally not operational because Γ is typically unknown. A consistent estimate of Γ can be obtained using a consistent estimate of u which can be obtained by using the consistent system estimator that ignores the information in the variance–covariance matrix. In other words, this estimator is formed by setting Γ = Imn . In this case our SPSCM SUR looks nearly identical to the original, single equation SPSCM estimator,
˜ z ) = XT K(z )X −1 XT K(z )Y . β(
(10)
With this estimator in hand, we can obtain the m × 1 vector of ˜ zi ), the estimate of the m × m matrix Σ residuals as uˆ i = y˜ i − x˜ i β( n −1 ˆ ˆ i uˆ Ti . Hence, the feasible estimator is calculated as Σ = n i=1 u
ˆ z ) by replacing Σ −1 by Σ ˆ −1 . Given of β(z ) can be obtained from β( ˆ that Σ is a finite dimensional matrix and that Σ − Σ = op (1)
= op (1)), the feasible estimator of β(z ) has ˆ z ) (that uses Σ −1 in its defithe same asymptotic behavior as β( ˆ (hence, Σ
−1
−Σ
−1
nition). Therefore, for notational simplicity we will only consider ˆ z ) in this paper. β( The proposed semiparametric smooth coefficient generalization of Zellner’s (1962) SUR model offers several advantages over a fully parametric SUR model. From an economic perspective, if there are operating or environmental variables that characterize the underlying technology, then their omission will lead to biased and inconsistent estimates. Further, these variables’ impact on firm technology is in general poorly understood given that they do not act as traditional inputs, thus, the exact manner in which they enter the model is debatable and a semiparametric approach has obvious appeal. How these additional variables are selected is case specific and will need to be tailored for a given application. However, our theoretical results require that the z variables are exogenous, thus assisting in the types of operating environments and managerial effects that can be measured and included. Econometrically, the inclusion of these additional variables poses little additional costs (e.g., computing time) as the model still retains the parametric structure of a SUR.4 2.2. Cross-equation coefficient restrictions The SPSCM SUR estimator we have discussed so far may not be directly portable to the applied microeconomics setting where cross-equation restrictions need to hold. For example, in the theory of the firm, the focus is on estimation of the technology, which
4 In the case where a subset of the coefficients do not vary with z it would be possible to construct a partially linear extension of our model. We leave this for future research.
151
is often specified in terms of the dual cost/profit/revenue function. A common feature of these functions is that they satisfy derivative properties (Shephard’s/Hotelling’s lemma). These derivative properties require functional restrictions that are to be satisfied by the underlying technology. In a parametric model, these functional restrictions require cross-equational restrictions on the parameters. As a concrete example, if firms minimize cost, the underlying production technology is specified in terms of a dual cost function, viz., C = C (v, o) where o is a vector of L outputs, v a vector of J input prices and X the corresponding vector of J inputs. This cost function satisfies the following derivative property (Shephard’s lemma)
∂C = Xj , ∂vj
j = 1, 2, . . . , J .
If we add the firm subscript i and use a translog cost function, ln Ci = β0 +
J
βvj ln vji +
j =1
+
L
βot oti +
t =1
J L 1
2 t =1 j =1
δot ln oti ln oji +
J J 1
2 j=1 k=1
J L
γjk ln vji ln vki
κjt ln vji ln oti ,
(11)
j =1 t =1
where β ≡
β0 , βv1 , . . . , βvJ , βo1 , . . . , βoL , γ ≡ γ11 , . . . , γJJ , δ ≡ (δo1 , . . . , δoL ) and κ ≡ κ11 , . . . , κJL are parameters to be estimated, then Shephard’s lemma delivers the following cost share equations
vji Xji ∂ ln Ci = ∂ ln vji Ci = βvj +
J k=1
γjk ln vji +
L
κjt ln oti ,
j = 1, . . . , J .
(12)
t =1
In this setting, the SUR system consisting of the cost function in (11) and J − 1 of the cost share equations5 in (12). Note that none of the parameters in the cost share equations in (12) are new in the sense that they all appear in (11). That is, in using both (11) and (12) the parameters βv j , γjk and κjt appear across the system of equations. Christensen and Greene (1976) used these restrictions in estimating a dual cost function in a fully parametric setting, given in (11) and (12). The cost system in (11) and (12) can be written in the form ysi = xTsi βs (zi ) + usi subject to a set of restrictions Rβ (zi ) = r. Specifically, our goal is to estimate the varying coefficient (vector) function β(z ) subject to the set of restrictions on the functional coefficients6 Rβ(z ) = r ,
(13)
where R is the standard (J × ml) design matrix, where J is the total number of coefficient restrictions and r is a J × 1 vector. It
5 We cannot use all J share equations because the shares in (12) sum to unity, the random disturbances corresponding to the share equations sum to zero, thus yielding a singular covariance matrix of errors. Barten (1969) has shown that full information maximum likelihood estimates of the parameters can be obtained by arbitrarily deleting any one cost share equation. Alternatively, this problem can also be avoided by normalizing the cost and input prices by one of the input prices such that only J − 1 share equations are left. 6 As mentioned in the introduction, these restrictions are part and parcel of the models based on duality. Thus the model without these restrictions might be meaningless. However, one can consider other restrictions in the model that follow duality results. It is also possible to think of the model in more general terms (applications beyond duality) in which the unrestricted model might make sense and a key objective is to test the restrictions. In such a case the idea might be to examine efficiency gains from imposing the constraints. Our discussion in Section 4 follows this route.
152
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
should be noted that this method is not only useful for ensuring that the cross-equation coefficient restrictions hold, but it also leads to asymptotic and finite sample gains as the number of parameters being estimated can potentially be decreased by J and more observations may potentially be used to estimate a given parameter. Estimation of (2) subject to the cross-equation restrictions in (13) amounts to solving for β(z ) in minimizing (5) subject to (13). The Lagrangian for this problem is L = [Y − X β(z )]T K(z )1/2 Γ −1 K(z )1/2 [Y − X β(z )]
− 2µT [Rβ(z ) − r] ,
Let
ˆ (z ) = [XT K(z )1/2 Γ −1 K(z )1/2 X]−1 RT G
× {R[XT K(z )1/2 Γ −1 K(z )1/2 X]−1 RT }−1 R
Jn (z ) = [XT K(z )1/2 Γ −1 K(z )1/2 X]−1 RT
× {R[XT K(z )1/2 Γ −1 K(z )1/2 X]−1 RT }−1 r
then we can, using the same ideas that we used to decompose ˆ z ), show that our estimator which imposes the cross-equation β( coefficient restrictions is
(14)
βˆ ∗ (z ) = [I − Gˆ (z )]{β(z ) + Dn (z )−1 [An (z ) + Cn (z )]} + Jn (z ). (16)
where µ is a J × 1 vector of Lagrange multipliers. Taking the firstorder conditions of (14) with respect to β(z ) and γ and solving for β(z ) leads to the estimator
We now provide the assumptions under which we will develop the theoretical properties for our semiparametric estimator. These assumptions will also be used for our testing procedure which is based on an integrated squared difference statistic. We allow for z to contain both continuous and discrete components. We write z = (z c , z d ), where z c and z d are the continuous and discrete components of z, respectively.
ˆ z ) − XT K(z )1/2 Γ −1 K(z )1/2 X −1 RT βˆ ∗ (z ) = β( −1 T −1 ˆ z) − r , Rβ( R × R XT K(z )1/2 Γ −1 K(z )1/2 X ˆ z ), the unconstrained system which is itself a function of β( estimator. In the standard case where r is a vector of zeros, this estimator simplifies to βˆ ∗ (z ) =
Iml − XT K(z )1/2 Γ −1 K(z )1/2 X
×
−1
R XT K(z )1/2 Γ −1 K(z )1/2 X
−1
RT
RT
−1
ˆ z ). R β(
To construct the estimate of Γ , we define the estimator which ignores the information in the variance–covariance matrix (for the case where r = 0J ) as
−1
β ∗ (z ) = Iml − XT K(z )X
R
T
−1
R XT K(z )X
RT
−1
ˆ z ). R β(
The full construction of the variance–covariance matrix follows from the discussion in the previous sub-section. 2.3. Large sample properties First, we write our estimator in (7) as
ˆ z ) = β(z ) + [Dn (z )]−1 {An (z ) + Cn (z )}, β( where, Dn (z ) = An (z ) = Cn (z ) =
n
1
nhq1 i=1 n
1 nhq1
x˜ Ti Σ −1 x˜ i Kiz x˜ Ti Σ −1 x˜ i (β(zi ) − β(z ))Kiz
i=1 n
1
nhq1 i=1
d
q
Assumption 2.2. (i) W (·) is a product kernel W (v) = j=1 1 w(vj ). w(·) is a bounded symmetric (around zero) density function satisfying w(v)v 4 dv < ∞. (ii) As n → ∞, h → 0 and nhq1 → ∞. Assumption 2.1 places very generic conditions on the data generating process underlying our smooth coefficient system. Further, 2.1(i) implies that E (ui |xi , zi ) = 0 while 2.1(ii) allows for very general forms of unknown conditional heteroscedasticity. Assumption 2.2 places standard conditions on the product kernel used for construction of the semiparametric smooth coefficient estimator. 2.2(i) suggests that we are using standard second-order kernel functions when smoothing. 2.2(ii) places the usual limit behavior on the bandwidths used for smoothing. As the sample size grows the bandwidth(s) need to decrease to eliminate the bias, yet they must decrease slow enough that the variance component also decreases, the classic bias–variance trade-off. Here we are requiring the optimal decay of the bandwidth to balance squared bias and variance. Summarizing what we have found above, we obtain the following result:
x˜ Ti Σ −1 u˜ i Kiz ,
In Appendix A we show that Dn (z ) = M (z ) + op (1), where M (z ) = E [˜xTi Σ −1 x˜ i |zi = z ]f (z ). The An (z ) corresponds to bias terms with An (z ) = h2 A1 (z ) + λA2 (z ) + op (h2 + λ + (nhq1 )−1/2 ), where A1 (z ) and A2 (z ) are finite constants (depending on z) and are defined at the Appendix. We also show that Cn (z ) has zero mean and variance (nhq1 )−1 [ν0 M (z ) + o(1)]. Moreover, by the Liapunov central limit theorem we have
√
Assumption 2.1. (i) The data {˜xi , y˜ i , zi }ni=1 are independent and identically distributed (i.i.d.) as (˜x1 , y˜ 1 , z1 ). E [˜yi |˜xi , zi ] = x˜ i β(zi ) almost everywhere, and u˜ i = y˜ i − x˜ i β(zi ) possesses finite fourth moments. (ii) Let fz (z ) denote the marginal density function of zi and let fJ (˜xi , zi ) represent the joint density function of (˜xi , zi ). β(z ) is three-time continuously differentiable with respect to z c at all interior point z c ∈ Zc , where Zc is the support of zic . fz (z ) and fJ (x, z ) are both twice continuously differentiable with respect to z c for z c in the interior of Zc . (iii) fJ (x, z ) and fz (z ) are bounded and β(zi ) and (˜xi , zi ), possess finite fourth moments.
nhq1 Cn (z ) → N (0, ν0 M (z )).
(15)
Theorem 2.1. Under Assumptions 2.1 and 2.2, for a fixed point z = (z c , z d ) ∈ Zc × D with z c is an interior point of Zc , where D is the
support of z d , we have
√
ˆ (z )]β(z ) − Jn (z ) nhq1 βˆ ∗ (z ) − [I − G d − [I − Gˆ (z )]Dn (z )−1 Ah,λ (z ) → N (0, Λ(z )),
where Ah,λ (z ) = h2 A1 (z ) + λA2 (z ), A1 (z ) and A2 (z ) are defined in the Appendix A. Λ(z ) = ν0 [I − G(z )]M (z )−1 [I − GT (z )], G(z ) = ˆ (z ), M (z )−1 RT [RM (z )−1 RT ]−1 R is the probability limit of G
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
Note that ν0 M (√ z )−1 is the asymptotic variance of (the unreˆ z ). The difference between the asympstricted estimator) nhq1 β( totic variances of the unrestricted and the restricted estimators is
ν0 M (z )−1 − Λ(z ) = ν0 M (z )−1 − ν0 [I − G(z )]M (z )−1 [I − GT (z )] where G(z ) is an asymmetric and idempotent matrix. By proposition 2 in Taylor (1976) we have that ν0 M (z )−1 − Λ(z ) is positive semi-definite. Thus, as expected, the restricted estimator is more efficient than the unrestricted estimator. This is an important result which extends the insights of Taylor (1976) to restricted estimation within a semiparametric context. Our main result here suggests that if cross-equation restrictions exist then they can be used in estimation to improve efficiency. We again want to reiterate that the restrictions we have in mind in this paper are required by theory, but these restrictions lead to gains nonetheless. As an example, consider a simple case where l = 1, r = 0 and R = (1, −1) which corresponds to the restriction β1 (z ) = β2 (z ). We then have (xTsi = xsi since xsi is a scalar in this example) ysi = xsi βs (zsi ) + usi ,
i = 1 , . . . , n, s = 1 , 2 ,
(17)
If we assume that σ12 = 0, then M (z) becomes a block diagonal M11 (z ) 0 . To ease the analysis, we 0 M (z )
matrix with M (z ) =
22
further assume that M11 (z ) = M ≡ M0 (z ), in this case it is 22 (z ) easy to see that I − G(z ) = (1/2)
1 1
1 1
. Hence, we have
ˆ ˆ ˆβ ∗ (z ) ∼ [I − G(z )]β( ˆ z ) = (1/2) β1 (z ) + β2 (z ) βˆ 1 (z ) + βˆ 2 (z ) so that
βˆ 1∗ (z ) = βˆ 2∗ (z ) ∼ (1/2)[βˆ 1 (z ) + βˆ 2 (z )]. Hence,
ν0 1 ν0 [I − G(z )]M (z ) [I − G(z )] = 2M0 (z ) 1 −1
1 . 1
˜ z ) denote the estimator based on (17). Under the Let β( assumption that M11 (z ) = M22 (z ) ≡ M0 (z ), the asymptotic variance of the unrestricted estimator is (for s = 1, 2) Av ar [β˜ s (z )] ∼
ν0 nh
M0 (z )−1 ,
while that for the cross-equation coefficient equality estimator is Av ar [βˆ s∗ (z )] ∼
ν0 2nh
M0 (z )−1 .
We observe that the asymptotic variance of the restricted estimator is half that of the unrestricted estimator. This is intuitive given that the restriction is such that we effectively have a sample of size 2n to estimate a single smooth coefficient whereas the unrestricted estimator has available n observations to estimate each smooth coefficient. Note also that for the estimator which respects the crossequation coefficient restrictions we have introduced asymptotic covariance between βˆ 1∗ (z ) and βˆ 2∗ (z ) whereas the unrestricted estimator did not have an asymptotic covariance. Theorem 2.1 only considers a single point z. If one is interested in estimating β(z ) for finitely many different points, because it is known that nonparametric kernel estimator evaluated at different points (of z) are asymptotically independent, point-wise asymptotic results can be applied directly to each of the finitely many points.
153
2.4. Bandwidth selection As with nonparametric estimation, the choice of smoothing parameters is imperative to the performance of semiparametric models. A common approach to obtaining bandwidths is to use a leave-one-out cross-validation routine. Cross-validation routines are an alternative to plug in methods which often require pilot bandwidths and rely on complicated asymptotic expressions. Here we use least-squares cross-validation (LSCV), which in the SUR setting selects bandwidths that minimize CV (h, λ) =
n 1
mn j=1
T
y˜ j − x˜ j βˆ −j (zj )
y˜ j − x˜ j βˆ −j (zj ) ,
(18)
where our vector of leave-one-out estimates of the smooth coefficients is expressed as
−1 1/2 1/2 −1 X−j Γ−j K−j zj βˆ −j (zj ) = XT−j K−j zj 1/2 1/2 −1 Y −j , Γ −j K −j z j × XT−j K−j zj and the notation of subscript −j implies that the jth row is removed from X and Y , and the jth row and the jth column are removed from K(zj ) and Γ −1 . This variant of cross-validation is different in the single regression setting. If we were to use Γ = I, then we could estimate the bandwidths for each equation separately since the smoothing of each equation would be independent from the remaining equations. However, when Γ ̸= I or when we have cross-equation restrictions, we must use the above formulation since this allows for both of these events.7 Bandwidth selection in a systems setting offers several interesting alternatives to single equation bandwidth selection. First, if different z variables appear in different equations, they will need separate bandwidths. Secondly, when we impose cross-equation coefficient restrictions, regardless of whether the bandwidths are identical or not, the coefficients will still satisfy the equalities acrossequations since the coefficients are determined via the system and not a particular equation. This also suggests that in certain settings, we could use information from one equation to assist with estimation of a smooth coefficient in another equation. By following similar arguments as in Li and Racine (2010), one can show that the above cross-validation method works, i.e., it selects an h that is asymptotically equivalent to an optimal h that minimizes a weighted estimation mean squared error. As an aside, an anonymous referee correctly points out that, if we have information that certain coefficients do not depend on z, then this information will assist in the estimation of√the constant coefficients because the rate of convergence will be n, i.e., there is no curse of dimensionality for the parametric components in a partially linear varying coefficient SUR model. Due to space limitations we do not consider a partially linear varying coefficient SUR model in this paper. 2.5. Large sample properties with only discrete environmental variables Theorem 2.1 deals with a mixed continuous and discrete z variable case with z = (z c , z d ), where z c of dimensional q1 , and
7 Additionally, in our setup here we have assumed that all coefficients in a given equation are smoothed equally. Alternatively, we could use the two-step smoothing approach of Fan and Zhang (1999) to allow each coefficient to be smoothed differently. In this setup we would obtain a preliminary set of bandwidths which under-smooth. We leave this as a topic for future research. Further, as noted by a referee, if we elected to allow each variable to be smoothed differently in each smooth coefficient, this will result in a high-dimension vector of bandwidths and it will be numerically more demanding to obtain this vector bandwidths in practice.
154
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
z d of dimensional q2 (1 ≤ q1 ≤ q and q1 + q2 = q). It requires that z contains at least one continuous component z c . If z = z d only contains discrete components, then the result of Theorem 2.1 should be modified as follows:
√
d
ˆ (z )]β(z ) − Jn (z ) → N (0, Λd (z )), n βˆ ∗ (z ) − [I − G
(19)
ˆ and Jn are defined below Eq. (15) except that in the where G kernel function we remove the product kernel function associated with z c , Λd (z ) = [I − G(z )]Md (z )−1 [I − GT (z )], G(z ) = ˆ and Md (z )−1 RT [RMd (z )−1 RT ]−1 R is the probability limit of G, Md (z ) = f (z )E [˜xTi Σ −1 x˜ i |zi = z ]. The proof of (19) follows similar steps as in the proof of Theorem 2.1 by realizing that λ = Op (n−1 ) √ (see Theorem 1 of Li et al. (2013)) and nλ = Op (n−1/2 ) so that the leading bias √ is asymptotically negligible even after multiplying by a factor of n. Therefore, we omit the proof of (19). 3. A test for correct functional form Consider a parametric specification of our smooth coefficient SUR model: ysi = xTsi βs,0 (zi ) + usi ,
i = 1, . . . , n, s = 1, . . . , m,
(20)
where βs,0 (z ) is a parametric function of z. For example, if we had T z )T . In this case a scalar z, we could have βs (z ) = (αs + z γs , β0s we would have a standard SUR. Testing for correct functional form is prudent if model misspecification is of concern. In empirical applications, correctly specified parametric models are more efficient than their semiparametric counterparts. However, if the parametric model is misspecified then estimation results based on it will lead to inconsistent results. In what follows, we propose a statistic that can test if the fully parametric SUR is correctly specified against the SPSCM SUR. If interest hinges on a single equation within the SUR, the test in Li et al. (2002) can be deployed. The null hypothesis that model (20) is correctly specified is H0 : β(z ) − β0 (z ) = 0, almost everywhere. The alternative hypothesis is H1 : β(z ) − β0 (z ) ̸= 0 on a set with positive measure. Following Li et al. (2002), we use an integrated squared difference statistic as the basis for our test.8 The integrated squared difference is defined as def
I=
[β(z ) − β0 (z )]T [β(z ) − β0 (z )] dz .
(21)
I = 0 under H0 and I > 0 under H1 . To obtain a feasible test statistic we replace β(z ) and β0 (z ) with estimates. ˜ z ) (defined in (10)) in (21) to We will replace β(z ) by β( obtain a feasible test statistic. However, given that the random ˜ n (z ) = XT K(z )X in β( ˜ z ) is not strictly bounded denominator D away from 0, obtaining the asymptotic distribution of I is difficult. To avoid the random denominator issue we propose an alternative, weighted test statistic, In =
T
˜ n (z ) β( ˜ z ) − βˆ 0 (z ) D
˜ n (z ) β( ˜ z ) − βˆ 0 (z ) D
dz ,
where βˆ 0 (z ) is the estimator of β0 (z ) based on the parametric null model. After some further simplifications (as in Li et al., 2002), including the removal of a center term and the replacement of a convolution kernel function by a standard second order kernel function, we obtain a final test statistic given by
ˆIn =
1 n2 h q 1
n
n
uˆ Ti,0 x˜ i x˜ Tj uˆ j,0 Kzi zj ,
(22)
i=1 j̸=i
8 Other means to construct consistent model specification tests exist. See Bierens and Ploberger (1997) and Li and Wang (1998) for two alternative setups.
where uˆ i,0 = y˜ i − x˜ i βˆ 0 (zi ) is the m × 1 vector residual from the parametric null model, and Kzi ,zj = Wz c ,z c Lz d ,z d is the generalized i
j
i
j
product kernel introduced in Racine and Li (2004). We now present asymptotic results for our proposed testing procedure. Theorem 3.1. Provided Assumptions 2.1 and 2.2 hold, (1) under H0 , Jˆn = nhq1 /2 ˆIn /σˆ 0 → N (0, 1) in distribution, where
σˆ 02 =
2
n n (ˆuTi,0 x˜ i x˜ Tj uˆ j,0 )2 Kz2i ,zj ,
n2 hq1 i=1 j= ̸ i
is a consistent estimator of σ02 = 2ν0 E [f (zi )P (zi , zi )], where ν0 = W 2 (v)dv and P (zi , zj ) = trE x˜ Ti Σ x˜ j x˜ Ti Σ x˜ j |zi , zj . (2) under H1 , Prob[Jˆn > Bn ] → 1 as n → ∞, where Bn is any nonstochastic sequence with Bn = o(nhq1 /2 ). We provide a sketch of the proof to Theorem 3.1 in Appendix A. Part (1) of Theorem 3.1 suggests a rescaled statistic that is asymptotically pivotal making bootstrapping inference valid, while part (2) shows that the test is consistent under departures from H0 . Here we consider a four step bootstrap procedure to employ the test in practice. Given that we have a system of equations, the basic idea of the bootstrap here is similar to the panel data case (e.g. Henderson et al., 2008) where we randomly sample all the residuals for a particular cross-sectional unit with replacement
T uˆ it t =1 . For our wild bootstrap in the system of equations case, we
assign the same (wild) bootstrap weight to each cross-sectional unit (i) across the S equations. Our four step procedure is as follows: (1) Compute the test statistic Jˆn for the original sample of {xsi }, {zi }, {ysi } for s = 1, . . . , m and i = 1, . . . , n and save the re-centered residuals from the null model uˆ i,0 − uˆ 0 , i = 1, 2, . . . , n, where uˆ i,0 = y˜ i −˜xi βˆ 0 (zi ) and uˆ 0 = (n)−1 j=1 uˆ j,0 . (2) For each cross-sectional unit i, construct the (vector of)
n
√
bootstrapped residuals u∗i , where u∗i
√ with probability 1+√ 5 and u∗i 2 5 √ probability 1 − 1+√ 5 . Construct 2 5
=
√
=
1+ 5 2
1− 5 2
uˆ i,0 − uˆ 0
uˆ i,0 − uˆ 0
with
the bootstrapped left-hand-
variable by adding the bootstrapped residuals to the fitted values under the null as y∗i = x˜ i βˆ 0 (z )+ u∗i . Call {˜x1 , x˜ 2 , . . . , x˜ n }, {z1 , z2 , . . . , zn } and {y∗1 , y∗2 , . . . , y∗n } the bootstrap sample. (3) Calculate Jˆn∗ where Jˆn∗ is calculated the same way as Jˆn except that yi and uˆ i,0 are replaced by y∗i and uˆ ∗i,0 = y∗i − x˜ i βˆ 0∗ (zi ),
βˆ 0∗ (zi ) is the estimator of β0 (zi ) based on the null model and using the bootstrap sample. (4) Repeat steps (2)–(3) a large number (B) of times and then construct the sampling distribution of the bootstrapped test statistics. We reject the null that the parametric model is correctly specified if the estimated test statistic Jˆn is greater than the upper α -percentile of the bootstrapped test statistics. 4. Simulations While the conclusions from Theorem 2.1 suggest the restricted estimator should be efficient relative to the unrestricted estimator, we examine the finite sample performance of each of our estimators to determine the magnitude of such gains. Our setup is a two-equation model with two regressors and no intercept (for simplicity). Specifically, y1i = b11 (zi )x1i,1 + b12 (zi )x1i,2 + u1i y2i = b21 (zi )x2i,1 + b22 (zi )x2i,2 + u2i
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
155
Table 1 Finite sample performance. Entries are the median ratio between the unrestricted and the restricted estimator. Entries greater than 1 indicate superior performance of the restricted estimator. The lower and upper deciles of the ratio of ASE across the 1000 Monte Carlos simulations are reported in parentheses beneath each estimate.
n = 100 n = 200
c=0
n = 500 n = 100 n = 200
c = 0.25
n = 500 n = 100 n = 200
c = 0.50
n = 500 n = 100 n = 200
c=1
n = 500
Function
b11 (z )
b12 (z )
b21 (z )
b22 (z )
1.02 (0.996, 1.06) 1.02 (1.002, 1.04) 1.01 (1.004, 1.03)
1.25 (0.753, 2.04) 1.23 (0.819, 1.82) 1.14 (0.822, 1.58)
4.102 (1.075, 14.93) 3.692 (1.134, 12.82) 3.728 (1.304, 10.93)
1.52 (0.871, 2.64) 1.43 (0.888, 2.30) 1.39 (0.927, 2.09)
1.04 (0.921, 1.14) 1.02 (0.928, 1.11) 1.01 (0.949, 1.07)
1.02 (0.997, 1.07) 1.02 (1.002, 1.04) 1.01 (1.004, 1.03)
1.27 (0.789, 1.96) 1.21 (0.810, 1.76) 1.16 (0.826, 1.58)
4.003 (1.045, 15.89) 3.745 (1.117, 13.36) 3.615 (1.195, 11.10)
1.58 (0.968, 2.71) 1.50 (0.955, 2.48) 1.46 (0.969, 2.17)
1.04 (0.940, 1.15) 1.02 (0.940, 1.11) 1.01 (0.948, 1.08)
1.02 (0.999, 1.06) 1.02 (1.003, 1.05) 1.01 (1.005, 1.03)
1.30 (0.758, 2.06) 1.21 (0.794, 1.81) 1.15 (0.815, 1.58)
4.371 (1.173, 17.45) 4.084 (1.178, 12.23) 3.547 (1.143, 11.27)
1.65 (1.033, 2.74) 1.60 (1.034, 2.58) 1.53 (1.040, 2.36)
1.05 (0.933, 1.15) 1.03 (0.941, 1.12) 1.02 (0.950, 1.08)
1.03 (1.000, 1.08) 1.02 (1.004, 1.05) 1.02 (1.005, 1.03)
1.28 (0.763, 2.14) 1.21 (0.746, 1.89) 1.17 (0.801, 1.63)
3.800 (1.017, 13.19) 3.686 (0.978, 11.72) 3.173 (1.056, 9.63)
1.74 (1.103, 2.77) 1.74 (1.106, 2.73) 1.66 (1.124, 2.42)
1.05 (0.942, 1.18) 1.04 (0.947, 1.14) 1.02 (0.949, 1.09)
for i = 1, . . . , n. For our simulations we consider the case where b12 (z ) = b21 (z ). If one views these as share equations then b12 (z ) = b21 (z ) restrictions are symmetry restrictions (equality of cross-partials or Young’s theorem) which must be imposed. u1i is generated as i.i.d. N (0, 1) while u2i = c u1i + v2i where v2i is also generated as i.i.d. N (0, 1) and c ∈ {0, 0.25, 0.5, 1}. When c = 0 the two equations are unrelated. For c ̸= 0 there exists cross-equation correlation and it is increasing in c. The regressors x1i and x2i are generated as i.i.d. U[0, 2] and U[0, 1], respectively. The nonparametric covariate zi is generated as i.i.d. U[−3, 2]. Finally, we assume the following functional forms for the varying coefficient functions b11 (z ) = 3z ,
b12 (z ) = b21 (z ) = sin(z ),
b22 (z ) = z 3 .
We consider two different estimators for estimating b11 (z ), b12 (z ) and b22 (z ), one is the estimator which ignores the cross-equation restrictions, and one which imposes the restrictions prior to estimation. For our first set of simulations where we compare the restricted and unrestricted estimators, we set Σ = I. We compare the average square error (ASE) of the conditional mean for each of the estimators. Additionally, we calculate the ASE for each smooth coefficient estimate. The ASE for each smooth coefficient is defined as ASE (bst ) = n
−1
n 2 βst (zi ) − βst (zi ) , i=1
for s = 1, 2 and t = 1, 2. We let n = 100, 200 and 500 and set the number of Monte Carlo simulations equal to 1000. Table 1 gives the results from this exercise. Table 2 presents the ratio of ASEs for the conditional mean and the three unknown smooth coefficients comparing the restricted estimator which does not make use of the covariance structure across the two equations, against the estimator which does, ˆ . We refer to the estimator which ignores the Σ = Im vs. Σ covariance structure as the naïve estimator. We do not focus on the performance of the unrestricted estimator since as with the traditional parametric SUR, if the covariates are the same crossequations, then feasible generalized least-squares accounting for the covariance structure is equivalent to equation by equation
OLS estimation. In our case this means that the naïve unrestricted estimator will be identical to the unrestricted estimator which uses ˆ. Σ =Σ The results from the two tables suggest at least two things. First, Table 1 shows that finite sample gains only appear to accrue on coefficients that are common across the two equations. Second, Table 2 shows that there does not appear to be gains from estimating the two step estimator when there is no cross-equation correlation between the errors. Further, the fact that the elements of Γ have to be estimated leads to some finite sample outcomes where the two-step restricted estimator performs worse than the naïve estimator. For example, for c = 0.5 and n = 200 we see that at the median the two-step restricted estimator provides a roughly 10% improvement in the global estimation of b12 (z ), and at the upper/lower deciles we witness a 35% decrease in improvement against a 71% improvement. These features taken together suggest that deploying the naïve restricted estimator is likely to provide solid finite sample results (see Lin and Carroll, 2000 for a similar result in the panel data literature). We also point out that the relative gain/loss in improvement at the upper/lower deciles for the other coefficients, and the unknown function itself suggest roughly comparable trade-offs. 5. Application: Returns to scale in US banking In Section 2.2 we discussed the cost system stemming from derivative properties of the underlying cost function. Here we provide an application of a cost system that is still quite popular in the literature and dates back to (at least) Christensen and Greene (1976). We estimate a cost system for US commercial banks, in which the first equation is the translog cost function and the remainder are the cost share equations derived from Shephard’s lemma. As mentioned before, Shephard’s lemma reinforces the optimality conditions used in deriving the cost function. In this sense its use does not impose any additional restrictions in the model. However, it implies some mathematical relations to be satisfied, viz., the integrability condition (integration of the share equations gives the cost function). Thus, all the parameters of the cost share equations come from the cost function and these restrictions are required when estimating the system. The so called ‘unrestricted’
156
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
Table 2 Finite sample performance. Entries are the median ratio between the naïve and the restricted estimator. Entries greater than 1 indicate superior performance of the restricted estimator. The lower and upper decile of the ratio of ASE across the 1000 Monte Carlos simulations is reported in parentheses beneath each estimate.
n = 100 c=0
n = 200 n = 500 n = 100
c = 0.25
n = 200 n = 500 n = 100
c = 0.50
n = 200 n = 500 n = 100
c=1
n = 200 n = 500
Function
b11 (z )
b12 (z )
b22 (z )
0.998 (0.972, 1.02) 1.000 (0.983, 1.01) 1.000 (0.990, 1.01)
0.993 (0.936, 1.04) 0.994 (0.952, 1.03) 0.998 (0.972, 1.02)
1.04 (0.674, 1.56) 1.05 (0.732, 1.40) 1.03 (0.807, 1.28)
1.01 (0.917, 1.10) 1.01 (0.941, 1.07) 1.00 (0.964, 1.04)
1.000 (0.964, 1.03) 1.001 (0.974, 1.03) 1.002 (0.985, 1.02)
0.991 (0.935, 1.04) 0.994 (0.954, 1.04) 0.998 (0.976, 1.02)
1.05 (0.610, 1.64) 1.08 (0.685, 1.57) 1.06 (0.792, 1.42)
1.01 (0.892, 1.13) 1.01 (0.922, 1.10) 1.01 (0.952, 1.06)
1.003 (0.959, 1.04) 1.004 (0.969, 1.03) 1.002 (0.978, 1.03)
0.995 (0.928, 1.05) 0.995 (0.951, 1.03) 0.997 (0.971, 1.02)
1.09 (0.596, 1.83) 1.10 (0.651, 1.71) 1.09 (0.726, 1.65)
1.02 (0.869, 1.17) 1.02 (0.910, 1.12) 1.01 (0.929, 1.08)
1.009 (0.953, 1.07) 1.007 (0.959, 1.06) 1.005 (0.967, 1.04)
0.989 (0.902, 1.07) 0.994 (0.936, 1.06) 0.993 (0.946, 1.04)
1.23 (0.545, 2.42) 1.24 (0.604, 2.32) 1.23 (0.672, 2.17)
1.04 (0.850, 1.25) 1.03 (0.866, 1.20) 1.01 (0.893, 1.14)
cost system might not be meaningful here because it does not impose integrability conditions. Therefore, we do not (i) estimate the ‘unrestricted’ cost system in which the share equation parameters are treated as ‘free’ parameters (not related to those in the cost function) even if such a system can be defined and (ii) compare results with the ‘restricted’ cost system.9 The data used in our application come from the Reports of Income and Condition (Call Reports) published by the Federal Reserve Bank of Chicago. Our sample consists of a random sample of 3112 commercial banks in the most recent year available (2010). Since banking outputs are services which cannot be stored, the standard practice is to specify the production technology in terms of a dual cost function thereby meaning that banks minimize cost taking outputs as given. We use the standard input and output variables in the literature (see, for example Restrepo-Tobón and Kumbhakar, 2012; Wheelock and Wilson, 2012 and references cited therein).10 Output and input variables for each year are computed as the quarterly average of balance-sheet nominal values. The output variables are: Household and individual loans (y1 ), Real estate loans (y2 ), Loans to business and other institutions (y3 ), Federal funds sold and securities purchased under agreements to resell (y4 ) and Other assets (y5 ). The input variables are: Labor quantity (x1 ), Premises and fixed assets (x2 ), Purchased funds (x3 ), Interest-bearing transaction accounts (x4 ), and Non-transaction accounts (x5 ). For each input xj , its price wj is obtained by dividing its total expenses by the corresponding input quantity. We allow a bank’s technology to vary smoothly depending upon the size of the bank. The United States banking industry has seen numerous changes brought about by changing regulation, resulting in consolidation via merger and acquisition activities, a decline in commercial banks and increased concentration of assets
among the largest banks. As such, a growing body of literature suggests that large banks employ ‘‘hard’’ information-based technology (e.g. Berger et al., 2005) while smaller, commercial banks use ‘‘soft’’ information based production technologies (Berger, 2003). Further, evidence also suggests that banks serve/specialize in different market segments depending upon their size. We follow the main empirical practice (see Berger and Mester, 2003; Berger et al., 2005; Feng and Serletis, 2009, 2010 among others) in the applied banking literature and use log(assets) to measure the size of a bank. Unlike previous papers partitioning banks by size in a arguably ad hoc manner,11 we allow assets to affect technology in a completely flexible manner. The advantage of making all the parameters a nonparametric function of bank size is that it is not necessary to classify banks into some arbitrary number of categories (as in Feng and Serletis, 2009) and allow the coefficients to vary by size categories only. Further, we do not have to specify how bank size enters into the smooth coefficient. Hence, this allows for heterogeneity of any form with respect to bank size, which is widely believed to exist, but the form of which is unknown. 5.1. The model Our normalized translog cost function is ln(Ci /w5i ) = α0 (z ) +
4
αj (z ) ln(wji /w5i )
j =1
+
5
γt (z ) ln yti + (1/2)
γtt ′ (z ) ln yti ln yt ′ i
t =1 t ′ =1
t =1
+ (1/2)
5 5
4 4
ηjj′ (z ) ln(wji /w5i ) ln(wj′ i /w5i )
j=1 j′ =1
9 However, there is nothing wrong in treating the model that satisfies the economic theoretical restrictions as the ‘unrestricted model’ and consider some special cases such as the one that imposes separability constraints, constant returns to scale constraints, etc., and call them as restricted models. In such cases we can compare between restricted and unrestricted models. 10 Table 3 presents summary statistics for our data.
11 While the Federal Financial Institutions Examination Council (FFIEC) provides standard asset size categories (<100 million, 100–300 million and >300 million), there is no reason to believe these categories are set based on banks underlying technology.
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162 Table 3 Summary statistics for US banking data in 2010.
C S1 S2 S3 S4
w1 w2 w3 w4 w5 y1 y2 y3 y4 y5 z
Min
Max
Mean
SD
241.188 0.175 0.019 0.000 0.000 2.964 0.011 0.000 0.000 0.001 272.037 3984.411 1786.997 1964.557 284.819 9.439
34126.050 0.817 0.330 0.424 0.202 108.694 2.282 1.149 0.028 0.038 70964.410 512193.400 196452.400 270410.300 30836.040 13.596
6629.340 0.490 0.120 0.050 0.018 65.760 0.292 0.031 0.005 0.016 6738.084 103649.916 36104.131 46110.637 4300.835 11.891
4936.184 0.090 0.040 0.049 0.022 13.609 0.270 0.034 0.004 0.004 7671.218 89185.630 31350.075 41996.257 4204.533 0.741
Notes: All variables are measured in thousands of dollars. z is log of assets where assets is measured in thousands of dollars.
+
4 5
δjt (z ) ln(wji /w5i ) ln yti + ui ,
j =1 t =1
where Ci is the total cost defined as the sum of costs of all five inputs for bank i. The cost function is normalized (by w5i ) so that the linear homogeneity (in input prices) property is automatically satisfied. Symmetry (Young’s theorem) of the cost function requires imposition of the following restrictions: ηjj′ (z ) = ηj′ j (z ) and γtt ′ (z ) = γt ′ t (z ). These restrictions are automatically satisfied by the above normalized cost function. The derivative conditions (Shephard’s lemma) give us the following four cost share equations (which will be used alongside our cost function in our cost system) Sji = αj (z ) +
4 j ′ =1
ηjj′ (z ) ln(wj′ i /w5i ) +
5
δjt (z ) ln yti + uji ,
t =1
for j ∈ {1, 2, 3, 4}. Note that Sji = wji xji /Ci is the cost share of input j for bank i. The fifth cost share equation (S5i ) is automatically dropped (sum of cost shares equals unity) because we normalize the cost function by w5i . The full five equation cost system requires restrictions both across the share equations as well as between the share equations and the cost function, making it an excellent illustration of our smooth coefficient SUR estimator. (See Table 3.) In almost all banking studies, the focus is on estimating returnsto-scale (RTS), which is defined as the reciprocal of the sum of cost elasticities. That is, if we define the sum of output elasticities as 5 ∂ ln Ci Ecyi = t =1 ∂ ln yti , then RTSi = 1/Ecyi and scale economies are often defined as (1 − Ecyi ). A positive value of scale economies (RTS > 1) means that for a one percentage increase in all outputs cost is increased by less than one percent. Here, the presence of increasing RTS for a bank means that it is operating below its efficient scale size (RTS = 1). Because of this, policy analysts, regulators, and bankers want to know whether banks have scale economies or not, thereby implying whether banks can benefit from expansion. This information is often used to justify bank mergers and regulation. Thus, knowledge of the extent of scale economies is important to argue for or against control of bank size either as a policy in general or for a particular merger case, especially if it involves big banks. Since size is related to RTS, it is important to use it in a flexible manner in the cost function so that the RTS measure is fully flexible in terms of bank-size. We feel that using size as the z variable makes the model much more flexible than arbitrarily grouping them in terms of assets. Given that bank size (our z variable) enters the RTS function in a flexible manner, our RTS estimates are bank-specific (note that this would even be true with a Cobb–Douglas cost function).
157
Table 4 Decile, quartile and mean estimates of RTS for both the parametric and semiparametric models. RTSSUR is estimated returns to scale obtained from the corresponding SUR model and RTSSIN is estimated returns to scale obtained from the corresponding cost function. D10 and D90 are the lower and upper deciles, respectively while Q25 , Q50 and Q75 are the lower quartile, median and upper quartile, respectively.
Parametric RTSSUR RTSSIN Semiparametric RTSSUR RTSSIN
D10
Q25
Q50
Q75
D90
Mean
1.421 1.295
1.637 1.486
1.978 1.764
2.451 2.196
2.972 2.645
2.104 1.896
0.985 1.004
1.001 1.021
1.028 1.049
1.064 1.084
1.097 1.120
1.035 1.056
5.2. Parametric estimates Before looking into the results for our SPSCM estimator, we feel it prudent to examine the results from a flexible parametric model. Here we consider both a single equation and system where our z variable (ln (assets)) enters in full translog form (i.e., it enters linearly, in quadratic form and interacts with each regressor). The first two rows of Table 4 present extreme deciles, quartiles and means for our estimated RTS across the two parametric approaches (single equation and system). We find increasing returns at each decile and substantial heterogeneity across the sample. However, these results do not appear to be economically reasonable and are much larger than those typically found in the literature (e.g. Hughes and Mester, 2013). We tried several other ways to introduce z and these led to even larger values of RTS (these results are not reported but are available from the authors upon request).12 The problem with the parametric approach is that the way in which ln (assets) enters the equation(s) must be specified a priori and each of the approaches we tried led to results which did not appear reasonable from an economic point of view. That is a good enough justification for rejecting a model on economic grounds. However, this may sound judgmental and hence we applied our functional form test outlined in Section 3 to the aforementioned parametric system estimator. Using a wild bootstrap approach specifically designed for a system of equations, we found our p-value to be equal to zero to four decimal places which favors the semiparametric models. Hence, we spend the remainder of the paper focusing on our semiparametric approach. 5.3. Semiparametric estimates The third and fourth rows of Table 4 present extreme deciles, quartiles and means for our estimated RTS across the two semiparametric approaches (single equation and system). We see from the table that a potential first order dominance exists. Both models provide reasonable estimates of RTS for the cross-section of US banks in 2010. We find evidence of increasing RTS from both the SPSCM and SPSCM SUR models. The single equation SPSCM cost model shows increasing RTS for almost all banks, while the SPSCM SUR shows roughly 25% of the banks operated under increasing RTS in 2010. This is consistent with some recent studies (e.g. Wheelock and Wilson, 2012), although we are not aware of any study with data up to 2010. In addition to finding a large number banks with estimated RTS greater than 1 in the data, the absolute values are similar to those of recent studies (e.g. Hughes and Mester, 2013).
12 We have also estimated RTS from the SUR treating assets as discrete, using a framework similar to Feng and Serletis (2009, Table III). As mentioned before, the cutoffs deployed are arbitrary in nature and any misspecification in the appropriate cutoffs could impact estimation results. That being said, our estimated RTS are qualitatively similar and track closely those treating assets as continuous. These results are available upon request.
158
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
it does not take into account the cost share equations thereby ignoring valuable information which was essentially costless (in the sense that the cost shares do not add any extra parameters). Therefore, the estimates from the single equation model will be less efficient compared to the full system. Fig. 3 presents 45° degree plots (Henderson et al., 2012) of the estimated RTS. This plot allows us to visualize the significance of the estimated RTS. For example, we have plotted the vertical and horizontal axes at 1 and any point estimate whose confidence bounds contain 1 are statistically indistinguishable from 1. We note that the RTS estimates obtained from the SUR have smaller bootstrap standard errors (using a wild bootstrap) for approximately 95% of all observations relative to the RTS estimates constructed using the smooth coefficient estimates from just the cost function. Fig. 4 plots only those estimates of RTS which are statistically different from 1. While it is not immediately obvious from either figure that the bootstrap standard errors are narrower for the system, consider the difference across Figs. 3 and 4 for estimated RTS less than one. Here it appears that the single equation estimates are much less precise (in the sense of testing against unitary RTS). 6. Conclusion Fig. 1. Empirical cumulative distribution functions for estimated RTS for the system (SPSCM SUR) and single equation (SPSCM) estimators. The dashed line is for the SPSCM estimates while the solid line is for the SPSCM SUR estimates.
Fig. 1 plots the estimated CDFs of our estimates for RTS. Following Henderson and Maasoumi (2013), a test of firstorder stochastic dominance fails to reject the null of first-order dominance (p-value = 1.000). Note that the percentage of banks operating under increasing RTS is found to be higher in the single equation SPSCM. Since the additional equations (cost shares) are implied by the cost model and add extra information without any extra parameters, we believe that the results from the SUR SPSCM are more reliable. Even with a single z variable, in this case the logarithm of bank assets, it is hard to plot an exact relationship between estimated RTS and the smoothing variable. Fig. 2 plots the estimated RTS for each bank along with an estimated conditional mean using localconstant kernel regression (with 95% confidence bounds). We can see that both models suggest, on average, a decreasing relationship between RTS and bank assets. It also shows that scale economies of very large banks are non-existent. This is consistent with economic theory which suggests that scale economies tend to decline with increase in size. Furthermore, we find that some of the largest bank in our sample have exhausted their scale economies (and are operating at their efficient scale size). This is in contrast to the Wheelock and Wilson (2012) study who found increasing RTS even for the largest banks. Since they have not used a system they have probably over estimated RTS like our single equation SPSCM. That being said, we should note that our data is more recent and we only used a cross-sectional (2010) data set whereas they used a panel (1984–2006). The estimated SPSCM SUR cost system accommodated the derivative conditions (Shephard’s lemma) which imposed parametric restrictions. Given that the cost system without these constraints does not make economic sense, we do not report the so called ‘unrestricted’ cost system.13 Instead, we estimated a single equation SPSCM with the cost function alone. Since the cost function contains all the parameters, we could estimate them consistently using a single equation (i.e., a SPSCM translog cost function). We view this single equation SPSCM as ‘limited’ in the sense that
13 These results are available upon request.
This paper has presented a semiparametric estimator for a seemingly unrelated regression that is straightforward to implement and impose cross-equation restrictions. The use of a semiparametric model lessens the curse of dimensionality relative to general implementation of a nonparametric model. This model is a generalization of Zellner (1962) and should allow for greater insight into economic analysis of systems of equations where economic theory does not provide ample support for a specific form for β(z ). We have further shown the asymptotic properties of this estimator. Our theoretical results suggest an asymptotic improvement when cross-equation restrictions are present and a small scale Monte Carlo analysis demonstrated impressive finite sample gains. The fact that our estimator provides asymptotically more efficient estimates lends credence to the empirical importance of imposing cross-equation restrictions in a production setting. Moreover, the ease with which cross-equation restrictions can be incorporated into this estimator relative to a fully nonparametric approach makes it a desirable empirical tool. Finally, as our estimator is motivated by economic theory, we showed how this estimator could be used to estimate a cost system. Using US commercial banking data, we estimated both a single equation cost function as well as a system approach consisting of the cost function and the cost share equations. We rejected a parametric version of the model with our theoretically justified functional form test with asymptotically valid bootstrap. We found more efficient estimation with the cost system, which requires cross-equation restrictions. Using a smooth coefficient model for each where bank size entered the coefficients nonparametrically, we found that returns-to-scale diminished with bank size. We also found evidence in the system that the largest banks exhibited constant returns-to-scale, but found increasing returns for the largest banks in the single equation model. Acknowledgments We would like to thank three referees and an associate editor for providing insightful comments that greatly improved the paper. We would also like to thank participants at the Applied and Theoretical Econometrics Workshop at the University of ColoradoBoulder, New York Camp Econometrics VII, Midwest Econometric Group (University of Chicago), University of Miami Finance Series and University of Padua for valuable comments. Li’s research is partly supported by National Nature Science Foundation of China (Key Project, Grant # 71133001).
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
(a) System.
159
(b) Single equation.
Fig. 2. RTS point estimates along with local constant conditional mean. The solid lines are the local constant estimates from the regression of estimated RTS on ln assets. 95% Confidence bands are presented as dotted lines. Bandwidths for each plot were selected via least-squares cross-validation.
Appendix A. Proofs
Dn (z ) = M (z ) + op (1).
Proof of Theorem 2.1. In the text it was shown that
Next, it is easy to show that E (Cn (z )) = 0 and Var (Cn (z )) = (nhq1 )−1 (ν0 M (z ) + o(1)). Then by the Liapunov’s central limit
ˆ z ) = β(z ) + Dn (z )−1 {An (z ) + Cn (z )}, β(
(A.1)
where, Dn (z ) = An (z ) = Cn (z ) =
theorem we know that
√ 1 nhq1 1
n
˜
˜
(z ),
d
nhq1 Cn (z ) → N (0, ν0 M (z )).
(A.3)
i =1 n
nhq1 i=1 1
xTi Σ −1 xi Ki
(A.2)
n
nhq1 i=1
x˜ Ti Σ −1 x˜ Ti (β(zi ) − β(z ))Ki (z ),
It can be easily shown that An gives the leading bias terms. E (An (z )) =
x˜ Ti Σ −1 u˜ i Ki (z ).
Note that, Ki (z ) = Wi (z c )Li (z d ) is the ith diagonal element of K (z ). It is easy to show that E (Dn ) = M (z ) + Op (h2 + λ + (nhq1 )−1/2 ), where M (z ) = f (z )E [˜xTi Σ −1 x˜ i |zi = z ] and that Var (Dn,j ) = Op ((nhq1 )−1 ) = op (1), where Dn,j is the jth column of Dn , j = 1, . . . , (ml). Hence, we have
=
1 E E (˜xTi Σ −1 x˜ i |zi )(β(zi ) − β(z ))Ki (z ) hq 1
M (˜z )(β(˜z ) − β(z ))Wz˜ c ,z d Lz˜ d ,z d dz˜ c
z˜ d ∈D
= h2 A1 (z ) + λA2 (z ) + o(h2 + λ),
(A.4)
where D is the support of the discrete random variable z d , M (˜z ) = f (˜z )E [˜xTi Σ −1 x˜ i |zi = z˜ ], and
160
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
(a) SUR.
(b) Single equation.
Fig. 3. RTS point estimates along with 95% bootstrap confidence intervals. Panel (a) presents point estimates of RTS (as circles) from the SPSCM SUR along with 95% bootstrap confidence intervals (as triangles). Panel (b) presents point estimates of RTS (as circles) from the SPSCM along with 95% bootstrap confidence intervals (as triangles). The vertical and horizontal dashed lines at 1 represent constant RTS, estimates in the upper right quadrant display increasing RTS while estimates in the lower left quadrant display decreasing RTS.
A1 (z ) = µ2
q1 [Mj (z )βj (z ) + M (z )βjj (z )/2],
ˆ (z ) = (XT K(z )X)−1 RT {R[XT K(z )X]−1 RT }−1 R, by Eq. (16) Let G we have
j=1
A2 (z ) =
I (˜z d , z d )M (z c , z˜ d )(β(˜z d , z c ) − β(z )),
(A.5)
βˆ ∗ (z ) = [I − Gˆ (z )]{β(z ) + Dn (z )−1 [Ah,λ (z ) + Cn ] + Jn (z ) + op (h2 + λ + (nhq1 )−1/2 )},
z˜ d ∈D
where µ2
q2
= w(v)v 2 dv , I (˜z d , z d ) = zjd , zjd ) with j=1 I (˜ d d d I (˜ , ) = 1(˜zj ̸= zj ) if zj is an unordered discrete variable, and I (˜ , ) = 1(|˜zjd − zjd | = 1) if zjd is an ordered discrete variable. Also, we used the notation for a q1 dimension z c = (z(c1) , . . . , z(cq ) ) 1 zjd zjd
zjd zjd
that gj (z ) =
∂ g (z ) ∂ z(cj)
∂ 2 g (z )
and gjj (z ) = (∂ z c )2 denote the first order (j) and second order derivative functions of g (·) with respect to z(cj) , j = 1, . . . , q1 , where g (z ) is either M (z ) or β(z ). It is straightforward to show that Var (An (z )) = O h2 (nhq1 )−1 = o((nhq1 )−1 ). Combining the above results we have shown that An (z ) = h2 A1 (z ) + λA2 (z ) + op (h2 + λ + (nhq1 )−1/2 ). By Liapunov’s central limit theorem we know that Dn (z )−1
√
d
nhq1 Cn (z ) → M (z )−1 N (0, ν0 M (z ))
= N (0, ν0 M (z )−1 ).
(A.6)
where Ah,λ (z ) = h2 A1 (z ) + λA2 (z ). Hence,
√
ˆ (z )]β(z ) − Jn (z ) nh βˆ ∗ (z ) − [I − G d − [I − Gˆ (z )]Dn (z )−1 Ah,λ (z ) → N (0, Λ(z )),
where G(z ) = M (z )−1 RT [RM (z )−1 RT ]−1 R is the probability limit of ˆ (z ) and Λ(z ) = ν0 [I − G(z )]M (z )−1 [I − G(z )]. G Consistent estimates of bias and variance terms The leading bias and variance terms are Jn (z ) + [I − ˆ (z )]Dn (z )−1 Ah,λ (z ) and Λ(z ) = ν0 [I − G(z )]M (z )−1 [I − G(z )T ], G
ˆ (z ) = (XT K(z )X)−1 RT {R[XT K(z )X]−1 RT }−1 R, Jn (z ) = where G (XT K(z )X)−1 RT {R[XT K(z )X]−1 RT }−1 r, Dn (z ) = (nhq1 )−1 XT K(z )X, and Ah,λ (z ) = h2 A1 (z ) + λA2 (z ).
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
(a) SUR.
161
(b) Single equation.
Fig. 4. RTS point estimates along with 95% bootstrap confidence intervals for estimates statistically different than 1. Panel (a) presents point estimates of RTS (as circles) from the SPSCM SUR along with 95% bootstrap confidence intervals (as triangles). Panel (b) presents point estimates of RTS (as circles) from the SPSCM along with 95% bootstrap confidence intervals (as triangles). The vertical and horizontal dashed lines at 1 represent constant RTS, estimates in the upper right quadrant display increasing RTS while estimates in the lower left quadrant display decreasing RTS.
Let B be any quantity that appears at the leading bias of leading ˆ z ). It is easy to show that the leading bias can be variance of β( ˆ where Bˆ is obtained by replacing estimated by replacing B by B, unknown functions in B by kernel estimators. For example, βj (z ) and βjj (z ) can be estimated by local quadratic method, M ′ (z ) by local linear, other functions by local constant method. The leading ˆ (z ))−1 [I − Gˆ T ], ˆ = ν0 [I − Gˆ ](M variance can be estimated by Σ ˆ is defined a few lines above (16), M ˆ (z ) can be obtained by where G ˆ )−1 , replacing M (z ) by its kernel estimator and replacing Σ −1 by (Σ n T − 1 ˆ zi ). ˆ =n ˆ i uˆ i , uˆ i = y˜ i − x˜ i β( where Σ i =1 u Proof of Theorem 3.1. The proof of Theorem 3.1 follows closely to that of Theorem 3.1 in Li et al. (2002) and so we only provide a sketch of the proof here. Proof of Theorem 3.1(a). First, note that under H0 the identity uˆ i,0 = y˜ i − x˜ i βˆ 0 (zi ) = u˜ i + x˜ i β0 (zi ) − βˆ 0 (zi ) holds. We have
ˆIn = I1n + 2I2n + I3n , where I1n = (n2 hq1 )−1
n n i =1
n n i =1
j̸=i
u˜ Ti x˜ i x˜ Tj u˜ j Kij , I2n = (n2 hq1 )−1
˜ ˜ ˜ ˜ (β0 (zj ) − βˆ 0 (zj ))Kij , I3n = (n2 hq1 )−1
T T j̸=i ui xi xj xj
n n i=1
j̸=i
d (βˆ 0 (zi ) − β0 (zi ))T x˜ Ti x˜ i x˜ Tj x˜ j (βˆ 0 (zj ) − β0 (zj ))Kij . The term nhq1 /2 I1n → N (0, σ02 ) via a similar argument as found in the proof of Lemma 1
in Li and Wang (1998). Here we sketch the argument. I1n can be written as the sum of second order, degenerate Ustatistics. It is straightforward to show that this second order Ustatistic has E (nhq1 /2 I1n ) = 0 and variance σ02 + o(1). With these two facts, using Hall’s (1984) central limit theorem for degenerate d
U-statistics suggests that nhq1 /2 I1n → N (0, σ02 ). It is easy to show
that σˆ 02 = σ02 + op (1). Next, by the fact that βˆ 0 (z ) − β0 (z ) = Op (n−1/2 ), similar arguments as in Li and Wang (1998) lead to I2n = Op (n−1 ), I3n = Op (n−1 ) and σˆ 02 = σ02 + op (1). Therefore, we d
have nhq1 /2 ˆIn /σˆ 0 = nhq1 /2 I1n /σ0 + op (1) → N (0, 1), under H0 .
Proof of Theorem 3.1(b). This proof follows directly from the p
results in Li and Wang (1998). First, one can show that ˆIn → I > 0 under H1 . Second, σˆ 0 = C + op (1) is easily established under
H1 , where C is a positive constant. Lastly, Jˆn = nhq1 /2 ˆIn /σˆ 0 = nhq1 /2 [I /C + op (1)]. As n → ∞, we see that the test statistic Jˆn diverges to +∞ at the rate of nhq1 /2 . This proves the desired result.
162
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162
Having proven parts (a) and (b) this completes the proof of Theorem 3.1. Appendix B. Equation by equation SPSCM estimation We provide a short sketch that when the covariates are identical across the equations (with no restrictions), that accounting for the covariance structure of the residuals does not differ from the estimator that ignores this structure. Recall that our matrix of covariates in this setting is X = Im ⊗ X , Γ = Σ ⊗ In and K1/2 = Im ⊗ K 1/2 . Note that (A ⊗ B)(C ⊗ D) = AC ⊗ BD and (A ⊗ B)−1 = A−1 ⊗ B−1 . Our feasible SPSCM-SUR estimator is
ˆ z ) = XT K1/2 Γ −1 K1/2 X −1 XT K1/2 Γ −1 K1/2 Y β( −1 −1 = Σ −1 ⊗ X T K X Σ ⊗ XT K Y −1 T = Im ⊗ X T K X X K Y T −1 T ˜ z ). = X KX X KXY = β( Appendix C. Properties of G (z ) and M (z )−1 We show here the properties necessary for Proposition 2 of Taylor (1976) hold. First, G(z ) is an asymmetric and idempotent matrix. G(z ) = M (z )−1 RT RM (z )−1 RT
̸= R
T
T −1
RM (z )−1 R
−1
R
RM (z )−1 = G(z )T
and G(z )G(z ) = M (z )−1 RT RM (z )−1 RT
−1
−1 × RM (z )−1 RT RM (z )−1 RT R − 1 = M (z )−1 RT RM (z )−1 RT R = G(z ). Next, the key assumption in Taylor (1976) is that G(z )M (z )−1 = M (z )−1 G(z )T . This identity holds by noting that G(z )M (z )−1 = M (z )−1 RT RM (z )−1 RT
−1
M (z )−1 G(z )T = M (z )−1 RT RM (z )−1 RT
RM (z )−1 ,
−1
RM (z )−1 .
We see that G(z )M (z )−1 = M (z )−1 G(z )T . Thus, the symmetry condition needed for Proposition 2 in Taylor (1976) holds. References Asaftei, G., Parmeter, C.F., 2010. Market power, EU integration and privatization: The case of Romania. J. Comp. Econ. 38 (3), 340–356. Barten, A.P., 1969. Maximum likelihood estimation of a complete system of demand equations. Eur. Econ. Rev. 1, 7–73. Berger, A.N., 2003. The economic effect of technological progress: Evidence from the banking industry. J. Money Credit Bank. 35, 141–176. Berger, A.N., Mester, L.J., 2003. Explaining the dramatic changes in performance of US banks: technological change, deregulation, and dynamic changes in competition. J. Financ. Intermed. 12, 57–95. Berger, A.N., Miller, H.M., Mitchell, A.P., Rajan, R.G., Stein, J.C., 2005. Does function follow organizational form? Evidence from the lending practices of large and small banks. J. Financ. Econ. 76, 237–269. Bierens, H.J., Ploberger, W., 1997. Asymptotic theory of integrated conditional moment tests. Econometrica 65, 1129–1151. Cai, Z., 2007. Trending time-varying coefficient time series models with serially correlated errors. J. Econometrics 136, 163–188. Cai, Z., Das, M., Xiong, H., Wu, X., 2006. Functional coefficient instrumental variables models. J. Econometrics 133, 207–241. Cai, Z., Fan, J., Li, R., 2000a. Efficient estimation and inferences for varyingcoefficient models. J. Amer. Statist. Assoc. 95, 888–902. Cai, Z., Fan, J., Yao, Q., 2000b. Functional-coefficient regression models for nonlinear time series. J. Amer. Statist. Assoc. 95, 941–956.
Cai, Z., Li, Q., 2008. Nonparametric estimation of varying coefficient dynamic panel data models. Econometric Theory 24, 1321–1342. Cai, Z., Li, Q., Park, J.Y., 2009. Functional-coefficient models for nonstationary time series data. J. Econometrics 148, 101–113. Christensen, L.R., Greene, W.H., 1976. Economics of scale in U.S. electric power generation. J. Polit. Econ. 84, 655–676. Cleveland, W.S., Grosse, E., Shyu, W.M., 1991. Local regression models. In: Chambers, J.M., Hastie, T. (Eds.), In Statistical Models in S. Pacific Grove: Wadsworth and Brooks/Cole, pp. 309–376. Das, M., 2005. Instrumental variables estimators of nonparametric models with discrete endogenous regressors. J. Econometrics 124, 335–361. Fan, J., Huang, T., 2005. Profile likelihood inference on semiparametric varyingcoefficient partially linear models. Bernoulli 11, 1031–1057. Fan, J., Zhang, W., 1999. Statistical estimation in varying-coefficient models. Ann. Statist. 27, 1491–1518. Feng, G., Serletis, A., 2009. Efficiency and productivity of the US banking industry, 1998–2005: evidence from the Fourier cost function satisfying global regularity conditions. J. Appl. Econometrics 24, 105–138. Feng, G., Serletis, A., 2010. Efficiency, technical change, and returns to scale in large US banks: Panel data evidence from an output distance function satisfying theoretical regularity. J. Bank. Finance 34 (1), 127–138. Hall, P., 1984. Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivariate Anal. 14 (1), 1–16. Hall, P., Li, Q., Racine, J.S., 2007. Nonparametric estimation of regression functions in the presence of irrelevant variables. Rev. Econ. Stat. 89, 784–789. Hastie, T., Tibshirani, R., 1993. Varying-coefficient models. J. R. Stat. Soc. Ser. B Stat. Methodol. 55, 757–796. Henderson, D.J., Carroll, R.J., Li, Q., 2008. Nonparametric estimation and testing of fixed effects panel data models. J. Econometrics 144, 257–275. Henderson, D.J., Kumbhakar, S.C., Parmeter, C.F., 2012. A simple method to visualize results in nonlinear regression models. Econom. Lett. 117, 578–581. Henderson, D.J., Maasoumi, E., 2013. Searching for rehabilitation in nonparametric regression models with exogenous treatment assignment. In: Ullah, A., Racine, J.S., Su, L. (Eds.), Handbook of Applied Nonparametric and Semiparametric Econometrics and Statistics. Oxford University Press, New York, pp. 501–520. Henderson, D.J., Ullah, A., 2005. A nonparametric random effects estimator. Econom. Lett. 88, 403–407. Huber, P., 1985. Projection Pursuit. Ann. Statist. 13, 435–475. Hughes, J.P., Mester, L.J., 2013. Who said large banks don’t experience scale economies? Evidence from a risk-return-driven cost function. J. Financ. Intermed. 22, 559–585. Jun, S.J., 2009. Local structural quantile effects in a model with a nonseparable control variable. J. Econometrics 151, 82–97. Jun, S.J., Pinkse, J., 2009. Efficient semiparametric seemingly unrleated quantile regression estimation. Econometric Theory 25, 1392–1414. Lavergne, P., Vuong, Q., 2000. Nonparametric significance testing. Econometric Rev. 16, 576–601. Lee, T.-H., Ullah, A., 2001. Nonparametric bootstrap tests for neglected nonlinearity in time series regression models. J. Nonparametr. Stat. 13, 425–451. Li, Q., Huang, C.J., Li, D., Fu, T.-T., 2002. Semiparametric smooth coefficient models. J. Bus. Econom. Statist. 20, 412–422. Li, Q., Racine, J.S., 2010. Smooth varying-coefficient estimation and inference for qualitative and quantitative data. Econometric Theory 26, 1607–1637. Li, Q., Ouyang, D.S., Racine, J.S., 2013. Categorical semiparametric varyingcoefficient models. J. Appl. Econometrics 28, 551–579. Li, Q., Wang, S., 1998. A simple consistent bootstrap test for a parametric regression function. J. Econometrics 87, 145–165. Lin, X., Carroll, R.J., 2000. Nonparametric function estimation for clustered data when the predictor is measured without/with error. J. Amer. Statist. Assoc. 95, 520–534. Mamuneas, T.P., Savvides, A., Stengos, T., 2006. Economic development and the return to human capital: a smooth coefficient semiparametric approach. J. Appl. Econometrics 21, 111–132. Matzkin, R.L., 2008. Identification in nonparametric simultaneous equations. Econometrica 76, 945–978. Orbe, S., Ferreira, E., Rodriguez-Poo, J.M., 2003. An algorithm to estimate time varying parameters SURE models under different types of restriction. Comput. Statist. Data Anal. 42, 363–383. Orbe, S., Ferreira, E., Rodriguez-Poo, J.M., 2005. Nonparametric estimation of time varying parameters under shape restrictions. J. Econometrics 126, 53–77. Ouyang, D., Li, Q., Racine, J., 2009. Nonparametric estimation of regression functions with discrete regressor. Econometric Theory 25, 1–42. Racine, J.S., Li, Q., 2004. Nonparametric estimation of regression functions with both categorical and continuous data. J. Econometrics 119, 99–130. Restrepo-Tobón, D., Kumbhakar, S.C., 2012. Measuring Profit Efficiency without Estimating a Profit Function: The Case of U.S. Commercial Banks. Working paper, State University of New York at Binghamton. Robinson, P., 1989. Nonparametric estimation of time-varying parameters. In: Hackl, P. (Ed.), Analysis and Forecasting of Economic Structural Change. North Holland, Amsterdam. Taylor, W.E., 1976. Prior information on the coefficients when the disturbance covariance matrix is unknown. Econometrica 44, 725–739. Welsh, A.H., Yee, T.W., 2006. Local regression for vector responses. J. Statist. Plann. Inference 136, 3007–3031. Wheelock, D.C., Wilson, P.W., 2012. Do large banks have lower costs? New estimates of returns to scale for U.S. banks. J. Money Credit Bank. 44, 171–199. Zellner, A., 1962. An efficient method for estimating seemingly unrelated regressions and tests for aggregation bias. J. Amer. Statist. Assoc. 57, 585–612.