Gradient estimation of the local-constant semiparametric smooth coefficient model

Gradient estimation of the local-constant semiparametric smooth coefficient model

Economics Letters 185 (2019) 108684 Contents lists available at ScienceDirect Economics Letters journal homepage: www.elsevier.com/locate/ecolet Gr...

335KB Sizes 0 Downloads 89 Views

Economics Letters 185 (2019) 108684

Contents lists available at ScienceDirect

Economics Letters journal homepage: www.elsevier.com/locate/ecolet

Gradient estimation of the local-constant semiparametric smooth coefficient model Xin Geng a , Kai Sun b , a b



School of Finance, Nankai University, Tianjin, China School of Economics, Shanghai University, Shanghai, China

article

info

Article history: Received 25 July 2019 Received in revised form 8 September 2019 Accepted 9 September 2019 Available online 14 September 2019 JEL classification: C14 Keywords: Semiparametric smooth coefficient model Partial derivative

a b s t r a c t This paper studies the analytic gradient of the local-constant estimator for the semiparametric smooth coefficient (SPSC) model. This gradient estimator is shown to be consistent and asymptotically normal. A gradient-based cross-validation method for bandwidth selection is proposed for the SPSC model. Simulation suggests that the analytic gradient of the local-constant estimator outperforms the locallinear counterpart with a relatively large sample size. The gradient estimators are then applied to estimate the marginal effects of research and development on capital and labor productivity in China’s high-technology industry. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Consider a semiparametric smooth coefficient (SPSC) model (Fan and Zhang, 1999; Li et al., 2002): yi = x′i β (zi ) + ui ,

(1)

where yi is a scalar, xi is a p × 1 vector of regressors, zi is a q × 1 vector of covariates, β (·) : Rq → Rp is an unknown function of zi , and ui is the noise term. To estimate the gradient vector of the jth element of β (z), that is, Dβj (z) of dimension q × 1, for j = 1, . . . , p, a natural estimator is the local-linear SPSC estimator. It can be easily motivated. Using Taylor approximation, given z and for zi in a neighborhood of z, βj (zi[ ) ≈ βj (z) + Dβ]j (z)′ (zi − ⨂ xi z). Let α (z) = [β (z)′ , Dβ (z)′ ]′ , Si = , where xi (zi − z) ⨂ ′ ′ ′ Dβ (z) ≡ [Dβ1 (z) , . . . , Dβp (z) ] and denotes the Kronecker product. The local-linear SPSC the solution to the ∑n estimator is ′ sample moment condition i=1 Si (yi − Si α (z))Ki (z) = 0, that

]′ [ )−1 ∑n (∑n ′ ˆ βˆ (z)′ , D β (z)′ = i=1 Si Si Ki (z) i=1 Si yi Ki (z), ( ) where n denotes the sample size, Ki (z) = K H −1 (zi − z) is a q kernel function, and H = diag {hs }s=1 is a diagonal matrix of

is, αˆ (z) =

bandwidths. An alternative estimator for the gradient of β (z) is the analytic gradient of the local-constant SPSC estimator. This estimator has not been carefully studied in the literature yet. We show in this ∗ Correspondence to: Room 514, Economics and Management Building, School of Economics, Shanghai University, Shanghai, 200444, China. E-mail address: [email protected] (K. Sun). https://doi.org/10.1016/j.econlet.2019.108684 0165-1765/© 2019 Elsevier B.V. All rights reserved.

paper that (1) the gradient of the local-constant SPSC estimator can be easily obtained via another local-constant SPSC regression of the transformed residuals from the original estimating equation on the original regressors; (2) this gradient estimator is consistent and asymptotically normal with a typical convergence rate as the gradient estimator in a fully nonparametric model (Rilstone and Ullah, 1989); moreover, its asymptotic distribution and that of the gradient part of the local-linear SPSC estimator are exactly the same under the Gaussian kernel, the most frequently used kernel in empirical work; and most importantly, (3) the local-constant gradient estimator has finite sample advantages over its local-linear counterpart in most cases of our Monte Carlo simulation study. In particular, the mean squared error (MSE) of the local-constant estimator decreases with the sample size at a faster rate than that of the local-linear counterpart, and the local-constant estimator eventually performs better with an adequately large sample. These findings are based on the rule-of-thumb (ROT) bandwidths for gradient estimation, and an alternative bandwidth selection method, the gradient-based cross-validation (GBCV) for the SPSC model. That is, we adapt the fully nonparametric GBCV method proposed in Henderson et al. (2015) such that the new GBCV works for our SPSC model. This new bandwidth selection method yields the optimal bandwidth by minimizing the MSE based on the gradient of regression function, and therefore it is natural to use it in our study. For empirical illustration, using the local-constant and local-linear gradient estimators, we estimate and compare the marginal effects of research and development (R&D) on capital and labor productivity in China’s high-technology industry, and find that the local-constant gradient estimates have smaller variations than the local-linear counterpart.

2

X. Geng and K. Sun / Economics Letters 185 (2019) 108684

The rest of this paper is organized as follows. Section 2 gives the gradient of the local-constant SPSC estimator and establishes its asymptotic distribution. Sections 3 and 4 provide a Monte Carlo study and an empirical example, respectively. Section 5 concludes. 2. Gradient estimation of the local-constant semiparametric smooth coefficient model

Remark ( 1. For ) the product Gaussian kernel Ki (z) = K

∏q

l=1

( k

zli −zl hl

k

zli − zl

( βˆ (z) =

)−1

n



n



xi x′i Ki (z)

i=1

xi yi Ki (z).

(2)

i=1

Alternatively, βˆ (z) in (2) can be defined from the locally weighted sample moment condition: n ∑

xi uˆ i (z)Ki (z) = 0,

(3)

Dl βˆ (z) ≡

i=1

where Gli (z) = ∂ Ki (z)/∂ zl , for l = 1, . . . , q and i = 1, . . . , n, is the partial derivative of Ki (z) with respect to the lth continuous z variable. If Ki (z) is non-zero, then uˆ ∗li (z) = Gli (z)uˆ i (z)/Ki (z). Therefore, the derivative of the local-constant SPSC estimator is simply another local-constant SPSC regression of the transformed residuals, uˆ ∗li (z), on the original regressors, xi and zi .1 That is, (4) reveals that the analytic derivative of βˆ (z) with respect to zl is the SPSC estimator for δ (z) in uˆ ∗li (z) = x′i δ (zi ) + ei ,

(5)

where ei is the noise such that E (xi ei | zi = z ) = 0. The following theorem establishes the consistency and asymptotic normality of Dl βˆ (z). The proof is given in Appendix. Theorem 2.1. Under conditions (A1) and (A2) given in the Appendix, for a fixed value of z with fz (z) > 0 where fz (z) is the marginal density function of zi , and for l = 1, . . . , q, we have (a) p



d

Dl βˆ (z) − Dl β (z) −→ 0, and (b) (nhq h(Dl βˆ (z) )− Dl β (z)) −→ N (0, Σz ), provided that Mz ≡ fz (z)E xi x′i | z(i = z is positive def-) −1 −1 fz (z)E) xi x′i σu2 (xi , zi ) | zi = z ∫inite,2 where Σz ≡ M2z Vz Mz , V(z ≡ 2 Dl K (γ ) dγ , and σu (xi , zi ) ≡ E ui | xi , zi .

1 This formula provides researchers with convenience when they want to ( )

calculate the marginal effects of zl on yˆ i , as ∂ yˆ i /∂ zl = x′i ∂ βˆ (z)/∂ zl .

zli − zl

2

)2 )

hl

, ∀ l = 1, . . . , q, (6)

Therefore, the transformed residual in this case is uli (z) ( ) zli −zl

uˆ i (z)

h2l

=

.

Remark 2. It can be easily shown that the asymptotic distributions of the analytic derivative of the local-constant estimator, Dl βˆ (z), and that of the gradient part of the local-linear estimator, ˆ say D l β (z), are identical using a product Gaussian kernel. In ˆ particular, the asymptotic distribution of D l β (z) can be inferred from Cai et al. (2006) as a special case of no endogeneity, that is,



d

∗ ∗ −1 ∗ −1 ˆ nhq h(D l β (z) − Dl β (z)) −→ N (0, Σz ), where Σz ≡ Mz Vz Mz , and

Vz∗ ≡ C ∗ fz (z)E xi x′i σu2 (xi , zi ) | zi = z ,

(



C ≡

)

k2 (u) du

(∫

)q−1 ∫

u2 k2 (u) du

u2 k(u) du

)2

.

Note that Vz and Vz∗ differ from each other only in a multiplying constant depending on the kernel. Using a product kernel K (γ ) = ∏q s=1 k(γs ), the constant in Vz is C ≡

(4)

exp −

(

(7)



i=1



1

( ) q ( ) ( ) ∂ zli − zl ∏ zsi − zs zli − zl zi − z k k = K . ∂ zl hl hs h h2l s̸ =l

(∫

( n )−1 n ∑ ∑ ∂ βˆ (z) ′ xi uˆ i (z)Gli (z) = xi xi Ki (z) ∂ zl i=1 i=1 ( n )−1 n ∑ ∑ ′ = xi xi Ki (z) xi uˆ ∗li (z)Ki (z),

= √

(

ˆ∗

i=1

where uˆ i (z) = yi − x′i βˆ (z). Differentiate both sides of (3) with respect to the lth continuous z variable, zl , and we obtain the partial derivative of βˆ (z) with respect to zl as

1

it can be shown that Gli (z) =

It is straightforward to derive the local-constant SPSC estimator. We first pre-multiply both sides of (1) by xi , take the expectations of both sides conditional on zi = z, and use the moment condition E (xi ui | zi = z ) = 0. Then, using( a Nadaraya– ) Watson kernel estimator for E (xi yi | zi = z ) and E xi x′i | zi = z , we can write the local-constant SPSC estimator as (see Li et al., 2002; Li and Racine, 2007)

=

h

with

)

hl

( zi −z )

D2l K (

γ ) dγ =

(∫

2

k (u) du

)q−1 ∫

[k′ (u)]2 du.

It turns out that for the Gaussian kernel, the constants, C ∗ and C , are identical, leading to exactly the same asymptotic distribution for these two gradient estimators. Therefore, we use a Monte Carlo study to examine their finite sample performances. 3. Monte Carlo study In the previous section, we show that the local-constant and local-linear SPSC gradient estimators are asymptotically equivalent using the Gaussian kernel, the most frequently used kernel in empirical work. In this section, we compare their finite-sample performances and consider some Monte Carlo experiments using the following data generating process (DGP): yi = β0 (zi ) + β1 (zi )Xi + ui , where zi ∼ U [0, 1], Xi ∼ N(0, 1), and ui ∼ N(0, σ 2 ), with σ = 0.5 or 1, for i = 1, . . . , n. β0 (zi ) = sin(2π zi ), and β1 (zi ) = − cos(2π zi ). Therefore, Dβ0 (zi ) ≡ ∂β0 (zi )/∂ zi = 2π cos(2π zi ), and Dβ1 (zi ) ≡ ∂β1 (zi )/∂ zi = 2π sin(2π zi ).2 The ROT bandwidths for gradient estimation, hROT = 1.06 · sd(zi ) · n−1/7 , are used. In addition, we also use an alternative bandwidth selection method, i.e., the GBCV 2 We choose this DGP following Racine (2016), that studies fully nonparametric models and compares the local-linear derivative estimator (Taylor-based) and the analytic derivative of the local-linear estimator (asymptotic properties are not established). A simple Monte Carlo study in Racine (2016) suggests that the analytic derivative admits a lower bias while the Taylor-based derivative estimator might be preferred from a squared error loss perspective.

X. Geng and K. Sun / Economics Letters 185 (2019) 108684

by adapting the fully nonparametric GBCV proposed in Henderson et al. (2015) such that the new GBCV works for our SPSC model. Specifically, we minimize the following cross-validation (CV) function: CV(h) =

n 1∑

n

(

′ ∂ β (zi )

ˆ

xi

i=1

∂ zi

′ ∂β (zi )

− xi

)2 ,

∂ zi

where x′i = [1, Xi ], ∂ βˆ (zi )/∂ zi = [∂ βˆ 0 (zi )/∂ zi , ∂ βˆ 1 (zi )/∂ zi ]′ is the SPSC gradient estimator—we use either the local-constant or the local-linear estimator to retrieve its GBCV bandwidth, respectively, and ∂β (zi )/∂ zi = [2π cos(2π zi ), 2π sin(2π zi )]′ is the true gradient vector given earlier in this section. Minimizing this CV function – the MSE based on the gradient of regression function – yields the optimal bandwidth, and therefore it is a natural choice for the current setting.3 For a given Monte Carlo replication, we consider the following measures of performance for our estimator of the derivative of the slope: bias =

MSE =

n 1∑

n

i=1

n 1∑

n

i=1

(

∂ βˆ 1 (zi ) ∂β1 (zi ) − ∂ zi ∂ zi

(

)

∂ βˆ 1 (zi ) ∂β1 (zi ) − ∂ zi ∂ zi

,

)2 ,

and variance is computed as MSE minus squared bias. We draw M = 1000 Monte Carlo replications from this DGP, and consider sample sizes of n = 100, 200, 400, 800, and 1600. Table 1 reports the median bias, variance, and MSE over the M replications for each sample size. It can be seen that, although the local-constant and local-linear gradient estimators have the same asymptotic variance, when the ROT bandwidths are used, the median variance and MSE of the local-constant estimator decrease at a faster rate as the sample size increases than those of the locallinear counterpart, and the local-constant estimator eventually performs better with an adequately large sample in terms of all aspects. A similar pattern follows when σ = 0.5 and the GBCV bandwidths are used, i.e., the local-constant estimator performs better in all aspects when the sample size is large enough. For the case of σ = 1 and the GBCV bandwidths, the local-constant estimator performs better in terms of variance and MSE for all sample sizes considered. It seems that the relative performance of the local-constant gradient estimator improves with an increase in the error variance (σ 2 ). In summary, for this simple simulation study where the DGP and bandwidth selection method are quite prevalent, the analytic gradient of the local-constant estimator outperforms the local-linear counterpart when the sample size is large enough with either the ROT or GBCV bandwidths.4 4. Empirical example This section presents a simple empirical example of calculating the derivatives. Following Li et al. (2002) and Zhang et al. (2012), we estimate a production function specified as ln Yi = β0 (zi ) + β1 (zi ) ln Ki + β2 (zi ) ln Li + ui ,

(8)

where Y is the output, K and L are the capital and labor inputs, respectively, and z = lnR&D. We use the data set of China’s high technology industry from Zhang et al. (2012), in which details 3 Note that the GBCV method for the SPSC model is infeasible because the true gradient vector is unknown in a real application. A feasible GBCV method for the SPSC model is saved as future work. 4 We save as future work more extensive Monte Carlo studies under complex scenarios.

3

of data construction and summary statistics of the variables can be found. Here, we are particularly interested in calculating the marginal effects of lnR&D on β1 and β2 , i.e., the capital and labor productivity, respectively, in addition to estimating β1 and β2 themselves. Fig. 1 shows the kernel density plots of the smooth coefficient estimates. It can be seen that, using the ROT bandwidth,5 the local-constant coefficient estimates have relatively smaller variations than the local-linear counterpart, given the particular sample. Economic theory indicates that β1 and β2 are non-negative. It would be less likely for violations of economic theory to occur, if the coefficient estimates have smaller variations.6 Furthermore, the local-linear estimates have heavier right tails that cross into the region that is larger than unity, and this suggests that some local-linear estimates are overly large, and thus have fewer economic meanings. For the local-constant estimates, the median βˆ 1 is 0.3804, and the median βˆ 2 is 0.7068— these estimates are in line with conventional wisdom as the capital share and labor share, respectively. Fig. 2 shows the kernel density plots of the gradient estimates. These gradient estimates measure input bias (Stevenson, 1980), i.e., the marginal effects of lnR&D on the elasticities of capital and labor, respectively. It can be seen that, using the ROT bandwidth for gradient estimation, the local-constant gradient estimates have smaller variations than the local-linear counterparts. Policy-making would be easier with the local-constant gradient estimates because the marginal effects are not overly heterogeneous. Similar to the findings in Zhang and Sun (2019), we find that the capital (labor) share decreases (increases) with lnR&D for most of the local-constant gradient estimates. In particular, the capital (labor) share increases with lnR&D for about 38% (63%) observations. In terms of ∂ βˆ 2 (z)/∂ z, most positive estimates come from the central region7 of China – known for its well-established manufacturing infrastructure – where R&D investments seem most likely to boost labor productivity. Finally, based on the gradient estimates of all the smooth coefficients in (8), we obtain the productivity of R&D, i.e., ∂ ln Y /∂ z, and find that the median value of it is 0.0464: if R&D investments are increased by 1%, ceteris paribus, output would increase by 0.0464%. Thus, in general, R&D has a positive effect on output. More specifically, the productivity of R&D varies across regions— the central region, generally speaking, enjoys relatively larger output-enhancing effects of R&D than the other two regions. 5. Conclusion This paper derives the analytic derivative of the local-constant smooth coefficient estimator, βˆ (z), with respect to continuous z variables. The formula of the gradient estimator indicates that the derivative estimates of the local-constant SPSC estimator are based on the residuals from the original local-constant SPSC regression. Asymptotic properties of our gradient estimator are studied. In particular, our gradient estimator and the gradient part of the local-linear SPSC estimator have the same asymptotic distribution using the Gaussian kernel. A Monte Carlo study shows that the local-constant gradient estimator has finite sample advantages over its local-linear counterpart under certain circumstances. We illustrate the use of this estimator by an 5 This bandwidth is 1.06 · sd(z ) · n−1/5 , which is different from the ROT i bandwidth for gradient estimation. 6 In fact, all the coefficient estimates are strictly positive with the localconstant SPSC estimator; however, βˆ 1 is negative for one observation with the local-linear SPSC estimator. 7 See Zhang et al. (2012) for details about the eastern, central, and western regions of China.

4

X. Geng and K. Sun / Economics Letters 185 (2019) 108684

Table 1 Performance of the semiparametric gradient estimators.

σ = 0.5

σ =1

Local-constant n

Bias

Local-linear

Local-constant

Variance

MSE

Bias

Variance

MSE

4.5401 3.3394 2.3578 1.6875 1.1759

4.6903 3.3924 2.3772 1.6949 1.1776

0.0099 −0.0086 0.0094 0.0007 −0.0081

3.8322 3.2162 2.6452 2.1949 1.7921

3.9261 3.2614 2.6752 2.2061 1.7985

2.5855 1.6960 1.1134 0.7250 0.4925

2.6920 1.7321 1.1230 0.7379 0.4976

0.0193 −0.0042 0.0062 0.0018 0.0010

2.0850 1.3795 0.9908 0.7099 0.5013

2.2005 1.4485 1.0222 0.7320 0.5132

Local-linear

Bias

Variance

MSE

Bias

Variance

MSE

0.0213

4.7200 3.4220 2.4467 1.7393 1.2150

4.8473 3.4953 2.4840 1.7558 1.2219

−0.0108 −0.0110 0.0150 0.0012 −0.0086

4.2872 3.5015 2.8288 2.2807 1.8664

4.5205 3.6457 2.8861 2.3243 1.8795

3.3714 2.1789 1.4951 1.0321 0.7161

3.4946 2.2799 1.5318 1.0569 0.7291

0.0358 0.0093 0.0055 0.0110 0.0068

3.5983 2.5441 1.7898 1.3005 0.9500

3.9002 2.7266 1.8868 1.3587 0.9747

ROT bandwidths 100 200 400 800 1600

0.0176

−0.0036 0.0104 0.0044 −0.0016

−0.0019 0.0112 0.0062 −0.0020

GBCV bandwidths 100 200 400 800 1600

0.0124

−0.0056 0.0061 0.0025 0.0002

0.0160

−0.0080 −0.0069 0.0100

−0.0014

Fig. 1. Kernel density plots of the smooth coefficient estimates.

Fig. 2. Kernel density plots of the gradient estimates.

empirical example about estimating the marginal effects of R&D on capital and labor productivity in China’s high-technology industry.

useful comments and suggestions, and are responsible for all remaining errors.

Acknowledgments

Appendix. Proof of theorem 2.1

This research is funded by the National Natural Science Foundation of China (Grant ID number: 71801146). The authors would like to thank Carlos Martins-Filho and an anonymous referee for

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.econlet.2019.108684.

X. Geng and K. Sun / Economics Letters 185 (2019) 108684

References Cai, Z., Das, M., Xiong, H., Wu, X., 2006. Functional coefficient instrumental variables models. J. Econometrics 133, 207–241. Fan, J., Zhang, W., 1999. Statistical estimation in varying coefficient models. Ann. Statist. 27, 1491–1518. Henderson, D., Li, Q., Parmeter, C., Yao, S., 2015. Gradient-based smoothing parameter selection for nonparametric regression estimation. J. Econometrics 184, 233–241. Li, Q., Huang, C., Li, D., Fu, T., 2002. Semiparametric smooth coefficient models. J. Bus. Econom. Statist. 20 (3), 412–422.

5

Li, Q., Racine, J.S., 2007. Nonparametric Econometrics: Theory and Practice. Princeton University Press. Racine, J.S., 2016. Local Polynomial Derivative Estimation: Analytic or Taylor? Essays in Honor of Aman Ullah. Emerald Group Publishing Limited. Rilstone, P., Ullah, A., 1989. Nonparametric estimation of response coefficients. Comm. Statist. Theory Methods 18 (7), 2615–2627. Stevenson, R., 1980. Measuring technological bias. Amer. Econ. Rev. 70, 162–173. Zhang, Y.-F., Sun, K., 2019. How does infrastructure affect economic growth? Insights from a semiparametric smooth coefficient approach and the case of telecommunications in China. Econ. Inq. 57 (3), 1239–1255. Zhang, R., Sun, K., Delgado, M., Kumbhakar, S., 2012. Productivity in China’s high technology industry: Regional heterogeneity and R & D. Technol. Forecast. Soc. Change 79, 127–141.