Journal Pre-proof Research on regional differences and influencing factors of green technology innovation efficiency of China’s high-tech industry Chunyang Liu, Xingyu Gao, Wanli Ma, Xiangtuo Chen
PII: DOI: Reference:
S0377-0427(19)30602-8 https://doi.org/10.1016/j.cam.2019.112597 CAM 112597
To appear in:
Journal of Computational and Applied Mathematics
Received date : 25 March 2019 Revised date : 1 September 2019 Please cite this article as: C. Liu, X. Gao, W. Ma et al., Research on regional differences and influencing factors of green technology innovation efficiency of China’s high-tech industry, Journal of Computational and Applied Mathematics (2019), doi: https://doi.org/10.1016/j.cam.2019.112597. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Elsevier B.V. All rights reserved.
Manuscript Click here to view linked References
Journal Pre-proof
Research on Regional Differences and Influencing Factors of Green Technology Innovation Efficiency of China's High-tech Industry
1,b
1,c,*
, Xiangtuo Chen
2,d
of
1,a
Chunyang Liu , Xingyu Gao , Wanli Ma
1 School of Business, Shandong University, Weihai 264209, CHINA
2 Laboratory MICS, CentraleSupélec, Paris-Saclay University, Gif sur Yvette 91190, FRANCE
p ro
* Corresponding author:
[email protected]
Jo
urn
al
Pr e-
Note: This paper is supported by the Graduate Science Research Fund of Shandong University Business School.
Journal Pre-proof
Abstract Through the K-means clustering analysis, it divides the regions of China into four clusters according to the differences in high-tech industry development level between 2008 and 2016. Considering "environmental pollution" and "innovation failure", an improved SBM-DEA efficiency measurement model was constructed to measure the green technology innovation efficiency of China's high-tech industry clusters. Lasso regression was used to screen out the factors affecting the green technology innovation efficiency of high-tech industry in each cluster area. On this basis, quantile regression method is used to
of
study the influence degree and regional differences of various influencing factors on green innovation efficiency of high-tech industry at different quantile. Meanwhile, DEA-tobit model is used for robustness test. The research shows that in each cluster area, the factors that significantly affect the green innovation
p ro
efficiency of high-tech industry are different, and the degree of influence of each factor on the innovation efficiency at different quantile is also different. Combining the empirical results with the reality of high-tech industries in various regions, the corresponding policy recommendations are put forward. Keywords: High-tech Industry; Innovation Efficiency; SBM-DEA; K-means Clustering Analysis; Lasso
Jo
urn
al
Pr e-
Regression;Quantile Regression;DEA-Tobit Model
Journal Pre-proof 1.Introduction High-tech industry has an important role in the strategic emerging industry in China. China's economic development has entered a new normal, how to effectively optimize the allocation of innovation resources and improve the innovation efficiency of enterprises is one of the important issues facing China's implementation of innovation-driven strategy and construction of an innovation-oriented country. In recent years, China's high-tech industry has made great
of
contribution to China's economic development with its rapid development. However, there are also many problems. For example, The development of high-tech industries in China is unbalanced among regions, and the factors influencing the innovation level of regional
p ro
high-tech industry are different. Due to the characteristics of technology and knowledge concentration and low resource consumption, the technology innovation efficiency plays a more important guiding role in industrial development, and determines the overall development level of high-tech industries in various regions(Meng et al.,2019). Therefore, it is still of great theoretical and practical significance to further enrich the measure and evaluation means of innovation efficiency in high-tech industries and analyze the key factors affecting
Pr e-
innovation efficiency in various regions.
The research objectives of this paper are mainly reflected in three aspects. Firstly, this study takes "environmental pollution" and "innovation failure" as the unexpected output of green innovation efficiency. The SBM - DEA model is constructed to measure and analyze the green technology innovation efficiency of high-tech industries in each cluster area. Secondly, the factors influencing the green innovation efficiency of high-tech industries in each cluster region were selected by Lasso regression. Finally, the quantile regression method and DEA-Tobit method are used to study the differences in the degree of influence of each factor on green
al
innovation efficiency of high-tech industry in different quantiles. On the basis of relevant empirical results and in combination with China's national conditions, this paper provides valuable policy suggestions for deepening the reform of China's regional high-tech industry and
urn
realizing innovation-driven and efficient development. 2.Literature Review
2.1Research on the measurement of innovation efficiency In recent years, scholars have deepened their research on the innovation efficiency of
Jo
high-tech industries, and their perspectives and methods have been gradually expanded. Data envelopment analysis (DEA) is a common method to measure innovation efficiency. Carayannis et al. (2016) compared the innovation efficiency of 185 regions in 23 European countries with the multi-objective DEA model, and pointed out that there were differences in the innovation efficiency between different regions and different innovation stages. Kaya Samut and Cafrı (2016) used DEA to measure the Innovation efficiency values of hospitals in 29 OECD countries between 2000 and 2010, and then applied the panel Tobit model to determine the environmental factors affecting hospital efficiency scores. By decomposing the Malmquist
Journal Pre-proof productivity index decomposition, the change of the efficiency decomposition value was analyzed. Lafarga and Balderrama (2015) used DEA method to calculate the overall efficiency, patent production efficiency and scientific paper production efficiency of 32 Mexican states. Yeung and Azevedo (2011) employed data envelopment analysis to measure the efficiency of Brazilian state courts. 2.2Research on the influencing factors of innovation efficiency
of
Scholars at home and abroad have conducted a lot of researches on the influencing factors of innovation efficiency of high-tech industry from different perspectives and methods, and achieved fruitful results. Some studies show that the factor market distortion can inhibit
p ro
innovation activities in high-tech industries (Ji and Dou 2016; Li et al. 2017, 2018; Gao et al. 2019; Shen et al. 2018). Fang and Chiu (2017) shows that industry-university-research cooperation is an effective way to improve innovation performance. Kalapouti et al. (2017) found that patent application, development level, employment level and technological diversity have an impact on innovation efficiency. Hong et al. (2016) found that government subsidies had a negative impact on the innovation efficiency of high-tech industries, while private R&D funds observably
Pr e-
promoted the innovation efficiency of high-tech industries. Castro and Gregorio (2015) believe that acquiring knowledge through the Internet and the outside world is very important for sustained and stable innovation.
2.3 Study on the method of influencing factor selection
Many scholars use Lasso regression in machine learning to select the influencing factors of variables (Liu and Du 2012; Fang et al. 2014; Mansiaux and Carrat 2014; Pereira et al. 2016), the advantage of Lasso method is that it directly estimates the regression coefficient of insignificant
al
variables by adding penalty terms, so as to eliminate weak variables. Lasso regression modeling can be used regardless of the nature of the target dependent variable. The basic idea of Lasso regression is to minimize the sum of the squared residuals under the condition that the sum of the absolute values of a regression coefficient is less than a constant, so that the coefficient
urn
should be reduced to zero strictly, and the corresponding variables are deleted to realize variable selection (Tibshirani, 1996). Matsui and Hidetoshi (2018) estimated the parameters of the logistic regression model through he sparse group Lasso-type penalty, and then selected the optimization parameters of the model under the model selection criteria. At present, some scholars use quantile regression to select variables (Alhamzawi and Yu 2012; Jiang et al. 2014; Fan 2015; ). Some studies combine LASSO with quantile to estimate and select parameters.
Jo
Hashem et al. (2016) used a group lasso penalty to estimate the parameters of the quantile regression model in the binary response classification problem. Xie and Xu (2014) introduced sparse group Lasso technology to construct an algorithm for feature selection of uncertain data. Benoit et al. (2016) proposed the Bayesian hierarchical model to select and estimate variables in binary quantile regression, and gives the corresponding Lasso program. Existing literature lacks in-depth research on the scientific measurement of green innovation efficiency of high-tech industry, and the selection of influencing factors is also lack of comprehensiveness and innovation. Based on the existing literature, SBM-DEA model considering "environmental pollution" and "innovation failure" is adopted to measure green
Journal Pre-proof innovation efficiency, and LASSO regression is used to screen out the variables that affect green innovation efficiency. On this basis, DEA-Tobit model and panel quantile regression are used to measure the direction and degree of the impact of variables on green innovation efficiency. By comparing the parameter estimates of the two regression models, the results are more convincing, and the accuracy of LASSO selection variables is further verified. 3. Model and Algorithm
of
3.1. SBM-DEA efficiency measurement model Data Envelopment Analysis(DEA) refers to a nonparametric technical efficiency analysis method used to measure the relative efficiency of a research object, the decision-making unit
p ro
(DMU). In the DEA theoretical model, the number of DMUs can be infinite.
Most of the traditional DEA models are radial and angular measures. Tone (2001) constructed a non-radial and non-angle SBM-DEA model, which could fully consider the slack of input and output. Suppose there are n decision making units in a production system. Each decision-making unit has three vectors: input X , expected output Y g and non-expected outpu s s g b g b Y b . Its elements can be expressed as x R m , y R 1 and y R 2 .defining matrix X , Y , Y as
Pr e-
follows: X x1,, xn R mn , Y g y1g ,, yng Rs1n , Y b y1b ,, ynb Rs2 n , Among them, xi 0 , yis 0 和 y1b 0i 1,2,, n . Then the SMB-DEA model based on unexpected output can be
expressed as:
1 m si 1 m i 1 xi 0 * = min s1 s2 1 sg sb 1 ( rg rb ) s1 s2 r 1 yr 0 r=1 yr 0
(1)
urn
al
x0 X s g g g y0 Y s s.t. b b b y0 Y s s 0, s g 0, s b 0, 0
In formula(1), s , s g and s b indicate the slack of input, expected output and non-expected output, respectively, is a weight vector, objective function is about s , s g , s b Strictly monotonous decreasing. And 0 1 . when 1 , s , s g , s b are zero, if 1 ,it shows that
Jo
the decision making unit is efficient;if It shows that there is redundancy in decision making units, and the efficiency can be improved by optimizing the allocation. 3.2.K - means Algorithm K-means clustering is a clustering analysis method based on partition, and it is also the most classical clustering method at present. The basic principle is that after the given classification number k , the algorithm divides the data set into k categories C C1, C2 ,, Ck , and then iterates continuously until the objective function reaches the minimum value, that is, the final clustering result is obtained. The objective function is
Journal Pre-proof k
E
x u
2
(2)
i
i 1 xCi
In formula(2), E is the square error sum of all clustering objects, x is the clustering object, and ui is the average of all data in class Ci , namely the clustering center(Wang et al.,2019). 3.3.Lasso Regression Algorithm
of
Generally, we consider a linear regression problem with P variables and n observations in the form as follows:
p ro
Y ~ 1 x1 2 x2 ... p x p X
(3)
In formula(3), Y Rn , is the explained variable, X ( x1 , x2 ,..., x p ) is the matrix of explanatory 2 variables of dimension n p ; ~ N (0, I n ) is the vector of stochastic errors; is the coefficient
vector of regression model. Under the least squares criterion, estimate coefficient vector by minimizing the sum of squares of errors:
R n
However,
when
2 2
Pr e-
arg min Y X
the
variables
are
multicollinearity,
the
ordinary
(4)
least
squares
regression(OLS) performs poorly in terms of prediction quality and model complexity. Robert Tibshirani (1996) first proposed the Lasso regression method, which is a compression estimation method. It compresses the regression coefficient of some indistinctive variables to 0 by introducing penalty function into the regression model. Lasso regression method can be used to solve multicollinearity problem. The idea of Lasso regression is to minimize the sum of
al
the residual squares by imposing a constraint on the L1 norm of the regression coefficients. It is written in the following penalized form:
1 n
lasso arg min Y X
1 , 0.
(5)
urn
R
2 2
In formula(5), is called the penalty parameter. It helps not only to reduce the bias by narrowing the coefficients , but also make the automatic selection of variables, by while estimating certain components of the coefficient vector by 0.
Jo
3.4.The Quantile Regression Method In this paper, the quantile regression method is used to analyze the change of the influence degree of various factors on the green innovation efficiency of high-tech industry when it is at different levels. The estimated coefficient of quantile regression represents the marginal effect of independent variable on dependent variable at specific quantiles, which can fully reflect the characteristics of conditional distribution of dependent variable, especially the effective description of local information of distribution function, avoiding one-sided judgment on research issues based on "average" influence. Let the distribution function form of random variable Y be as follows:
Journal Pre-proof F ( y ) P(Y y )
(6)
In equation (6), y of the (0 1 0) quantile function may be defined as: Q( ) inf y : F ( y )
(7)
of
In equation (7), 0 1 represents the proportion of data below the regression line or regression plane. The characteristic of the partition function is that the distribution of the dependent variable y has a proportion of less than Q( ) , and the part (1 ) is greater than Q( ) . The distribution of y is divided by . To solve for quantile regression, the probability function () is defined as: Yi X i'
( ) ( 1)
p ro
Yi X i'
(8)
In equation (8), is the parameter reflecting the probability density function, () represents the probability density function relationship when the sample points of y are below and above the quantile . Suppose quantile regression model as follows: yˆQ Q Q x
(9)
Pr e-
In equation (9), quantile regression of y is to find that the sum of absolute deviation of y under Q quantile is the minimum, and the expression is as follows: min
y
iQ
Q Q X i iQ
(10)
In equation (10), for simplicity, 1 can be assumed in the specific estimation process, so for any quantile regression, parameter estimation is to minimize the weighted sum of squared absolute errors:
Y X (1 ) Y X i
' i
yi X i'
i
' i
al
ˆ (t ) arg min
(11)
yi X i'
In equation (11), Yi and X i ' represent vectors of dependent variable and independent variable
urn
respectively, for the estimated quantile values, when takes different values on (0,1), different parameter estimates can be obtained. 3.5.System Framewrok
The paper takes "Research on Regional Differences and Influencing Factors of Green Technology Innovation Efficiency of China's high-tech Industry Based on Machine Learning" as
Jo
the topic, which involves the measurement and regional difference of innovation efficiency of high-tech industry based on machine learning, the analysis of the influencing factors of innovation efficiency of high-tech industry based on machine learning and the differentiated realization path and institutional arrangement for improving the efficiency of green innovation in high-tech industries. The specific system framework of this paper is shown in Figure 1.
p ro
of
Journal Pre-proof
Diagram 1. System framework of this paper 4.Data and Variables
Pr e-
The study sample includes 30 provincial administrative regions in mainland China (excluding Tibet due to limited data), totaling 270 observation samples. Relevant data are from China Statistical Yearbook, China High-tech Industry Statistical Yearbook, China Science and Technology Statistical Yearbook and China Financial Yearbook from 2009 to 2016. 4.1.Green Innovation Efficiency Index System
The efficiency is measured by SBM-DEA method in MaxDEA software. The setting and data processing of each indicator are described in detail as follows:
al
The input of innovation activities is considered from two perspectives. From the perspective of human capital investment, the index of full-time equivalence of R&D personnel in high-tech industries is adopted to better measure the human investment and actual working time of R&D
urn
personnel in innovation activities. From the perspective of capital investment, the internal expenditure index of R&D expenditure of high-tech industries is used to represent the actual expenditure of R&D expenditure in each region through the construction of R&D price index (Lyu and Li, 2016).
The expected output of innovation activities is also considered from two perspectives. From
Jo
the perspective of knowledge technology, Domestic patent application volume is selected to approximate represent the output of knowledge technology. From the perspective of product, New product sales revenue can reflect the result of innovation efficiency from the output dimension of technology transformation. In the base period of 2008, the sales revenue of new products is reduced according to the producer price index. The unexpected output of innovation activities is considered from the perspective of environmental pollution. Industrial waste water and waste gas emissions are selected as
Journal Pre-proof unexpected outputs, which makes the calculated results of technological innovation efficiency more practical. 4.2.Other Impact Factors The influencing factors selected in this paper are as follows: (1)Green Financial Resources Allocation Efficiency (FE)
of
The allocation efficiency of green financial resources in high-tech industries is constructed as follows: based on relevant literature and theories, R&D investment can dramatically improve the level of innovation (Li and Zhang,2014), so government funds and enterprises' self-raised developing innovation.
p ro
R&D funds are selected as input indicators, which fully ensures the capital supply for
The selected expected output indexes include the sales revenue of new products and the number of new product projects. These indexes can reflect the efficiency of financial resource allocation
from
the
output
dimension
of
financial
resource
transformation.
The
output.
Pr e-
non-performing loan ratio of commercial Banks is regarded as an indicator of unexpected The non-performing loan ratio of the benchmark period in 2008 is set as 1, and the
ratio between the non-performing loan amount of the current year and the non-performing loan amount of the previous year is taken as the non-performing loan ratio of the current year. (2)Degree of Opening-up (DO). By participating in global competition, we can find our own shortcomings and make use of our own comparative advantages, so as to improve the technological innovation. In this paper, The ratio of the export delivery value of high-tech industries to the main business income is selected as the indicator of opening up.
al
(3)New Product Demand (NPD). The market orientation formed by users' demand for new products can guide enterprises' behaviors to a certain extent, so that enterprises can improve
urn
the efficiency of technological innovation through targeted organizational management and R&D learning. The ratio of new product sales revenue and main business revenue of high-tech industries is used to measure the demand for new products in different regions. (4)The Factors of Financial Market Distortions(FD). In terms of measuring the distortion of factor market, this method is adopted to measure the distortion degree of financial factor market in different regions according to the relative difference between the degree of financial
Jo
marketization in different regions and the degree of benchmark financial marketization, that is:
DISTit max(MARKit ) MARKit max(MARKit )
(12)
In formula (12), i and t represent region and year respectively. DISTit is the degree of market
distortion of financial factors; MARKit is financial marketization index; maxMARKit is the maximum value of the financial marketization index in the sample; DISTit is between 0 and 1. This method not only reflects the relative differences in financial factor market distortions between regions, but also the inter-temporal changes of such differences.
Journal Pre-proof (5) Level of Government Support (GS). Government support plays a very important role in improving the innovation level of high-tech industry, which is mainly reflected in the fiscal subsidy policies and tax preferential policies. This paper selects the proportion of government funds in the R&D expenditure of high-tech industry in the total sum of government funds and enterprise funds as the index to measure the degree of government support. (6) Transfer of Knowledge(DKTD,IKTD). The knowledge transfer degree reflects the flow of innovation resources such as technology between regions. From the source of knowledge
of
transfer, it can be classified into domestic knowledge transfer degree and international knowledge transfer degree, which respectively reflect the flow of innovative resources in the region, other regions and abroad. In this paper, domestic knowledge transfer degree (DKTD) is
p ro
measured by the proportion of regional technology market technology transfer (contract amount) to regional GDP, while international knowledge transfer degree (IKTD) is measured by the proportion of foreign technology transfer contract amount to regional GDP. (7) Financial support(FS). The investment of science and technology finance and the support of venture capital can promote the technological innovation. This paper uses the
Pr e-
proportion of bank loan in the total amount of science and technology fund to represent financial support.
(8) Industry-university-research cooperation (IURC). Enterprises, universities and research institutions are the innovation subjects and basic forces of innovation in the regional innovation system (Li and Fu, 2014), the interaction among the three has an important impact on innovation performance (Lundvall, 1988; Edquist et al., 2002). This paper uses the proportion of enterprise funds in the total amount of science and technology funds raised by
al
universities and R&D institutions. (9) Foreign direct investment (FDI). Foreign direct investment will bring capital, advanced technology, equipment and knowledge, advanced management experience and technical
urn
talents to the region. However, foreign investors' advantages in capital, technology and government policies will squeeze the market share of local enterprises. In this paper, foreign direct investment is measured by the proportion of total investment of foreign-funded enterprises in the gross regional product. (10) Information Infrastructure Development (IID). Information technology is diffused
Jo
through the construction of information infrastructure, so as to promote the penetration and integration between high-tech industries and non-high-tech industries, and thus improve the efficiency of innovation. The proportion of total postal and telecommunications business in regional GDP is adopted to reflect the development level of infrastructure. (11) Regional Economic Development (lnGDP). The real GDP of a region not only reflects its economic performance, but also its economic strength and wealth. Taking the logarithm of real GDP makes the data more stable.
Journal Pre-proof
Standard
Sample Size
Maximum
Minimum
Mean
IV
270
1
0.011
0.540
0.258
FE
270
1
0.543
0.864
0.147
DO
270
3.344
0
0.350
0.463
NPO
270
0.622
0.001
0.224
0.148
FD
270
0.746
0
0.380
0.178
GS
270
0.613
0.002
0.128
0.113
DKTD
270
0.068
0.001
0.010
0.009
IKTD
270
0.045
0
0.003
0.006
FS
270
0.135
IURC
270
0.321
FDI
270
4.466
IID
270
0.972
lnGDP
270
10.611
p ro
of
Variables
deviation
0.036
0.020
0.013
0.135
0.068
0.047
0.347
0.447
0.003
0.145
0.274
6.890
9.075
0.849
Pr e-
0.005
Table 1. Descriptive Statistics of Variables
5.Empirical Analysis 5.1.Clustering Region Division
As we know, regional imbalance is one of the most important characters of the development in the developing countries(Wei,1999; Hastie and Efron,2013). In order to further analysis of
al
China's regional green technological innovation efficiency in time and space change characteristics, a K-means clustering is firstly carried out with to divide the 30 provinces into 4
urn
groups according to their development level of high-tech industry(Zou,2017). These groups are high, medium, medium low, low level 4 kind of economic regions, corresponding clustering number 1, 2, 3, 4, clustering results as shown in Table 3. Table 2. Clustering Division of Chinese Provinces
Cluster 1
Jo
Cluster 2 Cluster 3
Cluster 4
Tianjin,Shanghai, Jiangsu, Guangdong,Chongqing,
Beijing, Zhejiang, Anhui,Shandong, Henan, Sichuan
Hebei,Jilin,Fujian, Jiangxi,Hunan,Hainan,Shaanxi, Qinghai, Ningxia, Xinjiang Shanxi,Neimenggu, Liaoning,Heilongjiang,,Guangxi, Guizhou, Yunnan,
Gansu
Based on the data in table 3, the following conclusions can be drawn: from 2008 to 2016, Jiangsu, Guangdong ranked the first in clustering, and the efficiency of technological innovation was higher than the national average. Beijing, Shanghai, Zhejiang, Shandong, Henan, Sichuan, Tianjin are ranked in the second clustering, and technological innovation efficiency is in the national average level. Anhui, Hainan, Chongqing, Qinghai, Ningxia, Xinjiang lie in the
Journal Pre-proof third cluster and have a higher technical innovation efficiency than the national average. In the fourth cluster, the efficiency of technological innovation is lower than the national average. Cluster 1 gathers a large number of excellent manufacturing enterprises at home and abroad. They are the pioneers of green innovation industry. The province in cluster 2 is also a large economy province, but it is not the best soil for high-tech industry because its per capita resource possession is not as large as that in cluster 1. Cluster 4 mainly concentrated in the inland and northeast regions, have always been China's heavy industry base, while high-tech
of
industries are relatively deficient in industrial development. The results of clustering grouping in this paper are not consistent with the results of geographical division, which indicates that the traditional method of dividing different regions of China according to geographical location
p ro
is not completely applicable to the study of technological innovation efficiency in China. In order to prove the rationality of the above clustering grouping results, the following paper conducts ANOVA on the clustering results to obtain intra-class differences and inter-class differences, as shown in Table 4.
Pr e-
Table 3. Analysis of Variance Total
Inter-Class
Iintra-Class
Interblock Percentage
1.363
1.258
0.105
92.30%
As can be seen from Table 4, the inter-class difference is 1.258, and the intra-class difference is 0.105. That is to say, the inter-class difference in the green technology innovation efficiency of high-tech industry explains 92.30% of the regional total difference between 2008 and 2016. In other words, the differences in the green technology innovation efficiency of high-tech
al
industries in different regions of China are mainly caused by the differences among different categories. Therefore, when analyzing the influencing factors of green innovation efficiency in China, it is necessary to classify regions to understand the differences of the impact of various
urn
factors on the green innovation efficiency of high-tech industries in different clustering regions. 5.2.Measurement of Green Technology Innovation Efficiency As can be seen from Table 2, the overall green innovation efficiency of most provinces has maintained a tortuous growth trend, and the national average green innovation efficiency increased from 0.422 in 2008 to 0.636 in 2016. The technological innovation efficiency of
Jo
Liaoning, Heilongjiang, Anhui, Jiangxi, Shandong, Hubei, Hunan shows a straight upward trend; the technological innovation efficiency of Hebei, Inner Mongolia, Jilin, Shanghai, Zhejiang, Fujian, Guangxi and Guizhou is in a state of fluctuation, with little change in final value and initial value; while the technological innovation efficiency of Yunnan shows a tortuous downward trend. The average regional innovation efficiency of Beijing, Tianjin, Guangzhou, Anhui and other provinces is above 0.70, which can be summarized as regions with high innovation efficiency in the result. Meanwhile, the average regional innovation efficiency of Shanxi, Liaoning, Jiangxi, Guangxi, Hainan, Guizhou, Gansu, Qinghai, Ningxia and other
Journal Pre-proof provinces is below 0.45, which can be concluded as insufficient innovation efficiency in the result. Area. The average level of green innovation efficiency of high-tech industries in clustering 1, 2, 3 and 4 decreases successively, among which the average efficiency of clustering 1 and 2 is higher than the national level, and the average efficiency of cluster 3 and 4 is obviously lower than the national level. The research shows that the green innovation efficiency level of the high-tech industry is closely related to the development level of the regional high-tech industry.
2009
2010
2011
2012
Beijing
1.000
1.000
1.000
1.000
1.000
Tianjin
0.625
1.000
1.000
1.000
1.000
Hebei
0.228
0.180
0.177
0.131
0.208
Shanxi
1.000
0.204
0.399
0.231
0.286
0.000
0.334
0.260
Liaoning
0.192
0.275
0.254
Jilin
0.364
0.356
0.285
Heilongjiang
0.107
0.084
0.082
Shanghai
1.000
0.468
0.430
Jiangsu
1.000
0.300
0.366
Zhejiang
0.303
0.373
Anhui
0.145
0.443
Fujian
1.000
0.356
Jiangxi
0.134
0.147
Shandong
0.314
Henan
rs
Inner
2013
2014
2015
2016
Year/Mea n
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.958
0.236
0.256
0.204
0.267
0.210
0.312
0.337
0.335
0.100
0.356
Pr e-
2008
p ro
Provinces/Yea
of
Table 4. Green Innovation Efficiency of China's High-tech Industries, 2008-2016
0.247
0.216
0.206
0.201
0.265
0.210
0.223
0.247
0.344
0.357
0.428
0.631
0.328
0.235
0.329
0.523
0.406
0.278
0.365
0.349
0.134
0.177
0.152
0.215
0.223
0.308
0.165
0.455
0.400
0.480
0.529
0.504
0.583
0.539
0.626
0.566
0.549
0.726
0.725
1.000
0.651
0.355
0.353
0.433
1.000
0.496
0.591
0.510
0.491
0.275
0.427
1.000
1.000
1.000
1.000
1.000
0.699
0.361
0.446
0.416
0.390
0.380
0.447
0.539
0.482
0.196
0.166
0.250
0.357
0.489
0.584
0.795
0.347
0.325
0.406
0.434
0.350
0.371
0.404
0.491
1.000
0.455
0.300
0.297
1.000
0.349
0.398
1.000
1.000
1.000
1.000
0.705
Hubei
0.177
0.236
0.264
0.195
0.221
0.275
0.294
0.374
0.544
0.287
Hunan
0.226
0.317
0.417
0.635
0.532
0.565
0.531
0.489
1.000
0.524
Jo
urn
al
0.159
Mongolia
Guangdong
1.000
1.000
1.000
1.000
0.535
0.664
0.689
0.702
1.000
0.843
Guangxi
0.255
0.248
0.236
0.116
0.316
0.440
0.367
0.408
0.394
0.309
Hainan
0.061
0.234
0.106
0.346
1.000
0.262
0.268
0.205
0.100
0.287
Chongqing
0.267
0.388
0.372
1.000
0.578
0.569
0.719
1.000
0.886
0.642
Sichuan
0.184
0.255
0.229
0.247
1.000
0.502
1.000
1.000
1.000
0.602
Guizhou
1.000
0.256
0.220
0.409
0.378
0.232
0.318
0.288
0.442
0.394
Journal Pre-proof 0.326
1.000
0.334
0.280
0.316
0.335
0.360
0.309
0.203
0.385
Shaanxi
0.139
0.137
0.155
0.143
0.143
0.172
0.192
0.197
0.240
0.169
Gansu
0.137
0.210
0.233
0.344
0.408
0.369
0.433
0.413
0.388
0.326
Qinghai
0.019
0.016
1.000
1.000
0.096
0.095
0.203
0.380
1.000
0.423
Ningxia
0.153
0.270
0.252
0.318
0.563
1.000
0.274
0.255
0.534
0.402
Xinjiang
1.000
1.000
0.194
0.088
0.123
1.000
0.426
1.000
1.000
0.648
Nationwide
0.422
0.39
0.395
0.416
0.451
0.514
0.496
0.534
0.636
0.473
Cluster 1
0.778
0.631
0.634
0.816
0.616
0.652
0.733
0.786
0.894
0.727
Cluster 2
0.374
0.449
0.544
0.468
0.697
0.812
0.817
0.847
0.918
0.659
Cluster 3
0.332
0.301
0.314
0.351
0.366
0.460
0.343
0.404
0.584
0.384
Cluster 4
0.377
0.326
0.252
0.237
0.297
0.300
0.324
0.326
0.341
0.309
p ro
5.3.Lasso Regression
of
Yunnan
According to the level of this penalty coefficient , the algorithm will set the regression
Pr e-
coefficient of "unimportant" variables to 0, thus achieving the purpose of variable selection. Take the whole country as an example, the variable selection of each cluster area is similar. The different components of coefficient vectors nationwide are shown in figure 1. In Diagram 1, as the independent variable changes from left to right, more and more non-zero components are added, which means more and more variables are selected. Relative variation of Cp value is shown in the following Diagram 2. It is minimized when 4 factors are kept. The penalty
urn
al
parameter at this point equals “0.23478”.
Diagram 3. Cp criteria for Selection
Jo
Diagram 2. Regression Coefficient Variation
Lasso regression method is used to analyze the factors affecting the green innovation efficiency of high-tech industry. By establishing the Lasso regression model, the coefficients of the optimal variables selected in each region are shown in table 5. Table 5. Lasso Regression Coefficient of Each Region Variables
Nationwide
Cluster 1
Cluster 2
Cluster 3
Cluster 4
FE
3.75E-01
0
0
0
0
Journal Pre-proof 0
1.53E-07
0
-2.48E-08
0
NPD
1.06E+00
7.43E-05
9.60E-05
1.47E-05
2.37E-08
FD
-2.34E-08
0
0
-4.22E-04
-3.20E+02
GS
-6.33E-03
-5.78E-04
-5.33E-03
-8.92E-04
-7.88E-03
DKTD
0
8.77E-05
0
0
0
IKDT
0
-1.89E-05
0
0
0
FS
0
0
4.12E-05
0
0
IURC
0
0
0
0
0
FDI
0
0
0
IID
0
0
0
lnGDP
0
0
3.26E-03
of
DO
0
0
0
5.42E-07
6.03E-04
p ro
0
The results show that the allocation efficiency of green financial resources only observably promotes the green innovation efficiency of high-tech industries at the national level; Opening up has a significant promoting effect on the improvement of green innovation efficiency in cluster 1, while the opposite effect is shown in cluster 3; The demand for new products has a
Pr e-
positive impact on the improvement of green innovation efficiency in all regions; The distortion of financial factor market has a obvious negative impact on the improvement of green innovation efficiency in the whole country and cluster 3 and cluster 4; Government support is not conducive to the improvement of green innovation efficiency in all regions; Knowledge transfer only has a significant impact on the green innovation efficiency of high-tech industries in cluster I, among which, domestic knowledge transfer has a positive impact on the green innovation efficiency, while domestic knowledge transfer is just the opposite; Financial support only observably inhibits the improvement of green innovation efficiency of cluster 2; Regional
al
economic development has promoted the improvement of green innovation efficiency in cluster 2, 3 and 4.
urn
5.4.Quantile Regression and Tobit Regression Using quantile regression and panel Tobit regression, the variables selected by Lasso regression were empirically analyzed. This paper further discusses the influence degree and difference of various influencing factors on green innovation efficiency at different quantile of green innovation efficiency of high-tech industry, and reveals the influence law of various influencing factors on the conditional distribution of technological innovation efficiency in
Jo
China. The quantile regression and panel Tobit regression results of each region are shown in Table 6.
Table 6. The Quantile Regression and Panel Tobit Regression Results of Each Region
Region
Nationwide
Quantile Regression
Factors
FE
Tobit Model 0.25
0.5
0.75
0.031
0.158
0.259*
0.196**
(0.090)
(0.119)
(0.156)
(0.099)
Journal Pre-proof
FD
DO
GS
Cluster 1
-0.518***
-0.683***
-0.511***
(0.116)
(0.154)
(0.201)
(0.128)
0.414***
0.639***
0.723***
0.557***
(0.094)
(0.124)
(0.163)
(0.104)
-0.697***
-0.511***
-0.256*
-0.444***
(0.076)
(0.101)
(0.132)
(0.084)
0.401*
0.445***
0.449
0.492***
(0.216)
(0.163)
(0.169)
(0.141)
-1.007***
-1.044***
(0.466)
(0.352)
0.170***
0.829***
(0.287)
(0.217)
(0.225)
(0.187)
18.038**
12.710**
9.043
19.984***
(7.882)
(5.944)
(6.172)
(5.670)
-16.042**
-4.804
-5.300
-8.738*
(6.236)
(4.703)
(4.883)
(4.612)
-0.275
-1.942***
-1.881***
-1.416**
(1.053)
(0.641)
(0.598)
(0.625)
0.487***
0.548***
0.532***
0.751***
(0.322)
(0.196)
(0.183)
(0.189)
2.560
0.872
1.216
2.383*
(2.231)
(1.357)
(1.267)
(1.391)
-0.099
0.399***
0.363***
0.307***
(0.172)
(0.105)
(0.098)
(0.123)
-0.060*
-0.080*
-0.058
-0.099**
(0.031)
(0.045)
(0.061)
(0.038)
-0.459**
-0.742***
-1.098***
-0.688***
(0.177)
(0.255)
(0.350)
(0.214)
0.233
0.358*
0.293
0.328*
(0.141)
(0.204)
(0.279)
(0.172)
-0.195
0.356
0.796*
0.567*
(0.230)
(0.332)
(0.455)
(0.280)
0.031
0.113**
0.121
0.068
(0.038)
(0.055)
(0.076)
(0.047)
-0.457***
-0.459***
-0.159***
-0.578***
(0.139)
(0.101)
(0.181)
(0.110)
1.082***
1.092***
1.367***
1.158***
(0.279)
(0.203)
(0.363)
(0.221)
-1.047***
-0.915***
-0.867**
-0.673***
(0.136)
(0.231)
(0.411)
(0.251)
0.106*
0.126***
0.198**
0.137***
PROD
DKTD
NPD Cluster 2 FS
lnGDP
DO
Cluster 3
urn
GS
al
GS
Pr e-
IKDT
of
NPD
-0.305***
-1.127***
-1.162***
(0.365)
(0.301)
0.883***
1.009***
p ro
GS
NPD
FD
Jo
lnGDP
GS
NPD
Cluster 4
FD lnGDP
Journal Pre-proof (0.062)
(0.045)
(0.080)
(0.049)
The allocation efficiency of green financial resources(FE) plays a remarkable role in promoting the green innovation efficiency of high-tech industries. The improvement of financial resource allocation efficiency directly affects the allocation efficiency of production factors, improves the TFP and innovation efficiency of high-tech industries, and pushes China's economic development onto the road of new normal.
of
Opening up(DO) has a promoting effect on the green innovation efficiency of high-tech industry in cluster 1, while it is opposite in cluster 3. In the two clustering regions, this effect is more obvious in the regions with higher innovation efficiency. Opening up is conducive to
p ro
breaking the monopoly of the state-owned sector, enhancing the innovation vitality of small and medium-sized enterprises in market competition, and stimulating enterprises to increase their knowledge reserves through market introduction, learning, communication and so on. New product demand(NPD) of regional high-tech industry has significant positive influence on green innovation efficiency, and the higher the innovation efficiency of the region, the
Pr e-
greater the influence degree. The higher the demand of consumers for new products, the higher the price of new products will be. To some extent, the incentive of producers to invest in innovation will be strengthened, and the innovation efficiency of enterprises will be improved. The distortion of the financial factor market(FD) has notable negative effects on the green innovation efficiency of the national and cluster 3 and 4 high-tech industries. Among them, in the whole country and cluster 4, this negative effect is more significant in the regions with lower efficiency, while in cluster 3, it is opposite. The distortion of financial factor market will have adverse effects on the allocation of innovative capital and the initiative of enterprises to
al
innovate, which will lead to the loss of innovation efficiency. Government support(GS) has a obvious negative impact on the improvement of green
urn
innovation efficiency, and the higher the innovation efficiency is, the greater the impact degree is. Government financial support can effectively solve the problem of insufficient financing for enterprise innovation and research and development activities, thus promoting enterprises to carry out innovation activities efficiently. Government investment may crowd out enterprises' investment in R&D, or "crowding out effect". Government funds make enterprises' investment in R&D relatively insufficient, which inhibits enterprise innovation to some extent.
Jo
Knowledge transfer(DKTD,IKTD) only has an impact on innovation efficiency of cluster 1, and the impact is more significant in regions with low innovation efficiency. Domestic knowledge transfer(DKTD) promotes the improvement of innovation efficiency, while foreign knowledge transfer(IKTD) is the opposite. Enterprises in different regions in China have realized knowledge sharing through knowledge transfer, which makes innovation activities more effective and promotes the improvement of innovation efficiency. However, the crowding effect of foreign investors on the local market inhibits the improvement of innovation efficiency.
Journal Pre-proof Financial support(FS) promotes the innovation efficiency of cluster 1. Finance is the core of modern economy, and scientific and technological innovation and industrialization need financial support. Regional economic development(lnGDP) shows a positive correlation with innovation efficiency of high-tech industries, which is consistent with China's national conditions. 6.Discussion and Suggestions
of
The contribution and originality of this paper are mainly reflected in in the following aspects. A scientific system is constructed to measure the green innovation efficiency of high-tech industries. Then, through Lasso regression, quantile regression and DEA-Tobit method, this
p ro
paper comprehensively and systematically studied the regional differences of influencing factors of green innovation efficiency in high-tech industry. It provides theoretical support for deepening reform, realizing innovation-driven and efficient development of regional high-tech industries in China.
However, the traditional panel regression assumes that individuals are independent of each
Pr e-
other, and fails to consider the correlation between regions and the spatial spillover effect, leading to an unscientific and incomplete analysis conclusion on the efficiency of green innovation. In addition, when considering the influencing factors of green innovation efficiency of high-tech industries, some variables, such as social system and culture, are missing due to the difficulty of measurement, and the missing errors caused by them need to be further discussed in future studies. Finally, due to the length and research direction of the paper, there is no analysis on the decomposition value of innovation efficiency of high-tech industry, nor does it distinguish the sub-industries of high-tech industry, which needs to be further enriched
al
and improved in the later research. Based on the research results, the following policy suggestions are proposed. Different regions have great differences in resource endowment, economic development level, economic
urn
development speed and policy environment, which leads to different innovation efficiency. Therefore, we should make full use of our respective advantages, strengthen the communication and learning between each other, and promote the effective use and allocation of innovation resources. When local governments formulate corresponding policies to promote innovation in high-tech industries, they need to adjust measures to local conditions and focus
Jo
on promoting the development of factor markets.
Journal Pre-proof
References
of
Meng W Z, Li C Y, Shi X D. Analysis of innovation efficiency of high-tech industry in China in stages -- based on three-stage DEA model [J]. Macroeconomic research,2019(02):78-91. Carayannis E G , Grigoroudis E , Goletsis Y . A multilevel and multistage efficiency
p ro
evaluation of innovation systems: A multiobjective DEA approach[J]. Expert Systems with Applications, 2016, 62:63-80.
Kaya Samut P , Cafrı R . Analysis of the Efficiency Determinants of Health Systems in OECD Countries by DEA and Panel Tobit[J]. Social Indicators Research, 2016, 129(1):113-132. Lafarga C V, Balderrama J I L. Efficiency of Mexico's regional innovation systems: an Innovation & Development, 2015.
Pr e-
evaluation applying data envelopment analysis (DEA)[J]. African Journal of Science Technology
Yeung L L , Azevedo P F . Measuring efficiency of Brazilian courts with data envelopment analysis (DEA)[J]. IMA Journal of Management Mathematics, 2011, 22(4):343-356. Dai K Z, Liu Y J . How Factor Market Distortion Affects Innovation Performance[J]. The Journal of World Economy, 2016,39(11):54-79.
al
Dai K Z, Liu Y J . Factor Market Distortion and Innovation Efficiency:Empirical Evidence of China's High-tech Industries[J]. Economic Research Journal, 2016,51(07):72-86. Ji Y , Dou J . Study on Stage Impacts of Factor Price Distortion on Chinese Technology
urn
Innovation Based on Data Mining[J]. Journal of Computational and Theoretical Nanoscience, vol. 13, issue 12, pp. 10504-10513, 2016, 13(12):10504-10513. Li X, Ran G, Wei Z. How Does Financial Factor Distortion Affect Enterprise Innovation Investment?——Analysis from the Perspective of Financing Constraints[J]. Studies of International Finance, 2017(12):25-35.
Jo
Li X, Ran G, Wei Z.The Innovative Effect of Financial Factor Distortion and Its Regional Differences[J]. Studies in Science of Science, 2018,36(03):558-568. Gao X Y , Lyu Y W , Shi F , Zeng J T , Liu C Y . The Impact of Financial Factor Market Distortion on Green Innovation Efficiency of High-tech Industry[J]. Ekoloji,2019.28(107): 3449-3461. Fang J W , Chiu Y H . Research on Innovation Efficiency and Technology Gap in China
Journal Pre-proof Economic Development[J]. Asia Pacific Journal of Operational Research, 2017, 34(2):1750005. Kalapouti K , Petridis K , Malesios C , et al. Measuring efficiency of innovation using combined Data Envelopment Analysis and Structural Equation Modeling: empirical study in EU regions[J]. Annals of Operations Research, 2017. Hong J , Feng B , Wu Y , et al. Do government grants promote innovation efficiency in
of
China’s high-tech industries?[J]. Technovation, 2016:S0166497216301018. Castro M D , Gregorio . Knowledge management and innovation in knowledge-based and High-tech industrial markets: The role of openness and absorptive capacity[J]. Industrial
p ro
Marketing Management, 2015, 47:143-146.
Liu R Z , Du W . Portfolio Construction Using Variable Selection:Based on LASSO Method[J]. On Economic Problems, ,2012,(9):103-107.
Fang K N , Zhang G J , Zhang H Y . Individual Credit Risk Prediction Method:Application of 2014,31(02):125-136.
Pr e-
a Lasso-logistic Model[J]. The Journal of Quantitative & Technical Economics,
Mansiaux Y , Carrat F . Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections[J]. BMC Medical Research Methodology, 2014, 14(1).
Pereira J M , Basto M , Silva A F D . The Logistic Lasso and Ridge Regression in Predicting Corporate Failure[J]. Procedia Economics and Finance, 2016, 39:634-641.
al
Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective[J]. Journal of the Royal Statistical Society, 1996, 58(1):267-288.
urn
Matsui, Hidetoshi. Sparse group Lasso for multiclass functional logistic regression models[J]. Communications in Statistics - Simulation and Computation, 2018:1-14. Alhamzawi R, Yu K. Variable selection in quantile regression via Gibbs sampling[J]. Journal of Applied Statistics, 2012, 39(4):799-813. Jiang L , Bondell H D , Wang H J . Interquantile shrinkage and variable selection in quantile
Jo
regression[J]. Computational Statistics & Data Analysis, 2014, 69:208-219. Fan Y L . Two-step variable selection in quantile regression models[J]. Journal of Shanghai Normal University (Natutal Sciences), 2015,44(03):270-283. Wang B C , Wei Y H . Bayesian Estimation of Using M-H Algorithm to Solve Logistic Regression Model Parameters[J]. Statistics & Decision, 2017(18):23-28. Chien-wen Shen, Min Chen*, Chiao-chen Wang (2018). Analyzing the Trend of O2O Commerce by Bilingual Text Mining on Social Media. Computers in Human Behavior,
Journal Pre-proof https://doi.org/10.1016/j.chb.2018.09.031. Hashem H , Vinciotti V , Alhamzawi R , et al. Quantile regression with group lasso for classification[J]. Advances in Data Analysis and Classification, 2016, 10(3):375-390. Xie Z , Xu Y . Sparse group Lasso based uncertain feature selection[J]. International Journal of Machine Learning and Cybernetics, 2014, 5(2):201-210.
of
Benoit D F , Alhamzawi R , Yu K . Bayesian lasso binary quantile regression[J].
Jo
urn
al
Pr e-
p ro
Computational Statistics, 2013, 28(6):2861-2873.