Journal Pre-proof Effective energy consumption forecasting using enhanced bagged echo state network
Huanling Hu, Lin Wang, Lu Peng, Yu-Rong Zeng PII:
S0360-5442(19)32473-9
DOI:
https://doi.org/10.1016/j.energy.2019.116778
Reference:
EGY 116778
To appear in:
Energy
Received Date:
13 March 2019
Accepted Date:
14 December 2019
Please cite this article as: Huanling Hu, Lin Wang, Lu Peng, Yu-Rong Zeng, Effective energy consumption forecasting using enhanced bagged echo state network, Energy (2019), https://doi.org /10.1016/j.energy.2019.116778
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier.
Journal Pre-proof
Effective energy consumption forecasting using enhanced bagged echo state network Huanling Hu1, Lin Wang1*, Lu Peng1, Yu-Rong Zeng2 1. School of Management, Huazhong University of Science and Technology, Wuhan 430074, China; 2. Hubei University of Economics, Wuhan 430205, China. Abstract: Precise analysis and forecasting of energy consumption not only affects energy security and environment of a nation but also provides a useful decision basis for policy makers. This study proposes a new enhanced optimization model based on the bagged echo state network improved by differential evolution algorithm to estimate energy consumption. Bagging is applied to reduce forecasting error and improve generalization of network. Further, three parameters of echo state network are optimized using differential evolution algorithm. Thus, the proposed model combines the merits of three techniques which are echo state network, bagging, and differential evolution algorithm. The proposed model is applied to two comparative examples and an extended application to verify its accuracy and reliability. Results of the comparative examples show the proposed model achieves better forecasting performance compared with basic echo state network and other existing popular models. Mean absolute percentage error of the proposed model is 0.215% for total energy consumption forecasting of China. Therefore, the proposed model can be a satisfactory tool for forecasting energy consumption because of its high accuracy and stability. Keywords: energy consumption forecasting, echo state network, bagging, differential evolution ___________________ *Corresponding author. E-mail addresses:
[email protected] (Huan-Ling Hu);
[email protected] (Lin Wang);
[email protected] (Lu Peng);
[email protected] (Yu-Rong Zeng).
1
Journal Pre-proof Terms used in this study are as follows: long-range
energy
alternatives
ACO
ant colony optimization
LEAP
ADE
adaptive differential evolution
LSTM
long short-term memory
AI
artificial intelligence
MAE
mean absolute error
ANN
artificial neural network
MAPE
mean absolute percentage error
MARKAL
market allocation
MLR
multiple linear regression
MSE
mean square error
ARIMA BPNN COA
autoregressive integrated moving average back-propagation neural network proportion of coal consumption in TEC
planning system
DE
differential evolution
OLS
ordinary least squares
EA
evolutionary algorithm
POP
population
EEC
electrical energy consumption
PSO
particle swarm optimization
ESN
echo state network
RAT
the ratio of TEC to GDP
EXP
energy export
RMSE
root mean square error
FFNN
feedforward neural network
RNN
recurrent neural network
GA
genetic algorithm
SEC
share of secondary industry
GDP
gross domestic product
SVM
support vector machine
GM
grey model
TCE
ton of standard coal equivalent
TEC
total energy consumption
HOU IMP ISSA
household energy consumption per capita energy import improved analysis
singular
TIMES spectrum
URB
the
integrated
MARKAL-EFOM
system urbanization rate
1 Introduction Energy has great effect on the economic development and quality of life in a country. Energy consumption has experienced a significant growth given the continued economic growth in developing countries in the past decades. For example, China as the largest developing country in the world has achieved continuous and fast development, including rapid economic growth and ongoing process of urbanization, since 2002. Its energy consumption has experienced an extensive growth. Its gross domestic product (GDP) grew at an average annual rate of 9.8% from 2002 to 2015, and its energy consumption increased at 7.64% from 1,695.77 million tons of standard coal equivalent (TCE) in 2002 to 4,300 million TCE in 2015 [1]. Under these circumstances, energy consumption forecasting becomes more critical than before 2
Journal Pre-proof because it not only affects energy security and environment of a nation but also provides decision basis for policy makers. This study aims to propose a new effective model BDEESN based on the bagged echo state network (ESN) improved by differential evolution (DE) for energy consumption forecasting. In the BDEESN, bagging is used to reduce forecasting error and improve generalization of network, and DE is used to find suitable parameters of ESN. This study is the first to combine ESN, bagging, and DE to forecast energy consumption. Moreover, three examples are used to validate the accuracy and reliability of BDEESN. 1.1 Literature review on energy consumption forecasting Some studies focus on energy consumption with various forecasting methods. The methods are divided into five types, including statistical models, grey models (GMs), artificial intelligent (AI) models, hybrid models, and bottom-up models. A summary of typical studies under each type of method is shown in Table 1. For statistical models, Erdogdu [2] used cointegration analysis and autoregressive integrated moving average (ARIMA) for Turkey’s electricity demand. Limanond et al. [3] forecasted transport energy demand in Thailand with a log-linear regression model. Shao et al. [4] proposed a semiparametric model for mid-long term electricity consumption forecasting in China. de Oliveira and Oliveira [5] forecasted electric energy consumption in many countries with bagging ARIMA and exponential smoothing model. For GMs, Hsu and Chen [6] developed an improved GM for power demand forecasting. Zhou et al. [7] applied a trigonometric GM to forecast electricity demand. Wu et al. [8] proposed a nonlinear grey Bernoulli model to forecast renewable energy consumption of China. In recent years, owing to the increasing complexity and irregularity of energy forecasting problem, the AI models have received wider attention because of many advantages, which include powerful nonlinear fitting capability, capability to deal with noisy data, and satisfactory performance. The widely used AI models include particle swarm optimization (PSO), genetic algorithm (GA), artificial neural network (ANN), and support vector machine (SVM). Askarzadeh [9] studied the performance of different PSO variants to estimate Iran’s electricity demand. Canyurt and Ozturk [10] investigated Turkey’s fossil fuel demand, projection, and supplies based on GA. Dumitru and Gligor [11] proposed an architecture based on feedforward artificial neural network (FFNN) related to the wind power forecasting. Zhang et al. [12] applied SVM optimized by the cuckoo search algorithm to forecast the short-term electric load. Zong [13] developed ANN models to forecast South Korea’s transport energy demand with various independent variables and obtained more robust results. Yu et al. [14] forecasted global oil consumption with AI models and online big data. Besides, there are several hybrid models for energy consumption forecasting. Kıran et al. [15] proposed a hybrid approach based on PSO and ant colony optimization (ACO) to forecast Turkey’s energy demand. Zeng et al. [16] developed a hybrid intelligent model based on adaptive differential evolution (ADE) and back-propagation neural network (BPNN). Ahmad and Chen [17] forecasted district-level energy demand based on machine 3
Journal Pre-proof learning models. Wei et al. [18] developed a hybrid model ISSA-LSTM combining improved singular spectrum analysis (ISSA) with long short-term memory (LSTM) for natural gas consumption forecasting. For bottom-up models, Comodi et al. [19] analyzed an Italian seaside town with the integrated MARKAL-EFOM system (TIMES). Huang et al. [20] applied the long-range energy alternatives planning system (LEAP) to long-term forecast of Taiwan’s energy supply and demand. Tsai and Chang [21] studied Taiwan’s 2050 low carbon development with market allocation (MARKAL) model. Table 1 Summary of typical studies under each type of method used for energy consumption forecasting. Classification
Methods
Statistical models
Cointegration analysis and ARIMA[2] Log-linear regression model [3] Semiparametric model [4]
Artificial intelligent models
Bottom-up models
Forecasted areas
Electricity demand
Turkey
Transport energy demand Electricity consumption
Thailand China
-
Improved GM [6] Trigonometric GM [7] Nonlinear grey Bernoulli model [8]
-
Power demand
Canada, France, Italy, Japan, Brazil, Mexico, and Turkey Taiwan
-
Electricity demand
China
-
Renewable energy consumption
China
PSO [9]
GDP, POP, IMP, EXP
Electricity demand
Iran
GA [10]
GDP, POP, IMP, EXP
FFNN [11]
-
SVM [12]
Turkey South-East part of the Europe Australia
Transport energy demand
South Korea
ANN and SVM [14]
GDP, POP, oil price, number of vehicle registrations, passenger transport amount Online big data
Fossil fuels demand Wind power energy production Electric load
Oil consumption
Global countries
PSO-ACO [15]
GDP, POP, IMP, EXP
Energy demand
Turkey
ADE-BPNN [16]
GDP, POP, IMP, EXP
Total energy consumption
China
ANN with nonlinear autoregressive [17]
Environmental and aggregated energy consumption data
Energy demand
Several districts
ISSA-LSTM [18]
-
Four representative cities
TIMES [19]
GDP, POP, energy commodity prices
LEAP [20]
GDP, POP, energy conversion portion
MARKAL [21]
GDP, POP, industrial structure, household number
Natural gas consumption Households, transport, and the public sectors Energy supply and demand Electricity, industry, household and service, and transportation sectors
ANN [13]
Hybrid models
GDP per capita, electricity prices, net electricity consumption per capita GDP, POP, the numbers of registered vehicles IMP, EXP, deposits in financial institutions
Forecasted energy type
Electric energy consumption
Bagging ARIMA and exponential smoothing [5] Grey models
Influencing factors
4
Italy Taiwan Taiwan
Journal Pre-proof This study proposes the forecasting model based on the causal relationship between energy consumption and the chosen influencing factors. Plenty of studies used GDP, POP, IMP, and EXP to forecast the energy consumption of a nation [16,22]. GDP is an important indicator of the overall economic status of a nation. POP has a direct impact on energy use. IMP and EXP reflect the total size of a nation’s foreign trade. These four factors are the most related to the energy consumption of a nation [23]. Therefore, GDP, POP, IMP, and EXP are selected as the influencing factors in two comparative examples forecasting the energy consumption of Turkey and Iran, respectively. While in China, GDP and POP are the most and second important factors influencing energy consumption, respectively; IMP only has a slight impact, and EXP has almost no effect [16]. In addition, there are other factors that affect energy consumption, such as the urbanization rate (URB) and the share of secondary industry (SEC). The influence of URB [24, 25] and SEC [26] on energy use in China has attracted wide attention and researches. The Pearson and Spearman coefficient analysis is used to select the best influencing factors in the extended application. The electricity energy consumption (EEC) forecasting is widely accepted as a significant type of energy forecasting, and similar methods can be used to forecast the total energy consumption (TEC) [16]. 1.2 Contributions ESN as a recurrent neural network (RNN) is employed in this study to forecast energy consumption with its capability to hold non-linear system behavior. ESN has a dynamic reservoir as an information processing unit, which contains many randomly and sparsely connected neurons [27]. Furthermore, only the readout weights whose destinations are in the output layer need to be trained in the learning process [28]. So, ESN has two significant merits of high global optimality and low learning complexity. ESN has been successfully applied to various areas in recent years, including decoding human emotions [29] and dissolved oxygen control [30], but it is rarely employed in energy consumption forecasting. ESN is unstable because some weight matrices are randomly selected then remain unaltered. Bagging as a powerful ensemble learning algorithm is used to reduce forecasting error by combining the advantages of various networks [31]. Bagging can also improve the generalization of ESN [32]. Besides, studies show that the randomly generated reservoir greatly affects the performance of ESN [28,33], and thus, DE is used to set suitable parameters for the reservoir to obtain satisfactory performance. Based on the above analysis, the bagging process is applied to the ESN improved by DE (DEESN) as a base model. To validate the accuracy and reliability of BDEESN, three examples are studied, which are the annual EEC of Turkey from 1979 to 2006, the annual EEC of Iran from 1982 to 2009, and the annual TEC of China from 1990 to 2015. The first two examples are comparative examples, which have been studied in previous papers [16,22] and [9], respectively. The BDEESN is compared with existing popular models provided in these studies under the same data. The third example is a real application and aimed to forecast the TEC of China. China as the largest developing country has 5
Journal Pre-proof achieved continuous and fast development. In 2009, China surpassed the United States and became the world's largest energy consumer. The future energy consumption of China not only has an impact on its own energy security but also has a great significance to the global energy market. Therefore, forecasting the TEC of China has been chosen as the real application. This study is more in-depth on the basis of Wang et al.’s previous research [33]. In terms of the proposed model, bagging and DE are both used to enhance the forecasting performance of ESN in this study, but only the latter is used in [33]. Bagging as a powerful ensemble learning algorithm can reduce forecasting error and improve generalization of ESN. Besides, for comparison with DE, another intelligent optimization algorithm, GA, is also used to find the suitable parameters of the reservoir. ESN improved by GA (GAESN) and bagged ESN improved by GA (BGAESN) are both applied in three examples in this study. In terms of the used data, data on the influencing factors are applied to forecast energy consumption based on their causal relationship in this study. In particular, the Pearson and Spearman coefficient analysis are used to select the best influencing factors in the extended application. While only historical energy consumption data are used in [33]. Based on the above analysis, this study extends previous work by introducing bagging to the model, and considering the impact of influencing factors in the used data. In addition, the forecasting performance of BDEESN can be fully validated by comparing it with GAESN and BGAESN. The novelties of this study are as follows: (a) To the best of our knowledge, no studies have used bagged ESN improved by DE for energy consumption forecasting. In the proposed BDEESN, bagging is used to reduce forecasting error and improve generalization of network, and DE is used to find suitable parameters of ESN. (b) Three examples are used to fully validate the accuracy and reliability of BDEESN, including two comparative examples and an extended application. BDEESN is compared with existing popular models, BPNN, RNN, LSTM, ESN, bagged ESNs (BESN), GAESN, DEESN, and BGAESN to ensure a comprehensive evaluation of BDEESN. (c) The Pearson and Spearman coefficient analysis is used to select the best influencing factors in the extended application. The two correlation coefficients reflect the direction and degree of the changing trend between two variables. (d) The effects of bagging and DE alone and together on the accuracy of forecasting are examined. Besides, the effects of DE and GA are compared. These can be achieved by comparing the results of ESN, BESN, GAESN, DEESN, BGAESN, and BDEESN. The flowchart of EEC/TEC forecasting is presented in Fig. 1.
6
Journal Pre-proof Data and preprocessing Train S ESN models improved by DE Generate S forecasting results Average to get the final forecasting result
BDEESN
Influencing factors and EEC data
EEC forecasting of Turkey (comparative case)
EEC forecasting of Iran (comparative case)
TEC forecasting of China (real application)
Influencing factors and TEC data
Prove the superiority of the BDEESN
Fig. 1. Flowchart of EEC/TEC forecasting.
1.3 Organization of paper The remaining parts of this study consist of four sections. Section 2 gives a background overview of ESN, bagging, and DE. Section 3 introduces the proposed enhanced optimization model BDEESN. Section 4 presents three examples and analyzes the economic benefits of BDEESN. The conclusions and future research directions are given in Section 5.
2 Related backgrounds The proposed BDEESN combines the merits of three techniques, namely, ESN, bagging, and DE. Three approaches are briefly analyzed in this section. 2.1 Echo state network ESN is introduced in this subsection, including its basic theory and three parameters. 2.1.1 Basic theory of echo state network As a dynamic RNN proposed by Jaeger [34], ESN’s hidden layer is modeled by a reservoir. As shown in Fig. 2, ESN is composed of three layers, namely, input, reservoir, and output layers, where input layer has M input units, reservoir has N internal units, and output layer has L output units. The L=1 because this study deals with single-step forecasting problems. At the moment i (total training time step is I and washout time step is 𝐼0), the input units, internal units, and output units are shown as Eqs. (1), (2), and (3) respectively. The typical update equations of internal and output units are defined as Eqs. (4) and (5).
7
Journal Pre-proof Input layer win
Reservoir
Output layer
w
wout
y(i) . . .
u(i)
x(i) wback
Fig. 2. ESN architecture.
𝑢(𝑖) = [𝑢1(𝑖),𝑢2(𝑖)…𝑢𝑀(𝑖)]𝑇
(1)
(2) 𝑥(𝑖) = [𝑥1(𝑖),𝑥2(𝑖)…𝑥𝑁(𝑖)]𝑇 𝑇 (3) 𝑦(𝑖) = [𝑦1(𝑖),𝑦2(𝑖)…𝑦𝐿(𝑖)] (4) 𝑥(𝑖 + 1) = 𝑓(𝑤𝑖𝑛 ∗ 𝑢(𝑖 + 1) +𝑤 ∗ 𝑥(𝑖) + 𝑤𝑏𝑎𝑐𝑘 ∗ 𝑦(𝑖)) (5) 𝑦(𝑖 + 1) = 𝑔(𝑤𝑜𝑢𝑡 ∗ 𝑥(𝑖 + 1)) The 𝑓 and 𝑔 are the activation functions of the reservoir and output units, respectively. The number of weight matrices is four, namely, 𝑤𝑖𝑛(𝑁 ∗ 𝑀), w(𝑁 ∗ 𝑁), 𝑤𝑏𝑎𝑐𝑘(𝑁 ∗ 𝐿), and 𝑤𝑜𝑢𝑡(𝐿 ∗ 𝑁), which represent the input, internal reservoir, output backward, and readout weight matrices, respectively. The 𝑤𝑜𝑢𝑡, weight matrix from reservoir to output layer, can be updated during the learning process, while the other three weight matrices are randomly selected then remain unaltered [28,35]. For the calculation of readout weights, select the reservoir state vectors M ( ( 𝐼 ― 𝐼0 + 1) ∗ 𝑁) and the target outputs T ((𝐼 ― 𝐼0 + 1) ∗ 𝐿) where the time step is equal to or bigger than 𝐼0. Then, the readout weights are calculated with Eq. (6). 𝑇
𝑤𝑜𝑢𝑡 = (𝑀 ―1 ∗ 𝑇)
(6)
2.1.2 Three parameters of echo state network Three crucial parameters of the reservoir for ESN have great influence on the performance of network, so suitable values must be set for them. The number of neurons N in reservoir has a great impact on the performance of ESN because of its exponential relation with the hidden states’ evolution [28]. The range of N depends on the length of training data and the complexity of targeted application, and it is determined based on many experiments in this study. Besides, the connectivity rate α also affects the performance of ESN. According to Jaeger, too few connections may result in loss of reservoir states and lack of memory, whereas too many connections render difficulties in decoding [28]. Based on previous study, α is usually set within 1%–5% [36]. 8
Journal Pre-proof In addition, the spectral radius ρ, the biggest absolute eigenvalue of weight matrix 𝑤, is also of great importance. ρ is suggested to be within the interval (0, 1) to ensure the echo state property, which refers the current network state just depends on the input history and the training sample’s output; it is independent from its initial state after some iterations when the inputs are quite long [37,38]. Based on above analysis, the N, α, and ρ are studied in detail. 2.2 Bagging Bagging as a powerful ensemble learning algorithm is used to improve the performance of machine learning algorithms by combining the advantages of several base models [31]. Bagging can improve the unstable forecasting because it generates various base models based on the unstable learning algorithm; it can be regarded as a good way of using the instability to reduce forecasting error [32, 39]. Each base model is trained with a training sample generated by bootstrap method [40]. Bagging comprises the following steps: First, generate S bags of training samples, of which each one has a certain size NT and is obtained by randomly drawing successively with replacement from the original training set. Second, train S base models with the different training samples obtained. Subsequently, test the S trained models with testing set, and each model can correspond to a forecasting result. Finally, take the average value of the S forecasting results as the final forecasting result [41]. Two parameters can affect the performance of bagging, including the size of the training sample NT and the number of models S [32]. Previous studies noted that training can show a satisfactory performance on training samples which include approximately 60%–80% of the original training set; 63% data of the original training set can be selected when NT is almost equal to the size of the original training set [32,42]. The S, which relies on NT and acceptable calculation cost, is usually set to approximately 50 [43]. Based on recommendations by previous studies and the specific examples in this study, NT and S are obtained by the grid search algorithm to improve the accuracy of forecasting model and avoid overfitting, and details are given in Section 4.1.3. Finally, the calculation time may be increased by S times under bagging, but the increase can be much less when the S base models are trained and tested in parallel, which can be realized in bagged neural networks. 2.3 Differential evolution algorithm In recent years, with the rapid development of computer technology, many intelligent optimization algorithms are proposed to solve non-linearity, global optimization, combinatorial optimization, and other complex problems. The intelligent optimization algorithm is a heuristic optimization algorithm, including GA [44], DE [45,46], PSO [9], etc. In the proposed BDEESN, DE is used to search suitable parameters based on its simplicity, effectiveness, and reliability [47,48]. The steps of DE can be found in [48], which are not discussed here to ensure a reasonable article length. 9
Journal Pre-proof 3 Proposed enhanced optimization model The proposed BDEESN is described in this section. Sections 3.1 and 3.2 report the rationalities of using bagging and DE, respectively. An introduction of base model DEESN and the overall procedure of BDEESN is given in Section 3.3. 3.1 Rationality of using bagging Bagging can lead to smaller forecasting errors compared with those use a single neural network by combining the advantages of various networks. Moreover, bagging improves generalization because each training sample is a random combination of original training set, and many original datasets are possibly repeated, whereas others do not appear in it [32]. Bagging operates well especially when small changes in dataset may result in large changes in forecasting results [49]. Bagging has been applied in traffic flow forecasting [49], air transportation demand forecasting [50], financial time series forecasting [51], and bankruptcy forecasting [52] in the past years. The base model DEESN is unstable because three weight matrices are randomly initialized and keep unchanged in ESN. Hence, bagging, which utilizes the instability to reduce forecasting errors, is necessary. Bagging can also improve generalization of network. The bagging process is applied to DEESN based on these two reasons. 3.2 Rationality of using differential evolution DE, which is welcomed for its simplicity, effectiveness, and reliability, is a population-based intelligent optimization algorithm with four operations, namely, initialization, mutation, crossover, and selection. DE has been proved to provide more ability to reach good solution compared to other evolutionary algorithms (EAs) such as GA for various problems [44]. It has shown success in recent years in various problems, including the continuous multi-objective optimization [53], complex function optimization problems [54], and feature selection in facial expression recognition systems [55]. The three parameters containing N, α, and ρ have great influence on the performance of ESN. Although some recommendations were given by previous studies, setting the suitable parameters for the specific example is difficult. Thus, the DE algorithm is used to choose the three parameters of ESN in this study. 3.3 Proposed model The proposed model BDEESN is presented in this subsection, including its base model DEESN and the overall process of BDEESN. 3.3.1 Base model: echo state network improved by differential evolution (1) Decoding scheme of DE The global optimal parameters for each base model DEESN, including N, α, and ρ, 10
Journal Pre-proof are obtained by conducting a pre-training based on DE. The dimension D of each individual is equal to 3, and the details are shown in Fig. 3. The base model is determined when three parameters obtained from pre-training are re-injected to the network. (2) Fitness function The three parameters of ESN can be obtained from each DE optimization, then the forecasting value 𝑦𝑡(t=1, 2, …, k; k is the number of output samples) is gained based on the obtained parameters. In this study, the mean square error (MSE) as Eq. (7) is chosen as the fitness function, where 𝑦𝑡 is the observed value. 𝑘
MSE =
(
)
∑𝑡 = 1 𝑦𝑡 ― 𝑦𝑡
2
(7)
𝑘
(3) Flowchart of DEESN The flowchart of DEESN is as Fig. 4. As for DE, each individual in the population of number NP is represented by a D-dimensional vector. The gene is within [Umin, Umax] and the maximum iteration number is maxgen. F and CR are mutation factor and crossover factor respectively. The population is initialized based on the above parameters. Scale of reservoir N
Connectivity rate
Spectral radius
α
ρ
D=3
Fig. 3. Structure of each individual from DE.
11
Journal Pre-proof
Start
Initialize the population and G=0
Conduct the mutation, crossover, and selection to get the offspring individual
No
The present global optimal value <=μor G>=maxgen
Yes
The best individual is assigned as the three parameters
Generate the offspring population
Train the ESN with training sample
Calculate the fitness values, then get the present global optimal value and individual
Forecast with testing set
End
G=G+1
Fig. 4. Flowchart of base model DEESN.
3.3.2 Process of the proposed model The overall process of BDEESN is divided into five steps as follows. The flowchart is shown in Fig. 5. Step 1: Gather data and preprocess if necessary, then divide the data into training set and testing set. Step 2: Generate S bags of training samples by randomly drawing successively with replacement from the original training set obtained in Step 1. Step 3: Train S base models DEESN𝑖(i=1, …, S) by using each set of the training samples. Step 4: Test the S trained models with testing set and obtain S forecasting results. Step 5: Take the average value of the S forecasting results as the final forecasting result.
12
Journal Pre-proof
Fig. 5. Flowchart of BDEESN.
4 Experimental study and economic benefits analysis The BDEESN model is applied to three examples to validate its accuracy and reliability. Besides, the impact of BDEESN on the economic benefits is analyzed. 4.1 Example 1: electricity energy consumption forecasting in Turkey This example is the comparative example 1 and is intended to forecast the annual EEC of Turkey. The data set, evaluation metrics, parameter setting, and forecasting results are presented as follows. 4.1.1 Data set Four influencing factors, including GDP, POP, IMP, and EXP, are used to forecast the EEC of Turkey, which is the same as studies [16,22]. Fig. 6 and Table A1 present the EEC and four influencing factors for Turkey from 1979 to 2006. The data are obtained from studies [16,22].
Fig. 6. Data of the electricity demand and four factors in Turkey from 1979 to 2006.
For the same problem and data, the BDEESN is compared with existing good methods proposed in previous studies, including ACOQ [22], BABCEEQ [22], and ADE-BPNN [16]. In addition, the forecasting results are compared with some other models, including basic ESN, BESN, GAESN, DEESN, and BGAESN. A total of 28 annual data are selected as a sample for analysis, where the preceding 18 data are used as training set and the latter 10 data are used as testing set. 13
Journal Pre-proof Data preprocessing is firstly conducted with Eq. (8) to eliminate the dimensional effect, which is consistent with a previous study [16]. The data for each series are within [0.1, 0.9] after preprocessing. 𝐷 ― 𝐷𝑚𝑖𝑛
Dn = 𝐷𝑚𝑎𝑥 ― 𝐷𝑚𝑖𝑛*(0.9−0.1) +0.1
(8)
4.1.2 Evaluation metrics To evaluate and compare the forecasting performance among the BDEESN and eight other models, three popular evaluation metrics of root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are selected, which are calculated by Eqs. (9), (10), and (11), respectively. RMSE reflects the deviation between forecasting values and actual values, which is extremely sensitive to outliers. MAE, the average value of absolute errors, can well reflect the truth of forecasting errors. MAPE measures the outcome of a model's forecast [33]. 2
𝑘
∑𝑡 = 1(𝑦𝑡 ― 𝑦𝑡)
RMSE = ∑k
MAE =
|yt - yt|
t=1
k ∑k
MAPE =
𝑘
t=1
|yt - yt| yt k
(9) (10) (11)
4.1.3 Parameter setting In addition to BDEESN, basic ESN, BESN, GAESN, DEESN, and BGAESN are also utilized to forecast EEC in Turkey. The number of input units is equal to 4 because there are four influencing factors. Based on recommendations reported in Sections 2.1.2 and 2.2, the parameters N, α, and ρ of GAESN, DEESN, BGAESN, and BDEESN are determined by the corresponding algorithm, the NT and S of BESN, BGAESN, and BDEESN are obtained by the grid search, and the remaining parameters for the six ESN based models are obtained by trial and error method. It should be noted that for the parameters NT and S, the search spaces which are [16, 18] and [49, 51] respectively are determined according to the previous studies and several experiments, and then the grid search is used to find the optimal parameter values. The grid search is an exhaustive search method for each possible combination of parameters, and it is suitable for optimization of three or fewer parameters.The trial and error method yields satisfactory results by trying different combinations of parameters until the error is small enough, which is an effective and common parameter setting method [18,33]. The search spaces of DE/GA and other parameter values in this example are shown in Table 2. The same parameter setting methods are used in examples 2 and 3.
14
Journal Pre-proof Table 2 Parameters for the six ESN based models in Turkey. ESN
BESN
GAESN
DEESN
BGAESN
BDEESN
N
25
25
[20, 30]
[20, 30]
[20, 30]
[20, 30]
α ρ 𝐼0 f g
0.05 0.8 9 tangent identity
0.05 0.8 9 tangent identity
[0.01, 0.05] [0.1, 0.99] 9 tangent identity
[0.01, 0.05] [0.1, 0.99] 9 tangent identity
[0.01, 0.05] [0.1, 0.99] 9 tangent identity
[0.01, 0.05] [0.1, 0.99] 9 tangent identity
Bagging part
NT S
-
17 49
-
-
18 51
18 50
DE/GA part
NP maxgen μ F CR
-
-
20 30 10^(−5) 0.03 0.8
20 30 10^(−5) 0.9 0.1
20 30 10^(−5) 0.03 0.8
20 30 10^ (−5) 0.9 0.1
ESN part
DE and GA are used for continuous optimization where the optimized parameters are real numbers. Therefore, DE and GA should be modified here because the parameter N is an integer. Similar to studies [27,56], the N is taken as a real variable at the evolution process, then before evaluating fitness value, N is changed into the nearest integer to the obtained real number in a given interval. The same approach is used in the latter two examples. 4.1.4 Forecasting results For BDEESN and DEESN, DE is used to render a pre-training of the three parameters for ESN to find the most suitable values for the specific example. Fig. 7 presents a certain optimization process of DE in example 1 used for explanation. The iteration terminates until the number of iteration reaches the maxgen 30, then three obtained parameters based on decoding scheme stated in Section 3.3.1 are sent back to the network, and ESN performs normal training. 10 -4
1.25
The optimization process of DE
Fitness Function (MSE)
1.2
1.15
1.1
1.05
1
0.95
0.9
0
5
10
15
20
25
30
Generation
Fig. 7. Iterative MSE trend of ESN searching (For Turkey). 15
Journal Pre-proof The EEC forecasting results of Turkey from 1997 to 2006 by using various models are listed in Table 3. The relevant forecasting results of ACOQ, BABCEEQ, and ADE-BPNN are from previous studies [16,22]. Furthermore, RMSE, MAE, and MAPE of the nine models are shown in Table 3. Three metrics of BDEESN are the smallest among all models, showing the best performance and the improvement by simultaneously using bagging and DE. In addition, the results of BESN and DEESN are the second and third respectively in terms of calculated metrics, which indicates bagging and DE can individually enhance performance of ESN. Besides, DEESN is better than GAESN and BDEESN is better than BGAESN based on three metrics, which indicates DE is more suitable to find parameters of ESN in EEC forecasting of Turkey. Table 3 Forecasting results of annual electricity demand in Turkey by using the BDEESN and other models (unit: GWh). Years
Actual
ACOQ
BABCEEQ
ADE-BPNN
ESN
BESN
GAESN
DEESN
BGAESN
BDEESN
1997
105.517
101.936
103.668
104.7468
105.9064
104.9574
105.0508
106.0147
104.5874
104.6539
1998
114.023
110.643
106.827
108.8124
114.2341
114.8100
114.5778
114.1636
114.7906
114.5894
1999
118.485
109.321
106.064
112.5695
118.5879
118.2620
117.6456
117.8292
118.1631
118.0899
2000
128.276
129.396
131.188
123.5975
124.1821
126.7028
124.1294
126.9573
126.4521
126.2198
2001
126.871
123.629
119.434
127.5462
127.4712
128.5260
127.4386
129.5569
128.1864
128.0335
2002
132.553
133.644
131.993
133.1803
131.4178
132.3845
133.4321
133.5362
132.4794
132.3052
2003
141.151
141.689
145.786
141.1699
146.1361
145.7651
146.1197
144.8990
146.6422
145.6852
2004
150.018
147.806
157.713
151.2757
154.5139
152.0983
150.1128
154.9609
152.0528
151.6339
2005
160.794
163.158
160.555
160.2127
156.6309
156.2699
156.4876
156.3014
156.3311
156.8674
2006
174.637
172.819
171.046
174.2285
168.1348
171.5545
168.2841
174.7858
170.6078
171.2129
RMSE
3.6777
6.0600
2.9591
3.5113
2.4862
3.2112
2.6325
2.7761
2.3911
MAE
2.8510
4.8535
2.0141
2.6679
1.9267
2.3177
1.9614
2.1250
1.8792
MAPE (%)
2.271
3.772
1.639
1.800
1.330
1.588
1.386
1.467
1.305
Fig. 8 shows the real and predicted annual EEC of nine models in Turkey between 1997 and 2006, which are expressed with polyline and histogram, respectively. The ACOQ, BABCEEQ, and ADE-BPNN show poor performance because they have large gaps versus the actual values. The performance of ESN is not stable. The remaining five models of BESN, GAESN, DEESN, BGAESN, and BDEESN are much closer to actual values, of which the BDEESN is the closest.
16
Journal Pre-proof 180 170
GWh
160 150 140 130 120 110 100 1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
ACOQ
BABCEEQ
ADE-BPNN
ESN
BESN
GAESN
DEESN
BGAESN
BDEESN
Actual
Fig. 8. Forecasting results of Turkey. 6
Relative Error (%)
4 2 0 -2 -4 -6 -8 -10 -12 1997
1998
1999
2000
2001
2002
2003
2004
2005
ACOQ
BABCEEQ
ADE-BPNN
ESN
BESN
GAESN
DEESN
BGAESN
BDEESN
2006
Fig. 9. Error analysis of Turkey.
Fig. 9 displays the relative errors of the nine models. The relative errors of BDEESN are small, quite stable, and only have a small gap with the horizontal zero axis. The error range [−3%, +3%] is always taken as a standard to evaluate the forecasting values [57,58]. Here, the BDEESN has only one value point slightly exceeding the range in total ten value points, which is 3.212% in 2003. BDEESN has the best performance that coincides with the results shown in Table 3 and Fig. 8. In conclusion, the BDEESN outperforms ACOQ, BABCEEQ, ADE-BPNN, ESN, BESN, GAESN, DEESN and BGAESN in the EEC forecasting of Turkey, indicating the forecasting performance of ESN can be improved efficiently by using bagging and DE. 4.2 Example 2: electricity energy consumption forecasting in Iran This example is the comparative example 2 designed to forecast the annual EEC 17
Journal Pre-proof of Iran. The data set, parameter setting, and forecasting results are shown as follows. 4.2.1 Data set The GDP, POP, IMP, and EXP are influencing factors in forecasting the EEC of Iran, which is consistent with study [9]. The annual data of electricity demand (EEC) and four influencing factors in Iran from 1982 to 2009 are presented in Fig. 10 and Table A2, and the data are from study [9].
Fig. 10. Data of the electricity demand and four factors in Iran from 1982 to 2009.
The BDEESN is compared with CLPSO-E (exponential model optimized by comprehensive learning PSO) and PSO-Q (quadratic model optimized by PSO) proposed in study [9]. Besides, ESN, BESN, GAESN, DEESN, and BGAESN are applied to forecast EEC of Iran, and their results are involved in the comparison. A total of 28 annual data are selected as a sample for analysis, where the preceding 22 data are set as training set and the latter 6 data are used as testing set. The data are preprocessed by Eq. (12) to range [0, 1] like study [9]. 𝐷 ― 𝐷𝑚𝑖𝑛
Dn = 𝐷𝑚𝑎𝑥 ― 𝐷𝑚𝑖𝑛
(12)
4.2.2 Parameter setting Basic ESN, BESN, GAESN, DEESN, and BGAESN are also used to forecast the EEC in Iran. Four input units are used in this example. The search spaces of NT and S are [20, 22] and [34, 36] respectively for the grid search. The search spaces of DE/GA and other parameter values are shown in Table 4.
18
Journal Pre-proof Table 4 Parameters for the six ESN based models in Iran. ESN
BESN
GAESN
DEESN
BGAESN
BDEESN
N
12
12
[11, 15]
[11, 15]
[11, 15]
[11, 15]
α ρ 𝐼0 f g
0.05 0.7 11 tangent identity
0.05 0.7 11 tangent identity
[0.01, 0.05] [0.1, 0.99] 11 tangent identity
[0.01, 0.05] [0.1, 0.99] 11 tangent identity
[0.01, 0.05] [0.1, 0.99] 11 tangent identity
[0.01, 0.05] [0.1, 0.99] 11 tangent identity
Bagging part
NT S
-
22 35
-
-
20 35
20 36
-
-
20
20
20
20
DE/GA part
NP maxge n μ F CR
-
-
30
30
30
30
-
-
10^(−5) 0.03 0.7
10^(−5) 0.9 0.1
10^(−5) 0.03 0.7
10^(−5) 0.9 0.1
ESN part
4.2.3 Forecasting results Fig. 11 shows a certain optimization process of DE in example 2. When the optimization process ends, three obtained optimized parameters can be sent back to the network, and ESN performs normal training. 10 -4
2.5
Optimization process of DE
Fitness Function (MSE)
2
1.5
1
0.5
0
0
5
10
15
20
25
30
Generation
Fig. 11. Iterative MSE trend of ESN searching (For Iran).
Eight models are used to forecast EEC in Iran from 2004 to 2008. Table 5 lists the forecasting results of all models, among which the results of CLPSO-E and PSO-Q are obtained by rebuilding the corresponding models based on given parameters in study [9]. Besides, RMSE, MAE, and MAPE of eight models are shown in Table 5. Based on the three metrics, models centered on ESN containing ESN, BESN, GAESN, DEESN, BGAESN and BDEESN have higher accuracy than CLPSO-E and PSO-Q, which indicates the effectiveness of ESN. In addition, BDEESN and BGAESN have comparable best results, the MAE and MAPE of BDEESN are smaller than those of 19
Journal Pre-proof BGAESN, whereas the RMSE of BDEESN is slightly larger than that of BGAESN. These indicate DE and GA have similar ability to find suitable parameters in this example. It can also be seen that bagging and intelligent optimization algorithm can result in greater improvement when used simultaneously compared with individually, which applies to both DE and GA. Table 5 Forecasting results of annual electricity demand of Iran by using the BDEESN and other models (unit:106 MWh). Years
Actual
CLPSO-E
PSO-Q
ESN
BESN
GAESN
DEESN
BGAESN
BDEESN
2004
114.624
114.4235
118.7928
113.8597
115.1378
115.3491
115.0048
115.6091
115.1798
2005
125.528
123.1485
132.7117
125.3335
125.3284
125.0383
124.9653
125.4815
125.3165
2006
134.238
133.4989
133.9981
134.9538
134.6559
135.1986
135.0876
134.7336
134.2358
2007
147.001
144.3358
152.7172
140.2374
142.8431
141.0002
141.3980
144.7563
144.1467
2008
155.598
154.8149
171.1062
154.1935
157.1350
155.3693
156.4478
156.9285
156.9630
2009
169.047
165.6219
186.9299
169.1513
166.7233
168.6787
169.7348
166.3204
166.3500
RMSE
2.0695
10.5041
2.8538
2.0627
2.5128
2.3725
1.6053
1.7145
MAE
1.6987
8.4499
1.6578
1.5250
1.4622
1.4890
1.3048
1.2810
MAPE (%)
1.161
5.662
1.153
1.018
1.031
1.030
0.877
0.845
Fig. 12 presents the real and predicted annual EEC of the eight models in Iran between 2004 and 2009. It can be seen the PSO-Q has poor performance because the forecasting values are much larger than actual values, except the value in 2006. The forecasting values of the remaining seven models are all close to actual values. It is shown that the BDEESN and BGAESN are the closest and can fit the EEC of Iran well. 190 180
10^6 MWh
170 160 150 140 130 120 110 2004
2005
2006
2007
2008
CLPSO-E
PSO-Q
ESN
BESN
GAESN
DEESN
BGAESN
BDEESN
Actual
2009
Fig. 12. Forecasting results of Iran.
Fig. 13 displays the relative errors of eight models in every forecasting year. The relative errors of BDEESN and BGAESN are always in the range [−3%, +3%], which indicates stable and satisfactory performance. The analysis related to relative errors is 20
Journal Pre-proof consistent with that of the results in Table 5 and Fig. 12. 12
Relative Error (%)
10 8 6 4 2 0 -2 -4 -6 2004
2005
2006
2007
2008
2009
CLPSO-E
PSO-Q
ESN
BESN
GAESN
DEESN
BGAESN
BDEESN
Fig. 13. Error analysis of Iran.
The proposed BDEESN is superior to CLPSO-E, PSO-Q, ESN, BESN, GAESN, and DEESN in the EEC forecasting of Iran, showing better forecasting performance. It is worth mentioning that the BDEESN and BGAESN have comparable best forecasting performance in this example. The bagging and intelligent optimization algorithm when used simultaneously can result in greater improvement than individually, which applies to both DE and GA. 4.3 Example 3: total energy consumption forecasting in China This example is a real application designed to forecast the annual TEC of China. The influencing factors selection and data set, multiple linear regression model (MLR), parameter setting, and forecasting results are presented in this subsection. 4.3.1 Influencing factors selection and data set In this real application, BDEESN is applied to forecast the TEC of China, which is the largest energy consumer in the world. There are various factors that affect the TEC of China, including GDP [16, 59], POP [16, 59], URB [24, 25], SEC [26], the ratio of TEC to GDP (RAT) [59], proportion of coal consumption in TEC (COA) [59], and household energy consumption per capita (HOU) [59]. The data of the TEC and seven factors in China from 1990 to 2015 are collected from China Statistical Yearbook 2016 [1], which are shown in Fig. 14.
21
Journal Pre-proof
Fig. 14. Annual data of the TEC and seven factors in China from 1990 to 2015.
The best influencing factors are selected based on the calculated Pearson and Spearman correlations between factors and TEC in China [60]. The two correlation coefficients reflect the direction and degree of the change trend between two variables, and their values range from -1 to 1. Among these, 0 means that the two variables are not correlated, positive value means positive correlation, negative value means negative correlation, and larger absolute value means stronger correlation. The Pearson correlation coefficient is the most common correlation coefficient, but it is greatly affected by outliers. The Spearman correlation coefficient is less affected by outliers because it is calculated according to the sorting position of the original data. So, the two correlation coefficients are used together to select the best influencing factors. Table 6 shows the two correlation coefficients between each factor and TEC in China. The table also gives the ranking of each factor on two correlation coefficients, and the smaller the ranking is, the stronger the correlation is. The last row is the average value of each factor’s correlation rankings. The GDP, POP, and URB rank the top three from the average ranking, and their two correlation coefficients are above 0.9. Based on the above analysis, GDP, POP, and URB are selected as the best influencing factors. Table 6 Pearson and Spearman correlation coefficients between each factor and TEC in China. Factor
GDP
POP
URB
SEC
RAT
COA
HOU
Pearson correlation coefficient Ranking Spearman correlation coefficient Ranking Average ranking
0.969 3 1.000 1 2
0.919 4 1.000 1 2.5
0.976 2 1.000 1 1.5
-0.015 7 0.007 7 7
-0.717 5 -0.991 4 4.5
-0.669 6 -0.748 6 6
0.984 1 0.854 5 3
The annual data of the TEC and three best influencing factors in China from 1990 to 2015 are reported in Table A3. A total of 26 annual data are selected as a sample 22
Journal Pre-proof for analysis, where the preceding 20 data are used as training set and the latter 6 data are used as testing set. The data are preprocessed by Eq. (12) to be within the range [0, 1]. 4.3.2 Multiple linear regression model The MLR is also considered in this example, but it is not suitable for forecasting the TEC in China. The model can be expressed as Eq. (13). (13)
Y = a + b 1 * X1 + b 2 * X2 + b 3 * X3
Where X1, X2, and X3 are GDP, POP, and URB, respectively. The four parameters which contain a, b1, b2, and b3 are calculated based on the ordinary least square (OLS). Two key points are related to OLS. First, it needs many observations, not less than 30, because the OLS is based on statistics [61], whereas only 26 annual data are available in this example. Second, the influencing factors should be uncorrelated [61], while the three factors used are interactional. Table 7 lists their correlation coefficients. The multicollinearity is extremely severe. Table 7 Correlation coefficients of three influencing factors. GDP POP URB
GDP
POP
URB
1 0.8680 0.9387
0.8680 1 0.9761
0.9387 0.9761 1
The multicollinearity may result in incorrect analysis, which means the regression coefficient of one influencing factor fails to pass the significance test, although the factor is significant in simple linear regression. In addition, the sign of the regression coefficient may be affected. The MLR model is built with the normalized training set by using Excel, and the obtained regression model can be expressed as Eq. (14). Y = 0.02813 + 1.1386 * X1 ― 0.3248 * X2 +0.6022 * X3
(14)
POP has a positive effect on the TEC in China, whereas the regression coefficient is negative because of the multicollinearity. Based on the above analysis, the MLR model expressed as Eq. (14) cannot be used for forecasting in this example. 4.3.3 Parameter setting In this example, nine models including the BDEESN, BPNN, RNN, LSTM, ESN, BESN, GAESN, DEESN, and BGAESN are used to forecast the annual TEC in China. Among these, BPNN is a widely used FFNN and is often regarded as a benchmark [13,16]. RNN is a network that contains loops, allowing persistence of information, and is particularly suitable for applications related to time series [62]. LSTM is a deep learning RNN, which can solve the long dependence problem in RNN [18,47]. These 23
Journal Pre-proof three networks have been applied in various time series forecasting fields. The number of input units is 3 because there are three influencing factors. The search spaces of NT and S are [18, 20] and [49, 51] respectively for the grid search. The search spaces of DE/GA and other parameter values are shown in Table 8. For BPNN, based on a series of experiments, the best-fitting network is N3−4−1, the maximum number of epochs is set to 2000, learning rate is 0.0005, accuracy is 0.0005, and activation functions of the hidden and output layers are logsig and purelin functions respectively. For RNN, batch size is 2, the number of hidden neurons and epochs are 5 and 300 respectively. For LSTM, batch size is 1, the number of hidden neurons and epochs are 4 and 200 respectively. Table 8 Parameters for the six ESN based models in China. ESN
BESN
GAESN
DEESN
BGAESN
BDEESN
N
15
15
[11, 20]
[11, 20]
[11, 20]
[11, 20]
α ρ 𝐼0 f g
0.05 0.7 10 tangent identity
0.05 0.7 10 tangent identity
[0.01, 0.05] [0.1, 0.99] 10 tangent identity
[0.01, 0.05] [0.1, 0.99] 10 tangent identity
[0.01, 0.05] [0.1, 0.99] 10 tangent identity
[0.01, 0.05] [0.1, 0.99] 10 tangent identity
Bagging part
NT S
-
20 50
-
-
20 50
20 50
DE/GA part
NP maxgen μ F CR
-
-
20 30 10^(−4) 0.02 0.8
20 30 10^(−4) 0.9 0.1
20 30 10^(−4) 0.02 0.8
20 30 10^(−4) 0.9 0.1
4.3.4 Forecasting results Similar to examples 1 and 2, Fig. 15 shows a certain optimization process of DE used to select parameters of ESN in example 3. 10 -4
9
Optimization process of DE
8
Fitness Function (MSE)
ESN part
7
6
5
4
3
2
0
5
10
15
20
25
30
Generation
Fig. 15. Iterative MSE trend of ESN searching (For China). 24
Journal Pre-proof Table 9 lists the forecasting results and the errors of the nine models based on RMSE, MAE, and MAPE. The six ESN based models have lower errors compared with BPNN, RNN, and LSTM from these three metrics, which indicates the superiority of the ESN. The poor performance of LSTM may be due to the small training set. Furthermore, BESN and DEESN have comparable values, and both are better than basic ESN in terms of the three metrics, which indicates that bagging and DE can individually enhance the performance of ESN. In addition, BDEESN has the best performance with the lowest errors, which indicates that bagging and DE when used simultaneously can result in greater improvement. Besides, DE is more suitable to search parameters in this example because the errors of DEESN are smaller than that of GAESN and the errors of BDEESN are smaller than that of BGAESN. Table 9 Forecasting results of annual TEC in China using the BDEESN and other models (unit:106 TCE). Years
Actual
BPNN
RNN
LSTM
ESN
BESN
GAESN
DEESN
BGAESN
BDEESN
2010
3606.48
3552.2280
3637.13
3704.04
3602.0121
3603.3100
3609.1155
3634.1500
3605.8472
3607.2442
2011
3870.43
3851.4671
3874.35
3900.62
3859.3596
3854.7707
3856.9010
3867.8938
3855.3504
3859.4259
2012
4021.38
4019.1771
4025.85
4028.20
4035.1228
4031.9783
3987.5880
4020.3030
4045.2132
4035.3952
2013
4169.13
4153.2727
4158.35
4145.46
4191.9567
4167.8824
4175.5595
4192.0103
4180.3840
4180.5141
2014
4258.06
4226.9265
4261.38
4242.70
4270.6318
4261.3172
4264.2415
4256.8462
4257.8404
4258.6108
2015
4300
4258.7751
4353.22
4330.04
4324.9540
4327.1480
4306.3487
4296.7058
4314.9289
4314.9918
RMSE
32.2179
25.6062
45.0448
16.4984
13.6429
15.5549
14.7708
13.8166
10.5890
MAE
27.2722
17.7260
33.9401
14.9389
10.1801
11.4860
9.7786
10.9914
8.7850
MAPE(%)
0.687
0.439
0.880
0.362
0.249
0.285
0.252
0.270
0.215
Fig. 16 intuitively presents the real and predicted annual TEC of the nine models in China between 2010 and 2015, and the relative errors are shown in Fig. 17. 4400 4300
10^6 TCE
4200 4100 4000 3900 3800 3700 3600 3500 2010
2011
2012
2013
2014
2015
BPNN
RNN
LSTM
ESN
BESN
GAESN
DEESN
BGAESN
BDEESN
Actual
Fig.16. The forecasting results of China.
25
Relative Error (%)
Journal Pre-proof 3.0 2.5 2.0 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0 2010
2011
2012
2013
2014
BPNN
RNN
LSTM
ESN
BESN
GAESN
DEESN
BGAESN
BDEESN
2015
Fig.17. The error analysis of China.
Fig. 16 clearly shows the nine models all have satisfactory forecasting values, and the BDEESN has the forecasting results closest to the actual values. Fig. 17 shows the relative errors of the nine models are always in the range [−3%, +3%]. It can be learned that BDEESN is the most stable and just has very small fluctuations around zero. The proposed BDEESN model achieves the most satisfactory forecasting performance among the nine models in the TEC forecasting of China, which indicates high accuracy and reliability. 4.4 Economic benefits analysis With the continuous development of economy, energy consumption is also increasing, and energy has become a strategic issue in the economic development of many countries. Accurate energy consumption forecasting based on the proposed BDEESN has a great impact on economic benefits. First of all, it is beneficial to ensure the balance of energy supply and demand, thus supporting sustainable and stable economic growth. In addition, it is beneficial to recognize the contradiction between energy constraints and long-term growth of energy consumption, thus promoting the implementation of energy-saving policies in various countries. Energy-saving policies are conducive to promoting technological progress, reducing environmental pollution, and improving economic efficiency. In general, the impact of BDEESN on the economic benefits is mainly divided into the above two aspects.
5 Conclusions and future research An enhanced optimization model (BDEESN) for the multifactor-influenced EEC/TEC forecasting is developed in this study. Three examples are used to validate its forecasting accuracy and reliability, moreover, the impact of BDEESN on the economic benefits is analyzed. The main conclusions are as follows: 26
Journal Pre-proof (a) This study proposes a new effective model BDEESN based on the bagged ESN improved by DE to estimate energy consumption. Bagging process is used to reduce forecasting error and improve generalization of network, and DE is used to find suitable parameters of ESN. Thus, the BDEESN combines the merits of three techniques which are ESN, bagging, and DE. (b) In the two comparative examples, the BDEESN has higher precision compared with the existing best models. In addition, its forecasting accuracy is higher than basic ESN, BESN, GAESN, DEESN, and BGAESN, except for the similar forecasting performance to BGAESN in the second example. (c) In the extended application, the TEC of China is forecasted to further prove the accuracy and reliability of BDEESN. Based on the Pearson and Spearman coefficient analysis, GDP, POP, and URB are selected as the best influencing factors. The relatively high accuracy is obtained from BDEESN with MAPE of 0.215%, which indicates the superiority of BDEESN and the effectiveness of selected influencing factors. (d) In examples 1 and 3, BESN and DEESN have comparable performance next to BDEESN. These results indicate bagging and DE can individually enhance the performance of ESN, whereas greater improvement can be obtained when applied simultaneously. In example 2, GAESN has similar performance to DEESN, and BGAESN has similar performance to BDEESN. It is found that DE is better than GA in examples 1 and 3, while they have comparable performance in example 2. (e) Accurate energy consumption forecasting based on BDEESN has a great impact on economic benefits. It is beneficial to ensure sustainable and stable energy supply and economic growth. Besides, it is conducive to the implementation of energy-saving policies, which can promote technological progress, reduce environmental pollution, and improve economic efficiency. In the future work, the final forecasting result can be obtained from a weighted average of various base models instead of directly deriving an average. Moreover, the parameters obtained by trial and error method can be further fine-tuned by the popular search algorithms [63]. The proposed BDEESN can also be used to solve more prediction problems under complex environments [64] .
Acknowledgments The authors are grateful for the support from National Natural Science Foundation of China (Nos: 71771095; 71531009).
27
Journal Pre-proof Appendix A. Data sets of three examples Table A1 Annual data of the electricity demand and four factors in Turkey from 1979 to 2006. Electricity demand
POP
(GWh)
GDP ($10^9)
(10^6)
IMP ($10^9)
EXP ($10^9)
1979
23.566
82
43.53
5.07
2.26
1980
24.617
68
44.438
7.91
2.91
1981
26.289
72
45.54
8.93
4.7
1982
28.325
64
46.688
8.84
5.75
1983
29.568
60
47.864
9.24
5.73
1984
33.267
59
49.07
10.76
7.13
1985
36.361
67
50.306
11.34
7.95
1986
40.471
75
51.433
11.1
7.46
1987
44.925
86
52.561
14.16
10.19
1988
48.43
90
53.715
14.34
11.66
1989
52.602
108
54.893
15.79
11.62
1990
56.812
151
56.203
22.3
12.96
1991
60.499
150
57.305
21.05
13.59
1992
67.217
158
58.401
22.87
14.72
1993
73.432
179
59.491
29.43
15.35
1994
77.783
132
60.576
23.27
18.11
1995
85.552
170
61.644
35.71
21.64
1996
94.789
184
62.697
43.63
23.22
1997
105.517
192
62.48
48.56
26.26
1998
114.023
207
63.459
45.92
26.97
1999
118.485
187
64.345
40.67
26.59
2000
128.276
200
67.461
54.5
27.78
2001
126.871
146
68.618
41.4
31.33
2002
132.553
181
69.626
51.55
36.06
2003
141.151
239
70.712
69.34
47.25
2004
150.018
299
71.789
97.54
63.17
2005
160.794
361
72.065
116.77
73.48
2006
174.637
400
72.974
139.58
85.53
Years
28
Journal Pre-proof Table A2 Annual data of the electricity demand and four factors in Iran from 1982 to 2009. Year
a
Electricity demand (10^6 MWh)
GDP (10^9
Riala)
POP
IMP
EXP
(10^3)
($10^6)
($10^6)
1982
18.234
170281
40826
13515
339.53
1983
21.753
191667
42420
11845
283.74
1984
25.153
212877
44077
18103
356.63
1985
28.177
208516
45798
14494
361.13
1986
30.812
212686
47587
11408
464.95
1987
32.619
193235
49445
9355
915.5
1988
34.74
191312
50662
9369
1160.79
1989
36.147
180823
51909
8177
1036
1990
39.956
191503
53187
12807
1043.89
1991
45.107
218539
54496
18722
1312.17
1992
49.175
245036
55837
29677
2648.7
1993
52.306
254822
56656
29870
2987.68
1994
58.114
258601
57488
20037
3746.8
1995
63.625
259876
58331
11570
4824.55
1996
65.854
267534
59187
12082
3250.67
1997
69.671
283807
60055
14467
3105.71
1998
73.358
291769
61070
13633
2875.59
1999
77.646
300140
62103
13708
3013.31
2000
84.656
304941
63152
11972
3362
2001
90.366
320069
64219
13186.75
3762.7
2002
97.171
330565
65301
17198.25
4224.05
2003
105.525
357671
66300
21761
4608
2004
114.624
385630
67315
26597.8
5972.2
2005
125.528
410429
68345
35388.55
6847.3
2006
134.238
438900
69390
39247
10474.4
2007
147.001
467930
70496
41722.6
12996.91
2008
155.598
491099
71532
48438.82
15312.28
2009
169.047
495266
72584
56042.01
18333.58
Rial: Currency unit of Iran
29
Journal Pre-proof Table A3 Annual data of the TEC and three factors in China from 1990 to 2015. Year
a
TEC (10^6 TCE)
GDP (10^10
Yuana)
POP
URB
(10^8)
(%)
1990
987.03
188.729
11.4333
26.41
1991
1037.83
220.056
11.5823
26.94
1992
1091.7
271.945
11.7171
27.46
1993
1159.93
356.732
11.8517
27.99
1994
1227.37
486.375
11.985
28.51
1995
1311.76
613.399
12.1121
29.04
1996
1351.92
718.136
12.2389
30.48
1997
1359.09
797.15
12.3626
31.91
1998
1361.84
851.955
12.4761
33.35
1999
1405.69
905.644
12.5786
34.78
2000
1469.64
1002.801
12.6743
36.22
2001
1555.47
1108.631
12.7627
37.66
2002
1695.77
1217.174
12.8453
39.09
2003
1970.83
1374.22
12.9227
40.53
2004
2302.81
1618.402
12.9988
41.76
2005
2613.69
1873.189
13.0756
42.99
2006
2864.67
2194.385
13.1448
44.34
2007
3114.42
2702.323
13.2129
45.89
2008
3206.11
3195.155
13.2802
46.99
2009
3361.26
3490.814
13.345
48.34
2010
3606.48
4130.303
13.4091
49.95
2011
3870.43
4893.006
13.4735
51.27
2012
4021.38
5403.674
13.5404
52.57
2013
4169.13
5952.444
13.6072
53.73
2014
4258.06
6439.74
13.6782
54.77
2015
4300
6855.058
13.7462
56.1
Yuan: Currency unit of China
References [1] NBSC. China statistics yearbook. Beijing: China Statistics Press; 2016 [in Chinese]. [2] Erdogdu E. Electricity demand analysis using cointegration and ARIMA modelling: A case study of Turkey [J]. Energy Policy, 2007, 35(2):1129-1146. [3] Limanond T, Jomnonkwao S, Srikaew A. Projection of future transport energy demand of Thailand [J]. Energy Policy, 2011, 39(5):2754-2763. [4] Shao Z, Gao F, Zhang Q, et al. Multivariate statistical and similarity measure based semiparametric modeling of the probability distribution: A novel approach to the case study of mid-long term electricity consumption forecasting in China 30
Journal Pre-proof [J]. Applied Energy, 2015, 156:502-518. [5] de Oliveira E M, Oliveira F L C. Forecasting mid-long term electric energy consumption through bagging ARIMA and exponential smoothing methods[J]. Energy, 2018, 144: 776-788. [6] Hsu C C, Chen C Y. Applications of improved grey prediction model for power demand forecasting [J]. Energy Conversion & Management, 2003, 44(14): 2241 -2249. [7] Zhou P, Ang B W, Poh K L. A trigonometric grey prediction approach to forecasting electricity demand [J]. Energy, 2006, 31(14):2839-2847. [8] Wu W, Ma X, Zeng B, et al. Forecasting short-term renewable energy consumption of China using a novel fractional nonlinear grey Bernoulli model [J]. Renewable Energy, 2019, 140:70-87. [9] Askarzadeh A. Comparison of particle swarm optimization and other metaheuristics on electricity demand estimation: A case study of Iran [J]. Energy, 2014, 72(7):484-491. [10] Canyurt O E, Ozturk H K. Application of genetic algorithm (GA) technique on demand estimation of fossil fuels in Turkey[J]. Energy Policy, 2008, 36(7): 2562 -2569. [11] Dumitru C D, Gligor A. Daily average wind energy forecasting using artificial neural networks [J]. Procedia Engineering, 2017, 181:829-836. [12] Zhang X, Wang J, Zhang K. Short-term electric load forecasting based on singular spectrum analysis and support vector machine optimized by Cuckoo search algorithm [J]. Electric Power Systems Research, 2017, 146:270-285. [13] Zong W G. Transport energy demand modeling of South Korea using artificial neural network [J]. Energy Policy, 2011, 39(8):4644-4650. [14] Yu L, Zhao Y, Tang L, et al. Online big data-driven oil consumption forecasting with Google trends [J]. International Journal of Forecasting, 2019, 35(1): 213 -223. [15] Kıran M S, Özceylan E, Gündüz M, et al. A novel hybrid approach based on particle swarm optimization and ant colony algorithm to forecast energy demand of Turkey [J]. Energy Conversion & Management, 2012, 53(1):75-83. [16] Zeng Y R, Zeng Y, Choi B, et al. Multifactor-Influenced energy consumption forecasting using enhanced Back-propagation neural network [J]. Energy, 2017, 127: 381-396. [17] Ahmad T, Chen H. Potential of three variant machine-learning models for forecasting district level medium-term and long-term energy demand in smart grid environment[J]. Energy, 2018, 160: 1008-1020. [18] Wei N, Li C, Peng X, et al. Daily natural gas consumption forecasting via the application of a novel hybrid model[J]. Applied Energy, 2019, 250: 358-368. [19] Comodi G, Cioccolanti L, Gargiulo M. Municipal scale scenario: Analysis of an Italian seaside town with MarkAL-TIMES[J]. Energy Policy, 2012, 41(1): 303 -315. [20] Huang Y, Bor Y J, Peng C Y. The long-term forecast of Taiwan’s energy supply and demand: LEAP model application [J]. Energy Policy, 2011, 39(11): 6790 31
Journal Pre-proof -6803. [21] Tsai M S, Chang S L. Taiwan’s 2050 low carbon development roadmap: An evaluation with the MARKAL model [J]. Renewable & Sustainable Energy Reviews, 2015, 49:178-191. [22] Kıran M S, Özceylan E, Gündüz M, et al. Swarm intelligence approaches to estimate electricity energy demand in Turkey[J]. Knowledge-Based Systems, 2012, 36(6):93-103. [23] Toksarı M D. Ant colony optimization approach to estimate energy demand of Turkey[J]. Energy Policy, 2007, 35(8): 3984-3990. [24] Shen L, Cheng S, Gunson A J, et al. Urbanization, sustainability and the utilization of energy and mineral resources in China[J]. Cities, 2005, 22(4): 287 -302. [25] Zhang C, Lin Y. Panel estimation for urbanization, energy consumption and CO2 emissions: a regional analysis in China [J]. Energy Policy, 2012, 49: 488-498. [26] Yuan X C, Sun X, Zhao W G, et al. Forecasting China’s regional energy demand by 2030: A Bayesian approach[J]. Resources, Conservation & Recycling, 2017, 127 : 85-95. [27] Zhong S, Xie X, Lin L, et al. Genetic algorithm optimized double-reservoir echo state network for multi-regime time series prediction[J]. Neurocomputing, 2017, 238:191-204. [28] Chouikhi N, Ammar B, Rokbani N, Alimi AM. PSO-based analysis of Echo State Network parameters for time series forecasting [J]. Applied Soft Computing, 2017, 55: 211–225. [29] Bozhkov L, Koprinkova-Hristova P, Georgieva P. Learning to decode human emotions with Echo State Networks[J]. Neural Networks, 2016, 78:112. [30] Bo Y C. Online adaptive dynamic programming based on echo state networks for dissolved oxygen control [J]. Applied Soft Computing, 2018, 62: 830–839. [31] Breiman L. Bagging predictors [J]. Machine Learning, 1996, 24(2):123-140. [32] Khwaja A S, Naeem M, Anpalagan A, et al. Improved short-term load forecasting using bagged neural networks [J]. Electric Power Systems Research, 2015, 125: 109-115. [33] Wang L, Hu H, Ai X Y, et al. Effective electricity energy consumption forecasting using echo state network improved by differential evolution algorithm[J]. Energy, 2018, 153: 801-815. [34] Jaeger H. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note [M]. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, 2001, 148(34): 13. [35] Ma Q, Shen L, Chen W, et al. Functional echo state network for time series classification [J]. Information Sciences, 2016, 373:1-20. [36] Xu X, Niu D, Fu M, et al. A multi time scale wind power forecasting model of a chaotic echo state network based on a hybrid algorithm of particle swarm optimization and tabu search [J]. Energies, 2015, 8(11):12388–12408. [37] Wang L, Lv S.X., Zeng Y.R. Effective sparse adaboost method with ESN and FOA for industrial electricity consumption forecasting in China [J]. Energy, 32
Journal Pre-proof 2018, 155: 1013-1031. [38] Maiorino E, Bianchi F M, Livi L, et al. Data-driven detrending of nonstationary fractal time series with echo state networks[J]. Information Sciences, 2017, 382: 359 -373. [39] Zhao Y, Li J, Yu L. A deep learning ensemble approach for crude oil price forecasting [J]. Energy Economics, 2017, 66:9-16. [40] Efron B, Tibshirani R J. An introduction to the bootstrap [M]. Chapman & Hall, 1993. [41] Kotsiantis S B, Kanellopoulos D, Zaharakis I D. Bagged averaging of regression models[C]//IFIP International Conference on Artificial Intelligence Applications and Innovations. Springer, Boston, MA, 2006: 53-60. [42] Martínez-Muñoz G, Suárez A. Out-of-bag estimation of the optimal sample size in bagging[J]. Pattern Recognition, 2010, 43(1):143-152. [43] Bühlmann P, Yu B. Analyzing bagging [J]. Annals of Statistics, 2002, 30(4): 927-961. [44] Das S, Mullick S S, Suganthan P N. Recent advances in differential evolution – An updated survey [J]. Swarm & Evolutionary Computation, 2016, 27:1-30. [45] Zeng Y R, Peng L, Zhang J L., et al. An effective hybrid differential evolution algorithm incorporating simulated annealing for joint replenishment and delivery problem with trade credit [J]. International Journal of Computational Intelligence Systems, 2016, 9(6): 1001-1015. [46] Neri F, Tirronen V. Recent advances in differential evolution: a survey and experimental analysis. [J]. Artificial Intelligence Review, 2010, 33(1):61-106. [47] Peng L, Liu S, Liu R, et al. Effective Long short-term memory with differential evolution algorithm for electricity price prediction [J]. Energy, 2018, 162: 1301 -1314. [48] Storn R, Price K. Differential Evolution-A simple and efficient heuristic for global optimization over continuous spaces [J]. Journal of Global Optimization, 1997, 11(4): 341-359. [49] Moretti F, Pizzuti S, Annunziato M, et al. Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling [J]. Neurocomputing, 2015, 167(C):3-7. [50] Dantas T M, Oliveira F L C, Repolho H M V. Air transportation demand forecast through Bagging Holt Winters methods[J]. Journal of Air Transport Management, 2017, 59:116-123. [51] Jin, Sainan, Su, Ullah A. Robustify financial time series forecasting with Bagging [J]. Econometric Reviews, 2014, 33(5-6):575-605. [52] Kim M J, Kang D K. Ensemble with neural networks for bankruptcy prediction [J]. Expert Systems with Applications, 2010, 37(4):3373-3379. [53]Wang X, Tang L. An adaptive multi-population differential evolution algorithm for continuous multi-objective optimization [J]. Information Sciences, 2016, 348 (2):124-141. [54] Wang L, Hu H L, Liu R, et al. An improved differential harmony search algorithm for function optimization problems [J]. Soft Computing, 2019, 23 (13): 4827-4852. 33
Journal Pre-proof [55] Mlakar U, Fister I, Brest J, et al. Multi-objective differential evolution for feature selection in facial expression recognition systems [J]. Expert Systems with Applications, 2017, 89: 129-137. [56] Coelho L D S. An efficient particle swarm approach for mixed-integer programming in reliability–redundancy optimization applications [J]. Reliability Engineering & System Safety, 2009, 94(4):830-837. [57] Niu D, Wang Y, Wu D D. Power load forecasting using support vector machine and ant colony optimization [J]. Expert Systems with Applications, 2010, 37(3): 2531-2539. [58] Wang J, Li L, Niu D, et al. An annual load forecasting model based on support vector regression with differential evolution algorithm [J]. Applied Energy, 2012, 94(6): 65-70. [59] Wu Q, Peng C. A hybrid BAG-SA optimal approach to estimate energy demand of China[J]. Energy, 2017, 120: 985-995. [60] Kapetanakis D S, Mangina E, Finn D P. Input variable selection for thermal load predictive models of commercial buildings[J]. Energy and Buildings, 2017, 137: 13-26. [61] Ming M, Niu D. Annual electricity consumption analysis and forecasting of China based on few observations methods [J]. Energy Conversion and Management, 2011, 52(2):953-957. [62] Laubscher R. Time-series forecasting of coal-fired power plant reheater metal temperatures using encoder-decoder recurrent neural networks[J]. Energy, 2019: 116187. [63] Wang L, Xiong Y N, Li S W, et al. New fruit fly optimization algorithm with joint search strategies for function optimization problems [J]. Knowledge-Based Systems, 2019, 176:77-96. [64] Wang L, Wang Z G, Qu H, et al. Optimal forecast combination based on neural networks for time series forecasting[J]. Applied Soft Computing, 2018, 66: 1-17.
34
Journal Pre-proof
Declaration of interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Journal Pre-proof
Highlights
Propose an enhanced optimization model using bagged echo state network improve by differential evolution.
The proposed model is firstly used for energy consumption forecasting.
The proposed model achieves better forecasting performance than the existing best models for two comparative examples.
Mean absolute percentage error of the proposed model is 0.215% for total energy consumption forecasting in China.