Effective energy consumption forecasting using enhanced bagged echo state network

Effective energy consumption forecasting using enhanced bagged echo state network

Journal Pre-proof Effective energy consumption forecasting using enhanced bagged echo state network Huanling Hu, Lin Wang, Lu Peng, Yu-Rong Zeng PII:...

832KB Sizes 0 Downloads 60 Views

Journal Pre-proof Effective energy consumption forecasting using enhanced bagged echo state network

Huanling Hu, Lin Wang, Lu Peng, Yu-Rong Zeng PII:

S0360-5442(19)32473-9

DOI:

https://doi.org/10.1016/j.energy.2019.116778

Reference:

EGY 116778

To appear in:

Energy

Received Date:

13 March 2019

Accepted Date:

14 December 2019

Please cite this article as: Huanling Hu, Lin Wang, Lu Peng, Yu-Rong Zeng, Effective energy consumption forecasting using enhanced bagged echo state network, Energy (2019), https://doi.org /10.1016/j.energy.2019.116778

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier.

Journal Pre-proof

Effective energy consumption forecasting using enhanced bagged echo state network Huanling Hu1, Lin Wang1*, Lu Peng1, Yu-Rong Zeng2 1. School of Management, Huazhong University of Science and Technology, Wuhan 430074, China; 2. Hubei University of Economics, Wuhan 430205, China. Abstract: Precise analysis and forecasting of energy consumption not only affects energy security and environment of a nation but also provides a useful decision basis for policy makers. This study proposes a new enhanced optimization model based on the bagged echo state network improved by differential evolution algorithm to estimate energy consumption. Bagging is applied to reduce forecasting error and improve generalization of network. Further, three parameters of echo state network are optimized using differential evolution algorithm. Thus, the proposed model combines the merits of three techniques which are echo state network, bagging, and differential evolution algorithm. The proposed model is applied to two comparative examples and an extended application to verify its accuracy and reliability. Results of the comparative examples show the proposed model achieves better forecasting performance compared with basic echo state network and other existing popular models. Mean absolute percentage error of the proposed model is 0.215% for total energy consumption forecasting of China. Therefore, the proposed model can be a satisfactory tool for forecasting energy consumption because of its high accuracy and stability. Keywords: energy consumption forecasting, echo state network, bagging, differential evolution ___________________ *Corresponding author. E-mail addresses: [email protected] (Huan-Ling Hu); [email protected] (Lin Wang); [email protected] (Lu Peng); [email protected] (Yu-Rong Zeng).

1

Journal Pre-proof Terms used in this study are as follows: long-range

energy

alternatives

ACO

ant colony optimization

LEAP

ADE

adaptive differential evolution

LSTM

long short-term memory

AI

artificial intelligence

MAE

mean absolute error

ANN

artificial neural network

MAPE

mean absolute percentage error

MARKAL

market allocation

MLR

multiple linear regression

MSE

mean square error

ARIMA BPNN COA

autoregressive integrated moving average back-propagation neural network proportion of coal consumption in TEC

planning system

DE

differential evolution

OLS

ordinary least squares

EA

evolutionary algorithm

POP

population

EEC

electrical energy consumption

PSO

particle swarm optimization

ESN

echo state network

RAT

the ratio of TEC to GDP

EXP

energy export

RMSE

root mean square error

FFNN

feedforward neural network

RNN

recurrent neural network

GA

genetic algorithm

SEC

share of secondary industry

GDP

gross domestic product

SVM

support vector machine

GM

grey model

TCE

ton of standard coal equivalent

TEC

total energy consumption

HOU IMP ISSA

household energy consumption per capita energy import improved analysis

singular

TIMES spectrum

URB

the

integrated

MARKAL-EFOM

system urbanization rate

1 Introduction Energy has great effect on the economic development and quality of life in a country. Energy consumption has experienced a significant growth given the continued economic growth in developing countries in the past decades. For example, China as the largest developing country in the world has achieved continuous and fast development, including rapid economic growth and ongoing process of urbanization, since 2002. Its energy consumption has experienced an extensive growth. Its gross domestic product (GDP) grew at an average annual rate of 9.8% from 2002 to 2015, and its energy consumption increased at 7.64% from 1,695.77 million tons of standard coal equivalent (TCE) in 2002 to 4,300 million TCE in 2015 [1]. Under these circumstances, energy consumption forecasting becomes more critical than before 2

Journal Pre-proof because it not only affects energy security and environment of a nation but also provides decision basis for policy makers. This study aims to propose a new effective model BDEESN based on the bagged echo state network (ESN) improved by differential evolution (DE) for energy consumption forecasting. In the BDEESN, bagging is used to reduce forecasting error and improve generalization of network, and DE is used to find suitable parameters of ESN. This study is the first to combine ESN, bagging, and DE to forecast energy consumption. Moreover, three examples are used to validate the accuracy and reliability of BDEESN. 1.1 Literature review on energy consumption forecasting Some studies focus on energy consumption with various forecasting methods. The methods are divided into five types, including statistical models, grey models (GMs), artificial intelligent (AI) models, hybrid models, and bottom-up models. A summary of typical studies under each type of method is shown in Table 1. For statistical models, Erdogdu [2] used cointegration analysis and autoregressive integrated moving average (ARIMA) for Turkey’s electricity demand. Limanond et al. [3] forecasted transport energy demand in Thailand with a log-linear regression model. Shao et al. [4] proposed a semiparametric model for mid-long term electricity consumption forecasting in China. de Oliveira and Oliveira [5] forecasted electric energy consumption in many countries with bagging ARIMA and exponential smoothing model. For GMs, Hsu and Chen [6] developed an improved GM for power demand forecasting. Zhou et al. [7] applied a trigonometric GM to forecast electricity demand. Wu et al. [8] proposed a nonlinear grey Bernoulli model to forecast renewable energy consumption of China. In recent years, owing to the increasing complexity and irregularity of energy forecasting problem, the AI models have received wider attention because of many advantages, which include powerful nonlinear fitting capability, capability to deal with noisy data, and satisfactory performance. The widely used AI models include particle swarm optimization (PSO), genetic algorithm (GA), artificial neural network (ANN), and support vector machine (SVM). Askarzadeh [9] studied the performance of different PSO variants to estimate Iran’s electricity demand. Canyurt and Ozturk [10] investigated Turkey’s fossil fuel demand, projection, and supplies based on GA. Dumitru and Gligor [11] proposed an architecture based on feedforward artificial neural network (FFNN) related to the wind power forecasting. Zhang et al. [12] applied SVM optimized by the cuckoo search algorithm to forecast the short-term electric load. Zong [13] developed ANN models to forecast South Korea’s transport energy demand with various independent variables and obtained more robust results. Yu et al. [14] forecasted global oil consumption with AI models and online big data. Besides, there are several hybrid models for energy consumption forecasting. Kıran et al. [15] proposed a hybrid approach based on PSO and ant colony optimization (ACO) to forecast Turkey’s energy demand. Zeng et al. [16] developed a hybrid intelligent model based on adaptive differential evolution (ADE) and back-propagation neural network (BPNN). Ahmad and Chen [17] forecasted district-level energy demand based on machine 3

Journal Pre-proof learning models. Wei et al. [18] developed a hybrid model ISSA-LSTM combining improved singular spectrum analysis (ISSA) with long short-term memory (LSTM) for natural gas consumption forecasting. For bottom-up models, Comodi et al. [19] analyzed an Italian seaside town with the integrated MARKAL-EFOM system (TIMES). Huang et al. [20] applied the long-range energy alternatives planning system (LEAP) to long-term forecast of Taiwan’s energy supply and demand. Tsai and Chang [21] studied Taiwan’s 2050 low carbon development with market allocation (MARKAL) model. Table 1 Summary of typical studies under each type of method used for energy consumption forecasting. Classification

Methods

Statistical models

Cointegration analysis and ARIMA[2] Log-linear regression model [3] Semiparametric model [4]

Artificial intelligent models

Bottom-up models

Forecasted areas

Electricity demand

Turkey

Transport energy demand Electricity consumption

Thailand China

-

Improved GM [6] Trigonometric GM [7] Nonlinear grey Bernoulli model [8]

-

Power demand

Canada, France, Italy, Japan, Brazil, Mexico, and Turkey Taiwan

-

Electricity demand

China

-

Renewable energy consumption

China

PSO [9]

GDP, POP, IMP, EXP

Electricity demand

Iran

GA [10]

GDP, POP, IMP, EXP

FFNN [11]

-

SVM [12]

Turkey South-East part of the Europe Australia

Transport energy demand

South Korea

ANN and SVM [14]

GDP, POP, oil price, number of vehicle registrations, passenger transport amount Online big data

Fossil fuels demand Wind power energy production Electric load

Oil consumption

Global countries

PSO-ACO [15]

GDP, POP, IMP, EXP

Energy demand

Turkey

ADE-BPNN [16]

GDP, POP, IMP, EXP

Total energy consumption

China

ANN with nonlinear autoregressive [17]

Environmental and aggregated energy consumption data

Energy demand

Several districts

ISSA-LSTM [18]

-

Four representative cities

TIMES [19]

GDP, POP, energy commodity prices

LEAP [20]

GDP, POP, energy conversion portion

MARKAL [21]

GDP, POP, industrial structure, household number

Natural gas consumption Households, transport, and the public sectors Energy supply and demand Electricity, industry, household and service, and transportation sectors

ANN [13]

Hybrid models

GDP per capita, electricity prices, net electricity consumption per capita GDP, POP, the numbers of registered vehicles IMP, EXP, deposits in financial institutions

Forecasted energy type

Electric energy consumption

Bagging ARIMA and exponential smoothing [5] Grey models

Influencing factors

4

Italy Taiwan Taiwan

Journal Pre-proof This study proposes the forecasting model based on the causal relationship between energy consumption and the chosen influencing factors. Plenty of studies used GDP, POP, IMP, and EXP to forecast the energy consumption of a nation [16,22]. GDP is an important indicator of the overall economic status of a nation. POP has a direct impact on energy use. IMP and EXP reflect the total size of a nation’s foreign trade. These four factors are the most related to the energy consumption of a nation [23]. Therefore, GDP, POP, IMP, and EXP are selected as the influencing factors in two comparative examples forecasting the energy consumption of Turkey and Iran, respectively. While in China, GDP and POP are the most and second important factors influencing energy consumption, respectively; IMP only has a slight impact, and EXP has almost no effect [16]. In addition, there are other factors that affect energy consumption, such as the urbanization rate (URB) and the share of secondary industry (SEC). The influence of URB [24, 25] and SEC [26] on energy use in China has attracted wide attention and researches. The Pearson and Spearman coefficient analysis is used to select the best influencing factors in the extended application. The electricity energy consumption (EEC) forecasting is widely accepted as a significant type of energy forecasting, and similar methods can be used to forecast the total energy consumption (TEC) [16]. 1.2 Contributions ESN as a recurrent neural network (RNN) is employed in this study to forecast energy consumption with its capability to hold non-linear system behavior. ESN has a dynamic reservoir as an information processing unit, which contains many randomly and sparsely connected neurons [27]. Furthermore, only the readout weights whose destinations are in the output layer need to be trained in the learning process [28]. So, ESN has two significant merits of high global optimality and low learning complexity. ESN has been successfully applied to various areas in recent years, including decoding human emotions [29] and dissolved oxygen control [30], but it is rarely employed in energy consumption forecasting. ESN is unstable because some weight matrices are randomly selected then remain unaltered. Bagging as a powerful ensemble learning algorithm is used to reduce forecasting error by combining the advantages of various networks [31]. Bagging can also improve the generalization of ESN [32]. Besides, studies show that the randomly generated reservoir greatly affects the performance of ESN [28,33], and thus, DE is used to set suitable parameters for the reservoir to obtain satisfactory performance. Based on the above analysis, the bagging process is applied to the ESN improved by DE (DEESN) as a base model. To validate the accuracy and reliability of BDEESN, three examples are studied, which are the annual EEC of Turkey from 1979 to 2006, the annual EEC of Iran from 1982 to 2009, and the annual TEC of China from 1990 to 2015. The first two examples are comparative examples, which have been studied in previous papers [16,22] and [9], respectively. The BDEESN is compared with existing popular models provided in these studies under the same data. The third example is a real application and aimed to forecast the TEC of China. China as the largest developing country has 5

Journal Pre-proof achieved continuous and fast development. In 2009, China surpassed the United States and became the world's largest energy consumer. The future energy consumption of China not only has an impact on its own energy security but also has a great significance to the global energy market. Therefore, forecasting the TEC of China has been chosen as the real application. This study is more in-depth on the basis of Wang et al.’s previous research [33]. In terms of the proposed model, bagging and DE are both used to enhance the forecasting performance of ESN in this study, but only the latter is used in [33]. Bagging as a powerful ensemble learning algorithm can reduce forecasting error and improve generalization of ESN. Besides, for comparison with DE, another intelligent optimization algorithm, GA, is also used to find the suitable parameters of the reservoir. ESN improved by GA (GAESN) and bagged ESN improved by GA (BGAESN) are both applied in three examples in this study. In terms of the used data, data on the influencing factors are applied to forecast energy consumption based on their causal relationship in this study. In particular, the Pearson and Spearman coefficient analysis are used to select the best influencing factors in the extended application. While only historical energy consumption data are used in [33]. Based on the above analysis, this study extends previous work by introducing bagging to the model, and considering the impact of influencing factors in the used data. In addition, the forecasting performance of BDEESN can be fully validated by comparing it with GAESN and BGAESN. The novelties of this study are as follows: (a) To the best of our knowledge, no studies have used bagged ESN improved by DE for energy consumption forecasting. In the proposed BDEESN, bagging is used to reduce forecasting error and improve generalization of network, and DE is used to find suitable parameters of ESN. (b) Three examples are used to fully validate the accuracy and reliability of BDEESN, including two comparative examples and an extended application. BDEESN is compared with existing popular models, BPNN, RNN, LSTM, ESN, bagged ESNs (BESN), GAESN, DEESN, and BGAESN to ensure a comprehensive evaluation of BDEESN. (c) The Pearson and Spearman coefficient analysis is used to select the best influencing factors in the extended application. The two correlation coefficients reflect the direction and degree of the changing trend between two variables. (d) The effects of bagging and DE alone and together on the accuracy of forecasting are examined. Besides, the effects of DE and GA are compared. These can be achieved by comparing the results of ESN, BESN, GAESN, DEESN, BGAESN, and BDEESN. The flowchart of EEC/TEC forecasting is presented in Fig. 1.

6

Journal Pre-proof Data and preprocessing Train S ESN models improved by DE Generate S forecasting results Average to get the final forecasting result

BDEESN

Influencing factors and EEC data

EEC forecasting of Turkey (comparative case)

EEC forecasting of Iran (comparative case)

TEC forecasting of China (real application)

Influencing factors and TEC data

Prove the superiority of the BDEESN

Fig. 1. Flowchart of EEC/TEC forecasting.

1.3 Organization of paper The remaining parts of this study consist of four sections. Section 2 gives a background overview of ESN, bagging, and DE. Section 3 introduces the proposed enhanced optimization model BDEESN. Section 4 presents three examples and analyzes the economic benefits of BDEESN. The conclusions and future research directions are given in Section 5.

2 Related backgrounds The proposed BDEESN combines the merits of three techniques, namely, ESN, bagging, and DE. Three approaches are briefly analyzed in this section. 2.1 Echo state network ESN is introduced in this subsection, including its basic theory and three parameters. 2.1.1 Basic theory of echo state network As a dynamic RNN proposed by Jaeger [34], ESN’s hidden layer is modeled by a reservoir. As shown in Fig. 2, ESN is composed of three layers, namely, input, reservoir, and output layers, where input layer has M input units, reservoir has N internal units, and output layer has L output units. The L=1 because this study deals with single-step forecasting problems. At the moment i (total training time step is I and washout time step is 𝐼0), the input units, internal units, and output units are shown as Eqs. (1), (2), and (3) respectively. The typical update equations of internal and output units are defined as Eqs. (4) and (5).

7

Journal Pre-proof Input layer win

Reservoir

Output layer

w

wout

y(i) . . .

u(i)

x(i) wback

Fig. 2. ESN architecture.

𝑢(𝑖) = [𝑢1(𝑖),𝑢2(𝑖)…𝑢𝑀(𝑖)]𝑇

(1)

(2) 𝑥(𝑖) = [𝑥1(𝑖),𝑥2(𝑖)…𝑥𝑁(𝑖)]𝑇 𝑇 (3) 𝑦(𝑖) = [𝑦1(𝑖),𝑦2(𝑖)…𝑦𝐿(𝑖)] (4) 𝑥(𝑖 + 1) = 𝑓(𝑤𝑖𝑛 ∗ 𝑢(𝑖 + 1) +𝑤 ∗ 𝑥(𝑖) + 𝑤𝑏𝑎𝑐𝑘 ∗ 𝑦(𝑖)) (5) 𝑦(𝑖 + 1) = 𝑔(𝑤𝑜𝑢𝑡 ∗ 𝑥(𝑖 + 1)) The 𝑓 and 𝑔 are the activation functions of the reservoir and output units, respectively. The number of weight matrices is four, namely, 𝑤𝑖𝑛(𝑁 ∗ 𝑀), w(𝑁 ∗ 𝑁), 𝑤𝑏𝑎𝑐𝑘(𝑁 ∗ 𝐿), and 𝑤𝑜𝑢𝑡(𝐿 ∗ 𝑁), which represent the input, internal reservoir, output backward, and readout weight matrices, respectively. The 𝑤𝑜𝑢𝑡, weight matrix from reservoir to output layer, can be updated during the learning process, while the other three weight matrices are randomly selected then remain unaltered [28,35]. For the calculation of readout weights, select the reservoir state vectors M ( ( 𝐼 ― 𝐼0 + 1) ∗ 𝑁) and the target outputs T ((𝐼 ― 𝐼0 + 1) ∗ 𝐿) where the time step is equal to or bigger than 𝐼0. Then, the readout weights are calculated with Eq. (6). 𝑇

𝑤𝑜𝑢𝑡 = (𝑀 ―1 ∗ 𝑇)

(6)

2.1.2 Three parameters of echo state network Three crucial parameters of the reservoir for ESN have great influence on the performance of network, so suitable values must be set for them. The number of neurons N in reservoir has a great impact on the performance of ESN because of its exponential relation with the hidden states’ evolution [28]. The range of N depends on the length of training data and the complexity of targeted application, and it is determined based on many experiments in this study. Besides, the connectivity rate α also affects the performance of ESN. According to Jaeger, too few connections may result in loss of reservoir states and lack of memory, whereas too many connections render difficulties in decoding [28]. Based on previous study, α is usually set within 1%–5% [36]. 8

Journal Pre-proof In addition, the spectral radius ρ, the biggest absolute eigenvalue of weight matrix 𝑤, is also of great importance. ρ is suggested to be within the interval (0, 1) to ensure the echo state property, which refers the current network state just depends on the input history and the training sample’s output; it is independent from its initial state after some iterations when the inputs are quite long [37,38]. Based on above analysis, the N, α, and ρ are studied in detail. 2.2 Bagging Bagging as a powerful ensemble learning algorithm is used to improve the performance of machine learning algorithms by combining the advantages of several base models [31]. Bagging can improve the unstable forecasting because it generates various base models based on the unstable learning algorithm; it can be regarded as a good way of using the instability to reduce forecasting error [32, 39]. Each base model is trained with a training sample generated by bootstrap method [40]. Bagging comprises the following steps: First, generate S bags of training samples, of which each one has a certain size NT and is obtained by randomly drawing successively with replacement from the original training set. Second, train S base models with the different training samples obtained. Subsequently, test the S trained models with testing set, and each model can correspond to a forecasting result. Finally, take the average value of the S forecasting results as the final forecasting result [41]. Two parameters can affect the performance of bagging, including the size of the training sample NT and the number of models S [32]. Previous studies noted that training can show a satisfactory performance on training samples which include approximately 60%–80% of the original training set; 63% data of the original training set can be selected when NT is almost equal to the size of the original training set [32,42]. The S, which relies on NT and acceptable calculation cost, is usually set to approximately 50 [43]. Based on recommendations by previous studies and the specific examples in this study, NT and S are obtained by the grid search algorithm to improve the accuracy of forecasting model and avoid overfitting, and details are given in Section 4.1.3. Finally, the calculation time may be increased by S times under bagging, but the increase can be much less when the S base models are trained and tested in parallel, which can be realized in bagged neural networks. 2.3 Differential evolution algorithm In recent years, with the rapid development of computer technology, many intelligent optimization algorithms are proposed to solve non-linearity, global optimization, combinatorial optimization, and other complex problems. The intelligent optimization algorithm is a heuristic optimization algorithm, including GA [44], DE [45,46], PSO [9], etc. In the proposed BDEESN, DE is used to search suitable parameters based on its simplicity, effectiveness, and reliability [47,48]. The steps of DE can be found in [48], which are not discussed here to ensure a reasonable article length. 9

Journal Pre-proof 3 Proposed enhanced optimization model The proposed BDEESN is described in this section. Sections 3.1 and 3.2 report the rationalities of using bagging and DE, respectively. An introduction of base model DEESN and the overall procedure of BDEESN is given in Section 3.3. 3.1 Rationality of using bagging Bagging can lead to smaller forecasting errors compared with those use a single neural network by combining the advantages of various networks. Moreover, bagging improves generalization because each training sample is a random combination of original training set, and many original datasets are possibly repeated, whereas others do not appear in it [32]. Bagging operates well especially when small changes in dataset may result in large changes in forecasting results [49]. Bagging has been applied in traffic flow forecasting [49], air transportation demand forecasting [50], financial time series forecasting [51], and bankruptcy forecasting [52] in the past years. The base model DEESN is unstable because three weight matrices are randomly initialized and keep unchanged in ESN. Hence, bagging, which utilizes the instability to reduce forecasting errors, is necessary. Bagging can also improve generalization of network. The bagging process is applied to DEESN based on these two reasons. 3.2 Rationality of using differential evolution DE, which is welcomed for its simplicity, effectiveness, and reliability, is a population-based intelligent optimization algorithm with four operations, namely, initialization, mutation, crossover, and selection. DE has been proved to provide more ability to reach good solution compared to other evolutionary algorithms (EAs) such as GA for various problems [44]. It has shown success in recent years in various problems, including the continuous multi-objective optimization [53], complex function optimization problems [54], and feature selection in facial expression recognition systems [55]. The three parameters containing N, α, and ρ have great influence on the performance of ESN. Although some recommendations were given by previous studies, setting the suitable parameters for the specific example is difficult. Thus, the DE algorithm is used to choose the three parameters of ESN in this study. 3.3 Proposed model The proposed model BDEESN is presented in this subsection, including its base model DEESN and the overall process of BDEESN. 3.3.1 Base model: echo state network improved by differential evolution (1) Decoding scheme of DE The global optimal parameters for each base model DEESN, including N, α, and ρ, 10

Journal Pre-proof are obtained by conducting a pre-training based on DE. The dimension D of each individual is equal to 3, and the details are shown in Fig. 3. The base model is determined when three parameters obtained from pre-training are re-injected to the network. (2) Fitness function The three parameters of ESN can be obtained from each DE optimization, then the forecasting value 𝑦𝑡(t=1, 2, …, k; k is the number of output samples) is gained based on the obtained parameters. In this study, the mean square error (MSE) as Eq. (7) is chosen as the fitness function, where 𝑦𝑡 is the observed value. 𝑘

MSE =

(

)

∑𝑡 = 1 𝑦𝑡 ― 𝑦𝑡

2

(7)

𝑘

(3) Flowchart of DEESN The flowchart of DEESN is as Fig. 4. As for DE, each individual in the population of number NP is represented by a D-dimensional vector. The gene is within [Umin, Umax] and the maximum iteration number is maxgen. F and CR are mutation factor and crossover factor respectively. The population is initialized based on the above parameters. Scale of reservoir N

Connectivity rate

Spectral radius

α

ρ

D=3

Fig. 3. Structure of each individual from DE.

11

Journal Pre-proof

Start

Initialize the population and G=0

Conduct the mutation, crossover, and selection to get the offspring individual

No

The present global optimal value <=μor G>=maxgen

Yes

The best individual is assigned as the three parameters

Generate the offspring population

Train the ESN with training sample

Calculate the fitness values, then get the present global optimal value and individual

Forecast with testing set

End

G=G+1

Fig. 4. Flowchart of base model DEESN.

3.3.2 Process of the proposed model The overall process of BDEESN is divided into five steps as follows. The flowchart is shown in Fig. 5. Step 1: Gather data and preprocess if necessary, then divide the data into training set and testing set. Step 2: Generate S bags of training samples by randomly drawing successively with replacement from the original training set obtained in Step 1. Step 3: Train S base models DEESN𝑖(i=1, …, S) by using each set of the training samples. Step 4: Test the S trained models with testing set and obtain S forecasting results. Step 5: Take the average value of the S forecasting results as the final forecasting result.

12

Journal Pre-proof

Fig. 5. Flowchart of BDEESN.

4 Experimental study and economic benefits analysis The BDEESN model is applied to three examples to validate its accuracy and reliability. Besides, the impact of BDEESN on the economic benefits is analyzed. 4.1 Example 1: electricity energy consumption forecasting in Turkey This example is the comparative example 1 and is intended to forecast the annual EEC of Turkey. The data set, evaluation metrics, parameter setting, and forecasting results are presented as follows. 4.1.1 Data set Four influencing factors, including GDP, POP, IMP, and EXP, are used to forecast the EEC of Turkey, which is the same as studies [16,22]. Fig. 6 and Table A1 present the EEC and four influencing factors for Turkey from 1979 to 2006. The data are obtained from studies [16,22].

Fig. 6. Data of the electricity demand and four factors in Turkey from 1979 to 2006.

For the same problem and data, the BDEESN is compared with existing good methods proposed in previous studies, including ACOQ [22], BABCEEQ [22], and ADE-BPNN [16]. In addition, the forecasting results are compared with some other models, including basic ESN, BESN, GAESN, DEESN, and BGAESN. A total of 28 annual data are selected as a sample for analysis, where the preceding 18 data are used as training set and the latter 10 data are used as testing set. 13

Journal Pre-proof Data preprocessing is firstly conducted with Eq. (8) to eliminate the dimensional effect, which is consistent with a previous study [16]. The data for each series are within [0.1, 0.9] after preprocessing. 𝐷 ― 𝐷𝑚𝑖𝑛

Dn = 𝐷𝑚𝑎𝑥 ― 𝐷𝑚𝑖𝑛*(0.9−0.1) +0.1

(8)

4.1.2 Evaluation metrics To evaluate and compare the forecasting performance among the BDEESN and eight other models, three popular evaluation metrics of root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are selected, which are calculated by Eqs. (9), (10), and (11), respectively. RMSE reflects the deviation between forecasting values and actual values, which is extremely sensitive to outliers. MAE, the average value of absolute errors, can well reflect the truth of forecasting errors. MAPE measures the outcome of a model's forecast [33]. 2

𝑘

∑𝑡 = 1(𝑦𝑡 ― 𝑦𝑡)

RMSE = ∑k

MAE =

|yt - yt|

t=1

k ∑k

MAPE =

𝑘

t=1

|yt - yt| yt k

(9) (10) (11)

4.1.3 Parameter setting In addition to BDEESN, basic ESN, BESN, GAESN, DEESN, and BGAESN are also utilized to forecast EEC in Turkey. The number of input units is equal to 4 because there are four influencing factors. Based on recommendations reported in Sections 2.1.2 and 2.2, the parameters N, α, and ρ of GAESN, DEESN, BGAESN, and BDEESN are determined by the corresponding algorithm, the NT and S of BESN, BGAESN, and BDEESN are obtained by the grid search, and the remaining parameters for the six ESN based models are obtained by trial and error method. It should be noted that for the parameters NT and S, the search spaces which are [16, 18] and [49, 51] respectively are determined according to the previous studies and several experiments, and then the grid search is used to find the optimal parameter values. The grid search is an exhaustive search method for each possible combination of parameters, and it is suitable for optimization of three or fewer parameters.The trial and error method yields satisfactory results by trying different combinations of parameters until the error is small enough, which is an effective and common parameter setting method [18,33]. The search spaces of DE/GA and other parameter values in this example are shown in Table 2. The same parameter setting methods are used in examples 2 and 3.

14

Journal Pre-proof Table 2 Parameters for the six ESN based models in Turkey. ESN

BESN

GAESN

DEESN

BGAESN

BDEESN

N

25

25

[20, 30]

[20, 30]

[20, 30]

[20, 30]

α ρ 𝐼0 f g

0.05 0.8 9 tangent identity

0.05 0.8 9 tangent identity

[0.01, 0.05] [0.1, 0.99] 9 tangent identity

[0.01, 0.05] [0.1, 0.99] 9 tangent identity

[0.01, 0.05] [0.1, 0.99] 9 tangent identity

[0.01, 0.05] [0.1, 0.99] 9 tangent identity

Bagging part

NT S

-

17 49

-

-

18 51

18 50

DE/GA part

NP maxgen μ F CR

-

-

20 30 10^(−5) 0.03 0.8

20 30 10^(−5) 0.9 0.1

20 30 10^(−5) 0.03 0.8

20 30 10^ (−5) 0.9 0.1

ESN part

DE and GA are used for continuous optimization where the optimized parameters are real numbers. Therefore, DE and GA should be modified here because the parameter N is an integer. Similar to studies [27,56], the N is taken as a real variable at the evolution process, then before evaluating fitness value, N is changed into the nearest integer to the obtained real number in a given interval. The same approach is used in the latter two examples. 4.1.4 Forecasting results For BDEESN and DEESN, DE is used to render a pre-training of the three parameters for ESN to find the most suitable values for the specific example. Fig. 7 presents a certain optimization process of DE in example 1 used for explanation. The iteration terminates until the number of iteration reaches the maxgen 30, then three obtained parameters based on decoding scheme stated in Section 3.3.1 are sent back to the network, and ESN performs normal training. 10 -4

1.25

The optimization process of DE

Fitness Function (MSE)

1.2

1.15

1.1

1.05

1

0.95

0.9

0

5

10

15

20

25

30

Generation

Fig. 7. Iterative MSE trend of ESN searching (For Turkey). 15

Journal Pre-proof The EEC forecasting results of Turkey from 1997 to 2006 by using various models are listed in Table 3. The relevant forecasting results of ACOQ, BABCEEQ, and ADE-BPNN are from previous studies [16,22]. Furthermore, RMSE, MAE, and MAPE of the nine models are shown in Table 3. Three metrics of BDEESN are the smallest among all models, showing the best performance and the improvement by simultaneously using bagging and DE. In addition, the results of BESN and DEESN are the second and third respectively in terms of calculated metrics, which indicates bagging and DE can individually enhance performance of ESN. Besides, DEESN is better than GAESN and BDEESN is better than BGAESN based on three metrics, which indicates DE is more suitable to find parameters of ESN in EEC forecasting of Turkey. Table 3 Forecasting results of annual electricity demand in Turkey by using the BDEESN and other models (unit: GWh). Years

Actual

ACOQ

BABCEEQ

ADE-BPNN

ESN

BESN

GAESN

DEESN

BGAESN

BDEESN

1997

105.517

101.936

103.668

104.7468

105.9064

104.9574

105.0508

106.0147

104.5874

104.6539

1998

114.023

110.643

106.827

108.8124

114.2341

114.8100

114.5778

114.1636

114.7906

114.5894

1999

118.485

109.321

106.064

112.5695

118.5879

118.2620

117.6456

117.8292

118.1631

118.0899

2000

128.276

129.396

131.188

123.5975

124.1821

126.7028

124.1294

126.9573

126.4521

126.2198

2001

126.871

123.629

119.434

127.5462

127.4712

128.5260

127.4386

129.5569

128.1864

128.0335

2002

132.553

133.644

131.993

133.1803

131.4178

132.3845

133.4321

133.5362

132.4794

132.3052

2003

141.151

141.689

145.786

141.1699

146.1361

145.7651

146.1197

144.8990

146.6422

145.6852

2004

150.018

147.806

157.713

151.2757

154.5139

152.0983

150.1128

154.9609

152.0528

151.6339

2005

160.794

163.158

160.555

160.2127

156.6309

156.2699

156.4876

156.3014

156.3311

156.8674

2006

174.637

172.819

171.046

174.2285

168.1348

171.5545

168.2841

174.7858

170.6078

171.2129

RMSE

3.6777

6.0600

2.9591

3.5113

2.4862

3.2112

2.6325

2.7761

2.3911

MAE

2.8510

4.8535

2.0141

2.6679

1.9267

2.3177

1.9614

2.1250

1.8792

MAPE (%)

2.271

3.772

1.639

1.800

1.330

1.588

1.386

1.467

1.305

Fig. 8 shows the real and predicted annual EEC of nine models in Turkey between 1997 and 2006, which are expressed with polyline and histogram, respectively. The ACOQ, BABCEEQ, and ADE-BPNN show poor performance because they have large gaps versus the actual values. The performance of ESN is not stable. The remaining five models of BESN, GAESN, DEESN, BGAESN, and BDEESN are much closer to actual values, of which the BDEESN is the closest.

16

Journal Pre-proof 180 170

GWh

160 150 140 130 120 110 100 1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

ACOQ

BABCEEQ

ADE-BPNN

ESN

BESN

GAESN

DEESN

BGAESN

BDEESN

Actual

Fig. 8. Forecasting results of Turkey. 6

Relative Error (%)

4 2 0 -2 -4 -6 -8 -10 -12 1997

1998

1999

2000

2001

2002

2003

2004

2005

ACOQ

BABCEEQ

ADE-BPNN

ESN

BESN

GAESN

DEESN

BGAESN

BDEESN

2006

Fig. 9. Error analysis of Turkey.

Fig. 9 displays the relative errors of the nine models. The relative errors of BDEESN are small, quite stable, and only have a small gap with the horizontal zero axis. The error range [−3%, +3%] is always taken as a standard to evaluate the forecasting values [57,58]. Here, the BDEESN has only one value point slightly exceeding the range in total ten value points, which is 3.212% in 2003. BDEESN has the best performance that coincides with the results shown in Table 3 and Fig. 8. In conclusion, the BDEESN outperforms ACOQ, BABCEEQ, ADE-BPNN, ESN, BESN, GAESN, DEESN and BGAESN in the EEC forecasting of Turkey, indicating the forecasting performance of ESN can be improved efficiently by using bagging and DE. 4.2 Example 2: electricity energy consumption forecasting in Iran This example is the comparative example 2 designed to forecast the annual EEC 17

Journal Pre-proof of Iran. The data set, parameter setting, and forecasting results are shown as follows. 4.2.1 Data set The GDP, POP, IMP, and EXP are influencing factors in forecasting the EEC of Iran, which is consistent with study [9]. The annual data of electricity demand (EEC) and four influencing factors in Iran from 1982 to 2009 are presented in Fig. 10 and Table A2, and the data are from study [9].

Fig. 10. Data of the electricity demand and four factors in Iran from 1982 to 2009.

The BDEESN is compared with CLPSO-E (exponential model optimized by comprehensive learning PSO) and PSO-Q (quadratic model optimized by PSO) proposed in study [9]. Besides, ESN, BESN, GAESN, DEESN, and BGAESN are applied to forecast EEC of Iran, and their results are involved in the comparison. A total of 28 annual data are selected as a sample for analysis, where the preceding 22 data are set as training set and the latter 6 data are used as testing set. The data are preprocessed by Eq. (12) to range [0, 1] like study [9]. 𝐷 ― 𝐷𝑚𝑖𝑛

Dn = 𝐷𝑚𝑎𝑥 ― 𝐷𝑚𝑖𝑛

(12)

4.2.2 Parameter setting Basic ESN, BESN, GAESN, DEESN, and BGAESN are also used to forecast the EEC in Iran. Four input units are used in this example. The search spaces of NT and S are [20, 22] and [34, 36] respectively for the grid search. The search spaces of DE/GA and other parameter values are shown in Table 4.

18

Journal Pre-proof Table 4 Parameters for the six ESN based models in Iran. ESN

BESN

GAESN

DEESN

BGAESN

BDEESN

N

12

12

[11, 15]

[11, 15]

[11, 15]

[11, 15]

α ρ 𝐼0 f g

0.05 0.7 11 tangent identity

0.05 0.7 11 tangent identity

[0.01, 0.05] [0.1, 0.99] 11 tangent identity

[0.01, 0.05] [0.1, 0.99] 11 tangent identity

[0.01, 0.05] [0.1, 0.99] 11 tangent identity

[0.01, 0.05] [0.1, 0.99] 11 tangent identity

Bagging part

NT S

-

22 35

-

-

20 35

20 36

-

-

20

20

20

20

DE/GA part

NP maxge n μ F CR

-

-

30

30

30

30

-

-

10^(−5) 0.03 0.7

10^(−5) 0.9 0.1

10^(−5) 0.03 0.7

10^(−5) 0.9 0.1

ESN part

4.2.3 Forecasting results Fig. 11 shows a certain optimization process of DE in example 2. When the optimization process ends, three obtained optimized parameters can be sent back to the network, and ESN performs normal training. 10 -4

2.5

Optimization process of DE

Fitness Function (MSE)

2

1.5

1

0.5

0

0

5

10

15

20

25

30

Generation

Fig. 11. Iterative MSE trend of ESN searching (For Iran).

Eight models are used to forecast EEC in Iran from 2004 to 2008. Table 5 lists the forecasting results of all models, among which the results of CLPSO-E and PSO-Q are obtained by rebuilding the corresponding models based on given parameters in study [9]. Besides, RMSE, MAE, and MAPE of eight models are shown in Table 5. Based on the three metrics, models centered on ESN containing ESN, BESN, GAESN, DEESN, BGAESN and BDEESN have higher accuracy than CLPSO-E and PSO-Q, which indicates the effectiveness of ESN. In addition, BDEESN and BGAESN have comparable best results, the MAE and MAPE of BDEESN are smaller than those of 19

Journal Pre-proof BGAESN, whereas the RMSE of BDEESN is slightly larger than that of BGAESN. These indicate DE and GA have similar ability to find suitable parameters in this example. It can also be seen that bagging and intelligent optimization algorithm can result in greater improvement when used simultaneously compared with individually, which applies to both DE and GA. Table 5 Forecasting results of annual electricity demand of Iran by using the BDEESN and other models (unit:106 MWh). Years

Actual

CLPSO-E

PSO-Q

ESN

BESN

GAESN

DEESN

BGAESN

BDEESN

2004

114.624

114.4235

118.7928

113.8597

115.1378

115.3491

115.0048

115.6091

115.1798

2005

125.528

123.1485

132.7117

125.3335

125.3284

125.0383

124.9653

125.4815

125.3165

2006

134.238

133.4989

133.9981

134.9538

134.6559

135.1986

135.0876

134.7336

134.2358

2007

147.001

144.3358

152.7172

140.2374

142.8431

141.0002

141.3980

144.7563

144.1467

2008

155.598

154.8149

171.1062

154.1935

157.1350

155.3693

156.4478

156.9285

156.9630

2009

169.047

165.6219

186.9299

169.1513

166.7233

168.6787

169.7348

166.3204

166.3500

RMSE

2.0695

10.5041

2.8538

2.0627

2.5128

2.3725

1.6053

1.7145

MAE

1.6987

8.4499

1.6578

1.5250

1.4622

1.4890

1.3048

1.2810

MAPE (%)

1.161

5.662

1.153

1.018

1.031

1.030

0.877

0.845

Fig. 12 presents the real and predicted annual EEC of the eight models in Iran between 2004 and 2009. It can be seen the PSO-Q has poor performance because the forecasting values are much larger than actual values, except the value in 2006. The forecasting values of the remaining seven models are all close to actual values. It is shown that the BDEESN and BGAESN are the closest and can fit the EEC of Iran well. 190 180

10^6 MWh

170 160 150 140 130 120 110 2004

2005

2006

2007

2008

CLPSO-E

PSO-Q

ESN

BESN

GAESN

DEESN

BGAESN

BDEESN

Actual

2009

Fig. 12. Forecasting results of Iran.

Fig. 13 displays the relative errors of eight models in every forecasting year. The relative errors of BDEESN and BGAESN are always in the range [−3%, +3%], which indicates stable and satisfactory performance. The analysis related to relative errors is 20

Journal Pre-proof consistent with that of the results in Table 5 and Fig. 12. 12

Relative Error (%)

10 8 6 4 2 0 -2 -4 -6 2004

2005

2006

2007

2008

2009

CLPSO-E

PSO-Q

ESN

BESN

GAESN

DEESN

BGAESN

BDEESN

Fig. 13. Error analysis of Iran.

The proposed BDEESN is superior to CLPSO-E, PSO-Q, ESN, BESN, GAESN, and DEESN in the EEC forecasting of Iran, showing better forecasting performance. It is worth mentioning that the BDEESN and BGAESN have comparable best forecasting performance in this example. The bagging and intelligent optimization algorithm when used simultaneously can result in greater improvement than individually, which applies to both DE and GA. 4.3 Example 3: total energy consumption forecasting in China This example is a real application designed to forecast the annual TEC of China. The influencing factors selection and data set, multiple linear regression model (MLR), parameter setting, and forecasting results are presented in this subsection. 4.3.1 Influencing factors selection and data set In this real application, BDEESN is applied to forecast the TEC of China, which is the largest energy consumer in the world. There are various factors that affect the TEC of China, including GDP [16, 59], POP [16, 59], URB [24, 25], SEC [26], the ratio of TEC to GDP (RAT) [59], proportion of coal consumption in TEC (COA) [59], and household energy consumption per capita (HOU) [59]. The data of the TEC and seven factors in China from 1990 to 2015 are collected from China Statistical Yearbook 2016 [1], which are shown in Fig. 14.

21

Journal Pre-proof

Fig. 14. Annual data of the TEC and seven factors in China from 1990 to 2015.

The best influencing factors are selected based on the calculated Pearson and Spearman correlations between factors and TEC in China [60]. The two correlation coefficients reflect the direction and degree of the change trend between two variables, and their values range from -1 to 1. Among these, 0 means that the two variables are not correlated, positive value means positive correlation, negative value means negative correlation, and larger absolute value means stronger correlation. The Pearson correlation coefficient is the most common correlation coefficient, but it is greatly affected by outliers. The Spearman correlation coefficient is less affected by outliers because it is calculated according to the sorting position of the original data. So, the two correlation coefficients are used together to select the best influencing factors. Table 6 shows the two correlation coefficients between each factor and TEC in China. The table also gives the ranking of each factor on two correlation coefficients, and the smaller the ranking is, the stronger the correlation is. The last row is the average value of each factor’s correlation rankings. The GDP, POP, and URB rank the top three from the average ranking, and their two correlation coefficients are above 0.9. Based on the above analysis, GDP, POP, and URB are selected as the best influencing factors. Table 6 Pearson and Spearman correlation coefficients between each factor and TEC in China. Factor

GDP

POP

URB

SEC

RAT

COA

HOU

Pearson correlation coefficient Ranking Spearman correlation coefficient Ranking Average ranking

0.969 3 1.000 1 2

0.919 4 1.000 1 2.5

0.976 2 1.000 1 1.5

-0.015 7 0.007 7 7

-0.717 5 -0.991 4 4.5

-0.669 6 -0.748 6 6

0.984 1 0.854 5 3

The annual data of the TEC and three best influencing factors in China from 1990 to 2015 are reported in Table A3. A total of 26 annual data are selected as a sample 22

Journal Pre-proof for analysis, where the preceding 20 data are used as training set and the latter 6 data are used as testing set. The data are preprocessed by Eq. (12) to be within the range [0, 1]. 4.3.2 Multiple linear regression model The MLR is also considered in this example, but it is not suitable for forecasting the TEC in China. The model can be expressed as Eq. (13). (13)

Y = a + b 1 * X1 + b 2 * X2 + b 3 * X3

Where X1, X2, and X3 are GDP, POP, and URB, respectively. The four parameters which contain a, b1, b2, and b3 are calculated based on the ordinary least square (OLS). Two key points are related to OLS. First, it needs many observations, not less than 30, because the OLS is based on statistics [61], whereas only 26 annual data are available in this example. Second, the influencing factors should be uncorrelated [61], while the three factors used are interactional. Table 7 lists their correlation coefficients. The multicollinearity is extremely severe. Table 7 Correlation coefficients of three influencing factors. GDP POP URB

GDP

POP

URB

1 0.8680 0.9387

0.8680 1 0.9761

0.9387 0.9761 1

The multicollinearity may result in incorrect analysis, which means the regression coefficient of one influencing factor fails to pass the significance test, although the factor is significant in simple linear regression. In addition, the sign of the regression coefficient may be affected. The MLR model is built with the normalized training set by using Excel, and the obtained regression model can be expressed as Eq. (14). Y = 0.02813 + 1.1386 * X1 ― 0.3248 * X2 +0.6022 * X3

(14)

POP has a positive effect on the TEC in China, whereas the regression coefficient is negative because of the multicollinearity. Based on the above analysis, the MLR model expressed as Eq. (14) cannot be used for forecasting in this example. 4.3.3 Parameter setting In this example, nine models including the BDEESN, BPNN, RNN, LSTM, ESN, BESN, GAESN, DEESN, and BGAESN are used to forecast the annual TEC in China. Among these, BPNN is a widely used FFNN and is often regarded as a benchmark [13,16]. RNN is a network that contains loops, allowing persistence of information, and is particularly suitable for applications related to time series [62]. LSTM is a deep learning RNN, which can solve the long dependence problem in RNN [18,47]. These 23

Journal Pre-proof three networks have been applied in various time series forecasting fields. The number of input units is 3 because there are three influencing factors. The search spaces of NT and S are [18, 20] and [49, 51] respectively for the grid search. The search spaces of DE/GA and other parameter values are shown in Table 8. For BPNN, based on a series of experiments, the best-fitting network is N3−4−1, the maximum number of epochs is set to 2000, learning rate is 0.0005, accuracy is 0.0005, and activation functions of the hidden and output layers are logsig and purelin functions respectively. For RNN, batch size is 2, the number of hidden neurons and epochs are 5 and 300 respectively. For LSTM, batch size is 1, the number of hidden neurons and epochs are 4 and 200 respectively. Table 8 Parameters for the six ESN based models in China. ESN

BESN

GAESN

DEESN

BGAESN

BDEESN

N

15

15

[11, 20]

[11, 20]

[11, 20]

[11, 20]

α ρ 𝐼0 f g

0.05 0.7 10 tangent identity

0.05 0.7 10 tangent identity

[0.01, 0.05] [0.1, 0.99] 10 tangent identity

[0.01, 0.05] [0.1, 0.99] 10 tangent identity

[0.01, 0.05] [0.1, 0.99] 10 tangent identity

[0.01, 0.05] [0.1, 0.99] 10 tangent identity

Bagging part

NT S

-

20 50

-

-

20 50

20 50

DE/GA part

NP maxgen μ F CR

-

-

20 30 10^(−4) 0.02 0.8

20 30 10^(−4) 0.9 0.1

20 30 10^(−4) 0.02 0.8

20 30 10^(−4) 0.9 0.1

4.3.4 Forecasting results Similar to examples 1 and 2, Fig. 15 shows a certain optimization process of DE used to select parameters of ESN in example 3. 10 -4

9

Optimization process of DE

8

Fitness Function (MSE)

ESN part

7

6

5

4

3

2

0

5

10

15

20

25

30

Generation

Fig. 15. Iterative MSE trend of ESN searching (For China). 24

Journal Pre-proof Table 9 lists the forecasting results and the errors of the nine models based on RMSE, MAE, and MAPE. The six ESN based models have lower errors compared with BPNN, RNN, and LSTM from these three metrics, which indicates the superiority of the ESN. The poor performance of LSTM may be due to the small training set. Furthermore, BESN and DEESN have comparable values, and both are better than basic ESN in terms of the three metrics, which indicates that bagging and DE can individually enhance the performance of ESN. In addition, BDEESN has the best performance with the lowest errors, which indicates that bagging and DE when used simultaneously can result in greater improvement. Besides, DE is more suitable to search parameters in this example because the errors of DEESN are smaller than that of GAESN and the errors of BDEESN are smaller than that of BGAESN. Table 9 Forecasting results of annual TEC in China using the BDEESN and other models (unit:106 TCE). Years

Actual

BPNN

RNN

LSTM

ESN

BESN

GAESN

DEESN

BGAESN

BDEESN

2010

3606.48

3552.2280

3637.13

3704.04

3602.0121

3603.3100

3609.1155

3634.1500

3605.8472

3607.2442

2011

3870.43

3851.4671

3874.35

3900.62

3859.3596

3854.7707

3856.9010

3867.8938

3855.3504

3859.4259

2012

4021.38

4019.1771

4025.85

4028.20

4035.1228

4031.9783

3987.5880

4020.3030

4045.2132

4035.3952

2013

4169.13

4153.2727

4158.35

4145.46

4191.9567

4167.8824

4175.5595

4192.0103

4180.3840

4180.5141

2014

4258.06

4226.9265

4261.38

4242.70

4270.6318

4261.3172

4264.2415

4256.8462

4257.8404

4258.6108

2015

4300

4258.7751

4353.22

4330.04

4324.9540

4327.1480

4306.3487

4296.7058

4314.9289

4314.9918

RMSE

32.2179

25.6062

45.0448

16.4984

13.6429

15.5549

14.7708

13.8166

10.5890

MAE

27.2722

17.7260

33.9401

14.9389

10.1801

11.4860

9.7786

10.9914

8.7850

MAPE(%)

0.687

0.439

0.880

0.362

0.249

0.285

0.252

0.270

0.215

Fig. 16 intuitively presents the real and predicted annual TEC of the nine models in China between 2010 and 2015, and the relative errors are shown in Fig. 17. 4400 4300

10^6 TCE

4200 4100 4000 3900 3800 3700 3600 3500 2010

2011

2012

2013

2014

2015

BPNN

RNN

LSTM

ESN

BESN

GAESN

DEESN

BGAESN

BDEESN

Actual

Fig.16. The forecasting results of China.

25

Relative Error (%)

Journal Pre-proof 3.0 2.5 2.0 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0 2010

2011

2012

2013

2014

BPNN

RNN

LSTM

ESN

BESN

GAESN

DEESN

BGAESN

BDEESN

2015

Fig.17. The error analysis of China.

Fig. 16 clearly shows the nine models all have satisfactory forecasting values, and the BDEESN has the forecasting results closest to the actual values. Fig. 17 shows the relative errors of the nine models are always in the range [−3%, +3%]. It can be learned that BDEESN is the most stable and just has very small fluctuations around zero. The proposed BDEESN model achieves the most satisfactory forecasting performance among the nine models in the TEC forecasting of China, which indicates high accuracy and reliability. 4.4 Economic benefits analysis With the continuous development of economy, energy consumption is also increasing, and energy has become a strategic issue in the economic development of many countries. Accurate energy consumption forecasting based on the proposed BDEESN has a great impact on economic benefits. First of all, it is beneficial to ensure the balance of energy supply and demand, thus supporting sustainable and stable economic growth. In addition, it is beneficial to recognize the contradiction between energy constraints and long-term growth of energy consumption, thus promoting the implementation of energy-saving policies in various countries. Energy-saving policies are conducive to promoting technological progress, reducing environmental pollution, and improving economic efficiency. In general, the impact of BDEESN on the economic benefits is mainly divided into the above two aspects.

5 Conclusions and future research An enhanced optimization model (BDEESN) for the multifactor-influenced EEC/TEC forecasting is developed in this study. Three examples are used to validate its forecasting accuracy and reliability, moreover, the impact of BDEESN on the economic benefits is analyzed. The main conclusions are as follows: 26

Journal Pre-proof (a) This study proposes a new effective model BDEESN based on the bagged ESN improved by DE to estimate energy consumption. Bagging process is used to reduce forecasting error and improve generalization of network, and DE is used to find suitable parameters of ESN. Thus, the BDEESN combines the merits of three techniques which are ESN, bagging, and DE. (b) In the two comparative examples, the BDEESN has higher precision compared with the existing best models. In addition, its forecasting accuracy is higher than basic ESN, BESN, GAESN, DEESN, and BGAESN, except for the similar forecasting performance to BGAESN in the second example. (c) In the extended application, the TEC of China is forecasted to further prove the accuracy and reliability of BDEESN. Based on the Pearson and Spearman coefficient analysis, GDP, POP, and URB are selected as the best influencing factors. The relatively high accuracy is obtained from BDEESN with MAPE of 0.215%, which indicates the superiority of BDEESN and the effectiveness of selected influencing factors. (d) In examples 1 and 3, BESN and DEESN have comparable performance next to BDEESN. These results indicate bagging and DE can individually enhance the performance of ESN, whereas greater improvement can be obtained when applied simultaneously. In example 2, GAESN has similar performance to DEESN, and BGAESN has similar performance to BDEESN. It is found that DE is better than GA in examples 1 and 3, while they have comparable performance in example 2. (e) Accurate energy consumption forecasting based on BDEESN has a great impact on economic benefits. It is beneficial to ensure sustainable and stable energy supply and economic growth. Besides, it is conducive to the implementation of energy-saving policies, which can promote technological progress, reduce environmental pollution, and improve economic efficiency. In the future work, the final forecasting result can be obtained from a weighted average of various base models instead of directly deriving an average. Moreover, the parameters obtained by trial and error method can be further fine-tuned by the popular search algorithms [63]. The proposed BDEESN can also be used to solve more prediction problems under complex environments [64] .

Acknowledgments The authors are grateful for the support from National Natural Science Foundation of China (Nos: 71771095; 71531009).

27

Journal Pre-proof Appendix A. Data sets of three examples Table A1 Annual data of the electricity demand and four factors in Turkey from 1979 to 2006. Electricity demand

POP

(GWh)

GDP ($10^9)

(10^6)

IMP ($10^9)

EXP ($10^9)

1979

23.566

82

43.53

5.07

2.26

1980

24.617

68

44.438

7.91

2.91

1981

26.289

72

45.54

8.93

4.7

1982

28.325

64

46.688

8.84

5.75

1983

29.568

60

47.864

9.24

5.73

1984

33.267

59

49.07

10.76

7.13

1985

36.361

67

50.306

11.34

7.95

1986

40.471

75

51.433

11.1

7.46

1987

44.925

86

52.561

14.16

10.19

1988

48.43

90

53.715

14.34

11.66

1989

52.602

108

54.893

15.79

11.62

1990

56.812

151

56.203

22.3

12.96

1991

60.499

150

57.305

21.05

13.59

1992

67.217

158

58.401

22.87

14.72

1993

73.432

179

59.491

29.43

15.35

1994

77.783

132

60.576

23.27

18.11

1995

85.552

170

61.644

35.71

21.64

1996

94.789

184

62.697

43.63

23.22

1997

105.517

192

62.48

48.56

26.26

1998

114.023

207

63.459

45.92

26.97

1999

118.485

187

64.345

40.67

26.59

2000

128.276

200

67.461

54.5

27.78

2001

126.871

146

68.618

41.4

31.33

2002

132.553

181

69.626

51.55

36.06

2003

141.151

239

70.712

69.34

47.25

2004

150.018

299

71.789

97.54

63.17

2005

160.794

361

72.065

116.77

73.48

2006

174.637

400

72.974

139.58

85.53

Years

28

Journal Pre-proof Table A2 Annual data of the electricity demand and four factors in Iran from 1982 to 2009. Year

a

Electricity demand (10^6 MWh)

GDP (10^9

Riala)

POP

IMP

EXP

(10^3)

($10^6)

($10^6)

1982

18.234

170281

40826

13515

339.53

1983

21.753

191667

42420

11845

283.74

1984

25.153

212877

44077

18103

356.63

1985

28.177

208516

45798

14494

361.13

1986

30.812

212686

47587

11408

464.95

1987

32.619

193235

49445

9355

915.5

1988

34.74

191312

50662

9369

1160.79

1989

36.147

180823

51909

8177

1036

1990

39.956

191503

53187

12807

1043.89

1991

45.107

218539

54496

18722

1312.17

1992

49.175

245036

55837

29677

2648.7

1993

52.306

254822

56656

29870

2987.68

1994

58.114

258601

57488

20037

3746.8

1995

63.625

259876

58331

11570

4824.55

1996

65.854

267534

59187

12082

3250.67

1997

69.671

283807

60055

14467

3105.71

1998

73.358

291769

61070

13633

2875.59

1999

77.646

300140

62103

13708

3013.31

2000

84.656

304941

63152

11972

3362

2001

90.366

320069

64219

13186.75

3762.7

2002

97.171

330565

65301

17198.25

4224.05

2003

105.525

357671

66300

21761

4608

2004

114.624

385630

67315

26597.8

5972.2

2005

125.528

410429

68345

35388.55

6847.3

2006

134.238

438900

69390

39247

10474.4

2007

147.001

467930

70496

41722.6

12996.91

2008

155.598

491099

71532

48438.82

15312.28

2009

169.047

495266

72584

56042.01

18333.58

Rial: Currency unit of Iran

29

Journal Pre-proof Table A3 Annual data of the TEC and three factors in China from 1990 to 2015. Year

a

TEC (10^6 TCE)

GDP (10^10

Yuana)

POP

URB

(10^8)

(%)

1990

987.03

188.729

11.4333

26.41

1991

1037.83

220.056

11.5823

26.94

1992

1091.7

271.945

11.7171

27.46

1993

1159.93

356.732

11.8517

27.99

1994

1227.37

486.375

11.985

28.51

1995

1311.76

613.399

12.1121

29.04

1996

1351.92

718.136

12.2389

30.48

1997

1359.09

797.15

12.3626

31.91

1998

1361.84

851.955

12.4761

33.35

1999

1405.69

905.644

12.5786

34.78

2000

1469.64

1002.801

12.6743

36.22

2001

1555.47

1108.631

12.7627

37.66

2002

1695.77

1217.174

12.8453

39.09

2003

1970.83

1374.22

12.9227

40.53

2004

2302.81

1618.402

12.9988

41.76

2005

2613.69

1873.189

13.0756

42.99

2006

2864.67

2194.385

13.1448

44.34

2007

3114.42

2702.323

13.2129

45.89

2008

3206.11

3195.155

13.2802

46.99

2009

3361.26

3490.814

13.345

48.34

2010

3606.48

4130.303

13.4091

49.95

2011

3870.43

4893.006

13.4735

51.27

2012

4021.38

5403.674

13.5404

52.57

2013

4169.13

5952.444

13.6072

53.73

2014

4258.06

6439.74

13.6782

54.77

2015

4300

6855.058

13.7462

56.1

Yuan: Currency unit of China

References [1] NBSC. China statistics yearbook. Beijing: China Statistics Press; 2016 [in Chinese]. [2] Erdogdu E. Electricity demand analysis using cointegration and ARIMA modelling: A case study of Turkey [J]. Energy Policy, 2007, 35(2):1129-1146. [3] Limanond T, Jomnonkwao S, Srikaew A. Projection of future transport energy demand of Thailand [J]. Energy Policy, 2011, 39(5):2754-2763. [4] Shao Z, Gao F, Zhang Q, et al. Multivariate statistical and similarity measure based semiparametric modeling of the probability distribution: A novel approach to the case study of mid-long term electricity consumption forecasting in China 30

Journal Pre-proof [J]. Applied Energy, 2015, 156:502-518. [5] de Oliveira E M, Oliveira F L C. Forecasting mid-long term electric energy consumption through bagging ARIMA and exponential smoothing methods[J]. Energy, 2018, 144: 776-788. [6] Hsu C C, Chen C Y. Applications of improved grey prediction model for power demand forecasting [J]. Energy Conversion & Management, 2003, 44(14): 2241 -2249. [7] Zhou P, Ang B W, Poh K L. A trigonometric grey prediction approach to forecasting electricity demand [J]. Energy, 2006, 31(14):2839-2847. [8] Wu W, Ma X, Zeng B, et al. Forecasting short-term renewable energy consumption of China using a novel fractional nonlinear grey Bernoulli model [J]. Renewable Energy, 2019, 140:70-87. [9] Askarzadeh A. Comparison of particle swarm optimization and other metaheuristics on electricity demand estimation: A case study of Iran [J]. Energy, 2014, 72(7):484-491. [10] Canyurt O E, Ozturk H K. Application of genetic algorithm (GA) technique on demand estimation of fossil fuels in Turkey[J]. Energy Policy, 2008, 36(7): 2562 -2569. [11] Dumitru C D, Gligor A. Daily average wind energy forecasting using artificial neural networks [J]. Procedia Engineering, 2017, 181:829-836. [12] Zhang X, Wang J, Zhang K. Short-term electric load forecasting based on singular spectrum analysis and support vector machine optimized by Cuckoo search algorithm [J]. Electric Power Systems Research, 2017, 146:270-285. [13] Zong W G. Transport energy demand modeling of South Korea using artificial neural network [J]. Energy Policy, 2011, 39(8):4644-4650. [14] Yu L, Zhao Y, Tang L, et al. Online big data-driven oil consumption forecasting with Google trends [J]. International Journal of Forecasting, 2019, 35(1): 213 -223. [15] Kıran M S, Özceylan E, Gündüz M, et al. A novel hybrid approach based on particle swarm optimization and ant colony algorithm to forecast energy demand of Turkey [J]. Energy Conversion & Management, 2012, 53(1):75-83. [16] Zeng Y R, Zeng Y, Choi B, et al. Multifactor-Influenced energy consumption forecasting using enhanced Back-propagation neural network [J]. Energy, 2017, 127: 381-396. [17] Ahmad T, Chen H. Potential of three variant machine-learning models for forecasting district level medium-term and long-term energy demand in smart grid environment[J]. Energy, 2018, 160: 1008-1020. [18] Wei N, Li C, Peng X, et al. Daily natural gas consumption forecasting via the application of a novel hybrid model[J]. Applied Energy, 2019, 250: 358-368. [19] Comodi G, Cioccolanti L, Gargiulo M. Municipal scale scenario: Analysis of an Italian seaside town with MarkAL-TIMES[J]. Energy Policy, 2012, 41(1): 303 -315. [20] Huang Y, Bor Y J, Peng C Y. The long-term forecast of Taiwan’s energy supply and demand: LEAP model application [J]. Energy Policy, 2011, 39(11): 6790 31

Journal Pre-proof -6803. [21] Tsai M S, Chang S L. Taiwan’s 2050 low carbon development roadmap: An evaluation with the MARKAL model [J]. Renewable & Sustainable Energy Reviews, 2015, 49:178-191. [22] Kıran M S, Özceylan E, Gündüz M, et al. Swarm intelligence approaches to estimate electricity energy demand in Turkey[J]. Knowledge-Based Systems, 2012, 36(6):93-103. [23] Toksarı M D. Ant colony optimization approach to estimate energy demand of Turkey[J]. Energy Policy, 2007, 35(8): 3984-3990. [24] Shen L, Cheng S, Gunson A J, et al. Urbanization, sustainability and the utilization of energy and mineral resources in China[J]. Cities, 2005, 22(4): 287 -302. [25] Zhang C, Lin Y. Panel estimation for urbanization, energy consumption and CO2 emissions: a regional analysis in China [J]. Energy Policy, 2012, 49: 488-498. [26] Yuan X C, Sun X, Zhao W G, et al. Forecasting China’s regional energy demand by 2030: A Bayesian approach[J]. Resources, Conservation & Recycling, 2017, 127 : 85-95. [27] Zhong S, Xie X, Lin L, et al. Genetic algorithm optimized double-reservoir echo state network for multi-regime time series prediction[J]. Neurocomputing, 2017, 238:191-204. [28] Chouikhi N, Ammar B, Rokbani N, Alimi AM. PSO-based analysis of Echo State Network parameters for time series forecasting [J]. Applied Soft Computing, 2017, 55: 211–225. [29] Bozhkov L, Koprinkova-Hristova P, Georgieva P. Learning to decode human emotions with Echo State Networks[J]. Neural Networks, 2016, 78:112. [30] Bo Y C. Online adaptive dynamic programming based on echo state networks for dissolved oxygen control [J]. Applied Soft Computing, 2018, 62: 830–839. [31] Breiman L. Bagging predictors [J]. Machine Learning, 1996, 24(2):123-140. [32] Khwaja A S, Naeem M, Anpalagan A, et al. Improved short-term load forecasting using bagged neural networks [J]. Electric Power Systems Research, 2015, 125: 109-115. [33] Wang L, Hu H, Ai X Y, et al. Effective electricity energy consumption forecasting using echo state network improved by differential evolution algorithm[J]. Energy, 2018, 153: 801-815. [34] Jaeger H. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note [M]. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, 2001, 148(34): 13. [35] Ma Q, Shen L, Chen W, et al. Functional echo state network for time series classification [J]. Information Sciences, 2016, 373:1-20. [36] Xu X, Niu D, Fu M, et al. A multi time scale wind power forecasting model of a chaotic echo state network based on a hybrid algorithm of particle swarm optimization and tabu search [J]. Energies, 2015, 8(11):12388–12408. [37] Wang L, Lv S.X., Zeng Y.R. Effective sparse adaboost method with ESN and FOA for industrial electricity consumption forecasting in China [J]. Energy, 32

Journal Pre-proof 2018, 155: 1013-1031. [38] Maiorino E, Bianchi F M, Livi L, et al. Data-driven detrending of nonstationary fractal time series with echo state networks[J]. Information Sciences, 2017, 382: 359 -373. [39] Zhao Y, Li J, Yu L. A deep learning ensemble approach for crude oil price forecasting [J]. Energy Economics, 2017, 66:9-16. [40] Efron B, Tibshirani R J. An introduction to the bootstrap [M]. Chapman & Hall, 1993. [41] Kotsiantis S B, Kanellopoulos D, Zaharakis I D. Bagged averaging of regression models[C]//IFIP International Conference on Artificial Intelligence Applications and Innovations. Springer, Boston, MA, 2006: 53-60. [42] Martínez-Muñoz G, Suárez A. Out-of-bag estimation of the optimal sample size in bagging[J]. Pattern Recognition, 2010, 43(1):143-152. [43] Bühlmann P, Yu B. Analyzing bagging [J]. Annals of Statistics, 2002, 30(4): 927-961. [44] Das S, Mullick S S, Suganthan P N. Recent advances in differential evolution – An updated survey [J]. Swarm & Evolutionary Computation, 2016, 27:1-30. [45] Zeng Y R, Peng L, Zhang J L., et al. An effective hybrid differential evolution algorithm incorporating simulated annealing for joint replenishment and delivery problem with trade credit [J]. International Journal of Computational Intelligence Systems, 2016, 9(6): 1001-1015. [46] Neri F, Tirronen V. Recent advances in differential evolution: a survey and experimental analysis. [J]. Artificial Intelligence Review, 2010, 33(1):61-106. [47] Peng L, Liu S, Liu R, et al. Effective Long short-term memory with differential evolution algorithm for electricity price prediction [J]. Energy, 2018, 162: 1301 -1314. [48] Storn R, Price K. Differential Evolution-A simple and efficient heuristic for global optimization over continuous spaces [J]. Journal of Global Optimization, 1997, 11(4): 341-359. [49] Moretti F, Pizzuti S, Annunziato M, et al. Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling [J]. Neurocomputing, 2015, 167(C):3-7. [50] Dantas T M, Oliveira F L C, Repolho H M V. Air transportation demand forecast through Bagging Holt Winters methods[J]. Journal of Air Transport Management, 2017, 59:116-123. [51] Jin, Sainan, Su, Ullah A. Robustify financial time series forecasting with Bagging [J]. Econometric Reviews, 2014, 33(5-6):575-605. [52] Kim M J, Kang D K. Ensemble with neural networks for bankruptcy prediction [J]. Expert Systems with Applications, 2010, 37(4):3373-3379. [53]Wang X, Tang L. An adaptive multi-population differential evolution algorithm for continuous multi-objective optimization [J]. Information Sciences, 2016, 348 (2):124-141. [54] Wang L, Hu H L, Liu R, et al. An improved differential harmony search algorithm for function optimization problems [J]. Soft Computing, 2019, 23 (13): 4827-4852. 33

Journal Pre-proof [55] Mlakar U, Fister I, Brest J, et al. Multi-objective differential evolution for feature selection in facial expression recognition systems [J]. Expert Systems with Applications, 2017, 89: 129-137. [56] Coelho L D S. An efficient particle swarm approach for mixed-integer programming in reliability–redundancy optimization applications [J]. Reliability Engineering & System Safety, 2009, 94(4):830-837. [57] Niu D, Wang Y, Wu D D. Power load forecasting using support vector machine and ant colony optimization [J]. Expert Systems with Applications, 2010, 37(3): 2531-2539. [58] Wang J, Li L, Niu D, et al. An annual load forecasting model based on support vector regression with differential evolution algorithm [J]. Applied Energy, 2012, 94(6): 65-70. [59] Wu Q, Peng C. A hybrid BAG-SA optimal approach to estimate energy demand of China[J]. Energy, 2017, 120: 985-995. [60] Kapetanakis D S, Mangina E, Finn D P. Input variable selection for thermal load predictive models of commercial buildings[J]. Energy and Buildings, 2017, 137: 13-26. [61] Ming M, Niu D. Annual electricity consumption analysis and forecasting of China based on few observations methods [J]. Energy Conversion and Management, 2011, 52(2):953-957. [62] Laubscher R. Time-series forecasting of coal-fired power plant reheater metal temperatures using encoder-decoder recurrent neural networks[J]. Energy, 2019: 116187. [63] Wang L, Xiong Y N, Li S W, et al. New fruit fly optimization algorithm with joint search strategies for function optimization problems [J]. Knowledge-Based Systems, 2019, 176:77-96. [64] Wang L, Wang Z G, Qu H, et al. Optimal forecast combination based on neural networks for time series forecasting[J]. Applied Soft Computing, 2018, 66: 1-17.

34

Journal Pre-proof

Declaration of interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Journal Pre-proof

Highlights 

Propose an enhanced optimization model using bagged echo state network improve by differential evolution.



The proposed model is firstly used for energy consumption forecasting.



The proposed model achieves better forecasting performance than the existing best models for two comparative examples.



Mean absolute percentage error of the proposed model is 0.215% for total energy consumption forecasting in China.