A hybrid electricity price forecasting model with Bayesian optimization for German energy exchange

Electrical Power and Energy Systems 110 (2019) 653–666 Contents lists available at ScienceDirect Electrical Power and Energy Systems journal homepag...

Download PDF

4MB Sizes 2 Downloads 96 Views

Report

Full Text

Electrical Power and Energy Systems 110 (2019) 653–666

Contents lists available at ScienceDirect

Electrical Power and Energy Systems journal homepage: www.elsevier.com/locate/ijepes

A hybrid electricity price forecasting model with Bayesian optimization for German energy exchange ⁎

T

⁎

Hangyang Chenga, Xiangwu Dingb, , Wuneng Zhoua, , Renqiang Dinga a b

College of Information Sciences and Technology, Donghua University, Shanghai 201620, China College of Computer Science, Donghua University, Shanghai 201620, China

A R T I C LE I N FO

A B S T R A C T

Keywords: Electricity price forecasting Time series analysis Empirical wavelet transform Bi-directional long short-term memory Bayesian optimization

Electricity price forecasting aﬀects the operation of the entire electricity market and it is extremely important to every market participant. In this paper, a novel hybrid method, with using empirical wavelet transform (EWT), support vector regression (SVR), Bi-directional long short-term memory (BiLSTM) and Bayesian optimization (BO), is proposed to increase the accuracy of electricity price forecasting. First, EWT is used as a processing tool to decompose the original signal into speciﬁc modal components according to the characteristics of the signal itself. Then, considering the complexity of forecasting nonlinear subseries, SVR and BiLSTM are used as basic framework to forecast the nonlinear subseries. At the same time, BO is introduced to adjust parameters and optimize model performance. In last, the ﬁnal prediction results are combined by the prediction results of different models. The proposed hybrid model is employed on the data gathered from the European Power Exchange Spot (EPEXSPOT). Five diﬀerent case studies are adopted to verify the eﬀectiveness of BO, EWT and hybrid model respectively. Statistical tests of experimental results compared with other situations demonstrated the proposed hybrid model can achieve a better forecasting performance.

1. Introduction The future development of the world power industry and the application of power science research are focused on the reform of power marketization. The main content of the reform is to compete in diﬀerent levels of power generation, transmission, distribution, and sales. Electricity price reform is the focus of marketization [1]. In 2014, China oﬃcially adopted the “new electricity reform” program. The core of the market is the marketization of electricity prices. The ultimate goal is to realize the marketization of electricity trading and ﬁnally realize the market supply and demand relationship to determine the selling price. Electricity prices aﬀect the operation of the entire electricity market and are extremely important to every market participant. Each participant trades in the electricity market based on the price of electricity. In the market competition, if the electricity price is accurately predicted in advance, it can be in a favorable position to obtain more beneﬁts. Electricity price ﬂuctuations aﬀect the ﬂow and distribution of various resources in the electricity market, reﬂecting the supply and demand relationship of electricity as a commodity's attributes in the short term. Therefore, it is one of the urgent problems in the short-term trading of the electricity market [2]. However, due to the unique characteristics of electricity and the

⁎

uncertainty of market and bidding strategies, electricity price forecasts become more complex than power load forecasting. In stark contrast to past markets, current electricity prices are often accompanied by many characteristics, such as high volatility (price diﬀerences are very large), strong mean regression (prices tend to revolve around long-term equilibrium volatility), and sudden sums. Unexpected price jumps or spikes quickly decay (associated with shock price elasticity demand and supply) [3]. All these various reasons make the prediction of electricity prices very diﬃcult. Over the past few decades, many electricity price forecasting models have been suggested. These models can be divided into four categories as fundamental models, statistical models, artiﬁcial intelligence models and hybrid model in general [4]. The fundamental models mainly uses the relationship between the most basic physical characteristics and the economy in the production and trading of electricity to conduct electricity price forecasting. Vahvilainen and Pyykkonen [5] established a model and predicted the monthly price movements in the electricity market by capturing the relationship between 13 weather factors, 4 load factors and 10 supply parameters. However, it also has great limitations in practical applications. For example, researchers need to understand the economic characteristics of the market and the large amount of computation time

Corresponding authors. E-mail addresses: [email protected] (X. Ding), [email protected] (W. Zhou).

https://doi.org/10.1016/j.ijepes.2019.03.056 Received 13 November 2018; Received in revised form 4 February 2019; Accepted 22 March 2019 0142-0615/ © 2019 Published by Elsevier Ltd.

Electrical Power and Energy Systems 110 (2019) 653–666

H. Cheng, et al.

many areas. RNN is an artiﬁcial neural network in which nodes are connected in a loop. The internal state of such a network can exhibit dynamic timing behavior. Unlike the feed-forward neural network, the RNN can use its internal memory to process input sequences of any timing. Ugurluet al. [21] used RNN models to perform electricity price forecasting. Devet al. [22] suggested Using neural networks and extreme value distributions to model electricity pool prices. Mandal et al. [23] developed RNN to forecast electricity price for PJM (Pennsylvania—New Jersey—Maryland) day-ahead market and observed that the proposed RNN model has a strong robustness and eﬃcient performance. However, the traditional RNN architecture model has many shortcomings, such as gradient disappearance, gradient explosion, and the perception of future time nodes for the previous time nodes, which makes it diﬃcult for RNN to train long-term dynamic sequences. In recent years, Bengio et al. have designed a new RNN structural model called long short-term memory (LSTM). LSTM is able to accept and utilize past time series information using input gates, forgetting gates, and output gates in the fabric. Therefore, LSTM solves the problem of gradient disappearance and gradient explosion to some extent, and provides a method to further improve prediction accuracy [24]. Peng et al. [25] suggested introduced eﬀective LSTM with diﬀerential evolution model for electricity price forecasting and results indicate that the proposed DE (diﬀerential evolution)–LSTM model outperforms existing forecasting models in terms of forecasting accuracies. Kuo et al. [26] combined with CNN and LSTM to perform electricity price forecasting. In addition, due to the inherent shortcomings of a single mode, hybrid mode can combine the advantages of diﬀerent models to achieve higher accuracy [27]. Chen et al. [24] developed a novel wind speed forecasting method using nonlinear-learning ensemble of deep learning time series prediction and extremal optimization. As discussed in ref. [28], a wavelet transform and hybrid prediction method based on neural network and optimization algorithm is proposed, which used to perform electricity price forecasting and proved that the proposed method has a better performance than single model. In [29], the authors developed a new hybrid forecasting system by using wavelet transform, ARMA and kernel-based extreme learning machine methods for electricity price forecasting and show that The results show that the proposed method has higher prediction accuracy, versatility and practicability than other models. To overcome the shortcomings of singlelayer decomposition algorithm, Wang et al. [30] suggested a hybrid forecasting approach using two-layer decomposition technique and BP neural network optimized by ﬁreﬂy algorithm, in which (Variation Mode Decomposition) VMD was speciﬁcally applied to further decompose data series intomultiple subsequence in the interest of the forecast accuracy. The principal contributions of this paper are as follows.

is required to establish equations. Statistical models, which are also called stochastic model, use mathematical methods combined with historical price information or price-related information to predict current prices. This type of method mainly includes, auto-regressive integrated moving average (ARIMA), generalized autoregressive conditional heteroskedasticity (GARCH), wavelet transform (WT), etc. Chen et al. [6] developed a conventional model based on outlier smooth transition autoregressive GARCH model, and the results show that the proposed extended model signiﬁcantly improves the representation of the stochastic price process considered. Contreras et al. [7] employed ARIMA model for the prediction of electricity prices in mainland Spain and Californian markets. Zhang et al. [8] adopted WT to predict electricity price in two Australian districts. However, these statistical methods are often based on linear sequence analysis. So it is often pointed out that the ability to capture nonlinear behavior and rapid changes in the electricity price signal is limited, resulting in poor electricity price forecasting performance. In recent years, artiﬁcial intelligence algorithms have been greatly developed, and have achieved fruitful results in dealing with practical problems. In general, the artiﬁcial intelligence algorithms can mainly divided into two groups as conventional machine learning models and deep learning models. Various kinds of evolutionary computing techniques, (such as genetic algorithm, particle swarm algorithm, ant algorithm), are most common conventional algorithms and usually used in conjunction with other algorithms. Saradaet al. [9] applied genetic algorithm based neural network (GANN) approach, and pointed out that this method oﬀers exact and has higher prediction accuracy. Razaket al. [10] introduced a novel hybrid method of LSSVM-GA (Least Squares Support Vector Machine - Genetic Algorithm) with multiple stage optimization for electricity price forecasting. In addition, SVM is also widely used in time series prediction. Li, et al. [11] presented an ANN (Artiﬁcial Neural Network)-SVM model which is proven to have a better performance than Autoregressive moving average model (ARMA) and GARCH models. Yan et al. [12] presented a multiple support vector machine model to perform the mid-term electricity price forecasting. As the fastest growing algorithm in the last decade, deep learning models include recurrent neural network (RNN), convolutional neural network (CNN), radial basis function (RBF), Elman neural network, and extreme learning machine (ELM). Gao et al. [13] used ANN to electricity price time series for the purpose of improving forecasting accuracy. Peter et al. [14] presented a Sequential wavelet-ANN with embedded ANN-PSO (Particle Swarm Optimization) hybrid electricity price forecasting model for Indian energy exchange. Paras et al. [15] successfully performed accurate and eﬃcient electricity price forecasting in Pennsylvania–New Jersey–Maryland by application of ANN models. As a class of machine learning algorithm based on feedforward neuron network, Extreme Learning Machine (ELM) has good advantages of high learning eﬃciency and generalization ability, and is widely used in classiﬁcation, regression, and aggregation. Class, feature learning, etc. Raﬁeiet al. [16] presented a probabilistic forecasting model of hourly electricity price by using ELM. Chai et al. [17] proposed a novel combined model based on ensemble ELM and logistic ensemble model output statistics (EMOS), in which proved to provide superior full distributional forecasting skill over the existing approaches. Sun et al. [18] developed a retail sales forecasting framework based on ELM, in which results show that the method has higher prediction accuracy than other neural network methods. Xu et al. [19] proposed a solar radiation prediction method based on genetic algorithm for extreme learning machine, and pointed out that the method has higher prediction accuracy and can adapt to the irradiation prediction needs under the condition of sudden weather conditions. However, the random allocation in ELM reduces the performance of network learning to a certain extent [20]. After extensive experimental research, deep networks have achieved far more precision than classical Machine Learning (ML) methods in

(1) A novel hybrid series prediction model is proposed for wind speed forecasting. (2) Bayesian optimization is used to optimize parameters for enhancing model performance. (3) EWT is introduced to explore and exploit the wind speed time series. (4) The eﬀectiveness of proposed model is validated on four case studies. (5) The inﬂuence of multi-lagged variables is interpreted. 2. Methodology 2.1. EWT Conventional data preprocessing methods like wavelet transform (WT) and Fourier transform technology have some short boards (such as diﬃcult in selecting mother wavelet), empirical mode decomposition (EMD) began from Hibert-Huang transform which put forward by 654

Electrical Power and Energy Systems 110 (2019) 653–666

H. Cheng, et al.

wavelet, and F −1 [·] is the inverse Fourier transformation. Finally, the original signal can be reconstructed as:

Huang et al., in 1998 [31]. Due to there are some complex mixed pattern problems in practice, a new data denoising technology namely ensemble empirical mode decomposition (EEMD) was developed by Wu and Huang, in 2009. On the basis of EMD method, EEMD method can further process the time sequences data by put the white-noise sequence into the initial data sequence, though this way solves the problems of complex mixed pattern. Nevertheless, some other issues will arise due to add the white-noise sequence into the initial data sequence for instance residue noise and cost more time to deal with the time sequences [32]. Considering that WT, EMD and EEMD methods have some disadvantages, EWT was proposed by Jerome Gilles [33]. As a new adaptive time-frequency analysis method, the EWT method takes advantage of the empirical mode decomposition method and traditional wavelet analysis in signal processing. In the frequency domain analysis of the signal, the Fourier spectrum is adaptively segmented by capturing the frequency domain maximum point, and ﬁnally the separation of diﬀerent modes in the signal frequency component is realized. The main idea of EWT is based on the divided Fourier spectrum. Firstly, we can assume that the Fourier support interval [0, π] is divided into N consecutive parts, and ωn represents the boundary between each segment (ω0 = 0, ωN = π ) . As showed in Fig. 1, every band

N

f (t ) = wfε (0, t )*ϕ1 (t ) + ∑n = 1 wfε (n, t )*ψn (t ) =

(6)

fk (t ) = ωfε (k , t )*φk (t )

(7)

N

∑ fk (t )

(8)

k=0

2.2. LSTM Due to its special loop structure, RNN is able to map input sequences to output sequences to understand complex temporal dynamics. However, vanishing gradient and explosion gradient problem has always occurred in Deep Neural Networks (DNN) models. As a special structure of RNN, LSTM solves these two problems by changing the cell structure and adding a memory cell to judge whether information needs to be remembered. Each memory unit contains the input gate, forget gate and output gate, which serve as an interface for information propagation within the network [34]. Therefore, LSTM is more suitable than RNN for processing important events with relatively long intervals and delays in the time series prediction. The structure of the LSTM processor unit is shown in Fig. 2. And the predicted series can be computed as:

(1)

ψn̂ (ω) ifωn + τn ⩽ |ω| < ωn + 1 − τn + 1 ⎧1 ⎪ cos[ π β ( 1 (|ω| − ω + τ ifωn + 1 − τn + 1 ⩽ |ω| < ωn + 1 ))] 1 1 + + n n 2 2τn ⎪ = + τn + 1 ⎨ 1 π ⎪ sin[ β ( (|ω| − ωn + τn ))] ifωn − τn ⩽ |ω| < ωn + τn 2 2τn ⎪ 0 otherwise ⎩

it = σ (Wxi xt + Whi ht − 1 + Wci ct − 1 + bi )

(2) The proportional relationship between τn and ωn is:

(9)

ft = σ (Wxf xt + Whf ht − 1 + Wcf ct − 1 + bi )

(10)

ct = ft ct − 1 + it tanh(Wxc xt + Whc ht − 1 + bi )

(11)

ot = σ (Wxo xt + Who ht − 1 + Wco ct + bo)

(12)

ht = ot tanh(ct )

(13)

(3)

And set {ϕ1 (t ), {ψn (t )}nN= 1} is a tight frame of L2 (·) . Then, the EWT can be implemented in the same way as the classic wavelet transform, and the corresponding mathematical expression is deﬁned as:

W fε (n, t ) = 〈f (t ), ψn (t )〉 = ∫ f (τ ) ψn (τ¯− t )dτ ¯ =F −1 (f ̂ (ω) ψn̂ (ω)) Where

are the Fourier transformations of * is convolution symbol. Thus, the

f0 (t ) = ωfε (0, t )*φ1 (t )

f (t ) =

n=1

W fε (n,

where it, ft, ot and ct are respectively the vectors for input gates, forget gates, output gates and cell activations; ct is activation function output vector; ht is the Output variable; Wxc, Wxi, Wxf, Wxo, Whc, Whi, Whf, Who

(4) g

t ) is diﬀerent frequency component of the empirical

Input gate f

1 Forget gate

2τ 3

2τ 2

2τ n

2τ N

2τ n +1

h

•

f

w

w2

w3

w4

w5

•

Cell

2τ 1

(5)

After EWT decomposition, a given signal f (t) can be expressed as intrinsic mode function:

∀ n > 0 . The empirical wavelets function ψn̂ (ω) and the empirical scaling function ϕn̂ (ω) are deﬁned by Eqs. (1) and (2), respectively:

0< γ <1

ε

N

empirical mode is given by:

N

τn = γωn ,

ω)*ϕ1̂ (ω) + ∑n = 1 W f̂ (n, ω)*ψn̂ (ω))

ε ε Where W f̂ (0, ω) and W f̂ (n, ω) ε ε ωf (0, t ) and ωf (n, t ) , respectively;

pass ﬁlters can be deﬁned as Λn = [ωn − 1, ωn], and ⋃ Λn = [0, π ],

1 if |ω| < ωn − τn ⎧ ⎪ ϕn̂ (ω) = cos[ π2 β ( 21τ (|ω| − ωn + τn ))] ifωn − τn ⩽ |ω| < ωn + τn n ⎨ ⎪0 otherwise ⎩

ε F −1 (W f̂ (0,

Output gate

w6

Fig. 1. Partitioning of the Fourier axis.

Fig. 2. LSTM unit. 655

f

Electrical Power and Energy Systems 110 (2019) 653–666

H. Cheng, et al.

Output layer W6

Backward layer

W6

W5

W6

W4

Forward layer

W2 W3

W1

W4

W2 W3

W1

← ← → ht = H (W3 x t + W5 ht − 1 + b )

(15)

→ yt = W4 ht + W6 ht + by

(16)

→ ← Where ht , ht , yt are respectively the vectors forforward propagation, backward propagation and output layer; W1, W2, W3, W4, W5, W6 are → ← the corresponding weight coeﬃcients; b , b , by are the corresponding bias vectors.

W5

W4

W2

(14)

←

W5

W5

→ → → ht = H (W1 x t + W2 ht − 1 + b )

W2

W3

2.4. Bayesian optimization

W1

Hyperparameters are parameters of the training algorithm itself that are not learned directly from the training process. Each model has diﬀerent hyperparameters and a good choice of hyperparameters can really make an algorithm obtain an optimal performance. For example, the learning rate, the number of layers and the number of neurons in each layer need to be speciﬁed in BiLSTM. Another example is Gradient Boosting Decision Tree (GBDT) in which one has to choose the number of maximum iterations, loss function, subsampling rate, etc. EWT has lots of hyperparameters needs to be speciﬁed before training which is essential to performance of decomposition [35,36]. However, artiﬁcial parameter setting is ineﬃcient and always suﬀers from human bias. Bayesian optimization (BO) has emerged as practical tool for parameter selection in prediction systems, which can eﬃciently search the space of possible hyperparameters and manage a large set of experiments for hyperparameters tuning. Many optimization settings assume that the objective function f(x) has a known mathematical form, which is convex or is cheap to evaluate. But the characteristics in forecasting models do not apply to the problem of ﬁnding hyperparameters where the function is unknown and expensive to evaluate. This is where Bayesian optimization comes into play [37]. The basic idea of BO is to estimate the posterior distribution of the objective function based on the data using Bayes' theorem, and then

Input layer Fig. 3. Expansion structure of bidirectional recurrent neural network.

are the corresponding weight coeﬃcients; bi, bc, bf, bo are the corresponding bias vectors; σ is activation function, in which the sigmoid function is used in this paper. 2.3. BiLSTM In timing processing, standard RNN and LSTM often ignore future information, while BiLSTM can take advantage of future information. The basic structural idea of BiLSTM is that the front and back layers of each training sequence are two LSTM networks respectively, and the LSTM networks are both connected to one input layer and one output layer. The output layer can obtain past information of each point in the input sequence, and can also obtain future information of each point through this structure. Fig. 3 shows a bi-directional neural network that expands along time [34]. Increased neural network update formula can be computed as:

The proposed hybird model 120 100 80

Model initialization

60 40 20

1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 295 309 323 337 351 365 379 393 407 421 435 449 463 477 491

0

Trian dataset

Gaussian Process Regressor

Trian dataset EWT Low frequency component

High frequency component

(x, y) of the maximum AC value Acquisition Function

SVR

(x,y)

BiLSTM

fulfil requirements ? Forecasting electricity price

Y Output (x,y)

Evaluate the forecasting results

Fig. 4. Framework of the proposed model. 656

N

Electrical Power and Energy Systems 110 (2019) 653–666

€ /MWh

H. Cheng, et al.

80 70 60 50 40 30 20 10 0 Fig. 5. German bidding zone electricity price data for September 2014

Table 1 The major statistical characteristics of the Day-Ahead electricity price data in €/MWh. Count

Mean

St.d

Min

25%

50%

75%

Max

720

34.79

9.66

9.85

27.95

34.35

41.48

69.3

Z=

x

2.5. Data cleaning Data cleaning is a very important part of the model construction, and it is the process of manual inspection and comparison of data. In this process, the missing values, outliers, duplicate values, etc. in the data can be corrected to ensure data consistency. In order to avoid the impact on the prediction accuracy due to errors in the electricity price entry, it is necessary to properly handle the outliers and missing values in the data. Through observation of the experimental data, this paper ﬁlls in the missing values using the estimation (taking the average of the ﬁve data before and after); when the electricity price data is negative, it is treated as an outlier and ﬁlled by the method of missing values, and then the estimation process is performed. 2.6. Normalization Data normalization is a common data preprocessing method in machine learning. After the raw data is normalized by the data, it can be converted into a dimensionless pure value, which is conducive to the prediction and comprehensive comparison evaluation. Since the electricity price sequence involves a wide range of time, sometimes it will produce extreme electrical value. Using the conventional min-max standardization method, the extreme electricity price will have a greater impact on the normalization result. This article uses a modiﬁed z-score standardized approach. In this method, the minima of the maxima does not have a signiﬁcant eﬀect on the normalization eﬀect, so the normalized result can better reﬂect the distribution of the data.

(17)

Where the expectation is taken over Yt+1 which is distributed as the posterior objective value at xt+1. When using a Gaussian process, the following analytical expression for EI is expressed as followed:

(μ (x ) − f (x +)) ϕ (Z ) + σ (x ) ϕ (Z ), if σ (x ) > 0 EI (x ) = ⎧ ⎨ 0, if σ (x ) = 0 ⎩

(19)

After the decomposition is completed, the lowest frequency component (IMF0) is used as the low frequency component, and the remaining frequency components are added as the high frequency component.

select the hyperparameter combination of the next sample according to the distribution. It makes full use of the information of the previous sampling point, and its optimization works by learning the shape of the objective function and ﬁnding the parameter that maximizes the result to the global maximum. In order to avoid the local optima, exploration and exploitation need to be tradeoﬀ. Exploitation means that sampling in the areas where the global optimal solution is most likely to occur according to the posterior distribution; exploitation means that sampling in the areas that have not been sampled. To encode the tradeoﬀ between exploration and exploitation, an acquisition function, which provides a single measure of how useful it would be to try any given point, can be deﬁned. In this paper, expected improvement (EI) is used as acquisition function. EI measures the expected increase in the maximum objective value seen over all experiments, given the next point we pick. Mathematically, x can be obtained by Eq. (17):

x = argmax E (max{0, ft + 1 (x ) − f (X+)}|Dt )

μ (x ) − f (x +) σ (x )

(18)

Where:

Fig. 6. Weekly and hourly average day-ahead prices in September 2014 657

Electrical Power and Energy Systems 110 (2019) 653–666

H. Cheng, et al.

Table 2 Forecasting results with 60% as the train set and 40% as the test set. Dataset

Mean

St.d

Min

Median

Max

MAE

RMSE

MAPE

Entire dataset Train dataset Test dataset

34.79 34.13 35.76

9.66 9.07 10.39

9.85 13.52 9.85

34.35 34.9 33.94

69.3 54.12 69.3

0.1209

0.2177

0.3353

Table 3 Forecasting results with 70% as the train set and 30% as the test set. Dataset

Mean

St.d

Min

Median

Max

MAE

RMSE

MAPE

Entire dataset Train dataset Test dataset

34.79 34.37 35.77

9.66 9.05 10.89

9.85 13.52 9.85

34.35 34.9 33.17

69.3 54.12 69.3

0.1116

0.1862

0.2976

Table 4 Forecasting results with 80% as the train set and 20% as the test set. Dataset

Mean

St.d

Min

Median

Max

MAE

RMSE

MAPE

Entire dataset Train dataset Test dataset

34.79 34.50 35.93

9.66 9.38 10.64

9.85 9.85 17.41

34.35 34.7 33.17

69.3 69.3 63.24

0.1148

0.1872

0.3075

0.25

0.34%

2.7. Feature selection of lagging variables

0.33% 0.2

In the predictive model, the choice of the lag variable greatly aﬀects the predictive performance. Therefore, an indicator is needed to evaluate the relative importance of time series prediction input features. There are various decision trees, such as bagged trees and random forests, which can be used to calculate the importance scores of eigenvalues. In this paper, we use stochastic forest model to help assess the relative importance of input features in time series prediction by using the normality score index, and summarize the relative importance score of in a week, a total of 168 lagged variables. A large number of trees are used to ensure the stability of the score.

0.32% 0.15

0.31% 0.3%

0.1

0.29% 0.05 0.28% 0

0.27% 40% test set

30% test set MAE

RMSE

20% test set MAPE

2.8. Performance indices

Fig. 7. Forecasting results with diﬀerent divisions of datasets

In order to evaluate the prediction eﬀect, this paper needs to adopt some performance indicators. Three widely used statistical criteria are used to evaluate the prediction performance of the electricity price prediction model: the mean absolute error (MAE), the root mean square error (RMSE) and the mean absolute percent error (MAPE). The calculation formula for the two indicators is as follows:

Table 5 Forecasting results with diﬀerent NN values. Others

NN

MAE

RMSE

MAPE

LR = 0.0008, C = 50000, σ = 0.00009.

20 40 60 80 100

0.1655 0.1378 0.1416 0.1116 0.1191

0.2388 0.2234 0.2184 0.1862 0.1883

0.4653 0.3667 0.3689 0.2976 0.3147

MAE =

1 T

T

∑ |ptture 1 T

RMSE = Table 6 Forecasting results with diﬀerent LR values. Others

LR

MAE

RMSE

MAPE

NN = 80, C = 50000, σ = 0.00009.

0.0002 0.0004 0.0006 0.0008 0.0010

0.1302 0.1215 0.1252 0.1116 0.1145

0.1890 0.2087 0.2035 0.1862 0.1897

0.3768 0.3229 0.3348 0.2976 0.3020

MAPE =

x−μ m ∑i = 1 |x i − μ|

(21)

1 T

T

∑ (ptture

− pt forecast )2

(22)

t=1 T

∑ t=1

ptture − pt forecast ptture

ptture

(23) forecast

Where is the actual electricity price value at time t, pt is the forecasting electricity price value at time t. MAE represents the degree of similarity between predicted and actual values; RMSE measures the overall deviation between predicted and actual values; and MAPE considers not only the errors between predicted and real values, but also the ratio between errors and real values.

Improved z-score standardization is deﬁned as followed:

x∗ =

− pt forecast |

t=1

3. The proposed forecasting model

(20)

In this paper, a novel forecasting model is proposed to receive better

Where μ is median. 658

Electrical Power and Energy Systems 110 (2019) 653–666

H. Cheng, et al.

parameter tuning. The architecture of the proposed model is depicted in Fig. 4. As showed in Fig. 4, the main steps of the proposed model are follows:

Table 7 Forecasting results with diﬀerent C values. Others

C

MAE

RMSE

MAPE

NN = 80, LR = 0.0008, σ = 0.00009.

100 1000 5000 10,000 50,000

0.2452 0.2596 0.1610 0.1653 0.1116

0.3493 0.3342 0.1926 0.2175 0.1862

0.6987 0.7115 0.5035 0.4477 0.2976

(1) Data analysis and processing. Before the model is built, it is necessary to carry on data analysis and processing, including data cleaning, normalization and feature selection. The details have presented in Sections 2.5–2.8. (2) Owing to a strong ability of decomposing the original signal into speciﬁc modal components according to the characteristics of the signal itself, EWT is used in this paper as a signal processing tool. Original signal is decomposed into high frequency component and low frequency component by EWT. The details of EWT have provided in Section 2.1. (3) SVR and BiLSTM are used as basic forecasting framework in this paper and are adopted to complete the forecasting for the sub-layers with diﬀerent frequency in order to enhancing the forecasting performance. The details of BiLSTM network are described in Section 2.3. (4) A practical and stable algorithm, Bayesian optimization, is applied in this paper to eﬃciently search the space of possible hyperparameters and manage a large set of experiments for hyperparameter tuning to optimize performance of BiLSTM network and SVR. The details have presented in Section 2.4.

Table 8 Forecasting results with diﬀerent σ values. Others

σ

MAE

RMSE

MAPE

NN = 80, LR = 0.0008, C = 50000.

0.00005 0.00009 0.00015 0.00030 0.00050

0.1454 0.1116 0.1259 0.1590 0.1772

0.1922 0.1862 0.2204 0.2930 0.3726

0.4226 0.2976 0.3221 0.4069 0.4474

prediction performance based on EWT, BiLSTM, SVR and Bayesian optimization. Owing to a strong ability of decomposing the original signal into speciﬁc modal components according to the characteristics of the signal itself, EWT is used in this paper as a signal processing tool. BiLSTM has excellent performance in forecasting task, and it is used as basic framework in this paper. Considering the long training time of BiLSTM, SVR is also used as another forecasting model because of its short training time and potential ability in forecasting. Since artiﬁcially setting training parameters has a great impact on prediction performance, a powerful parameter optimization method, Bayesian optimization, is applied in this paper to eﬃciently search the space of possible hyper-parameters and manage a large set of experiments for hyper-

4. Case study This section presents the forecasting performance of the proposed model. We ﬁrstly describe the electricity price data in Section 4.1. Then the sensitivity analysis of the proposed forecasting model is to be validated on case study 1 and case study 2. And we have demonstrated the

Fig. 8. Forecasting results of diﬀerent cases 659

Electrical Power and Energy Systems 110 (2019) 653–666

H. Cheng, et al.

4.1. Data description

Table 9 Performance indices of forecasting results obtained by diﬀerent models on case study 3. Model

MAE

RMSE

MAPE (%)

BiLSTM EMD-BiLSTM-SVR EEMD-BiLSTM-SVR EWT-BiLSTM-SVR

0.4742 0.3368 0.2528 0.1116

0.6065 0.5975 0.4002 0.1862

1.3737 0.9742 0.7067 0.2976

0.7

1.6%

0.6

1.4%

In this paper, the proposed forecasting model was applied to the electricity price data collected by the major European energy exchange, the EPEXSPOT. The EPEXSPOT publishes hourly Day-Ahead prices for the German bidding zone starting from July 20, 2007. All prices are given in € /MWh. The electricity price data sampled per one hour from September 01, 2014 to September 30, 2014 (total 720 data) were utilized as the dataset to perform electricity price forecasting and were showed in Fig. 5. Table 1 shows the major statistical characteristics of the price data. It can be observed that half of the data range between the 25% and 75% quantile of 28 €/MWh and 41.5 €/MWh, the average price is 34.35 €/ MWh. The common temporal patterns in Day-Ahead prices mentioned before are depicted in Fig. 6. Each diagram shows the average value of all data points given a certain ﬁlter criterion. There are two major temporal patterns present in the data. The ﬁrst being weekly trends with highed prices in the working days, compared to weekends. Furthermore, the course of electricity prices shows a clear daily pattern, with low prices at night, peaks in the morning and evening hours and a drop during working hours. In case study 2–5, 70% (504 data) as the train set and 30% (216 data) as the test set. The parameters setting are illustrated in Section 4.3. And the electricity price data sampled per ten minutes were utilized as thedataset to perform thirty-minute ahead utmost short term electricity price prediction, that is, the predicted number of steps is 3 steps. Furthermore, in case study 1–4, the number of input variables is only 1, which is the actual electricity price data.

1.2%

0.5

1%

0.4

0.8% 0.3

0.6%

0.2

0.4%

0.1

0.2%

0

0% BiLSTM

EMD-BiLSTM-SVR EEMD-BiLSTM-SVR MAE

RMSE

EWT-BiLSTM-SVR

MAPE(%)

Fig. 9. Forecasting results obtained by diﬀerent models on case study 3

eﬀects of models and methods on prediction results through two experiments, and the inﬂuence of the relative validity of input features on prediction performance is presented in case study 5. Especially to deserve to be mentioned, all the experiments were carried out in the python3.5 environment on a 3.4 GHz PC with process I5-7500, 8 GB RAM and GTX 1050TI.

4.2. Case study 1: inﬂuence of datasets divisions on model performance In this section, we discussed the eﬀect of data set segmentation by three diﬀerent experiments. The forecasting results and statistical characteristics of dataset are presented in Tables 2–4 and Fig. 7. In

Fig. 10. The electricity price forecasting results obtained by diﬀerent models on case study 3 (a) BiLSTM; (b) EMD-BiLSTM-SVR; (c) EEMD-BiLSTM-SVR; (d) EWTBiLSTM-SVR. 660

Electrical Power and Energy Systems 110 (2019) 653–666

H. Cheng, et al.

Fig. 11. The electricity price forecasting errors obtained by diﬀerent models on case study 3 (a) BiLSTM; (b) EMD-BiLSTM-SVR; (c) EEMD-BiLSTM-SVR; (d) EWTBiLSTM-SVR.

4.3. Case study 2: inﬂuence of parameters setting on model performance

Table 10 Performance indices of forecasting results obtained by diﬀerent models on case study 4. Model

MAE

RMSE

MAPE (%)

KNN SVR GBDT ELM BiLSTM EWT-BiLSTM-SVR

0.9326 0.8290 0.8102 0.7704 0.4742 0.1116

1.6593 1.3720 1.3352 1.2810 0.6065 0.1862

2.7724 2.4925 2.4347 2.3184 1.3737 0.2976

This section focuses on the discussion about the eﬀect of diﬀerent parameter selection on experimental results. Same as the previous section, EWT-BiLSTM-SVR model is used in all experiments in this case study. As adjustment parameters, neurons number (NN) in BiLSTM, learning rate (LR) in BiLSTM, C in SVR andσin SVR are selected to test forecasting accuracy. Furthermore, the other parameters are exactly the same. The forecasting results are presented in Tables 5–8 and Fig. 8. From the following tables and ﬁgure, it can be seen that diﬀerent parameter selection have a huge eﬀect on forecasting accuracy. In BiLSTM neural network, number of neurons and learning rate have a decisive role in the eﬀect of the model. If the number of neurons in the hidden layer is too small, the network will not be able to learn well. However, an extremely large number of neurons value may aﬀect computational eﬃciency. And the learning rate setting is the same. The nonlinear ﬁtting ability of SVR is determined by parameter C. The size

addition, the proposed model, EWT-BiLSTM-SVR, is used in all experiments in this section. The results indicate that the diﬀerent divisions of datasets nearly have no eﬀect on the performance of the prediction model.

3% 2.5% 2% 1.5% 1% 0.5% 0%

1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0

MAE

RMSE

MAPE(%)

Fig. 12. Forecasting results obtained by diﬀerent models on case study 4. 661

Electrical Power and Energy Systems 110 (2019) 653–666

H. Cheng, et al.

Fig. 13. The electricity price forecasting results obtained by diﬀerent models on case study 4 (a) KNN; (b) SVR; (c) GBDT; (d) ELM; (e) BiLSTM; (f) EWT-BiLSTM-SVR.

of σ aﬀects the shape of the kernel function and also aﬀects the sensitivity of the model to noise. In addition to the parameters mentioned above, other parameters are determined by Bayesian optimization and experimental testing. In BiLSTM, the number of hidden layer was set as 1 and Droupout was set as 0.5. For EWT method, regularization uses 'Gaussian regularization', the ﬁlter length is 10, the ﬁlter standard deviation is 1.5; the detection mode is 'scalespace', the detection type is 'otsu', the maximum bandwidth is 3, and the initialization limit is [2 25]. As for GBDT model, nestimators, learning rate, max-depth and min-samples-split were set as 600, 0.01, 3 and 100, respectively. As for KNN model, the number of neighbors and leaf size are set as 12 and 30 respectively. ELM with 1 hidden layer and 100 neurons.

Tables 9 and Figs. 9–11 provides several important points of comparison between the four models: (1) The proposed EWT-BiLSTM-SVR model performs better than three other compared models widely used as forecasting methods, with the minimum value of MAE as 0.1116, RMSE as 0.1862 and MAPE as 0.2976%. Secondly, EEMD-BiLSTM-SVR also has good performance, whose values of MAE, RMSE and MAPE are 0.2528, 0.4002 and 0.7067% respectively. At the same time, EMD-BiLSTM-SVR has greater performance than BiLSTM. EMD-BiLSTM-SVR’s value of MAE, RMSE and MAPE are 0.3368, 0. 5975 and 0.9742% respectively, while the worst one, BiLSTM, with MAE as 0.4742, RMSE as0. 6065, MAPE as 1.3737%. (2) Compared with the other three methods, BiLSTM model has a larger performance gap. more speciﬁcally, with the addition of EMD and SVR, MAE increased by an average of 28.98%, RMSE increased by 1.48%, and MAPE increased by 29.08%; due to EEMD, MAE increased by 46.69%, RMSE increased by 34.01%, and MAPE increased by 48.55%; the addition of EWT and SVR improve the performance of the model, with MAE increased by 76.47%, RMSE increased by 69.30%, and MAPE increased by 78.34%. This proves that the reasonable decomposition and reconstruction method can make the non-stationary signal stable, and fully mine the characteristics and attributes of the original data, thus eﬀectively

4.4. Case study 3: inﬂuence of time series decomposition method on model performance In order to know the impact of time series decomposition method on model performance, we use EWT-BiLSTM-SVR model, EEMD-BiLSTMSVR model, EMD-BiLSTM-SVR model and BiLSTM model as comparative experiments. The forecasting results are listed in Table 9, and comparison performances are illustrated in Fig. 9. Furthermore, Figs. 10 and 11 display the forecasting results and prediction residual errors of each model, respectively. 662

Electrical Power and Energy Systems 110 (2019) 653–666

H. Cheng, et al.

Fig. 14. The electricity price forecasting errors obtained by diﬀerent models on case study 4 (a) KNN; (b) SVR; (c) GBDT; (d) ELM; (e) BiLSTM; (f) EWT-BiLSTM-SVR.

certain extent, and can adaptively represent the processed signal.

Table 11 Training time of diﬀerent models on case study 4. Model

Training time (s)

KNN SVR GBDT ELM BiLSTM EWT-BiLSTM-SVR

0.0416 0.0587 0.2617 0.0270 8.8893 9.5026

4.5. Case study 4: comparison of single model and hybrid model This case study is contribute to argued deductively about the merits and faults of EWT-BiLSTM-SVR models compared with KNN model, SVR model, GBDT model, ELM model and BiLSTM model. The forecasting results of diﬀerent models are listed in Table 10. Fig. 12 visualizes the comparison of forecasting performance. The forecast results and prediction residual errors of each model are showed in Figs. 13 and 14, respectively. Table 11 presents the training time of diﬀerent models on case study 4 (see Fig. 15). From Tables 10 and 11 and Figs. 12–14, we can see the following observations:

improving the prediction eﬀect of the model. (3) Compared with the EMD model, the EEMD model is much more accurate. This is mainly because EEMD can remove noise and obtain a more durable and stable part in many experiments. (4) Further analysis and comparison of EWT-BiLSTM model and EMDBiLSTM model, in this data set, EWT algorithm has better prediction eﬀect than EMD algorithm. This is mainly because EWT overcomes the boundary eﬀect and modal aliasing of EMD to a

(1) The proposed EWT-BiLSTM-SVR performs better than these compared widely used forecast approaches with the minimum value of MAE as 0. 0.1116, RMSE as 0. 0.1862 and MAPE as 0.2976%. And the best one of the compared prediction models is BiLSTM with 663

Electrical Power and Energy Systems 110 (2019) 653–666

H. Cheng, et al.

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1

51

101

151

Fig. 15. The results of feature selection

whether we use a hybrid model or a single model.

Table 12 Performance indices of forecasting results obtained by diﬀerent models on case study 5. Situation Situation Situation Situation Situation

1 2 3 4

MAE

RMSE

MAPE (%)

0.1263 0.1093 0.2894 0.1983

0.2113 0.2005 0.4110 0.3316

0.3566 0.2953 0.8051 0.5678

4.6. Case study 5: the inﬂuence of the relative validity of input features on prediction performance As discussed in Section 2.8, the choice of the lag variable greatly aﬀects the predictive performance and we use stochastic forest model to help assess the relative importance of input features. Firstly, we get lagged variables relative to the week after the ﬁrst day. q). We can see that it has the biggest score at 168th, 167th and 21th and the smallest score at 82th, 83th and 89th. Finally, we designed four models, which has diﬀerent input features, to verify the inﬂuence of the relative validity of input features on prediction performance. The input features of diﬀerent models are illustrated as followed;

MAE as 0.4742, RMSE as 0.6065 and MAPE as 1.3737% while the worst one is KNN with MAE as 0.9326, RMSE as 1.6593, MAPE as 2.7724%. (2) According to the Figs. 12 and 13, EWT-BiLSTM-SVR model has the best curve ﬁtting ability between the real electricity price and prediction electricity price, and smallest prediction errors compared with KNN, SVR, GBDT, ELM and BiLSTM. Therefore, a suitable hybrid model can eﬀectively improve the prediction accuracy. (3) From Table 11, it can be clearly seen that, although BiLSTM can improve the prediction accuracy, it will also greatly increase the training time. ELM has the shortest training time, with only 0.0270 s. And KNN, SVR, GBDT are also training very fast. (4) To summarize, in actual production and life, we need to consider the performance of the model and training time, so as to determine

Situation Situation Situation Situation

1: 2: 3: 4:

Without lag variable, only current data. Current data and lag variable at 168th. Current data and lag variable at 82th. Current data and lag variable at 168th and 82th.

The forecasting results are listed in Table 12, and comparison performances are graphically displayed in Fig. 16. Furthermore, Figs. 17 and 18 displays the forecasting results and prediction residual errors of each model, respectively. It should be pointed out that the ﬁrst 128 data

0.45

0.9%

0.4

0.8%

0.35

0.7%

0.3

0.6%

0.25

0.5%

0.2

0.4%

0.15

0.3%

0.1

0.2%

0.05

0.1%

0

0% Situation 1

Situation 2 MAE

Situation 3 RMSE

Situation 4

MAPE (%)

Fig. 16. Forecasting results obtained by diﬀerent models on case study 4 664

Electrical Power and Energy Systems 110 (2019) 653–666

H. Cheng, et al.

Fig. 17. The electricity price forecasting results obtained by diﬀerent models on case study 5 (a) Situation 1; (b) Situation 2; (c) Situation 3; (d) Situation 4.

Fig. 18. The electricity price forecasting errors obtained by diﬀerent models on case study 5 (a) Situation 1; (b) Situation 2; (c) Situation 3; (d) Situation 4.

input variables will improve the prediction accuracy. This is because models can ﬁnd and utilize more information and relationships between prediction sequences. Since the model can not make full use of the information of these data sequences by adding the variables with extreme correlation, it will increase the additional dimension and produce noise, so the prediction accuracy of case 3 is the lowest.

are deleted because of the absence of lag value and all models used in experiments are proposed EWT-BiLSTM-SVR with 70% train set. According to this experiment, the main observations can be summarized as follows: (1) From Table 12 and Fig. 16, the model, which using current data and lag variable at 168th as input variable, outperforms other three competitors for electricity price forecasting with smallest value of MAE as 0.1093, RMSE as 0.2005, and MAPE as 0.2953% while the worst compared models is model 3 with MAE as 0.2894, RMSE as 0.4110 and MAPE as 0.8051%. (2) Compared diﬀerent situations, we can ﬁnd that adding diﬀerent correlation factors into the input sequence will have diﬀerent effects on the experimental results. For example, the results of Situation 1 and Situation 2 illustrate that adding high correlation

5. Conclusion and future work Over the past few decades, many electricity price forecasting models have been developed. This paper tries to address the short-term electricity price forecasting problem and proposed a novel method based on EWT, BiLSTM, SVR and BO. In the proposed model, EWT is used as a processing tool to decompose the original signal into speciﬁc modal 665

Electrical Power and Energy Systems 110 (2019) 653–666

H. Cheng, et al.

components according to the characteristics of the signal itself. Then, considering the complexity of forecasting nonlinear subseries, SVR and BiLSTM are used as basic framework to forecast the nonlinear subseries. At the same time, BO is introduced to adjust parameters and optimize model performance. In last, the ﬁnal prediction results are combined by the prediction results of diﬀerent models. Based on the experimental results, it can be concluded that: (1) Data set segmentation and parameter setting have a great inﬂuence on the prediction performance of the models; (2) a reasonable decomposition and reconstruction method can make the non-stationary signal stable, and fully excavate the characteristics and attributes of the original data that couldeﬀectively improving the prediction eﬀect of the model; (3) Compared to single KNN, SVR, GBDT, ELM and BiLSTM models, the proposed modelcan achieve a better forecasting performance in terms of diﬀerent performance indices (i.e., MAE, RMSE, MAPE); (4) adding diﬀerent correlation factors into the input sequence will have diﬀerent eﬀects on the experimental results: adding high correlation input variables will improve the prediction accuracy and adding low correlation input variables will reduce the prediction accuracy. As for future work, multi-parameter prediction can be developed to further enhance forecasting accuracy. In addition, one-step ahead electricity price forecasting has been investigated in this paper and multi-step ahead electricity price forecasting is another important research area.

[12] Yan X, Nurul A. Mid-term electricity market clearing price forecasting: A multiple SVM approach. Int J Electr Power Energy Syst 2014;58:206–14. [13] Gao G, Lo K, Fan F. Comparison of ARIMA and ANN modelsused in electricity price forecasting for power market. The 9th Asia-Paciﬁc Power and Energy Engineering Conference, China. 2017. p. 121–7. [14] Peter S, Raglend I. Sequential wavelet-ANN with embedded ANN-PSO hybrid electricity price forecasting model for Indian energy exchange. Neural Comput Appl 2017;28(8):2277–92. [15] Paras M, Anurag K, Park J. An eﬀort to optimize similar days parameters for ANNBased electricity price forecasting. IEEE Trans Ind Appl 2009;45(5):1888–96. [16] Raﬁei M, Niknam T, Khooban M. Probabilistic forecasting of hourly electricity price by generalization of ELM for usage in improved wavelet neural network. IEEE Trans Ind Inf 2017;13(1):71–9. [17] Chai S, Xu Z, Jia Y. Conditional density forecast of electricity price based on ensemble ELM and logistic EMOS. IEEE Trans Smart Grid 2018:1–17. [18] Sun Z, Choi T, Au K, Yu Y. Sales forecasting using extreme learning machine with applications in fashion retailing. Decis Support Syst 2008;46(1):411–9. [19] Xu J, Huang N, Wang W, Qi J, Xu S, Yu Z. hourly solar radiation forecasting based on GA-ELM neural network. Power Syst Clean Energy 2016;32(8):105–9. (in Chinese). [20] Chen X, Dong Z, Meng K, Ku Y, Wong K, Ngan H. Electricity price forecasting with extreme learning machine and bootstrapping. IEEE Trans Power Syst 2012;27(4):2055–62. [21] Ugurlu U, Oksuz I, Tas O. Electricity price forecasting using recurrent neural networks. Energies 2018;11(5):1–23. [22] Dev P, Martin M. Using neural networks and extreme value distributions to model electricity pool prices: Evidence from the Australian National Electricity Market 1998–2013. Energy Convers Manage 2014;84:122–32. [23] Mandal P, Srivastava A, Senjyu T, Negnevitsky M. A new recursive neural network algorithm to forecast electricity price for PJM day-ahead market. Int J Energy Res 2010;34(6):507–22. [24] Chen J, Zeng G, Zhou W, Du W, Lu K. Wind speed forecasting using nonlinearlearning ensemble of deep learning time series prediction and extremal optimization. Energy Convers Manage 2018;165:681–95. [25] Peng L, Liu S, Liu R, Wang L. Eﬀective long short-term memory with diﬀerential evolution algorithm for electricity price prediction. Energy 2018;162:1301–14. [26] Kuo P, Huang C. An electricity price forecasting model by hybrid structured deep neural networks. Sustainability 2018;10(4):1280–96. [27] Hu Y, Chen L. A nonlinear hybrid wind speed forecasting model using LSTM network, hysteretic ELM and diﬀerential evolution algorithm. Energy Convers Manage 2018;173(4):123–42. [28] Gollou A, Ghadimi N. A new feature selection and hybrid forecast engine for dayahead price forecasting of electricity markets. J Intell Fuzzy Syst 2017;32(6):4031–45. [29] Zhang Y, Li C, Li L. Electricity price forecasting by a hybrid model, combining wavelet transform, ARMA and kernel-based extreme learning machine methods. Appl Energy 2017;190:291–305. [30] Wang D, Luo H, Grunder O, Lin Y, Guo H. Multi-step ahead electricity price forecasting using a hybrid model based on two-layer decomposition technique and BP neural network optimized by ﬁreﬂy algorithm. Appl Energy 2017;190:390–407. [31] Yu L, Wang S, Lai K. Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm. Energy Econ 2008;30(5):2623–35. [32] Wu Z, Huang N. Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv Adapt Data Anal 2009;1(1):1–41. [33] Jerome G. Empirical wavelet transform. IEEE Trans Signal Process 2013;61:3999–4010. [34] Chen T, Xu R, He Y, Wang X. Improving sentiment analysis via sentence type classiﬁcation using BiLSTM-CRF and CNN. Expert Syst Appl 2017;72:221–30. [35] Lu K, Zhou W, Zeng G, Du W. Design of PID controller based on a self-adaptive statespace predictive functional control using extremal optimization method. J Franklin Inst 2018;355(5):2197–220. [36] Lu K, Zhou W, Zeng G, Zheng Y. Constrained population extremal optimization based robust load frequency control of multi-area interconnected power system. Int J Electr Power Energy Syst 2019;105:249–71. [37] Wang C, Liu S, Zhu M. Bayesian network learning algorithm based on unconstrained optimization and ant colony optimization. Syst Eng Electron 2012;5:784–90.

Acknowledgements This work is partially supported by the National Natural Science Foundation of China (No. 61573095). References [1] Fan S, Mao C, Chen L. Next-day electricity-price forecasting using a hybrid network. Generation, Transm Distrib, IET 2007;1(1):176–82. [2] Niu D, Liu D, Wu D. A soft computing system for day-ahead electricity price forecasting. Appl Soft Comput 2010;10(3):868–75. [3] Vahidinasab V, Jadid S, Kazemi A. Day-ahead price forecasting in restructured power systems using artiﬁcial neural networks. Electr Power Syst Res 2008;78(8):1332–42. [4] Weron R. Electricity price forecasting: A review of the state-of-the-art with a look into the future. Int J Forecast 2014;30(4):1030–81. [5] Vehviläinen I. Pyykkönen Tnen. Stochastic factor model for electricity spot price-the case of the Nordic market. Energy Econ 2005;27(2):351–67. [6] Chen H, Li F, Wang Y. Wind power forecasting based on outlier smooth transition autoregressive GARCH model. J Mod Power Syst Clean Energy 2018;6(3):532–9. [7] Contreras J, Espinola R, Nogales F, Conejo A. ARIMA models to predict next-day electricity prices. IEEE Trans Power Syst 2003;18(3):1014–20. [8] Zhang Y, Li C, Li L. Wavelet transform and Kernel-based extreme learning machine for electricity price forecasting. Energy Syst 2018;9(1):113–34. [9] Sarada K, Bapiraju V. Comparison of day-ahead price forecasting in energy market using neural network and genetic algorithm. Guntur: Smart Electric Grid (ISEG); 2014. [10] Razak I, Abidin I, Yap K, Abidin A, Rahman T, Nasir M. A novel hybrid method of LSSVM-GA with multiple stage optimization for electricity price forecasting. 2016 IEEE International Conference on Power and Energy (PECon), Malaysia. 2016. p. 390–5. [11] Li G, Liu C, Chris M, Jacques L. Day-ahead electricity price forecasting in a grid environment. IEEE Trans Power Syst 2007;22(1):266–74.

666

A hybrid electricity price forecasting model with Bayesian optimization for German energy exchange

A hybrid electricity price forecasting model with Bayesian optimization for German energy exchange

Recommend Documents