Energy 187 (2019) 115804
Contents lists available at ScienceDirect
Energy journal homepage: www.elsevier.com/locate/energy
Electricity price prediction based on hybrid model of adam optimized LSTM neural network and wavelet transform Zihan Chang a, Yang Zhang b, Wenbo Chen a, * a b
School of Information Science & Engineering, Lanzhou University, Lanzhou, China IT Service, Lanzhou University, Lanzhou, China
a r t i c l e i n f o
a b s t r a c t
Article history: Received 21 February 2019 Received in revised form 19 June 2019 Accepted 21 July 2019 Available online 22 July 2019
To a large extent, electricity price prediction is a daunting task because it depends on factors, such as weather, fuel, load and bidding strategies etc. Those features generate a lot of fluctuations to electricity price. As a type of RNN, LSTM has a good performance on processing time series data as well as some nonlinear and complex problems. To explore more accurate electricity price forecasting approach, in this paper, a new hybrid model based on wavelet transform and Adam optimized LSTM neural network, denoted as WT-Adam-LSTM, is proposed. After the wavelet transform, nonlinear sequence of electricity price can be decomposed and processed data will have a more stable variance, and the combination of Adam, one of efficient stochastic gradient-based optimizers, and LSTM can capture appropriate behaviors precisely for electricity price. This study presented four cases to verify the performance of the hybrid model, and the dataset from New South Wales of Australia and French were adopted to illustrate the excellence of the hybrid model. The results show that the proposed model can significantly improve the prediction accuracy. © 2019 Published by Elsevier Ltd.
Keywords: Component Electricity price prediction Wavelet transform Long short-term memory Adam
1. Introduction Electricity price forecasting is crucial for decision makers to arrange bidding strategies and develop appropriate products, it has been one of the main efforts of energy market researchers and practitioners. However, electricity price prediction has volatility, it can be affected by multiple factors, such as holidays, temperature, festival and season. Moreover, due to fluctuations, unbalanced power resources may lead to increased power loads and higher electricity prices [1,2]. In recent years, many researchers have demonstrated that power-related predictions are a daunting task because power demand depends to a large extent on many factors [3e5]. Weron [6] pointed out that electricity price forecasting methods can be roughly classified into five categories: statistical approaches, computational intelligence techniques multi-agent models, fundamental methods and reduced-form models, among which statistical methods and computational intelligence techniques are most commonly used. Previous statistical methods used in electricity price forecasting
* Corresponding author. E-mail addresses:
[email protected] (Y. Zhang),
[email protected] (W. Chen). https://doi.org/10.1016/j.energy.2019.07.134 0360-5442/© 2019 Published by Elsevier Ltd.
(Z.
Chang),
[email protected]
include exponential smoothing methods, AR-type time series models just like the well-known AutoRegressive Moving Average (ARMA), AutoRegressive Integrated Moving Average (ARIMA), and Generalized AutoRegressive Conditional Heteroskedastic (GARCH). Tan et al. [7] proposed a method of electricity price prediction using WT combined with ARIMA and GARCH, and verified in the PJM and Spanish power markets. Dong et al. proposed a hybrid model based on Empirical Mode Decomposition (EMD), Seasonal Adjustment (SA) and Autoregressive Integrated Moving Average (ARIMA) and tested it on electricity price dataset from New South Wales in Australia [8]. However, many statistical methods always focused on combining historical data with current data using a mathematical combination approach and limited to handle complex or nonlinear time series problems. Computational intelligence methods have elicited widely attention for its outstanding performance in handle complex and nonlinear problems. In terms of electricity price forecasting, traditional machine learning has achieved many impressive results, of which extreme learning machine (ELM), support vector machine (SVM) and artificial neural network (ANN) are widely used. The author of [9] used artificial neural network (ANN) and hybrid model of ANN combined with clustering algorithm for Day-ahead price forecasting. ELM coupled with wavelet technique was used by
2
Z. Chang et al. / Energy 187 (2019) 115804
Shrivastava et al. [10] to improve the forecasting accuracy as well as reliability based on the dataset from Ontario, PJM, New York and Italian Electricity markets. A multiple support vector machine (SVM) based mid-term electricity market clearing price forecasting model is proposed by Yan et al. [11] and the PJM interconnection data are used to test the proposed model. With the rapid development of artificial intelligence, deep learning has attracted considerable attention for its outstanding performance in language modeling [12], speech recognition [13], and natural language inference [14]. Compared to traditional machine learning methods, deep neural networks can analyze deep and complex nonlinear relationships through hierarchical and distributed feature representations. Recurrent neural network (RNN) is a powerful deep learning method for processing sequential data such as sound [15] and stock market [16], which differ from the traditional feedforward networks in the sense that they don't only have neural connections on a single direction, in other words, neurons can pass data to a previous or the same layer. However, due to the existence of “long-term dependencies”, the traditional RNNs encountered obstacles to the study of time series analysis. Long Short Term Memory networks e usually just called “LSTM” e are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter & Schmidhuber (1997), and were refined and popularized by many people in many areas. They work tremendously well on a large variety of problems, and are now widely used in sequence analysis, speech recognition and natural language processing. The author of [17] has forecasted the volatility of stock price index with hybrid model of LSTM and multiple GARCH, In Ref. [18], Liu et al. predicted the wind speed with the hybrid model of empirical wavelet transform (EWT), Elman neural network and LSTM. Long Short Term Memory networks are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn. The actual electricity price data is nonlinear and non-stationary. Peng et al. [19] has adopted differential evolution (DE) algorithm to identify parameters for LSTM while predicting electricity prices, and experimentally verified that Long Short Term Memory networks have a clear advantage in accuracy of electricity price forecasting over some statistical approaches and traditional computational intelligence methods, mainly because it can better capture features and deal with the irregularities in electricity prices. The method based on stochastic gradient has core practical importance in the field of deep learning optimization, which plays a key role in the accuracy of prediction. Adam is one of efficient stochastic optimization that only requires first-order gradients with little memory requirement [20], it has combined the advantages of two popular methods: AdaGrad [21], which works well with sparse gradients, and RMSProp [22], which has an excellent performance in non-line and non-stationary settings. It's easy implementation, little memory requirements and appropriateness for non-stationary make Adam an effective and efficient algorithm that has been used successfully in previous studies [23e25]. It can achieve the goal of optimizing the deep learning model by finding a series of parameters to minimize the objective function. Therefore, the Adam optimized LSTM neural network is considered as a powerful tool to predict electricity price. The wavelet transform, denoted as WT, is a data processing method that provides very useful decomposition information in the time domain and frequency domain, making it suitable for analyzing non-stationary signals such as time series. It combines many other models to form a hybrid model to improve the performance of the model, which is reflected in many previous literatures [26e28]. Yang et al. [29] presented a hybrid model,
combining wavelet transform, ARMA and kernel-based extreme learning machine methods for electricity price forecasting on the dataset from PJM, Australian and Spanish markets; The author of [30] suggested a hybrid method that combined the WT, the Gravitational Search Algorithm and LSSVM and evaluated by using electricity price data from Iran's, Ontario's and Spain's price markets. However, WT coupled with Adam optimized LSTM used for electricity price forecasting has not been discovered yet. The combination of wavelet transform and Adam optimized LSTM neural network used for time series analysis is a novel and excellent idea. So in this paper, a hybrid model based on wavelet transform and Adam optimized LSTM neural network, denoted as WT-Adam-LSTM, for electricity price forecasting is proposed, and the data from New South Wales of Australia and French were adopted to verify the performance of WT-Adam-LSTM. To the best of the author's knowledge, the proposed hybrid model is first applied to electricity price prediction. In order to illustrate the performance of the proposed model, the accuracy of electricity price forecasting has been calculated and compared with those existing traditional models and some hybrid models, including some combined models reported in the literature. In addition, different optimizers are used and compared on the purpose of finding the one that best fits the LSTM. The contributions of this paper are as follows: (a) This study is the first to combine WT and Adam optimized LSTM for electricity price forecasting. (b) Pauta criterion and Min-max normalization are used as preprocessing methods for price data. (c) Wavelet transform is used to decompose the electricity price series into a set of better-performing constitutive series. (d) Four cases are listed to illustrate WT-Adam-LSTM outperforms other popular methods. The results show that the proposed model can significantly improve the prediction accuracy. The framework of this paper is organized as follows: Section 2 introduced the basic theory of WT-Adam-LSTM, Section 3 described the proposed hybrid model in detail. In Section 4, four cases are presented for electricity price forecasting. The conclusions are drawn in Section 5. 2. Methodologies 2.1. Adam-LSTM RNN neural networks are powerful models for processing sequential data, which differ from the traditional feed-forward networks in the sense that they don't only have neural connections on a single direction, in other words, neurons can pass data to a previous or the same layer. In the traditional neural network model, between the input layer and the hidden layer, the hidden layer and the output layer are fully connected, and the nodes between each layer are disconnected. But this common neural network is powerless to deal with many complicated problems. For example, if you want to predict what the next word of a sentence is, you usually need to use the previous word, because the words in a sentence are not independent. RNN is called a cyclic neural network, that is, the current output of a sequence is also related to the previous output. The specific form of expression is that the network memorizes the previous information and applies it to the calculation of the current output, and the nodes between the hidden layers will be connected. As a result, the input of the hidden layer includes not only the output of the input layer but also the output of the hidden layer at the previous moment. As shown in
Z. Chang et al. / Energy 187 (2019) 115804
3
~ct ¼ tanhðwc ½ht1 ; xt þ bc Þ
(4)
The memory cell is updated by moderated input features cet and the partial forgetting of the previous memory cell ct1 , which yields
ct ¼ ft ct1 þ it ~ct
Fig. 1. An unrolled recurrent neural network.
Fig. 1, At time step t, a chunk of neural network, A, receives input values xt and previous output ht1 and calculates the result ht . It can be thought of as multiple copies of the same neural networks. In theory, RNN can process sequence data of any length. Instead of neurons, LSTM networks have memory blocks that are connected through layers, each block contains gates that manage block's output and status, and it has a memory for the most recent sequence and a component that makes it smarter than classical neurons. The block operates on the input sequence, and each gate within the block uses a sigmoid activation unit to control whether they are triggered, making the change of state and the addition of information flowing through the block conditional. There are three types of gates within a unit (Fig. 2): (1) Forget gate ft: conditionally decides what information to throw away from the block, which derives the following:
ft ¼ s wf ½ht1 ; xt þ bf
(1)
where wf and bf represent weight matrices and bias vectors, respectively. ht1 is previous output, xt means that input value, and s (x) indicates sigmoid function. (2) Input gate it : conditionally determines which values are entered to update the memory state. The formula is as follows:
i t ¼ sðwi ½ht1 ; xt þ bi Þ
(2)
(5)
Ultimately, the hidden output state ht is calculated by output gate ot and memory ct , where:
ht ¼ ot *tanhðct Þ
(6)
In (1)e(6), the matrices wf , wi , wo , and wc are weight matrices; Vectors bf , bi , bo , and bc are the bias vectors; ht1 is the result calculated by previous hidden state and ht is the output of hidden state at time step t; xt means that input information. Adam [20], an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is simple to implement, high in computational efficiency, has small memory requirements, rescales the diagonal of the gradient, and is well suited to large problems in terms of data or parameters. The algorithm of Adam can be seen in Table 1. As described in the following algorithm, after determining the parameters a, b1, b2 and the random objective function f(q), we need to initialize the parameter vector, the first moment vector, the second moment vector, and the time step. The loop then iteratively updates the various parts (t, gt, mt, vt, mbt, vbt and qt) until the parameter q converges. Adam is also applicable to LSTM neural network by optimizing the target function f(q) (mean squared error was used in proposed model), and the goal of Adam is to find a set of parameters that minimize mean squared error. Meanwhile, Adam does not require a stationary objective, it works with sparse gradients and naturally performs a form of step size annealing. Experiments with the effects of different optimizers on the model are presented in Section 4, and the empirical results show that Adam works well in practice and has advantages over other stochastic optimization methods. 2.2. Wavelet transform
(3) Output gate ot : conditionally decide what to output based on the input and block memory. Where:
ot ¼ sðwo ½ht1 ; xt þ bo Þ
(3)
At each time t, the input features are calculated by input xt and the previous hidden state ht1 , and tanh function is considered to push the values to be between 1 and 1, where:
Wavelet analysis is an emerging branch of mathematics and signal theory, which is the result of the perfect fusion of general functions, Fourier analysis, harmonic analysis, and numerical analysis. It is considered as another effective time-frequency analysis method after Fourier analysis and is well known for outperformance in many applications, especially in signal processing, image processing [32], speech processing [33], hydrological prediction [34]. The WT can be divided into two types: continuous wavelet transformation (CWT) and discrete wavelet transformation (DWT). For signal y(t), CWT is defined as:
1 CWTy ða; tÞ ¼ pffiffiffiffiffiffi jaj
ð
yðtÞj*
tt
a
dt
(7)
where a is the scale parameter, t is the translation parameter, j ðxÞ is the complex conjugate function, and jðxÞ is the mother wavelet. DWT is defined as:
DWTy ðm; nÞ ¼ a0 2
m
Fig. 2. Long short term memory neural network [31].
ð
yðtÞj* ða0 m t nt0 Þdt
(8)
where m is the scaling constant (decomposition level) and n is the translating constant which is an integer. Due to the limited amount of electricity price data, we use discrete wavelet transform (DWT) to predict electricity price in this
4
Z. Chang et al. / Energy 187 (2019) 115804
Table 1 Algorithm of Adam [20]. Algorithm 1: Adam, our proposed algorithm for stochastic optimization. See section 2 for details, and for a slightly more efficient (but less clear) order of computation. g2t indicates the elementwise square gt 1 gt . Good default settings for the tested machine learning problems are a ¼ 0.001, b1 ¼ 0.9, b2 ¼ 0.999 and ε ¼ 108. All operations on vectors are element-wise. With b1t and b2t we denote b1 and b2 to the power t. Require: a: Stepsize Require: b1, b2 2 [0, 1): Exponential decay rates for the moment estimates Require: f(q): Stochastic objective function with parameters q Require: q0: Initial parameter vector m0 ) 0 (Initialize 1st moment vector) v0 ) 0 (Initialize 2nd moment vector) t ) 0 (Initialize timestep) while qt not converged do t)tþ1 gt ) Vqft(qt1) (Get gradients w.r.t. stochastic objective at timestep t) mt ) b1 $ mt1 þ (1 b1) $ gt (Update biased first moment estimate) vt ) b2 $ vt1 þ (1 b2) $ gt 2 (Update biased second raw moment estimate) mbt ) mt/(1 b1t) (Compute bias-corrected first moment estimate) vbt ) vt/(1 b2t) (Compute pffiffiffiffiffiffiffi bias-corrected second raw moment estimate) qt ) qt1 a · mbt/( vbt þ ε) (Update parameters) end while return qt (Resulting parameters)
work. After the DWT has been identified, we can distinguish the different subcategories of wavelets by the number of order (the number of vanishing moments) and the level of decomposition (scaling constant m). The number of the order indicates the number of vanishing moments, so db3 has three vanishing moments and db5 has 5 vanishing moment. The number of vanishing moments is related to the approximation order and smoothness of the wavelet. If a wavelet has p vanishing moments, it can approximate polynomials of degree p e 1, as the number of vanishing moments increases, the number of polynomials of the wavelet increases and the signal becomes smoother [35]. And as the level of decomposition increases, the number of samples represented by the wavelet increases. Compared ‘Daubechies’ wavelets with orders 1 to 5 and levels 1 to 5, the combination of db5 and 5 levels of decomposition can fully display the details of the signal and make the signal smoother. Real-time electricity prices have nonlinearity and instability, there are many factors such as the power load, human activities, weather, temperature and fuel price may affect the accuracy of electricity price forecasting [36,37]. The existence of these factors leads to the generation of multiple features of the electricity price sequence. So we should accord to the characteristics of electricity price, and comprehensively consider the wavelet function's regularity, compact support, the order of vanishing moments and the length of input values, to select the appropriate wavelet and decomposition level. Taking into account the unique volatility of electricity prices and its various features, Daubechies 5 and five levels of decomposition is used in this paper, which provides an appropriate compromise between wavelength and smoothness. The decomposition results have shown proper behaviors that are suitable for electricity prices. Similar wavelets are adopted by previous researchers in the fields of electricity price prediction [38,39]. 2.3. Rationales for using WT-Adam-LSTM to predict electricity prices In order to achieve accurate price forecasts, the predictive model must be able to capture different patterns in the price series. However, the existence of high volatility, nonlinearity and irregularity of the electricity prices has made the traditional model, like BP [40], SVM [11] and ELM [41], perform not so satisfactory in grasping the features and dealing with irregularity of the price
series, so the accuracy in improving the forecast of electricity price has encountered obstacles. As a powerful model for processing sequential data, LSTM is favored by many researchers because of its adaptability to complex problems and its outstanding ability to capture those features, and it has received extensive attention in electricity price forecasting. Whether it is an approximate sequence or a detailed sequence, the prediction of price series using the LSTM neural network is satisfactory. Moreover, the Adam algorithm helps the LSTM neural network search for suitable parameters and improves its stability and generalization ability. In addition to the deep learning model, wavelet transform can convert abnormal price series into a set of normal approximation series and detail series, and the decomposed series can be forecast more accurately than that can be achieved through direct forecasting by LSTM. Related experiments have appeared in the literature [42], in which the author identified and tested several ways to prove that wavelet transform could support time series forecasting. Hence, the hybrid model of wavelet transform and Adam optimized LSTM neural network is proposed for electricity price forecasting is this study. 3. Proposed hybrid model The process of predicting electricity price using a hybrid model based on wavelet transform and Adam-LSTM is described as Fig. 3. 3.1. The process for electricity price forecasting The process of predicting electricity price using a hybrid model based on wavelet transform and Adam optimized LSTM is roughly divided into five steps: (1) Step 1: Data collection The proposed model was used for testing of electricity price data from France and New South Wales, Australia, and was compared to recent publications. Data of French electricity market and New South Wales market of Australia obtained from their website [43,44] were adopted to illustrate the excellence of model. (2) Step 2: Data preprocessing In order to further improve the accuracy of the model for price series prediction, the abnormal values of the training set of the original data was processed according to the Pauta criterion. The
Z. Chang et al. / Energy 187 (2019) 115804
5
Fig. 3. The overall framework of proposed model.
Pauta criterion assumes that a set of test data only contains random errors, calculate it to obtain standard deviation, and determine a range according to a certain probability. It is considered that the error exceeding this interval is not a random error but a coarse error. For Pauta criterion: the probability of a numerical distribution in (m-s, mþs) is 0.6827, and the probability in (m-2s, mþ2s) is 0.9545, in (m-3s, mþ3s) the probability is 0.9973. We used Pauta criterion to determine a 95% confidence interval of observed data under good condition and then apply the bounds of the confidence interval to identify abnormal data [45]. The process is as follows: (a) We assume that the training set is c ¼ ðc1 ; c2 …cm Þ, Then the variance s can be calculated using the following formula:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pm i¼1 ðci cÞ s¼ m1
2) The data y ; (m-3s, mþ3s) indicates that y is a abnormal data, then we replace y with c and update the training set. To make the dataset fit the model better, the datasets were normalized to the range of 0-to-1with the Min-max normalization method after the abnormal values of the training set are updated. The min-max normalization method is a linear transformation of the original data. Let minA and maxA be the minimum and maximum values of dataset A, An original value x of A is normalized by min-max normalization to a value x' in the interval [0, 1]. The formula is: 0
x ¼
x minA maxA minA
(10) 0
(9)
where c is the mean of training set, and m is the sample number of training dataset. (b) When a new data y is observed, we use Pauta criterion to determine whether the data belongs to the distribution of (m3s, mþ3s). 1) If y 2 (m-3s, mþ3s), we serve y as a normal training data, and the training dataset remains the same.
where x is original value, x is the normalized value, maxA and minA are the maximum and minimum values in the dataset A, respectively. (3) Step 3: Electricity price decomposition Wavelet transform is used to decompose the electricity price series into a set of better-performing constitutive series, these series show better behavior in terms of stability and smoothness than the original price series. The constitutive series is predicted separately, and reverse wavelet transform is performed to generate an actual predicted price. In this work, Daubechies 5 is used as a
6
Z. Chang et al. / Energy 187 (2019) 115804
mother wavelet and five levels of decomposition are considered, which provides an appropriate trade-off between wavelength and smoothness, and the decomposition results show behaviors that are suitable for electricity prices. (4) Step 4: Adam-optimized LSTM
(5) Step 5: Model evaluation After WT-Adam-LSTM is built, the dataset from New South Wales of Australia and French were adopted to illustrate the excellence of the hybrid model, and six loss functions are available to evaluate the performance of electricity price forecasting.
3.2. Evaluation of forecasting performance To evaluate the performance of the proposed model, four loss functions and two statistical measures can be used as the criteria to evaluate the prediction performance relative to electricity price value including mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE). The smaller the value of the loss function, the higher the accuracy of the model for electricity price prediction. Meanwhile, two performance metrics of Theil U statistic 1 and Theil U statistic 2 were adopted by Refs. [9,29], in which Theil U Statistic 1 receives values between 0 (denoting good accuracy) and 1 (denoting poor accuracy), Theil U Statistic 2 can take values greater than 1 which denote poor forecasting accuracy. The performance metrics is as follows: T 1X 2 ðb x t xt Þ T i¼1
(11)
(12)
i¼1
MAE ¼
Before using LSTM to predict electricity prices, we convert a single column into a two-column data set: the first column contains all the original electricity price data, and the second column contains the next time (t þ 1) electricity price data corresponding to time t for prediction. This causes the predicted value to be one point less than the true value, but this can be resolved by adding subsequent data. After the above processing, the LSTM neural network is used to train and predict the processed time series, so that a more accurate prediction result can be obtained. The network has a visible layer with 1 input, a hidden layer with 4 LSTM blocks or neurons, and an output layer that makes a single value prediction. The default sigmoid activation function is used for the LSTM blocks. The network is trained for 100 epochs and a batch size of 1 is used, and the look_back, which is the number of previous time steps to use as input variables to predict the next time period d set to 2 in this study. Specifically, as a type of efficient stochastic gradient-based optimizers, Adam was used for LSTM in electricity price forecasting, the flow chart used to indicate how Adam works is shown in Fig. 3.
MSE ¼
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u T u1 X RMSE ¼ t ðb x t xt Þ 2 T T 1X x t xt j jb T i¼1
MAPE ¼
(13)
T b 1X x t xt T i¼1 xt
(14)
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ffi P T 1 c i¼1 ðxt xt T q qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi U1 ¼ P P 1 T ð xb Þ2 þ 1 T ðx Þ2 i¼1 t i¼1 t T T
(15)
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ffi P 1 T1 xc tþ1 xtþ1 i¼1
T
xt
U2 ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 P xtþ1 xt 1 T T
i¼1
(16)
xt
where T is the number of test set samples, xt refers to the real value of the tth forecasting point, and b x t is the corresponding predicted value. 4. Case study and results In this section, four cases are enumerated to verify the performance of the proposed hybrid model. In case 4, we use the latest price data to verify the superiority of the proposed model, and the other two cases are compared with the recently published publication using WT-Adam-LSTM. As shown in Table 2. 4.1. Case 1: comparison of different optimizers for LSTM After determining the parameters of the LSTM model, it is crucial to choose a suitable optimizer for it. There are many optimizers available to optimize the LSTM neural network for better performance. Among them, SGD, RMSprop and Adam are more commonly used for time series prediction. To find the optimizer that is most suitable for the LSTM neural network, three optimizers SGD, RMSprop and Adam are used for comparison, and electricity price data for New South Wales from January 2014 to December 2014 is used to demonstrate the fitness of different optimizers for the LSTM model. During the 12 months, the last day of each month data is used as a test set to verify the quality of the model, all data containing 30 days or 29 days or 27 days before the test set is used as the training set. Three loss functions, RMSE, MAE and MAPE, are used to evaluate the accuracy of the LSTM model under different optimizer optimizations for electricity price prediction. The results are shown in Table 3. As a kind of stochastic gradient-based
Table 2 Case study.
Case 1 Case 2 Case 3 Case 4
Data Sources
Date of price
Purpose of selection
New South Wales New South Wales France, New South Wales France
January 2014eDecember 2014 January 2014eDecember 2014 first 100 points of September 2017 in France, May 2013 in NSW July 22, 2018eSeptember 1, 2018 January 06, 2019eFebruary 16, 2019
Finding an optimizer comparative case comparative case Latest data for model evaluation
Z. Chang et al. / Energy 187 (2019) 115804
7
Table 3 Comparison of different optimizers for LSTM neural network models. RMSE
MAE
MAPE (%)
Jan
SGD-LSTM RMSProp-LSTM Adam-LSTM
2.88 2.59 2.89
2.10 1.94 1.86
3.96 3.72 3.49
Feb
SGD-LSTM RMSProp-LSTM Adam-LSTM SGD-LSTM RMSProp-LSTM Adam-LSTM SGD-LSTM RMSProp-LSTM Adam-LSTM SGD-LSTM RMSProp-LSTM Adam-LSTM SGD-LSTM RMSProp-LSTM Adam-LSTM SGD-LSTM RMSProp-LSTM Adam-LSTM SGD-LSTM RMSProp-LSTM Adam-LSTM SGD-LSTM RMSProp-LSTM Adam-LSTM SGD-LSTM RMSProp-LSTM Adam-LSTM SGD-LSTM RMSProp-LSTM Adam-LSTM SGD-LSTM RMSProp-LSTM Adam-LSTM
1.17 1.08 1.06 1.60 1.71 1.60 2.52 2.32 2.33 2.68 1.72 1.46 3.67 3.90 3.81 2.48 2.30 2.19 4.62 4.40 4.32 2.65 2.48 2.41 2.72 2.73 2.69 1.66 1.52 1.38 3.45 3.65 3.46
0.70 0.68 0.61 1.19 1.29 1.16 1.65 1.54 1.54 1.85 1.39 1.02 2.06 1.99 1.86 1.83 1.73 1.60 2.81 2.72 2.56 1.91 1.79 1.67 1.84 1.86 1.81 1.20 1.23 0.99 2.42 2.54 2.40
1.49 1.45 1.31 2.39 2.51 2.26 3.09 2.90 2.89 3.80 2.89 2.06 3.80 3.50 3.27 6.67 6.42 5.86 7.33 7.07 6.61 7.40 6.90 6.42 5.65 5.59 5.46 4.59 4.98 3.84 6.29 6.54 6.22
Mar
Apr
May
June
July
Aug
Sep
Oct
Nov
Dec
optimizer, Adam shows better fitness with LSTM than other optimizers. 4.2. Case 2: comparative experiments using electricity price in New South Wales In this case, electricity price data from New South Wales for 12 months in 2014 is adopted, which is exactly the same as the data used in Ref. [29], in which, A hybrid model, combining wavelet transform, statistical method ARMA and kernel-based extreme learning machine is considered to predict the electricity price on the last day of each month in New South Wales 2014. During the 12month period, the last day of each month data is used as a test set quality of the model, and all data containing 30 days, 29 days or 27 days prior to the test set is used as the training set. Just like what [29] did, MAPE, U1 and U2 is used to enable a fair comparison. As the Case 1 did, the training set is smoothed and standardized before all data is used, and the Adam optimizer is used to optimize the performance of the proposed model in terms of electricity price forecasting. As shown in Table 4, the proposed WT-Adam-LSTM model is compared to Yang's model [29]. From the results in Table 4, we can see that the 12-month MAPE averages from Yang's model [29] and the proposed WT-AdamLSTM model are 3.74, and 2.34, respectively. Compared with Yang's method, the error is reduced by 1.4%. In addition, Table 4 also shows two statistical measures U1 and U2 calculated from the proposed method and the comparison methods. All U1 values of twelve months generated by proposed methods are close to zero, denoting good prediction accuracy of proposed model, moreover, in
all the comparison methods, the proposed model has the smallest average value of U1. In addition, the value of U2 of the proposed model is much smaller than 1. All results show that the proposed method has the best predictive performance. The real electricity price and forecast price of New South Wales in January 2014 are shown as an example in Fig. 4. 4.3. Case 3: comparative experiments using electricity price in France and New South Wales To fully illustrate the outstanding performance of the proposed model in electricity price forecasting, we has compared the proposed model with the recently published publication [46], which used differential evolution (DE) to optimize LSTM models, denoted as DE-LSTM, to predict electricity prices and validate them in the Australian and French electricity markets. Just as what the author of [46] did, the first 100 points of electricity price in France in September 2017 and electricity price of New South Wales in May 2013 are obtained to verify the performance of proposed model. 4.3.1. Experiments using electricity price in France To enable a fair comparison, as the dataset appeared in Ref. [46], the first 100 points of electricity price in France in September 2017 are divided two subsets: training set (first 80 points) and test set (20 points). Then the dataset is normalized and Adam is selected as the optimizer for LSTM (The training set is not smoothed because there are no outliers in the training set),three loss functions, RMSE, MAE and MAPE, are used to evaluate the accuracy of proposed model and DE-LSTM [46]. The comparison between the
8
Z. Chang et al. / Energy 187 (2019) 115804
Table 4 Methods comparison of MAPE (%), U1 and U2 in New South Wales, 2014. Yang's model [29]
WT-Adam-LSTM
Jan
MAPE U1 U2
3.21 0.02 0.27
1.97 0.0132 0.1647
Feb
MAPE U1 U2 MAPE U1 U2 MAPE U1 U2 MAPE U1 U2 MAPE U1 U2 MAPE U1 U2 MAPE U1 U2 MAPE U1 U2 MAPE U1 U2 MAPE U1 U2 MAPE U1 U2
1.53 0.01 0.13 3.82 0.02 0.31 2.08 0.01 0.18 1.51 0.01 0.13 2.43 0.02 0.22 3.84 0.02 0.30 6.15 0.04 0.42 5.33 0.03 0.40 1.95 0.01 0.17 5.67 0.03 0.42 7.35 0.05 0.54 3.74 0.0225 0.2908
0.66 0.0043 0.0592 1.07 0.0065 0.0859 1.3 0.0111 0.1393 1.91 0.0112 0.145 1.9 0.0183 0.2045 2.89 0.0196 0.2323 3.64 0.0244 0.2498 2.76 0.015 0.2016 3.15 0.0201 0.2241 3.9 0.0206 0.277 2.97 0.0219 0.2525 2.34 0.0155 0.1863
Mar
Apr
May
June
July
Aug
Sep
Oct
Nov
Dec
AVE (MAPE) AVE(U1) AVE (U2)
80 70
price (MWh)
60 50 40
Actual price
30
Predicted price
20 10 0 1
6
11
16
21
26
31
36
41
46
hour
Fig. 4. Actual electricity price and predicted price of New South Wales in January 2014.
proposed WT-Adam-LSTM, DE-LSTM [46] and Adam-optimized LSTM is shown in Table 5, and the actual value and predicted price are shown in Fig. 5. As can be seen in Table 5, the author [46] lists different cases based on this dataset for electricity price forecasting, the minimum error of these cases are 3.65% (MAPE), 1.76
(RMSE) and 1.43 (MAE), which is higher than the proposed WTAdam-LSTM model by 1.5 (MAPE), 0.62 (RMSE), 0.62 (MAE) respectively.
Z. Chang et al. / Energy 187 (2019) 115804
4.4. Case 4: prediction of electricity prices in French market
Table 5 Methods comparison of MAPE (%), RMSE and MAE in France, September 2017. Adam-optimized LSTM DE-LSTM [46] Proposed WT-Adam-LSTM MAPE (%) 3.64 RMSE 1.67 MAE 1.38
3.65 1.76 1.43
9
2.15 1.14 0.81
4.3.2. Experiments using electricity price in New South Wales The electricity price data of New South Wales in May 2013 are also adopted by Ref. [46], in which the author use this data set to compare with [47]. Available half-hour data are converted into hourly data. So in Ref. [46], the 48 data points of the daily electricity price became 24 data points, and all the data in May are changed from 1488 data points to 744 data points. Similar to Refs. [46,47] did, One-step-ahead forecasting and 24-h-ahead (1-day-ahead) forecasting is performed, in which the data of 1 day (24 points) are set as test set in the case of one-step-ahead forecasting and data of 7 days (168 points) are set in the case of 24-h-ahead (1-day-ahead) forecasting. Similar to previous process, the dataset is normalized, and the MAE and MSE values calculated by different forecasting models are reported in Table 6. The actual price and predicted price generated by proposed WT-Adam-LSTM for the one-step-ahead forecasting are presented in Fig. 6, the actual price and predicted price generated by proposed WT-Adam-LSTM for 24-h-ahead (1-day-ahead) forecasting are shown in Fig. 7. Compared with ARIMA-ANN model proposed by Ref. [47] and DE-LSTM proposed by Ref. [46], WTAdam-LSTM leads to reductions in MSE values of 15.48 and 12.64, MAE of 1.78 and 1.75 for the one-step-ahead forecasting, and MSE of 49.13 and 48.71, MAE of 3.79 and 3.49 for 24-h-ahead (1-dayahead) forecasting, respectively. It is obvious that WT-Adam-LSTM outperform ARIMA-ANN [47] and DE-LSTM [46] in terms of electricity price forecasting based on New South Wales power market.
In this case, we obtain the electricity price data of France from July 22, 2018 to September 1, 2018 as a dataset from the website [43]. The dataset contains six weeks of data (24 data points per day, 1008 data points in six weeks), among which Five weeks of data is used as a training set, and the last week is used for testing. The data is sorted by day of the week, every single day's data is predicted by proposed model. For example, we use the data from the first five Sundays to predict the price of the last Sunday, and then use it on every single day. In order to highlight the superiority of the WTAdam-LSTM model, after the training set is smoothed according to the Pauta criterion and normalized by Min-max normalization method, the acquired data is used for electricity price prediction by traditional machine learning model back propagation (BP) neural network, general regression neural network (GRNN), deep learning model LSTM neural network and proposed hybrid model WTAdam-LSTM. The results are shown in Table 7. The loss functions RMSE, MAE and MAPE are used to verify the quality of the predictions. As indicated in Table 7, the prediction of French electricity price using the deep neural network LSTM is significantly better than the traditional machine learning method, mainly because LSTM can deal with more complex nonlinear problems through hierarchical and distributed feature representations. In particular, due to the fact that the dataset processed by WT has a more stable variance and Adam enables LSTM to find a set of suitable weights more conveniently, the proposed hybrid model WT-Adam-LSTM exhibits better performance in electricity price forecasting than the traditional machine learning BP, GRNN and the single deep learning model LSTM. The raw data and forecasting values generated by BP, GRNN, LSTM and proposed WT-Adam-LSTM model for latest electricity price forecasting in France are shown in Fig. 8. To further verify the performance of the model's electricity price
50 45
price (MWh)
40 35 30 25
Actual price
20
Predicted price
15 10 5 0 1
6
11
16
hour Fig. 5. Actual electricity price and predicted price of France in September 2017.
Table 6 Methods comparison of MSE and MAE in NSW, May 2013.
One-step-ahead 24-step-ahead (1 day ahead)
MSE MAE MSE MAE
ARIMA-ANN [47]
DE-LSTM [46]
Proposed WT-Adam-LSTM
18.2793 3.2342 53.0071 5.3219
15.4446 3.1999 52.5919 5.0194
2.80 1.45 3.88 1.53
10
Z. Chang et al. / Energy 187 (2019) 115804
70 60
price (MWh)
50 40 Actual price
30
predicted price
20 10 0 1
6
11
16
21
hour
Fig. 6. Actual electricity price and predicted price of NSW in May 2013 (One-step-ahead).
80 70
price (MWh)
60 50 40 Actual price
30
Predicted price
20 10 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 137 145 153 161
0 hour
Fig. 7. Actual electricity price and predicted price of NSW in May 2013 (24-step-ahead (1 day ahead)).
Table 7 MAPE (%), MAE and RMSE of different forecasting models.
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
MAPE MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE MAE RMSE
BP
GRNN
LSTM
WT-Adam-LSTM
9.38 5.19 6.05 11.53 5.47 6.74 5.46 3.65 4.34 4.32 3.05 3.64 5.06 3.31 4.2 7 4.77 5.33 4.83 3.01 3.57
9.40 5.19 6.21 11.72 5.48 6.78 5.52 3.65 4.35 4.32 3.05 3.64 5.12 3.32 4.23 7.03 4.78 5.34 4.96 3.05 3.61
4.79 2.63 3.47 8.22 3.67 5.64 3.68 2.47 3.27 2.82 1.82 3 2.89 1.81 2.47 2.86 1.84 2.59 3.7 2.3 2.75
2.5 1.41 1.73 5.2 2.65 2.94 1.68 1.15 1.46 2.39 1.59 1.83 1.92 1.23 1.55 2.2 1.52 1.93 1.75 1.09 1.48
Z. Chang et al. / Energy 187 (2019) 115804
11
Fig. 8. Performance of five models on different days of week.
forecast, the latest electricity price data of France from January 06, 2019 to February 16, 2019 are adopted to illustrate the excellence of WT-Adam-LSTM. The training set and test set of data are divided in the same way as above. As can be seen in Table 8, the proposed WTAdam-LSTM has the most accurate prediction of electricity price compared with the other models listed. 5. Conclusion Over the last 15 years, electricity price forecasts have played a
vital role in energy companies’ decision-making mechanisms. In this study, a new hybrid model combining wavelet transform and Adam optimized LSTM neural network for electricity price forecasting is proposed and tested in the electricity markets of New South Wales and French. After nonlinear sequence is decomposed by wavelet transform, the processed price series have a more stable variance so that Adam optimized LSTM neural network can accurately capture the appropriate behavior of electricity prices. To validate the superiority of proposed model, WT-Adam-LSTM is applied to recent price forecasting of French market and compared
Table 8 MAPE (%), MAE and RMSE of different forecasting models.
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
MAPE MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE MAE RMSE
BP
GRNN
LSTM
WT-Adam-LSTM
59.3 15.69 16.3 31.2 10.1 11.74 14.32 6.97 7.54 7.45 3.71 4.23 6.56 3.42 4.16 7.19 3.43 3.85 25.57 10.92 12.83
58.9 15.56 16.12 31.2 10.1 11.74 13.85 6.89 7.5 7.45 3.71 4.23 6.53 3.4 4.12 7.19 3.43 3.85 25.23 10.67 12.14
19.24 5.12 6.19 18.42 5.53 8.86 6.29 3.1 3.27 6.29 3.05 3.87 5.49 2.65 3.59 5.92 2.83 3.47 7.81 3.4 4.28
6.25 1.9 2.4 7.56 2.77 3.46 3.33 1.75 2.11 3.02 1.58 2.17 3.34 1.69 2.06 2.87 1.47 2.03 4.09 1.83 2.28
12
Z. Chang et al. / Energy 187 (2019) 115804
with latest publications reported in the literature. The conclusions of this work are as follows. (1) In case 1, in order to find the optimizer that best fits the LSTM neural network, three most popular optimizers, SGD, RMSprop and Adam, are listed to compare. The results show that Adam is able to outperform the other two in electricity price forecasting of NSW energy market. (2) In case 2, proposed WT-Adam-LSTM is compared with the hybrid model which combined wavelet transform, statistical method ARMA and kernel-based ELM, the results of MAPE, U1 and U2 prove that WT-Adam-LSTM's forecasting accuracy is significantly improved. (3) In case 3, two comparisons between WT-Adam-LSTM and DE-LSTM are exhibited based on the price data from New South Wales and French, respectively. The accuracy of WTAdam-LSTM is still better than that of DE-LSTM and ARIMA-ANN. (4) In case 4, the price data of France from July 22, 2018 to September 1, 2018, and from January 6, 2019 to February 16, 2019, are adopted to illustrate the excellence of proposed model, it turns out that the accuracy of WT-Adam-LSTM is satisfactory. From the experimental results of 4 cases it is reflected that proposed hybrid model exhibits better performance than that of existing models. So it can be concluded that WT-Adam-LSTM is a satisfactory approach in terms of electricity price forecasting. In future work, more influential factors, such as electricity demand and weather, would be taken into account, and the application area of this research will be extended to natural gas forecasting and wind power analysis etc. References [1] He K, Yu L, Tang L. Electricity price forecasting with a BED (Bivariate EMD Denoising) methodology. Energy 2015;91:601e9. [2] Elamin N, Fukushige M. Modeling and forecasting hourly electricity demand by SARIMAX with interactions. Energy 2018;165:257e68. [3] Singh N, Mohanty SR, Shukla RD. Short term electricity price forecast based on environmentally adapted generalized neuron. Energy 2017;125:127e39. [4] Lago J, et al. Forecasting day-ahead electricity prices in Europe: the importance of considering market integration. Appl Energy 2018;211:890e903. [5] Nowotarski J, Weron R. On the importance of the long-term seasonal component in day-ahead electricity price forecasting. Energy Econ 2016;57: 228e35. [6] Weron R. Electricity price forecasting: a review of the state-of-the-art with a look into the future. Int J Forecast 2014;30(4):1030e81. [7] Tan Z, et al. Day-ahead electricity price forecasting using wavelet transform combined with ARIMA and GARCH models. Appl Energy 2010;87(11): 3606e10. [8] Dong Y, et al. Short-term electricity price forecast based on the improved hybrid model. Energy Convers Manag 2011;52(8e9):2987e95. [9] Panapakidis IP, Dagoumas AS. Day-ahead electricity price forecasting via the application of artificial neural network based models. Appl Energy 2016;172: 132e51. [10] Shrivastava NA, Panigrahi BK. A hybrid wavelet-ELM based short term price forecasting for electricity markets. Int J Electr Power Energy Syst 2014;55: 41e50. [11] Yan X, Chowdhury NA. Mid-term electricity market clearing price forecasting: a multiple SVM approach. Int J Electr Power Energy Syst 2014;58:206e14. [12] Sundermeyer M, Schlüter R, Ney H. LSTM neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association; 2012. [13] Graves A, Jaitly N, Mohamed A-r. Hybrid speech recognition with deep bidirectional LSTM. In: Automatic speech recognition and understanding (ASRU), 2013 IEEE workshop on. IEEE; 2013. [14] Chen Q, et al. Enhanced lstm for natural language inference. 2016. arXiv preprint arXiv:1609.06038. [15] Parascandolo G, Huttunen H, Virtanen T. Recurrent neural networks for
[16]
[17]
[18]
[19] [20] [21] [22] [23] [24]
[25] [26] [27]
[28] [29]
[30]
[31]
[32] [33]
[34] [35] [36]
[37] [38]
[39] [40]
[41] [42] [43] [44] [45]
[46]
[47]
polyphonic sound event detection in real life recordings. 2016. arXiv preprint arXiv:1604.00861. Chen K, Zhou Y, Dai F. A LSTM-based method for stock returns prediction: a case study of China stock market. In: Big data (big data), 2015 IEEE international conference on. IEEE; 2015. Kim HY, Won CH. Forecasting the volatility of stock price index: a hybrid model integrating LSTM with multiple GARCH-type models. Expert Syst Appl 2018;103:25e37. Liu H, Mi X-w, Li Y-f. Wind speed forecasting method based on deep learning strategy using empirical wavelet transform, long short term memory neural network and Elman neural network. Energy Convers Manag 2018;156: 498e514. Peng L, et al. Effective long short-term memory with differential evolution algorithm for electricity price prediction. Energy 2018;162:1301e14. Kingma DP, Ba J. Adam: a method for stochastic optimization. 2014. arXiv preprint arXiv:1412.6980. Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 2011;12(Jul):2121e59. Tieleman T, Hinton G. Lecture 6.5-RMSProp, COURSERA: neural networks for machine learning. University of Toronto, Technical Report; 2012. Peng Y, et al. Chemical-protein relation extraction with ensembles of SVM, CNN, and RNN models. 2018. arXiv preprint arXiv:1802.01255. Bansal T, Belanger D, McCallum A. Ask the gru: multi-task learning for deep text recommendations. In: Proceedings of the 10th ACM conference on recommender systems. ACM; 2016. Arik SO, et al. Convolutional recurrent neural networks for small-footprint keyword spotting. 2017. arXiv preprint arXiv:1703.05390. Reddy SS, Jung C-M. Short-term load forecasting using artificial neural networks and wavelet transform. Int J Appl Eng Res 2016;11(19):9831e6. Mandal P, et al. A novel hybrid approach using wavelet, firefly algorithm, and fuzzy ARTMAP for day-ahead electricity price forecasting. IEEE Trans Power Syst 2013;28(2):1041e51. Zhang J, Tan Z. Day-ahead electricity price forecasting using WT, CLSSVM and EGARCH model. Int J Electr Power Energy Syst 2013;45(1):362e8. Yang Z, Ce L, Lian L. Electricity price forecasting by a hybrid model, combining wavelet transform, ARMA and kernel-based extreme learning machine methods. Appl Energy 2017;190:291e305. Shayeghi H, Ghasemi A. Day-ahead electricity prices forecasting by a modified CGSA technique and hybrid WT in LSSVM based scheme. Energy Convers Manag 2013;74:482e91. Nelson DM, Pereira AC, de Oliveira RA. Stock market's price movement prediction with LSTM neural networks. In: 2017 international joint conference on neural networks (IJCNN). IEEE; 2017. Boix M, Canto B. Wavelet Transform application to the compression of images. Math Comput Model 2010;52(7e8):1265e70. Luo Z, Takiguchi T, Ariki Y. Emotional voice conversion using neural networks with different temporal scales of F0 based on wavelet transform. In: SSW; 2016. Maheswaran R, Khosa R. Comparative study of different wavelets for hydrologic forecasting. Comput Geosci 2012;46:284e95. http://ataspinar.com/2018/12/21/a-guide-for-using-the-wavelet-transformin-machine-learning/. Shao Z, Yang S-L, Gao F. Density prediction and dimensionality reduction of mid-term electricity demand in China: a new semiparametric-based additive model. Energy Convers Manag 2014;87:439e54. Reddy SS. Bat algorithm-based back propagation approach for short-term load forecasting considering weather factors. Electr Eng 2017:1e7. Voronin S, Partanen J. Forecasting electricity price and demand using a hybrid approach based on wavelet transform, ARIMA and neural networks. Int J Energy Res 2014;38(5):626e37. Conejo AJ, et al. Day-ahead electricity price forecasting using the wavelet transform and ARIMA models. IEEE Trans Power Syst 2005;20(2):1035e42. Wang D, et al. Multi-step ahead electricity price forecasting using a hybrid model based on two-layer decomposition technique and BP neural network optimized by firefly algorithm. Appl Energy 2017;190:390e407. Chen X, et al. Electricity price forecasting with extreme learning machine and bootstrapping. IEEE Trans Power Syst 2012;27(4):2055e62. Schlüter S, Deuschle C. Using wavelets for time series forecasting: does it pay off? [IWQW Discussion Papers]. 2010. http://www.epexspot.com/. http://www.aemo.com.au/Electricity/. Shen C, et al. Two noise-robust axial scanning multi-image phase retrieval algorithms based on Pauta criterion and smoothness constraint. Opt Express 2017;25(14):16235e49. Peng L, Liu S, Liu R, et al. Effective long short-term memory with differential evolution algorithm for electricity price prediction. Energy 2018;162: 1301e14. Babu CN, Reddy BE. A moving-average filter based hybrid ARIMAeANN model for forecasting time series data. Appl Soft Comput 2014;23:27e38.