Applied Soft Computing 10 (2010) 784–792
Contents lists available at ScienceDirect
Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc
Application of a new hybrid neuro-evolutionary system for day-ahead price forecasting of electricity markets Nima Amjady *, Farshid Keynia Department of Electrical Engineering, Semnan University, Semnan, Iran
A R T I C L E I N F O
A B S T R A C T
Article history: Received 24 April 2008 Received in revised form 11 April 2009 Accepted 5 September 2009 Available online 15 September 2009
In this paper, a new forecast strategy is proposed for day-ahead prediction of electricity prices, which are so valuable for both producers and consumers in the new competitive electric power markets. However, electricity price has a nonlinear, volatile and time dependent behavior owning many outliers. Our forecast strategy is composed of a preprocessor and a Hybrid Neuro-Evolutionary System (HNES). Preprocessor selects the input features of the HNES according to MRMR (Maximum Relevance Minimum Redundancy) principal. The HNES is composed of three Neural Networks (NN) and Evolutionary Algorithms (EA) in a cascaded structure with a new data flow among its building blocks. The effectiveness of the whole proposed method is demonstrated by means of real data of the PJM and Spanish electricity markets. Also, the proposed price forecast strategy is compared with some of the most recent techniques in the area. ß 2009 Elsevier B.V. All rights reserved.
Keywords: Hybrid neuro-evolutionary system Neural network Evolutionary algorithm Price forecast
1. Introduction In recent years, the traditional vertically integrated electric utility structure has been deregulated and replaced by a competitive market scheme in many countries worldwide [1]. The main objective of power system restructuring and deregulation is to introduce competition in the power industry and provide more choices to market participants in the way they trade electricity and ancillary services. In restructured power systems with hybrid market structure, generation companies (or customers) can sell (or buy) electricity either from a centralized power pool or directly through bilateral contracts [2]. The deregulation of the electricity industry is radically changing the manner in which electric power utilities do their business. Optimal decisions now will be highly dependent on market electricity prices. Decisions regarding the operation of a power plant, acquisition of a new machine, and disposal of a present plant all will be influenced by forecasted market prices of electricity. The basic risk management functions of market participants will also depend on the knowledge of the future behavior of electricity prices in the market. The volatility in the wholesale market price of electricity during the last several years clearly indicates that there is a strong need in the electric power industry for practical models
* Corresponding author. Tel.: +98 021 88797174; fax: +98 021 88880098. E-mail address:
[email protected] (N. Amjady). 1568-4946/$ – see front matter ß 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2009.09.008
that suitably represent the stochastic behavior of the market price of electricity [3]. One of the key components in liberalized power sectors is the short-term electricity market, where hourly energy prices are set [4]. The market can be settled by two main settlement mechanisms, namely, the pay-as-offer (also referred to as pay-as-bid) mechanism, where each selected supplier is paid at its offer price, and the pay-at-MCP mechanism, where all selected suppliers are paid at a uniform market clearing price (MCP), usually the price of the most expensive selected offer. However, in practice, the pay-atMCP settlement mechanism is widely accepted and used for payments [5,6]. Companies that trade in electricity markets make extensive use of price forecast techniques either to bid or hedge against volatility. A producer that is able to predict pool prices can adjust its own price/production schedule depending on hourly pool prices and its own production costs. Similarly, once a good next-day price forecast is available, a large consumer can derive a plan to maximize its own utility using the electricity purchased from the pool. Besides, a good knowledge of future pool prices is very useful in valuating bilateral contracts more accurately [7]. In [5] it has been shown that MCPs are most important to determine settlement costs and have significant impacts on forward transactions outside of the ISO markets. This paper considers day-ahead price prediction. It is noted that after the day-ahead market, where the major part of the total energy is traded, subsequent short-term market mechanisms (such as, intraday markets, ancillary reserves, and real-time
N. Amjady, F. Keynia / Applied Soft Computing 10 (2010) 784–792
markets) can be executed in order to provide the final balance between power generation and demand [6]. However, usually the major part of the volume traded in the electricity markets are related to the day-ahead market [8]. In this paper, a new Hybrid Neuro-Evolutionary System (HNES) is proposed for electricity price forecast, which is composed of Neural Networks (NN) and Evolutionary Algorithms (EA) in a cascaded structure. Besides, a new data flow is designed among the building blocks of the HNES to enhance its forecast accuracy. A two step feature selection method based on MRMR (Maximum Relevance Minimum Redundancy) principal [9] is used to select the input features of the HNES. Both measured and predicted inputs are considered as the candidate inputs, which among them, the best input features are adaptively selected for each forecast day by the proposed feature selection method. The remaining parts of the paper are organized as follows. In Section 2, an overview of the problem, i.e., the complexities of the price prediction and its previous solution methods are presented. In Section 3, the rationale behind the proposed technique and details of its implementation are described. Obtained numerical results are presented and discussed in Section 4. A brief review of the paper and the future research are in Section 5. 2. The Problem description Electricity price is a nonlinear, time variant, nonstationary, and volatile signal owning multiple periodicity, high frequency components and significant outliers, i.e., unusual prices (especially in periods of high demand) due to unexpected or uncontrolled events in the electricity markets [6,10]. In general, electricity market has its own complexities due to differences from most other commodities. The electrical energy cannot be considerably stored and the power system stability requires constant balance between generation and load. On short time scales, most users of electricity are unaware of or indifferent to its price. Transmission bottlenecks usually limit electricity transportation from one region to another. These facts enforce the extreme price volatility or even price spikes of the electricity market, e.g., the price spikes of the PJM (Pennsylvania–New Jersey–Maryland) and California markets in 1999 and 2000, respectively [6,7]. Besides, some researchers claimed that a uniform auction worsens spot price volatility as compared to a discriminatory auction [7]. Generation companies (GENCO) can create price volatility and outliers by their strategic biddings or, in other words, through exercising market power [1]. Since in a competitive environment all participants have the freedom to operate independently, the overall level of uncertainty in the operation of the power system increases, and the variables that might be relevant proliferate [11]. The real-time market activities can affect MCPs and so the dayahead MCP prediction is difficult under this market structure [12]. In recent years, several methods have been applied to predict prices in the electricity markets. Stationary time series models such as Auto-Regressive (AR) [13], dynamic regression and transfer function [14], Auto-Regressive Integrated Moving Average (ARIMA) [15] and nonstationary time series models like Generalized Auto-Regressive Conditional Heteroskedastic (GARCH) [7] have been proposed for this purpose. However, most of time series models are linear predictors, while electricity price is a nonlinear function of its input features. So, the behavior of price signal cannot be completely captured by the time series techniques [10]. To solve this problem, some other research works considered NN and Fuzzy Neural Networks (FNN) for price forecast [6,12,16–18]. NNs and FNNs have the capability of modeling the nonlinear input/output mapping functions. However, electricity price is a time variant signal and its functional relationships vary with time [6,17]. So, derived information or extracted feature of the NN or FNN may lose
785
its value. While it seems that the NN or FNN learns well the training data, they may encounter large prediction errors in the test phase. In [19], combination of fuzzy inference system (FIS) and leastsquares estimation (LSE) has been proposed for prediction of LMP (locational marginal price). When the available least-cost energy cannot be delivered to load in a transmission-constrained area, higher-cost generation units have to be dispatched to meet that load. In this situation, the price of energy in the constrained area is higher than the unconstrained market clearing price. LMP is defined as the price of lowest-cost resources available to meet the load, subject to delivery constraints of the physical network. LMP is an efficient way of pricing energy supply when transmission constraints exist [19]. In [20], combination of similar day and NN techniques is proposed for LMP prediction. Some other researchers tried to decompose price time series with the aim of finding less volatile and less time variant components. To do this, different filtering and decomposition techniques like wavelet transform [21], decoupled extended Kalman filter (DEKF) [18] and input output hidden Markov model (IOHMM) [17] have been applied. However, by decomposition of the price signal, including high frequency components, to a finite number of constitutive series or blocks some information is lost. Besides, uncertainty of the obtained components may be in the order of the original price signal or even be more than it. In this paper, a new forecast strategy is proposed, which can modify some imperfections of the previous methods for day-ahead electricity price prediction. Details of this forecast strategy are described in the following section. 3. The proposed hybrid intelligent system As previously described, electricity price is a nonlinear, time variant and multi-variable function. For instance, electricity price depends on its previous values, load values, available generation, etc. [10,12,16]. It is very hard for a single NN to capture correct input/ output mapping function of such a signal in all time periods. However, our experience shows that different learning algorithms can usually cover deficiencies of each other for price forecast provided that a correct data flow is constructed among them. Some researchers proposed parallel and cascaded structures for this purpose [12,16,17]. However, these structures share input data among their building blocks. So, obtained knowledge of a block is not really transferred to the other blocks. To solve this problem, a new HNES with the architecture shown in Fig. 1 is proposed in this paper. The HNES is composed of three NNs, each one owning a suitable learning algorithm for price forecasting. All NNs of the HNES have the same multi-layer perceptron (MLP) structure. MLP can approximate any continuous multivariate function to a desired degree of accuracy, provided that there are a sufficient number of hidden neurons [12,16]. Besides, according to Kolmogorov’s theorem, the MLP can solve a problem by using one hidden layer, provided it has the proper number of neurons [22]. So, one hidden layer has been considered in the MLP structure of all NNs of the HNES. Fig. 1 shows that each NN transfers two kinds of results to the next NN. The first set of results includes obtained values for the adjustable parameters, i.e., weights and bias values. In other words, each NN transfers its obtained knowledge to the next one and so it can begin its learning process from the point that the previous NN terminated (instead of beginning from a random point). Only the first NN should begin with an initial set of random values for the adjustable parameters. Since all NNs have the same MLP structure, these weights and bias values can be directly used by the next NN and then it can increase the obtained knowledge of the previous one. By suitable selection of training algorithms of the NNs, the HNES can learn much more than a single NN, i.e., a better tuning for the adjustable parameters can be obtained. For the price
N. Amjady, F. Keynia / Applied Soft Computing 10 (2010) 784–792
786
Fig. 1. Structure of the proposed forecast method including the preprocessor, HNES and its initial predictors.
prediction, we considered three NNs in the HNES, owning LM (Levenberg-Marquardt), BFGS (Broyden, Fletcher, Goldfarb, Shanno), and BR (Bayesian Regularization) learning algorithms, respectively. These are some of the most efficient NN training mechanisms for the prediction tasks [23–25]. Since these learning algorithms are based on the variations of the Newton’s method, at first this method is briefly introduced and then each learning algorithm is described. Newton’s update for minimizing a function V(x) with respect to the vector x is given by [25]: 1
Dx ¼ ½r2 VðxÞ rVðxÞ
(1)
2
where 5 V(x) is the Hessian matrix and 5V(x) is the gradient vector. The vector x includes adjustable parameters of the NN, i.e., its weights and bias values. Assuming that V(x) is the sum of square errors, given by: VðxÞ ¼
N X
where e(x) is the error vector as follows: eðxÞ ¼ ½e1 ðxÞ; e2 ðxÞ; . . . ; eN ðxÞT and J(x) is the Jacobian 2 @e1 ðxÞ @e1 ðxÞ 6 @x @x2 1 6 6 @e2 ðxÞ @e2 ðxÞ 6 6 @x2 JðxÞ ¼ 6 @x1 6. .. 6 .. . 6 4 @eN ðxÞ @eN ðxÞ @x1 @x2
matrix given by: 3 @e1 ðxÞ ... @xn 7 7 @e2 ðxÞ 7 7 ... @xn 7 7 7 7 ... ... 7 @eN ðxÞ 5 ... @xn
(5)
(6)
where n indicates dimension of the vector x. Also, S(x) in (4) is given by: SðxÞ ¼
N X
em ðxÞr2 em ðxÞ
(7)
m¼1
e2m ðxÞ
(2)
m¼1
then: rVðxÞ ¼ 2J T ðxÞeðxÞ
(3)
r2 VðxÞ ¼ 2JT ðxÞJðxÞ þ 2SðxÞ
(4)
However, computation of the second-order derivatives of the error vector is expensive. LM learning algorithm uses a Gauss–Newton approximation to the Hessian matrix. Neglecting the second-order derivatives of the error vector, i.e., assuming that S(x) 0, the Hessian matrix is given by: r2 VðxÞ ¼ 2JT ðxÞJðxÞ
(8)
N. Amjady, F. Keynia / Applied Soft Computing 10 (2010) 784–792
algorithm can be summarized as the following step-by-step procedure:
By substituting (8) and (3) into (1) we obtain: 1
Dx ¼ ½JT ðxÞJðxÞ JT ðxÞeðxÞ
(9)
The advantage of the Gauss–Newton over the standard Newton’s method is that it does not require calculation of second-order derivatives. Nevertheless, the matrix JT(x)J(x) in (9) may not be invertible. This is overcome with the LM algorithm, which consists in finding the update given by: 1
Dx ¼ ½JT ðxÞJðxÞ þ mI JT ðxÞeðxÞ
(10)
where I indicates the identity matrix and m is a scalar parameter, which is conveniently modified during the algorithm iterations. More details about this matter can be found in [24,25]. The BFGS learning algorithm is also derived from the Newton’s method in optimization. The BFGS is the most successful quasiNewton method for the NN training [23]. In quasi-Newton methods the Hessian matrix of second derivatives of the function to be minimized, i.e., 52V(x), does not need to be computed at any stage. The Hessian is updated by analyzing successive gradient vectors instead. In the BFGS algorithm, from an initial guess x0 and an approximate Hessian matrix B0 the following steps are repeated until x converges to the solution. (1) Obtain a step sk by solving: Bk sk ¼ rVðxk Þ
(11)
(2) Perform a line search to find the optimal step size ak in the direction found in the first step, then update xkþ1 ¼ xk þ ak sk
(12)
Mathematical details about line search method can be found in [26]. (3) yk ¼ rVðxkþ1 Þ rVðxk Þ (4) Bkþ1
y yT B s ðB s ÞT ¼ Bk þ kT k k kT k k yk sk sk Bk sk
(13) (14)
(15)
where V(x) is the sum of squared errors defined in (2) and U(x) is the sum of squares of the network weights UðxÞ ¼
n X x2i
(1) Initialize a and b of the objective function and the weights x of the NN. Initial values of a and b are chosen as a = 0 and b = 1. After the first training step, the objective function parameters will recover from the initial setting. Also, in the HNN, BR learning algorithm is used for the NN3 as shown in Fig. 1. So, the initial values for the vector x in the BR learning algorithm of the HNN are the obtained values by its previous neural network NN2, trained by the BFGS. (2) Take one step of the LM algorithm to minimize the objective function. Here, F(x) instead of V(x) is considered as the objective function. (3) Compute the effective number of parameters defined for the BR algorithm as follows: 1
g ¼ n 2atrðr2 FðxÞÞ
(16)
i¼1
In (15), a and b are the parameters of the objective function F(x), which should be minimized in the BR learning algorithm. BR
(17)
where tr(.) indicates the trace function that is summation of diagonal elements of its argument (its argument is a matrix). The parameter g is a measure of how many parameters in the neural network are effectively used in reducing the error function. It can range from zero to n. For (17), the Gauss– Newton approximation to the Hessian available in the LM learning algorithm is used. From (15), we have: r2 FðxÞ ¼ br2 VðxÞ þ ar2 UðxÞ
(18)
Using the Gauss–Newton approximation, the first term in the right hand side of (18) can be obtained as shown in (8). The second term, based on (16), can be written as: r2 UðxÞ ¼ 2I
(19)
Based on (8) and (19), (18) can be written as: r2 FðxÞ ¼ 2bJ T ðxÞJðxÞ þ 2aI
Practically, B0 can be initialized with B0 = I, so that the first step will be equivalent to a gradient descent, but further steps are more and more refined by Bk, the approximation to the Hessian. More details about the BFGS learning algorithm can be found in [23,24]. BR learning algorithm minimizes a linear combination of squared errors and weights. It also modifies the linear combination so that at the end of training, the resulting network has good generalization qualities. The Bayesian regularization takes place within the LM algorithm. The optimal regularization technique requires the computation of the Hessian matrix. To minimize the computational overhead, a Gauss–Newton approximation to the Hessian matrix is used. This approximation is available when using the LM algorithm for network training (shown in (8)). Typically, training aims to reduce the sum of squared errors as represented in (2). However, regularization adds an additional term; the objective function becomes FðxÞ ¼ bVðxÞ þ aUðxÞ
787
(20)
where J(x) is the Jacobian matrix given by (6). (4) Compute new estimates for the objective function parameters as follows [27]:
a¼ b¼
g 2UðxÞ Ng 2VðxÞ
(21)
(22)
(5) Now iterate step 2 through 4 until convergence. More details about the BR learning algorithm can be found in [23,27]. LM is a fast learning algorithm. At the beginning of the training phase, usually MLP can rapidly learn about the problem and its training error quickly decreases. So the LM algorithm is selected for the first NN. BFGS learning algorithm can perform a better search of the solution space provided that it starts from a suitable initial point. Solution space for a NN can be considered as a vector space, where its dimensions are the adjustable parameters of the NN. After the LM algorithm is saturated, the second NN is trained by the BFGS training mechanism to find a fitter solution for the adjustable parameters of the NN. BR learning algorithm is considered for the last NN for final tuning of the adjustable parameters and getting the utmost training efficiency. To avoid the overfitting problem of NNs, a cross-validation technique is used in the HNES, in which a part of training set is removed and retained as the validation set. Training of each NN of the HNES is performed by the remaining part of the training set and
788
N. Amjady, F. Keynia / Applied Soft Computing 10 (2010) 784–792
so the validation set becomes unseen for each NN. Prediction error of each NN for the validation set can be a measure of its error for the forecast day, evaluating the generalization capability of the NN. More similarity between price behavior in validation and test periods results in more efficiency of the cross-validation technique. So, we examined various alternatives for the validation set like the day before the forecast day, the same day in the last week, and random validation sets. Among them, the day before the forecast day as the validation period results in the better performance of the cross-validation technique. Progress of training phase of each NN is controlled by the validation error (the error of the validation set). When the NN begins to overfit the training samples, the validation error will typically begin to rise [22]. So, at this point the training phase is terminated (early stopping). Another technique, which is used to enhance the performance of the HNES is local search. Empirically we have seen that in the NN applications for the electricity price prediction, near optimum solutions are close to each other in the solution space. At the end of the training phase the learning algorithm may find one of these solutions while the better ones might be in its vicinity and unseen for the NN, since the learning algorithms usually search the solution space in a special direction (like the steepest descent). So, this makes the motivation to search around the final solution of the learning algorithm in various directions as much as possible to find a better solution. Evolutionary algorithm (EA) can be a suitable candidate for what is required. The evolution of the proposed EA is based on the following relations:
Dxið jþ1Þ ¼ m:Dxið jÞ þ ð1 mÞ:g:xið jÞ
(23)
xið jþ1Þ ¼ xið jÞ þ Dxið jþ1Þ
(24)
where xi is the ith element of the vector x or the ith adjustable parameter (weight or bias value) of the NN. It is noted that 1 i n. Dxi indicates change of xi. The subscripts j and j + 1 represent two successive generations (parent and child, respectively) of the EA. g is a small random number separately generated for each adjustable parameter. m is momentum constant. Use of the momentum can smooth the search path decreasing sudden changes. In our examinations m = 0.5 and g is selected in the range of (0, 0.1) for all generations of the EA. In other words, a uniform search is selected for the proposed EA, without using localizing techniques, such as hill climbing operator [28]. Non-uniform searches are suitable when there is one optimum point in the solution space and it is desired that the stochastic search technique, like EA, converges to it. However, in our problem there are several optimums and each one may be better than the other.
At the beginning of the EA, xi(0) is the obtained value from the learning algorithm for the adjustable parameter xi and Dxi(0) = 0. In each cycle, the EA repeats (23) and (24) for all elements xi (i = 1, 2, . . ., n) until the next generation of all adjustable parameters is obtained. Then, the error function (validation error) of the NN is evaluated for the new generation. If the child has less validation error than its parent, the parent is replaced by the child, otherwise the parent is restored and the next cycle of the EA is executed. So, at the end of the EA, the best examined solution among all generations will be selected. The EA is executed after each learning algorithm of the HNES to enhance the training efficiency at most. This matter is shown in Fig. 1 for the three neural networks of the HNES as ‘‘NN1 (LM + EA1)’’, ‘‘NN2 (BFGS + EA2)’’ and ‘‘NN3 (BR + EA3)’’, respectively. So, the whole training phase of the HNES can be represented as follows: LM
EA1
BFGS
EA2
BR
EA3
xInitial !xðLMÞ!xðNN1Þ! xðBFGSÞ!xðNN2Þ!xðBRÞ!xðNN3Þ (25) In (25), xInitial indicates initial values for the vector x including the adjustable parameters (weights and bias values) of the NN. xInitial is obtained from random initialization. The LM learning algorithm of the NN1 of the HNES, shown in Fig. 1, begins from xInitial. After learning of the NN1 by the LM algorithm, the obtained values for the adjustable parameters, denoted by x(LM), are given to its respective EA, denoted by EA1 in (25) and Fig. 1. Then, the EA1 further optimizes x(LM) based on (23) and (24). The obtained values by EA1, denoted by x(NN1), indicate the adjustable parameters of the first neural network NN1. x(NN1) is also passed to the second neural network of the HNES, represented by NN2 in Fig. 1. The BFGS learning algorithm of NN2 begins from x(NN1) and its obtained result, denoted by x(BFGS), is given to the EA2 for further optimization. Similarly, the obtained values by the EA2, i.e., x(NN2), indicate the adjustable parameters of NN2, which are also given as initial values to the BR learning algorithm of NN3. In the same way, the obtained values by the BR, i.e., x(BR), are given to the EA3 and its obtained result x(NN3) represents the vector of the adjustable parameters used by the NN3. It is noted that all three EA parts including EA1, EA2 and EA3 (shown in Fig. 1 and (25)) use the same mechanism of EA with momentum represented in (23) and (24). To give a better insight about the performance of the HNES in the training phase, a typical curve for its validation error is shown in Fig. 2 (vertical axis has logarithmic scale). The validation error is in terms of MSE (mean square error) for 24 h of the day before the
Fig. 2. Typical curve for the validation error of the HNES.
N. Amjady, F. Keynia / Applied Soft Computing 10 (2010) 784–792
forecast day. This figure has been obtained for the PJM electricity market with training set from April 10, 2006 to May 28, 2006. As seen from Fig. 2, LM, BFGS, and BR learning algorithms have 42, 43 (97 54), and 49 (162 113) training iterations, respectively, based on the early stopping technique. Also, this figure shows that the three EA parts including EA1, EA2 and EA3 have 12 (54 42), 16 (113 97) and 18 (180 162) generations, respectively. It can be observed that the validation error decreases by 3.92e5 (8.27e-1/ 2.11e-6). Bearing in mind the validation error is prediction error for the unseen data (the validation set or the day before the forecast day, which is removed from the training set), this shows great generalization capability of the HNES for the problem of price prediction. Contribution of different learning algorithms and EA executions to the generalization capability can be seen from Fig. 2. While the NNs of the HNES perform global search in the vector space, the EAs add the local search capability to the HNES. Correlation analysis has been recently used by some researchers for feature selection of electricity price forecast [19,29,30]. The correlation coefficient between two random variables a and b, denoted by ra,b, can be computed from the following relation:
ra;b ¼
covða; bÞ s a :s b
(26)
where cov(a,b) is the covariance of the random variables a and b: covða; bÞ ¼ Eðða ma Þðb mb ÞÞ
(27)
where E(.) is the expected value operator. ma and mb represent expected values of a and b, respectively:
ma ¼ EðaÞ
(28)
mb ¼ EðbÞ
(29)
Also, sa and sb in (26) are the standard deviations of a and b, respectively: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s a ¼ Eðða ma Þ2 Þ (30)
sb ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 Eððb mb Þ Þ
(31)
The previous works only consider the relevance of the candidate inputs to the target variable in the feature selection measured by the correlation coefficient between each candidate input and the target variable. However, another important aspect of the feature selection is filtering of redundant information and collinear inputs, which can degrade the training phase of the NNs, since the NNs should search in a spuriously enlarged solution space. Here, a two step correlation analysis is used for the feature selection based on the MRMR principle [9]. The proposed feature selection technique can evaluate the linear independency of the candidate inputs in addition to their relevance to the target variable and so it can filter out redundant information and collinear candidate inputs in addition to the irrelevant candidates. More correlation between the output feature and a candidate variable results in more chance of that variable to be selected as an input feature. So, if the correlation coefficient, defined in (26), between a candidate variable and the output feature is greater than a prespecified value cor1, then this candidate is retained for further processing; else, it is not considered any further. Next, for the retained candidates, a cross-correlation analysis is performed. If the correlation coefficient between any two candidate variables is smaller than a prespecified value cor2, then both variables are retained; else, only the variable with the largest correlation coefficient with the output feature is retained, while the other is not considered any further. Less correlation between candidate variables results in more linear independency and so more
789
information content of these variables. Retained variables after the two steps of correlation analysis are selected as the input features of the HNES. For the price prediction, the candidate inputs contain different lags of the price (as the auto-regression part) and exogenous variables like hourly system demand, hourly generation, surplus generation, fuel costs, generation companies’ shares, aggregated supply functions, day indicator, transmission congestion, etc. as the cross-regression part [6,19]. Load demand from the exogenous variables (as the most important price driver [20]) plus the autoregression part have been considered here, which is also the approach adopted by some previous researchers [7,14,15,19,20,31]. 200 lagged values of hourly price (P(h199), . . ., P(h)) and hourly load (L(h199), . . ., L(h)) are considered in the candidate set of input features to capture the effect of short run trend, daily periodicity and weekly periodicity [29,30]. This set of 2 200 = 400 candidate inputs are first normalized to eliminate the effect of different ranges of variables (price and load) and then are refined by the feature selection technique. These tasks are performed in the preprocessor as shown in Fig. 1. Besides, we discovered that the predicted values, obtained from the other forecast techniques can also be useful information for the price forecast. As seen from Fig. 1, price forecast of the next hour, i.e., Pˆðhþ1Þ , is provided for the HNES by the ARIMA time series [10,14,15,21]. The superscript ^ indicates the predicted value. Besides, load forecast of the next hour, i.e., Lˆðhþ1Þ , is also provided for the HNES by a batch NN, owning MLP structure and LM learning algorithm. So, the input features of the HNES consist of selected inputs by the two step correlation analysis (among the 400 candidate inputs) plus Pˆðhþ1Þ (ARIMA) and Lˆðhþ1Þ (batch NN). These input features are given to the first NN (NN1) of the HNES (Fig. 1). The NN2 of the HNES uses the same input features except that Pˆðhþ1Þ of the ARIMA is replaced by Pˆðhþ1Þ of the NN1. Similarly, the NN3 uses Pˆðhþ1Þ of the NN2. Pˆðhþ1Þ is the second kind of results, transferred between the NNs of the HNES (the first set of the transferred results includes the adjustable parameters as shown in Fig. 1). In other words, each NN of the HNES uses both the obtained knowledge and price forecast of its previous NN. Then it enhances the knowledge and produces a more accurate prediction for the price (Fig. 2). The ARIMA time series only uses the selected inputs by the two step correlation analysis to produce the initial price forecast for the HNES (Fig. 1). As seen from Fig. 1, Pˆðhþ1Þ (ARIMA) and Lˆðhþ1Þ (batch NN) are directly given to the HNES without participating in the feature selection process. This is due to the fact that in our experiments, when we insert these two variables to the candidate set of inputs (resulting in 400 + 2 = 402 candidates), they are always selected by the two step correlation analysis. In other words, not only Pˆðhþ1Þ and Lˆðhþ1Þ have high correlations with the output feature P(h+1) and so are selected by the first part of the correlation analysis, but also in competition with the other variables, they win and so are also selected by the cross-correlation part. Similarly, the obtained forecast of each NN of the HNES is used by the next one, since price forecast of each block usually has better accuracy than the previous one and so can be a better choice for the input feature of the next block. It is noted that in [12] load forecast has been considered as an input feature in a cascaded structure of NNs. However, we extended this idea for feature selection by consideration of both load and price forecasts. Besides, in [12], the measured value of L(h+1) has been used for the training phase. However, we consider Lˆðhþ1Þ for the training of the HNES (all three NNs). In this way, the HNES can see the load forecast error in the learning phase and so its NNs can be better adapted to the prediction phase when L(h+1) is not available. Finally, our important idea in transferring the obtained knowledge of a NN (its adjustable parameters) to the next one is not seen in [12].
790
N. Amjady, F. Keynia / Applied Soft Computing 10 (2010) 784–792
Now, the whole setup process of the forecast method including the training phase of the HNES and its preprocessing can be summarized as the following step-by-step algorithm: (1) Select the training period (e.g., 50 days ago) and remove the day before the forecast day from this period for cross-validation. This day should be unseen for both the preprocessor and forecasters (including batch NN, ARIMA, and HNES) to really simulate the situation of the forecast day. (2) Constitute the set of candidate inputs including lagged prices and loads up to 200 h ago. These candidate inputs are normalized and then refined by the feature selection technique. The selected features plus Pˆðhþ1Þ (ARIMA) and Lˆðhþ1Þ (batch NN) comprise the input features of the HNES. Training samples of the HNES include these input features and one output feature dedicated to the price of the next hour. (3) Three NNs of the HNES are trained by LM + EA1, BFGS + EA2, and BR + EA3 training mechanisms, respectively. At the end of training phase of each NN, its adjustable parameters and price forecast are transferred to the next one. The NN1 of the HNES is directly trained by the constructed training samples of step 2. However, in the training samples of NN2 and NN3, the input feature Pˆðhþ1Þ of ARIMA is replaced by Pˆðhþ1Þ of NN1 and NN2, respectively. The only remaining part to setup the proposed forecast method is fine-tuning of its degrees of freedom including cor1 and cor2 of the feature selection technique and number of hidden nodes of the NNs of the HNES (NH). All NNs of the HNES have the same number of hidden nodes NH, so that their adjustable parameters can be transferred. For tuning of cor1, cor2 and NH, the setup process of the HNES (as described in the above step-by-step algorithm) with different values of these parameters is executed and the values of cor1, cor2 and NH resulting in the least validation error are selected. To reduce the number of combinations that should be examined, the two step procedure of [22] has been also used. After the setup process, the HNES can predict hourly price values of the forecast day by the same price and load features selected by the feature selection technique. However, one point about the prediction phase should be mentioned. The variables of the short-run trend of price, e.g., P(h), P(h1), and P(h2), are usually among the selected features of the two step correlation analysis. In the day-ahead price forecast, measured values of these features are not known for further hours (e.g., for price forecast of 3 h ahead, 4 h ahead, etc.) and so their predicted values should be used. Thus, price forecast of an hour is used to predict the next hour and this cycle repeats, resulting in the propagation of error in the forecast horizon. As a consequence, further hours usually contain more forecast errors, referred to as the lead time effect [32]. If we can use from more accurate forecasts for initial hours, error propagation decreases and better predictions for all later hours can be obtained. Thus, each NN of the HNES (and also ARIMA), instead of recursive forecasting of the whole 24 h ahead, only predicts the next hour and transfers it to the following block until the final prediction of the HNES for that hour is obtained by the NN3 and inserted to the database (shown in Fig. 1). This final forecast, owning the least prediction error in our system, is used in the ARIMA and NNs of the HNES to predict its next hour. This cycle is repeated until prices of the whole 24 h ahead are forecasted. 4. Numerical results The proposed price forecast method is examined on the dayahead electricity markets of PJM and mainland Spain, which are two well-established electricity markets in the US and Europe, respectively. Moreover, the proposed method is compared with
Table 1 MAPE of the proposed method and SD + NN for the PJM electricity market in year 2006. Test period
SD + NN (%) [20]
Proposed method (%)
January 20 February 10 March 5 April 7 May 13 February 1–7 February 22–28
6.93 7.96 7.88 9.02 6.91 7.66 8.88
4.98 4.10 4.45 4.67 4.05 4.62 4.66
some of the most recent price forecast techniques. Data of the PJM and Spanish electricity markets have been obtained from [33,34], respectively. The first examination of this paper has been performed by means of real data of the PJM electricity market in year 2006. The obtained results for some test days and weeks are shown in Table 1 and compared with the results of [20]. This reference has proposed combination of similar days (SD) and NN techniques for day-ahead price forecasting. For the sake of a fair comparison, the same test days and weeks of [20] have been considered in Table 1 and the results of SD + NN have been directly quoted from [20]. Besides, the same error criterion of this reference, i.e., MAPE (Mean Absolute Percentage Error), has been used to measure the forecast error: MAPE ¼
P AVE ¼
N jP ðiÞ PˆðiÞ j 1X N i¼1 P AVE
(32)
N 1X P N i¼1 ðiÞ
(33)
where P(i) and PˆðiÞ are the actual and forecasted price of hour i, respectively. PAVE indicates the average of actual prices as defined in (33). N indicates forecast horizon. For daily and weekly MAPE, N is equal to 24 and 168, respectively. For the 5 test days and 2 test weeks of Table 1, daily MAPE and weekly MAPE are used, respectively. It is noted that in the classical definition of MAPE, P(i) is considered in its denominator [6]. However, PAVE is used here to avoid adverse effect of prices close to zero [20]. The forecasting error is the main concern for power engineers; a lower error indicates a better result. For all test days and weeks of Table 1, the proposed method outperforms SD + NN. Besides, considerable differences between the forecast errors of the proposed method and SD + NN are observed. In addition to the mean error, stability of results is another important factor for the comparison of forecast methods. In Table 2, variance of the prediction errors, as a measure of uncertainty, for the forecast days and weeks is presented. Again, the variance values of SD + NN have been quoted from [20]. The proposed method also has much lower error variances than SD + NN in all test days and weeks indicating less uncertainty in the predictions of the proposed method. To also give a graphical view about the price forecast accuracy of the propose method, its obtained results for the second test week of Table 2 Error variance of the proposed method and SD + NN for the PJM electricity market in year 2006. Test period
SD + NN [20]
Proposed method
January 20 February 10 March 5 April 7 May 13 February 1–7 February 22–28
0.0034 0.0050 0.0061 0.0038 0.0049 0.0066 0.0047
0.0020 0.0012 0.0015 0.0018 0.0013 0.0016 0.0017
N. Amjady, F. Keynia / Applied Soft Computing 10 (2010) 784–792
791
Table 4 MAPE for 4 weeks of Spanish electricity market in year 2002. Test week
ARIMA (%)
Wavelet-ARIMA (%)
FNN (%)
NN
HNES (%)
Winter Spring Summer Fall
6.32 6.36 13.39 13.78
4.78 5.69 10.70 11.27
4.62 5.30 9.84 10.32
5.23 5.36 11.40 13.65
4.28 4.39 6.53 5.37
Average
9.96
8.11
7.52
8.91
5.14
Table 5 Error variance for 4 weeks of Spanish electricity market in year 2002.
Fig. 3. Hourly prices (dark), price forecasts (blue), and absolute value of forecast errors (red) for the second test week of Table 1 (February 22–28, 2006). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.).
Table 1 (February 22–28) are shown in Fig. 3. As seen from this figure, the forecast curve acceptably follows the price curve. The error curve in Fig. 3 is uniformly close to zero without sharp error peaks, which is in accordance with the low value of error variance of the proposed method (0.0017). As another comparison, it is noted that ARMA time series, GARCH time series, FFNN (feed forward neural network), SVM (support vector machine), FIS (fuzzy inference system), LSE (least square estimation), and combination of FIS and LSE have been applied for day-ahead price forecast of PJM electricity market in [19]. Besides, different candidate variables have been also examined in [19]. However, their best MAPE value is about 9.6%, while our MAPE values for different test days and weeks of the PJM electricity market are in the range of 4.05–4.98% as shown in Table 1. To better illustrate the performance of the HNES, the obtained results from its modules are separately presented. However, for the sake of conciseness, this detailed analysis is only performed for the first test day of Tables 1 and 2 (January 20, 2006 of the PJM electricity market). Similar results are obtained for the other test periods. The obtained results from the feature selection technique, i.e., the selected candidates, for January 20, 2006 are {P(h1), P(h2), P(h12), P(h22), P(h23), P(h24), P(h25), P(h26), P(h47), P(h48), P(h49), P(h71), P(h72), P(h73), P(h96), P(h120), P(h144), P(h168), L(h1), L(h2), L(h24)}. These inputs are selected among the 400 candidates (200 lagged prices and 200 lagged loads). The selected candidates show the effect of short-run trend, daily periodicity and weekly periodicity. These selected inputs plus Pˆðhþ1Þ (obtained from the ARIMA) and Lˆðhþ1Þ (obtained from the batch NN) comprise the input features of the HNES (Fig. 1). The MAPE and error variance for the output of the ARIMA, NN1, NN2 and NN3 for January 20, 2006 are shown in Table 3. The output of NN3 is the final output of the HNES
Table 3 MAPE and error variance for the output of each part of the proposed method (January 20, 2006). Output
MAPE (%)
Error Variance
ARIMA NN1 NN2 NN3
10.34 6.16 5.32 4.98
0.0038 0.0030 0.0026 0.0020
Test Week
ARIMA
Wavelet-ARIMA
FNN
HNES
Winter Spring Summer Fall
0.0034 0.0020 0.0158 0.0157
0.0019 0.0025 0.0108 0.0103
0.0018 0.0019 0.0092 0.0088
0.0013 0.0015 0.0033 0.0022
Average
0.0092
0.0064
0.0054
0.0021
and has the same MAPE (4.98%) and error variance (0.0038) of Tables 1 and 2 for January 20, 2006. Fig. 2 shows contribution of each training mechanism (LM, BFGS and BR) and each EA part in the whole training phase of the HNES based on the validation error. However, Table 3 illustrates contribution of each trained NN (by both the respective training mechanism and EA part) in the prediction phase of the HNES (i.e., each trained NN how much improves the price forecast accuracy of the HNES). It is noted that the initial price forecast obtained from the ARIMA technique for January 20, 2006 is poorer than the price forecast of the SD + NN for this test day in terms of the both error criteria (10.34% MAPE and 0.0038 error variance of the ARIMA versus 6.93% MAPE and 0.0034 error variance of the SD + NN). However, when the ARIMA’s forecast passes through the modules of the HNES, its forecast accuracy and stability improves step-bystep, until better price forecast (than SD + NN) is obtained at the output of the HNES (4.98% MAPE and 0.0020 error variance). The second examination of this paper has been performed with means of real data of the Spanish electricity market in year 2002. Similarly, obtained results for four weeks corresponding to the four seasons of year 2002 are presented in Tables 4 and 5. In [6,21,25], the fourth week of February, May, August, and November are selected for winter, spring, summer, and fall seasons, respectively. For the sake of a fair comparison between the technique of the paper and the proposed methods of [6,21,25], the same test weeks have been also considered in this examination. In Tables 4 and 5, results of ARIMA and ARIMA plus wavelet transform have been quoted from [21]. The wavelet transform was used to decompose the price signal into less volatile components. The fourth column represents the obtained results from the FNN (quoted from [6]), in which instead of decomposing the price values, its solution space is softly divided into subspaces and each functional relationship of the price is implemented in one subspace. In the fifth column of Table 4, obtained results from a MLP neural network with LM learning algorithm are shown, quoted from [25]. As seen from Table 4, the HNES has better forecast accuracy than all other considered methods in all time periods due to its more learning capability. The HNES can perform a more thoroughly search of the solution space and obtains better understanding of the future behavior of the price signal. Besides, prediction of the HNES is more stable. In summer and fall seasons, volatility of the Spanish electricity market increases [6,21] and so all other methods encounter considerably larger prediction errors. However, accuracy of the HNES has smaller variations, since it searches many directions of the solution space and so it can finally find
792
N. Amjady, F. Keynia / Applied Soft Computing 10 (2010) 784–792
solutions with similar qualities for different time periods. In other words, volatility of the market has less effect on the HNES, an attractive characteristic for the market participants. Besides, Table 5 shows that the HNES has less error variances than the other methods in all time periods indicating less uncertainty in the HNES predictions. For the NN approach proposed in [25], the variance values have not been presented. Total setup time of the proposed method including the execution of preprocessor, training of HNES (with consideration of the fine-tuning of its degrees of freedom), and training of ARIMA and batch NN was about 30 min on a Pentium P4 2.8 GHz personal computer with 512 MB RAM memory. Although this training time is more than that of the single ARIMA, NN, and FNN, however it is acceptable within a day-ahead decision making framework. It is noted that the proposed method is designed for price forecast of day-ahead electricity markets. After training, response time of the HNES is negligible (about 1 s), since it only involves with the forward propagation of the neural networks. 5. Conclusion An accurate forecast of electricity prices is important for the companies that trade in the electricity markets. With more accurate price forecasts, these companies can offer better bidding strategies and minimize their financial risks. However, electricity price is a complex time series owning nonlinear, volatile and time dependent structure. At the same time, utility companies usually have limited and uncertain information for price prediction. Thus, an efficient and robust forecasting method is so valuable for the companies that trade in the electricity markets. This subject matter motivates many research works in the area, especially in the last decade. In this research work, a new hybrid strategy has been proposed for price prediction. The proposed strategy has a preprocessor, HNES as the main forecast block and auxiliary predictors (ARIMA and batch NN). The HNES is a combination of NNs and EAs. This strategy has been examined on the PJM and Spanish electricity markets and compared with some of the most recently published price forecast methods. These comparisons reveal the forecast capability of the proposed strategy. Optimization of the HNES structure, e.g., by inclusion of more efficient NNs and replacement of EAs with more effective stochastic search techniques, is a subject matter that demands further research. Moreover, enhancement of the feature selection analysis of the preprocessor can be also considered for the future work. References [1] J.D. Liu, T.T. Lie, K.L. Lo, An empirical method of dynamic oligopoly behavior analysis in electricity markets, IEEE Trans. Power Syst. 21 (2) (2006) 499–506. [2] Y. Ding, P. Wang, Reliability and price risk assessment of a restructured power system with hybrid market structure, IEEE Trans. Power Syst. 21 (1) (2006) 108–116. [3] J. Valenzuela, M. Mazumdar, A probability model for the electricity price duration curve under an oligopoly market, IEEE Trans. Power Syst. 20 (3) (2005) 1250–1256. [4] L.A. Barroso, R.D. Carneiro, S. Granville, M.P. Pereira, M.H.C. Fampa, Nash equilibrium in strategic bidding: a binary expansion approach, IEEE Trans. Power Syst. 21 (2) (2006) 629–638.
[5] P.B. Luh, W.E. Blankson, Y. Chen, J.H. Yan, G.A. Stern, S.C. Chang, F. Zhao, Payment cost minimization auction for deregulated electricity markets using surrogate optimization, IEEE Trans. Power Syst. 21 (2) (2006) 568–578. [6] N. Amjady, Day-ahead price forecasting of electricity markets by a new fuzzy neural network, IEEE Trans. Power Syst. 21 (2) (2006) 887–896. [7] R.C. Garcia, J. Contreras, M.V. Akkeren, J.B.C. Garcia, A GARCH forecasting model to predict day-ahead electricity prices, IEEE Trans. Power Syst. 20 (2) (2005) 867– 874. [8] M.A. Plazas, A.J. Conejo, F.J. Prieto, Multimarket optimal bidding for a power producer, IEEE Trans. Power Syst. 20 (4) (2005) 2041–2050. [9] H. Peng, F. Long, C. Ding, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Machine Intell. 27 (8) (2005) 1226–1238. [10] N. Amjady, M. Hemmati, Energy price forecasting—problems and proposals for such predictions, IEEE Power Energy Mag. 4 (2) (2006) 20–29. [11] M.P. Garcia, D.S. Kirschen, Forecasting system imbalance volumes in competitive electricity markets, IEEE Trans. Power Syst. 21 (1) (2006) 240–248. [12] L. Zhang, P.B. Luh, K. Kasiviswanathan, Energy clearing price prediction and confidence interval estimation with cascaded neural network, IEEE Trans. Power Syst. 18 (1) (2003) 99–105. [13] O.B. Fosso, A. Gjelsvik, A. Haugstad, M. Birger, I. Wangensteen, Generation scheduling in a deregulated system. The Norwegian case, IEEE Trans. Power Syst. 14 (1) (1999) 75–81. [14] F.J. Nogales, J. Contreras, A.J. Conejo, R. Espinola, Forecasting next-day electricity prices by time series models, IEEE Trans. Power Syst. 17 (2) (2002) 342–348. [15] J. Contreras, R. Espinola, F.J. Nogales, A.J. Conejo, ARIMA models to predict nextday electricity prices, IEEE Trans. Power Syst. 18 (3) (2003) 1014–1020. [16] J.J. Guo, P.B. Luh, Improving market clearing price prediction by using a committee machine of neural networks, IEEE Trans. Power Syst. 19 (4) (2004) 1867–1876. [17] A.M. Gonzalez, A.M. San Roque, J.G. Gonzalez, Modeling and forecasting electricity prices with input/output hidden Markov models, IEEE Trans. Power Syst. 20 (2) (2005) 13–24. [18] L. Zhang, P.B. Luh, Neural network-based market clearing price prediction and confidence interval estimation with an improved extended Kalman filter method, IEEE Trans. Power Syst. 20 (1) (2005) 59–66. [19] G. Li, C.C. Liu, C. Mattson, J. Lawarree, Day-ahead electricity price forecasting in a grid environment, IEEE Trans. Power Syst. 22 (1) (2007) 266–274. [20] P. Mandal, T. Senjyu, N. Urasaki, T. Funabashi, A.K. Srivastava, A novel approach to forecast electricity price for PJM using neural network and similar days method, IEEE Trans. Power Syst. 22 (4) (2007) 2058–2065. [21] A.J. Conejo, M.A. Plazas, R. Espinola, A.B. Molina, Day-ahead electricity price forecasting using the wavelet transform and arima models, IEEE Trans. Power Syst. 20 (2) (2005) 1035–1042. [22] G.J. Tsekouras, N.D. Hatziargyriou, E.N. Dialynas, An optimized adaptive neural network for annual midterm energy forecasting, IEEE Trans. Power Syst. 21 (1) (2006) 385–391. [23] N. Amjady, Introduction to Intelligent Systems, Semnan University Press, Semnan, Iran, 2002. [24] L. Fausett, Fundamentals of Neural Networks: Architectures, Algorithms, and Applications, Prentice-Hall, Englewood Cliffs, NJ, 1994. [25] J.P.S. Catala˜o, S.J.P.S. Mariano, V.M.F. Mendes, L.A.F.M. Ferreira, Short-term electricity prices forecasting in a competitive market: a neural network approach, Electr. Power Syst. Res. 77 (10) (2007) 1297–1304. [26] J.F. Blowey, A.W. Craig, T. Shardlow, Frontiers in Numerical Analysis, Springer Verlag, Berlin, 2003. [27] F. Dan Foresee, M.T. Hagan, Gauss–Newton approximation to Bayesian learning, Int. Conf. Neural Netw. 3 (1997) 1930–1935. [28] I.G. Damousis, A.G. Bakirtzis, P.S. Dokopoulos, A solution to the unit-commitment problem using integer-coded genetic algorithm, IEEE Trans. Power Syst. 19 (2) (2004) 1165–1172. [29] H. Zareipour, C.A. Canizares, K. Bhattacharya, J. Thomson, Application of publicdomain market information to forecast Ontario’s wholesale electricity prices, IEEE Trans. Power Syst. 21 (4) (2006) 1707–1717. [30] H.T. Pao, Forecasting electricity market pricing using artificial neural networks, Energy Conversion Manag. 48 (2007) 907–912. [31] F.J. Nogales, A.J. Conejo, Electricity price forecasting through transfer function models, J. Operat. Res. Soc. 57 (4) (2006) 350–356. [32] N. Amjady, Short-term bus load forecasting of power systems by a new hybrid method, IEEE Trans. Power Syst. 22 (1) (2007) 333–341. [33] PJM Web Site. http://www.pjm.com. [34] OMEL. Market operator of the electricity market of mainland Spain. [Online] Available: http://www.omel.es/frames/es/index.jsp.