Short-term electric load and temperature forecasting using wavelet echo state networks with neural reconstruction

Short-term electric load and temperature forecasting using wavelet echo state networks with neural reconstruction

Energy 57 (2013) 382e401 Contents lists available at SciVerse ScienceDirect Energy journal homepage: www.elsevier.com/locate/energy Short-term elec...

4MB Sizes 1 Downloads 79 Views

Energy 57 (2013) 382e401

Contents lists available at SciVerse ScienceDirect

Energy journal homepage: www.elsevier.com/locate/energy

Short-term electric load and temperature forecasting using wavelet echo state networks with neural reconstruction Ali Deihimi*, Omid Orang 1, Hemen Showkati 1 Bu-Ali Sina University, Department of Electrical Engineering, Shahid Fahmideh Street, 6517838683 Hamedan, Islamic Republic of Iran

a r t i c l e i n f o

a b s t r a c t

Article history: Received 4 October 2012 Received in revised form 29 May 2013 Accepted 2 June 2013 Available online 10 July 2013

In this paper, WESN (wavelet echo state network) with a novel ESN-based reconstruction stage is applied to both STLF (short-term load forecasting) and STTF (short-term temperature forecasting). Wavelet transform is used as the front stage for multi-resolution decomposition of load or temperature time series. ESNs function as forecasters for decomposed components. A modified shuffled frog leaping algorithm is used for optimizing ESNs. Both one-hour and 24-h ahead predictions are studied where the number of inputs are kept minimum. The performance of the proposed WESN-based load forecasters are investigated for three cases as the predicted temperature input is fed by actual temperatures, output of the WESN-based temperature forecasters and noisy temperatures. Effects of temperature errors on load forecasts are studied locally by sensitivity analysis. Hourly loads and temperatures of a North-American electric utility are used for this study. First, results of the proposed forecasters are compared with those of ESN-based forecasters that have previously shown high capability as stand-alone forecasters. Next, the WESN-based forecasters are compared with other models either previously tested on the data used here or to be rebuilt for testing on these data. Comparisons reveal significant improvements on accuracy of both STLF and STTF using the proposed forecasters.  2013 Elsevier Ltd. All rights reserved.

Keywords: Echo state network Short-term load forecasting Short-term temperature forecasting Shuffled frog leaping algorithm Wavelet transform

1. Introduction After liberalization of electric power industry and launching competitive power markets, precise prediction of electric power consumption has become much more important for both power system operators and market participants. One-hour and one-day ahead predictions of loads that are often referred to as STLF (short-term load forecasting) are crucial requirement for power market efficiency and power system economy and security. As response to such a prominent need, some hybrid methods have been proposed particularly in the last few years to improve accuracy of load forecasts. They mostly combine some methods from two main classes, namely soft computing (e.g. NN (neural network), EC (evolutionary computation), fuzzy logic, machine learning, etc.) and hard computing (e.g. regressions, BoxeJenkins, Kalman filter, signal spectral analysis, etc.) methods [1e30]. The hybrid methods that use a combination of WT (wavelet transform) and NNs have presented promising accuracy of load forecasts [18e30]. Different NN types in different combinations with WT have been used such

* Corresponding author. Tel.: þ98 8118292505; fax: þ98 8118380802. E-mail addresses: [email protected], [email protected] (A. Deihimi). 1 Both the authors are co-author. 0360-5442/$ e see front matter  2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.energy.2013.06.007

as MLPNNs (multilayer perceptron NNs) in Refs. [18e22], RBFNNs (radial basis function NNs) in Refs. [22e24], FNN (fuzzy NNs) in Refs. [22,25], Kohonen NNs in Ref. [26], WBNNs (wavelet-based NNs) (using wavelet activation functions) in Refs. [27e29] and Elman NNs in Ref. [30]. Recently, ESNs (echo state networks) have been employed in Ref. [31] as stand-alone forecasters with a minimum number of inputs for STLF, and results show high accuracy of load forecasts comparable to those given in Ref. [19] obtained from a combination of WT, MLPNNs and EC where the correlation analysis has been done to recognize the best multiple lagged inputs to MLPNNs. ESN not only benefits some feedbacks like other RNNs (recurrent neural networks) that enable them to model any complex dynamic behavior, but also gains a sparsely interconnected reservoir of neurons leading to a very fast and simple training procedure unlike the complicated and time consuming training process of other RNNs without reservoir. Therefore, a combination of WT and ESNs may provide more accurate load forecasts. Such a combination has been used in Ref. [32] for time series prediction where decomposed components (approximation and details) of time series are forecasted individually by ESNs and the forecasted time series is reconstructed by sum of outputs of the ESNs. However, there are three shortcomings for that proposed scheme out of which two have been mentioned in Ref. [33]. First, no padding scheme was considered to tackle effects of signal sides in WT

A. Deihimi et al. / Energy 57 (2013) 382e401

decomposition [21,33]. Second, there may be an accumulation of errors of the ESNs so as to degrade overall forecasting accuracy [33]. Third, discarding the first detail with the highest frequency (called denoising) is not advisable for STLF, because of useful information that appears in this detail due to some of non-stationary features of load time series. In Ref. [33], the first two shortcomings have been dealt with using a linear combination of outputs of the component forecasters (ESNs) so that the weight coefficients are adjusted to minimize the error accumulation of the component forecasters where the truncated time series is decomposed without any padding. However, appropriate padding schemes like those given in Refs. [19,21] can be used to achieve more accuracy as well as a nonlinear scheme can be applied to minimize the error accumulation. Actually, the linear combination of outputs of the ESNs by some weight factors would not be the best solution for the second shortcoming, because of complex dynamic and behaviors of errors of the ESNs. Therefore, in view of above points and recently attracted attentions to application of ESNs in different fields [31,34e37], this combination is needed to be more rigorously studied for STLF. It has been demonstrated empirically that among all the exogenous variables affecting electric power consumption, temperature has the most influence especially in cold regions. Hence, short-term load forecasters most often need appropriate temperature data predicted for future hours or days that are not provided by weather services everywhere. Thus, utilities often need STTF (short-term temperature forecasting) beside their own short-term load forecasters. Because of the direct impact of accuracy of forecasted temperatures on accuracy of forecasted loads, more accurate STTF methods are needed. Recently, the combinations of WT and MLPNNs have been used for this purpose [38,39]. Since there is no detailed work about application of the combination of WT and ESNs to STTF, it is still an open field for study. In this paper, the proposed WESN-based forecaster contains an efficient reconstruction stage that uses an ESN instead of sum or linear combination of outputs of ESNs to obtain forecasted load or temperature time series from forecasted decomposed components. Because of some useful information of the first detail, denoising is not applied. The least number of inputs is used for one-hour and 24-h ahead load and temperature forecasting. The performance of the proposed forecasters is examined for a long prediction time and compared to the similar ESN-based forecasters as well as some other models. 2. Motivation Our investigations on different short-term load and temperature forecasting models previously proposed in the articles revealed four factors that motivated us for developing the proposed WESN-based forecasters. These factors are first described for STLF and then, considered for STTF. The first factor is forecasting accuracy that is now needed much more than ever to avoid or sharply reduce financial penalties imposed on electric utilities and other active market players in deregulated power systems. Nowadays, high cost of over- and under-contracts on balancing markets due to load forecasting errors has motivated the researchers to persistently seek the forecasting models that reduce this financial distress by improving the load forecasting accuracy even by a fraction of a percent [40]. Since the dynamics of load data series of different regions in the world are most often different, the accuracies of their forecasted results are naturally different even using the same forecasting model. When the results of the models tested on the data series of the North-American electric utility (used in this study) are considered, so far the best load forecasting accuracy has been obtained using the WT-MLPNN-EC [19] (MAPE1 ¼ 0.99 and

383

MAPE24 ¼ 2.04) among MLPNN [21] (MAPE1 ¼ 2.10 and MAPE24 ¼ 3.58 for M1 and MAPE1 ¼ 1.10 and MAPE24 ¼ 3.41 for M2), WT-MLPNN [21] (MAPE1 ¼ 1.12 and MAPE24 ¼ 3.16 for M3 and MAPE1 ¼ 1.99 and MAPE24 ¼ 2.64 for M4), ESN [31] (MAPE1 ¼ 1.048 and MAPE24 ¼ 2.1174) and abductive network [52] (MAPE1 ¼ 1.14 and MAPE24 ¼ 2.66) models for one-hour and 24-h ahead load forecasting. The definition of MAPE (mean absolute percentage error) will be given in Eq. (23), and subscripts 1 and 24 indicate onehour and 24-h ahead forecasting MAPEs respectively. Also, a brief description of these models will be given in Section 5.3. However, the ESN [31] give the closest results to WT-MLPNN-EC model [19]. As it has been well observed that the use of WT as the front stage for multi-resolution decomposition of the load time series improves the forecasting accuracy [18e26,30], it motivated us to seek more forecasting accuracy using appropriate WESN models. The previous works on other data series using different models show that the obtained short-term forecasting accuracies should be more improved. For instance, a proposed WFNN (wavelet fuzzy NN) [25] has been tested on Indian utility data and gives better one-hour load forecasting MAPE in the range of 1.31e1.78 for different months of one year in comparison to a fuzzy NN based on Choquet integral (FNCI) with MAPE in the range of 1.69e2.08, an ANFIS (adaptive neuro-fuzzy inference system) with MAPE in the range of 2.78e3.32 and an RBFNN with MAPE in the range of 2.10e3.84. As another example, a NCSTAR (neuro-coefficient smooth transition autoregressive) model combined with a SOM (self-organizing map) network has been developed in Ref. [6] and tested on the data series of Alberta power market. This NCSTAR-SOM model results in better load forecasting accuracy with MAPE1 ¼ 1.07 in comparison to other models such as ARIMA (autoregressive integrated moving average), MLPNN and NCSTAR. Although these values of MAPE are acceptable and not so large, the better accuracy is still demanded by power markets. The second factor is the problem of selecting lagged inputs for most of forecasting models. Autoregressive models like AR (autoregressive), ARMA (autoregressive moving average) and ARIMA (also ARX (autoregressive exogenous), ARMAX (autoregressive moving average with exogenous variable) and ARIMAX (autoregressive integrated moving average with exogenous variable) in the presence of exogenous variables) often need the information on the sample ACF (autocorrelation function), the sample PACF (partial autocorrelation function), and the CCF (crosscorrelation function) as a reference to select the appropriate model orders (lagged inputs) that is probably the most difficult stage and requires a skill that only comes with experience [40]. In Ref. [11], many tentative models with different model orders are chosen and a PSO (particle swarm optimization) algorithm finds the best order and parameters. However, the selection of the model order candidates and their extent may need some initial investigations on ACF, PACF and CCF so that the method may not fully solve the problem. Different NN models that are not recurrent have a similar problem [18e21,25] when they are used as stand-alone forecasters or in combination with other methods. Often, a correlation analysis is used to find the best lagged inputs. This process can be relatively time consuming [19]. When the WT is used as the front stage for multi-resolution decomposition, the problem can be further complicated by increased number of inputs due to multiple details. However, the RNN models can result in good forecasting accuracy with a few lagged inputs [30,31]. In Ref. [30], a wavelet Elman model has been proposed that have a few lagged inputs to the component forecasters for forecasting the next-day hourly load of a specified area in India. The obtained MAPE is promising and in the range of 0.77e4.53 (with a mean value about 1.917) for different (rainy, cold and hot) days of the year 2009. ESNs [31] using current hour data (without any further lagged inputs) give promising 24-h ahead forecasting MAPE ( ¼2.1174) comparable to the resulting

384

A. Deihimi et al. / Energy 57 (2013) 382e401

MAPE of the wavelet Elman [30] as well as very close to that for WTMLPNN-EC [19] (MAPE24 ¼ 2.06). This is a big motivation for using WESN model as a forecaster, because the addition of the WT to ESNs for higher forecasting accuracy needs no process to select the appropriate lagged inputs. The third factor is the time needed for training the model. The models based on hard computing often need a kind of converging training process that takes time to gradually converge to the solution where the tuning parameters of the process are specified before or obtained from an optimization algorithm [11,40]. This is also true for many NN models. Also, the Elman like other RNNs without reservoir needs such a training algorithm like back-propagation [30]. The converging training algorithms with traditional descent methods may lead to local optimum in the error surface. To overcome the problem, the idea of using evolutionary algorithms has been applied [13,19]. However, this may not attenuate the time consuming training burden. The ESN [31] does not need any converging training algorithm, but only needs a pseudoinverse of the extended state collection matrix in Eq. (13). This property of the ESN reduces the time for training and is preserved when being combined with the WT for multi-resolution decomposition of the load data series. The fourth factor is the error accumulation in reconstruction of the forecasted time series from the forecasted decomposed components where the WT is used as the front stage for multi-resolution decomposition. All the works in Refs. [18e30] have not considered this factor. This is studied in Ref. [33] to improve the forecasting accuracy of the developed WESN model in Ref. [32]. Three shortcomings have been mentioned in Section 1 (Introduction) for the model proposed in Ref. [32]. The first two shortcomings about the side effects of decomposing a truncated signal (without any padding) and the error accumulation have intended to be solved in Ref. [33] using a linear combination of outputs of the component forecasters. The weight coefficients are adjusted to minimize the error accumulation of the component forecasters where the truncated time series is decomposed without any padding as inputs to the component forecasters. Different padding schemes studied in detail in Ref. [21] show that using an appropriate padding scheme has effective influence in the forecasting accuracy. So, this motivated us to use a padding scheme instead of decomposing truncated time series. Furthermore, the nonlinear nature of errors at outputs of component forecasters (here, ESNs) motivated us to use ESN as a nonlinear scheme for reconstruction of forecasted time series from outputs of the component forecasters to minimize the error accumulation. Next, the aforementioned four factors are considered for STTF. As mentioned before, the temperature as the important exogenous variable influences the load forecasting accuracy. When the results of the models tested on the data series of the North-American electric utility (used in this study) are considered, so far the best temperature forecasting accuracy has been obtained using abductive networks [55] with MAPE1 ¼ 2.14 and MAPE24 ¼ 3.49 in comparison to MLPNN [55] (MAPE1 ¼ 2.37 and MAPE24 ¼ 3.86) and ESN [31] (MAPE1 ¼ 2.1899 and MAPE24 > 3). In Ref. [56], MLPNNs are used as 24-h ahead temperature forecasters for 8 utilities in North America that lead to a mean absolute error in the range of 1.04  F (0.58  C)e2.03  F (1.128  C) with an average of 1.48  F (0.82  C). In Ref. [57], a WNN (wavelet NN) model is used to forecast the daily mean temperature of Taipei with a resulting MAPE in the range of 0.7e0.9 for four months of June, July, August and September so that gives better results in comparison to different time-variant fuzzy time series models [58]. If the results for other months have also been reported, better comparisons could have been made, because of more fluctuations of temperature in winter and spring. Promising accuracy of the WNN model that uses MLPNN after the WT front stage motivated us to improve the resulting accuracy of ESNs [31] using the WT to seek more temperature forecasting accuracy

needed by load forecasters. For second and third factors, the case of temperature forecasting is similar to the case of load forecasting. However, the fourth factor, namely the error accumulation has not been regarded for temperature forecasting by WNN models. So, this motivated us to see the performance of the WESN model in temperature forecasting when the error accumulation is to be minimized by a nonlinear scheme (ESN here). The main contributions of this study with respect to models developed in Refs. [32,33] are: (i) A padding scheme to further reduce forecasting error of each component forecaster is used in the proposed WESN models, and a nonlinear scheme (i.e. ESN) is used for reconstruction of the forecasted time series from the outputs of the component forecasters to minimize the error accumulation; (ii) A component forecaster (an ESN) for D1 (the high frequency detail) is considered; (iii) A modified SLF is used to obtain the best design of ESNs for more accurate forecasts; (iv) The sensitivity of the proposed load forecasters to temperature errors is analyzed. 3. The proposed forecasters for STLF and STTF As load and temperature time series are inherently nonstationary signals, their behaviors can be effectively analyzed by localization on both time and frequency what is frequently achieved using WT [18e30]. On the other hand, ESN as the-state-ofthe-art RNN with a simple and fast training process has shown high capability to predict complex dynamic behaviors with promising accuracy [31]. So the combination of these two is examined to achieve more accuracy for STLF and STTF. This section describes the proposed WESN-based forecasters for both one-hour and 24-h ahead load and temperature forecasting. Brief descriptions of WT and ESN are given before unfolding the details of the proposed forecasters. 3.1. Wavelet transform WT is a time-frequency analysis tool with the ability to extract local spectral and temporal information of a signal simultaneously [41,42]. This allows a local scale-dependent spectral analysis of signal features. DWT (discrete wavelet transform) for a discrete time signal x(l) is defined as:

cDm;n ¼

X

xðlÞjm;n ðlÞ

(1)

l

jm;n ðlÞ ¼ ð2m=2 Þjð2m l  nÞ is the dilated and translated form of the mother wavelet jðlÞ ¼ j0;0 ðlÞ. cDm,n is known as the wavelet (or detail) coefficient at scale m and location n. Scaling functions given in Eq. (2) that can exhibit smoothing features of signals are associated with the orthonormal wavelet basis functions.

4m;n ðlÞ ¼



   2m=2 4 2m l  n

(2)

4ðlÞ ¼ 40;0 ðlÞ is referred to as the father wavelet. The convolution of scaling functions with a signal gives approximate coefficients as below:

cAm;n ¼

X

xðlÞ4m;n ðlÞ

(3)

l

An approximation of the signal at scale m can be obtained as below.

Am ¼

þN X n ¼ N

cAm;n 4m;n

(4)

A. Deihimi et al. / Energy 57 (2013) 382e401

385

In fact, Am is a smooth, scaling function-dependent version of the actual signal x at scale m. Using approximate and detail coefficients up to scale M [41,42]:

x ¼ AM þ

M X

x=A0 Low-pass Filter

Dj

High-pass Filter

(5)

j¼1

Dj ¼

þN X n ¼ N

cDj;n jj;n

A1

(6)

Eq. (4)

Low-pass Filter

Dj is called the detail of the signal at scale j. When the signal x is expressed using Eq. (5) up to two consecutive scales of M ¼ m  1 P and M ¼ m separately, that is, x ¼ Am1 þ m1 j ¼ 1 Dj for M ¼ m  1 P Pm1 and x ¼ Am þ m D ¼ A þ D þ for M ¼ m, m m j¼1 j j ¼ 1 Dj

A2

Am1 ¼ Am þ Dm

Low-pass Filter

hk cAm;2nþk ¼

k

cDmþ1;n ¼

X

X

hk2n cAm;k

(8)

gk2n cAm;k

(9)

k

gk cAm;2nþk ¼

k

High-pass Filter

(7)

X k

These equations represent the multi-resolution decomposition algorithm that is interpreted as a low-pass filter with coefficients hk and a high-pass filter with coefficients gk to separate low and high frequency scale-dependent components of the signal, respectively. A repetitive application of decomposition Eqs. (8) and (9) results in approximation and detail coefficients at successive increased scales as shown in Fig. 1 (e.g. up to scale m ¼ 3).

ESN benefits a DR (dynamical reservoir) containing a number of interconnected recurrent neurons. The DR as an internal or hidden layer locates between input and output layers as shown in Fig. 2 [43]. The most useful feature of ESN is that only the weights associated with output layer are trained while other weights are selected initially and remained untrained. Then, its training process is simple and very fast. The DR often contains sigmoid neurons, and other layers have linear neurons. At time step (n þ 1), the DR state vector X (outputs of DR neurons) and ESN outputs are updated as:

  Xðn þ 1Þ ¼ F Win Vðn þ 1Þ þ WXðnÞ þ Wofb YðnÞ

cD2

Eq. (6)

Eq. (6)

Eq. (6)

A3

D3

D2

D1

input and DR state vectors together. The training process adjusts only Wout while W is chosen randomly before training, providing that its entities meet echo state condition [43]. Under this condition, effects of DR states and inputs on future DR states should vanish gradually as time proceeds. Practically this is met when W has a spectral radius (the largest absolute eigenvalue) less than one (3 < 1). Win and Wofb are dense and generated randomly over [1,1]. The training target is to minimize the MSE (mean square error) given in Eq. (12) between the desired outputs (teacher signal) d(n) and actual outputs of the ESN y(n) for successively applied time series of inputs.

1 nseries

nX series 

kdðkÞ  yðkÞk2

k¼1

2

¼

1 nseries

(10)

(11)

Win ˛RNu Nx , W˛RNx Nx , Wofb ˛RNy Nx and Wout ˛RðNu þNx ÞNy are input-hidden, hidden-hidden, output-hidden and output weight matrices, respectively. V and Y are respectively input and output vectors. F and G are vectors of activation functions for DR and output neurons, respectively. Z is the extended state vector joining

nX series 

keðkÞk2

2

(12)

k¼1

nseries is the number of inputeoutput pairs used for training. A direct solution can be obtained as:

h iT   Yðn þ 1Þ ¼ G Wout VT ðn þ 1Þ; XT ðn þ 1Þ ¼ GðWout Zðn þ 1ÞÞ

cD1

Eq. (4)

Fig. 1. Multi-resolution decomposition using DWT.

MSE ¼ 3.2. Echo state networks

cD3

cA3

This multi-resolution representation states that sum of approximation and detail of the signal at an arbitrary scale m results in signal approximation at an increased resolution (i.e. at scale m  1) [41]. The signal approximation at scale m ¼ 0 is the actual signal (i.e. A0 ¼ x) where the highest resolution is appeared. From Eq. (7), approximation and detail coefficients at an arbitrary scale can be determined using those coefficients at previous scale as:

X

High-pass Filter

cA2

Eq. (4)

comparing two expressions for two scales of M ¼ m  1 and M ¼ m gives:

cAmþ1;n ¼

Downsampling by 2

cA1

Fig. 2. The structure of ESN.

386

A. Deihimi et al. / Energy 57 (2013) 382e401

Wout ¼ TSþ

while the other input is the predicted next-hour temperature (Tpred(K þ 1)). Since the echoes of past history of the ESN inputs are reverberated in the DR [31], no lagged value of components or temperature is considered as additional input to ESNs. This meets the desired feature of minimum number of inputs for new proposed forecasters. The output of each ESN is the predicted nexthour value of the corresponding component (Am(K þ 1), D1(K þ 1), D2(K þ 1), ., Dm(K þ 1)). The aggregation of outputs of all ESNs may be used to reconstruct the next-hour load forecast (L(K þ 1)). However, the accumulation of errors of the ESNs may result in more forecasting error as described in Ref. [33]. So an ESN is used for the reconstruction stage while designed to minimize the error accumulation. For 24-h ahead load forecasting, twenty-four individual forecasters are considered as proposed in Ref. [31]; each one for a specific hour of the day. The proposed 24-h ahead WESN-based load forecaster for the hour h (1:00, 2:00, . , 24:00) is illustrated in Fig. 3(b). The daily load subseries for the hour h with the length Lw up to the current day (d) is decomposed by DWT into approximation and details up to scale m. The current day (i.e. last) entity of the subseries of each component (Am(d), D1(d), D2(d), ., Dm(d)) is used as one of the three inputs to the ESN associated with that component. The predicted next-day temperature (Tpred,h(d þ 1)) at the hour h is the second input to ESNs. The third input is the twovalue day-type index (Ftype(d þ 1)) used in Ref. [31] to discriminate between weekdays and weekends/holidays. Again, no lagged value of components or temperature is considered as additional input to ESNs to meet the desired feature of minimum number of inputs. The output of each ESN is the predicted next-day value of the corresponding component (Am(d þ 1), D1(d þ 1), D2(d þ 1), ., Dm(d þ 1)) at the hour h. An ESN is used to reconstruct the next-day

(13)

T˛R nseries Ny and S˛R nseries ðNu þNx Þ are respectively the output teacher collection matrix and the extended state collection matrix. The superscript þ in Eq. (13) denotes the MooreePenrose pseudoinverse. When the training process begins, some initial portion of the states should be discarded for a washout of the initial DR state. The ST (settling time) is then measured as the number of steps that inputs are applied to the ESN before outputs are collected for training.

3.3. The proposed WESN-based STLF method There are some exogenous variables like temperature, humidity, wind speed and cloud density which influence the trend and amount of electric consumption. If an accurate prediction of such variables at a desired target time is available, they can be incorporated into STLF methods as different inputs. However, prediction errors of these variables can adversely affect forecasted results depending on the robustness of the adopted STLF method. Actually, new proposed accurate forecasters with the least number of inputs are more attractive in STLF field [31]. Two common STLF cases of one-hour and 24-h ahead load forecasting are considered here. The proposed one-hour ahead WESN-based load forecaster is illustrated in Fig. 3(a). The hourly load subseries of the length Lw up to the current hour (K) is decomposed by DWT into approximation and details up to scale m. The current hour (i.e. last) entity of the subseries of each component (Am(K), D1(K), D2(K), ., Dm(K)) is used as one of the two inputs to the ESN associated with that component

Am Load subseries up to L(K )

Signal decomposition by DWT

Dm

D1

LE LE . . .

LE

Am (K )

ESN

Am (K + 1)

Dm (K )

ESN . . .

Dm (K + 1) .

ESN

D1 ( K + 1)

D1 ( K )

T pred (K + 1)

.

E S N

L(K + 1)

.

(a) Am Load subseries for h up to Lh (d )

Signal decomposition by DWT

Dm

LE

Am (d )

ESN

Am (d + 1)

Dm (d )

Dm (d + 1) .

.

ESN . . .

LE

ESN

D1 (d + 1)

LE . .

D1

D1 (d )

Ftype (d + 1)

. .

T pred , h (d + 1)

(b) Fig. 3. The proposed (a) one-hour ahead and (b) 24-h ahead WESN-based load forecasters.

E S N

Lh (d + 1)

A. Deihimi et al. / Energy 57 (2013) 382e401

load forecast (Lh(d þ 1)) at the hour h while designed for minimum accumulation error. As mentioned in Refs. [19,21], filtering/convolution operations on a finite length signal cause the information on two signal sides to be distorted and corrupted. Hence, the decomposition of the load subseries of the finite length Lw may adversely affect the forecasted results. Padding (extension) schemes are often used to avoid such distortions. The padding scheme used in Refs. [19,21] is applied for this study. Thus, 72 padding-values are added to both sides of the load subseries. The 72 successive actual values of load measured just before the first entity of the load subseries are added to the beginning side, and the 72 successive forecasted values of load just after the last entity of the load subseries are appended at the end side. These forecasted values are obtained, without the DWT decomposition stage in Fig. 3, just from ESNs and the reconstruction stage where each forecasted component value by an ESN is consecutively fed back to the same ESN to forecast the next values of the component. Meanwhile, if the temperature input to ESNs comes from the output of a WESNbased temperature forecaster as will be described in the next section, the 72 successive values of the predicted temperature will be obtained from a similar successive feeding back procedure without the DWT decomposition stage. After adding paddingvalues to both sides of the load subseries and decomposing the extended subseries by DWT, the last 72 entities of each component are eliminated. Then, the last entity of the remaining subseries of each component is separated to feed the ESN associated with that component. The separating function is performed by the block LE (as abbreviation for last entity) in Fig. 3.

temperature forecasters for this purpose. Although some meteorological factors like wind speed, irradiance of sunlight, cloud density, pressure and humidity influence the temperature, we consider here neither of them as input to temperature forecasters for two reasons. One reason is the desired feature of minimum number of inputs for new proposed forecasters. The other one is that the lack of appropriate data for those meteorological factors leads to high cost of measuring and collecting such data imposed to utilities. The same two cases as the previous subsection are considered here for STTF; that is, one-hour and 24-h ahead temperature forecasting. The proposed one-hour and 24-h ahead WESN-based temperature forecasters are illustrated in Fig. 4 that include similar stages as the proposed WESN-based load forecasters. The same padding scheme as described before is used for decomposing temperature subseries of the finite length Lw. The one-hour ahead WESN-based temperature forecaster will provide the required temperature input to the one-hour ahead WESN-based load forecaster. Also, 24-h ahead WESN-based temperature forecaster for the hour h will provide the required temperature input to the 24-h ahead WESNbased load forecaster for the same hour. 3.5. Modified SLF algorithm for ESN design optimization For design of each ESN in the proposed WESN-based forecasters, three parameters of size (NDR), connectivity (CW) and spectral radius (3 ) for DR should be determined appropriately. An efficient optimization algorithm may be used to obtain the best values of these parameters. The connectivity is determined by CW ¼ 10%/NDR [31,43]. Here, the best values of NDR and 3 are sought in intervals [10,1200] and [0.5,1] respectively. The maximum value of NDR is limited by the available memory capacity of the computer. For the computer used for this study, NDR greater than 1200 fails for computations due to out of memory error. Also, based on our experience and observations, NDR less than 10 is too small to properly learn the complex dynamics of actual time series (like load

3.4. The proposed WESN-based STTF method The predicted temperature at the desired target hour is needed for the WESN-based load forecasters proposed in the previous subsection. Since weather services may not provide the required temperature data, utilities should use their own appropriate

Am Temperature subseries up to T (K )

Signal decomposition by DWT

Dm

D1

387

LE

Am (K )

LE . . .

Dm (K )

LE

D1 ( K )

ESN

Am (K + 1)

ESN . . .

Dm (K + 1) . . .

ESN

D1 ( K + 1)

ESN

Am (d + 1)

ESN . . .

Dm (d + 1)

ESN

D1 (d + 1)

E S N

T (K + 1)

E S N

Th (d + 1)

(a) Am Temperature subseries for h up to Th (d )

Signal decomposition by DWT

Dm

LE LE .

Am (d ) Dm (d )

. . D1

LE

D1 (d )

. . .

(b) Fig. 4. The proposed (a) one-hour ahead and (b) 24-h ahead WESN-based temperature forecasters.

388

A. Deihimi et al. / Energy 57 (2013) 382e401

and temperature time series) so that the forecasted results have poor accuracies [31,34]. Generally, 3 is less than or equal to unity [43]. However, as studied in Ref. [44], the appropriate values of 3 are around 0.8 and not less than 0.5. In Ref. [31], the best values of 3 are obtained greater than 0.8 for load and temperature forecasting using ESN. Moreover, in Ref. [34], they are obtained greater than 0.75 for bus voltage estimation. Then, the interval [0.5,1] is assumed here as an appropriate range for seeking the best value of 3 in this study. As the data time series is often divided into three successive portions for training, validating and testing the ESN, the objective (or fitness) function as in Ref. [31] is the MSE evaluated over the validation portion of the data series. After training the ESN using Eq. (13) for a candidate set of parameters, the MSE is evaluated over the validation portion in order to be compared with those of other candidates. Here, a modified SFL algorithm is used to obtain the best values of the parameters. Originally, the SFL algorithm was proposed in Ref. [45]. An initial population of Nf frogs (solutions) is generated randomly and ranked decreasingly by fitness. The frogs are divided into m memeplexes to share their memes with each other. The frog with the rank k will be the member of ith memeplex where i  1 is equal to the modulus after division of k/m. During a memetic evolution for each memeplex, change of position of the worst frog (Xw) is first examined by leaping towards the position of the best frog of the memeplex (Xb). If it improves the frog fitness, the new position will be adopted. Otherwise, the leap is examined towards the position of the globally best frog of the whole population (Xg). If it yields no improvement, a random leap will be extremely considered. After a specified number of memetic evolutions (Nme), memeplexes pass their information to each other in a shuffling stage to make a new population. New populations repeatedly experience the division into memeplexes, memetic evolutions and shuffling stage until a certain convergence criterion is met where the best frog is reported as the optimum solution. The modified SFL algorithm used here improves both local and global searching performance of the conventional algorithm to prevent a premature convergence to local optima. The modifications are: - A clustering process based on the Euclidean distance is used for dividing frogs into m memeplexes [46]. After ranking the frogs in the population, m frogs with the highest ranks go to m memeplexes, each frog as the first member and the initial centroid of one memeplex. Then, memeplexes one by one in turn admit the next member among the remaining frogs which has the closest distance to the centroid of the memeplex. The advantage of this clustering process is that each memeplex contains the points from a part of the search space being near to each other thereby improving searching performance. - At each memetic evolution, change of position of the worst frog is first examined using positions of all of other frogs in that memeplex including the best frog while the position (XC) called “center of mass” of the memeplex influences the leaping direction as below.

Di ¼ r1 ðXi  Xw Þ þ r2 ðXC  Xw Þ 0 XC ¼ @

p X

1

0

ð1=Fj ÞXj A

@

j¼1

new old Xw;i ¼ Xw þ Di

=

p X

(14) 1

ð1=Fj ÞA

(15)

j¼1

(16)

Fj is the fitness value for the jth frog in the memeplex. p is the number of frogs in the memeplex. r1 and r2 are random numbers from [0,1]. The most improving leap in Eq. (16) is selected for the

new position of the worst frog. When none of leaps in Eq. (16) improves the worst frog, change of position of the worst frog is examined using the position of the globally best frog and the “center of mass” of the population (XCg) as below.

    D ¼ r3 Xg  Xw þ r4 XCg  Xw 0 XCg ¼ @

Nf X

1

0

ð1=Fj ÞXj A

@

j¼1

=

Nf X

(17) 1

ð1=Fj ÞA

(18)

j¼1

new old ¼ Xw þD Xw

(19)

r3 and r4 are random numbers from [0,1]. If no improvement is achieved, a random frog is generated around XC as below and replaces the worst frog. new Xw ¼ XC þ r5 Dmax =ðl þ 1Þ

(20)

r5 is a random number from [1,1]. Dmax denotes the allowed maximum space around XC to generate the random frog. l is the index to count the number of performed shuffled stages. The use of “center of mass” inspired by the main features of the BBeBC (big bangebig crunch) algorithm [51] has enhancing effects on searching performance. The flow diagram of the modified SFL algorithm is depicted in Fig. 5. As Win, W and Wofb are initialized randomly before training, the algorithm is repeated 20 times for different weight initializations and the best result is selected as the final obtained ESN design. Here, the initial population size (Nf) is 200. The number of memeplexes (m) is 10. The number of memetic evolutions (Nme) is 10. The total number of shuffling frogs as new population (Nsh) is 100 and used as the stop criterion. The given values for Nf, m, Nme and Nsh were obtained through trial and error. Actually, the different cases with different values of these parameters were examined and the quality of the solution and the processing time were considered as selection criteria. As resulted from the experiments, the initial population size (Nf) should be large enough for better global search in the search space and high probability of finding the global optimum solution. The low values of Nf degrade the performance of the algorithm and increase the probability of trapping in local optima. However, a very large value of Nf adversely increases the processing time. Considering the works on SLF, the selected value of Nf is commonly in the range of 200e300 [45e50]. In our study, Nf ¼ 200 was suitable to yield solutions of high quality with a reasonable processing time. As local searches are conducted by memetic evolutions in the memplexes, the number of memeplexes m influences the number of areas locally searched in the search space. When m is low, a few areas may be locally searched and the number of memetic evolutions (Nme) needs to be increased to a large value in order to have sufficient moves towards the position of the globally best frog and have sufficient random moves thereby going to other areas of the search space. When m is large, the processing time is large and if the number of memetic evolutions (Nme) is decreased as a solution, it may degrade local search capability of the algorithm. Also, m should be selected depending on the selected value of Nf so that the number of frogs in each memeplex is neither too low that degrades local searches of the algorithm nor too high that makes the processing time to be high because our proposed modified SLF algorithm examines all moves of the worst frog in each memeplex towards all of other members of that memeplex. The selected m ¼ 10 was a suitable moderate value that resulted in the solution of high quality, and the selected Nme ¼ 10 was an appropriate value that benefits both good local searches and sufficient moves

A. Deihimi et al. / Energy 57 (2013) 382e401

Give the ranges of DR parameters (Nx and ε ) Give SFL parameters (Nf, m, Nsh, Nme)

Generate initial population and evaluate their fitness

Rank the frogs decreasingly and cluster them into memeplexes Select the first memeplex (i=1)

j=1

Determine XC, Xw, XCg and Xg

389

towards the position of the globally best frog as well as sufficient random moves to find the solution of high quality with a reasonable processing time. Considering the works on SLF, the selected value of m is commonly from 5 to 20 and the selected value of Nme is commonly from 5 to 10 [45e50]. Although the total number of shuffling frogs Nsh (as the total number of iterations the algorithm continues until being stopped) was selected here equal to 100, the best solutions were obtained in less than 100 iterations as will be shown exemplarily in Section 5.1. So, the selected value of Nsh is reasonably large to always allow that the algorithm finds the best solution of high quality. The number of algorithm repeat selected here to be 20 is to take into account the different random initialization of the algorithm thereby increasing the chance to find a better initialization leading to better quality of the final solution. Generally the random initialization means the random generation of initial population for evolutionary algorithms. Here, the initialization of the weight matrices Win, W and Wofb of the ESN before training is also considered. Actually, there is infinite number of such a random initialization. However, the number of algorithm repeat is frequently selected from 10 to 50 in the literatures. Larger values may prohibitively increase the processing time.

Leaping the worst frog using (14)-(16) 4. Load and temperature data series

Yes

Improvement for the worst frog? No

Leaping the worst frog using (17)-(19)

Yes

Improvement for the worst frog?

Normalized hourly load and temperature series of a NorthAmerican electric utility measured for 7 years and illustrated in Fig. 6 is used in this study. For load, the maximum value 1.0 and the minimum value 0.2127 are respectively corresponding to 4635 MW and 986 MW. For temperature, the maximum value 1.0 and the minimum value 0.0714 are respectively corresponding to 36.67  C and 13.9  C. The proposed one-hour ahead WESN-based forecasters are trained over one year, validated over one year and tested over two last years of the data series. The proposed 24-h ahead WESN-based forecasters are trained over two years, validated over

No Leaping the worst frog using (20) Move the worst frog to the new position

j=Nme?

Shuffle the frogs as new population

j=j+1

No Yes i=m?

i=i+1

No Yes

No

Meeting stop criterion? Yes

No

Number of algorithm repeat = 20? Yes Return the obtained best ESN design

Fig. 5. The modified SFL algorithm to find the best ESN design.

Fig. 6. The data series: (a) normalized hourly loads and (b) normalized hourly temperatures.

390

A. Deihimi et al. / Energy 57 (2013) 382e401

one year and tested over two last years of the data series. The longer training set considered here for 24-h ahead forecasting provides more information about daily behaviors of load and temperature at each specific hour of the day so that more appropriate designs of forecasters can be obtained. 5. Results and discussions The length Lw of the load and temperature subseries (see Figs. 3 and 4) should be long enough to reflect actual behavior and embedded temporal information of the signals after decomposition by DWT. Here, Lw is selected to be one year. In addition, the same values of the day-type index as given in Ref. [31] are used for the proposed 24-h WESN-based load forecasters (i.e. 0.25 and 0.75 for weekends/holidays and weekdays respectively). AE (absolute error), MAE (mean absolute error) and MAPE (mean absolute percentage error) are used to measure the accuracy of forecasts as below.

    AEi ¼ Gact;i  Gfor;i 

MAE ¼

i ¼ 1; .; N

(21)

N 1 X AE N i¼1 i

MAPE ¼

(22)

N 1 X AEi N i ¼ 1 Gact;i

(23) Fig. 7. An actual normalized load subseries and its approximation and details up to scale 5.

Gact;i and Gfor;i are respectively actual and forecasted loads or temperatures at the ith entity of hourly series where N denotes the length of series considered for forecasting. 5.1. One-hour ahead forecasting Results of the proposed one-hour ahead WESN-based load forecasting are first presented while actual measured temperatures are used as the predicted temperature input to the ESNs forecasting components. Three cases of decomposing load subseries up to scales 3, 4 and 5 are examined. Table 1 gives the best values of parameters of the trained and validated ESNs obtained using the described modified SFL algorithm for each component as well as the resulting MAPE and MAE for each trained ESN over the validation portion of the data series. As seen, less error is obtained for D4 and A4 in comparison to A3 as well as for A5 in comparison to A4. However, the error is slightly higher for D5 in comparison to A4. Keeping in mind that variations of D5 is normally much less than those of A5 (see Fig. 7) and A4, one can then expect to gain more prediction accuracy when increasing the decomposition scale (m) Table 1 The best parameters of ESNs and their resulting MAPE and MAE over the validation data series for one-hour ahead WESN-based load forecaster using measured temperatures instead of forecasted ones. For scale

m ¼ 3, 4, 5

m m m m

¼ ¼ ¼ ¼

3 4, 5 4 5

Decomposed component

DR size

DR connectivity

DR spectral radius

MAPE (%)

MAE

D1 D2 D3 A3 D4 A4 D5 A5

821 912 1014 997 502 118 51 106

0.0122 0.0110 0.0099 0.0100 0.0199 0.0847 0.1960 0.0943

0.84 0.79 0.76 0.81 0.80 0.84 0.87 0.83

0.4427 0.2411 0.0786 0.1455 0.0863 0.0579 0.0584 0.0351

0.00220 0.00120 0.00038 0.00084 0.00042 0.00027 0.00029 0.00017

from 3 to 4 and from 4 to 5, but with less improvement in the latter than the former. This is confirmed by the resulting MAPE and MAE given in Table 2 for the load forecaster tested over two last years of the data series for different decomposition scales. Actually, no more benefit was observed for decomposition to higher scales (m > 5). Since the resulting forecasts are more accurate for the decomposition up to scale 5, the DWT decomposition of load subseries up to this scale is adopted. Typical forecasted results are depicted in Fig. 8(a) against actual loads for 300 h of the test data. Also, the corresponding AE of those load forecasts is shown in Fig. 8(b). The performance of the modified SLF algorithm described in Section 3.5 is compared with that for the conventional SLF [45] in Fig. 9 to obtain the best design of the ESN forecasting approximation A5. The parameters for two algorithms are the same as those given in Section 3.5. The better performance of the modified SLF can be observed from this exemplar comparison. The best solution is nearly found after the iteration 40. If the initial population size (Nf) is decreased from 200 to 100, the convergence speed of the modified SLF algorithm will be decreased so that the best solution is found after the iteration 74 with MSE ¼ 1.783e8, namely 11.6% worse than the solution obtained before. This indicates that the

Table 2 Resulting MAPE and MAE for one-hour ahead WESN-based load forecaster tested over 2 last years for different decomposition scales. Decomposition up to scale m

m¼3 m¼4 m¼5

Using measured temperatures

Using forecasted temperatures

MAPE (%)

MAE

MAPE (%)

MAE

0.7338 0.7016 0.6974

0.00362 0.00338 0.00331

0.7562 0.7173 0.7087

0.00379 0.00351 0.00346

A. Deihimi et al. / Energy 57 (2013) 382e401

391

Table 3 The best parameters of ESNs and their resulting MAPE and MAE over the validation data series for one-hour ahead WESN-based temperature forecaster. For scale

Decomposed component

DR size

DR connectivity

DR spectral radius

MAPE (%)

MAE

m ¼ 3, 4, 5

D1 D2 D3 A3 D4 A4 D5 A5

109 893 513 57 911 54 208 55

0.0917 0.0112 0.0195 0.1754 0.0110 0.1852 0.0481 0.1818

0.88 0.81 0.80 0.83 0.75 0.86 0.81 0.89

0.5241 0.2239 0.0525 0.1436 0.0358 0.0331 0.0491 0.0315

0.00260 0.00110 0.00026 0.00054 0.00017 0.00013 0.00024 0.00013

m m m m

¼ ¼ ¼ ¼

3 4, 5 4 5

Fig. 8. Forecasted results from the proposed one-hour ahead WESN-based load forecaster using actual measured temperatures (a) comparison of forecasted and actual loads and (b) forecasting AE.

algorithm has been trapped in a local optimum. If the number of memeplexes (m) is decreased from 10 to 5 and the number of memetic evolutions (Nme) is increased from 10 to 20, the best solution is nearly found after the iteration 66 with MSE ¼ 1.612e8, namely 0.8% worse than the solution obtained before. The model given in Ref. [33] is used for the same one-hour load forecasting mentioned above in order to be compared with the proposed WESN-based load forecaster. In that model, no padding is applied to the load subseries and the output of the component forecasters are combined linearly using some weight coefficients adjusted to minimize the validation MSE. The resulting MAPE over two last years of the data series is 0.8715% that is 0.1741% greater than that of the proposed WESN model.

Fig. 10. An actual normalized temperature subseries and its approximation and details up to scale 5.

Next, results of the proposed one-hour ahead WESN-based temperature forecasting are investigated. The same three cases as above are considered. Table 3 gives the best values of parameters of the trained and validated ESNs as well as the resulting MAPE and MAE for the trained ESNs over the validation portion. Again, less error is observed for D4 and A4 in comparison to A3 as well as for A5 in comparison to A4. However, the error is higher for D5 in comparison to A4. It is noted that variations of D5 is much less than those of A5 (see Fig. 10) and A4. The resulting MAPE and MAE are given in

Table 4 Resulting MAPE and MAE for one-hour ahead WESN-based temperature forecaster tested over 2 last years for different decomposition scales.

Fig. 9. Comparison of the modified SLF and the conventional SLF.

Decomposition up to scale m

MAPE (%)

MAE

m¼3 m¼4 m¼5

0.8962 0.8193 0.8117

0.00441 0.00419 0.00416

392

A. Deihimi et al. / Energy 57 (2013) 382e401

Table 4 for the one-hour ahead WESN-based temperature forecaster tested over two last years of the data series at different decomposition scales. The decomposition of temperature subseries up to higher scales greater than 5 actually shows no significant benefit. Again, the decomposition of temperature subseries up to scale 5 is adopted. Typical forecasted results are depicted in Fig. 11(a) against actual temperatures for 300 h of the test data. Also, the corresponding AE of those temperature forecasts is shown in Fig. 11(b). If the best values of parameters of trained and validated ESNs in Table 1 are compared to those in Table 3 for each component, it will be found that they have no certain relation or similarity. This is mainly due to the fact that the dynamics and shape of variations of each component for electric load time series generally differs from those of the same component for temperature time series. This can be seen by comparing the variations of components illustrated in Figs. 7 and 10. For instance, variations and dynamics of D3 depicted in Fig. 7 are completely different from those for D3 shown in Fig. 10. Such a difference can also be observed for other components. Furthermore, the best values of parameters of trained and validated ESNs for different details (D1 to D5) in Table 1 (or Table 3) have also no certain relation or similarity. For instance, the value of NDR for D2 is less than that for D3 in Table 1, but the value of NDR for D2 is greater than that for D3 in Table 3. Based on Eqs. (7)e(9) and Fig. 1, Di (i ¼ 1,2,3,.) is obtained by applying a high-pass filter to Ai-1 so that it contents the upper half of the frequencies in Ai1 and the remaining lower half of the frequencies appear in Ai as Ai1 ¼ Ai þ Di. In turn, Diþ1 just contents the upper half of the frequencies in Ai that has no overlap with frequencies in Di and is totally less than frequencies in Di. Therefore, different details (Di) extract different dynamics of separate and non-overlapping frequency bands from the original signal (x). Then, since different values of parameters for the best design of the ESN are generally expected for learning and

predicting different dynamics, the best values of parameters of trained and validated ESNs for different details (D1 to D5) of one time series are expected not to be similar or related. However, since Ai contents the lower half of the frequencies in Ai1, they may have very close dynamics, providing that Di extracts simple and insignificant dynamics from Ai1. For instance, the value of NDR for D5 is 51 in Table 1 and shows a relatively simple and insignificant dynamics for this component (in comparison to other details). Thus, close values of NDR and spectral radius are observed for A4 and A5 in Table 1. Next, the forecasted temperatures obtained from the one-hour ahead WESN-based temperature forecaster designed above are used to provide the predicted next-hour temperature input to the ESNs of the one-hour ahead WESN-based load forecaster. Results for the three aforementioned cases of decomposition of load subseries are given in Table 2. The MAPE shows respectively 0.0224%, 0.0157% and 0.0113% increase for decomposition up to scales 3, 4 and 5 with respect to the MAPE obtained when actual temperatures are used. Typical forecasted results are depicted in Fig. 12(a) against actual loads for the same 300 h as used in Fig. 8. Also, the corresponding AE of those load forecasts is shown in Fig. 12(b). A zero-mean Gaussian noise was added to the actual measured temperature series in Refs. [21,31] for emulating predicted temperature errors and investigating effects of such errors on the performance of load forecasters. Hence, a zero-mean Gaussian noise with a standard deviation of 0.00416 (equal to the MAE of temperature forecasting with decomposition up to scale 5 as given in Table 4) is added to the actual measured temperature series, and the resulting series is used to provide the predicted temperature input of the ESNs of the one-hour ahead WESN-based load forecaster. The resulting MAE and MAPE are 0.00337 and 0.7011% that are respectively well close to 0.00346 and 0.7087% (see Table 2) obtained when the temperature forecaster provides the temperature input to the ESNs of the load forecaster.

Fig. 11. Forecasted results from the proposed one-hour ahead WESN-based temperature forecaster (a) comparison of forecasted and actual temperatures and (b) forecasting AE.

Fig. 12. Forecasted results from the proposed one-hour ahead WESN-based load forecaster using forecasted temperatures (a) comparison of forecasted and actual loads and (b) forecasting AE.

A. Deihimi et al. / Energy 57 (2013) 382e401

The effect of the temperature prediction error on load forecasting is also studied locally at each specific hour by sensitivity analysis as in Ref. [31]. For this, some deviations are applied to the actual temperature at a specific hour, and the forecasting error obtained from the load forecaster is studied for that hour. Sensitivity analysis was performed for a large set of different hours at different days. Typical results are depicted in Fig. 13 at some different hours. The actual measured temperature is marked by a square in Fig. 13. The plots show that the sensitivity of load forecasting error is much higher for a positive temperature deviation in comparison to the similar negative one. So, underestimation of the next-hour temperature compared to its overestimation results in less load forecasting error.

Table 5 The best DR sizes of ESNs and their resulting MAPE and MAE over the validation data series for 24-h ahead WESN-based load forecasters using measured temperatures instead of forecasted ones. Day hours Quantity D1

D2

D3

D4

D5

A5

1

149 0.8146 0.00411 121 0.8569 0.00433 153 0.8762 0.00442 121 0.9229 0.00461 78 0.9793 0.00493 101 1.1538 0.00571 129 1.3788 0.00685 106 1.3422 0.00672 101 1.0553 0.00533 127 0.8994 0.00451 154 0.8646 0.00431 142 0.8967 0.00452 91 0.9289 0.00462 132 0.9521 0.00486 143 1.0072 0.00497 113 1.0325 0.00513 118 1.0221 0.00514 149 1.0353 0.00524 101 1.0463 0.00526 115 1.0293 0.00514 124 0.9502 0.00473 121 0.8695 0.00436 101 0.7183 0.00366 123 0.8343 0.00421

131 0.4051 0.00202 119 0.4299 0.00214 127 0.4305 0.00215 113 0.4585 0.00236 139 0.4927 0.00254 131 0.5502 0.00275 111 0.7046 0.00349 121 0.6834 0.00344 108 0.5278 0.00261 132 0.4582 0.00227 92 0.4592 0.00236 114 0.4654 0.00232 113 0.4788 0.00242 80 0.5143 0.00261 84 0.5373 0.00277 102 0.5563 0.00279 106 0.5547 0.00287 126 0.5543 0.00286 115 0.5205 0.00261 130 0.4947 0.00251 102 0.4586 0.00237 131 0.4219 0.00215 120 0.3292 0.00165 138 0.3937 0.00211

130 0.1344 0.00067 117 0.1429 0.00072 148 0.1395 0.00069 151 0.1499 0.00074 128 0.1533 0.00076 135 0.1753 0.00087 124 0.1964 0.00091 134 0.1954 0.00091 152 0.1679 0.00084 155 0.1397 0.00069 137 0.1279 0.00063 143 0.1374 0.00068 119 0.1438 0.00072 92 0.1449 0.00072 142 0.1539 0.00076 104 0.1571 0.00078 131 0.1637 0.00081 121 0.1568 0.00078 132 0.1537 0.00076 141 0.1514 0.00075 111 0.1413 0.00071 154 0.1374 0.00068 135 0.1117 0.00051 152 0.1322 0.00065

92 0.09693 0.00048 84 0.1003 0.00049 64 0.1032 0.00050 72 0.1085 0.00054 81 0.1173 0.00058 105 0.1396 0.00069 70 0.1781 0.00087 91 0.1711 0.00086 73 0.1312 0.00065 71 0.1083 0.00050 111 0.1031 0.00051 94 0.1075 0.00053 101 0.1166 0.00058 79 0.1229 0.00061 91 0.1305 0.00065 72 0.1328 0.00060 84 0.1313 0.00065 85 0.1243 0.00062 86 0.1189 0.00059 104 0.1166 0.00058 62 0.1058 0.00050 82 0.1009 0.00050 102 0.0808 0.00040 77 0.0966 0.00048

128 0.09382 0.00048 111 0.1009 0.00049 131 0.1088 0.00052 55 0.1163 0.00053 104 0.1209 0.00056 102 0.1479 0.00071 102 0.1727 0.00089 93 0.1568 0.00088 114 0.1217 0.00066 102 0.0974 0.00054 109 0.0924 0.00053 107 0.0973 0.00057 103 0.1069 0.00060 118 0.1121 0.00067 95 0.1175 0.00069 145 0.1196 0.00070 127 0.1189 0.00069 113 0.1108 0.00064 117 0.1047 0.00061 132 0.0982 0.00057 101 0.0907 0.00052 124 0.0882 0.00049 101 0.0876 0.00040 94 0.901 0.00048

2

3

4

5.2. 24-hour ahead forecasting 5

Results of twenty-four proposed 24-h ahead WESN-based load forecasters for day hours are first presented while actual measured temperatures are used as the predicted temperature input to the ESNs forecasting components. Again, the decomposition of load subseries up to scale 5 is adopted. Table 5 gives the best DR sizes of the trained and validated ESNs for each component as well as the resulting MAPE and MAE for each trained ESN over the validation portion of the data series. The resulting MAPE and MAE for load

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24 Fig. 13. Sensitivity analysis of load forecasting error to temperature deviation at (a) 3:00 (b) 19:00 and (c) 23:00.

393

DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE

123 1.0712 0.00537 119 1.1050 0.00554 112 1.1773 0.00594 124 1.2285 0.00615 91 1.2943 0.00644 108 1.4763 0.00732 137 1.8962 0.00943 124 1.7708 0.00886 113 1.4307 0.00714 89 1.1895 0.00594 141 1.1594 0.00536 82 1.2117 0.00614 153 1.2705 0.00634 154 1.3225 0.00662 121 1.3761 0.00689 111 1.4767 0.00746 114 1.5399 0.00773 107 1.4845 0.00747 92 1.3716 0.00683 83 1.3329 0.00662 71 1.2044 0.00611 122 1.1115 0.00564 113 0.8964 0.00452 127 1.0699 0.00533

394

A. Deihimi et al. / Energy 57 (2013) 382e401

Table 6 Resulting MAPE and MAE for 24-h ahead WESN-based load forecasters tested over 2 last years. Day hours

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Average

Using measured temperatures

Using forecasted temperatures

Using noisy temperatures

MAPE (%)

MAE

MAPE (%)

MAE

MAPE (%)

MAE

1.5137 1.5704 1.8410 1.8857 2.1003 2.3214 2.6792 2.3824 1.9094 1.5410 1.4972 1.5166 1.5604 1.6250 1.7228 1.8258 1.9125 1.8439 1.7554 1.6490 1.6059 1.4333 1.4232 1.4345 1.7729

0.0080 0.0081 0.0091 0.0097 0.0104 0.0118 0.0145 0.0135 0.0108 0.0094 0.0090 0.0091 0.0095 0.0098 0.0103 0.0108 0.0113 0.0108 0.0104 0.0099 0.0097 0.0085 0.0072 0.0086 0.0101

1.5230 1.6525 1.8559 1.9093 2.1612 2.4962 2.9351 3.3912 2.0145 1.6143 1.5253 1.6501 1.5780 1.6426 1.8652 1.8802 1.9531 1.8953 1.7695 1.6596 1.6173 1.4384 1.4257 1.4508 1.8709

0.0081 0.0085 0.0093 0.0098 0.0108 0.0128 0.0161 0.0196 0.0116 0.0094 0.0091 0.0093 0.0097 0.0099 0.0105 0.0108 0.0114 0.0112 0.0108 0.0099 0.0098 0.0085 0.0073 0.0089 0.0105

1.5419 1.6822 1.8492 1.8823 2.1507 2.5922 2.7785 2.4881 1.9755 1.5564 1.5027 1.5178 1.6257 1.6900 1.8089 1.8837 1.9996 1.8870 1.8392 1.6788 1.6607 1.4831 1.5678 1.4976 1.8391

0.0083 0.0088 0.0093 0.0097 0.0107 0.0131 0.0150 0.0140 0.0110 0.0091 0.0090 0.0091 0.0099 0.0103 0.0107 0.0110 0.0116 0.0110 0.0109 0.0100 0.0100 0.0089 0.0076 0.0087 0.0103

forecasters tested over two last years of the data series are given in Table 6. The average MAPE and MAE over 24 h of the day are 1.7729% and 0.0101, respectively. The forecasted results obtained from the proposed 24-h ahead WESN-based load forecasters are illustrated

in Fig. 14 against actual loads at some of day hours for 101 days of the test data. Also, the corresponding AE of those load forecasts are shown in Fig. 14. As mentioned before, the two values of the day-type index are selected to be the same as those in Ref. [31], namely, 0.25 and 0.75 for weekends/holidays and weekdays respectively. This index is used as an input for 24-h ahead load forecasting (see Fig. 3(b)). Actually, each pair of values from [0,1] could be selected to discriminate between weekends/holidays and weekdays, providing that their difference would be sufficiently large. The study on the forecasting errors at different day hours while examining diverse pairs of values of the day-type index shows that 0.2 is a sufficient difference so that forecasting errors at a specific hour of the day are well close to each other for differences greater than 0.2 and gradually increase for differences less than 0.2. The worst forecasting error is obtained when the difference is zero. For instance, the resulting MAPE of 24-h ahead WESN-based load forecaster at 9:00 grows (from 1.9094 given in Table 6) to 2.6491 in the case of the zero difference. It should be noted that the best design of the 24-h ahead load forecaster is again obtained for each new selected pair of values of the day-type index. Next, results of twenty-four proposed 24-h ahead WESN-based temperature forecasters for day hours are investigated. Again, the decomposition of temperature subseries up to scale 5 is used. Table 7 gives the best DR sizes of the trained and validated ESNs for each component as well as the resulting MAPE and MAE for each trained ESN over the validation portion of the data series. The resulting MAPE and MAE for temperature forecasters tested over two last years of the data series are given in Table 8. The average MAPE and MAE over 24 h of the day are 1.8741% and 0.0105, respectively. The forecasted results using the proposed 24-h ahead WESN-based temperature forecasters are illustrated in Fig. 15 against actual temperatures at some of day hours for 101 days of

Fig. 14. Results from the proposed 24-h ahead WESN-based load forecasters using actual measured temperatures: comparison of forecasted and actual loads, and forecasting AE at (a,b) 1:00, (c,d) 6:00, (e,f) 11:00, (g,h) 16:00, (i,j) 19:00, (k,l) 21:00.

A. Deihimi et al. / Energy 57 (2013) 382e401 Table 7 The best DR sizes of ESNs and their resulting MAPE and MAE over the validation data series for 24-h ahead WESN-based temperature forecasters. Day hours

Quantity

1

DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE DR size MAPE MAE

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

D1 84 2.3390 0.0116 21 2.4694 0.0123 24 2.6228 0.0131 34 2.6727 0.0133 32 2.5602 0.0127 36 2.8124 0.0142 35 2.6667 0.0132 22 2.6218 0.0131 54 2.6579 0.0132 72 2.4778 0.0123 63 2.5023 0.0124 42 2.5569 0.0127 55 2.5742 0.0128 42 2.5044 0.0124 33 2.5261 0.0126 49 2.4181 0.0122 41 2.5075 0.0125 42 2.4663 0.0125 38 2.3718 0.0118 72 2.6168 0.0131 98 2.1095 0.0105 59 2.2162 0.0115 41 2.3914 0.0119 52 2.4378 0.0121

D2 79 1.2029 0.0060 73 1.2247 0.0061 55 1.3475 0.0066 51 1.3074 0.0065 46 1.3359 0.0066 42 1.5136 0.0075 33 1.4521 0.0072 46 1.3539 0.0067 61 1.3955 0.0069 73 1.3557 0.0067 62 1.3021 0.0064 68 1.2653 0.0062 57 1.2448 0.0062 93 1.2271 0.0061 81 1.1673 0.0058 79 1.1494 0.0058 102 1.1877 0.0059 91 1.2068 0.0059 84 1.1802 0.0059 71 1.2266 0.0061 106 1.2272 0.0061 92 1.2089 0.0060 63 1.2767 0.0064 65 1.3092 0.0065

D3 72 0.5226 0.0026 57 0.5643 0.0028 51 0.5869 0.0029 65 0.5667 0.0028 44 0.586 0.0029 81 0.6404 0.0032 74 0.5636 0.0028 82 0.5253 0.0026 75 0.5142 0.0026 61 0.5197 0.0026 59 0.54 0.0027 81 0.5181 0.0026 83 0.5003 0.0025 94 0.4884 0.0024 100 0.4915 0.0025 102 0.48 0.0024 125 0.4614 0.0023 82 0.4689 0.0023 82 0.4591 0.0023 90 0.4813 0.0024 105 0.438 0.0022 78 0.4179 0.0021 74 0.4675 0.0023 71 0.4805 0.0024

D4 83 0.1774 0.00085 58 0.1759 0.00087 82 0.1791 0.00088 60 0.1798 0.00089 62 0.1733 0.00085 64 0.1851 0.00092 72 0.1799 0.00089 71 0.174 0.00086 69 0.1706 0.00085 81 0.1815 0.0009 75 0.1778 0.0008 72 0.1841 0.00091 65 0.1831 0.00091 63 0.1829 0.00091 71 0.1794 0.00089 79 0.1784 0.00088 88 0.1837 0.0009 61 0.1813 0.0009 74 0.1735 0.00085 61 0.1679 0.00083 68 0.1604 0.0007 71 0.1709 0.00085 80 0.1736 0.00086 73 0.1726 0.00085

D5 51 0.2172 0.00102 41 0.2102 0.00101 33 0.2124 0.00101 31 0.2167 0.00102 33 0.2144 0.00101 22 0.2186 0.00103 21 0.2192 0.00103 21 0.2081 0.00099 25 0.2185 0.00103 34 0.2217 0.00106 21 0.2191 0.00103 35 0.2182 0.00104 33 0.2177 0.00104 32 0.2553 0.00114 32 0.2238 0.00106 33 0.2142 0.00102 35 0.2097 0.00099 41 0.2115 0.00103 30 0.1933 0.00096 43 0.2050 0.00100 31 0.1913 0.00095 29 0.1874 0.00093 32 0.1986 0.00098 31 0.2004 0.00100

A5 39 0.1439 0.00091 41 0.1402 0.00088 39 0.1374 0.00089 44 0.1419 0.00090 32 0.1415 0.00090 43 0.1429 0.00093 41 0.1409 0.00093 34 0.1415 0.00089 31 0.1461 0.00092 33 0.1513 0.00094 24 0.1549 0.00095 33 0.1585 0.00096 31 0.1639 0.00098 29 0.1656 0.00098 31 0.1681 0.00098 32 0.1676 0.00095 31 0.1636 0.00092 43 0.1609 0.00092 32 0.1491 0.00082 34 0.1451 0.00085 32 0.1357 0.0008 41 0.1371 0.00082 41 0.1434 0.00087 44 0.1396 0.00086

395

Table 8 Resulting MAPE and MAE for 24-h ahead WESN-based temperature forecasters tested over 2 last years. Day hours

MAPE (%)

MAE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Average

1.4965 1.6564 1.8602 1.9138 2.1663 2.5021 2.9420 3.3992 2.0192 1.6181 1.5289 1.6538 1.5817 1.6465 1.8697 1.8847 1.9577 1.8998 1.7737 1.6636 1.6212 1.4418 1.4260 1.4547 1.8741

0.0080 0.0086 0.0094 0.0099 0.0109 0.0130 0.0163 0.0197 0.0116 0.0095 0.0093 0.0094 0.0098 0.0100 0.0103 0.0109 0.0116 0.0113 0.0109 0.0099 0.0100 0.0086 0.0072 0.0081 0.0106

the test data. Also, the corresponding AE of those temperature forecasts are shown in Fig. 15. Next, the forecasted temperatures obtained from the 24-h WESN-based temperature forecasters designed above are used to provide the predicted next-day temperature input to the ESNs of the corresponding 24-h ahead WESN-based load forecasters for each hour of the day. The resulting MAPE and MAE for the load forecasters tested over two last years of the data series are given in Table 6. The average MAPE and MAE over 24 h of the day are 1.8709% and 0.0105, respectively. The MAPE shows 0.098% increase with respect to the MAPE obtained when actual temperatures are used. To emulate predicted temperature errors and evaluating their effects on the performance of the proposed 24-h ahead WSENbased load forecasters, a zero-mean Gaussian noise with a standard deviation equal to the MAE of temperature forecasting for each hour of the day is added to actual temperature series of the same hour, and the resulting series is used to provide the predicted temperature input of the ESNs of the corresponding WESN-based load forecaster. The resulting MAPE and MAE for twenty-four proposed 24-h ahead load forecasters tested over two last years of the data series are given in Table 6. The average MAPE and MAE over 24 h of the day are respectively 1.8391% and 0.0103 that are close to 1.8709% and 0.0105 (see Table 6) obtained when temperature forecasters provide the temperature input to the ESNs of load forecasters. The effects of the temperature prediction error on the performance of each 24-h ahead WESN-based load forecaster designed above are also studied locally at each specific day by sensitivity analysis. Again, some deviations are applied to the actual temperature at a specific day, and the forecasting error obtained from each 24-h ahead WESN-based load forecaster is examined for that specific day. Sensitivity analysis was performed on each 24-h ahead WESN-based load forecaster for a large number of different days. Typical results are depicted in Fig. 16 at some of different days. The actual measured temperature is marked by a square in Fig. 16. The plots show that the sensitivity of load forecasting error is very small for positive and negative deviations of the predicted temperature within 5  C.

396

A. Deihimi et al. / Energy 57 (2013) 382e401

Fig. 15. Results from the proposed 24-h ahead WESN-based temperature forecasters: comparison of forecasted and actual temperatures, and forecasting AE at (a,b) 1:00, (c,d) 6:00, (e,f) 11:00, (g,h) 16:00, (i,j) 19:00, (k,l) 21:00.

Fig. 16. Sensitivity analysis for 24-h ahead WESN-based load forecasting error to temperature deviations at (a) 2:00 and (b) 10:00.

5.3. Comparisons For proper and meaningful comparison of the proposed WESNbased forecasters with other models previously developed and reported in the related articles, the following points should be considered. First, the forecasting target should be the same when the results of the models are compared. This means all models

should generate, for instance, one-hour ahead load forecasts in order that their results can be compared. Hence, for example, the comparison of the results of a next-day forecasting model with the results of a next-hour forecasting model is not proper. The main reason is the different natures of the dynamics of those two time series. Second, the models should be compared using the same data series. Since the dynamics of the electric load (or temperature)

Table 9 Comparison between WESN-based and ESN-based forecasters in terms of the resulting average MAPE over two last years. Type of forecaster

STLF (%)

STTF (%)

Using actual temperatures

WESN (proposed) ESN [31]

Using forecasted temperatures

One-hour ahead

24-h ahead

One-hour ahead

24-h ahead

0.6974 1.14

1.7729 2.37

0.7087 1.2312

1.8709 >6

One-hour ahead

24-h ahead

0.8117 2.1899

1.8741 >3

A. Deihimi et al. / Energy 57 (2013) 382e401 Table 10 Comparison of WESN-based load forecasters and other models in terms of the resulting MAPE over two last years of the data series excluding weekends and holidays. Load forecasting model

MAPE (%)

M1 (MLPNN) [21] M2 (MLPNN) [21] M3 (WT-MLPNN) [21] M4 (WT-MLPNN) [21] WT-MLPNN-EC [19] ESN [31] WESN (proposed)

One-hour ahead

24-h ahead

2.10 1.10 1.12 1.99 0.99 1.0480 0.6826

3.58 3.41 3.16 2.64 2.04 2.1174 1.7089

Table 11 Comparison of load forecasters (WESN and abductive network) tested over the year 1990. Load forecasting error over the year 1990

MAPE (%)

One-hour ahead load forecasters

24-h ahead load forecasters

WESN (proposed)

Abductive network

WESN (proposed)

Abductive network

0.6901

1.14

1.9327

2.66

variations can be quite different for different locations, the forecasting results are most often different even using the same forecasting model. There are several forecasting models proposed in the articles (e.g. in Refs. [1,2,4,25,30,38,39]) that use specific time series unavailable to other researchers. Thus, for comparison purposes, the models developed in such articles should be carefully rebuilt for testing on the same available time series, providing that the sufficient details of the developed model have been given in the article. Third, the same weather factors should be used for the models. For instance, if only temperature is used for one load forecasting model while temperature and humidity are used for the other one, the comparison of load forecasts of two models is not fair due to probable adverse or improving effects of additional humidity data on load forecasting [30]. In view of the aforementioned points this section gives the comparison of the proposed WESN-based forecasters with other forecasting models. First, the comparison is made appropriately using models developed in Refs. [5,19,21, 31,52e56] that were tested previously on the data series of the North-American electric utility which are also used in this study. Next, the comparison is made by rebuilding other previously developed models given in Refs. [11,25,38,57] that have not been tested on these data series yet. In Ref. [31], the ESN was applied as a stand-alone forecaster for STLF and STTF using the same data series as used in this study. Both one-hour and 24-h ahead load and temperature forecasting just in the same fashion as this study were considered in Ref. [31] where promising high accuracy of predictions was reported except for 24h ahead temperature forecasting. Comparisons between the ESNbased forecasters [31] and the proposed WESN-based forecasters in terms of average prediction accuracy over two last years of the data series are given in Table 9. This comparison reveals significant

397

improvements of the load and temperature forecasts using the proposed WESN-based forecasters in all items especially for 24-h ahead temperature forecasting and 24-h ahead load forecasting by employing forecasted temperatures. Two MLPNN models called M1 and M2 and two WT-MLPNN models called M3 and M4 were proposed in Ref. [21] for onehour ahead load forecasting with temperature as the only considered weather factor where they were examined using the data series of the North-American electric utility excluding the data of weekends and holidays. Those models used recursion (i.e. feeding input with the output of the model) for multiple steps (up to 24-h) ahead load forecasting. M1 has 11 inputs: loads and temperatures of current hour and 1, 23 and 167 h ago; predicted next-hour temperature and two additional inputs codifying the next-hour of the day. M2 has four additional inputs (i.e. first-order differenced loads of current hour and 1, 23 and 167 h ago) with respect to M1. M3 has eight additional inputs (i.e. first-order differenced A3 and D3 at current hour and 1, 23 and 167 h ago) with respect to M2. In M4, three MLPNN-based forecasters for the components A3, D3 and D2 and a mean calculation unit for D1 are used that sum of their outputs gives the next-hour load forecast. The forecasters of D3 and D2 do not use temperature input. Using the same time series without weekends and holidays, a WT-MLPNN-EC model was examined in Ref. [19]. The model uses four MLPNN-based forecasters for the components A3, D3, D2 and D1. A sufficiently long set of lagged values (up to 500 h ago) of the components and temperature in addition to the predicted next-hour temperature are considered as inputs while a two-step correlation analysis is used to select the best lagged inputs for each component forecaster. Furthermore, the weight matrix of each component forecaster is improved by an evolutionary algorithm after training by LevenbergeMarquardt algorithm. The comparison of the values of the resulting MAPE for M1 [21], M2 [21], M3 [21], M4 [21], WT-MLPNN-EC [19], ESN [31] and the proposed WESN-based load forecasters tested over two last years of the data series is given in Table 10 while the data of weekends and holidays have been omitted from the whole data series for training, validation and testing the models. Moreover, actual measured temperatures are used to feed the predicted temperature input for all models. Significant improvements for both one-hour and 24-h ahead load forecasting are observed using the proposed WESN-based load forecasters. In Ref. [52], abductive networks were used for both one-hour and 24-h ahead load forecasting while five years (1985e1989) of the data series of the North-American electric utility were applied for model synthesis, and the models were tested over the year 1990. One model was used for each hour of the day. The inputs for 24-h ahead load forecasting are minimum and maximum temperatures and 24 hourly loads of the previous day, predicted minimum and maximum temperatures of the current day and current day 4-value day-type code. The inputs for one-hour ahead load forecasting are daily mean temperature and 24 hourly loads of the previous day, predicted daily mean temperature of the current day, current day 2value day-type code and hourly loads from 1:00 to current hour for the current day. The best inputs of each model are obtained in an automatic iterative process. Table 11 gives a comparison of the

Table 12 Comparison of WESN-based load forecasters and other models for 16e40-h ahead load forecasting for TuesdayseFridays and 16e88 h ahead load forecasting for weekends plus Mondays. Load forecasting error: from Nov. 1, 1990 to March 31, 1991

Load forecasting model EGRV [53]

NN with L2-SVM [54]

NN with EB [54]

LWR [5]

Local SVR [5]

LWSVR [5]

WESN (proposed)

MAPE (%) MAE (MW)

4.73 Not given

4.88 Not given

4.89 Not given

4.71 139.11

4.08 121.84

3.62 101.02

3.2708 89.5597

398

A. Deihimi et al. / Energy 57 (2013) 382e401 Table 14 Comparison of temperature forecasters (WESN, abductive network and MLPNN) tested on the year 1990.

Fig. 17. The resulting MAPE of every day of the week for EGRV, LWSVR and WESN models over the testing period.

results of abductive networks [52] with those of the proposed WESN-based load forecasters while the models are tested over the year 1990. Significant improvements for both one-hour and 24-h ahead load forecasting are observed using the WESN model. In Refs. [5,53,54], the data series of the North-American electric utility were used to forecast the entire hourly loads of the next day at 8:00 on current day, hence, from 16 up to 40-h ahead when the next day is a weekday out of TuesdayeFriday as well as the entire hourly loads of three consecutive days at 8:00 on Fridays, hence, from 16 up to 88-h ahead for entire weekend plus Monday. The entire hourly load forecasts of the next day start from the midnight up to the next midnight for the former case, and the entire hourly load forecasts of three consecutive days begin from the midnight of Saturday up to the midnight of Tuesday for the latter case. The results of load forecasts were obtained when actual measured temperatures feed the predicted temperature input. In Ref. [53], 24 regression models (one for each hour of the day) for Tuesdayse Fridays and 24 regression models for weekends plus Mondays were developed using 31 inputs: loads at 8:00 for the last day, the last Monday and the day after the last holiday; temperature and its square for the target hour; maximum temperature and its square for both target day and the previous day; moving average of midnight temperatures of 7 last days; introduced interactions between temperature and monthly binary variables for October, November, December, February and March; dummy variables for year, inversed year, October, November, December, February, March, Monday, Friday and a day after holiday; a constant bias; and five hourly lagged prediction errors before 8:00 on the last day. This multiple regression model is called EGRV referring to the first letters of its developers’ names (Engle, Granger, Ramanathan and Vahid-Araghi). In Ref. [54], two nonparametric learning procedures,

Temperature forecasting error over the year 1990

One-hour ahead temperature forecasters

24-h ahead temperature forecasters

WESN

Abductive network

MLPNN

WESN

Abductive network

MLPNN

MAPE (%) MAE ( C)

0.8037 0.2231

2.14 0.583

2.37 0.616

1.9817 0.5172

3.49 0.933

3.86 0.989

namely an EB (extended Bayesian) learning and a SVM (support vector machine) learning with a second order error function called L2-SVM, were developed to tackle the problems of NN structure and input selection for STLF. To apply the models to the aforementioned STFL problem, 84 inputs were considered that included 24 dummy variables codifying the hours of the day; 1e6, 24e29 and 168e173-h lags for load as well as temperature and its square; the predicted target-hour temperature and its square; the predicted maximum target-day temperature and its square; and the maximum temperature and its square for the previous day. In Ref. [5], three models including LWR (locally weighted regression), local SVR (support vector regression) and LWSVR (locally weighted support vector regression) were applied to the aforementioned STLF problem. The same inputs were considered as used in Ref. [54]. In order to apply the proposed WESN-based load forecasters to the same STLF problem, appropriate 48, 72 and 96-h ahead WESNbased load forecasters are designed for hours of the day like 24-h ahead WESN-based load forecasters (Fig. 3(b)) where d þ 1 is then replaced by d þ 2, d þ 3 and d þ 4 respectively. 96-h ahead WESN-based load forecasters are used only for hours from 8:00 up to 24:00 to forecast loads at these hours on Monday. Furthermore, 96-h ahead WESN-based load forecasters are not needed for hours from 1:00 to 7:00 because the load forecasts at these hours on Monday are produced by the corresponding 72-h ahead WESNbased load forecasters. The resulting MAPE of the proposed WESN-based forecasters is compared with that for EGRV [53], NN with EB [54], NN with L2-SVM [54], LWR [5], Local SVM [5] and LWSVR [5] over the testing period from November 1, 1990 to March 31, 1991 as given in Table 12. The comparison shows the superiority of the proposed WESN-based load forecasters. In addition, the resulting MAPE computed over the testing period for each day of the week is given in Fig. 17 for EGRV [53], LWSVR [5] and the proposed WESN-based load forecasters. It can be seen from these results that the WESN model gives better load forecast accuracy for

Table 13 Comparison of one-hour ahead WESN-based load forecasters and other one-hour ahead load forecasters tested over two last years of the data series. Days of the week

Monday Tuesday Wednesday Thursday Friday Saturday Sunday Average

Forecasting error

MAPE (%) MAE (MW) MAPE (%) MAE (MW) MAPE (%) MAE (MW) MAPE (%) MAE (MW) MAPE (%) MAE (MW) MAPE (%) MAE (MW) MAPE (%) MAE (MW) MAPE (%) MAE (MW)

Different models for one-hour ahead load forecasting WESN

ESN [31]

ANFIS [25]

WFNN [25]

ARMAX [11]

0.7416 16.3520 0.6866 15.0112 0.6553 14.5443 0.6801 15.0118 0.7246 15.7970 0.6677 14.8141 0.7261 15.8627 0.6974 15.3418

1.1840 26.2178 1.1005 24.6863 1.0734 24.1308 1.1651 25.8993 1.1554 25.6348 1.1007 24.5032 1.2011 26.4402 1.14 25.3589

3.0775 69.5937 2.8025 63.8611 2.6978 61.6795 3.0322 68.5754 3.1575 71.1042 2.8992 64.8681 2.8723 64.3484 2.9341 66.2901

1.4650 33.4052 1.3618 31.5563 1.4135 32.3594 1.4378 33.7788 1.4566 33.4369 1.4572 33.1683 1.4293 32.1691 1.4316 32.6963

1.6179 36.8847 1.5661 35.4679 1.5704 35.3078 1.6488 37.2286 1.5966 36.0601 1.6418 36.9016 1.5773 35.5328 1.6027 36.1976

A. Deihimi et al. / Energy 57 (2013) 382e401

399

Table 15 Comparison of temperature forecasters (WESN and WNN) for daily mean temperature forecasting in terms of the resulting MAPE over the last year of the data series. Forecasting model

Sept. 91

Oct. 92

Nov. 91

Dec. 91

Jan. 92

Feb. 92

Mar. 92

Apr. 92

May 92

June 92

July 92

Aug. 92

Ave.

WESN WNN

0.6427 1.1532

0.6449 1.1453

0.6421 1.1392

0.6461 1.1496

0.6236 1.1318

0.6225 1.1128

0.6364 1.1467

0.6611 1.2521

0.6379 1.1328

0.5567 0.9712

0.5272 1.0102

0.4872 0.9193

0.6107 1.1053

different days of the week except Monday. Particularly, it significantly enhances load forecasts for weekends (Saturdays and Sundays). Although the MAPE of the WESN model is somewhat greater than that for LWSVR for Mondays, the difference (about 0.1%) is fairly small and totally the performance of the WESN model is better than other models given in Table 12. Actually, 96-h ahead WESN-based load forecasters at some hours yield the worst accuracy in comparison to similar 24, 48, and 72-h ahead WESN-based load forecasters. Table 13 gives the results of one-hour load forecasting resulted from rebuilt models from Refs. [11,25] along with those of the proposed WESN models and ESNs [31] when the models are tested over two last years of the data series of the North-American electric utility, and the resulting MAPE and MAE of each day of the week is reported separately. From Ref. [11], an ARMAX model is rebuilt where PSO algorithm is used for order determination and parameter estimation. The temperature is the only exogenous variable used here. The training, validation and testing data are the same as those used for WESN model. From Ref. [25], two models namely ANFIS and WFNN are rebuilt here. The inputs used in Ref. [25] are hour of the day, day of the month, previous hour load, previous hour temperature, previous hour humidity, previous hour wind speed, previous day load during the same hour, and mean of the previous week load. Since the data of humidity and wind speed are not available for the data series used here, their related inputs are omitted for this study. The models (ANFIS and WFNN) are trained using four years of the data series and tested on two last years of the data series. The results in Table 13 show that the performance of the proposed WESN model is superior than that of other compared models. In Ref. [55], abductive networks were used for both one-hour and 24-h ahead temperature forecasting while five years (1985e 1989) of the temperature data series of the North-American electric utility were applied for model synthesis, and the models were tested over the year 1990. One model was used for each hour of the day. The inputs for 24-h ahead temperature forecasting are minimum and maximum temperatures and 24 hourly temperatures of the previous day and predicted minimum and maximum temperatures of the current day. The inputs for one-hour ahead load forecasting are minimum and maximum temperatures and 24 hourly temperatures of the previous day, predicted minimum and maximum temperatures of the current day, and hourly temperatures from 1:00 to current hour for the current day. The best inputs of each model are obtained in an automatic iterative process. In Ref. [55], it is addressed to use MLPNNs just like abductive networks for each hour of the day in both one-hour and 24-h ahead temperature forecasting with the same inputs mentioned above for comparison purposes. In this case, 20% of the training data is used for cross validation. Table 14 gives a comparison of the results for abductive networks [55], MLPNNs [55] and the proposed WESN-based temperature forecasters while the models are tested over the year 1990. Significant improvements for both one-hour and 24-h ahead temperature forecasting are observed using the WESN model. Table 15 gives the results of daily mean temperature forecasting resulted from the rebuilt WNN model from Refs. [38,57] along with those of the WESN model when the models are tested over the last years of the temperature data series of the North-American electric utility, and the resulting MAPE of each month is reported separately.

Cloud density data is used in Ref. [38] as only exogenous variable for daily mean temperature forecasting while no exogenous variable is used in Ref. [57] though the same model employed. Then, the model in Ref. [57] is actually rebuilt here for the comparison purpose. One-day ahead WESN-based mean temperature forecaster follows a structure similar to the one depicted in Fig. 4(a) while the argument K (referring to hour) in parentheses is replace by d (referring to day). The results indicate the significant improvement of the forecasting accuracy using the WESN model. Eventually, main benefits of the proposed WESN models can be expressed as: - Significant improvements in short-term load and temperature forecasting accuracy when a minimum number of inputs are used. As observed, all the other models compared with the proposed model have greater number of inputs, specially lagged inputs. This shows high capability of ESNs to learn features of time series extracted by DWT. - Time-consuming process of selecting the best lagged inputs needed by many other forecasting models is avoided. - Time-consuming converging learning approaches needed by many other forecasting models is avoided through a very fast training process using pseudoinverse matrix. - Effective capability to give high accurate load and temperature forecasts for a long period after one time training thereby avoiding the need for short time re-training of the model every day or every week. 6. Conclusions This paper proposes WESN-based forecasters with novel ESNbased reconstruction stage for one-hour and 24-h ahead load and temperature forecasting while the ESNs forecasting decomposed components (approximation and details) as well as the reconstructing ESNs are designed using a modified SFL algorithm. The main target is to improve forecasting accuracy for STLF and STTF. Normalized hourly load and temperature data series of a NorthAmerican electric utility is used to evaluate the performance of the proposed WESN-based forecasters. The resulting MAPE values for one-hour ahead load forecasting are respectively 0.6974%, 0.7087% and 0.7011% when actual temperatures, forecasted temperatures and noisy temperatures by a zero-mean Gaussian noise (with a standard deviation equal to the MAE of forecasted temperatures) are used to provide the predicted temperature inputs to ESNs forecasting components. The resulting average MAPE values for 24-h ahead load forecasting are respectively 1.7729%, 1.8709% and 1.8391% for three cases similar to those given in the previous sentence. The resulting MAPE for one-hour ahead temperature forecasting and the resulting average MAPE for 24-h ahead temperature forecasting are respectively 0.8117% and 1.8741%. They show significant improvement of forecasting accuracy in comparison to the similar ESN-based forecasters. The study of temperature error effects on performance of the proposed one-hour ahead WESN-based load forecaster shows less sensitivity to negative deviations than to positive ones. However, the sensitivity is very small for both positive and negative deviations when temperature error effects on the performance of the proposed 24-h ahead WESN-

400

A. Deihimi et al. / Energy 57 (2013) 382e401

based load forecasters are investigated. The proposed WESN models also outperform other compared models. Acknowledgement We thank Professor Alexandre P. Alves da Silva and Professor Vitor H. Ferreira for the data series of a North-American electric utility. References [1] Niu DX, Shi HF, Wu DD. Short-term load forecasting using Bayesian neural networks learned by hybrid Monte Carlo algorithm. Applied Soft Computing 2012;12:1822e7. [2] Moazzami M, Khodabakhshian A, Hooshmand R. A new hybrid day-ahead peak load forecasting method for Iran’s national grid. Applied Energy 2013;101:489e501. [3] Che J, Wang J, Wang G. An adaptive fuzzy combination model based on selforganizing map and support vector regression for load forecasting. Energy 2012;37:657e64. [4] Amina M, Kodogiannis VS, Petrounias I, Tomtsis D. A hybrid intelligent approach for the prediction of electricity consumption. International Journal of Electrical Power & Energy Systems 2012;43:99e108. [5] Elattar EE, Goulermas JY, Wu QH. Electric load forecasting based on locally weighted support vector regression. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 2010;40:438e47. [6] Yadav V, Srinivasan D. A SOM-based hybrid linear-neural model for shortterm load forecasting. Neurocomputing 2011;74:2874e85. [7] Hong WC. Load forecasting by seasonal recurrent SVR (support vector regression) with chaotic artificial bee colony algorithm. Energy 2011;36: 5568e78. [8] Ko CN, Lee CM. Short-term load forecasting using SVR (support vector regression)-based radial basis function neural network with dual extended Kalman filter. Energy 2013;49:413e22. [9] Cai Y, Wang JZ, Tang Y, Yang YC. An efficient approach for load forecasting using distributed ART (adaptive resonance theory) & HS-ARTMAP (Hyperspherical ARTMAP network) neural network. Energy 2011;36:1340e50. [10] Zhang WY, Hong WC, Dong Y, Tsai G, Sung JT, Fan GF. Application of SVR with chaotic GASA algorithm in cyclic electric load forecasting. Energy 2012;45: 850e8. [11] Huang CM, Huang CJ, Wang ML. A particle swarm optimization to identifying the ARMAX model for short-term load forecasting. IEEE Transactions on Power Systems 2005;20:1126e33. [12] Niu DX, Wang Y, Wu DD. Power load forecasting using support vector machine and ant colony optimization. Expert Systems with Applications 2010;37:2531e9. [13] Wang J, Zhu S, Zhang W, Lu H. Combined modeling for load forecasting with adaptive particle swarm optimization. Energy 2010;35:1671e8. [14] An N, Zhao W, Wang J, Shang D, Zhao E. Using multi-output feedforward neural network with empirical mode decomposition based signal filtering for electricity demand forecasting. Energy 2013;49:279e88. [15] Liao GC, Tsao TP. Application of a fuzzy neural network combined with a chaos genetic algorithm and simulated annealing to short-term load forecasting. IEEE Transactions on Evolutionary Computation 2006;10:330e40. [16] Wang CH, Grozev G, Seo S. Decomposition and statistical analysis for regional electricity demand forecasting. Energy 2012;41:313e25. [17] Zahedi G, Azizi S, Bahadori A, Elkamel A, Alwi SRW. Electricity demand estimation using an adaptive neuro-fuzzy network: a case study from the Ontario province e Canada. Energy 2013;49:323e8. [18] Bashir ZA, El-Hawary ME. Applying wavelets to short-term load forecasting using PSO-based neural networks. IEEE Transactions on Power Systems 2010;24:20e7. [19] Amjady N, Keynia F. Short-term load forecasting of power systems by combination of wavelet transform and neuro-evolutionary algorithm. Energy 2009;34(1):46e57. [20] Chen Y, Luh PB, Guan C, Zhao Y, Michel LD, Coolbeth MA, et al. Short-term load forecasting: similar day-based wavelet neural networks. IEEE Transactions on Power Systems 2010;25:322e30. [21] Reis AJR, Alves da Silva AP. Feature extraction via multiresolution analysis for short-term load forecasting. IEEE Transactions on Power Systems 2005;20(1): 189e98. [22] Pandey AS, Singh D, Sinha SK. Intelligent hybrid wavelet models for shortterm load forecasting. IEEE Transactions on Power Systems 2010;25:1266e73. [23] Chang YJ, Wang SH, Sun HY. Research of short-term load forecasting algorithm based on wavelet analysis and radial basis function neural network. International Conference on Power Electronics and Intelligent Transportation System 2009;1:81e4. [24] Sinha N, Lai LL, Ghosh PK, Ma Y. Wavelet-GA-ANN based hybrid model for accurate prediction of short-term load forecast. International Conference on Intelligent Systems Applications to Power Systems 2007;1:1e8.

[25] Hanmandlu M, Chauhan BK. Load forecasting using hybrid models. IEEE Transactions on Power Systems 2011;26:20e9. [26] Kim CI, Yu IK, Song YH. Kohonen neural network and wavelet transform based approach to short-term load forecasting. Electric Power Systems Research 2002;63:169e76. [27] Huang CM, Yang HT. Evolving wavelet-based networks for short-term load forecasting. IEE Proceedings-Generation, Transmission and Distribution 2001;148:222e8. [28] Meng M, Sun W. Short-term load forecasting based on rough set and wavelet neural network. International Conference on Computational Intelligence and Security 2008;2:446e50. [29] Zhang Q. Research on short-term load forecasting based on fuzzy rules and wavelet neural network. International Conference on Computer Engineering and Technology 2010;3:343e7. [30] Kelo S, Dudul S. A wavelet Elman neural network for short-term electrical load prediction under the influence of temperature. International Journal of Electrical Power & Energy Systems 2012;43:1063e71. [31] Deihimi A, Showkati H. Application of echo state networks in short-term load forecasting. Energy 2012;39:327e40. [32] Wyffels F, Schrauwen B, Stroobandt D. Using reservoir computing in a decomposition approach for time series prediction. European Symposium on Time Series Prediction 2008;1:149e58. [33] Wang J, Peng Y, Peng X. Anti boundary effect wavelet decomposition echo state networks. Lecture Notes in Computer Science 2011;6675:445e54. [34] Deihimi A, Momeni A. Neural estimation of voltage-sag waveforms of nonmonitored sensitive loads at monitored locations in distribution networks considering DGs. Electric Power Systems Research 2012;92:123e37. [35] Sheng C, Zhao J, Liu Y, Wang W. Prediction for noisy nonlinear time series by echo state networks based on dual estimation. Neurocomputing 2012;82: 186e95. [36] Xing K, Wang Y, Zhu Q, Zhou H. Modeling and control of McKibben artificial muscle enhanced with echo state networks. Control Engineering Practice 2012;20:477e88. [37] Li G, Niu P, Zhang W, Zhang Y. Control of discrete chaotic systems based on echo state network modeling with an adaptive noise canceler. KnowledgeBased Systems 2012;35:35e40. [38] Sharma A, Agarwal S. Temperature prediction using wavelet neural network. Research Journal of Information Technology 2012;4:22e30. [39] Eynard J, Grieu S, Polit M. Wavelet-based multi-resolution analysis and artificial neural networks for forecasting temperature and thermal power consumption. Engineering Applications of Artificial Intelligence 2011;3:501e16. [40] Weron R. Modeling and forecasting electricity loads and prices. Chichester, England: John Wiley & Sons Ltd.; 2006. [41] Mallat S. A theory for multiresolution signal decomposition-the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 1989;11:674e93. [42] Akansu AN, Haddad PR. Multiresolution signal decomposition: transforms, subbands, and wavelets. Academic Press; 2000. [43] Jaeger H. The “echo state” approach to analysing and training recurrent neural networks. GMD tech. report 148, German National Research Center for Information Technology. Fraunhofer Institute for Autonomous Intelligent Systems; 2001. [44] Venayagamoorthy GK, Shishir B. Effects of spectral radius and settling time in the performance of echo state networks. Neural Networks 2009;2:861e3. [45] Eusuf MM, Lansey KE. Optimization of water distribution network design using the shuffled frog leaping algorithm. Journal of Water Resources Planning and Management 2003;129:210e25. [46] Jadidoleslam M, Bijami E, Ebrahimi A. Generation expansion planning by a modified SFL algorithm. Intelligent Systems in Electrical Engineering 2011;2: 27e44. [47] Elbeltagi E, Hegazy T, Grierson D. Comparison among five evolutionarybased optimization algorithms. Advances in Engineering Informatics 2005;19:43e53. [48] Khorsandi A, Alimardani A, Vahidi B, Hosseinian SH. Hybrid shuffled frog leaping algorithm and NeldereMead simplex search for optimal reactive power dispatch. IET Generation, Transmission & Distribution 2011;5:249e56. [49] Naghizadeh RA, Vahidi B, Hosseinian SH. Modelling of inrush current in transformers using inverse JileseAtherton hysteresis model with a neuroshuffled frog-leaping algorithm approach. IET Electric Power Applications 2012;6:727e34. [50] Elbeltagi E, Hegazy T, Grierson D. A modified shuffled frog-leaping optimization algorithm: applications to project management. Structure and Infrastructure Engineering 2007;3:53e60. [51] Erol OK, Eksin I. A new optimization method: big bangebig crunch. Advances in Engineering Software 2006;37:106e11. [52] Abdel-Aal RE. Short-term hourly load forecasting using abductive networks. IEEE Transactions on Power Systems 2004;19:164e73. [53] Ramanathan R, Engle R, Granger CWJ, Araghi FV, Brace C. Short-run forecasts of electricity loads and peaks. International Journal of Forecasting 1997;13: 161e74. [54] Ferreira VH, da Silva APA. Toward estimation autonomous neural networkbased electric load forecasters. IEEE Transactions on Power Systems 2007;22:1554e62.

A. Deihimi et al. / Energy 57 (2013) 382e401 [55] Abdel-Aal RE. Hourly temperature forecasting using abductive networks. Engineering Applications of Artificial Intelligence 2004;17:543e56. [56] Khotanzad A, Davis MH, Abaye A, Maratukulam DJ. An artificial neural network hourly temperature forecaster with applications in load forecasting. IEEE Transactions on Power Systems 1996;11:870e6.

401

[57] Rastogi A, Srivastava A, Srivastava VK, Pandey AK. Pattern analysis approach for prediction using wavelet neural networks. International Conference on Natural Computation 2011;1:695e9. [58] Chen SM, Hwang JR. Temperature prediction using fuzzy time series. IEEE Transactions on Systems, Man and Cybernetics 2000;30:263e75.