Electricity price forecasting with a BED (Bivariate EMD Denoising) methodology

Electricity price forecasting with a BED (Bivariate EMD Denoising) methodology

Energy 91 (2015) 601e609 Contents lists available at ScienceDirect Energy journal homepage: www.elsevier.com/locate/energy Electricity price foreca...

1MB Sizes 3 Downloads 145 Views

Energy 91 (2015) 601e609

Contents lists available at ScienceDirect

Energy journal homepage: www.elsevier.com/locate/energy

Electricity price forecasting with a BED (Bivariate EMD Denoising) methodology Kaijian He, Lean Yu*, Ling Tang School of Economics and Management, Beijing University of Chemical Technology, Beijing, 100029, China

a r t i c l e i n f o

a b s t r a c t

Article history: Received 16 December 2014 Received in revised form 7 August 2015 Accepted 11 August 2015 Available online xxx

The forecasting of electricity price remains the subject of increasingly intense research attention as the market structure becomes more complicated with the deregulation waves and the increasing level of price fluctuations observed. The heterogeneous data structure revealed in the recent empirical studies serves as the important stylized fact to be explored and analyzed in the heterogeneous market structure framework. Facing the increasingly diversified and more integrated market environment, the forecasting model in the electricity markets needs to take into account the individual and inter dependent heterogeneity features such as noises. In this paper, under the proposed HMH (Heterogeneous Market Hypothesis), we propose a BED (Bivariate EMD Denoising) based forecasting methodology to track and predict the electricity price movement. The BED algorithm is introduced as the feature extraction tool to identify and remove the noises, where the Error Entropy is further used as the criteria to determine the optimal level in EMD (Empirical Mode Decomposition) to be shrinkaged. Empirical studies conducted in the Australian electricity markets demonstrate the significant performance improvement of the proposed BED algorithm incorporating the heterogeneous market characteristics, against benchmark models. © 2015 Elsevier Ltd. All rights reserved.

Keywords: Cross-correlation BED (Bivariate EMD Denoising Algorithm) Vector autoregressive model Electricity market

1. Introduction Electricity industry has long been the government protected area restricted to outside investment, which has hindered its development and resulted in lower level of efficiency, higher costs, lack of innovation and insufficient investment, etc. The demand for more competitive electricity market from both wholesale and retail sectors has pushed the deregulation wave across electricity markets, together with the establishment of integrated electricity market at the national level [1]. Accompanying these waves are significantly higher level of short-term volatility and regional integration in the market [1]. The smooth and efficient operations of the market as well as the appropriate risk management demand more accurate forecasting of electricity price [2]. Over the years, several lines of researches have been carried out to tackle the forecasting issue. Main approaches include the econometric models, artificial intelligence models, time series models [3]. Traditionally the econometric model takes the

* Corresponding author. Tel./fax: þ86 010 64438793. E-mail addresses: [email protected] (K. He), [email protected] (L. Yu), [email protected] (L. Tang). http://dx.doi.org/10.1016/j.energy.2015.08.021 0360-5442/© 2015 Elsevier Ltd. All rights reserved.

fundamentalist approach to study the market price evolution and strategic behavior of market participants using equilibrium and simulation analysis based on economic theories such as the classical Nash equilibrium type models, etc. [4]. For example, Ref. [5] presents a procedure capable of taking into account some risk factors, such as hydraulic inflows, demand growth and fuel costs [5]. Ref. [6] has obtained a closed-form expression that provides the price for a SFE given a demand realization under the assumption of an n-firm symmetric oligopoly with inelastic demand and uniform pricing [6]. Both Refs. [7,8] analyzed industry level software system for their effectiveness in energy market behavior simulation and price forecasting. The performance of this approach usually suffice for the medium to long term time horizon, but may not approximate the rapid movement in the market in the short term due to the exponentially increasing level of computational complexity and dimensional curse in the analyzing the main driving factors [7,8]. The AI (Artificial Intelligence) based approaches such as ANN (Artificial Neural Network), SVR (Support Vector Regression) and EMD (Empirical Mode Decomposition) largely rely on the data mining exercises to extract nonlinear data patterns [9,10]. For example, Ref. [11] proposed the cascaded architecture of multiple ANN to forecast MCP [11]. Ref. [12] used an extended Kalman filter

602

K. He et al. / Energy 91 (2015) 601e609

combined with ANN to improve the forecasting performance of MCP [12]. However, the performance improvement in AI based approaches is not consistent for all test cases and may risk overfitting the data [13,9,10,14]. The time series based approach takes the reduced form approach and enjoys the reduced costs and lower level of dimensions. Typical time series models include ARIMA (Autoregressive Integrated Moving Average), GARCH (Generalized Autoregressive Conditional Heteroskedastic) [3]. For example, both EGARCH and GIGARCH models have been proposed for short-term electricity price forecasting [15,16]. Ref. [17] explores the usefulness of time series models in electricity markets by allowing for nonparametric innovations in autoregressive models. The time series approach is developing rapidly as new stylized facts of data are revealed continuously. Some recent empirical studies on the fractal and multi scale data characteristics indicate the emergence of multi scale modeling as the important development in the field [18]. Wavelet analysis, as one popular multi scale modeling technique, has been introduced to model not only horizontal dependency in the time domain such as volatility clustering (conditional heteroscedasticity) and long memory (slow decaying autocorrelation), but also vertical dependency across time domain simultaneously [19e23]. Ref. [24] has introduced the univariate wavelet analysis in constructing effective forecasting algorithm in the crude oil market. Ref. [25] proposes a novel technique to forecast day-ahead electricity prices based on the wavelet transform and ARIMA models [25]. Ref. [26] proposes a new hybrid forecast technique based on WT (wavelet transform), CLSSVM (chaotic least squares support vector machine) and EGARCH (exponential generalized autoregressive conditional heteroskedastic) model to predict electricity prices more precisely. Compared to the theoretically superior but practically much restricted wavelet approach, the recently emerging EMD (Empirical Mode Decomposition) takes an empirical, intuitive, direct and selfadaptive data approach as the alternative [27,9]. The univariate EMD analysis has been used to decompose and model the data at finer scale in different financial markets. For example, Ref. [27] firstly applied EMD to financial data analysis and found the EMD algorithm offer much better temporal and frequency resolutions, which spawns a series of researches in the field [27]. Refs. [28,9] propose a series of multi-scale neural network ensemble learning paradigm based on EMD for crude oil price prediction [28,9]. Ref. [29] uses the Hilbert Huang transform to identify the consistency between frequency components of both indicators data and real data, in both industrial production and consumption sectors [29]. Ref. [30] proposes a differential EMD for improving prediction of exchange rates under SVR (Support Vector Regression). [30]. Ref. [31] combines a multi-output FFNN (feed forward neural network) with EMD-based signal filtering and seasonal adjustment. Results demonstrate that the proposed model improves the forecasting accuracy noticeably comparing with the existing models [31]. Ref. [32] proposes a forecasting model that detaches high volatility and daily seasonality for electricity price based on EMD. The comparisons demonstrate that the proposed model can improve the prediction accuracy noticeably [32]. Interested readers are referred to [33] for the comprehensive survey [33]. But till now majority of researches in the literature use EMD as a black box technique in the market event study and trend analysis, with less attentions paid to the identification of underlying data components. In the meantime, we have identified very limited attempts in modeling the correlations and co-movements across markets when constructing the forecasting algorithm, with much less attentions paid to the use of EMD in the multivariate setting [34]. In this paper we propose an innovative BED (Bivariate EMD Denoising) based forecasting methodology. Empirical studies are conduced in the closely related NSW (New South Wales) and QLD

(Queensland) electricity markets in Australia to evaluate the additional value offered by the incorporation of nonlinear multiscale cross markets correlations in the proposed algorithm. Our contributions are two-fold. Firstly based on the HMH (Heterogeneous Market Hypothesis), we provide the empirical evidence of heterogeneous data characteristics distinguishable by sizes. We further incorporate this stylized fact in the construction of the innovative BED based forecasting algorithms for forecasting. Secondly we propose the minimum Error Entropy as a measure for in-sample performance to select the optimal level to be considered as noise and the decomposition level. Practically the performance improvement of the proposed model is of nontrivial size compared to those performance improvement of other algorithms in the literature. The performance is calculated using the log differenced electricity price data, and the accuracy improvement of return around 1% is nontrivial for investors. More importantly the performance improvement is statistically robust. In the literature overfitting of the forecasts to the data window is one of the most difficult research problem on the forecasting field. The significant performance improvement may be sensitive to the data window, and the performance may deteriorate if alternative data window is used. It is shown that the performance of the proposed model is stable with different data window. This indicates lower level of overfitting issue and the model risk with the proposed model. The organization of the present paper proceeds as follows. Section 2 describes the methodology. Experiment results for empirical studies are reported and analyzed in Section 3, based on which finalizing conclusions and remarks are drawn in Section 4. 2. Methodology In this section, a brief introduction of the BED theory is first provided. Then the BED is used to analyze the time varying correlations. Finally, we further construct the BED based electricity price forecasting algorithm. 2.1. Bivariate EMD EMD (Empirical Mode Decomposition) was initially proposed for the study of ocean waves, and then successfully applied in biomedical engineering, structured health monitoring and other natural science and engineering areas [35,10]. Compared with Fourier and wavelet analysis, EMD offers much better temporal and frequency resolutions [27]. EMD can adaptively decompose a time series into several independent IMF components and one residual component. The fluctuations within a time series are automatically and adaptively selected from the time series. EMD is a data driven method with very few assumptions, thus it can be used for data series of nonlinear and nonstationary nature [9]. Univariate EMD decomposes the data into several IMFs (Intrinsic Mode Function), based on the intuitive notion of oscillation which naturally relates to local extrema. Each IMF is a nearly periodic zero mean function, with variable amplitude and frequency at different times. In practice, the IMFs are extracted through a sifting process. In the two dimensional case, the notion of oscillation is extended to the notion of rotation. The BEMD (Bivariate EMD) algorithm views the signal as fast rotations superimposed on slower rotations [36]. For convenience, the bivariate time series are expressed as the complex-valued signal zðtÞ ¼ xi þ y. For any complex-valued signal xðtÞ, BEMD involves the following steps: (1) Choose the number of directions k to calculate the envelope curve.

K. He et al. / Energy 91 (2015) 601e609

(2) Project the complex-valued signal zðtÞ on the direction fi ; i21:::j : pfj ðtÞ ¼ Reðeifj zðtÞÞ. (3) Calculate the maxima of pfj ðtÞ. Extract the locations tij . (4) Interpolate the set ðtij ; xðtij ÞÞ using the chosen curve fitting algorithm such as cubic spline algorithm, to generate the envelop curve on the direction fj : efj ðtÞ. (5) Repeat step (2) to (4) to obtain the envelop curve for all directions fj ; j ¼ 1:::k. (6) Calculate the mean curves mðtÞ from all envelope curves on P different directions: mðtÞ ¼ 1k efj ðtÞ. j

(7) Subtract the mean from the original signal xðtÞ. The sifting elementary operators are define as SB ½zðtÞ ¼ zðtÞ  mðtÞ. (8) Repeat the previous steps until the residual satisfies some stopping criterion. The original bivariate time series can be expressed as the sum of some IMFs and a residue:

Zk zðtÞ ¼

ci ðtÞ þ rk ðtÞ

(1)

603

As illustrated in Fig. 1, the BED based electricity forecasting algorithm involves the following steps. (1) The Bivariate EMD algorithm is used to separate data from noises as in (2)

rt ¼ rt;DN þ rt;N

(2)

where rt rt;DN , rt;N refer to the original, denoised and noise bivariate return series at time t. (2) The denoised data are supposed to following some particular stochastic process. The conditional mean matrix for the denoised data and noise data are modeled by employing the particular conditional time series models. In this paper, we adopt Vector Autoregressive (VAR) processes as in (3)

b r t ¼ dt þ

m X

fi rti þ εt

(3)

i¼1

i¼1

2.2. A BED (Bivariate EMD Denoising) based forecasting model for electricity price Homogeneity and rationality are two basic assumptions imposed in the traditional EMH (Efficient Market Hypothesis) behind major multivariate electricity price forecasting algorithm. To recognize the diverse heterogeneous data features available in the high frequency data, we propose the HMH instead. In HMH, the heterogeneous market microstructure is acknowledged explicitly, by assuming different investors strategy, scale and time horizon, just to name a few [37e43]. Following HMH framework, the electricity market receives the joint influence from market agents with different defining characteristics including investment strategies, time horizons and investment scales. This relaxation of traditional EMH has led to many potential research issues. During the modeling process, we make some further simplifying assumptions to fully utilize the techniques we propose. In this paper, to explore the potential of BED algorithm, we make the following assumptions: (1) The market is dominated with some main investment strategies, stable and stationary over the period of analysis (2) Different investment strategies, distinguishable by sizes, are mutually independent. Then we propose the BED (Bivariate EMD Denoising) based electricity price forecasting algorithm, which is illustrated in the block diagram in Fig. 1.

where b r t is the conditional bivariate return matrix at time t, rti is the lag i return matrix with coefficient matrix fi . And εt is the residuals in the period. VAR is chosen over more theoretically sound VARMA (Vector Autoregressive Moving Average) model in this paper due to the following reasons: firstly there is lack of authoritative methodology to uniquely identify and estimate VARMA model, although some initial attempts have been made [44]. VAR is still by far the most well established and widely applied multivariate time series models in the literature. Secondly any invertible Vector ARMA can be approximated by VAR with infinite order [45]. We use the information criteria to determine the optimal specification for the VAR model. The information criteria used are the AIC and BIC minimization principal as in (4)

  2 2l 2 l Min AICðlÞ ¼  lnðlÞ þ ; BICðlÞ ¼  lnðlÞ þ lnT T T T T

(4)

(3) Using the in-sample data, different criteria such as MSE and Entropy are used to identify the noise component to remove. The minimization of MSE corresponds to the minimization of error variance while the minimization of Error Entropy corresponds to maximization of information content captured in the forecasts. Given two random variables X2Rn , where X is the random variable generated with unknown parameters, and Y2Rn is the observation, the error entropy is defined as (5) [46].

Fig. 1. Block diagram for algorithm procedure.

604

K. He et al. / Energy 91 (2015) 601e609

Z



g ¼ HðX  gðYÞÞ ¼ E½  logpðX  gðYÞÞ ¼ 

g

g

p ðxÞlogp ðxÞdx Rn

(5) where gðYÞ refers to the estimate of X based on Y through g as the measurable function, Hð:Þ refers to the Shannon entropy, pg ðxÞ refers to the probability density function (PDF). (4) With the noise level determined, repeat step 1e3 to forecast the return. 3. Empirical studies 3.1. Empirical analysis of dynamic behaviors ANEM (Australian National Electricity Market) is one of the most deregulated electricity market, which provides the ideal testing ground for empirical studies. It consists of five regional markets, including NSW (New South Wales), QLD (Queensland), SA (Southern Australia), VIC (Victoria) and Tasmania. With the deregulation movement, they make regional electricity prices and demands data publicly available at various frequencies and over sufficient length of time periods to serve the interests of both investors and academics. The dataset is compiled from the original data published by AEMO (Australian Energy Market Operator) publicly. The data set spans from 1 January 2004 to 25 December 2013. The starting date is chosen when market structure stabilizes from the deregulation and restructuring process. The end date is chosen due to data availability when the research was conducted. This amounts to 3532 daily observations. The data set is pre-processed for recording errors and asynchronous holiday interruptions. Since there is no consensus on the division of data set, either in the machine learning or econometric literature [47], when dividing the dataset, we follow the common criteria that reserve at least 70% data as the training set and retain sufficiently large size of the test set for the results to be statistically valid [48]. To further test the generalizability of the proposed algorithm, we use different portions (time window) of the dataset to conduct the experiments. That include the 100%, 80% and 60% of the entire dataset. The tested data set is divided into two sub data set, i.e. the training set for the model tuning process to determine optimal noise level, decomposition level and other relevant parameters for the proposed BED VAR model (70%), and the test set for the out-of-sample test to evaluate the performance of different models (30%). We perform one day ahead forecast using rolling-window method. The negative price and the abnormally high level of price greater than 300 are treated as outliers. They were removed from the data set during the pre-process phase. Since Autocorrelation and Partial Autocorrelation function analysis indicate that the original data include trend factors, it is log dif  ferenced at the first order as rt ¼ ln PPtt to remove trend factors 1

when the data set is constructed. The returns are transformed to be scale free, which correspond to percentage changes in financial positions and have more attractive statistical properties such as stationarity, etc. The holding period is assumed to be 1 day. For each experiment, a portfolio of one asset position worth 1 USD is assumed. To determine whether simple linear models suffice for the data being analyzed and whether nonlinear models are necessary, tests for nonlinearity data characteristics are employed, which include BDS test [49,50], Bispectrum test [51], and Bicorrelation test [52], etc. Among them, BDS test, since its introduction by Brock, Dechert, and Scheinkman, has become the standard for testing

independence or non linear dependencies in the data. The null hypothesis for the test is that elements of the time series are IID (independently and identically distributed). The BDS test statistics Wm ðεÞ satisfying asymptotically Nð0; 1Þ are computed as in (6)

Wm ðεÞ ¼

 pffiffiffi n Cm ðεÞ  C1 ðεÞm sm ðεÞ

(6)

where Cm ðεÞ refers to the fraction of m-tuples in the series. sm ðεÞ denotes the estimate of the standard deviation. The null hypothesis is rejected when tested statistics exceed a given critical value (1.95 at 95% confidence level). The rejection of null hypothesis implies that there is some kind of non linear dependencies, or even chaotic hidden structure in the data. The null hypothesis for tests for departures from normality is that the data distribution is symmetric and mesokurtic (of normality). By performing statistical hypothesis test, it can be found out whether the data distribution statistically conforms to normal assumptions. There is a variety of tests available including JarqueeBera test for normality and Pearson chi-square tests, etc (Brooks, 2002). Among them, JarqueeBera test for normality is the most commonly applied test in practice. JarqueeBera test for normality tests whether the third and fourth moments (I.e. skewness and kurtosis) of data series are jointly zero. The test statistics W is calculated as in (7)

" W¼T

b21 ðb2  3Þ2 þ 6 24

# (7)

where T is the sample size. b1 and b2 are coefficients of skewness and kurtosis respectively as defined in (8) and (9).



E u3 b1 ¼ 3 s2 2

(8)



E u4 b2 ¼ 2 s2

(9)

The test statistics asymptotically follow c2 ð2Þ distribution. The calculated test statistics are compared to critical values to determine the degree of statistical significance of the null hypothesis. When the data size is not very large, rejection of null hypothesis implies that the sample distribution deviates from normal distribution significantly. Descriptive statistics of the price and return data in both NSW and QLD markets are listed in Table 1, where pNSW and pQLD refer to the daily price in NSW and QLD regions respectively. rNSW and rQLD refer to the log differenced return of both pNSW and pQLD . pJB and pBDS refer to the p value of both JarqueeBera and BDS test respectively.

Table 1 Descriptive statistics and statistical tests of the price and return data. Statistics

pNSW

pQLD

rNSW

rQLD

Mean Max Min Std Skewness Kurtosis pJB pBDS

35.0060 281.1800 13.8700 27.3794 4.8913 33.3504 0.0010 0.0000

32.0323 292.7700 0.5700 25.2359 4.6300 32.0547 0.0010 0.0000

0.0000 2.2698 2.3521 0.3349 0.2753 18.2008 0.0010 0.0000

0.0000 4.2669 3.6818 0.3756 0.2792 23.3018 0.0010 0.0000

K. He et al. / Energy 91 (2015) 601e609

Descriptive statistics in Table 1 show some interesting stylized facts. The market exhibits considerable fluctuations, as suggested by the significant volatility level. The distribution of the market price is fat-tail and leptokurtic, as suggested by significant skewness and kurtosis levels. There is also high level of market risk exposure due to extreme events in the market, as reflected in the significant kurtosis level. The market return also deviates from the normal distribution and exhibit nonlinear dynamics. This is further confirmed by the rejection of both JarqueeBera test of normality and BDS test of independence at 95% confidence levels [53,54]. We further calculated the descriptive statistics on the return data and found that it approximates the normal distribution, as indicated by the four moments. The kurtosis appears to deviate from the normal level, which indicates that the market exhibits significant abnormal return changes event. Besides, since the null hypothesis of both JB and BDS test are rejected, this further indicates that the market return contains unknown nonlinear dynamics, not easily captured by traditional linear models. The electricity markets are subject to frequent and abrupt external shocks. This is exemplified in the plot of returns of both NSW and QLD markets as in Figs. 2 and 3. It can be seen from Figs. 2 and 3 that prices in both markets share some common features while demonstrating their unique characteristics. For example, electricity prices in both markets exhibit significant fluctuations. Abrupt changes in both markets occur at approximately similar periods of time before 2006. But the price volatility in two markets is different, subject to zone pricing system [55,56]. Since 2006, the unique characteristics is more prominent. The fluctuations in NSW market is significantly higher than that in QLD subject to different regional pricing mechanism. Since 2008 when the subprime mortgage crisis and European debt crisis break out in turn, the price fluctuation in the two markets evolves in more synchronized manner, subject to different market situation with different counteracting measures. The correlation over the sample period between the NSW and QLD prices is 0.5691. The p value for its statistical significance is 0. We further calculate the correlations using the rolling window of 252 and plot the results in the following Fig. 4.

605

Based on the aforementioned analysis on both price and correlation plots, we further calculated descriptive statistics for four periods and results are listed in Table 2, where pi ; i ¼ 1; 2; 3; 4 refers to correlations over period 1 (1 January 2004 to 31 December, 2005), period 2 (1 January 2006 to June 29, 2007), period 3 (June 30, 2007 to November 19, 2009) and period 4 (November 20, 2009 to September 9, 2013). pJB and pBDS refer to p value of both JarqueeBera and BDS test respectively. Results in Fig. 4 and Table 2 further confirm that the correlations among different electricity markets are dynamically changing and shifting among different regimes over different periods at the significant level. From June 29, 2007 the correlation decrease steeply because of the on-going electric power reform, together with the impacts of the subprime crisis. However, on November 19, 2009, the correlation experiences another wave of fluctuations due to the European debt crisis where with risk averse counteracting measures demand as well as the capital spending will be reduced in the field of electric power [57]. In Fig. 5, we illustrate the decomposed bivariate electricity prices components across different scales using BEMD algorithm. Experiment results in Fig. 5 show that the decomposed data components exhibit drastically different behavior and volatility levels. The decomposed data components at lower scales are significantly more volatile than the decomposed data components at higher scale. The correlation between NEW and QLD electricity markets at diferent scales also exhibit drastically different strength and behaviors. 3.2. Experiment settings To evaluate the performance of the proposed algorithm against the benchmark ones, we mainly use MSE (Mean Square Error) and MAE (Mean Absolute Error) to measure the level predictive accuracy and use Clark West test of equal predictive accuracy [58e60]. MAPE (Mean Absolute Percentage Error) results are also reported. However, work by Ref. [61] suggests that MAPE are preferred when the data are positive [61]. MAE are preferred otherwise. The forecasts in this paper are made based on the scale independent log differenced data. The MAPE As there are some negative and close to zero values, MAPE may significantly underestimate the forecasts.

1400

1200

1000

Price

800

600

400

200

0 01/01/2004 14/05/2005 26/09/2006 08/02/2008 30/06/2009 12/11/2010 26/03/2012 Date

Fig. 2. Electricity price in NSW market.

25/12/2013

606

K. He et al. / Energy 91 (2015) 601e609 1400

1200

1000

Price

800

600

400

200

0 01/01/2004 14/05/2005 26/09/2006 08/02/2008 30/06/2009 12/11/2010 26/03/2012 Date

25/12/2013

Fig. 3. Electricity price in QLD market.

1 0.9 0.8

Correlation

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 01/01/2004−09/09/2004

26/09/2006−05/06/2007

30/06/2009−09/03/2010 Date window

17/04/2013−25/12/2013

Fig. 4. Dynamic correlations between NSW and QLD markets.

We mainly rely on MSE and MAE to evaluate the model performance. The CW test was proposed to adjust DM (Diebold Mariano) statistics since the originally proposed test statistics were upward

biased heavily if models tested were nested. The test statistics is defined in (10).

d CW ¼ pffiffiffiffi b P V

Table 2 Descriptive statistics and statistical tests of four time periods. Statistics

p1

p2

p3

p4

Min Max Mean Std Kurtosis Skewness pJB pBDS

0.5796 0.9706 0.8197 0.0999 2.9039 0.5218 0.0010 0

0.6446 0.9789 0.7546 0.0982 3.3450 1.1734 0.0010 0

0.0187 0.9829 0.4161 0.298 1.5163 0.2014 0.0010 0

0.1021 0.9912 0.5086 0.4126 1.3615 0.2672 0.0010 0



T 1 X 2 dt ; dt ¼ e21t  e22t þ ð b y 1t  b y 2t Þ P t¼Rþh

b ¼1 V P

T T m

þ 2 X   X X dt  d dt  d dtj  d uðj; mÞ P j¼1

t¼Rþh

t¼jþrþh

(10) where d is sample mean while P is the sample size.

K. He et al. / Energy 91 (2015) 601e609

607

Fig. 5. Decomposed return data at different scales.

The null hypothesis for the test is: H0 : Edt ¼ 0, i.e. equal predictive accuracy for MSPE (Mean Square Percentage Error). The test statistics have an asymptotically normal distribution. The directional statistic (Dstat ) is used to measure the forecasting accuracy for directional price movement [62]. It is defined as the percentage of correctly predicted directional price movement, as defined in (11).

Dstat ¼

N 1X at  100% N i¼1

(11)

where at ¼ 1 if ð xd tþ1  xt Þðxtþ1  xt Þ  0, and at ¼ 0 otherwise. The general standardized test statistics for PT test are as defined in (12):

sn ¼

pffiffiffiffi 1 N V n 2 Sn

where     b n ¼ vf ðPÞ b vf ðPÞ V U vP P¼b vP P¼b P P nij bP b0 b ¼c ;U JP N m  X ni0 b b P b b pii  P Sn ¼ i0 0i ; P i0 ¼ N i¼1

(12)

Pij ¼

where nij is the number of observations in the ij category of conb tingency table. c J is an m2  m2 diagonal matrix with elements of P on the diagonal. The null hypothesis of the test is that the actual and predicted values of variables are independent of each other. The test statistics conform to the Chi-square distribution with one degree of freedom.

MSEs for the benchmark Naive and VAR model are ð0:110476; 0:091283; 0:097204Þ and ð0:110439; 0:091240; 0:097180Þ. Then the BED VAR model is applied to the testing data with different levels removed. The model orders for VAR (r,m) processes are determined following Information Criteria (IC) such as AIC and BIC minimization. In Table 3, EEi ; i ¼ 1; 2; 3 and MSEi;101 ; i ¼ 1; 2; 3 refer to the Error Entropy and MSE for the model forecasts over different over three sub-data set i, where i ¼ 1; 2; 3 corresponding to the dataset constructed using 100%, 80% and 60% of the original dataset respectively. Experiment results in Table 3 further confirm that the forecasting accuracy of VAR model based on denoised data is sensitive to the BED parameters used to denoise the original data. Some of the BED parameters can improve the forecasting performance to the level that beats the traditional benchmark models significantly. E.g. the BED denoising VAR outperforms both Naive and VAR model when level 3 BED is treated as noise and is removed. Meanwhile, the forecasting accuracy decreases as a result Table 3 Model performance with different mode parameters using the model tuning data. Level

EE1 ðMSE1;101 Þ

EE2 ðMSE2;101 Þ

EE3 ðMSE3;101 Þ

1 2 3 4 5 6 7 8 9 10

1.2367 1.2225 1.2712 0.9759 1.0607 0.9217 1.0514 0.9735 1.0163 1.1779

1.2040 1.1166 1.1714 0.9081 1.0515 0.6764 0.8308 0.8903 1.0689 1.2878

6.0344 6.0451 6.0319 6.0659 6.0395 6.0391 6.0541 6.0515 6.0485 6.0299

(1.10407) (1.10435) (1.10398) (1.10448) (1.10476) (1.10382) (1.10364) (1.10426) (1.10344) (1.10437)

(0.91187) (0.91192) (0.91283) (0.91211) (0.91191) (0.91226) (0.91193) (0.91208) (0.91112) (0.91193)

(0.97186) (0.97184) (0.97146) (0.97143) (0.97178) (0.97223) (0.97139) (0.97092) (0.97202) (0.97144)

608

K. He et al. / Energy 91 (2015) 601e609

of improper denoting model used. E.g. The BED denoising VAR model failed to beat the benchmark models when level 1 is treated as noise and is removed. As recent empirical studies show that entropy measure serves as a better optimization criteria for choosing the model specification and parameters [63]. In this paper we adopt the error entropy measure as the criteria when identifying the appropriate model specification. The error entropy minimization principle is adopted and the optimal model specifications chosen are (6,6,10). Where MSEi;101 ; i ¼ 1; 2; 3 refer to Mean Square Error over three sub-data set i, where i ¼ 1; 2; 3 corresponding to the dataset constructed using 100%, 80% and 60% of the original dataset respectively.CWi;N and CWi;V refer to p value of Clark West test of equal predictive accuracy, against Naive and VAR model respectively. Dstati;% refers to the proportion of correctly predicted market timing. PTi;p refers to the p value of the Pesaran Timmermann test of directional accuracy respectively. Experiment results in Table 4 show that the proposed algorithm outperforms the benchmark Naive and VAR model, in terms of MSE, MAE and directional accuracy. The Clark West test of equal predictive accuracy suggests that the performance gap is significant at 15%, 13% and 23% confidence level against the Naive model respectively and 25%, 38% and 27% confidence level against VAR model respectively. The D statistics show that the proposed algorithm can achieve higher level of directional forecasting accuracy, and thus capture better the market timing. The ratios of the correctly predicted direction are 50.52%, 51.23% and 50.76% respectively, which are higher than the 50% in Naive model. They are higher than those ratios of VAR models in the case of 100% and 80% data window, and are at the comparable level for 60% data window. The performance superiority is further confirmed with the p value of Pesaran Timmermann test of directional predictive accuracy at 0.3773, 0.1181 and 0.1629 respectively. In the case of 100% and 80% data window, these p value improve for the proposed model than those for the VAR model. In the case of 60% data window, p value suggests that the market timing accuracy of the proposed model is statistically significant at 84% confidence level. Meanwhile, Out-of-sample performance suggests that using the proposed Entropy measure leads to improved model performance out-of-sample. It has achieved the lowest MSE in general. Although

Table 4 Model performance with optimized model parameters using the out-of-sample data. Naive

VAR

BED

MSE1;101 MAPE1 MAE1;101 CW1;N CW1;V Dstat1;% PT1;p

1.10476 1.0013 1.9908 N/A 0.1594 N/A N/A

1.10439 1.0330 1.9910 0.1594 N/A 49.21 0.4398

1.10382 1.0588 1.9900 0.1513 0.2455 50.52 0.3773

MSE2;101 MAPE2 MAE2;101 CW2;N CW2;V Dstat2;% PT2;p

0.91283 0.9993 1.7836 N/A 0.1746 N/A N/A

0.91240 1.0166 1.7838 0.1746 N/A 47.79 0.3095

0.91226 1.0096 1.7828 0.1257 0.3836 51.23 0.1181

MSE3;101 MAPE3 MAE3;101 CW3;N CW3;V Dstat3;% PT3;p

0.97204 1.0003 1.7386 N/A 0.2910 N/A N/A

0.97180 1.0207 1.5545 0.2910 N/A 51.85 0.0690

0.97144 1.0246 1.7380 0.2273 0.2785 50.76 0.1629

the MAPE statistics show the deteriorating performance of the proposed model. The MAPE results are not reliable as the data contain negative and close to 0 values [61]. The performance improvement is attributed to the analysis of latent structure with the identification and the appropriate removal of noises using BED analysis, as well as the optimal noise level and parameters selection based on Error Entropy minimization principle. These results further imply that the electricity price data is complicated processes with a mixture of underlying DGPs of different natures. There may be redundant representation of the underlying latent structure since there lacks explicit and analytic solutions. The determination of optimal noise level and parameters, heterogeneous in nature, holding the key to the further performance improvement and more thorough understanding of the DGPs during the modeling process. Compared to the previous approaches in the literature, the proposed model is unique in that it reveals and models the multiscale characteristics in the data. There are two major advantages for the proposed models. Firstly, despite the increasing evidence of multiscale data characteristics, the exploitation of the multiscale data feature in the forecasting applications remains a challenging research issue, where the multiscale data characteristics in the higher dimensions have not been accounted for in the previous multivariate forecasting approaches. In this paper we have provided the alternative evidence that the modeling of the multi scale data features could lead to effective and robust performance improvement. Secondly, the determination of the BED parameters represent an optimization problem over a set of redundant parameters. The BED parameters were largely chosen in an arbitrary manner in the previous approaches. In our approaches we have proposed an explicit in-sample search method for determining the BED parameters. This has led to the significantly improved robustness of the forecasting results. The performance of the previous forecasting approaches is sensitive to the data time window. The superior performance of the proposed model in this paper is robust and consistent over different data time window. 4. Conclusions In this paper, we propose the HMH based theoretical framework for electricity price forecasting. Under the proposed theoretical framework we further propose the BED based electricity price forecasting methodology, as one particular implementation. We found that the electricity price behaviors are affected by noises and main trends, co-existing in a heterogeneous manner. The BED offers an important alternative than traditional denoising approach to separate noise and data based on their heterogeneous characteristics, as well as to recover the useful data for further modeling by VAR model. Results from empirical studies confirm the performance improvement of the proposed models, against the benchmark models. Work done in this paper suggests that the more accurate separation of data from noises lead to better behaved data and higher level of model generalizability. Acknowledgment This work is supported by the National Science Fund for Distinguished Young Scholars (NSFC No. 71025005), the National Natural Science Foundation of China (NSFC No. 71201054, No. 91224001), the National Program for Support of Top-Notch Young Professionals, the Strategic Research Grant of City University of Hong Kong, and the Fundamental Research Funds for the Central Universities in BUCT.

K. He et al. / Energy 91 (2015) 601e609

References [1] Weron A, Weron R. Fractal market hypothesis and two power-laws. Chaos Solit Fractals 2000;11(1e3):289e96. [2] Deng S, Oren S. Electricity derivatives and risk management. Energy 2006;31(6):940e53. [3] Aggarwal SK, Saini LM, Kumar A. Day-ahead price forecasting in ontario electricity market using variable-segmented support vector machine-based model. Electric Power Components and Systems 2009;37(5):495e516. [4] Ventosa M, Ballo lvaro, Ramos A, Rivier M. Electricity market modeling trends. Energy Policy 2005;33(7):897e913. [5] Otero-Novas I, Meseguer C, Batlle C, Alba J. A simulation model for a competitive generation market. Power Syst IEEE Trans 2000;15(1):250e6. [6] Rudkevich A, Duckworth M, Rosen R. Modeling electricity pricing in a deregulated generation industry: the potential for oligopoly pricing in a poolco. Energy J 1998;19(3):19e48. [7] Borenstein S, Bushnell J, Kahn E, Stoft S. Market power in california electricity markets. Util Policy 1995;5(3/4):219e36. [8] Deb R, Albert R, Hsue L-L, Brown N. How to incorporate volatility and risk in electricity price forecasting. Electr J 2000;13(4):65e75. [9] Yu L, Wang S, Lai KK. Credit risk assessment with a multistage neural network ensemble learning approach. Expert Syst Appl 2008;34(2):1434e44. [10] Zhang X, Lai KK, Wang S-Y. A new approach for crude oil price analysis based on empirical mode decomposition. Energy Econ 2008;30(3):905e18. [11] Guo J-J, Luh P. Improving market clearing price prediction by using a committee machine of neural networks. Power Syst IEEE Trans 2004;19(4): 1867e76. [12] Zhang L, Luh P. Neural network-based market clearing price prediction and confidence interval estimation with an improved extended kalman filter method. Power Syst IEEE Trans 2005;20(1):59e66. [13] Clements MP, Franses PH, Swanson NR. Forecasting economic and financial time-series with non-linear models. Int J Forecast 2004;20(2):169e83. [14] He K, Lai KK, Guu S-M, Zhang J. A wavelet based multi scale var model for agricultural market. In: Thi H, Bouvry P, Dinh T, editors. Modelling, computation and optimization in information systems and management sciences, proceedings; 2008. p. 429e38. Vol. 14 of Communications in Computer and Information Science, Univ Luxembourg; UPV M, Lab Informat Theor Appl; UPV M, UFR Math Informat Mecan Automat; Fonds Natl Rech Luxembourg; UL, Comp Sci & Commun Res Unit; Conseil Gen Moselle; Conseil Reg Lorraine, Springer-Verlag Berlin, Heidelberger Platz 3, D-14197 Berlin, Germany. [15] Bowden N, Payne JE. Short term forecasting of electricity prices for {MISO} hubs: evidence from arima-egarch models. Energy Econ 2008;30(6):3186e97. [16] Diongue AK, Gugan D, Vignal B. Forecasting electricity spot market prices with a k-factor {GIGARCH} process. Appl Energy 2009;86(4):505e10. [17] Weron R, Misiorek A. Forecasting spot electricity prices: a comparison of parametric and semiparametric time series models. Int J Forecast 2008;24(4): 744e63. [18] Wen F, Li Z, Xie C, Shaw D. Study on the fractal and chaotic features of the shanghai composite index. Fractals-Complex Geometry Patterns Scaling Nat Soc 2012;20(2):133e40. [19] Peters E. Fractal market analysis: applying chaos theory to investment and economics. New york: Wiley; 1994. [20] Gencay R, Selcuk F, Ulugulyagci A. High volatility, thick tails and extreme value theory in value-at-risk estimation. Insur Math Econ 2003;33(2):337e56. [21] Tiwari AK, Dar AB, Bhanja N. Oil price and exchange rates: a wavelet based analysis for India. Econ Model 2013;31:414e22. [22] Reboredo JC, Rivera-Castro MA. A wavelet decomposition approach to crude oil price and exchange rate dependence. Econ Model 2013;32:42e57. [23] Orlov AG. A cospectral analysis of exchange rate comovements during asian financial crisis. J Int Financial Mark Institutions Money 2009;19(5):742e58. [24] He K, Yu L, Lai KK. Crude oil price analysis and forecasting using wavelet decomposed ensemble model. Energy 2012;46(1):564e74. [25] Conejo AJ, Contreras J, Espinola R, Plazas MA. Forecasting electricity prices for a day-ahead pool-based electric energy market. Int J Forecast 2005;21(3): 435e62. [26] Zhang J, Tan Z. Day-ahead electricity price forecasting using wt, clssvm and egarch model. Int J Electr Power & Energy Syst 2013;45(1):362e8. [27] Huang NE, Wu M-L, Qu W, Long SR, Shen SS. Applications of hilbertehuang transform to non-stationary financial time series analysis. Appl Stoch Models Bus Industry 2003;19(3):245e68. [28] Yu L, Lai KK, Wang S, He K. Oil price forecasting with an emd-based multiscale neural network learning paradigm. In: Computational ScienceeICCS 2007. Springer; 2007. p. 925e32. [29] Crowley PM, Schildt T. An analysis of the embedded frequency content of macroeconomic indicators and their counterparts using the Hilbert-Huang transform. Research Discussion Papers 33/2009, Bank of Finland. Dec. 2009.

609

[30] Premanode B, Toumazou C. Improving prediction of exchange rates using differential emd. Expert Syst Appl 2012;40(1):377e84. [31] An N, Zhao W, Wang J, Shang D, Zhao E. Using multi-output feedforward neural network with empirical mode decomposition based signal filtering for electricity demand forecasting. Energy 2013;49(1):279e88. [32] Dong Y, Wang J, Jiang H, Wu J. Short-term electricity price forecast based on the improved hybrid model. Energy Convers Manag 2011;52(8):2987e95. [33] Crowley PM. How do you make a time series sing like a choir? Extracting embedded frequencies from economic and financial time series using empirical mode decomposition. Stud Nonlinear Dyn Econ 2012;16(5). Article Number: 4. [34] Lindstram E, Regland F. Modeling extreme dependence between european electricity markets. Energy Econ 2012;34(4):899e904. [35] Guanlei X, Xiaotong W, Xiaogang X, Lijia Z. Improved emd for the analysis of fm signals. Mech Syst Signal Process 2012;33:181e96. [36] Rilling G, Flandrin P, Gonalves P, Lilly JM. Bivariate empirical mode decomposition. Signal Processing Letters, IEEE 2007;14(12):936e9. [37] Brock WA, Hommes CH. A rational route to randomness. Econometrica 1997;65(5):1059e95. [38] Brock WA, Kleidon AW. Periodic market closure and trading volume: a model of intraday bids and asks. J Econ Dyn Control 1992;16(3e4):451e89. [39] Brock WA, LeBaron BD. A dynamic structural model for stock return volatility and trading volume. Rev Econ Statistics 1996;78(1):94e110. [40] Hommes CH. Financial markets as nonlinear adaptive evolutionary systems. Quant Finance 2001;1(1):149e67. [41] Dacorogna MM, Genay R, Muller UAO, Pictet RB, O.V.. An introduction to highfrequency finance. San Diego: Academic Press; 2001. [42] H. C, Goeree JK. Heterogeneous beliefs and the non-linear cobweb model. J Econ Dyn Control 2000;24(38):761e98. [43] Farmer J. Market force, ecology and evolution. Industrial Corp Change November 2002;11(59):895e953. [44] Athanasopoulos G, Vahid F. VARMA versus VAR for macroeconomic forecasting. J Bus Econ Statistics 2008;26(2):237e52. [45] Rachev ST, editor. Financial econometrics: from basics to advanced modeling techniques, Chichester. Hoboken, N.J.: Chichester: John Wiley; 2007. [46] Chen B, Principe JC. Some further results on the minimum error entropy estimation. Entropy 2012;14(5):966e77. [47] Hansen PR, Timmermann A. Choice of sample split in out-of-sample forecast evaluation, Economics Working Papers ECO2012/10. European University Institute; 2012. [48] Zou H, Xia G, Yang F, Wang H. An investigation and comparison of artificial neural network and time series models for chinese food grain price forecasting. Neurocomputing 2007;70(16/18):2913e23. [49] Brock WA, Hsieh DA, LeBaron BD. Nonlinear dynamics, chaos, and instability: statistical theory and economic evidence. Cambridge, Mass.: MIT Press; 1991. [50] Panagiotidis T. Testing the assumption of linearity. Econ Bull 2002;3(29):1e9. [51] Hinich M. Testing for gaussianity and linearity of a stationary time series. J Time Ser Analysis 1982;3(3):169e76. [52] Hsieh D. Implications of nonlinear dynamics for financial risk management. J Financial Quantitative Analysis 1993;28(1):41e64. [53] Jarque CM, Bera AK. A test for normality of observations and regression residuals. Int Stat Rev 1987;55(2):163e72. [54] Broock W, Scheinkman JA, Dechert WD, LeBaron B. A test for independence based on the correlation dimension. Econ Rev 1996;15(3):197e235. [55] Price J. Market-based price differentials in zonal and lmp market designs. Power Syst IEEE Trans 2007;22(4):1486e94. [56] Stoft S. Transmission pricing zones: simple or complex? Electr J 1997;10(1): 24e31. [57] J. Cannell, The financial crisis and its impact on the electric utility industry, Edison Electric Institute. [58] Alves Portela Santos A, Carneiro Affonso da Costa Jr N, dos Santos Coelho L. Computational intelligence approaches and linear models in case studies of forecasting exchange rates. Expert Syst Appl 2007;33(4):816e23. [59] Mariano FXDRS. Comparing predictive accuracy. J Bus Econ Statistics 1995;13(3):253e63. [60] Pesaran MH, Timmermann A. A simple nonparametric test of predictive performance. J Bus Econ Statistics 1992;10(4):461e5. [61] Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. Int J Forecast 2006;22(4):679e88. [62] Yu L, Wang SY, Lai KK. A novel nonlinear ensemble forecasting model incorporating glar and ann for foreign exchange rates. Comput Operations Res 2005;32(10):2523e41. [63] He K, Wang L, Zou Y, Lai KK. Value at risk estimation with entropy-based wavelet analysis in exchange markets. Phys A Stat Mech Appl 2014;408: 62e71.