Journal Pre-proof Indonesian electricity load forecasting using singular spectrum analysis, fuzzy systems and neural networks Winita Sulandari, Subanar, Muhammad Hisyam Lee, Paulo Canas Rodrigues PII:
S0360-5442(19)32103-6
DOI:
https://doi.org/10.1016/j.energy.2019.116408
Reference:
EGY 116408
To appear in:
Energy
Received Date: 26 January 2019 Revised Date:
16 October 2019
Accepted Date: 19 October 2019
Please cite this article as: Sulandari W, Subanar , Lee MH, Rodrigues PC, Indonesian electricity load forecasting using singular spectrum analysis, fuzzy systems and neural networks, Energy (2019), doi: https://doi.org/10.1016/j.energy.2019.116408. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.
Indonesian Electricity Load Forecasting Using Singular Spectrum Analysis, Fuzzy Systems and Neural Networks 1,2
1
3
4,5
Winita Sulandari , Subanar , Muhammad Hisyam Lee , Paulo Canas Rodrigues 1 Departement of Mathematics, Universitas Gadjah Mada, Indonesia 2 Study Program of Statistics, Universitas Sebelas Maret, Indonesia 3 Department of Mathematical Sciences, Universiti Teknologi Malaysia, Malaysia 4 CAST, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland 5 Department of Statistics, Federal University of Bahia, Brazil e-mail:
[email protected],
[email protected],
[email protected],
[email protected]
Abstract Electricity plays a key role in human life. This study presents several methods to forecast Indonesian electricity load demand and compares the performance of the methods. The Indonesian hourly and half-hourly load series tend to have multiple seasonal patterns. Singular Spectrum Analysis (SSA) is chosen because of its capability in decomposing the series into two separable components, a combination of cyclist and seasonal series and noise (irregular) components. In this paper we propose to model time series data by obtaining the forecast values with SSA considering the Linear Recurrent Formula (LRF) and, afterwards, to model the irregular component by fuzzy systems and neural networks (NN). The forecast values obtained from SSA-LRF are then compared with the forecast values obtained from the combining methods, i.e. SSA-LRFFuzzy and SSA-LRF-NN. Based on RMSE and MAPE, the SSA-LRF-NN is the most appropriate method to predict the future values of electricity load series. Four Indonesian electricity load data sets were considered in this study to validate the effectiveness of the proposed hybrid methods. The results show that the proposed methods, namely the SSALRF-NN algorithm can reduce the RMSE for the testing data from that obtained by SSALRF up to 83%. Keywords: electricity, SSA, Fuzzy, neural network, forecast
1. Introduction In Indonesia, the electricity needs tend to grow significantly from year to year [1]. Electrical energy cannot be stored because of the storage limitation but also because of the requirement of complex equipment. Therefore, it is important to set the suitability between the demand and the supply. When the demand is greater than the supply, PT. PLN (Government Corporation that supplies electricity needs in Indonesia) will do the blackout. Rotating blackouts continue to
occur throughout Sumatera, Kalimantan, Sulawesi, and areas of eastern Indonesia [2], which result in great inconvenience and major financial loss for costumers. To minimize the power outages, estimation of electricity demand and the electricity infrastructure development are needed. It is stated in Electricity Supply Business Plan (RPUTL = Rencana Umum Penyediaan Tenaga Listrik) [3] that the additional need for the power plant is 3.86 GWh this year of 2019 and that it will increase to 10.06 GWh next year. RUPTL is the Indonesian electric power system plan that is prepared for the next ten years. Meanwhile, its implementation and realization are reviewed annually for the purpose of adjusting to the dynamics of society and growth [3]. Furthermore, the forecasts for electricity demand are needed for planning power plants, transmissions, and substations to answer the short and long term electricity needs of all communities in the territory of Indonesia. In the last decade, researchers such as in [4–8] implemented some forecasting methods to Indonesian electricity load demand series. Suhartono [4] used double seasonal recurrent neural networks while Utomo et al. [8] discussed double seasonal ARFIMA for short term electricity load demand forecasting. Further, Suhartono et al. [5] combined ARIMA and ANFIS to improve the performance of forecasting values. Meanwhile, Sulandari et al. [6] combined exponential smoothing state space model with neural network (NN) and, recently, Sulandari et al. [7] discussed an SSA-based model and its combination with NN to forecast the electricity load demand series. Some of their conclusions is that the hybrid models tend to give better results than the single models. Generally, the fluctuation of electricity load series indicates the existence of trend, seasonal and irregular pattern that merge into one combination. The trend is affected by the changing public demand for electricity which is related to the population growth while the seasonal phenomenon may stem from factors such as the different activity between the trading day and holidays which affect the economic activities. These factors can affect the load values systematically and show the deterministic effect. Wei [9] suggested that the deterministic factors
can be eliminated from the series to obtain the stochastic component so that a satisfactory probabilistic model for the stochastic component can be determined. Many studies were done to obtain more accurate predictions. Application of linear models to the load series such as linear regression, exponential smoothing and ARIMA were discussed by [10–12]. However, these linear models cannot accommodate the nonlinear relationship in the load series. Further, Taylor [13] developed double seasonal exponential smoothing model. Taylor et al [14] compared the double seasonal ARIMA, double seasonal exponential smoothing, NN, regression method with principal component analysis, with the simples naïve benchmark and the naïve benchmark with error model which take into account the multiple seasonality in the data. Soares and Medeiros [15] proposed the two level forecasting model which combined the deterministic and stochastic model for the improvement of Brazilian load forecasting accuracy. The deterministic component was eliminated by trend and harmonic function to obtain stochastic component. Soares and Medeiros [15] implemented the autoregressive model to approximate the stochastic irregular component. Later, Wang et al. [16] combined the seasonal ARIMA, seasonal exponential smoothing, and weighted support vector machines to accommodate both seasonal and nonlinearity in the load series. Recently, a hybrid approach based on improved empirical mode decomposition was proposed by Zhang et al. [17]. Other researchers such as in [18–20] were interested in implementing singular spectrum analysis for improving the performance of load forecasting model. This work considers SSA hybrid model to enhance the forecasting results. SSA is considered as a tool to extract component patterns of the complex pattern series. Le et al. [20] combined SSA with autoregressive (AR) model and then compared its results with the AR model, the SSA with linear recurrence formulae (SSA-LRF) and a back-propagation neural network (BPNN) model for short-term power load forecasting. SSA–LRF is also possible to be combined with NN to solve the nonlinearity in the data (see [7,21]).
This study presents two new forecasting strategies that are applicable to time series data, i.e. combination SSA-LRF with weighted fuzzy time series, considering four different methods, and combination between SSA-LRF with NN. These methodologies are then employed to our main data set containing the hourly electricity load in Java-Bali area. Other applications were also considered to reinforce the great generality of our proposals. By this hybrid combination of existing methods, we obtain new and powerful strategies where it is expected that the deterministic relationships in the series can be well accommodated through the SSA-LRF model, while the nonlinear stochastic relationship problem can be handled by the fuzzy or NN frameworks. Therefore the proposed hybrid approach can enhance the forecasting accuracy performance. The forecasting performance for several-steps ahead is compared with that of a benchmark model that is SSALRF model. These comparisons are made by considering the root mean squared error (RMSE) and the mean absolute percentage error (MAPE). We also implemented the proposed hybrid approaches to other data sets, i.e. a half hourly electricity load of Bawen, Indonesia, and to another case study in a different field, i.e. the weekly US ending stocks of total gasoline, in order to show the effectiveness and generality of the proposed approaches. This paper is organized as follows. Section 2 explains briefly the basic SSA, SSA-LRF forecasting algorithm, fuzzy time series and NN. Section 3 describes the methodology of the research and Section 4 presents data and the result. Finally, the conclusions are presented in Section 5. 2. A Brief Overview of SSA-LRF, Fuzzy and NN Below we provide a brief review of SSA-LRF, fuzzy time series, and NN. 2.1. SSA-LRF
Let us consider a time series , = 1, 2, … , . For a given window
length (1 < < /2, the series , = 1, 2, … , is used to build the Hankel trajectory matrix with size × ,
⋯ ⋯ " ⋱ ⋮ ⋯ !
= ⋮ ⋮
(1)
where = − + 1. Some recommendations on the choice of the window
length are discussed in [22] and in [23].
In the next step, the trajectory matrix in Eq. (1) is decomposed into rank one matrices using singular value decomposition (SVD), = ∑)(* &'( +( ,(- ,
(2)
where ' ≥ ' ≥ ⋯ ≥ ') > 0 are the positive eigenvalues of the matrix X- , +(
and ,( = X- +( /&'( (for 4 = 1, 2, … 56 are left and right singular vectors of , respectively.
In the third step, the eigentriples (&'( , +( , ,( ) are grouping onto 7 disjoint subsets
89 (: = 1, 2, … 7; 7 ≤ 56 such that Eq. (2) can be represented as = => + =? + ⋯ + =@ ,
where =A = ∑B∈=A &'B +B ,B- for : = 1, 2, … 7 and 7 ≤ 5. Essentially we are
mostly interested in creating two main disjoint subsets: the signal that will be used for model fit and model forecasting, and the noise that should be discarded.
Finally, in the fourth step, each component =A is transformed into a time series
using the diagonal averaging algorithm. Further details about this algorithm can
be found in [24–28]. In case of data contaminated with outlying observations, a correlation analysis can be considered as in [29], and a general robust SSA algorithm for model fit as proposed by Rodrigues et al. [30] can be considered. Long time series can be handled with lower computational needs by using a As mentioned above, assuming that the time series D! = , =
randomized version of the SSA algorithm [31].
1, 2, … , can be represented as D! = D! + D! , where D! and D! are (6
(6
(6
(6
approximately separable for the window length . D! is regarded as noise and (6
the forecasts of the signal component D! can be obtained by using the LRF (6
([18,32]), that can be written as
Manuscript Number : EGY-D-19-00744 Highlights: •
Hybrid methods between SSA-LRF, WFTS, and NN are proposed for load forecasting
•
The proposed methodologies are of great generality and applicable to other TS data
•
The proposed method can reduce the RMSE obtained by SSA-LRF up to 83%
•
SSA-LRF-NN yields better performance accuracy than SSA-LRF-Fuzzy
Indonesian Electricity Load Forecasting Using Singular Spectrum Analysis, Fuzzy Systems and Neural Networks 1,2
1
3
4,5
Winita Sulandari , Subanar , Muhammad Hisyam Lee , Paulo Canas Rodrigues 1 Departement of Mathematics, Universitas Gadjah Mada, Indonesia 2 Study Program of Statistics, Universitas Sebelas Maret, Indonesia 3 Department of Mathematical Sciences, Universiti Teknologi Malaysia, Malaysia 4 CAST, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland 5 Department of Statistics, Federal University of Bahia, Brazil e-mail:
[email protected],
[email protected],
[email protected],
[email protected]
Abstract Electricity plays a key role in human life. This study presents several methods to forecast Indonesian electricity load demand and compares the performance of the methods. The Indonesian hourly and half-hourly load series tend to have multiple seasonal patterns. Singular Spectrum Analysis (SSA) is chosen because of its capability in decomposing the series into two separable components, a combination of cyclist and seasonal series and noise (irregular) components. In this paper we propose to model time series data by obtaining the forecast values with SSA considering the Linear Recurrent Formula (LRF) and, afterwards, to model the irregular component by fuzzy systems and neural networks (NN). The forecast values obtained from SSA-LRF are then compared with the forecast values obtained from the combining methods, i.e. SSA-LRFFuzzy and SSA-LRF-NN. Based on RMSE and MAPE, the SSA-LRF-NN is the most appropriate method to predict the future values of electricity load series. Four Indonesian electricity load data sets were considered in this study to validate the effectiveness of the proposed hybrid methods. The results show that the proposed methods, namely the SSALRF-NN algorithm can reduce the RMSE for the testing data from that obtained by SSALRF up to 83%. Keywords: electricity, SSA, Fuzzy, neural network, forecast
1. Introduction In Indonesia, the electricity needs tend to grow significantly from year to year [1]. Electrical energy cannot be stored because of the storage limitation but also because of the requirement of complex equipment. Therefore, it is important to set the suitability between the demand and the supply. When the demand is greater than the supply, PT. PLN (Government Corporation that supplies electricity needs in Indonesia) will do the blackout. Rotating blackouts continue to
occur throughout Sumatera, Kalimantan, Sulawesi, and areas of eastern Indonesia [2], which result in great inconvenience and major financial loss for costumers. To minimize the power outages, estimation of electricity demand and the electricity infrastructure development are needed. It is stated in Electricity Supply Business Plan (RPUTL = Rencana Umum Penyediaan Tenaga Listrik) [3] that the additional need for the power plant is 3.86 GWh this year of 2019 and that it will increase to 10.06 GWh next year. RUPTL is the Indonesian electric power system plan that is prepared for the next ten years. Meanwhile, its implementation and realization are reviewed annually for the purpose of adjusting to the dynamics of society and growth [3]. Furthermore, the forecasts for electricity demand are needed for planning power plants, transmissions, and substations to answer the short and long term electricity needs of all communities in the territory of Indonesia. In the last decade, researchers such as in [4–8] implemented some forecasting methods to Indonesian electricity load demand series. Suhartono [4] used double seasonal recurrent neural networks while Utomo et al. [8] discussed double seasonal ARFIMA for short term electricity load demand forecasting. Further, Suhartono et al. [5] combined ARIMA and ANFIS to improve the performance of forecasting values. Meanwhile, Sulandari et al. [6] combined exponential smoothing state space model with neural network (NN) and, recently, Sulandari et al. [7] discussed an SSA-based model and its combination with NN to forecast the electricity load demand series. Some of their conclusions is that the hybrid models tend to give better results than the single models. Generally, the fluctuation of electricity load series indicates the existence of trend, seasonal and irregular pattern that merge into one combination. The trend is affected by the changing public demand for electricity which is related to the population growth while the seasonal phenomenon may stem from factors such as the different activity between the trading day and holidays which affect the economic activities. These factors can affect the load values systematically and show the deterministic effect. Wei [9] suggested that the deterministic factors
can be eliminated from the series to obtain the stochastic component so that a satisfactory probabilistic model for the stochastic component can be determined. Many studies were done to obtain more accurate predictions. Application of linear models to the load series such as linear regression, exponential smoothing and ARIMA were discussed by [10–12]. However, these linear models cannot accommodate the nonlinear relationship in the load series. Further, Taylor [13] developed double seasonal exponential smoothing model. Taylor et al [14] compared the double seasonal ARIMA, double seasonal exponential smoothing, NN, regression method with principal component analysis, with the simples naïve benchmark and the naïve benchmark with error model which take into account the multiple seasonality in the data. Soares and Medeiros [15] proposed the two level forecasting model which combined the deterministic and stochastic model for the improvement of Brazilian load forecasting accuracy. The deterministic component was eliminated by trend and harmonic function to obtain stochastic component. Soares and Medeiros [15] implemented the autoregressive model to approximate the stochastic irregular component. Later, Wang et al. [16] combined the seasonal ARIMA, seasonal exponential smoothing, and weighted support vector machines to accommodate both seasonal and nonlinearity in the load series. Recently, a hybrid approach based on improved empirical mode decomposition was proposed by Zhang et al. [17]. Other researchers such as in [18–20] were interested in implementing singular spectrum analysis for improving the performance of load forecasting model. This work considers SSA hybrid model to enhance the forecasting results. SSA is considered as a tool to extract component patterns of the complex pattern series. Le et al. [20] combined SSA with autoregressive (AR) model and then compared its results with the AR model, the SSA with linear recurrence formulae (SSA-LRF) and a back-propagation neural network (BPNN) model for short-term power load forecasting. SSA–LRF is also possible to be combined with NN to solve the nonlinearity in the data (see [7,21]).
This study presents two new forecasting strategies that are applicable to time series data, i.e. combination SSA-LRF with weighted fuzzy time series, considering four different methods, and combination between SSA-LRF with NN. These methodologies are then employed to our main data set containing the hourly electricity load in Java-Bali area. Other applications were also considered to reinforce the great generality of our proposals. By this hybrid combination of existing methods, we obtain new and powerful strategies where it is expected that the deterministic relationships in the series can be well accommodated through the SSA-LRF model, while the nonlinear stochastic relationship problem can be handled by the fuzzy or NN frameworks. Therefore the proposed hybrid approach can enhance the forecasting accuracy performance. The forecasting performance for several-steps ahead is compared with that of a benchmark model that is SSALRF model. These comparisons are made by considering the root mean squared error (RMSE) and the mean absolute percentage error (MAPE). We also implemented the proposed hybrid approaches to other data sets, i.e. a half hourly electricity load of Bawen, Indonesia, and to another case study in a different field, i.e. the weekly US ending stocks of total gasoline, in order to show the effectiveness and generality of the proposed approaches. This paper is organized as follows. Section 2 explains briefly the basic SSA, SSA-LRF forecasting algorithm, fuzzy time series and NN. Section 3 describes the methodology of the research and Section 4 presents data and the result. Finally, the conclusions are presented in Section 5. 2. A Brief Overview of SSA-LRF, Fuzzy and NN Below we provide a brief review of SSA-LRF, fuzzy time series, and NN. 2.1. SSA-LRF
Let us consider a time series , = 1, 2, … , . For a given window
length (1 < < /2, the series , = 1, 2, … , is used to build the Hankel trajectory matrix with size × ,
⋯ ⋯ " ⋱ ⋮ ⋯ !
= ⋮ ⋮
(1)
where = − + 1. Some recommendations on the choice of the window
length are discussed in [22] and in [23].
In the next step, the trajectory matrix in Eq. (1) is decomposed into rank one matrices using singular value decomposition (SVD), = ∑)(* &'( +( ,(- ,
(2)
where ' ≥ ' ≥ ⋯ ≥ ') > 0 are the positive eigenvalues of the matrix X- , +(
and ,( = X- +( /&'( (for 4 = 1, 2, … 56 are left and right singular vectors of , respectively.
In the third step, the eigentriples (&'( , +( , ,( ) are grouping onto 7 disjoint subsets
89 (: = 1, 2, … 7; 7 ≤ 56 such that Eq. (2) can be represented as = => + =? + ⋯ + =@ ,
where =A = ∑B∈=A &'B +B ,B- for : = 1, 2, … 7 and 7 ≤ 5. Essentially we are
mostly interested in creating two main disjoint subsets: the signal that will be used for model fit and model forecasting, and the noise that should be discarded.
Finally, in the fourth step, each component =A is transformed into a time series
using the diagonal averaging algorithm. Further details about this algorithm can
be found in [24–28]. In case of data contaminated with outlying observations, a correlation analysis can be considered as in [29], and a general robust SSA algorithm for model fit as proposed by Rodrigues et al. [30] can be considered. Long time series can be handled with lower computational needs by using a As mentioned above, assuming that the time series D! = , =
randomized version of the SSA algorithm [31].
1, 2, … , can be represented as D! = D! + D! , where D! and D! are (6
(6
(6
(6
approximately separable for the window length . D! is regarded as noise and (6
the forecasts of the signal component D! can be obtained by using the LRF (6
([18,32]), that can be written as