Available online at www.sciencedirect.com
ScienceDirect Solar Energy 122 (2015) 1371–1383 www.elsevier.com/locate/solener
A novel clustering approach for short-term solar radiation forecasting M. Ghayekhloo a, M. Ghofrani b,⇑, M.B. Menhaj c, R. Azimi d a Young Researchers and Elite Club, Qazvin Branch, Islamic Azad University, Qazvin, Iran School of Science, Technology, Engineering and Mathematics (STEM), University of Washington, Bothell, USA c Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran d Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran b
Received 19 March 2015; received in revised form 6 October 2015; accepted 30 October 2015
Communicated by: Associate Editor David Renne
Abstract This paper proposes a hybrid solar radiation forecasting method based on a novel game theoretic self-organizing map (GTSOM). New strategies are proposed to resolve the limitations of the original SOM for non-winning neurons and increase their competition with winning neurons to obtain more input patterns. Neural gas (NG) and competitive Hebbian Learning (CHL) are used to enhance the learning and quality of the map. Solar radiation data are decomposed by the discrete wavelet transform (DWT). A time series analysis is then used to develop the structure of the training and testing for Bayesian neural networks (BNNs). The proposed GTSOM groups the time-series analyzed datasets into clusters with similar data. The elbow method is used to determine the number of clusters. A cluster selection method is developed to determine the appropriate cluster whose solar radiation data provide the input to the NN. Temperature, wind speed and wind direction data are also included in the inputs to the BNN whose outputs provide the solar radiation forecasts. The historical solar radiation data are used to evaluate the accuracy of the hybrid forecasting with the proposed clustering and its comparison with that of the K-means, the original SOM and NG clustering algorithms. The comparison demonstrates the superior performance of the proposed clustering method. Published by Elsevier Ltd.
Keywords: Clustering; Data preprocessing; Forecasting; Solar radiation
1. Introduction Accurate solar forecasting facilitates the integration of solar generation into the grid by reducing the integration and operational costs associated with solar intermittencies. The need for more accurate forecasting methods is highlighted in the past few years with higher penetrations and energy market growths of solar generation (Inman et al., 2013). Conventional statistical methods such as
⇑ Corresponding author at: UWBB Room 227, 18807 Beardslee Blvd, Bothell, WA 98011, USA. Tel.: +1 425 352 3224; fax: +1 425 352 3775. E-mail address:
[email protected] (M. Ghofrani).
http://dx.doi.org/10.1016/j.solener.2015.10.053 0038-092X/Published by Elsevier Ltd.
auto-regressive (AR) and autoregressive moving average (ARMA) are based on time-series models and have been traditionally used for solar prediction (Aguiar and Pereira, 1992; Mora-Lopeza and Sidrach-de-Cardona, 1998). The statistical methods provide linear stationary models which are not appropriate for the non-linear non-stationary solar radiation. Several methods such as fuzzy logic (Tanaka et al., 2011; Jafarzadeh et al., 2013) artificial neural networks (ANNs) Oudjana et al., 2011 and support vector regression (SVR) Li and Li, 2008; Shi et al., 2012 were used to provide non-linear models and better accuracies for solar forecasting. However, the limitations of the fuzzy logic-based method to directly
1372
M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383
learn from historical data and complexity of the ANNbased methods to specify numerous model parameters require a hybrid of different methods to resolve the shortcomings of the individual forecasting techniques. A hybrid of ARMA and a time delay neural network (TDNN) was proposed in Wu and Chan (2011) which demonstrated more accurate results than each method separately. However, the feasibility of the proposed method for short-term solar forecasting is not justified due to its highcomputational complexity. Voyant et al. (2012) used a combination of ARMA and ANN with numerical weather prediction (NWP) to forecast the hourly mean global horizontal irradiance. Self-organizing map (SOM) has been also used in hybrid methods for pattern recognition and classification of the input data (Chen et al., 2011; Yang et al., 2014). Meteorological predictions for relative humidity, solar irradiance and temperature were utilized in Chen et al. (2011) to forecast the solar power generation. However, inaccuracies of the meteorological forecasts degrade the quality of the solar power prediction. Yang et al. (2014) developed a hybrid approach combining a SOM, a learning vector quantization (LVQ) network, the SVR and fuzzy inference methods for a day-ahead PV power prediction. Numerical results demonstrated that the proposed hybrid approach performs better than the traditional ANN and the simple SVR methods individually. However, SOM has some limitations for non-convex or discontinuous input distributions. In addition, non-winning neurons do not participate in the learning phase which reduces the quality of the map and accuracy of the forecast. Game theories have been used for SOM training to resolve these problems and globally optimize the map (Herbert and Yao, 2005; Neme et al., 2005). Herbert and Yao (2005) proposed a training algorithm that considers the neurons as players with strategy sets and utility functions. The algorithm was used to improve the quality of the map, but no justification was provided by either numerical results or experiments. In addition, the algorithm defines the player’s utility function based on all players’ actions and requires solving the game for all iterations which increases its computational complexity. Neme et al. (2005) proposed different strategies to enable the winning neuron to select between its neighbors for the weight adaption. This reduces the complexity of the original SOM algorithm where the weight vectors of all neighboring neurons are updated. However, the proposed strategies were noncooperative in topographic map formation without minimizing the error measure. Engelbrecht (2007) proposed a hybrid model of self-organizing maps (SOM), support vector regression (SVR) and particle swarm optimization (PSO) to forecast hourly global solar radiation. SOM algorithm was applied in the first step to divide the entire input space into several disjointed regions or clusters. SVR models were then used to model each cluster for detecting characteristic correlation between the predicted and the actual values. In order to deal with the volatility of SVR
with different parameters, PSO algorithm was used to improve the forecasting performance of the SVR models. However, the SOM algorithms may converge to nonoptimal clustering results depending on the initialization and learning rate considered for the algorithm. In addition, neighborhood violations occur if the output space topology does not match with the data shape. The amount of solar radiation that reaches the ground is extremely variable because of the apparent motion of the sun and changing conditions of the atmosphere, which make its forecasting complicated. The chaotic nature of the solar data disrupts the neural network learning process and presents the forecasting results with high errors (Di Piazza et al., 2011). An improved version of SOM algorithm is proposed in this paper, to group the solar time series data into clusters to better characterize its irregular features. This technique allows handling groups of data separately, which by identifying anomalies and irregular patterns as well as neglecting outliers, provides a better understanding of the collected information. This leads to more appropriate learning for neural networks and improves the accuracy of the forecasts. This paper develops a hybrid solar radiation forecasting based on a novel clustering algorithm. Game-theoretic concepts with new strategies are used in conjunction with SOM to provide a modified GTSOM clustering. The proposed clustering method increases the non-winning neurons’ participation in the learning and enables their competition with the winning neurons. This provides a better clustering performance as compared to the original SOM where the dead neurons have negligible chances to obtain input patterns. Unlike the original SOM which defines the neighborhood based on the neurons’ distances in the twodimensional lattice, NG is combined with CHL to define the neighborhood based on the neurons’ distances in the input space. This speeds up the learning and enhances the mapping quality. Wavelet decomposition and time series analysis are used to preprocess the solar radiation data. The time series analysis develops the structure of the training and forecasting for BNNs. The proposed GTSOM groups the pre-processed solar radiation datasets into a number of clusters determined by the elbow method. A cluster selection technique is proposed to select the cluster that is used for the solar forecasting of each individual hour. Temperature, wind speed and wind direction data are also included in the inputs to the BNN whose outputs provide the solar radiation forecasts. The rest of the paper is organized as follows. The original SOM algorithm and game-theoretic concepts are explained in Section 2. This section also describes the proposed GTSOM clustering and the hybrid solar radiation forecasting methods. Section 3 provides a case study where the accuracy results are calculated for the hybrid forecasting with different clustering algorithms including the proposed GTSOM method and the existing K-means, the original SOM and NG clustering. Conclusions are given in Section 4.
M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383
1373
where aðtÞ 2 ½0; 1 is the learning rate parameter; rc and ri 2 R2 are the positions of the winning and i th neurons in the two-dimensional lattice. The neighborhood size is controlled by the parameter r.
competitive game and enable the players to obtain more input patterns. This resolves the shortcomings of the original SOM where the positioning of the dead neurons with respect to the input patterns reduces their contribution in the learning and their chance of competing with the winning neurons. The proposed strategies are as follows: Strategy A (Approach): Similar to the original SOM algorithm, winning neurons and its neighbors move toward and approach the input pattern to minimize the Euclidean distance. The non-winning neurons’ strategies are determined based on the situations of the neurons and stage of the iteration. Strategy O (Opposite): The non-winning neurons move in a direction opposite to the winning neuron’s direction at early stages of the iteration. This increases their chance of reaching a pattern as the patterns are distributed within the input space with an equal probability. Strategy S (Stay): This strategy requires the nonwinning neurons to stay in their current positions. This is more appropriate for the recently winning neurons or the neurons, which have won many times as they most likely, approached regions with sufficient input patterns. Strategy R (Random): This strategy requires the nonwinning neurons to move to random positions in the input space. This strategy is more applicable for neurons wandering in regions with insufficient or no input patterns. The random moves increase the chance of these neurons approaching regions with sufficient input patterns and are appropriate at early stages of the iteration. Strategy B (Best player to approach): This strategy requires the non-winning neurons to approach the neuron defined as the best player. An error variable (Ec ) is used to identify the best player. The cumulative error of the neuron and the Euclidean distance between the input pattern and BMU are added to calculate the error variable as follows:
2.2. Game theory
Ec ðtÞ ¼ Ec ðt 1Þ þ kx wc k2
Game theory studies situations involving players with conflicting interests (Pavel, 2012). Each player has a payoff function and a strategy set. The strategy set defines the player actions in each stage of the game. Both the player and other players’ actions determine the pay-off for each player. The objective is maximizing the players’ payoffs. Games are classified into non-cooperative and cooperative. Each player’s action is chosen independently in a non-cooperative game as opposed to the cooperative games where the players can choose to form coalitions and collaborations.
The number of wins for the neuron is retrieved at the end of the iterations to calculate a counter variable. The error variable of (4) is then divided by the counter variable to obtain the average cumulative error. The neuron with the minimum average cumulative error is selected as the best player. The topographic mapping of SOM preserves the topology of the input data and requires the neighborhood function to be defined based on the neurons’ distances in the 2D lattice (Engelbrecht, 2007). This is not appropriate for the proposed strategies where the neighborhood is defined based on the neurons’ distances in the input space. A SOM variant, NG, is used in this paper to address the issue and provide a proper neighborhood function for the proposed strategies (Martinetz and Schulten, 1991). NG ranks the neurons based on their proximities to the input pattern with an integer ki assigned to each neuron. For example,
2. Methodology 2.1. SOM algorithm SOM is an unsupervised, competitive learning, clustering network that classifies input patterns by transforming them to a two-dimensional map of features (Engelbrecht, 2007). Let X ¼ ½x1 ; x2 ; . . . ; xR T represents an arbitrary input T vector and W i ¼ ½wi1 ; wi2 ; . . . ; wiR denotes the weight vector for the i th neuron. The SOM training is an iterative process where the Euclidean norm between the neurons and a pattern randomly selected from the set of input patterns is calculated and the neuron with the minimum distance is selected as the winning neuron. The winning neuron is the best matching unit (BMU) whose index is given by Kohonen (1990): qðxÞ ¼ kx wc k2 ¼ min8i kX W i k2
ð1Þ
where wc is the weight vector for the winning neuron. The weight adaption rule of SOM is as follows (Kohonen, 1990): W i ðtÞ þ li ðtÞ½XðtÞ W i ðtÞ; if i 2 N q ðjÞ W i ðt þ 1Þ ¼ W i ðtÞ; if i R N q ðjÞ ð2Þ where N q is the neighborhood set of BMU and li ðtÞ is the neighborhood function given by: ! 2 krc ri k ð3Þ li ðtÞ ¼ aðtÞ: exp 2r2 ðtÞ
2.3. The proposed clustering method Game theoretic SOM (GTSOM) is used in this paper to cluster the two-dimensional solar irradiance–temperature dataset. Five strategies are proposed to provide a more
ð4Þ
1374
M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383
2.4. The proposed hybrid forecasting method
Initialize the weight vectors of the neurons and place the close values in the center of the input space
2.4.1. Overview A hybrid forecasting method is proposed that combines a time-series analysis, the GTSOM clustering algorithm, a cluster selection algorithm and BNN. In this section, we will discuss the details of our proposed hybrid method. Fig. 2 shows the schematic diagram of the proposed forecasting method.
Randomly select and apply an input pattern
Use Eq. (1) to determine the winning and nonwinning neurons
No
Is neuron the BMU?
Apply strategies O/S/R/B based on the situations of the neurons and stage of the iteration
2.4.2. Pre-processing stage The pre-processing stage includes wavelet decomposition and a time series analysis to better characterize the solar irradiance behavior and provide the most appropriate inputs for BNNs. The discrete wavelet transform (DWT) decomposes the data into proper levels of resolution determined by the entropy-based criterion (Coifman and Wickerhauser, 1992). Fig. 3 shows the multi-resolution decomposition for an M-level discrete wavelet transform (DWT). For a DWT with M-level decomposition, the sum of the approximation (A) and details (D) of the signal up to scale M provides the original signal (f ) as Ghayekhloo et al., 2015:
Yes
Apply strategy A
Use Eq. (5) to update the neurons’ weight vectors
No
Is the termination criteria satisfied?
Yes
f ¼ AM þ
End
M X
Fig. 1. Flowchart for the proposed clustering method (GTSOM).
neuron i0 is the closest neuron with the integer 0; neuron i1 is the second-closest with the integer 1 and so on. The weight vectors of the neurons are updated by NG as: (
W i ðt þ 1Þ ¼
k i W i ðtÞ þ aðtÞ exp rðtÞ ½XðtÞ W i ðtÞ; if i 2 N q ðjÞ
W i ðtÞ;
CHL in combination with NG determine the neighboring relationship between neurons (Martinetz and Schulten, 1991). Fig. 1 shows the flowchart for the developed GTSOM.
n¼þ1 X
Acm;n um;n
ð7:aÞ
n¼þ1 X n¼1
Dcj;n wj;n
ð7:bÞ
where um;n and wj;n are the scaling and detailed functions at location n and scales m and j, respectively. Acm;n and Dcj;n are the approximate and detailed coefficients calculated by the following recursive equations:
Clustering Stage
Solar testing data
Solar training data
Pre-processing Stage
DWT
Am ¼ Dj ¼
ð5Þ
Solar radiation data
ð6Þ
where Am and Dj are the approximation and detail of the signal at scales m and j calculated by:
n¼1
if i R N q ðjÞ
Structure the training and testing data using time-series analysis
Dj
j¼1
GTSOM Clustering
Select the cluster with the minimum distance and maximum number of correlated training input to the testing input data
Training Stage Wind speed Cluster n Wind direction Temperature
Testing input Testing output
Fig. 2. Schematic diagram of the proposed forecasting method.
Bayesian learning for training data
Forecasting Stage
NN for the forecast output and error calculation
M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383
1375
f = A0 Low-pass Filter
High-pass Filter
Eq. (8.a)
A1
A1c
Eq. (7.a) Low-pass Filter
High-pass Filter
Eq. (8.a)
A2
A2c
Eq. (7.a)
Low-pass Filter
High-pass Filter
Eq. (8.a)
A3
A3c
Eq. (7.a)
Low-pass Filter
High-pass Filter
Eq. (8.a)
Eq. (8.b)
Eq. (8.b)
Eq. (8.b)
Eq. (8.b)
D4c
D3c
c 2
D
D1c
Eq. (7.a)
Eq. (7.b)
Eq. (7.b)
Eq. (7.b)
Eq. (7.b)
AM
DM
D3
D2
D1
A4c
Fig. 3. M-level wavelet decomposition.
Acmþ1;n ¼ Dcjþ1;n ¼
X i
X i
li Acm;2nþi ¼
hi Acj;2nþi ¼
X i
li2n Acm;i
X hi2n Acj;i
ð8:aÞ ð8:bÞ
i
where li and hi are the coefficients of the low- and high-pass filters. The signal’s general trend is approximated by the lowfrequency component while the high-frequency components represent the details. Different frequency components of the data are then time-series analyzed to provide the inputs for NNs. The structure of the time series for the training and testing is given by (9.a) and (9.b), respectively. The data on the left side of the line within the matrices of
(9.a) and (9.b) represents the training and testing input series, and the last column on the right side within the matrices of (9.a) and (9.b) provides the outputs in the training and testing data, respectively. For each set of data, SðtÞ is the most current data point, with N being the length of the sliding window used for the lagged data. The data includes K rows of training and K 0 rows of testing. An iterative test is run to determine an appropriate value for N which provides a reasonable compromise between the forecast accuracy and computational burden. To do so, the forecast accuracy is calculated for different values of N as it is increased and a knee point is selected as the length of the sliding window. Increasing N beyond the knee point does not significantly enhance the forecast accuracy.
ð9:aÞ
1376
M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383
ð9:bÞ
2.4.3. Clustering stage In the clustering stage, the developed GTSOM is used to group the pre-processed dataset into a number of clusters determined by the elbow method (Ketchen and Shook, 1996). A cluster selection method is proposed to determine the appropriate cluster that is used to provide the forecast for a specific hour. Fig. 4 shows the flowchart for the proposed cluster selection method where C is the number of clusters and J is the total number of training input rows within each cluster. m is a counter for the number of training input rows whose correlation with the testing input is greater than a desired value (0.8). An average is calculated for correlated training input datasets within a cluster and represented by Avg: Correlated Train InputðcÞ. X denotes the average of the testing input dataset. For each cluster, a distance is calculated between these two averages. This process is repeated for all clusters and the cluster with the minimum distance and maximum number of correlated training input is chosen to be passes to the training stage. 2.4.4. Training stage Once the data are pre-processed and clustered, Bayesian approach is used to train the data of the selected cluster. Bayesian approach demonstrates the most appropriate performance for limited training datasets and therefore is used among the learning methods for the NN training (Neal, 1996). This is particularly true for the clusters containing fewer data. More information regarding the training parameters are as follows: (1) Number of input: 10 (solar radiation) + 1(temperature) + 1(wind speed) + 1(wind direction). (2) Number of hidden layers and neurons: 1 hidden layer with 5 neurons, 1 output layer. (3) Transfer function of hidden layer: Tansing and pureline. (4) Learning algorithm: Bayesian. (5) Comparison functions: root mean square error (RMSE), relative RMSE (rRMSE). (6) Data distribution (train–test) = 80% train, 20% test. 2.4.5. Forecasting stage This paper uses the BNN for solar irradiance forecasting over short horizons from 1 to 3 h ahead. After receiving
Load data
Use GTSOM and Elbow methods to group the data into an appropriate number of clusters c=1 X = avg. (Test-Input)
j = 1, m = 1
( j) = correlation (Train-Input ( j), Test-Input)
| (j)| >= 0.8
No
Yes Y(m, c) = avg. (Train-Input (j)) m=m+1
j=j+1
Yes
j
Count-Correlated-Train-Input (c) = m Avg.-Correlated-Train-Input (c) = avg. (Y(m, c)) Distance (c) = |X- (Avg.-Correlated-Train-Input (c))|
c =c + 1
Yes
c
Select the cluster with the minimum distance and maximum Count-Correlated-Train-Input End Fig. 4. Flowchart of the cluster selection method.
M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383
1377
Table 1 Forecast accuracy for different lengths of the sliding window (N). Window
Inputs
RMSE (W/m2)
rRMSE
N=3 N=5 N=8 N = 10 N = 15
[S(t 2), S(t 1), S(t)] [S(t 4), S(t 3), S(t 2), S(t 1), S(t)] [S(t 7), S(t 6), S(t 5), S(t 4), S(t 3), S(t 2), S(t 1), S(t)] [S(t 9), S(t 8), S(t 7), S(t 6), S(t 5), S(t 4), S(t 3), S(t 2), S(t 1), S(t)] [S(t 14), S(t 13), S(t 12), S(t 11), S(t 10), S(t 9), S(t 8), S(t 7), S(t 6), S(t 5), S(t 4), S (t 3), S(t 2), S(t 1), S(t)]
77.5059 71.2911 68.2694 65.860 64.798
0.487 0.448 0.429 0.414 0.408
effective inputs and simulating a network, the BNN offers output values as forecast. As mentioned in Section 2.4.4, in this paper for the solar irradiance forecasting process, the data on air temperature, wind speed, wind direction and the data on solar irradiance for the selected cluster are considered as effective inputs, and based on Eq. (9.b), the values with the general form of b S ðt þ 1Þ; b S ðt þ 2Þ; 0 b . . . ; S ðt þ K Þ include the network output as forecast for different time horizons (1 h ahead, 2 h ahead, . . ., K 0 hour ahead).
3. Case studies This section evaluates the accuracy of the hybrid forecasting with the proposed clustering method (GTSOM) and its comparison with that of the K-means, original SOM and NG. The hourly solar radiation data of the Ames station between 2011 and 2013 are used for the NN training and forecasting (http://mesonet.agron.iastate. edu). 80% of the data is used to train the NNs and the remaining 20% is used for testing. The iterative test
Table 2 Selection of the clusters for irradiance solar forecasting of individual hours (10/05/2013).
8 9 10 11 12 13 14 15 16 17 18 19
Cluster 1
2
3
4
5
6
7
6.59 129.12 60.06 131.25 104.63 26.63 7.62 49.03 146.77 50.45 179.87 23.85
83.82 51.89 137.29 54.02 178.14 103.86 84.86 126.26 135.03 26.78 102.89 53.38
172.05 52.23 118.58 50.1 74.01 152.02 171.02 129.62 30.91 130.91 1.23 157.5
57.12 167.17 3.64 65.76 40.93 37.08 56.08 14.68 84.03 114.15 116.17 87.56
73.89 61.82 127.36 63.96 171.93 93.92 74.92 116.33 144.97 16.85 112.83 43.45
116.17 108.12 62.7 105.98 18.13 96.14 115.14 73.73 24.97 173.21 57.11 146.61
72.08 63.63 125.55 65.76 170.12 92.12 73.11 114.52 146.77 15.04 114.64 41.64
150 140 130 120
RMSE
Hour
110 100 90 80 70 60 50
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Number of Clusters Fig. 5. Performance of the proposed solar radiation forecasting method with different clustering techniques for different number of clusters.
1378
M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383
Table 3 Accuracy results for the proposed hybrid forecasting with the new and existing clustering methods (1 h ahead). Month
Performance measures (W/m2)
Proposed hybrid forecasting with
Jan.
RMSE rRMSE
53.929 0.845
58.283 0.733
73.027 0.921
80.193 1.003
Feb.
RMSE rRMSE
59.735 0.727
62.986 0.609
67.065 0.582
77.684 0.714
Mar.
RMSE rRMSE
55.538 0.462
89.621 0.586
111.831 0.637
83.347 0.512
Apr.
RMSE rRMSE
71.499 0.541
85.143 0.453
114.216 0.662
96.602 0.538
May
RMSE rRMSE
78.977 0.487
93.362 0.462
127.033 0.611
145.553 0.694
Jun.
RMSE rRMSE
81.358 0.488
109.460 0.459
141.153 0.662
146.888 0.639
Jul.
RMSE rRMSE
83.516 0.405
78.751 0.303
117.250 0.429
120.439 0.431
Aug.
RMSE rRMSE
77.313 0.392
83.350 0.393
131.059 0.578
103.506 0.418
Sep.
RMSE rRMSE
73.038 0.458
126.009 0.706
108.683 0.615
95.042 0.498
Oct.
RMSE rRMSE
58.796 0.657
56.993 0.504
80.407 0.687
86.238 0.701
Nov.
RMSE rRMSE
57.363 0.988
63.127 0.686
59.101 0.757
78.796 0.861
Dec.
RMSE rRMSE
56.476 1.026
71.633 1.022
68.146 1.085
69.630 1.063
GTSOM
SOM
K-means
NG
Table 4 Accuracy results for the proposed hybrid forecasting with different clustering methods and forecast horizons. Model
Performance measures (W/m2)
Forecast horizon 1 h ahead
2 h ahead
3 h ahead
NG
RMSE rRMSE
63.681 0.408
86.059 0.552
106.903 0.686
K-means
RMSE rRMSE
61.029 0.384
85.723 0.533
98.264 0.611
SOM
RMSE rRMSE
57.855 0.374
90.580 0.586
114.786 0.742
GTSOM
RMSE rRMSE
52.623 0.327
82.765 0.521
96.076 0.604
determines the length of the sliding window used for the NN structure. This provides the number for the lagged solar radiation data used as the inputs to the NN. In addition, temperature, wind speed and wind direction are included in the inputs to better train the NN. The forecast produces hourly irradiance values for different forecast horizons. The forecast accuracy is evaluated by the mean absolute error (MAE), relative MAE (rMAE), root mean square error (RMSE), relative RMSE (rRMSE) and normalized RMSE (nRMSE) as follows:
K0 1X b MAE ¼ 0 S ðnÞ S Actual ðnÞ K n¼1 rMAE ¼ MAE=S Actual rMAE ¼ MAE=S Actual vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u K0 u1 X 2 t ðb SðnÞ S Actual ðnÞÞ RMSE ¼ 0 K n¼1 rRMSE ¼ RMSE=S Actual
ð10Þ ð11Þ ð12Þ ð13Þ
M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383
nRMSEð%Þ ¼ 100 RMSE=ðmaxðS Actual Þ minðS Actual ÞÞ
ð14Þ
Table 5 Accuracy results of the proposed hybrid forecasting with GTSOM clustering for different solar datasets.
RMSE
rRMSE
76.603 88.2438 61.3799 71.174 70.996 88.948 69.975 86.552 84.984 73.889 92.147
0.4818 0.5659 0.3921 0.455 0.441 0.576 0.604 0.562 0.593 0.472 0.580
b ðnÞ and where K 0 is the total number of testing outputs, S S Actual ðnÞ are the solar radiation forecast and the actual solar radiation for hour n. Table 1 provides the results of the iterative test to determine the length of the sliding window ðN Þ. The forecast accuracy significantly enhances as the window’s length increases to 10 beyond which further increasing N does not significantly decreases the forecast errors. Therefore, 10 is selected as number of lagged solar radiation inputs for the NN. The proposed GTSOM groups the solar irradiance data into 7 clusters determined by the elbow method. Table 2 provides the distances between the average of the correlated training inputs and average of testing inputs for each cluster for October 5, 2013. The distance is provided for hours 8–19 as the daylight hours. The cluster with the minimum distance and maximum number of correlated training inputs is used to forecast the solar irradiance for each hour. For hour 8, the
600 Measurement GTSOM
500
SOM NG
Solar radiation (W/m 2 )
Ames Calmar Castana Cedar Rapids Chariton Gilbert Kanawha Lewis Muscatine Nashua Sutherland
Proposed hybrid forecasting with GTSOM
Kmeans
400 300 200 100 0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Time (hour) Fig. 6. Solar radiation forecasts for 02/05/2013.
1000 900
Measurement GTSOM SOM NG Kmeans
800
Solar radiation (W/m2)
Dataset
700 600 500 400 300 200 100 0
1
2
3
4
5
6
7
8
1379
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Time (hour) Fig. 7. Solar radiation forecasts for 05/10/2013.
1380
M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383 900 Measurement GTSOM SOM NG Kmeans
Solar radiation (W/m2 )
800 700 600 500 400 300 200 100 0 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Time (hour) Fig. 8. Solar radiation forecasts for 08/25/2013.
500 450
Measurement GTSOM SOM NG Kmeans
2
Solar radiation (W/m )
400 350 300 250 200 150 100 50 0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Time (hour)
nRMSE (%)
Fig. 9. Solar radiation forecasts for 11/17/2013.
50 45 40 35 30 25 20 15 10 5 0
Proposed Method Hybrid ARIMA LES SES RW
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Time Fig. 10. nRMSE (%) comparison of forecasting methods for solar radiation data in Colorado.
distance between the testing input average and correlated training input average of cluster 13 is the lowest (6.59). Therefore, this cluster is used for the solar irradiance forecasting of hour 8. 51.89 is the minimum distance among the
values of row 2 which selects cluster 2 as the appropriate cluster to forecast the solar irradiance for hour 9. The selected clusters for the solar irradiance forecasting of hours 10–19 are 4, 3, 6, 1, 1, 4, 6, 7, 3, and 1, respectively. Fig. 5 shows the RMSE values of the forecast with different clustering methods for the Ames solar radiation dataset. Forecast results of the proposed hybrid algorithm are individually calculated for each of the clustering methods as the number of clusters increases to 30. The figure shows the lowest RMSE values of the forecasts with the proposed GTSOM for all number of clusters. The figure also demonstrates the minimum number of clusters needed for each clustering method to provide the best forecasting results. Table 3 shows the accuracy results of the hybrid forecasting with the proposed GTSOM method and its comparison with that of the K-means, original SOM and NG clustering algorithms. The forecast accuracy is calculated for each hour and then averaged to provide the performance measures for each month of the year 2013. The calculated performance indices demonstrate that the
Table 6 A summary of the forecast results for several forecasting methodologies. Author(s)
Mathiesen and Kleissl (2011)
Gala et al. (2013)
Models
Forecast horizon
Performance measures RMSE (W/m2)
rMAE (%)
MAE (W/m2)
NK: interpolated NAM (North American Model) forecast G: the 3-h constant kt* (the clear sky index) GFS (global forecast system) forecast ECMWF: the European center for medium–range weather forecast
1 day 1 day
117.0 82.8
– –
– –
1 day
100.9
–
–
ECMWF
1h 3h 1 day 1h 3h 1 day 1h 3h 1 day
– – – – – – – – –
– – – – – – – – –
30.22 65.92 243.09 28.00 59.21 203.89 28.00 58.76 218.05
Galicia on Spain’s north western coast
D-SVR: post-processing ECMWF with SVR at daily resolution added to 1-h resolution using CSM (climate system model)
Bondville, Illinois
Perez et al. (2010)
An empirical correlation between the sky cover data and measured GHI (global horizontal irradiance)
1h 2h 3h 4h 5h 6h 1 day 2 day 3 day 4 day 5 day 6 day 7 day
80 88 96 104 116 142 125 139 142 147 147 147 169
– – – – – – – – – – – – –
– – – – – – – – – – – – –
Desert Rock SURFRAD station
Lorenz et al. (2009)
ECMWF–OL: statistical post-processing in combination with a clear sky model
1 day 2 day 3 day 1 day 2 day 3 day 1 day 2 day 3 day 1 day 2 day 3 day 1 day 2 day 3 day
92 95 102 94 100 104 99 103 107 118 125 138 144 161 167
– – – – – – – – – – – – – – –
59 63 67 62 66 69 67 70 73 74 79 86 92 104 109
University of Oldenburg, Germany
BLUE: the statistical model of Blue Sky
MM–MOS: MOS (model output statistics) by the weather company Meteomedia GmbH. GmbH (Gesellschaft mit beschra¨nkter Haftung) is a type of business or company in Germany.) WRF–MT
Persistence: the strong correlation between past and future
M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383
3H-SVR: post-processing ECMWF with SVR at 3-h resolution added to 1-h resolution using CSM
Location of data used
(continued on next page) 1381
49.792 57.411 72.096 82.734 15.071 17.377 21.822 25.042 67.921 82.506 113.449 119.752 1h 2h 3h 1 day A hybrid solar radiation forecasting method based on a novel game theoretic self-organizing map (GTSOM) Proposed method
123 129 142 1 day 1 day 1 day
34.7 39 41.2
77 87 91
Golden CO, USA dataset Austrian dataset 98.09 61 65 70 27.39 27.6 29.02 31.5 149.29 99 101 112
NREL (National Renewable Energy Laboratory) golden CO BLUE ECMWF–OL (Oldenburg) The traditional synoptic model of the meteorologists of Blue Sky (SYNOP) WRF–MT (Weather Research and Forecasting–Meteotest) CENER: post processing based on machine learning methods Austrian persistence Manjili and Niknamfar (2015)
1 day 1 day 1 day 1 day
– – – 103 93 57 1 day (clear sky)
– – –
– – – – – – 127 233 138
ECMWF–NWP: a global forecast model used for NWPs STNN: developed Statistical model based on ANN ECMWF–MOSNN: developed model based on ANN and ECMWF NWP data ECMWF–NWP STNN ECMWF–MOSNN Cornaro et al. (2015)
1 day (overcast)
RMSE (W/m2)
rMAE (%)
MAE (W/m2)
Location of data used Performance measures
Forecast horizon Models Author(s)
Table 6 (continued)
Ames, Iowa, United States
M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383 The ESTER Laboratory of the University of Rome
1382
proposed forecasting with the new clustering method outperforms the forecasting with other clustering techniques. Table 4 provides the accuracy of the proposed hybrid forecasting method with different clustering algorithms for forecast horizons ranging from 1 to 3 h ahead. The results for rRMSE demonstrate that the proposed GTSOM model outperforms the SOM, NG, and Kmeans by 12.6%, 19.9%, and 14.8% for 1 h ahead prediction; 11.1%, 5.6%, and 2.3% for 2 h ahead prediction; and 18.6%, 12%, and 1.1% for 3 h ahead prediction. The corresponding values of RMSE are 9%, 17.4%, and 13.8% for 1 h ahead prediction; 8.6%, 3.8%, and 3.5% for 2 h ahead prediction; and 16.3%, 10.1%, and 2.2% for 3 h ahead prediction. The proposed forecasting with GTSOM clustering is evaluated using several different solar radiation datasets to provide a comprehensive performance analysis. The datasets are from different weather stations with different solar radiation characteristics. Table 5 provides the accuracy results for an hour-ahead prediction. Figs. 6–9 show the solar irradiance forecasts for four different days in spring, summer, fall and winter 2013 in Ames region. The figures provide the forecast results of the proposed GTSOM and other clustering methods as well as the measurements. The figures demonstrate the better performance of the proposed forecasting with GTSOM clustering for various weather conditions. Fig. 10 compares the hourly forecasting results of the proposed method with five other well-established forecasting models including ARIMA (autoregressive integrated moving average), LES (linear exponential smoothing), SES (simple exponential smoothing), RW (random-walk) and Hybrid method (Dong et al., 2015). Solar radiation data in Colorado USA (http://www.nrel.gov) is used for all these forecasting methods to provide a fair comparison. The results for nRMSE (%) demonstrate that the proposed method provides more accurate forecasting results than other forecasting models. Table 6 provides a comparative analysis of the other newest forecasting models available in literature as well as our proposed forecasting method. Several accuracy indices are used for the comparison. Accuracy results cover a wide range of forecast horizons including hourly predictions from an hour ahead up to 7-h ahead as well as daily predictions from a day ahead to 7-day ahead. The table also provides an overview of the forecast models and the data used for the evaluation. Results demonstrate that the proposed hybrid forecasting with GTSOM clustering outperforms its comparatives with the same forecasting horizons. 4. Conclusions A four-stage (pre-processing, clustering, training and forecasting) solar radiation forecasting method is proposed in this paper. The pre-processing stage decomposes the solar radiation data into an appropriate resolution level
M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383
determined by the entropy-based criterion. A time series analysis is then used to develop the structure of the input and output series for the NNs. A modified GTSOM with novel strategies is developed for the clustering stage of the forecasting method to provide a more competitive game and a better clustering performance. A cluster selection method is also proposed to determine the appropriate cluster for the solar radiation forecasting of each individual hour. The pre-processed and clustered solar radiation data as well as the temperature, wind speed and wind direction data are trained by Bayesian approach. NN is then used to estimate the solar radiation and provide the forecast. The historical hourly solar radiation data are used to evaluate the performance of the developed forecasting method with different clustering algorithms. Comparison of the forecast results demonstrates the improved accuracy of the proposed GTSOM as compared to the existing SOM, NG and K-means clustering techniques. The proposed hybrid method is also compared with the existing state-of-the-art techniques for their forecast accuracies. The results show a significant accuracy improvement for the proposed forecasting model. References Aguiar, R., Pereira, M.C., 1992. TAG a time-dependent, autoregressive, Gaussian model for generating synthetic hourly radiation. Sol. Energy 49 (3), 167–174. Chen, C., Duan, S., Cai, T., Liu, B., 2011. Online 24-h solar power forecasting based on weather type classification using artificial neural network. Sol. Energy 85 (11), 2856–2870. Coifman, R.R., Wickerhauser, M.V., 1992. Entropy-based algorithms for best basis selection. IEEE Trans. Inf. Theory 38 (2), 713–718. Cornaro, C., Pierro, M., Bucci, F., 2015. Master optimization process based on neural networks ensemble for 24-h solar irradiance forecast. Sol. Energy 111, 297–312. Di Piazza, A., Di Piazza, M.C., Ragusa, A., Vitale, G., 2011. Environmental data processing by clustering methods for energy forecast and planning. Renew. Energy 36 (3), 1063–1074. Dong, Z., Yang, D., Reindl, T., Walsh, W.M., 2015. A novel hybrid approach based on self-organizing maps, support vector regression and particle swarm optimization to forecast solar irradiance. Energy 82, 570–577. Engelbrecht, A.P., 2007. Computational Intelligence: An Introduction, second ed. John Wiley & Sons. ´ ., Dı´az, J., Dorronsoro, J., 2013. Support vector Gala, Y., Ferna´ndez, A forecasting of solar radiation values. In: Pan, J., Polycarpou, M., Woz´niak, M., de Carvalho, A., Quintia´n, H., Corchado, E. (Eds.), . In: Hybrid Artificial Intelligent Systems, vol. 8073. Springer, Berlin Heidelberg, pp. 51–60. Ghayekhloo, M., Menhaj, M.B., Ghofrani, M., 2015. A hybrid short-term load forecasting with a new data preprocessing framework. Electr. Power Syst. Res. 119, 138–148. Herbert, J., Yao, J., 2005. A game-theoretic approach to competitive learning in self-organizing maps. In: Wang, L., Chen, K., Ong, Y.S. (Eds.), Advances in Natural Computation. Springer, pp. 129–138.
1383
Inman, R.H., Pedro, H.T.C., Coimbra, C.F.M., 2013. Solar forecasting methods for renewable energy integration. Energy Combust. Sci. 39, 535–576. Jafarzadeh, S., Fadali, M.S., Evrenosog˘lu, C.Y., 2013. Solar power prediction using interval type-2 TSK modeling. IEEE Trans. Sustain. Energy 4 (2), 333–339. Ketchen, D.J., Shook, C.L., 1996. The application of cluster analysis in strategic management research: an analysis and critique. Strateg. Manag. J. 17, 441–458. Kohonen, T., 1990. The self-organizing map. IEEE Proc. 78 (9), 1464–1480. Li, R., Li, G.M., 2008. Photovoltaic power generation output forecasting based on support vector machine regression technique. Elect. Power 41 (2), 74–78. Lorenz, E., Remund, J., Mller, S.C., Traunmller, W., Steinmaurer, D.G., Ruiz-Arias, J.A., Fanego, V.L., Ramirez, L., Romeo, M.G., Kurz, C., Pomares, L.M., Guerrero, C.G., 2009. Benchmarking of different approaches to forecast solar irradiance. In: 24th European Photovoltaic Solar Energy Conference. Manjili, Y., Niknamfar, M., 2015. Big data analytic: cases for communications systems modeling and renewable energy forecast. In: ElOsery, A., Prevost, J. (Eds.), . In: Control and Systems Engineering, vol. 27. Springer International Publishing, pp. 109–134. Martinetz, T., Schulten, K., 1991. A neural-gas” network learns topologies. In: Kohonen, T., Makisara, K., Simula, O., Kangas, J. (Eds.), Proceedings of International Conference on Artificial Neural Networks (ICANN-91), vol. 1, pp. 397–402. Mathiesen, P., Kleissl, J., 2011. Evaluation of numerical weather prediction for intra-day solar forecasting in the continental United States. Sol. Energy 85 (5), 967–977. Mora-Lopeza, L.L., Sidrach-de-Cardona, M., 1998. Multiplicative ARMA models to generate hourly series of global irradiation. Sol. Energy 63 (5), 283–291. Neal, R.M., 1996. Bayesian Learning for Neural Networks. SpringerVerlag, New York. Neme, A., Hernandez, S., Neme, O., Hernandez, L., 2005. Self-organizing maps with non-cooperative strategies (SOM-NC). In: Prı´ncipe, J.C., Miikkulainen, R. (Eds.), Advances in Self-Organizing Maps. Springer, pp. 200–208. Oudjana, S.H., Hellal, A., Mahamed, I.H., 2011. Short term photovoltaic power generation forecasting using neural network. In: Proc. Int. Conf. Environ. Electr. Eng., pp. 706–711. Pavel, L., 2012. Game Theory for Control of Optical Networks. Springer. Perez, R., Kivalov, S., Schlemmer, J., Hemker Jr., K., Renne´, D., Hoff, T. E., 2010. Validation of short and medium term operational solar radiation forecasts in the US. Sol. Energy 84 (12), 2161–2172. Shi, J., Lee, W.J., Liu, Y., Yang, Y., Wang, P., 2012. Forecasting power output of photovoltaic systems based on weather classification and support vector machines. IEEE Trans. Ind. Appl. 48 (3), 1064–1069. Tanaka, K., Uchida, K., Ogimi, K., Goya, T., Yona, A., Senjyu, T., et al., 2011. Optimal operation by controllable loads based on smart grid topology considering insolation forecasted error. IEEE Trans. Smart Grid 2 (3), 438–444. Voyant, C., Muselli, M., Paoli, C., Nivet, M., 2012. Numerical weather prediction (NWP) and hybrid ARMA/ANN model to predict global radiation. Energy 39 (1), 341–355. Wu, J., Chan, C.K., 2011. Prediction of hourly solar radiation using a novel hybrid model of ARMA and TDNN. Sol. Energy 85 (5), 808– 817. Yang, H., Huang, C., Huang, Y., Pai, Y., 2014. A weather-based hybrid method for 1-day ahead hourly forecasting of PV power output. IEEE Trans. Sustain. Energy 5 (3), 917–926.