A novel clustering approach for short-term solar radiation forecasting

A novel clustering approach for short-term solar radiation forecasting

Available online at www.sciencedirect.com ScienceDirect Solar Energy 122 (2015) 1371–1383 www.elsevier.com/locate/solener A novel clustering approac...

719KB Sizes 64 Downloads 211 Views

Available online at www.sciencedirect.com

ScienceDirect Solar Energy 122 (2015) 1371–1383 www.elsevier.com/locate/solener

A novel clustering approach for short-term solar radiation forecasting M. Ghayekhloo a, M. Ghofrani b,⇑, M.B. Menhaj c, R. Azimi d a Young Researchers and Elite Club, Qazvin Branch, Islamic Azad University, Qazvin, Iran School of Science, Technology, Engineering and Mathematics (STEM), University of Washington, Bothell, USA c Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran d Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran b

Received 19 March 2015; received in revised form 6 October 2015; accepted 30 October 2015

Communicated by: Associate Editor David Renne

Abstract This paper proposes a hybrid solar radiation forecasting method based on a novel game theoretic self-organizing map (GTSOM). New strategies are proposed to resolve the limitations of the original SOM for non-winning neurons and increase their competition with winning neurons to obtain more input patterns. Neural gas (NG) and competitive Hebbian Learning (CHL) are used to enhance the learning and quality of the map. Solar radiation data are decomposed by the discrete wavelet transform (DWT). A time series analysis is then used to develop the structure of the training and testing for Bayesian neural networks (BNNs). The proposed GTSOM groups the time-series analyzed datasets into clusters with similar data. The elbow method is used to determine the number of clusters. A cluster selection method is developed to determine the appropriate cluster whose solar radiation data provide the input to the NN. Temperature, wind speed and wind direction data are also included in the inputs to the BNN whose outputs provide the solar radiation forecasts. The historical solar radiation data are used to evaluate the accuracy of the hybrid forecasting with the proposed clustering and its comparison with that of the K-means, the original SOM and NG clustering algorithms. The comparison demonstrates the superior performance of the proposed clustering method. Published by Elsevier Ltd.

Keywords: Clustering; Data preprocessing; Forecasting; Solar radiation

1. Introduction Accurate solar forecasting facilitates the integration of solar generation into the grid by reducing the integration and operational costs associated with solar intermittencies. The need for more accurate forecasting methods is highlighted in the past few years with higher penetrations and energy market growths of solar generation (Inman et al., 2013). Conventional statistical methods such as

⇑ Corresponding author at: UWBB Room 227, 18807 Beardslee Blvd, Bothell, WA 98011, USA. Tel.: +1 425 352 3224; fax: +1 425 352 3775. E-mail address: [email protected] (M. Ghofrani).

http://dx.doi.org/10.1016/j.solener.2015.10.053 0038-092X/Published by Elsevier Ltd.

auto-regressive (AR) and autoregressive moving average (ARMA) are based on time-series models and have been traditionally used for solar prediction (Aguiar and Pereira, 1992; Mora-Lopeza and Sidrach-de-Cardona, 1998). The statistical methods provide linear stationary models which are not appropriate for the non-linear non-stationary solar radiation. Several methods such as fuzzy logic (Tanaka et al., 2011; Jafarzadeh et al., 2013) artificial neural networks (ANNs) Oudjana et al., 2011 and support vector regression (SVR) Li and Li, 2008; Shi et al., 2012 were used to provide non-linear models and better accuracies for solar forecasting. However, the limitations of the fuzzy logic-based method to directly

1372

M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383

learn from historical data and complexity of the ANNbased methods to specify numerous model parameters require a hybrid of different methods to resolve the shortcomings of the individual forecasting techniques. A hybrid of ARMA and a time delay neural network (TDNN) was proposed in Wu and Chan (2011) which demonstrated more accurate results than each method separately. However, the feasibility of the proposed method for short-term solar forecasting is not justified due to its highcomputational complexity. Voyant et al. (2012) used a combination of ARMA and ANN with numerical weather prediction (NWP) to forecast the hourly mean global horizontal irradiance. Self-organizing map (SOM) has been also used in hybrid methods for pattern recognition and classification of the input data (Chen et al., 2011; Yang et al., 2014). Meteorological predictions for relative humidity, solar irradiance and temperature were utilized in Chen et al. (2011) to forecast the solar power generation. However, inaccuracies of the meteorological forecasts degrade the quality of the solar power prediction. Yang et al. (2014) developed a hybrid approach combining a SOM, a learning vector quantization (LVQ) network, the SVR and fuzzy inference methods for a day-ahead PV power prediction. Numerical results demonstrated that the proposed hybrid approach performs better than the traditional ANN and the simple SVR methods individually. However, SOM has some limitations for non-convex or discontinuous input distributions. In addition, non-winning neurons do not participate in the learning phase which reduces the quality of the map and accuracy of the forecast. Game theories have been used for SOM training to resolve these problems and globally optimize the map (Herbert and Yao, 2005; Neme et al., 2005). Herbert and Yao (2005) proposed a training algorithm that considers the neurons as players with strategy sets and utility functions. The algorithm was used to improve the quality of the map, but no justification was provided by either numerical results or experiments. In addition, the algorithm defines the player’s utility function based on all players’ actions and requires solving the game for all iterations which increases its computational complexity. Neme et al. (2005) proposed different strategies to enable the winning neuron to select between its neighbors for the weight adaption. This reduces the complexity of the original SOM algorithm where the weight vectors of all neighboring neurons are updated. However, the proposed strategies were noncooperative in topographic map formation without minimizing the error measure. Engelbrecht (2007) proposed a hybrid model of self-organizing maps (SOM), support vector regression (SVR) and particle swarm optimization (PSO) to forecast hourly global solar radiation. SOM algorithm was applied in the first step to divide the entire input space into several disjointed regions or clusters. SVR models were then used to model each cluster for detecting characteristic correlation between the predicted and the actual values. In order to deal with the volatility of SVR

with different parameters, PSO algorithm was used to improve the forecasting performance of the SVR models. However, the SOM algorithms may converge to nonoptimal clustering results depending on the initialization and learning rate considered for the algorithm. In addition, neighborhood violations occur if the output space topology does not match with the data shape. The amount of solar radiation that reaches the ground is extremely variable because of the apparent motion of the sun and changing conditions of the atmosphere, which make its forecasting complicated. The chaotic nature of the solar data disrupts the neural network learning process and presents the forecasting results with high errors (Di Piazza et al., 2011). An improved version of SOM algorithm is proposed in this paper, to group the solar time series data into clusters to better characterize its irregular features. This technique allows handling groups of data separately, which by identifying anomalies and irregular patterns as well as neglecting outliers, provides a better understanding of the collected information. This leads to more appropriate learning for neural networks and improves the accuracy of the forecasts. This paper develops a hybrid solar radiation forecasting based on a novel clustering algorithm. Game-theoretic concepts with new strategies are used in conjunction with SOM to provide a modified GTSOM clustering. The proposed clustering method increases the non-winning neurons’ participation in the learning and enables their competition with the winning neurons. This provides a better clustering performance as compared to the original SOM where the dead neurons have negligible chances to obtain input patterns. Unlike the original SOM which defines the neighborhood based on the neurons’ distances in the twodimensional lattice, NG is combined with CHL to define the neighborhood based on the neurons’ distances in the input space. This speeds up the learning and enhances the mapping quality. Wavelet decomposition and time series analysis are used to preprocess the solar radiation data. The time series analysis develops the structure of the training and forecasting for BNNs. The proposed GTSOM groups the pre-processed solar radiation datasets into a number of clusters determined by the elbow method. A cluster selection technique is proposed to select the cluster that is used for the solar forecasting of each individual hour. Temperature, wind speed and wind direction data are also included in the inputs to the BNN whose outputs provide the solar radiation forecasts. The rest of the paper is organized as follows. The original SOM algorithm and game-theoretic concepts are explained in Section 2. This section also describes the proposed GTSOM clustering and the hybrid solar radiation forecasting methods. Section 3 provides a case study where the accuracy results are calculated for the hybrid forecasting with different clustering algorithms including the proposed GTSOM method and the existing K-means, the original SOM and NG clustering. Conclusions are given in Section 4.

M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383

1373

where aðtÞ 2 ½0; 1 is the learning rate parameter; rc and ri 2 R2 are the positions of the winning and i th neurons in the two-dimensional lattice. The neighborhood size is controlled by the parameter r.

competitive game and enable the players to obtain more input patterns. This resolves the shortcomings of the original SOM where the positioning of the dead neurons with respect to the input patterns reduces their contribution in the learning and their chance of competing with the winning neurons. The proposed strategies are as follows: Strategy A (Approach): Similar to the original SOM algorithm, winning neurons and its neighbors move toward and approach the input pattern to minimize the Euclidean distance. The non-winning neurons’ strategies are determined based on the situations of the neurons and stage of the iteration. Strategy O (Opposite): The non-winning neurons move in a direction opposite to the winning neuron’s direction at early stages of the iteration. This increases their chance of reaching a pattern as the patterns are distributed within the input space with an equal probability. Strategy S (Stay): This strategy requires the nonwinning neurons to stay in their current positions. This is more appropriate for the recently winning neurons or the neurons, which have won many times as they most likely, approached regions with sufficient input patterns. Strategy R (Random): This strategy requires the nonwinning neurons to move to random positions in the input space. This strategy is more applicable for neurons wandering in regions with insufficient or no input patterns. The random moves increase the chance of these neurons approaching regions with sufficient input patterns and are appropriate at early stages of the iteration. Strategy B (Best player to approach): This strategy requires the non-winning neurons to approach the neuron defined as the best player. An error variable (Ec ) is used to identify the best player. The cumulative error of the neuron and the Euclidean distance between the input pattern and BMU are added to calculate the error variable as follows:

2.2. Game theory

Ec ðtÞ ¼ Ec ðt  1Þ þ kx  wc k2

Game theory studies situations involving players with conflicting interests (Pavel, 2012). Each player has a payoff function and a strategy set. The strategy set defines the player actions in each stage of the game. Both the player and other players’ actions determine the pay-off for each player. The objective is maximizing the players’ payoffs. Games are classified into non-cooperative and cooperative. Each player’s action is chosen independently in a non-cooperative game as opposed to the cooperative games where the players can choose to form coalitions and collaborations.

The number of wins for the neuron is retrieved at the end of the iterations to calculate a counter variable. The error variable of (4) is then divided by the counter variable to obtain the average cumulative error. The neuron with the minimum average cumulative error is selected as the best player. The topographic mapping of SOM preserves the topology of the input data and requires the neighborhood function to be defined based on the neurons’ distances in the 2D lattice (Engelbrecht, 2007). This is not appropriate for the proposed strategies where the neighborhood is defined based on the neurons’ distances in the input space. A SOM variant, NG, is used in this paper to address the issue and provide a proper neighborhood function for the proposed strategies (Martinetz and Schulten, 1991). NG ranks the neurons based on their proximities to the input pattern with an integer ki assigned to each neuron. For example,

2. Methodology 2.1. SOM algorithm SOM is an unsupervised, competitive learning, clustering network that classifies input patterns by transforming them to a two-dimensional map of features (Engelbrecht, 2007). Let X ¼ ½x1 ; x2 ; . . . ; xR T represents an arbitrary input T vector and W i ¼ ½wi1 ; wi2 ; . . . ; wiR  denotes the weight vector for the i th neuron. The SOM training is an iterative process where the Euclidean norm between the neurons and a pattern randomly selected from the set of input patterns is calculated and the neuron with the minimum distance is selected as the winning neuron. The winning neuron is the best matching unit (BMU) whose index is given by Kohonen (1990): qðxÞ ¼ kx  wc k2 ¼ min8i kX  W i k2

ð1Þ

where wc is the weight vector for the winning neuron. The weight adaption rule of SOM is as follows (Kohonen, 1990):  W i ðtÞ þ li ðtÞ½XðtÞ  W i ðtÞ; if i 2 N q ðjÞ W i ðt þ 1Þ ¼ W i ðtÞ; if i R N q ðjÞ ð2Þ where N q is the neighborhood set of BMU and li ðtÞ is the neighborhood function given by: ! 2 krc  ri k ð3Þ li ðtÞ ¼ aðtÞ: exp  2r2 ðtÞ

2.3. The proposed clustering method Game theoretic SOM (GTSOM) is used in this paper to cluster the two-dimensional solar irradiance–temperature dataset. Five strategies are proposed to provide a more

ð4Þ

1374

M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383

2.4. The proposed hybrid forecasting method

Initialize the weight vectors of the neurons and place the close values in the center of the input space

2.4.1. Overview A hybrid forecasting method is proposed that combines a time-series analysis, the GTSOM clustering algorithm, a cluster selection algorithm and BNN. In this section, we will discuss the details of our proposed hybrid method. Fig. 2 shows the schematic diagram of the proposed forecasting method.

Randomly select and apply an input pattern

Use Eq. (1) to determine the winning and nonwinning neurons

No

Is neuron the BMU?

Apply strategies O/S/R/B based on the situations of the neurons and stage of the iteration

2.4.2. Pre-processing stage The pre-processing stage includes wavelet decomposition and a time series analysis to better characterize the solar irradiance behavior and provide the most appropriate inputs for BNNs. The discrete wavelet transform (DWT) decomposes the data into proper levels of resolution determined by the entropy-based criterion (Coifman and Wickerhauser, 1992). Fig. 3 shows the multi-resolution decomposition for an M-level discrete wavelet transform (DWT). For a DWT with M-level decomposition, the sum of the approximation (A) and details (D) of the signal up to scale M provides the original signal (f ) as Ghayekhloo et al., 2015:

Yes

Apply strategy A

Use Eq. (5) to update the neurons’ weight vectors

No

Is the termination criteria satisfied?

Yes

f ¼ AM þ

End

M X

Fig. 1. Flowchart for the proposed clustering method (GTSOM).

neuron i0 is the closest neuron with the integer 0; neuron i1 is the second-closest with the integer 1 and so on. The weight vectors of the neurons are updated by NG as: (

W i ðt þ 1Þ ¼



 k i W i ðtÞ þ aðtÞ  exp rðtÞ ½XðtÞ  W i ðtÞ; if i 2 N q ðjÞ

W i ðtÞ;

CHL in combination with NG determine the neighboring relationship between neurons (Martinetz and Schulten, 1991). Fig. 1 shows the flowchart for the developed GTSOM.

n¼þ1 X

Acm;n um;n

ð7:aÞ

n¼þ1 X n¼1

Dcj;n wj;n

ð7:bÞ

where um;n and wj;n are the scaling and detailed functions at location n and scales m and j, respectively. Acm;n and Dcj;n are the approximate and detailed coefficients calculated by the following recursive equations:

Clustering Stage

Solar testing data

Solar training data

Pre-processing Stage

DWT

Am ¼ Dj ¼

ð5Þ

Solar radiation data

ð6Þ

where Am and Dj are the approximation and detail of the signal at scales m and j calculated by:

n¼1

if i R N q ðjÞ

Structure the training and testing data using time-series analysis

Dj

j¼1

GTSOM Clustering

Select the cluster with the minimum distance and maximum number of correlated training input to the testing input data

Training Stage Wind speed Cluster n Wind direction Temperature

Testing input Testing output

Fig. 2. Schematic diagram of the proposed forecasting method.

Bayesian learning for training data

Forecasting Stage

NN for the forecast output and error calculation

M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383

1375

f = A0 Low-pass Filter

High-pass Filter

Eq. (8.a)

A1

A1c

Eq. (7.a) Low-pass Filter

High-pass Filter

Eq. (8.a)

A2

A2c

Eq. (7.a)

Low-pass Filter

High-pass Filter

Eq. (8.a)

A3

A3c

Eq. (7.a)

Low-pass Filter

High-pass Filter

Eq. (8.a)

Eq. (8.b)

Eq. (8.b)

Eq. (8.b)

Eq. (8.b)

D4c

D3c

c 2

D

D1c

Eq. (7.a)

Eq. (7.b)

Eq. (7.b)

Eq. (7.b)

Eq. (7.b)

AM

DM

D3

D2

D1

A4c

Fig. 3. M-level wavelet decomposition.

Acmþ1;n ¼ Dcjþ1;n ¼

X i

X i

li Acm;2nþi ¼

hi Acj;2nþi ¼

X i

li2n Acm;i

X hi2n Acj;i

ð8:aÞ ð8:bÞ

i

where li and hi are the coefficients of the low- and high-pass filters. The signal’s general trend is approximated by the lowfrequency component while the high-frequency components represent the details. Different frequency components of the data are then time-series analyzed to provide the inputs for NNs. The structure of the time series for the training and testing is given by (9.a) and (9.b), respectively. The data on the left side of the line within the matrices of

(9.a) and (9.b) represents the training and testing input series, and the last column on the right side within the matrices of (9.a) and (9.b) provides the outputs in the training and testing data, respectively. For each set of data, SðtÞ is the most current data point, with N being the length of the sliding window used for the lagged data. The data includes K rows of training and K 0 rows of testing. An iterative test is run to determine an appropriate value for N which provides a reasonable compromise between the forecast accuracy and computational burden. To do so, the forecast accuracy is calculated for different values of N as it is increased and a knee point is selected as the length of the sliding window. Increasing N beyond the knee point does not significantly enhance the forecast accuracy.

ð9:aÞ

1376

M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383

ð9:bÞ

2.4.3. Clustering stage In the clustering stage, the developed GTSOM is used to group the pre-processed dataset into a number of clusters determined by the elbow method (Ketchen and Shook, 1996). A cluster selection method is proposed to determine the appropriate cluster that is used to provide the forecast for a specific hour. Fig. 4 shows the flowchart for the proposed cluster selection method where C is the number of clusters and J is the total number of training input rows within each cluster. m is a counter for the number of training input rows whose correlation with the testing input is greater than a desired value (0.8). An average is calculated for correlated training input datasets within a cluster and represented by Avg:  Correlated  Train  InputðcÞ. X denotes the average of the testing input dataset. For each cluster, a distance is calculated between these two averages. This process is repeated for all clusters and the cluster with the minimum distance and maximum number of correlated training input is chosen to be passes to the training stage. 2.4.4. Training stage Once the data are pre-processed and clustered, Bayesian approach is used to train the data of the selected cluster. Bayesian approach demonstrates the most appropriate performance for limited training datasets and therefore is used among the learning methods for the NN training (Neal, 1996). This is particularly true for the clusters containing fewer data. More information regarding the training parameters are as follows: (1) Number of input: 10 (solar radiation) + 1(temperature) + 1(wind speed) + 1(wind direction). (2) Number of hidden layers and neurons: 1 hidden layer with 5 neurons, 1 output layer. (3) Transfer function of hidden layer: Tansing and pureline. (4) Learning algorithm: Bayesian. (5) Comparison functions: root mean square error (RMSE), relative RMSE (rRMSE). (6) Data distribution (train–test) = 80% train, 20% test. 2.4.5. Forecasting stage This paper uses the BNN for solar irradiance forecasting over short horizons from 1 to 3 h ahead. After receiving

Load data

Use GTSOM and Elbow methods to group the data into an appropriate number of clusters c=1 X = avg. (Test-Input)

j = 1, m = 1

( j) = correlation (Train-Input ( j), Test-Input)

| (j)| >= 0.8

No

Yes Y(m, c) = avg. (Train-Input (j)) m=m+1

j=j+1

Yes

j
Count-Correlated-Train-Input (c) = m Avg.-Correlated-Train-Input (c) = avg. (Y(m, c)) Distance (c) = |X- (Avg.-Correlated-Train-Input (c))|

c =c + 1

Yes

c
Select the cluster with the minimum distance and maximum Count-Correlated-Train-Input End Fig. 4. Flowchart of the cluster selection method.

M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383

1377

Table 1 Forecast accuracy for different lengths of the sliding window (N). Window

Inputs

RMSE (W/m2)

rRMSE

N=3 N=5 N=8 N = 10 N = 15

[S(t  2), S(t  1), S(t)] [S(t  4), S(t  3), S(t  2), S(t  1), S(t)] [S(t  7), S(t  6), S(t  5), S(t  4), S(t  3), S(t  2), S(t  1), S(t)] [S(t  9), S(t  8), S(t  7), S(t  6), S(t  5), S(t  4), S(t  3), S(t  2), S(t  1), S(t)] [S(t  14), S(t  13), S(t  12), S(t  11), S(t  10), S(t  9), S(t  8), S(t  7), S(t  6), S(t  5), S(t  4), S (t  3), S(t  2), S(t  1), S(t)]

77.5059 71.2911 68.2694 65.860 64.798

0.487 0.448 0.429 0.414 0.408

effective inputs and simulating a network, the BNN offers output values as forecast. As mentioned in Section 2.4.4, in this paper for the solar irradiance forecasting process, the data on air temperature, wind speed, wind direction and the data on solar irradiance for the selected cluster are considered as effective inputs, and based on Eq. (9.b), the values with the general form of b S ðt þ 1Þ; b S ðt þ 2Þ; 0 b . . . ; S ðt þ K Þ include the network output as forecast for different time horizons (1 h ahead, 2 h ahead, . . ., K 0 hour ahead).

3. Case studies This section evaluates the accuracy of the hybrid forecasting with the proposed clustering method (GTSOM) and its comparison with that of the K-means, original SOM and NG. The hourly solar radiation data of the Ames station between 2011 and 2013 are used for the NN training and forecasting (http://mesonet.agron.iastate. edu). 80% of the data is used to train the NNs and the remaining 20% is used for testing. The iterative test

Table 2 Selection of the clusters for irradiance solar forecasting of individual hours (10/05/2013).

8 9 10 11 12 13 14 15 16 17 18 19

Cluster 1

2

3

4

5

6

7

6.59 129.12 60.06 131.25 104.63 26.63 7.62 49.03 146.77 50.45 179.87 23.85

83.82 51.89 137.29 54.02 178.14 103.86 84.86 126.26 135.03 26.78 102.89 53.38

172.05 52.23 118.58 50.1 74.01 152.02 171.02 129.62 30.91 130.91 1.23 157.5

57.12 167.17 3.64 65.76 40.93 37.08 56.08 14.68 84.03 114.15 116.17 87.56

73.89 61.82 127.36 63.96 171.93 93.92 74.92 116.33 144.97 16.85 112.83 43.45

116.17 108.12 62.7 105.98 18.13 96.14 115.14 73.73 24.97 173.21 57.11 146.61

72.08 63.63 125.55 65.76 170.12 92.12 73.11 114.52 146.77 15.04 114.64 41.64

150 140 130 120

RMSE

Hour

110 100 90 80 70 60 50

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Number of Clusters Fig. 5. Performance of the proposed solar radiation forecasting method with different clustering techniques for different number of clusters.

1378

M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383

Table 3 Accuracy results for the proposed hybrid forecasting with the new and existing clustering methods (1 h ahead). Month

Performance measures (W/m2)

Proposed hybrid forecasting with

Jan.

RMSE rRMSE

53.929 0.845

58.283 0.733

73.027 0.921

80.193 1.003

Feb.

RMSE rRMSE

59.735 0.727

62.986 0.609

67.065 0.582

77.684 0.714

Mar.

RMSE rRMSE

55.538 0.462

89.621 0.586

111.831 0.637

83.347 0.512

Apr.

RMSE rRMSE

71.499 0.541

85.143 0.453

114.216 0.662

96.602 0.538

May

RMSE rRMSE

78.977 0.487

93.362 0.462

127.033 0.611

145.553 0.694

Jun.

RMSE rRMSE

81.358 0.488

109.460 0.459

141.153 0.662

146.888 0.639

Jul.

RMSE rRMSE

83.516 0.405

78.751 0.303

117.250 0.429

120.439 0.431

Aug.

RMSE rRMSE

77.313 0.392

83.350 0.393

131.059 0.578

103.506 0.418

Sep.

RMSE rRMSE

73.038 0.458

126.009 0.706

108.683 0.615

95.042 0.498

Oct.

RMSE rRMSE

58.796 0.657

56.993 0.504

80.407 0.687

86.238 0.701

Nov.

RMSE rRMSE

57.363 0.988

63.127 0.686

59.101 0.757

78.796 0.861

Dec.

RMSE rRMSE

56.476 1.026

71.633 1.022

68.146 1.085

69.630 1.063

GTSOM

SOM

K-means

NG

Table 4 Accuracy results for the proposed hybrid forecasting with different clustering methods and forecast horizons. Model

Performance measures (W/m2)

Forecast horizon 1 h ahead

2 h ahead

3 h ahead

NG

RMSE rRMSE

63.681 0.408

86.059 0.552

106.903 0.686

K-means

RMSE rRMSE

61.029 0.384

85.723 0.533

98.264 0.611

SOM

RMSE rRMSE

57.855 0.374

90.580 0.586

114.786 0.742

GTSOM

RMSE rRMSE

52.623 0.327

82.765 0.521

96.076 0.604

determines the length of the sliding window used for the NN structure. This provides the number for the lagged solar radiation data used as the inputs to the NN. In addition, temperature, wind speed and wind direction are included in the inputs to better train the NN. The forecast produces hourly irradiance values for different forecast horizons. The forecast accuracy is evaluated by the mean absolute error (MAE), relative MAE (rMAE), root mean square error (RMSE), relative RMSE (rRMSE) and normalized RMSE (nRMSE) as follows:

 K0  1X b  MAE ¼ 0  S ðnÞ  S Actual ðnÞ K n¼1 rMAE ¼ MAE=S Actual rMAE ¼ MAE=S Actual vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u K0 u1 X 2 t ðb SðnÞ  S Actual ðnÞÞ RMSE ¼ 0 K n¼1 rRMSE ¼ RMSE=S Actual

ð10Þ ð11Þ ð12Þ ð13Þ

M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383

nRMSEð%Þ ¼ 100  RMSE=ðmaxðS Actual Þ  minðS Actual ÞÞ

ð14Þ

Table 5 Accuracy results of the proposed hybrid forecasting with GTSOM clustering for different solar datasets.

RMSE

rRMSE

76.603 88.2438 61.3799 71.174 70.996 88.948 69.975 86.552 84.984 73.889 92.147

0.4818 0.5659 0.3921 0.455 0.441 0.576 0.604 0.562 0.593 0.472 0.580

b ðnÞ and where K 0 is the total number of testing outputs, S S Actual ðnÞ are the solar radiation forecast and the actual solar radiation for hour n. Table 1 provides the results of the iterative test to determine the length of the sliding window ðN Þ. The forecast accuracy significantly enhances as the window’s length increases to 10 beyond which further increasing N does not significantly decreases the forecast errors. Therefore, 10 is selected as number of lagged solar radiation inputs for the NN. The proposed GTSOM groups the solar irradiance data into 7 clusters determined by the elbow method. Table 2 provides the distances between the average of the correlated training inputs and average of testing inputs for each cluster for October 5, 2013. The distance is provided for hours 8–19 as the daylight hours. The cluster with the minimum distance and maximum number of correlated training inputs is used to forecast the solar irradiance for each hour. For hour 8, the

600 Measurement GTSOM

500

SOM NG

Solar radiation (W/m 2 )

Ames Calmar Castana Cedar Rapids Chariton Gilbert Kanawha Lewis Muscatine Nashua Sutherland

Proposed hybrid forecasting with GTSOM

Kmeans

400 300 200 100 0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Time (hour) Fig. 6. Solar radiation forecasts for 02/05/2013.

1000 900

Measurement GTSOM SOM NG Kmeans

800

Solar radiation (W/m2)

Dataset

700 600 500 400 300 200 100 0

1

2

3

4

5

6

7

8

1379

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Time (hour) Fig. 7. Solar radiation forecasts for 05/10/2013.

1380

M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383 900 Measurement GTSOM SOM NG Kmeans

Solar radiation (W/m2 )

800 700 600 500 400 300 200 100 0 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Time (hour) Fig. 8. Solar radiation forecasts for 08/25/2013.

500 450

Measurement GTSOM SOM NG Kmeans

2

Solar radiation (W/m )

400 350 300 250 200 150 100 50 0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Time (hour)

nRMSE (%)

Fig. 9. Solar radiation forecasts for 11/17/2013.

50 45 40 35 30 25 20 15 10 5 0

Proposed Method Hybrid ARIMA LES SES RW

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Time Fig. 10. nRMSE (%) comparison of forecasting methods for solar radiation data in Colorado.

distance between the testing input average and correlated training input average of cluster 13 is the lowest (6.59). Therefore, this cluster is used for the solar irradiance forecasting of hour 8. 51.89 is the minimum distance among the

values of row 2 which selects cluster 2 as the appropriate cluster to forecast the solar irradiance for hour 9. The selected clusters for the solar irradiance forecasting of hours 10–19 are 4, 3, 6, 1, 1, 4, 6, 7, 3, and 1, respectively. Fig. 5 shows the RMSE values of the forecast with different clustering methods for the Ames solar radiation dataset. Forecast results of the proposed hybrid algorithm are individually calculated for each of the clustering methods as the number of clusters increases to 30. The figure shows the lowest RMSE values of the forecasts with the proposed GTSOM for all number of clusters. The figure also demonstrates the minimum number of clusters needed for each clustering method to provide the best forecasting results. Table 3 shows the accuracy results of the hybrid forecasting with the proposed GTSOM method and its comparison with that of the K-means, original SOM and NG clustering algorithms. The forecast accuracy is calculated for each hour and then averaged to provide the performance measures for each month of the year 2013. The calculated performance indices demonstrate that the

Table 6 A summary of the forecast results for several forecasting methodologies. Author(s)

Mathiesen and Kleissl (2011)

Gala et al. (2013)

Models

Forecast horizon

Performance measures RMSE (W/m2)

rMAE (%)

MAE (W/m2)

NK: interpolated NAM (North American Model) forecast G: the 3-h constant kt* (the clear sky index) GFS (global forecast system) forecast ECMWF: the European center for medium–range weather forecast

1 day 1 day

117.0 82.8

– –

– –

1 day

100.9





ECMWF

1h 3h 1 day 1h 3h 1 day 1h 3h 1 day

– – – – – – – – –

– – – – – – – – –

30.22 65.92 243.09 28.00 59.21 203.89 28.00 58.76 218.05

Galicia on Spain’s north western coast

D-SVR: post-processing ECMWF with SVR at daily resolution added to 1-h resolution using CSM (climate system model)

Bondville, Illinois

Perez et al. (2010)

An empirical correlation between the sky cover data and measured GHI (global horizontal irradiance)

1h 2h 3h 4h 5h 6h 1 day 2 day 3 day 4 day 5 day 6 day 7 day

80 88 96 104 116 142 125 139 142 147 147 147 169

– – – – – – – – – – – – –

– – – – – – – – – – – – –

Desert Rock SURFRAD station

Lorenz et al. (2009)

ECMWF–OL: statistical post-processing in combination with a clear sky model

1 day 2 day 3 day 1 day 2 day 3 day 1 day 2 day 3 day 1 day 2 day 3 day 1 day 2 day 3 day

92 95 102 94 100 104 99 103 107 118 125 138 144 161 167

– – – – – – – – – – – – – – –

59 63 67 62 66 69 67 70 73 74 79 86 92 104 109

University of Oldenburg, Germany

BLUE: the statistical model of Blue Sky

MM–MOS: MOS (model output statistics) by the weather company Meteomedia GmbH. GmbH (Gesellschaft mit beschra¨nkter Haftung) is a type of business or company in Germany.) WRF–MT

Persistence: the strong correlation between past and future

M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383

3H-SVR: post-processing ECMWF with SVR at 3-h resolution added to 1-h resolution using CSM

Location of data used

(continued on next page) 1381

49.792 57.411 72.096 82.734 15.071 17.377 21.822 25.042 67.921 82.506 113.449 119.752 1h 2h 3h 1 day A hybrid solar radiation forecasting method based on a novel game theoretic self-organizing map (GTSOM) Proposed method

123 129 142 1 day 1 day 1 day

34.7 39 41.2

77 87 91

Golden CO, USA dataset Austrian dataset 98.09 61 65 70 27.39 27.6 29.02 31.5 149.29 99 101 112

NREL (National Renewable Energy Laboratory) golden CO BLUE ECMWF–OL (Oldenburg) The traditional synoptic model of the meteorologists of Blue Sky (SYNOP) WRF–MT (Weather Research and Forecasting–Meteotest) CENER: post processing based on machine learning methods Austrian persistence Manjili and Niknamfar (2015)

1 day 1 day 1 day 1 day

– – – 103 93 57 1 day (clear sky)

– – –

– – – – – – 127 233 138

ECMWF–NWP: a global forecast model used for NWPs STNN: developed Statistical model based on ANN ECMWF–MOSNN: developed model based on ANN and ECMWF NWP data ECMWF–NWP STNN ECMWF–MOSNN Cornaro et al. (2015)

1 day (overcast)

RMSE (W/m2)

rMAE (%)

MAE (W/m2)

Location of data used Performance measures

Forecast horizon Models Author(s)

Table 6 (continued)

Ames, Iowa, United States

M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383 The ESTER Laboratory of the University of Rome

1382

proposed forecasting with the new clustering method outperforms the forecasting with other clustering techniques. Table 4 provides the accuracy of the proposed hybrid forecasting method with different clustering algorithms for forecast horizons ranging from 1 to 3 h ahead. The results for rRMSE demonstrate that the proposed GTSOM model outperforms the SOM, NG, and Kmeans by 12.6%, 19.9%, and 14.8% for 1 h ahead prediction; 11.1%, 5.6%, and 2.3% for 2 h ahead prediction; and 18.6%, 12%, and 1.1% for 3 h ahead prediction. The corresponding values of RMSE are 9%, 17.4%, and 13.8% for 1 h ahead prediction; 8.6%, 3.8%, and 3.5% for 2 h ahead prediction; and 16.3%, 10.1%, and 2.2% for 3 h ahead prediction. The proposed forecasting with GTSOM clustering is evaluated using several different solar radiation datasets to provide a comprehensive performance analysis. The datasets are from different weather stations with different solar radiation characteristics. Table 5 provides the accuracy results for an hour-ahead prediction. Figs. 6–9 show the solar irradiance forecasts for four different days in spring, summer, fall and winter 2013 in Ames region. The figures provide the forecast results of the proposed GTSOM and other clustering methods as well as the measurements. The figures demonstrate the better performance of the proposed forecasting with GTSOM clustering for various weather conditions. Fig. 10 compares the hourly forecasting results of the proposed method with five other well-established forecasting models including ARIMA (autoregressive integrated moving average), LES (linear exponential smoothing), SES (simple exponential smoothing), RW (random-walk) and Hybrid method (Dong et al., 2015). Solar radiation data in Colorado USA (http://www.nrel.gov) is used for all these forecasting methods to provide a fair comparison. The results for nRMSE (%) demonstrate that the proposed method provides more accurate forecasting results than other forecasting models. Table 6 provides a comparative analysis of the other newest forecasting models available in literature as well as our proposed forecasting method. Several accuracy indices are used for the comparison. Accuracy results cover a wide range of forecast horizons including hourly predictions from an hour ahead up to 7-h ahead as well as daily predictions from a day ahead to 7-day ahead. The table also provides an overview of the forecast models and the data used for the evaluation. Results demonstrate that the proposed hybrid forecasting with GTSOM clustering outperforms its comparatives with the same forecasting horizons. 4. Conclusions A four-stage (pre-processing, clustering, training and forecasting) solar radiation forecasting method is proposed in this paper. The pre-processing stage decomposes the solar radiation data into an appropriate resolution level

M. Ghayekhloo et al. / Solar Energy 122 (2015) 1371–1383

determined by the entropy-based criterion. A time series analysis is then used to develop the structure of the input and output series for the NNs. A modified GTSOM with novel strategies is developed for the clustering stage of the forecasting method to provide a more competitive game and a better clustering performance. A cluster selection method is also proposed to determine the appropriate cluster for the solar radiation forecasting of each individual hour. The pre-processed and clustered solar radiation data as well as the temperature, wind speed and wind direction data are trained by Bayesian approach. NN is then used to estimate the solar radiation and provide the forecast. The historical hourly solar radiation data are used to evaluate the performance of the developed forecasting method with different clustering algorithms. Comparison of the forecast results demonstrates the improved accuracy of the proposed GTSOM as compared to the existing SOM, NG and K-means clustering techniques. The proposed hybrid method is also compared with the existing state-of-the-art techniques for their forecast accuracies. The results show a significant accuracy improvement for the proposed forecasting model. References Aguiar, R., Pereira, M.C., 1992. TAG a time-dependent, autoregressive, Gaussian model for generating synthetic hourly radiation. Sol. Energy 49 (3), 167–174. Chen, C., Duan, S., Cai, T., Liu, B., 2011. Online 24-h solar power forecasting based on weather type classification using artificial neural network. Sol. Energy 85 (11), 2856–2870. Coifman, R.R., Wickerhauser, M.V., 1992. Entropy-based algorithms for best basis selection. IEEE Trans. Inf. Theory 38 (2), 713–718. Cornaro, C., Pierro, M., Bucci, F., 2015. Master optimization process based on neural networks ensemble for 24-h solar irradiance forecast. Sol. Energy 111, 297–312. Di Piazza, A., Di Piazza, M.C., Ragusa, A., Vitale, G., 2011. Environmental data processing by clustering methods for energy forecast and planning. Renew. Energy 36 (3), 1063–1074. Dong, Z., Yang, D., Reindl, T., Walsh, W.M., 2015. A novel hybrid approach based on self-organizing maps, support vector regression and particle swarm optimization to forecast solar irradiance. Energy 82, 570–577. Engelbrecht, A.P., 2007. Computational Intelligence: An Introduction, second ed. John Wiley & Sons. ´ ., Dı´az, J., Dorronsoro, J., 2013. Support vector Gala, Y., Ferna´ndez, A forecasting of solar radiation values. In: Pan, J., Polycarpou, M., Woz´niak, M., de Carvalho, A., Quintia´n, H., Corchado, E. (Eds.), . In: Hybrid Artificial Intelligent Systems, vol. 8073. Springer, Berlin Heidelberg, pp. 51–60. Ghayekhloo, M., Menhaj, M.B., Ghofrani, M., 2015. A hybrid short-term load forecasting with a new data preprocessing framework. Electr. Power Syst. Res. 119, 138–148. Herbert, J., Yao, J., 2005. A game-theoretic approach to competitive learning in self-organizing maps. In: Wang, L., Chen, K., Ong, Y.S. (Eds.), Advances in Natural Computation. Springer, pp. 129–138.

1383

Inman, R.H., Pedro, H.T.C., Coimbra, C.F.M., 2013. Solar forecasting methods for renewable energy integration. Energy Combust. Sci. 39, 535–576. Jafarzadeh, S., Fadali, M.S., Evrenosog˘lu, C.Y., 2013. Solar power prediction using interval type-2 TSK modeling. IEEE Trans. Sustain. Energy 4 (2), 333–339. Ketchen, D.J., Shook, C.L., 1996. The application of cluster analysis in strategic management research: an analysis and critique. Strateg. Manag. J. 17, 441–458. Kohonen, T., 1990. The self-organizing map. IEEE Proc. 78 (9), 1464–1480. Li, R., Li, G.M., 2008. Photovoltaic power generation output forecasting based on support vector machine regression technique. Elect. Power 41 (2), 74–78. Lorenz, E., Remund, J., Mller, S.C., Traunmller, W., Steinmaurer, D.G., Ruiz-Arias, J.A., Fanego, V.L., Ramirez, L., Romeo, M.G., Kurz, C., Pomares, L.M., Guerrero, C.G., 2009. Benchmarking of different approaches to forecast solar irradiance. In: 24th European Photovoltaic Solar Energy Conference. Manjili, Y., Niknamfar, M., 2015. Big data analytic: cases for communications systems modeling and renewable energy forecast. In: ElOsery, A., Prevost, J. (Eds.), . In: Control and Systems Engineering, vol. 27. Springer International Publishing, pp. 109–134. Martinetz, T., Schulten, K., 1991. A neural-gas” network learns topologies. In: Kohonen, T., Makisara, K., Simula, O., Kangas, J. (Eds.), Proceedings of International Conference on Artificial Neural Networks (ICANN-91), vol. 1, pp. 397–402. Mathiesen, P., Kleissl, J., 2011. Evaluation of numerical weather prediction for intra-day solar forecasting in the continental United States. Sol. Energy 85 (5), 967–977. Mora-Lopeza, L.L., Sidrach-de-Cardona, M., 1998. Multiplicative ARMA models to generate hourly series of global irradiation. Sol. Energy 63 (5), 283–291. Neal, R.M., 1996. Bayesian Learning for Neural Networks. SpringerVerlag, New York. Neme, A., Hernandez, S., Neme, O., Hernandez, L., 2005. Self-organizing maps with non-cooperative strategies (SOM-NC). In: Prı´ncipe, J.C., Miikkulainen, R. (Eds.), Advances in Self-Organizing Maps. Springer, pp. 200–208. Oudjana, S.H., Hellal, A., Mahamed, I.H., 2011. Short term photovoltaic power generation forecasting using neural network. In: Proc. Int. Conf. Environ. Electr. Eng., pp. 706–711. Pavel, L., 2012. Game Theory for Control of Optical Networks. Springer. Perez, R., Kivalov, S., Schlemmer, J., Hemker Jr., K., Renne´, D., Hoff, T. E., 2010. Validation of short and medium term operational solar radiation forecasts in the US. Sol. Energy 84 (12), 2161–2172. Shi, J., Lee, W.J., Liu, Y., Yang, Y., Wang, P., 2012. Forecasting power output of photovoltaic systems based on weather classification and support vector machines. IEEE Trans. Ind. Appl. 48 (3), 1064–1069. Tanaka, K., Uchida, K., Ogimi, K., Goya, T., Yona, A., Senjyu, T., et al., 2011. Optimal operation by controllable loads based on smart grid topology considering insolation forecasted error. IEEE Trans. Smart Grid 2 (3), 438–444. Voyant, C., Muselli, M., Paoli, C., Nivet, M., 2012. Numerical weather prediction (NWP) and hybrid ARMA/ANN model to predict global radiation. Energy 39 (1), 341–355. Wu, J., Chan, C.K., 2011. Prediction of hourly solar radiation using a novel hybrid model of ARMA and TDNN. Sol. Energy 85 (5), 808– 817. Yang, H., Huang, C., Huang, Y., Pai, Y., 2014. A weather-based hybrid method for 1-day ahead hourly forecasting of PV power output. IEEE Trans. Sustain. Energy 5 (3), 917–926.