Hybrid machine learning forecasting of solar radiation values

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Hybrid ma...

Download PDF

1MB Sizes 1 Downloads 96 Views

Report

PDF Reader
Full Text

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Hybrid machine learning forecasting of solar radiation values Yvonne Gala a,n, Ángela Fernández a, Julia Díaz b, José R. Dorronsoro a a Departamento de Ingeniería Informática, Universidad Autónoma de Madrid, C/Francisco Tomás y Valiente, 11, E.P.S., Ediﬁcio B, 4 planta, UAM-Cantoblanco, 28049 Madrid, Spain b Instituto de Ingeniería del Conocimiento, 28049 Madrid, Spain

art ic l e i nf o

a b s t r a c t

Article history: Received 11 April 2014 Received in revised form 4 February 2015 Accepted 6 February 2015

The constant expansion of solar energy has made the accurate forecasting of radiation an important issue. In this work we apply Support Vector Regression (SVR), Gradient Boosted Regression (GBR), Random Forest Regression (RFR) as well as a hybrid method to combine them to downscale and improve 3-h accumulated radiation forecasts provided by Numerical Weather Prediction (NWP) systems for seven locations in Spain. We use either direct 3-h aggregated radiation forecasts or we build ﬁrst global accumulated daily predictions and disaggregate them into 3-h values, with both approaches outperforming the base NWP forecasts. We also show how to disaggregate the 3-h forecasts into hourly values using interpolation based on clear sky (CS) theoretical and experimental radiation models, with the disaggregated forecasts again being better than the base NWP ones and where empirical CS interpolation yields the best results. Besides providing ample background on a problem that offers many opportunities to the Machine Learning (ML) community, our study shows that ML methods or, more generally, hybrid artiﬁcial intelligence systems are quite effective and, hence, relevant for solar radiation prediction. & 2015 Elsevier B.V. All rights reserved.

Keywords: Solar radiation Support Vector Regression Gradient Boosting Random Forests Numerical Weather Prediction

1. Introduction The presence of renewable energy in the electrical systems of advanced countries is clearly growing. Two key reasons for this are concern about climate change and the ever increasing cost of energy. But a third reason is the constant advances in the underlying generation technologies that drive down their price and make it progressively more competitive with respect to other, more traditional energy sources. All these imply a growing impact of renewable energy in the operation and stability of electrical systems and give a very high priority to their smooth integration in the electrical grid. This integration is not easy since, to a large extent, renewable energy derived from wind or solar radiation is essentially non-operable (with perhaps the exception of solar thermal plants) and, as of today, there are not efﬁcient ways to store that energy. Thus, it has to be directly transmitted to the electrical grid. This gives a great importance to the accurate forecasting of renewable energy production; in turn, this depends on the availability of a precise forecasting of the meteorological variables that affect most than production. In the case of photovoltaic and n

Corresponding author. Tel.: þ 34 914972349; fax: þ 34 4972235. E-mail addresses: [email protected] (Y. Gala), [email protected] (Á. Fernández), [email protected] (J. Díaz), [email protected] (J.R. Dorronsoro).

solar thermal energies, this means global solar radiation. Weather forecasting systems provide nowadays fairly accurate general predictions for everyday use. But the high variability of atmospheric phenomena, the complex atmospheric modeling on which Numerical Weather Prediction (NWP) relies, the extremely large underlying systems for data capture and assimilation and, in general, the large uncertainty associated with weather behavior make accurate localized forecasting a very difﬁcult task. On the other hand, the increased use of NWP variables in energy modeling is also a driving force for the improvement of their forecast. A case at hand is that of wind energy, where new NWP products tailored to the needs of wind energy prediction have appeared relatively recently. But this process is bound to be slower in the case of solar energy. A ﬁrst reason is that its widespread use is still well below that of wind energy. A second reason is that current NWP modeling of radiation and other phenomena that affect solar energy is not as precise as that of, say, wind. In fact, NWP forecasting of cloud behavior at a solar park scale is impossible given the current spatial resolution of NWP systems. Moreover, phenomena such as aerosols or the presence of sand and dust in the atmosphere are probably not too well modeled in current NWP systems or, even, considered in some of them. This offers a window of opportunity to the use of other tools that can help us to improve NWP radiation forecast by downscaling it, i.e., to adjust it to the precise geographical location

http://dx.doi.org/10.1016/j.neucom.2015.02.078 0925-2312/& 2015 Elsevier B.V. All rights reserved.

Please cite this article as: Y. Gala, et al., Hybrid machine learning forecasting of solar radiation values, Neurocomputing (2015), http: //dx.doi.org/10.1016/j.neucom.2015.02.078i

2

Y. Gala et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

where a solar park may be. Particularly promising candidates are the methods derived from Machine Learning (ML) or, more generally, Artiﬁcial Intelligence, that can be used either individually by themselves or as combination of different systems under a hybrid approach; we will consider both possibilities here. Of course, the use of statistical or ML methods to post-process NWP forecasts is not new and there is a large literature on this subject. Regression models have been certainly proposed for this and a ﬁrst example can be found in [1] where solar radiation measurements are matched with available forecasts of Global Horizontal solar Radiation (GHR) over horizons from 6 to 30 h. From an artiﬁcial intelligence perspective, Artiﬁcial Neural Networks (ANNs) have been applied, for instance, in [2] to reduce the relative Root Mean Square Error (RMSE) of daily average GHR forecasts; [3] gives a comprehensive review of artiﬁcial intelligence techniques in photovoltaic (PV) energy. In fact, the application of ML methods to these issues is lately increasing. A ﬁrst example is the benchmark performed under the Weather Intelligence for Renewable Energies—WIRE COST Action1—for the forecasting of energy production in two Italian PV farms. The second example is the 2013 competition jointly organized by the American Meteorological Society (AMS) and the company Kaggle2 where the goal was to predict daily aggregated radiation at a number of weather stations in Oklahoma from an ensemble set of NWP predictions of a number of radiation-related weather variables. Quantile regression was the winning method in the ﬁrst case and gradient boosting regression (which we will use here) yielded the winning model in the AMS–Kaggle competition. The starting point in any ML-based downscaled radiation forecast is obviously the output of state-of-the-art NWP systems such as that of the European Center for Medium Weather Forecast (ECMWF3). The ECMWF model is very widely used and often considered the best globally. Its forecasts are available twice a day on a 0.251 0.251 grid resolution (recently reﬁned to 0.1251 0.1251) and it gives predictions for a number of days with a temporal resolution of three hours. We observe that this resolution is due to the huge sizes of the global forecasts of a large number of atmospheric variables over several pressure layers and up to ten days ahead that world-wide NWP systems such as ECMWF have to provide; these sizes make ﬁner temporal scales prohibitive. In this study we will work with two ECMWF variables, surface solar radiation downward (SSRD), that represents the aggregated global horizontal radiation (GHR) predicted over the last three hours, and total cloud cover (TCC) prediction, given as a three hour average. However, it has been shown that these models have prediction biases and, therefore, limited forecast accuracy. This is due to factors such as the complexity of cloud physics or the wide spatial resolution, and has been reported in many studies, such as [4–6] for the ECMWF and other NWP systems. To obtain reﬁned forecasts, Model Output Statistics (MOS) is a widely used technique to correct NWP bias; it has been applied to radiation estimates under different approaches [6–9]. In some sense, the application of ML methods that we will present here can also be seen as an alternative approach to MOS. We will consider predictions of global horizontal radiation (GHR) for seven geographic locations in Spain derived from three ML approaches, Support Vector Regression (SVR; [10]), Gradient Boosted Regression (GBR; [11]) and also Random Forest Regression (RFR; [12]; see also [13] for a concise presentation of these methods). The ECMWF's predicted variables SSRD and TCC will

1 http://wire1002.ch/ﬁleadmin/user_upload/Documents/ES1002_Benchmark_ announcement_v6.pdf 2 https://www.kaggle.com/c/ams-2014-solar-energy-prediction-contest 3 European Center for Medium-Range Weather Forecasts http://www.ecmwf. int/

be used as inputs for them. Moreover, SSRD values can also be used as a base radiation forecast with which to contrast the results of the ML models. Recall that aggregated NWP radiation forecasts are given every three hours, so our ﬁrst goal is to derive predictions of aggregated 3-h radiation, for which we will follow two different approaches. In the ﬁrst one we will work with models that directly transform 3-h NWP predictions into 3-h aggregated downscaled radiation. We will identify as 3H the models obtained under this approach. In the second one, we will take the NWP predictions of an entire day as a single input pattern from which we predict the aggregated radiation of the entire day and then disaggregate that prediction into 3-h values by scaling it proportionally to the ECMWF's 3-h radiation predictions. We will identify the corresponding models as D. In any case, the main goal is, of course, to arrive to hourly radiation forecasts, for which we have to disaggregate the 3-h predictions of the 3H and D models. We point out that disaggregating accumulated radiation into hourly values is not an easy problem at all and that there is not yet an accepted solution. The approach often used is to interpolate three-hour values using the hourly values of a Clear Sky (CS) radiation model and we will do so here. But, again, it is quite difﬁcult to adjust theoretical CS models to the particular settings of a concrete geographical location. Because of this, we will also use an alternative approach, local empirical CS curves, that are computed for a given day D and hour h according to the radiation measured in the past at the same hour h of a number of days centered at D. Putting all this together, the paper's main contributions can be summarized as follows:

We thoroughly analyze the radiation downscaling problem

both from a NWP perspective and as a ﬁeld for the application of ML techniques. We study the application of three state of the art ML methods, SVR, GBR and RFR to the downscaling of 3-h aggregated radiation either as a prediction directly derived from 3-h NWP forecasts or as the disaggregation of aggregated daily radiation predictions. In particular, we show that all of them clearly improve on the predictions provided by the ECMWF, which are often considered as the best NWP forecasts, with SVR giving the best results overall. We propose a new procedure to decompose the predictions of three-hour aggregated radiation into hourly values using the concept of a local empirical CS radiation curve.

Moreover, and from a more general point of view, we will provide valuable background information on the problem of downscaling NWP radiation forecasts. We believe this to be of interest for the ML community, as renewable energy is a ﬁeld that offers many opportunities for ML practitioners to provide useful solutions to important problems. This is particularly the case with solar energy, where the improving efﬁciency of PV cells and the falling costs of installation and operation of PV plants will likely lead to a large impact of PV energy on the operation of electrical systems and, thus, to demands of accurate local radiation forecasts. The rest of the paper is organized as follows. We will describe in Section 2 the three ML approaches, SVR, RFR and GBR, that we will use. Section 3 describes ﬁrst the measured radiation data we will work with as well as the base ECMWF forecasts, and then we will present the results of the daily and 3-h ML models as well as their comparison with the base ECMWF forecasts. In Section 4 we will ﬁrst review a simple clear sky radiation model and describe then our approach to build empirical clear sky curves for the concrete locations considered here. Both approaches shall then be applied in Section 5 to derive hourly radiation forecasts by

Please cite this article as: Y. Gala, et al., Hybrid machine learning forecasting of solar radiation values, Neurocomputing (2015), http: //dx.doi.org/10.1016/j.neucom.2015.02.078i

Y. Gala et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

interpolating the 3-h aggregated predictions of the D and 3H models. Finally, Section 6 ends the paper with a discussion, conclusions and pointers to further work.

2. Machine learning models ML has a very rich offer of modeling tools, many of them with a proven track record in applications. We shall consider here Support Vector Regression, one of the workhorses in nonlinear modeling, and, alternatively, we shall also work with two ensemble based approaches: Random Forest Regression, where standard tree bagging is improved ensuring that the different trees are uncorrelated, and Gradient Boosted Regression, where we boost the predictions of a pool of trees. We explain these approaches next. 2.1. Support vector regression Given a sample S ¼ fðX p ; t p Þ : 1 r p rNg, the simplest form of Support Vector Regression (SVR) [10] seeks to ﬁt a linear model f ðX; W; bÞ ¼ W X þ b to the sample in such a way that it minimizes the regularized ϵ-insensitive loss function Lϵ ðW; bÞ deﬁned as Lϵ ðW; bÞ ¼

X λ ½t p f ðX p ; W; bÞϵ þ ‖W‖2 ; 2 p

ð1Þ

where we have ½zϵ ¼ maxð0; j zj ϵÞ. In other words, an error t p f ðX p ; W; bÞ is penalized only if it is greater than ϵ in absolute value. Thus, SVR can be seen as a variant of the standard L2 regularized regression where instead of the familiar z2i ¼ ðyi W X i bÞ2 square error, we use the ½zi ϵ errors; i.e., we allow an ϵ-wide, penalty-free “error tube” around the model function W X þb. In order to solve it, the SVR loss function is usually written ﬁrst in a more SVM-like form as the problem of minimizing over W, b and ξ the loss function LðW; b; ξÞ deﬁned as X 1 n LðW; b; ξÞ ¼ ‖W‖2 þC ðξi þ ξi Þ; 2 i

ð2Þ

subject to the restrictions ξi ϵ r W X i þ b yi r ξi þ ϵ; n

ξi ; ξni Z 0:

ð3Þ

This is known as the SVR primal problem which, in turn, is transformed using standard Lagrangian analysis in its dual problem, that is, minimizing the dual function X 1X ðα β i Þðαj βj ÞX i X j þ ϵ ðαi þ βi Þ Jðα; β Þ ¼ 2 i;j i i X yi ðαi β i Þ ð4Þ i

over the Lagrange multipliers α and β that are subject to the P constrains 0 r αi ; β i r C and i ðαi βi Þ ¼ 0. We recall that α and β arise, respectively, as the multipliers of the ﬁrst set of restrictions n on ξi and ξj in (3) above when building the Lagrangian of LðW; b; ξÞ and the constraints in (3). The preceding discussion can be carried into a non-linear setting by observing that both the SVR model and problem (4) only involve dot products X i X j . Using an appropriate kernel k, we can work with high (possibly inﬁnite) dimensional projections ΦðX i Þ of the original Xi patterns and replace the previous dot products with the new ones ΦðX i Þ ΦðX j Þ ¼ kðX i ; X j Þ. Here we will use a Gaussian kernel kðX; X 0 Þ ¼ expð‖X X 0 ‖2 =2σ 2 Þ. The relatively small dimension of the problems we are concerned with (either 8 or 48, as discussed below) precludes the use of a linear kernel, as the resulting model would be too simple. Moreover, in our experience, the models derived from polynomial kernels are not competitive against the Gaussian kernel which, besides, can be simply interpreted as a radial basis function

3

expansion on the meteorological conditions of the days that correspond to the support vectors that make up the SVR model. The dual problem (4) is usually solved using the well known SMO algorithm and, once we compute the optimal multipliers αni P n n and β i , the optimal weight is given as W n ¼ ðαni β i ÞΦðX i Þ, and we obtain the ﬁnal model as X X n n γ ni ΦðX i Þ ΦðXÞ ¼ bn þ γ ni kðX i ; XÞ f ðXÞ ¼ b þ W n ΦðXÞ ¼ b þ n

¼b þ

X

i

γi e n

‖X X i ‖2 =2σ 2

:

i

where we write γ ni ¼ αni βi . In the experiments of the following section we will solve (4) using the LIBSVM implementation [14] which can be considered the state-of-the-art in SVM software. Finally, we point out that building a Gaussian SVR model requires to ﬁx three meta-parameters, the C penalty constant, the tolerance ϵ and the width σ of the Gaussian kernel. We will discuss how to ﬁnd them in Section 3 and turn now our attention to Random Forest Regression. n

2.2. Random Forest for regression In this subsection we will brieﬂy review the construction of individual regression trees (RTs), and then how to use them in Random Forests Regression (RFR). Assume again a sample S ¼ fðX p ; t p Þ : 1 r p r Ng with the Xp being D-dimensional vectors. An RT partitions input space in a number M of rectangular regions Rj and, once these rectangles are found, the response of a tree that P minimizes square error is TðXÞ ¼ j γ j I Rj ðXÞ, where I Rj ðXÞ is the indicator function of the rectangle Rj and γj is given by 1 X γj ¼ t p ¼ E½t p j X p A Rj : j Rj j R j

Finding the Rj is the difﬁcult part. The standard approach in CART [15], probably the most popular method for building RTs, is a greedy procedure that performs binary splits along successive variables j according to split values s. This leads to a large, possibly overﬁtting tree which is then pruned balancing the model's error and complexity. In more detail [13, Section 9.2], splitting the variable j using a value s yields two regions RLj;s ¼ fX : xj rsg and RRj;s ¼ fX : xj 4 sg. An optimal choice of j and s should solve 8 9 8 93 2 > > > > < X = < X = 6 7 ðt p cL Þ2 þ min ðt p cR Þ2 5: min4min cL > c > > > j;s R :X p A RL ; :X p A RR ; j;s

j;s

It is easy to see that the optimal cnL and cnR are given by cnL ¼ E½t p j xp A RLj;s and cnR ¼ E½t p j xp A RRj;s and, since increasing s just “moves” points from RR to RL, the ﬁrst split j1 and s1 can be quickly obtained exploring this way the xj variables. This is successively applied to the regions RLj1 ;s1 ; RRj1 ;s1 and so on, and splitting stops when a rectangular region containing a preﬁxed small number of points is reached. This way we arrive at a possibly very large tree T0 which we prune in terms of cost (i.e., error) and complexity (i.e., tree size). Assuming that T is a subtree of T0 with j T j terminal nodes associated to the regions Rj, we set 1 X 1 X t p ; Q j ðTÞ ¼ ðt p cj Þ2 ; Nj ¼ j Rj j ; cj ¼ Nj X A R Nj X A R p

j

p

j

and deﬁne for α 4 0 the cost-complexity criterion function: C α ðTÞ ¼ α j T j þ

j Tj X

Nj Q j :

1

It can be shown [15] that for each α there is a unique smallest subtree T α of T0 that minimizes C α ðTÞ. This T α can be found by

Please cite this article as: Y. Gala, et al., Hybrid machine learning forecasting of solar radiation values, Neurocomputing (2015), http: //dx.doi.org/10.1016/j.neucom.2015.02.078i

Y. Gala et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

weakest link pruning, where we successively collapse the leaves of P the node that produces the smallest increase in j N j Q j ðTÞ, until we arrive at a single node tree. It can also be shown that the minimizing tree T α belongs to the sequence of subtrees built this way and, thus, we can identify it during the pruning procedure; this T α will be our ﬁnal RT. We can see α as a regularization parameter that tunes the trade-off between tree size and goodness of ﬁt. For instance, if T0 has single pattern nodes, T0 minimizes C 0 ðTÞ. RTs will clearly have small bias but possibly large variance. An obvious way to reduce variance is to apply bagging to build an RT family T m ; 1 r m r M, over bootstrapped subsamples of S and then average them. Bagging, i.e., bootstrap averaging, ensures that the RTs are i.d. and have small bias. If, moreover, the Tm are i.i.d. with the same variance σ2, the average's variance will be σ 2 =M, decreasing with M. However, if they have a pairwise correlation ρ 4 0, the average's variance will become ρσ 2 þ ðð1 ρÞ=MÞσ 2 , with the ﬁrst term being independent of M. The goal is thus to reduce pairwise correlation and in Random Forests (RFs) this is done pairing bootstrap subsampling with a random selection of the variable subsets to be split. More precisely, before each split, p o d input variables are randomly pﬃﬃﬃ selected as splitting candidates. Typical values for p are d, ⌊d=3c or even 1. Once M such trees T m ðXÞ are grown, the random forest (regression) predictor rfM is rf M ðXÞ ¼

M 1X T m ðXÞ: M 1

Depending on the RFR implementation used, several different parameters may be adjusted. We have used the routine sklearn.ensemble.RandomForestRegressor in the ScikitLearn Python implementation, where we have selected the number of trees and the number J of terminal nodes in the ranges f10; 50; 100; 300g and ½5; 20 respectively, using the Scikit default values for other parameters. The procedure we follow is described in Section 3.

2.3. Gradient boosted regression Assume once more a sample S ¼ fðX p ; t p Þ : 1 r p r Ng; if we want to ﬁt some model F(X) over S using a loss function Lðy; FÞ a possibility is to consider the vector F ¼ ðFðX 1 Þ; …; FðX N ÞÞt of values of the unknown F at the sample points Xp and to optimize P LðF Þ ¼ p Lðt p ; FðX p ÞÞ over F . For instance, starting at h0 and following a steepest descent approach, we can take hm ¼ ρm g m , where g m ðXÞ is the gradient of LðF Þ at the current vector F m 1 , i.e., ∂LðF Þ ∂LðF m 1 Þ ∂L ¼ ðt p ; F m 1 ðX p ÞÞ; g m ðX p Þ ¼ ¼ ∂F p F ¼ F m 1 ∂F m 1 ðX p Þ ∂F we derive then a step length

ρm as the solution of

ρm ¼ arg min LðF m 1 ρg m Þ; ρ

and, ﬁnally, update F m 1 to F m ¼ F m 1 ρm g m ¼ F m 1 þ hm . A weakness of this approach is that it essentially only considers the sample points. To enable the consideration of other points, we may ﬁt at the m-th iteration a weak learner hðX; Θm Þ parameterized by Θm that is close to the negative gradient g m . A natural option for this is to use least squares, solving

Θm ¼ arg min Θ

N X 1

ð g m ðX p Þ hðX p ; ΘÞÞ2 :

We next compute the optimal step

ρm as

ρm ¼ arg min LðF m 1 þ ρhð; Θm ÞÞ ρ

ð5Þ

and the new update is F m ðXÞ ¼ F m 1 ðXÞ þ ρm hðX; Θm Þ. This is essentially the approach followed in Gradient Boosted Regression (GBR; [13]), where regression trees are used as weak learners. Then at each step m, a tree T m ðXÞ ¼ TðX; Θm Þ is ﬁrst built solving

Θm ¼ arg min Θ

N X

ð g m ðX p Þ TðX p ; ΘÞÞ2 ;

1

where Θ captures the parameterization of a tree T. However, instead of looking for a single optimal updating step ρ as in (5), we m retain the leaves' rectangles Rm 1 ; …; RJ m of Tm and ﬁt individual m values γm at these R as j j X γm ¼ arg min Lðt p ; F m 1 ðX p Þ þ γ Þ; j γ

X p A Rm j

arriving ﬁnally at the GBR updates F m ðXÞ ¼ F m 1 ðXÞ þ

Jm X

γm ðX p Þ: j I Rm j

1

There are in principle two basic meta-parameters in GBR, the number M of iterations to be performed (i.e., the number of weak learners to be used) and the number Jm of leaves of the tree at step m. If Regression Trees are used, the number of leaves Jm could be a by-product of cost-complexity regularization. However, and as discussed at the end of Section 10.11 in [13], values in the range 4 r J r 8 work well in practice while it is unlikely that larger values J 4 10 are required. Also, instead of applying again CV to decide on M, we can follow a shrinkage-based regularization approach, in which we replace the previous update of Fm by F m ðXÞ ¼ F m 1 ðXÞ þ ν

Jm X

γm ðX p Þ j I Rm j

1

where the parameter ν, 0 o ν o 1, “shrinks” the contribution of the new RT to be added. It has been observed in [11] that smaller values of ν lead to larger M values and usually a better performance. On the other hand, a smaller ν implies obviously longer training times and there is thus a trade-off between M and ν. The usual strategy is to choose a very small ν o0:1 and then choose M by early stopping using validation subsets obtained by K-fold CV. In our experiments we have used the Boosted Binary Regression Tree implementation available in Matlab Central and, as metaparameters, we will consider the values f10; 50; 100; 300g for the number N of trees, the shrinkage parameter ν will vary in the interval ½0:05; 0:09 and the number of terminal nodes J in the range ½2; 8. Again, our concrete procedure is described next in Section 3.

3. Forecasting aggregated daily and 3-hourly radiation We will now describe our numerical results, starting with those of the ECMWF's forecasts (Section 3.1), giving then those of the SVR, GBR and RFR models (Section 3.2) and considering ﬁnally (Section 3.3) a hybrid model to combine these three ML-based forecasts. As mentioned above, we will use standard implementations of SVR, GBR and RFR. In fact key reasons for our choices of ML methods are not only that they represent the state of the art but also that there are high quality and publicly available implementations of them and that clear procedures can be given to select their meta-parameters. We believe these reasons to be essential in order to have strong applications of ML methods in real world problems.

Please cite this article as: Y. Gala, et al., Hybrid machine learning forecasting of solar radiation values, Neurocomputing (2015), http: //dx.doi.org/10.1016/j.neucom.2015.02.078i

Y. Gala et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

3.1. Radiation data and ECMWF forecasts We shall compare ﬁrst ECMWF radiation forecasts with actual measured radiation values at seven concrete weather stations situated near the Spanish cities of A Coruña (COR), Alicante (ALI), Almeria (ALM), Oviedo-Asturias (AST), Barcelona (BAR), Ciudad Real (CR) and Salamanca (SAL); see Fig. 1 for their approximate locations. These hourly values correspond to power per surface quantities and, thus, can be rather high when aggregated hourly. For instance, a typical near noon radiance of about 1000 watts per square meter (W/m2) results in 3600 1000 ¼ 3:6 106 joules per square meter (J/m2). Because of this we will work with global horizontal solar radiation (GHR) measurements and forecasts given in 10 KJ/m2 (i.e., about 3600 KJ/m2 for the previous estimate). The radiation data are available for UTC hours 5–20 and all days from October 2009 to July 2011, that is, 22 months and, in principle, there are ECMWF radiation forecasts for all of them. However, since the ML models considered in the next subsection require the selection of a subset for model training and validation, we will compare the ECMWF predictions with those of the ML models in a 10-month test period that goes from October 2010 to July 2011. Recall that while radiation measurements are given in hourly values, ECMWF radiation forecasts are given as 3-h accumulated values for UTC hours 6, 9, 12, 15, 18 and 21; we have thus to match both datasets, which we do as follows. In the summer accumulated radiation values at solar hour 6 certainly include radiation at hour 5 and even a slight amount at hour 4. We do not have actual radiation measurements at hour 4, but it corresponds to night time in peninsular Spain; so, in what follows we will ignore it and consider as 3-h accumulated radiation at hour 6 the sum of radiations at hours 5 and 6. We will do the same for aggregated radiation at UTC hour 21, that we will take as the sum of those at hours 19 and 20, ignoring the reading at hour 21 that also corresponds to night time. Therefore, we will ﬁrst compare 3-h actual accumulated radiation for UTC hours 6, 9, 12, 15, 18 and 21 with the corresponding ECMWF forecasts, and measure the mean absolute error, MAE, in the test period for both the individual 3-h forecasts and the total daily radiation. (While other error measures such as mean square error can be used, our choice of MAE is motivated by standard practice in renewable energy, as MAE

5

measures there the absolute deviation between predicted and real energy production.) In a more formal way, in what follows we will denote by I d;h , I 3H d;h and I D d;h the global horizontal radiation at hour h of day D, its accumulated 3-h radiation up to hour h of day D and the daily aggregated radiation at day D respectively. Notice that while I d;h is considered for each day at all UTC hours from 4 to 21, with always I d;4 ¼ I d;21 ¼ 0, we will consider I 3H d;h only for hours h ¼ 3k, with k ¼ 2; 3; 4; 5; 6 and 7 (i.e., the three hour periods ending at UTC hours 6, 9, 12, 15, 18 and 21). We thus have I 3H d;h ¼ I d;h 2 þ I d;h 1 þ I d;h ;

Id ¼

21 X

I d;h :

h¼4

We will use a similar notation for the hourly, 3-h and daily forecast M 3H;M D;M and I^d;h derived from a given model M, namely I^ d;h , I^d;h respectively. We will consider 3-hour (3H) and daily (D) models in this section and defer to Section 5 for the hourly models' discussion. 3H;E In the case of the ECMWF forecasts, bI d;h denotes its 3-h prediction and its corresponding 3-h MAE eE3H is given by ( ) ND 7 3H;E 1 X 1X 3H E b jI I j ; e3H ¼ ND d ¼ 1 6 k ¼ 2 d;3k d;3k with ND being the number of days in the period of time used as the test subset. For total daily radiation forecasts, we transform the 3H;E accumulated 3-hour radiation ECMWF forecasts bI d;h into daily D;E D;E 3H;E P forecasts bI d as bI d ¼ 7k ¼ 2 bI d;3k and the daily ECMWF forecasts error eED is thus given by eED ¼

ND D;E 1 X j I bI j : ND d ¼ 1 d d

The values of the eE3H and eED are given in Tables 1 and 2, respectively. We will comment on them after we introduce the SVR, GBR and RFR models with whose errors they will be compared. 3.2. ML radiation forecasts models In this subsection we describe the application of SVR, GBR and RFR to derive ML models for 3-hourly and daily accumulated GHR values. In the ﬁrst case we will use as input patterns X d;h the

Fig. 1. Weather station location and (a) daily and (b) 3H-SVR model MAEs for each station. (For interpretation of the references to color in the text, the reader is referred to the web version of this paper.)

Please cite this article as: Y. Gala, et al., Hybrid machine learning forecasting of solar radiation values, Neurocomputing (2015), http: //dx.doi.org/10.1016/j.neucom.2015.02.078i

Y. Gala et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

6

Table 1 3H model mean absolute errors, with those of the best models under a Wilcoxon Rank Sum test in boldface. Prov

COR ALI ALM AST BAR CR SAL

ECMWF

56.47 41.30 38.69 61.98 50.65 45.27 50.31

3H models

D models

SVR

GBR

RFR

Hyb

SVR

GBR

RFR

Hyb

50.07 38.04 35.59 56.15 46.73 42.64 48.67

49.48 38.60 36.16 56.53 48.33 43.97 48.06

49.56 37.95 36.06 55.93 47.93 43.53 47.40

49.00 37.73 35.44 55.92 47.14 42.76 47.32

45.12 38.88 33.85 50.67 45.35 38.56 41.07

46.76 39.22 35.73 52.44 45.31 40.01 42.87

47.65 39.09 35.16 51.75 44.87 40.56 44.94

45.89 38.65 34.45 50.85 44.44 39.14 42.61

Table 2 Daily model mean absolute errors. Best model errors under a Wilcoxon Rank Sum test in boldface. Prov

COR ALI ALM AST BAR CR SAL

ECMWF

210.96 183.29 159.42 235.47 211.33 166.26 164.43

D models

3H models

SVR

GBR

RFR

Hyb

SVR

GBR

RFR

Hyb

184.37 159.37 143.56 206.33 204.64 157.57 157.86

199.83 158.96 156.20 225.35 201.82 172.48 172.15

208.32 158.60 149.69 217.89 200.11 172.23 186.00

196.65 155.12 145.62 212.87 197.26 163.74 167.77

202.75 157.31 145.23 215.89 195.29 175.13 191.70

197.46 160.97 149.09 217.89 205.52 185.39 188.90

201.25 154.64 147.08 214.89 201.55 179.02 183.83

197.40 155.45 145.61 215.48 200.61 178.52 184.94

ECMWF's SSRD and TCC forecasts at each one of the four grid corners that surround each weather station; pattern dimension is 3H;M

thus 8. These 3-h forecasts bI d;h , with M ¼ S; G or R for SVR, GBR and RFR models respectively can be added to derive a daily D’3H;M P 3H;M ¼ 72 bI d;3k . radiation forecast bI d While an approach to derive these models would be to use just a single model for the six 3-h intervals at any given day, given the substantial scale variation of GHR along the day, we will build three separate models, one for the 3-h radiation values ending at UTC hours 6 and 21 (closest to sunrise and sunset), a second one for UTC hours 9 and 18, and a third one for hours 12 and 15 (closest to noon). These hour-dependent models perform better than a single, all-hour model. With a slight abuse of language we will refer to the hour-dependent models as the SVR, GBR or RFR models, even if each is made up of three individual submodels. On the other hand, we can also build models that aim to give D;M direct forecasts I^ d of the total daily GHR Id from daily patterns made up of the six 4-point SSRD and TCC forecasts given daily by the ECMWF at the same four grid corners used before; pattern dimension is now 6 8 ¼ 48. In turn, we can disaggregate these D;M 3H;DM daily forecasts bI d into 3-h accumulated GHR forecasts bI d;h by simply using the proportions determined by the ECMWF radiation forecasts, i.e., we have

b3H;E bI 3H’D;M ¼ I d;3k bI D;M ; d;3k d bI E d E 3H;E P where we recall that bI d ¼ 7k ¼ 2 bI d;3k . As mentioned in Section 2, we need to select some metaparameters for the SVR, GBR and RFR methods in order to get accurate models. There are several options that can be considered and, in fact, given the clear seasonal behavior of radiation, the adequate selection of the validation and training subsets may have a great inﬂuence on a model's performance. In any case, to select each model's meta-parameters, we work over the entire initial 12 month period that we reserve for training and validation purposes, and randomly divide it into ﬁve disjoint subsets; this is done in a

stratiﬁed way so that each subset has the same amount of days for each month. Hence we will have ﬁve training and validation subsets with 80% and 20% of the data respectively. This way we will arrive to ﬁve optimal meta-parameter subsets and we use them to retrain the 5 models now over the entire 12 month training period. Before describing their performance, we brieﬂy discuss meta-parameter selection. In all cases meta-parameters are found using a simple search over grids that encompass possible parameter values. In the case of SVR we consider C values in the interval ½2 10 ; 210 , σ values in the interval ½2 17 ; 23 and ϵ values in ½21 ; 25 . For GBR we will consider as possible values for the number N of trees those in the set f10; 50; 100; 300g, the shrinkage parameter ν will vary in the interval ½0:05; 0:09 and the number of terminal nodes J in the range ½2; 8. In the RFR case, we consider again as the number of trees the values f10; 50; 100; 300g, and the number of terminal nodes J in the range ½5; 20. As suggested in [12], we ﬁx the number of input variables to be randomly chosen as F ¼ ⌊log 2 ðM þ 1Þc, i.e., F¼3 for the 3-h models, and F¼5 for the daily model. In any case, model parameters are crucial for achieving low prediction errors and, in fact, SVR parameter tuning is an active research ﬁeld where new studies appear on a regular basis. Some examples of recent work can be found in [16–18]. While restricted in principle to SVR, some of these approaches could also be applied to GBR and even RFR. After the meta-parameter search ﬁnishes, we have 5 different models for each one of the SVR, GBR and RFR procedures. Of these we will dismiss the two models with higher and lower MAE over the training set (i.e., the best and worst models) and the ﬁnal SVR, GBR and RFR forecasts will be given by the averages of the individual forecasts of the remaining three models. 3.3. Hybrid radiation forecasts Given the quite different construction of the SVR, GBR and RFR models, a natural idea is to combine them in a hybrid model. Several possibilities could be tried but we will restrict ourselves to build a hybrid SVR-GBR-RFR model as a weighted linear combination of the SVR, GBR and RFR outputs, with the weights being

Please cite this article as: Y. Gala, et al., Hybrid machine learning forecasting of solar radiation values, Neurocomputing (2015), http: //dx.doi.org/10.1016/j.neucom.2015.02.078i

Y. Gala et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

derived from each model's MAE over the training set as explained next. Recall that we retain three SVR, GBR and RFR models. Let MAEj;M , 1 r j r3, be the MAE of the selected model j over its validation subset built under the M¼SVR, M¼RFR or M¼GBR approach. Then the weight wM used to incorporate the M model is simply wM ¼ P

ωM ; 0 M 0 ωM

with ωM ¼ P3

j¼1

1 MAEj;M

;

in other words, the weights of a given model in the hybrid combination are inversely proportional to their MAEs; thus, presumably better models with a smaller MAE are given a larger H weight. Then, for a new input pattern X, the output I^ ðXÞ of the H ^ hybrid model I is given by X H M wM I^ ðXÞ: I^ ðXÞ ¼ M

We discuss next the performance of all previously described models. 3.4. The performance of the 3-h and daily aggregated radiation forecasting models The MAEs of the ECMWF, SVR, GBR, RFR and hybrid models are reported in Tables 1 and 2 for both the 3-hour and daily forecasts. To evaluate model performance we have applied the well known Wilcoxon Signed Rank (WSR) test, whose null hypothesis is that two distributions have the same median. More precisely, we have applied it separately for the 3-h and daily aggregated predictions as follows: we select for each location the model with the smallest MAE, which we write in boldface, and compare its error distribution with that of the other competing models, applying to both the WSR test. If the null hypothesis of equal medians cannot be rejected, the MAE of the competing model is also shown in boldface. However, when the null hypothesis can be rejected, the competing model's MAE is left in standard typeface. Informally, we can thus say that models with boldface MAEs “tie” as best models, while those with standard typeface MAE “lose” against the smallest MAE model; as a consequence, we consider as “better” those models with more MAE values in boldface. With respect to 3-h aggregated predictions, it can be seen in Table 1 that the daily D-SVR models consistently appear to be the best (in COR, ALM, AST, CR and SAL) or tie with the best model (in BAR). The D-hybrid model comes second, as it is the best or ties with it in four locations, while D-RFR does so in two locations and D-GBR in one. The ECMWF 3–h forecasts do not win or tie in any locations. Turning our attention to the aggregated daily predictions, it can be seen in Table 2 that the daily D-SVR models are the best or tie in six locations, the same performance as that of the Dhybrid model. The other models perform here quite better than in the 3-hour case: 3-SVR wins or ties in four locations and D-GBR, DRFR and the ECMWF do so in three locations. We can thus conclude that D-SVR seems to have the best overall performance, closely followed by the D-hybrid model, where, nevertheless, the contributions of the GBR and RFR models are not enough to improve the hybrid model's performance above that of D-SVR. As a matter of fact, the errors of the different models are highly correlated with values often above 80% across all models. Thus it seems that when combined in a hybrid model, GBR and RFR may slightly hamper SVR's bias while not reducing by much its variance. In any case, the close performance of the various models can also be seen in Figs. 2 and 3 that describe the monthly MAE evolution of the daily and 3-h model D models for the OviedoAsturias (AST) and Almería (ALM) stations, which present the worst and best prediction behavior respectively. As it can be seen,

7

the SVR and hybrid models present a very close evolution, and the same do the GBR and RFR models that, in turn, are not too far away from those of the other two models. On the other hand, ECMWF's errors lie often above those of the other four models. Finally, Fig. 1 depicts station positions and the color coded smallest MAE values for the daily and 3-hour best model. As it may be expected, the errors for the southern stations (which have many more sunny days) are lower than those for the northern ones, even if radiation received in the north of Spain is lower.

4. Theoretical and empirical clear sky modeling While the previous discussion deals with daily or 3-h radiation predictions derived directly from NWP forecasts or from ML models, it is clear that the important prediction goal is that of the hourly radiation values. Thus, we have to disaggregate the preceding aggregated radiation predictions, for which some kind of interpolation is needed. Of course many approaches can be followed but perhaps the most obvious way of getting interpolation values and, in fact, the most widely used, is to interpolate in terms of a theoretical Clear Sky (CS) radiation model. Modeling of solar radiation is a very active research area, and many theoretical CS models have been proposed that often require the adjustment of several local parameter values. A review of some of these models from the point of view of renewable energy is in [19]; we will brieﬂy describe below a much simpliﬁed model of clear sky radiation. Of course, the clear sky assumption does not hold for cloudy days but, even then, clear sky interpolation is still, mostly because of the lack of theoretical modeling alternatives for such days and because its interpolated values give a reasonable approximation to the hourly evolution of radiation. Moreover, recall that in our case the interpolation is limited to three hour intervals. In any case, a more substantial difﬁculty when building theoretical CS curves is the need to locally adjust several atmospheric parameter values. Because of this, we will also propose below what we will call empirical CS curves, a simpliﬁed approach to local CS modeling that, nevertheless and as we will see, yield rather good results. In order to give some background on the important issue of CS interpolation, let us describe it from a general point of view.4 Let us denote by I d the direct (beam) radiation at a given point, by I the corresponding horizontal radiation and by Θ the solar angle of incidence or zenith angle (i.e., the angle between the incident solar rays and the vertical at the point). We obviously have I ¼ I d cos Θ, where Θ is a trigonometric function Θ ¼ ΘðL; d; hÞ of the latitude L of the point and the day D and hour h. As a ﬁrst approximation to I d we could use direct solar radiation IS given by [20] n 3 : I S ¼ 1370 1 þ 0:034 cos 2π 365 Here 1370 is an estimate of the average value of the direct solar radiation at the top of the atmosphere, that varies with the distance of the Sun to the Earth, with a maximum of about 1417 W/m2 at the perihelion and a minimum of about 1325 W/m2 at the aphelion; n denotes the day number, and we recall that the aphelion falls approximately on January 3, i.e., for n¼ 3. However, the previous IS formula does not take into account the length of the path the Sun rays follow in the atmosphere, which we need for a better estimate of I d . This relative length is described in terms of the so-called air mass A ¼ AðΘÞ which depends on the zenith angle Θ. When Θ is 0 and the Sun is 4 A simple introduction to several issues in solar radiation can be found at http://www.brighton-webs.co.uk/index_energy.aspx

Please cite this article as: Y. Gala, et al., Hybrid machine learning forecasting of solar radiation values, Neurocomputing (2015), http: //dx.doi.org/10.1016/j.neucom.2015.02.078i

Y. Gala et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

8

Fig. 2. Monthly evolution of daily model errors for the (a) Oviedo and (b) Almería stations.

Fig. 3. Monthly evolution of 3-h model errors for the (a) Oviedo and (b) Almería stations.

directly overhead, air mass is taken as A ¼1. When Θ increases so does path length and, hence, air mass. An approximation to AðΘÞ is the Kasten–Young formula [21]: AðΘÞ ¼

1 cos Θ þ 0:50572ð96:07995 ΘÞ 1:6364

with Θ given here in degrees. Models for direct radiation I d usually combine air mass A and direct solar radiation IS; a simple one is that of Meinel [22] 0:678 0:678 N 3 0:7A ; I d ¼ I S 0:7A ¼ 1370 1 þ 0:034 cos 2π 365 and we ﬁnally arrive to the following basic CS model for horizontal radiation I ¼ IðΘÞ ¼ IðΘðL; d; hÞÞ: N3 IðΘðL; d; hÞÞ ¼ 1370 1 þ 0:034 cos 2π 365 0:7AðΘðL;d;hÞÞ

0:678

cos ðΘðL; d; hÞÞ:

In Section 5 we will use the more sophisticated Bird model [23], for which code is publicly available, to derive for each location a table τd;h that contains the theoretical clear sky radiation estimate that the Bird model yields for day D and hour h. To do so, observe that any clear sky model gives the radiation values at any time t of day D as a function cðt; dÞ. Thus, the accumulated radiation between hours h 1 and h of day D is Z h M 1 X i τd;h ¼ cðt; dÞ dt C c h1þ ; d ; Mi¼1 M h1 where M determines the approximation time step. We will use these theoretical CS τd;h values in the next section but, in any case, we point out that the output of any such model depends explicitly only on Θ, that is, latitude, day and hour, but, nevertheless, it also involves several other physical and atmospheric parameters that are heavily dependent on local conditions and, in general, fairly difﬁcult to establish. Thus, general purpose models may not work all too well at a given location.

On the other hand, notice that since we will use clear sky curves only as an interpolation tool, actual precise radiation values are not strictly needed but only the ability to capture the overall radiation evolution and particularly, the adequate identiﬁcation of sunrise and sunset. Therefore, provided that a long enough radiation record is available, it can be taken as the basis to build a clear sky curve proxy. As a ﬁrst approach, we will simply compute here for each day-hour pair ðd; hÞ an empirical maximum radiation curve μd;h deﬁned as the maximum of past radiation values measured at hour h over an interval ½d δ; d þ δ centered at D. More precisely, we have

μd;h ¼ max fI d þ q;h;y : δ r q r δ;g y;q where δ is some small integer and I d þ q;h;y denotes the radiation measured at hour h in all d þq days, δ r q r δ, of past years y. In our case, radiation records extend from October 2009 to July 2011 and since we use the period October 2009–September 2010 as the training set, this is also the period used to compute μd;h . This approach yields rather reasonable CS curves for summer days but sometimes not so for winter days. An example of this can be seen in Fig. 4, that depicts a fairly good empirical clear sky curve for June 30 at Oviedo but a not so good one for January 6 at Almería. As a simple expedient to smooth this out, we further reﬁne the initial empirical curves by averaging for a pair (d,h) the initial μd0 ;h values around D, i.e.,

μ d;h ¼

δ 1 X μðd þ δ; hÞ 1 þ 2δ δ

and rescaling them so that peak daily value is retained after averaging. We will apply this with δ ¼6 and the resulting curves for June 30 at Oviedo and January 6 at Almería are shown in Fig. 5. As it can be seen, the June 30 curve is essentially unchanged while the January 6 one is smoother. In any case, while we stress that other, perhaps better criteria could be devised to arrive to interpolation curves, we think this

Please cite this article as: Y. Gala, et al., Hybrid machine learning forecasting of solar radiation values, Neurocomputing (2015), http: //dx.doi.org/10.1016/j.neucom.2015.02.078i

Y. Gala et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

9

Fig. 4. Empirical clear sky curve for (a) May 30 (Oviedo) and (b) January 6 (Almería).

Fig. 5. Averaged empirical clear sky curve for (a) May 30 (Oviedo) and (b) January 6 (Almería).

initial approach an interesting venue that, as we shall see next, yields a better interpolation than that provided by the Bird model. 5. Forecasting hourly radiation In this section we will ﬁnally disaggregate into hourly values the 3-h radiation predictions derived from the 3-h and daily models as well as from the ECMWF 3-h forecasts. This has be done simply by interpolation using for each day D a certain interpolation table pd;h , P23 0 r h r 23, such that pd;h ¼ 1 and to derive then the hourly DM 0 b radiation predictions I d;h derived from either the direct 3-h forecasts of model M or those derived from its daily version and those derived from the 3-h version of model M as b3H;M bI H’3H;M ¼ p3 d;h j d;h j I d;h ;

bI H’3H’D;M ¼ p3 b3H’D;M d;h j d;h j I d;h

where we write a given hour as h j with h ¼ 3k as before, and P j ¼ f0; 1; 2g, and p3d;h j ¼ pd;h j = 2m ¼ 0 pd;h m : For the interpolating

values pd;h we will use both the theoretical CS τd;h and the empirical CS μ d;h discussed in the preceding section. Now the hourly MAE error ^ H’D;M or I^H’3H;M of any model M is eM H of the hourly predictions I d;h d;h given by eM H ¼

ND 20 M 1 X 1 X j I bI j ; ND d ¼ 1 16 h ¼ 5 d;h d;h

H’3H;M

H’3H;M with eM and bI M denoting here either eH and bI d;h H H’D;M

and bI d;h

d;h

or eH’D;M H

.

In Table 3 we give for the theoretical and empirical clear sky curves the hourly errors for each location of the disaggregation of the ECMWF predictions as well as those of the 3-h and daily versions of the SVR, GBR, RFR and hybrid models. As done before, we show the result of a WRS test applied now over the error distributions of the hourly models, comparing again for each location the distribution of the model with smallest MAE (in principle the best one) against all others. We retain our boldface notation for the MAEs of the models for which we cannot reject the null hypothesis of same medians. We interpret this again as considering that models in boldface tie with the model with smallest MAE. Thus we may slightly informally think of models with a larger number of boldface MAE values as being better than those with fewer or no MAE value in boldface. Under this interpretation, empirical D-SVR models appear as the better ones for six out of seven locations, followed by the empirical D-hybrid models. In the remaining location (Alicante) the best models are 3H-SVR and the 3H-hybrid one. The tables also show that, in general, D models perform better than 3H ones, that GBR and RFR models have a worse performance than the SVR or Hybrid ones, and that empirical CS interpolation (Table 3 bottom) is clearly superior to theoretical CS interpolation (Table 3 top).

Please cite this article as: Y. Gala, et al., Hybrid machine learning forecasting of solar radiation values, Neurocomputing (2015), http: //dx.doi.org/10.1016/j.neucom.2015.02.078i

Hyb

21.05 16.74 15.43 23.55 20.07 18.41 20.38

Hyb

20.54 16.58 15.28 23.34 19.66 18.28 19.99

Y. Gala et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

10

6. Discussion and conclusions

20.79 16.71 15.53 23.46 20.00 18.52 20.03 20.67 16.94 15.53 23.54 20.06 18.73 20.32 20.83 16.65 15.28 23.32 19.50 18.17 20.39 19.66 17.07 15.10 21.79 19.17 17.18 18.33 20.06 17.22 15.36 22.04 19.35 17.63 19.17

Hyb RFR ECMWF

23.32 17.62 16.40 25.63 21.09 19.01 20.98

Prov

COR ALI ALM AST BAR CR SAL

GBR

19.78 17.24 15.52 22.18 19.40 17.36 18.42

SVR SVR

GBR 3H models D models

23.37 17.71 16.40 25.71 21.41 18.96 21.09

19.24 17.13 14.84 21.67 19.28 16.89 17.80

RFR

21.31 16.86 15.65 23.67 20.39 18.69 20.41 21.19 17.12 15.70 23.78 20.48 18.88 20.65 21.27 16.82 15.44 23.55 19.89 18.29 20.77 20.55 17.47 15.67 22.37 19.78 17.74 19.23 21.06 17.60 15.93 22.66 19.99 18.24 19.96 20.82 17.64 16.14 22.81 20.03 17.96 19.34

RFR GBR

COR ALI ALM AST BAR CR SAL

20.37 17.54 15.43 22.41 19.94 17.58 18.83

SVR SVR

Hyb

3H models D models ECMWF Prov

Table 3 Hourly mean absolute errors using the theoretical (top) and empirical (bottom) clear sky curve. Best model errors in boldface.

GBR

RFR

6.1. Discussion The growing presence of solar energy is causing a renewed interest in the prediction of weather variables that are related to it, most notably, radiation, as it has a direct impact on energy production. The standard approach in radiation forecasting is to rely on atmospheric physics models to reﬁne base predictions given by Numerical Weather Prediction (NWP) systems and adapt them to precise geographical locations. There is an ample body of literature on this; examples can be found in [4–6,8,9] or [19]. The recent book [24] gives a thorough overview of some of these approaches and, in general, of radiation modeling and prediction. While usually quite effective, these methods have, in general, the drawback of the need to recalibrate physical parameters or even partially modify the underlying physical assumptions in order to ﬁt them to the local situation. This may require extensive work to ﬁnd the proper physical model for a new location and to have rather long measurement series of radiation-related variables, which may not be available at new locations and that, also, often have a proprietary nature that makes it hard to replicate previous results. In particular, its quick application even to a moderate number of sites may be difﬁcult. Thus, statistical or, more generally, Machine Learning (ML) models that more or less directly adjust NWP forecasts to local radiation measurements arise as a natural alternative. Their advantages are obvious: a fairly direct modeling approach whose results are much easier to replicate than in the case with physical models, and that can easily be applied to any location, given that global NWP forecasts are nowadays readily available. Examples of this approach are in [1], based on Model Output Statistics, in [2,25] and [26], where neural networks are used, or in many of the proposals submitted to the American Meteorological Society– Kaggle competition on the forecast of daily aggregated radiation. More recently, researchers have been starting to propose ML models for the direct forecasting of photovoltaic energy, often being seen as a time series and with a focus on short term predictions. An early review can be found in [3] and more recent examples are [27,28] or [29]. Our contribution broadly follows this approach but departs from previous research in that we have used three state of the art ML regression methods, SVR, GBR and RFR, taking as model inputs NWP radiation and cloud cover forecasts from a standard and commercially available model, that of the ECMWF. This requires to understand the structure of the ECMWF's NWP variables and forecasts, which we have thoroughly discussed. Another novelty is our proposal of the use of local empirical CS radiation curves to overcome the three-hour minimum time resolution in ECMWF's forecasts and be able to disaggregate our ML-based three-hour, aggregated radiation predictions into hourly radiation values. These values are, of course, the ultimate forecasting goal. To the best of our knowledge this has not been done before and we have shown it to give better results than those obtained using a theoretical CS curve. Finally, another novelty of our work is to introduce hybrid models that combine the SVR, GBR and RFR forecasts. Our best hybrid model beats stand alone GBR and RFR but is still slightly worse than the best SVR model. As mentioned above, the likely reason for this is the high correlation between the various SVR, GBR and RFR forecasts errors and, thus, the fact that the hybrid models do not reduce variance enough as to compensate the higher bias of the GBR and RFR models. This is supported by the fact that straight averaging of the SVR, GBR and RFR forecasts yields errors worse than those of our hybrid model but not greatly far away. (On the other hand, a straight linear regression

Please cite this article as: Y. Gala, et al., Hybrid machine learning forecasting of solar radiation values, Neurocomputing (2015), http: //dx.doi.org/10.1016/j.neucom.2015.02.078i

Y. Gala et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

combination of the SVR, GBR and RFR outputs has a clearly much poorer performance.) Finally, throughout the paper we have stressed the relevance of the accurate downscaling of radiation forecasts, ﬁrst by themselves but also as potential inputs to photovoltaic energy models. This is certainly opening a new area of opportunities for ML methods and practitioners, as witnessed in the last NIPS workshop on Machine Learning for Sustainability, with nearly half of its papers devoted to energy issues, or the 2013 and 2014 Data Analytics for Renewable Energy (DARE) workshops at the ECML conference. In this vein, another valuable contribution of our paper may be the thorough introduction we make to the solar radiation forecasting problem, which we believe will also be of interest to ML readers.

6.2. Conclusions and further work The increasing integration of renewable energies into the electrical system is demanding better forecasts of the meteorological variables involved. Of course this requires advances in the NWP systems now in exploitation but the complexity of these systems makes these advances relatively slow. In the mean time, ML and, in general, artiﬁcial intelligence systems can be of great interest to improve basic NWP predictions and tailor them to precise locations. In this work we have shown for seven locations in Spain how ML techniques such as Support Vector Regression, Gradient Boosted Regression or Random Forest Regression can improve the initial radiation forecasts provided by the state-ofthe-art ECMWF model. More precisely, our results show clearly that more accurate local forecasts can be derived using ML models (namely SVR here) having as inputs the ECMWF predictions, and that, moreover, a hybrid approach can improve in some cases the results of a single ML model. Moreover, we show how a concrete ML approach can be applied in different ways to build competing models, which in our case means that better results can be achieved by building ﬁrst global models to predict aggregated daily radiation and deriving then hourly radiation values by disaggregating daily radiation, ﬁrst, into 3-h values using ECMWF's aggregated radiation forecasts and, then, into hourly values using an empirical CS curve. While we have concentrated here on radiation forecasting, an obvious conclusion is that the previous or other ML methods may be exploited with proﬁt to obtain better forecasts in the ﬁeld of solar energy, something that may result in improved economic yields to farm operators (as they will have to bear smaller deviation penalties) and an easier operation and integration of these energies in the electrical system. Moreover, our concept of empirical CS curve as a maximum of radiation values immediately translates to an equivalent one of a clear sky energy curve as a maximum of solar energy values. In any case, there is clearly room for further improvements even at the radiation level. A particularly intriguing question is why the potentially more precise 3-h models perform slightly worse that the disaggregated daily models. A cause may lie in the “jumpy” nature of the 3-h aggregated radiation values, that may present relatively large differences at consecutive 3-h intervals. This suggests us to build NWP-radiation models over hourly patterns obtained by disaggregating ﬁrst the 3-h ECMWF's GHR forecasts into hourly values using a clear sky curve and simply interpolating, perhaps even linearly, total cloud cover values. Another clear area for further work is to study new approaches to hybrid modeling that improve on the very simple one used here. An obvious ﬁrst step in this direction is to ensure that the errors of the basic ML models to be incorporated into the hybrid one are largely uncorrelated. We are presently looking into these and other related issues.

11

Acknowledgments With partial support from Spain's Grants TIN2010-21575-C0201 and TIN2013-42351-P, and the UAM–ADIC Chair for Data Science and Machine Learning. The ﬁrst author was also supported by an UAM–ADIC Chair grant and the second author by an FPIUAM grant. We thank the Departamento de Aplicaciones para la Operación of Red Eléctrica de España for providing the radiation data and many helpful discussions. Finally, the authors gratefully acknowledge the use of the facilities of Centro de Computación Cientíﬁca (CCC) at UAM. References [1] J. Jensenius, G. Cotton, The development and testing of automated solar energy forecasts based on the model output statistics (MOS) technique, in: 1st Workshop on Terrestrial Solar Resource Forecasting and on Use of Satellites for Terrestrial Solar Resource Assessment, 1981, pp. 22–29. [2] R. Guarnieri, F. Martins, E. Pereira, S. Chuo, Solar Radiation Forecasting using Artiﬁcial Neural Networks, National Institute for Space Research, vol. 1, 2008, pp. 1–34. [3] A. Mellit, S.A. Kalogirou, Artiﬁcial intelligence techniques for photovoltaic applications: a review, Progr. Energy Combust. Sci. 34 (5) (2008) 574–632. [4] J. Remund, R. Perez, E. Lorenz, Comparison of solar radiation forecasts for the USA, in: European PV Conference, Valencia, Spain, 2008. [5] B. Espinar, L. Ramires, A. Drews, H.G. Beyer, L.F. Zarzalejo, J. Polo, L. Martin, Analysis of different comparison parameters applied to solar radiation data from satellite and German radiometric stations, Solar Energy 83 (1) (2009) 118–125. [6] E. Lorenz, J. Remund, S. Müller, W. Traunmüller, G. Steinmaurer, J. Ruiz-Arias, V. Fanego, L. Ramirez, M. Romeo, C. Kurz, L. Pomares, C. Guerrero, Benchmarking of different approaches to forecast solar irradiance, in: 24th European Photovoltaic Solar Energy Conference, 2009, pp. 4199–4208. [7] S. Boﬁnger, G. Heilscher, Solar electricity forecast—approaches and ﬁrst results, in: 21st European Photovoltaic Solar Energy Conference, 2006. [8] R. Perez, S. Kivalov, J. Schlemmer, K. Hemker Jr., D. Renné, T.E. Hoff, Validation of short and medium term operational solar radiation forecasts in the US, Solar Energy 84 (12) (2010) 2161–2172. [9] P. Mathiesen, J. Kleissl, Evaluation of numerical weather prediction for intraday solar forecasting in the continental united states, Solar Energy 85 (2011) 967–977. [10] B. Schölkopf, A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2001. [11] J.H. Friedman, Greedy function approximation: a gradient boosting machine, Ann. Stat. (2001) 1189–1232. [12] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32. [13] T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, R. Tibshirani, The Elements of Statistical Learning, vol. 2, Springer, 2009. [14] C. Chang, C. Lin, LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (2011) 27:1–27:27, Software available at 〈http://www. csie.ntu.edu.tw/ cjlin/libsvm〉. [15] L. Breiman, J. Friedman, R. Olshen, C. Stone, Classiﬁcation and Regression Trees, Wadsworth and Brooks, Monterey, CA, 1984. [16] S. Li, M. Tan, Tuning svm parameters by using a hybrid clpso-bfgs algorithm, Neurocomputing 73 (10–12) (2010) 2089–2096. [17] Y. Bao, Z. Hu, T. Xiong, A PSO and pattern search based memetic algorithm for svms parameters optimization, Neurocomputing 117 (2014) 98–106. [18] T. Xiong, Y. Bao, Z. Hu, Multiple-output support vector regression with a ﬁreﬂy algorithm for interval-valued stock price index forecasting, Knowl. Based Syst. 55 (2014) 87–100. [19] D.R. Myers, Solar radiation modeling and measurements for renewable energy applications: data and model quality, Energy 30 (9) (2005) 1517–1531. [20] C. Kandilli, K. Ulgen, Solar illumination and estimating daylight availability of global solar irradiance, Energy Sources Part A: Recovery Util. Environ. Effects 30 (2008) 1127–1140. [21] F. Kasten, A. Young, Revised optical air mass tables and approximation formula, Appl. Opt. 38 (1989) 4735–4738. [22] A.B. Meinel, M.P. Meinel, Applied Solar Energy, Addison Wesley Publishing Co., 1976. [23] R.E. Bird, R. Hulstrom, Simpliﬁed Clear sky Model for Direct and Diffuse Insolation on Horizontal Surfaces, Technical Report No. SERI/TR-642-761, Solar Energy Research Institute, Golden, CO, 1981. [24] J. Kleissl, Solar Energy Forecasting and Resource Assessment, Academic Press, 2013. [25] D. Elizondo, G. Hoogenboom, R. McClendon, Development of a neural network model to predict daily solar radiation, Agric. For. Meteorol. 71 (1994) 115–132. [26] M. Benghanem, A. Mellit, S. Alamri, Ann-based modelling and estimation of daily global solar radiation data: a case study, Energy Convers. Manag. 50 (7) (2009) 1644–1655. [27] H. Pedro, C. Coimbra, Assessment of forecasting techniques for solar power output with no exogenous inputs, Solar Energy 86 (2012) 2017–2028.

Please cite this article as: Y. Gala, et al., Hybrid machine learning forecasting of solar radiation values, Neurocomputing (2015), http: //dx.doi.org/10.1016/j.neucom.2015.02.078i

12

Y. Gala et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

[28] H. Pedro, C. Coimbra, Stochastic learning methods, in: J. Kleissl (Ed.), Solar Energy Forecasting and Resource Assessment, Academic Press, 2013, pp. 383–407. [29] B. Wolff, E. Lorenz, O. Kramer, Statistical learning for short-term photovoltaic power predictions, in: Proceedings of the DARE 2013, Data Analytics for Renewable Energy Integration Workshop, 2013, pp. 2–13.

Yvonne Gala is a Ph.D. student in the Computer Science Department of the Universidad Autónoma de Madrid, Spain. She received from this university her degree in Mathematics in 2010, and her M.Sc. degree in Computer Science in 2013. Her main research is focused on Support Vector Machines for Big Data applied to renewable energy prediction, and also covers other machine learning techniques related to Big Data.

Ángela Fernández is a Data Scientist in Health and Energy Predictive Analytics department at the Instituto de Ingeniería del Conocimiento (IIC) and Part Time Professor of Computer Engineering at the Universidad Autónoma de Madrid. She received from this university her degrees in Computer Engineering and in Mathematics in 2009, as well as her M.Sc. degree in Computer Science in 2010 and her Ph.D. in 2014. Her main research is focused on manifold learning and clustering, but also covers additional machine learning and pattern recognition paradigms.

Julia Díaz (Ph.D., Universidad Autónoma de Madrid; Spain) is the Health and Energy Predictive Analytics Innovation Manager at the Instituto de Ingeniería del Conocimiento (IIC) and Part Time Professor of Computer Engineering at the Universidad Autónoma de Madrid. She has authored more than 20 scientiﬁc papers in machine learning and applications, has managed a large number of research and innovation projects and has 10 years’ experience on renewable energy prediction.

José R. Dorronsoro (Ph.D., Washington University in St Louis; USA) is a Professor of Computer Engineering at the Universidad Autónoma de Madrid. He has authored more than 100 scientiﬁc papers in mathematical analysis, machine learning and applications, has managed a large number of research and innovation projects and has 10 years experience on wind and solar energy prediction. Dr Dorronsoro is also a Senior Scientist at the Instituto de Ingenierá del Conocimiento (IIC), where he leads IICs research and innovation on wind and solar energy prediction, that services more than 100 wind and solar farms in Spain and about 4 GW.

Please cite this article as: Y. Gala, et al., Hybrid machine learning forecasting of solar radiation values, Neurocomputing (2015), http: //dx.doi.org/10.1016/j.neucom.2015.02.078i

Hybrid machine learning forecasting of solar radiation values

Hybrid machine learning forecasting of solar radiation values

Recommend Documents