Solar Energy 150 (2017) 423–436
Contents lists available at ScienceDirect
Solar Energy journal homepage: www.elsevier.com/locate/solener
Multi-site solar power forecasting using gradient boosted regression trees Caroline Persson a,⇑, Peder Bacher a, Takahiro Shiga b, Henrik Madsen a a b
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark Social Systems Research Domain, Toyota Central R&D Labs., Inc., Nagakute, Aichi, Japan
a r t i c l e
i n f o
Article history: Received 31 January 2017 Received in revised form 25 April 2017 Accepted 26 April 2017
Keywords: Solar power forecasting Multi-site forecasting Spatio-temporal forecasting Regression trees Gradient boosting Machine learning
a b s t r a c t The challenges to optimally utilize weather dependent renewable energy sources call for powerful tools for forecasting. This paper presents a non-parametric machine learning approach used for multi-site prediction of solar power generation on a forecast horizon of one to six hours. Historical power generation and relevant meteorological variables related to 42 individual PV rooftop installations are used to train a gradient boosted regression tree (GBRT) model. When compared to single-site linear autoregressive and variations of GBRT models the multi-site model shows competitive results in terms of root mean squared error on all forecast horizons. The predictive performance and the simplicity of the model setup make the boosted tree model a simple and attractive compliment to conventional forecasting techniques. Ó 2017 Elsevier Ltd. All rights reserved.
1. Introduction Today’s society is facing huge challenges in form of climate changes caused by increased emissions of greenhouse gasses into the atmosphere. Different initiatives in progress are aiming to lower human caused CO2-emissions, e.g. the European goals of a CO2-neutral Europe by 2050 (European Commision, 2011). The urge for a green transition has initiated research and development focusing on sustainable and green technologies. Investments are constantly made in the search for less expensive and more efficient technologies and strategies to promote the transition from the current dependence on fossil fuels to a future society relying on energy from renewable resources. For non-dispatchable renewables such as wind and solar power new challenges arise, since power generation from these sources depends heavily on climatological and meteorological conditions. This complicates the task of forecasting their power generation, which is important in order to schedule power generation and manage the allocation of reserve capacities. Thus, good forecasts of renewable power generation are necessary when striving for an efficient and cost beneficial incorporation of renewable power into the electricity market. As the scale of energy systems influenced by variable energy sources is growing the need for efficient forecasting tools is
⇑ Corresponding author. E-mail address:
[email protected] (C. Persson). http://dx.doi.org/10.1016/j.solener.2017.04.066 0038-092X/Ó 2017 Elsevier Ltd. All rights reserved.
increasing. It is no longer sufficient to only have small scale forecasting models for a single wind farm or photovoltaic (PV) installation. To optimize integration of renewable energy, forecasting models need to be able to deal with production spanning larger geographical areas, increasing amounts of historical data and computational time limits. Non-parametric machine learning algorithms have shown to be efficient in statistical modelling involving large amounts of data and highly non-linear patterns. Thus, it is relevant to explore the application of these methodologies in the field of solar power forecasting.
1.1. Bibliographic review Several examples of non-parametric modelling techniques applied in the field of solar power forecasting exist in the literature. Sharma et al. (2011) use weather forecasts to predict hourly solar intensity as a proxy for solar power generation. The study compares multiple regression techniques for generating prediction models, including linear least squares and Support Vector Machines (SVM) using multiple kernel functions. Furthermore, dimensionality reduction is explored using Principal Component Analysis (PCA). The best performance is obtained by an SVM model with a Radial Basis Function (RBF) kernel being 51% more accurate than a naive persistence model (Antonanzas et al., 2016) in terms of root mean squared error (RMSE). The choice of an RBF kernel and the dimensionality reduction using PCA improved the SVM
424
C. Persson et al. / Solar Energy 150 (2017) 423–436
model significantly. Similarly, Ragnacci et al. (2012) propose a model based on SVM for forecasting of PV power, where multiple techniques for dimensionality reduction of the input variables are exploited. The best results are obtained with the supervised dimensionality reduction method Covariance Operator Inverse Regression (COIR) (Kim and Pavlovic, 2008). These results clearly support the importance of proper feature and model selection. This paper takes advantage of the automated feature selection of the Gradient Boosted Regression Trees (GBRT). By introducing a regularization term to the fitting algorithm of the GBRT, the feature selection is performed automatically (Hastie et al., 2011) and no preliminary dimensionality reduction step is needed. Moreover, in contrast to PCA and its variants, GBRT do not suffer from any loss of interpretation due to transformations of the input variables. Marquez and Coimbra (2011) propose the use of Artificial Neural Networks (ANN) to forecast global horizontal irradiance (GHI) and direct normal irradiance (DNI) using weather forecasts as predictors. A preliminary feature selection is performed using a Genetic Algorithm (GA) and a Gamma Test. All ANN models proposed show major improvements over the reference model. The improvements are particularly accentuated for models forecasting DNI. Pedro and Coimbra (2012) fit several forecasting models, which predict the hourly PV power generation for one and two hours ahead only using endogenous variables. The methods studied in the paper are among others Autoregressive Integrated Moving Average (ARIMA), k-Nearest-Neighbors (kNN), ANN and ANN optimized by Genetic Algorithms (ANN/GA). The ANN based methods outperform the other models on several error metrics. Only in terms of mean bias error (MBE) the ARIMA model is slightly better for certain time intervals. The ANN/GA model performs better than the ANN in all tests, emphasizing the positive effect of optimizing the parameters and input variables of the ANN model. Finally, Pedro and Coimbra (2012) reports an improvement of 32:2% over a persistence model for the ANN/GA in terms of RMSE for forecasts one hour ahead. As exemplified in Marquez and Coimbra (2011) and Pedro and Coimbra (2012) the use of ANN has resulted in good predictive performance. Some of the advantages of ANN models are their ability to model highly non-linear relations, moreover the models do not require any assumptions about the underlying process relating input and output variables. On the other hand, it is not always enough to simply produce good predictions. Often it is also desirable to have information providing qualitative understanding of the relationship between joint values of the input variables and the resulting predicted response value. Thus, black box methods such as ANN models (Hastie et al., 2011), which can be quite useful in purely predictive settings, do not represent the ideal methodology if a transparent interpretation of the models is desired. As shown later, it is possible to compute the importance of the input variables of the GBRT models. Visualization of the importance of input variables is a time efficient way to identify and compare dominant input variables across models, in this case models of different forecast horizons. Zamo et al. (2014a) evaluate the performance of several different statistical model approaches including SVM, Binary Regression Trees, Random Forest (RF), GBRT and Generalized Additive Models (GMA) for forecasting of hourly PV power generation for lead times between 28 and 45 h, i.e. one day ahead with an hourly resolution. Predictors consist of 31 different outputs of a NWP model. The study evaluates the predictive performance of 28 individual PV plants, as well as the performance of predicting the aggregated power generation over all plants. The benchmarking designates RF as the best forecast model. For individual power plants, the median RMSE over 30-fold cross-validation for the RF model is between 10% and 12% of the maximum measured production. In
general, this paper supports the application of decision tree models within the field of solar power forecasting. Bessa et al. (2015) investigate the influence of including spatial and temporal information when forecasting solar power generation one to six hours ahead. Data consists of power series from 44 micro-generation units. The paper uses a vector autoregressive (VAR) framework, where lagged terms from the target installation and the neighboring installations are included as predictors resulting in a multi-output linear regression model. The VAR model is compared to a recursive autoregressive (AR) model with no spatial information. The results show that for point-forecasts the spatiotemporal VAR model on average outperforms the AR model with 12:5% for lead time one and 0:1% for lead time six in terms of normalized RMSE. Moreover, Bessa et al. (2015) apply gradient boosting to estimate the coefficients of a probabilistic VAR model and a VAR model with exogenous input (VARX). The boosting uses linear base learners and a quantile loss function to fit quantiles ranging between 5% and 95% with 5% increments. Tree based models have also been applied for probabilistic forecasting of solar power generation, e.g. Zamo et al. (2014b) and Almeida et al. (2015) successfully apply Quantile Regression Forests to estimate quantiles of the PV power generation in order to produce probabilistic forecasts. In this paper the application of regression trees, or more precisely, Gradient Boosted Regression Trees (GBRT) for prediction of future power generation from PV rooftop installations are thoroughly analyzed. Data consists of hourly power measurements from 42 individual PV installations and associated weather forecasts for their respective locations. The objective is to fit a model which is capable of predicting future power generation for a PV installation given its location, historical power generation and available weather forecasts. When using the location of the PV installations as inputs to the GBRT it is possible to make predictions in a multi-site framework. Instead of fitting an individual model for each PV installation the data from all 42 PV installations are used to fit one multi-site model. An individual multi-site GBRT model is trained for each forecast horizon resulting in six individual models. Together the combined predictions from these six models constitute the multi-site forecasts one to six hours ahead. The remainder of this paper is structured as follows. In Section 2 the data and techniques for normalization are presented. In Section 3 the different models including benchmark models and the GBRT models are described. In Section 4 the intermediate and final results are presented. Furthermore, model selection, parameter tuning and feature exploration are thoroughly described. In Section 5 shortcomings and potential improvements are discussed. Finally, in Section 6 the paper is concluded and the major findings are summarized.
2. Data analysis and preprocessing The data consists of historical power observations, information about the location of the PV installations and Numerical Weather Predictions (NWP). The data spans the period from April 19, 2014 to February 28, 2015. Power observations and weather forecasts are available in an hourly resolution. The data is divided into a training and a test set. The training set consists of the first 75% of the data (April 19, 2014 to December 12, 2014) and is further divided into K equally sized subsets without shuffling, referred to as the validation sets. The validation sets are used for K-fold cross-validation in order to optimize parameters for model selection as described in Section 4. The test set consists of the last 25% of the data (December 13, 2014–February 28, 2015) and is exclusively used for evaluation and comparison of the final models.
425
C. Persson et al. / Solar Energy 150 (2017) 423–436
The time series of the power generation should in principle be denoted Y i;t , where i is referring to a unique PV installation and t is time. However, in the remaining part of the paper the i subscript is omitted for simplicity as all variables, formulas and models (except the multi-site GBRT models) are similar for each PV installation.
using quantile regression (Koenker, 2005), where the regressors are the seasonal terms
2pkt þ b1;k 24 365:25 k¼1 2pkt 2pkt þ a2;k sin þ b2;k cos 24 365:25 24 2pkt cos 24
b cs ¼ a0 þ Y t
2.1. Power data The power series Y t consists of hourly averages of power generation from each of the 42 PV installations. Fig. 1 shows the geographical locations of the 42 PV installations around Nagoya Bay in Japan. The installations have an installed capacity in the range of 2:44 to 9:66 kWp . The power generation observed during night time is discarded and simply assumed to be zero. Night time is defined to be when the zenith angle of the sun is smaller than 87 . The graphs in Fig. 2 show the power generation as a function of the zenith angle for three different PV installations for all observations in the training set. The vertical line is placed at a zenith angle of 87 . To better visualize the trend between power generation and zenith angle the power has been smoothed. As smoothing technique a rolling median of the power observations is applied with a window of one zenith angle. The smoothing is only applied for this visualization and not for any further analysis or modelling. The amount of discarded energy ranges from approximately 0:5%– 2:2% with an average percentage around 1:3% over the training set for all 42 PV installations. 2.1.1. Normalization For the linear autoregressive (AR) model and GBRT 1n and GBRT 2n models described in Section 3 normalized power data is used for modelling. Inspired by the work of Bacher et al. (2009) and Huang et al. (2013) the solar power is normalized by an estimate b cs , of the solar power generation under clear-sky conditions Y t
which is modelled as a function of the time of day and the time b cs is estimated as the 99th quantile of the power series of year. Y t
2 X
a1;k sin
ð1Þ
a0 is the estimated mean of the data, a1;k and b1;k are coefficients of the yearly and twice yearly cycles and a2;k and b2;k are coefficients of the daily and twice daily cycles. The coefficients are computed b cs such that using quantile regression with the objective to fit Y t
h i b cs ¼ 0:99 P Yt 6 Y t
ð2Þ
b cs defines the normalized power series The ratio between Y t and Y t
st , also referred to as the clear sky index:
st ¼
Yt b Y cs t
ð3Þ
The motivation for the normalization is to obtain a more stationary process for the AR modelling, which is elaborated in Section 3. The GBRT models analyzed are trained first on raw power measurements and secondly on normalized power measurements in order to compare the influence of the normalization on the model b cs and st over some days in the beginapproach. Fig. 3 shows Y t ; Y t
ning of August 2014 for a specific PV installation (PV1). Some non-stationarity is still present in the morning and night hours of the normalized time series. This can be seen on e.g. the first day in Fig. 3 which is characterized by clear-sky conditions.
Fig. 1. Location of the 42 PV installations. The size of the markers is scaled by installed capacity.
426
C. Persson et al. / Solar Energy 150 (2017) 423–436
Fig. 2. Smoothed power generation as a function of solar zenith angle for three different PV installations.
b cs . Down: Normalized power Fig. 3. Up: Example of Y t and Y t
2.2. Weather forecasts Numerical Weather Predictions (NWP) are provided by Japan Meteorological Agency (JMA). The weather forecasts originates from a NWP mesoscale model (JMA MSM) managed by JMA. For technical details about the model, see JMA (2013). The JMA MSM provides forecasts of several weather variables every three hours on a spatial grid size of 5 km 5 km. Contiguous forecasts one to six hours ahead are extracted for each hour. The weather variables available for this project are listed in Table 1. For a given PV installation the weather forecasts from the closest grid point measured in euclidean distance are extracted and associated with the installation. The weather forecasts are preprocessed such that the most recent ones are included in the modelling. Due to calculation delay the NWP of the most recent forecasts are four to seven hours old.
st defined as the ratio of the two curves in the upper graph.
to as the time of issue, and t þ k is the time of prediction, i.e. k hours ahead. Besides the NWP data, historical power observations are used to create additional predictors. Among these are the three latest observations Y t ; Y t1 ; Y t2 , the difference between the two latest observations Y t Y t1 and the diurnal lagged observation Y t24þk . The 99th quantile of all the observations over the past three weeks (504 observations) is also included as a regressor toy and is denoted Y toy indicates whether the last t . The value of Y t three weeks were a period of high or low power generation. Y toy t can be interpreted as a function of the time of year, hence ‘toy’. Finally, the 99th quantile of the 21 observations matching the hour of prediction over the past three weeks is included as well and is tod denoted Y tod tþkjt . Y tþkjt contains information about the power generation for a given hour during the past three weeks and can be inter-
preted as a function of the time of day. Y toy and Y tod t tþkjt are calculated such that
2.3. Feature engineering The objective is to predict future power generation Y tþk . The b tþkjt , where t is the current time, referred predictions are denoted Y Table 1 NWP data provided by JMA to be used for modelling. Variable
(Abbreviation)
[Unit]
Temperature Cloud cover – low Cloud cover – medium Cloud cover – high Cloud cover – total Relative humidity Precipitation Wind speed – west Wind speed – south
(NWPTa ) (NWPCCLow ) (NWPCCMid ) (NWPCCHigh ) (NWPCCTotal ) (NWPRh ) (NWPPrecip ) (NWPWsw ) (NWPWss )
[ C] [%] [%] [%] [%] [%] [mm] [m/s] [m/s]
P Y tj 6 Y toy ¼ 0:99; j ¼ 1 . . . 504 t h i P Y ðtþkÞ24j 6 Y tod j ¼ 1 . . . 21 tþkjt ¼ 0:99;
ð4Þ ð5Þ
The solar zenith angle is deterministic and calculated (Reda and Andreas, 2004) for a given time and location. Thus, the zenith angle denoted ztþk is included as well for the time of prediction for each PV installation. Table 2 lists the available input variables and the response. Since the purpose of multi-site modelling is to differentiate, within one model between different locations in both input and output space, it is important to keep track of which PV installation a given set of observations belongs to. Thus, the subscript i is included in the notation in Table 2. Notice that only the multi-site GBRT models described in Section 3 take all the predictors as input, whereas the persistence, climatology, AR models and single-site
427
C. Persson et al. / Solar Energy 150 (2017) 423–436 Table 2 Definition of the indices and variables used for modelling. Index
Definition
t i k
Time, hourly from April 19, 2014 to February 28, 2015 PV installation id: fPV1; PV2; . . . ; PV42g Lead time in hours: f1; 2; . . . ; 6g
Variable Y i;t Y i;t1 Y i;t2 Y i;t24þk Y i;t Y i;t1
⁄
Y tod i;tþkjt
Latest observation of PV solar power generation⁄ Observation at time t 1⁄ Observation at time t 2 ⁄ Observation from the previous day at the same time as time of prediction The difference between the two latest observations ⁄ The quantile with a nominal level of 0:99 as a function of time of day ⁄
Y toy i;t
The quantile with a nominal level of 0:99 as a function of time of year
Lat i Loni zi;tþk NWP w;i;tþk
The north–south position of the PV installation on the Earth’s surface The east–west position of the PV installation on the Earth’s surface The zenith angle, i.e. the elevation of the sun with respect to the horizon w ¼ fTa; CCLow; CCMid; CCHigh; CCTotal; Rh; Precip; Wsw; Wssg, see Table 1
Prediction b i;tþkjt Y
Prediction of PV solar power generation k hours ahead
⁄
⁄
Either measured in Watts or as normalized power generation dependent on the model approach.
GBRT models as defined later on in Section 3 use a subset of the variables listed in Table 2. 3. Forecasting models Different models of increasing complexity are fitted in order to benchmark the more complex models to the simpler models. The models are presented in terms of increasing complexity: Persistence, Climatology, Recursive autoregressive (AR) model and single-site as well as multi-site gradient boosted regression trees (GBRT). Notice that each model has its parameters optimized independently for each forecast horizon k ¼ 1; . . . ; 6. Furthermore, each of the models, except the multi-site GBRT, is fitted separately for each PV installation i ¼ PV1; PV2; . . . ; PV42 and thus referred to as single-site models. Only the multi-site GBRT model is trained on data from all PV installations simultaneously and thus referred to as a multi-site approach. 3.1. Persistence and climatology Persistence models are among the most simple benchmark models in the context of forecasting. Here the Persistence predicts the solar power to be equal to the latest observation corrected by the clear sky index
b tþkjt ¼ Y b cs st Y tþk
⁄
ð6Þ
3.2. Adaptive linear model Adaptive recursive linear AR time series models are used to pre^tþkjt from Eq. (3). AR models are dict the normalized solar power, s linear time-invariant, hence only optimal for stationary time series processes (Madsen, 2008). As the normalized power st represents a more stationary process than the raw power series an AR model is fitted to the normalized power. The predictors consist of the latest normalized observation st , the latest normalized diurnal lagged observation st24þk and the forecast of the medium cloud cover NWPCCMid;tþk for the time of prediction. As the cloud cover is included the model is not purely autoregressive, but autoregressive with exogenous input, also referred to as an ARX model (Bacher et al., 2009). The prediction of the Recursive AR model is
s^tþkjt ¼ m þ a1 st þ a2 st24þk þ b1 NWPCCMid;tþk
ð8Þ
and the parameters are fitted with a recursive least square (RLS) scheme, i.e. the coefficients are updated for every new observation arriving, for further details see Bacher et al. (2009). The algorithm includes a forgetting factor k, which controls the weight of the historical observations when estimating the coefficients. For each PV installation and forecast horizon the forgetting factor is chosen among the set f0:900; 0:905; . . . ; 0:995g such that the root mean squared error is minimized for the test set. 3.3. Gradient boosted regression trees
Persistence models are often competitive on short forecast horizons when a strong correlation between consecutive observations exists. See Antonanzas et al. (2016) and Marquez et al. (2013) for further details. Another common benchmark in the field of forecasting is climatology. Climatology often represents a more general pattern in the time series. In this case the Climatology is defined as the mean of all available daylight observations up to time t without any dependence on the time of day. If having m past observations, this is
From a machine learning perspective the task of forecasting power generation is categorized as a supervised regression problem. Data consists of explanatory input variables and for each input sample a corresponding target value (or response value) exists. Boosting is a powerful learning strategy originally designed for classification problems, but have successfully been extended to regression as well. The motivation for boosting is to combine the output of many ‘‘weak learners” into a powerful ‘‘committee” (Hastie et al., 2011). Gradient boosting considers additive models trained in a forward stage-wise manner of the form
m1 X b tþkjt ¼ 1 Y Y tj m j¼0
F m ðxÞ ¼ F m1 ðxÞ þ hm ðxÞ
ð7Þ
Fig. 4 shows an example of how the predictions from the Persistence and the Climatology look like for a forecast horizon of one hour.
ð9Þ
where hm ðxÞ are the basis functions referred to as weak learners. In the case of GBRT the basis functions hm are small regression trees of fixed size and thus the GBRT model F m ðxÞ is a sum of m small regression trees. For each boosting iteration m a new regression tree is
428
C. Persson et al. / Solar Energy 150 (2017) 423–436
Fig. 4. Example of predictions from the Persistence and the Climatology on a forecast horizon of one hour.
added to the GBRT model. The input x is the set of variables from Table 2. The procedure aims at estimating the response Y i;tþk from the training set, which implies the perfect hm to satisfy
F m ðxi;t Þ ¼ F m1 ðxi;t Þ þ hm ðxi;t Þ ¼ Y i;tþk
ð10Þ
which is equivalent to
hm ðxi;t Þ ¼ Y i;tþk F m1 ðxi;t Þ
ð11Þ
hm in Eq. (11) is the model fitting the current residuals r m;i;t ¼ Y i;tþk F m1 ðxi;t Þ at iteration m. Notice that the current residuals are the negative gradients of the squared error loss function
@ 12 ðY i;tþk F m1 ðxi;t ÞÞ2 ¼ Y i;tþk F m1 ðxi;t Þ @F m1 ðxi;t Þ
ð12Þ
This shows that hm equals the negative gradient of the squared loss function. Moreover, Eq. (12) proves that the gradient boosting algorithm minimizing the squared error loss function is a gradient descent (or steepest descent) algorithm. This generalizes to other loss functions by substituting the squared error with a different loss function and its gradient. For an exhaustive description of gradient boosting and gradient boosted regression trees, see Hastie et al. (2011), Ridgeway (2007) and Scikit-learn (2015a). For each boosting iteration a regression tree is fitted to the current residuals. Adding enough regression trees to the final model results in arbitrarily small training error. To avoid overfitting a simple regularization strategy is to scale the contribution of each regression tree by a factor m
F m ðxÞ ¼ F m1 ðxÞ þ mhm ðxÞ
m 2 ½0; 1
ð13Þ
The parameter m is also called the learning rate as it scales the step length of the gradient descent procedure. m strongly interacts with the number of boosting iterations M. Smaller values of m require more iterations, i.e. a higher number of basis functions, in order to make the training error converge. Empirical evidence suggests that small values of m favor better test error. Hastie et al. (2011) recommends to set the learning rate to a small constant and choose M by early stopping. For a more detailed discussion of the interaction between m and M, see Ridgeway (2007). 3.3.1. Predicting solar power GBRT models are purely data-driven and non-parametric. If any dependencies between input and output variables exist, it is up to the model itself to discover these dependencies or patterns based on historical events. The input variables are the 19 variables listed in Table 2. In total four different variations of the GBRT model are proposed:
GBRT 1: Multi-site model using raw power observations GBRT 1n: Multi-site model using normalized power observations GBRT 2: Single-site model using raw power observations GBRT 2n: Single-site model using normalized power observations The multi-site approaches include data from all PV installations in one GBRT model, while on the contrary, the single-site approaches fit a separate GBRT model for each PV installation. Tuning of relevant parameters (e.g. number of boosting iterations and learning rate) is carried out individually for each model. Notice that the single-site GBRT models omit the variables Lon and Lat, since they do not contribute with any information in a single-site framework where installations are modelled individually. Fig. 5 shows an example of how the predictions from the Recursive AR and the GBRT 1 model look like for a forecast horizon of one hour. 4. Results The results, including the model selection exemplified with the multi-site GBRT 1 model, are presented in this section. Furthermore, a feature analysis is carried out and accompanied by visualizations of feature importance. Finally, the models described in Section 3 are compared and analyzed. All computations are performed in the Python environment version 2.7.10 using the scikit-learn package, see Scikit-learn (2015a) and Pedregosa et al. (2011). 4.1. Error metrics Evaluation metrics for solar power forecasting have been discussed in several papers including Marquez and Coimbra (2013), Coimbra et al. (2013), Hoff et al. (2013) and Antonanzas et al. (2016). In order to compare the predictive performance of the different PV installations the forecast performance is measured by the normalized root mean squared error, nRMSE defined in e.g. Antonanzas et al. (2016) and Hoff et al. (2013) in a solar power context. The nRMSE for validation and test sets are denoted: nRMSEval and nRMSEtest for lead time k, respectively. The GBRT k k models are tuned using 5-fold cross-validation, therefore the is the average nRMSEk of the 5 validation sets. In order nRMSEval k to provide a fair comparison of the performance on the test set of the final models, the normalized mean absolute error nMAEtest k , test;Pers , with normalized mean bias error nMBEtest k and skill score SSk
C. Persson et al. / Solar Energy 150 (2017) 423–436
429
Fig. 5. Example of predictions from the Recursive AR and the GBRT 1 model on a forecast horizon of one hour.
respect to the Persistence, are also presented. The calculation of nRMSE for a fixed lead time k is
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! u P u X b i;tþkjt 2 Y i;tþk Y N 1X 1 t nRMSEk ¼ t¼1 P i¼1 N Y max i
ð14Þ
where N is the number of predictions to evaluate, P is the number of is the maximum observed power meaPV installations and Y max i surement in the training set for PV installation i. The rest of the notation is summarized in Table 2. The nMAE is defined as
nMAEk ¼
P N b i;tþkjt j j Y i;tþk Y 1X 1X max P i¼1 N t¼1 Yi
The nMBE is defined as P N b i;tþkjt Y i;tþk Y 1X 1X nMBEk ¼ max P i¼1 N t¼1 Yi
!
ð15Þ
! ð16Þ
The SSPers is defined as k
SSPers ¼1 k
nRMSEk nRMSEk;Pers
ð17Þ
where nRMSEk;Pers is the nRMSEk of the Persistence. To compare the performance of the models predicting normalized power with models predicting raw power measurements, the normalized predic^i;tþkjt are transformed back to Watts using the precomputed tions s cs b Y prior to evaluation tþk
b cs s b i;tþkjt ¼ Y ^i;tþkjt Y i;tþk
ð18Þ
4.2. Model selection A detailed description of the parameter tuning for the GBRT models are exemplified with results from the multi-site GBRT 1 model trained on raw power data. The GBRT model have four parameters to tune:
Size of regressions trees J Number of iterations M Shrinkage m Subsample g
All regression trees hm are restricted to be of the same size, J m ¼ J; 8m. At each boosting iteration a J-terminal node regression tree is induced. This makes J a parameter of the boosting procedure
that should be adjusted, such that the model performance is maximized. According to Hastie et al. (2011) experience indicates that 4 6 J 6 8 works well in the context of boosting, with results being fairly insensitive to particular choices in this range. Thus, J is set to 6 and is not further optimized. To tune M; m and g 5-fold crossvalidation is carried out. The parameters resulting in the lowest average validation error are chosen. Initial model runs indicated that up to 200 iterations should suffice for the error measure to converge. The shrinkage value enforces regularization and makes the model more robust to overfitting. The smaller the shrinkage value is, the more regularization is enforced, and more iterations are needed for nRMSEval to converge. Cross-validation is carried out k for m ¼ ½0:9; . . . ; 0:1; 0:09; . . . ; 0:01 with M ¼ 1; . . . ; 200. Fig. 6 for different choices of m as a function of M shows the nRMSEval k for the multi-site GBRT 1 model with a forecast horizon of one (left) and six (right) hours. The results for other forecast horizons look similar. With little shrinkage (high values of m), the validation error converges very fast and after a few iterations the validation error starts to increase due to overfitting. Higher regularization results in a lower validation error and a more robust model. However, the slower convergence also means longer computational time for model fitting. The differences between the models having m < 0:1 are minor after approximately 100 iterations. Subsampling is another regularization strategy, that modifies the algorithm of the GBRT such that the mth regression tree hm is fitted to a fraction g of the current residuals at the mth iteration. Parameter tuning is a time consuming process and ideally M; m and g should be optimized simultaneously. To overcome the computational challenge m is optimized without subsampling (g ¼ 1) and then fixed when optimizing g afterwards. The left plot in for different choices of g as a function Fig. 7 shows the nRMSEval k of M for a forecast horizon of six hours. Notice that m has been fixed to the value found in the previous tuning exemplified in Fig. 6. The right plot in Fig. 7 shows a zoom-in of the rectangle in the left plot. Tuning g only results in minor improvements compared to the tuning of m. Tables 3 and 4 show the parameters resulting in lowest validation error for each of the four GBRT models. The correspondis listed next to the parameter values. Recall that all ing nRMSEval k values in Table 4 for the single-site models are averages over 42 individual models, one for each PV installation. Based on the validation errors given in Tables 3 and 4 the multi-site models outperform the single-site models slightly. Furthermore, the models trained on normalized data perform slightly better than the models trained on raw power data.
430
C. Persson et al. / Solar Energy 150 (2017) 423–436
Fig. 6. Model performance of GBRT 1 in terms of validation error nRMSEval k for different choices of shrinkage m as a function of iterations M for forecast horizon k ¼ 1 and k ¼ 6.
Fig. 7. Model performance of GBRT 1 in terms of validation error nRMSEval k for different choices of the subsample ratio g as a function of iterations M for forecast horizon k ¼ 6. The plot to the right shows a zoom-in of the rectangle in the left plot.
Table 3 Parameter tuning and validation error nRMSEval k after 5-fold cross-validation for the multi-site GBRT models. Horizon k [h]
1 2 3 4 5 6
GBRT 1
GBRT 1n
ðM; m; gÞ
nRMSEval k [–]
ðM; m; gÞ
nRMSEval k [–]
ð169; 0:06; 1Þ ð103; 0:07; 1Þ ð199; 0:03; 0:2Þ ð126; 0:05; 0:5Þ ð113; 0:05; 0:3Þ ð106; 0:06; 0:5Þ
0:0948 0:1158 0:1220 0:1257 0:1272 0:1278
ð130; 0:04; 0:4Þ ð180; 0:03; 0:2Þ ð82; 0:07; 0:7Þ ð163; 0:04; 0:8Þ ð89; 0:05; 0:2Þ ð58; 0:07; 0:6Þ
0:0907 0:1106 0:1190 0:1228 0:1252 0:1265
Table 4 Parameter tuning and validation error nRMSEval k after 5-fold cross-validation for the single-site GBRT models. Horizon k [h]
GBRT 2 ðM; m; gÞ
1 2 3 4 5 6
ð119; 0:049; 0:59Þ ð92; 0:058; 0:52Þ ð103; 0:048; 0:44Þ ð97; 0:055; 0:48Þ ð77; 0:069; 0:52Þ ð98; 0:049; 0:43Þ
GBRT 2n nRMSEval k
[–]
0:0973 0:1184 0:1254 0:1291 0:1316 0:1317
ðM; m; gÞ
nRMSEval k [–]
ð101; 0:051; 0:62Þ ð80; 0:058; 0:47Þ ð91; 0:052; 0:52Þ ð78; 0:058; 0:50Þ ð81; 0:051; 0:49Þ ð65; 0:069; 0:48Þ
0:0937 0:1144 0:1220 0:1257 0:1278 0:1293
C. Persson et al. / Solar Energy 150 (2017) 423–436
4.3. Feature importance Feature interpretation is one of the advantages of GBRT models. It is possible to rank features according to their contribution to the model performance. Features frequently used for splitting branches of the regression trees causing a reduction in training error are ranked better than features not being used for splitting. By tracking the splits and the associated reduction in training error during the model fit the features can be ranked according to their importance in the model. Figs. 8 and 9 show the relative importance of the features for lead times k ¼ 1; 3; 6 for the multi-site models, GBRT 1 and GBRT 1n. The features are sorted such that the most important feature appears on top of the bar plots. Y tod and the zenith angle z both rank high for all three time horizons. This indicates that the power generation as a function of the time of day and the position of the sun are valuable predictors for short as well as long forecast horizons. For a short lead time (k ¼ 1) recent observations (e.g. Y t and Y t Y t1 ) have a high relative importance, which supports the assumption that observations are highly auto-correlated. For a long forecast horizon ðk ¼ 6Þ the importance of these variables decreases and the importance of weather forecasts such as cloud cover indices (NWPCCMid ; NWPCCTotal and NWPCCLow ) and western wind speed (NWPWsw ) increases. Hence, it is found that information about recent power generation is important when predicting on shorter horizons, and on longer forecast horizons weather forecasts contain more important information.
4.4. Model comparison The forecasting models compared are the Persistence, Climatology, Recursive AR and the single-site and multi-site GBRT models with and without normalized power generation as described in Section 3. The models are compared on a forecast horizon of one to six hours. The left plot in Fig. 10 shows nRMSEtest of the different k models. The test error of the Persistence increases rapidly over time, but its competitiveness is very clear for the shortest forecast
431
horizon (k ¼ 1) with a test error of 0.128. The Climatology has an almost constant test error of 0.27 with respect to the different forecast horizons, which is not surprising as it simply predicts the average of the historical observations. The differences between the Recursive AR model and the GBRT model become clearer on the right plot in Fig. 10 where the Climatology is omitted. The Recursive AR model has an overall higher nRMSEtest than the GBRT modk els with a test error ranging from 0.121 to 0.152. The GBRT models perform quite similar, the GBRT 1n performs best on a horizon of one to five hours ahead with a test error ranging from 0.100 to 0.136, and GBRT 1 performs best on a horizon of six hours ahead with a test error of 0.137. The multi-site models (GBRT 1 and GBRT 1n) perform slightly better than the single-site GBRT models (GBRT 2 and GBRT 2n), especially on forecast horizons greater than three hours. Contrary to the validation errors reported in Tables 3 and 4, the models trained on normalized power data do not perform convincingly better than the models trained on raw power data. Thus, the test results do not show any major benefits from normalizing the power data for the GBRT models. Moreover, the validation errors were in general smaller than the test errors. The validation errors, for e.g. k ¼ 1, were for all GBRT models <0.10. This clear difference between validation and test error, is likely a consequence of the seasonal difference between the training and test data sets. Recall that the training set represents April-December and the test set represents December-February. In order to get some insights into the seasonal differences, Fig. 11 shows the normalized variability defined in Eq. (19) of each PV installation for the training and test set, respectively. The average variability of the test set is 11% higher than the average variability of the training set. For a further discussion of the partitioning of the training and test set see Section 5.
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 XN V¼ ðsi¼t si¼t1 Þ2 t¼1 N
ð19Þ
The left plot in Fig. 12 shows the normalized absolute mean error as defined in Eq. (15). For a better comparison of the Recursive AR model and the GBRT models, the Climatology is omitted. The GBRT 1n performs best over all forecast horizons with a
Fig. 8. Feature importance for GBRT 1 (multi-site and power data).
432
C. Persson et al. / Solar Energy 150 (2017) 423–436
Fig. 9. Feature importance for GBRT 1n (multi-site and normalized data).
Fig. 10. Test error nRMSEtest for the different models for each forecast horizon k. The graph on the right shows a zoom-in of the test error nRMSEtest for the Recursive AR model k k and the GBRT model.
nMAEtest ranging between 0:065 and 0:089 from k ¼ 1 to k ¼ 6. k Compared to Fig. 10 the differences between the multi-site GBRT models and the normalized GBRT models are slightly more visible. The multi-site models obtain lower nMAEtest than the single-site k models. Moreover, for both the single-site and multi-site GBRT models, the variants trained on normalized power data perform slightly better than the models trained on raw power data. The right plot in Fig. 12 shows the normalized mean bias error as defined in Eq. (16). The GBRT models trained on raw power data (GBRT 1 and GBRT 2), are almost unbiased over all forecast horizons. The Persistence, Climatology and Recursive AR models are all positively biased meaning that they on average predict lower than observed. The normalized GBRT models (GBRT 1n and GBRT 2n) are negatively biased meaning that they on average predict a higher value than observed. Finally, Fig. 13 shows the skill score
defined in Eq. (17) in terms of nRMSEtest with respect to the Persisk tence. The results are equivalent to Fig. 10 with GBRT 1n performing best on forecast horizon one to five hours ahead with a skill score of 0:22 to 0:65, and GBRT 1 performing best for k ¼ 6 with a skill score of 0:67 over the Persistence.
4.5. Local performance In order to gain insight into the variations across the different PV installations, a brief analysis of the individual PV installations is included in the following. Fig. 14 shows nRMSEtest k¼1 for each PV installation for the Recursive AR and the GBRT models on a forecast horizon of one hour ahead. The average over the PV installations for each of the models in Fig. 14 corresponds to the values in
C. Persson et al. / Solar Energy 150 (2017) 423–436
433
Fig. 11. The normalized variability, defined in Eq. (19), of each PV installation for the training and test set, respectively. The dotted lines indicate the global variability, i.e. the average variability over the 42 PV installations.
Fig. 12. Left: Test error nMAEtest for the different models for each forecast horizon k. In order to identify the differences between the Recursive AR and GBRT models the k Climatology is omitted from the figure. Right: nMBEtest for the different model for each forecast horizon k. k
Fig. 13. Skill score as defined in Eq. (17) in terms of nRMSEtest with respect to the Persistence. The annotations in the boxes show the best skill score values for each horizon. k The boxes are colored according to the model with the best skill score for the given horizon. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
434
C. Persson et al. / Solar Energy 150 (2017) 423–436
Fig. 14. Test error nRMSEtest k¼1 for the Recursive AR and the GBRT models on a horizon of one hour ahead for each PV installation.
Fig. 10 for k ¼ 1. Fig. 14 shows that power generation from some PV installations is easier to predict than for others. For example PV43, PV50 and PV76 clearly have a lower nRMSEtest k¼1 than e.g. PV47 and PV70. The GBRT models do most often outperform the Recursive AR model on the test set for all PV installations, though for e.g. PV21 the Recursive AR model performs better than the GBRT models on a horizon of one hour ahead. In order to explain why the Recursive AR model is slightly superior for PV21 on horizon k ¼ 1, Fig. 15 shows a scatter plot of the variability defined in Eq. (19) over the test set versus the test error nRMSEtest k¼1 for each PV installation and horizon k ¼ 1. The annotation inside each marker is the PV installation number, and the color of each marker indicates the best performing model according to the color legend in the upper left corner of the figure. In Fig. 15 it is clear that PV21 has the lowest variability over the test set among all the PV installations. A closer look at the rest of the PV installations in Fig. 15 shows that the best performing model varies. Except for PV21, the multi-site GBRT models (GBRT 1 and GBRT 1n) perform best on the majority of the installations. The single-site model GBRT 2 does also perform best on a couple of installations. Finally, Fig. 16 shows the location of the PV installations similar to Fig. 1. The markers are scaled by the size of the best nRMSEtest k¼1 obtained among the different models for k ¼ 1. The figure look similar for the other forecast horizons. It is hard to definitively conclude whether the location of the PV installations influence the performance of the forecasts. Though, it looks like the most eastern
located installations tend to have smaller prediction errors. Whether it is the climate or other physical or mechanical factors which influence this observed variation of performance among the PV installations is hard to conclude. 5. Discussion This paper investigates the potential use of gradient boosted regression trees (GBRT) for forecasting of solar power generation in a multi-site framework. Regression trees are not initially designed for forecasting tasks dealing with temporal correlations. Instead of investigating the temporal correlations in detail before initiating the modelling, lagged variables are used as input features and it is then up to the procedure to carry out the internal feature selection. State of the art techniques for solar power forecasting use time-adaptive methods, which are able to update the model settings for each new observation arriving. GBRT models do not have a recursive version which updates parameters online. However, if the algorithm is implemented efficiently, such that computational time for model fitting remains low, the model can be updated by re-fitting whenever a sufficient amount of new observations is available. Spatial correlation is currently a highly relevant research topic within solar power forecasting. Computing correlation metrics between multiple PV installations can be computationally intense for even low numbers of installations. GBRT models do not need
Fig. 15. Scatter plot of the variability defined in Eq. (19) versus the test error nRMSEtest k¼1 for each PV installation on horizon k ¼ 1. The annotations indicate the number of the PV installation and the color indicates the best performing model.
C. Persson et al. / Solar Energy 150 (2017) 423–436
435
Fig. 16. Location of the 42 PV installations. The size of the markers are scaled by nRMSEtest k¼1 based on the best performing model.
pre-computed similarity measures between installations. Including all installations in one multi-site model automatically makes spatial information available. Similar, historical events will be clustered together in the end nodes of the regression trees. Thus, predictions for a given PV installation will be based on historical events from other PV installations with similar patterns as the given installation. In Section 4 the GBRT models were superior to the benchmark models on all forecast horizons. However, it should be mentioned that the benchmark models are all very simple. In practice more complex time-series models would be applied for forecasting of solar power. An obvious modification to the Recursive AR model (also the best performing of the benchmarks) would be to include additional regressors and apply an appropriate regularization technique or even implement a vectorized version as in Bessa et al. (2015). Section 4 also investigated the differences between the training and the test set. Clearly, the test set contained more variability than the training set and resulted in an overall higher test error than expected. Optimally, the two data sets should be fully comparable. Instead of keeping the chronology of the data in the training and test set, one could consider to generate the two sets by sampling over the entire data such that both sets would contain samples from all 10 months. This approach would only be possible for the GBRT models and not for the Climatology, Persistence and Recursive AR. Thus, in order to be able to compare the different models on a common test set the limitations regarding the differences in the training and test set were accepted for this study. Another issue to be addressed is the model output space. The predictions from the models trained on normalized power data were transformed to raw power data and then evaluated and compared to the other models. However, the optimal model settings,
with regard to normalized power data, are not necessarily the same in the case of raw power data which makes the comparison slightly biased in favor of the models trained on raw power data. An interesting continuation of the present study would be to investigate the use of GBRT models for probabilistic forecasting. By substituting the loss function of the GBRT model it is possible to perform quantile regression (Scikit-learn, 2015b), which consequently will provide probabilistic forecasts.
6. Conclusion This paper approaches the challenge of forecasting solar power generation in a multi-site framework applying experience and knowledge from the field of machine learning. Preprocessing of data results in solar power time series spanning 316 days for 42 individual PV installations located in Japan. Lagged observations and different NWPs are used as explanatory variables to predict the PV power generation on a time horizon from one to six hours ahead. Gradient boosted regression trees (GBRT) are fitted to the data resulting in a multi-site model for each time horizon. The GBRT models are thoroughly analyzed and optimized. Model selection and parameter tuning are carried out using 5-fold crossvalidation. The influence of input features are explored by ranking the explanatory variables according to their contribution to the GBRT model. Feature analysis shows that variables related to lagged observations are more important for shorter forecast horizons. On longer horizons the importance of weather forecasts increases. Fitting the GBRT models on both normalized and raw power data shows that GBRT models are invariant to this scaling of variables, and no major benefits are achieved by normalizing power data. The performance of the GBRT models measured in
436
C. Persson et al. / Solar Energy 150 (2017) 423–436
normalized root mean squared error are compared to simple benchmark models. Among the benchmark models are a persistence, climatology and a recursive autoregressive model. The GBRT models outperform all presented benchmark models for each forecast horizon of one to six hours ahead. The best results in terms of normalized root mean squared error were between 0.100 and 0.137 for lead time k ¼ 1–k ¼ 6, respectively. These results corresponds to a skill score over a persistence model ranging between 0.22 and 0.67 for the forecast horizons between one and six hours ahead. The GBRT models are able to catch highly non-linear patterns between input variables and target response in a fast and robust manner. Conventional time-series methods are characterized by recursively updating the model as new observations arrive making them time-adaptive. The GBRT models have no simple updating procedure, however, if implemented efficiently the model can be re-fitted when a sufficient amount of new data is available. References Almeida, M.P., Perpian, O., Narvarte, L., 2015. PV power forecast using a nonparametric PV model. Sol. Energy 115, 354–368. Antonanzas, J., Osorio, N., Escobar, R., Urraca, R., Martinez-de Pison, F.J., AntonanzasTorres, F., 2016. Review of photovoltaic power forecasting. Sol. Energy 136, 78– 111. Bacher, P., Madsen, H., Nielsen, H.A., 2009. Online short-term solar power forecasting. Sol. Energy 83 (10), 1772–1783. Bessa, R.J., Trindade, A., Silva, C.S.P., Miranda, V., 2015. Probabilistic solar power forecasting in smart grids using distributed information. Int. J. Electr. Power Energy Syst. 72, 16–23. Coimbra, C.F.M., Kleissl, J., Marquez, R., 2013. Overview of solar-forecasting methods and a metric for accuracy evaluation. Sol. Energy Forecast. Resource Assess., 171–194 Commision, E., 2011. A Roadmap for Moving to a Competitive Low Carbon Economy in 2050. Accessed 15/02/2016. URL
. Hastie, T., Tibshirani, R., Friedman, J., 2011. The Elements of Statistical Learning. Springer. Hoff, T.E., Perez, R., Kleissl, J., Renne, D., Stein, J., 2013. Reporting of irradiance modeling relative prediction errors. Prog. Photovoltaics 21 (7), 1514–1519. Huang, J., Korolkiewicz, M., Agrawal, M., Boland, J., 2013. Forecasting solar radiation on an hourly time scale using a coupled autoregressive and dynamical system (CARDS) model. Sol. Energy 87, 136–149. JMA, J.M.A., 2013. Outline of the Operational Numerical Weather Prediction at the Japan Meteorological Agency. Last update: 2013. URL
jma/jma-eng/jma-center/nwp/outline2013-nwp/pdf/outline2013_03.pdf> (accessed 02/12/2015). Kim, M., Pavlovic, V., 2008. Dimensionality reduction using covariance operator inverse regression. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008, CVPR 2008, pp. 1–8. Koenker, R., 2005. Quantile regression. Quantile Regression, 1–349. Madsen, H., 2008. Time Series Analysis. Chapman and Hall/CRC. Marquez, R., Coimbra, C.F., 2011. Forecasting of global and direct solar irradiance using stochastic learning methods, ground experiments and the NWS database. Sol. Energy 85 (5), 746–756. URL . Marquez, R., Coimbra, C.F.M., 2013. Proposed metric for evaluation of solar forecasting models. J. Sol. Energy Eng. – Trans. ASME 135 (1), 011016. Marquez, R., Pedro, H.T., Coimbra, C.F., 2013. Hybrid solar forecasting method uses satellite imaging and ground telemetry as inputs to ANNs. Sol. Energy 92, 176– 188. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830. Pedro, H.T., Coimbra, C.F., 2012. Assessment of forecasting techniques for solar power production with no exogenous inputs. Sol. Energy 86 (7), 2017–2028. URL . Ragnacci, A., Pastorelli, M., Valigi, P., Ricci, E., 2012. Exploiting dimensionality reduction techniques for photovoltaic power forecasting. In: IEEE International Energy Conference and Exhibition (ENERGYCON), 2012, pp. 867–872. Reda, I., Andreas, A., 2004. Solar position algorithm for solar radiation applications. Sol. Energy 76 (5), 577–589. Ridgeway, G., 2007. Generalized boosted models: a guide to the GBM package. Update 1 (1), 2007. Scikit-learn, 2015a. Scikit-learn 0.17 Documentation: Ensemble Methods. Last update: 2015. URL (accessed 30/12/2015). Scikit-learn, 2015b. Scikit-learn 0.17 Documentation: Ensemble Methods. Last update: 2016. URL (accessed 21/02/2016). Sharma, N., Sharma, P., Irwin, D., Shenoy, P., 2011. Predicting solar generation from weather forecasts using machine learning. In: IEEE International Conference on Smart Grid Communications (SmartGridComm), 2011, pp. 528–533. Zamo, M., Mestre, O., Arbogast, P., Pannekoucke, O., 2014a. A benchmark of statistical regression methods for short-term forecasting of photovoltaic electricity production, Part I: Deterministic forecast of hourly production. Sol. Energy 105, 792–803. URL . Zamo, M., Mestre, O., Arbogast, P., Pannekoucke, O., 2014b. A benchmark of statistical regression methods for short-term forecasting of photovoltaic electricity production. Part II: Probabilistic forecast of daily production. Sol. Energy 105, 804–816. URL .