Renewable Energy 80 (2015) 770e782
Contents lists available at ScienceDirect
Renewable Energy journal homepage: www.elsevier.com/locate/renene
Nearest-neighbor methodology for prediction of intra-hour global horizontal and direct normal irradiances Hugo T.C. Pedro, Carlos F.M. Coimbra* Department of Mechanical and Aerospace Engineering, Jacobs School of Engineering, Center of Excellence in Renewable Energy Integration, Center for Energy Research, University of California San Diego, La Jolla, CA 92093, USA
a r t i c l e i n f o
a b s t r a c t
Article history: Received 13 January 2015 Accepted 25 February 2015 Available online 23 March 2015
This work proposes a novel forecast methodology for intra-hour solar irradiance based on optimized pattern recognition from local telemetry and sky imaging. The model, based on the k-nearest-neighbors (kNN) algorithm, predicts the global (GHI) and direct (DNI) components of irradiance for horizons ranging from 5 min up to 30 min, and the corresponding uncertainty prediction intervals. An optimization algorithm determines the best set of patterns and other free parameters in the model, such as the number of nearest neighbors. Results show that the model achieves significant forecast improvements (between 10% and 25%) over a reference persistence forecast. The results show that large ramps in the irradiance time series are not very well capture by the point forecasts, mostly because those events are underrepresented in the historical dataset. The inclusion of sky images in the pattern recognition results in a small improvement (below 5%) relative to the kNN without images, but it helps in the definition of the uncertainty intervals (specially in the case of DNI). The prediction intervals determined with this method show good performance, with high probability coverage (z90% for GHI and z85% for DNI) and narrow average normalized width (z8% for GHI and z17% for DNI). © 2015 Elsevier Ltd. All rights reserved.
Keywords: Smart solar forecasting Global and direct irradiance Machine learning k-nearest-neighbors
1. Introduction Forecasting of intra-hour solar irradiance and solar power output is important for power grid regulators which seek to reduce power system operation cost by committing appropriate amount of energy resources and reserves. Intra-hour forecasts are also relevant for optimal central plant operations, and is an enabling technology for optimal dispatch of ancillary resources and storage systems [15]. Increased accuracy in irradiance predictions can enable substantial improvement on the automatic generation control tools used to balance generation and load. Moreover, mitigation measures for large drops in solar irradiance, such as demand response, storage and intra-hour scheduling can only be maximized with accurate and reliable intra-our forecast [15]. Short-term fluctuations in solar irradiance are dictated, almost exclusively, by cloud cover. A single passing cloud can bring the power output of a solar farm from full production to minimum and back to full in a mater of minutes or even seconds [27]. Thus, it is * Corresponding author. E-mail addresses:
[email protected] (H.T.C. Pedro),
[email protected] (C.F.M. Coimbra). http://dx.doi.org/10.1016/j.renene.2015.02.061 0960-1481/© 2015 Elsevier Ltd. All rights reserved.
not surprising that many recent short-term forecast models incorporate cloud cover information. In general, the methodology for processing ground-based images and including them into the forecast models is based on cloud motion detection and the propagation of the cloud field into the future [8e11,13,21e23,29]. Another approach to model and forecast cloud cover consist of studying the sunshine number, which is a binary number equal to the unity if sun is shining and 0 if it is not. Parametric and semiparametric approaches based on Markovian dynamics of the sunshine number have shown good results for nowcasting [4,5,26] and are good candidates to improve the intra-hour forecast of solar irradiance, specially when large irradiance fluctuations occur. Contrary the previous works, no cloud motion detection nor distribution estimation is performed in this work. Here we propose a distribution-free methodology based on the k-Nearest-Neighbors (kNN) algorithm to predict intra-hour global (GHI) and direct (DNI) irradiances using ground telemetry and sky images. The data used in this work consist of pyranometer measurements of GHI and DNI and sky images captured with a off-the-shelf security camera pointing to the zenith. The main novelty of this study is the development of (i) specific techniques used to extract information from the irradiance data and (ii) of image processing techniques used to quantify sky
H.T.C. Pedro, C.F.M. Coimbra / Renewable Energy 80 (2015) 770e782
Nomenclature Bi DNI d, D GHI I bI m Li k kt kNN PI PICP
backward cumulative average of irradiance for window [t id,t] direct normal irradiance distance vectors for the kNN forecast global horizontal irradiance measured irradiance, GHI or DNI, in Wm2 forecasted irradiance, GHI or DNI, using model m, in Wm2 lagged 5 min average of irradiance for window [t id, t (i 1)d] number of nearest neighbors clear-sky index, GHI or DNI k nearest neighbors prediction interval prediction interval coverage probability
PINAW r, g, b RMSE s S Vi t
a d Dt Dkt u
771
prediction interval normalized average width red, green and blue channels for the sky images root mean squared error forecast skill set of features used in a given kNN forecast irradiance variability for window [t id,t] time, in min weights for the kNN forecast window size or window increments, in min forecast horizon, in min clear-sky step change weights for the kNN distances
Superscripts clr clear-sky model g, d GHI, DNI
conditions from the sky images and (iii) of integrating the information into the forecast models. The images are not used to monitor could dynamics, instead, they are used to provide features, such as the entropy of the Red, Green and Blue channels, to determine the nearest neighbors from the historical dataset. The kNN method is one of the simplest methods among the machine learning algorithms. In contrast to the statistical methods that attempt to find models from the available data, the kNN uses the training dataset as the model. Despite the fact that the kNN model was originally developed for pattern classification it can easily be applied to regression problems for time series [34], such as the forecast of solar radiation. Although there are very few articles in literature that apply kNN to the forecast of solar irradiation (see for instance [25,27]), kNN has been extensively applied as a forecasting technique to problems such as: electricity load and electricity price [18,19], daily river water temperature [31], water inflow [1] and weather forecast [2]. In the field of meteorology and weather forecasting the kNN method is also known as the analog method [30,33,35]. For the purpose of forecasting time series the kNN model consists of looking into the time series history and identifying the time stamps in the past that resemble the current conditions most closely - the nearest neighbors. Once these are determined, the forecasting is computed from the time series values subsequent to the matches. In essence, the kNN model resembles a lookup table for which previous features are used as indicators of sequential behavior. Another goal for this work is to propose a method to estimate the PIs for the forecast. A PI is an estimate of an interval in which a future measured value falls, given a lower and an upper bound [7]. PIs provide information about the forecast uncertainty and are very important for operational planning. Usually a probability is associated with the PI to quantify the confidence level of the interval. In the field of wind power forecasting, there are many significant works dedicated to calculating PIs [3,17,28,32]. The same does not happen in the field of solar power forecast where most literature focuses on point forecast, while the PIs are rarely provided. A few exceptions are the works by Refs. [6,12,16,20,24]. In this work we propose a simple algorithm to calculate the PI bounds using the nearest neighbors. The quality of the PIs is measured in terms of their probability coverage and average normalized width.
measurement per minute. The dataset contains one year of data (from December 2012 to December 2013). The irradiance data was averaged into 5-min windows and divided into three disjointed datasets. The first dataset, denoted as training or historical dataset, is used to create the kNN database of features from where the nearest neighbors are selected. The second dataset, denoted as optimization dataset, is used in the optimization algorithm to determine the several free parameters (explained below) in the forecasting model. The third dataset, the independent testing set, is used to assess the performance of forecasting model. The three datasets were constructed by grouping disjoint subsets for each month in the whole dataset, so that the training, optimization and testing sets all contain similar data. Fig.1 shows the all the data for the three datasets. In this figure the scatter plots show kt for GHI versus the normalized time u (u ¼ 1 sunrise and u ¼ 1 sunset). The whiskers allow to visualize the data variability as a function of the normalized time. The histograms show the relative frequency of kt for the entire datasets, in logarithmic scale. These plots show that the three datasets have similar data distributions. The size of each dataset is also shown in the figure. Sky images were captured at the rate of one image per minute with a security camera (Vivotek, model FE8171V) pointing to the zenith in the same period. The advantages of this camera include high-resolution, easy installation, low cost, and absence of moving parts. On the other hand, given that direct sunlight is not blocked, the circumsolar region is affected by glare caused by forward Mie scattering and also from light scattered from the dome. The camera provides 24-bit compressed jpgs, with 8 bits per color channel (Red, Green and Blue), at 1563 by 1538 pixels. After removing the pixels that do not correspond to the sky dome and the saturated pixels (yellow and orange shaded areas in Fig. 1d, respectively) only z50% of the pixels are usable. The images were synchronized with the irradiance data and also divided in three datasets. Finally, in this work we consider only time stamps for which the sun elevation angle is above 5 .
2. Data
The persistence model is used in this work as a baseline model. In this model we assume that the clear-sky index for the forecasted variable (GHI or DNI) persists in to the future. The clear-sky index is defined as
The solar irradiance measurements (GHI and DNI) were obtained at Folsom, CA, 38.64 N and 121.14 W, at a sampling of 1
3. Methodology 3.1. Persistence
772
H.T.C. Pedro, C.F.M. Coimbra / Renewable Energy 80 (2015) 770e782
Fig. 1. (a) GHI kt for the training/historical dataset, (b) GHI kt for the optimization dataset, (c) GHI kt for the testing dataset. The scatter plots show all data points (the light gray dots) versus a normalized time, u (u ¼ 1 sunrise and u ¼ 1 sunset). The whiskers end points represent the 0th, 25th, 75th and 100th percentiles for each one of u 50 bins equally spaced. The black dot between the whiskers identifies the 50th percentile. The histogram shows the GHI kt frequency for 50 bins in the range [0, 2]. The annotation in the scatter plot indicates the number of data points in the corresponding dataset. (d) An example of a sky image used in this work. The yellow shaded area marks pixels that do not correspond to sky and are not considered in the analysis. The orange area is also removed from the analysis because of saturation due to glare. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
kt ðtÞ ¼
IðtÞ Iclr ðtÞ
(1)
where I is the solar irradiance, GHI or DNI, and Iclr is the clear-sky irradiance computed following the algorithm in Ref. [14]. For a given forecast Dt the forecast for the average irradiance in the interval [t, t þ Dt] denoted as bIðt þ DtÞ is calculated as clr bI p ðt þ DtÞ ¼ 〈kt 〉 ½tDt;t 〈I 〉½t;tþDt
(2)
where 〈,〉½t;t±Dt indicates the average in the window [t, t±Dt]. In this expression the window size [t Dt, t] that defines the persistence and the window size [t, t þ Dt] for the forecast have the same length. However, there is no reason why they must be coupled. Thus, in this work we also explore the optimization of the persistence model by considering the following implementation: clr bI pd ðt þ DtÞ ¼ 〈kt 〉 ½td;t 〈I 〉½t;tþDt
(3)
where d2{5, 10, /,120} min is determined via exhaustive search method such that it minimizes the RMSE for the optimization dataset:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u nO 2 u1 X bI I RMSE ¼ t i i nO i¼1
(4)
where nO is the number of data points in the optimization dataset. In this work we used RMSE as the main error metric since it
emphasizes the larger errors. A second metric, used to evaluate the improvement relative to the baseline model, is the forecast skill which is given by:
s¼
1
RMSEm RMSE0
100 ½%
(5)
where RMSE0 is the RMSE for the model described by Eq. (2) and RMSEm is the RMSE for the model m. 3.2. k-nearest-neighbors The kNN uses past data of the endogenous variables (GHI or DNI) and exogenous variables (the sky images and DNI or GHI) to predict the endogenous variable for the forecast horizons 5 mine30 min in 5 min increments. The first step in developing a kNN model is to construct the database of features that will be used in the comparison with current conditions. 3.2.1. Irradiance In this work all the irradiance-derived values were computed using the clear-sky-index, thus removing the daily solar cycle from them. The first feature included in the kNN database is the backward average for the clear-sky index time series. For a given time stamp t this feature is given by the vector B(t) with components:
Bi ðtÞ ¼
1 N
X
kt ðtÞ;
i ¼ f1; 2; /; 24g
(6)
t2tid;t
where d is a minimum window size (equal to 5 min in this work), and N is the number of data points in the interval [t id,t]. The
H.T.C. Pedro, C.F.M. Coimbra / Renewable Energy 80 (2015) 770e782
index i varies from 1 to 24 which means that the smallest window is 5 min and the longest window is 120 min. The second feature is the lagged 5-min average values for the clear-sky index time series. For a given time stamp t this feature is given by the vector L(t) with components:
Li ðtÞ ¼
1 N
X
kt ðtÞ;
i ¼ f1; 2; /; 24g:
standard deviation
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u1 X s¼t ðv mÞ2 N i¼1 i and entropy
The last feature extracted from the GHI time series is the variability V(t), whose components for the time stamp t are:
e¼ i ¼ f1; 2; /; 24g
(8)
where Dkt(t) ¼ kt(t) kt(t Dt). Fig. 2 shows these three vectors computed for the time stamp “2013-02-22 15:30:00 PST”. The left figure highlights (shaded area) the data in the window between 13:30 and 15:30 which is used to compute the 3 features. In the first 2/3 of the window the data is highly variable, in the last 1/3 the sky clears up and GHI approaches the clear-sky value. This change is clearly reflected in B, L, and V shown in Fig. 2 on the right. 3.2.2. Sky image Besides irradiance, the cloud cover information can also be incorporated into the forecast model using the 8-bit RGB channels from the sky images. The algorithm used to extract the features consists of the following steps: 1. Apply a binary mask to select those pixels that correspond to sky. Pixels corresponding to trees, buildings, etc. and the circumsolar region are excluded (see Fig. 1d). 2. Convert the 8-bit RGB channels into floating-point vectors (r, g and b, respectively). 3. Compute the red-to-blue ratio r with components ri ¼ ri/bi. 4. For each one of these 4 vectors (r, g, b and r) compute the following 3 quantities: average
m¼
N 1 X vi : N i¼1
(10)
(7)
t2tid;tði1Þd
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u u1 X Vi ¼ t Dkt ðtÞ2 ; N t2tid;t
773
(9)
NB X
pi log2 ðpi Þ
(11)
i¼1 pi s0
where vi represents one of the vectors, N the number of elements in the vector (equal to the number of selected pixels in step 1), and pi is the relative frequency for the ith bin (out of NB ¼ 256 bins evenly spaced). This algorithm is applied to every image returning 12 values (3 metrics m, s and e for the 4 vectors r, g, b and r). The cloud cover included in the kNN database is then assembled from these values. For example, Rm(t), which measures the average of the red channel for the sky images leading to time t, has components: mR(t id), i ¼ {0, 2, /,9}, where mR is computed from Eq. (9) for the Red channel, and d is the window increment which is equal to 1 min. The bottom graphs in Fig. 3 show the average and the entropy for the R, G and B channels, respectively, computed from Eqs. (9) and (11), for the 9 images. This figure clearly shows that the variation in the cloud cover conditions is well represented in the variation of the two features shown.
3.2.3. kNN forecast All the variables described in the previous sub-sections were computed for every data point in the 3 datasets. The meta data for these variables is described in Table 1. The features are assembled row-wise into matrices, in which each row corresponds to a different time stamp. There are 18 matrices in total, 6 from the irradiance data, and 12 from the sky images. The kNN forecast is computed using the features assembled in the matrices in a two-step process. In the first step we compute the distance between the variables for the new dataset (the optimization or the testing sets) and the features in the historical dataset. For a given set of features S ¼ {p1, /,pn} (any one of the 18
Fig. 2. GHI features for the time stamp “2013-02-22 15:30:00 PST”. Left: GHI, clear-sky GHI and the clear-sky index for GHI for a 6 h period window in 2013-02-22 at Folsom, CA. The arrow on the indicates how the window [t id, t] increases with the index i. The shaded window shows the maximum 2-h window used to compute the features for this time stamp. Right: The 3 features computed using Eqs. (6)e(8).
774
H.T.C. Pedro, C.F.M. Coimbra / Renewable Energy 80 (2015) 770e782
available) in the new dataset with lengths N1, /,Nn, the distances to the historical data are computed as:
dij ¼
Ni X
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 pik P ikj ;
(12)
k
where Pi is the matrix that contains the corresponding feature i for the historical dataset. Then, the n-distance vectors are combined in a single vector
~1 þ / þ u d ~n D ¼ u1 d n ~ is the normalized distance for the i vector where d
di
and ui are integer numbers used to specify the weight of each distance vector in the global distance D. In this implementation we apply the Euclidean distance to compute the distances in Eq. (12). Mahalanobis distance For instance, one might think about a distance notion capturing longterm statistical relationships - like the Mahalanobis distance. In the second step the distance D is sorted in ascending order, and the first k elements Ds (Ds,1 Ds,2 / Ds,k) and their associated k time stamps {t1, /, tk} are extracted. Using the historical data subsequent to these time stamps we compute k forecasts:
b f i ðt þ DtÞ ¼ 〈kt 〉½ti ;ti þDt 〈I clr 〉½t;tþDt ; i ¼ 1; /; k
(15)
and the final forecast is computed as:
Pk
b
i¼1 ai f i ðt þ Pk i¼1 ai
Feature
Eq.
Window size (min)
Increment (min)
Feature length
Irradiance GHI or DNI Sky images R, G, B or R/B
Backward average Lagged average Variability Mean Standard deviation Entropy
6 7 8 9 10 11
5 5 5 1 1 1
5 e 5 1 1 1
24 24 24 10 10 10
DtÞ
:
ai ¼
(14)
max di min di
bI kNN ðt þ DtÞ ¼
Source
(13)
i
~i ¼ d
Table 1 Meta data for the kNN features.
(16)
kNN In other words, the prediction bI is a linear function of the time series values that follow the nearest neighbors time stamps ti. The weights a are function of the distance Ds for the nearest neighbors:
1 Ds;i max Ds min Ds
to 120 to to to to
120 10 10 10
n ; i ¼ 1; /; k
(17)
where n is an adjustable positive integer parameter. If n ¼ 0 the weights are equal to 1 and all the k nearest neighbors are equally weighted, if n ¼ 1 the k nearest neighbors are linearly weighted, with the closest neighbor with unit weight and the farthest neighbor with 0 weight. 3.2.4. kNN optimization To obtain the kNN forecast from the algorithm described above several parameters need to be specified: 1. the number of nearest neighbors, k2{1,2,/,150}. 2. The set of features S, i.e., which features are used in the search for the nearest neighbors. The set of selected features can contain the same feature twice but with different lengths. In this context the GHI backward average from 5 min to 30 min, for example, is considered a different variable than the GHI backward average from 5 min to 60 min. The different lengths are considered through the Ni in Eq. (12). It is important to leave this parameter free as it is likely that some lengths will be more appropriate than others for different forecast horizons. 3. The weights u2{1,2,/,10} in the calculation of D in Eq. (13). 4. The exponent n2{1,2,/,5} for the weights ai in Eq. (17). Given that the forecast accuracy will depend on these four parameters, we used an iterative exhaustive search method to
Fig. 3. Top: GHI irradiance for the same day as in Fig. 2. Middle: 9 sky images for the period 13:30 to 15:30. Bottom: the average of the R, G and B channels (left) and the entropy of the R, G and B channels (right) for the 9 sky images.
H.T.C. Pedro, C.F.M. Coimbra / Renewable Energy 80 (2015) 770e782
775
determined the best set of parameters such that the forecast error (for the optimization dataset) is minimized:
4.2. kNN optimization
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u nO 2 u1 X bIðt þ Dt; k; S; u; nÞ Iðt þ DtÞ argmin t i i nO i k;S;u;n
The second optimization problem in this work was the optimization of the kNN model. Table 3 lists the parameter length N, their weights u, the number of nearest neighbors k, and the exponent n for the weights a in Eq. 17 that minimize the error for the GHI forecast for the several forecasts. The columns under the header “without images” list the best parameters when only irradiance data are considered. Columns under “with images” list the best parameters when including features from the images. Table 4 shows the same but for the DNI forecast. The information in these Tables specifies the forecast models completely. For example, in order to apply the kNN model to predict GHI 30 min ahead of time without sky images we apply the following algorithm based on Table 3:
(18)
where nO is the number of forecast instances in the optimization set. The exhaustive method is appropriate in this case because of the small search space: three of the free parameters are integer numbers (u, n, k) and the possible combination of variable sets S are just a few thousands.
3.2.5. Prediction intervals The algorithm described above returns a single value (or point forecast), but it provides enough information to also calculate PI. Popular approaches to quantify PIs often rely on the assumption of normality, but in this case such assumption is not always true. For example, whenever the irradiance approaches its natural limits, the distribution of possible outcomes will tend to asymmetric distributions such as the gamma, or the exponential distributions. For this reason, we defined the PIs empirically using the extreme values for the individual forecasts produced by the k nearest neighbors, given by Eq. (15). Thus, the lower and upper boundaries of the PIs are given by:
n o f ðt þ DtÞ; max b f ðt þ DtÞ fLðt þ DtÞ; Uðt þ DtÞg ¼ min b
(19)
In order to assign a confidence level to these intervals we computed the PICP, that is, the probability that an actual irradiance value I(t þ Dt) is within the range [L(t þ Dt), U(t þ Dt)], defined as:
1. calculate the distances
g
dBj ¼
12 X
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi g 2 g pBk P Bkj
k¼1
d dBj
¼
1 X
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi d 2 d pBk P Bkj
k¼1
d dVj
¼
5 X
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi d 2 d pVk P Vkj
k¼1 g
g
N 1 X PICP ¼ εi N
(20)
d
d
~B ; d ~ B and d ~ V and the the respective normalized distances d weighted distance d
d
~B þ 1 d ~B þ 1 d ~V : D¼1d
i¼1
where N is the number for forecast instances and εi is a boolean variable given by
εi ¼
1
if Iðt þ DtÞ2½Lðt þ DtÞ; Uðt þ DtÞ
0
if Iðt þ DtÞ;½Lðt þ DtÞ; Uðt þ DtÞ
(21)
A second metric used to measure the quality of the prediction intervals is the PINAW:
PINAW ¼
1 N Imax
N X ðUðt þ DtÞ Lðt þ DtÞÞ
(22)
2. Sort D and extract the 30 indices that return smallest distance and their associated time stamps {t1, /,t30}. Compute the forecast for each neighbor
b f i ðt þ 30Þ ¼ 〈kt 〉½ti ;ti þ30 〈I clr 〉½t;tþ30 ; i ¼ 1; /; 30 3. Compute the weights a1 which are all equal to 1 given that n ¼ 0 for this case, and compute the final forecast and the prediction intervals
i¼1
where Imax is set to 1 kWm2 for GHI and DNI. This parameter measures the PI width relative to the maximum range observed in the irradiance. If PINAW is large and approaches the maximum possible range, the PI will have little value as it is trivial to say that the future irradiance will be between its possible extreme values.
4. Results and discussion
30 X b bI kNN ðt þ 30Þ ¼ 1 f ðt þ DtÞ: 30 i¼1 i
Table 2 Optimal window length d for the persistence model given by Eq. (3). The length is in minutes and improvement over baseline persistence model (Eq. (2)) is given in percentage.
Dt
4.1. Persistence model optimization The optimal window length d for the persistence model defined in Eq. (3) is listed in Table 2 as a function of the forecast variable and the forecast horizon. The Table also lists the improvement with respect to the reference persistence model given by Eq. (2) for the optimization set and the independent testing set.
5 10 15 20 25 30
GHI
DNI
d
sOpt
sTes
d
sOpt
sTes
5 5 5 10 15 20
0.0 1.7 1.5 1.2 0.5 0.2
0.0 2.4 2.7 2.3 1.1 0.2
5 5 5 5 5 5
0.0 5.4 7.9 8.8 8.8 8.8
0.0 6.2 9.2 10.2 10.0 9.4
776 Table 3 Optimal kNN parameters for the GHI forecast for the several forecasts, without and with the sky images. The pair of numbers (,), under each feature lists the feature weight (u) and feature length (N), respectively. The columns under k and n, list the optimal number of nearest neighbors and the exponent for the averaging weights, respectively.
Dt
Without images
With images
GHI
DNI
B 5 10
25 30
5
(1) (1) (1) (1) e (1) e e e e e e
5
9 10 12 12
3 4 3 4 3
L
B
e e e e e e e e e e e e
(2) e (2) (1) (1) (3) (1) e (1) (1) e e
V 1
(1) e (1) e e (1) (1) e (1) (1) e e
1 2 4 1 1 1 1
n
L 2 1
8 6 5 5
GHI
DNI
B
(1) 2 e (1) 2 e e (1) 3 e e e e e e
30
0
30
0
55 30
1 0
30 30
0 0
(1) e (1) e e (1) (1) (1) (1) (1) (1) (1)
V 5
(1) e (1) (1) e e (1) e e (1) e e
5
9 1 12 12 1 2 12
4 3 4
2
2
L
B
(1) 2 e e e e e e e e e e e
(3) e (2) (1) (1) (2) (2) e (1) (1) e e
Red V
1
(2) e (1) e e (1) (1) e (1) (1) e e
1 2 4 1 1 1 1
L 2
(1) (1) (1) e e (1) e e e e e e
1
8 5 5 5
2 3 2
3
Green
Blue
Red/blue
m
s
e
m
s
e
m
s
e
m
s
e
e e e e e e e e e e e e
e e e e e e e e e e e e
e e (1) 6 e e e e e e e e e
e e e e e e e e e (1) 4 e e
e e e e e e (1) 9 e e (1) 10 e e
(1) 1 e e e e e e e e e e e
e e e e e e (1) 3 e e e e e
e e e e e e (1) 1 e e (1) 1 e e
e e e e e e (1) 1 e e (1) 5 e e
e e e e e e e e e e e e
e e e e e e e e e e e e
(1) (1) e e e e (1) e e (1) e e
k
n
30
0
30
0
4
30 30
0 0
2
30 50
0 1
k
n
4
45
1
2
30
0
2 4
50
1
6
45
1
2
50
1
2
45
1
1 10
Table 4 Same as Table 3 but for the DNI forecast.
Dt
Without images
With images
GHI B 5
10
15
20
25 30
(4) (1) e (7) (1) e (7) (1) e (6) (1) e (3) (1) (3) (1)
1 4 1 3 1 8 1 8 1 8 1 8
DNI V
L
B
e e e (1) e e (1) (1) e (1) (1) e (1) e (1) e
e e e (1) 2 e e (1) 11 e e e e e e e e e
(1) e e (3) (1) (1) (2) (1) (1) (2) (1) (1) (1) (1) (1) (1)
22
13 15 2 24 14 14
2
1 3 21 2 4 13 1 2 14 1 14 1 14
V
L
(1) 13 e e e e e e e e e e e e e e e
e e e e e e e e e e e e (1) 3 e (1) 3 e
k
n
20
0
25
0
40
1
40
1
40
1
45
1
GHI B (3) (1) (1) (4) (1) e (7) (1) e (6) (1) e (4) (1) (5) (1)
DNI V
1 3 4 1 3 1 8 1 8 1 8 1 8
(1) e e (1) e e (1) (1) e (1) (1) e (1) (1) (1) (1)
L 1
24
13 15 2 24 2 24 2 13
(1) e e (1) (1) e (1) e e e e e e e e e
B 2
2 3 11
(1) (1) e (1) e e (2) (1) (1) (2) (1) (1) (3) (1) (3) (1)
2 5 7
2 4 13 1 2 14 1 14 1 14
Red
Green
V
L
m
s
e
(1) 14 e e (1) 1 e e e e e e e e e e e e
e e e (1) 2 e e e e e e e e e e e e
e e e e e e e e e e e e e e e e
e e e e e e e e e e e e e e e e
(1) e e (1) e e (1) (1) e (1) e e e e e e
3
1
2 9 3
Blue
m
s
e
e e e e e e (1) (1) e e e e (1) e (1) e
e e e e e e e e e e e e e e e e
(1) e e (1) (1) e (1) (1) e (1) e e (1) (1) (1) e
1 3
1 2
m 4
6 8 1 4 1
2 5 1
(1) e e (1) e e (1) e e (1) (1) e (1) e (1) e
1
9
5
1 4 6 6
Red/blue
s
e
e e e (1) 5 e e (1) 2 e e e e e e e e e
(2) e e (2) (1) e (1) e e (1) e e (1) e (1) e
1
2 5 3
1
3 1
m
s
e
e e e e e e e e e e e e e e e e
e e e e e e e e e e e e e e e e
(1) e e (1) e e (1) (1) e (1) e e (1) e (1) e
H.T.C. Pedro, C.F.M. Coimbra / Renewable Energy 80 (2015) 770e782
15 20
(1) e (1) e e (1) (1) e (1) (1) e e
V
k
H.T.C. Pedro, C.F.M. Coimbra / Renewable Energy 80 (2015) 770e782
4.3. Overall forecast performance As explained above the optimization parameters were obtained using the optimization dataset. The different optimized models were then applied to the testing set in order to test their performance in an independent dataset. The RMSE and the forecast skill for the different models and sets are listed in Tables 5 and 6 for GHI and DNI, respectively. For a better visualization of these results selected values from the two Tables are plotted in Figs. 4 and 5 for GHI and DNI, respectively. These results show that GHI is much easier to forecast that DNI: the RMSE for all forecast horizons ranges between 30 and 45 Wm2 for GHI, whereas for DNI, it is more than double, ranging from 50 to 100 Wm2. The Figure shows that the optimized kNN models reduce the RMSE with respect to the baseline persistence model for all the cases studied. The reduction in the RMSE translates into significant forecast skills that range between 10 and 22%, and 10 and 26% for the GHI and DNI testing set, respectively. Ideally there would be no degradation in the forecast performance for the testing set relative to the optimization set. In practice that is never the case, and the tables show a small increase in RMSE and a small decrease in the forecast skill when the models are applied to the testing set. This deterioration is clearly visible in Figs. 4 and 5. The inclusion of the variables derived from the images only slightly improves the forecast, with a bigger improvement in the case of DNI. The Tables also show the improvement achieved by the optimized persistence model (also listed in Table 2). In the case of GHI this model shows very small improvement with respect to the baseline model; however, in the case of DNI the reduction in RMSE is much more pronounced for forecast horizons above 5 min. A fundamental difference between the persistence model and the kNN model is that the persistence model uses the most recent data to produce the forecast whereas the kNN uses this data to identify the nearest neighbors, returning a forecast based only on the historical dataset. A question that rises then is: does the forecast accuracy increase if the kNN database is augmented with the most recent data? We addressed this question by adding the variables from the testing set to the kNN database. The size of the pool of candidates for nearest-neighbors increases by 70% from nH to nH þ nT, where nH ¼ 22,203 and nT ¼ 15,807 are sizes of the training and testing data sets, respectively. Data from the testing set within the forecast horizon for a given testing time stamp was excluded from the nearest-neighbors to avoid contaminating the results with data that in a real case scenario is unknown. Also, data from the optimization data set was not included in this run since the models were optimized using those values. The results from these experiments are denoted as “testing with augmented kNN database”. The RMSE and forecast skill for this case are listed in Tables 5 and 6 under the header “Tes. aug. kNN” and in Figs. 4 and 5 as “Testing Set (aug. kNN)”. These results show that by adding instances to the pool of possible neighbors in the kNN the forecast
777
always improves the predictions. Tables 5 and 6 show that the greatest improvement happens for the 30-min forecast with and increase of 3.3% and 4.3% in the forecast skill for GHI and DNI, respectively. 4.4. Effect of the training period on the forecast performance The previous subsection showed that increasing the number of candidates for the kNN algorithm resulted in higher forecast skills. This fact shows that the length and data distribution in the poll of candidates affects the outcome of the kNN forecast model. In order to study this effect we have computed the kNN using sequentially larger fractions of all the available data (historical data plus the testing data). To test the robustness of the forecast algorithm the training data was selected randomly. This process was repeated 10 times for each fraction of the total training data. Fig. 6 shows the RMSE, PICP, and PINAW for each training length (given as a percentage of the total data available) for the optimized DNI kNN model with images. The dots show the values for all the runs and the color bands show the respective ranges. Results for the other kNN forecasts are not shown because they follow the same trends as in Fig. 6. These graphs show that the kNN algorithm is not very sensitive to the size of the training set. More precisely, Fig. 6a and c shows that after an initial steep descend the RMSE and the PINAW values level off rapidly. In all cases there is very little changes for training lengths higher than 60%. This fact explains why in the previous section the inclusion of testing data in the pool of nearest neighbors candidates resulted in small improvement. The PIPC values change very little with training length, varying at most 2% from the smallest training length to the largest. Additionally the narrow width of the color bands shows that the kNN results are not very affected by the randomness in the training data, which demonstrates the robustness of the algorithm. 4.5. Forecast performance vs variability In the previous sections we studied the overall forecast performance by calculating the error metrics using all the data points in the testing set. Given that a large percentage of the data corresponds to periods of clear sky, easily forecasted by the models, it is important to analyze the forecast performance as a function of the variability in the solar resource. To perform this analysis, periods of high variability were selected in terms of a normalized ramp rate, defined as
ri ¼
kt;i kt;i1
(23)
kt;max
where the reference value kmax is set to 1 for both GHI and DNI. Table 7 lists the frequency of GHI and DNI ramps for the historical and testing sets for 6 bins and the different forecast horizons/
Table 5 RMSE and forecast skill s for the GHI forecast for the optimization and the testing sets. RMSE values are in Wm2 and the skill s is in percentage. TH
5 10 15 20 25 30
Persist.
Persist. opt.
kNN without images Tes.
Opt.
kNN with images
Opt.
Tes.
Opt.
Tes. aug. kNN
Opt.
RMSE
RMSE
RMSE
s
RMSE
s
RMSE
s
RMSE
Tes. s
RMSE
s
RMSE
s
RMSE
Tes. s
RMSE
Tes. aug. kNN s
34.7 37.4 38.8 39.7 40.1 40.3
37.7 40.8 43.0 44.4 44.9 45.1
34.7 36.7 38.2 39.3 39.9 40.2
0.0 1.7 1.5 1.2 0.5 0.2
37.7 39.8 41.9 43.3 44.5 45.0
0.0 2.4 2.7 2.3 1.1 0.2
30.4 30.1 30.0 30.2 30.3 30.3
12.3 19.5 22.8 24.0 24.7 24.7
34.7 34.5 35.2 36.2 36.5 36.8
8.0 15.5 18.2 18.3 18.9 18.3
34.3 34.0 34.5 35.4 35.6 35.9
9.1 16.6 19.7 20.2 20.9 20.4
30.2 29.9 30.1 29.3 30.3 29.0
13.0 20.1 22.5 26.4 24.7 28.1
34.4 34.4 35.5 34.8 36.5 35.3
8.9 15.6 17.6 21.5 18.9 21.8
34.2 34.2 34.8 33.7 35.6 33.8
9.5 16.3 19.2 24.0 20.9 25.1
778
H.T.C. Pedro, C.F.M. Coimbra / Renewable Energy 80 (2015) 770e782
Table 6 RMSE and forecast skill s for the DNI forecast for the optimization and the testing sets. RMSE values are in Wm2 and the skill s is in percentage. TH
5 10 15 20 25 30
Persist.
Persist. opt.
kNN without images Tes.
Opt.
kNN with images
Opt.
Tes.
Opt.
Tes.
Tes. aug. kNN
Opt.
Tes.
Tes. aug. kNN
RMSE
RMSE
RMSE
s
RMSE
s
RMSE
s
RMSE
s
RMSE
s
RMSE
s
RMSE
s
RMSE
s
65.1 74.4 81.5 86.8 90.7 94.0
68.0 79.0 87.5 93.8 98.1 101.2
65.1 70.3 75.1 79.2 82.7 85.7
0.0 5.4 7.9 8.8 8.8 8.8
68.0 74.1 79.5 84.3 88.3 91.7
0.0 6.2 9.2 10.2 10.0 9.4
57.8 59.5 62.4 64.4 67.3 69.4
11.2 20.0 23.5 25.9 25.9 26.2
61.0 65.1 68.6 71.1 75.3 77.5
10.3 17.5 21.7 24.2 23.3 23.4
60.3 63.7 66.4 68.8 72.0 74.0
11.3 19.3 24.2 26.6 26.6 26.8
56.9 59.1 60.8 62.8 64.9 66.5
12.6 20.6 25.4 27.7 28.5 29.3
60.2 64.9 67.3 69.7 72.8 74.5
11.4 17.8 23.1 25.6 25.7 26.4
59.7 63.2 64.5 66.5 68.7 70.2
12.2 20.0 26.4 29.1 29.9 30.6
RMSE (Wm−2)
40 35 30 25
Skill (%)
30 Opt. Persist. KNN without Images KNN with Images Optimization Set Testing Set Testing Set (aug. kNN)
20 10 0
5
10
15
20
Forecast horizon [min]
25
30
Fig. 4. RMSE (top) and forecast skill (bottom) versus the forecast horizon for the GHI forecast.
averaging windows. The numbers in this Table give a very good indication for the overall forecast skill observed in the previous section: (i) DNI forecast shows larger errors than GHI forecast because there are much more ramps with larger magnitude; and (ii) as the averaging window increases the number of large ramps decreases (due to smoothing) and the forecast skill increases. After identifying the data points in the testing set that belong to each variability bin, we computed the error metrics as a function of the forecast Dt and the variability. In this analysis we used only the best model identified in the previous section: the kNN model with images and augmented database. Figs. 7 and 8 show the RMSE and forecast skill as a function of variability for GHI and DNI,
respectively. Figs. 7a and 8a show that the RMSE increases with the variability for both forecast models. Following the same trend in the overall forecast, the RMSE for DNI is more than double than the one for GHI. For large ramp indices we observe reductions in the RMSE of more than 100 Wm2 relative to the persistence model for both variables. Figs. 7b and 8b allow to compare the improvement of the kNN forecast relative to the persistence model more clearly. In these Figures the forecast skill for the whole testing set is shown for reference (the dashed horizontal lines). These Figures show that, in the case of GHI forecast skill, for time horizons below 15 min it increases monotonically with variability. For these horizons the
RMSE (Wm−2)
80 75 70 65 60 55
Skill (%)
30 Opt. Persist. KNN without Images KNN with Images Optimization Set Testing Set Testing Set (aug. kNN)
20 10 0
5
10
15
20
Forecast horizon [min]
25
30
Fig. 5. Same as Fig. 4 but for the DNI forecast.
H.T.C. Pedro, C.F.M. Coimbra / Renewable Energy 80 (2015) 770e782
779
Fig. 6. DNI RMSE (a), PICP (b), and PINAW (c) versus the length of the kNN database for the different forecast horizons. Forecast computed for the optimized kNN model using image features. The forecast was computed 10 times with fractions of training data selected randomly. The dots show the metrics the 10 runs for each training length. The color band shows the range for the three metrics. The forecast horizons are identified by the different colors. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
skill for low variability periods is lower than the overall skill which means that most improvement occurs when the GHI is most variable. For horizons above 15 min, this behavior changes, the curves are no longer monotonic and the forecast skill is less dependent on the variability. In the case of DNI, the 5-min forecast has a similar behavior to its GHI counterpart. A completely different dependence can be seen for forecasts above 15 min (inclusive) where the forecast skill decreases with increasing variability. This reveals that for these forecasts the kNN model for DNI predicts large ramps poorly and most of the improvement is achieved at lower levels of variability. Finally, in the case of 5-min forecast the skill is actually negative if ri < 0.10, for both GHI and DNI. This indicates that the kNN forecast is worse than the reference persistence model at this level of variability and forecast horizon. This observation suggests that, in these cases, a piecewise forecast model that applies the persistence model for periods of low variability, and the kNN model for periods of high variability, may return better results. The non-trivial problem, not addressed here, is in determining the logical condition for the picewise function. 4.6. Forecasting intervals As demonstrated above, the proposed model achieves substantial improvement relative to the reference model. But it still does a poor job of predicting the large ramps. A way of addressing
this issue is to recognize that large ramps will be always very hard to predict, and thus predict not only the actual value of solar irradiance but also the range for the irradiance, which can be calculated using Eq. (19). The performance of the PIs is quantified using the metrics PICP and PINAW given by Eqs. (20) and (22), respectively. Again, this analysis was just performed for the best model as in the previous sub-section. Fig. 9 shows the PICP and the PINAW for the GHI and DNI PIs versus the forecast horizon. These results show that, in the case of GHI, the actual value is within the prediction interval at least 90% of the time, regardless of the forecast horizon. In the case of DNI, the PICP decreases, and the PIs for the best kNN forecast only guarantee that the measured value is within the interval between 80% and 90% of the time. We also see that in this case there is a big drop in the PICP between the optimization and the testing sets. The same does not happen with GHI, where it is almost the same for the two sets. In terms of the PIs relative width measured by the PINAW values in Fig. 9c,d, we observe much lower values for GHI (between 6% and 10%) than for DNI (between 10% and 20%). It is interesting to notice that, although the DNI PIs are wider its PICP is lower than the GHI PICP. This illustrates again that the DNI is much more difficult to predict than GHI. Finally, Figs. 10 and 11 show in the top graphs the 30-min forecast time series and the corresponding PIs (the blue shaded areas (in the web version)) for GHI and DNI, respectively. The selected days cover different weather conditions, from completely
Table 7 Number of ramps per variability bin for GHI and DNI. The frequency is shown for the historical and the testing datasets. Bin
Set
GHI
DNI
Forecast horizon/averaging window
0.000.05 0.050.10 0.100.20 0.200.30 0.300.50 0.501.00
Tes Hist Tes Hist Tes Hist Tes Hist Tes Hist Tes Hist
5
10
15
20
25
30
5
10
15
20
25
30
12,272 16,624 916 1554 705 1240 287 530 191 387 45 128
12,654 17,296 867 1538 658 1097 160 364 77 162 0 6
12,970 17,824 840 1491 485 916 109 206 12 26 0 0
13,200 18,266 778 1353 390 775 48 63 0 6 0 0
13,404 18,608 693 1301 302 537 17 17 0 0 0 0
13,537 18,924 658 1197 217 338 4 4 0 0 0 0
12,288 16,999 829 1205 675 1071 276 501 261 485 86 202
12,544 17,450 847 1237 636 1034 249 439 133 288 7 15
12,752 17,753 868 1273 572 1004 173 342 51 90 0 1
12,887 18,014 884 1305 514 913 129 225 2 6 0 0
13,014 18,174 857 1334 495 897 50 56 0 2 0 0
13,120 18,360 834 1355 451 740 11 8 0 0 0 0
780
H.T.C. Pedro, C.F.M. Coimbra / Renewable Energy 80 (2015) 770e782
Fig. 7. GHI forecast RMSE and forecast skill as a function of the ramp rate and the forecast horizon. The forecast horizons are identified by the line colors. RMSE: The figure shows the RMSE for the persistence model (triangle) and the kNN model (circle). Skill: The dashed lines indicate the forecast skill when all data is used. Values in the shaded area identify bins for which the kNN forecast is worse than the reference persistence model.
Fig. 8. Same as Fig. 7 but for the DNI forecast.
overcast to completely clear days in the testing dataset. The top graph also displays black dots that identify instances for which the measured value was outside the PI. These are the cases that reduce the PICP values. The PICP and the PINAW for these selected days are also shown in the Figures. The PINAW values for these days are higher than the values for the whole testing dataset because periods of high irradiance variability are more represented in these
seven days than in the whole testing set in which clear days predominate. The bottom graphs in Figs. 10 and 11 show the absolute error for each point forecast instance. The error for the persistence model is also shown as reference. These graphs show that the kNN model is almost always better that the reference persistence model, specially when large variations in the measured irradiance are observed.
Fig. 9. (a) and (b) PICP for GHI and DNI versus the forecast interval. (c) and (d) PINAW for GHI and DNI versus the forecast interval. The values are computed for the optimization dataset and the testing dataset with the augmented kNN database.
H.T.C. Pedro, C.F.M. Coimbra / Renewable Energy 80 (2015) 770e782
781
Fig. 10. Measured and forecasted time series and the corresponding PIs for GHI. The graph shows 7 days that cover different weather conditions, from completely overcast to completely clear days in the testing dataset. The black dots identify instances for which the measured value is outside the prediction interval.
Fig. 11. Same as in Fig. 10 but for DNI.
5. Conclusions In this work we proposed a novel k-nearest-neighbors forecasting methodology for GHI and DNI using local irradiance data and sky images. We carried out a detailed analysis of the forecast performance for the new methodology, which is capable of achieving relatively large forecasting skills with very limited investment in instrumentation. The present work focuses on the methodology used for developing the forecasting method and therefore does not address more general questions such as the applicability of the method to a wide range of solar variability microclimates or the effect of very long (multiple years) training data. Nevertheless, the following conclusions can be drawn from this study:
Tests with the augmented kNN database show the importance of having a very complete historical database. This is especially important for the periods of large ramps, since, as we have seen, they are rare and thus underrepresented in the historical database. This impacts the kNN forecast as it relies on similarity of past data to produce the forecast. The study of the kNN forecast performance as a function of the size and composition of the historical database shows that the proposed model is very robust. The prediction intervals can easily be obtained using the proposed kNN methodology. The results showed that the PIs have a good performance with high probability coverage and small normalized average width. Acknowledgments
A proper selection of features derived from the irradiance time series is necessary to maximize the kNN forecast performance. Contrary to our expectations, including features from sky images into the kNN model resulted in minor improvements. In fact, features derived from the images are much less present in the optimized kNN than the irradiance variables. This is specially clear in the case of GHI as demonstrated in Table 3. A possible explanation for this fact maybe that the image derived parameters are only relevant when clouds are present. Given that the optimization is performed for the optimization dataset, in which clear days predominate, the image features have little impact on the overall performance. The analysis of the forecast performance versus the irradiance variability allows to conclude that he proposed model predicts large ramps poorly, especially in the case of DNI.
Both authors gratefully acknowledge the partial support by the National Science Foundation (NSF) EECS (EPAS) award N. 1201986, which is managed by Dr. Paul Werbos, and the partial support by the California Energy Commission PIER PON-13-303 program, which is managed by Dr. Zhiqin Zhang. References [1] Akbari M, Overloop P, Afshar A. Clustered k nearest neighbor algorithm for daily inflow forecasting. Water Resour Manag 2011;25(5):1341e57. [2] Bannayan M, Hoogenboom G. Predicting realizations of daily weather data for climate forecasts using the non-parametric nearest-neighbour re-sampling technique. Int J Climatol 2008;28(10):1357e68. [3] Bossavy A, Girard R, Kariniotakis G. Forecasting ramps of wind power production with numerical weather prediction ensembles. Wind Energy 2013;16(1):51e63.
782
H.T.C. Pedro, C.F.M. Coimbra / Renewable Energy 80 (2015) 770e782
[4] Brabec M, Badescu V, Paulescu M. Nowcasting sunshine number using logistic modeling. Meteorol Atmos Phys 2013;120(1e2):61e71. [5] Brabec M, Paulescu M, Badescu V. Generalized additive models for nowcasting cloud shading. Sol Energy 2014;101:272e82. [6] Bracale A, Caramia P, Carpinelli G, Di Fazio AR, Ferruzzi G. A bayesian method for short-term probabilistic forecasting of photovoltaic generation in smart grid operation and control. Energies 2013;6(2):733e47. [7] Chatfield C. Calculating interval forecasts. J Bus Econ Statistics 1993;11(2): 121e35. [8] Chow CW, Urquhart B, Lave M, Dominguez A, Kleissl J, Shields J, et al. Intrahour forecasting with a total sky imager at the UC San Diego solar energy testbed. Sol Energy 2011;85(11):2881e93. [9] Chu Y, Pedro HTC, Coimbra CFM. Hybrid intra-hour DNI forecasts with sky image processing enhanced by stochastic learning. Sol Energy Dec. 2013;98(Part C):592e603. [10] Chu Y, Pedro HTC, Nonnenmacher L, Inman RH, Liao Z, Coimbra CFM. A smart image-based cloud detection system for intrahour solar irradiance forecasts. J Atmos Ocean Technol 2014;31(9):1995e2007. [11] Crispim EM, Ferreira PM, Ruano AE. Prediction of the solar radiation evolution using computational intelligence techniques and cloudiness indices. Int J Innov Comput Inf Control 2008;4(5):1121e33. [12] Dong Z, Yang D, Reindl T, Walsh WM. Short-term solar irradiance forecasting using exponential smoothing state space model. Energy 2013;55:1104e13. [13] Ferreira P, Gomes J, Martins I, Ruano A. A neural network based intelligent predictive sensor for cloudiness, solar radiation and air temperature. Sensors 2012;12:15750e77. [14] Ineichen P. A broadband simplified version of the solis clear sky model. Sol Energy 2008;82:758e62. [15] Inman RH, Pedro HTC, Coimbra CFM. Solar forecasting methods for renewable energy integration. Prog Energy Combust Sci Dec. 2013;39(6):535e76. [16] Iversen EB, Morales JM, Mller JK, Madsen H. Probabilistic forecasts of solar irradiance using stochastic differential equations. Environmetrics 2014;25(3): 152e64. [17] Khosravi A, Nahavandi S, Creighton D. Prediction intervals for short-term wind farm power generation forecasts. IEEE Trans Sustain Energy 2013;4(3):602e10. [18] Lora A, Santos J, Santos J, Ramos J, Exposito A. Electricity market price forecasting: neural networks versus weighted-distance k nearest neighbours. In: Hameurlain A, Cicchetti R, Traunmüller R, editors. Database and expert systems applications. Vol. 2453 of lecture notes in computer science. Heidelberg: Springer Berlin; 2002. p. 157e211. [19] Lora AT, Riquelme JC, Ramos JLM, Santos JMR, Exposito AG. Influence of kNNbased load forecasting errors on optimal energy production. In: Pires FM, Abreu S, editors. Progress in artificial intelligence, 2902. Berlin: SpringerVerlag Berlin; 2003. p. 189e203.
[20] Lorenz E, Hurka J, Heinemann D, Beyer H. Irradiance forecasting for the power prediction of grid-connected photovoltaic systems. Sel Top Appl Earth Observ Remote Sens IEEE J 2009;2(1):2e10. [21] Marquez R, Coimbra CFM. Intra-hour DNI forecasting based on cloud tracking image analysis. Sol Energy May 2013;91:327e36. [22] Marquez R, Gueorguiev VG, Coimbra CFM. Forecasting of global horizontal irradiance using sky cover indices. J Sol Energy Eng Oct. 2012;135(1). 011017e011017. [23] Marquez R, Pedro HTC, Coimbra CFM. Hybrid solar forecasting method uses satellite imaging and ground telemetry as inputs to ANNs. Sol Energy Jun. 2013;92:176e88. [24] Mathiesen P, Brown J, Kleissl J. Geostrophic wind dependent probabilistic irradiance forecasts for coastal California. IEEE Trans Sustain Energy 2013;4(2):510e8. [25] Paoli C, Voyant C, Muselli M, Nivet M-L. Forecasting of preprocessed daily solar radiation time series using neural networks. Sol Energy 2010;84(12): 2146e60. [26] Paulescu M, Badescu V, Brabec M. Tools for PV (photovoltaic) plant operators: nowcasting of passing clouds. Energy 2013;54:104e12. [27] Pedro HTC, Coimbra CFM. Assessment of forecasting techniques for solar power production with no exogenous inputs. Sol Energy Jul. 2012;86(7): 2017e28. [28] Quan H, Srinivasan D, Khosravi A. Short-term load and wind power forecasting using neural network-based prediction intervals. IEEE Trans Neural Netw Learn Syst 2014;25(2):303e15. [29] Quesada-Ruiz S, Chu Y, Tovar-Pescador J, Pedro HTC, Coimbra CFM. Cloudtracking methodology for intra-hour DNI forecasting. Sol Energy Apr. 2014;102:267e75. [30] Singh D, Dimri AP, Ganju A. An analogue method for simultaneous prediction of surface weather parameters at a specific location in the western Himalaya in India. Meteorol Appl 2008;15(4):491e6. [31] St-Hilaire A, Ouarda TBMJ, Bargaoui Z, Daigle A, Bilodeau L. Daily river water temperature forecast model with a k-nearest neighbour approach. Hydrol Process 2012;26(9):1302e10. [32] Wan C, Xu Z, Pinson P. Direct interval forecasting of wind power. IEEE Trans Power Syst 2013;28(4):4877e8. [33] Xavier PK, Goswami BN. An analog method for real-time forecasting of summer monsoon subseasonal variability. Mon Weather Rev Dec. 2007;135(12):4149e60. [34] Yakowitz S. Nearest-neighbour methods for time series analysis. J Time Ser Analysis 1987;8(2):235e47. [35] Zorita E, von Storch H. The analog method as a simple statistical downscaling technique: comparison with more complicated methods. J Clim Aug. 1999;12(8):2474e89.