Electric Power Systems Research 121 (2015) 28–37
Contents lists available at ScienceDirect
Electric Power Systems Research journal homepage: www.elsevier.com/locate/epsr
Estimating wind speed probability distribution by diffusion-based kernel density method Xiaoyuan Xu, Zheng Yan ∗ , Shaolun Xu Key Laboratory of Control of Power Transmission and Conversion, Ministry of Education, Department of Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
a r t i c l e
i n f o
Article history: Received 16 July 2014 Received in revised form 15 November 2014 Accepted 28 November 2014 Keywords: Diffusion partial differential equation Discrete cosine transformation Kernel density estimation Parametric distribution model Wind speed probability distribution Goodness of fit test
a b s t r a c t Accurate estimation of wind speed probability distributions is a challenging task in wind power planning and operation. Different from the commonly used parametric methods which consist of selecting a suitable parametric model and estimating the parameters, this paper presents an improved non-parametric method to estimate wind speed probability distributions. Based on the diffusion partial differential equation in finite domain, this method accounts for both bandwidth selection and boundary correction of kernel density estimation. Preprocessing techniques are designed to handle data with different recording manners to produce smooth probability density functions. Probability densities of specific grid points are obtained by inverse discrete cosine transformation and are further used to calculate assessment indices of wind resources. The method has been tested to estimate probability densities of parametric distributions and actual wind speed data measured in different places. Simulation results show that the proposed approach is of practical value in fitting wind speed distribution models. © 2014 Elsevier B.V. All rights reserved.
1. Introduction As more and more wind power has been utilized around the world, it is of great value to estimate probability distributions of wind speed to maximize the efficiency of wind power generations. In power system planning and reliability analysis, most of the researches on wind speed probability distributions are based on parametric distribution models [1–6]. And the methods estimating wind speed probability distributions are usually parametric methods which consist of selecting a suitable parametric distribution and estimating the parameters [7–13]. The commonly used parametric distribution models can be divided into two categories:
(1) Unimodal parametric distributions including the Weibull distribution, the Gamma distribution, the Rayleigh distribution and so on. (2) Multimodal parametric distributions, especially bimodal models, for example, the mixture Weibull distribution of two components, the mixture distribution of truncated Normal and Weibull.
∗ Corresponding author. Tel.: +86 13621931964. E-mail addresses:
[email protected] (X. Xu),
[email protected] (Z. Yan),
[email protected] (S. Xu). http://dx.doi.org/10.1016/j.epsr.2014.11.029 0378-7796/© 2014 Elsevier B.V. All rights reserved.
The parametric methods are widely used for the efficiency in estimation and can be applied to estimate wind characteristics at the sites for which no wind data is available [14,15]. However, there are also challenges for the parametric methods. First, there is no rule in selecting the theoretical distribution and a distribution model which can represent wind regimes at some wind farms may not work well for others. Second, an estimated parametric model may not always be satisfactory results because of the extreme randomness of wind speed in both time and space. Thus, the non-parametric methods are introduced to estimate wind speed probability distributions [16,17]. Kernel density estimation (KDE) is one of the most popular non-parametric distribution estimating methods [18–20]. Up to now, most of the papers concentrate on the mathematical theories of KDE and only a few of them actually model the probability distributions of wind speed. A practical kernel density method is proposed in [16] to estimate long-time wind speed probability distributions. A smooth multivariate wind distribution model is developed in [17] to capture the coupled variation of wind speed, wind direction, and air density. In this paper an improved diffusion-based kernel density method (DKDM) is presented to estimate wind speed probability distributions. DKDM accounts for both bandwidth selection and boundary correction of KDE and discrete cosine transformation is adopted in DKDM to reduce the computational complexity. In order to produce smooth probability densities, a uniformly distributed
X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37
random number is added to the original wind speed data and its value range depends on the corresponding recording frequency and resolution. The probability densities of specific grid points are obtained by inverse discrete cosine transformation and are further used to calculate assessment indices of wind resources. The proposed method is compared with other approaches in estimating probability densities of parametric distribution functions and actual wind speed data. The simulation results demonstrate the practicality of this method.
x − X
R(K) =
K 2 (u)du,
f (u)du 2
(8)
Define the Asymptotic Mean Integrated Squared Error (AMISE) as: h4 R(K) + (u2 (K))2 R(f ) 4 Nh
(9)
hopt =
1/5
R(K) u22 (K)R(f )
n−1/5
(10)
In (10) only R(f ) is unknown and has to be estimated. Thus, plug-in methods mainly concern the techniques to estimate R(f ).
(2)
It turns out that the choice of bandwidth is much more important than the choice of kernel functions for the behavior of estimating results. Small values of bandwidth make the estimation look “wiggly” and show spurious features, whereas big values of bandwidth will make the result over-smooth in a sense that it may not reveal the structural features, such as multimodality [19]. In addition, the original KDE assumes the domain of the density to be infinite. Thus, KDE suffers from boundary problems if the domain has finite endpoints [21,22]. There are two classes of methods to estimate the bandwidth of KDE: the cross-validation methods which try to look at ISE(h), and the plug-in methods which try to minimize MISE(h) [23].
Among the two classes of methods, the most popular approach is Silverman’s rule of thumb (ROT) [18] in which the density is regarded as the normal distribution, but it usually makes the results over-smooth in multimodal models. Further, it has been reported in [24] that one-sided cross-validation method (OSCV) [25] and Sheather–Jones plug-in method (SJPI) [26] are outstanding methods among various bandwidth selection methods. In this paper, the densities estimated by ROT, OSCV and SJPI are evaluated and compared with the results obtained by the proposed method. ROT and SJPI are introduced in [18] and [26], respectively. The calculation procedure for OSCV using Gaussian kernel is deduced in Appendix A. 2.2. Kernel density estimation via diffusion
2
(fˆh (x) − f (x)) dx
(3) 2
{fˆh (x) − f (x)} dx]
MISE(h) = E[
(4)
where fˆh (x) represents the probability density estimated by KDE, f(x) represents the true density distribution function and E(·) is the expectation of variables. (1) Cross-validation methods
2
{fˆh (x) − f (x)} dx =
ISE(h) =
fˆh2 (x)dx − 2E{fˆh (x)}
+
f 2 (x)dx
(5)
The Gaussian kernel density estimator of (1) can be written in an alternative form: 1 fˆ (x; t) = (x, Xi ; t) N N
(11)
i=1
where
1 x − Xi 2 exp − (x, Xi ; t) = √ (12) 2t 2t √ where t has the same definition as h in (1), referred to as the bandwidth. It is shown in [27], the Gaussian kernel density estimator is the unique solution to the following diffusion partial differential equation (PDE): 2
The third term can be ignored since it does not depend on the bandwidth and (5) can be simplified as:
F(h) =
(7)
So the asymptotically optimal bandwidth is:
1 1 K(u) = √ exp − u2 2 2
h4 R(K) + (u2 (K))2 R(f ) 4 Nh
Nh
u2 K(u)du,
1
R(f ) =
where K(·) is the kernel function, h is the bandwidth and N is the size of data. There are many kernel functions and the Gaussian kernel is selected as the kernel function in this paper:
ISE(h) =
where
(1)
h
i=1
+ o(h4 ) + o
AMISE(h) =
i
2
E{fˆh (x) − f (x)} dx =
Suppose we have observed data X1 , X2 , . . ., XN from a common distribution with the probability density function f(x), and use the kernel density method to estimate the density as follows [18]: N
MISE(h) =
2.1. Introduction of kernel density estimation
1 fˆ (x) = K Nh
Various modifications of cross-validation methods have been proposed to accurately estimate the second term in (6). (2) Plug-in methods
u2 (K) =
2. Diffusion-based kernel density method
29
1 ∂ ˆ ∂ˆ f (x; t), x ∈ X, t > 0, X ≡ R f (x; t) = 2 ∂x2 ∂t
(13)
with the initial condition of 1 fˆ (x; 0) = ı(x − Xi ) N N
fˆh2 (x)dx − 2E{fˆh (x)}
(6)
i=1
(14)
30
X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37
where ı(x − Xi ) is the Dirac measure at Xi . When domain has finite endpoints, (1) needs boundary correction. Yet, within the PDE framework, all we have to do is to solve (13) over the finite domain with the initial condition (14) and the Neumann boundary condition:
∂ ˆ ∂ ˆ f (x; t)|x=Xl = f (x; t)|x=Xu = 0 ∂x ∂x
(15)
where Xl and Xu are the lower and upper bounds of the domain, respectively. Assuming the domain of the probability density is [0,1] and considering above conditions, the analytical solution of this PDE is as follow [27]: fˆ (x; t) =
N 1
N
(x, Xi ; t),
x ∈ [0, 1]
(x, 2k + Xi ; t) + (x, 2k − Xi ; t)
cos(kx) cos(kXi )
n−1
ak e−k
2 2 t/2
cos
(2j + 1)k
(24)
2n
k=0
6 2−3 7
2/5 [l] (t)
(25)
where [l] (t) = 1 (2 (· · ·l (t)· · ·)) l ≥ 1
⎛
l (t) = ⎝
2 1 + (1/2) 3N
/2
(l+1/2)
(26)
(1 × 3 × · · · × (2l − 1))
n−1 (2l+2) 2 −k2 2 t (k) ak e k=1
⎞2/(3+2l)
⎠
(27)
where l is selected as 7 in this paper.
The theta function in (17) can be written as: e
fˆ (x; t) ≈
√
2.3. Implementation of diffusion-based kernel density method
(x, Xi ; t) = 1 + 2
Eq. (22) indicates that ak can be calculated by discrete cosine transform (DCT). The DCT for a vector of n elements can be implemented via an n-point Fast Fourier Transform (FFT) for which the computational complexity is O(nlog2 n). For x = (2j + 1)/2n, j = 0, 1, . . ., n − 1
t=
Eq. (16) is an alternative form of KDE and considers both bandwidth selection and boundary correction. For small bandwidths, the estimator of (16) and the estimator of (17) behave similarly in the interior of [0,1]. As to the domain near boundaries, Eq. (16) has a better performance for it is consistent with the true density, while the Gaussian kernel density estimator by (1) is inconsistent [27].
−k2 2 t/2
(23)
(17)
k=−∞
∞
˛(k) = 2 for 1 ≤ k ≤ n − 1
(16)
where the kernel is the theta function and is given by: ∞
˛(0) = 1,
According to (24), the density estimator can be calculated by inverse discrete cosine transform (IDCT) and IDCT can be obtained via IFFT in O(nlog2 n) operations. The computation process of DCT and IDCT can be referred to [28]. The improved plug-in method in [27] is used to calculate the optimal bandwidth which is the solution to a fix-point problem.
i=1
(x, Xi ; t) =
where
3. Process to estimate wind speed probability distribution (18)
k=1
3.1. General steps of wind speed probability distribution estimation
Thus, Eq. (16) can be approximately expressed by: fˆ (x; t) ≈
n−1
ak e
−k2 2 t/2
cos(kx)
(19)
k=0
where n is a large positive integer and the coefficients ak can be given by:
ak =
⎧ 1 ⎪ ⎨
k=0
N 2
⎪ ⎩N
cos(kXi )
k = 1, 2, . . ., n − 1
(20)
i=1
For k = 1, 2, . . ., n − 1, ak can be estimated by the following equation:
ak
1
cos(kx)fˆ (x; 0)dx
=2
0 n−1
2 = N N
j=0 i=1 n−1
≈
Wn = (RF/FF) × RR × rnd
(j+1)/n
cos(kx)ı(x − Xi )dx j/n
2 cos k N j=0
j + 0.5 n
(21)
H(j)
where H(j) is the number of Xi in [j/n, (j + 1)/n]. Denote H(j)/N as y(j) and ak can be calculated by: n−1
ak = ˛(k)
j=0
y(j) cos
(2j + 1)k 2n
The probability distribution functions of wind speed in a duration of years are commonly recognized as continuous functions. However, in actual meteorological stations, data is sometimes recorded in low frequencies and resolutions which produce repeated wind speeds. In non-parametric methods, the results are completely determined by the data and these ties may lead to discrete distribution functions. Therefore, in order to obtain smooth densities, two preprocessing steps are designed. Step 1: If RF is higher than FF, the wind speed is the mean of the raw data in every FF. Where RF is the recording frequency of raw wind speed data and FF is the frequency of data which is used to estimate the probability distribution function. Step 2: A uniformly distributed random number Wn is added to the wind speed obtained in Step 1. Wn is calculated as follows.
(22)
(28)
where rnd is a uniformly distributed random number in the range of [−0.5, 0.5] and RR is the recording resolution of raw wind speed data. After the pretreatment of wind speed data, the probability distributions are estimated as follows. Step 3: Get the wind speed data V1 , V2 , . . ., VN and determine the lower bound (LB) and the upper bound (UB) of the domain. LB = 0
(29)
UB = max(Vi ) + × (max(Vi ) − min(Vi ))
(30)
where is a positive number and is 0.2 in this paper.
X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37
Step 4: Get n grid points Uj in the range of [LB, UB]. Count the number of wind speed data H(j) in the interval of [Uj , Uj+1 ). Uj = LB + j/n(UB − LB)
j = 0, 1, . . ., n − 1
(31)
where n is a large integer and is selected as 215 in this paper. Step 5: Transform n grid points to the domain of [0,1]: Xj =
Uj − LB
j n
=
UB − LB
Table 1 The parameters of three typical parametric distribution functions. PDF
Values of parameters
W1 W2 MWW MTNW
c = 7, k = 1.5 c = 8, k = 3.5 u1 = 3, c1 = 2.5, k1 = 3, u2 = 6, c2 = 4, k2 = 2, w = 0.4 u = 7, c = 6, k = 2, ˛ = 3, ˇ = 4, w = 0.5
(32)
Step 6: Calculate ak using (22) by DCT. Step 7: Calculate the optimal bandwidth using (25). Step 8: Calculate probability density fˆ (Xj + 1/(2n)) using (24) by IDCT. Step 9: Transform the density fˆ to the density fˆv in the domain of [LB, UB]:
fˆv Wj =
31
fˆ Xj + 1/ (2n)
3.2.2. Goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. The following two goodness of fit tests are applied to test the results. 1) Chi-squared (Chi2) test 2) Kolmogorov–Smirnov (KS) test
(33)
(UB − LB)
where Wj = Uj + (UB − LB)/(2n). Different from conventional KDE methods, the densities of n points are directly calculated by IDCT. Based on this property, the characteristics of wind speed can be further analyzed. Step 10: The assessment indices of wind resources can be obtained as follows.
Chi2 test is based on binning of the data, thus only using part of the information. KS test is based on the empirical cumulative distribution function and fully utilizes the information of data. In both statistical tests, a smaller statistic represents a better goodness of fit between the theoretical distribution and given samples. If the statistic is smaller than the corresponding critical value, the hypothesis that the samples are drawn from the reference distribution is accepted, otherwise it is rejected.
(a) Cumulative distribution function
Ww
Fˆv (Ww ) =
fv (v)dv = 0
w
4. Case study fˆv (Wj ) W
(34) 4.1. Estimating probability densities of parametric distribution models
j=0
where W = (UB − LB)/n. (b) The mean wind energy density
∞
v3 fv (v)dv = 0.5
P¯ all = 0.5
0
n−1
Wj3 fˆv (Wj ) W
(35)
j=0
where is the air density. (c) The mean effective wind energy density
Vout
v3 fv (v)dv = 0.5
P¯ use = 0.5
Vin
k2
Wj3 fˆv (Wj ) W
(36)
vm = v max(0.5 v3 fv (v)) = Wj max(0.5 Wj3 fˆv (Wj )) , j = 0, 1, . . . n − 1
(37)
3.2. Measures to evaluate the performances of methods 3.2.1. Integrated error The integrated error (IE) measures the distance between the estimated density and the true density distribution function and is given by:
IE =
|fˆ (x) − f (x)|dx
1) Weibull distribution (W) 2) Mixture Weibull distribution of two components (MWW) 3) Mixture distribution of truncated Normal and Weibull (MTNW)
j=k1
where Wk1 = Vin and is the cut-in wind speed of wind generators. Wk2 = Vout and is the cut-out wind speed of wind generators. (d) The mode of wind power density distribution The mode of wind power density distribution is the wind speed at which the power density distribution is a maximum.
In this section, we test the ability of the proposed method in estimating probabilities densities of commonly used parametric distribution models. Corresponding to the categories in Section 1, three typical parametric distribution functions are selected as follows.
(38)
where fˆ (x) is the estimated density and f(x) is the theoretical probability density function.
The above parametric distributions contain unimodal and multimodal models, and considers null wind speed as well. Other unimodal models such as the Gamma distribution and the Rayleigh distribution are not tested due to their similarities with the Weibull distribution. In Table 1, the values of parameters of the probability density functions are given. u, c and k are the location, scale and shape parameter of Weibull distribution. ˛ and ˇ are the expectation and standard deviation of the corresponding truncated normal distribution. w is the weight coefficient in MWW and MTNW. Readers can refer to [12,9] for detailed models of MWW and MTNW. For the four PDFs given in Table 1, 100, 200, 500, 1000 samples sizes are considered and 250 repetitions for each PDF and each sample size are performed. The final integrated errors are the mean of 250 integrated errors. We compared the proposed method with the widely recognized bandwidth selection methods, ROT, SJPI and OSCV. The results are given in Fig. 1. ROT assumes the estimating model to be the normal distribution, and usually does a good job in estimating unimodal distributions such as W1 and W2. The improved bandwidth selection methods, SJPI and OSCV, have the similar performances with ROT in estimating unimodal models. Moreover, they have a better performance in estimating multimodal distributions, for example, MWW and MTNW.
32
X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37
Fig. 1. The integrated errors of estimating probabilities densities of parametric models.
Wind speed is strictly non-negative and in this case the parameters of W2 and MWW make the distributions have a small probability near zero. Thus, the boundary problems can be ignored which leads to almost the same results obtained by SJPI, OSCV and DKDM. On the contrary, W1 and MTNW have a large probability near zero. Therefore, when estimating these models, DKDM is the
obvious winner respect to other methods which do not consider boundary correction. Fig. 2 gives intuitive descriptions of the performances of different methods in estimating densities of parametric models. The number of samples is set as 5000. Because of the similar performances by SJPI and OSCV, only the results by SJPI are shown. In the
Fig. 2. The probability densities of parametric models estimated by ROT, SJPI and DKDM.
X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37
33
Table 2 Details of wind speed observation stations. Location
Latitude
Longitude
Ganhekou Changma Deelen Huibertgat Abilene Bakersfield
N40.51 N39.85 N52.06 N53.57 N32.41 N35.43
E95.44 E96.78 E5.89 E6.40 W99.68 W119.05
Time 2008.5–2009.4 2008.5–2009.4 2001.1–2014.5 2001.1–2014.5 2013.1–2013.12 2013.1–2013.12
regions away from zero, the densities estimated by DKDM and SJPI nearly coincide which indicates that the computing bandwidths are practically equal for the two methods. As to the regions near zero, the outcomes are quite different. Specifically, the domain of the probability density estimated by DKDM is non-negative, while the domain of the density estimated by SJPI extends to the negative axle. In other words, there exist negative wind speeds for the results obtained by SJPI. In addition, ROT does a satisfactory job in describing W1 and W2 but it overfits the densities of bimodal distribution models. In summary, DKDM accounts for both bandwidth selection and boundary correction of KDE and does a satisfactory job in fitting commonly used parametric models.
4.2. Estimating probability densities of wind speed data In this section, wind speed data measured in six wind observation stations are used to test the proposed method and the details of the stations are given in Table 2. The wind speed data measured in Ganhekou and Changma is provided by State Grid Corporation of China (SGCC). The data measured in Huibertgat and Deelen is obtained from Royal Netherlands Meteorological Institute (KNMI) [29]. And the wind speed data in Abilene and Bakersfield is obtained from National Climatic Data Center (NCDC) [30].
RF
FF
RR
10 min 10 min 1h 1h 2 min 2 min
1h 1h 1 day 1 day 1h 1h
0.000001 m/s 0.000001 m/s 0.1 m/s 0.1 m/s 1 knot 1 knot
4.2.1. Probability distributions of wind speed data First of all, the effects of preprocessing on estimation are studied. In Fig. 3 Wn is not considered. The probability densities of the wind speed measured in Ganhekou and Changma are relatively smooth and are similar to parametric models such as the Weibull distribution. However, results in Deelen, Huibertgat, Abilene and Bakersfield present many discrete peaks. In these stations the recording frequencies and resolutions are relatively low which leads to many repeated wind speed data. Fig. 4 gives the occurrence numbers of different wind speed data measured in Deelen. The data is unevenly distributed. There are totally 119,136 samples but there are only 147 different values. The occurrence number of 1.1 m/s is 10,790 but there is no wind speed between 1.2 m/s and 2.0 m/s. According to the bandwidth selection method in (25), DKDM assumes the probability density function to be continuous and ties or repeated values are recognized as samples of discrete variables. Hence for these stations the bandwidths estimated by DKDM are quite small, leading to many spurious peaks in the results. In Fig. 5, Wn is introduced and the outcomes are quite different from those in Fig. 3. Due to different recoding frequencies and resolutions, the value ranges of Wn are different for different stations. For Ganhekou and Changma Wn = 1.67 · 10−7 rnd. For Deelen and Huibertgat Wn = 4.17 · 10−3 rnd. As to Abilene and Bakersfield, Wn = 1.71 · 10−2 rnd. In fact, the resolution of data is (RF/FF) × RR after Step 1, thus adding Wn to the wind speed data does not
Fig. 3. The probability densities of wind speed obtained by DKDM when Wn is not considered.
34
X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37 Table 3 The comparison of bandwidths between not introducing Wn and introducing Wn.
Fig. 4. Occurrence numbers of different wind speed data recorded in Deelen.
bring in errors under the resolution of (RF/FF) × RR. The introduction of Wn eliminates the repeated values of the data and produces continuous probability densities. A comparison of bandwidths between introducing Wn and not introducing Wn is given in Table 3. In accordance with Figs. 3 and 5, for Deelen, Huibertgat, Abilene and Bakersfield the estimated bandwidths are relatively larger after introducing Wn, which results in smoother probability densities. In Fig. 5, parametric methods and other non-parametric methods including the histogram (Hist) and k-nearest-neighbor (KNN) density estimation [31] are also performed for comparison. The histogram is simple but it is not an accurate description of the underlying density. It usually provides a quick visualization of the distribution. The Weibull distribution is used in the parametric method and does an acceptable job in describing the probability
Location
Bandwidth without Wn
Bandwidth with Wn
Ganhekou Changma Deelen Huibertgat Abilene Bakersfield
0.0161 0.0187 1.6994 × 10−05 2.3114 × 10−05 1.2128 × 10−05 1.0052 × 10−05
0.0161 0.0187 0.0184 0.0233 0.0179 0.0093
densities of the data measured in Changma and Huibertgat. However, in other stations, the parametric models are not consistent with the histograms. KNN has a good performance in evaluating the probability densities in the interior of the domain. However, KNN produces heavy tails and the result is not a true probability density since its integral over all the sample space diverges. Other KDE methods, including the bandwidth optimization (BO) method in [16] and SJPI in [17] are also performed. The comparisons between different KDE methods are given in Fig. 6. Just as the results given in Section 4.1, in the regions away from zero, densities obtained by SJPI and DKDM nearly coincide with each other. While in the regions near zero, the domain of the density estimated by DKDM is non-negative but the result obtained by SJPI extends to the negative axle. Compared with SJPI and DKDM, BO produces unsmooth results which are obvious in Changma, Deelen and Abilene. In other words, the bandwidths obtained by BO are smaller than the bandwidths estimated by DKDM and SJPI. In summary, DKDM does the best job in fitting wind speed probability distributions in these six stations. Different from conventional KDE methods, the probability densities of specific grid points are calculated by IDCT in DKDM. Therefore, the cumulative distribution function (CDF) and various assessment indices of wind resources can be easily obtained by numerical integrations. We assume that the cut-in wind speed of the wind generator is 3 m/s, the cut-out wind speed is 20 m/s and the air density is 1.225 kg/m3 . The CDF of wind speed between
Fig. 5. The probability densities of wind speed when Wn is considered.
X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37
35
Fig. 6. The comparisons between different KDE methods in fitting wind speed probability distributions.
Table 4 The assess indices of wind resources. Location
CDF vvout
P¯ all (W/m2 )
P¯ use (W/m2 )
vm (m/s)
Ganhekou Changma Deelen Huibertgat Abilene Bakersfield
0.80 0.91 0.72 0.97 0.72 0.30
598.21 579.32 75.40 382.61 127.13 23.00
551.54 550.49 70.09 382.22 125.24 19.99
13.07 12.73 5.97 9.82 9.11 4.93
in
Table 5 The statistics of Chi2 tests for the estimated probability densities.
DKDM SJPI Weibull Critical value
Ganhekou
Changma
Deelen
Huibertgat
Abilene
Bakersfield
17.54 22.43 307.94
10.54 11.26 54.39
3.46 3.22 314.74
12.57 11.29 151.17
4.15 4.22 120.36
2.80 3.01 2512.62
36.42
36.42
21.03
28.87
22.36
22.36
Table 6 The statistics of KS tests for the estimated probability densities. Ganhekou
Changma
Deelen
Huibertgat
Abilene
Bakersfield
DKDM SJPI Weibull
0.0114 0.0132 0.0386
0.0059 0.0062 0.0089
0.0076 0.0075 0.0398
0.0087 0.0082 0.0276
0.0060 0.0062 0.0295
0.0063 0.0067 0.0535
Critical value
0.0148
0.0148
0.0192
0.0195
0.0149
0.0157
the cut-in and cut-out wind speed and the assessment indices are given in Table 4. Although it is inappropriate to compare wind resources between different stations due to different measuring conditions, we can still draw some conclusions. For example, Ganhekou and Changma are rich in wind energies because of large P¯ all and P¯ use . Deelen and Huibertgat are nearby in geographical locations, but their wind conditions are remarkably different. P¯ all is small in Bakersfield and it may be unsuitable to establish wind farms in this place.
4.2.2. Goodness of fit tests The statistics of goodness of fit tests for the results obtained by the proposed method, SJPI and the parametric method using the Weibull distribution are given in Tables 5 and 6. In the case of Chi2 tests, the significant level is 0.05 and the data is grouped in accordance with the histogram in Fig. 5. The statistics obtained by DKDM and SJPI are smaller than the corresponding critical values. However, most of the statistics obtained by the Weibull distributions are larger than the critical values. Although the Weibull distribution
36
X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37
can represent a wide range of wind regimes, it is unable to describe the probability models of all the wind speed data. The results of KS tests are given in Table 6 and are similar to those in Table 5. It should be noted that in both Chi2 tests and KS tests, the empirical distribution function (EDF) is recognized as the theoretical cumulative distribution function which may be defective in testing the results obtained by KDE. EDF can be treated as the result estimated by KDE with the bandwidth equal to zero which is described by (14). When the bandwidth tends asymptotically toward zero, the results of KDE gradually approach EDF. Therefore, smaller statistics of goodness of fit tests indicate that the bandwidths are smaller and the estimated distribution functions are closer to EDF, rather than the true probability distribution functions. In Deelen and Huibertgat, the statistics of SJPI are a little smaller than those of DKDM which means that the bandwidths obtained by SJPI are a bit smaller. In the other four stations, the results obtained by DKDM are smaller. This is because in these cases the boundary problem is prominent and has been properly handled by DKDM.
Step 2 Minimize the following equation to get bandwidth b¯ by ¯ using kernel function K(·).
2 OSCV (b) = √ exp 2 N b i=1
−
2 K¯ N2h
where 2 I(Xi , Xj ) = √ 2h2
This work was supported by the Natural Science Foundation of China (51377103). The authors gratefully acknowledge the support from State Grid Corporation of China, Royal Netherlands Meteorological Institute and National Climatic Data Center for providing the wind speed data.
Appendix A. The calculation steps of Gaussian kernel density estimator by OSCV is as follows: Step 1: The intermediate kernel function is given as follows:
¯ = K(x)
1+
2/x
1 − 2/
2 exp √ 2
x2 − 2
, x<0
(A.1)
(Xi − Xj )2 4h2
(Xi − Xj )
I(Xi , Xj )
(A.2)
h
min(0,Xj −Xi )
exp
−
(x + 0.5(Xi − Xj ))2
−∞
h2
(A.3)
Step 3: Rescale b¯ to obtain the bandwidth of Gaussian kernel density estimation. h¯ = C · b¯
C=
Acknowledgements
−
i=1 j=1
This paper studies the problem of estimating wind speed probability distributions by non-parametric methods. Based on the diffusion partial differential equation in finite domain, the method accounts for both bandwidth selection and boundary correction of kernel density estimation. Discrete cosine transformation and inverse discrete cosine transformation are applied to reduce computation complexity of the proposed method. A process consisting of data preprocessing techniques and DKDM is designed to estimate wind speed probability distributions. The proposed method is compared with other approaches in estimating probability densities of parametric distributions functions and actual wind speed data. Although compared with parametric methods DKDM may be trapped in situations where no wind speed is available, it can obtain more accurate results. Moreover, wind speed in different places may own different characteristics. It is difficult for a single parametric distribution to properly model different wind speed data. The proposed non-parametric method can overcome the deficiencies of parametric methods and is of practical value in describing wind speed probability distributions. In future work, we shall try to extend DKDM to model multivariate wind distributions.
j
N
N
(A.4)
where 5. Conclusion
N
N
2
¯ R(K)(u2 (K)) 2 ¯ R(K)(u (K)) 2
1/5 ≈ 0.6168
(A.5)
where K is the Gaussian kernel. Step 4: The estimating probability density is:
1 1 exp − √ ¯ 2 N h 2 N
fˆ (x) =
i=1
x − X 2 i
h¯
(A.6)
References [1] D. Villanueva, J.L. Oazos, A. Feijoo, Probabilistic load flow including wind power generation, IEEE Trans. Power Syst. 26 (3) (2011) 1659–1667. [2] N. Soleimanpour, M. Mohammadi, Probabilistic load flow by using nonparametric density estimators, IEEE Trans. Power Syst. 28 (4) (2013) 3747–3755. [3] L. Shi, C. Wang, L. Yao, Y. Ni, M. Bazargan, Optimal power flow solution incorporating wind power, IEEE Syst. J. 6 (2) (2012) 233–241. [4] Z. Qin, W. Li, X. Xiong, Generation system reliability evaluation incorporating correlations of wind speeds with different distributions, IEEE Trans. Power Syst. 28 (1) (2013) 551–558. [5] D. Villanueva, A. Feijoo, J.L. Pazos, Simulation of correlated wind speed data for economic dispatch evaluation, IEEE Trans. Sustain. Energy 3 (1) (2012) 142–149. [6] T. Yeh, L. Wang, A study on generator capacity for wind turbines under various tower heights and rated wind speeds using Weibull distribution, IEEE Trans. Energy Convers. 23 (2) (2008) 592–602. [7] J.A. Carta, P. Ramírez, S. Velázquez, A review of wind speed probability distributions used in wind energy analysis: case studies in the Canary Islands, Renew. Sustain. Energy Rev. 13 (5) (2009) 933–955. [8] V. Lo Brano, A. Orioli, G. Ciulla, S. Culotta, Quality of wind speed fitting distributions for the urban area of Palermo, Italy, Renew. Energy 36 (3) (2011) 1026–1039. [9] J.A. Carta, P. Ramírez, Use of finite mixture distribution models in the analysis of wind energy in the Canarian Archipelago, Energy Convers. Manag. 48 (1) (2007) 281–291. [10] S. Wang, J. Yu, Approximation of two-peak wind speed probability density function with mixed Weibull distribution, Autom. Electr. Power Syst. 34 (6) (2010) 89–93. [11] E.C. Morgan, M. Lackner, R.M. Vogel, L.G. Baise, Probability distributions for offshore wind speeds, Energy Convers. Manag. 52 (1) (2011) 15–26. [12] O.A. Jaramillo, M.A. Borja, Wind speed analysis in La Ventosa, Mexico: a bimodal probability distribution case, Renew. Energy 29 (10) (2004) 1613–1630. [13] B. Safari, Modeling wind speed and wind power distributions in Rwanda, Renew. Sustain. Energy Rev. 15 (2) (2011) 925–935. [14] A.J. Bowen, N.G. Mortensen, WAsP Prediction Errors Due to Site Orography, 2004, http://orbit.dtu.dk/fedora/objects/orbit:91202/datastreams/ file 7711496/content [15] G. Gualtieri, S. Secci, Methods to extrapolate wind resource to the turbine hub height based on power law: a 1-h wind speed vs. Weibull distribution extrapolation comparison, Renew. Energy 43 (2012) 183–200. [16] Z. Qin, W. Li, X. Xiong, Estimating wind speed probability distribution using kernel density method, Electr. Power Syst. Res. 81 (12) (2011) 2139–2146. [17] J. Zhang, S. Chowdhury, A. Messac, L. Castillo, A multivariate and multimodal wind distribution model, Renew. Energy 51 (2013) 436–447. [18] B.W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall, UK, 1986.
X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37 [19] B.A. Turlach, Bandwidth Selection in Kernel Density Estimation: A Review, Université catholique de Louvain, Louvain-la-Neuve, 1993. [20] S. Chiu, A comparative review of bandwidth selection for kernel density estimation, Stat. Sin. 6 (1) (1996) 129–145. [21] M.C. Jones, Simple boundary correction for kernel density estimation, Stat. Comput. 3 (3) (1993) 135–146. [22] S.X. Chen, Probability density function estimation using gamma kernels, Ann. Inst. Stat. Math. 52 (3) (2000) 471–480. [23] R. Cao, Bootstrapping the mean integrated squared error, J. Multivar. Anal. 45 (1) (1993) 137–160. [24] N. Heidenreich, A. Schindler, S. Sperlich, Bandwidth Selection Methods for Kernel Density Estimation – A Review of Performance, 2010, Available at SSRN 1726428. [25] G.N. Gregoriou, Operational Risk Towards Basel III: Best Practices and Issues in Modeling, Management and Regulation, vol. 481, John Wiley and Sons, New Jersey, 2009, pp. 177–196.
37
[26] J.S. Sheather, M.C. Jones, A reliable data-based bandwidth selection method for kernel density estimation, J. R. Stat. Soc. Ser. B: Methodol. 53 (3) (1991) 683–690. [27] Z.I. Botev, J.F. Grotowski, D.P. Kroese, Kernel density estimation via diffusion, Ann. Stat. 38 (5) (2010) 2916–2957. [28] A.K. Jain, Fundamentals of Digital Image Processing, vol. III, Prentice-Hall, Englewood Cliffs, 1989. [29] KNMI, Royal Netherlands Meteorological Institute, 2014. http://www.knmi.nl/ klimatologie/onderzoeksgegevens/potentiele wind/ [30] NCDC, National Climatic Data Center, 2014. http://www.ncdc.noaa.gov/ [31] K. Fukunaga, L. Hostetler, Optimization of k-nearest-neighbor density estimates, IEEE Trans. Inform. Theory 19 (3) (1973) 320–326.