Estimating wind speed probability distribution by diffusion-based kernel density method

Electric Power Systems Research 121 (2015) 28–37 Contents lists available at ScienceDirect Electric Power Systems Research journal homepage: www.els...

Download PDF

2MB Sizes 0 Downloads 108 Views

Report

PDF Reader
Full Text

Electric Power Systems Research 121 (2015) 28–37

Contents lists available at ScienceDirect

Electric Power Systems Research journal homepage: www.elsevier.com/locate/epsr

Estimating wind speed probability distribution by diffusion-based kernel density method Xiaoyuan Xu, Zheng Yan ∗ , Shaolun Xu Key Laboratory of Control of Power Transmission and Conversion, Ministry of Education, Department of Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

a r t i c l e

i n f o

Article history: Received 16 July 2014 Received in revised form 15 November 2014 Accepted 28 November 2014 Keywords: Diffusion partial differential equation Discrete cosine transformation Kernel density estimation Parametric distribution model Wind speed probability distribution Goodness of ﬁt test

a b s t r a c t Accurate estimation of wind speed probability distributions is a challenging task in wind power planning and operation. Different from the commonly used parametric methods which consist of selecting a suitable parametric model and estimating the parameters, this paper presents an improved non-parametric method to estimate wind speed probability distributions. Based on the diffusion partial differential equation in ﬁnite domain, this method accounts for both bandwidth selection and boundary correction of kernel density estimation. Preprocessing techniques are designed to handle data with different recording manners to produce smooth probability density functions. Probability densities of speciﬁc grid points are obtained by inverse discrete cosine transformation and are further used to calculate assessment indices of wind resources. The method has been tested to estimate probability densities of parametric distributions and actual wind speed data measured in different places. Simulation results show that the proposed approach is of practical value in ﬁtting wind speed distribution models. © 2014 Elsevier B.V. All rights reserved.

1. Introduction As more and more wind power has been utilized around the world, it is of great value to estimate probability distributions of wind speed to maximize the efﬁciency of wind power generations. In power system planning and reliability analysis, most of the researches on wind speed probability distributions are based on parametric distribution models [1–6]. And the methods estimating wind speed probability distributions are usually parametric methods which consist of selecting a suitable parametric distribution and estimating the parameters [7–13]. The commonly used parametric distribution models can be divided into two categories:

(1) Unimodal parametric distributions including the Weibull distribution, the Gamma distribution, the Rayleigh distribution and so on. (2) Multimodal parametric distributions, especially bimodal models, for example, the mixture Weibull distribution of two components, the mixture distribution of truncated Normal and Weibull.

∗ Corresponding author. Tel.: +86 13621931964. E-mail addresses: [email protected] (X. Xu), [email protected] (Z. Yan), [email protected] (S. Xu). http://dx.doi.org/10.1016/j.epsr.2014.11.029 0378-7796/© 2014 Elsevier B.V. All rights reserved.

The parametric methods are widely used for the efﬁciency in estimation and can be applied to estimate wind characteristics at the sites for which no wind data is available [14,15]. However, there are also challenges for the parametric methods. First, there is no rule in selecting the theoretical distribution and a distribution model which can represent wind regimes at some wind farms may not work well for others. Second, an estimated parametric model may not always be satisfactory results because of the extreme randomness of wind speed in both time and space. Thus, the non-parametric methods are introduced to estimate wind speed probability distributions [16,17]. Kernel density estimation (KDE) is one of the most popular non-parametric distribution estimating methods [18–20]. Up to now, most of the papers concentrate on the mathematical theories of KDE and only a few of them actually model the probability distributions of wind speed. A practical kernel density method is proposed in [16] to estimate long-time wind speed probability distributions. A smooth multivariate wind distribution model is developed in [17] to capture the coupled variation of wind speed, wind direction, and air density. In this paper an improved diffusion-based kernel density method (DKDM) is presented to estimate wind speed probability distributions. DKDM accounts for both bandwidth selection and boundary correction of KDE and discrete cosine transformation is adopted in DKDM to reduce the computational complexity. In order to produce smooth probability densities, a uniformly distributed

X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37

random number is added to the original wind speed data and its value range depends on the corresponding recording frequency and resolution. The probability densities of speciﬁc grid points are obtained by inverse discrete cosine transformation and are further used to calculate assessment indices of wind resources. The proposed method is compared with other approaches in estimating probability densities of parametric distribution functions and actual wind speed data. The simulation results demonstrate the practicality of this method.

x − X

R(K) =

K 2 (u)du,

f (u)du 2

(8)

Deﬁne the Asymptotic Mean Integrated Squared Error (AMISE) as: h4 R(K) + (u2 (K))2 R(f ) 4 Nh

(9)

hopt =

1/5

R(K) u22 (K)R(f )

n−1/5

(10)

In (10) only R(f ) is unknown and has to be estimated. Thus, plug-in methods mainly concern the techniques to estimate R(f ).

(2)

It turns out that the choice of bandwidth is much more important than the choice of kernel functions for the behavior of estimating results. Small values of bandwidth make the estimation look “wiggly” and show spurious features, whereas big values of bandwidth will make the result over-smooth in a sense that it may not reveal the structural features, such as multimodality [19]. In addition, the original KDE assumes the domain of the density to be inﬁnite. Thus, KDE suffers from boundary problems if the domain has ﬁnite endpoints [21,22]. There are two classes of methods to estimate the bandwidth of KDE: the cross-validation methods which try to look at ISE(h), and the plug-in methods which try to minimize MISE(h) [23].

Among the two classes of methods, the most popular approach is Silverman’s rule of thumb (ROT) [18] in which the density is regarded as the normal distribution, but it usually makes the results over-smooth in multimodal models. Further, it has been reported in [24] that one-sided cross-validation method (OSCV) [25] and Sheather–Jones plug-in method (SJPI) [26] are outstanding methods among various bandwidth selection methods. In this paper, the densities estimated by ROT, OSCV and SJPI are evaluated and compared with the results obtained by the proposed method. ROT and SJPI are introduced in [18] and [26], respectively. The calculation procedure for OSCV using Gaussian kernel is deduced in Appendix A. 2.2. Kernel density estimation via diffusion

2

(fˆh (x) − f (x)) dx

(3) 2

{fˆh (x) − f (x)} dx]

MISE(h) = E[

(4)

where fˆh (x) represents the probability density estimated by KDE, f(x) represents the true density distribution function and E(·) is the expectation of variables. (1) Cross-validation methods

2

{fˆh (x) − f (x)} dx =

ISE(h) =

fˆh2 (x)dx − 2E{fˆh (x)}

+

f 2 (x)dx

(5)

The Gaussian kernel density estimator of (1) can be written in an alternative form: 1 fˆ (x; t) = (x, Xi ; t) N N

(11)

i=1

where

1 x − Xi 2 exp − (x, Xi ; t) = √ (12) 2t 2t √ where t has the same deﬁnition as h in (1), referred to as the bandwidth. It is shown in [27], the Gaussian kernel density estimator is the unique solution to the following diffusion partial differential equation (PDE): 2

The third term can be ignored since it does not depend on the bandwidth and (5) can be simpliﬁed as:

F(h) =

(7)

So the asymptotically optimal bandwidth is:

1 1 K(u) = √ exp − u2 2 2

h4 R(K) + (u2 (K))2 R(f ) 4 Nh

Nh

u2 K(u)du,

1

R(f ) =

where K(·) is the kernel function, h is the bandwidth and N is the size of data. There are many kernel functions and the Gaussian kernel is selected as the kernel function in this paper:

ISE(h) =

where

(1)

h

i=1

+ o(h4 ) + o

AMISE(h) =

i

2

E{fˆh (x) − f (x)} dx =

Suppose we have observed data X1 , X2 , . . ., XN from a common distribution with the probability density function f(x), and use the kernel density method to estimate the density as follows [18]: N

MISE(h) =

2.1. Introduction of kernel density estimation

1 fˆ (x) = K Nh

Various modiﬁcations of cross-validation methods have been proposed to accurately estimate the second term in (6). (2) Plug-in methods

u2 (K) =

2. Diffusion-based kernel density method

29

1 ∂ ˆ ∂ˆ f (x; t), x ∈ X, t > 0, X ≡ R f (x; t) = 2 ∂x2 ∂t

(13)

with the initial condition of 1 fˆ (x; 0) = ı(x − Xi ) N N

fˆh2 (x)dx − 2E{fˆh (x)}

(6)

i=1

(14)

30

X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37

where ı(x − Xi ) is the Dirac measure at Xi . When domain has ﬁnite endpoints, (1) needs boundary correction. Yet, within the PDE framework, all we have to do is to solve (13) over the ﬁnite domain with the initial condition (14) and the Neumann boundary condition:

∂ ˆ ∂ ˆ f (x; t)|x=Xl = f (x; t)|x=Xu = 0 ∂x ∂x

(15)

where Xl and Xu are the lower and upper bounds of the domain, respectively. Assuming the domain of the probability density is [0,1] and considering above conditions, the analytical solution of this PDE is as follow [27]: fˆ (x; t) =

N 1

N

(x, Xi ; t),

x ∈ [0, 1]

(x, 2k + Xi ; t) + (x, 2k − Xi ; t)

cos(kx) cos(kXi )

n−1

ak e−k

2 2 t/2

cos

(2j + 1)k

(24)

2n

k=0

6 2−3 7

2/5 [l] (t)

(25)

where [l] (t) = 1 (2 (· · ·l (t)· · ·)) l ≥ 1

⎛

l (t) = ⎝

2 1 + (1/2) 3N

/2

(l+1/2)

(26)

(1 × 3 × · · · × (2l − 1))

n−1 (2l+2) 2 −k2 2 t (k) ak e k=1

⎞2/(3+2l)

⎠

(27)

where l is selected as 7 in this paper.

The theta function in (17) can be written as: e

fˆ (x; t) ≈

√

2.3. Implementation of diffusion-based kernel density method

(x, Xi ; t) = 1 + 2

Eq. (22) indicates that ak can be calculated by discrete cosine transform (DCT). The DCT for a vector of n elements can be implemented via an n-point Fast Fourier Transform (FFT) for which the computational complexity is O(nlog2 n). For x = (2j + 1)/2n, j = 0, 1, . . ., n − 1

t=

Eq. (16) is an alternative form of KDE and considers both bandwidth selection and boundary correction. For small bandwidths, the estimator of (16) and the estimator of (17) behave similarly in the interior of [0,1]. As to the domain near boundaries, Eq. (16) has a better performance for it is consistent with the true density, while the Gaussian kernel density estimator by (1) is inconsistent [27].

−k2 2 t/2

(23)

(17)

k=−∞

∞

˛(k) = 2 for 1 ≤ k ≤ n − 1

(16)

where the kernel is the theta function and is given by: ∞

˛(0) = 1,

According to (24), the density estimator can be calculated by inverse discrete cosine transform (IDCT) and IDCT can be obtained via IFFT in O(nlog2 n) operations. The computation process of DCT and IDCT can be referred to [28]. The improved plug-in method in [27] is used to calculate the optimal bandwidth which is the solution to a ﬁx-point problem.

i=1

(x, Xi ; t) =

where

3. Process to estimate wind speed probability distribution (18)

k=1

3.1. General steps of wind speed probability distribution estimation

Thus, Eq. (16) can be approximately expressed by: fˆ (x; t) ≈

n−1

ak e

−k2 2 t/2

cos(kx)

(19)

k=0

where n is a large positive integer and the coefﬁcients ak can be given by:

ak =

⎧ 1 ⎪ ⎨

k=0

N 2

⎪ ⎩N

cos(kXi )

k = 1, 2, . . ., n − 1

(20)

i=1

For k = 1, 2, . . ., n − 1, ak can be estimated by the following equation:

ak

1

cos(kx)fˆ (x; 0)dx

=2

0 n−1

2 = N N

j=0 i=1 n−1

≈

Wn = (RF/FF) × RR × rnd

(j+1)/n

cos(kx)ı(x − Xi )dx j/n

2 cos k N j=0

j + 0.5 n

(21)

H(j)

where H(j) is the number of Xi in [j/n, (j + 1)/n]. Denote H(j)/N as y(j) and ak can be calculated by: n−1

ak = ˛(k)

j=0

y(j) cos

(2j + 1)k 2n

The probability distribution functions of wind speed in a duration of years are commonly recognized as continuous functions. However, in actual meteorological stations, data is sometimes recorded in low frequencies and resolutions which produce repeated wind speeds. In non-parametric methods, the results are completely determined by the data and these ties may lead to discrete distribution functions. Therefore, in order to obtain smooth densities, two preprocessing steps are designed. Step 1: If RF is higher than FF, the wind speed is the mean of the raw data in every FF. Where RF is the recording frequency of raw wind speed data and FF is the frequency of data which is used to estimate the probability distribution function. Step 2: A uniformly distributed random number Wn is added to the wind speed obtained in Step 1. Wn is calculated as follows.

(22)

(28)

where rnd is a uniformly distributed random number in the range of [−0.5, 0.5] and RR is the recording resolution of raw wind speed data. After the pretreatment of wind speed data, the probability distributions are estimated as follows. Step 3: Get the wind speed data V1 , V2 , . . ., VN and determine the lower bound (LB) and the upper bound (UB) of the domain. LB = 0

(29)

UB = max(Vi ) + × (max(Vi ) − min(Vi ))

(30)

where is a positive number and is 0.2 in this paper.

X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37

Step 4: Get n grid points Uj in the range of [LB, UB]. Count the number of wind speed data H(j) in the interval of [Uj , Uj+1 ). Uj = LB + j/n(UB − LB)

j = 0, 1, . . ., n − 1

(31)

where n is a large integer and is selected as 215 in this paper. Step 5: Transform n grid points to the domain of [0,1]: Xj =

Uj − LB

j n

=

UB − LB

Table 1 The parameters of three typical parametric distribution functions. PDF

Values of parameters

W1 W2 MWW MTNW

c = 7, k = 1.5 c = 8, k = 3.5 u1 = 3, c1 = 2.5, k1 = 3, u2 = 6, c2 = 4, k2 = 2, w = 0.4 u = 7, c = 6, k = 2, ˛ = 3, ˇ = 4, w = 0.5

(32)

Step 6: Calculate ak using (22) by DCT. Step 7: Calculate the optimal bandwidth using (25). Step 8: Calculate probability density fˆ (Xj + 1/(2n)) using (24) by IDCT. Step 9: Transform the density fˆ to the density fˆv in the domain of [LB, UB]:

fˆv Wj =

31

fˆ Xj + 1/ (2n)

3.2.2. Goodness of ﬁt The goodness of ﬁt of a statistical model describes how well it ﬁts a set of observations. The following two goodness of ﬁt tests are applied to test the results. 1) Chi-squared (Chi2) test 2) Kolmogorov–Smirnov (KS) test

(33)

(UB − LB)

where Wj = Uj + (UB − LB)/(2n). Different from conventional KDE methods, the densities of n points are directly calculated by IDCT. Based on this property, the characteristics of wind speed can be further analyzed. Step 10: The assessment indices of wind resources can be obtained as follows.

Chi2 test is based on binning of the data, thus only using part of the information. KS test is based on the empirical cumulative distribution function and fully utilizes the information of data. In both statistical tests, a smaller statistic represents a better goodness of ﬁt between the theoretical distribution and given samples. If the statistic is smaller than the corresponding critical value, the hypothesis that the samples are drawn from the reference distribution is accepted, otherwise it is rejected.

(a) Cumulative distribution function

Ww

Fˆv (Ww ) =

fv (v)dv = 0

w

4. Case study fˆv (Wj ) W

(34) 4.1. Estimating probability densities of parametric distribution models

j=0

where W = (UB − LB)/n. (b) The mean wind energy density

∞

v3 fv (v)dv = 0.5

P¯ all = 0.5

0

n−1

Wj3 fˆv (Wj ) W

(35)

j=0

where is the air density. (c) The mean effective wind energy density

Vout

v3 fv (v)dv = 0.5

P¯ use = 0.5

Vin

k2

Wj3 fˆv (Wj ) W

(36)

vm = v max(0.5 v3 fv (v)) = Wj max(0.5 Wj3 fˆv (Wj )) , j = 0, 1, . . . n − 1

(37)

3.2. Measures to evaluate the performances of methods 3.2.1. Integrated error The integrated error (IE) measures the distance between the estimated density and the true density distribution function and is given by:

IE =

|fˆ (x) − f (x)|dx

1) Weibull distribution (W) 2) Mixture Weibull distribution of two components (MWW) 3) Mixture distribution of truncated Normal and Weibull (MTNW)

j=k1

where Wk1 = Vin and is the cut-in wind speed of wind generators. Wk2 = Vout and is the cut-out wind speed of wind generators. (d) The mode of wind power density distribution The mode of wind power density distribution is the wind speed at which the power density distribution is a maximum.

In this section, we test the ability of the proposed method in estimating probabilities densities of commonly used parametric distribution models. Corresponding to the categories in Section 1, three typical parametric distribution functions are selected as follows.

(38)

where fˆ (x) is the estimated density and f(x) is the theoretical probability density function.

The above parametric distributions contain unimodal and multimodal models, and considers null wind speed as well. Other unimodal models such as the Gamma distribution and the Rayleigh distribution are not tested due to their similarities with the Weibull distribution. In Table 1, the values of parameters of the probability density functions are given. u, c and k are the location, scale and shape parameter of Weibull distribution. ˛ and ˇ are the expectation and standard deviation of the corresponding truncated normal distribution. w is the weight coefﬁcient in MWW and MTNW. Readers can refer to [12,9] for detailed models of MWW and MTNW. For the four PDFs given in Table 1, 100, 200, 500, 1000 samples sizes are considered and 250 repetitions for each PDF and each sample size are performed. The ﬁnal integrated errors are the mean of 250 integrated errors. We compared the proposed method with the widely recognized bandwidth selection methods, ROT, SJPI and OSCV. The results are given in Fig. 1. ROT assumes the estimating model to be the normal distribution, and usually does a good job in estimating unimodal distributions such as W1 and W2. The improved bandwidth selection methods, SJPI and OSCV, have the similar performances with ROT in estimating unimodal models. Moreover, they have a better performance in estimating multimodal distributions, for example, MWW and MTNW.

32

X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37

Fig. 1. The integrated errors of estimating probabilities densities of parametric models.

Wind speed is strictly non-negative and in this case the parameters of W2 and MWW make the distributions have a small probability near zero. Thus, the boundary problems can be ignored which leads to almost the same results obtained by SJPI, OSCV and DKDM. On the contrary, W1 and MTNW have a large probability near zero. Therefore, when estimating these models, DKDM is the

obvious winner respect to other methods which do not consider boundary correction. Fig. 2 gives intuitive descriptions of the performances of different methods in estimating densities of parametric models. The number of samples is set as 5000. Because of the similar performances by SJPI and OSCV, only the results by SJPI are shown. In the

Fig. 2. The probability densities of parametric models estimated by ROT, SJPI and DKDM.

X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37

33

Table 2 Details of wind speed observation stations. Location

Latitude

Longitude

Ganhekou Changma Deelen Huibertgat Abilene Bakersﬁeld

N40.51 N39.85 N52.06 N53.57 N32.41 N35.43

E95.44 E96.78 E5.89 E6.40 W99.68 W119.05

Time 2008.5–2009.4 2008.5–2009.4 2001.1–2014.5 2001.1–2014.5 2013.1–2013.12 2013.1–2013.12

regions away from zero, the densities estimated by DKDM and SJPI nearly coincide which indicates that the computing bandwidths are practically equal for the two methods. As to the regions near zero, the outcomes are quite different. Speciﬁcally, the domain of the probability density estimated by DKDM is non-negative, while the domain of the density estimated by SJPI extends to the negative axle. In other words, there exist negative wind speeds for the results obtained by SJPI. In addition, ROT does a satisfactory job in describing W1 and W2 but it overﬁts the densities of bimodal distribution models. In summary, DKDM accounts for both bandwidth selection and boundary correction of KDE and does a satisfactory job in ﬁtting commonly used parametric models.

4.2. Estimating probability densities of wind speed data In this section, wind speed data measured in six wind observation stations are used to test the proposed method and the details of the stations are given in Table 2. The wind speed data measured in Ganhekou and Changma is provided by State Grid Corporation of China (SGCC). The data measured in Huibertgat and Deelen is obtained from Royal Netherlands Meteorological Institute (KNMI) [29]. And the wind speed data in Abilene and Bakersﬁeld is obtained from National Climatic Data Center (NCDC) [30].

RF

FF

RR

10 min 10 min 1h 1h 2 min 2 min

1h 1h 1 day 1 day 1h 1h

0.000001 m/s 0.000001 m/s 0.1 m/s 0.1 m/s 1 knot 1 knot

4.2.1. Probability distributions of wind speed data First of all, the effects of preprocessing on estimation are studied. In Fig. 3 Wn is not considered. The probability densities of the wind speed measured in Ganhekou and Changma are relatively smooth and are similar to parametric models such as the Weibull distribution. However, results in Deelen, Huibertgat, Abilene and Bakersﬁeld present many discrete peaks. In these stations the recording frequencies and resolutions are relatively low which leads to many repeated wind speed data. Fig. 4 gives the occurrence numbers of different wind speed data measured in Deelen. The data is unevenly distributed. There are totally 119,136 samples but there are only 147 different values. The occurrence number of 1.1 m/s is 10,790 but there is no wind speed between 1.2 m/s and 2.0 m/s. According to the bandwidth selection method in (25), DKDM assumes the probability density function to be continuous and ties or repeated values are recognized as samples of discrete variables. Hence for these stations the bandwidths estimated by DKDM are quite small, leading to many spurious peaks in the results. In Fig. 5, Wn is introduced and the outcomes are quite different from those in Fig. 3. Due to different recoding frequencies and resolutions, the value ranges of Wn are different for different stations. For Ganhekou and Changma Wn = 1.67 · 10−7 rnd. For Deelen and Huibertgat Wn = 4.17 · 10−3 rnd. As to Abilene and Bakersﬁeld, Wn = 1.71 · 10−2 rnd. In fact, the resolution of data is (RF/FF) × RR after Step 1, thus adding Wn to the wind speed data does not

Fig. 3. The probability densities of wind speed obtained by DKDM when Wn is not considered.

34

X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37 Table 3 The comparison of bandwidths between not introducing Wn and introducing Wn.

Fig. 4. Occurrence numbers of different wind speed data recorded in Deelen.

bring in errors under the resolution of (RF/FF) × RR. The introduction of Wn eliminates the repeated values of the data and produces continuous probability densities. A comparison of bandwidths between introducing Wn and not introducing Wn is given in Table 3. In accordance with Figs. 3 and 5, for Deelen, Huibertgat, Abilene and Bakersﬁeld the estimated bandwidths are relatively larger after introducing Wn, which results in smoother probability densities. In Fig. 5, parametric methods and other non-parametric methods including the histogram (Hist) and k-nearest-neighbor (KNN) density estimation [31] are also performed for comparison. The histogram is simple but it is not an accurate description of the underlying density. It usually provides a quick visualization of the distribution. The Weibull distribution is used in the parametric method and does an acceptable job in describing the probability

Location

Bandwidth without Wn

Bandwidth with Wn

Ganhekou Changma Deelen Huibertgat Abilene Bakersﬁeld

0.0161 0.0187 1.6994 × 10−05 2.3114 × 10−05 1.2128 × 10−05 1.0052 × 10−05

0.0161 0.0187 0.0184 0.0233 0.0179 0.0093

densities of the data measured in Changma and Huibertgat. However, in other stations, the parametric models are not consistent with the histograms. KNN has a good performance in evaluating the probability densities in the interior of the domain. However, KNN produces heavy tails and the result is not a true probability density since its integral over all the sample space diverges. Other KDE methods, including the bandwidth optimization (BO) method in [16] and SJPI in [17] are also performed. The comparisons between different KDE methods are given in Fig. 6. Just as the results given in Section 4.1, in the regions away from zero, densities obtained by SJPI and DKDM nearly coincide with each other. While in the regions near zero, the domain of the density estimated by DKDM is non-negative but the result obtained by SJPI extends to the negative axle. Compared with SJPI and DKDM, BO produces unsmooth results which are obvious in Changma, Deelen and Abilene. In other words, the bandwidths obtained by BO are smaller than the bandwidths estimated by DKDM and SJPI. In summary, DKDM does the best job in ﬁtting wind speed probability distributions in these six stations. Different from conventional KDE methods, the probability densities of speciﬁc grid points are calculated by IDCT in DKDM. Therefore, the cumulative distribution function (CDF) and various assessment indices of wind resources can be easily obtained by numerical integrations. We assume that the cut-in wind speed of the wind generator is 3 m/s, the cut-out wind speed is 20 m/s and the air density is 1.225 kg/m3 . The CDF of wind speed between

Fig. 5. The probability densities of wind speed when Wn is considered.

X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37

35

Fig. 6. The comparisons between different KDE methods in ﬁtting wind speed probability distributions.

Table 4 The assess indices of wind resources. Location

CDF vvout

P¯ all (W/m2 )

P¯ use (W/m2 )

vm (m/s)

Ganhekou Changma Deelen Huibertgat Abilene Bakersﬁeld

0.80 0.91 0.72 0.97 0.72 0.30

598.21 579.32 75.40 382.61 127.13 23.00

551.54 550.49 70.09 382.22 125.24 19.99

13.07 12.73 5.97 9.82 9.11 4.93

in

Table 5 The statistics of Chi2 tests for the estimated probability densities.

DKDM SJPI Weibull Critical value

Ganhekou

Changma

Deelen

Huibertgat

Abilene

Bakersﬁeld

17.54 22.43 307.94

10.54 11.26 54.39

3.46 3.22 314.74

12.57 11.29 151.17

4.15 4.22 120.36

2.80 3.01 2512.62

36.42

36.42

21.03

28.87

22.36

22.36

Table 6 The statistics of KS tests for the estimated probability densities. Ganhekou

Changma

Deelen

Huibertgat

Abilene

Bakersﬁeld

DKDM SJPI Weibull

0.0114 0.0132 0.0386

0.0059 0.0062 0.0089

0.0076 0.0075 0.0398

0.0087 0.0082 0.0276

0.0060 0.0062 0.0295

0.0063 0.0067 0.0535

Critical value

0.0148

0.0148

0.0192

0.0195

0.0149

0.0157

the cut-in and cut-out wind speed and the assessment indices are given in Table 4. Although it is inappropriate to compare wind resources between different stations due to different measuring conditions, we can still draw some conclusions. For example, Ganhekou and Changma are rich in wind energies because of large P¯ all and P¯ use . Deelen and Huibertgat are nearby in geographical locations, but their wind conditions are remarkably different. P¯ all is small in Bakersﬁeld and it may be unsuitable to establish wind farms in this place.

4.2.2. Goodness of ﬁt tests The statistics of goodness of ﬁt tests for the results obtained by the proposed method, SJPI and the parametric method using the Weibull distribution are given in Tables 5 and 6. In the case of Chi2 tests, the signiﬁcant level is 0.05 and the data is grouped in accordance with the histogram in Fig. 5. The statistics obtained by DKDM and SJPI are smaller than the corresponding critical values. However, most of the statistics obtained by the Weibull distributions are larger than the critical values. Although the Weibull distribution

36

X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37

can represent a wide range of wind regimes, it is unable to describe the probability models of all the wind speed data. The results of KS tests are given in Table 6 and are similar to those in Table 5. It should be noted that in both Chi2 tests and KS tests, the empirical distribution function (EDF) is recognized as the theoretical cumulative distribution function which may be defective in testing the results obtained by KDE. EDF can be treated as the result estimated by KDE with the bandwidth equal to zero which is described by (14). When the bandwidth tends asymptotically toward zero, the results of KDE gradually approach EDF. Therefore, smaller statistics of goodness of ﬁt tests indicate that the bandwidths are smaller and the estimated distribution functions are closer to EDF, rather than the true probability distribution functions. In Deelen and Huibertgat, the statistics of SJPI are a little smaller than those of DKDM which means that the bandwidths obtained by SJPI are a bit smaller. In the other four stations, the results obtained by DKDM are smaller. This is because in these cases the boundary problem is prominent and has been properly handled by DKDM.

Step 2 Minimize the following equation to get bandwidth b¯ by ¯ using kernel function K(·).

2 OSCV (b) = √ exp 2 N b i=1

−

2 K¯ N2h

where 2 I(Xi , Xj ) = √ 2h2

This work was supported by the Natural Science Foundation of China (51377103). The authors gratefully acknowledge the support from State Grid Corporation of China, Royal Netherlands Meteorological Institute and National Climatic Data Center for providing the wind speed data.

Appendix A. The calculation steps of Gaussian kernel density estimator by OSCV is as follows: Step 1: The intermediate kernel function is given as follows:

¯ = K(x)

1+

2/x

1 − 2/

2 exp √ 2

x2 − 2

, x<0

(A.1)

(Xi − Xj )2 4h2

(Xi − Xj )

I(Xi , Xj )

(A.2)

h

min(0,Xj −Xi )

exp

−

(x + 0.5(Xi − Xj ))2

−∞

h2

(A.3)

Step 3: Rescale b¯ to obtain the bandwidth of Gaussian kernel density estimation. h¯ = C · b¯

C=

Acknowledgements

−

i=1 j=1

This paper studies the problem of estimating wind speed probability distributions by non-parametric methods. Based on the diffusion partial differential equation in ﬁnite domain, the method accounts for both bandwidth selection and boundary correction of kernel density estimation. Discrete cosine transformation and inverse discrete cosine transformation are applied to reduce computation complexity of the proposed method. A process consisting of data preprocessing techniques and DKDM is designed to estimate wind speed probability distributions. The proposed method is compared with other approaches in estimating probability densities of parametric distributions functions and actual wind speed data. Although compared with parametric methods DKDM may be trapped in situations where no wind speed is available, it can obtain more accurate results. Moreover, wind speed in different places may own different characteristics. It is difﬁcult for a single parametric distribution to properly model different wind speed data. The proposed non-parametric method can overcome the deﬁciencies of parametric methods and is of practical value in describing wind speed probability distributions. In future work, we shall try to extend DKDM to model multivariate wind distributions.

j

N

N

(A.4)

where 5. Conclusion

N

N

2

¯ R(K)(u2 (K)) 2 ¯ R(K)(u (K)) 2

1/5 ≈ 0.6168

(A.5)

where K is the Gaussian kernel. Step 4: The estimating probability density is:

1 1 exp − √ ¯ 2 N h 2 N

fˆ (x) =

i=1

x − X 2 i

h¯

(A.6)

References [1] D. Villanueva, J.L. Oazos, A. Feijoo, Probabilistic load ﬂow including wind power generation, IEEE Trans. Power Syst. 26 (3) (2011) 1659–1667. [2] N. Soleimanpour, M. Mohammadi, Probabilistic load ﬂow by using nonparametric density estimators, IEEE Trans. Power Syst. 28 (4) (2013) 3747–3755. [3] L. Shi, C. Wang, L. Yao, Y. Ni, M. Bazargan, Optimal power ﬂow solution incorporating wind power, IEEE Syst. J. 6 (2) (2012) 233–241. [4] Z. Qin, W. Li, X. Xiong, Generation system reliability evaluation incorporating correlations of wind speeds with different distributions, IEEE Trans. Power Syst. 28 (1) (2013) 551–558. [5] D. Villanueva, A. Feijoo, J.L. Pazos, Simulation of correlated wind speed data for economic dispatch evaluation, IEEE Trans. Sustain. Energy 3 (1) (2012) 142–149. [6] T. Yeh, L. Wang, A study on generator capacity for wind turbines under various tower heights and rated wind speeds using Weibull distribution, IEEE Trans. Energy Convers. 23 (2) (2008) 592–602. [7] J.A. Carta, P. Ramírez, S. Velázquez, A review of wind speed probability distributions used in wind energy analysis: case studies in the Canary Islands, Renew. Sustain. Energy Rev. 13 (5) (2009) 933–955. [8] V. Lo Brano, A. Orioli, G. Ciulla, S. Culotta, Quality of wind speed ﬁtting distributions for the urban area of Palermo, Italy, Renew. Energy 36 (3) (2011) 1026–1039. [9] J.A. Carta, P. Ramírez, Use of ﬁnite mixture distribution models in the analysis of wind energy in the Canarian Archipelago, Energy Convers. Manag. 48 (1) (2007) 281–291. [10] S. Wang, J. Yu, Approximation of two-peak wind speed probability density function with mixed Weibull distribution, Autom. Electr. Power Syst. 34 (6) (2010) 89–93. [11] E.C. Morgan, M. Lackner, R.M. Vogel, L.G. Baise, Probability distributions for offshore wind speeds, Energy Convers. Manag. 52 (1) (2011) 15–26. [12] O.A. Jaramillo, M.A. Borja, Wind speed analysis in La Ventosa, Mexico: a bimodal probability distribution case, Renew. Energy 29 (10) (2004) 1613–1630. [13] B. Safari, Modeling wind speed and wind power distributions in Rwanda, Renew. Sustain. Energy Rev. 15 (2) (2011) 925–935. [14] A.J. Bowen, N.G. Mortensen, WAsP Prediction Errors Due to Site Orography, 2004, http://orbit.dtu.dk/fedora/objects/orbit:91202/datastreams/ ﬁle 7711496/content [15] G. Gualtieri, S. Secci, Methods to extrapolate wind resource to the turbine hub height based on power law: a 1-h wind speed vs. Weibull distribution extrapolation comparison, Renew. Energy 43 (2012) 183–200. [16] Z. Qin, W. Li, X. Xiong, Estimating wind speed probability distribution using kernel density method, Electr. Power Syst. Res. 81 (12) (2011) 2139–2146. [17] J. Zhang, S. Chowdhury, A. Messac, L. Castillo, A multivariate and multimodal wind distribution model, Renew. Energy 51 (2013) 436–447. [18] B.W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall, UK, 1986.

X. Xu et al. / Electric Power Systems Research 121 (2015) 28–37 [19] B.A. Turlach, Bandwidth Selection in Kernel Density Estimation: A Review, Université catholique de Louvain, Louvain-la-Neuve, 1993. [20] S. Chiu, A comparative review of bandwidth selection for kernel density estimation, Stat. Sin. 6 (1) (1996) 129–145. [21] M.C. Jones, Simple boundary correction for kernel density estimation, Stat. Comput. 3 (3) (1993) 135–146. [22] S.X. Chen, Probability density function estimation using gamma kernels, Ann. Inst. Stat. Math. 52 (3) (2000) 471–480. [23] R. Cao, Bootstrapping the mean integrated squared error, J. Multivar. Anal. 45 (1) (1993) 137–160. [24] N. Heidenreich, A. Schindler, S. Sperlich, Bandwidth Selection Methods for Kernel Density Estimation – A Review of Performance, 2010, Available at SSRN 1726428. [25] G.N. Gregoriou, Operational Risk Towards Basel III: Best Practices and Issues in Modeling, Management and Regulation, vol. 481, John Wiley and Sons, New Jersey, 2009, pp. 177–196.

37

[26] J.S. Sheather, M.C. Jones, A reliable data-based bandwidth selection method for kernel density estimation, J. R. Stat. Soc. Ser. B: Methodol. 53 (3) (1991) 683–690. [27] Z.I. Botev, J.F. Grotowski, D.P. Kroese, Kernel density estimation via diffusion, Ann. Stat. 38 (5) (2010) 2916–2957. [28] A.K. Jain, Fundamentals of Digital Image Processing, vol. III, Prentice-Hall, Englewood Cliffs, 1989. [29] KNMI, Royal Netherlands Meteorological Institute, 2014. http://www.knmi.nl/ klimatologie/onderzoeksgegevens/potentiele wind/ [30] NCDC, National Climatic Data Center, 2014. http://www.ncdc.noaa.gov/ [31] K. Fukunaga, L. Hostetler, Optimization of k-nearest-neighbor density estimates, IEEE Trans. Inform. Theory 19 (3) (1973) 320–326.

Estimating wind speed probability distribution by diffusion-based kernel density method

Estimating wind speed probability distribution by diffusion-based kernel density method

Recommend Documents