Mycielski approach for wind speed prediction

Mycielski approach for wind speed prediction

Energy Conversion and Management 50 (2009) 1436–1443 Contents lists available at ScienceDirect Energy Conversion and Management journal homepage: ww...

1MB Sizes 0 Downloads 81 Views

Energy Conversion and Management 50 (2009) 1436–1443

Contents lists available at ScienceDirect

Energy Conversion and Management journal homepage: www.elsevier.com/locate/enconman

Mycielski approach for wind speed prediction Fatih O. Hocaog˘lu a,*, Mehmet Fidan b, Ömer N. Gerek b a b

Afyon Kocatepe University, Department of Electronics and Communication Engineering, 03200 Afyonkarahisar, Turkey Anadolu University, Department of Electrical and Electronics Engineering, 26555 Eskisßehir, Turkey

a r t i c l e

i n f o

Article history: Received 1 August 2008 Accepted 8 March 2009 Available online 8 April 2009 Keywords: Modeling Mycielski algorithm Wind speed Prediction

a b s t r a c t Wind speed modeling and prediction plays a critical role in wind related engineering studies. However, since the data have random behavior, it is difficult to apply statistical approaches with apriori and deterministic parameters. On the other hand, wind speed data have an important feature; extreme transitions from a wind state to a far different one are rare. Therefore, behavioral modeling is possible. Although several studies focus on global parametrization of wind data behavior, the literature in time-wise modeling and prediction is relatively small. In this study, a novel approach for wind speed modeling using the Mycielski algorithm is demonstrated. The algorithm accurately predicts the time variations of wind speed data in the sense of forecasting future values of wind data by analyzing the repeatedness in the history of the data. The prediction precision of the procedure is tested using wind speed data obtained from three _ different locations of Turkey (Kayseri, Izmir and Antalya). Prediction results with high accuracy are obtained and presented. Ó 2009 Elsevier Ltd. All rights reserved.

1. Introduction Wind speed data modeling and prediction remains an important subject for energy planning [1–6]. Specifically, the prediction of general statistics and the distribution of the wind data are of vital importance for determining the wind regime of a region on earth [7]. Generally, a Weibull distribution and its parameters are used for wind regime studies because it gives a fair (but coarse) information regarding the overall wind potential of a site. More specific modeling and time variation analysis of wind speed data was examined in relatively fewer works. Because of the transitional behavior of wind speed, Markov models became popular in that area. As examples, Sahin and Sen have modeled the wind speed data measured from the Marmara region of Turkey using first order Markov chains [8]. Tore et al. used first order Markov chain models for synthetic generation of hourly wind speed time series in the Corsica region [9]. Youcef Ettoumi et al. have modeled three-hourly wind speed and wind direction data by means of Markov chains [10]. Shamshad et al. have generated hourly wind speed data using first and second order Markov chains and compared the first and second order Markov chains using wind speed data measured from two different regions in Malaysia [11]. In their study, it was concluded that the wind speed behavior slightly improves by increasing the Markov model order. Recently, Hocaoglu et al. also modeled the wind speed data using Markov chains. It is presented in that study that increasing the state size of Markov has important effects for the quality of generated data from a Markov process [12]. * Corresponding author. Tel.: +90 5054856012; fax: +90 2722281422. E-mail address: [email protected] (F.O. Hocaog˘lu). 0196-8904/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.enconman.2009.03.003

This study presents a different and novel approach to the analysis of time variations in wind speed data by means of short term prediction. The prediction approach utilizes the Mycielski algorithm, which is occasionally used for communication applications. The underlying flavor of Markov modeling consists of constructing a value transition probability table using a training set of recorded wind speed samples. Although the idea of the Mycielski algorithm also depends on learning from past samples, unlike the Markov approach of building transition probabilities, it considers the past data samples as a whole during the prediction. The Mycielski algorithm uses all of the available past samples of the data. It searches for the longest string (or data array) in the ‘‘history” that matches the longest suffix string of the original data (which corresponds to the samples at the end of the array). Once the longest such repeating string is found in the past history samples, it is assumed that the next sample following the matched string is the prediction value. This algorithm was classically used for coding and compression in communications [13], or, with simple inversion modifications, it can be used as a pseudo-random number generator [14]. It is proposed here that it can also be used as a nonlinear predictor [15] for the purpose of very short term wind speed forecasting and analysis. The suggested algorithm is explained and exemplified in this paper with the following organization. First, brief information about the Mycielski algorithm is presented in Section 2. In Section 3, the data used to present the efficiency of prediction is revealed. The results obtained from the algorithm are presented and discussed in Section 4. Finally, the model efficiency is demonstrated in Section 5 by comparing hourly data samples and probability distributions of hourly wind speed data.

F.O. Hocaog˘lu et al. / Energy Conversion and Management 50 (2009) 1436–1443

2. The Mycielski algorithm The Mycielski algorithm performs a prediction using the total exact history of the data samples. The basic idea of the algorithm is to search for the longest suffix string at the end of the data sequence which had been repeated at least once in the history of the sequence. The search starts with a short (length = 1) template size and continues increasing the template length as long as matches are found in the history. When the longest repeating sequence is determined, the value of the sample right after the longest repeating template is assigned as the prediction value. The rule estimates that if this pattern had appeared like this in the past, then it is supposed to behave the same now. This predictor can be generalized with the expression in Eq. (1):

^x½n þ 1 ¼ fnþ1 ðx½1; . . . ; x½nÞ

ð1Þ

The function f ðÞ performs an iterative algorithm that starts from the shortest data segment at the end (i.e. length one sample: x½n), then one by one increases the data segment length to the left side as ðx½n  1; x½nÞ; ðx½n  2; x½n  1; x½nÞ, etc. Meanwhile, the segments are searched from the end point to the start point by sliding over the samples. Several matches could be found during the algorithm run. At a point of a ‘‘no-match”, a probably longer segment will not be encountered anywhere in the past sequence. At that point, the prediction is made as the next sample value of the latest encountered (1 shorter) matching string. Naturally, the algorithm searches through the whole data sequence repeatedly for each prediction step, and it has high computational requirements. The overall scheme can be analytically expressed as follows:

m ¼ arg max



x½k ¼ x½n; x½k  1 ¼ x½n  1



L ; . . . ; x½k  L þ 1 ¼ x½n  L þ 1 fnþ1 ¼ ^x½n þ 1 ¼ x½m

ð2Þ

The above predictor can be re-described in words as an attempt to estimate the next sample in the currently ongoing random process as the most probable sample that had occurred in the history of the data sequence. The most probable is taken as the longest repeating chain of data samples. 3. Data determination and model construction In this study, in order to show the efficiency of the Mycielski approach in wind speed data modeling, the data belonging to differ_ ent geographical regions of Turkey (Kayseri, Izmir and Antalya) are distinctively selected (as depicted in Figs. 1–3). The data were recorded in the year 2005, and were in the ranges of 0–11.5, 0.1– _ 13.6 and 0–15.3 for the Kayseri, Izmir and Antalya regions, respectively. Since the model requires integer values (so that exact comparisons can be made), the data is first converted to wind speed states basically by rounding to the nearest integers. After the wind states are determined, the Mycielski algorithm is applied as explained in Section 4. For the rest of the paper, these integerized values will be notated as ‘‘rounded measurements”. 4. Mycielski modeling performance results After the wind speed values are converted to wind states by rounding, the Mycielski prediction is applied throughout the available data. As suggested by the Mycielski prediction method, the wind states data sequence is examined by looking for the longest template ending at the end of the sequence which had appeared in the history of the wind state sequence. The prediction procedure is applied for each wind state. The main motive in applying the Mycielski algorithm to wind speed data was the relatively stationary behavior of wind data.

1437

As a matter of fact, the prediction methods that depend on Markov models also rely on this assumption. The stationary behavior can be illustrated by the following example. Let m indicate a wind state within a time interval, t. It is expected that, in the next time interval, the wind speed state will most probably be within the same state (m), or near to this state (m + 1 or m  1). Large deviations from the state value of m are rare. Another assumption of the Mycielski predictor is the ‘‘repeated” behavior of the data. This behavior is found to be reasonable considering the short term cyclic pattern corresponding to day-night transitions, and longer term patterns corresponding to seasonal weather variations. Therefore, it is assumed that there should be some sequences (long or short) that repeat themselves in the history of the data. This idea was translated into forms of transition probabilities in Markov models. Here, the idea is tested in its absolute repeating structure. To show the accuracy and robustness of the developed wind speed prediction process, the algorithm is applied to rounded ver_ sions of the data obtained from three different sites (Kayseri, Izmir and Antalya). In order to illustrate how close the prediction values get to the actual recordings, the rounded measurements and pre_ dicted values are shown on the same plots for Kayseri, Izmir and Antalya, given in Figs. 4–6, respectively. In these figures, red1 lines with circles represent prediction values and solid blue lines represent rounded actual recordings. To show the usefulness of the algorithm, the results are also compared with actual measured wind speed values. The measured values (without any rounding) and predicted values are also drawn on the same plots using the same legends as above. The results for _ Kayseri, Izmir and Antalya are given in Figs. 7–9, respectively. An important requirement of the prediction values is that they carry the same statistical properties as the actual recordings. The basic statistics between actual measured, rounded measured and predicted data are compared in Table 1. The parameters indicate that the prediction values have statistical parameters that are very close to those of the actual recordings. Furthermore, it is obvious from Figs. 4–9 that the process developed in this study not only predicts the general statistics but also depicts the variation of the data in time successfully. The statistical values of the differences between actual and prediction values (namely, the prediction error) are presented in Table 2. In this table, root mean square error (RMSE) values between actual and predicted data are also calculated. It can be noticed that the mean, extrema, and deviations of the prediction error are small, indicating a successful prediction output. This fact is also illustrated by plotting the prediction error values between rounded and predicted wind speed data for the Antalya region in Fig. 10. Prediction errors between actual recordings and prediction results are shown in Fig. 11. Figs. 10 and 11 show that the errors between actual and predicted data are small with a zero mean (indicating an unbiased prediction). It can also be seen that the prediction errors become small (and stay small) within a very short time. In order to provide quantitative measures for the prediction error, the statistics of the prediction errors are calculated and given in Table 2. As a side observation, the smallest prediction error variance was obtained from the data of the Kayseri region, indicating that the short term forecast was most successful for this inland region. _ On the other hand, the smallest error mean was obtained for Izmir. Therefore, long term prediction was found to be equally stable and successful for all considered regions.

1 For interpretation of color in Figs. 4–6, the reader is referred to the web version of this article.

F.O. Hocaog˘lu et al. / Energy Conversion and Management 50 (2009) 1436–1443

1438

12

Wind Speed (m/s)

10 8 6 4 2 0

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

8000

9000

8000

9000

Hour Fig. 1. Hourly wind speed data measured from Kayseri in year 2005.

Wind Speed (m/s)

15

10

5

0

0

1000

2000

3000

4000

5000

6000

7000

Hour _ Fig. 2. Hourly wind speed data measured from Izmir in year 2005.

Wind Speed (m/s)

20

15

10

5

0

0

1000

2000

3000

4000

5000

6000

7000

Hour Fig. 3. Hourly wind speed data measured from Antalya in year 2005.

F.O. Hocaog˘lu et al. / Energy Conversion and Management 50 (2009) 1436–1443

1439

Fig. 4. Rounded measured and predicted wind speed values for Kayseri.

_ Fig. 5. Rounded measured and predicted wind speed values for Izmir.

Fig. 6. Rounded measured and predicted wind speed values for Antalya.

5. Testing the model efficiency by probability distributions of hourly wind speed data Most of the wind energy production systems and their plans rely on the parameters obtained from a particular histogram (or frequencies) of the recorded wind speed values. Typically, a Weibull distribution is assumed for the wind speed values [1–6]. Therefore,

the recorded data is normally used for constructing the histogram and then estimating the mean and variance of the histogram with a Weibull distribution assumption. Considering the prediction process presented here, one way of checking the consistency of the prediction output is via verifying the Weibull behavior of the prediction output. A prediction output that violates the Weibull characteristics would spoil the accuracy of the energy plans that

1440

F.O. Hocaog˘lu et al. / Energy Conversion and Management 50 (2009) 1436–1443

Fig. 7. Measured and predicted wind speed values for Kayseri.

_ Fig. 8. Measured and predicted wind speed values for Izmir.

Fig. 9. Measured and predicted wind speed values for Antalya.

rely on that assumption. Consequently, in this section, the accuracy of the prediction model is studied by constructing the prediction output histogram and comparing it to an analytical expression of the Weibull distribution. Obviously, the Weibull characteristics of the actual data should also be studied for a fair comparison. First, the histogram sequences of the measured and generated data

belonging to each described region are calculated and compared to each other in Tables 3–5. In these tables, ‘‘actual” wind speed values correspond to the frequencies of rounded wind speed values, whereas ‘‘predicted” wind speed values correspond to the frequencies of the prediction values obtained from the Mycielski algorithm.

F.O. Hocaog˘lu et al. / Energy Conversion and Management 50 (2009) 1436–1443

1441

Table 1 Basic statistics of the measured and predicted data. Minimum

Maximum

Mean

Median

Variance

Range

Kayseri

M RM P

0 0 0

11.5 12 11

1.579 1.6313 1.6475

1.2 1.0 1.0

1.7530 1.7977 1.7806

11.5 12 11

_ Izmir

M RM P

0.1 0 0

13.6 14 14

3.596 3.6360 3.6314

3.3 3.0 3.0

4.3347 4.4243 4.3886

13.5 14 14

Antalya

M RM P

0 0 0

15.3 15 15

2.871 2.9056 2.9197

2.3 2.0 2.0

4.2189 4.3197 4.2514

15.3 15 15

M: Measured, RM: Rounded measured, P: Predicted.

Table 2 Statistics of errors. Region

Type

Kayseri

RM–P M–P

_ Izmir Antalya

Mean

Variance

RMSE

0.0163 0.0682

1.1883 1.1208

1.0901 1.0609

RM–P M–P

0.0046 0.0357

2.3299 2.2503

1.3786 1.3479

RM–P M–P

0.0142 0.0492

1.9005 1.8147

1.5263 1.5005

M: Measured, RM: Rounded measured, P: Predicted.

Next, the actual and predicted hourly wind speed data for the _ and Antalya regions are fitted to two parameter WeiKayseri, Izmir bull distributions. The Weibull parameters are presented in Table 6. The parameters match each other within an interval of about 1.5%, which indicates a very fine performance. To better illustrate the matching performance of the proposed model to the Weibull distribution, the histogram graphs for the rounded measured and predicted data for the Antalya region are given in Fig. 12. In this figure, the two histograms not only match each other (Fig. 12a and b – boxed plot), but also closely match the

Fig. 10. Prediction error between rounded measured and predicted data for Antalya.

Fig. 11. Prediction error between measured and predicted data for Antalya.

F.O. Hocaog˘lu et al. / Energy Conversion and Management 50 (2009) 1436–1443

1442

Table 3 Wind speed histogram values (percentages) for Kayseri.

Table 5 Wind speed histogram values (percentages) for Antalya.

State intervals

Actual

Predicted

State Intervals

Actual

Predicted

0–1 1–2 2–3 3–4 4–5 5–6 6–7 7–8 8–9 9–10 10–11 11–12

7.0128 55.0888 22.9964 7.4567 3.3925 1.5369 0.8424 0.7172 0.4668 0.3188 0.1480 0.0114

7.2746 53.4267 23.7819 8.0032 3.3242 1.6849 0.9677 0.6261 0.4895 0.2960 0.1138 0.0114

0–1 1–2 2–3 3–4 4–5 5–6 6–7 7–8 8–9 9–10 10–11 11–12 12–13 13–14 14–15 15–16

3.1393 23.7443 26.9178 16.4155 12.1005 6.3699 4.5548 2.3288 1.9406 1.4954 0.5479 0.2055 0.1027 0.0457 0.0685 0.0228

3.0822 23.1164 26.6096 17.6027 11.8493 6.4041 4.5320 2.3402 1.9749 1.5982 0.5251 0.1712 0.0913 0.0342 0.0457 0.0228

Table 4 _ Wind speed histogram values (percentages) for Izmir. State intervals

Actual

Predicted

0–1 1–2 2–3 3–4 4–5 5–6 6–7 7–8 8–9 9–10 10–11 11–12 12–13 13–14 14–15

1.2363 14.7436 19.3796 17.5366 15.5678 11.8933 9.5238 5.9066 2.2550 1.0646 0.4579 0.2747 0.0916 0.0229 0.0458

1.5682 14.1140 19.1735 17.9831 15.8768 12.2138 9.2376 5.6319 2.0375 1.2935 0.4808 0.2175 0.1145 0.0114 0.0458

classical Weibull distribution (continuous plots). Similar observa_ tions were made using the data from the Kayseri and Izmir regions as well.

6. Conclusions

0.2

0.1

0

Station

0

10

Wind Speed Intervals (m/s)

(a)

20

Actual

Kayseri _ Izmir Antalya

Predicted

k

c

k

c

1.9769 4.1608 3.3719

1.5563 1.8721 1.5952

2.0039 4.1708 3.3891

1.5758 1.8922 1.6128

exploits the total sample history of the recorded data. In order to be compatible with the predictor, integer samples are obtained by converting real valued wind speeds to wind state values by rounding. Because of the slow variations and cyclic patterns, the Mycielski predictor provides good prediction results in the sense that the prediction outputs and actual recordings match well with small prediction error. The model is tested on hourly data measured from _ the Kayseri, Izmir and Antalya regions of Turkey. The cities were deliberately selected as belonging to different geographical situations and wind regimes. In spite of the above differences in the wind regimes, accurate prediction results are obtained for all considered sites. Therefore, it is concluded that the model is robust to different behaviors of wind speed patterns. Experimental results also show that the model not only provides very consistent time variations in accordance with the actual measured data but also provides accurate distribution model parameters for estimating the wind power potential of a region. Because of the observed robustness, the model is expected to be easily adapted to any region to predict wind speed from recorded history.

Probability Density

Probability Density

In this paper, a novel short term wind speed modeling process is developed and presented. Unlike statistical models (i.e. Markov or distribution based models, which may fit only the global wind characteristics), the proposed method has time-wise variations that are very close to the actual recordings, and it is capable of very short term (next hour) prediction of the wind speed value. Needless to say, the method also provides a very good global characterization in terms of statistical parameters. In the process, the Mycielski prediction algorithm is adopted. The Mycielski predictor

Table 6 Weibull parameters obtained from measured and predicted data.

0.2

0.1

0

0

10

Wind Speed Intervals (m/s)

(b)

Fig. 12. Weibull distribution fitting results for (a) rounded measured and (b) predicted data.

20

F.O. Hocaog˘lu et al. / Energy Conversion and Management 50 (2009) 1436–1443

Acknowledgement The authors thank the Turkish State Meteorological Service (DMI) for supplying hourly wind speed data. References [1] Celik AN. A statistical analysis of wind power density based on the Weibull and Rayleigh models at the southern region of Turkey. Renew Energy 2004;29:593–604. [2] Hrayshat Eyad S. Wind resource assessment of the Jordanian southern region. Renew Energy 2007;32:1948–60. [3] Kavak, Akpinar E, Akpinar S. An assessment on seasonal analysis of wind energy characteristics and wind turbine characteristics. Energy Convers Manage 2005;46:1848–67. [4] Migoya E, Crespo A, Jiménez Á, García J, Manuel F. Wind energy resource assessment in Madrid region. Renew Energy 2007;32:1467–83. [5] Ngala GM, Alkali B, Aji MA. Viability of wind energy as a power generation source in Maiduguri, Borno state, Nigeria. Renew Energy 2007;32:2242–6. [6] Ozerdem B, Ozer S, Tosun M. Feasibility study of wind farms: a case study for _ Izmir, Turkey. J Wind Eng Ind Aerod 2006;94:725–43.

1443

[7] Hocaoglu FO, Kurban M. The effect of missing wind speed data on wind power estimation. Lect Notes Comput Sci 2007;4881:107–14. [8] Sahin AD, Sen Z. Order Markov chain approach to wind speed modeling. J Wind Eng Ind Aerodyn 2001;89:263–9. [9] Tore MC, Poggi P, Louche A. Markovian model for studying wind speed time series in Corsica. Int J Renew Energy Eng 2001;3:311–9. [10] Youcef Ettoumi F, Sauvageot H, Adane AHE. Statistical bivariate modeling of wind using first order Markov chain and Weibull distribution. Renew Energy 2003;28:1787–802. [11] Shamshad A, Bawadi MA, Wan Hussin WMA, Majid TA, Sanusi SAM. First and second order Markov chain models for synthetic generation of wind speed time m series. Energy 2005;30:693–708. [12] Hocaog˘lu FO, Gerek ÖN, Kurban M. The effect of Markov chain state size for synthetic wind speed generation. In: The tenth international conference on probabilistic methods applied to power systems (PMAPS2008), Rincón, Puerto Rico, May 25–29; 2008. [13] Ehrenfeucht A, Mycielski J. A Pseudorandom sequence–How random is it. American Mathematical Monthly 1992;99:373–5. [14] Fidan M, Gerek ÖN. A Time improvement over the Mycielski algorithm for predictive signal coding: Mycielski-78. In: Proceedings of 14th European signal processing conference EUSIPCO 2006, Florence; September 2006. [15] Jacquet P, Szpankowski W, Apostol I. A universal predictor based on pattern matching. IEEE Trans Inform Theory 2002;48:1462–72.