Energy 36 (2011) 4505e4517
Contents lists available at ScienceDirect
Energy journal homepage: www.elsevier.com/locate/energy
The effect of missing data on wind resource estimation Aidan Coville a, Afzal Siddiqui b, c, *, Klaus-Ole Vogstad d, e a
Development Impact Evaluation Initiative, The World Bank, Washington, DC, USA Department of Statistical Science, University College London, London, United Kingdom c Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden d Department of Wind & Site, Agder Energi, Kristiansand, Norway e Department of Industrial Economics and Technology Management, Norwegian University of Science and Technology, Trondheim, Norway b
a r t i c l e i n f o
a b s t r a c t
Article history: Received 4 September 2010 Received in revised form 18 February 2011 Accepted 26 March 2011 Available online 5 May 2011
Investment in renewable energy sources requires reliable data. However, meteorological datasets are often plagued by missing data, which can bias energy resource estimates if the missingness is systematic. We address this issue by considering the influence of missing data due to icing of equipment during the winter on the wind resource estimation for a potential wind turbine site in Norway. Using a meanreverting jump-diffusion (MRJD) process to model electricity prices, we also account for the impact on the expected revenue from a wind turbine constructed at the site. While missing data due to icing significantly bias the wind resource estimate downwards, their impact on revenue estimates is dampened because of volatile electricity spot prices. By contrast, with low-volatility electricity prices, the effect of missing data on revenue estimates is highly significant. The seasonality-based method we develop removes most of the bias in wind resource and revenue estimation caused by missing data. Ó 2011 Elsevier Ltd. All rights reserved.
Keywords: Wind power Missing data Investment analysis
1. Introduction Changes in energy policy to mitigate the effects of climate change have encouraged investment in renewable energy technologies ([1]). Consequently, the installed capacity of renewable energy technologies, such as wind turbines, has increased almost tenfold in the past decade from under 18 GW in 2000 to over 175 GW by mid-2010 ([2]). Although this is still less than 5% of the world’s installed power capacity, new wind capacity is being added at a higher rate than that of any other technology ([3]). While uncertainty in energy projects is common and has been analysed, e.g., via the real options approach ([4] and [5]), it has usually addressed stochastic electricity and fuel prices. By contrast, wind speeds and directions are subject to both long-term trends as well as highly localised phenomena, such as topographical attributes. Statistical methods based on generalised linear models (GLMs) account for such long-term trends while providing means to downscale the forecasts to regional levels ([6] and [7]). Related work in the wind sector includes [8], which uses firstand second-order Markov chains to model wind speeds in Malaysia. Both specifications are accurate in capturing expected wind speeds and distributions. However, they are unable to reflect the time* Corresponding author. Department of Statistical Science, University College London, London, United Kingdom. Tel.: þ44 207 6791871; fax: þ44 207 3834703. E-mail addresses:
[email protected] (A. Coville),
[email protected] (A. Siddiqui),
[email protected] (K.-O. Vogstad). 0360-5442/$ e see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.energy.2011.03.067
dependent nature of wind speeds due to the lack of a seasonal component. Moreover, there is neither out-of-sample forecasting nor consideration of missing data. The issue of missing data is tackled by [9] using a fuzzy inference algorithm that takes wind speeds and directions from sensors at different heights to interpolate missing data. The approach outperforms the standard wind shear coefficient one, but there is no consideration of revenues or on out-of-sample forecasts. Finally, measure-correlate-predict (MCP) methods use wind speeds and directions at a reference site in order to predict resources at a target site. A survey ([10]) of various MCP methods (ranging from simple linear regression to those that use ratios of the standard deviations of wind speeds) assesses prediction performance for a variety of metrics, e.g., mean wind speed, annual energy production, and wind speed/direction distributions. In spite of the wealth of literature on wind power, neither GLMs nor ad hoc statistical techniques have been applied extensively to problems faced by investors in wind farms. In this paper, we use the latter to address the problem of resource estimation. Unlike the work in the literature, we account for the impact of missing wind speed data on estimated revenues, which are the ultimate drivers of investment, and develop methods to correct for the missingness. For wind turbines with capacities of up to 3 MW, there are typically three planning stages. Initially, a permit is issued to give the right to build on a site. Next, the available wind resources are estimated via the installation of wind masts with anemometers and weather vanes in order to measure wind speed and direction, respectively. Finally, collected data are analysed together with
4506
A. Coville et al. / Energy 36 (2011) 4505e4517
Nomenclature
g1j, g2j dZt
I i j s T fWi ðwÞ Wi w
k g P(w) ri Et SEt Xt
lj
total number of wind sectors wind sector index season index season length number of periods per annum Weibull wind speed probability density function (PDF) for sector i wind speed random variable for sector i (m/s) wind speed observation (m/s) shape parameter of Weibull PDF scale parameter of Weibull PDF wind power output given wind speed w (kW) relative frequency of wind direction in sector i electricity price during period t (SEK/MWh) multiplicative seasonality component for the electricity price during period t non-seasonal component for the logarithm of the electricity price during period t usually modelled as a stochastic process angular frequency for season j
operating costs, electricity prices, subsidy levels, and investment costs for the purpose of making an investment decision. In this paper, we focus on the second stage of the planning process, viz., the collection of wind speed and direction data via exploratory wind masts, since such data are crucial to project valuation. However, many sites in northern Europe that are ideal for wind turbines due to their high wind speeds also exhibit sub-zero winter temperatures. As a result, anemometers are susceptible to freezing, which leaves gaps in the data. From the perspective of a wind turbine investor, such systematic missingness is likely to bias the wind resource availability, thereby affecting the value of the project. Although heated anemometers are available, they are currently too expensive to deploy widely and may still not provide a complete set of data. Hence, we develop ad hoc statistical procedures in this paper that exploit the underlying seasonality of the data in order to remove the bias in the wind resource availability due to missing data at a site in Norway. We make the following assumptions in our analysis. First, no long-term patterns in wind speed or electricity prices are considered. As such, we deal with a one-year time horizon when considering wind resource and revenue estimation. Second, a wind farm usually consists of a number of wind turbines across a site; however, we consider the case of only a single wind turbine with a rated capacity of 850 kW at the location where the wind speeds are observed. Revenue is assumed to be derived solely from the wind turbine, and no external sources of revenue such as government subsidies are considered. Finally, it is assumed that given the choice of wind turbine, investment and operating costs are constant over the year and are independent of revenue calculations. Costs are not explicitly accounted for in this paper, but, under the given assumption, they could be easily incorporated to estimate expected profits based on expected revenues. The structure of this paper is as follows: in Section 2, we survey tools for modelling wind speed, wind energy, and electricity spot prices. In Section 3, data used for the empirical analysis are considered in more detail. Section 4 outlines the models chosen to estimate wind speed, wind energy, and electricity spot prices, and Section 5 discusses the simulation procedure and its subsequent results. The paper concludes in Section 6 with a summary of the results and directions for future research.
X
h s et se Jt dqt
Dt Sw t vt x
a1, a2 dt b
amplitude of daily seasonality effect for season j increment to a standard Brownian motion process during period t mean-reversion level for the MR process rate of mean-reversion for the mean-reverting (MR) process volatility for the MR process disturbance term with normal (0, se) distribution during period t volatility of et size of jump in electricity price during period t increment to a Poisson process during period t for capturing the frequency of jumps in a MR jumpdiffusion (MRJD) process wind direction during period t multiplicative seasonality component for the wind speed during period t deseasonalised wind speed during period t (m/s) observed increment to deseasonalised wind speed (m/s) amplitudes for annual seasonality effect dummy variable for weekend during period t coefficient for weekend dummy variable
2. Modelling tools 2.1. Wind resource estimation Wind speed, W, typically follows a Rayleigh distribution, but it is further generalised to the Weibull distribution (of which Rayleigh is a special case) with the following probability density function (PDF):
fW ðwÞ ¼
k1 wk e g ;
k w g g
w 0:
(1)
This PDF captures the main components of wind speed: nonnegative observations with a positive skewness, thereby indicating the possibility of large wind speeds in the tail with a mean value larger than the median. The parameters k and g are referred to as the shape and scale parameters, respectively. Since wind speed and wind direction are dependent, wind direction is an important attribute of the wind regime that needs to be understood by developers as wind turbines are sited at optimal angles, perpendicular to the prevailing winds in the region, in order to avoid wake loss1 when multiple turbines are being constructed in close proximity. To deal with wind direction, which is a circular, continuous variable, wind direction data are aggregated into sectors and treated as categorical variables. The loss of accuracy from aggregation is far outweighed by the simplification of the resulting analysis. Thus, we aggregate wind direction into twelve sectors of 30 each and then assume that the conditional PDF of the wind speeds in each sector is Weibull with distinct shape and scale parameters. The power available from wind is proportional to the wind speed cubed, meaning that small variations in an estimate of the wind speed result in very large deviations in the corresponding power output. This makes it important to measure wind speeds accurately. Furthermore, it is assumed that the relationship between the wind speed and the resulting power output from a given wind turbine is described by a deterministic power curve.
1 Wake loss, in this case, describes the loss of exploitable power from wind due to the turbulence created from wind passing over another turbine.
A. Coville et al. / Energy 36 (2011) 4505e4517
While each turbine model will have different characteristics, all power curves follow a similar pattern. There is no power output at wind speeds below a certain cut-in point (usually about 4 m/s), and the curve then increases in proportion to the cubic value of the wind speed up to a point, increasing at a decreasing rate thereafter. The rated power level is the point at which the curve reaches its maximum, usually between 12 and 15 m/s. The turbine is then controlled such that the power output does not exceed this level even as wind speeds increase. For safety reasons, there is a cut-off point around 25 m/s at which point the turbine is shut down. Fig. 1(a) shows the typical power curve of a Vestas V52 wind turbine with a rated capacity of 850 kW. Ref. [11] addresses the fact that these power curves are only estimates of the true relationship between power output and wind speeds and can be affected by site conditions. It accounts for this by modelling power output as a stochastic process; however, this is beyond the scope of our paper. Instead, the reference power curve in this paper is defined in Equation (2) and graphed in Fig. 1(b):
8 < 0:5w3 21:4375 PðwÞ ¼ 850 : 0
3:5 w < 12 12 w < 25 : otherwise
(2)
When the power curve, P(w), the Weibull PDF for each sector, fWi ðwÞ, and the relative frequency of wind directions, ri, for each of the I sectors are known, the annual energy production (AEP) is calculated as follows:
AEP ¼ T
I X
Z ri
fWi ðwÞPðwÞdw
(3)
i¼1
The AEP is generally measured in MWh per year, where T is the number of hours in a standard year. In this form, the AEP does not explicitly consider seasonality since there is no time component to the calculation. We address this in Section 4.3 by modelling the seasonal component of the wind speed directly and retaining the time dimension of the data.
b 800 600
Rated Power Cut−in
0
5
Simplified Estimate
15
20
Icing of instruments is a common problem for wind turbine developers trying to obtain an initial dataset from which to estimate the wind resource in a region. Ref. [12] argues that in areas of North America, icing can be expected up to 10% of the time at high altitudes. As wind speeds tend to increase in winter, these missing data imply that the wind resource may be underestimated by fitting incorrect parameters to the Weibull PDFs. Ref. [13] presents a method for modelling the likelihood of instrument icing using altitude, wind speed, cloud cover, and temperature as influential factors. Since it assumes that icing will occur in the coldest periods of the year, we similarly replicate the effects of icing by removing the observations corresponding to the lowest 2%, 5%, and 10% of temperatures. 2.3. Electricity spot prices The three defining features of electricity price processes are seasonality, mean-reversion, and occasional spikes. Electricity markets often exhibit strong seasonality components dictated by changes in demand throughout the year. The electricity price, Et, can be decomposed into a deterministic seasonal component and a random process as follows:
Et ¼ SEt eXt
(4)
where {Xt, t 0} is a stochastic process and the seasonality component, SEt , is multiplicative. By modelling the logarithm of electricity prices, it can be separated from the random component:
log ðEt =SEt Þ ¼ Xt
(5)
In practice, there are a number of ways in which the seasonal component can be estimated using nonlinear regression. Ref. [14] posits that the annual seasonality in the Nord Pool Elspot market can be reflected via a sinusoidal function, while the weekly seasonality can be removed by using a dummy variable for weekends. Ref. [15] uses wavelet decomposition to break the time series into small and large frequencies before estimating the resulting cycles using a sine function, while [16] uses Fourier transforms to estimate the annual cycle of the deregulated UK market. These papers do not model intra-daily patterns and opt to use a single day as the data resolution level, aggregating the hourly observations accordingly. Ref. [17] removes seasonality by using sine and cosine functions such that: ½s=2 X
g1j cos lj t þ g2j sin lj t ;
t ¼ 1; 2; .; T
(6)
200
400
j¼1
Cut−out
10
2.2. Icing
St ¼
0
200
400
Power Output (kW)
800 600
Vestas V52 Wind Turbine
0
Power Output (kW)
a
4507
25
Wind Speed (m/s) Fig. 1. Power curves.
0
5
10
15
20
25
Wind Speed (m/s)
where s is the number of time steps over which the seasonal cycle occurs and lj ¼ 2pj/s. When s is an odd number, s/2 is rounded down. In this paper, we follow a similar method to that proposed in [14] to account for yearly and weekly seasonality. In order to address daily seasonality, we follow the method proposed by [17]. In representing the stochastic component of the electricity price, we have chosen both mean-reverting (MR) and meanreverting jump-diffusion (MRJD) models for the deseasonalised logarithm of the electricity price. The former assumes that {Xt, t 0} reverts to a long-term mean at a given rate with volatility around it. While it is posited to be an appropriate model for commodities ([18]), it is unable to capture the spikes that occur in electricity markets for a variety of reasons, e.g., lack of storage, transmission constraints, and fixed short-run generation capacity. For this reason, the MR model is augmented with jumps that arrive
4508
A. Coville et al. / Energy 36 (2011) 4505e4517
according to another stochastic process reflecting a relative frequency of occurrence and jump sizes that follow a lognormal distribution. This MRJD model replicates electricity price behaviour more accurately ([19]). Full details of both models are in the Appendix. 3. Data Dalane Vind, a Norwegian wind power developer and a subsidiary of Agder Energi, has installed three wind masts within a few kilometres of each other in southern Norway with the intention of estimating the wind resource in the region. For each of the three masts, Geitvassfjellet, Holmavatnet, and Leksarvatnet, data are recorded continuously but reported as 10-min interval averages. Average temperature, wind speed, wind direction, and their corresponding variances have been recorded at heights of 48 m and 30 m for a period of two years. These datasets suffer from a small amount of missing data, but Geitvassfjellet at 48 m was chosen as the reference mast for this paper with just over 1% of its data missing. The analysis uses data from the time period 15 May 2006e14 May 2007, where the second year of observations (15 May 2007e14 May 2008) is used as an out-of-sample dataset with which to compare the results obtained using the first year’s data. The Nord Pool electricity spot price market is the oldest and most reliable market of its kind. Hourly data are available from 1996; however, the data used in this paper span the interval of 1 January 1999e14 May 2008 and were obtained from the Nord Pool FTP server in Swedish kronor (SEK) per MWh. 4. Model fitting The model fitting and comparison methodology in this paper follows two distinct paths. Alternative electricity spot price models are considered first, and three imputation methods for calculating the AEP for different levels of missing data in wind speeds are proposed and compared with the case in which missing data are ignored. Revenue estimates are then obtained for each method based on energy production and electricity price simulations. This methodology is summarised as a flowchart in Fig. 2. 4.1. Electricity spot price Initially, seasonality is removed from the time series as Step A.2 in the methodology flowchart. Through descriptive statistics and observational analysis, the nature of the daily, weekly, and
Obtain electricity spot price data
Step A.1
Deseasonalise spot price data
Model spot price using MR and MRJD processes
Step A.2
Step A.3
Obtain relevant wind data
Artificially remove data to mimic icing
Calculate AEP for each level of missing data and obtain bias estimates
Step B.1
Step B.2
Step B.3
Obtain revenue estimates and compare Step C
Apply seasonality model with 10% missing data Step B.3.1
Fig. 2. Methodology flowchart.
annual seasonality is interpreted. Based on this, the methods described in Section 2.3 are used to remove the deterministic seasonal component from the observed data, allowing the deseasonalised spot price data to remain. The seasonal component is then added back to the data after the diffusion process has been estimated. This method of deseasonalisation is applied to both MR and MRJD models. Step A.3 of the methodology involves fitting the MR and MRJD models to the deseasonalised logarithm of the Nord Pool electricity spot price using Equations (A-1) and (A-3), respectively. While Equation (A-2) is used to estimate parameters for the MR model, we apply [16], [19], and [20] to obtain MRJD parameters. Specifically, spikes are first identified in the original time series by differencing the deseasonalised logarithm of the spot price and flagging all differences exceeding three standard deviations. These values are filtered from the time series, and the process is repeated until the number of differences of size three standard deviations or larger is that to be expected under the assumption that the differences were normally distributed. An MR process is then fitted to the remaining data, and the observations flagged as spikes are modelled separately under the assumption that the spikes are independent of the diffusion process. A shifted lognormal PDF is fitted to the jump sizes using maximum likelihood estimation (MLE) to determine the parameters of the PDF. Although a rather ad hoc method, it explicitly addresses the fact that the increments in the deseasonalised logarithm of the spot price are not normally distributed, as assumed in the MR model, and accounts for the presence of extreme jumps away from the mean. In this case, the timing of the jumps is assumed to follow a Poisson process, where the cumulative number of jumps expected at each time point is Poisson distributed and the waiting time until the next jump is exponentially distributed. This assumption ignores the fact that jumps are more likely to occur in close proximity to each other, which would contradict the memoryless property of the exponential distribution. Either a regime-switching mechanism or a more appropriate renewal process could be incorporated into the model to account for this; however, that is beyond the scope of our paper. Using 1 h as the time interval, 1000 yearly sample paths are then simulated based on the fitted models and discounted to present value using a continuous interest rate of 10%. The seasonality component is reapplied, and these sample paths are then used in conjunction with the simulated energy output to estimate the PDF of present value revenue. 4.2. Wind speed and direction Using the Geitvassfjellet mast data for wind speed and direction, 2%, 5%, and 10% of the data are artificially removed from the available set as Step B.2 of the methodology. Next, Step B.3 calculates the AEP for each level of missing data. Initially, Weibull PDFs are fitted to each of the twelve directional sectors using the available data and MLE to estimate the shape and scale parameters. The AEP is then calculated using Equation (3) for each level of missing data. In order to assess the significance of these differences, Monte Carlo simulation is used to obtain confidence intervals for the point estimates. For each of the T ¼ 8760 time steps, the directional sector is first chosen at random, based on the relative frequencies, ri, observed in the dataset. A random value is then generated from the fitted Weibull PDF corresponding to the chosen sector, thus providing the wind speed for each point in time. These wind speeds are then converted to power using the chosen power curve to provide the energy yield for each hour of the year. This process is repeated 1000 times to obtain a PDF for the AEP from which inferences can be made. As Step C of the methodology, the 1000 simulated wind speed sample paths are then multiplied by the
A. Coville et al. / Energy 36 (2011) 4505e4517
corresponding 1000 simulated electricity prices and summed over the year to obtain an estimated yearly revenue PDF based on the current wind regime and electricity spot price market. This allows one to assess the impact of missing data on AEP estimates in the context of electricity price uncertainty. In Step B.3.1, we remedy the observed bias using three methods. The first, rudimentary method, persistence, assumes the most recently observed data point will be the best estimate for the missing data point. The second method, regression, uses the extra site masts (Holmavatnet and Leksarvatnet) and other meteorological data as covariates for wind speed and direction at Geitvassfjellet. As our numerical examples in Section 5.3 will indicate, these two approaches are unable to remove the bias in the AEP estimate. Consequently, we turn to a third option, a seasonality method, which removes the seasonality present in both the wind speed and direction before fitting the Weibull PDFs to each directional sector. The ten-minutely data are then simulated based on these newly fitted Weibull PDFs to obtain a year’s worth of observations, and the seasonality is reapplied to the resulting time series. The full details of this simulation procedure follow in Section 4.3.
4.3. Seasonality model Bias occurs from icing-induced missingness because wind speeds are generally higher in the winter than in the summer. By removing this seasonality before estimating the wind regime, it is hypothesised that the effect of the systematic missingness will be dampened. In theory, if all of the observed bias is due to a seasonality effect that can be captured by statistical methods, then it should be possible to remove this bias completely. In practice, seasonality estimates will differ based on the data available. There is no guarantee that the deterministic seasonality component estimated with large portions of missing data in the colder months will accurately capture the true seasonality present at the site of interest. The problem is further compounded by the fact that seasonality is present in both wind speed and wind direction, which are dependent variables. A flowchart of the seasonality approach is presented in Fig. 3. 4.3.1. Seasonality of wind direction Because direction is a circular variable, standard descriptive tools such as the autocorrelation function (ACF) used to assess the nature of the seasonality component cannot be applied here. Instead, wind direction is assigned to one of twelve sectors and treated as a categorical variable. By constructing a wind rose (a graphical tool used to describe the joint distribution of wind speed and direction at a site) for each hour of the day as well as for each month of the year, we find that wind direction has definite annual and daily seasonal components. The wind roses do not Fit wind direction using a Markov model with six transition matrices to account for seasonality
Deseasonalise wind speed using robust regression
Fit Weibull distributions to each of the twelve directional sectors Simulate wind direction and wind speed on one-year time horizon
Convert to power and estimate AEP
Re-apply seasonality
Re-order wind speeds using sorting algorithm
Fig. 3. Flowchart of seasonality approach.
4509
present a clear cut-off point; however, it is noticeable that evening wind directions differ from those during the day. Monthly wind direction patterns seem to change between winter (NovembereJanuary), summer (AprileAugust) and transition (February, March, September, October) months, where wind roses are similar within these categories but differ quite markedly across categories. We assume that wind direction follows a Markov process, whereby the future wind directional sector, Dtþ1, is independent of past values, given the present direction, Dt ¼ it:
PrðDtþ1 ¼ ijDt ¼ it ; Dt1 ¼ it1 ; .; D1 ¼ i1 Þ ¼ PrðDtþ1 ¼ ijDt ¼ it Þ c t; i (7) By making this assumption, it is possible to estimate the transition probabilities for a discrete-time, discrete-space Markov chain based on the observed transitions across each of the twelve directional sectors. In this case, I2 ¼ 122 ¼ 144 transition probabilities will be estimated. To account for the daily and seasonal differences in transition probabilities, these probabilities are estimated for six Markov chains corresponding to each of the possible combinations of day/ night and summer/winter/transition months. Wind directions are then simulated over an entire year by choosing a starting sector proportional to the observed relative frequency of being in that sector. The simulations are then generated using the transition matrices estimated from the observed data and switching between each of the six transition matrices based on the time of day and month in the year. This method captures both the first-order autocorrelation and seasonality experienced in the wind direction observations. 4.3.2. Seasonality of wind speed Wind speed seasonality is easier to capture and can rely on some of the methodologies used to deseasonalise the electricity spot price data. It is assumed that the seasonality component, Sw t , is multiplicative to ensure that wind speeds, Wt, are always positive and can be described by:
Wt ¼ SW t vt
or
Wt ¼ vt SW t
(8)
where vt represents the deseasonalised wind speed. By taking logarithms to make the equation additive and using robust regression, the seasonality component can be estimated. This regression model uses a sine and cosine function to account for yearly seasonality and also to explain the daily pattern. To account for the fact that the daily variability can change over the year, second-order interactions of these functions are also included. After estimating the coefficients using robust regression, the logarithm of the seasonality component is obtained. The estimated function is exponentiated to obtain Sw t , and the observed wind speeds are divided by Sw t to obtain vt. Using these deseasonalised wind speeds, the Weibull PDFs are estimated as usual. Fig. 4 shows the wind speed ACF before and after deseasonalisation, thereby indicating the adequacy of this method in removing both the daily and yearly seasonality components. 4.3.3. Simulations Based on the fitted Weibull PDFs and directional transition matrices, the wind direction sectors for each time point in the year (52,560 annually with a 10-min resolution) are simulated. Wind speeds are then generated for each time point based on the Weibull PDF corresponding to the simulated directional sector. One thousand such sample paths are generated, and the seasonality component is reapplied in order to estimate the AEP. Before the AEP estimation is conducted, however, an algorithm
4510
A. Coville et al. / Energy 36 (2011) 4505e4517
Deseasonalised Data − 10 Days
0.6 0.2
0.4
ACF
0.6 0.4 0.0
0.0
0.2
ACF
0.8
0.8
1.0
1.0
Original Data − 10 Days
0
200
600
1000
1400
0
200
Lag
1000
1400
Lag
Deseasonalised Data − 1 Year
0.4 0.0
0.2
ACF
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Original Data − 1 Year
ACF
600
0
10000
30000
50000
0
10000
Lag
30000
50000
Lag Fig. 4. Deseasonalising wind speed - ACF.
is used to reorder the simulated data in such a way that the underlying autocorrelation structure of the observed wind speed is replicated. 4.3.4. Sorting algorithm By generating a random variable from a Weibull PDF for each point in time in order to simulate a sample path for wind speeds over a year, one can ensure that the resulting sample path will have Weibull-distributed wind speeds. In the process, however, all autocorrelation present in real wind speed data is lost since wind speeds at each time point are independent of the last. When looking to reproduce annual characteristics of wind speed, this
method is fine. If, however, the time dependency of the sample path is important, as is the case when considering the seasonality effect, then this feature becomes a problem. In order to reproduce the autocorrelation structure of wind speed over time but still maintain the overall characteristics of the wind regime, viz., the wind speed and direction distributions, a sorting algorithm is proposed. The sample path is first generated using the fitted Weibull PDFs and wind directions in order to maintain distributional properties. It is then assumed that the autocorrelation in wind direction is fully accounted for through the Markov process used, and, as such, the simulated wind directions are not altered. A PDF is then fitted to the
Fig. 5. Example of sorting algorithm.
A. Coville et al. / Energy 36 (2011) 4505e4517
Simulated Data − 10 Days
0.8 0.6 0.0
0.0
0.2
0.4
ACF
0.6 0.4 0.2
ACF
0.8
1.0
1.0
Observed Data − 10 Days
0
200
600
1000
1400
0
200
Lag
10000
30000
1000
1400
50000
0.0 0.2 0.4 0.6 0.8 1.0
Simulated Data − 1 Year
ACF
0.0 0.2 0.4 0.6 0.8 1.0 0
600
Lag
Observed Data − 1 Year
ACF
4511
0
10000
Lag
30000
50000
Lag
Fig. 6. Simulated vs. observed wind speed ACF.
increments observed in the deseasonalised wind speeds. As it turns out, the empirical PDF of increments is highly symmetric around zero with a median of zero and a mean of 0.0001. The skewness is estimated to be 0.217, and the absolute value of these increments can be estimated by a Weibull PDF with shape and scale parameters of 1.05 and 0.59, respectively. This PDF is then used to dictate the change in wind speed from one time point to the next. A numerical example of this algorithm is presented in Fig. 5. The result of this algorithm is that the increments in wind speeds from the simulation now roughly follow the same PDF as that of the observed data. The fitted Weibull PDF for a random simulation has a shape parameter of 0.89 and a scale parameter of 0.61. The smaller shape parameter indicates that the simulated PDF has a slightly larger spread than the empirical one. Although the algorithm succeeds in retaining the wind speed autocorrelation to an extent, it does exhibit a limitation. Since only the closest value available is chosen for the next time step, the incremental PDF is not followed exactly. In particular, problems occur at very high and very low wind speed levels since values never go below zero, and there are very few choices at extremely high wind speeds. In reality, it would be expected that increment sizes are dependent on the actual wind speed. This is seen trivially by the fact that decrements cannot be larger in absolute value than the current wind speed without producing negative values. Another problem that occurs is that as the algorithm proceeds and there are fewer simulated values from which to choose, the resulting choices slowly become far from ideal. While these factors hinder the algorithm’s ability to ensure that the simulated increment PDF stays true to the empirical one, they do not influence the overall wind speed and direction PDFs as the simulated values are not actually changed but only reordered. By accounting for the seasonality in wind speed and using the sorting algorithm discussed, the daily and annual autocorrelation is captured quite well as seen in Fig. 6.
5. Simulations and results 5.1. Electricity spot price Fig. 7 shows the hourly, daily, and monthly averages of the available data, which gives a clear indication of the seasonality present. The annual seasonality can be roughly described by a sine or cosine function, and there is a clear dip in prices during weekends, which are considered off-peak times. The daily seasonality also exhibit’s a clear pattern, but this pattern cannot be described by a single sine or cosine function. In this case, there is a bimodal shape with a commercial peak at 10:00 and a domestic peak at about 19:00. The average price drops sharply during off-peak hours when consumers go to sleep and demand drops. In order to account for these three components of seasonality, a robust regression is performed including twelve sine and twelve cosine functions to account for the daily seasonality as proposed by [17] as well as a dummy variable to indicate a weekend and a sine and cosine function to account for the yearly seasonality similar to [14]. In total, 27 variables are used to address the data’s seasonality, and the logarithm of electricity prices, Et, is described by the following equation:
2p 2p t þ a2 cos t þ bdt 8760 8760 12 X 2pj 2pj g1j cos þ t þ g2j sin t þ Xt ; t 24 24 j¼1
logðEt Þ ¼ a1 sin
¼ 1; 2; .; T
(9)
where a1 and a2 are coefficient estimates for the yearly seasonal process, dt is the weekend dummy variable, and b is the
4512
A. Coville et al. / Energy 36 (2011) 4505e4517
Fig. 7. Spot price seasonality.
corresponding estimated coefficient. The bracketed term in the summation reflects daily seasonality, and Xt is the error term of the regression, which corresponds to the deseasonalised logarithm of the electricity price process in Equation (5). The seasonality component is stored, and the error term of the regression is used for subsequent analyses. Using the deseasonalised process and Equation (A-2), the MR model yields h ¼ 0.006, X ¼ 5:27, and s ¼ 0.039 for estimated parameters in Equation (A-1). We then reapply the seasonality component and exponentiate to obtain the electricity spot price process. A sample path of this model is superimposed on the actual data in Fig. 8 to illustrate how well the model is able to describe the data. The mean and standard deviation of the simulated data are SEK 220.55/MWh and SEK 121/ MWh, respectively, which compares favourably with the corresponding mean of SEK 221.5/MWh and standard deviation of SEK 118.6/MWh calculated from the observed data. However, this model fails to capture some of the large spikes present in the observed data.
The MRJD model involves filtering the spikes and fitting a diffusion process to the spikeless data and an appropriate PDF to the size of the observed spikes. After filtering the data, 4098 observations are removed from the original time series. The intensity parameter of the Poisson PDF used to describe the arrival of jumps over time is estimated as the total number of jumps recorded divided by the total number of observations, which is 0.067. The PDF of spike sizes is slightly skewed with an estimated skewness of 0.4, a mean of 0.008, and a roughly equal number of upward and downward spikes. It is assumed that the true spike PDF is symmetric, thereby allowing for a simpler analysis by considering only the absolute value of the spikes. After taking the absolute value and shifting the data so that the smallest spike is of size zero, the PDF of absolute spike sizes is explained well by a truncated lognormal PDF with a truncation point of 1.2, where the parameters are estimated using MLE. Because the lognormal PDF has a longer tail than the empirical spike PDF, we use this
A. Coville et al. / Energy 36 (2011) 4505e4517
4513
Full Dataset N
15 % 10 % 5% W
E
S
Fig. 8. Sample path for the MR model.
truncation to ensure that unrealistically large spikes are not possible. An MR model is then fitted to the remaining data, and, by assuming the jump process is an independent Poisson process with truncated lognormal jumps, the fitted model yields h ¼ 0.00048, X ¼ 5:3, and s ¼ 0.015 for estimated parameters in Equation (A-3). Here, Jt w lognormal(3.37, 2.04) with truncation point 1.2, and dqt denotes the increments of a Poisson process with intensity parameter 0.067. The mean electricity price of the simulated data based on the MRJD model is SEK 233.8/MWh, while the standard deviation is SEK 140.2/MWh. Both of these values are overestimates of the moments calculated from the observed data. A sample path for the MRJD model is shown in Fig. 9.
Fig. 10. Wind rose.
5.2. Wind speed and direction Wind speeds are aggregated into manageable groups, and relative frequencies are displayed in a form of circular cumulative bar chart corresponding to the directional sectors of the observations (see Fig. 10). Formally, the Pearson correlation test yields a value of 0.1975 with a corresponding 95% confidence interval of [0.189,0.206], thereby indicating a highly significant correlation between wind speed and direction. 5.3. AEP and revenue estimation After artificially removing data, the resulting AEP calculations, based on the fitted Weibull PDF, are compared in Table 1. These AEP calculations confirm the hypothesis that systematic removal of data during cold periods biases the estimated AEP downwards with increasing bias as the percentage of missing data increases. In order to assess the significance of this bias, the Monte Carlo simulation described in Section 4.2 is used to estimate the AEP PDF under the assumption that the fitted Weibull PDFs accurately describe the wind resource at the site and are constant throughout time. This is a strong assumption to make based on only one year of data since, in reality, these are only estimates of the true underlying PDFs, and, when considering a long-term analysis, it should be acknowledged that the wind resource may change over time. As such, the true AEP variance will be larger to account for these sources of uncertainty. After simulating 1000 sample paths and obtaining the corresponding AEP for each path, the results are well approximated by the normal PDF as to be expected by the Central Limit Theorem.
Table 1 AEP Results.
Fig. 9. Sample path for the MRJD model.
Data Removed
AEP (MWh/year)
Bias (%)
None 2% 5% 10%
2641 2610 2559 2541
e 1.2 3.1 3.8
0.006 0.005 0.004
500
600
700
800
SEK in 1000s Fig. 12. Revenue PDF - MR Model.
5.4. Regression and persistence imputation results In order to address the bias induced by systematic missingness, the missing values in the time series are then imputed using both the persistence and regression methods described in Section 4.2, with wind speed and direction being imputed independently. Weibull PDFs are then fitted to each sector, and the AEP calculated, as illustrated in Fig. 13. The persistence method clearly underestimates the true AEP, while the regression method overestimates it. A plausible explanation for these opposing outcomes is that the persistence method uses the last available wind speed, which is always less than the missing wind speed since only the highest ones are removed. By contrast, the regression method overestimates the true AEP overall because any underestimation is insignificant relative to
0.012
Benchmark Persistence Regression
0.008 0.006 0.002
0.002
0.004
0.004
0.006
Density
0.008
0.010
0.010
0.012
0.003 0.002 0.001
400
Full 2% 5% 10%
0.000
0.000
Density
Full 2% 5% 10%
0.000
Fig. 11 provides a comparison of the approximated normal PDFs of the AEP to be expected for each of the missing data levels. This comparison highlights the significance of the bias induced by missing data in relation to the expected variability in the AEP. The estimated AEP decreases from 2641 MWh/year with the full dataset to 2541 MWh/year with 10% missing data. A value smaller than the point estimate of 2541 MWh/year found using the dataset with 10% missing observations would be expected with a 0.0004 probability if the true wind resource were described by the PDF estimated when using the full dataset at Geitvassfjellet, which is a highly significant bias. These simulated power output series are then multiplied by the simulated electricity spot prices to obtain estimates of the present value of revenue over a one-year time horizon. With the MR model, the resulting revenue PDFs have a small positive skewness due to the fact that electricity prices are bounded below by zero but are not bounded above. This results in a handful of very large simulated revenues. The MRJD model results in a higher revenue mean and variance along with a more pronounced skewness relative to the MR model. Using the full wind speed dataset and the MR model, the simulated revenue mean and standard deviation are SEK 553360 and SEK 57407, respectively. By contrast, these are SEK 585440 and SEK 277925, respectively, with the MRJD model. Most noticeable is the fact that the standard deviation of revenues when using the MRJD model is almost five times larger than that with the MR model. This indicates the importance of the right electricity price model and the degree of variability between models. As to be expected, comparison between revenue estimates based on the full dataset and the set with 10% artificially removed data shows smaller differences using the MRJD model than the MR model with respect to the variability expected. For illustration, results from only the MR model are presented in Fig. 12. In effect, the bias from missing data is overwhelmed by the variability experienced in the electricity spot price estimates. Although there is still, as expected, a downward shift in the revenue PDF as more data are removed, this shift is relatively small in comparison to the variability expected in revenue estimation. The mean of the revenue estimate with 10% missing data is SEK 532460 - a discrepancy of just over SEK 20000.
0.007
A. Coville et al. / Energy 36 (2011) 4505e4517
Density
4514
2400
2500
2600
MWh/year Fig. 11. AEP PDF.
2700
2800
2500
2600
2700
MWh/year Fig. 13. AEP PDF - treated data.
2800
0.005 0.004
400
500
600
700
800
SEK in 1000s Fig. 15. Revenue PDF treated with seasonality method - MR model.
Larger power output is then produced at times when electricity prices are higher, thereby shifting the revenue estimate upward. The standard deviation of revenues in the seasonal model is also higher as a result of accounting for the autocorrelation in wind speed. Via Monte Carlo simulation, we have shown that it is possible to obtain revenue estimates and confidence intervals for a wind turbine if we have an appropriate electricity spot price model and data to calculate the estimated AEP. Using the MR model and basing our AEP estimates on data up to 14 May 2007, we then use the outof-sample data for wind speeds and electricity prices over the period 15 May 2007e14 May 2008 as a basis for comparison. The revenue for the year, assuming an 850 kW wind turbine had been in place, is calculated (using actual wind speeds and electricity spot prices during the year) to be SEK 661650. If we consider the
Full 10%
0.03 0.00
0.000
0.01
0.002
0.02
0.004
Density
0.006
0.04
0.008
0.05
0.06
Full 10%
0.010
0.003 0.001
Once the ordered wind speed sample paths are obtained via the method of Section 4.3, the seasonality component is reapplied, the series is converted to energy yield, and the AEP PDF is estimated based on 1000 sample paths. Fig. 14 shows the AEP PDFs approximated by normal PDFs for both the full dataset and the dataset with 10% missing observations treated by the seasonality method. Strikingly, there is very little difference between the two PDFs indicating that the proposed seasonality model is successful in restoring the original AEP PDF even when this estimate is based on a large amount of missing data. The mean of 2640 MWh/year for the full dataset is very close to the AEP obtained analytically using Equation (3), thereby confirming that the process itself does not introduce any bias. The mean AEP of 2642 MWh/year found using the dataset with the 10% missing observations imputed via the seasonality approach indicates a very slight, but insignificant, overestimate of the true AEP. The standard deviation is also affected and is slightly underestimated in this case. Corresponding revenue estimates are then calculated using the MR model and summarised in Fig. 15. The estimate of the revenue mean for the full dataset is SEK 15000 larger than the estimate when not accounting for the seasonality component in the wind speed in Section 5.2. This follows from the fact that electricity prices and wind speeds follow similar seasonal patterns with increases during the day and in winter.
0.000
5.5. AEP and revenue calculations with seasonality adjustment
Density
Full 10%
0.002
Density
overestimation due to the cubing of the wind speed in calculating power output. While the Pearson correlation coefficient between the wind speed and electricity spot price processes using the raw data is 0.04 with the 95% confidence interval [0.02, 0.06], this significant (but small) correlation may pose a problem when modelling these two processes independently. This dependence, however, seems to be induced purely because of the common seasonal components underlying both processes. Since the Pearson correlation coefficient estimate of the correlation between the deseasonalised logarithm of the electricity price and deseasonalised wind speed is not significantly different from zero, i.e., 0.002 with the 95% confidence interval [0.005, 0.001], this indicates that the two deseasonalised processes can be modelled independently, as is done in Section 4.3.
4515
0.006
A. Coville et al. / Energy 36 (2011) 4505e4517
2500
2550
2600
2650
2700
MWh/year Fig. 14. AEP PDF treated with seasonality method.
2750
2800
500
520
540
560
SEK in 1000s Fig. 16. Revenue PDF with low-volatility electricity price model.
580
4516
A. Coville et al. / Energy 36 (2011) 4505e4517
0.02 0.00
0.01
Density
0.03
0.04
Full 10%
520
540
560
580
600
SEK in 1000s Fig. 17. Revenue PDF using low-volatility electricity prices and treated with the seasonality method.
confidence interval derived from the dataset with 10% missing data but with seasonality-based imputations, then the out-of-sample mean lies within a 90% confidence interval of the estimated revenue PDF and has a two-tailed p-value of 0.14. In comparison, using the revenue PDF of the data with 10% of artificially missing values and not considering any imputation before estimating the confidence interval (but still using the MR model), we find the outof-sample estimate lies outside of the 95% confidence interval with a two-tailed p-value of 0.025. Thus, by not accounting for systematic missing data, confidence intervals for expected revenues may fail to represent the true revenue range for a given year. When incorporating seasonality to account for this missingness, the mean and standard deviation of the revenue PDF are adjusted, and the associated confidence interval more accurately represents the true point estimate and associated variability that a wind turbine investor may face. The revenue confidence interval will clearly be influenced by the spot price model that is chosen. In a high-volatility market, the confidence intervals for revenue estimates will be very wide, making it less likely that systematic missingness will bias results enough to affect substantially whether or not a point estimate lies within a given confidence interval. However, when investors face a low-volatility environment for electricity spot prices, the corresponding confidence intervals will decrease, and the bias induced by systematic missingness in wind speeds could have a major affect on estimates. If, for instance, the electricity spot price model were chosen to be dXt ¼ 0.006(5.27Xt)dt þ 0.015dZt, then the conclusions regarding the influence of AEP bias on revenue estimation would change. Here, the model uses the volatility estimate of 0.015 - the estimate obtained when considering the MRJD process. In this case, however, the spikes are not included in the model, which affects the revenue PDF significantly (see Fig. 16). Using the lowvolatility electricity model but applying this seasonality approach, the effect of the missing data bias is nearly removed from the revenue estimates (see Fig. 17). 6. Conclusions Investment in wind power by the private sector is essential if countries hope to reach their renewable energy production goals.
This requires accurate estimates of a wind turbine’s revenue PDFs, knowledge that is also needed to inform government policy in setting subsidies. First and foremost in the procedure, however, is the confidence needed by stakeholders that revenue estimates are accurate and reliable. In this paper, missing data due to icing are shown to bias wind resource estimates downwards by up to 3.8%. This, in turn, results in a biased revenue estimate, which is relatively minor in comparison to the high-volatility nature of the Nord Pool electricity spot price market. However, if an investor either is assured of a fixed-price, feed-in tariff or experiences a lowvolatility price regime for electricity, then the effect of missing data needs to be remedied. By modelling wind and electricity seasonality, the method proposed in this paper not only drastically reduces the bias induced by missing data but also increases the expected revenue estimates and variance thereof. This is a result of the common seasonality patterns experienced in both wind and electricity prices and points to the importance of incorporating a time dimension into revenue calculations when the processes underlying these calculations are affected by seasonal fluctuations. The model is not without its limitations and still requires further development. For example, a better understanding of long-term variability of wind speeds is needed since it is not guaranteed that the true underlying wind resource will remain constant over the lifetime of a wind farm. Furthermore, a more refined analysis of electricity spot prices incorporating regime-switching behaviour or stochastic volatility would also give stronger weight to the revenue analysis in this paper. Finally, a real options approach to account for the discretion to defer investment or to choose its scale would be more realistic. Acknowledgments We acknowledge Dalane Vind for providing us with wind data, Nord Pool for providing us with electricity price data, and Agder Energi for valuable discussions with their wind energy group. Siddiqui acknowledges the support of the ELDEV project in Trondheim, Norway. The authors are grateful for feedback from the participants of the 2009 ELDEV Winter Workshop (11e12 February 2009 in Trondheim, Norway), the 2009 IAEE International Conference (21e24 June 2009, San Francisco, CA, USA), and the 2009 IAEE European Conference (7e10 September 2009, Vienna, Austria). We are also appreciative of the comments from the editor and two anonymous referees in improving the draft of this paper. All remaining errors are the authors’ own. Appendix After removing the seasonality component, we use the OrnsteineUhlenbeck model to capture the MR properties of the process {Xt, t 0} that drives electricity prices:
dXt ¼ hðX Xt Þdt þ sdZt
(A-1)
where dZt is a Brownian motion increment, X is the meanreversion level in the long term, h is the speed at which the process reverts to the long-term mean, and s is the volatility. [18] shows that this process in its discrete form is equivalent to an autoregressive model of lag 1 - AR(1):
Xt Xt1 ¼ a þ bXt1 þ et
(A-2)
Here, et is normally distributed with mean zero and standard deviation se and the parameters a and b can be estimated by simple linear regression. Although the simple MR model captures the basic attributes of electricity spot prices, it fails to describe the inherent
A. Coville et al. / Energy 36 (2011) 4505e4517
spiky nature of the market. By visual inspection of historical data, it is clear that large spikes occur more frequently than one would expect if the changes in electricity prices were truly explained by a normal distribution, as implied by Equation (A-1). By incorporating a jump process, we account for random spikes that would not be predicted by the MR model. These jumps are captured by a Poisson process, independent of the MR process. More formally, the MRJD model is:
dXt ¼ hðX Xt Þdt þ sdZt þ Jt dqt
(A-3)
where Jt is the size of the jump, determined by a chosen distribution, and dqt follows a Poisson process with an intensity parameter indicating the frequency of these jumps. ([15]) shows that, by choosing an appropriate method to remove spikes from the observed data, the remaining observations are much better suited to the assumption of normally distributed increments. References [1] IPCC. “Climate Change 2007: the physical science Basis”, In: Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, Cambridge University Press, Cambridge, UK. 2007. [2] World Wind Energy Association. “16 Gigawatt of Wind Power Added in First Half of 2010”. Press Release; Bonn, Germany; 2011. http://www.wwindea.org. [3] EIA. International Energy Outlook 2009, US Department of Energy Report DOE/EIA-0484(2009). Washington, DC, USA, 2009. [4] Deng S-J, Johnson B, Sogomonian A. “Exotic electricity options and the valuation of electricity generation and transmission assets”. Decision Support Systems 2001;30(3):383e92. [5] Fleten S-E, Maribu KM, Wangensteen I. “Optimal investment strategies in decentralized renewable power generation under uncertainty”. Energy 2007; 32(5):803e15.
4517
[6] Yan Z, Bate S, Chandler RE, Isham V, Wheather H. “An analysis of daily maximum wind speed in Northwestern Europe using generalized linear models”. Journal of Climate 2002;15(15):2073e88. [7] Yan Z, Bate S, Chandler RE, Isham V, Wheather H. “Changes in extreme wind speeds in NW Europe simulated by generalized linear models”. Theoretical and Applied Climatology 2006;83(1e4):121e37. [8] Shamshad A, Bawadi MA, Wan Hussin WMA, Majid TA, Sanusi SAM. “First and second order Markov chain models for synthetic generation of wind speed time series”. Energy 2005;30(5):693e708. [9] Yang Z, Liu Y, Li C. “Interpolation of missing wind data based on ANFIS”. Renewable Energy 2011;36(3):993e8. [10] Rogers AL, Rogers JW, Manwell JF. “Comparison of the performance of Four measure-correlate-predict algorithms”. Journal of Wind Engineering and Industrial Thermodynamics 2005;93:243e64. [11] Anahua E, Barth S, Peinke J. “Markovian power curves for wind turbines”. Wind Energy 2007;11(3):219e32. [12] Lacroix, A and JF Manwell. “Wind Energy: Cold Weather Issues,” technical report, Renewable Energy Research Laboratory, University of Massachusetts at Amherst, Amherst, MA, USA, 2000. [13] Harstveit, K, L Tallhaug, and A Fidje. “Modelling and Measuring Atmospheric Icing at a Coastal Mountain in Norway”, A study supported by the Norwegian State Power Board (Statkraft), 2004. [14] Lucia JJ, Schwartz E. “Electricity prices and power derivatives: evidence from the nordic power exchange”. Review of Derivatives Research 2002;5:5e50. [15] Weron R, Bierbrauer M, Trück S. “Modeling electricity prices: jump diffusion and regime switching”. Physica A: Statistical and Theoretical Physics 2004; 336(1e2):39e48. [16] Cartea A, Figueroa MC. “Pricing in electricity markets: a mean reverting jump diffusion model with stationarity”. Applied Mathematical Finance 2005;12(4): 313e35. [17] Harvey AC. Forecasting structural time series models and the Kalman filter. Cambridge, UK: Cambridge University Press; 1989. [18] Dixit AK, Pindyck RS. Investment under uncertainty. Princeton, NJ, USA: Princeton University Press; 1994. [19] Blanco, C and D Soronow. “Jump Diffusion Processes - Energy Price Processes Used for Derivatives Pricing & Risk Management,” Commodities Now, 2001. September: 83e87. [20] Bierbrauer M, Trück S, Weron R. “Modeling electricity prices with regime switching models,” in computational science - ICCS 2004. Berlin, Germany: Springer; 2004.