Journal of Wind Engineering & Industrial Aerodynamics 164 (2017) 174–178
Contents lists available at ScienceDirect
Journal of Wind Engineering and Industrial Aerodynamics journal homepage: www.elsevier.com/locate/jweia
Discussion
Discussion of “The annual rate of independent events for the analysis of extreme wind speed” By Alessio Torrielli, Maria Pia Repetto & Giovanni Solari
MARK
R. Ian Harris RWDI, Tilers Road, Milton Keynes MK11 3LH, UK
This discusser is delighted to see that these authors have taken up the theme introduced in Harris (2014) because the variation of the rate parameter with wind speed for a correlated variate has serious implications for the extreme value analysis of mean wind speeds. As the discusser is UK-based, this contribution is couched in terms of hourly means, but is equally applicable to means taken over any averaging period in the WMO range, including the 10-min means used by the authors. Since Harris (2014) is the starting point of the authors’ work, before proceeding, the discusser would like to correct two statements made in the authors’ paper concerning this paper. Firstly all the data was synthesised to represent a generic data set with a Rayleigh parent. No attempt was made to synthesise the wind regime at Boscombe Down. The philosophy behind this was that any EV method which could not cope with such a simple data set would stand no chance of dealing with real wind data. The data set also required a choice of the functional form for the auto-correlation and its time scale. These were based on the values obtained from Boscombe Down data in Harris (2008) and the subsequent important additional information contributed in discussion by Baker (2010). The choices were the Von Karman (1948) functional form and 22.15 h. Secondly, the synthesis of the Rayleigh data set in Harris (2014) was carried out by a completely different method from the authors, which produces perfect matching of both the parent distribution and the power spectrum or auto-correlation. This proceeds by the generation of two identically correlated N(0,1) time series, x(t) and y(t), independent of one another, which when regarded as orthogonal vector components, produce as the resultant modulus a correlated time series with a Rayleigh parent given by:
P (V ) = [1–exp (−V 2 /2)] [equivalent to a Forward Weibull distribution of index 2 and scale parameter √2]. It was convenient to leave this small scale parameter value unmodified, hence the small values which appear in the paper. To convert the values to those appropriate for a site with an annual mean wind speed of K knots, these values can be re-scaled by multiplying by
E-mail address:
[email protected]. http://dx.doi.org/10.1016/j.jweia.2017.01.015 Received 19 September 2016Accepted 28 January 2017 Available online 27 February 2017 0167-6105/ © 2017 Elsevier Ltd. All rights reserved.
0.79788 K. Certainly they have nothing to do with whether the diurnal and annual variations were modelled. As an aside, the discusser would point out that these components rarely have any influence on design, since they are associated with meso-scale thermally driven phenomena such as land and sea breezes. They appear to be deterministic, since they are phase-locked to the solar cycles, see Harris (2008), and would appear to need to be modelled separately from the broadband component with a Forward Weibull parent, the extremes of which require modelling for design values. Specifying the probability of a mean wind speed has to be based on meteorological records and inevitably will involve some form of Extreme Value(EV) analysis. The methods used until now are based on the pioneering paper by Fisher and Tippett (1928) and the text by Gumbel (1958). These seminal works were wholly concerned with the largest of a countable number, n, of independent, identically distributed (i.i.d.) variables. As the authors state, Fisher and Tippett (1928) showed that as n→∞ the distribution of the largest value tends to one of three asymptotic forms. This discusser has a more acerbic view of the present situation. What now appears in most design codes is based on an analysis of wind data which is not of the type for which the analysis method is intended or valid. It relies on convergence to an asymptotic form which does not happen. The countable number n does not exist. It has been replaced by a notional rate parameter, r, which cannot be rigorously defined, but needs to be constant for a given wind record, when in reality it is not. It is thus not surprising that the existing methods are in disarray and alternatives are needed. Thus the authors’ paper is timely. Previously all the methods have started with measured wind speeds at modest levels and attempted to project these with an EV analysis to larger wind speeds with a return period of 50 y or higher which are needed for design. The novel feature of this paper is that the authors start by trying to achieve a believable asymptotic result valid for large wind speeds and then work backwards from it. This seems a useful strategy to investigate. As noted at the end of Harris (2014), given the theory by Galambos (1978) and the fact that r(V) has both an upper bound and appears to be monotonically increasing, then it should have an asymptote for large
Journal of Wind Engineering & Industrial Aerodynamics 164 (2017) 174–178
R.I. Harris
V. The key question is what is the value and how does it depend on the other parameters? Since Harris (2014) was published, the discusser has continued to work on this problem, the intention being to publish the results as a full paper. However some of the interim results already obtained have a direct bearing on the material in the authors’ paper, both by supplying vital material needed to enhance what they have done and to remove some of the obstacles from the use of their new “top-down” approach. Thus the rest of this discussion uses these results. The inconvenient truth is that with a record length of either 20,000 y as in Harris (2014) or 12,740 y (as in the present paper) computation of r(V) from a single realisation blows up due to sampling error at values of V corresponding to y between 5 & 6. Some preliminary tests in collaboration with N.J. Cook indicated that because of the double logarithmic relation between y and the EV probability, to go to a level of V sufficient to resolve the value of the asymptote might require a simulation well in excess of 107 years. Apart from the computer time needed, going to this level would be likely to cause problems with random number generation. The alternative strategy is to use shorter length individual simulations, but compute larger numbers of them to form an ensemble, and then to take the ensemble average. Therefore the discusser made changes to speed-up the computations. The PC operating system was up-graded and a compiled version of the program was used. The length of each realisation was reduced to 10,000 years, This allowed the two streams of random data merged to form the Rayleighdistributed wind speed simulation to be held permanently in RAM and thus removed the time penalty of transfers to and from the hard disc. The form of the auto-correlation was changed. The Von Karman form is believed to be near to the truth because it provides a spectrum which rolls off as f−5/3 at high frequencies, which seems to be a general feature of any system obeying the Navier-Stokes equations. The disadvantage of the Von Karman form is that to produce it from the independent N(0,1) series produced by the random number generator requires a complicated and thus time-consuming auto-regressive process. An alternative widely used for turbulence modelling in the US aerospace industry in the pre-digital age was the Dryden Spectrum. The auto-correlation corresponding to this has a simple exponential form:
Fig. 1. Comparison of Von Karman & Combined Dryden correlations for Z=22.15 h.
series with the CD correlation and a time-scale of 22.15 h. Fig. 2 shows the resulting Gumbel plot. The original plot from Harris (2014) is shown for comparison as the solid line. There is very little difference, demonstrating that if the auto-correlation has the same time-scale and is a simple monotonically decreasing function, there is little effect on the Gumbel plot. Given these comparisons, the discusser decided that using the CD correlation for further computations was a reasonable course. The result is that one 10,000 y simulation which previously would have required a 42 h computer run, now takes 12 min. This allows a single simulation for each case investigated to be replaced by an average over a 100 member ensemble. The discusser ran five such simulations with values of the time-scale given by Z=2.5 h, 5 h, 10 h, 20 h & 40 h. To assist the discussion of the impact of these results on those of the authors, the discusser introduces some other theoretical formulae not given in their paper. If the CDF of the parent distribution is denoted by P(V), then Q(V) =1−P(V) is known as the “risk of exceedance”. Q(0)=1 and decreases monotonically to zero as V increases. This means that Q(V) can always be written as:
Q (V ) = exp [−h (V )] which serves to define the function h(V). With r(V) dependent on V as indicated, the usual definition of the Characteristic Largest Value (CLV) in classical EV theory has to be redefined as the solution of:
ρ (τ ) = exp (−τ / Θ) where Θ is the time-scale. A Gaussian process with an exponential auto-correlation is also a simple Markov process and is thus easily derived from an independent N(0,1) series by a simple two-term autoregression. Thus synthesising a time series with this form of autocorrelation runs about 100 times faster than a Von Karman synthesis. If an exponential auto-correlation is used for the components x(t) & y(t) then the modulus r(t) has an auto-correlation ς(τ) given by [see Harris (2014)]:
ς (τ ) =
1 2 F1 (− 2 ,
Q (U ) = 1/ r (U ) The monotonic decrease of Q(V) and the monotonic increase of r(V)
1
− 2 ; 1; e−2τ / Θ ) − 1 4/ π − 1
For convenience, the discusser will refer to this as the CD autocorrelation [CD=Combined Dryden]. By integration this also provides the relation between the time-scale, Z of r(t) and Θ, that of a component, which is needed for the computation. To the full precision of 64-bit arithmetic this is:
Θ = 2.10329777632669 × Z Fig. 1 shows a comparison between the Von Karman and CD autocorrelation functions. Both have a time-scale of 22.15 h. They are quite similar monotonically decreasing functions. The discusser used the original independent and uncorrelated N(0,1) time series from Harris (2014) and applied the simple auto-regressive process needed to convert them to a 20,000 y simulation of a Rayleigh-distributed time
Fig. 2. Comparison of Gumbel plots for CD and Von K correlated data Z=22.15 h.
175
Journal of Wind Engineering & Industrial Aerodynamics 164 (2017) 174–178
R.I. Harris
together with the existence of an upper bound, guarantees that a unique solution for U always exists. No useful theoretical lower bound for r(V) exists, but a practical one can be deduced. In his exposition of his “Method of Independent Storms”, Cook (1982) devised a method of counting the number of independent storms in a year by counting the number of maxima over a threshold. The exact count varied with the choice of threshold, but for a sensible choice in relation to the wind speed levels sought for design, the number of storms was always in the region of 150/y for temperate storms and 50/y for thunderstorms. These values set a practical lower bound for r(V) as each independent storm must produce a minimum of one independent value of wind speed and usually many more. For instance, the authors’ simulation finds for 10-min means a value of r(V) of about 2×104 at y=0. A lower bound for r(V) provides an upper bound for Q(U) and hence ensures that for all values V > U the quantity Q(V)/2≪1. This result enables the authors’ equation (6) to be rearranged to read: Fig. 4. Rayleigh Parent, CD Correlation, Z=5 h.
y = h (V )–ln [r (V )] Note that this result preserves that from classical EV theory that when y=0, V=U. On any Gumbel plot of y versus V ( with y as ordinate) and a Forward Weibull parent wind speed distribution with index w > 1, the plot is curved concave upwards. The curvature decreases as V increases but will not be satisfactorily approximated by the straight line of an FT1 asymptote until values of y well beyond those useful for design are exceeded, as shown in the authors’ Fig. 4. The discusser believes that the above form for y is particularly useful as it indicates the origin of the curvature on the Gumbel plot. The term h(V) contains all the curvature arising from lack of convergence to the FT1 asymptote, whilst the term in ln[r(V)] contains all that due to the correlation present in the parent. When simulated data is being studied, as in the present paper, then the form of the parent P(V) and hence of h(V) is always available. Hence the substitution z=h(V) can be introduced and thus:
y = z − ln [r (z )]
Fig. 5. Rayleigh Parent, CD Correlation, Z=10 h.
Hence if r is constant the Gumbel plot becomes a straight line of slope 1, i.e. the lack of convergence curvature has been removed, and any curvature on the plot is due to correlation. The discusser prefers this form of presentation, compared say to that in the authors’ Fig. 4, where the curvature is a mixture of lack of convergence and correlation effects. The above form of y/z plot is used in what follows. The results from the simulations are shown in Figs. 3–7. Features common to all include a horizontal line at y=5 which represents the point for a single simulation at which the plot goes beserk due to sampling error. For the same reasons the points plotted above this level
Fig. 6. Rayleigh Parent, CD Correlation, Z=20 h.
have an open instead of a closed symbol. The performance of the ensemble averages is much more stable, since this reduces the sampling error by a factor of 10. The line labelled r=8766 is what the authors call the IO line, i.e. it is what would be obtained were the data all independent. If as the authors suggest, all the values, regardless of time-scale, asymptote to this case, then in every one of the figures the data plot must eventually coincide with this line. On the other hand, if each plot has an individual constant asymptote, they will form a set of lines parallel to this one.
Fig. 3. Rayleigh Parent, CD Correlation, Z=2.5 h.
176
Journal of Wind Engineering & Industrial Aerodynamics 164 (2017) 174–178
R.I. Harris
Fig. 8. Shift from IO line along y=0 with variation of correlation time-scale Z. Fig. 7. Rayleigh Parent, CD Correlation, Z=40 h.
Collectively, these figures confirm that the time-scale Z is the important parameter which governs the appearance of the Gumbel plot. This serves to reinforce the discusser's view that periodic components should be handled separately. Otherwise, both the autocorrelation and the time-scale of the combination may well not be representative of either component. Strictly r(z) should be written as r[z,ς(τ)] as in general, one would expect the value to depend on both the form and the time-scale of the auto-correlation, but the evidence of Fig. 2 suggests that r(z,Z) may suffice. As Z→0, one would expect to recover the result for independent data. In Fig. 3 where Z is small [=2.5 h], the plot is very close to the IO line. As Z increases, then on a line of constant y, for instance y=0 along which z= ln[r(z,Z)], the plot shifts systematically to the left. There is clearly no limiting value of r(z,Z) at around y=5 or 6, as the authors have built into their empirical model. If along the line y=0, −Δz is the shift, then it follows that exp(−Δz) is the ratio r(z,Z)/8766. Table 1 summarises the values of z, −Δz and r(z,Z)/8766 deduceable from Figs. 3–7, and the results are depicted in Fig. 8. The discusser notes that given sufficient information is available about the parent distribution so that a y/z plot can be constructed, this shift may provide a method of deducing the value of the time-scale Z without direct processing of the data, which given the mixed nature of real data may well not be possible. The question of whether all the lines asymptote to the common IO line cannot be decided from these results. However if this does happen, it must be at a value of y well above the range of interest for design. Since the plot is slightly concave up, fitting a straight line to a section and extrapolating along it to the intersection with the IO line will produce a lower bound for the true value of any intersection. Applying this technique to Fig. 6 where Z=20 h, to the data lying between y=0 and y=3, leads to an estimate of y=13.55 for the intersection, which corresponds to a return period of 760,000y. Since this is a systematic underestimate, it confirms that if there is a real intersection it is well beyond any practical range and cannot be evaluated by any form of computer simulation of this kind.
Since for design, the Gumbel plot is needed only at most over the range y=0 to 9, then there seems to be a lot of merit in fitting a power law approximation to the y/V Gumbel plot of the form:
y = AV w + B which is the basis of the Cook-Harris, (2004, 2008) penultimate method, successfully tested both by the authors (Torielli et al., 2013) and in Harris (2014). Alternatively, since the curvature on the Figs. 3– 7 is so small, if information about the parent is sufficient to construct a y/z plot, then fitting a power law on this plot or even linear extrapolation on it seems also to be a simple alternative. Otherwise, the discusser would like to see the authors’ “top down” approach pursued, but it will require the fitting of a suitable expression to the data in the figures to match the approach to the IO limit and this seems to require many further simulations, to incorporate the role of the time-scale. Finally the discusser wishes to comment on the relation of the authors’ work to the underlying problem which now influences all EV analyses. As soon as the idea of convergence to an asymptotic form is necessarily abandoned, then every other EV method (apart from CookHarris penultimate) requires additional information about the parent distribution. It has long been accepted that the wind climate of most sites is actually mixed in the sense that several different physical mechanisms produce the wind regime. Gomes and Vickery (1978) produced the pioneering paper on dealing with this situation where the components were grossly different e.g. thunderstorms and tropical cyclones. The problem now is dealing with components where the differences are more subtle. Until recently the techniques used to separate components were left censorship, when one mechanism dominated the wind regime, or separation synoptically using annotated records (Lombardo et al., 2009). The situation has changed since the development of the OEN method (Harris and Cook, 2014). This demonstrated that sites like Tiree and Marham in the UK, long regarded as having “simple climates”, actually had at least two significant components. More importantly, it seemed to offer a means of splitting a mixed climate into its component parent distributions including an estimate of the fraction of the time that each component was active. Since the OEN paper was published, Cook has continued to refine the data processing and has quite deliberately applied the method to sites with known very “mixed” regimes. An example is Adelaide, South Australia, Cook (2015), where a “blind” data analysis resulted in the extraction of five OEN components which were then found to be consistent with the known meteorology of the site. One advantage of the OEN method is that all the OEN distributions found from the analysis yield marginal distributions of wind speed which can be very accurately approximated by Forward Weibull distributions, all with a value of the index close to 2. This alleviates the problem
Table 1 Details of left shift produced by parent correlation. Z
z
-Δz
r (z,Z)/8766
0.00 2.50 5.00 10.0 20.0 40.0
9.078636 8.813309 8.488791 8.024742 7.446589 6.776074
0.000000 0.265327 0.589846 1.053894 1.632047 2.302562
1.00000 0.766956 0.554413 0.348578 0.195529 0.100002
177
Journal of Wind Engineering & Industrial Aerodynamics 164 (2017) 174–178
R.I. Harris
extreme wind speeds drawn from tail-equivalent Weibull parents. Struct. Saf. 26, 391–420. Cook, N.J., Harris, R.I., 2008. Postscript to "Exact and general FT1 penultimate distributions of extreme wind speeds drawn from tail-equivalent Weibull parents". Struct. Saf. 30, 1–10. Fisher, R.A., Tippett, L.M.C., 1928. Limiting forms of the largest or smallest of a sample. Proc. Camb. Philos. Soc. 24, 180–190. Galambos, J., 1978. The Asymptotic Theory of Extreme Order Statistics. Wiley, New York. Gomes, L., Vickery, B.J., 1978. Extreme wind speeds in mixed climates. JWEIA 2, 331–334. Gumbel, E.J., 1958. Statistics of Extremes. Columbia University Press, New York. Harris, R.I., 2008. The macro-meteorological spectrum – A preliminary study. JWEIA 96, 2294–2307. Harris, R.I., 2014. A simulation method for the macro-meteorological wind speed and the implications for extreme value analysis. JWEIA 125, 146–155. Harris, R.I., Cook, N.J., 2014. The parent wind speed distribution: why Weibull? JWEIA 131, 72–87. Lombardo, F.T., Main, J.A., Simiu, E., 2009. Automated extraction and classification of thunderstorm and non-thunderstorm wind data for extreme-value analysis. JWEIA 97, 120–131. Torielli, A., Repetto, M.P., Solari, G., 2013. Extreme wind speeds from long-term synthetic records. JWEIA 115, 22–38. Von Karman, Th, 1948. Progress in the statistical theory of turbulence. Proc. Natl. Acad. Sci. US 34, 530.
highlighted by the authors, of the variability of the index, when a single Forward Weibull is fitted to what is almost certainly mixed data, which is not a reliable route to extreme values. Since the study of Adelaide, this work has been ongoing and the new results seem to confirm that OEN is a viable method. Note that it cannot separate the data, but does appear to be able to identify and separate the parent distributions of the components, including those subject to modulation by the daily and seasonal solar cycles. Thus this suggests that EV analysis may in the future use the data not to calculate extremes directly, but to identify the parent components which can then be combined with a method similar to the one outlined in the authors’ paper. References Baker, C.J., 2010. Discussion of "The macro-meteorological spectrum – A preliminary study" by R.I.Harris. JWEIA 98 (12), 945–947. Cook, N.J., 1982. Towards better estimation of extreme winds. JWEIA 9, 259–323. Cook, N.J., 2015. A statistical model of the seasonal-diurnal wind climate at Adelaide. Aust. Meteorol. Oceanogr. J. 65, 206–232. Cook, N.J., Harris, R.I., 2004. Exact and general FT1 penultimate distributions of
178