Application of the generalized Pareto distribution to extreme value analysis in wind engineering

Application of the generalized Pareto distribution to extreme value analysis in wind engineering

Journal of Wind Engineering and Industrial Aerodynamics 83 (1999) 1}10 Application of the generalized Pareto distribution to extreme value analysis i...

126KB Sizes 1 Downloads 54 Views

Journal of Wind Engineering and Industrial Aerodynamics 83 (1999) 1}10

Application of the generalized Pareto distribution to extreme value analysis in wind engineering J.D. Holmes!,*, W.W. Moriarty" !Department of Mechanical Engineering, Monash University, Clayton, Vic. 3168, Australia "Moriarty Meteorological Services, 143 Reservoir Road, Sunbury, Vic. 3429, Australia

Abstract This paper discusses the generalized Pareto distribution (GPD) and its application to the statistical analysis of extreme wind speeds. Its main advantage is that it makes use of all relevant data on the high wind gusts produced by the storms of interest, not just the annual maxima, and it is not necessary to have a value for every year to carry out the analysis. The GPD is closely related to the generalized extreme value distribution (GEVD), and can be used to determine the appropriate value of shape factor, k, for use in the GEVD. However, a negative shape factor, corresponding to a Type II GEVD, is physically unrealistic, and should be avoided for long-term wind speed predictions. ( 1999 Elsevier Science Ltd. All rights reserved. Keywords: Extreme values; Probability; Wind speeds

1. Introduction The generalized Pareto distribution (GPD) is of fairly recent origin [1], but is of considerable signi"cance for geophysical phenomena such as #oods or extreme windstorms due to its close relationship with the extreme value distributions [2,3]. In this paper, the application of the GPD to the analysis of wind gusts generated by the passage of downbursts is described, and predicted extreme wind speeds by this method are compared with those derived from conventional extreme value analysis of annual maxima. The GPD approach has also been used for wind speeds in the United States by Lechner et al. [4], and Simiu and Heckert [5], but for annual maxima not separated by storm type.

* Corresponding author. 0167-6105/99/$ - see front matter ( 1999 Elsevier Science Ltd. All rights reserved. PII: S 0 1 6 7 - 6 1 0 5 ( 9 9 ) 0 0 0 5 6 - 2

2

J.D. Holmes, W.W. Moriarty / J. Wind Eng. Ind. Aerodyn. 83 (1999) 1}10

2. Generalized extreme value distribution Although the tradition in wind engineering has been to "t the Type I extreme value distribution to extreme wind speeds following the work of Gumbel [6], in fact this distribution is a special case of the generalized extreme value distribution (GEV) [7] F (<)"expM![1!k(
Fig. 1. Generalized extreme value distribution with k"!0.2, 0.0 and #0.2.

J.D. Holmes, W.W. Moriarty / J. Wind Eng. Ind. Aerodyn. 83 (1999) 1}10

3

alternative method which has advantages for small sample sizes and is more easily programmed on a computer, is the `method of probability-weighted momentsa [8]. Sometimes the "tting of a GEVD to sample maxima results in a Type II distribution. Since there are physical grounds for rejecting such a result for extreme winds, it may be attributed to some fault with the data. One reason could be that the sample contains mixed populations } for example, wind speeds from two di!erent storm types, say thunderstorms and gales. Another reason might be sampling errors from small samples. It would seem to be best in such cases to compromise by "tting a Type I (or Gumbel) distribution, with k set equal to zero (Eq. (2)). However, this will produce conservative results at high return periods. The theoretical justi"cation of "tting the GEVD to sample maxima depends on the assumption that the samples of the parent population are large (e.g. annual samples), and that they contain a "xed number of observations. In engineering applications, the Type I distribution has, at times, been applied in situations when the number of observations varied from sample to sample. In such cases (e.g. Refs. [9,10]), the use of the distribution (or a Type III) would be justi"ed by the "t of the resulting curve to the observations rather than by theory, as this approach does not have a sound theoretical basis. Since, we normally have small data samples when "tting the GEVD to annual maxima, the sampling errors will be high [11]. However, this approach ignores data from the second, third etc. largest wind speeds in any year. An approach which allows this additional data to be utilised, and is theoretically justi"ed, is described in this paper. The approach has the advantage that its theoretical justi"cation does not depend on having a "xed annual number of occurrences. It is still valid if there are no occurrences in some years.

3. Excesses over high thresholds } the generalized Pareto distribution The generalized Pareto distribution for a variate, y, takes the form F (y)"1![1!(ky/p)]1@k, (3) y where p is a scale factor, and k is a shape factor. The case k(0 is the usual Pareto distribution. The signi"cance of the GPD is that it is the appropriate distribution for the excesses, or di!erences, of independent observations above a de"ned threshold, given that the limiting parent distribution of the observations is one of the extreme value distributions [2,3]. It also turns out that the value of the shape factor, k, in the GPD is the same shape factor appropriate to the underlying GEVD (Eq. (1)). The GPD also has a &threshold stability' property, i.e. if > has a generalized Pareto distribution and u'0, then the conditional distribution of >!u, given >'u, is also generalized Pareto. This leads to a useful "tting method from which k and p can be determined [3]. Under certain conditions it can be shown [3] that E(>!uD>'u)"[p!k(u!u )]/(1#k), (4) 0 where u is the lowest threshold chosen ('0). E( ) is the expectation operation. 0

4

J.D. Holmes, W.W. Moriarty / J. Wind Eng. Ind. Aerodyn. 83 (1999) 1}10

Thus, if the mean observed excess over u is plotted against (u!u ), then the plot 0 should follow a straight line with slope !k/(1#k) and intercept p/(1#k), if the generalized Pareto distribution is applicable. From the slope and intercept, the values of k and p can be determined. The linearity of this `mean exceedance plota will give an indication of how well the GPD "ts the data. A negative slope indicates that the data is &in the domain of attraction' of the Type III extreme value distribution. In Ref. [4], this approach is described as conditional mean exceedance (CME). The generalized Pareto distribution may also be "tted to data by various other methods, including the method of probability-weighted moments [2]. An estimate of the R-yr return period value, < , of the underlying variable, <, R can be obtained as follows. If the exceedance rate of the level u , is j (/yr), then 0 the mean crossing rate of the level < is given by j[1!F (< !u )]" R y R 0 j[1!k(< !u )/p]1@k, from Eq. (3). R 0 Setting this equal to 1/R, we obtain [3] < "u #p[1!(jR)~k]/k. (5) R 0 Note that, for k'0, as RPR, < Pu #(p/k), i.e. at high return periods, the R 0 predicted extreme wind speeds tend to a limiting value. The application of the method is explained in a later section. It is of interest to consider the limiting form of this method when k is forced to be 0. This case is discussed in detail in Appendix B. Eq. (4) shows that the expected excess is, in this case, a constant and equal to p } i.e it is independent of k. The limiting form of Eq. (5) is as follows (Eq. (B.3) of Appendix B): < "u #p ln (jR). (6) R 0 Appendix B also shows that setting k equal to zero is equivalent to assuming a Type I (or Gumbel) extreme value distribution for the annual maxima, as expected. Eq. (6) can be written as follows < "A#B ln R, (7) R where A"u #p ln j, and B"p. 0 Eq. (7) is a well-known limiting form of the Type I Distribution, for high values of R. The equivalent to Eq. (7) resulting from Eq. (5) is < "C!DR~k, (8) R where C"u #(p/k), and D"(p/k)j~k. 0 In Eq. (8), C is the upper limit of extreme wind speed, as previously discussed. 4. Some di7culties with 5tting the GPD The use of the GPD for extreme value estimation is susceptible to sampling errors. For example if one recorded wind gust from a storm is much greater than all the others, the mean exceedance plot will not show a clear linearity. A calculated line of best "t would be of doubtful meaning, and might suggest a negative value for k, which, as discussed previously, is physically unrealistic.

J.D. Holmes, W.W. Moriarty / J. Wind Eng. Ind. Aerodyn. 83 (1999) 1}10

5

The GPD applies to observations above some large threshold. If lower thresholds are used, the mean exceedance plot may not be linear. On the other hand, if there are only a few observations available above the chosen thresholds, the true linearity of the upper part of the line may be di$cult to detect, so that again a best "t line would be of doubtful signi"cance. When the largest observation is larger than the next two by three (or more) threshold intervals, the mean exceedance plot will have two or three (or more) points entirely dependent upon this one observation. Using a median value of such points will reduce this excessive dependence. An alternative method of determination of k, and the sensitivity of the shape factor to the chosen threshold speed, and hence to the number of available samples of exceedances, is discussed in detail for US data by Simiu and Heckert [5].

5. Comparative advantages of the GEVD and the GPD It has been noted that, unlike the GEVD, the GPD is not theoretically dependent on each extreme value being derived from a "xed number of observations in the parent sample, and that it can use information from all the large observations. While both the GEVD (using either the method of sextiles or the method of probability-weighted moments) and the GPD (using mean exceedance plots) are susceptible to sampling errors they are a!ected in di!erent ways. In the sextiles and probability-weighted moment approaches, the e!ects of a large outlier are mitigated by the averaging processes. In the mean exceedance plot, such an outlier has an e!ect on every point plotted. If the observations are a mixture of two or more distributions (for example wind speeds produced by more than one type of storm), one of which tends to have higher values, some of the annual maxima used in the GEVD estimation may belong to one of the lower valued distributions and hence introduce distortion into the calculated value of the shape factor, k. However, with the mean exceedance plot, the e!ect of the population with the lower distribution is to show a change of slope, or lack of linearity. If there is a linear upper part in the plot and only this is used, the inhomogeneity in the population should have a minimal e!ect on the estimated value of k. This suggests a way of using the two approaches to check each other when both can be applied. If the necessary data are available, all the extreme observations greater than the least of the annual maxima could be used for the mean exceedance plot. A reasonably linear result would suggest that the annual maxima are a homogeneous set, and can con"dently be used to "t a GEVD. The values for k estimated by the two approaches should then be in reasonable agreement. If they are not, some kind of average might be adopted. Since the value of k estimated by either the method of sextiles or probability-weighted moments, would be less a!ected by a large outlier than that determined by a mean exceedance plot, the former estimate should be given a greater weight in the average, say by 2 : 1. If the result of the mean exceedance plot is clearly non-linear, it is likely that the lowest annual maxima do not belong to the same population as the higher ones. The

6

J.D. Holmes, W.W. Moriarty / J. Wind Eng. Ind. Aerodyn. 83 (1999) 1}10

value of k obtained from "tting the GEVD (by sextiles or probability-weighted moments) will then be a faulty estimate. The maxima of two- or three-yr periods might be tried if there are su$cient data, or k may be estimated just from the mean exceedance plot. Another way of combining the two approaches, which would avoid the "tting of curved lines, would be to use a "tting of the GPD to "nd a value of k, and then to use this value of k in a linear "t of the GEVD to the annual maxima. An example of this procedure is given in the next section.

6. Example } downburst winds at Moree, N.S.W. A survey of the Dines anemometer records obtained by the Bureau of Meteorology for Moree, N.S.W., between 1965 and 1992, revealed 65 gusts of 40 knots (20.6 m/s) or greater, whose &signature' indicated that they were produced by thunderstorm downbursts. An analysis of the average excess over the thresholds u"20.3(u ), 22, 0 24, 2, 34 (m/s), gave the values shown plotted against (u!u ) in Fig. 2. A straight 0 line was "tted, from which values for p of 5.067 (m/s), and for k of 0.161, were obtained. The value of j is 2.32 (65 events in 28 yr). Then applying Eq. (5) for return periods between 10 and 10 000 yr, we obtain values for < given in Table 1 (column 2). An R upper limit of about 52 m/s is approached at very high return periods. Estimates of extreme winds from the excesses over threshold method, when the shape factor, k, is forced to be zero have also been made using Eq. (6), and are given in column 3 of Table 1. As expected, at high return periods, these estimates exceed those in column 2.

Fig. 2. Mean exceedance plot for Moree downburst gusts.

J.D. Holmes, W.W. Moriarty / J. Wind Eng. Ind. Aerodyn. 83 (1999) 1}10

7

Table 1 Predictions of extreme winds (m/s) due to downbursts at Moree, N.S.W. Return Generalized period (yr) Pareto method (k"0.16)

Generalized Pareto method (k"0)

Extreme value Type I (annual maxima)

Extreme GEVD! by value method of Type III sextiles with k"0.16 (linear "t)

GEVD" by probabilityweighted moments

20 50 100 200 500 1000 2000 5000 10 000

33.5 36.6 39.0 41.4 44.5 46.9 49.2 52.4 54.8

34.3 37.4 39.8 42.2 45.3 47.7 50.1 53.2 55.6

33.8 36.0 37.4 38.7 40.1 41.1 42.0 43.0 43.7

34.7 37.7 39.8 41.8 44.3 46.1 47.8 50.0 51.5

34.8 37.1 38.7 40.1 41.7 42.7 43.7 44.8 45.5

34.1 36.6 38.3 39.9 41.9 43.2 44.4 45.9 46.9

!k"0.11. "k"0.07.

There were only 2 yr between 1965 and 1992 in which no downburst gust of 40 knots or greater, was recorded. Assuming that in these years, the maximum downburst gust was 39 knots, a full set of annual maxima becomes available. Then these values were ordered and plotted against the unbiassed reduced variate appropriate to the Type I extreme value distribution suggested by Gringorten [12] x"!lnM!ln[1!(i!0.44)/(N#1!0.88)]N,

(9)

where i is the order and N is the total number of values (28 in this case). A straight line "tted by the least squares method was used to produce the predicted values of < shown in Table 1 (column 3). (This method of "tting the Type I distribuR tion gives very similar results to the well-known Lieblein BLUE approach [14], and is easier to use for samples greater than 16). As indicated earlier, a value of k greater than 0 obtained from the generalized Pareto plot (Fig. 2) indicates a Type III underlying extreme value distribution. Since the Type III is a three parameter distribution, the value of the shape factor, k, must be speci"ed when deriving an equivalent straight line "t. The equivalent to Eq. (9) for the Type III, is the following: x"aM1!exp(!kx@)N/k with x@"!lnM!ln[1!(i!0.30)/(N#1!0.60)]N.

(10)

In this case the Benard plotting position parameter has been used, as recommended by Cunnane [13] for the Type III extreme value distribution. The value of k of 0.161 obtained from the generalized Pareto plot was used in Eq. (10) for the Moree data. The values of < for various values of return period, R, from the Type III "t to the R annual maxima are also shown in Table 1 (column 4). Perhaps not surprisingly, as the

8

J.D. Holmes, W.W. Moriarty / J. Wind Eng. Ind. Aerodyn. 83 (1999) 1}10

same value of shape factor, k, applies, the values obtained were quite similar to those obtained from the generalized Pareto analysis via Eq. (5). However, both give estimates signixcantly lower at high return periods (and approaching an upper limit), than those obtained by "tting the Type I distribution to the annual maxima. The Type I extreme value, or Gumbel, approach to prediction of extreme wind speeds for design appears to be a conservative one. Also shown in Table 1, are predictions obtained by "tting the annual maxima to the GEVD by the method of sextiles (column 5) and by the method of probability weighted moments (column 6). These procedures gave values of the shape factor which were slightly less than that obtained by the GPD approach. Hence, the predicted extreme winds fall between those obtained by the GPD approach (column 2) and the "t to extreme value Type I (column 3). However, in this example the di!erences are relatively small for return periods below 1000 yr. 7. Conclusions This paper has discussed the generalized Pareto distribution and its application to the statistical analysis of extreme wind speeds. Its main advantage is that it makes use of all relevant data on the high wind gusts produced by the storms of interest, not just the annual maxima, and it is not necessary to have a value for every, or nearly every, year to carry out the analysis. The Type I (Gumbel) and Type II (Frechet) extreme value distributions are conservative ones for geophysical phenomena, because they predict unlimited values as the return period increases, a property which is intuitively and physically incorrect. The Type III extreme value distribution (k'0) is a more appropriate model for extreme wind speeds. When used together with the GPD, the two approaches may be combined by "tting the GPD to the excesses over thresholds, as a method of determining the appropriate value of the shape factor, k, and then "tting the Type III extreme value distribution. Acknowledgements The work described in this paper was carried out as part of a project sponsored by the Australian Electricity Supply Industry Research Board, Powerlink (Queensland) and TransGrid (N.S.W.). The support of these bodies is gratefully acknowledged by the authors. Appendix A. Proof that the Type I extreme value distribution is a special case of the generalized extreme value distribution with zero shape factor This Appendix gives a proof that (1!kx)(1@k) tends to exp(!x) as k tends to 0. When x is written as (
J.D. Holmes, W.W. Moriarty / J. Wind Eng. Ind. Aerodyn. 83 (1999) 1}10

9

First take a binomial expansion of (1!kx)(1@k)

AB A BA B

(1!kx)(1@k)"1!

1 (kx) k

1 k

#

A BA BA B

1 1 !1 (kx)2 k k ! 2!

1 !1 k

AB

AB

1 !2 (kx)3 k

3!

#2

1 2 1 3 (1!k)(kx)2 (1!k)(1!2k)(kx)3 k k "1!x# ! #2 2! 3! (1!k)x2 (1!k)(1!2k)x3 ! #2 "1!x# 3! 2! x2 x3 As kP0, Lim (1!kx)(1@k)"1!x# ! #2 3! 2! k?0 "exp(!x).

Appendix B. Excesses over threshold analysis, assuming shape factor of 0 The excesses over threshold analysis becomes simpler if we "x the shape factor at zero. This is equivalent to a Type I, or Gumbel distribution, for the annual maxima. The distribution of the positive excesses, y, over a threshold, u, is then an exponential distribution, a special case of the generalized Pareto distribution, for k"0: F (y)"1!exp(!y/p). (B.1) y It can be shown by considering the "rst moment of f (y), that the mean value of y is p. y (This also results from setting k"0 in Eq. (4) in the main text, which then gives an expected excess independent of u). The mean crossing rate of a high level < , can be estimated by multiplying the mean R crossing rate, j, of a lower level, u , by the probability of exceedence of the excess 0 (< !u ), i.e. j[1!F (< !u )]. Then setting this rate equal to 1/R: R 0 y R 0 (1/R)"jexp[!(< !u )/p] (B.2) R 0 from which we obtain, < "u #p ln (jR) (B.3) R 0 (this is the limiting case of Eq. (5) in the main text, as kP0). Thus the procedure in this case is to simply average the means of the positive excesses of the gusts above the various levels, u , u , u etc. This will then give the 0 1 2 parameter, p. The average crossing rate, j, of the lowest level, u , is also calculated, 0 and Eq. (B.3) is then applied to predict wind speeds at high return periods.

10

J.D. Holmes, W.W. Moriarty / J. Wind Eng. Ind. Aerodyn. 83 (1999) 1}10

Alternatively, the distribution of annual maxima can be predicted as follows. The mean rate of upcrossing of a high level, x, is given by: l "j exp[!(x!u )/p]. x 0 Assuming that the Poisson Distribution represents occurrences of high upcrossings, the cumulative distribution of the annual maxima is the same as the probability of no upcrossings of the level x in any 1 yr: F (x)"exp(!l ) x x "expM!jexp[!(x!u )/p]N 0 "expM!exp[lnj!(x!u )/p]N 0 "expM!exp[!(1/p)(x!(u #plnj))]N 0 This is a Type I distribution, with a mode of (u #p ln j), and a slope of (1/p). 0 References [1] J. Pickands, Statistical inference using extreme order statistics, Ann. Statist. 3 (1975) 119}131. [2] J.R.M. Hosking, J.R. Wallis, Parameter and quantile estimation for the generalized Pareto distribution, Technometrics 29 (1987) 339}349. [3] A.C. Davison, R.L. Smith, Models for exceedances over high thresholds, J. Roy. Statist. Soc. B 52 (1990) 393}442. [4] J.A. Lechner, S.D. Leigh, E. Simiu, Recent approaches to extreme value estimation with application to wind speeds. Part 1: the Pickands method, J. Wind. Eng. Ind. Aerodyn. 41 (1992) 509}519. [5] E. Simiu, N.A. Heckert, Extreme wind distribution tails: a peaks over threshold approach, ASCE J. Struct. Eng. 122 (1996) 539}547. [6] E.J. Gumbel, Statistics of Extremes, Columbia University Press, New York, 1958. [7] A.F. Jenkinson, The frequency distribution of the annual maximum (or minimum) of meteorological elements, Quart. J. Roy. Meteorol. Soc. 81 (1955) 158}171. [8] J.R.M. Hosking, J.R. Wallis, E.F. Wood, Estimation of the generalized extreme-value distribution by the method of probability-weighted moments, Technometrics 27 (1985) 251}261. [9] N.J. Cook, Towards better estimation of extreme winds, J. Wind Eng. Ind. Aerodyn. 9 (1982) 295}323. [10] W.W. Moriarty, The implications for extreme gust estimations of the variable frequencies of directional winds I. Theoretical considerations, J. Wind Eng. Ind. Aerodyn. 25 (1987) 307}318. [11] E. Simiu, J. Bietry, J.J. Filliben, Sampling errors in estimation of extreme winds, J. Struct. Div. ASCE 104 (1978) 491}501. [12] I.I. Gringorten, A plotting rule for extreme probability paper, J. Geophys. Res. 68 (1963) 813}814. [13] C. Cunnane, Unbiased plotting positions } a review, J. Hydrology 37 (1978) 205}222. [14] J. Lieblein, E$cient methods of extreme-value methodology, Report NBISR 74-602, National Bureau of Standards, Washington, 1974.