Extreme hurricane wind speeds: Estimation, extrapolation and spatial smoothing

Extreme hurricane wind speeds: Estimation, extrapolation and spatial smoothing

Journal of Wind Engineering and Industrial Aerodynamics 74—76 (1998) 131—140 Extreme hurricane wind speeds: Estimation, extrapolation and spatial smo...

174KB Sizes 3 Downloads 96 Views

Journal of Wind Engineering and Industrial Aerodynamics 74—76 (1998) 131—140

Extreme hurricane wind speeds: Estimation, extrapolation and spatial smoothing E. Casson, S. Coles* Department of Mathematics and Statistics, Lancaster University, Lancaster LA1 4YF, UK

Abstract Estimation of the extremal behaviour of hurricane wind speeds is complicated by the lack of accurate extreme data at any one site. As a result, wind engineers have developed climatological/physical models from which hurricane events can be simulated and used as a basis for inference. We present an application of non-linear spatial regression techniques to hurricane wind speed data simulated at locations on the Gulf and Atlantic coasts of the United States. Modelling the spatial variation in extremal behaviour provides a means of pooling data, thus increasing the extent of information available for inference at each site. Estimates of distributional models for extremal behaviour and, consequently, return levels are more precise than those obtained by standard methods applied to individual site data. We also adapt these spatial techniques to estimate the distribution of directions of the r—largest hurricane wind speeds at each site. The techniques we describe are preliminary steps to spatially modelling the joint behaviour of wind direction and speeds. We anticipate that this method of estimating return levels for winds in a given direction will yield estimates with greater accuracy than current techniques enable. ( 1998 Elsevier Science Ltd. All rights reserved. Keywords: Directional data; Extreme values; Hurricanes; Spatial models

1. Introduction In this paper we focus on statistical modelling of the spatial aspects of the US hurricane database simulated by Batts et al. [1]. The data correspond to series of simulated hurricane wind speeds, stratified by direction, corresponding to locations along the eastern and south—eastern coastlines of the United States. The necessity for simulating data arises from the difficulty in recording wind speeds during violent storms. Thus, the simulated data are obtained from a physical model relating wind speed to pressure, calibrated at each location to reflect the historical series of pressures

* Corresponding author. E-mail: [email protected]. 0167-6105/98/$19.00 ( 1998 Elsevier Science Ltd. All rights reserved. PII: S 0 1 6 7 - 6 1 0 5 ( 9 8 ) 0 0 0 1 1 - 7

132

E. Casson, S. Coles / J. Wind Eng. Ind. Aerodyn. 74—76 (1998) 131–140

recorded in the vicinity. In total, 999 independent simulated hurricane events were generated at each of 55 coastal locations, where each event consists of the maximum wind speed in each of 16 directional sectors. Estimated hurricane rates at each location are also obtained from historical records. As well as summarising the spatial extremal characteristics of the hurricane model, our aim, in treating the simulated data as a surrogate for genuine hurricane wind speed data, is to develop a technique which applies equally well for modelling other sets of spatially recorded environmental series, whether genuine or simulated. The main interest with the hurricane data concerns the behaviour of extremes of the series, though we shall also consider directional aspects. Longer term, we are seeking a model which describes jointly the distribution of extreme hurricane wind speeds and their associated directions. In the present paper we restrict ourselves to separate analyses of each of these components. There are a number of reasons for favouring an analysis that attempts to describe spatial patterns of variation in stochastic behaviour, especially in the context of modelling extremes. These include: 1. Spatial regression models facilitate a ‘pooling’ of information from site to site, improving inferences at each individual location, and enabling accurate estimates to be obtained at sites with very few data. 2. Having established a spatial model, estimates can be interpolated to obtain estimates at locations without data. 3. Obtaining inferences on a global scale can lead to an improved understanding of the underlying meteorological processes. Despite the advantages, there are difficulties in such an analysis using classical regression-type techniques. Therefore, in this paper, we summarise the results of a novel approach to spatial smoothing, using recent developments in Markov chain Monte Carlo methodology to overcome the computational difficulties.

2. Models for extremes 2.1. Models for a single site The classical model for the stochastic behaviour of extremes assumes a sequence of independent variables, X ,X ,2 with common distribution function F, and considers 1 2 the distribution of M "maxMX ,2, X N (1) n 1 n for large n. Based on the asymptotic limit as nPR, applied for a large but finite value of n"N, an approximation to the distribution function of M is obtained as N x!k ~1@m G(x)"exp ! 1#m , (2) p

G C A BD H

E. Casson, S. Coles / J. Wind Eng. Ind. Aerodyn. 74—76 (1998) 131–140

133

defined on Mx: 1#m(x-k)/p'0N. This is the generalised extreme value (GEV) distribution, which includes all three of the Fisher—Tippett types of distribution as special cases. The parameter m is a shape parameter which determines the weight of the tail of F; k and p('0) are location and scale parameters, respectively. This argument justifying the GEV distribution for M can also be shown to be valid when there is N short-range dependence at extreme levels of the X series [2]. i In a typical application, the GEV distribution is fitted to the series of annual maximum observations of a data series, assumed to be independent realisations of M . However, this procedure is inefficient if other data are also available, which is N the case, for example, with the simulated hurricane data. Hence, we work with an alternative characterisation for the tail of F, based on the distribution of PrM X 'u#xDX 'uN for sufficiently large thresholds u; see Ref. [3] for details. This i i enables us to obtain estimates of the GEV parameters (k,p, m) based on all data which are large in the sense of exceeding a high threshold u. Since the hurricanes are simulated, calculation of the probabilities of extreme events over a given period — a year say — additionally require an estimate of the yearly rate of occurrence of hurricanes, j. Combining the hurricane rate with the estimated GEV parameters leads to an estimate of the n-year return level, that is the level exceeded in any year with probability 1/n, as

GA B H

p l "k# n m

jn m !1 , N

(3)

where N is the number of realisations assumed per year in the fitting of model (2). This procedure can be applied separately at each of the 55 data locations, to obtain GEV estimates (k ,p ,m ) and associated return level estimates l . Instead, to obtain j j j n,j the benefits of a spatial model identified in the introduction, we develop a modelling approach which incorporates spatial smoothing of these parameters. 2.2. Spatial models One option is to propose simple regression models for each of, or a subset of, the GEV parameters, which can then be fitted by, say, maximum likelihood. However, the observed pattern of variation in the parameter estimates is complex, making it difficult to formulate suitable regression models. Nonetheless, spatial coherence in the driving meteorological processes suggests that nearby locations might be expected to have similar GEV parameter values. Hence, for each GEV parameter, we propose models of the form h(k(s))"f (s;b)#g(s;o),

(4)

where s denotes the spatial location, h is a link function (e.g. logarithmic), f is a regression function (linear, for example) with unknown parameters b, and g is a zero mean Gaussian stochastic process with correlation function o. The component f in this model corresponds to possible systematic effects such as spatial trends — this is standard in statistical modelling. The novel feature is the inclusion of the stochastic

134

E. Casson, S. Coles / J. Wind Eng. Ind. Aerodyn. 74—76 (1998) 131–140

Gaussian component, g, in a non-Gaussian model. The introduction of this structure for modelling non-linear, non-Gaussian spatial effects was suggested in other contexts by Diggle et al. [4]. Our implementation enables smooth, but possibly irregular, variation in the GEV parameters from location to location. It remains in model (4) to specify the correlation structure of the Gaussian component, which determines the degree of similarity of the parameters from site to site. We have adopted correlation functions of the form o(s ,s )"expM!(aDs !s D)dN, (5) 1 2 1 2 which is a standard correlation model within the spatial statistics literature. Note, in particular, that the larger the value of a, the faster the decay in parameter correlation between sites, so that small values of a correspond to strong spatial coherence, whilst large values correspond to substantial spatial irregularity. Models of a similar structure to Eq. (4) were also used for the spatial variation in the other GEV parameters p and m. Because of the complexity of the spatial structure, inference for the model is not feasible using conventional methods. However, recent developments in Markov chain Monte Carlo methodology provide a scheme which makes estimation possible. The idea, at least in a formal sense, is to construct the model within a Bayesian framework, and then to use a simulation scheme which generates points from a Markov chain whose equilibrium distribution is the target posterior distribution. Denoting by H, the parameter set of the model, the full Bayesian specification of the model determines a posterior distribution, n(HDX), where X is used to denote the entire simulated data set across all sites. Analytic computations on the distribution would be avoided if we could simulate from p directly, but even this is unwieldy. Instead, we proceed in the following way, to obtain a Markov chain h , h ,2 whose equilibrium distribution is 1 2 the target distribution HDX. The procedure, given a current value of the chain h , is to i simulate a new proposed value h* according to an arbitrary transition model i`1 q(h ,h ). Then let h "h* , with probability c, where i`1 i i`1 i`1 n(h DX)q(h ,h ) i i`1 , c" i`1 (6) n(h DX)q(h ,h ) i i`1 i otherwise, let h "h . i`1 i From this procedure, after an initialisation period, the simulants, h , can be used to i infer different aspects of n(HDX), such as the marginal distributions of interest. The point of this algorithm is that q can be chosen almost arbitrarily, so that choices can be made which facilitate straightforward simulation. The rejection criterion for h* ensures that the required target distribution is obtained. A drawback is i`1 that some choices for q can lead to very poor performance (in particular, strong dependence in the simulated series), so considerable experimentation is needed to obtain a scheme which is easy to implement and which also provides accurate inferences with relatively short simulation series. Having obtained a series of values of the vector of parameters H, focusing just on one individual component, say k , yields an estimate of the marginal posterior j

E. Casson, S. Coles / J. Wind Eng. Ind. Aerodyn. 74—76 (1998) 131–140

135

distribution of the GEV parameter k at site j. Full details of the general methodology are summarised by Smith and Roberts [5]. For this particular application, in which a graphical structure of the model is exploited to obtain suitable transition models q, details are given in Refs. [3,6]. Simulation studies suggest that the method produces accurate inferences from data with similar structure to the hurricane series and that estimates are considerably more accurate than those obtained using conventional techniques in situations where very few data are available. This gives us some confidence in applying the spatial model to the actual data simulated from the hurricane model. Limitations on space prevent us from giving a thorough presentation of our results on the simulated hurricane series. Fig. 1 shows the spatial variation of the GEV parameters, and of the rate of hurricane occurrence, from sites in Texas, through to

Fig. 1. Estimates of GEV model parameters obtained using spatial methods, and the hurricane rate, based on data simulated at locations from Texas, to Maine.

136

E. Casson, S. Coles / J. Wind Eng. Ind. Aerodyn. 74—76 (1998) 131–140

Maine. For each location we show the optimal estimate of each parameter, together with a 95% probability interval obtained from the fitted distributions. Substantial spatial variation in the parameters k and p is apparent. Moreover, there is strong evidence for the inclusion of a linear trend component for each of the parameters k and p. This implies that the spatial variation in extremes of these data, as previously reported by Batts et al. [1], is not just a consequence of different rates of hurricane occurrence, as summarised by the plot for j, but also due to differences in the stochastic behaviour of hurricane characteristics when such events occur. In contrast, the shape parameter, m, is comparatively stable along the entire coastline. In fact, there is little evidence for any spatial variation in m at all. Such features are often evident in environmental processes — the parameters k and p are influenced by local characteristics, whilst m is determined by global meteorology and so may be constant over very large regions. Moreover, in this case, the evidence here suggests that m is substantially less than zero at all locations, corresponding to bounded extrapolations of return levels. This is consistent with the present meteorological consensus. The results are more easily interpreted when expressed in terms of return levels. In Fig. 2 we show, for two different return periods, estimates of return level for each site together with associated 95% probability intervals. The return levels also vary smoothly. There is a noticeable downward trend in the 50 year return levels with sharp increases around Florida and South Carolina. For the 2000 year return levels the effect of an overall trend is diminished. The robustness of results to model specification was examined by considering a number of alternative model structures. In particular, long-term extrapolations can be especially sensitive to assumptions made about the shape parameter m. Thus, we also fitted models in which m was held constant over the entire coastline, or constant

Fig. 2. Estimates of 50- and 2000- year return levels (ms~1) obtained using spatial methods, and the hurricane rate, based on data simulated at locations from Texas, to Maine.

E. Casson, S. Coles / J. Wind Eng. Ind. Aerodyn. 74—76 (1998) 131–140

137

over geographical sub—regions. In all cases, there was no substantial deviation from the analysis presented in Fig. 1, therefore adding weight to the robustness of those results.

3. Models for directional behaviour We now examine the directional behaviour of the simulated hurricane series. Each simulated event comprises the maximum wind speed generated in each of 16 directional sectors. We define / to be the direction of the maximum wind speed for a particular event and consider patterns of variation in the distribution of / along the coastline. Of most interest would be a model which jointly describes the stochastic behaviour of the extreme wind speeds and their associated directions. This is the subject of current research. We study the issue in a more limited way here by looking at the distribution of the variable /, for values of / associated with only the r most extreme events at each location, for different values of r. Care is needed in specifying a model for the distribution of / because of its circular domain. We adopt the von Mises distribution, whose probability density function has the form expMi cos(/!t)N f (/;t,i)" , 0)/(2p, i*0, I(i)

(7)

where 2p I(i)" exp(i cos /) d/.

P

(8)

0 This is a standard model for directional data. The parameter t, with 0)t(2p, represents the modal direction, while i is a concentration parameter: i"0 corresponds to uniform variation across directions; iPR yields a degenerate distribution focused on the direction t. Preliminary checks confirmed that this model provides a quite reasonable description of the / distribution at each of the simulated data locations. The advantage of such a simple model is the ease of parameter interpretation: one parameter for the principal direction, one for the degree of uniformity. Our interest again is in the spatial coherence of the directional process. We therefore make similar assumptions about the spatial structure of the parameters t and i to those made earlier about the GEV parameters. Thus, for example, we model spatial variation in t as h(t(s))"g(s;o)#c,

(9)

where g is a spatial Gaussian process with correlation function o, and c is a constant. In this case, the link function h is chosen to ‘wrap’ the process g so that the domain of the function remains [0,2p). Potentially, regression effects could also have been

138

E. Casson, S. Coles / J. Wind Eng. Ind. Aerodyn. 74—76 (1998) 131–140

Fig. 3. Modal direction of the 200 most extreme wind speeds at each location: (a) Arrow length J 10i; (b) Arc radius J 10i.

included as in model (4), but were found not to improve the quality of the fitted models. We demonstrate our results based on the largest 200 and the largest 10 events at each site, respectively. We discuss first the results based on the largest 200 events. First, Fig. 3(a) shows the estimated value of t, corresponding to the modal wind direction, at each location on the coastline, and the estimated value of i, the measure of uniformity of data about t. The spatial coherency of the process is clearly evident, although there are some sites around Maine for which the pattern is different. Fig. 3(b) gives a more detailed description of the fitted model. In this figure, at each location, the arcs are centred on the estimated value of t, and extend to comprise a 95% probability interval for t. In this way, we display not just the estimates themselves, but also a measure of their uncertainty. Similarly, the lengths of the arcs are proportional to the estimated values of i, with subdivisions representing a 95% probability interval. Thus, Fig. 3(b) contains estimates and 95% probability intervals for both parameters t and i. To avoid congestion on the plot, this information is shown just for a subset of sites. Interpreting these figures, there is some suggestion that at most locations, the predominant direction for the most extreme hurricane winds is in a southerly direction, with small variance, either at one location or between sites. However, there are sites north of Florida where the modal direction is northerly, or northwesterly. For comparison, Fig. 4(a) and 4(b) gives the results of the equivalent analysis based on just the largest 10 wind speeds at each location. In this case, the sparsity of the data leads to the spatial analysis giving substantially better inferences compared with a separate site-by-site analysis. There are also two noticeable differences compared with the analysis based on the largest 200 events. First, the predominant direction at each location is south—southeasterly. Second, the estimated values of i are approximately twice as large on average, corresponding to a much smaller variation in

E. Casson, S. Coles / J. Wind Eng. Ind. Aerodyn. 74—76 (1998) 131–140

139

Fig. 4. Modal direction of the 10 most extreme wind speeds at each location: (a) Arrow length J 2i; (b) Arc radius J 2i.

directions. Consequently, at the most extreme levels, there is a tendency for wind directions to have greater homogeneity, both within the events at a single location and spatially across locations. 4. Conclusions We have demonstrated that recent advances in statistical computation enable the estimation of quite complex spatial models, both for the magnitude and direction of hurricane wind speeds. In both cases the spatial model facilitates a pooling of information which has greatest impact in the estimation of the distribution of extreme hurricane wind speeds, because of the sparsity of relevant data in that case. Though the statistical modelling is intricate, the conclusions of the analysis are easily interpreted, offering, we believe, as detailed an analysis of the simulated hurricane data as is possible. In particular, the spatial variation in patterns of hurricane behaviour are clearly identified. Our remaining task is the development of a model to jointly describe the spatial patterns of hurricane wind speed-direction behaviour. Acknowledgements We thank Emil Simiu for many helpful discussions. The work was supported by funding under the NATO Collaborative Research Grants Programme. References [1] M.E. Batts, M.R. Cordes, L.R. Russell, J.R. Shaver, E. Simiu, Hurricane Wind Speeds in the United States, NBS Building Science Series 124, Nat. Bur. Stand., Washington, DC, 1980.

140

E. Casson, S. Coles / J. Wind Eng. Ind. Aerodyn. 74—76 (1998) 131–140

[2] M.R. Leadbetter, G. Lindgren, H. Rootze´n, Extremes and Related Properties of Random Sequences and Series, Springer, New York, 1983. [3] S.G. Coles, E.A. Casson, Extreme value modelling of hurricane wind speeds, J. Struct. Eng., (1998), submitted. [4] P.J. Diggle, R.A. Moyeed, J.A. Tawn, Model-based geostatistics, Appl. Statist. 47 (Part 3) (1998) 299—350. [5] A.F.M. Smith, G.O. Roberts, Bayesian computation via the Gibbs Sampler and related Markov chain Monte Carlo methods, J. R. Statist. Soc. B (1993) 3—24. [6] E.A. Casson, S.G. Coles, Spatial regression models for extremes, Extremes (1998) submitted.