Journal of Hydrology 476 (2013) 188–199
Contents lists available at SciVerse ScienceDirect
Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol
Regional estimates of intense rainfall based on the Peak-Over-Threshold (POT) approach Alain Mailhot a,⇑, Simon Lachance-Cloutier b, Guillaume Talbot a, Anne-Catherine Favre c a
INRS-Eau, Terre et Environnement, 490 de la Couronne, Québec, Canada G1K 9A9 Centre d’Expertise Hydrique du Québec (CEHQ), Aile Taschereau, 675, boulevard René Lévesque Est, Québec, Canada G1R 5V7 c Institut national polytechnique de Grenoble (GINP)/École nationale supérieure Énergie, Eau et Environnement (ENSE3) Laboratoire d’étude des Transferts en Hydrologie et Environnement (LTHE), BP 53, 38041 Grenoble, Cedex 9, France b
a r t i c l e
i n f o
Article history: Received 13 January 2012 Received in revised form 21 August 2012 Accepted 24 October 2012 Available online 2 November 2012 This manuscript was handled by Andras Bardossy, Editor-in-Chief, with the assistance of Uwe Haberlandt, Associate Editor Keywords: Extreme rainfall Regional frequency analysis Partial duration series Intra-annual variability Spatial covariates
s u m m a r y The Peak-Over-Threshold (POT) approach is an interesting alternative to the one based on Annual Maxima (AM) series since it gives the opportunity to take into consideration extreme events that would not be considered otherwise. It has also been recognized that the regional approach improves statistical inference when compared to the local approach, assuming that the region is statistically homogeneous. A regional POT approach was developed and applied to the network stations located in southern Québec. POT series for 5-, 10-, 15-, 30-min and 1-, 2-, 6- and 12-h durations were constructed assuming a fixed exceedance rate. An analysis of local POT series showed that the intra-annual variability of the Generalized Pareto Distribution (GPD) parameters needs to be taken into consideration. Models of various complexities were defined combining local and regional representations as well as the intra-annual variability of GPD parameters. Regional likelihood was estimated and models were compared based on the Akaike Information Criterion (AIC). Models with regional shape and scale parameters and accounting for intra-annual variability were selected for all durations. Spatial covariates were also introduced through a simple model linking GPD parameters to latitude, longitude and altitude. The sensitivity of results to threshold values and selected models was also investigated. Interpolated maps of intense rainfall over the studied area are finally proposed. Ó 2012 Elsevier B.V. All rights reserved.
1. Introduction Two approaches are usually considered for the statistical estimation of the intensities of rare rainfall events. The most common is based on Annual Maxima (AM) series and consists in extracting from available records the maximum rainfall recorded each year during a specific duration (e.g. 5 min or 1 h; Coles 2001). Based on extreme value theory, it can be shown that these series can be statistically represented by the Generalized Extreme Value (GEV) distribution (Coles, 2001). The Gumbel distribution (a special case of the GEV distribution) is also used to represent AM series. The second approach is called the Peak-Over-Threshold (POT) or Partial-Duration (PD) approach (Coles 2001; Madsen et al. 1997a,b; Kingumbi and Mailhot 2010). The idea is to keep, from recorded series, rainfall values above a predefined threshold. The resulting series are commonly represented by the Generalized Pareto Distribution (GPD) (Madsen et al. 1997a). Other distributions ⇑ Corresponding author. Tel.: +1 418 654 3821; fax: +1 418 654 2600. E-mail addresses:
[email protected] (A. Mailhot),
[email protected] (S. Lachance-Cloutier),
[email protected] (G. Talbot),
[email protected] (A.-C. Favre). 0022-1694/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jhydrol.2012.10.036
have also been proposed to represent exceedance magnitudes, among others the Weibull (Miquel, 1984; Ekanayake and Cruise, 1993) and lognormal distributions (Rosbjerg et al., 1991). The advantage of the POT approach is that more than one extreme value can be considered each year while only the most extreme value is kept using the AM approach. The POT approach therefore improves the sampling of extreme events. Studies by Madsen et al. (1997a,b,) showed that the POT–GPD approach is generally more efficient than the AMS–GEV model. Two reasons explain the fact that the POT approach is less often used than the AM approach. Firstly, the definition of the threshold is, to some extent, arbitrary; secondly, POT series can be autocorrelated (Madsen et al., 1997a). In Canada, observed rainfall records used to assess the intensity of extreme rainfall are generally short and spatial coverage is sparse. Evaluating the intensity of extreme rainfall events is therefore challenging and efficient approaches are needed to extract the information from available series, especially for events with a return period much longer than the sample size. In this context, the POT approach may be seen as an interesting alternative to AM series. The application of the POT approach implies that a threshold has to be defined. It is also important to account for a
A. Mailhot et al. / Journal of Hydrology 476 (2013) 188–199
possible intra-annual variability of the GPD parameters since meteorological systems generating extreme rainfall events vary in terms of spatial and temporal extents during the year. Taking into account a possible intra-annual variability may improve the homogeneity of the series. Regional analysis may also be an interesting option. It is based on the hypothesis that local series can be represented by a regional distribution with both regional and local parameters (Hosking and Wallis, 1997). The fact that some parameters have a regional character implies that series from multiple sites can be used to estimate these parameters and therefore reduces uncertainties in these regional parameters and extreme rainfall estimates. Buishand (1991) considered a regional joint likelihood function with an application to AM series of daily precipitation in the Netherlands. He proposed an approach with a common shape parameter of the GEV distribution for all stations within a given region. Regional analysis of the POT series can be achieved through different means. Madsen et al. (2002) used a generalized least squares regression model that explicitly accounts for intersite correlation to describe the variability of the POT parameters using physiographic and climatic characteristics. These authors showed that, for concurrent events, the correlation was rather small for short duration (10 min.) and more important for longer duration (24 h.). Beguería and Vincente-Serrano (2006) obtained at-site parameters estimates using the method of probability-weighted moments. These sets of parameters were then regressed upon a set of explanatory variables to obtain a spatially explicit probability model. Both approaches consist in the regression of the GPD parameters on explanatory variable after performing at-sites estimations. The objective of this paper is to develop a general framework for the implementation of a regional version of the POT approach based on the definition of a joint likelihood function as initially proposed by Buishand (1991). This approach is developed through an application to Quebec data. As will be showed, it is essential in this context to take into consideration the intra-annual variability of parameters. Models of increasing complexity in terms of the representation of intra-annual variability, and based on various combinations of regional and local parameters, were compared using the Akaike criterion (Akaike, 1973). Spatial covariates were introduced to describe the spatial dependency of parameters. In our study, explanatory variables are explicitly taken into consideration in the estimations of the GPD parameters through the regional joint likelihood function. The article is organized as follows. Section 2 provides a description of the available data. A general description of the POT approach is presented in Section 3 along with a specific discussion about threshold estimation and how POT series were constructed. Section 4 presents the methodology used to analyze available rainfall series while selected regional models are presented and discussed in Section 5. Section 6 addresses the issue of spatial covariates. Section 7 investigates sensitivity to threshold and model selection and provides interpolated regional maps summarizing main results. Finally, Section 8 presents a summary and conclusions.
189
Fig. 1. Location of stations used in this study. Three zones (Z1, Z2, Z3) are defined for demonstration purpose (see Fig. 3).
Fig. 2. ‘‘Bar code’’ graph displaying the period covered by the records at each station. Discontinuities in record ‘‘life span’’ lines are due to missing data over the corresponding periods.
2. Available data Maximum daily rainfall depths at stations (107 stations for 5-, 10-, 15-, 30-min durations and 109 stations for 1-, 2-, 6- and 12h durations) located in the southern part of the Province of Québec were considered (Fig. 1; please note that zones Z1, Z2 and Z3 identified in Fig. 1 are defined for demonstration purposes; see Fig. 3). These stations are operated either by the Ministère du Développement Durable, de l’Environnement et des Parcs (MDDEP) or by Environment Canada. Available records cover the May to October
Fig. 3. Average threshold values for 30-min series over stations of a specific region and for a mean number of six threshold excesses per MO period, as a function of calendar day. Dashed line refers to zone Z1, dotted line to zone Z2 and bold line to zone Z3 (see Fig. 1 for the location of the three zones).
190
A. Mailhot et al. / Journal of Hydrology 476 (2013) 188–199
(MO) period (stations are closed during the winter period). The number of available years of records ranges from 10 to 51 years. Stations located north of the 49°N parallel were discarded. Fig. 2 gives an overview of the available recorded period at each station (for technical reasons, year 2000 is missing for almost all stations). It is important to note that maximum daily values correspond to maximum values over a moving window of a fixed duration (e.g. 15-min) for a given day. Each value is therefore the maximum rainfall value over a fixed duration recorded during a calendar day starting at midnight and finishing 24 h later. The probability that the real maximum rainfall overlaps two days increases as duration increases (e.g. a 12-h event will more likely overlap two days than a 10-min event). The probability of underestimating the maximum values therefore increases for events of longer duration.
trends. This test was applied to the series of values exceeding the threshold as well as to the series of durations separating consecutive threshold excesses (to check if the average time between threshold excesses was constant over time). Results show that, for all durations and all months, percentages of stationary series (at a level of confidence of 95%) range from 92% to 99% of the total number of series indicating that POT series can be assumed stationary.
4. Statistical analysis of POT series Values above the threshold are represented by the GPD for which the probability density function (pdf) is given by (following the notation of Hosking and Wallis, 1997):
3. Threshold estimation and construction of POT series A threshold needs to be defined for the construction of POT series. Various methods can be found in the literature to guide our choice of threshold – the Hill plot (Hill, 1975), the mean excess plot (Davison and Smith, 1990), the bootstrap method based on mean squared error (Danielsson and de Vries, 1997; Danielsson et al., 2001). An alternative is to fix the average number of times the threshold is exceeded over a given period (Dierckx and Teugels, 2010; Kysely et al. 2010; Coelho et al. 2008; Mendez et al. 2008; Trefry et al. 2005; Smith, 1989). For some other approaches to determine the threshold, the reader is referred to Coles (2001; pp. 78–80). The mean excess plot method aims at estimating the threshold value based on a linear relationship between threshold value and the mean excesses of the threshold (see also Lang et al. 1999). This procedure, based on the visual inspection of a graph, remains somewhat subjective and is difficult to automatize. The approach consisting of fixing the average number of times the threshold is exceeded guarantees that retained rainfall intensities at the various stations are similar in terms of their ‘‘extreme’’ character. It is particularly well suited for regions with heterogeneous rainfall regimes. As mentioned by Mekis and Hogg (1999), for Canada, applying a common threshold value over large regions is not recommended. The mean number of threshold excesses was fixed to 6 per year (MO period), which corresponds to an average of one excess per month (sensitivity to threshold choice is discussed in Section 7.2). This choice is justified by the fact that the region under study presents relatively homogeneous meteorological conditions. Threshold values were estimated for each month of the MO period in order to account for possible intra-annual variability of extremes. A declustering method was applied to minimize temporal correlation in the obtained POT series. This method identifies clusters within the series (consecutive values exceeding the threshold) and only keeps the maximum value within each cluster (Sugahara et al., 2009; Khaliq et al., 2006). Monthly thresholds were estimated using the monthly mean number of threshold excesses after declustering the series. These monthly thresholds were finally interpolated to obtain daily threshold values. Fig. 3 presents the resulting average threshold over regions displayed on Fig. 1. Important intra-annual variability of the threshold was observed with the largest values occurring in July and August. The eastern region (Z3) also displayed smaller threshold values for all months. Smaller threshold values mean that less intense rainfalls were generated during the corresponding period. After defining threshold values, POT series were constructed for each month. Since the approach is based on the hypothesis of stationary series, the modified Mann-Kendall test was applied to available POT series (Hamed and Rao, 1998) to check for possible
fGPD ðxÞ ¼
8 11 > 1 ðx nÞ k > > k–0 < 1k
a
a
a
a
ð1Þ
> 1 xn > > : exp k¼0
where n (threshold) corresponds to the location parameter ðn 2 RÞ, a to the scale parameter n 2 Rþ , and k to the shape parameter ðk 2 RÞ. The function is defined for n 6 x 6 n þ ak if k > 0 and n 6 x < 1 if k 6 0 (heavy-tailed).
4.1. Intra-annual variability Fig. 3 clearly shows the importance of threshold intra-annual variability. The analysis of monthly POT series reveals that the GPD parameters also display intra-annual variability. This result is not surprising when considering the very different nature of meteorological systems generating extreme rainfall events throughout the year (for a discussion on that issue see chapter 2 of CSA, 2010). Intra-annual variability was described by representing GPD parameters as Fourier series (Katz et al. 2002):
kðnÞ ¼ k0 þ ¼ a0 þ
mk h X p p i k1j cos 2jn þ k2j sin 2jn D D j¼1 ma h X
a1j cos 2jn
j¼1
p D
p i þ a2j sin 2jn D
ð2Þ
where n denotes the calendar day, D = 365.25 days, mk and ma the number of modes for parameter k and a respectively. Fourier coefficients {k1j, k2j}, j = 1, . . . , mk and {a1j, a2j}, j = 1, . . . , ma must be estimated. The total number of parameters to be estimated is thus 2(mk + ma + 1). Table 1 Number of parameters of the various models with Fourier representation of intraannual variability. k Regional
Local
Number of modes
0
1
2
0
1
2
Regional
0 1 2
2 4 6
4 6 8
6 8 10
2 4 6
4 6 8
6 8 10
Local
0 1 2
108 322 536
110 324 538
112 326 540
108 322 536
110 324 538
112 326 540
a
191
A. Mailhot et al. / Journal of Hydrology 476 (2013) 188–199
Fig. 4. Nomenclature for models using a Fourier representation of the intra-annual variability of parameters. Models using monthly parameters are identified by replacing the number of modes by a ‘‘m’’.
Table 2 Names and number of parameters (in parenthesis) of models with monthly parameter values. k Regional
Local
mRmR (12) mLmR (648)
mRmL (648) mLmL (1284)
a Regional Local
] 3 [0, 1, 2 modes for k]). Table 1 presents the models based on Fourier representation of the intra-annual variability of parameters with the corresponding number of parameters to be estimated. Fig. 4 describes the proposed model nomenclature. Models using monthly parameters values were also considered (Table 2), while models with mixed Fourier representation and monthly parameter values were not retained. 4.3. Model comparison and selection Model parameters were estimated by maximizing the logarithm of the likelihood (log-likelihood) function (for a general discussion on the estimation of GPD parameters, refer to de Zea Bermudez and Kotz, 2010). The parameters estimates are thus defined as: Np S X X ^h ¼ Arg max fArg max log½f ðxi;p jhpðLÞ hðRÞ Þg hðRÞ
4.2. Proposed models Various models were defined where scale and shape parameters of the GPD: (1) were either local or regional and; (2) described by Fourier series with 0, 1 or 2 modes. A total of 36 models were considered (2 [a or k] x 2 [regional or local] 3 [0, 1, 2 modes for a
p¼1
hðLÞ
ð3Þ
i¼1
with f() the GPD pdf, h = [h(L), h(R)] the set of model parameters, ^ h the optimal parameter set, hðLÞ p the set of local parameters at station p, h(R) the set of regional parameters, S the total number of stations, Np the number of POT values at station p, xi,p POT values (i = 1, . . . , Np) at station p. Arg max f ðxÞ defines the value of x which maxix mizes f(x). Optimization was iteratively achieved by initially proposing a set of regional parameters. Using this set of parameters,
Fig. 5. AICc ranking of the 40 models for 30-min duration events.
192
A. Mailhot et al. / Journal of Hydrology 476 (2013) 188–199
Fig, 6. Five top models for the various durations. Note that the different scale for 6- and 12-h durations.
local parameters maximizing the log-likelihood function at each site were then estimated. The ‘‘regional’’ log-likelihood was evaluated by making the product of likelihood values at each site. Regional parameters were then updated to maximize the regional likelihood and this process was iterated until the regional log-likelihood was maximized. This optimization procedure was selected for its simplicity and because it leads to reasonable computing time. However it doesn’t guarantee that global optimal parameter sets are identified especially for more complex regional models. In many cases, it shows that, even if estimated parameters are not globally optimal, regional models are preferred, in terms of AIC score, to local ones. Eq. (3) implicitly assumes spatial independence of rainfall series at the various stations. This hypothesis seems reasonable for short duration extreme rainfalls (e.g. 10 min generally associated to localized convective systems) but is questionable for long duration rainfalls as these are generally associated to large scale synoptic systems (Madsen et al. 2002). However, spatial correlations, although increasing sampling uncertainties, are not expected to introduce additional bias (Buishand, 1991).
The Akaike Information Criterion (AIC) was used to compare and select the most appropriate model (Akaike, 1973; Burnham and Anderson, 2002). Following the recommendation of Burnham and Anderson (2004), the corrected AIC, AICc, score was used since it is more appropriate for small sample size. AIC and AICc scores converge to the same result for large sample size. The AICc score is given by:
2KðK þ 1Þ nK 1 2Kn ^ ¼ 2 log½‘ðhÞ þ nK 1
AIC c ¼ 2 log½‘ð^hÞ þ 2K þ
ð4Þ
with K the number of parameters and n the sample size. The model having the smallest AICc score was selected. Models were compared using the following metrics (Burnham and Anderson, 2004):
Di ¼ ðAIC c Þi mini¼1;...M ½ðAIC c Þi
ð5Þ
with (AICc)i the AICc score for model i and M the number of models considered. Di is a measure of the difference between model i and the model with the smallest AICc score. The Di values allow a quick
193
A. Mailhot et al. / Journal of Hydrology 476 (2013) 188–199
ranking of the proposed models. An analysis based on the maximum likelihood ratio was also tested when models of various complexities are embedded (Coles 2001). The results of these tests are not presented since they were identical to the results obtained using AICc score.
Table 3 Names and number of parameters (in parenthesis) of models with spatial covariates and their corresponding models without spatial covariates (two first lines). Covariates
Number of modes
0
1
2
1 2 1 2 1 2 1 2
1R0R (4) 2R0R (6) 1-0-L (8) 2-0-L (12) 1-0-G (8) 2-0-G (12) 1-0-E (8) 2-0-E (12)
1R1R (6) 2R1R (8) 1-1-L (12) 2-1-L (16) 1-1-G (12) 2-1-G (16) 1-1-E (12) 2-0-E (16)
1R2R (8) 2R2R (9) 1-2-L (16) 2-2-L (20) 1-2-G (16) 2-2-G (20) 1-2-E (16) 2-2-E (20)
Latitude + Longitude (LG)
1 2
1-0-LG (12) 2-0-LG (18)
1-1-LG (18) 2-1-LG (24)
1-2-LG (24) 2-2-LG (30)
Latitude + Longitude + Altitude (LGA)
1 2
1-0-LGE (16) 2-0-LGE (24)
1-1-LGE (24) 2-1-LGE (32)
1-2-LGE (32) 2-2-LGE (40)
5. Regional versus local models with Fourier and monthly parameters
a
Fig. 5 presents the ranking of the 40 models for 30-min duration events. We can observe that:
Latitude (L)
The ‘‘local-local’’ models (local for scale and shape parameters) do not perform very well since the ‘‘best’’ local model ranks 16th (2L0L model). The ‘‘no mode’’ models (with no intra-annual variability) are among the ‘‘worst’’ models; they rank successively 35th (0L0R), 36th (0R0R), 38th (0L0L) and 39th (0R0L). The monthly parameter models are always outperformed by one of the corresponding Fourier representation models. Models with no mode on the scale parameter performed poorly in all cases. The regional–regional models are identified as the ‘‘best’’ set of models with the four top ranks and the corresponding monthly parameter model ranked 8th. Models 2R2R and 2R1R are selected as the ‘‘best’’ models among the 40 models considered.
Altitude (A)
None
These conclusions remain valid for all other durations. Fig. 6 presents the top five models for each duration. As can be seen, regional–regional models dominate and more specifically models 2R2R or 2R1R are among the top two models in all cases. Differences between the top five models are rather small for 6and 12-h durations. 6. Regional models with spatial covariates The selection of regional–regional models means that the only ‘‘local information’’ used in estimating rainfall intensities at a given site is the threshold value. Selected models only have 8 or 10 parameters (plus the local position parameters). Scale and shape parameters include regional Fourier representation of intra-annual variability. This apparent simplicity may hide some spatial structure of the parameters. Spatial covariates were therefore introduced. For the shape parameter, the proposed models are of the following form: ð0Þ ðLÞ ðGÞ ðAÞ k0 ¼ k0 þ k0 L þ k0 G þ k0 ð0Þ ðLÞ ðGÞ ðAÞ Aklj ¼ klj þ klj L þ klj G þ klj A ð0Þ
ðLÞ
ðGÞ
ðAÞ
ð0Þ
ð6Þ ðLÞ
ðGÞ
ðAÞ
where ½k0 ; k0 ; k0 ; k0 and ½klj ; klj ; klj ; klj are the coefficients of the Fourier representation defined as a linear regression of spatial covariates (L stands for latitude, G for longitude and A for altitude; j refers to the mode and l = 1, 2 see Eq. (2). Spatial covariates for the scale parameter a are defined in a similar way. Only the regional– regional models with an intra-annual variability of the scale parameter were retained. Five models with the following spatial covariates were considered: (1) latitude (L); (2) longitude (G); (3) altitude (A); (4) latitude and longitude (LG); and (5) latitude, longitude and altitude (LGA). Spatial covariates were considered for all modes for both scale and shape parameters. Table 3 presents the list of models considered and Fig. 7 describes model nomenclature. It is interesting to note that surrogates of spatial covariates (e.g. mean annual precipitation) could also be considered to define spatial dependence (as in Alila, 1999; Brath et al. 2003; Di Baldassarre et al. 2006).
k
Longitude (G)
Fig. 7. Nomenclature for regional models with spatial covariates.
Fig. 8 presents the results for the 30-min duration. We can observe that: The model with spatial covariates outperformed the corresponding model without covariates (except when altitude was considered). Latitude had a far more important impact on model performance than longitude; the introduction of both longitude and latitude slightly increased model performance compared with models where only latitude is considered. The introduction of altitude degraded the AICc ranking except for 6- and 12-h duration (even when comparing LGA models to the corresponding LG models). Models 2-1-LG (with longitude and latitude as spatial covariates and two modes for a and one mode for k) outperformed all other models; it is followed by model 2-2-LG. Fig. 9 presents the AICc ranking for the top five models with spatial covariates. Models with covariates dominate models without spatial covariates in all cases (the ‘‘best’’ model without covariates ranks 12). Models with two modes for the scale parameters are ranked first for all durations except 12-h. No clear pattern appears for the dominant spatial covariate to consider (top five models for the various durations include model with L, LG and LGA covariates). 7. Estimation of intense rainfall depths Rainfall depths xbi for a given duration and return period were estimated at each station by solving the following equation:
i 2 1 X k h b j; b ¼ 1 F GPD b kj; a n ij x i jb T j¼j NMO j
ð7Þ
1
bj; b with F GPD ðb xi j b kj ; a n j Þ, the cumulative distribution function of the b GPD, ð a j ; b k j Þ the regional parameters values for calendar day j, c nij the threshold value at station i for calendar day j, T the return period (in MO years) and k the mean number of threshold excesses per MO
194
A. Mailhot et al. / Journal of Hydrology 476 (2013) 188–199
Fig. 8. AICc ranking of the 30 models with spatial covariates with the six corresponding models without spatial covariates for 30-min duration.
year (six in this analysis). The sum is over the calendar days of the MO year (j1 = 121, j2 = 304 and NMO = 184). 7.1. Sensitivity to model selection In order to compute xbi values, we needed to select a model. On principle, selected models should correspond to those with the smallest AICc scores. Fig. 9 however shows that the differences of AICc scores between best ranked models may be very small. Consequently one may ask how sensible the results are to the model selection. Akaike weights, xi, can be attributed to the various models based on their respective Di values (Burnham and Anderson, 2004) as follows:
expðDi =2Þ
xi ¼ PM
m¼1
expðDm =2Þ
ð8Þ
Akaike weights xi are representative of the probability that model i is the ‘‘best’’ model for the available data (‘‘best’’ among the proposed models and in the sense of the Kullback–Leibler information theory; for theoretical details see Burnham and Anderson, 2002 and Burnham and Anderson, 2004).
Table 4 presents the resulting xi values of the top three models for each duration and cumulative Akaike weights for models ranked higher than third (we assume models are sorted by increasing D values):
xP4 ¼
M X
xi
! ð9Þ
i¼4
Table 4 shows that, except for the 12-h duration, the best ranked model widely dominates in terms of Akaike weights. Fig. 10 presents the differences in estimated rainfall depth between the first ranked model and 2nd and 3rd models for 30-min duration as a function of longitude. The two best ranked models (2-1-LG and 22-LG) integrate latitude and longitude as spatial covariates and Fig. 10 confirms the very small differences in estimated rainfall depth between these two models. The third ranked model (2-2-L) only considers latitude and Fig. 10 effectively shows that differences between 2-1-LG and 2-2-L models vary with longitude (maximal absolute differences are of ±1.5 mm or ±5% to 7%). Although according to Akaike weights a ‘‘mixture’’ of the top three ranked models should be considered for the inference of intense rainfall depth, only the top ranked model was considered because of small differences between estimated rainfalls. Similar conclusions hold
A. Mailhot et al. / Journal of Hydrology 476 (2013) 188–199
195
Fig. 9. Top five models with spatial covariates for the various durations.
Table 4 Top three models with their Akaike weights xi (in parenthesis) and cumulative Akaike weights for models ranked 4th and higher. Duration
1st Model
2nd Model
3rd Model
xP4
5-min 10-min 15-min 30-min 1-h 2-h 6-h 12-h
2-1-LGA (0.76) 2-1-LGA (0.63) 2-2-LGA (0.60) 2-1-LG (0.91) 2-1-LG (0.56) 2-1-L (0.51) 2-0-LGA (0.97) 1-1-LGA (0.43)
2-1-LG (0.21) 2-1-LG (0.32) 2-1-LGA (0.39) 2-2-LG (0.07) 2-1-L (0.30) 2-2-L (0.44) 1-0-LGA (0.02) 2-0-LGA (0.25)
2-2-LG (0.02) 2-2-LG (0.03) 2-2-LG (0.01) 2-2-L (0.01) 2-2-L (0.10) 2-1-LG (0.04) 2-0-L (0.00) 1-0-LGA (0.21)
0.01 0.02 0.00 0.01 0.04 0.01 0.01 0.11
for all durations and therefore only the first ranked models are considered. 7.2. Sensitivity to mean threshold excess rate Mean threshold excess rate (MTER) was fixed equal to 6 per MO period (in average one threshold excess per month). Considering the relative ‘‘arbitrariness’’ of this value, it is important to investi-
Fig. 10. Differences in estimated rainfall depth (mm) at the 107 stations between 21-LG and 2-2-LG models (black dots) and between 2-1-LG and 2-2-L models (squares) for 30-min duration and 10-year return period, as a function of the longitude.
196
A. Mailhot et al. / Journal of Hydrology 476 (2013) 188–199
Fig. 11. Top five ranked models for 30-min. duration for the various MTER.
gate the sensitivity of previous results to this choice. Three other MTER values were therefore considered: 1, 1.5 and 3 threshold excesses per MO period. The previous analysis was repeated using these rates. Only results for 30-min duration are presented for conciseness. Similar results and conclusion were obtained for the other durations. Fig. 11 presents the top five models for the 30-min. duration and for the various MTER. The rate clearly has a huge impact on selected models. For example, the top model for a MTER of 6 per MO period, 2-1-LG, is ranked 3rd for a MTER of 3 and is not even among the five top models for MTER = 1.5 or 1 (in fact 2-1-LG ranks 9 for MTER = 1.5 and 12 for MTER = 1). A close examination of Fig. 11 globally suggests that a decreasing rate leads to the selection of more simple models both in terms of Fourier representation and spatial covariates. This conclusion is not counterintuitive considering that a lower MTER implies a less extensive sampling of intense rainfall (less data retained in POT series). A more restrictive sampling reduces our ability to infer complex patterns of intra-annual variability as well as a possible dependence on spatial covariates. It is important to note that, even for the smallest MTER, all high ranking models integrate intra-annual variability and spatial covariates. Relative differences between rainfall depths estimated for the various MTER values and the corresponding selected models were computed. As can be seen from Fig. 12, relative differences range between ±7% for MTER = 3 to approximately ±13% for MTER = 1. Differences also apparently increase as MTER decreases (similar results were obtained for other durations). These results suggest that the choice of MTER has an impact on the estimated rainfall depths. Differences in estimated rainfall values for the range of MTER between 3 and 6 (a plausible range according to the literature) remains relatively small. This result, combined with the fact that a MTER value of 6 threshold excesses per MO year allows a more complete investigation of intra-annual variability and possible dependence on spatial covariates, justifies our choice of MTER. 7.3. Interpolated rainfall depths The estimated rainfall depths xbi corresponding to a 10-year return period were then interpolated using the ordinary kriging
Fig. 12. Relative differences (%) at the various stations between rainfall depths (30min. duration and 10-year return period) of first ranked model for MTER 3, 1.5 and 1 and corresponding values for MTER = 6 threshold excesses per MO period. x-axis identifies MTER values with the corresponding selected model.
procedure (Zhang and Srinivasan, 2009). The selected model is the one which minimizes AICc score for MTER = 6 threshold excesses per MO year. Fig. 13 presents examples of the resulting map for 5-min, 30-min, and 6-h durations. The emerging global picture reveals an increase along a south-west to north-east axis for small durations (Fig. 13a) slowly shifting to a south to north axis as duration increases (Fig. 13b and c).
8. Summary and conclusions The POT approach may be seen as an interesting alternative to annual maxima series for the statistical analysis of intense rainfalls. By retaining rainfall values over a given threshold, the POT approach improves the sampling of extremes and therefore possibly provides better estimates of more extreme rainfall events.
A. Mailhot et al. / Journal of Hydrology 476 (2013) 188–199
197
Fig. 13. Ten-year return period rainfall depths (mm) for: (a) 5-min (2-1-LGA); (b) 30-min (2-1-LG model); and (c) 6-h (2-0-LGA model).
Regional analysis, where series from various stations are combined under the hypothesis of a common statistical distribution with regional and local parameters, is another interesting way to improve statistical inference of extremes. The main objective of this study was to develop a regional version of the POT approach. Development of this framework was made through an application to Quebec data but it is important to stress that the approach remains general. Intra-annual variability was taken into consideration through a Fourier representation of the GPD parameters. Models of variable complexity were constructed combining regional and local parameters with Fourier representation with 0, 1 or 2 modes of intra-annual variability of model parameters. Models were selected based on the corrected Akaike criterion. The developed approach was applied to maxi-
mum daily rainfall depths (107 stations for 5-, 10-, 15-, 30-min durations and 109 stations for 1-, 2-, 6- and 12-h durations). Threshold values were determined by specifying a mean number of six threshold excesses per May to October period. A declustering technique was applied to eliminate possible temporal correlation in POT series. It is important to note that, due to the type of rainfall data used in this study (daily maximum value), rainfall estimates for 2-, 6and 12-h are underestimated and should be used with caution (underestimation increases with duration as the probability that rainfall events overlap two calendar days increases). Other sources of data (e.g. hourly rainfall records) could be used and combined with daily maximum rainfall series data to estimate 2-, 6- and 12-h rainfalls.
198
A. Mailhot et al. / Journal of Hydrology 476 (2013) 188–199
Selected models include intra-annual variability for all durations which is a consequence of the very different nature of meteorological systems generating extremes during a calendar year. Continuous Fourier representation of intra-annual variability was also preferred to the one based on discrete monthly values. Regional models (both for scale and shape parameters) were preferred to local models for all durations. The 2R2R model (regional scale and shape parameters with two modes each) is the best ranked model for 5-min to 2-h durations while the 1R2R (regional scale and shape parameters with one and two modes respectively) is preferred for the 6-h duration and the 2R1R model (regional scale and shape parameters with two and one modes respectively) is the best ranked model for the 12-h duration. It must be noted that differences in Akaike score values between the top two ranked models were often very small. The only ‘‘local information’’ of models with regional scale and shape GPD parameters is the threshold value. Estimated scale and shape parameters are somehow regionally averaged values. The integration of spatial covariates was therefore considered in order to identify the possible spatial structure of these parameters and take into consideration regional non-homogeneities. Simple models (linear relationships of Fourier coefficients with latitude, longitude and altitude) were considered. Models with spatial covariates widely outperformed those without spatial covariates. Latitude and longitude were the dominant covariates for all durations. More complex representations of the spatial covariate structure could be defined as well. As discussed by Burnham and Anderson (2002), one of the weaknesses of the Akaike criterion is that we may select the ‘‘less wrong model among wrong models’’. Proposed models were hierarchically constructed starting from the simplest model (no spatial covariates or no intra-annual variability) to more and more complex models (one or more spatial covariates or a Fourier representation of intra-annual variability with more and more modes). It is therefore expected that the selected models are the most parsimonious models which best fit to available series and that the dominant information content of data is reasonably represented by the selected models. Parameter estimation using the maximization of the regional log-likelihood function explicitly supposed spatially uncorrelated POT series at the various stations. Buishand (1991) however mentioned that spatial correlation between stations does not introduce additional bias but increases standard errors. Madsen and Rosbjerg (1997) mentioned that spatial correlation may result in an underestimation of regional parameter uncertainties and may as well induce erroneous conclusions about regional homogeneity. Although it is reasonable to assume that spatial correlation between station POT series for small durations may be small, it could increase for longer duration events (e.g. 2–12-h) and further investigation is needed to assess the potential impact of these correlations on estimated parameters. Model’s ‘‘weight of evidence’’ (according to the expression of Burnham and Anderson, 2004) can be accessed through the Akaike weights. The weights of top three ranked models are larger than 0.95 for all durations, which shows that these models clearly ‘‘outperformed’’ other models. However, the sensitivity of estimated intense rainfall to the selected model remains small between top ranked models. Threshold selection is a major issue when applying the POT approach. The selected threshold ‘‘defines’’ the intense rainfall sample. Since threshold selection is, to some extent, arbitrary (no unique and formally value can be defined), it was important to examine results sensitivity to threshold selection or similarly to the fixed mean threshold excess rate, MTER (as in this study). Three other MTER values were considered (1, 1.5 and 3 threshold excesses per MO period). A MTER value of 6 threshold excesses
per MO year seems a reasonable value which enables the investigation of possible intra-annual variability and spatial structure. Further investigation is however needed since defining a unique MTER value for a region showing very heterogeneous intra-annual rainfall patterns may be questionable (e.g. this may result for some regions to an oversampling of extreme events in specific months). Acknowledgements This work was funded in part by the Climate Change and Adaptation Program of Natural Resources Canada, by the Consortium Ouranos on Regional Climatology and Adaptation to Climate Change and by the Collaborative Research and Development (CRD) Grants of the Canadian Natural Sciences and Engineering Research Council of Canada (RDCPJ_387169 – 09, owned by A.C. Favre). The authors also thank Ms Catherine Savard of the Ministère du Développement Durable, de l’Environnement et des Parcs (MDDEP) of the Québec government who provided the observed series. Finally the authors would like to thank reviewers for their constructive comments and suggestions. References Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In: B.N. Petrov and F. Csaki (Eds.), Proceedings of the Second International Symposium on Information Theory, Akademiai Kiado, Budapest, p. 267–281. Alila, Y., 1999. A hierarchical approach for the regionalization of precipitation annual maxima in Canada. J. Geophys. Res. 104 (D24), 31,645–31,655. Beguería, S., Vicente-Serrano, S.M., 2006. Mapping the hazard of extreme rainfall by peaks over threshold extreme value analysis and spatial regression techniques. J. Appl. Meteor. Climatol. 45 (1), 108–124. http://dx.doi.org/10.1175/jam2324.1. Brath, A., Castellarin, A., Montanari, A., 2003. Assessing the reliability of regional depth-duration-frequency equations for gaged and ungaged sites. Water Resour. Res. 39 (12), 1367–1379. http://dx.doi.org/10.1029/2003WR002399. Buishand, T.A., 1991. Extreme rainfall estimation by combining data from several sites. Hydrol. Sci. J. 36 (4), 345–365. Burnham, K.P., Anderson, D.R., 2002. Model Selection and Multimodel Inference. A Practical Information–Theoretic Approach. Springer, New York, USA, 488p. Burnham, K.P., Anderson, D.R., 2004. Multimodel inference – understanding AIC and BIC in model selection. Sociological Methods & Research 33 (2), 261–304. http:// dx.doi.org/10.1177/0049124104268644. Coelho, C.A.S., Ferro, C.A.T., Stephenson, D.B., Steinskog, D.J., 2008. Methods for exploring spatial and temporal variability of extreme events in climate data. J. Clim. 21 (10), 2072–2092. http://dx.doi.org/10.1175/2007jcli1781.1. Coles, S., 2001. An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics, Spring-Verlag, London, UK, 208p. CSA, 2010. Development, Interpretation and Use of Rainfall Intensity-DurationFrequency (IDF) Information: Guideline for Canadian Water Resources Practitioners. Canadian Standard Association, Mississauga Ontario, 149p. Danielsson, J., De Vries, C.G., 1997. Tail index and quantile estimation with very high frequency data. J. Empirical Finance 4, 241–257. Danielsson, J., De Haan, L.L.P., De Vries, C.G., 2001. Using a bootstrap method to choose the sample fraction in tail index estimation. J. Multivar. Anal. 76, 226– 248. Davison, A.C., Smith, R.L., 1990. Models for exceedances over high thresholds (with discussion). J. Roy. Stat. Soc. 52, 393–442. de Zea Bermudez, P., Kotz, S., 2010. Parameter estimation of the generalized Pareto distribution – Part I. J. Stat. Plan. Infer. 140 (6), 1353–1373. http://dx.doi.org/ 10.1016/j.jspi.2008.11.019. Di Baldassarre, G., Castellarin, A., Brath, A., 2006. Relationships between statistics of rainfall extremes and mean annual precipitation: an application for designstorm estimation in northern central Italy. Hydrol. Earth Syst. Sci. 10, 589–601. Dierckx, G., Teugels, J.L., 2010. Change point analysis of extreme values. Environmetrics 21 (7–8), 661–686. http://dx.doi.org/10.1002/env.1041. Ekanayake, S.T., Cruise, J.F., 1993. Comparisons of Weibull- and exponential-based partial duration stochastic flood models. Stoch. Hydrol. Hydraul. 7 (4), 283–297. Hamed, K.H., Rao, A.R., 1998. A modified Mann–Kendall trend test for autocorrelated data. J. Hydrol. 204 (1–4), 182–196. http://dx.doi.org/10.1016/ s0022-1694(97)00125-x. Hill, B.M., 1975. A simple general approach to inference about the tail of a distribution. Ann. Stat. 3 (5), 1163–1174. Hosking, J.R.M., Wallis, J.R., 1997. Regional Frequency Analysis: An Approach Based on L-moments. Cambridge University Press, Cambridge, UK, 224p. Katz, R.W., Parlange, M.B., Naveau, P., 2002. Statistics of extremes in hydrology. Adv. Water Resources 25 (8–12), 1287–1304. Khaliq, M.N., Ouarda, T., Ondo, J.C., Gachon, P., Bobée, B., 2006. Frequency analysis of a sequence of dependent and/or non-stationary hydro-meteorological
A. Mailhot et al. / Journal of Hydrology 476 (2013) 188–199 observations: a review. J. Hydrol. 329 (3–4), 534–552. http://dx.doi.org/ 10.1016/j.jhydrol.2006.03.004. Kingumbi, A., Mailhot, A., 2010. Courbes Intensité-Durée-Fréquence (IDF): Comparaison des estimateurs des durées partielles et des maximums annuels. Hydrol. Sci. J. 55 (2), 162–176. http://dx.doi.org/10.1080/02626660903545995. Kysely, J., Picek, J., Beranova, R., 2010. Estimating extremes in climate change simulations using the peaks-over-threshold method with a non-stationary threshold. Global Planet. Change 72 (1–2), 55–68. http://dx.doi.org/10.1016/ j.gloplacha.2010.03.006. Lang, M., Ouarda, T., Bobée, B., 1999. Towards operational guidelines for overthreshold modeling. J. Hydrol. 225 (3–4), 103–117. http://dx.doi.org/10.1016/ S0022-1694(99)00167-5. Madsen, H., Rosbjerg, D., 1997. Generalized least squares and empirical Bayes estimation in regional partial duration series index-flood modeling. Water Resources Res. 33 (4), 771–781. Madsen, H., Pearson, C.P., Rosbjerg, D., 1997a. Comparison of annual maximum series and partial duration series method for modelling extreme hydrologic events: 2. Regional modelling. Water Resources Res. 33 (4), 759–769. Madsen, H., Rasmussen, P.F., Rosbjerg, D., 1997b. Comparison of annual maximum series and partial duration series method for modelling extreme hydrologic events: 1. At site modelling. Water Resources Res. 33 (4), 747–757. Madsen, H., Mikkelsen, P.S., Rosbjerg, D., Harremoës, P., 2002. Regional estimation of rainfall-intensity-duration-frequency curves using generalised least squares
199
regression of partial duration series statistics. Water Resources Res. 38 (11), 1239. http://dx.doi.org/10.1029/2001WR001125. Mekis, E., Hogg, W.D., 1999. Rehabilitation and analysis of Canadian daily precipitation time series. Atmosphere–Ocean 37 (1), 53–85. http://dx.doi.org/ 10.1080/07055900. Mendez, F.J., Menendez, M., Luceno, A., Medina, R., Graham, N.E., 2008. Seasonality and duration in extreme value distributions of significant wave height. Ocean Eng. 35 (1), 131–138. http://dx.doi.org/10.1016/j.oceaneng.2007.07.012. Miquel, J., 1984. Guide Pratique d’estimation des Probabilités de Crues. Editions Eyrolles, Paris, 160p. Rosbjerg, D., Rasmussen, P.F., Madsen, H., 1991. Modelling of exceedances in partial duration series. Paper presented at the International Hydrology and Water Resources Symposium, Inst. of Eng., Perth, UK. Smith, J.A., 1989. Regional flood frequency analysis using extreme order statistics of the annual peak record. Water Resources Res. 25 (2), 311–317. http:// dx.doi.org/10.1029/WR025i002p00311. Sugahara, S., da Rocha, R.P., Silveira, R., 2009. Non-stationary frequency analysis of extreme daily rainfall in Sao Paulo, Brazil. Int. J. Climatol. 29 (9), 1339–1349. http://dx.doi.org/10.1002/joc.1760. Trefry, C.M., Watkins, D.W., Johnson, D., 2005. Regional rainfall frequency analysis for the State of Michigan. J. Hydrol. Eng. 10 (6), 437–449. Zhang, X., Srinivasan, R., 2009. GIS-based spatial precipitation estimation: a comparison of geostatistical approaches. J. Am. Water Resources Assoc. (JAWRA) 45 (4), 894–906. http://dx.doi.org/10.1111/j.1752-1688.2009.003.