Assessing the applicability of fractional order statistics for computing confidence intervals for extreme quantiles

Assessing the applicability of fractional order statistics for computing confidence intervals for extreme quantiles

Journal of Hydrology 376 (2009) 528–541 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhy...

1MB Sizes 1 Downloads 92 Views

Journal of Hydrology 376 (2009) 528–541

Contents lists available at ScienceDirect

Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Assessing the applicability of fractional order statistics for computing confidence intervals for extreme quantiles Francesco Serinaldi * Dipartimento di Idraulica Trasporti e Strade, ‘‘Sapienza” Università di Roma, Via Eudossiana 18, 00184 Rome, Italy Honors Center of Italian Universities – H2CU, ‘‘Sapienza” Università di Roma, Via Eudossiana 18, 00184 Rome, Italy

a r t i c l e

i n f o

Article history: Received 3 May 2009 Received in revised form 19 June 2009 Accepted 24 July 2009 This manuscript was handled by P. Baveye, Editor-in-Chief, with the assistance of Efrat Morin, Associate Editor Keywords: Confidence intervals Quantiles Uncertainty Return period Frequency analysis Flood

s u m m a r y The main aim of frequency analysis of hydrological data is the definition of the quantiles with a given return period, also known as return levels, namely, the values xT of a hydrological variable X exceeded on average once in T time intervals. To make the analysis effective, these values have to be complemented with information about their uncertainty, which is usually quantified by confidence intervals (CIs). The paper discusses and compares some parametric, non-parametric, and simulation techniques to build CIs for quantiles with emphasis on the extreme ones. In particular, the work focuses on the usefulness of fractional order statistics to compute CIs for quantiles. This approach appears attractive as it involves simple analytical formulas and allows computing fully non-parametric CIs without requiring information about the parent distribution of the data. Since CIs for order statistics can be used to approximate CIs for quantiles based on order statistics, the relation between them is explored as well. The applicability of methods based on order statistics is assessed by studying the actual coverage probability through simulations, and applying these approaches to annual maximum peak discharge records from 22 streamgage stations in the continental United States and a streamgage station downtown Rome, Italy. For extreme quantiles, results show that these techniques are generally outperformed in term of coverage probability by other methods, such as Monte Carlo approach, when the parent distribution is well-specified or misspecification involves distributions with tail behavior similar to the one used to generate the sample. On the other hand, the application to real world data indicates that the values of confidence limits based on order statistics are close to those obtained by Monte Carlo approach. In particular, the upper limits from non-parametric setups often result in cautionary values. Ó 2009 Elsevier B.V. All rights reserved.

Introduction and motivation Frequency analysis of hydrological, environmental, and geophysical quantities is commonly applied by practitioners and researchers. One of its objectives is the definition of the return levels xT , namely, the values of a variable X corresponding to a given return period T, or analogously, the quantiles xP of X exceeded with fixed probability P ¼ 1  1=T (e.g., Chow et al., 1988). Usually, sample sizes are rather limited, and extreme quantiles cannot be defined directly from the data (by means of the empirical distribution function), but have to be computed from parametric distributions, which best fits data according to certain criteria (e.g., maximum likelihood, method of the moments, least squares method). To provide effective and meaningful results for most applications, quantile estimates have to be complemented by an assessment of their uncertainty. This task is commonly accomplished by adding CIs to point estimates. However, to correctly de* Tel.: +39 06 4458 5042; fax: +39 06 4458 5065. E-mail address: [email protected] 0022-1694/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jhydrol.2009.07.065

fine these intervals, it is necessary to keep in mind the type of uncertainty we are accounting for. In statistical analysis, the sources of uncertainties are usually grouped in three main categories with slight differences from one author to the other (e.g., van Gelder, 1999): 1. Inherent or natural uncertainty, which represents randomness and complexity of the natural process. It cannot be reduced in any way. 2. Statistical uncertainty, which is related to estimation of the parameters. It can be reduced by increasing the sample size. 3. Model uncertainty, which depends on the choice of the physical or statistical model. It cannot be reduced by increasing information (e.g., the sample size), but only by increasing the knowledge of the process, and by adopting more complex models. The motivation of this work arises from a paper by Bâ et al. (2001), who compare two methods to compute CIs for quantiles. Namely, the authors consider the so-called empirical method, which ‘‘uses the method of moments to compute the variance of

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541

the distribution function of the estimator of interest, assuming this distribution function is normal”, and a so-called analytical method based on the theoretical binomial distribution of the order statistics. In addition, the authors state that their work focuses on the model uncertainty. Finally, they conclude that ‘‘the use of the empirical (Gaussian-based) method for the calculation of CIs is not justified”, as the probability distribution of sample estimates of quantiles is a beta function of the distribution function of data, rather than Gaussian. In this work, we discuss the rationales of the aforementioned empirical and analytical approaches, and show that the analytical method proposed by Bâ et al. (2001) does not provide CIs for quantiles, but those of the order statistics. Furthermore, we study the applicability of an analytical method based on fractional order statistics (Hutson, 1999) for computing CIs for extreme quantiles. Simulations and real world data are used to highlight differences among the approaches. The study is developed in the context of the classical frequency analysis involving the hypothesis of independent and identically distributed data. The work is structured as follows: in ‘‘Materials and methods”, we provide an overview of the methods considered in the literature to estimate the CIs for quantiles. ‘‘Comparison of the methods” discusses rationales and differences of these methods and their applicability depending on the aim of the analysis, and in light of the different types of uncertainty that they account for. Furthermore, four methods are compared in terms of coverage probabilities (defined in ‘‘Materials and methods”) by simulating from fixed parent distributions. ‘‘Applications” illustrates an application of the methods described in ‘‘Comparison of the methods”, and discusses their performance on two real world datasets of annual maximum peak discharge. Finally, discussion and conclusions close the work. Materials and methods It is well-known that defining a CI for an unknown parameter h means to compute two numbers hl and hu , which are expected to include within their range the unknown parameter in a specified percentage of cases after repeated experiments under identical conditions (e.g., Kottegoda and Rosso, 2008, p. 236). Then, a CI can be defined as the interval such that:

Pr½hl < h < hu  ¼ ð1  aÞ for 0 < a < 1;

ð1Þ

where hl and hu are called lower and upper confidence limits, respectively, and the probability ð1  aÞ is known as coverage probability or confidence level. In hydrology and other fields, the parameter of interest is often a given quantile, e.g. the 0.99th quantile of the annual flood distribution corresponding to a value with 100-year return period (e.g., Stedinger, 1983). Hence, CIs are a viable way to quantify the accuracy of quantile estimates. In the following, we provide an overview with discussion of some methods proposed in the literature. Confidence intervals for quantile estimates based on normal approximation (NA method) For a fixed value of probability P, denote ^ xP ¼ F^1 ðPÞ the estima1 ^ tor of the Pth quantile, where F is the inverse of a fitted strictly increasing distribution function F^ of a variable X. A common method to define the CIs for quantiles is to assume that the estimators are asymptotically normal distributed with mean equal to ^ xP and variance (Bahadur, 1966; Su, 2009):

Var½^xP  ¼

1 Pð1  PÞ ; n ^f 2 ð^xP Þ

ð2Þ

529

where n is the sample size and ^f is the fitted density function cor^ Through Eq. (2), an approximate 100ð1  aÞ% CI responding to F. is given by (Bahadur, 1966; Stedinger et al., 1993; Su, 2009):

^xP  z1a=2

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Var½^xP ;

ð3Þ

where z1a=2 denotes the value of the standard normal variate with probability ð1  a=2Þ. Eq. (3) allows computing approximate CIs when the expression of the estimator variance Var½^xP  is known. Exact and asymptotic formulas can be found in the literature for normal, lognormal, Pearson and log-Pearson type III, Gumbel (GUM), log-GUM, Generalized Extreme Value (GEV), three-parameter Weibull distributions, and point and regional estimators (Stedinger, 1983; Chowdhury and Stedinger, 1991; Stedinger et al., 1993; Fortin and Bobée, 1994; Heo and Salas, 1996; De Michele and Rosso, 2001; Heo et al., 2001). This method is related to the assessment of the quantile estimator uncertainty, namely, to ‘‘parameter uncertainty caused by the sampling errors in the estimated parameters of the selected probability distribution” (Chowdhury and Stedinger, 1991). In fact, all the formulas available for Var½^ xP  are specific for each particular estimation method of the distribution parameters (e.g., method of the moments, probability weighted moments, L-moments, maximum likelihood), as they depend on ^f and F^ (Eqs. (2) and (3)). The empirical method applied by Bâ et al. (2001) actually falls in this category, since the authors use the formula of the variance of the quantile estimator corresponding to the method of the moments. Hence, in spite of the statement mentioned in the introduction of the paper by Bâ et al. (2001), the authors focus on statistical uncertainty and not on model uncertainty. This can lead to misleading conclusions since, for instance, a reduction of the width of CIs due to an increased sample size could be erroneously interpreted as a reduction of model uncertainty, while only statistical uncertainty is reduced. Simulation-based confidence intervals for quantile estimates (MC method) Normal approximation cannot be appropriate when the distribution of the quantile estimator is not symmetric. A way to overcome this drawback is to resort to simulation approaches such as Monte Carlo simulation (also known as parametric bootstrap) and non-parametric bootstrap (e.g., Efron and Tibshirani, 1993; Fortin and Bobée, 1994; Kottegoda and Rosso, 2008, pp. 517– 519). These techniques are computer intensive and the effort cannot be justified when the asymptotic results work well (e.g., Fortin and Bobée, 1994). However, these methods become highly valuable when the expression of Var½^ xP  is not available. The algorithm to construct Monte Carlo CIs for quantile estimates given by Kottegoda and Rosso (2008, pp. 517–519) is described below, since it will be useful later in the paper. Let us suppose we want to assess the performance of the Pth quantile estimator of a generic distribution function Fð; nÞ with parameters n (e.g., GUM) obtained by a chosen estimation method (e.g., the L-moments method): 1. Simulate N samples of size n as xij ¼ F 1 ðuij ; nÞ for i ¼ 1; . . . ; N and j ¼ 1; . . . ; n, where uij are pseudo-random realizations of a standard uniform random variable; 2. for each ith sample, compute the values ^ n of the distribution parameters n with the chosen estimation method; 3. compute the Pth quantile for each estimated distribution nÞ; xPi ¼ F 1 ðP; ^ 4. compute the confidence limits as the ða=2Þ and ð1  a=2Þ quantiles of the sample distribution of xPi . This approach is called percentile method (e.g., Efron and Tibshirani, 1993, pp. 168– 177).

530

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541

Instead of the percentile method, other types of bootstrap CIs, such as approximated bootstrap CIs (ABC) and bootstrap-corrected and accelerated (BCa), can be computed with slight changes of the forth step of the algorithm (e.g., Efron and Tibshirani, 1993, pp. 178–199). When samples are simulated by resampling with replacement instead of using Monte Carlo simulation algorithms, the methods are called non-parametric bootstrap-type. Such approaches have been compared by Kysely´ (2008) for extreme value distributions, concluding that differences among the different types of bootstrap CIs (percentile, ABC and BCa) are usually smaller than those between the parametric and non-parametric versions of bootstrap. Furthermore, Kysely´ (2008) states that parametric bootstrap should be preferred for small to moderate sample sizes, and when a suitable distribution for the data is known or can be assumed. Confidence intervals for quantiles based on order statistics (HAD and OS methods) An alternative approach, which does not resort to simulations, is based on order statistics. Let X 1:n ; . . . ; X n:n be the order statics from a sample of size n of a variable X with distribution function F. It can be shown that (e.g, Mood et al., 1974; Hutson, 1999; Su, 2009):

p ¼ Pr½X r:n 6 xP 6 X s:n  ¼

s1   X n ½FðxP Þi ½1  FðxP Þni : i i¼r

ð4Þ

Any choice of r and s, such that p ¼ ð1  aÞ, gives a CI for xP . This interval can be obtained trying different values of r and s such that p becomes close to ð1  aÞ. However, as explained by Hutson (1999), when the sample size is small (a common situation working, for instance, with annual maxima), the value of p can be very different from ð1  aÞ. By resorting to fractional order statistics and their properties (Stigler, 1977; Hutson, 1999; Nadarajah and Gupta, 2004), an exact 100ð1  aÞ% CI for xP for any fixed confidence level may be defined by rewriting Eq. (4) as follows:

Pr½xl 6 xP 6 xu  ¼ IðP; n0 Pl ; n0 ð1  Pl ÞÞ  IðP; n0 Pu ; n0 ð1  Pu ÞÞ;

ð5Þ

Rz

where 0 < P l < P u < 1; n0 ¼ ðn þ 1Þ and Iðz; a; bÞ ¼ 0 t a1 ð1 tÞb1 dt=Bða; bÞ is the incomplete beta function ratio (that is, the beta cumulative distribution function). Then, an exact 100ð1  aÞ% CI for the Pth quantile is given by:

  F 1 ðPl Þ; F 1 ðPu Þ ;

ð6Þ

where Pl and Pu are obtained by solving the expressions:

IðP; n0 Pl ; n0 ð1  P l ÞÞ ¼ 1  a=2;

ð7Þ

IðP; n0 Pu ; n0 ð1  Pu ÞÞ ¼ a=2:

ð8Þ

For the median quantile with every value of a and sample size greater than 10, values of Pl and Pu can be closely approximated with negligible error by the following expressions:

Pl  I1 ða=2; n0 P; n0 ð1  PÞÞ; Pu  I1 ðð1  a=2Þ; n0 P; n0 ð1  PÞÞ;

ð9Þ ð10Þ

1

where I ða; n0 P; n0 ð1  PÞÞ is the ath quantile of a beta random variable with parameters n0 P and n0 ð1  PÞ (Hutson, 1999). Then, confidence limits of a 100ð1  aÞ% CI for the median are approximately:

xl  F 1 ðI1 ða=2; n0 P; n0 ð1  PÞÞÞ;

ð11Þ

xu  F 1 ðI1 ðð1  a=2Þ; n0 P; n0 ð1  PÞÞÞ;

ð12Þ

where P ¼ 0:5. Expressions in Eqs. (9) and (10) are the formulas suggested by Bâ et al. (2001) to build CIs for any quantile xP . However, these equations are conceptually different from Eqs. (7) and

(8). These formulas are designed to find fractional order statistics able to provide a CI for xP , while Eqs. (9) and (10) provide the CIs for the order statistics X nP:n . Although differences are negligible for median and central quantiles, they can be non-negligible for extreme quantiles. In the next section we will discuss these differences more in depth. So far, it was assumed that F is a known or fitted distribution, so that F 1 can be easily computed. When F is unknown, it is possible to resort to a non-parametric quantile function. A simple and powerful non-parametric quantile function allowing to overcome the shortcoming of the classical step function (limited within the probabilities 1=ðn þ 1Þ and n=ðn þ 1Þ) was described by Hutson (2002), and its expression is:

8 0 1 0 < P 6 nþ1 ; > < X 1:n þ ðX 2:n  X 1:n Þ logðn PÞ; 1 n ^xP ðPÞ ¼ ð1  ÞX bn0 Pc:n þ X bn0 Pcþ1:n ; < P < ; nþ1 nþ1 > : n 6 P < 1; X n:n  ðX n:n  X n1:n Þ logðn0 ð1  PÞÞ; nþ1 ð13Þ where the term in the second row is the standard linear interpolator,  ¼ n0 P  bn0 Pc, and bc denotes the floor function. For variables with support ½0; þ1Þ, such as discharge data, the estimator in Eq. (13) specializes in (Hutson, 2002):

8 1 0 < P 6 nþ1 ; > < X 1:n ; 1 n ^xP ðPÞ ¼ ð1  ÞX bn0 Pc:n þ X bn0 Pcþ1:n ; < P < ; nþ1 nþ1 > : n 6 P < 1; X n:n  ðX n:n  X n1:n Þ logðn0 ð1  PÞÞ; nþ1 ð14Þ A slight modification of Eq. (14), which allows accounting for discrete–continuous distributions (such as those of zero-inflated rainfall sequences), can be found in Hutson (2008). These quantile functions allow extrapolation to extreme probabilities, and perform quite well for data from mid- to light-tailed distributions. Finite and large sample properties of these quantile estimators are discussed by Hutson (2002). Eqs. (13) and (14) can be used to compute xl and xu from the Pl and Pu values obtained by Eqs. (7)–(10) when F is unknown. Hereinafter, we denote CIs for order statistics (Eqs. (9) and (10) plus Eq. (6)) as OS method, and CIs for quantiles based on fractional order statistics (Eqs. (7) and (8) plus Eq. (6)) with ^ parametric distribution as HAD method, from known (F) or fitted ðFÞ the name of Hutson, who first introduced this method. The Hutson’s method with non-parametric quantile functions (13) and (14) is labeled as HAD-F n method.

Simulation-based OS confidence intervals (OS-MC method) To support the discussion in the next section, we describe the simple algorithm to obtain the sample distribution of the order statistics and the corresponding CIs, namely, the simulation-based version of the CIs given by Eqs. (6), (9) and (10): 1. Simulate N samples of size n as xij ¼ F 1 ðuij ; nÞ for i ¼ 1; . . . ; N and j ¼ 1; . . . ; n; 2. obtain the order statistics X ðj:nÞi by arranging each sample in ascending order; 3. compute the confidence limits as the ða=2Þ and ð1  a=2Þ quantile of the sample distribution of X ðj:nÞi for each j ¼ 1; . . . ; n (or apply other bootstrap methods, such as ABC and BCa). The method can be extended to the generic fractional order statistics X nP:n applying Eqs. (13) or (14) to each simulated sample and step 3 of the above algorithm to the N-size sample of X nP:n .

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541

Comparison of the methods General remarks From the previous section, we can distinguish three practical approaches to construct CIs for quantiles plus a method for order statistics, which can be used under some assumptions to approximate CIs for quantiles. However, these methods actually yield different information as they describe different types of uncertainties. NA and MC methods define the CIs for the estimates ^ xP deduced by a selected parent distribution function. They measure the statistical uncertainty of the parameter estimators (Chowdhury and Stedinger, 1991; Kottegoda and Rosso, 2008), as proved by the fact that different expressions of Var½^ xP  are given in the literature according to the different methods of estimation. On the other hand, HAD method provides an estimate of the CI for the quantile xP exploiting the theoretical distribution of the order statistics. It does not account for parameter uncertainty as the parameter estimator does not appear in the formulation and the distribution is assumed known or well-specified. The application of this method has been limited by the low quality of the coverage probability ð1  aÞ caused by the small number of order statistics available in observed samples (Mood et al., 1974). As shown by Hutson (1999), this shortcoming can be overcame resorting to fractional order statistics, which allow constructing CIs with exact coverage probabilities. Finally, the OS approach provides the CIs for the order statistics X nP:n , whose theoretical distribution can be shown to be beta. However, this method does not yield information about quantiles. To give a qualitative picture of the differences among the different methods, we have simulated a sample of size n ¼ 50 from a standard GUM distribution FðxÞ ¼ expðexpðxÞÞ, and constructed CIs by the aforementioned approaches. Results are shown as confidence bands in the qq-plot in Fig. 1. The value of a was selected equal to 0.025 to obtain 95% CIs. Bands are obtained interpolating CIs computed for quantiles/order statistics corresponding to a set of non-exceedance probabilities ranging from 0.02

531

ðP ¼ 1=ðn þ 1ÞÞ to 0.98 ðP ¼ n=ðn þ 1ÞÞ. For sake of comparison, NA intervals were computed using the asymptotic results for Var½^ xP  related to L-moments estimator (NA-LM) (e.g., Chowdhury and Stedinger, 1991), and to conventional moments estimator (NA-MM) (e.g., Chow et al., 1988). MC method (estimating parameters by L-moments method; MC-LM), OS and OS-MC methods were applied as well. As shown in Fig. 1, values are very close for central quantiles, while some differences emerge on the upper (lower) limits of the upper (lower) tails. We focus on the upper tail, which is of interest in several applications, but analogous remarks hold for the lower tail. Looking at the panel on the right, we notice that NA-MM intervals are wider than NA-LM, as L-moments estimator is more robust than conventional moments against outliers and exceptionally high values. MC-LM performs closely to NA-LM, and accounts for the slight asymmetry of the quantile distribution. OS and OS-MC are practically equal but different from MC-LM. Since MC-LM and OS-MC CIs are computed on the same set of simulated series, the difference between the two methods is only related to the different structures of the algorithms described in ‘‘Simulation-based confidence intervals for quantile estimates (MC method)” and ‘‘Simulation-based OS confidence intervals (OS-MC method)”. Recalling that MC-LM and OS-MC CIs correspond to analytical NA and OS CIs, respectively, OS and NA methods yield different results because they describe CIs for different quantities (quantiles and order statistics) and not because the probability distribution of the sample estimates of the quantiles is a beta function of the distribution function of data, as stated by Bâ et al. (2001). Comparison of HAD and OS methods As CIs for quantile estimates (NA and MC) are widely studied in the literature, we focus on the less studied HAD and OS methods. According to Fig. 1, differences between the two approaches seem negligible for all practical purposes. However, these differences dramatically increase when we try to quantify the uncertainty for extreme quantiles with non-exceedance probabilities greater

Fig. 1. Left: qq-plot of a simulated 50-size sample from a standard GUM distribution (white points). Grey points denote 1000 50-size standard Gumbel samples simulated to show the empirical spreading of the order statistics. Lines denote confidence bands obtained interpolating CIs for several quantiles computed by different methods: Hutson (HAD), order statistics (OS), simulation-based order statistics (OS-MC), normal approximation with conventional moments estimator (NA-MM), simulation-based with Lmoments estimator (MC-LM), normal approximation with L-moments estimator (NA-LM). Right: magnification of the grey area shown in the panel on the left.

532

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541

than n=ðn þ 1Þ, e.g. the uncertainty for the 500-year flood ðP ¼ 0:998Þ from a 20-year sample of annual maxima ðn=ðn þ 1Þ ¼ 0:95Þ. Fig. 2 shows the upper tails of qq-plots (on Gumbel chart) for two standard Gumbel samples with size 50 and 200 and the CIs for standard Gumbel quantiles with probability P ranging from 0.92 and 0.998. Black points denote the largest simulated values from the 50-size sample, and white points from the 200-size sample. Extrapolating beyond P ¼ 0:98, the OS upper limits rapidly increase, while those from HAD method seem more constant. Thus, it seems that HAD method underestimates the uncertainty. However, we should question how credible it is that the 500-year standard Gumbel variate assumes a value around 30, which corresponds to a standard Gumbel quantile with P > 0:999999 (T > 107 years). On the other hand, HAD method gives an upper limit approximately equal to 8 (P  0:9996646; T  2981 years). Looking at the sample with size 200 (white points), it is clear that increasing the size from 50 to 200 does not involve a fast increase of the absolute values of the simulated extreme variates (e.g., the largest observations from the two samples have values quite close to each other). In fact, by drawing a large number of samples from GUM distribution, the largest simulated values are usually lower than 10 in almost all the cases (as shown, for instance, in the left panel of Fig. 1) independently of sample size and frequency assigned to the largest order statistics by a plotting position. HAD method recognizes this fact and assigns proper limited upper limits to extreme Gumbel quantiles. Conversely, in the OS method the interval assigned to the largest order statistic represents a probable range for the maximum values that can be drawn from the considered distribution. This means that the extreme quantiles probably lie within this range. Then, the extrapolation by OS method advocated by Bâ et al. (2001) is not consistent, and, at most, the OS CIs for the maxima can be used as a first approximation of the interval that contains a generic extreme quantile. In other words, the largest observations of the 50and 200-size samples represent two different quantiles (the 0.98th and 0.995th), but the same order statistic (the largest one). Hence the extrapolation toward increasing non-exceedance probabilities is meaningful for CIs associated to quantiles, but is meaningless

for OS CIs, since it should involve the existence of order statistics larger than the largest one. It follows that the extrapolation of the OS CI for the largest order statistic of the 50-size sample yields unreliable (too wide) CI for the largest observation of the 200-size sample. On the contrary, HAD CI for the largest observation of the 200-size sample is very close to HAD CI extrapolated from the 50size sample. It is worth noting that the mutual positions of HAD and OS curves shown in Fig. 2 are general and do not depend on the considered distribution function. Hutson (1999) suggests OS confidence limits (Eqs. (9) and (10)) as a good approximation of HAD limits (Eqs. (7) and (8)) for the median. We explore what happens for the other quantiles. By applying the two methods to compute values of P l and P u for the median, values very close to each other are obtained. However, for extreme quantiles small differences on Pl and Pu can cause large differences on xl and xu , which depend on F via x ¼ F 1 ðPÞ. To circumvent the impact of F, we compare the two approaches in terms of the probabilities a. Values of P l and Pu were computed by Eqs. (9) and (10) for fixed values of a and replaced in Eqs. (7) and (8) to compute the a values corresponding to HAD method. Computation was performed for several quantiles and three sample sizes usually available in frequency analyses of annual maxima. Results are shown in the upper panels of Fig. 3 for a range of ð1  a=2Þ values commonly used to compute P u . The patterns corresponding to a=2 (usually used to compute Pl ) can be obtained by symmetry, and are shown in the bottom panels for sake of completeness. The figure shows that the approximation for the median works rather well for every sample size, while differences become larger for extreme quantiles. For example, focusing on mid-upper plot, namely, on the upper limit for the 0.99th quantile of 50-size samples, values of ð1  a=2Þ from 0.90 to 0.999 for the OS method correspond to HAD probabilities ð1  a=2Þ greater than 0.99. Hence HAD upper limits are lower than OS ones. This explains why the CI upper limits associated with extreme quantiles in Fig. 2 seem upper-bounded, while the limits related to order statistics exhibit a fast increase. Furthermore, looking at the upper panels, it emerges that a possible inversion (HAD upper limits are greater than the OS ones) can occur only for quantiles with probability lower than n=ðn þ 1Þ (0.9, 0.98, 0.99 in the first, second and third upper panels, respectively). In other words, when we extrapolate beyond the empirical probabilities (return time greater than the number of observed years), the upper limits of HAD CIs are always lower than those of OS CIs, for every value of a and any fixed distribution. To confirm this result (namely, that only the curves corresponding to quantiles with probability lower than n=ðn þ 1Þ intersect the 45 line), we have considered a set of sample sizes n ¼ 10; 11; . . . ; 1000, and searched Pu values yielding equal HAD and OS probabilities ð1  a=2Þ by finding the zeros of the function:

D ¼ IðP; n0 Pu ; n0 ð1  Pu ÞÞ þ IðPu ; n0 P; n0 ð1  PÞÞ  1;

ð15Þ

obtained from Eqs. (8) and (10). A root-finder algorithm applied for several values of P was able to find a root for values of P up to ðn  0:1Þ=ðn þ 1Þ, while the function was found always positive for P P ðnÞ=ðn þ 1Þ, confirming the aforementioned remarks. Moreover, OS confidence limits approach the HAD limits when sample size increases (as already shown in Fig. 2), and differences between HAD and OS are less evident for upper limits of extreme lower quantiles. Mutata mutandis, the previous discussion can be applied to the bottom panels of Fig. 3 for the lower limits of the extreme lower quantiles. Non-parametric setup Fig. 2. Comparison of OS (grey lines) and HAD (black lines) CIs for two standard Gumbel samples with size n equal to 50 and 200. Dashed lines denote extrapolation to quantiles with probabilities greater than n=ðn þ 1Þ.

In the previous sections, CIs for the theoretical quantiles of a known or fitted distribution F have been discussed. These values

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541

533

Fig. 3. Upper panels show the relationships between the OS and the corresponding HAD probabilities ð1  a=2Þ computed with the same P u values. Upper (bottom) panels refers to probabilities 1  a=2 ða=2Þ usually used to compute upper (lower) limits of confidence intervals. Each curve refers to a quantile: e.g., 0.95 is the 0.95th quantile.

can be used in statistical plots, such as qq-plot, to visually assess the agreement of the sample distribution with the theoretical model accounting for estimation uncertainty (NA and MC methods) and sampling variability (OS and HAD methods). In this type of application we focus on probabilities ranging within the interval of the empirical frequencies ð1=ðn þ 1Þ; n=ðn þ 1ÞÞ, and, as previously shown, differences among the methods could be negligible. However, when the aim is to estimate CIs for quantiles with probability of non-exceedance greater than the observed frequencies, and the parent distribution is unknown (as it normally is), the model mis-specification can strongly impact the results. To avoid this problem, a non-parametric framework could be helpful. On the other hand, one can be interested in estimating a credible range of values for a fixed extreme quantile without fitting a model of the entire distribution. By construction, NA method does not have a non-parametric counter-part as Var½^ xP  relies on the method of estimation of a parametric model. Analogously, the MC method relies on a parametric distribution and a parameter estimation method. Since non-parametric bootstrap does not allow simulation of values different from those observed, it cannot help as well. Sharif and Burn (2006) proposed an approach to perturb resampled data by adding a noise term to simulate values within the observed ones. However, ad hoc methods should be required to simulate values greater than those observed (Furrer and Katz, 2008). On the contrary, HAD method allows a non-parametric setup (HAD-F n ) exploiting the non-parametric quantile functions in Eqs. (13) and (14). Then, HAD-F n method provides a data-driven way of quantifying the uncertainty of extreme quantiles as well as a tool for comparing results obtained by parametric ap-

proaches. An example is given in Fig. 4. The upper panels show the upper tails of the 50- and 200-size standard Gumbel samples reported in Fig. 2, with 95% HAD and HAD-F n CIs. In this case, the model is well-specified. CIs drawn from the 50-size sample are quite similar, while the non-parametric setup yields intervals narrower than the parametric ones for the 200-size sample. HAD-F n method follows the pattern of the data becoming smoother for values greater than those observed, where the logarithmic tail of the non-parametric quantile function in Eq. (13) drives the CI patterns. The bottom panels show an example of mis-specification. Two samples with size 50 and 200 were simulated from a two-parameter lognormal (LN2) distribution with position and scale parameters equal to 0 and 1, respectively. Assuming (incorrectly) that samples are drawn from a GUM distribution, parametric CIs show that the Gumbel choice can be questionable. On the other hand, the non-parametric intervals show that the spreading of the upper tail values is underestimated when the GUM distribution is selected, since the data actually come from a LN2 distribution with upper tail heavier than the GUM one (e.g., El Adlouni et al., 2008). Clearly, this is a too obvious example of mis-specification, whereas more subtle cases can occur in real world data. In some of these cases, the comparison of HAD and HAD-F n CIs can provide further information about the tail behavior of the sample. Finally, an alternative approach to account for mis-specification in a parametric setup could be to use distribution functions able to mimic a wide range of families, such as the generalized lambda function (e.g., Su, 2007) or the four-parameter kappa distribution (Hosking, 1994). The reader is pointed to the study of Su (2009) for an illustration exploiting the generalized lambda distribution.

534

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541

Fig. 4. Comparison of parametric HAD (black lines) and non-parametric HAD-F n (grey lines) CIs. Upper panels refer to standard Gumbel samples with size 50 (left) and 200 (right) and well-specified GUM distribution. Bottom panels refer to lognormal samples with size 50 (left) and 200 (right) and mis-specified GUM distribution. Dashed lines denote extrapolation to quantiles with probabilities greater than n=ðn þ 1Þ.

Analysis of the coverage probabilities Monte Carlo experiments were carried out to study the actual coverage probabilities of the aforementioned methods. These experiments were designed as follows: a sample with given size is drawn from a distribution with fixed parameters and CIs with nominal coverage probability equal to 95% for several quantiles are then computed using different methods. The process is repeated 1000 times and the proportion for which the true underlying quantiles fall within the evaluated confidence intervals is reported (e.g., Su, 2009). A method is accurate if this proportion (the actual coverage probability) is equal to the nominal coverage probability (95%). When the actual coverage probability is lower (greater) than the nominal one, the CIs are too narrow (too wide) relative to the real uncertainty of the estimates (e.g., Kysely´, 2008). Hutson (1999) evaluates the coverage probabilities for the median and third quartile for the standard uniform, normal, Cauchy, exponential and GUM distributions by simulating pseudo-random samples of size n ¼ 5, 11, 12, 13, 14 and 15. However, since the performance related to extreme quantiles and sample sizes common in hydrological analyses are of great interest, we performed Monte Carlo simulations using three sample sizes common in real world applications involving annual maxima, namely, 20, 50 and 100. The 95% CIs were computed for 0.95th, 0.98th, 0.99th, 0.995th,

and 0.998th quantiles corresponding to 20-, 50-, 100-, 200- and 500-year return periods in analyses of annual maxima. This allows assessing the ability of the methods to yield reliable CIs for quantiles with non-exceedance probabilities greater than the observed ones. Furthermore, we considered four configurations of the Monte Carlo experiments to account for the problem of mis-specification. In the first configuration, it is assumed that the parent distribution is correctly specified by the fitted distribution used in the parametric setup. The second configuration involves a mis-specification where parent and fitted distributions are different but have similar upper tail behavior. In the third and fourth configurations, we have considered mis-specifications where the parent distribution has upper tail lighter than and heavier than the fitted one, respectively. Gamma (GAM), GUM, LN2, and two-parameter Weibull (WE2) distributions were used, as they are commonly applied in several applications and are used in the case studies. For a discussion about the tail behavior of these distributions, the reader is pointed to El Adlouni et al. (2008). The values of the parameters were chosen so that they are consistent with values estimated from the discharge data described in the next section (see the caption of Table 1). Finally, the coverage probabilities of MC-LM CIs were computed as well for sake of comparison. Results are reported in Table 1. The GAM–GAM case refers to the first configuration. Only values related to the GAM distribution

535

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541

Table 1 Results of simulations assessing the actual coverage probability (in percentage) of four CI computation methods (simulation-based with L-moments estimator (MC-LM), order statistics (OS), parametric (HAD) and non-parametric (HAD-F n ) Hutson’s method), three sample sizes (20, 50 and 100), five return periods (20, 50, 100, 200 and 500 years), and four possible configurations: (1) the parent distribution (GAMð10:0; 2:6Þ) is correctly specified by the fitted distribution (GAM); (2) mis-specification where parent (GAMð10:0; 2:6Þ) and fitted (GUM) distributions have similar upper tail behavior; (3) mis-specification where the parent distribution (GAMð10:0; 2:6Þ) has upper tail lighter than the fitted one (LN2); (4) mis-specification where the parent distribution (LN2ð1:8; 0:4Þ) has upper tail heavier than the fitted one (WE2). Results refer to 95% CIs. Configuration GAM–GAM

Sample size 20

50

100

GAM–GUM

20

50

100

GAM–LN2

20

50

100

LN2–WE2

20

50

100

T (years)

MC-LM

OS

20 50 100 200 500 20 50 100 200 500 20 50 100 200 500

93.1 92.9 92.9 92.8 92.7 93.8 93.7 93.8 93.6 93.7 93.8 92.8 93.4 93.5 93.7

99.2 99.8 100.0 99.6 98.7 99.2 99.9 99.9 99.9 99.9 99.5 99.9 100.0 100.0 100.0

96.9 93.7 90.9 85.5 77.6 99.0 99.0 98.8 97.8 93.2 99.5 99.9 99.9 99.7 99.2

85.0 75.1 66.6 61.1 53.4 91.4 82.0 73.5 66.0 57.5 96.3 90.9 83.5 76.0 66.7

20 50 100 200 500 20 50 100 200 500 20 50 100 200 500

91.7 92.1 92.1 92.6 93.3 91.8 91.9 92.1 92.0 92.0 92.5 92.7 93.4 93.9 94.1

99.7 99.8 99.9 99.9 98.6 99.4 99.9 100.0 100.0 100.0 99.9 100.0 100.0 100.0 100.0

97.3 96.4 93.6 89.6 81.5 98.6 99.5 99.5 98.6 95.9 99.8 99.9 100.0 100.0 99.8

85.4 76.9 69.0 62.7 55.2 93.4 83.0 76.2 69.5 61.5 96.4 90.1 82.5 74.9 65.8

20 50 100 200 500 20 50 100 200 500 20 50 100 200 500

97.4 98.2 98.9 99.1 98.9 98.2 98.3 96.3 94.7 90.4 99.2 95.2 88.6 79.0 62.1

99.6 99.8 99.8 99.2 94.8 99.8 99.8 99.5 99.3 97.4 100.0 99.6 99.0 98.5 96.7

99.3 98.9 98.4 97.8 95.9 99.7 99.8 99.8 99.7 99.9 100.0 99.7 99.5 99.4 99.6

85.4 76.9 69.0 62.7 55.2 93.4 83.0 76.2 69.5 61.5 96.4 90.1 82.5 74.9 65.8

20 50 100 200 500 20 50 100 200 500 20 50 100 200 500

74.8 65.6 59.6 51.2 44.0 69.7 51.2 38.7 27.9 17.7 64.9 37.4 20.9 11.3 4.0

89.2 91.7 94.5 96.5 99.4 85.6 84.2 88.7 94.7 98.3 82.0 70.2 73.0 83.5 97.0

80.4 61.7 44.9 29.5 17.0 83.1 65.6 45.5 25.8 10.6 81.0 60.9 42.4 21.6 5.2

83.0 71.5 63.8 56.8 47.9 92.3 81.1 72.9 65.0 54.5 95.9 90.9 82.8 74.0 63.3

are shown, as similar results were obtained for the other distributions. Results show the good performance of MC-LM method, which slightly underestimates the coverage probability. As expected, OS method overestimates the coverage probability for all sample sizes and quantiles, as the approach yields the widest intervals. HAD method worsens as the quantiles are more and more extreme and sample size small. When the sample size increases, the

HAD

HAD-F n

actual coverage probability of HAD CIs is greater than the nominal one. The HAD-F n method shows that the actual coverage probabilities are generally lower than nominal ones. The method behaves better as the sample size increases and return periods are small. As usual for non-parametric approaches, HAD-F n method performs less well than the parametric counterpart under correct specifications. However, since HAD-F n CIs rely exclusively on data and do

536

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541

not require any distributional assumption, they are not affected by possible model mis-specifications, as shown by the last column of values for the cases GAM–GUM, GAM–LN2, and LN2–WE2. HAD-F n results are exactly equal for the cases GAM–GUM and GAM–LN2, as the same GAM pseudo-random sequences were used in both the cases. GAM–GUM results are rather close to the case GAM– GAM, showing that mis-specification between distributions with similar upper tail behavior does not significantly affect the performance. In the third configuration, MC-LM method overestimates the coverage probability for all return periods and size 20. As sample size increases, the nature of the parent distribution becomes more evident, resulting in an underestimation of the coverage probability for large return periods. On the other hand, OS and HAD methods tend to overestimate the nominal coverage probability because the CI upper limits for extreme quantiles are driven by the largest LN2 order statistics, which have values higher than the corresponding GAM order statistics. This fact results in wide CIs, which almost always enclose the extreme GAM quantiles. As previously mentioned, mis-specification does not affect the non-parametric HAD-F n method. When samples simulated from LN2 distribution are fitted with the lighter tailed WE2 distribution, MC-LM method performs quite poorly, while OS results seem to be quite good. However, this is due to the behavior described in Fig. 2, namely, the high values assumed by CI upper limits when one extrapolates beyond observed frequencies. Thus, OS CIs for extreme quantiles based on WE2 distribution have a width such that extreme LN2 quantiles are often within the CI limits in spite of mis-specification. HAD performs better than MC-LM for small return periods (lower than or about equal to the sample size) and sample sizes equal to 20 and 50, and for all return periods and size 100. However, MC-LM and HAD coverage probabilities are similar. Finally, it is worth noting that the actual coverage probability of HAD-F n method is close to the nominal one for return periods lower than the sample size. In other words, HAD-F n method performs well when CI limits are computed by the central part of the quantile functions in Eqs. (13) or (14). When the sample size is equal to the return period, HAD-F n method yields coverage probabilities within the range ð0:80; 0:85Þ, which are consistent with the results reported by Hutson (2002). Since the computation of CI upper (lower) limits of the upper (lower) quantiles resorts to logarithmic tail extrapolations in

Eqs. (13) and (14), it follows that an improvement of the results could be reached by using empirical quantile functions with different semi-parametric tails, such as the composite quantile function suggested by Hutson (2000). Applications In this section, two datasets of annual maximum peak discharges are used to show the applicability of the aforementioned methods. US Geological Survey (USGS) discharge data As a first case study, we have computed 95% CIs for extreme quantiles (T ¼ 200; 500 and 1000 years) of 22 USGS series of annual instantaneous peak discharges from streamflow gaging stations in the United States with a record of at least 100 years (Table 2). The dataset is thoroughly described by Villarini et al. (2009a), who checked the series for trends, change points, autocorrelation, and long range memory. Furthermore, Villarini et al. (2009a) selected optimal parametric distribution for each series by means of the Akaike Information Criterion (AIC; Akaike, 1974) (see second column of Table 3). To further support the selection, we have applied four goodness-of-fit tests, namely, Kolmogorov– Smirnov, Anderson–Darling (e.g., Kottegoda and Rosso, 2008, pp. 273–281), Cramer–von Mises (e.g., Laio, 2004), and the test on Pearson’s correlation coefficient on pp-plot, which is similar to Filliben’s test for qq-plots (Filliben, 1975). According to the obtained p-values (not shown), the null hypothesis (sample from the tested distribution) cannot be rejected at the 5% significance level. CIs were computed by means of HAD, HAD-F n , MC-LM, and OS methods. The last one is used only for sake of comparison, although we have shown that extrapolation results in unrealistic CIs. Results are summarized in Table 3. The HAD-F n method yields CI upper limits greater than the parametric HAD in about 62% of cases. In five cases (stations 02392000, 02477000, 05464500, 11152000 and 14048000), the HAD-F n upper limits are smaller than or equal to the point estimates of the quantiles computed from the fitted distributions, denoting that the parametric distributions have an upper tail heavier than the non-parametric quantile function in Eq. (14). The OS values for 500- and 1000-year return period

Table 2 Summary statistics of USGS stations used in the analyses. The symbols X 0:25 ; X 0:50 , and X 0:75 denote the three quartiles. USGS ID

State

River

Drainage area (km2)

Sample size

X 0:25 (m3 s1 km2)

X 0:50 (m3 s1 km2)

X 0:75 (m3 s1 km2)

Max (m3 s1 km2)

01463500 01536500 01638500 02019500 02055000 02083500 02213000 02349605 02392000 02473000 02477000 02489000 03011020 03051000 03183500 03294500 03373500 05464500 10128500 11152000 11335000 14048000

NJ PA MD VA VA NC GA GA GA MS MS MS NY WV WV KY IN IA UT CA CA OR

Delaware Susquehanna Potomac James Roanoke Tar Ocmulgee at Macon Flint at GA 26 Etowah Leaf Chickasawhay Pearl near Columbia Allegheny Tygart Valley Greenbrier Ohio East Fork White Cedar Weber Arroyo Cosumnes John Day

17,560 25,795 24,995 5369 995 5654 5801 7563 1588 4527 2378 14,814 4165 1051 3533 236,121 12,760 16,860 420 632 1388 19,631

104 117 113 115 110 102 116 104 116 103 104 104 104 101 112 140 104 106 103 102 101 102

0.12 0.10 0.08 0.14 0.15 0.05 0.10 0.06 0.15 0.11 0.12 0.06 0.13 0.23 0.21 0.05 0.06 0.03 0.10 0.20 0.08 0.01

0.15 0.13 0.12 0.20 0.23 0.07 0.14 0.10 0.19 0.16 0.17 0.07 0.15 0.29 0.27 0.06 0.08 0.04 0.13 0.34 0.20 0.02

0.20 0.17 0.16 0.26 0.33 0.10 0.20 0.14 0.28 0.21 0.29 0.10 0.20 0.35 0.36 0.07 0.11 0.06 0.15 0.60 0.43 0.03

0.53 0.38 0.54 0.94 0.92 0.35 0.52 0.51 0.65 0.76 0.73 0.32 0.50 0.79 0.75 0.13 0.36 0.12 0.28 1.27 1.90 0.06

537

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541

Table 3 Point estimates (computed by parametric fitted distributions; in m3 s1 km2) and CIs (in m3 s1 km2) for quantiles with return period T ¼ 200; 500 and 1000 years for the series recorded at the stations listed in Table 2. Four methods are used: parametric (HAD) and non-parametric (HAD-F n ) Hutson’s method, order statistics (OS) and simulationbased with L-moments estimator (MC-LM). USGS ID

Distribution

T (years)

Point estimate

HAD

HAD-F n

OS

Low

Up

Low

Up

Low

MC-LM Up

Low

Up

01463500

LN2

200 500 1000

0.48 0.55 0.60

0.36 0.39 0.46

0.67 0.73 0.76

0.41 0.48 0.50

0.70 0.74 0.76

0.37 0.40 0.43

1.04 2.43 5.40

0.40 0.45 0.49

0.57 0.67 0.74

01536500

LN2

200 500 1000

0.34 0.37 0.40

0.27 0.29 0.30

0.44 0.48 0.50

0.25 0.20 0.31

0.75 0.85 0.90

0.27 0.29 0.31

0.60 1.16 2.46

0.29 0.32 0.34

0.39 0.44 0.48

01638500

LN2

200 500 1000

0.47 0.55 0.62

0.34 0.37 0.39

0.71 0.79 0.84

0.40 0.48 0.51

0.75 0.81 0.84

0.34 0.38 0.42

1.15 3.12 9.28

0.38 0.43 0.48

0.59 0.71 0.81

02019500

LN2

200 500 1000

0.69 0.79 0.88

0.51 0.55 0.58

0.99 1.10 1.15

0.56 0.60 0.75

2.01 2.30 2.44

0.52 0.57 0.62

1.50 3.64 9.64

0.57 0.64 0.71

0.82 0.97 1.09

02055000

GAM

200 500 1000

0.85 0.96 1.04

0.66 0.70 0.73

1.14 1.23 1.27

0.69 0.74 0.82

1.52 1.68 1.76

0.66 0.73 0.78

1.57 2.66 5.94

0.74 0.82 0.89

0.99 1.12 1.22

02083500

LN2

200 500 1000

0.25 0.29 0.33

0.19 0.20 0.21

0.37 0.41 0.43

0.22 0.28 0.31

0.63 0.70 0.73

0.19 0.21 0.23

0.61 1.62 3.86

0.20 0.23 0.25

0.32 0.38 0.42

02213000

GAM

200 500 1000

0.48 0.54 0.58

0.38 0.40 0.42

0.63 0.68 0.71

0.36 0.41 0.46

0.86 0.96 1.00

0.38 0.41 0.44

0.84 1.40 2.19

0.42 0.47 0.50

0.55 0.62 0.67

02349605

LN2

200 500 1000

0.49 0.59 0.68

0.33 0.36 0.38

0.79 0.90 0.96

0.35 0.38 0.44

0.96 1.07 1.13

0.33 0.38 0.42

1.50 5.18 16.50

0.37 0.44 0.49

0.65 0.82 0.95

02392000

LN2

200 500 1000

0.78 0.91 1.01

0.56 0.61 0.65

1.15 1.29 1.36

0.63 0.64 0.65

0.69 0.70 0.71

0.57 0.64 0.69

1.82 4.70 13.66

0.63 0.72 0.79

0.96 1.15 1.30

02473000

LN2

200 500 1000

0.69 0.82 0.93

0.48 0.53 0.56

1.06 1.19 1.26

0.58 0.70 0.72

0.97 1.02 1.05

0.49 0.55 0.61

1.90 5.74 16.59

0.55 0.64 0.71

0.88 1.07 1.23

02477000

LN2

200 500 1000

0.99 1.21 1.39

0.66 0.73 0.77

1.63 1.86 1.99

0.65 0.69 0.71

0.91 0.95 0.97

0.67 0.76 0.85

3.16 11.31 37.23

0.75 0.89 1.00

1.32 1.66 1.95

02489000

LN2

200 500 1000

0.23 0.26 0.29

0.17 0.19 0.20

0.32 0.35 0.37

0.19 0.24 0.27

0.58 0.65 0.68

0.18 0.19 0.21

0.51 1.21 2.79

0.19 0.22 0.24

0.28 0.32 0.36

03011020

GUM

200 500 1000

0.39 0.44 0.47

0.31 0.33 0.34

0.51 0.55 0.57

0.31 0.35 0.42

1.00 1.12 1.19

0.32 0.34 0.36

0.71 1.23 1.90

0.35 0.39 0.42

0.44 0.50 0.54

03051000

GUM

200 500 1000

0.67 0.74 0.79

0.53 0.56 0.58

0.86 0.92 0.95

0.57 0.60 0.69

1.49 1.66 1.74

0.54 0.58 0.61

1.20 2.07 3.08

0.59 0.65 0.70

0.75 0.83 0.89

03183500

GUM

200 500 1000

0.71 0.79 0.85

0.56 0.60 0.62

0.92 0.99 1.03

0.64 0.73 0.74

0.84 0.86 0.87

0.57 0.61 0.65

1.25 2.14 3.43

0.63 0.70 0.76

0.79 0.88 0.95

03294500

LN2

200 500 1000

0.11 0.12 0.13

0.09 0.10 0.11

0.13 0.14 0.15

0.10 0.11 0.12

0.22 0.25 0.26

0.09 0.10 0.11

0.15 0.22 0.35

0.10 0.11 0.12

0.12 0.13 0.14

03373500

GUM

200 500 1000

0.25 0.29 0.31

0.20 0.21 0.22

0.34 0.36 0.38

0.20 0.24 0.29

0.77 0.87 0.92

0.20 0.22 0.23

0.48 0.85 1.31

0.22 0.25 0.27

0.29 0.33 0.36

05464500

GAM

200 500 1000

0.16 0.18 0.19

0.12 0.13 0.14

0.21 0.22 0.23

0.11 0.12 0.13

0.13 0.14 0.15

0.12 0.13 0.14

0.29 0.49 1.03

0.14 0.15 0.16

0.18 0.21 0.23

10128500

GAM

200 500 1000

0.27 0.29 0.31

0.23 0.24 0.25

0.33 0.34 0.35

0.24 0.25 0.26

0.41 0.45 0.46

0.23 0.24 0.25

0.42 0.63 0.87

0.24 0.26 0.27

0.30 0.33 0.35

11152000

WE2

200 500 1000

1.62 1.83 1.98

1.21 1.31 1.37

2.16 2.32 2.39

1.24 1.25 1.26

1.35 1.37 1.38

1.23 1.36 1.46

3.02 4.93 6.95

1.30 1.45 1.56

1.97 2.25 2.46

(continued on next page)

538

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541

Table 3 (continued) USGS ID

Distribution

T (years)

Point estimate

HAD

HAD-F n

OS

MC-LM

Low

Up

Low

Up

Low

Up

Low

Up

11335000

GAM

200 500 1000

1.52 1.78 1.98

1.04 1.15 1.22

2.22 2.44 2.54

1.07 1.51 1.69

3.30 3.63 3.80

1.06 1.21 1.33

3.46 6.61 622.58

1.22 1.41 1.56

1.93 2.28 2.54

14048000

LN2

200 500 1000

0.07 0.08 0.09

0.04 0.05 0.06

0.10 0.11 0.12

0.05 0.06 0.07

0.07 0.08 0.09

0.05 0.06 0.07

0.18 0.49 1.25

0.05 0.06 0.07

0.09 0.10 0.12

always exceed those from the other methods reaching unrealistic values in four cases (stations 02349605, 02392000, 02473000, 02477000 and 11335000). These results further confirm that OS CIs are not designed to provide information about extreme quan-

tiles by extrapolation. MC-LM and HAD methods yield CI limits close to each other, especially for the 1000-year quantile. This is an interesting result in light of the simplicity of HAD method compared to the computer intensive MC approach.

Fig. 5. Box-plots summarizing lower and upper limits (panels on the left and on the right, respectively) of 95% CIs computed by HAD, HAD-F n , MC-LM, and OS methods for quantiles with return period T ¼ 20; 50; 100; 200; 500 and 1000 years. The horizontal lines inside the boxes represent the median, the limits of the boxes the 0.25th and 0.75th quantiles, the limits of the whiskers the 0.05th and 0.95th quantiles, the points the minimum and maximum values.

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541

To better understand the behavior of the different methods, 95% CI limits for six return periods (20, 50, 100, 200, 500 and 1000 years) are summarized in Fig. 5 by box-plots. We have chosen three return periods lower than the sample size and three larger than size to highlight the effects of extrapolation on CI limits. The panels on the left clearly show that all the methods yield similar lower limits. The difference among the methods arises in the upper limits as the return period increases. Box-plots of MC-LM and HAD are very similar. On the other hand, the HAD-F n method shows a positive bias and a variability lower than MC-LM and HAD methods, resulting in higher and more conservative upper limits. Finally, OS upper limits rapidly increase up to unrealistic values for extreme upper quantiles with probabilities greater than n=ðn þ 1Þ. Tiber River discharge data As a second case study, we have re-analyzed the two samples of Tiber River peak discharge illustrated in Figs. 6 and 7 of the paper by Natale and Savi (2007). The first dataset refers to annual peak discharges recorded at Ripetta gaging station (downtown Rome) since 1965, when Corbara reservoir (the most important upstream reservoir in the Tiber River Basin) started to operate. These measurements will be denoted as post-Corbara (PC) data. The second dataset refers to a sample obtained by merging data available from 1870 to 1965 and annual peak discharges naturalized by removing the effects of Corbara reservoir (Calenda et al., 1997) from 1967 to 1989 (1966, 1970 and 1976 are missing). This dataset will be denoted as naturalized (NZ) data. For PC data, Natale and Savi (2007) found that GUM distribution (with parameters estimated by probability weighted moments method) provides a good fit based on Pearson and Kolmogorov– Smirnov tests. Given the limited sample size (34), we have applied the two-parameter distributions used in ‘‘US Geological Survey (USGS) discharge data”, and the GEV distribution with parameters estimated by L-moments method. The four goodness-of-fit tests listed in ‘‘US Geological Survey (USGS) discharge data” did not reject any family at the 5% of significance (only LN2 was rejected by

539

Anderson–Darling and Cramer–von Mises tests). However, a closer look at the p-values of the test statistics (not shown) allows ordering the distributions from the ‘‘best” to the ‘‘worst” as follows: GEV and WE2, GAM, GUM, LN2. In particular, GUM exhibits p-values close to the commonly used critical value 0.05. In spite of the good performance of GEV, the point estimate of its shape parameter and the corresponding 95% MC CI are 0.30 and ½0:02; 0:66, denoting that the distribution is upper bounded. As shown by Natale and Savi (2007) through Monte Carlo simulations, peaks larger than the computed GEV upper bound (about 2500 m3 s1) can occur even though Corbara reservoir exerts flood routing control and peak discharge reduction. Since the GEV model leads to strong underestimation of the extreme quantiles and is not conservative for planning and design purposes, it was discarded. CIs for quantiles can help to better understand differences among the other distributions. The left panel of Fig. 6 shows that the empirical distribution as well as GAM, LN2, and WE2 curves lie within the limits of MC-LM and HAD CIs corresponding to the GUM distribution. The performance of WE2 and GAM in the goodness-of-fit testing is due to the good fitting on the upper tail. Even though GUM distribution seems to perform worse than WE2 and GAM, CIs highlight that the apparent lack of fit can be ascribed to sampling variability. In addition, since confidence bands associated with GAM, LN2, and WE2 (figure not shown) partly overlap to each other, there is not strong evidence to prefer a distribution over another (e.g., Su, 2009). The panel on the right shows that similar conclusions can be drawn for NZ data as well. Also in this case, GEV distribution (curve not shown) exhibits positive shape parameter (0.18 ½0:06; 0:33) and an upper bound equal to about 4066 m3 s1. Natale and Savi (2007) estimated the parameters of the GEV distribution on leftcensored NZ data (discarding the low floods corresponding to dry years) obtaining a shape parameter value very close to zero (0.0125), meaning that the GEV distribution tends to converge to GUM distribution. Furthermore, the selection of GUM distribution agrees with the findings by Calenda et al. (2005) on an extended version of PC and NZ data. For these data, HAD-F n CIs are not shown, as the small variability of the three largest observations

Fig. 6. Left: qq-plot of post-Corbara (PC) Tiber River discharge records on Gumbel chart. Several fitted distributions and 95% HAD and MC-LM CIs for GUM distribution are shown. Right: qq-plot of naturalized (NZ) Tiber River discharge data on Gumbel chart. PC data and the corresponding fitted GUM distribution (light grey points and lines) are shown for sake of comparison.

540

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541

results in CI upper limits for extreme quantiles close to the observed maximum peak discharge. Thus, in this case, HAD-F n CIs do not yield practical information. Finally, the plot in the right panel of Fig. 6 shows that the lines denoting the GUM distributions estimated on PC and NZ samples are rather parallel. Namely, the estimates of GUM scale parameter are close to each other and practically overlapping if the sampling uncertainty is accounted for: 366 [258, 494] and 418 ½351; 484 for PC and NZ data, respectively. On the other hand, the difference of the GUM location parameters is about 422 m3 s1. Hence the smoothing effect of Corbara reservoir on peak discharges, as modeled by Calenda et al. (1997), is rather constant and independent on the absolute magnitude of the flood. This further confirms that PC peak discharges cannot be assumed upper bounded based on the smoothing effect of the reservoir, provided the accuracy of the naturalization procedure. Discussion and conclusions In this work, several methods to compute confidence intervals for quantiles were analyzed and discussed in light of their rationales. In particular, we focused on methods based on fractional order statistics as they provide simple analytical expressions, which are potentially attractive for practical applications in the traditional hydrological frequency analysis. The paper shows that the method proposed by Bâ et al. (2001) actually yields CIs for order statistics and not for quantiles, which, instead, are correctly given by Hutson (1999). Furthermore, OS CIs yield a good approximation of HAD CIs for central quantiles or, more in general, for quantiles with probability within the range of the observed frequencies (1=ðn þ 1Þ; n=ðn þ 1Þ). Monte Carlo simulations show that the widely applied parametric bootstrap approach generally performs better than the other methods in terms of coverage probability. Nevertheless, HAD method appears to be conservative (the actual coverage probability is greater than the nominal one) in several cases, resulting in wider CIs. The HAD-F n method does not perform very well when the parent distributions are well-specified, but it is insensitive to strong mis-specification of the parent distribution, which instead affects the performance of the other methods. An advantage of the approach based on order statistics with respect to the well-known normal approximation is the possibility to build non-parametric CIs, which do not require any hypothesis on the parent distribution of the observed sample. Since HAD-F n CIs for extreme quantiles depend on observed extreme order statistics, and on the tails of a non-parametric quantile function, their performance in terms of coverage probability becomes poor when quantiles with probability outside of the range ð1=ðn þ 1Þ; n=ðn þ 1ÞÞ are considered. However, the variability of the sampled extreme order statistics cannot be controlled, while alternative non-parametric or semi-parametric quantile functions with different tail behavior can be introduced to improve the results (e.g., Hutson, 2000). Furthermore, the approaches based on order statistics are designed for independent and identically distributed samples, but at the present stage cannot be applied to non-stationary time series showing memory (e.g., Koutsoyiannis, 2003) and underlying distributions with time-varying parameters (e.g., El Adlouni et al., 2007; El Adlouni and Ouarda, 2009; Villarini et al., 2009a,b). The application to USGS discharge data shows that MC-LM and HAD methods yield results rather close to each other. In many cases, the HAD-F n method gives conservative upper limits, while OS upper limits reach unreliable large values. In light of these results, given the simplicity of computing HAD and HAD-F n CIs, it could be worth applying these methods as well as NA and/or MC methods to have further evidence about the actual uncertainty associated to extreme quantiles.

The analysis of Tiber River peak discharges allows some general remarks about the communication of the results of a flood frequency analysis. The study shows that the GUM distribution can be a suitable model for these data, but other mid-, light-tailed distributions cannot be excluded. Even though the light-tail behavior is uncommon to other Italian rivers and Tiber tributaries, it appears to be a feature of Tiber mainstream, which is characterized by a large number of small annual maxima (Calenda et al., 2005). CIs highlight that model uncertainty is smaller than sampling uncertainty. Then, providing a reliable estimation of the sampling uncertainty can be more important than refining the choice among distributions yielding similar point estimates, or reporting differences among several estimation methods for the same distribution when these differences can be negligible compared to the sampling uncertainty. Recalling the good fitting of the upper bounded GEV distribution on PC Tiber data, it is clear that the choice of a suitable parametric distribution and the estimation of reliable CIs cannot rely only on statistical analyses without accounting for the physical nature of the data. Thus, an effort should be made to complement hydrological frequency analyses with physical meta-information (e.g., Villarini et al., 2009a). Acknowledgments The author would like to thank Dr. G. Villarini for his thorough review, Prof. F. Savi for insightful discussions and for having kindly shared Tiber River data, and the anonymous reviewers for their useful suggestions. The work was partly carried out under the project ‘‘Preliminary study of hydraulic risk within the basins of Rio Torbido, Torrente Rigo, Torrente Vezza, and related sub-basins located in Viterbo administrative province” supported by Provincia di Viterbo and Tiber River Basin Authority in agreement with University of Tuscia. References Akaike, H., 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19 (6), 716–723. Bâ, K.M., Díaz-Delgado, C., Cârsteanu, A., 2001. Confidence intervals of quantiles in hydrology computed by an analytical method. Natural Hazards 24, 1–12. Bahadur, R.R., 1966. A note on quantiles in large samples. The Annals of Mathematical Statistics 37 (3), 577–580. Calenda, G., Di Malta, L., Mancini, C.P., Ubertini, L., 1997. Distribuzione di probabilità dei colmi di piena del Tevere a Roma. L’Acqua 5, 13–22. Calenda, G., Mancini, C.P., Volpi, E., 2005. Distribution of the extreme peak floods of the Tiber River from the XV century. Advances in Water Resources 28, 615–625. Chowdhury, J.U., Stedinger, J.R., 1991. Confidence interval for design floods with estimated skew coefficient. Journal of Hydraulic Engineering 117 (7), 811–831. Chow, V.T., Maidment, D.R., Mays, L.W., 1988. Applied Hydrology. McGraw-Hill, New York. De Michele, C., Rosso, R., 2001. Uncertainty assessment of regionalized flood frequency estimates. Journal of Hydrologic Engineering 6 (6), 453–459. Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman & Hall/ CRC. El Adlouni, S., Ouarda, T.B.M.J., 2009. Joint Bayesian model selection and parameter estimation of the generalized extreme value model with covariates using birth– death Markov chain Monte Carlo. Water Resources Research 45, W06403. El Adlouni, S., Ouarda, T.B.M.J., Zhang, X., Roy, R., Bobée, B., 2007. Generalized maximum likelihood estimators of the non-stationary GEV model parameters. Water Resources Research 43, W03410. El Adlouni, S., Bobée, B., Ouarda, T.B.M.J., 2008. On the tails of extreme event distributions in hydrology. Journal of Hydrology 355 (1–4), 16–33. Filliben, J.J., 1975. The probability plot correlation coefficient test for normality. Technometrics 17, 111–117. Fortin, V., Bobée, B., 1994. Nonparametric bootstrap confidence intervals for the Log-Pearson type III. Transactions on Ecology and the Environment 6, 351–358. Furrer, E.M., Katz, R.W., 2008. Improving the simulation of extreme precipitation events by stochastic weather generators. Water Resources Research 44, W12439. Heo, J.-H., Salas, J.D., 1996. Estimation of quantiles and confidence intervals for the Log-Gumbel distribution. Stochastic Environmental Research and Risk Assessment 10, 187–207. Heo, J.-H., Salas, J.D., Kim, K.-D., 2001. Estimation of confidence intervals of quantiles for the Weibull distribution. Stochastic Environmental Research and Risk Assessment 15, 284–309.

F. Serinaldi / Journal of Hydrology 376 (2009) 528–541 Hosking, J.R.M., 1994. The four-parameter kappa distribution. IBM Journal of Research and Development 38, 251–258. Hutson, A.D., 1999. Calculating nonparametric confidence intervals for quantiles using fractional order statistics. Journal of Applied Statistics 26, 343–353. Hutson, A.D., 2000. A composite quantile function estimator with applications in bootstrapping. Journal of Applied Statistics 27, 567–577. Hutson, A.D., 2002. A semi-parametric quantile function estimator for use in bootstrap estimation procedures. Statistics and Computing 12, 331–338. Hutson, A.D., 2008. A discrete–continuous mixture quantile function estimator with a practical application to phase II cancer clinical trials. Statistics in Medicine 27, 2094–2109. Kottegoda, N.T., Rosso, R., 2008. Applied Statistics for Civil and Environmental Engineers, second ed. Wiley-Blackwell. Koutsoyiannis, D., 2003. Climate change, the Hurst phenomenon, and hydrological statistics. Hydrological Sciences Journal 48, 3–24. Kysely´, J., 2008. A cautionary note on the use of nonparametric bootstrap for estimating uncertainties in extreme-value models. Journal of Applied Meteorology and Climatology 47, 3236–3251. Laio, F., 2004. Cramer–von Mises and Anderson–Darling goodness of fit tests for extreme value distributions with unknown parameters. Water Resources Research 40, W09308. Mood, A.M., Graybill, F.A., Boes, D.C., 1974. Introduction to the Theory of Statistics, third ed. McGraw-Hill, New York. Nadarajah, S., Gupta, A.K., 2004. Characterizations of the Beta distribution. Communications in Statistics Theory and Methods 33, 2941–2957.

541

Natale, L., Savi, F., 2007. Monte Carlo analysis of probability of inundation of Rome. Environmental Modelling and Software 22, 1409–1416. Sharif, M., Burn, D.H., 2006. Simulating climate change scenarios using an improved K-nearest neighbor model. Journal of Hydrology 325 (1–4), 179–196. Stedinger, J.R., 1983. Confidence intervals for design events. Journal of Hydraulic Engineering 109 (1), 13–27. Stedinger, J.R., Vogel, R.M., Foufoula-Georgiou, E., 1993. In: Maidment, D. (Ed.), Handbook of Applied Hydrology. McGraw-Hill, New York, pp. 18.1–18.6 (Chapter: Frequency analysis of extreme events). Stigler, S.M., 1977. Fractional order statistics, with applications. Journal of the American Statistical Association 72 (359), 544–550. Su, S., 2007. Numerical maximum log likelihood estimation for generalized lambda distributions. Computational Statistics and Data Analysis 51 (8), 3983–3998. Su, S., 2009. Confidence intervals for quantiles using generalized lambda distributions. Computational Statistics and Data Analysis 53, 3324–3333. van Gelder, P.H.A.J.M., 1999. Statistical Methods for the Risk-based Design of Civil Structures. PrintPartners Ipskamp, Enschede, The Netherlands. Villarini, G., Serinaldi, F., Smith, J.A., Krajewski, W.F., 2009a. On the stationarity of annual flood peaks in the continental United States during the 20th century. Water Resources Research 45, W08417, doi: 10.1029/2008WR007645. Villarini, G., Smith, J.A., Serinaldi, F., Bales, J., Bates, P.D., Krajewski, W.F., 2009b. Flood frequency analysis for nonstationary annual peak records in an urban drainage basin. Advances in Water Resources 32 (8), 1255–1266, doi: 10.1016/ j.advwatres.2009.05.003.