Advances in Water Resources 83 (2015) 1–9
Contents lists available at ScienceDirect
Advances in Water Resources journal homepage: www.elsevier.com/locate/advwatres
Flood frequency analysis: Confidence interval estimation by test inversion bootstrapping Thomas Schendel a, Rossukon Thongwichian b,∗ a b
Oranienburger Straße 33, 16775 Gransee, Germany Department of Biotechnology, Faculty of Science and Technology, Thammasat University, Pathumthani 12121, Thailand
a r t i c l e
i n f o
Article history: Received 20 December 2014 Revised 3 May 2015 Accepted 10 May 2015 Available online 21 May 2015 Keywords: Flood frequency analysis Confidence interval Test inversion bootstrapping
a b s t r a c t A common approach to estimate extreme flood events is the annual block maxima approach, where for each year the peak streamflow is determined and a distribution (usually the generalized extreme value distribution (GEV)) is fitted to this series of maxima. Eventually this distribution is used to estimate the return level for a defined return period. However, due to the finite sample size, the estimated return levels are associated with a range of uncertainity, usually expressed via confidence intervals. Previous publications have shown that existing bootstrapping methods for estimating the confidence intervals of the GEV yield too narrow estimates of these uncertainty ranges. Therefore, we present in this article a novel approach based on the less known test inversion bootstrapping, which we adapted especially for complex quantities like the return level. The reliability of this approach is studied and its performance is compared to other bootstrapping methods as well as the Profile Likelihood technique. It is shown that the new approach improves significantly the coverage of confidence intervals compared to other bootstrapping methods and for small sample sizes should even be favoured over the Profile Likelihood. © 2015 Elsevier Ltd. All rights reserved.
1. Introduction The analysis of extreme events in hydrology is without doubt of crucial importance for engineering practice regarding water resources design and management. For example, hydraulic constructions such as dams are usually required to hold back extreme floods with a defined return period. One common approach to estimate such flood events with a certain return period (the return level), is the annual block maxima approach, where for every observed year the maximal streamflow is determined and then a distribution to this series of annual maxima is fitted. From the obtained distribution, the streamflow of extreme events can be estimated. The generalized extreme value distribution (GEV) is generally used for fitting the annual series of maxima and has been employed in many applications in meteorology and hydrology. Examples include areas such as precipitation [17,20,25,31,43], floods [5,22,32,33,36,41,46], air temperature [26,29], wind speed extremes [49], and extreme sea levels [25,48,50]. The particular usefulness of the GEV is partly due to the Fisher–Tippett-theorem [16,19], stating that the GEV is the limiting distribution of maxima of a series of independent and identically distributed observations.
∗
Corresponding author. Tel.: +66 917809579. E-mail address:
[email protected] (R. Thongwichian).
http://dx.doi.org/10.1016/j.advwatres.2015.05.004 0309-1708/© 2015 Elsevier Ltd. All rights reserved.
However, it is important to bear in mind that return levels (for a defined return period) estimated from the observed time series will not be identical with the true return level of the (unknown) underlying true distribution. After all, this is the fate of every finite sample. Therefore the return level estimated from the observed time series may over- or underestimate the true return level. One common way to represent this uncertainty is the estimation of confidence intervals (CIs). In order to estimate these CIs, the bootstrap approach is widely employed (for example in [2,10,27,28,38,39,41,43,47], for the theoretical background of the bootstrap approach, see [8,9,11–14,35,42], a good overview over these methods is given for example in [4,7]). The characteristic of the bootstrapping method is that from the given sample, new samples are generated by using either the nonparametric (via resampling with replacement) or parametric approach (via fitting a parametric distribution to the observed sample and then randomly drawing new samples from this distribution). In case of estimating the CI for large quantiles of extreme events, an extensive study by Kyselý [30] has convincingly shown that for the heavy-tailed GEV, the parametric bootstrapping methods are in general superior to the non-parametric ones yielding more realistic ranges for double sided CI. However, this study also revealed that even the best parametric bootstrap method resulted in CIs of return levels that were systematically too narrow. For example, in case of the 95% CI regarding the 100-year return level estimated from a sample
T. Schendel, R. Thongwichian / Advances in Water Resources 83 (2015) 1–9
10000 2000
2000 1900
1940
2000
1980
size of N = 100, all parametric bootstrap approaches studied in [30] led to CIs where 7.7% or more of the samples drawn from a parent distribution (the GEV in this case) did not include the true 100-year return level. For even smaller sample sizes, the coverage error became even worse. These findings are in terms of a realistic risk assessment not satisfying. Therefore, an alternative method to the bootstrap for estimating the CIs was proposed: the Profile Likelihood [6,15,51], which is based on the likelihood ratio test. While applications to hydrology are not widespread, a recent publication [37] suggests that the Profile Likelihood is superior to the bootstrap. However, one parametric bootstrapping approach not considered by Kyselý [30] is the test inversion bootstrap (TIB), which exploits the duality between hypothesis testing and confidence intervals (for an introduction to the TIB, see [3,4]). This method is less known and applications (to our best knowledge absent in the field of hydrology) focused only on CI of single parameters of the distribution itself [3,4], but not on quantities like the return level, which is an explicit function of all three parameters of the GEV. Therefore, in this article we aim to present a novel method particularly designed for estimating the confidence intervals of return levels of the GEV. This approach is based on test inversion bootstrapping combined with the use of the likelihood function. To our best knowledge, the application of the test inversion regarding the bootstrap to a quantity (in this case the return level) which is an explicit function of the three parameters of the distribution is new. The accuracy of this method is studied on both, single sided and double sided CIs, and compared to previously published results of other bootstrapping methods as well as the performance of the Profile Likelihood. 2. Methods In order to demonstrate the novel approach of estimating confidence intervals (CI), the annual peak flow at Potomac River at Point of Rocks, MD, USA (source: US Geological Survey http://water.usgs.gov/nwis/peak) from 1895–2013 (water year from October to September) was studied. The respective data are presented in Fig. 1. In the following, we use the generalized extreme value distribution (GEV) to fit to this annual peak flow time series. We use the following notation of the cumulative GEV distribution (with the parameters a, b, c and the peak flow x):
− x−c b
x−c b
) a; a = 0; 1 + a x − c > 0 −1
; a = 0.
10000
14000
Fig. 2. Q–Q plot for fit of the generalized extreme value distribution to the annual peak flow at Point of Rocks (Potomac River). The line of quality indicates perfect fit.
Fig. 1. Annual peak flow at Point of Rocks (Potomac River) from 1895–2013.
Fa,b,c (x ) = e−(1+a
6000
Expected quantile in m3 s
Year
F0,b,c (x ) = e−e
6000
Observed quantile in m3 s
6000
10000
Annual peak flow in m3 s
14000
14000
2
b
(1)
To avoid confusion, we would like to mention that an alternative parameterisation of the GEV with −a instead of a is often used in hydrology. Anyway, for a ≥ 0 the GEV has no upper boundary, and for a > 0 the GEV is heavily tailed (since the moments greater than 1a are infinite) in contrast to a ≤ 0, where the distribution has a finite upper bound. The special case a = 0 is called the Gumbel distribution, which is unbound but all of its moments are defined. For the given example, we fitted the GEV distribution to the annual peak flow time series using the maximum likelihood estimation (MLE) [6,45]. While other methods such as probability weighted moments (PWM) [21,23]) and combinations of MLE and PWM [36] perform better for small samples, this effect will not play much of a role for the example presented here (which contains 119 values). Moreover, the MLE approach does not rely on the assumption that a finite mean exists. For the given example of the annual peak flow at Point of Rocks (Potomac River), we obtained the shape parameter to a = 0.176, which provides some evidence that the distribution is heavy tailed (P-value P ≈ 0.002 for likelihood ratio test of a = 0). We would like to mention that this gauge was fitted to the GEV distribution by Smith and Katz and coworkers [44] and [25], where the record ended in 1986 and 2000, respectively. The shape parameter was estimated to 0.42 and 0.191. For the fit performed here, a Q–Q plot (presented in Fig. 2) shows a fairly good fit for lower and middle annual peak flow, however for very large annual peak flows the fit underestimates the observed values. If the 1000-year return level for the Potomac River at Point of Rocks is estimated using the GEV distribution, we arrive at 3 a value of approximately 18,200 ms . However, this estimation of the 1000-year return level is based on a finite sample. Therefore we need to address this uncertainty by estimating the CI. The approach we present here to estimate the CI is based on test inversion bootstrapping (TIB). An introduction to TIB can be found in [3,4]. We would like to introduce here a less technical and more intuitive approach to motivate the test inversion approach. As example we consider the two-sided 95% CI. In order to estimate this interval, we need to answer the following question: if the 1000-year return level estimated from the observed sample is within a probability range of non-exceedance of 2.5–97.5% of the underlying (but unknown) “true distribution” , in which interval would the “true” 1000-year return level be located? This view highlights the fact that the observed time series is just a sample drawn from the “true” distribution and that therefore the 1000-year return level estimated from this sample has a certain (yet unknown) probability of nonexceedance regarding the 1000-year return level estimated from the
T. Schendel, R. Thongwichian / Advances in Water Resources 83 (2015) 1–9
entity of samples (with same sample size as the observation). However, for calculating the upper boundary of the 95% CI (which is simply the finite boundary of the one sided 97.5% CI), we assume that the 1000-year return level estimated by the observed sample is the 2.5% quantile of the underlying “true” distribution (of return levels). The question now is: if the previous assumption would be true, which parameters would the underlying true distribution have? Likewise, for the lower boundary of the 95% CI (the finite boundary of the one sided 2.5% CI), we would need to determine the parameters of the underlying “true” distribution, where the 1000-year return level estimated from the observed time series has a probability of non-exceedance of 97.5% regarding this “true” distribution. The term “test inversion” therefore refers to the duality of hypothesis testing and confidence intervals. If we want to calculate the upper boundary of the double sided 95% confidence interval, we need to find the parameters of the GEV distribution, where 2.5% of all 1000-year return levels estimated from samples drawn from this distribution (with the same number of N = 119 values like in the observed sample) are smaller than the 1000-year return level estimated by the observed time series. But the GEV has in total three parameters, while we have only one constraint. Therefore, we used among the set of possible parameters the one that maximizes the likelihood function. This means that we are looking for the most probable set of parameters (regarding the observed sample), where the 1000-year return level estimated from the observed sample is exactly the 2.5% quantile of the distribution of 1000-year return levels. In general, the outlined procedure would be a rather cumbersome task. Running several set of bootstrap simulations (requiring each several thousand resamplings) for each set of two parameters in order to determine the third one (such that the constraint is satisfied) would require tremendous effort. However, the specific structure of the GEV allows to simplify the problem significantly, which will be demonstrated in the following. In order to draw a random value from the GEV (Eq. 1), initially a random variable r uniform on (0, 1) needs to be drawn. Then Eq. (1) needs to be solved for the peak flow x
b · (− ln(r ))−a − 1 ; a = 0 a x = c + b · (ln(− ln(r ))); a = 0 .
x = c+
(2)
It is apparent that every random variable xi drawn from the GEV with the parameters (a = a j , b = 1, c = 0 ) can be easily adjusted to every GEV with the parameters (a = a j , b = bk , c = cl ) (where (aj , bk , cl ) can be any set of valid parameters) by
xnew = xi ·bk + cl . i
(3)
This feature of the GEV allows that for each aj , only the set of parameters (a = a j , b = 1, c = 0 ) needs to be considered, which also requires only one set of bootstrap simulations. Hence, in total B samples (containing N values as in the observed time series) are generated for each set of parameters (aj ; 1; 0) and for each sample the parameters a, b, c ) are estimated using the same estimation method as for the ( observed sample (which for this example is MLE). Eventually an estimate of the 1000-year return level is calculated for each sample. After ordering the obtained B values of the 1000-year return levels, the 2.5% and 97.5% quantiles (in case of the 95% double sided CI) can be determined by choosing the 0.025×(B + 1 )th and 0.975×(B + 1 )th smallest values of return levels [34], respectively. We denote them by qa0.025 and qa0.975 . Throughout this article, B = 9999 was chosen, and thus we picked the 250th and 9750th smallest return level. For each aj the parameters b and c can be derived as follows (QTobs denotes the p T-year return level estimated from the observed sample, qa denotes the pth quantile of the return level of the chosen T-year return period
3
for given a, and L denotes the log-likelihood function of the GEV):
zi : = 1 +
a obs (x − c ) b i
L = −N ln(b) −
N N a+1 −1 ln(zi ) − zi a a i=1
N N ∂L 1 a + 1 − a+1 − zi a =0= ∂c b zi i=1 i=1 b=
(QTobs − c ) qap
.
(4)
i=1
(5)
(6)
Eq. (6) ensures that the pth quantile of the T-year return level matches the T-year return level estimated from the observed sample. After inserting Eq. (6) into Eq. (5), numerically c and subsequently b can be determined. For each given a these parameters allow to determine the log-likelihood function (Eq. 4). While varying a, we can eventually determine the value a that maximizes Eq. 4. We varied a in steps of 0.005 and interpolated between these steps. For various confidence levels, we will present the results for the peak flow at Point of Rocks, Potomac River for the 1000-year return level in the Result section. However, the reliability of the used approach still needs to be studied. As outlined in the Introduction, in [30] for the GEV an extensive comparison of the performance of various non-parametric and parametric bootstrapping methods was given. From a parent distribution (using the values a = 0.2, b = 10, and c = 30) 5000 samples were drawn (including N=20, 40, 60, 100 values, respectively) and for each sample, the return levels with return periods of T = 10, 20, 50, 100 years were determined. Then for each return level, the double sided 90% and 95% CIs were constructed, using for non-parametric and parametric bootstrap methods the following approaches: percentile, studentized, bias-corrected and accelerated bootstrapping. Eventually the percentage of samples where the CI did not cover the true value was determined. In order to compare these results to the presented approach, we used the same parameters for the parent distribution (a = 0.2, b = 10, c = 30 ) and also draw 5000 samples containing 20, 40, 60, or 100 values each. For the return periods given above (plus the 1000-year return level), we constructed the 80%, 90% and 95% confidence interval (CI) and determine the percentage of samples that fail to include the true value (of the parent distribution). Since a double sided CI is based on two one sided CIs, we also studied the coverage error of those one sided CIs. Apart from the used TIB method, deviations occur due to the finite numbers of Monte Carlo simulations. Because first, for each a the empirical quantiles are determined based on 9999 simulations, and second, from the parent distribution 5000 samples are drawn. These Monte Carlo errors (which could be reduced by simply increasing the number of respective simulations) yield in total a standard deviation of 0.27%, 0.38%, and 0.52% for the single sided 2.5% (97.5%), 5% (95%), and 10% (90%) CI, respectively (this follows from the binominal distribution). In contrast to the investigation performed for the annual peak flow of Point of Rocks (Potomac River), we used in accordance to [30] the probability weighted moments (PWM) to estimate the parameters of the GEV of each sample throughout this particular investigation. Please note that this refers only to the estimation of return levels of p the samples (QTobs and qa in Eq. (6)) but does not affect the fact that for the TIB approach we maximize the log-likelihood (Eq. 4) in order to choose the final set of parameters among all suitable set of parameters (those who obey Eq. (6)). Finally, the performance of the TIB for the Gumbel distribution was also studied, which is a special case (a = 0) of the GEV. The parameters of the parent distribution were set to b = 1.5, c = 32, but according to Eq. (3) the results can be generalised to every parametrization of the Gumbel distribution. Similar to the case of the GEV, 5000 samples were drawn from the parent distribution, the return
4
T. Schendel, R. Thongwichian / Advances in Water Resources 83 (2015) 1–9 Table 2 Parameters of the underlying GEV distribution for the observed sample and various single sided confidence levels (determined by test inversion bootstrapping) for Point of Rocks (Potomac River).
Table 1 The 1000-year return levels for various single sided confidence intervals estimated by the test inversion bootstrapping (TIB), parametric percentile bootstrapping (PPB) approach, and the Profile Likelihood (PL) for Point of Rocks (Potomac River). m3 s
Confidence level
TIB in
2.5% 5% 10% 90% 95% 97.5%
11,900 12,600 13,500 27,500 31,600 35,600
PPB in
m3 s
11,100 11,900 13,000 26,200 29,300 32,600
PL in
m3 s
Observed sample 2.5% confidence level 5% confidence level 10% confidence level 90% confidence level 95% confidence level 97.5% confidence level
12,400 13,000 13,900 26,700 29,000 30,600
levels were determined via PWM and for each sample the CIs were estimated. Profile Likelihood Since we want to compare the TIB approach not only to other bootstrap techniques but also to the Profile Likelihood method (which is not commonly used in hydrology), we would like to give a short introduction. In a first step, the GEV is reparameterized in terms of the desired T-year return level xT [6]
xT = c +
b a
→ c = xT − c +
− ln 1 − b a
1 T
−a
− ln 1 −
1 T
−1
−a
−1 .
(7)
Therefore the log-likelihood function L can be also expressed as L(a, b, xT ). For each chosen xT the parameters a and b are determined by maximizing L(xT )
L(xT ) = max L(xT , a, b) a,b
(8)
and eventually the value L(xT ) can be determined. If the original MLE ˆ cˆ), then approximately those xT are ˆ b, estimates are denoted by (a, inside the 100(1 − α )% double sided CI, for which following condition holds true:
ˆ cˆ) − L(xT )] ≤ q1,α ˆ b, 2[L(a,
(9)
q1,α denotes the (1 − α ) quantile of the chi-square distribution with a degree of freedom equal to 1. 3. Results With the method described in the previous section, we determined the values of the 1000-year return level for the Potomac River at Point of Rocks for the 80%, 90%, and 95% confidence intervals (CIs) (see Table 1). For comparison, the parametric percentile bootstrap approach was also used to determine these CIs. For the percentile basic bootstrap method we used simply the parameters estimated from the observed sample (by MLE), generated 9999 bootstrap samples, determined the 1000-year return levels and chose the appropriate quantiles. It is striking that both, the upper and lower boundaries of the CI are larger when using the test inversion bootstrapping (TIB) than the parametric percentile bootstrapping. The reason for this behaviour can be seen in Table 2, where the parameters of the generalized extreme value (GEV) distribution used to generate the samples are shown. While this parameter set is in case of the parametric percentile bootstrap always the same, it differs remarkably in case of the TIB approach. The difference is especially pronounced for the shape parameter a, which increases significantly with increasing boundary of the CI. This explains why the values for the boundaries of the CI are larger in case of the TIB approach: Since the shape parameter a
m3 s
a
b in
0.176 0.048 0.068 0.091 0.276 0.306 0.331
1166 1148 1146 1148 1208 1225 1242
c in
m3 s
2480 2527 2516 2507 2462 2460 2458
governs mainly the tail behavior of the GEV (and therefore also the variability of the high quantile estimates), smaller values of a lead to smaller variability of the 1000-year return levels. Hence, the lower boundaries of the respective confidence interval must be closer to the 1000-year return level estimated from the observed sample. Thus, the lower boundary is larger than the one estimated by the parametric percentile bootstrapping. Similarly, if the shape parameter a is larger, the variability of the 1000-year return level increases. Therefore, the upper boundaries of the CI must be farther away from the 1000-year return level estimated from the observed sample. Hence, the upper boundaries of the respective CI must be larger than the ones estimated by the parametric percentile bootstrapping. In order to demonstrate the reliability of the introduced TIB approach, we used the GEV distribution with the parameters a = 0.2, b = 10, c = 30 as the parent distribution (same as in [30]). After drawing 5000 samples with various number of values N from this parent generation (as described in the Method section), the 2.5%, 5%, 10%, 90% 95%, and 97.5% one-sided CIs were constructed each for the 10-, 20-, 50-, 100-, and 1000-year return levels. Finally the percentage of samples were determined where the true value was not inside of the respective one sided CIs. The results are shown in Table 3. For return levels equal or above the return period of 50 years, the results are reasonably close to the expected ones. While errors from the correct values exist, they are rather small with most values deviating not more than 10% from the correct values, but all deviate less than 20%. For the 10- and 20-year return levels the deviations are slightly larger than for the other return levels, especially the 97.5% one-sided CI for the 10-year return level is too large irrespective the sample size N. Overall, for sample sizes N ≥ 40, no clear dependence of the coverage error (defined as |q − α| (with q denoting the percentage of simulated results that do not include the true value) for the (1 − α )-confidence interval (CI)) regarding the sample size can be observed. The same procedure as outlined above was applied using the Profile Likelihood. The results can be seen in Table 4. It seems that in general for the small one sided CIs (namely the 2.5%, 5%, and 10% CI), too little samples are outside of the respective CI, while for the upper CIs, too many samples are outside of the respective CI. This means that the Profile Likelihood underestimates the lower as well as the upper boundary of double-sided CIs. Noteworthy, the coverage error reduces significantly with increasing sample size. However, a comparison with the results for the TIB approach (Table 3 shows that even for the sample size N = 100, and return periods equal or larger than 50 years, the coverage error is especially for the 90%, 95%, and 97.5% CI still slightly larger for the Profile Likelihood than for the TIB. For lower sample sizes the advantage of the TIB approach against the Profile Likelihood even increases. As mentioned before, the performance of many bootstrap approaches were studied by Kyselý [30]. However, only the percentage of simulated results of double sided CIs that do not cover the true value were reported. For comparison, we combined for each case the
T. Schendel, R. Thongwichian / Advances in Water Resources 83 (2015) 1–9
5
Table 3 Percentage of simulated results for which the single sided confidence intervals (CI) (determined by test inversion bootstrapping) did not cover the true values of the return level (r.l.) of the parent distribution (GEV distribution with shape parameter a=0.2) for different return periods and different sample sizes N. N
CI
10-year r.l.
20-year r.l.
50-year r.l.
100-year r.l.
1000-year r.l.
20
2.5% 5% 10% 90% 95% 97.5% 2.5% 5% 10% 90% 95% 97.5% 2.5% 5% 10% 90% 95% 97.5% 2.5% 5% 10% 90% 95% 97.5%
100–3.6 100–6.4 100–10.8 7.5 2.8 1.3 100–3.0 100–5.8 100–10.5 8.8 4.3 2.0 100–2.9 100–5.0 100–9.8 9.2 4.2 1.9 100–2.9 100–5.6 100–9.9 9.6 4.3 1.9
100–3.3 100–6.0 100–9.7 8.4 3.8 2.0 100–2.9 100–5.3 100–10.0 9.4 4.7 2.2 100–2.5 100–4.9 100–9.8 8.8 4.1 2.2 100–2.8 100–5.5 100–10.4 9.5 4.3 2.2
100–2.7 100–5.2 100–9.4 8.8 4.7 2.3 100–2.8 100–5.3 100–9.7 9.7 5.1 2.3 100 –2.3 100–4.5 100–9.6 8.6 4.5 2.3 100–2.7 100–5.4 100–10.4 9.6 4.6 2.4
100–2.7 100–4.6 100–9.3 9.0 5.1 2.4 100–2.8 100–5.2 100–9.7 9.8 4.9 2.3 100–2.1 100–4.4 100–9.7 9.1 4.3 2.3 100–2.7 100–5.3 100–10.5 9.8 4.6 2.3
100–2.2 100–4.4 100–8.9 9.4 5.1 2.5 100–2.5 100–5.1 100–9.5 9.9 5.0 2.3 100–2.2 100–4.5 100–9.9 9.4 4.7 2.2 100–2.6 100–5.1 100–10.1 9.7 4.8 2.5
40
60
100
Table 4 Percentage of simulated results for which the single sided confidence intervals (CI) (determined by Profile Likelihood) did not cover the true values of the return level (r.l.) of the parent distribution (GEV distribution with shape parameter a=0.2) for different return periods and different sample sizes N. N
CI(% )
10-year r.l.
20-year r.l.
50-year r.l.
100-year r.l.
1000-year r.l.
20
2.5 5 10 90 95 97.5 2.5 5 10 90 95 97.5 2.5 5 10 90 95 97.5 2.5 5 10 90 95 97.5
100–1.9 100–3.5 100–7.7 14.2 7.8 4.2 100–1.9 100–3.6 100–7.7 12.8 6.7 3.3 100–2.1 100–4.1 100–8.5 11.4 6.1 2.9 100–2.3 100–4.8 100–8.9 11.2 5.5 2.8
100–1.7 100–4.0 100–8.2 13.8 7.7 4.1 100–2.0 100–3.9 100–8.0 12.7 6.5 3.4 100–1.9 100–4.4 100–8.5 11.5 5.8 3.1 100–2.3 100–4.6 100–9.6 11.0 5.3 2.8
100–2.0 100–4.5 100–8.5 13.3 7.0 4.0 100–2.2 100–4.0 100–8.1 12.5 6.4 3.2 100–2.0 100–4.4 100–9.1 11.4 5.8 3.0 100–2.3 100–4.8 100–9.4 10.9 5.3 2.8
100–2.2 100–4.5 100–9.1 13.5 6.8 4.1 100–2.2 100–4.3 100–8.1 12.5 6.4 3.2 100–2.1 100–4.4 100–9.1 11.0 5.7 3.1 100–2.4 100–4.8 100–9.3 10.9 5.4 2.7
100–2.5 100–4.8 100–9.6 12.8 6.8 3.8 100–2.5 100–4.7 100–9.0 12.3 6.4 3.2 100–2.4 100–4.6 100–9.5 11.0 5.6 2.9 100–2.5 100–4.8 100–9.4 11.1 5.4 2.8
40
60
100
two finite boundaries of the single sided CIs in order to obtain the respective double sided CI. The double sided 90% CI consists of the finite boundary of the 5% and 95% one sided CI, while the double sided 95% CI consists of the finite boundary of the 2.5% and 97.5% single sided CI (as in [30]). The results were compared to the bootstrapping method that performed best in [30]. This comparison is shown in Table 5. First, in most cases the TIB as well as the Profile Likelihood totally outperforms any of the approaches used in [30]. Second, it is striking that there is not a single case where any of the bootstrapping approaches used by Kyselý [30] has a lower coverage error than the
TIB method. Third, the difference between the TIB to other bootstrapping approaches is especially large for smaller sample sizes. Finally, the introduced TIB was applied to the Gumbel distribution, which is a special case of the GEV (shape parameter a = 0). In order to determine the coverage error for the single sided CIs, the same procedure as for the previously used heavy tailed GEV was carried out: 5000 samples from the parent distribution (with the parameters b = 1.5 and c = 32) were drawn, then the appropriate CI were estimated and finally the percentage of samples, where the true value of the return level is not within the CI were determined. The results are
6
T. Schendel, R. Thongwichian / Advances in Water Resources 83 (2015) 1–9 Table 5 Comparison of the percentage of simulated results for which the double sided confidence intervals (CI) did not include the true value of the return level of the parent distribution (GEV distribution). Following methods for obtaining the CI were considered: TIB=test inversion bootstrap, PL=Profile Likelihood, BOB=best other bootstrap used in [30], which includes the non-parametric and parametric version of the basic, studentized, and biased and accelerated bootstrap approach) for different sample sizes N. N
T-year return level r.l.
90% CI TIB
90% CI PL
90% CI BOBM
95% CI TIB
95% CI PL
95% CI BOBM
20
10-year 20-year 50-year 100-year 10-year 20-year 50-year 100-year 10-year 20-year 50-year 100-year 10-year 20-year 50-year 100-year
9.2 9.8 9.9 9.7 10.1 10.0 10.4 10.1 9.2 9.0 9.0 8.7 9.9 9.8 10.0 9.9
11.3 11.6 11.5 11.3 10.3 10.4 10.5 10.7 10.2 10.2 10.2 10.1 10.4 9.9 10.1 10.1
15.5 19.6 19.4 18.9 13.4 15.5 15.4 15.0 12.7 14.1 14.4 13.9 10.1 12.3 13.0 13.1
4.9 5.3 5.0 5.1 5.0 5.1 5.1 5.1 4.8 4.7 4.6 4.4 4.8 5.0 5.1 5.0
6.1 5.8 6.0 6.3 5.3 5.4 5.4 5.4 5.0 5.0 5.0 5.2 5.1 5.2 5.1 5.0
10.3 13.9 14.6 14.1 8.1 10.2 10.7 10.6 7.2 8.8 9.4 9.3 6.2 6.7 7.6 7.7
40
60
100
Table 6 Percentage of simulated results for which the single sided confidence intervals (CI) (determined by test inversion boootstrapping) did not cover the true values of the return level (r.l.) of the parent distribution (Gumbel distribution) for different return periods and different sample sizes N. N
CI(%)
10-year r.l.
20-year r.l.
50-year r.l.
100-year r.l.
1000-year r.l.
20
2.5 5 10 90 95 97.5 2.5 5 10 90 95 97.5 2.5 5 10 90 95 97.5 2.5 5 10 90 95 97.5
100–3.6 100–6.8 100–11.7 9.5 3.5 1.4 100–3.4 100–5.8 100–10.7 9.0 4.1 1.8 100–3.0 100–5.4 100–10.6 9.5 4.2 1.9 100–3.0 100–5.6 100–11.1 10.0 4.2 2.2
100–3.3 100–6.6 100–11.2 9.7 4.2 1.7 100–3.0 100–5.8 100–10.3 9.1 4.2 1.9 100–2.8 100–5.3 100–10.3 9.3 4.5 1.9 100–2.8 100–5.5 100–11.0 10.1 4.2 2.2
100–3.4 100–6.2 100–10.9 10.0 4.5 2.0 100–2.9 100–5.4 100–10.5 9.2 4.4 2.0 100–2.8 100–5.2 100–10.3 9.2 4.5 2.0 100–2.5 100–5.4 100–11.0 10.3 4.4 2.2
100–3.1 100–6.0 100–10.8 10.3 4.6 2.1 100–2.9 100–5.5 100–10.3 9.2 4.5 2.1 100–2.7 100–5.2 100–10.4 9.2 4.7 2.0 100–2.5 100–5.5 100–10.9 10.4 4.4 2.1
100–3.2 100–5.9 100–10.5 10.1 4.8 2.3 100–2.8 100–5.4 100–9.9 9.3 4.6 2.3 100–2.7 100–5.1 100–10.3 9.3 4.7 2.1 100–2.5 100–5.5 100–11.0 10.1 4.5 2.1
40
60
100
shown in Table 6. For comparison, the same procedure was carried out for the Profile Likelihood (Table 7). In general, the coverage error for the Gumbel distribution using the TIB and the Profile Likelihood yields similar characteristics as the previously studied heavy tailed GEV: it becomes smaller for return levels with larger return periods. However, up to a sample size to N = 100, the coverage rate of the CIs seem to be closer to their nominal values for the TIB approach than for Profile Likelihood. 4. Discussion In this article, we have developed an approach particularly designed for estimating the confidence intervals (CIs) of return levels of the generalized extreme value distribution (GEV). The presented approach combines the test inversion bootstrapping (TIB) with a like-
lihood function. It is numerically advantageous, since by exploiting the specific structure of the GEV, a three-dimensional bootstrapping problem (the return level of the GEV distribution depends explicitly on all three parameters) could be turned into an effective onedimensional one. Our investigation has shown that for situations where the GEV is heavy tailed, the coverage error of single as well as double sided CIs are reasonable small. By comparing the TIB approach to other methods, we have shown evidence that TIB is superior to other boootstrapping approaches. Moreover, for sample sizes up to N = 100 and return periods equal or larger than 50 years, the TIB performs also better than the Profile Likelihood. Finally, an application of the TIB to Gumbel distribution yielded likewise results. The use of the TIB has been mainly advocated by Carpenter [3], while the idea of test inversion has been around before [9], [18], [24]. Especially DiCiccio and Romao [9] also proposed the use of
T. Schendel, R. Thongwichian / Advances in Water Resources 83 (2015) 1–9
7
Table 7 Percentage of simulated results for which the single sided confidence intervals (CI) (determined by Profile Likelihood) did not cover the true values of the return level (r.l.) of the parent distribution (Gumbel distribution) for different return periods and different sample sizes N. N
CI(%)
10-year r.l
20-year r.l.
50-year r.l.
100-year r.l.
1000-year r.l.
20
2.5 5 10 90 95 97.5 2.5 5 10 90 95 97.5 2.5 5 10 90 95 97.5 2.5 5 10 90 95 97.5
100–2.0 100–3.6 100–6.8 15.3 7.6 3.9 100–2.1 100–4.1 100–7.7 12.5 6.6 3.5 100–1.7 100–4.1 100–8.6 12.4 6.3 3.2 100–2.1 100–4.3 100–9.2 11.7 6.0 3.1
100–1.8 100–3.3 100–6.5 15.5 7.8 4.2 100–2.1 100–4.0 100–7.4 12.7 6.9 3.5 100–1.8 100–4.0 100–8.2 12.8 6.4 3.2 100–2.1 100–4.4 100–9.1 12.1 6.0 3.1
100–1.9 100–3.2 100–6.3 15.5 7.8 4.2 100–2.1 100–3.8 100–7.2 13.0 7.1 3.6 100–1.9 100–3.8 100–8.1 12.9 6.4 3.3 100–2.1 100–4.3 100–8.9 12.2 6.2 3.1
100–1.8 100–3.2 100–6.1 15.6 8.0 4.3 100–2.0 100–3.9 100–7.0 13.3 7.1 3.5 100–1.9 100–3.7 100–8.1 12.9 6.4 3.3 100–2.1 100–4.3 100–9.0 12.5 6.2 3.0
100–1.6 100–3.2 100–6.0 16.1 8.1 4.4 100–2.0 100–3.9 100–7.1 13.4 7.1 3.4 100–1.9 100–3.7 100–7.9 12.9 6.6 3.5 100–2.1 100–4.1 100–9.0 12.4 6.3 3.1
40
60
100
MLE in situations with nuisance parameters. However, to our best knowledge, the TIB has never been applied to a quantity that is an explicit function of several distribution parameters (like the return level). Moreover, even for parameters of the GEV, we are not aware that the performance of the TIB for heavy-tailed distributions was studied before. While the idea of test inversion itself was used by Burn [2], actually only a non-parametric bootstrap was performed (and the residuals obtained for the upper quantiles were used to obtain the lower boundary of the confidence interval and vice versa). However, this approach is likely to encounter the problems associated with all non-parametric bootstrap approaches for the GEV (as outlined in [30]). Moreover, the range of the double sided CI was determined directly by the characteristics of the sample - which is misleading, since characteristics of the sample are not representative for the underlying true distribution. In contrast, the TIB approach fully embraces the fact that the distribution fitted to the observed sample cannot directly govern the boundaries of the CI. Hence, to determine the (100 − α )% single sided CI for a given return value, the distribution parameters need to be estimated, where α % of samples drawn from this distribution yield a return level smaller than the one estimated from the observed sample. Basically those parameters determine the boundaries of the CI. In case of the example of Point of Rocks (Potomac River), we have shown that the shape parameter a corresponding to the upper boundary of the CI is much larger than the one estimated from the observed time series. That makes sense - if the return level estimated from the observation has underestimated the true return level, the underlying “‘true”’ distribution must most likely have a larger a (and therefore be more heavily tailed), since from all three parameters of the GEV (location, scale and shape), the shape parameter is the one that is the least constrained through the values of the observed sample. Vice versa, if the return level estimated from observation overestimates the true return level, the underlying ”‘true”’ distribution will be much less heavily tailed (smaller a). Our study shows that the presented TIB dominates other bootstrapping approaches. Building on the work of Kyselý [30], we can see that the best common bootstrap methods have a coverage error (of double sided 95% CI) that exceeds the expected value by a factor 1.5–3 for the 100-year return level. In contrast, for the same return
level the TIB has deviations from the correct values around maximal 10%. However, we would like to emphasize that for a thoroughly test of accuracy an investigation of the coverage error of double sided CI is not sufficient, the single sided CIs needs to be studied too. As we have observed from the use of the Profile Likelihood, both the lower and upper boundary are underestimated. But if only double sided CIs are considered, this error is partly compensated, which may lead to an overoptimistic view of this method. Finally, while one could claim that for a double sided CI, a reasonable small coverage error is sufficient, real practical implications arise from the fact that the boundaries of the CI are clearly defined (for example for the symmetric 95% CI that the value estimated from the observed sample is within a quantile range between 2.5% and 97.5% of the underlying true distribution). Since that is what a CI can tell us: if the observed sample has not been a too extreme event (defined by the confidence level) regarding the unknown true distribution, we can expect the true value to be within the range of the CI. Particularly in flood frequency analysis, the upper boundary of the CI is in terms of risk assessment of much more importance than the lower one. Therefore, we also studied the accuracy of the single sided CIs. For sample sizes of N = 40 or more and return periods of 50 years or more, we obtained reasonable good values. There is a tendency that the upper boundary is slightly overestimated, however in case of risk assessment an overestimation might be better than an underestimation of the same magnitude. However, for the 10-year return level, we observed greater deviations. This might be due to the fact that return levels with smaller return periods are directly influenced by all three parameters of the GEV. While in case of the 1000-year return level, the location parameter c plays a minor role, its direct influence for the 10-year return value is much more pronounced. In such cases, many combinations of significantly different parameter sets might have the same likelihood (which might not be much lower than the maximum likelihood). However, we only choose the most likely set of parameters. Therefore, it is not surprising that significant deviations occur. Our investigation shows that the Profile Likelihood performs much better (regarding the coverage error of double sided CIs) than most of the common bootstrap approaches (except the TIB), which is in compliance with the findings of Obeysekera and Salas [37]. However for sample sizes up to 100 and return periods equal or larger than
8
T. Schendel, R. Thongwichian / Advances in Water Resources 83 (2015) 1–9
50 years, the TIB produceds better results. One reason is that the Profile Likelihood relies on a large sample chi-squared approximation, which is not valid for small sample sizes. This explains why the performance of the Profile Likelihood increases strongly with increasing sample size. We want to emphasize that the performance of the TIB approach relies heavily on the right specification of the underlying distribution. Fortunately, in the block maxima approach, the Fisher–Tippett theorem guarantees that the underlying distribution will eventually converge to the GEV. Nevertheless, if the underlying distribution is actually a Gumbel distribution (a special case of the GEV), using the GEV in cases of small sample sizes could lead to significant deviations. However, as long as we cannot (with given significance level) rule out the possibility that the distribution is heavily tailed, we probably still need to use the full GEV distribution at least for estimating the upper boundary of the CI (in case of peak flood estimation). This might be total contradictory to usual scientific practice (where a more complex model needs to show significant advantages before adopted in favor of a more simple one and not vice versa), but in terms of risk we may need to choose the at least favorable situation - and that remains to consider the heavy tail. Finally, the use of the proposed TIB approach for the peak over threshold analysis might be as advantageous as shown for the block maxima approach. First, due to the Pickands–Balkema–de-Haan theorem [1,40], the limiting distribution for the values above the threshold is given by the generalized Pareto distribution. Second, the generalized Pareto distribution has a very similar structure to the GEV distribution, which means that the same numerical advantages as for the GEV can be used, which effectively turns a three-dimensional bootstrapping problem into a one-dimensional. And third, for the parameter range where the generalized Pareto distribution is heavily tailed, the proposed TIB approach could have a similar large performance edge against other bootstrap methods as shown for the GEV.
References [1] Balkema A, de Haan L. Residual life time at great age. Ann Probab 1974;2:792–804. http://dx.doi.org/10.1214/aop/1176996548. [2] Burn DH. The use of resampling for estimating confidence intervals for single site and pooled frequency analysis. Hydrolog Sci J 2003;48:25–38. http://dx.doi.org/10.1623/hysj.48.1.25.43485. [3] Carpenter JR. Test-inversion bootstrap confidence intervals. J Roy Stat Soc Ser B 1999;B61:159–72. http://dx.doi.org/10.1111/1467-9868.00169. [4] JR Carpenter, Bithell J. Bootstrap confidence intervals: When, which, what? A practical guide for medical statisticians. Stat Med 2000;19:1141– 64. http://dx.doi.org/10.1002/(SICI)1097-0258(20000515)19:9<1141::AIDSIM479>3.0.CO;2-F. [5] JU Chowdhury, JR Stedinger, Lu LH. Goodness of fit tests for regional generalized extreme value flood distributions. Water Resour Res 1991;7:1765–76. http://dx.doi.org/10.1029/91WR00077. [6] Coles S. An introduction to statistical modeling of extreme values. London: Springer; 2001. http://dx.doi.org/10.1007/978-1-4471-3675-0. [7] Davison A, Hinkley D. Bootstrap Methods and their application. Cambridge series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press; 1997. http://dx.doi.org/10.1017/CBO9780511802843. [8] TJ DiCiccio, Effron B. Bootstrap confidence intervals. Stat Sci 1996;11:189–228. http://dx.doi.org/10.1214/ss/1032280214. [9] TJ DiCiccio, Roman JP. Nonparametric confidence limits by resampling methods and least favourable families. Int Statist Rev 1990;58:59–76. http://dx.doi.org/10.2307/1403474. [10] Dunn PK. Bootstrap confidence intervals for predicted rainfall quantiles. Int J Climatol 2001;21:89–94. http://dx.doi.org/10.1002/joc.596. [11] Efron B. Bootstrap methods: another look at the jackknife. Ann Stat 1979;7:1–26. http://dx.doi.org/10.1007/978-1-4612-4380-9_41. [12] Efron B. Censored data and the bootstrap. J Am Stat Assoc 1981;76:312–19. http://dx.doi.org/10.1080/01621459.1981.10477650. [13] Efron B. Better bootstrap confidence intervals. J Am Stat Assoc 1987;82:171–200. http://dx.doi.org/10.1080/01621459.1987.10478410. [14] Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman& Hall; 1993. http://dx.doi.org/10.1007/978-1-4899-4541-9. [15] MA Evans, Kim H, OBrien TE. An application of profile-likelihood based confidence interval to capture-recapture estimators. J Agri Bio and Environ Stat 1996;1:131– 40. http://dx.doi.org/10.2307/1400565.
[16] RA Fisher, Tippett LHC. Limiting forms of the frequency distribution of the largest or smallest member of a sample. Proc Cambridge Phil Soc 1928;24:180–90. http://dx.doi.org/10.1017/S0305004100015681. [17] HJ Fowler, Kilsby CG. A regional frequency analysis of United Kingdom extreme rainfall from 1961 to 2000. Int J Climatol 2003;23:1313–34. http://dx.doi.org/ 10.1002/joc.943. [18] PH Garthwaite, Buckland ST. Generating Monte Carlo confidence interval by the Robbins-Monro process. Appl Stat 1992;41:159–71. http://dx.doi.org/ 10.2307/2347625. [19] Gnedenko BV. Sur la distribution limite du terme maximum d’une serie aleatoire. Ann Math 1943;44:423–53. http://dx.doi.org/10.2307/1968974. [20] Hanel M, Buishand TA. Multi-model analysis of RCM simulated 1-day to 30-day seasonal precipitation extremes in the Czech Republic. J Hydrol 2012;412– 413:141–50. http://dx.doi.org/10.1016/j.jhydrol.2011.02.007. [21] Hosking JRM. L-moments: analysis and estimation of distributions using linear combinations of order statistics. J R Statist Soc B 1990;52:105–24. [22] JRM Hosking, JR Wallis, Wood EF. An appraisal of the regional flood frequency procedure in the UK Flood Studies Report. Hydrol Sci J 1985;30:85–109. http://dx.doi.org/10.1080/02626668509490973. [23] JRM Hosking, JR Wallis, Wood EF. Estimation of the generalized extreme-value distribution by the method of probability-weighted moments. Technometrics 1985b;27:251–61. http://dx.doi.org/10.1080/00401706.1985.10488049. [24] Kabaila P. Some properties of profile bootstrap confidence intervals. Aust J Statist 1993;35:205–14. http://dx.doi.org/10.1111/j.1467-842X.1993.tb01326.x. [25] RW Katz, MB Parlange, Naveau P. Statistics of extremes in hydrology. Adv Water Resour 2002;25:1287–304. http://dx.doi.org/10.1016/S0309-1708(02)00056-8. [26] MN Khalik, St-Hilaire A, TBMJ Ouarda, Bobee B. Frequency analysis and temporal patttern of occurrences of southern quebec heatwaves. Int J Climatol 2005;25:485–504. http://dx.doi.org/10.1002/joc.1141. [27] VV Kharin, Zwiers FW. Changes in the extremes in an ensemble of transient climate simulations with a coupled atmosphere-ocean GCM. J Climate 2000;13:3760–88. http://dx.doi.org/10.1175/1520-0442(2000)0132.0.CO;2. [28] VV Kharin, Zwiers FW. Estimating extremes in transient climate change simulations. J Climate 2005;18:1156–73. http://dx.doi.org/10.1175/JCLI3320.1. [29] Kyselý J. Comparison of extremes in GCM-simulated, downscaled and observed central-european temperature series. Climate Res 2002;20:211–22. http://dx.doi.org/10.3354/cr020211. [30] Kyselý J. A cautionary note on the use of nonparametric bootstrap for estimating uncertainties in extreme-value models. J Appl Meteorol Climatol 2008;47:3236– 51. http://dx.doi.org/10.1175/2008JAMC1763.1. [31] Kyselý J, Beranová R, Placová E. Climate change scenarios of precipitation extremes in Central Europe from ENSEMBLES regional climate models. Theor Appl Climatol 2011;104:529–42. http://dx.doi.org/10.1007/s00704-010-0362-z. [32] DP Lettenmaier, JR Wallis, Wood EF. Effect of regional heterogeneity on flood frequency estimation. Water Resour Res 1987;23:313–23. http://dx.doi.org/ 10.1029/WR023i002p00313. [33] Madsen H, CP Pearson, Rosbjerg D. Comparison of annual maximum series and partial duration series methods for modeling extreme hydrologic events 2 Regional modeling. Water Resour Res 1997;33:759–69. http://dx.doi.org/ 10.1029/96WR03849. [34] Makkonen L. Bringing closure to the plotting position controversy. Commun Stat A-Theor 2008;37:460–7. http://dx.doi.org/10.1080/03610920701653094. [35] WG Manteiga, JMP Sánchez, J Romo. The bootstrap - a review. Comp Stat 1994;9:165–205. [36] JE Morrison, Smith JA. Stochastic modeling of flood peaks using the generalized extreme value distribution. Water Resour Res 2002;38:1–12. http://dx.doi.org/10.1029/2001WR000502. [37] Obeysekera J, JD Salas. Quantifying the uncertainty of design floods under nonstationary conditions. J Hydraul Eng 2014;19(7):1438–46. http://dx.doi.org/ 10.1061/(ASCE)HE.1943-5584.0000931. [38] Paeth H, Hense A. Mean versus extreme climate in the Mediterranean region and its sensitivity to future global warming conditions. Meteor Z 2005;14:329–47. http://dx.doi.org/10.1127/0941-2948/2005/0036. [39] JS Park, HS Jung, RS Kim, Oh JH. Modelling summer extreme rainfall over the Korean Peninsula using Wakeby distribution. Int J Climatol 2001;21:1371–84. http://dx.doi.org/10.1002/joc.701. [40] Pickands J. Statistical inference using extreme order statistics. Ann Stat 1975;3:119–31. http://dx.doi.org/10.1214/aos/1176343003. [41] HW Rust, Kallache M, HJ Schellnhuber, Kropp JP. Confidence intervals for flood return level estimates assuming long-range dependence. In: Kropp JP, Schellnhuber HJ, editors. In extremis. Disruptive Events and Trends in Climate and Hydrology. Heidelberg: Springer; 2011. http://dx.doi.org/10.1007/978-3-642-14863-7_3. [42] Shao J, Tu D. The jackknife and bootstrap. New York: Springer; 1995. http://dx.doi.org/10.1007/978-1-4612-0795-5. [43] Semmler T, Jacob D. Modelling extreme precipitation events - A climate change for Europe. Global Planet Change 2004;44:119–27. http://dx.doi.org/ 10.1016/j.gloplacha.2004.06.008. [44] Smith JA. Estimating the upper tail of flood frequency distributions. Water Resour Res 1987;25:1287–304. http://dx.doi.org/10.1029/WR023i008p01657. [45] Smith RL. Maximum likelihood estimation in a class of nonregular cases. Biometrika 1985;72:67–90. http://dx.doi.org/10.2307/2336336. [46] JR Stedinger, Lu LH. Appraisal of regional and index flood quantile estimators. Stoch Hydrol Hydraul 1995;9:49–75. http://dx.doi.org/10.1007/BF01581758. [47] Trichakis I, Nikolos I, Karatzas GP. Comparison of bootstrap intervals for an ANN model of a karstic aquifer response. Hydrol Process 2011;25:2827–36. http://dx.doi.org/10.1002/hyp.8044.
T. Schendel, R. Thongwichian / Advances in Water Resources 83 (2015) 1–9 [48] van den Brink HW, GP Koennen, Opsteegh JD. The reliability of extreme surge levels , estimated from observational records of order hundred years. J Coastal Res 2003;19:376–88. [49] van den Brink HW, GP Koennen, Opsteegh JD. Statistics of synoptic-scale wind speeds in ensemble simulations of current and future climate. J Climate 2004;17:4564–74. http://dx.doi.org/10.1175/JCLI-3227.1.
9
[50] van den Brink HW, GP Koennen, Opsteegh JD. Uncertainties in extreme surge level estimates from observational records. Philos Trans Roy Soc London 2005;A363:1377–86. http://dx.doi.org/10.1098/rsta.2005.1573. [51] DJ Venzon, Moolgavkar SH. A Method for Computing Profile-Likelihood Based Confidence Intervals. Appl Stat 1988;37:87–94. http://dx.doi.org/ 10.2307/2347496.