Considering historical flood events in flood frequency analysis: Is it worth the effort?

Considering historical flood events in flood frequency analysis: Is it worth the effort?

Accepted Manuscript Considering historical flood events in flood frequency analysis: Is it worth the effort? Thomas Schendel, Rossukon Thongwichian P...

571KB Sizes 0 Downloads 90 Views

Accepted Manuscript

Considering historical flood events in flood frequency analysis: Is it worth the effort? Thomas Schendel, Rossukon Thongwichian PII: DOI: Reference:

S0309-1708(17)30479-7 10.1016/j.advwatres.2017.05.002 ADWR 2842

To appear in:

Advances in Water Resources

Received date: Revised date: Accepted date:

12 November 2015 16 January 2017 5 May 2017

Please cite this article as: Thomas Schendel, Rossukon Thongwichian, Considering historical flood events in flood frequency analysis: Is it worth the effort?, Advances in Water Resources (2017), doi: 10.1016/j.advwatres.2017.05.002

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • We studied the utility of including historic floods for flood frequency

CR IP T

analysis.

• Bias decreases for large return periods, but increases for smaller ones.

• Including historical data substantially reduces root mean square error.

AN US

• We generalized an approach to estimate the historic period to several known floods.

• Exact knowledge of the historic period does only marginally increase

AC

CE

PT

ED

M

performance.

1

ACCEPTED MANUSCRIPT

Considering historical flood events in flood frequency analysis: Is it worth the effort?

CR IP T

Thomas Schendela,∗, Rossukon Thongwichianb a

b

Oranienburger Straße 33, 16775 Gransee, Germany Thammasat University, Department of Biotechnology, Faculty of Science and Technology, Pathumthani, 12121 Thailand

AN US

Abstract

Information about historical floods can be useful in reducing uncertainties in flood frequency estimation. Since the start of the historical record is often defined by the first known flood, the length of the true historical period

M

M remains unknown. We have expanded a previously published method of estimating M to the case of several known floods within the historical

ED

period. We performed a systematic evaluation of the usefulness of including historical flood events into flood frequency analysis for a wide range of return periods and studied bias as well as relative root mean square error

PT

(RRMSE). Since we used the generalized extreme value distribution (GEV) as parent distribution, we were able to investigate the impact of varying the

CE

skewness on RRMSE. We confirmed the usefulness of historical flood data regarding the reduction of RRMSE, however we found that this reduction

AC

is less pronounced the more positively skewed the parent distribution was. Including historical flood information had an ambiguous effect on bias: depending on length and number of known floods of the historical period, bias ∗

Corresponding author. Email:[email protected]

Preprint submitted to Advances in Water Resources

May 6, 2017

ACCEPTED MANUSCRIPT

was reduced for large return periods, but increased for smaller ones. Finally, we customized the test inversion bootstrap for estimating confidence inter-

CR IP T

vals to the case that historical flood events are taken into account into flood frequency analysis.

Keywords: flood frequency analysis, historical floods, confidence intervals

1

1. Introduction

Water resources design and management requires the analysis of extreme

3

flood events, especially the determination of floods with defined return pe-

4

riods. A common approach therefore is the annual block maxima approach,

5

where for each year the peak streamflow is determined and a distribution

6

is fitted to this series of maxima. Eventually this distribution is used to

7

estimate the return level for a defined return period. However, due to the

8

finite sample size, the estimated return levels are typically associated with

9

a wide range of uncertainty. One way to reduce this uncertainty is the in-

10

clusion of historical recorded flood events before the onset of systematical

11

record. These historical records are characterized by the knowledge of one

12

or several (usually very extreme) flood events, while for the majority of the

13

years within the historical period the maximal annual streamflow is unknown.

14

But since they were not recorded, it is assumed that they were smaller than

15

the smallest mentioned flood event. In many case studies, the implementa-

AC

CE

PT

ED

M

AN US

2

16

tion of historical data into flood frequency analysis was demonstrated, e.g.

17

by using traditional methods of statistical analysis [8] [28] [22] or Bayesian

18

statistics [26] [33] [9] [25]. The advantages of including historical floods were

19

systematically studied e.g. by Frances et al. (1994) [8]. By assuming the log 3

ACCEPTED MANUSCRIPT

Gumbel distribution as parent distribution, for different types of historical

21

data (flood events are explicitly known vs. only known that a threshold was

22

surpassed) the statistical gain of including historical information was studied

23

in dependence on the desired return period.

CR IP T

20

However, in many cases the true length of the historical period M is not

25

known, especially when the historical record starts already with a known

26

flood. Assuming that M is simply the period from the first known flood to

27

the start of the systematic record (denoted as L) will substantially underes-

28

timate the true length of the historical period, which was already mentioned

29

in Hirsch and Stedinger (1987) [11]. Ga´al et al. (2010) [9] combined a visual

30

inspection method with Bayesian analysis in order to estimate M , but the au-

31

thors called their approach themselves as ”subjective”. Another systematic

32

study was performed by Strupczewski et al. (2014) [31] (using frequentist

33

statistics), where the benefit of including the largest historical flood for flood

34

frequency analysis was studied, using the Weibull and Gumbel distribution

35

as underlying parent distribution. The authors presented a sound approach

36

for estimating M if one flood is known during M . However, this investi-

37

gation led to the rather pessimistic claim, that it is not sure ”whether the

38

whole operation is worth a candle”. Although the root mean square error

39

was substantially reduced by including historical information (for the flood

40

with return period of 100 years), the authors judged the occurring positive

CE

PT

ED

M

AN US

24

bias (for the same return level) in conjunction with the uncertainty of flood

42

magnitude and length of the historic period very critical. However, we have

43

some doubts regarding the used approach. First, only samples were consid-

44

ered were the maximal flood is within the historical period and does not occur

AC 41

4

ACCEPTED MANUSCRIPT

in the systematic record (with length N ). This selection itself may already

46

introduce a bias, since samples with very large floods in the systematic record

47

(or reverse, only modestly large floods in the historic period) are neglected.

48

Second, for estimating the historical period M , the formula M = 2 · L is

49

used, although the authors mention elsewhere that M = 2 · L + N should

50

be utilized if the maximal flood occurs in the historical period. Both factors

51

may result in a positive bias compared to a flood frequency analysis using

52

the systematic record only.

AN US

CR IP T

45

While the assessment of uncertainty of the results of flood frequency anal-

54

ysis is an integral part of the results obtained by Bayesian statistics (credible

55

interval), the calculation of confidence intervals is often neglected in the

56

framework of frequentist statistics. Nevertheless, attempts to estimate confi-

57

dence intervals in the presence of historical information have been made for

58

example by McDonald et al. (2014) [22], where the variance is estimated by

59

the delta method and the appropriate return level is assumed to be normal

60

distributed. However, for the cases of extreme events, it was shown that con-

61

fidence intervals are typically not symmetrical distributed (see for example

62

[24]), which leads to the use of bootstrap approaches ([3],[17],[1]) or Profile

63

likelihood ([24], [32]). In a previous article, we have customized a rarely used

64

bootstrap method, the test inversion bootstrap, to the case of systematic an-

65

nual flood record for the generalized extreme value distribution (GEV) [29]

CE

PT

ED

M

53

and shown its superiority against other bootstrap approaches as well as the

67

Profile Likelihood. The main idea of the test inversion is that it emphasizes

68

that the sample does not represent the parent distribution (nor its variance),

69

but given a certain confidence level, the sample is seen as a clearly defined

AC 66

5

ACCEPTED MANUSCRIPT

70

over- or underestimation of the underlying distribution. In this article we aim to evaluate the usefulness of using historical flood

72

information for flood frequency analysis using the GEV as parent distribu-

73

tion. We will investigate this matter studying the bias and relative root mean

74

square error (RRMSE) for a wide range of different skewness of the under-

75

lying distribution, which extends the investigations performed by Frances et

76

al. (1994) [8] and Strupczewski et al. (2014) [31]. We intend to generalize

77

the estimation of the length of the historical period (presented in [31] for the

78

largest historical flood) to the case that several floods are known for the his-

79

torical period. Furthermore, we want to adapt the test inversion bootstrap

80

approach for estimating confidence intervals to the case of historical flood

81

information.

82

2. Methods

83

2.1. Estimating the length of the historical period

ED

M

AN US

CR IP T

71

First of all, we would like to discuss a rather simple case for flood fre-

85

quency analysis using historical data, shown in Fig. 1a. Given are a sys-

86

tematic record with the length of N years and k known floods during the

87

historical period M . For this case, we assume that the historical period M

88

is known. As example, if a city was founded at a river side and the historic

89

accounts of flood reports (since its founding) are well preserved, it can be

AC

CE

PT

84

90

assumed that the historical period is simply the period between the found-

91

ing of the city and the onset of systematic records. It is assumed that for

92

the unrecorded years, the annual maximal streamflow’s are lower than the

93

smallest recorded flood xl within the historical period - otherwise they would 6

ACCEPTED MANUSCRIPT

T M

N

CR IP T

streamflow x

b streamflow x

a

T L

(years)

N

M

(years)

Figure 1: Systematic record of N annual maximal streamflow combined with a) known historical period M ; b) unknown historical period M with length L between first known flood and the begin of the systematic record.

have been mentioned too. The likelihood function L˜ in this case would read

95

[8] : L˜ = 

M k



 F (xl )M −k

k Y j=1

f (xj )

N Y

f (xi ) ,

(1)

i=1

M



AN US

94

with F (x) the cumulative distributive and f (x) the probability density func-

97

tion. The term F (xl )M −k accounts for all the unrecorded years. By maximiz-

98

˜ the corresponding parameters of F can be determined. ing the likelihood L,

99

However, Fig. 1b shows a much more realistic version of how historical flood

100

data is usually available: the historical period starts already with a distinc-

101

tive flood event, but nothing certain is known for the years before this event.

102

However, it is very likely that if a flood with a similar magnitude had oc-

103

curred some years before, it would have been recorded too. Therefore, taking

104

the historical period M = L (with L the number of years from the first his-

105

torical flood to the start of the systematic record) will lead to substantial

106

bias [11] [31]. Strupczewski et al. (2014) [31] inferred that for the largest

107

historical flood, the best estimate would be M = 2L (disregarding the sys-

108

tematic record), since the position of the largest flood (and therefore also L)

AC

CE

PT

ED

96

7

ACCEPTED MANUSCRIPT

can be seen as a uniform random variable on the interval [0, 1, ..., M ] with the

110

expected value E(L) = M/2 (if the systematic record is neglected, otherwise

111

it would be E(L) = (M + N )/2).

112

CR IP T

109

We would like to generalize this finding to the situation that several floods are known in the historical period. We propose following steps:

114

1. Determine the value of the smallest flood xl in the historical period (the

115

assumption here, of course is that in all non-recorded years the maximal

116

streamflow was lower than this particular flood).

117

2. Consider both, the historical period and the systematic record: How many

118

floods are larger or equal to the value xl ? Denote the result with k.

119

3. The estimate for M is then given by:

AN US

113

L+N −1 . k

(2)

M

M =L+

The reasoning behind this formula is that for the unrecorded years, the max-

121

imal streamflow is lower than the kth largest flood. Since it is assumed that

122

the maximal annual floods are independent, picking the k largest floods is

123

not different from picking any k numbers from the interval [1, 2, ..., M + N ]

124

(note that in contrast to [31] the interval starts with 1). Therefore, the prob-

125

ability that one of the k mentioned flood events occurs for a given year, is the

126

same for each year. Drawing k uniform distributed random numbers from the

127

continuous interval [0, 1] will yield an expected value for the position of the

AC

CE

PT

ED

120

128

largest value (which corresponds to L) of k/(k+1) [4]. Correspondingly, using

129

the interval [1, 2, ..., M + N ], L can be estimated to 1 + (M + N − 1)k/(k + 1).

130

This leads to Eq. 2. The estimated M can be used to calculate the appro-

131

priate likelihood given in Eq. 1.

8

ACCEPTED MANUSCRIPT

132

2.2. The GEV and estimation of its parameters The generalized extreme value distribution (GEV) is a theoretical sound

134

choice to fit the annual series of maximal streamflow, since the Fisher-

135

Tippett-Theorem [7] [10] states that the GEV is the limiting distribution

136

of maxima of a series of independent and identically distributed observa-

137

tions. It is also widely used in hydrology [16] [27] [30] [19] [5] [18] [12] [23].

138

For the cumulative GEV distribution, we used following notation (with the

139

shape parameter a, scale parameter b, location parameter c, and the peak

140

flow x): Fa,b,c (x) = e−(1+a

1 x−c − a ) b

− x−c b

F0,b,c (x) = e−e

AN US

CR IP T

133

; a 6= 0; 1 + a

; a = 0.

x−c >0 b (3)

Noteworthy, in hydrology often an alternative parametrization of the GEV

142

with −a instead of a is used. Anyway, for a ≥ 0 the GEV has no upper

143

boundary, and for a > 0 the GEV is heavily tailed (since the moments

144

greater than 1/a are not defined). In contrast, for a < 0, the distribution

145

has a finite upper bound where all moments the distribution has a finite

146

upper bound. The case a = 0 is known as the Gumbel distribution, which is

147

unbound but all of its moments are defined.

ED

PT

Utilizing Eq. 1 implies the use of maximum likelihood in order to estimate

the parameters of the GEV. However, from previous studies [13] [21][6] it is

AC

149

CE

148

M

141

150

known that MLE performs bad in terms of bias as well as root mean square

151

error compared to the methods of probability weighted moments (PWM) for

152

small sample sizes. Coles and Dixon (1999) [6] have attributed the com-

153

parable good performance of PWM to its assumption that the mean of the 9

ACCEPTED MANUSCRIPT

distribution exists (shape parameter a < 1) and if such a constraint is in-

155

troduced to the likelihood, its performance is even better than for PWM.

156

In detail, they introduced a penalty function P that is multiplied to the

157

likelihood function: L˜Pa,b,c (x) = P (a) · L˜a,b,c (x) with

    1    P (a) = g(a)      0  g(a) = exp −

a≤0

AN US

158

CR IP T

154

0
(5)

(6)

M

a≥1  a 1−a

(4)

159

The penalty increases with increasing shape parameter a. However, the

160

choice of g(a) in Eq. 6 is suboptimal. If LP denotes the penalized log-

likelihood (LP := log(L˜P )), then the derivative of the penalized log-likelihood

162

with respect to a is:

∂LP ∂L 1 = − ∂a ∂a (1 − a)2 0 < a < 1.

CE

PT

ED

161

There are two main problems associated with this derivative, more exactly

164

with the penalizing term:

165

1. It is not unambiguously defined for a = 0. The left hand limes is 0,

166

the right one -1. This can be problematic for algorithms searching for local

167

maxima of the penalized likelihood.

168

2. Even small (but positive) values of the shape parameter a are already

AC

163

10

ACCEPTED MANUSCRIPT

noticeable penalized. However, Coles and Dixon (1999) [6] themselves have

170

shown that robust measures of estimator performance (like the median) show

171

negative bias despite the significant positive bias that is observed for the mean

172

estimators (see also the results in the Appendix). Thus, small values of a

173

should, if possible, not noticeable be penalized.

Therefore, we would like to introduce a different function g(a):   a3 g(a) = exp − . 1−a

(7)

AN US

174

CR IP T

169

For this choice of g(a) the first and second derivative (with respect to a) are

176

unambiguously defined. Moreover, small values of a are penalized very little,

177

while for a → 1 the penalizing effect is similar to Eq. 6. The performance

178

of the modified penalized likelihood is studied in the Appendix for relevant

179

parameters used in this article.

180

2.3. Details for simulations

ED

M

175

For the performed simulations, we always chose the systematical record

182

length N = 50. We considered the number of known floods k in the historical

183

period k = 1, 2, and 4. The true historical period M was chosen to be

184

M = 200. However, for studying cases where M is not known, we determined

185

for each run the length L by subtracting all those years from M before one

186

of the k largest floods occurs first and then used an estimate for M (Eq. 2).

187

This enabled us to compare both cases. For the choice of parameters of

188

the parent distribution (the GEV, from where the samples are drawn), the

189

location parameter c and the scale parameter b can be arbitrarily chosen,

190

since c only shifts the whole distribution and b scales it around c. Therefore,

191

we chose c = 0 and b = 1. Note that c = 0 leads to negative flood events.

AC

CE

PT

181

11

ACCEPTED MANUSCRIPT

While this does not interfere with the calculations, it should be kept in mind

193

that the relative bias and relative root mean square error (RRMSE) reported

194

in this article is necessarily higher than in practical applications, since c is

195

supposed to be positive, which leads to a larger mean. For the remaining

196

parameter a (the shape parameter), we chose a = 0, 0.1, 0.2, 0.3. All choices

197

of a represent cases without upper boundary, and the skewness ranges from

198

1.14 to 13.48. Relative bias and RMSE are calculated for the return levels

199

associated with following return periods T : T = 50, 100, 200, 500, 1000

200

years.

201

2.4. Test inversion bootstrapping

AN US

CR IP T

192

The use of the test inversion bootstrapping (TIB) for estimating confi-

203

dence intervals (CI) was mainly advocated by Carpenter [2], [3]. The under-

204

lying idea of TIB can be explained by considering the definition of confidence

205

intervals (CI). Let us assume that from any sample with the sample size N a

207

c of W is estimated. If we knew the true underlying distribution, statistic W

ED

206

M

202

ci (estimates we could construct the range of values of W where 95% of all W

of samples) would be located. If this interval would be symmetrically cho-

209

sen, 2.5% of all samples would yield an estimate smaller than the lower and

210

another 2.5% of all samples would yield an estimate larger than the upper

211

boundary. This would be the symmetric and double-sided 95% CI. However,

212

the problem is we do not know the true distribution and all we have is a

AC

CE

PT

208

213

214

215

216

ci . sample and the corresponding estimate W

Under these circumstances, estimating the 95% CI requires the following

way of thinking: Determining the upper boundary of the CI (the single-sided ci underestimates 97.5% CI), we need to assume that the sample estimate W 12

ACCEPTED MANUSCRIPT

W such that the probability of non-exceedance is only 2.5% (regarding all

218

samples of the true, but unknown distribution). If that is the case, how would

219

the parameters of the true distribution (and hence W ) look like? Or more

220

specifically: What are the parameters were 2.5% of all values are smaller

221

CR IP T

217

ci ? This would define the upper boundary of the CI. Likewise, the than W

lower boundary of the 95% CI can be determined by assuming that 2.5% of

223

all samples of the true (yet unknown) distribution would exceed the sample

224

225

ci . estimate W

AN US

222

A simple example might illustrate this point. Let x be uniform distributed

226

on the interval [0, R] with unknown R. The cumulative distribution function

227

F (x) is then given by

229

230

b would be R b = 2x1 . For of the 95% double-sided CI? The mean estimate R

the upper boundary, the probability of non-exceedance need to be 2.5%. Therefore,

232

CE

PT

231

If only one value x1 is drawn from this distribution, what are the boundaries

ED

228

x ; 0 ≤ x ≤ R. R

M

F (x) =

x1 R → R = 40 · x1 .

F (x1 ) = 0.025 =

The upper boundary of the 95% CI is forty times larger than the sampled value. For the lower boundary, the probability of non-exceedance need to be

234

97.5%, therefore

AC 233

x1 R x1 →R = . 0.975

F (x1 ) = 0.975 =

13

ACCEPTED MANUSCRIPT

235

The lower boundary is roughly only 2.6% larger than the estimate x1 . The

236

complete 95% CI is given by (1.026 x1 ; 40 x1 ). Note that the boundaries of

238

b = 2x1 . the CI are not symmetrically placed around R

CR IP T

237

What most methods designed to determine CI imply is that the parame-

ters estimated from the sample can represent the true distribution and then

240

the interval were the appropriate percentage of samples are located serve as

241

estimate for the CI. But this is not true: In order to estimate the boundaries

242

of the CI we need to assume that the sample is more or less a significant over-

243

or underestimation of the true value. Therefore, loosely spoken, the distribu-

244

tion estimated from the sample cannot represent adequately the variability of

245

the distribution (which influences of course the width of the CI) and we need

246

to estimate the parameters of the true distribution under the assumption

247

that the sample was a defined over- or underestimation.

M

AN US

239

For distributions with a single parameter (like the simple example of

249

the uniform distribution presented above) the CI can be determined exactly

250

without the necessity of using bootstrap simulations. The TIB approach

251

itself was developed initially for one parameter Θ under the presence of a

252

nuisance parameters η ([3]). It uses the sample estimate for the nuisance

254

PT

single-sided CI for Θ (Θα ) are estimated (using bootstrap simulations) such that

AC

255

parameter η = ηb for the subsequent bootstrap simulations. The α · 100 %

CE

253

ED

248

b obs ); Θα , ηb) = 1 − α . F (Θ(x

256

For each value of Θ, bootstrap simulations can be carried out (with each

257

bootstrap simulation having the same sample size as the observed sample),

258

ci can be estimated. Subsequently, the and for each sample i the value Θ 14

ACCEPTED MANUSCRIPT

259

b obs ) can be determined. If this probability of non-exceedance regarding Θ(x

probability of non-exceedance reaches 1 − α, the boundary of the α · 100 %

261

single-sided CI (Θα ) has been found.

CR IP T

260

However, return levels of flood events are an explicit function of all three

263

parameters of the GEV. The TIB approach was customized in a previous

264

publication (Schendel and Thongwichian (2015) [29]). Before we discuss how

265

historical information should be included in the estimation of CI by the TIB

266

approach, we would like to shortly summarize the findings for solely system-

267

atic records. The problem in this situation is that only one constraint is avail-

268

able (the confidence level) but three variables are to determine. The main

269

idea was to maximize the likelihood function, such that the most likely set of

270

parameters (that suffice the respective confidence level) is chosen. Normally,

271

this would be a computationally demanding task, since it would require run-

272

ning several thousand bootstrap simulations for each set of two parameters

273

in order to determine the third one. However, since random values xi drawn

274

from the parameter set (a = aj , b = 1, c = 0) can be easily adjusted to every

275

GEV with the parameters (a = aj , b = bk , c = cl ) by:

PT

ED

M

AN US

262

277

(8)

we first draw for each aj B samples (with sample size N ), estimate the

CE

276

xnew = x i · bk + c l . i

parameters (b a, bb, b c) (using for example PL) and hence, calculate the return

levels of interest Q (for each sample). If we are interested in the (1−α)·100%

279

CI, we chose α/2 · (B + 1)th and 1 − α/2 · (B + 1)th smallest values of return

AC

278

280

levels [20]. In general, we will denote this values as qap , which is the pth

281

quantile of the return level of the chosen T -year return period for given a.

282

For each aj the parameters b and c can be derived as follows (Qest T denotes 15

ACCEPTED MANUSCRIPT

the T -year return level estimated from the observed sample, and L denotes

284

the log-likelihood function of the GEV): a − c) zi : = 1 + (xobs b i (Qest T − c) b = qap

CR IP T

283

(9)

N

N

X −1 (Qest − c) a+1X L = −N ln( T p )− ln(zi ) − zi a qa a i=1 i=1

AN US

N ∂L + = 0= ∂c (Qest T − c) N   1 X qap (xiobs − Qest −a T ) − (a + 1) . + z i 2 z (Qest − c) i T i=1

(10)

(11)

Equation 9 ensures that the pth quantile of the T -year return level matches

286

the T -year return level estimated from the observed sample. Eq. 11 allows

287

to numerically determine c and subsequently b. For each a, these parameters

288

enable us to determine the log-likelihood function (Eq. 10). By varying a, the

289

value a that maximizes Eq. 10 can be determined. The better performance of

290

this approach in comparison to other methods to estimate the CI was shown

291

for the GEV in [29].

PT

ED

M

285

In order to implement historical information, we would like to consider

293

first the situation depicted in Fig.1a with known historical period M . The

294

bootstrap runs in order to determine qap can be performed as described be-

295

fore, the historical information is included as follows: M values are randomly

296

drawn from the GEV, the k largest values are chosen (given that for the ob-

297

served time series k floods are known from the historical period) and for the

298

remaining values it is assumed that they are smaller than the kth largest

299

value. Using Eq. 1 with Eq. 3, Eq. 5, and Eq. 7, the desired return level can

AC

CE

292

16

ACCEPTED MANUSCRIPT

be estimated for each run and subsequently the qap can be determined. To

301

determine the parameters of the GEV that will eventually lead to the bound-

302

aries of the CI, Eq. 10 and Eq. 11 need to be altered as follows (S denotes the

303

smallest known historical flood, apart from that all known historical flood

304

events will be added to xiobs ):

AN US

a − c) zi : = 1 + (xobs b i a λ : = 1 + (S − c) b (Qest T − c) b = qap

CR IP T

300

(12)

M

N +k N +k X (Qest a+1 X −1 T − c) L = −(N + k) ln( )− ln(zi ) − zi a p qa a i=1 i=1   M 1 −(M − k)λ− a + ln   k

ED

a+1 S − Qest N +k ∂L = 0 = (M − k)λ− a qap est T 2 + est + ∂c (QT − c) (QT − c) N +k  1  X qap (xiobs − Qest −a T ) + z − (a + 1) . i 2 z (Qest − c) i T i=1

(13)

All estimations of CI in this article used B = 9999 simulation runs for deter-

306

mining qap .

For the other situation of historical information (Fig.1b), only L (the

CE

307

PT

305

time period between the first known historical flood and the beginning of

309

the systematic record) is known, while the historical period M needs to be

310

estimated. The correct approach in estimating CI using the TIB would be

311

to run several sets of bootstrap simulations with different M (in addition to

312

the varying parameters of the GEV), and use the value of M that maximizes

313

the likelihood function. Unfortunately, this would inevitably lead to M = L.

AC

308

17

ACCEPTED MANUSCRIPT

314

The reason can be illustrated using a simpler example: If a uniform distri-

315

bution is defined on the interval [0, N ] and a sample x1 , ..., xn is given, then

317

b = max(x1 , ..., xn ) the maximum likelihood estimation of N will lead to N

CR IP T

316

[4], which is unfortunately just a lower bound for N . Therefore, in order

to account somehow for the variability of the unknown variable M , we in-

319

terpreted M as a random variable for the bootstrap simulations. To derive

320

the distribution for M , let us first consider the case that k random numbers

321

are drawn from the uniform distribution on the interval [0, 1]. The proba-

322

bility density function f (g) for the maximum number g and the cumulative

323

distribution function F (g) are then given by:

AN US

318

f (g) = k g k−1

M

F (g) = g k .

To explain the expression derived for f (g), k − 1 numbers need to be lower

325

than g and due to permutation, each of the k numbers can be the maximum.

326

With these results, we can infer the result for the cumulative distribution

327

function F (L), where k floods are distributed on the interval [1, 2, ..., M + N ]

328

(assuming that k<
CE

PT

ED

324

F (L) =



L+N −1 M +N −1

k

.

We can solve this equation for M (r denotes a uniform random number which

330

represents F (L)):

AC

329

M =1+

L+N −1 1

rk

−N

(14)

331

For each single bootstrap run, a new random variable r is chosen, which leads

332

to a new estimation of M . Afterwards, in order to determine the parameters 18

15



10



● ●

5

relative bias in %

systematic M=L estimated M known M

● ●

0 0

200

400

600

return period in years

AN US

● ● ● ● ● ●

CR IP T

ACCEPTED MANUSCRIPT

800

1000

Figure 2: Relative bias in dependence on return period for a = 0.2 4) using only the floods were known during M .

M

systematic record;  using M = L; ◦) using estimate of M ; •) using true M = 200. Four

of the GEV that will eventually lead to the boundaries of the CI, for M the

334

estimation of Eq. 2 is used and the procedure is carried out the same way as

335

for known M .

336

3. Results

337

3.1. Effect of including historical information for bias and root mean square

PT

CE error

AC

338

ED

333

339

Bias and RMSE where studied for a various numbers of known floods

340

within the historical period, shape parameters of the GEV, and return pe-

341

riods. Although we cannot present all the results due to space limitations,

342

we will still discuss all the typical cases. In Fig. 2 the relative bias is shown 19

systematic k=1 flood k=2 floods k=4 floods



8



6





● ●

4

relative bias in %

10



● ●

2



0



0

200

400

600

return period in years

AN US



CR IP T

12

ACCEPTED MANUSCRIPT

800

1000

Figure 3: Relative bias in dependence on return period for a = 0.2 •) systematic record period M (estimated).

M

only;  one known flood; ◦) two known floods; 4) four known floods in the historical

for the case of four known flood events and a shape parameter a = 0.2 in

344

dependence on the return period for different methods of estimating M . Ob-

345

viously, M = L leads to a large bias while the estimation of M via Eq. 2

346

yields much better results. Using the correct value M = 200 reduces the bias

347

slightly more. Compared to the use of the systematic record only (N = 50),

348

the inclusion of historical information (using Eq. 2) seems to reduce the bias

349

for large return periods, while it may increase it for smaller return periods.

AC

CE

PT

ED

343

350

Fig. 3 depicts for the same shape parameter a = 0.2 the relative bias for

351

one, two, and four known floods in the historical period. The bias reduces

352

with the increase of the number of known floods. Again, inclusion of his-

353

torical flood events seems to reduce the bias for large return periods, while 20



k=1 flood k=2 floods k=4 floods

● ●









0



200



400



600

return period in years

AN US



CR IP T



0.50 0.55 0.60 0.65 0.70 0.75 0.80

ratio of RRMSE

ACCEPTED MANUSCRIPT

800

1000

Figure 4: Ratio of RMSE between using historical information vs. using the systematic

M

record only for a = 0.2 and •) one known flood event; ◦) two known flood events; ) four known flood events in the estimated historical period M .

it increases them for smaller ones, especially if only one flood is known. In

355

Fig. 4 the ratio of RMSE for using historical information versus using the

356

systematic record only is depicted for various return periods, using a = 0.2.

357

The more flood events are known in the historical period, the larger the re-

358

duction in RMSE. Moreover, this reduction is slightly more pronounced for

359

large return periods. In order to compare the effect of skewness of the parent

360

distribution on RMSE, Fig. 5 shows the ratio of RMSE for using historical

AC

CE

PT

ED

354

361

information (with four known floods) versus using the systematic record only

362

for four different values of the shape parameter a. It seems that the larger

363

a, the smaller the reduction of RMSE gets if historical information is used

364

(compared to the use of the systematic record only). Moreover, the increase 21

a=0 a=0.1 a=0.2 a=0.3



0.60 0.55

● ●

0.50















0.45



0

200

400

600

return period in years

AN US

ratio of RRMSE

0.65



CR IP T

0.70

ACCEPTED MANUSCRIPT

800

1000

Figure 5: Ratio of RMSE between using historical information vs. using the systematic

M

record only with four known floods and shape parameter •) a = 0; ◦) a = 0.1; ) a = 0.2; 4) a = 0.3. Historical period M estimated.

of RMSE reduction for large return periods is more pronounced for smaller

366

a. Finally, Fig. 6 compares the relative decrease of RMSE for the cases that

367

M needs to be estimated (using Eq. 2) or if M is known for two different

368

values of the shape parameter a (a = 0 and a = 0.3 (and one known historic

369

flood event)). While knowledge of M decreases RMSE further, this decrease

370

is rather small compared to the overall reduction of RMSE due to the im-

371

plementation of historical information: For example, for the 100-year return

AC

CE

PT

ED

365

372

level, a = 0, and estimated M , RMSE is 63.8% of RMSE if the systematic

373

record is used only, but 58.5% if M is known. For a = 0.3 the difference is

374

similar: 76.7% for estimated M , but 71.8% for known M .

22



0.8

M estimated; a=0 M known; a=0.0 M estimated; a=0.3 M known; a=0.3

0.7

ratio of RMSE

0.9



● ●

0.6









● ● ●

0

200

400

600

return period in years

AN US

0.5



CR IP T

1.0

ACCEPTED MANUSCRIPT

800

1000

Figure 6: Ratio of RMSE between using historical information vs. using the systematic

M

record only, for one known flood and •) a = 0 and estimated M ; ◦) a = 0 and known M ;  a = 0.3 and estimated M ; 4) a = 0.3 and known M .

3.2. Coverage error of the test inversion bootstrap

ED

375

In order to investigate the performance of the test inversion bootstrap

377

for confidence interval (CI) estimation, we chose a parent distribution with

378

the parameters a = 0.2, b = 1, c = 0, a systematic record length N = 50,

379

a known historical period M = 200 with one known flood event. We study

380

one-sided CI of flood events with return periods of 50, 100, 200, 500, and 1000

381

years for the following single-sided confidence levels: 2.5%, 5%, 10%, 90%,

AC

CE

PT

376

382

95%, and 97.5%. From the parent distribution, we draw 5000 samples and

383

determine the one-sided CI stated above. Then the percentage of samples,

384

were the true value of the parent distribution of the respective return level is

385

within the one-sided CI (which means lower than the boundary of the single23

ACCEPTED MANUSCRIPT

50-yr. r.l

100-yr r.l.

200-yr. r.l. 500-yr r.l.

1000-yr r.l.

2.5%

2.7

2.6

2.7

2.7

2.6

5%

5.4

5.0

5.0

4.8

10%

10.5

10.1

10.2

10.0

10.2

90%

91.1

91.1

91.2

90.9

90.7

95%

95.3

95.3

95.4

95.6

95.6

97.5% 97.4

97.6

97.6

97.7

97.9

CR IP T

CI

AN US

4.9

Table 1: Percentage of simulated results for which the single-sided confidence intervals (CI) (determined by test inversion bootstrapping) covered the true values of the return level (r.l.) of the parent distribution (GEV distribution with shape parameter a=0.2) for

M

different return periods for explicit known historical period M .

sided CI) will be determined. Consequently, deviations from the confidence

387

level can be regarded as coverage error. In Table 1 the respective percentage

388

of samples, where the true value is inside of the CI, is reported. The coverage

389

error is reasonable small. There seems to be a slight tendency to overestimate

390

the boundaries of large confidence levels (the upper boundary of double-sided

391

CI).

PT

CE

A similar investigation for the case that the historical period M is not

AC

392

ED

386

393

explicitly known (and therefore needs to be estimated), is numerically much

394

more cumbersome. Because L (number of years between first known flood

395

and begin of the systematic record) in Eq. 14 is different for each sample,

396

theoretically the whole bootstrap simulations to determine qap (used e.g. in 24

ACCEPTED MANUSCRIPT

50-yr. r.l

100-yr r.l.

200-yr. r.l. 500-yr r.l.

1000-yr r.l.

2.5%

3.2

3.0

3.2

2.9

2.9

5%

6.0

6.0

5.9

6.0

10%

11.4

11.6

11.5

11.8

11.8

90%

88.3

88.0

87.7

87.4

87.3

95%

93.4

93.1

93.0

92.8

92.9

97.5% 96.0

96.1

96.2

96.4

96.4

CR IP T

CI

AN US

6.0

Table 2: Percentage of simulated results for which the single-sided confidence intervals (CI) (determined by test inversion bootstrapping) covered the true values of the return level (r.l.) of the parent distribution (GEV distribution with shape parameter a=0.2) for

M

different return periods for unknown historical period M .

Eq. 13) need to be carried out for each of the 5000 samples. Moreover, the

398

overall rank (including the floods of the systematic record) of the largest

399

known flood within the historical period may vary from sample to sample,

400

which needs to require separate bootstrap simulations for each rank of the

401

known historical flood. Using the same parent distribution as before (in-

402

cluding the underlying historical period M of 200 years). The bootstrap

403

simulations were carried out not only for a variety of shape parameter a,

AC

CE

PT

ED

397

404

but also for the following L: 1, 50, 100, 150, 200 and rank of the historical

405

flood 1,2,3. This simulations made it possible to interpolate qap for all possi-

406

ble L. The results for the coverage probabilities are given in Table 2. The

407

boundaries of small and large confidence levels are both underestimated. For 25

ACCEPTED MANUSCRIPT

double-sided CI, this results into too narrow boundaries. The reason for the

409

larger coverage error in case of unknown M might be due to the fact that

410

the variability of M could not fully be implemented in the TIB approach (as

411

described in section 2.4).

412

3.3. Case Study: Gauge Cochem (Moselle)

CR IP T

408

The gauge Cochem at the river Moselle is located in the west of Germany.

414

Historical floods for this gauge have been investigated by Sartor et al. (2010)

415

[28]. In detail, for the four largest historical floods between 1572-1819, the

416

maximal streamflow was estimated to following values:

417

1572/1573: 4170-4400 m3 /s

418

1651: 4500 m3 /s

419

1740: 4170 m3 /s

420

1784: 5750 m3 /s

M

AN US

413

From the data source of the Federal Waterways and Shipping Adminis-

422

tration, Germany (WSV) (provided by the Federal Institute of Hydrology,

423

Germany (BfG)) maximal annual streamflow was available for the hydrolog-

424

ical years 1819-2014 (starting from 01.11. and ending at 31.10.). Hence, the

425

systematic record contains N = 196 years. The length of the true historical

426

period M is unknown, the first known flood starts in the hydrological year

427

1573. Therefore, L is given by 1819 − 1573 = 246. The minimal known

CE

PT

ED

421

historic flood had a streamflow of about 4170 m3 /s. Such a streamflow was

429

reached in the systematic record once (22.12.1993). Thus, we can assign an

430

order k = 5 to the minimal known flood event within the historical period.

431

Using Eq. 2, we can estimate M to M ≈ 334 years. However, before we use

432

Eq. 1, one point still needs to be considered: The maximal streamflow of the

AC

428

26

ACCEPTED MANUSCRIPT

return

estimate 95% CI m3

m3

historic information included

95% CI

estimate 95% CI

[

]

l.b. [

50-yr.

3790

3540

4250

3850

100-yr.

4080

3770

4700

4190

200-yr.

4350

3970

5150

4510

500-yr.

4690

4200

5750

4910

1000-yr. 4920

4350

6210

s

s

]

l.b. [

5200

m3 s

95% CI 3

] u.b. [ ms ]

3600

4110

3880

4530

4130

4960

4430

5540

AN US

s

] [

m3

level

s

] u.b. [

m3

CR IP T

systematic record only

4630

5970

Table 3: Estimated return levels for Cochem (Moselle) together with the lower (l.b.) and upper bound (u.b.) of the symmetric and double-sided 95 % confidence intervals (CI) for

M

various return periods with and without using historic flood events.

flood event in 1572/1573 is not known exactly, but somewhere in the inter-

434

val [4170 m3 /s;4400 m3 /s]. In order to deal with this uncertainty, we need

435

to insert following term into Eq. 1: F (4400 m3 /s) − F (4170 m3 /s). This

PT

ED

433

term describes the probability that the flood was smaller than 4400 m3 /s

437

but larger than 4170 m3 /s. Using the test inversion bootstrap approach as

438

described in section: 2.4, the double-sided 95% confidence interval is calcu-

439

lated with and also without utilizing historical flood events.The results are

AC

CE

436

440

presented in Table 3 and depicted in Fig. 7. Implementing historical flood

441

information reduces the length of the estimated confidence intervals for this

442

gauge.

27

● ● ● ●

3500

● ● ●

0

200

400

600

return period in years

AN US



CR IP T

5500 5000 4500

● ●

4000

streamflow in m^3/s

6000

ACCEPTED MANUSCRIPT

800

1000

Figure 7: Estimated upper (4) and lower (◦) bound of the 95% confidence interval as

M

well as the mean estimate () regarding extreme flood events at Cochem (Moselle) using

443

4. Discussion

ED

historical flood information (dashed line) vs. using the systematic record only (solid line).

In this article, we have shown that information about historical flood

445

events is very useful in flood frequency analysis. Especially, uncertainty due

446

to sampling error can be reduced substantially. However, for bias the situ-

447

ation is ambiguously: taking historical floods into account will reduce bias

448

for return levels of large return periods (where ”‘large”’ is determined by the

AC

CE

PT

444

449

length of the historical period and the number of known floods), but it will

450

introduce a slight bias for floods with smaller return periods. We have gen-

451

eralized and validated a method to estimate the true length of the historical

452

period (developed originally by Strupczewski et al. (2014) [31] for the largest 28

ACCEPTED MANUSCRIPT

historical flood) to the case of several known floods within the historical pe-

454

riod. Furthermore, we found that explicit knowledge of the true length of

455

the historical period M does only lead to minor improvements of bias and

456

RMSE compared to the use of estimated M . By utilizing different values of

457

the shape parameter a of the GEV, we found that reduction of RMSE due to

458

historical information is more pronounced for less skewed parent distributions

459

(small shape parameter a) and that for these cases RMSE also significantly

460

reduces more if the return period is increased. Finally, we have successfully

461

adapted the test inversion bootstrap for estimating confidence intervals for

462

the use of historical flood information.

AN US

CR IP T

453

We would like to emphasize that including historical floods will not in

464

general introduce an additional bias, only for return levels associated with

465

certain return periods. However, this bias is originally not an effect of the

466

unknown length of the historical period M (although this contributes to bias),

467

but seems to be a feature of the likelihood estimation when using censored

468

data. That bias can even be reduced for large return periods is indicative

469

that the shape parameter a is more precisely (with less bias) estimated due

470

to the use of historical information, but with the effect that in turn the scale

471

parameter b seems to increase. This can explain why bias increases for small

472

return periods, but decreases for larger ones. However, a positive bias is

473

in terms of risk assessment more desirable than a negative one (assuming

CE

PT

ED

M

463

the same absolute value). Given that for the maximum (as well as for the

475

penalized) likelihood the median bias is negative (using systematic record

476

only), a slight positive bias might be even welcome. More importantly, in

477

practice the parent distribution is not known, and therefore it is impossible

AC 474

29

ACCEPTED MANUSCRIPT

to distinguish between sampling error and bias. But we know that RMSE

479

is reduced substantially if historical information is included, which seems to

480

be an important justification for including historical information into flood

481

frequency analysis.

CR IP T

478

Our investigation has shown, that RMSE reduction is less pronounced

483

if the parent distribution is more positively skewed (implies larger shape

484

parameter a). Increasing a means that the distribution is more heavy-tailed

485

and hence, the probability distribution decreases more slowly for large and

486

increasing streamflow. Therefore, for large a the variability of extreme flood

487

events increases such that even small changes in the probability of exceedance

488

will lead to large changes of streamflow. Hence, extreme flood events carry

489

less information (resulting in less RMSE reduction) if the parent distribution

490

is more heavy-tailed.

M

AN US

482

We have confirmed the results of Hirsch and Stedinger (1987) [11] and

492

Strupczewski et al. (2014) [31] that assuming that the true historical period

493

is equal to the difference between the first known flood and begin of the

494

systematic record (M = L) is wrong (although often used in literature, like

495

in Frances et al. (1994) [8], Sartor et al. (2010) [28]). We have generalized

496

an idea (developed in [31]) for estimating M from one known flood in the

497

historic period to several ones. We could show that this estimate of M leads

498

to only marginal larger bias and RMSE compared to explicit knowledge of M .

CE

PT

ED

491

Therefore, we do not support the statement ”every effort should be made to

500

establish M as exact as possible” (Hirsch and Stedinger (1987) [11], similar

501

in Strupczewski et al. (2014) [31]), but believe that the estimation method

502

presented in this work is already a good starting point. Noteworthy, in

AC 499

30

ACCEPTED MANUSCRIPT

Frances et al. (1994) [8] it was already shown that at least for return periods

504

of interest, knowledge that a flood exceeds a certain threshold is nearly as

505

good as knowing the exact runoff. Thus, neither the true historic period nor

506

the exact magnitude of a historic flood event need to be known exactly for

507

benefitting from historical data. In contrast, Strupczewski et al. (2014) [31]

508

has provoked the question, whether including historical data in presence of

509

the uncertainties of the true historic period as well as flood magnitudes, ”is

510

worth a candle”. Given the facts, we can answer this question confidently:

511

It even deserves the spotlight!

AN US

CR IP T

503

One problem not tackled in this paper is the misspecification of the parent

513

distribution. While the Fisher-Tippet-Theorem states that the GEV is the

514

limiting distribution of a series of independent and identically distributed ob-

515

servations [7][10], smaller flood events (e.g. with return periods smaller than

516

2 years) might not be in the respective limit. Therefore, the GEV is not

517

the only valid choice as parent distribution. However, respecting the Fisher-

518

Tippett-theorem means that every function used to fit a series of annual

519

maximal streamflow must converge to the GEV at least for events associated

520

with large return periods. Since the GEV can be heavy-tailed (some moments

521

from a certain order on are infinite, and therefore not defined) for particular

522

parameter values, any other distribution used to fit annual maximal stream-

523

flow needs to have these feature too. This is for many distributions typically

CE

PT

ED

M

512

used in hydrology for fitting annual maximal streamflow, like the Weibull or

525

the Log-Pearson 3 distribution, simply not the case. In contrast, a version of

526

the generalized logistic distribution (used e.g. in the UK for flood frequency

527

analysis [15][22]) or the Wakeby distribution (see e.g. [14]) would be possible

AC 524

31

ACCEPTED MANUSCRIPT

choices. Anyway, for large return periods it is much more likely to assume the

529

validity of the Fisher-Tippet-Theorem than for smaller ones. Since known

530

historic floods are typically associated with large return periods, inclusion

531

of this historic data can even reduce errors caused by the wrong use of the

532

model distribution. This is simply because all the distributions will have the

533

same functional behaviour for extreme events.

CR IP T

528

Finally, we would like to present a possible application of this work to a

535

case that has nothing to do with historical flood events. Let us assume that

536

recently a very large flood has occurred. This flood leads to a reevaluation

537

of the flood frequency analysis. As long as the reevaluation of the flood

538

frequency analysis can be unambiguously attributed to this last flood event,

539

we need to expand the observation period. Otherwise, as we have seen for

540

historical floods, the results would be upward biased. By determining the

541

rank k of the last flood, additional (N − 1)/k (given a time series of length

542

N ) years in the future need to be assumed to be lower than the last flood

543

event. Ironically, studying the past leads to future implications.

544

References

546

548

M

ED

PT

single site and pooled frequency analysis. Hydrolog Sci J 2003; 48(1):2538 http://dx.doi.org/doi:10.1623/hysj.48.1.25.43485.

AC

547

[1] Burn DH. The use of resampling for estimating confidence intervals for

CE

545

AN US

534

[2] Carpenter

JR. J

Roy

Test Stat

inversion Soc

bootstrap

549

tervals.

550

http://dx.doi.org/doi:10.1111/1467-9868.00169.

32

Ser

B

confidence

in-

1999;61(1):159-172

ACCEPTED MANUSCRIPT

551

[3] Carpenter JR, Bithell J. Bootstrap confidence intervals:

When,

which, what?

A practical guide for medical statisticians. Stat

553

Med 2000;19(9):1141-1164 http://dx.doi.org/doi:10.1002/(SICI)1097-

554

0258(20000515)19:9¡1141::AID-SIM479¿3.0.CO;2-F.

CR IP T

552

[4] Castillo E, Hadib AS. A method for estimating parameters and quantiles

556

of distributions of continuous random variables. Comput Stat Data An

557

(1995);20(4):421-439 http://dx.doi.org/10.1016/0167-9473(94)00049-O.

558

[5] Chowdhury JU, Stedinger JR, Lu LH. Goodness of fit tests for re-

559

gional generalized extreme value flood distributions. Water Resour Res

560

1991;27(7):1765-1776 http://dx.doi.org/doi:10.1029/91WR00077.

AN US

555

[6] Coles SG, Dixon MJ. Likelihood-based inference for extreme value mod-

562

els. Extremes, Kluwer Academic Publishers, Boston, 1999;2(1):5-23

563

http://dx.doi.org/doi:10.1023/A:1009905222644.

ED

M

561

[7] Fisher RA, Tippett LHC. Limiting forms of the frequency distribution

565

of the largest or smallest member of a sample. Proc Cambridge Phil Soc

566

1928;24(2):180-190 http://dx.doi.org/10.1017/S0305004100015681.

568

tematic and historical or paleoflood data based on the two-parameter general extreme value models. Water Resour Res 1994;30(6):1653-1664

AC

569

[8] Frances F, Salas JD, Boes DC. Flood frequency analysis with sys-

CE

567

PT

564

570

http://dx.doi.org/doi:10.1029/94WR00154.

571

[9] Ga´al L, Szolgay J, Kohnov´a S, Hlavˇcov´a K. Inclusion of his-

572

torical information in flood frequency analysis using a Bayesian

573

MCMC technique:

a case study for the power dam Orl´ık, 33

ACCEPTED MANUSCRIPT

574

Czech Republic. Contributions Geophy Geodesy 2014;40(2):121-147

575

http://dx.doi.org/doi:10.2478/v10126-010-0005-5. [10] Gnedenko

577

mum

578

http://dx.doi.org/doi:10.2307/1968974.

579

BV.

d’une

[11] Hirsch

Sur

serie

RM,

la

distribution

aleatoire.

Stedinger

Ann

JR.

limite

du

terme

maxi-

CR IP T

576

of

Math

Plotting

1943;44(3):423-453

positions

for

historical

floods and their precision. Water Resour Res 1987;23(4):715-727

581

http://dx.doi.org/doi:10.1029/WR023i004p00715.

AN US

580

[12] Hosking JRM, Wallis JR, Wood EF. An appraisal of the regional flood

583

frequency procedure in the UK Flood Studies Report. Hydrol Sci J

584

1985;30(1):85-109 http://dx.doi.org/doi:10.1080/02626668509490973.

585

[13] Hosking

JRM,

M

582

Wallis

JR,

generalized

587

probability-weighted

588

http://dx.doi.org/doi:10.2307/1269706.

589

[14] Houghton JC. Birth of a parent:

by

Technometrics

PT

moments.

Estimation the

of

method

the of

1985;27(3):251-261

The Wakeby distribution

for modeling flood flows. Water Resour Res 1978;14(6):1105-1110

CE

591

EF.

distribution

ED

586

590

extreme-value

Wood

http://dx.doi.org/doi:10.1029/WR014i006p01105.

[15] Institute of Hydrology (IH): The Flood Estimation Handbook, Cen-

593

tre for Ecology and Hydrology, Wallingford, UK, 1999 ISBN:

594

9781906698003.

AC

592

595

[16] Katz

RW,

Parlange

MB,

Naveau 34

P.

Statistics

of

extremes

ACCEPTED MANUSCRIPT

596

in

597

http://dx.doi.org/doi:10.1016/S0309-1708(02)00056-8.

Water

Resour

2002;25(8-12):1287-1304

[17] Kysel´ y J. A Cautionary Note on the Use of Nonparamet-

599

ric

600

Models.

601

Adv

Bootstrap J

for

Estimating

Appl

Meteorol

CR IP T

598

hydrology.

Uncertainties Climatol

in

Extreme-Value

2007;47(12):3236-3251

http://dx.doi.org/doi:10.1175/2008JAMC1763.1.

[18] Lettenmaier DP, Wallis JR, Wood EF. Effect of regional heterogeneity

603

on flood frequency estimation. Water Resour Res 1987:23(2):313-323

604

http://dx.doi.org/doi:10.1029/WR023i002p00313.

AN US

602

[19] Madsen H, Pearson CP, Rosbjerg D. Comparison of annual maximum

606

series and partial duration series methods for modeling extreme hydro-

607

logic events, 2. Regional modeling. Water Resour Res 1997;33(4):759-769

608

http://dx.doi.org/doi:10.1029/96WR03849. [20] Makkonen

ED

609

M

605

L.

Bringing

the

611

http://dx.doi.org/doi:10.1080/03610920701653094.

Position

2008;37(3):460-467

CE

[21] Martins ES, Stedinger JR. Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data. Water Resour Res 2000;36(3):737-744 http://dx.doi.org/doi:10.1029/1999WR900330.

AC

614

A-Theor

Plotting

PT

Controversy.

613

Stat

to

610

612

Commun

Closure

615

[22] McDonald N, Kjeldsen TR, Prosdocimi I, Sangster H. Reassessing flood

616

frequency for the Sussex Ouse, Lewes: the inclusion of historical flood

617

information since AD1650. Nat Hazards Earth Sys 2014;14(10):2817-

618

2828 http://dx.doi.org/doi:10.5194/nhess-14-2817-2014. 35

ACCEPTED MANUSCRIPT

[23] Morrison JE, Smith JA. Stochastic modeling of flood peaks using the

620

generalized extreme value distribution. Water Resour Res 2002;38(12):1-

621

12 http://dx.doi.org/doi:10.1029/2001WR000502.

CR IP T

619

622

[24] Obeysekera J, Salas JD. Quantifying the uncertainty of design floods

623

under nonstationary conditions. J Hydraul Eng 2014;19(7):1438-1446

624

http://dx.doi.org/doi:10.1061/(ASCE)HE.1943-5584.0000931.

[25] Park B, Demeritt D. Defining the hundred year flood: Bayesian

627

tainty in flood frequency estimates. J Hydrol 2016;540(9):1189-1208

628

http://dx.doi.org/10.1016/j.jhydrol.2016.07.025. [26] Payrastre

O,

for

Gaume

Andrieu

data

H.

cal

631

based

632

http://dx.doi.org/doi:10.1029/2010WR009812.

case

frequency

study.

Water

to

reduce

Usefulness

analyses:

Resour

Res

of

uncer-

histori-

Developments 2011;47(8):1-15

ED

a

flood

historic

630

on

for

E,

M

information

using

A

626

629

approach

AN US

625

[27] Rust HW, Kallache M, Schellnhuber HJ, Kropp JP. Confidence Intervals

634

for Flood Return Level Estimates Assuming Long-Range Dependence.

635

In: Kropp JP and Schellnhuber HJ, editors. In extremis. Disruptive

636

Events and Trends in Climate and Hydrology. Heidelberg: Springer;

CE

2011 http://dx.doi.org/doi:10.1007/978-3-642-14863-7 3.

AC

637

PT

633

638

639

640

[28] Sartor J, Zimmer KH, Busch N. Historische Hochwasserereignisse der deutschen Mosel. Wasser und Abfall 2010;10:46-51.

[29] Schendel T, Thongwichian R. Flood frequency analysis: Confidence in-

36

ACCEPTED MANUSCRIPT

terval estimation by test inversion bootstrapping. Adv Water Resour

642

2015;83:1-9 http://dx.doi.org/doi:10.1016/j.advwatres.2015.05.004.

643

[30] Stedinger JR, Lu LH. Appraisal of regional and index flood

644

quantile

645

http://dx.doi.org/doi:10.1007/BF01581758.

646

estimators.

[31] Strupczewski

WG,

Hydrol

Kochanek

K,

the

648

cal

649

http://dx.doi.org/doi:10.5194/nhess-14-1543-2014.

E.

largest

Flood

histori-

AN US

by

1995;9(1):49-75

Bogdanowicz

frequency

Nat

supported

Hydraul

647

flood.

analysis

Stoch

CR IP T

641

Hazards

Earth

Sys

2014;14(6):1543-1551

[32] Venzon DJ, Moolgavkar SH. A Method for Computing Profile-

651

Likelihood Based Confidence Intervals. Appl Stat 1988;37(1):87-94

652

http://dx.doi.org/doi:10.2307/2347496.

M

650

[33] Viglione A, Merz R, Salinas JL, Bl¨oschl G. Flood frequency hydrol-

654

ogy: 3. A Bayesian analysis. Water Resour Res 2013;49(2):675-692

655

http://dx.doi.org/doi:10.1029/2011WR010782.

657

PT

mators

In order to compare the performance of the maximum likelihood (MLE),

AC

658

Appendix A. Performance comparison of various likelihood esti-

CE

656

ED

653

659

penalized likelihood according to Eq. 6 by Coles and Dixon (1999) [6] (de-

660

noted as PLE1), and its modification presented in Eq. 7 (denoted as PLE2),

661

the mean bias, the median bias, and the relative root mean square error

662

(RRMSE) were determined. Therefore, for parent distributions (of the GEV)

37

ACCEPTED MANUSCRIPT

with various shape parameter a, 10,000 samples with sample size N = 50 were

664

drawn for estimating bias and RRMSE for return levels of different return

665

periods. The results are seen in Table A.4, A.5, A.6. PLE1 lowers the mean

666

bias significantly compared to MLE, even to the point that it introduces a

667

strong negative bias for smaller return periods. Similar, the already negative

668

median bias of MLE is further pronounced for PLE1. Therefore, it is also

669

not surprising that PLE1 has significant lower RMSE, since introducing neg-

670

ative bias for positively skewed distributions will to a certain extent lower

671

RMSE. In contrast, the median bias is only slightly lowered for PLE2 com-

672

pared to MLE. Regarding the mean bias, for return periods of 50 and 100

673

years, PLE2 also performs better than PLE1 and often improves the results

674

of MLE. For larger return periods, the mean bias is always smaller than for

675

MLE, especially for large shape parameter a. However, PLE1 yields typically

676

better results. Finally, PLE2 yields smaller RMSE values than MLE, espe-

677

cially for larger values of a. Nevertheless, the RMSE of PLE1 is smaller, but

678

this might be due the wide range of estimates being penalized - even those

679

with only slightly positive shape parameter a. To summarize, PLE2 seems to

680

substantially improve bias and RMSE compared to MLE, while avoiding the

681

heavy side effects of PLE1, namely the considerable larger negative median

682

bias and negative mean bias for smaller return periods.

AC

CE

PT

ED

M

AN US

CR IP T

663

38

ACCEPTED MANUSCRIPT

CR IP T

Table A.4: Mean bias in % for return levels (r.l.) of different return periods of the parent

GEV distribution with various shape parameter a. MLE denotes maximum likelihood estimation, PLE1 penalized likelihood estimation according to Eq. 6, and PLE2 penalized likelihood estimation according to Eq. 7.

method

50-yr. 100-yr 200-yr. 500-yr 1000-yr r.l.

r.l.

r.l.

r.l.

0.0 MLE

-0.9

-0.1

1.0

2.8

4.5

PLE1

-2.5

-2.2

-1.7

-0.7

0.3

PLE2

-1.1

-0.3

0.7

2.5

4.1

0.1 MLE

0.6

2.1

3.9

7.0

9.9

PLE1

-2.8

-2.3

-1.6

-0.3

1.1

PLE2

0.0

1.3

2.9

5.6

8.1

PT

ED

M

r.l.

0.2 MLE

2.6

4.8

7.6

12.5

17.1

PLE1

-3.4

-3.0

-2.2

-0.6

1.1

PLE2

0.9

2.5

4.6

8.1

11.5

0.3 MLE

4.8

7.9

12.0

19.0

25.8

PLE1

-5.1

-4.9

-4.3

-3.0

-1.5

PLE2

0.4

2.0

4.2

7.9

11.4

CE AC

AN US

a

39

ACCEPTED MANUSCRIPT

CR IP T

Table A.5: Median bias in % for return levels (r.l.) of different return periods of the parent GEV distribution with various shape parameter a. MLE denotes maximum likelihood estimation, PLE1 penalized likelihood estimation according to Eq. 6, and PLE2 penalized likelihood estimation according to Eq. 7.

method

50-yr. 100-yr 200-yr. 500-yr 1000-yr r.l.

r.l.

r.l.

r.l.

0.0 MLE

-2.9

-3.1

-3.5

-3.8

-3.9

PLE1

-3.9

-4.3

-4.8

-5.3

-5.6

PLE2

-2.9

-3.1

-3.6

-3.8

-4.0

0.1 MLE

-2.5

-2.7

-3.1

-3.3

-3.3

PLE1

-5.2

-5.9

-6.8

-7.9

-8.6

PLE2

-2.7

-2.9

-3.3

-3.5

-3.5

PT

ED

M

r.l.

0.2 MLE

-2.3

-2.2

-2.5

-2.4

-2.4

PLE1

-6.5

-7.5

-8.6

-9.8

-11.0

PLE2

-2.9

-2.8

-3.2

-3.4

-3.4

0.3 MLE

-2.0

-1.9

-2.0

-1.6

-1.7

PLE1

-8.8

-10.0

-11.3

-13.4

-14.7

PLE2

-3.9

-4.0

-4.5

-4.8

-4.9

CE AC

AN US

a

40

ACCEPTED MANUSCRIPT

CR IP T

Table A.6: Relative Root mean square error (RRMSE) in % for return levels (r.l.) of

different return periods of the parent GEV distribution with various shape parameter a. MLE denotes maximum likelihood estimation, PLE1 penalized likelihood estimation according to Eq. 6, and PLE2 penalized likelihood estimation according to Eq. 7.

50-yr. 100-yr 200-yr. 500-yr 1000-yr r.l.

r.l.

r.l.

r.l.

0.0 MLE

21.5

25.4

30.0

37.2

43.5

PLE1

19.9

22.9

26.4

31.7

36.3

PLE2

21.2

25.0

29.4

36.1

42.0

0.1 MLE

26.2

31.7

38.3

49.0

59.0

PLE1

23.2

27.1

31.6

38.6

44.8

PLE2

25.2

30.1

35.9

45.0

53.2

PT

ED

r.l.

0.2 MLE

31.7

39.3

48.6

64.2

79.5

PLE1

26.4

31.0

36.3

44.7

52.1

PLE2

28.8

34.7

41.7

52.9

63.1

0.3 MLE

38.0

48.1

60.8

83.0

105.6

PLE1

28.7

33.5

39.1

47.8

55.4

PLE2

31.2

37.4

44.8

56.5

67.2

CE AC

AN US

method

M

a

41