Probability plotting position formulas for flood records with historical information

Probability plotting position formulas for flood records with historical information

Journal of Hydrology, 96 (1987) 18~199 Elsevier Science Publishers B.V., Amsterdam - - Printed in The Netherlands 185 PROBABILITY PLOTTING POSITION ...

737KB Sizes 0 Downloads 70 Views

Journal of Hydrology, 96 (1987) 18~199 Elsevier Science Publishers B.V., Amsterdam - - Printed in The Netherlands

185

PROBABILITY PLOTTING POSITION FORMULAS FOR FLOOD RECORDS WITH HISTORICAL INFORMATION

ROBERT M. HIRSCH

U.S. Geological Survey, Mail Stop 410, National Center, Reston, VA 22092 (U.S.A.) (Accepted for publication November 12, 1986)

ABSTRACT Hirsch, R.M., 1987. Probability plotting position formulas for flood records with historical information. In: W.H. Kirby, S.-Q. Hua and L.R. Beard (Editors), Analysis of Extraordinary Flood Events, J. Hydrol., 96: 18~199. For purposes of evaluating fitted flood frequency distributions or for purposes of estimating distributions directly from plots of flood peaks versus exceedance probabilities (either by subjective or objective techniques), one needs a probability plotting position formula which can be applied to all of the flood data available: both systematic and historic floods. Some of the formulas in use are simply extensions of existing formulas (such as Hazen and Weibull) used on systematic flood records. New plotting position formulas proposed by Hirsch and Stedinger (1986) and in this paper are based on a recognition that the flood data arises from partially censored sampling of the flood record. The theoretical appropriateness, bias in probability and bias in discharge of the various plotting position formulas are considered. The methods are compared in terms of their effects on flood frequency estimation when an objective curve-fitting method of estimation is employed. Consideration is also given to the correct interpretation of the historical record length and the effect of incorrectly assuming that record length equals the time since the first known historical flood. This assumption is employed in many flood frequency studies and may result in a substantial bias in estimated design flood magnitudes. INTRODUCTION T h e v a l u e of h i s t o r i c a l or p a l e o f l o o d i n f o r m a t i o n , for i m p r o v i n g e s t i m a t e s of q u a n t i l e s of flood f r e q u e n c y d i s t r i b u t i o n s , h a s b e e n d i s c u s s e d b y B e n s o n (1950), L e e s e (1973), C h e n et al. (1974), G e r a r d a n d K a r p u k (1979), C o n g et al. (1979), I n t e r a g e n c y A d v i s o r y C o m m i t t e e o n W a t e r D a t a ( I A C W D ) (1982), S t e d i n g e r a n d C o h n (1986), a n d o t h e r s . T h e q u a n t i l e s of g r e a t e s t i n t e r e s t for p u r p o s e s of d e s i g n of s p i l l w a y s a n d b r i d g e s or for p u r p o s e s of flood p l a i n z o n i n g , p l a n n i n g o r i n s u r a n c e a r e v e r y e x t r e m e ; for e x a m p l e , q u a n t i l e s w i t h e x c e e d a n c e proba b i l i t i e s of 0.02, 0.01, 0.002 o r e v e n 0.001 a r e o f t e n s o u g h t . T h e l e n g t h of t h e s y s t e m a t i c ( g a g e d ) r e c o r d is t y p i c a l l y v e r y s h o r t . I n p a r t i c u l a r , t h e e x c e e d a n c e p r o b a b i l i t y of i n t e r e s t is t y p i c a l l y s m a l l e r t h a n 1In w h e r e n is t h e r e c o r d l e n g t h . T h u s , it is n o t u n c o m m o n for t h e e x p e c t e d n u m b e r of e x c e e d a n c e s of t h e q u a n t i l e of i n t e r e s t i n t h e s y s t e m a t i c r e c o r d to b e less t h a n 1. G i v e n t h a t n o a s s u r a n c e is a v a i l a b l e t h a t t h e floods a t a g a g e f o l l o w a d i s t r i b u t i o n of a

0022-1694/87/$03.50

~ 1987 Elsevier Science Publishers B.V.

186

particular shape as specified by distribution type and coefficient of skewness, the quantile estimation can be expected to be highly uncertain. If information is available about some large floods prior to the period of record, then it is particularly tempting to try to incorporate this information into formal frequency analysis or into assessment of the quality (typically by graphical means) of a particular fitted distribution. This historical or paleohydrologic data may arise from official government records, from the press, from physical evidence (mud lines, slack water deposits), from botanical evidence, from labeled highwater marks (typically on buildings or bridges) or from oral history. This paper will refer to all such floods as "historical floods." In general, the raw information on the historical floods is stage information. In this paper it will be assumed that this stage information can be accurately transformed to discharge information. Historical flood records must possess certain features if they are to be used in frequency analysis. The basic requirement is that, for every such flood, one must be able to state its rank as an annual flood within some particular period of time. In general, our knowledge of historical floods is stated in terms of a particular flood discharge occurring in a particular year, with no information provided about any other years. If we were, in fact, utterly ignorant about the magnitude or rank of the annual flood in these other years, then the known flood discharges would be of virtually no value in estimation. They could not be ranked and plotted in some probability context, and they could not be used in moment or maximum likelihood estimation. On close examination of a record, it is usually apparent that statements can be made about the years without a known historical flood. Our knowledge of a flood occurrence in a particular year comes about because a flood is viewed as extra ordinary or is extraordinary in some physical sense (related to the creation of lasting evidence). The lack of knowledge about a flood in some year in the past means one of two things: either the cultural sophistication did not exist to note and record evidence of an extraordinary flood (in a form which would persist to the present), or the cultural sophistication existed but no e xtr ao r d in ar y flood occurred. If the former is the case, then the particular year cannot be considered part of the sample. If the latter is the case, then the year does belong in the sample, and that sample must be viewed as a censored sample: one where floods over some threshold in magnitude (denoted as extraordinary) are considered to be known in magnitude, and those below the threshold are simply known to be below the threshold but of unknown magnitude. The difficulties with evaluating a record are therefore of two types: the first is identifying the threshold and the second is determining which years are in the sample and which are not. Typically the threshold (denoted here as Q0) is the discharge at which serious economic damage and disruption occurs. In some cases Q0 is not easily determined, but an upper bound estimate of Q0 may be established equal to the smallest known historical flood. The accurate determination of Q0 is, in fact, not essential to the effective use of methods described here (see Hirsch and

187 Stedinger, 1987). W h a t is i m p o r t a n t a b o u t Q0 is t h a t the concept be recognized and a plausible v a l u e be selected. The q u e s t i o n of r e c o r d l e n g t h is obviously closely r e l a t e d to the definition of Q0. If one can r e a s o n a b l y estimate Q0 for a p a r t i c u l a r year, t h e n t h a t y e a r can be viewed as a p a r t of the censored record. F o r y e a r s which come between two k n o w n historical floods or between a historical flood and the b e g i n n i n g of the systematic r e c o r d the d e t e r m i n a t i o n is easy. The difficulty arises with years preceding the first k n o w n historical flood. If we know t h a t Q0 was exceeded in, say, 1895, it is r e a s o n a b l e to t h i n k t h a t Q0 was also exceeded in 1894, but we are u n a w a r e of it? P r o b a b l y not. The o c c u r r e n c e of " b a c k - t o - b a c k " e x t r a o r d i n a r y floods would likely be notable unless the floodplain suddenly c h a n g e d from being t o t a l l y u n i n h a b i t e d to i n h a b i t e d b e t w e e n the flood seasons of 1894 and 1895. If we r e p e a t the question for 1893, t h e n 1892 and so on, we may become progressively less sure of the answer. The only way to a r r i v e at a r e a s o n a b l e a n s w e r a b o u t the r e c o r d length is to examine i n f o r m a t i o n about the h i s t o r y and r e c o r d keeping of the area. It is u n l i k e l y t h a t u n e q u i v o c a l conclusions can be r e a c h e d a b o u t the t r u e r e c o r d length, but a r e a s o n a b l e estimate can be made, some bounds on the t r u e length can be stated, and the sensitivity of s u b s e q u e n t analysis to the assumed r e c o r d length can be evaluated. M a n y examples in the l i t e r a t u r e a p p e a r to assume the r e c o r d l e n g t h to equal the n u m b e r of years from the first historical flood to the present. This is a lower bound estimate and, as will be seen below, can lead to some serious biases in r e s u l t i n g estimates. PLOTTING POSITIONS -- DEFINITIONS The d a t a to be considered are assumed to be of the following form. T h e r e are a total o f g k n o w n floods. Of these floods, k of them are k n o w n to be the k largest in a period of n years. The n y e a r period c o n t a i n s w i t h i n it some systematic (gaged) r e c o r d period of s years (s < n). Of the k largest floods, e of them o c c u r r e d d u r i n g the systematic r e c o r d (e ~< k, e ~< s). Note also t h a t g = k + s - e. We will assume t h a t t h e r e is a t h r e s h o l d of p e r c e p t i o n Q0 such t h a t the k largest floods are larger t h a n or equal to it and the r e m a i n d e r are smaller t h a n it, and t h a t we k n o w a b o u t the k - e k n o w n floods in the historical period (pre-systematic record) specifically because t h e y were g r e a t e r t h a n or equal to Q0The q u a n t i t i e s which we w a n t to estimate are Pi, the probability of exc e e d a n c e of the ith largest of these g floods. The estimates will be denoted as ~5i. The use of these/5~ values is two-fold. One is for plotting the d a t a on probability plots. T h a t is a plot of flood m a g n i t u d e Qi (where Qi is the magnitude of the ith largest flood) versus ~5i. Such plots t y p i c a l l y display Qi on a linear or log scale versus Z / o n a l i n e a r scale where Zi = FN105i) w h e r e FN 1 is the s t a n d a r d n o r m a l c u m u l a t i v e d i s t r i b u t i o n f u n c t i o n inverse. The o t h e r use of the ~5i is in objective c u r v e fitting t e c h n i q u e s (see Cong et al., 1979).

188 This p a p e r p r e s e n t s five different p l o t t i n g position f o r m u l a s w h i c h a r e based on different o b j e c t i v e s or different u n d e r s t a n d i n g s of the n a t u r e of the data. T h e e s t i m a t e s from the f o r m u l a s will be c o m p a r e d , for two a c t u a l d a t a sets, to give a g e n e r a l i m p r e s s i o n of the m a g n i t u d e of the differences b e t w e e n results. T h e rules will be c o m p a r e d in t e r m s of bias - - b o t h the bias in p r o b a b i l i t i e s and the bias in discharge. T h e bias in p r o b a b i l i t y is i n d e p e n d e n t of d i s t r i b u t i o n type. T h e bias in d i s c h a r g e depends on d i s t r i b u t i o n a n d is e v a l u a t e d for a G u m b e l distribution, the l o g n o r m a l d i s t r i b u t i o n and the g a m m a distribution. Finally, t h r e e of the p l o t t i n g position rules will be used as a p a r t of a n o b j e c t i v e curve-fitting t e c h n i q u e , a n d a limited M o n t e C a r l o s t u d y d e m o n s t r a t e s the biases and r o o t m e a n s q u a r e e r r o r s of e s t i m a t e d 100-year floods. TRADITIONAL RULES P r o b a b i l i t y p l o t t i n g positions in s i t u a t i o n s w h e r e no h i s t o r i c a l floods are c o n s i d e r e d are t y p i c a l l y s t a t e d as: Pi

--

i -a n+l-2a

(1)

w h e r e a is a c o n s t a n t . F o r the Weibull (1939) rule a = 0, for the G r i n g o r t e n (1963) rule a = 0.44, a n d for the H a z e n (1914) rule a = 0.5. In this p a p e r a t t e n t i o n will be limited to t h e s e t h r e e a values. In the u n c e n s o r e d case the Weibull is k n o w n to p r o d u c e u n b i a s e d e s t i m a t e s of p r o b a b i l i t i e s and the Gring o r t e n is k n o w n to be n e a r l y u n b i a s e d in d i s c h a r g e for a v a r i e t y of flood-like d i s t r i b u t i o n s (see N a t u r a l E n v i r o n m e n t a l R e s e a r c h Council, 1975; C u n n a n e , 1978). In o r d e r to m a k e the f o r m u l a s a p p l i c a b l e to s i t u a t i o n s w i t h h i s t o r i c a l flood data, t h e y h a v e b e e n modified to a c c o u n t for the fact t h a t the k l a r g e s t are r a n k e d in the n y e a r period but the r e m a i n i n g (s - e) floods are o n l y r a n k e d in the s h o r t e r s y e a r period. A l t h o u g h m i n o r v a r i a t i o n s exist in the t r e a t m e n t of b e l o w - t h r e s h o l d floods, the modifications m a d e by B e n s o n (1950), Cong et al. (1979), I A C W D (1982), c a n be c h a r a c t e r i z e d by the formula:

i-a n+l-2a' t5 =

i=1

....

k

k - a (n - a + 1 - k) (i - k - a) n + 1 2a + (n + 1 - 2a) (s - e + 1 - 2 a ) '

(2) i=k+l

.....

g

T h e s e f o r m u l a s a r e r e f e r r e d to h e r e i n as the Weibull (W) f o r m u l a w h e n a = 0, G r i n g o r t e n (G) f o r m u l a w h e n a = 0.44, a n d H a z e n (H) f o r m u l a w h e n a = 0.5. All of t h e m a s s u r e that/~i < /5i+ 1 for all i < g - 1.

189 EXCEEDANCE-BASED RULES

As an a l t e r n a t i v e to t h e s e t h r e e rules, one c a n first c o n s i d e r the p r o b a b i l i t y (Pe) t h a t an a n n u a l flood will e q u a l or exceed the t h r e s h o l d of p e r c e p t i o n Q0. Ifpe w e r e k n o w n r a t h e r t h a n e s t i m a t e d , t h e n an u n b i a s e d e s t i m a t e o f p i would be: ~Pe,

i=1

I/

Pi

.....

k

(3)

i - k

po + (1 - po)

e +

(s

i = k + 1. . . . .

1)'

g

This r e s u l t is b a s e d on the fact t h a t Pi is a n o r d e r s t a t i s t i c of a u n i f o r m l y d i s t r i b u t e d r a n d o m variable. T h e s e o r d e r s t a t i s t i c s follow a b e t a d i s t r i b u t i o n r e g a r d l e s s of the d i s t r i b u t i o n of d i s c h a r g e ( J o h n s o n and Kotz, 1970, pp. 37-41). T h e b o u n d s of these b e t a d i s t r i b u t i o n s are 0 to Pe for floods a b o v e the t h r e s h o l d and Pe to 1.0 for floods below the threshold. T h e e x c e e d a n c e p r o b a b i l i t y for Q0 is, of course, u n k n o w n , but a n a t u r a l e s t i m a t o r f o r p i would use the m a x i m u m likelihood e s t i m a t e ofp~ in place Of Pe in eqn. (1). The m a x i m u m likelihood (and m e t h o d of m o m e n t s ) e s t i m a t o r Of Pe is kin. Thus, the p l o t t i n g position formula, w h i c h relies on the Weibull (1939) c o n c e p t of u n b i a s e d p r o b a b i l i t i e s as well as a r e c o g n i t i o n of this e x c e e d a n c e p r o b a b i l i t y is

f k +i 1 kn ' /)i =

k -+ n

(n - k) n

i=1 (i(s-e+

k) 1)'

i

..... k+l

k .....

(4) g

This r u l e is d e s i g n a t e d the E - W rule. It was i n t r o d u c e d by H i r s c h a n d S t e d i n g e r (1986). This rule c a n be g e n e r a l i z e d j u s t as eqn. (2) is generalized, as: k i a k k+l-2an'

:~

i=1

....

k

=

(5) n - k (i - k - a) + - n (s - e + 1 - 2 a ) '

i

=

k + 1,

...,g

One c a n form an E x c e e d a n c e - W e i b u l l ( E - W ) f o r m u l a by s e t t i n g a = 0, an E x c e e d a n c e - G r i n g o r t e n ( E - G ) f o r m u l a with a = 0.44, a n d an E x c e e d a n c e ~ H a z e n f o r m u l a (E H) w i t h a = 0.5. It should be n o t e d t h a t the G a n d E G f o r m u l a s tend to give e s t i m a t e s w h i c h are v e r y close, a n d the differences d e c r e a s e w i t h i n c r e a s i n g v a l u e s of k. F o r the k floods ~> Q0, the H a n d E - H f o r m u l a s a r e identical, and for the r e m a i n i n g floods t h e y a r e v e r y close. Thus,

1 2 3 4

36,000 22,000 17,700 16,000

0.00050 0.0192 0.0373 0.0555

0.00100 0.0192 0.0373 0.0555

0.00050 0.0113 0.0298 0.0483

0.00056 0.0109 0.0294 0.0478

G 0.00050 0.0098 0.0283 0.0468

H 2000 52.2 26.8 18.0

EW

EG

EW

W

1/15 i

15~

1001 52.2 26.8 18.0

W 2000 88.2 33.6 20.7

EG

1786 91.7 34.0 20.9

G

2000 102.5 35.4 21.4

H

i

1 2 3 10 26 27

Q(ft~s 1)

1,020,000 740,000 654,000 482,000 343,000 338,000

(H)

H

0.0049 0.0099 0.0148 0.0494 0.1284 0.1454

0.0051 0.0102 0.0153 0.0510 0.1327 0.1447

0.0029 0.0080 0.0131 0.0488 0.1305 0.1402

0.0029 0.0080 0.0131 0.0490 0.1310 0.1378

0.0026 0.0077 0.0128 0.0487 0.1308 0.1369

202.5 101.2 67.5 20.2 7.8 6.9

EW

G

EW

E-G

1/~i

pi W

196.0 98.0 65.3 19.6 7.5 6.9

W

349.8 125.6 76.5 20.5 7.7 7.1

EG

348.4 125.1 76.2 20.4 7.6 7.2

G

390.0 130.0 78.0 20.5 7.7 7.3

H

Plotting positions (/5~) and r e c u r r e n c e i n t e r v a l s (1/,6i) for several of the l a r g e s t floods, S u s q u e h a n n a River at H a r r i s b u r g , Pa., U.S.A. n = 195, s - 91, k - 26, e = 20. P l o t t i n g position f o r m u l a s are: E x c e e d a n c e ~ Weibull (E W), Weibull (W), E x c e e d a n c e ~ G r i n g o r t e n (E G), G r i n g o r t e n (G) a n d H a z e n

TABLE 2

i

Q(m3s 1)

Plotting positions (/~i) and r e c u r r e n c e i n t e r v a l s (1//5,) for the four largest floods, Yellow River at San-Men-Xia, China. n - 1000, s = 54, k = 1, e = 0. Plotting position f o r m u l a s are: Exceedance~Weibull (E W), Weibull (W), E x c e e d a n c e ~ G r i n g o r t e n (E G), G r i n g o r t e n (G) and H a z e n (H)

TABLE 1

191

for this paper the E-H formula is not considered as separate from the H formula. EXAMPLE COMPARISONS

Tables 1 and 2 list the plotting positions and their reciprocals (recurrence intervals) for several selected floods at two sites with historical information: the Yellow River at San Men Xia, China (Teng, 1984), and the Susquehanna River at Harrisburg, Pennsylvania, U.S.A. (data from U.S. Geological Survey files, 1985). These two stations provide two disparate examples. The Yellow River is a case with an exceptionally high n value and only one extraordinary flood. The Susquehanna record is much shorter but contains many floods. BIAS IN PROBABILITY

One way to evaluate the various plotting position formulas is in terms of their bias with respect to probability. It may be argued that a desirable feature of a plotting position is that E[pi] - E[15i] should be small, where E[/5~] is the expected value of the probability plotting position of the ith largest flood and E[pi] is the expected value of the true exceedance probability of the ith largest flood. It is obvious that the Pi values are random variables, because the size of the ith largest flood is a random variable. It is less obvious that/~i is a random variable. For all the rules stated above, for i > k the plotting position/5i depends on k and e which are both random variables. For i < k,/5i is also random for the E-W and E-G formulas because it depends on the random variable k. In general, both E[p~] and E[/~i] are functions ofpe. In particular, for i ~< k (which are the cases of greatest interest in terms of design flood estimation), these expectations are computed as follows:

E[pi] = ~

i p ~ Prob[klk /> i]

k=lk+

(6)

1

and: E[pi]

/Si(k) Prob[klk >~ i]

=

(7)

k 1

where

and~i (k) denotes the particular formula for ~i, eqn. (2) or (5),which may, or m a y not, be a function of k. Figure I shows E[pl] and E[~I] for all five plotting position rules for n = 150 and k j- I. Cases where k = 0 were not considered in the bias calculations

192

E[p,]and E[p'~] n=150, k>1 0.01

>_1 ~n

< ,",n 0 rY

Legend TRUE WEIBULL GRINGORTEN E-WEIBULL E-GRINGORTEN

HAZEN 0.001

. . . . . .

0.3

,

1

.

.

EN

.

.

.

.

.

1~3

Fig. 1. Bias in probability. The expected value of the exceedance probability of the largest flood,

E[p 1] (curve marked "true"), and the expected value of the plotting position for the largest flood, E[fi~], for the five plotting position formulas: as a function of E[k], for n = 150, k > 1. because they are cases where the largest flood could not be ranked in the 150 yr period but only within some shorter s yr period. Fig. 1 presents these expected probabilities as a function of Elk] where E[k] = n Pe. Note that E[k] is unknown in practice and thus it would be impossible to set #, to equal E[pl] in general. The figure shows that for high values of E[k] (say, 5 or more), the W rule is essentially unbiased in probability. In fact, asymptotically as E[k] --, n, the W rule is unbiased in probability. For very low values of Elk], all five formulas have E[/51] >> E[pl]. In some sense the E-W rule may be viewed as a good compromise choice with nearly the lowest E[#I] for low values of Elk] and E[#I] for high values of E[k] which closely approximate the unbiased W rule. BIAS IN DISCHARGE It may be argued that a plotting position should be relatively unbiased in terms of discharges rather than in terms of probabilities (see Cunnane, 1978). Bias in discharge is defined as the difference between E[Qi] (the expected value of the ith largest flood) and E[F 1(1 #i)] (the expected value of the population cumulative distribution function evaluated at 1 minus the probability plotting position). Unlike the bias in probabilities, the bias in discharge is dependent on the distribution of floods. The evaluation of bias in discharge is only considered here for i ~< k. For given values ofpe, n and i, the expectation of Qi, (i <~ k) is:

193 TABLE 3 Ratio of expected value of discharge with exceedance probability/~1, to the expected value of largest flood for n = 150, as a function of the expected number of exceedances Elk], and distribution, given k > 1. Plotting position formulas are: Exceedanc~Weibull (E W), Weibull (W), Exceedance Gringorten (E G), Gringorten (G) and Hazen (H)

E[k]

Distribution type

C~

Plotting position EW

W

EG

G

H

0.75 0.75 0.75 0.75 0.75

Lognormal Gamma Gumbel Lognormal Gamma

1 1 1.14 4 4

0.90 0.91 0.89 0.79 0.83

0.83 0.84 0.83 0.68 0.72

0.91 0.92 0.90 0.81 0.84

0.90 0.91 0.89 0.79 0.82

0.91 0.92 0.91 0.81 0.84

1.5 1.5 1.5 1.5 1.5

Lognormal Gamma Gumbel Lognormat Gamma

1 1 1.14 4 4

0.94 0.95 0.94 0.86 0.89

0.88 0.89 0.88 0.76 0.80

0.96 0.96 0.96 0.90 0.92

0.94 0.95 0.94 0.86 0.89

0.96 0.97 0.96 0.91 0.93

4.5 4.5 4.5 4.5 4.5

Lognormal Gamma Gumbel Lognormal Gamma

1 1 1.14 4 4

0.96 0.96 0.95 0.89 0.91

0.93 0.93 0.92 0.84 0.87

1.00 1.01 1.00 0.97 0.99

1.00 1.00 1.00 0.96 0.99

1.01 1.01 1.01 0.99 1.01

30 30 30 30 30

Lognormal Gamma Gumbel Lognormal Gamma

1 1 1.14 4 4

0.93 0.94 0.93 0.84 0.87

0.93 0.93 0.93 0.84 0.87

1.00 1.00 1.00 0.97 0.99

1.00 1.00 1.00 0.97 0.99

1.01 1.02 1.02 0.99 1.01

E[Qi] =

"_L ~ E[QilPe, k] Prob[klk >/ i] k

Prob[klk and was over the value of

E[F

i

~ i] is d i s t r i b u t i o n f r e e , b u t E[Qi~e, k] d e p e n d s o n t h e d i s t r i b u t i o n e s t i m a t e d b y M o n t e C a r l o s i m u l a t i o n ( w i t h a t o t a l o f 20,000 r e p e t i t i o n s set of k values which have non-negligible probabilities). The expected t h e d i s c h a r g e f o r t h e g i v e n / ~ i v a l u e s is e v a l u a t e d by:

1(1 - / ~ i ) ]

=

~. F k

111 - / ~ ( k ) ]

P r o b [ k l k >/ i]

i

w h e r e F 1 [1 - / ~ i (k)] is t h e d i s c h a r g e w i t h t r u e e x c e e d a n c e p r o b a b i l i t y of/~i (k). T h e s e e x p e c t e d v a l u e s w e r e d e t e r m i n e d f o r n = 150 a n d f o r i = 1 a n d 2 (k ~ i), f o r f i v e d i f f e r e n t d i s t r i b u t i o n s : L o g n o r m a l w i t h Cs = 1 a n d 4, G a m m a w i t h C~ - 1 a n d 4, a n d G u m b e l (Cs = 1.14). I n a l l c a s e s t h e C~ - 0.5. T a b l e 3 provides a summary of the results, and Fig. 2 shows an example of the full r e s u l t s f o r o n e d i s t r i b u t i o n ( G a m m a , Cs = 1), f o r i = 1. T h e r e s u l t s s h o w t h a t for low values of

E[k],

all five plotting position formulas have severe negative

194

q0,] ond

n:150,k_~l GAMMA,CV--0.5,CS=I.0 7

L' I

6

5~

~

~

"1

w

o £E < :2: o

Legend TRUE WEIBULL_

\

'

GRINGO_RTEN

EE :--:-:;LR~[ -G~E N

~ .

6~~7_ ===:----:-:~<:~.~~-==: :=:=:c: ...............=--~-=~--_-.--HAZE" 5.5

200 YRFLOOO

~ -

- 150 YRFLOOD

5

,

0.3

....

,i

,

1

,

,

, , , ,p

10

E[k] Fig. 2. Bias in discharge. The expected value of the largest flood discharge, E[QJ (curve marked "true"), and the expected value of the population cumulative distribution function inverse evaluated at the probability plotting position, F 1(1 - /5]), for five plotting position formulas: as a function of E[k], for n = 150, k _> 1.

bias in relation to E[QJ. As E[k] increases, the bias decreases. For Cs = 1 or 1.14, the bias is negligible (3% or less) for E[k] ~> 3 for the G, E-G or H rules. For Cs = 4, these rules have negligible bias for E[k] > 10. Without exception, the W rule is the most biased. For very low E[k], the E-W is substantially less biased than the W and about equivalent to the G, E-G and H. For E[k] large, the E W is virtually indistinguishable from the W. The exact magnitude of the bias depends, of course, on the family of distribution, the skewness and E[k], and consequently one could develop special, optimal plotting positions for each situation (see Ji et al., 1984, for an example with only systematic record). However, skewness, distribution type and E[k] are unknown in practice. The results here show the G, E-G, and H to be essentially equal in bias for all of the cases considered. These biases are small in comparison to the variance of Q~. USE OF THE FORMULAS IN OBJECTIVE CURVE FITTING

One approach to estimating flood frequency distributions is to use the Qi and ~i values in some objective curve fitting procedure (see, for example, Cong et al., 1979). Given this possible use of the plotting positions, another way to evaluate various plotting position formulas is to check the accuracy of estimated quantiles of distributions fitted this way using each of several plotting position formulas.

195

BIAS OF 100 YR FLOOD ESTIMATES n - 200, s = 20, cv = 0.5, cs -- 1.625 (LOG NORMAL) 0.15-

O.tO -

0.05-

Legend MLE 0.00

GRINGORTEN

-

WEIBULL E WEIBULL ADJ M O M E N T S -0.05

. . . . . . .

10

30

E[k]

Fig. 3. Bias of estimated 100yr flood using objective curve fitting (minimize sum of absolute deviations) with three different plotting position formulas, and the MLE and adjusted moment procedure.

ROOT MEAN SQUARE ERROR OF 100 YR FLOOD n = 2 0 0 s = 20, cv = 0.5, cs = 1.625 (LOG NORMAL) 0.25X N X o.2o-

---2:,

\

x

~E 0Js-

Legend GRINGORTEN WEIBULL E WEIBULL ADJ M O M E N T S

0.05

. . . . . . .

1()

30

qk] Fig. 4. Root mean square error of estimated 100 yr flood using objective curve fitting (minimize sum of absolute deviations) with three different plotting position formulas, and the MLE and adjusted moment procedure.

196 The particular objective function selected was one Cong et al. (1979) found to be best: minimize the sum of the absolute deviations in discharge. Simulations were conducted in which flood records were generated from a twoparameter lognormal distribution (C~ = 0.5, C~ = 1.65). Records were generated with n -- 200 and s = 20 and one of eight different censoring thresholds (Q0). For any given record t h e g known floods (g = k + s - e) are ranked from Q~ (smallest) to Q1 (largest). For a given plotting position formula each of these floods is assigned a plotting position/5 i. The optimization searches over a range of means and variances to find a parameter set which minimizes:

i~

Qi - ~' 1(1 -

where F 1 is the estimated lognormal cumulative distribution function inverse. An estimate of the 100 yr flood, Q(0.99), is computed for the fitted distribution, and errors are calculated as [Q(0.99) - Q(O.99)]/Q(0.99) for each of 1000 Monte Carlo trials. These errors are summarized by their bias and root mean squared error (RMSE) and are shown in Figs. 3 and 4. The E -G and H results are not shown but are virtually identical to those for the G formula. Also shown on the figures are results for two other estimation methods. The first is the censored maximum likelihood approach (MLE) of Stedinger and Cohn (1986), and an adjusted moment procedure along the lines used by IACWD (1982) and by Cong et al. (1979). The implementation of adjusted moments used here computes the first two moments of the log flood discharges by computing the moments above the threshold and below the threshold and then applying weights of i to those above the threshold and (n - k)/(s - e) to those below the threshold (see eqns. 6-1 through 6-3c of IACWD, 1982). It should be noted that the adjusted moment techniques and the plotting position curve fitting techniques do not require exact knowledge of Q0, but r a t h e r the knowledge of which floods exceed it and which do not. The MLE requires knowledge of Q0 itself. When k = 0, the MLE utilizes the information that no exceedances occurred in n years, but the other methods do not. When k = 0, the curve fitting techniques take the record length to be s and compute plotting positions by the simple formulas (i - a)/(n + 1 - 2a). When k = 0, the adjusted moment procedure uses ordinary moments. RMSE values shown in Fig. 4 show that the MLE is most efficient and that the plotting position RMSE's follow the same order as their biases. All plotting positions show a similar decline in RMSE with increasing E[k] suggesting an effective utilization of available historical flood data. In fact, if bias were removed, the RMSE's for all plotting position methods would fall very close together and close to the MLE. The adjusted moment procedure suggests a very inefficient use of historical flood data, with only small declines in RMSE accompanying large increases in E[k]. This suggests a severe flaw in the adjusted moment procedure.

197 BIAS OF 100 YR FLOOD ESTIMATES n = 200, s = 20, cv = 0.5, cs = 1.625 (LOG NORMAL) n ESTIMATED FROM FIRST FLOOD 0.20 -

o.15-"....

\

\

\ 0.10-

-~

\

o.oo--- ~

~ ~ . ~

"'-.

Legend MLE GRINGORTEN WEIBULL

_E_W_E_LU_LL e __ ADJ MOMENTS

-0.05 10

30

E[k] Fig. 5. Bias of estimated 100 yr flood, with n estimated from the time of occurrence of the first flood, using objective curve fitting (minimize sum of absolute deviations) with three different plotting position formulas, and the MLE and adjusted moment procedure. T h i s M o n t e C a r l o e x p e r i m e n t is quite limited in t h a t it only considers n = 200, s = 20 a n d o n l y one distribution, t h e l o g n o r m a l . F u r t h e r e x p l o r a t i o n of the c u r v e fitting m e t h o d s should be c a r r i e d out, w i t h o t h e r d i s t r i b u t i o n s a n d w i t h the fitted d i s t r i b u t i o n b e i n g a different family of d i s t r i b u t i o n s t h a n the p o p u l a t i o n . T h e s e results, t o g e t h e r w i t h the bias in d i s c h a r g e r e s u l t s described above, do s u g g e s t t h a t a G or H p l o t t i n g p o s i t i o n m a y give the best r e s u l t s of a n y c u r v e fitting p r o c e d u r e in t e r m s of e s t i m a t i n g the 100 yr flood discharge. T h e M L E p r o c e d u r e a p p e a r s slightly m o r e a c c u r a t e , b u t it does rely on knowledge of Q0, w h i c h m a y n o t be a c c u r a t e l y known. MISSPECIFICATION OF n In m a n y of the l i t e r a t u r e e x a m p l e s of a n a l y s i s of h i s t o r i c a l floods (specifically Benson, 1950; D a l r y m p l e , 1960; I A C W D , 1982; a n d Zhang, 1982), one finds t h a t n is t a k e n to be the n u m b e r of y e a r s from the first e x t r a - o r d i n a r y flood to the present. This r e p r e s e n t s a l o w e r b o u n d e s t i m a t e of n, a n d it h a s a substantial influence on the r e s u l t i n g plots a n d fitted distributions. To explore the effect of u s i n g t h e l o w e r b o u n d e s t i m a t e of n r a t h e r t h a n t h e t r u e r e c o r d length, the M o n t e C a r l o e x p e r i m e n t described a b o v e was re-run. T h e s a m e 200 y r flood s e q u e n c e s w e r e g e n e r a t e d but, for all e s t i m a t i o n m e t h o d s , n was t a k e n to be the n u m b e r of y e a r s f r o m the first e x t r a o r d i n a r y flood to t h e end of record, or 20 (the s y s t e m a t i c r e c o r d length, s) w h i c h e v e r is g r e a t e r . Fig. 5 s h o w s the biases of t h e s e e s t i m a t e s of the 100 y r flood.

198 F o r E[k] small, t h e r e is a s u b s t a n t i a l u p w a r d bias added to all of the estimates. F o r E[k] large, say 10 or more, the effect is r e l a t i v e l y m i n o r because, with 10 or more e x t r a o r d i n a r y floods o c c u r r i n g in 200 years, the n u m b e r of years from the first e x t r a o r d i n a r y flood to the end of the r e c o r d is only slightly less t h a n 200. T h e r e l a t i v e m a g n i t u d e s of RMSE (not shown) are little c h a n g e d from those seen in Fig. 4. The small bias of adjusted moments is a c o n s e q u e n c e of c o m p e n s a t i n g errors. If unbiased estimates of distribution quantiles are desired, and if one is using the M L E or c u r v e fitting with G (or similar) plotting positions and the n u m b e r of e x t r a o r d i n a r y floods is small, t h e n it is vital t h a t a serious effort be made to c o r r e c t l y establish n, r a t h e r t h a n using the easily d e t e r m i n e d lower bound estimate of n. Beard (1962) noted the i m p o r t a n c e of identifying this c o r r e c t record length (n) and not simply using the period from the date of the earliest k n o w n flood. CONCLUSIONS A r e c o g n i t i o n of the f u n d a m e n t a l differences between a flood r e c o r d with historical and systematic data and records t h a t are e n t i r e l y s y s t e m a t i c leads to the c o n s i d e r a t i o n of a new class of plotting position formulas, the "Exc e e d a n c e " formulas. These formulas are based on a r e c o g n i t i o n t h a t the records are p a r t i a l l y censored samples, and the f r e q u e n c y of flooding above the censored t h r e s h o l d is a key d e s c r i p t o r of the d a t a set and subdivides the r a n g e of probabilities b e t w e e n the above-threshold and below-threshold groups. The use of the e x c e e d a n c e formulas r a t h e r t h a n t h e i r t r a d i t i o n a l counterparts (W, G or H) only makes a difference of p r a c t i c a l significance in the case of the W. In the systematic r e c o r d only case, the Weibull rule is k n o w n to be unbiased in terms of probability, and a G, H or similar rule is k n o w n to be' n e a r l y unbiased in terms of discharge for c o m m o n l y accepted flood-like distributions. With combined historical and systematic records, no rule appears unbiased in terms of probability, a l t h o u g h the E - W appears to be r o b u s t in this respect. The G, E G or H all a p p e a r r e l a t i v e l y unbiased in terms of discharge. If unbiased q u a n t i l e estimates are desired from an objective c u r v e fitting technique, t h e n the G, E - G or H formulas a p p e a r to be preferable to the W or E W formulas.

REFERENCES

Beard, L.R., 1962. Statistical Methods in Hydrology. U.S. Army, Corps Eng., Civ. Works Invest. Proj., Rep. CW-151. Benson, M.A., 1950.Use of historical data in flood-frequency analysis. Trans. Am. Geophys. Union, 31(3): 419~424. Chen, C., Yeh, Y. and Tan, W., 1974. The important role of historical flood data in the estimation of spillway design floods. Rep. Minist. Water Conserv. Elect. Power, Peking. Cong, S., Tan, W., Huang, S., Xu, Y., Zhang, W. and Chen, Y., 1979. Statistical testing research on the methods of parameter estimation in hydrological computation. Nanking, 27 pp.

199 Cunnane, C., 1978. Unbiased plotting position - - a review, J. Hydrol., 37: 205-222. Dalrymple, T., 1960. Flood-frequency analysis. U.S. Geol. Surv., Water Supply Pap., 1543-A. Gerard, R., and Karpuk, E.W., 1979. Probability analysis of historical flood data. J. Hydraul. Eng., 105(HY9): 115~1165. Gringorten, I.I., 1963. A plotting rule for extreme probability paper. J. Geophys. Res., 68(3): 813 814. Hazen, A., 1914. Storage to be provided in impounding reservoirs for municipal water supply. Trans. Am. Soc. Civ. Eng., Pap. 1308(77):1547 1550. Hirsch, R.M. and Stedinger, J.R., 1987. Plotting positions for historical floods and their precision. Water Resour. Res., 23(4): 715 727. Interagency Advisory Committee on Water Data, 1982. Guidelines for determining flood flow frequency. Hydrol. Subcomm. Bull. 17. U.S. Geol. Surv., Off. Water Data Coord. Ji, X., Ding, J., Shen, H.W. and Salas, J.D., 1984. Plotting positions for Pearson type-III distribution. J. Hydrol., 74:1 29. Johnson, N.L. and Kotz, S., 1970. Continuous Univariate Distributions 2. Houghton Mifflin, Boston, Mass., 306 pp. Leese, M.N., 1973. Use of censored data in the estimation of Gumbel distribution parameters for annual maximum flood series. Water Resour. Res., 9(6): 1534 1542. Natural Environmental Research Council, 1975. British Flood Studies Report, Vol. 1. Natl. Environ. Res. Counc., 550 pp. Stedinger, J.R. and Cohn, T., 1986. The value of historical and paleoflood information in flood frequency analysis. Water Resour. Res., 22(5): 785 793. Teng, W.A., 1984. Brief introduction of flood estimates for three projects in China. Beijing, 10pp. (unpubl.). Weibull, W., 1939. A statistical Theory of Strength of Materials. Ingenioersvetenskapsakad. 151 pp. Zhang, Y., 1982. Plotting positions of annual flood extremes considering extraordinary values. Water Resour. Res., 18(4): 859 864.