Log-logistic flood frequency analysis

Log-logistic flood frequency analysis

Journal of Hydrology, 98 (1988) 205-224 Elsevier Science Publishers B.V., Amsterdam - - Printed in The Netherlands 205 [4] LOG-LOGISTIC FLOOD FREQU...

862KB Sizes 39 Downloads 208 Views

Journal of Hydrology, 98 (1988) 205-224 Elsevier Science Publishers B.V., Amsterdam - - Printed in The Netherlands

205

[4]

LOG-LOGISTIC FLOOD FREQUENCY ANALYSIS

M.I. AHMAD', C.D. SINCLAIR' and A. WERRITTY2

IDepartment of Statistics, St. Andrews University, Fife KY16 9SS (U.K.) 2Department of Geography, St. Andrews University, Fife KY16 9SS (U.K.) (Received February 20; revised and accepted September 11, 1987)

ABSTRACT

Ahmad, M.I., Sinclair, C.D. and Werritty, A., 1988. Log-logistic flood frequency analysis. J. Hydrol., 98: 205-224. The log-logistic (LLG) distribution is evaluated for flood frequency analysis. Some of its properties and methods of parameter estimation are given, including a new method based on generalised least squares (GLS). The performance of the log-logistic distribution is compared with those of the generalised extreme value (GEV), three parameter log-normal (LN3) and three parameter Pearson (P3) distributions. The results are reported in terms of empirical distribution function (EDF) tests of goodness of fit, on both individual and regional flood series through the application of these distributions to a set of reasonably long annual maximum series for part of Scotland. Some reproductive properties of the LLG and GEV are also compared. In terms of four key properties the LLG performs better than the GEV, LN3 and P3 distributions and is thus commended for further analysis.

INTRODUCTION

A major unresolved problem in the field of flood frequency analysis is the identification of a statistical distribution which best represents the record of actual floods reported both at individual sites and across a whole region. Only when such a distribution has been correctly identified is it possible to obtain optimal estimates of the magnitude of flood quantiles for design purposes. It is with temerity that as a solution to this problem we propose the use of a new distribution for flood frequency analysis, but for the reasons advanced in this paper we believe that the log-logistic distribution merits serious appraisal both on theoretical and practical grounds. It is generally considered that the annual maximum flood series follows a skewed probability distribution (Natural Environment Research Council, 1975). However, as numerous studies have demonstrated, there is no general agreement amongst hydrologists as to which distribution best describes this annual maximum series (U.S. Water Resources Council, 1967; Natural Environment Research Council, 1975; Hosking et al., 1985a; Houghton, 1978; Matalas and Wallis, 1973; Rossi et al., 1984; Waylen and Woo, 1982). The reason for this lack of agreement is that all the distributions proposed thus far have been

0022-1694/88/$03.50

© 1988 Elsevier Science Publishers B.V,

2~

NOTATION a,b,c ~,~,k

AD B(.,.) sk CV CVM EDF F( ) GEV GLS K

KAP LLG LN3 ME ML OLS P3 PWM TCEV WAK Yn

log-logistic distribution parameters generalised extreme value distribution parameters Anderson Darling test Beta function coefficient of skewness Coefficient of variation Cramer-Von-Mises test Empirical distribution function Gamma function Generalised extreme value distribution Generalised least squares natural logarithm of sk Kappenman's method Log-logistic distribution Three parameter lognormal distribution moment estimates maximum likelihood estimates ordinary least squares Pearson type 3 distribution Probability weighted moments Two-component extreme value distribution Wakeby distribution largest standardised value in an annual maximum series

deficient in terms of one or more desirable properties. T h e ideal distribution for flood f r e q u e n c y analysis should possess each of the following properties: (a) it must r e p r o d u c e at least as m u c h v a r i a b i l i t y in flood c h a r a c t e r i s t i c s as is observed in empirical d a t a sets; (b) it must be insensitive to e x t r e m e outliers especially in the u p p e r tail; (c) it must h a v e a d i s t r i b u t i o n f u n c t i o n and an inverse d i s t r i b u t i o n f u n c t i o n t h a t can be explicitly expressed in a closed form; and (d) it must n o t be c o m p u t a t i o n a l l y complex n o r involve the e s t i m a t i o n of a large n u m b e r of parameters. T h e c u r r e n t l y available statistical distributions will now be e x a m i n e d in terms of these properties. The first p r o p e r t y forms the basis of the r e p r o d u c t i v e c r i t e r i o n (a) and includes the s e p a r a t i o n of skewness in observed and simulated floods ( M a t a l a s et al., 1975). In his extensive review of a large n u m b e r of c u r r e n t l y used distributions C u n n a n e (1986) concluded t h a t o n l y the two c o m p o n e n t e x t r e m e v a l u e (TCEV) and the W a k e b y (WAK) distributions satisfied this i m p o r t a n t r e p r o d u c t i v e criterion. In c o n t r a s t the generalised e x t r e m e v a l u e d i s t r i b u t i o n (GEV) was found to offer only a slight i m p r o v e m e n t o v e r a l t e r n a t i v e s such as the P e a r s o n type 3 (P3) or the log-normal (LN3). At p r e s e n t the GEV is f a v o u r e d for use in the U.K. ( N a t u r a l E n v i r o n m e n t R e s e a r c h Council, 1975) w h e r e a s the P3 is f a v o u r e d by the U.S. W a t e r Resources Council (1967).

207 Having thus excluded the GEV, P3 and LN3 we now examine the WAK and the TCEV distributions which at least have both proved successful in terms of the reproductive criterion. The WAK (Houghton, 1978) has five parameters and thus can accommodate its shape to provide a good fit across many data sets. However, the associated parameter estimates often have large standard errors which result in wide confidence intervals for the quantile estimates. Furthermore, its distribution function cannot be expressed in a closed form giving rise to problems in parameter estimation by maximum likelihood methods. Thus it fails to satisfy adequately the criteria (c) and (d) itemised above. The TCEV (Rossi et al., 1984) incorporates four parameters in order to describe a flood series generated by two distinct independent processes (e.g. snowmelt and frontal storms). However, evidence for separating two discrete processes is not necessarily simple nor unambiguous. Furthermore, parameter estimation using maximum likelihood methods on selected data sets can fail to achieve the required convergence. An explicit formula for the inverse of the TCEV does not exist and thus point and interval estimates of the quantiles are difficult to obtain. Thus the TCEV also fails to perform adequately in terms of criteria (c) and (d). Both the WAK and the TCEV jointly suffer from the disadvantage of requiring large data sets if an adequate number of degrees of freedom are to be generated for goodness-of-fit tests. More generally, none of the currently available statistical distributions performs adequately in terms of the four criteria set out above. This has prompted us to search for an alternative distribution which performs better. In this paper we investigate the properties of the three parameter log-logistic distribution (LLG) for flood frequency analysis and evaluate its performance. The data set initially comprises the annual maximum series from two catchments within a hydrologically homogeneous region in Scotland. Subsequently the data set comprises all the annual maximum series more than 26 years in length within the same hydrologically homogeneous region. The LLG is considered to be worth investigating in this context because of its relationship to other extremal distributions. It has recently been demonstrated by Lawless (1986) that the Weibull distribution plays a central role in the related field of reliability analysis because it is a unique distribution which belongs to two families of extreme distributions. Each of these two families in turn possesses desirable properties in analysing proportional hazards and accelerated failure time models. On generalising these families of extremal distributions the log-logistic distribution is found to be a member of both. THE LOG-LOGISTICDISTRIBUTION The log-logistic stands in the same relationship to the logistic distribution as the log-normal does to the normal, or the Weibull to the extreme value. The variable X is defined as being log-logistic if Y = log (X - a) has the logistic distribution. The probability density function (pdf), cumulative distribution function

208

(cdf), and inverse c u m u l a t i v e d i s t r i b u t i o n f u n c t i o n of the log-logistic distribution, with t h r e s h o l d p a r a m e t e r a, scale p a r a m e t e r b and shape p a r a m e t e r c, are respectively: f(x)

[(x - a)/b]-l/c c(x - a){1 + [(x - a)/b]-l/c} 2

=

1

Fix)

=

x

a + b[F/(1 - F)] ~

=

1 + [(x - a)/b] -1/~

(1) (2) (3)

w h e r e c > 0, b > 0 a n d x > a. The logistic reduced v a r i a t e Z = [Y - log (b)]/c has pdf, cdf and inverse cdf respectively: e-Z h(z)

-

(1 + e-Z) 2 1

H(z)

-

z =

log [H/(1 - H)]

l+e

-z

(4) (5) (6)

w h e r e log [ ] denotes n a t u r a l logarithm. T h e mean, v a r i a n c e and coefficient of skewness of Z are: E(Z)

= 0

(7) ~2

Var (Z)

-

3

Skew (Z) =

(8) 0

(9)

In order to c a l c u l a t e the m o m e n t s of the log-logistic variable X it is c o n v e n i e n t to define a new f u n c t i o n A ( j , c), with j as an i n t e g e r and c > 0 such t h a t A(j,c)

B(1 + j c , 1 - j c )

=

(10)

where B ( m , n) is the beta function, i.e.: B ( m , n)

=

F ( m ) F ( n ) / F ( m + n)

(11)

F(m) being the g a m m a function. The mean, v a r i a n c e and skewness of the L L G are: E(X)

=

V a r (X) Skew (X)

a + bA(1, c)

(12)

=

(13)

b2[A(2, c) - A2(1, c)] =

A(3, c) - 3A(2, c)A(1, c) + 2A3(1, c) [A(2, c) - A2(1, c)] ~/2

(14)

As in the case of W a k e b y d i s t r i b u t i o n ( H o u g h t o n , 1978), moments of some orders do not exist for a c e r t a i n r a n g e of values of the shape p a r a m e t e r . In

209 p a r t i c u l a r t h e s k e w n e s s is infinite unless c < 1/3, t h e v a r i a n c e is infinite unless c < 1/2, a n d the m e a n is infinite unless c < 1. Also, as in the case of the W a k e b y d i s t r i b u t i o n , this is no l i m i t a t i o n to its u s e for fitting to i n d i v i d u a l d a t a sets w h e r e v a l u e s of c g r e a t e r t h a n 1/3 m a y be required. T h e p r o b a b i l i t y w e i g h t e d m o m e n t s of o r d e r r, s a n d t of the s t a n d a r d i z e d log-logistic, w i t h a = 0, b = 1, are: E[XrF'(1

-

F ) t]

=

(15)

B(1 + s + rc, 1 + t - rc)

F o r desired v a l u e s of t the p r o b a b i l i t y w e i g h t e d m o m e n t s of t h e log-logistic d i s t r i b u t i o n c a n be derived f r o m t h e a b o v e equation. ESTIMATION OF THE PARAMETERS FOR THE LOG-LOGISTIC DISTRIBUTION Since the pdf, cdf a n d i n v e r s e cdf of t h e L L G c a n be explicitly specified, possible m e t h o d s for e s t i m a t i n g the p a r a m e t e r s of t h e d i s t r i b u t i o n include m o m e n t e s t i m a t e s (ME), m a x i m u m l i k e l i h o o d (ML), o r d i n a r y l e a s t s q u a r e s (OLS), g e n e r a l i s e d l e a s t s q u a r e s (GLS) a n d p r o b a b i l i t y w e i g h t e d m o m e n t s ( P W M ) ( G r e e n w o o d et al., 1979). E a c h of t h e s e m e t h o d s is described below a n d for i l l u s t r a t i v e p u r p o s e s will be used on the t w o d a t a sets itemised in T a b l e 1. T h e s e d a t a c o m p r i s e a n n u a l m a x i m u m series for S c o t t i s h c a t c h m e n t s t a b u l a t e d by A c r e m a n (1986). Moment estimate (ME)

C l e a r l y M E c a n be a c c o m p l i s h e d by solving eqns. (14), (13) a n d (12) in t h a t order, u s i n g s a m p l e v a l u e s for skewness, v a r i a n c e a n d m e a n . T h e following a p p r o x i m a t e s o l u t i o n to eqn. (14) is sufficiently a c c u r a t e for p r a c t i c a l p u r p o s e s o v e r t h e r a n g e 0 < c < 1. TABLE 1 Example data sets and their moments; annual maximum series in

m 3

S-1 for specified periods

Spey (at Kinrara): 1952-1982

89.8, 109.1, 202.2, 146.3, 212.3, 116.7, 109.1, 80.7, 127.4, 138.8, 283.5, 85.6, 105.5, 118.0, 387.8, 80.7, 165.7, 111.6, 134.4, 131.5, 102.0, 104.3, 242.5, 214.8, 144.6, 114.2, 98.3, 102.8, 104.3, 196.2, 143.7 Mean

Variance

Skewness

m0

m,

m2

145.3

4523.6

1.913

145.3

55.81

33.76

Kelvin (at Killermont): 1948-1982

98.3, 94.1, 90.1, 105.6, 76.4, 98.3, 128.2, 77.5, 79.9, 69.6, 60.4, 68.5, 78.7, 107.1, 114.9, 80.6, 68.3, 87.2, 91.8, 80.6, 68.3, 91.0, 64.5, 65.8, 53.9, 57.3, 91.0, 76.4, 86.5, 72.3, 73.6,80.6, 65.1, 77.0, 77. Mean

Variance

Skewness

mo

m I

m2

81.6

270.5

0.764

81.61

35.99

22.63

210 TABLE 2 Estimated values of p a r a m e t e r s Method

Parameters Spey data set

ME ML PWM OLS GLS

Kelvin data set

a

b

c

- 55.1 75.3 58.2 68.7 71.8

190.6 47.9 67.2 56.2 52.2

0.171 0.531 0.387 0.505 0.528

For skewness

sk and

a 27.1 33.1 18.5 28.1 25.9

sk 2 4.007 + 3.411

c

107.5 46.1 60.8 51.2 53.4

0.082 0.191 0.152 0.187 0.179

with ~ = log (sk):

c = e x p ( - 2.246 + 0.848~ - 0.1272~ 2 + 0.04008~3), c =

b

sk + 2.985sk ~'

if

sk

ifsk

< 2.5

> 2.5

However, it is not possible to obtain an estimate of c t hat exceeds 1/3 by this method, and there is a tendency to underestimate c, particularly in the case of samples from distributions with large values of c, and especially in the case of small samples. Since the estimated values of the other two parameters depend on the value of c, they can also be poorly estimated. Consequently we do not recommend t h a t this method be used to estimate the parameters of the log-logistic distribution. As an illustration of the method results for the Spey and Kelvin data are reported in Table 2.

Maximum likelihood (ML) From the pdf (1) the log of the likelihood function (L) may be written as: log(L)

=

nl°g(b)- nl°g(c)- (1 +~)

The first partial derivative of log (L) with respect to each of the three parameters to be estimated is equated to zero. This yields three nonlinear equations which when solved produce the ML estimates. These equations can be solved iteratively. Alternatively the log of the likelihood can conveniently be maximised numerically using the OPTIMISE facility in GENSTAT (1980). This approach has the advantage of providing approximate standard errors for the estimated parameters. P a r a m e t e r estimates for the Spey and Kelvin based on ML are reported in Table 2.

211 Probability weighted moments (PWM)

One of the possible P W M methods of e s t i m a t i n g the p a r a m e t e r s of the log-logistic d i s t r i b u t i o n is to use eqn. (15) with r and s t a k i n g v a l u e s 1 and 0 respectively, and t t a k i n g the v a l u e s 0, 1, 2. W h e n this is applied to the t h r e e p a r a m e t e r log-logistic we have: = aB(1, 1 + t) + bB(1 + c, 1 + t - c),

Mt

t

= 0,1,2

(16)

Solving these t h r e e e q u a t i o n s for c, b, a, in t h a t order, we obtain: c =

3 - 2(M0 - 3M2) M0 - 2M1

(17)

b

-

M0 - 2M1 c A(1, c)

(18)

a

=

M 0 - bA(1, c)

(19)

Sample values of the probability weighted m o m e n t s are c a l c u l a t e d from the d a t a using a suitable p l o t t i n g position, for example: Pi -

i - 0.35 , n

to replace F(xi)

this is: mt

= ~ (1 - pi) t xi/n,

t

= 0,1,2

(20)

The P W M estimates of the p a r a m e t e r s are found by s u b s t i t u t i n g m t for Mt (t = 0, 1, 2) in eqns. (17), (18) and (19). W i t h the e x c e p t i o n of eqn. (18) these are v e r y s t r a i g h t f o r w a r d equations. The g a m m a f u n c t i o n s in A(1, c) can readily be e v a l u a t e d using M I N I T A B (1985), or by using the following approximation: A(1, c~ =

1/(0.9993 + 0.0164c - 1.74337c ~ + 0.2321c ~ + 0.6c 4 - 0.10444c ~)

This is a c c u r a t e to t h r e e places of decimals for 0 < c < 1. P a r a m e t e r estimates based on this m e t h o d are r e p o r t e d in Table 2. H o s k i n g (1986) has r e c e n t l y r e p o r t e d on the P W M estimates for b o t h the logistic and the generalised logistic distributions. O r d i n a r y least s q u a r e s ( O L S ) a n d g e n e r a l i s e d least s q u a r e s ( G L S )

These are objective m e t h o d s for fitting a c u r v e to the g r a p h of order statistics against a logistic r e d u c e d variate. GLS uses the v a r i a n c e s and covariances of the o r d e r statistics, while OLS does not. In b o t h m e t h o d s it is n e c e s s a r y to use a plotting position, u, to estimate the d i s t r i b u t i o n f u n c t i o n c o r r e s p o n d i n g to the ith o r d e r statistic. In the n u m e r i c a l examples t h a t follow, this p l o t t i n g position was t a k e n as i/(n + 1). Using the c o n c e p t of inverse density and inverse d i s t r i b u t i o n f u n c t i o n

212 /OLS ii

4-0

GLS~IiML/PWM .ME

3'5-

3'0 i ii/I

.....

/

2.5

/

i i i

2.0

i

/

."

,/

I /I

/ r

,."

:-

...

.. .."

i/ . / .'

I //.

S'

1-5

1"0

..~ ° 0'5 - ~ " ' "

0.0

" -2

...."

.""

i

-1

() EV1 Reduced variate

Fig. 1. The LLG distribution fitted to standardised flows, Q/(~, for the River Spey at Kinrara (1952-1982) using parameter estimates based on GLS, OLS, ML, PWM and ME. (Parzen, 1979) from eqn. (3), the expected order statistics c a n be a p p r o x i m a t e d by the first order T a y l o r series: E(xi)

=

a + b exp(c0di) + b ( c -

C o ) d i exp(c0di)

(21)

where:

di R e g r e s s i n g the ordered v a l u e s o n their e x p e c t a t i o n s , w e h a v e n linear e q u a t i o n s in the u n k n o w n s a, b and b ( c - Co), w i t h coefficient matrix W. S t a r t i n g w i t h Co, an initial guess at the v a l u e of c, w e c a n i m p r o v e it via t h e

213 2"8 OLS/'~,. GLS //~' ML //// /

2.4

// z /~ / / z/// //// / Z/ /

2-0

"

IC 0

1.6

1.2

..~." / .~..oo° / /~. "°

0.8

..r."

/.

0.4

0'0

-2

-'1

6

~

~

~

~

EV1 Reduced variate

Fig. 2. The LLG distribution fitted to standardised flows, Q/Q, for the River Kelvin at Killermont (1948-1982) using parameter estimates based on GLS, OLS, ML, PWM and ME. s o l u t i o n to eqn. (21) for b(c - Co). R e p l a c i n g c o by t h e n e w v a l u e of c, w e c a n iterate until the c h a n g e in c is negligible, at w h i c h point the OLS e s t i m a t e for c is reached. For GLS w e require the c o v a r i a n c e b e t w e e n x i and x / t h e ith and j t h order statistics. This is a p p r o x i m a t e d by:

fxco ~

fXco ~

min

n + l'n

+ 1

(n + 1 ) 2

w h e r e fXco[i/(n + 1)] is t h e i n v e r s e d e n s i t y f u n c t i o n . D e f i n i n g D as t h e n × n d i a g o n a l m a t r i x w i t h ith entry fx[i/(n + 1)] e v a l u a t e d at Co, and V as the n × n matrix w i t h elements:

vo =

(n + 2) rain n + l ' ~ n

The e l e m e n t s of t h e i n v e r s e o f V are:

(n + 1 ) ~

214

<

Gaugi



Regic From o.

,

Fig. 3. Location of the 24 stations from region 2 (Acreman and Sinclair, 1986) used to evaluate the performance of the LLG.

V ii

=

2(n + 1),

v il

=

v ji

=

v il

=

0,

]i-jl

-(n

i =

1. . . . .

+ 1),

n

i -- 1 . . . . .

n - 1; j

=

i + 1

> 1

Writing M = DV-1D,

the G L S i t e r a t e for a, b, b ( c - Co) is:

( W T M W ) -1 W T M x w h e r e T d e n o t e s t r a n s p o s e d , a n d x d e n o t e s t h e v e c t o r of o r d e r e d o b s e r v a t i o n s . OLS a n d GLS p a r a m e t e r e s t i m a t e s for the Spey and K e l v i n d a t a are r e p o r t e d in T a b l e 2. G r a p h i c a l p r e s e n t a t i o n s of t h e d a t a a n d fitted d i s t r i b u t i o n c a n follow the u s u a l p r a c t i c e of h a v i n g ( s t a n d a r d i s e d ) flow on the v e r t i c a l axis, a n d on the h o r i z o n t a l axis e i t h e r the f a m i l i a r EV1 r e d u c e d v a r i a t e , log [ - l o g ( p ) ] , or

215 a g a i n s t t h e c o r r e s p o n d i n g l o g i s t i c r e d u c e d v a r i a t e , l o g [p](1 - p)]. F i g u r e s 1 and 2 show the standardised observations and the LLG fitted by several methods for the Spey and Kelvin data respectively. The growth curves obtained by GLS, OLS and ML are in fairly close agreement. However, PWM and especially ME result in shallower curves. These graphs confirm that the Spey data has a steeper growth curve than the Kelvin data, as suggested by the numerical values for the shape parameter c in each case. APPLICATION OF THE LLG MODEL TO INDIVIDUAL SITES AND TO A HYDROLOGICAL REGION A l t h o u g h i t is e s s e n t i a l t h a t a c a n d i d a t e d i s t r i b u t i o n p r o v i d e s a g o o d fit f o r a n n u a l m a x i m u m s e r i e s a t i n d i v i d u a l s i t e s , i t is w e l l - k n o w n t h a t s u c h t e s t s o n their own cannot determine the selection of the best flood frequency distribution. This arises because flood series at individual sites often include o u t l i e r s , i.e. r a r e e v e n t s o f a h i g h m a g n i t u d e w h o s e r e t u r n p e r i o d is d i f f i c u l t t o estimate because of the comparative brevity of the flood record. Thus a rigorous evaluation of a candidate distribution requires a regional analysis in which all the annual maximum series across a hydrologically homogeneous TABLE 3 Basic characteristics of the data from region 2 (Acreman and Sinclair, 1986) No. River (station)

n

mean

CV

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

44 31 31 31 30 30 31 31 53 28 31 30 31 28 34 34 26 66 35 27 27 27 26 36

457.6 145.3 232.3 176.9 472.9 106.2 80.3 249.5 432.6 49.4 803.1 997.7 341.7 102.7 203.9 510.5 446.4 76.6 81.6 284.0 210.5 402.4 15.5 54.3

0.444 0.463 0.440 0.402 0.353 0.494 0.220 0.322 0.408 0.319 0.256 0.197 0.329 0.339 0.883 0.393 0.240 0.343 0.202 0.302 0.349 0.263 0.238 0.178

Spey (Aberlour) Spey (Kinrara) Avon (Delnashaugh) Spey (Boat of Garten) Spey (Boat o'Brig) Spey (Invertruim) Dulnain (Balnaan Bridge) Spey (Grantown) Dee (Woodend) Isla (Forter) Tay (Caputh) Tay (Ballathie) Tay (Pitnacree) Almond (Craigie Hall) Tweed (Peebles) Tweed (Dryburgh) Nith (Friar's Carse) Irvine (Kilmarnock) Kelvin (Killermont) Clyde (Hazelbank) Clyde (Sills of Clyde) Clyde (Blairston) Kelvin (Bridgend) Kelvin (Dryfield)

sk

1.729 1.913 1.050 1.506 0.953 1.517 0.658 1.068 1.677 1.202 1.111 - 0.155 0.694 -0.010 3.84 1.558 0.822 3.172 0.764 1.312 1.361 0.383 0.263 0.260

Yn

2.71 2.67 2.25 2.32 1.97 2.42 1.55 1.95 2.62 2.01 1.82 1.37 1.70 1.60 5.29 2.30 1.56 2.96 1.57 1.81 1.95 1.64 1.51 1.41

216

region are standardised and combined to produce a single flood series. Acreman and Sinclair (1986) have identified five such regions for Scotland by classifying basins in terms of their physical characteristics. From these five regions the second has been selected for evaluating the performance of the LLG (Fig. 3). This region comprises 118 individual sites with annual flood series varying in length from 5 to 66 years. Since the moments of order higher than two of such data sets inherently possess a large sampling variation, only those series with at least 26 years of record have been selected for this test of the LLG. In the light of this screening there are 24 such data sets within the total of 118. The basic properties of these annual maximum series are itemised in Table 3 and their locations recorded in Fig. 3. The data used in the regional analysis are standardised by dividing each observation in a flood series by the appropriate series arithmetic mean. On pooling these standardised data from the 24 long data sets a single sequence comprising 798 station years is created. TABLE 4 Comparison of GEV/PWM and LLG/GLS based on EDF test statistics calculated from individual annual maximum series, (see Table 3) Station no.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

AD

CVM

KS

LLG

GEV

LLG

GEV

LLG

GEV

0.52 0.26 0.22 0.23 0.33 0.36 0.33 0.32 0.71"* 0.24 0.47 0.62* 0.70"* 0.37 0.51 0.33 0.21 0.39 0.18 0.45 0.41 0.27 0.62* 0.31

0.52* 0.29 0.22 0.28 0.36 0.39 0.33 0.40 0.73** 0.25 0.47 0.46 1.09"** 0.33 0.55* 0.36 0.23 0.43 0.19 0.51" 0.44 0.24 0.51" 0.28

0.091" 0.034 0.025 0.026 0.049 0.054 0.043 0.043 0.109"* 0.037 0.068 0.097* 0.115"* 0.050 0.074 0.046 0.024 0.052 0.026 0.068 0.068 0.035 0.102"* 0.046

0.093** 0.034 0.029 0.033 0.056 0.065 0.045 0.058 0.116"* 0.046 0.067 0.071 0.186"** 0.039 0.089** 0.053 0.031 0.059 0.032 0.087* 0.082* 0.033 0.082* 0.040

0.083 0.074 0.063 0.061 0.096 0.103 0.073 0.097 0.081 0.085 0.087 0.117 0.135" 0.098 0.101 0.101 0.064 0.080 0.079 0.080 0.107 0.074 0.125 0.071

0.082 0.082 0.063 0.068 0.104 0.114" 0.068 0.104 0.082 0.079 0.107 0.100 0.184"** 0.079 0.094 0.116" 0.076 0.098** 0.079 0.097 0.128" 0.062 0.116 0.075

*Significant at 10%. ** Significant at 5%. ***Significant at 1%.

217 Individual data sets within a region Comparison of the fit of the LLG and GEV distributions to individual data sets has been carried out using empirical distribution function (EDF) tests, namely the Kolmogorov-Smirnov (KS), Cramer-Von-Mises (CVM) and Anderson-Darling (AD) tests. The values of the test statistics AD, CVM and KS for the LLG fitted by GLS and for the GEV fitted by PWM for the 24 individual data sets are presented in Table 4. The significant points for AD, CVM and KS for the GEV are the results of simulations by the authors, while those for the LLG are adapted from Stephens (1979). Note that these significant points apply specifically to the situation where the parameter values are estimated. From Table 4 it is clear that LLG generally fits better than GEV to these individual data sets. The LLG provides a better fit than the GEV at all except four of the sites. From Table 3 we note that these four are the sites with skewness less than 0.4. For both the AD and CVM tests the number of sites at which LLG is rejected at the 10% level is not significantly more than the expected 2.4. For both the AD and CVM the number of sites at which the GEV is rejected at the 10% level is significantly more than the expected 2.4. Thus not only does the LLG fit better than the GEV, but the LLG fit is adequate while the GEV fit is not. Regional data Four distributions are compared in terms of their performance on the regional data described above. The four distributions to be evaluated are the LLG, GEV (fitted by ML), LN3 and P3 (fitted by the Kappenman (1985) method). The comparisons are reported in Table 5 using the EDF statistics AD, CVM and KS and also the RSME (root mean squared error for scaled data). Significantly bad fits are indicated by the usual notation using percentage points for AD, CVM and KS for the LN3 and P3 as adopted in Stephens (1974) and Pettitt and Stephens (1983). It is clear from the AD and CVM tests that not only does the LLG fit the data markedly better than the GEV, LN3 or P3, but it is also the only distribution that fits the data adequately. The lack of significantly bad fits TABLE 5 RMSE and EDF statistics for various distributions fitting 798 station years of grouped regional data Distributions

AD

CVM

KS

RMSE

LLG/ML GEV/ML LN3/KAP P3/KAP

0.46 1.07"** 4.22*** 5.97***

0.07 0.16"** 0.69*** 1.00"**

0.020 0.028 0.058 0.066

0.05 0.09 0.13 0.13

***Significant at 1%.

218 6-



~//////1 /

/1 // / GEV

I0

~3

,,,°/// •

//

...7"

0

i

i

E V l R e d u c e d variate

Fig. 4. GEV and LLG fits to the regional data using 798 station years of record on an EV1 reduced variate scale. Only the upper 50 data points are shown.

on the basis of the KS test is evidence of the well-known lack of power of this test relative to the others (Stephens, 1976). No significance tests have been applied to the RSME estimates, but the values indicate that the LLG is the distribution that best fits these data. It should be noted that Acreman and Sinclair (1986) found that the data for the whole region (including the shorter records omitted here) were not adequately fitted by either the GEV, TCEV or WAK distributions• The GEV and LLG fits to the regional data are shown in Fig. 4 using an EV1 reduced variate scale and only recording individually the largest 50 data points. The plotting position G = (i - 0.35)/n as recommended by Hosking (1985) is used in producing this and other comparable figures. In examining the results it is clear that the LLG curve lies much closer to the data than does the GEV curve. It is also noteworthy that the LLG growth curve is steeper than

219 t h a t for the GEV and t h u s is c o m p a r a b l e to the g r o w t h curves r e p o r t e d by the N a t u r a l E n v i r o n m e n t R e s e a r c h Council (1975).

Outliers It is i m p o r t a n t to n o t e t h a t the largest flood in the d a t a set summarised in fig. 4 is m a r k e d l y above the LLG c u r v e and even f u r t h e r r e m o v e d from the GEV c u r v e and the o t h e r d a t a points. Clearly this point should be r e g a r d e d as an outlier. It specifically relates to a flow of 1079 m3s ' on the River Tweed at Peebles on 7th J a n u a r y 1949. A l t h o u g h a clear o u t l i e r for the d a t a set r e p o r t e d in this study, this flow was s u b s t a n t i a l l y lower t h a n the flood r e p o r t e d at the same site on 12th A u g u s t 1948 but not included in the d a t a v o l u m e by N a t u r a l E n v i r o n m e n t R e s e a r c h Council (1975). In order to examine the effect of outliers on the L L G and GEV distributions the above analyses were r e p e a t e d o m i t t i n g the 1949 outlier. The r e s u l t i n g changes in the p a r a m e t e r estimates are r e p o r t e d in Table 6 and the associated changes in the g r o w t h curves d e m o n s t r a t e d in Fig. 5. The GEV shape p a r a m e t e r k is s u b s t a n t i a l l y r e d u c e d from - 0 . 0 5 4 to - 0.030 thus g e n e r a t i n g a flatter curve. In c o n t r a s t the LLG shape p a r a m e t e r exhibits only a small decrease from 0.204 to 0.199 effecting a v e r y slight c h a n g e in the shape of the g r o w t h curve. We t h e r e f o r e conclude from this analysis t h a t the LLG is more r o b u s t t h a n the GEV in dealing with outliers at the u p p e r end of flood f r e q u e n c y series. REPRODUCTIVE PROPERTIES As n o t e d in the i n t r o d u c t i o n , it is the r e p r o d u c t i v e ability of a c h o s e n model w h i c h c h a r a c t e r i s e s its p e r f o r m a n c e in r e p r o d u c i n g the basic shape of empirical flood f r e q u e n c y curves (Cunnane, 1986). In this final section we c o m p a r e the L L G with the GEV in terms of e a c h distribution's ability to r e p r o d u c e the v a r i a t i o n in skewness i n h e r e n t in the Scottish d a t a set. This is u n d e r t a k e n via a series of Monte-Carlo simulations. In order to o b t a i n records of equal length, the 24 a n n u a l m a x i m u m series TABLE 6 Parameter estimates using ML for regional data: (1) with outlier included in the analysis; and (2) without the outlier Distribution

Parameters a

b

c

LLG1 LLG2

0.1241 0.111

0.8102 0.8274

0.2042 0.1994

GEV1 GEV2

0.25 0.25

# 0.84 0.84

k - 0.054 - 0.030

220

?

6

OutLier -- ®

~/LLG2 /~/

GEV1/'

!

/I /

./ /

ii

// r] /

~"/S'~'/'I'// 77

• .

~3

] It J

,/,'GEV2 / /

I IG

/

/ / 1/] / /]

,:f/

///

• •



.o

-4

-'2

6

:2

'$

//~z

///

(~

8

10

1':2

Logistic reduced variate

Fig. 5. GEV and LLG fits to the regional data: (1) including; and (2) excluding one extreme outlier, using a logistic reduced variate scale. Only the upper 49 data points are shown.

w e r e trimmed to yield n o n o v e r l a p p i n g s e q u e n c e s c o n t a i n i n g 26 years of record. In t w o cases (the River D e e at W o o d e n d and the River Irvine at K i l m a r n o c k ) the o r i g i n a l series w e r e sufficiently l o n g to yield t w o s u c h sequences• This procedure t h u s g e n e r a t e d a total data set of 26 a n n u a l flood s e q u e n c e s e a c h of w h i c h w a s 26 years long. The s k e w n e s s and largest standardised ordered v a l u e s of Yn w e r e c a l c u l a t e d for e a c h sequence• The 26 s e q u e n c e s w e r e t h e n standardised by d i v i s i o n by the appropriate m e a n for e a c h s e q u e n c e and pooled to form a single s e q u e n c e 676 years in length• B o t h the LLG and t h e G E V w e r e fitted to these pooled data u s i n g ML, the r e s u l t i n g shape p a r a m e t e r s being c = 0.203 for the LLG and k = - 0.052 for the GEV. U s i n g t h e s e e s t i m a t e d p a r a m e t e r s a t h o u s a n d series e a c h 26 years in l e n g t h were i n d e p e n d e n t l y g e n e r a t e d for e a c h distribution• F o r e a c h series the s k e w n e s s and Yn v a l u e s w e r e c a l c u l a t e d t o g e t h e r w i t h their a s s o c i a t e d C V as reported in Table 7. It is clear from t h e s e

221 TABLE 7 Coefficient of variation for skewness and for Yn for observed and 1000 simulated floods series.

Observed Simulated GEV Simulated LLG

Yn

sk

0.352 0.183 0.300

0.785 0.640 0.780

results that the LLG is far superior to the GEV in reproducing both the skewness values and the size of the largest flood for individual flood series. A similar experiment was conducted without trimming; i.e. using the original records with lengths which varied between 26 and 66 years and with 5 ¸

/LLG

4

3

",,",-" " ~" ..j~/~. / ,,.~.,.,'J,,.'""'~ GEV

2

/ ,," i I

o2 2

J

-t



I

I

I



0

EV1 Reduced variate

Fig. 6. Observed distribution (dots) of skewness compared with the sampling experiment derived distributions for skewness for the LLG (solid line) and GEV (dashed line).

222 6-

5

LLG 4

/,~

3

2

~

,...,..,.~'GEV

.SSI" ~,o •

1

EV1 Reduced variate

Fig. 7. Observed distribution (dots) of Yn comparedwith the samplingexperimentderived distributions for Yn for the LLG (solid line) and GEV (dashed line). a modal value of 31 years. Using the regional parameter estimates from Table 6, a thousand simulated sequences 31 years in length were generated from each distribution. The resulting distributions of skewness and the largest maximum flood Yn are shown in Figs. 6 and 7 together with those for the observed flood series (Versace et al., 1985). In such diagrams the separation effect is demonstrated by the tendency for the observed points to be above the curve for the simulated data sets. It is apparent from these figures that the LLG is relatively close to the observed flood curve whilst the GEV is less satisfactory particularly at the upper extreme which is the part of the curve which is especially important in the calculation of design floods. CONCLUSIONS In this initial appraisal of the LLG we have demonstrated that it has many properties well-suited for the modelling of flood frequency data. In terms of the

223

four properties itemised in the introduction, the LLG has consistently performed better than other candidate distributions, i.e. the GEV, TCEV, WAK, LN3 and P3. Specifically the LLG is capable of reproducing the same degree of skewness typically present in observed data [property (a)]. In a large data set it has also demonstrated that it is not unduly sensitive to the presence of an extreme outlier in the upper tail [property (b)]. Lastly, the pdf, cdf and inverse cdf can also be expressed in relatively simple closed expressions. Hence the standard methods of parameter estimation on the three parameters can readily be implemented. In applying the LLG to a variety of empirical data sets including both individual sites and regionally grouped sites, the distribution has performed extremely well. On the basis of EDF tests we have concluded: (1) that the LLG provides a better fit than the GEV for reasonably long flood series (> 24 years in length) at individual sites; and (2) that it provides a better fit than the GEV, LN3 and P3 on a regional basis. In the light of these results we commend the use of the log-logistic distribution in flood frequency analysis and hope that this paper will trigger a more detailed appraisal of its potential within engineering hydrology. ACKNOWLEDGEMENT

We are grateful to a referee for constructive criticism of an earlier version. REFERENCES Acreman, M.C., 1986. Estimating flood statistics from basin characteristics in Scotland. Unpubl. Ph.D. Thesis, Univ. of St. Andrews, 318 pp. Acreman, M.C. and Sinclair, C.D., 1986. Classification of drainage basins according to their physical characteristics; an application for flood frequency analysis in Scotland. J. Hydrol., 84: 365-380. Cunnane, C., 1986. Review of statistical models for flood frequency estimation. A key note paper presented at the Int. Symp. Flood Frequency and Risk Analysis. Baton Rouge, La., 51 pp. GENSTAT, 1980. Optimisation. Lawes Agricultural Trust, Rothamsted Experimental Station. Part I, chapter 12; Part II, chapter 13. Greenwood, J.A., Landwehr, J.M., Matalas, N.C. and Wallis, J.R., 1979. Probability weighted moments. Definition and relation to parameters of distributions expressible in inverse form. Water Resour. Res., 15 (5): 1049-1054. Hosking, J.R.M., 1986. The theory of probability weighted moments. I.B.M. Res. Rep., RC 12210 (No. 54860), Thomas J. Watson Res. Center, New York, N.Y. Hosking, J.R.M., Wallis, J.R. and Wood, E.F., 1985a. An appraisal of the regional flood frequency procedure in the U.K. Flood Studies Report. Hydrol. Sci. J., 30 1 (3): 85-109. Hosking, J.R.M., Wallis, J.R. and Wood, E.F., 1985b. Estimation of the Generalised Extreme Value distribution by the method of probability weighted moments. Technometrics, 27 (3) 251-261. Houghton, J.C., 1978. Birth of a parent: The Wakeby distribution for modelling flood flows. Water Resour. Res., 14 (6): 1105-1109. Kappenman, R.F., 1985. Estimation for the three parameter Weibull, Lognormal and Gamma distributions. Comput. Stat. Data Anal., 3: 11-23. Lawless, J.F., 1986. A note on lifetime regression models. Biometrika, 73 (2): 509-512. Matalas, J.C. and Wallis, J.R., 1973. Eureka! It fits a Pearson type 3 distribution. Water Resour. Res., 9 (2): 281-289. Matalas, J.C., Wallis, J.R. and Slack, J.R., 1975. Regional skew in search of a parent. Water Resour. Res., 11 (6): 815-826.

224 MINITAB, 1985. Release 5.1, Minitab Inc., State College, PA 16801, USA, 232pp. Natural Environment Research Council, 1975. Flood studies report. Nat. Environ. Res. Counc., HMSO, London. Parzen, E., 1979. Nonparametric statistical data modelling. J. Am. Stat. Assoc., 74: 105-120. Pettitt, A.N. and Stephens, M.A., 1983. EDF statistics for testing for the Gamma distribution. Tech. Rep., Dep. Stat., Stanford Univ. Rossi, F., Fiorentino, M. and Versace, F.P., 1984. Two component extreme value distribution for flood frequency analysis. Water Resour. Res., 20 (7): 847~856. Stephens, M.A., 1974. EDF statistic for goodness of fit and some comparisons. J. Am. Stat. Assoc., 69: 730-737. Stephens, M.A., 1976. Asymptotic power of EDF statistic for exponentiality against Gamma and Weibull alternatives. Tech. Rep. No. 297, Dep. Stat., Stanford Univ. Stephens, M.A., 1979. Tests of fit for the logistic distribution based on empirical distribution function. Biometrika, 66:591 595. U.S. Water Resources Council, 1967. Guidelines for determining flood flow frequency. Hydrol. Comm., U.S. Water Resour. Counc., Washington, D.C. Versace, P., Fiorentino, M. and Rossi, F., 1985. Regional flood frequency estimation using the two component extreme value distribution. Hydrol. Sci. J., 30 (1): 51~4. Waylen, P. and Woo, Ming-ko, 1982. Prediction of annual floods generated by mixed processes. Water Resour. Res., 18 (4): 1283-1286.