Parameter estimation for the Pearson type 3 distribution using order statistics

Parameter estimation for the Pearson type 3 distribution using order statistics

Journal of Hydrology, 133 (1992) 215-232 ~15 Elsevier Science Publishers B.V., A m s t e r d a m [I] Parameter estimation for the Pearson type 3 d...

1MB Sizes 16 Downloads 39 Views

Journal of Hydrology, 133 (1992) 215-232

~15

Elsevier Science Publishers B.V., A m s t e r d a m

[I]

Parameter estimation for the Pearson type 3 distribution using order statistics S. Rocky Durrans Civil Engineering Department. 7"i~eUniversity of Alabama Tuscah~osa.AL 35487, USA (Received 11 February 1991; revision accepted 2 August 1991)

ABSTRACT Durr~ns, S.R., 1992. Parameter estimation for the Pearson type 3 distribution using order statistics. .l }4ydrol., I 3: 215-232. The Pearson type 3 distribution and its relatives, the log Pearson type 3 and gamma family o f distributions, are among the most widely applied in the field of hydrology. Parameter estimation for these distributions has been accomplished using the method o f moments, the methods of mixed moments and generalized moments, and the methods o f maximum hkelihood and maximum entropy. This study evaluates yet another estimation approach, which is based on the use of the properties of an extreme-order statistic. Based on the hypothesis that the popu!ation is distributed as Pearson type 3, this estimation approach yields both parameter and 100-year quantile estimators that have lower biases and variances than those of the method of moments approach as recommended by the US Water Resources Council.

INTRODUCTION

The Pearson type 3 (P3) distribution and its relatives, the log Pearson type 3 (LP3) and gamma family of distributions, are among the most widely applied in the field of hydrology. These distributions have been utilized for modeling of precipitation amounts, annual runoff volumes, and most ubiquitously, annual flood peaks. Following recommendation of the LP3 distribution for use as the population of annual flood peaks in the USA (Benson, 1968), these distributions have been the subject of a considerable amount of research. Of particular interest has been the performance of competing parameter estimation schemes, both in terms of their ability to yield reliable parameter values and, of greater interest to the practitioner, to yield reliable estimates of quantiles. Methods of parameter estimation which have beer used for the LP3 distribution in particular include two method of moments techniques, the methods of mixed moments and generalized moments, the method of maximum likelihood, and the method of maximum entropy. Other methods, including probability weighted moments (Greenwood et al., 1979)

0022-! 694/92/$05.00

© 1992 - - Elsevier Science Publishers B.V. All rights reserved

216

S.R. DURRANS

and L-moments (Hosking, 1989), have not enjoyed much popularity in the case of this distribution. The first of the two method of moments techniques, and probably the method most often used, is that recommended by the US Water Resources Council (WRC, 1981), now the Interagency Advisory Committee on Water Data (IACWD). Given an observed flood sequence {ql, q 2 . - . q,} one computes a log-transformed series xi -- In q; (or x i -- logl0qi). The sample me:m, samp!e variance, and a weighted combination of the sample coefficient of skewness of this transformed series and a regional coefficient of skewness value are then equated to the corresponding population values for a Pearson type 3 U,~'~'-:"'';""~LL~,u.~,..The second of the method of moments approaches, presented by Bobee (1975), is based on equating sample moments of the q; series to the population values for the LP3 distribution. A common criticism of these two approaches is that they rely on sample estimates of the coefficient of skewnc ~ T'his statistic is known to be significantly negatively biased (Wallis et aL, 1V74), algebraically bounded (Kirby, 1974), and to have large sampling variability. In the method presented by Bobee, population moments do not exist for some combinations of parameter values and thus the applicability of that approach must be constrained to a feasible parameter space. Recognizing the undesirable properties of sample estimates of the coefficient of skewness, Rao (1980) presented a method of mixed moments. The strategy was to eliminate the use of third-order sample moments by combining moments of the observed qi and log-transformed x i data series. Evaluation of the mixed moments approach appeared to indicate that the best combination of moments involved the sample mean and variance of the observed (realspace) data and the sample mean of the transformed (log-space) data, i.e. the geometric mean of the real-space data. Phien and Hira (1983), in a similar study, concluded that the best combination involved the sample mean and variance of the real-space data and the sample variance of the log-space data. This idea of mixing moments also led Bobee and Ashkar (1988) to develop a generalized method of moments. This latter method permits the use of moments of any order, positive or negative, and also gives rise to the so-called sundry averages method (SAM) which is based on use of the arithmetic, geometric, and harmonic means of the real-space data. Within the class of parameter estimation methods the method of maximum likelihood is generally regarded as being the most desirable, at least as the sample size becomes large. It is not clear, however, that its asymptotic properties hold for the relatively small sample sizes normally found in hydrology. In the case of the P3 distribution, application of the method of maximum likelihood is also attended by pitfalls. In such applications of the method, the resulting estimators are not jointly sufficient (Bobee, 1979), have

PARAMETER ESTIMATION FOR PEARSON TYPE 3 DISTRIBUTION

2t7

multiple roots for the locatJ ~n parameter I,Rao, 1986; Arora and Singh, 1988), and have been known to yield estimates of the location parameter lying within the range of the observed data (Kite, 1977). Furthermore, maximum likelihood solutions exist only in cases where the absolute value of the coefficient of skewness is less than two. The method of maximum entropy (Singh and Singh, 1985) permits estimates to be obtained in this latter range of parameter values but this method, like the method of maximum likelihood, is computationally intensive to apply. This property of these estimation schemes, and perhaps a lack of understanding of them by practitioners, has contributed to the popularity of the more easily applied moment-based approaches. Although the moment-based approaches are attractive for computational reasons, they must be applied with a certain deg~ee of caution. If sample estimates of the coefficient of skewness are used, recognition of their inefficiency should be taken into account. Yevjevich (1972) has suggested that moment-based estimates of the coefficient of skewness in excess of an absolute value of about 0.5 be held suspect. In the case of mixed or generalized moments, care must be taken when the variance is small or the coefficient of skewness is large because population moments in real-space may not exist. The primary limitation of moment-based methods occurs then for moderately to highly skewed data sets. The method of maximum likelihood might be applied in such situations but experience has shown that it will often yield either no solution or an unreasonable solution. The computational intensiveness of the method of maximum entropy also discourages its application. In light of these problems of estimation in the presence of high skewness, the present work evaluates the performance of a scheme which yields reasonable estimators. The method is sin,ilar in spirit to the mixed and generalized moment approaches in that it avoids use of moments of order higher than two. It differs fundamentally from those methods, however, in that rather than using another moment to supplement the sample estimates of the mean and variance, it employs instead the properties ef an extreme-order statistic. A result of this approach is that the properties of an estimator for the coefficient of skewness actually improve with increasing population skewness. The method is also extremely simple to apply. The estimation scheme described here is not entirely new. It has been suggested by Bain (1978) and by LaU and Beard (1982). Variants of the approach bare also been given by Bowman and 3benton (1988). Discussions ip these references, however, have been largely qualitative; results of a quantitative evaluation of the performance of the method, both in terms of parameter and quantile estimation, are considered here. The remainder of this paper is devoted to an exposition of the subject

218

S.R. DURRANS

estimation method. The next section addresses some properties of order statistics and is followed by a section outlining the procedure for deriving parameter estimates for the P3 distribution, Finally, results ot Monte Carlo '.:,tud~es performed tc~ assess the properties of the scheme are presented. DISTRIBUTIONS OF ORDER STATISTICS

Given a sample of n independent and identically distributed random variables from a population having cumulative distribution function F(x), tben the cumulative distribution function of the first-order statistic x m, i.e. the smallest element in the sample, may be shown to be Gt (x), where (David, 1981)

G:(x) = 1 - [1 - F(x)]"

(1)

Differentiating, the density function g~ (x) is g, (x) =

n[i - r(x)]"-'f(x)

(2)

where f(x) = dF(x)/dx is the density function of the population. To illustrate the potential attractiveness of the first-order statistic for parameter estimation it is convenient to consider an exponential population. This is a special case of the P3 distribution when the latter has a location parameter equal to zero and a positive coefficient of skewness equal to two. The density and cumulative distribution functions for an exponential population are given by

f(x) = ,te -~'x

(3)

and

F(x) =

1 - e -~'x

(4)

These expressions, when substituted into (2), yield

g!(x) = n(e-~'x)"-12e -~x =

(5)

rl,~e -n'~x

It may be observed that the distribution of the first-order statistic is also exponential, with a mean of l/n2 and a variance of (l/n2) 2. In contrast, the variance of the sample mean is given by a2/n = 1/n22. In conclusion, the sampling variability of the first-order statistic is but a (small) fraction of that of the sample mean. This suggests that use of the first-order statistic in a parameter estimation context should be seriously considered. One should recognize that the reason for the small sampling variability in the first-order statistic as developed for the case of an exponential population

P A R A M E T E R ESTIMATION F O R P E A R S O N TYPE 3 DISTRIBUTION

219

derives frc,:n the fact that the population is bounded in the direction of that statistic. The largest-order statistic, for example, would not have such desirable properties. In any case, the lesson to be learned here is that the old adage pertaining to the excessive sampling variability of sample extremes is not necessarily true. When one is dealing with bounded distributions such as the exponential d~scussed above, it may be true that the sample extreme in the bounded direction is one of the most stable statistics available. As the P3 distribution, like the exponential, is bounded in one direction, it also should have desirable properties for one of its extreme-order statistics, at least over a subset of the feasible parameter space. To evaluate these properties it is valid to represent the P3 distribution by the two-parameter gamma (G2) distribution that has density function f(x)

=

~

r(fl)

(6)

x a-le-~x

where x > 0; ~ > 0 and fl > 0 are, respectively, scale and shape parameters; and F(.) denotes the (complete) gamma function. The expected value of the first-order statistic for a G2 population is representative of the difference between the corresponding statistic and the location parameter in a positively skewed P3 population. In the case of a negatively skewed P3 population, it is representative of the difference between the location parameter and the nth, or largest-order statistic. The rth non-central moment of the first-order statistic from a G2 population may be evaluated as oo

E

=

=

| xrgl (X) dx

(7)

0

where

gi(x)

= n 1

F(fl) 0

u~-Ie -~" du

~ x~-Ie -~x F(fl)

(8)

Existence of these moments, i.e. convergence of the integrals, is guaranteed by the existence of the corresponding moments of the G2 population (David, 1981). Results of numerical integration of (7) for various values of n and population properties are summarized in Figs. 1 and 2. It may be seen by inspection of these results that both the mean and variance of x(1) decrease with increasing n, increasing population coefficients of skewness, and decreasing population variance. The coefficient of variation of the first-order statistic tends to increase with increasing population coefficients of skewness and thus indicates that the mean ofx(l ) decreases faster than does the standard

220

S,R.

DURRANS

I0

~'~ -

.

V ~^oor,X6= (

o

~

~_

^

O

~

O'

0

1.00 7,'~

Yl = O. 50

o ,C.

Mar(X) = 0 . 2 5

L_O0 /

y, = 1.00 10-1

El×c,11

)'1 = I. 5 0

10-2

Yl = 2 . 0 0

10-:5

yl = 2 . 8 3

10-4. - -

I0

~

. . . .

I

I

I

50

t

'

-:

"

1

I00

Fig. 1. Expected value of the firsl-order statistic from a two-parameter gamma population. The four curves shown for each population coefficient of skewness y, correspond to population variances of 0.25, 0.50, 0.75, and 1.00.

deviation o f x w. For population coefficients of skewness less than or equal to two the coefficient of variation tends to decrease, though slowly, with increasing sample size, and for population coefficients of skewness in excess of two it appears to increase with increasing sample size. Finally, for any given n and population coefficient of skewness, the coefficient of variation appears to be independent of the population variance. For ready application of these results, an approximation may be derived to relate E [x(,)] to the sample size and population variance and coefficient of skewness. Using techniques of linear regression, the developed relationship, applicable to sample sizes larger than 10 and smaller than about 50 or 60, and

PARAMETER ESTIMATION FOR PEARSON TYPE 3 DISTRIBUTION

221

i0-1, ~ ~ o

,o-2~'-~-~--~,~oo~-.~.~

Vor[X{I)]

"x

10-6

__.,.+~_~~ .

10-7 la} IO

50

r

°

2

I00

~

1o-3 va~lxii)j

Io -4

IO(b) n Fig. 2. Variance of the first-order ~tatistic from a two-parameter gamma population with a variance of (a) 0.25, (b) 0.50, (c) 0.75, (d) 1.00.

222

s.R DURRANS

-\

VatIX(I)]

(c)

10-710 -

-

-

,

....

t

,

510

~--

i0-1

10-2

10-3 Vat[X(I)l 10-4

i0-~

tO-6

10-7

(d)

" ~ n

Fig. 2. Continued.

~O0

a

223

TABLE 1 Regression coefficients

Coefficient of skewness, ~'1

A(Ti )

B(yl)

r 2 of regression

0.50 1.00 1.50 2.00 a 2.83

3.454 1.589 1.106 1.000 1.446

0.123 0.315 0.606 1.000 1.917

0.92 0.97 1.00 1.00

a Values shown for 71 = 2 were analytically derived from an exponential population. to population coefficients of skewness in the range from about 0.5 to 2.83, is of the form

E[x(,)]

=

A(y,)a,

(9)

HB(~'t )

where a,, is the standard deviation of the population. Values of the regression coefficients A(yl) and B(TI) as functions of the population coefficient of skewness are given in Table 1. A p p r o x i m a t i n g functions for the regression coefficients m a y be taken as

) -- - 0 . 5 3 5 ( y t - 2) 3 + 0.564(y~ - 2) 2 + 0.429(yt - 2) + 1

(10)

and =

0.224(~i - 2) 2 + 0.917(y~ - 2) + 1

(11)

Coefficients of determination for these relationships are, respectively, r 2 = 0.99 and r 2 = 1.00. It should be noted also that (9) is exact when ~,,i = 2, i.e. when the population is exponential. In summary, the utility o f the first-order statistic for parameter estimation is greatest when the sample size is large, the population variance is small, the population coefficient o f skewness is large, o~ any combination of these. Even t h o u g h the coefficient o f variation of the first-order statistic tends to increase with population skewness, its variance becomes sufficiently small that it may be applied with a high degree o f confidence. PARAMETER ESTIMATION

t h e density function of the P3 distribution may be written in the form

f(x)

=

(12)

224

s.R. DURRANS

where 0 is a location parameter, ~ ~- 0 is a scale parameter, and fl > 0 is a shape parameter. W h e n ~ < 0 the distribution is negatively skewed and the range of X is - o o < x < 0. Conversely, when ~ > 0 the distribution is positively skewed and the range is 0 < x < co. Some algebraic manipulation o f the well-known expressions for the m o m e n t s o f a P3 population yields fl

=

(13)

and o~ =

#-0

(14)

a2

~vhere ~ and 0.2 are the m e a n and variance o f the population. The latter expression is reminiscent o f Pearson's measure of skewness but involves the location parameter and variance rather than the m o d e and standard deviation. c,_:. . . . . .~,~,,~,,,. . . . . , . . .,,~,u,.~ . I. . . . . ~.,-, . t• = I, 2 . . . n . .~°';'*'~"~,-~ ,.,,w.., . . . . . . . . . . . /~ and ~2 for the m e a n and variance may be written in the usual m a n n e r as

'i

"~=

n

xi

(15)

i=~

and 62

_

1

~, ( x i -

n -

1,-=1

/~)2

(16)

In cases where the population is positively skewed, an initial estimate 00 o f the location parameter may be taken as equal to the first-order statistic, i.e. as the smallest data value: 00 =

x(i)

(17)

A n initiai estimate ~0 of the sca!e parameter may then be formed from (16) and that resulL wbep substituted into (15), yields an initial estimate ~0 o f the shape parameter. Finally, an initial estimate (~1)0 o f the coefficient o f skewness may be obtained as 2 =

I&l

(18)

As the initial estimate o f the location parameter is taken as the smallest data value, results obtained for the initial estimate o f the coeffÉcient o f skewness will be biased. A n updated estimate o f 0 may be found as Oi

=

x(i) -

E[x(i) - 0]l[n, 62, (71)i-i]

(19)

PARAMETER ESTIMATION FOR PEARSON TYPE 3 DISTRIBUTION

225

where the second term on the right-hand side of this expression denotes the expected difference between the first-order statistic and the location parameter given the sample size and estimates of the variance and coefficient of skewness of the population. An approximation to this difference is given by (9) with substitution of estimators for their population counterparts. This updated estimate of the location parameter may then be used to refine the estimate of the coefficient of skewness. Repetition of the procedure may be performed until changes in estimates of the coefficient of skewness are judged to be sufficiently small. When one is dealing with negatively skewed populations, two options are available for parameter estimation. The first is to invert the sample with respect to its mean and to compute the parameters as if the population were actually positively skewed. In this situation the final estimate of the population coefficient of skewness would be taken as - ~ and the final estimate of the scale parameter would be taken as - &. A second option is to develop an initial estimate of the location parameter as 00 = .v~,,~and to repeat the procedure outlined above. Refined estimates of the location parameter and thus of the coefficient of skewness would then be developed as Or =

x~,,~ + E [ O -

x~,,~]l[n, 6"2, (~1)~-i]

(20)

Numerically, the last term in this expression is identical to the corresponding term in (19). MONTE CARLO RESULTS

The performance of a parameter estimation scheme relative to competing or alternative schemes may be evaluated by generation of samples from known populations and comparison of parameter estimates derived from those samples. This so-called Monte Carlo experimental approach has been applied in the present study to compare the subject estimation scheme with the traditional method of moments approach as recommended by the WRC (1981). As these methods differ only in the way in which estimates of the coefficient of skewness are derived, results of the experiments are summarized ~or that statistic only. Some comments are also made, however, with respect to estimates of the parameters ~, fl and 0. The experimental design used consisted of synthetic generation of 500 samples of size n from known P3 (actually three-parameter gamma) populations. The resulting 500 estimates of each ~:~fthe parameters 0~, fl, and 0 and each of the statistics #, tr-'~and 7~ based on each estimation method were then used to approximate the means and variances of the respective sampling distributions. Sample sizes were n = 20(20)80, population coefficients of

.~/~,

SR.

DURRANS

variation were Cv = 0.1 (0.1) 1.0, and population coefficients of skewness were 0.50, 1,00, 1.41, and 2.00. These values were selected because they correspond to integral values of the shape parameter and theretbre simplify generation of the synthetic samples. Population coefficients of skewness equal to zero w~,'e also used by generation of normally distributed samples. Estimates of the mean and variance for each sample were obtair, ed using (15) and (16) and, in the case of the WRC method, estimates of the coefficient of skewness were obtained as

+ (n

--

I) 0l

--

2) 6.3 i ~ =1

(x,

(21)

Given these sample estimates of the moments, estimates of the parameters of the distribution were obtained by equating the sample moments to the population moments. For the case of the order statistics method, estimates of the coefficient of skewness and parameter values were obtained by application of the procedure outlined in the previous section except that updating of estimates as indicated in (19) was not performed. Strictly speaking, and as noted in the Ir~troduction, estimation of the coefficient of skewness as recommended by the WRC involves weighting of the sample estimate and a regional value. As the results of the study presented here are based on synthetically generated samples, however, weighting of sample estimates is not a practical alternative. This tends to bias the results unfairly in favor of the subject estimation method when exogenous (e.g. regional) information can be used to improve estimates which are based on (21). In applications where such information is not available, however, the comparisons may be considered to be fair. Results of the experiments for the coefficient of skewness are shown in Figs. 3 and 4. The population coefficient of variation is not included as a parameter in either of those figures as the results were found to be insensitive to it for all practical purposes. E×perimental~ results obtained for the case of the WRC method are consistent with previously published data. The curves illustrated for that method in Fig. 3 are rneretv a representation of the figures provided in Table 1 of Bulletin 17B (WRC, 19~J,~. ('~,~ves shown for the WRC method in Fig. 4 have been developed using the relationship !12 ~, =

(11 -

1) (n -

7~ 2) ~ ( G )

(22)

where values of ~(G) have been tabv.lated by Wallis et al. (1974, Table 4) based on extensive experiments. The remaining terms in this expression are included to make the results of Wallis et al. consistent with (21).

227

PARAMETER ESTIMATION FOR PEARSON TYPE 3 DISTRIBUTION

0.8

WCMeho/

0.6

Vat (-~i)

0.4

Method n : 20

0.2

40

~

Or

'

B r

0

1~ I I

ao ,

s 2

r~ Fig. 3. Variance of estimated coefficients o f skewness.

As indicated by Fig. 3, the sampling variance of estimates of the coefficient of skewness decreases with increasing sample size (as one would expect) al~ld increases with increases in the population coefficient of skewness. The rate of increase is much slower, however, in the case of estimates based en the use of the first-order statistic. Also evident from Fig. 3 is that, at least over the parameter space evaluated, the order statistics method is uniformly more efficient than is the WRC method. The sampling variability of estimates obtained using the subject method is, for identical sample sizes and population properties, roughly 20% of that for the WRC method. The WRC method would require a sample size about five times as large as that required by the suggested approach to gain the same amount of information. Given the indication by Yevjevich (1972, p. 180) that the method of maximum likelihood performs as well as the method of moments does with a sample size about 2.5 times larger, the order statistics scheme may dominate that procedure also, at least with respect to estimation of the coefficient of skewness. One should expect, however, that this domination may be true only over a subset of the

228

S.R. DURRANS

n : B0 60 40

2.

Order Statistics Method

WRC Method

I

iF,

j i

0

I

I

,

I

2

)¢1

Fig. 4. Expected value of estimated coefficients of skewness.

parameter space and/or !br the case of small sample size~. Exploration of these limits has not been performed. Figure 4 shows that absolute biases in estimated coefficients of skewness decrease with increasing sample sizes and therefore that both methods appear to be consistent. The WRC method tends to underestimate the population coefficient of skewness a~d the order statistics method results in overestimates of that statistic. Absolute biases increase with increases in the population coefficient of skewness for the WRC method and decrease with increases in the population skewness for the order statistics method. When the population coefficient of skewness is near or larger than two, skewness estimates based on the first-order statistic are, practically speaking, unbiased. For reasons of symmetry, and noting that the results presented are for positive population coeF~cients of skewness, biases for negative population coefficients of skewness are equal to the negative of those illustrated in Fig. 4.

PARAMETER ESTIMATION FOR PEARSON TYPE 3 DISTRIBUTION

229

Recalling that the Monte Carlo experiments were performed without any updating of the estimates as expressed in (19), one should expect that biases in the estimates may be significantly reduced. An alternative means of bias correction to that already given may be accomplished directly for estimates of the coefficient of skewness either by use of Fig. 4 or by use of a relationship which might be derived therefrom. Bias correction in this manner should be expected to be reliable because of the small sampling varia.bility of the estimates. It is to be noted, however, that this method of bias correction is based on the assumption that the relationship E(~ I)'~) may be inverted to deduce E(7~ I~1). Lall and Beard (1982), using Bayesian techniques, demonstrated that such an approach may not be appropriate. Further research should address this issue. In the interest of conciseness, results of the Monte Carlo experiments with respect to c~timates of the parameters ~,/L and 0 are described only briefly. The sampling v~riabilities of these parameter estimates based on the order statistics approach tend to be several orders of magnitude smaller than those developed using the method recommended by the WRC. This is especially true for the case of the shape parameter. With respect to biases in the parameter estimates, the order statistics approach again outperforms the WRC method, even when the population coefficient of skewness is sma'?. The degree of improvement, however, is not as large as it is for the case of sampling variabilities. QUANTILE ESTIMATION

An ultimate objective of frequency analysis is one of prediction of quantiles, .often beyond the range of available data. It does not necessarily follow, however, that relatively poor parameter or moment estimate~ result in correspondingly poor quantile estimates. Correlations between parameter estimates may be such that reasonably good quantile estimates may be obtained in spite of the parameter estimates themselves. It i~ desirable, of course, that both parameter and quantile estimates have usef~l properties but this is not always obtainable. To evaluate the performance of the suggested estimation procedure relative to that obtainable by application of the WRC procedure, an additional set of Monte Carlo experiments was performed. The only difference between these experiments and tho~.;e summarized earlier is that population coefficients of skewness equal to zero were not considered. Quantile estimates -?~00at the 0.01 exceedance probability level were made by each of the two estimation approaches, using tt~:: relationship :~'loo =

f~ + Kioo?)

/

(23)

S.R. DURRANS

230

where/~ and b are the sample mean and sample standard deviation and K~00, a function of the estimated coefficient of skewness, is a standard deviate. Values of K for various exceedance probability levels and for coefficients of skewness from - 9 to + 9 are tabulated in Appendix 3 of Bulletin 17B (WRC, 1981). Estimates of the coefficient of skewness for the WRC approach were obtained using (21). In the case of the order statistics approach, estimates of that statistic were obtained by correcting for bias using Fig. 4 rather than by updating of location parameter estimates as embodied in (19). Choice of the 0.01 exceedance probability level quantile as :;~e criterion for comparing the two estimation methods is somewhat arbitrary. It has been chosen, however, because of the ubiquity of adoption of the 100-year flood event as a design criterion for many types of hydraulic structures. As comparisons are made for that quantile only, the results should be interpreted as such. Either supporting or conflicting results may be obtained for other quantiles. Results of the experiments with respect to the variability of estimates of xm0

I.I

1.0

Var (~100) Order

0.9

Vor (~100) WRC

0.8

0.7"

I

I

I0

I

I

50 n

Fig. 5. Variance of quantile estimates at the 0.01 exceedance probability level.

i

,

i

i

I

I00

PARAMETER ESTIMATION FOR PEARSON TYPE 3 DISTRIBUTION

231

are illustrated in Fig. 5 and indicate that the order statistics approach is more efficient than is the WRC approach when sample sizes are greater than abo~,~ 20. Absolute biases in estimates by the order statistics approach are also smaller (typically about 50%) than those obtained by the WRC procedure. CONCLUSION

This paper has summarized a parameter estimation scheme for the Pearson type 3 distribution which is based on properties of extreme-order statistics. The performance of this approach has been found to be superior to the more traditional method of moments approach recommended by the WRC, both in terms of estimates of parameters and the coefficient of skewness, and in terms of estimation of the 0.01 exceedance probability quantile. Although these results are encouraging, additional study should be performed. The evaluations made here have been based on synthetically generated data sets. The more subtle problems of parameter estimation when outliers, zeroes, historical data, or even population mixtures must be considered are deserving of further attention. Additional work should be directed to assessment of the robustness (or possibly lack thereof) of the suggested estimation scheme. The results presented here are based on prior knowledge that the population is truly Pearson type 3. In actual applications one never knows with confidence that this is true. Finally, and with respect to an issue that has received little attention in hydrology, correlations between parameter estimators should be evaluated. Other things being equal, estimators that have the smallest correlations with one another should be adopted for application. ACKNOWLEDGMENTS

Comments and suggestions made on draft versions of this paper by Dr. William L. Lane of the US Bureau of Reclamation and by a reviewer are gratefully acknowledged. REFERENCES Arora, K. and Singh, V.P., 1988. On the method of maximum likelihood estimation for the log Pearson type 3 distribution. J. Stochastic Hydrol. Hydraul., 2: 155-160. Bain, L.J., 1978. Statistical Analysis of Reliability and Life-Testing Models: Theory and Methods. Marcel Dekker, New York. Benson, M.A., 1968 Uniform flood frequency estimating methods for federal agencies. Water Resour. Res., 4: 891-908. Bobee, B., 1975. The log Pearson type 3 distribution and its application in hydrology. Water Resour. Res., 1!: 681-689.

232

S.R. DURRANS

Bobee, B., 1979. Comment on "The log Pearson type 3 distribution: the T-year event and its asymptotic standard error by maximum likelihood theory" by R Condie. Water Resour. Res., 15: 189. Bobee, B. and Ashkar, F., 1988. Generalized method of moments applied to LP3 distribution. ASCE J. Hydraul. Eng., 114: 899-909. Bowman, K.O. and Shenton, L.R., 1988. Properties of Estimators for the Gamma Distribution. Marcel Dekker, New York. David, H.A., 1981. Order Statistics, 2nd edn. John Wiley, New York. Greenwood, J.A., Landwehr, J.M., Matalas, N.C. and Wallis, J.R., 1979. Probability weighted moments: definition and relation to parameters of several distributions expressable in inverse form. Water Resour. Res., 15: 1049-1054. Hosking, J.R.M., 1989. L-Moments: analysis and estimation of distributions using linear combinations of order statistics. J. R. Statist. Soc., Ser. B, 51. Kirby, W., 1974. Algebraic boundedness of sample statistics. Water Resour. Res., 10: 220-222. Kite, G.W., 1977. Frequency and Risk Analyses in Hydrology. Water Resources Publications, Littleton, CO. Lall, U. and Beard, L.R., 1982. Estimation of Pearson type 3 moments. Water Resour. Res., 18: 1563-1569. Phien, H.N. and Hira, M.A., 1983. Log Pearson type 3 distribution: parameter estimation. J. Hydrol., 64: 25-37. Rao, D.V., 1980. Log Pearson type 3 distribution: method of mixed moments. ASCE J. Hydraul. Div., 106: 999-1019. Rao, D.V., 1986. Fitting log Pearson type 3 distribution by maximum likelihood. International Symposium on Flood Frequency and Risk Analyses, Louisiana State University, Baton Rouge. Singh, V.P. and Singh, K., 1985. Derivation of the Pearson type III distribuUen by using the principle of maximum entropy. J. Hydrol., 80: 197-214. Wallis, J.R., Matalas, N.C. and Slack, J.R., 1974. Just a moment! Water Resour. Res., 10: 211-219. WRC, 1981. Guidelines for Determining Flood Flow Frequency. Bull. 17B HydrOogy Committee, US Water Resour. Council, Washington, DC. Yevjevich, V., 1972. Probability and Statistics in Hydrology. Water Resources Publications, Littleton, CO.