Journal of
Hydrology Journal of Hydrology 185 0996) 317-333
ELSEVIER
Uncertainty analysis of flood quantile estimates with reference to Tanzania S.H. M k h a n d i , R.K. K a c h r o o * , S.L. G u o 1 Water Resources Engineering Programme, University of Dar es Salaam, P.O. Box 35131, Dar es Salaam, Tanzania
Received 1 October 1994; accepted 26 September 1995
Abstract
A procedure based on a combination of descriptive and predictive ability tests, adopted for the selection of a suitable flood frequency model for use in Tanzania, is described. Based on this study the pooled annual maximum data of 53 stations in Tanzania can be equally well described by Pearson Type 3, log logistic or the general extreme value distributions, and the preferred method of parameter estimation for all three distributions is the method of probability weighted moments. A technique based on non-parametric fitting of distribution functions is suggested for systematic calculation of safety factors.
1. Introduction
Estimation of design floods involves selecting a frequency distribution and estimating parameters of this distribution from observed records of flood flows at the site of interest. To date, many flood frequency models have been suggested but none has been accepted universally. To achieve some degree of uniformity in the determination of flood quantiles some countries have agreed to adopt a certain distribution function. For example, the Log Pearson Type 3 was recommended by the US Water Resources Council (USWRC) in 1967 for use in the U S A (USWRC, 1967), and the general extreme value distribution was suggested by the Institute of Hydrology, UK, for use in the UK and Ireland (NERC, 1975). Floods estimated for any given return period are usually subject to errors. Those arising from incorrect choice of a model (i.e. a combination of parent distribution and a method of * Corresponding author. ~On leave from the Department of Hydrology and Water Environment, Wuhan University of Hydraulic and Electric Engineering, Wuhan, 430072, People's Republic of China. 0022-1694/96/$09.50 © 1996 - Elsevier Science B.V. All fights reserved SSDI 0022-1694(95)02955-9
318
S.H. Mkhandi et al./Journal of Hydrology 185 (1996) 317-333
parameter estimation) are referred to as model errors and those arising from inaccurately measured data are referred to as data errors. If the data errors are random and symmetric about zero mean then their effect cannot be distinguished from the effect of the model errors. To adjust for the effect of errors it is a common practice to multiply the estimated quantile by an empirical constant, known as the safety factor, to arrive at the finally acceptable flood estimate. This paper suggests a technique, based on non-parametric fitting of probability density functions, for systematic calculation of such factors.
2. Choice of flood frequency models for Tanzania In recent years emphasis in flood studies has changed from identifying the underlying population distribution to choosing a robust flood estimation procedure. To quote Cunnane (1989): "In the past (e.g. Benson, 1968, NERC, 1975), it was generally assumed that once a distributional choice was made according to a goodness of fit criterion that a satisfactory basis for flood estimation was established. Such an approach assumes that a single, as yet unidentified, underlying distributional form is adequate for modelling AM [annual maximum] flood peaks. Even if this assumption were true it has to be accepted that the true underlying distributional form cannot be identified with certainty at the present time, either on a single-site or regional basis. This fact should be taken into account when selecting a distribution. In doing so a distribution and an associated method of parameter estimation (denoted D/E for 'distribution and estimation procedure') must be sought which is robust with respect to extreme upper quantile estimation, over a reasonable range of distributions, random samples from which have statistical characteristics similar to observed AM flood data. Such a reasonable range of distributions will be termed 'flood-like', following Landwehr et al. (1980)."
2.1. Flood frequency models In a simulation experiment carried out at the University of Dares Salaam to choose suitable flood frequency models for countries of Southern Africa, ten statistical distributions, i.e. normal (NM), exponential (EXP), extreme value Type 1 (EV1), lognormal (LN), Pearson Type 3 (P3), log Pearson type 3 (LP3), two-parameter Gamma (G2), general extreme value (GEV), log logistic (LLG) and Wakeby (WK), and three parameter estimation methods, i.e. method of moments (MOM), the method of maximum likelihood (ML) and the method of probability weighted moments (PWM) were considered for evaluation. Combinations of distribution form and parameter estimation method were limited to 15 models. These are EV1-MOM, EV1-ML, EV1-PWM, LN-MOM, P3-MOM, P 3 PWM, LP3-MOM, G2-MOM, G2-ML, G2-PWM, GEV-PWM, GEV-MOM, L L G PWM, WK4-PWM and WK5-PWM. In the descriptive and the predictive ability tests, described in the following sections, these 15 models were considered for evaluation of the most robust flood frequency model for use in Tanzania.
319
S.H. Mkhandi et al./Journal of Hydrology 185 (1996) 317-333 2.2. Descriptive ability test
Following Rossi et al. (1984), the observed sampling distribution coefficient of variance (Cv), coefficient of skewness (Cs) and coefficient of kurtosis (Ck) from 53 Tanzanian annual maximum series of 15 years average length were used as a criterion to be reproduced by similar sized data sets drawn randomly from various assumed parent distributions. Ten parent distributions, mentioned above, were chosen for deriving the simulated sampling distributions of Cv, Cs and Ck values for Tanzania. The normal distribution was also added for the sake of comparison. The parent series were generated using the statistics of the pooled data of 53 chosen stations from Tanzania. The location of the chosen stations is shown in Fig. 1. From Fig. 2 it can be seen that the sampling distribution of Cv from the observed records is close to the mean sampling distribution of Cv from data sets drawn randomly from LLG and GEV distributions. Fig. 3 shows that many distributions including LLG and GEV can 0.5 °
S
5.5"
S
OJ "1:3 -3 4-J
.....,,.'.,
4-J ¢0 i
10.5
°
S
15.5
°
S
, 28 °
E
I 33 °
,
E
i 38 °
,
E
43 °
E
Longitude Fig. 1. Locations of 53 stations from Tanzania used to evaluate the performance of different estimation procedures.
320
S.H. Mkhandi et aL/Journal of Hydrology 185 (1996) 317-333 .6" -
Legend:
"
I Oata
7 EXP
.
2 LLG
8 EV!
1.2-
3 GEv
9 N~
1.0-
5 P3
1.4-
aJ
/
I
/
/
~
1 1 LP3
~
2
/ /
/
4
.& i
.6
c..)
.4
.2
, .0 -2.0
I , -1.0
I 0.0
,
I 1.0
,
I 2.0
,
I 3.0
EV1 R e d u ce d
,
I 4.0
,
variate,
I 5.0
yt
Fig. 2. Simulated distribution of Cv values for Tanzania's AM flood data by ten parent populations.
reproduce the sampling distribution of Cs comparable with the observed distribution of Cs. From these two figures it seems that LLG and GEV are the most reasonable contestants for choices as parent distribution for flood flows in Tanzania. 2.3. Predictive ability test The simulation experiment done at the University of Dar es Salaam, referred to above, was designed to investigate how well a distribution and its associated method of parameter estimation, jointly known as a D/E procedure, can estimate the Q - T relationship when the population distribution is not identical to that of the proposed distribution. This experiment 4.0"
Legend: 1 LLG
7 LN
3.0'
2 6EV 3 EXP 4 Data
8 EV1 9 G2 ,0 LP3
~ WAK
11NR
.
2.0 .
i ~
~
~
~
/ ~ 1
2
~ / 0
3 4 5 6 7 + 1,
I
1.0I
03 (J
.(>
- 1 .0 -2.0 -2.0
-1.0
0.0
1.0
2.0
3.0
EV~ R e d u c e d
4.0
vapiate,
5.0
yt
Fig. 3. Simulated distribution of Cs values for Tanzania's AM flood data by ten parent populations.
0.1 -0.4 1.5 -1.7
0.0 -1.8 1.9 -2.3
0.2 -0.2 0.3 2.2
-0.6 1.1 - I. I 1.7
-0.4 -1.0 1.3 -2.1
0.25
LP3/MOM G2/PWM LN/MOM P3/MOM
LP3/MOM LN/MOM EV1/MOM P3/MOM
EV1/MOM LP3/MOM EV1/M P3/PWM
LP3/MOM EV1/PWM EV1/MOM P3/PWM
EV1/PWM LP3/MOM P3/PWM EV1/MOM
Cs/Cv
.5
1.0
1.5
2.0
2.5
LP3/MOM P3/PWM LN/MOM GEV/PWM
LP3/MOM P3/PWM GEV/PWM LN/MOM
P3/PWM LP3/MOM EV1/PWM EV1/MOM
EV1/PWM P3/PWM LP3/MOM EV1/MOM
G2/MOM G2/ML G2/PWM LP3/MOM
0.50
0.4 -1.3 2.0 3.8
0.7 -0.8 3.8 4.9
-0.1 0.6 -2.6 -3.3
0.3 0.8 -0.8 -1.0
-0.1 0.1 1.4 -1.9
LP3/MOM P3/PWM GEV/PWM G2/MOM
LP3/MOM P3/PWM GEV/PWM G2/MOM
P3/PWM LP3/MOM G2/PWM G2/MOM
P3/PWM G2/PWM G2/M G2/MOM
G2/MOM LP3/MOM G2/ML EV1/PWM
0.75
0.5 -2.7 5.1 -6.8
0.7 -2.1 5.2 -5.4
-1.3 -2.6 -3.1 -3.4
0.1 -0.4 -1.2 -1.5
0.8 -1.3 1.4 -1.7
P3/PWM LP3/MOM G2/MOM G2/ML
LP3/MOM P3/PWM G2/MOM G2/PWM
LP3/MOM P3/PWM G2/PWM G2/MOM
G2/PWM P3/PWM G2/M LP3/MOM
G2/MOM G2/ML P3/PWM LP3/MOM
1.00
Table 1 Bias indicators of different models for return period, T - 100 years and sample size, N - 30.
-2.9 5.0 -5.6 -6.7
-0.8 -2.4 -4.1 -4.1
0.2 -1.0 -2.1 -2.6
0.0 0.3 -0.8 0.8
1.0 1.5 1.6 1.6
LP3/MOM P3/PWM G2/MOM G2/PWM
P3/PWM G2/MOM LP3/MOM G2/PWM
P3/PWM G2/PWM G2/MOM G2/ML
G2/M G2/MOM G2/PWM P3/PWM
G2/MOM G2/ML P3/PWM G2/PWM
1.25
.3 -2.5 -4.2 -4.7
-1.1 -2.8 3.0 -3.1
-0.1 -1.4 -1.7 -1.9
-0.3 -0.4 0.5 0.8
1.4 1.8 1.9 2.8
P3/PWM G2/MOM G2/PWM G2/ML
P3/PWM G2/MOM G2/PWM G2/M
P3/PWM G2/PWM G2/MOM G2/M
G2/MOM G2/ML G2/PWM P3/PWM
G2/MOM G2/ML P3/PWM G2/PWM
1.50
-.9 -2.7 -3.4 -3.5
-0.2 -1.9 -2.2 -2.6
0.4 -0.9 -1.1 -1.5
-0.1 -0.1 0.7 1.1
1.5 1.8 2.0 2.9
7.7 7.8 7.9 8.1
8.0 8.1 8.3 8.4
8.1 8.2 8.7 8.8
8.1 8.4 8.9 9.1
8.2 8.6 9.1 9.2
0.25
G2/PWM G2/ML G2/MOM EV1/PWM
G2/PWM EV1/ML G2/ML EV1/PWM
EV1/M G2/PWM EV1/PWM G2/M
EV1/M G2/PWM EV1/PWM LN/MOM
EV1/ML G2/PWM EV1/PWM LN/MOM
Cs/Cv
.5
1.0
1.5
2.0
2.5
EV1/ML EV1/PWM G2/PWM EV1/MOM
EV1/ML EV1/PWM G2/PWM EV1/MOM
EV1/M EV1/PWM G2/PWM EV1/MOM
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
0.50
11.0 13.0 14.2 15.3
11.0 12.8 13.9 14.6
10.9 12.6 13.7 13.9
11.0 12.2 13.2 13.4
11.0 11.9 12.5 13.0
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/M EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/M EV1/PWM EV1/MOM G2/PWM
0.75
12.5 15.2 18.5 18.6
12.6 15.2 17.9 18.5
12.5 14.8 17.1 18.1
12.6 14.6 16.4 17.6
12.4 14.1 15.6 16.9
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/M EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
1.00
Table 2 Se indicators of different models for return period, T - 100 years and sample size, N - 30.
13.5 16.7 21.0 22.3
13.6 16.6 20.4 22.1
13.6 16.4 19.6 21.4
13.4 15.8 18.6 20.4
13.4 15.5 17.9 19.8
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/M EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
1.25
14.2 17.7 23.0 25.1
14.0 17.4 22.2 24.2
14.1 17.2 21.4 23.6
14.0 16.8 20.6 22.8
13.9 16.3 19.6 21.9
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
EV1/ML EV1/PWM EV1/MOM G2/PWM
1.50
14.9 18.8 25.1 27.5
14.8 18.5 24.2 26.7
14.7 18.1 23.3 25.9
14.6 17.7 2Z3 25.0
14.4 17.1 21.3 23.9
toa t,J
11.2 11.2 11.8 12.4
12.2 12.3 13.0 13.4
G2/PWM EVI/MOM LN/MOM EV1/ML
EV1/MOM EV1/PWM G2/PWM EVI/ML
EV1/PWM EV1/MOM G2/PWM G2/ML
EV1/MOM EV1/PWM G2/PWM P3/PWM
1.0
1.5
2.0
2.5
10.1 10.2 10.3 10.7
8.9 9.3 9.5 9.6
G2/PWM G2/M G2/MOM EV1/MOM
0.5
8.3 8.5 8.5 9.7
0.25
Cs/Cv
EV1/PWM EV1/MOM EV1/ML G2/PWM
EV1/I'WM EV1/MOM EV1/ML G2/PWM
EV1/PWM EV1/ML EV1/MOM G2/PWM
EM1/ML EV1/PWM EV1/MOM G2/PWM
EV1/MOM EV1/PWM G2/MOM EV1/ML
0.50
16.2 17.2 18.0 18.4
14.7 15.9 15.9 16.8
13.3 13.5 14.5 15.1
11.9 12.7 13.5 13.9
13.9 14.2 14.5 14.6
EV1/PWM EV1/MOM G2/PWM G2/ML
EVI/PWM EV1/MOM G2/PWM G2/M
EV1/PWM EV1/MOM G2/PWM EV1/ML
EV1/PWM EV1/ML EV1/MOM G2/FWM
EV1/PWM EVI/MOM EV1/ML G2/MOM
0.75
20.8 21.6 21.9 22.8
19.2 20.3 20.4 21.4
17.5 18.7 19.2 19.4
16.7 17.6 17.6 19.1
17.4 17.5 18.3 19.4
G2/PWM EV1/PWM EV1/MOM G2/ML
EV1/PWM G2/PWM EVI,rMOM G2/M
EV1/PWM EV1/MOM G2/PWM G2/ML
EV1/PWM EV1/MOM G2/PWM G2/ML
EV1/MOM EV1/PWM GEV/MOM EV1/ML
1.00
Table 3 Rinse indicators of different models for return period, T - 100 years and sample size, N - 30.
24.7 25.1 25.4 26.1
23.4 23.6 24.0 24.7
21.6 22.4 23.0 23.7
22.8
20.4 21.0 22.6
20.8 21.0 22.4 22.9
G2/PWM EV1/MOM EV1/PWM G2/ML
EV1/PWM G2/PWM EV1/MOM G2/M
EV1/PWM EV1/MOM G2/PWM G2/ML
EV1/PWM EV1/MOM G2/PWM G2/ML
EV1/MOM EV1/PWM GEV/MOM P3/MOM
1.25
27.0 28.0 28.2 28.7
25.8 26.0 26.1 27.4
24.4 24.9 25.6 26.5
23.6 23.8 25.6 25.9
23.4 23.9 24.8 26.0
G2/PWM EV1/PWM EV1/MOM G2/ML
EV1/PWM EV1/MOM G2/PWM G2/M
EV1/PWM EV1/MOM G2/PWM G2/ML
EV1/PWM EV1/MOM G2/PWM GEV/IVlOM
EV1/MOM EV1/PWM GEV/MOM P3/MOM
1.50
29.4 29.5 29.6 31.3
28.1 28.4 28.7 30.3
26.9 27.2 28.3 29.3
26.0 26.1 28.1 28.3
25.7 26.2 27.0 28.3
t-
-3.1 8.0 17.5 -23.2
-.7 7.3 -15.5 -16.7
1.0 -7.1 -23.8 -28.6 -39.4
-3.1 -22.0 -26.8 -32.2
-6.6 -29.1 -31.6 -32.7
0.25
P3/PWM EV1/PWM LLG/PWM LN/MOM
EV1/ML LLG/PWM EV1/'PWM P3/PWM
LLG/PWM EV1/PWM P3/PWM EV1/PWM P3/PWM
LLG/PWM EV1/PWM P3/PWM GEV/PWM
LLG/PWM P3/PWM EV1/PWM GEV/PWM
Cs/Cv
0.5
1.0
1.5
2.0
2.5
LLG/PWM GEV/PWM P3/PWM Llq/MOM
LLG/PWM LNfMOM GEV/PWM P3/PWM
LN/MOM LLG/PWM EV1/PWM GEV/PWM
LLG/PWM EV1/PWM EVl/PWM EVl/ML
LLG/PWM EV1/PWM EV1/ML EV1/PWM
0.50
-19.5 -37,1 -42.3 -43.7
-17.0 -31.9 -36.8 -40.9
-8.8 -14.5 -36.3 -37.0
-9.2 -24.4 -32.9 -33.1
-1.4 -7.4 -14.8 -19.7
LN/MOM LLG/PWM GEV/PWM P3/PWM
LLG/PWM LN/MOM GEV/PWM P3/PWM
LLG/PWM GEV/PWM G2/PWM P3/PWM
LLG/PWM GEV/PWM G2/PWM G2/ML
LLG/PWM G2/PWM GEV/PWM G2/M
0.75
-10.2 -22.7 -35.6 -45.9
-20.4 29.5 -35.2 -44.9
-17.0 -34.4 -41.5 -42.5
-11.2 -32.0 -34.7 -38.1
-2.4 -25.5 -28.2 -30.0
LLG/PWM GEV/PWM LN/MOM P3/PWM
LLG/PWM GEV/PWM G2/PWM P3/PWM
LLG/PWM GEV/PWM G2/'PWM P3/PWM
LLG/PWM GEV/PWM G2/PWM G2/ML
LLG/PWM GEV/PM G2/PWM G2/M
1,00
-23.0 -32.2 43.7 -46.2
-21.0 -31.9 -44.2 -45.4
-15,8 -29.0 -40.0 -42.2
-9.6 -25.5 -34.5 -37.5
-2.0 -21.8 -27.4 -31.0
LLG/PWM GEV/PWM G2/PWM P3/PWM
LLG/PWM GEVfPWM G2/PWM P3/PWM
LLGfPWM GEV/PWM G2/PWM P3/PWM
LLG/PWM GEV/PWM G2/PWM G2/ML
LLG/PWM GEV/PWM G2/PWM G2/ML
1.25
Table 4 Expected probability of exceedanee indicators of different models for return period, T = 100 years and Sample size, N = 30.
-22.3 -28.9 -44.7 -45.0
-17.3 -25.7 -41.4 -41.9
-13.6 -23.6 -38.2 -39.7
-8.7 -21.0 -33.7 -36.6
-2.1 -17.6 -27.5 -30.9
LLG/PWM GEV/PWM P3/PWM G2/PWM
LLG/PWM GEV/PWM G2/PWM P3/PWM
LLG/PWM GEV/PWM G2/PWM
LLG/PWM GEV/PWM G2/PWM P3/PWM
LI.,G/PWM GEV/PWM G2/PWM G2/ML
1.50
-20.7 -26.0 -42.7 -43.1
-17.8 -24.2 -40.7 -40.8
-14.7 -22.4 -38.1
-11.0 -20.3 -34.4 -37.0
-4.8 -16.8 -28.6 -32.0 lal
S.H. Mkhandi et al./Journal of Hydrology 185 (1996) 317-333
325
involved generation of random samples from five different parent distributions (i.e. EV1, P3, G2, GEV and LN) and then estimation of quantiles using the 15 estimation procedures mentioned above, repeatedly for 1000 samples. The parameters for the parent distributions were derived from a range of populations with mean of 1.0, Cv s 0.25, 0.50 ..... 1.50 and Cs m 0.50, 1.00 ..... 2.50. Bias, root mean square error (rmse), standard error (SE) and expected probability of exceedance (PE) were used as indicators to test the predictive ability of the models. The expressions of these indicators are given by Eqs. (1)-(4): Bias(%) mE[(Qr - Qr)/Qr] × 100
^
rmse(%).-
2
1/2
( {E[(2r-Qr]2}l/2/Qr)× lOO
PE .. 1 - Fx(Qr)
(1)
(2)
(3)
(4)
where Qr, Qr are estimated and population quantiles, respectively, E(.) is the expected value, F~(.) is the distribution function and T is the return period. Eqs. (1)-(3) have been commonly used in the past in such studies (e.g. Landwehr et al., 1980). The average values of these indicators were obtained for the five assumed parent distributions. These were then used to compare the performance of different procedures for the given return periods (T .- 20, 50, 100 and 200) and sample sizes (n .. 20, 30 and 50). Full results of this work are not presented here but those parts of the results that are of immediate interest are presented in Tables 1-4. Table 1 presents the values of bias for various models for 100 years return period and sample size of 30. The values of bias are presented, together with the best four out of the 15 models used in the simulation. Tables 2 - 4 present similar results for the SE, the rinse and the expected PE. The results presented in Table 4, i.e. for the PE, are presented as differences between the estimated and correct return period in years. For example, a value of -3.1 indicates that the value of expected return period is 3.1 years under the specified return period. Based on these results, the following observations can be made for the case of pooled annual maximum data of 53 Tanzanian catchments which have an average value of Cv = 0. 6 and Cs l 0.8. The preferred procedure from consideration of bias is the P3-PWM or LP3-MOM. The preferred procedure from consideration of SE and the rinse is E V 1 PWM or EV1-ML, whereas the preferred procedure as far as expected PE is concerned is LLG-PWM, followed closely by GEV-PWM. Obviously, a three-parameter distribution is better able to tit the observed data than is a two-parameter distribution but its associated SE of quantiles is higher than that for a twoparameter distribution. The authors of this paper are of the opinion that in the selection of a flood frequency analysis model it would seem appropriate that the ability to fit the data,
326
S.H. Mkhandi et aL/Journal of Hydrology 185 (1996) 317-333
indicated by lower bias and smaller difference in the expected PE, should be given preference over lower SE of estimate if the latter is accompanied by poor fitting ability. Therefore, the recommendation of this study would be to couple the method of probability weighted moments (PWM) with LLG, P3 or GEV to model the Tanzanian annual maximum data.
3. Uncertainty analysis of flood quantile estimates Uncertainty in flood estimation can be caused by inaccurate measurements of the data known as data errors. Inaccuracy in measurement also includes error introduced by fitting and extrapolation of rating curves. Uncertainty is also caused by model errors which comprise incorrect estimation of the population parameters owing to sampling errors, possible incorrect choice of the population density function and possible incorrect choice of the parameter estimation procedure. A Monte Carlo type experimental study, described below, was carried out to investigate the extent to which the estimated flood quantiles might be affected by data and model errors. 3.1. Design o f experiment
Based on the results of the simulation experiment, it seems that LLG, GEV or P3 is the most likely parent distribution of Tanzanian AM flood data. These three distributions and EV1 were used as parent distributions and samples of size 10, 20, ..., 100 were generated. EV1 was included because it is the most preferred two-parameter distribution world-wide. The parent data were generated for unit mean and Cv and Cs equal to 0.6 and 0.8, respectively. These are the mean values of Cv and Cs of the 53 AM series of Tanzania. For a given parent distribution, sample size and return period, the simulation procedure comprised the following steps: •
Step 1. Generate a sample of size n from an assumed parent distribution, i.e. x(/), i - 1,...,n. • Step 2. Using four estimation procedures, namely, EV1-PWM, P3-PWM, G E V PWM and LLG-PWM, estimate quantiles Qr from the x(/) series. * Step 3. Repeat the procedure (Steps 1 and 2) 1000 times to arrive at 1000 quantile estimates of Qr(J), J ~ 1 ..... 1000. • Step 4. Estimate the probability density function (sampling distribution) of Qr(j) values obtained in Step 3. It was observed that skewness of the 1000 values of Qr(J) varied with the type of the parent distribution, estimation procedures and the sample size (Table 5). It is expected that the skewness of quantile estimates will increase as the population skewness increases. Exploratory data analysis of the Qr(J) values showed that it was not possible to assume a single suitable parametric form to describe the sampling distribution of Qr(j) values. As a result, the nonparametric method of fitting distribution functions was chosen because it is much
S.H. Mkhandi et aL/Journal of Hydrology 185 (1996) 317-333
327
Table 5 List of skewness of 100 years quantile estimates as a function of parent distributions, estimation procedures and sample sizes Parent
Model
Sample size 10
20
30
40
50
100
LLG
EV1-PWM P3--PWM GEV-PWM LLG-PWM
1.019 1.415 1.497 1.297
0.774 1.215 1.291 1.004
0.355 1.196 1.228 1.045
0.426 0.860 0.664 0.958
0.376 0.673 1.010 0.755
0.428 0.458 0.613 0.590
P3
EV1-PWM P3--PWM GEV-PWM LLG-PWM
0.460 0.694 0.920 0.849
0.191 0.570 0.730 0.620
0.335 0.477 0.588 0.298
0.196 0.445 0.579 0.523
0.268 0.571 0.458 0.368
0.373 0.352 0.344 0.418
GEV
EV1-PWM P3--PWM GEV-PWM LLG-PWM
0.593 0.936 0.925 0.945
0.304 0.639 0.715 0.577
0.247 0.551 0.571 0.401
0.236 0.436 0.474 0.388
0.057 0.489 0.350 0.341
0.087 0.141 0.312 0.283
EV1
EV1-PWM P3-PWM GEV-PWM LLG-PWM
0.707 1.130 1.166 1.089
0.519 0.955 0.886 0.646
0.427 0.643 0.877 0.644
0.363 0.560 0.607 0.674
0.249 0.572 0.497 0.503
0.113 0.259 0.541 0.314
more flexible in its fitting ability. The method described in Appendix A was found satisfactory. Step 5. Carry out the uncertainty and risk analysis as described in the next section. 3.2. Uncertainty or risk analysis
To assess the effect of model errors and data errors in quantile estimation, quantile values Or were obtained for specified probability of exceedance P(O.T) " 0.95, 0.90, 0.1 and 0.05 by using Eq. (A6) and Eq. (A7). The relative bias (RB) of Qr and the average flood quantile estimate E(Qr) are calculated by
RBp(t~r ) = { [ Q r - E ( Q T ) ] / E ( Q . r ) }
x 100
(5)
The magnitude of relative bias indicates the effect of model errors and data errors on the estimated quantiles for each sample considered. Fig. 4 plots relative bias of 100 years quantiles as a function of sample size only for the GEV population. These values were estimated by the four chosen estimation procedures, namely, EV1-PWM, P3-PWM, GEV-PWM and LLG-PWM models. The magnitude of uncertainty shows a tendency to decrease with increase in the sample size, and the GEV-PWM has the largest RB values whereas the EV1-PWM has the smallest RB values.
S.H. Mkhandi et aL/Journal of Hydrology 185
328 (a)
5040-
p=5% ~p=lO% p=90% /p=95%
20 1(}
-~.0-20-30,
I
,
20
0
(b)
60--
30
-40
(1996) 317-333
I
,
40
I
,
60
Sample Size.
I 80
,
I
5O403020--
~ p = 5 %
p=lO%
~0-J.0-20-30-40
-
0
100
20
40
60
Sample Size,
N
80
-
p=90%
100
N
(d)
80"-6O 4O
ae
20--
~ p = 5 %
-20
-6
, o
I
,
20 Sample
I , 40
I , so
Size.
I , I aO 100
N
• -
-40 0
I
,
,
20
40
Sample
i
60
Size,
80
,
p=90% p=95% Ii
100
N
Fig. 4. Relative bias of 100 years quantile estimates as a function of sample size for GEV population. (a)
EV1-PWM; Co) P3---PWM; (c) GEV-PWM; (d) LLG-PWM.
If safety factor is defined as the value of 95% confidence limit of the sampling distribution of Qr(J), which corresponds to exceedance probability P(Qr) " 0.05 or risk of 5%, then the safety factor (SF) ran be calculated by SF,- 1.0 + RB(%)etOr ).0.05
(6)
Table 6 summarizes the safety factor values of 100 years design flood for different parent distributions, frequency models and sample sizes. It can be seen from the table that the EV1-PWM has the smallest SF values compared with the other three models, and the GEV-PWM has the largest SF values among the four models used. The safety factors reduce as the sample size increases. As the true parent distribution of Tanzania's flood data is not known, it may be better to average the safety factors over four parent distributions for each estimation procedure and sample size. The average values are also presented in Table 6. These values, which depend on the model type and sample size, are suggested for use for estimating design floods in Tanzania. For example, let us assume the length of recorded data is 30 years and the
EV1-PWM P3-PWM GEV-PWM LLG-PWM EV1-PWM P3-PWM GEV-PWM LLG-PWM EV1-PWM P3-PWM GEV-PWM LLG-PWM EV1-PWM P3-PWM GEV-PWM LLG-PWM EV1-PWM P3-PWM GEV-PWM LLG-PWM
LLG
Mean
EV1
GEV
P3
Model
Parent
1.46 1.57 1.73 1.67 1.42 1.53 1.63 1.60 1.42 1.54 1.65 1.61 1.47 1.60 1.76 1.71 1.44 1.56 1.69 1.65
10
Sample size
Table 6 Safety factors of 100 years design floods
1.33 1.42 1.56 1.50 1.30 1.38 1.48 1.42 1.30 1.41 1.46 1.42 1.31 1.47 1.53 1.50 1.31 1.42 1.51 1.46
20 1.26 1.37 1.46 1.46 1.22 •.32 1.37 1.36 1.24 1.31 1.39 1.36 1.28 1.37 1.47 1.42 1.25 1.34 1.42 1.40
30 1.21 1.32 1.40 1.39 1.19 •.27 1.33 1.32 1.21 1.27 1.32 1.30 1.24 1.32 1.39 1.38 1.21 1.30 1.36 1.35
40 1.20 1.28 1.36 1.34 1.17 1.25 1.29 1.26 1.18 1.26 1.30 1.27 1.20 1.29 1.34 1.31 1.19 1.27 1.32 1.30
50 1.18 1.27 1.30 1.30 1.17 1.22 1.27 1.25 1.16 1.23 1.26 1.25 1.19 1.26 1.30 1.27 1.18 1.25 1.28 1.27
60 1.17 1.25 1.29 1.28 1.15 1.21 1.24 1.23 1.16 1.21 1.26 1.24 1.18 1.24 1.28 1.26 1.16 1.23 1.27 1.25
70 1.15 1.23 1.28 1.26 1.15 1.20 1.23 1.21 1.15 1.20 1.23 1.22 1.17 1.23 1.27 1.24 1.15 1.22 1.25 1.23
80
1.15 1.21 1.27 1.25 1.14 1.19 1.22 1.20 1.15 1.19 1.22 1.20 1.16 1.22 1.26 1.23 1.15 1.20 1.24 1.22
90
1.15 1.21 1.26 1.23 1.13 1.18 1.19 1.19 1.14 1.17 1.22 1.20 1.15 1.20 1.25 1.22 1.14 1.19 1.23 1.21
100
-,9, '~ ~" ~ ,.-,
330
S.H. Mkhandi et al. ~Journal of Hydrology 185 (1996) 317-333
100 years flood quantile estimated by LLG-PWM is 200 m 3 s- 1; then from Table 6 the safety factor is equal to 1.40. Therefore the final design flood should be equal to 200 × 1.40, i.e. 280 m 3 s- 1.
4. Conclusions and suggestions Based on the results of descriptive and predictive ability tests, the most suitable flood frequency distributions for Tanzania seem to be log-logistic, general extreme value or Pearson Type 3 distribution. The method of probability weighted moments is considered as the best parameter estimation procedure compared with the method of moments and the method of maximum likelihood. Understanding of the effects of data error and model errors is of crucial importance in the estimation of design floods. It has been possible to study the effect of this combination of errors on quantile estimates by using simulation technique and by applying a nonparametric method of estimating the probability density function and consequently obtaining the exceedance probability of estimated quantiles. This technique has been useful in facilitating the uncertainty analysis of flood quantiles. The results obtained from this study indicate the magnitude of uncertainty that can be expected in the quantile estimates. It is noted here that magnitudes of uncertainties vary with model type and sample size for a particular site. The different safety factor values may be used in engineering design practice. The safety factors presented in Table 6 are suggested for use in Tanzania for 100 years design floods.
Acknowledgements The help of Dr. T.A.G. Gunasekara in the initial stages of this work is gratefully appreciated. The financial assistance provided by the Government of Ireland and general technical support by University College Galway, Ireland, to the Water Resources Engineering Programme at University of Dares Salaam is gratefully acknowledged.
Appendix A. Nonparametric density estimation of quantile estimates Nonparametric methods of fitting probability density functions have been suggested by many investigators for use in flood frequency analysis (Adamowski, 1985, 1989; Guo, 1991, 1993). These methods fit a suitable probability density function through a scatter of observed data and also allow realistic extrapolation without making any assumptions regarding the analytical form of the parent distribution. Obviously, these methods can also be used for fitting sampling distributions through a scatter of quantile estimates where the analytical form cannot be ascertained. For a given kernel function K0 and a given sample Qr(J), J = 1, 2 ..... 1000, the nonparametric kernel estimator of a probability density function at each fixed point Qr
S.H. Mkhandiet al./Journalof Hydrology185 (1996)317-333
331
is given by
](Or)= ~---h~-~ L where m is the sample size (m -- 1000 in this study), h is a constant smoothing factor and K 0 is the smoothing kernel function whose form has to be obtained. Guo (1993) suggested the use of Gumbel (EV1) kernel, which is given by
K(t)=exp(-t-e -t) for -2.5 <- t--- 8.0
(A2)
and zero elsewhere, where t = [ Q r - O~[j)]/h. The practical range of val=ues of t is between -2.5 and +8.0. The integral / K(t)dt equals 0.000005 at t -2.5 and 0.99967 at t = 8.0. J- ~ A maximum likelihood estimation procedure was used to estimate the smoothing factor h from the given sample (Guo, 1993). The maximum likelihood function is given by m
1.1~1 " K [ 0 r ( i ) h 0r(j.!]
m^^ maximize L(h)= I-If(Qr)= fl if1
i - 1 m-h "
(A3)
j¢i
and in terms of log likelihood, m
maximize log[L(h)]=i~.llOg
{mE 1} 2
~-~.K t
Qr(i)
h OT(j)
(A4)
-nlog(nh) Taking the partial derivative of Eq. (A4) with respect to h, equating it to zero and rearranging the terms gives m
m_ 2
i'1L
tK'(t)/ J
2 K(t) J-~
j¢ 1
(A5)
+nzO
j¢ 1
where
Kt(t) -- (1 - e- t)exp( - t - e -t) for -2.5 --< t --< 8.0 and zero elsewhere. Eq. (A5) has only one unknown parameter, h, which can be solved by using single-parameter optimization methods. The probability of exceedance of QT is given by integrating the estimated probability density function o~
P ( Q r ) - IOr](Qr)dQr" - 1~ j~-I Cj(Qr ) m
^
(A6)
S.H. Mkhandiet al./Journalof Hydrology 185 (1996)317-333
332
.60.48.36.24.12.00 0
1
3
4
6
8
Fig. 5. Nonparametric density estimation of sampling distribution of 100 years design floods (n - 10). where
For the EV1 kernel, Cj(Qr) can be calculated by
q(Or)-h
0r < 0r(/)- 2.5h
Cj(Qr)-h[1 - e x p ( - exp{ - [ Q r -Qr(J)]/h })]
Q r ( j ) - 2.5h -< Q r ~ Qr(J)+ 8.0h
q(0
)=0
0F > 0F(J) + 8.0h (A7)
For any value of Qr an exceedance probability value can be determined from Eq. (A6) and Eq. (A7). Similarly, for any assumed probability or risk P(Qr), the corresponding quantile value Qr can be calculated. Fig. 5 shows the nonparametric density estimation of sampling distribution of Qr(J) which were estimated by L L G - P W M from the G E V parent distribution with sample size ten and return period 100 years.
References Adamowski, K., 1985. Nonparametric kernel estimation of flood frequency. Water Resour. Res., 21(11): 15851590. Adamowski, K., 1989. A Monte Carlo comparison of parametric and nonparametric estimation of flood frequencies. J. Hydrol., 108: 295-308. Benson, M.A., 1968. Uniform flood frequency estimating methods for federal agencies. Water Resour. Res., 4(5): 891-908.
S.H. Mkhandi et al./Journal of Hydrology 185 (1996) 317-333
333
Cunnane, C., 1989. Statistical distribution for flood frequency analysis. WMO Operational Hydrology Rep. 33, World Meteorological Organization, Geneva, 72 pp. Guo, S.L., 1991. Nonparametric variable kernel estimation with historical flood and palaeoflood information. Water Resour. Res., 27(1): 91-98. Guo, S.L., 1993. Parametric and nonparametric mixture density estimation with historical and palaeoflood information. IAHS Publ., 212: 277-286. Landwehr, J.M., Matalas, N.C. and Wallis, LR., 1980. Ouantfle estimation with more or less flood-like distributions. Water Resour. Res., 16(3): 547-555. NERC, 1975. Wood Studies Report. Natural Environment Research Council, London. Rossi, F., Fiorentino, M. and Versace, P., 1984. Two component extreme value distribution for flood frequency analysis. Water Resour. Res., 20(7): 847-856. USWRC, 1967. A uniform technique for determining flood flow frequencies. Bull. 15, Hydrology Committee, US Water Resources Council, Washington, DC.