Stochastic point process modelling of rainfall. I. Single-site fitting and validation

Stochastic point process modelling of rainfall. I. Single-site fitting and validation

Journal Journal of Hydrology 175 (1996) 17-46 Stochastic point process modelling of rainfall. I. Single-site fitting and validation P.S.P. Cowpertwa...

2MB Sizes 0 Downloads 30 Views

Journal

Journal of Hydrology 175 (1996) 17-46

Stochastic point process modelling of rainfall. I. Single-site fitting and validation P.S.P. Cowpertwaitapby*,

P.E. O’Connell”,

A.V. Metcalfeb,

J.A. Mawdsley”*’

aDepartment of Civil Engineering, University of Newcastle upon Tyne, Newcastle upon Tyne NE1 7RU, UK bDepartment of Engineering Mathematics, University of Newcastle upon Tyne, Newcastle upon Tyne NE1 7RU, UK

Received 2 March 1995;accepted 30 March 1995

Abstract A Neyman-Scott clustered point process model for rainfall is developed for use in storm sewer rehabilitation studies in the UK, where predictions are needed of the frequency of system overloading for existing and upgraded conditions. In the first part of this two-part paper, a flexible model fitting procedure is presented which involves matching approximately a chosen set of historical rainfall statistics, which exceeds in number the set of parameters. In fitting the model to hourly data, it is found that wet and dry spell transition probabilities should be included in the chosen set of statistics rather than lag 1 autocorrelations, as they improve the model’s fit to the historical dry spell sequences. In fitting the model to daily data, estimates of the variances of sub-daily rainfall totals derived from regional regression relationships are used to ensure that sub-daily totals generated by the fitted model exhibit the desired statistical behaviour. A number of validation checks are carried out on simulated time series, which include visual comparisons with historical series, and comparisons of crossing properties and of the distributions of daily annual maximum rainfalls. Overall, the results support the use of the model for its intended application.

1. Intmduction

The stochastic modelling of rainfall time series has been an active research topic in hydrology for many years. One of the potential applications of a stochastic model is in estimating the frequency of occurrence of some critical event generated by rainfall * Corresponding author. ’ Present address: National Rivers Authority, Northumbria Region, Newcastle upon Tyne NE1 7RU, UK. 0022-1694/96/$15.000 1996 - Elsevier Science B.V. All rights reserved SSDZ 0022-1694(95)02865-X

18

P.S.P. Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46

(e.g. flooding). However, as a rainfall event of a particular frequency does not necessarily produce a discharge peak of the same frequency (owing, for example, to the influence of soil moisture in catchment response), a mathematical model of the response of the system to rainfall is required to derive the magnitude frequency relationship for the quantity of interest. As this problem cannot, in general, be solved analytically, a Monte Carlo simulation approach must be adopted in which a stochastic model of rainfall is used to generate a long synthetic input series to the mathematical model; the required magnitude frequency relationship can then be estimated from the derived synthetic output series. The rainfall time series modelling described in this two-part paper was undertaken as part of the UK Urban Pollution Management Research Programme which is executed by the Water Research Centre, Swindon. In the UK, a major storm sewer rehabilitation programme has been under way for several years (WRc Engineering, 1986), as many storm sewer systems in the major cities and towns have become overloaded, leading to urban flooding and river pollution problems from increased storm sewer overflow discharges. To identify those measures needed to reduce the frequencies of flooding and/or storm sewer overflows (SSO) to acceptable standards, a mathematical model of the sewer network is required, and an appropriate description of the rainfall input. Traditionally, drainage engineers have used design rainfall storms with prescribed return periods to design new sewer systems (Arnell et al., 1984). However, when upgrading an existing system, it is necessary to estimate the magnitude and frequency of SSO discharges, and their impacts on river water quality, for both the existing and upgraded system. SSOs may operate many times each year, and the impact of a particular SSO discharge on a receiving watercourse will be worse if the storm follows a dry period, when low river flows offer reduced dilution and pollutant concentrations are high within the sewers. Thus continuous simulation modelling of the sewer system is needed to generate the required information for rehabilitation studies. This in turn generates a requirement for a sufficiently long input time series of rainfall to allow the required magnitude frequency relationship to be quantified. Moreover, numerical models of storm sewer systems frequently require input time series at a time resolution as short as 5 min. As storm sewer rehabilitation studies may be required at any urban location in the UK, it is unlikely that a rainfall record with the required length and sampling interval will be available at the site of interest. One possible solution to this problem, which is developed in Part II of this paper, is to assemble a historical rainfall data base for the region in question, to estimate the parameters of a time series model at the various sites and then to regionalize the model parameters using appropriate geographicphysiographic information. In this, the first part of the paper, the time series model is described and model fitting procedures for hourly and daily data are presented. The model’s ability to reproduce rainfall properties not used in fitting is also assessed. In Part II, model parameters are estimated from rainfall data for 112 sites distributed across the UK, and the parameter estimates are then regionalized. A technique for disaggregating hourly data to 5 min data is also presented, thus completing the method for generating synthetic rainfall data with the required sampling interval at any location in the UK.

PAP.

Cowpertwait et al. 1 Journal of Hydrology 175 (1996) 17-46

2. The Neyman-Scott

19

rectangular pulses model

2.1. Background In recent years, the application of stochastic point process theory to rainfall has been the focus of considerable research activity (e.g. Rodriguez-Iturbe et al., 1987a,b; Cowpertwait, 1991a,b). The theory of point processes has been covered by Cox and Isham (1980), and a review of applications in hydrology has been provided by Burlando and Rosso (1993). The focus of the work presented here is a clustered point process model known as the Neyman-Scott Rectangular Pulses (NSRP) model. The NSRP model has a flexible structure in which the model parameters broadly relate to underlying physical features observed in precipitation fields, e.g. convective rain cells. Previous research has shown the model to be capable of preserving statistical properties of a rainfall time series over a range of time scales (Rodriguez-Iturbe et al., 1987b; Cowpertwait 199 la,b). This paper provides further research results to support the use of the NSRP model in sewer system rehabilitation studies. A definition of the NSRP model now follows, together with properties that are needed to fit the model to rainfall data. 2.2. Dejinition and properties of model

Let us suppose that storm origins occur in a Poisson process of rate X (Fig. 1). A random number (C) of cell origins is associated with each storm origin, the cell origins being independently displaced from the storm origins by distances that are exponentially distributed with parameter ,6. A rectangular pulse (or rain cell) is associated with each cell origin. The duration of the pulse is an independent exponential random variable with parameter q; the intensity of the pulse is also an independent random variable and is denoted as X. The total intensity at any point in time is then the sum of the intensities of all active cells at that point (Fig. 1). More formally, if Y(t) is the total intensity at time t given by the NSRP model and X,+(u) is the intensity at time t owing to a cell with origin at t - u, then 00 Y(t) =

J

X,_,(u)dN(t

- u)

(1)

u=o

where 1 if there is a cell origin at t - u

dN(t - u) =

0

otherwise

and X,-U(U) =

X

with probability eWqu

0

with probability

1 - eeW

Second-order properties of the intensity process Y(t) of the NSRP model have been

20

PAP.

Cowpertwait et al. / Journal of Hydrology 175 (19%)

17-46

Storm origins arrive according to a Poisson Process

Each origin generates a random number of rain cells with cell origins at *

A rectangular pulse is associated with each rain cell

The total intensity at any point in time is the sum of the intensities of all active rain cells at that point

Fig. 1. A scheme for the Neyman-Scott

rectangular pulses model.

21

P.S.P. Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46

derived by Rodriguez-Iturbe et al. (1987a). Rainfall data are usually only available in aggregated form, e.g. as historical records of hourly or daily totals, so that the aggregated properties of the model are usually needed to estimate the model parameters. Let Yi(h)be the aggregated rainfall depth in the ith time interval of length h, so that ih

y.(h) I =

J

Y(t) dt

(2)

(i-l)h

Thus, if h is measured in hours, the series { Yi(h): i = 1,2, . . .} is a rainfall time series at the h-hour level of aggregation, i.e. an h-hourly rainfall time series. Second-order properties of Y,‘h’were derived by Rodriguez-Iturbe et al., (1987a), as follows: E{ YJh)} = hXE(C)E(X)/n I

(31

Var{ Yy’} = X~-~(nh - 1 + eOh){2pcE(X2)

+ E(C2 - C)p,$p*/(p* - n*)}

- A(j3h - 1 + ewPh)E(C2 - C)~,$‘/(p*

- n2)

(4)

and,fork>l, Cov{ YZ(h) Y$,> =

xv-3

(1 _

e-Vh)‘e-d’-‘)h

x

bw2)

+ gw*

-

%&p’l(p’

-

v2H

-

X(1 - e-ph)2e-4(k-1)hE(C2 - C)&/{Zp(p*

- n*)}

(51

When fitting the NSRP model to data, distributions need to be chosen for C and X. An approach that ensures at least one rain cell follows each storm origin, is to take C - 1 as a Poisson random variable with mean v - 1. If parameter parsimony is important, as it is in Part II of this paper, then X can be taken to be an exponential random variable with parameter 5. Thus in Eqs. (3)-(5) above /Lc f E(C) = v;

E(C2-C)=>-1;

px-E(X)=<-‘;

E(X2)=2<_’

A further possibility, which is considered in a later section, is to take X from a heavytailed distribution, such as the Weibull, which can improve the fit of the model to extreme values. By assuming C - 1 follows a Poisson distribution with mean v - 1, the following expression for the probability that an arbitrary interval of length h is dry was derived by Cowpertwait (1991 a,b): Pr{ YJh) I = 0) = exp

- Ah + Xp-‘(v (

- 1)-l{ 1 - exp[l - v + (v - l)e-4h]}

22

P.S.P. Cowpertwait et al. / Journal

of Hydrology 175 (1996) 17-46

where ph(t) = {e-4(‘+h) x

+ 1 - (qe@ - pe-“‘)/(rl-

exp{-(v

/?)}

- l)/?(e+’ - e-“‘)/(q - p) - (v - l)e+

+ (v - l)e-8(‘+h)}

For brevity, E{ Y,‘“‘}, Var{ Y,‘“‘}, Cov{ Yjh), Y$} and Pr{ Yi(‘) = 0) given by Eqs. (3)-(6) above will be denoted as p(h), r(h), y(h, k) and 4(h), respectively. The lag k autocorrelation function, which is given by y h, k)/y(h), will be denoted as p(h, k). ‘( > 0) and pr{ Y$ = O(Yi(h)= 0}, The transition probabilities, pr{ Y$ > 01Yih’ denoted as &,,(h) and $DD(h), respectively, can be expressed in terms of 4(h) and follow as (see, e.g. Cowpertwait 1991b) 4,,(h)

= WW#W

(7)

4(h) = &D(hM(h) + (1 - &W(h)Hl

- 4(h))

(8)

so that &W(h) = (1 - W(h) + eW))l{l

- 4(h))

(9)

3. Fitting the NSRP model to hourly rainfall data 3.1. Fitting procedure The fitting procedure described in this section assumes that an hourly rainfall time series is available. There are five parameters to be estimated, so a natural procedure would be to equate five statistical properties, or ‘moments’, taken from the observed time series with their derived expressions for the model, and to solve the resulting set of simultaneous equations for the parameter estimates. The model would then fit five sample moments exactly, with the fit to other values not guaranteed. A more flexible fitting procedure is adopted here which assumes that it is more desirable to fit a larger set of sample moments approximately rather than a smaller set exactly. Let L =_W@, 17,v, I) b e a function (e.g. a second-order moment) of the NSRP model, and 1etJi be its sampled value taken from an observed time series; expressions for f;: are given by Eqs. (3)-(7) and (9), with h 2 1 when an hourly time series is available. Suppose m functions and sample values are selected. The parameter estimates can then be found by minimizing the following sum of squares: S=

2

Wi(1 -f;:/fzi)2

(10)

i=l

where X, ,B,Q,< > 0, v > 1 andfi > 0. The Wiallow greater weight to be given to fitting some sample moments relative to others, and the use of a ratio of model function to sample value in (10) ensures that large numerical values do not dominate the fitting procedure. In the ensuing application, an arbitrary value of Wi= 100 is applied to the

P.S.P.

Cowpertwait

et al. / Journal of Hydrology

23

175 (1996) 17-46

term relating to the sample mean rainfall to ensure that this is matched almost exactly by the model; values of Wi= 1 are applied to the remaining terms. To take seasonality into account, the model parameters are estimated separately for each month, giving a total of 60 parameters per site. However, for the regionalized model described in Part II of this paper (Cowpertwait et al., 1996), a more parsimonious description of seasonality is employed. The results presented here are based on a 20 year historical record of hourly rainfall data for Manston, which is located on the south-east coast of England. There are many possible choices for$ in (10) which could be used to estimate the parameters of the NSRP model. Two approaches are considered here: the first uses lag 1 autocorrelations, whereas the second uses transition probabilities (Eqs. (7) and (9)). 3.2. Use of autocorrelations in the$tting procedure Following Rodriguez-Iturbe et al. (1987a,b), Entekhabi et al. (1989) and Cowpertwait (1991a), the following sample moments were used to fit the NSRP model to the Manston data: p(l), 9(l), /j( 1, l), 9(6), /j(6, l), ?(24), $(24, l), $(24). The following estimators of p(h), y(h), $h, 1) were employed to avoid bias (e.g. see Trenberth, 1984): (4

j&(h) = T F; Y;;~/{n$z} i=l j=l

Table 1 Sample moments

for the Manston

(11)

data

Month

P(l) (mm)

=Y(l) (mm’)

WV 1)

T(6) (mm21

1?6,1)

9~24) (mm2)

~(24,l)

&24)

Jan. Feb. Mar. Apr.

0.063 0.052 0.054 0.055 0.056 0.062 0.061 0.067 0.093 0.080 0.088 0.066

0.082 0.068 0.071 0.082 0.120 0.174 0.204 0.386 0.530 0.208 0.153 0.097

0.54 0.49 0.54 0.55 0.43 0.39 0.46 0.33 0.47 0.46 0.55 0.54

1.2 1.0 1.2 1.4 1.7 2.3 3.1 4.5 9.1 3.1 2.4 1.6

0.23 0.33 0.27 0.27 0.19 0.30 0.34 0.32 0.51 0.37 0.34 0.27

7.5 6.5 7.4 7.9 8.1 14. 21. 36. 61. 21. 17. 9.5

0.08 0.25 0.19 0.13 0.11 0.27 0.09 0.07 0.25 0.32 0.15 0.07

0.52 0.58 0.57 0.60 0.60 0.68 0.70 0.71 0.64 0.62 0.54 0.56

May July Aug. Sep. Oct. Nov. Dec.

P.S.P. Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46

24

Table 2 Parameter estimates when using the lag 1 autocorrelations in the fitting procedure Month

X (h-l)

P (h-‘)

7)(h-l)



e (h mm-‘)

Jan. July

0.0218 o.Oa470

0.191 0.225

1.30 3.51

4.55 19.0

1.22 0.413

n

l) =

?kh

Ah)-1

cix1 k ty& - bk@)){ j=l

y&k - bkk(h))/{(nf' -

lb)

(13)

where k is a calendar month index (k = 1 for January, 2 for February, etc.), Y/l’, is the jth h-hourly total in year i for month k, $’ is the number of h-hourly totals in’month k and n is the number of years of record. Thus, the sample moments are obtained by pooling all the available data for each calendar month. Values of the sample moments for each of the 12 calendar months for the 20 year Manston record are presented in Table 1. The parameters of the NSRP model were then estimated for each month 6 5 4 3

5% level 2 % =;. v) &

1 0 -1 -2 -3 -4 -5 -6 JFMAMJJASON Month

Fig. 2. r-Tests for the proportion of dry days, when using the lag 1 autocorrelations in the fitting procedure. of dry days; S is mean simulated proportion of dry days; SE is pooled estimate of standard error. H is mean historical proportion

PAP.

Cowpertwait et al. 1 Journal of Hydrology 175 (1996) 17-46

25

using these moments in the fitting procedure (10); Table 2 gives the parameter estimates for the months of January and July as examples. Twenty years of hourly rainfall data were simulated using the NSRP model, and, using standard l-tests, comparisons were made between historical and simulated sample moments. For example, the variance of the daily rainfall was found for January of each year for both the historical and simulated time series, and the mean values (over the 20 year period) also found. The null hypothesis that the mean simulated daily variance could have come from a population with mean equal to the mean historical daily variance was then tested. As might be expected, there was no evidence to suggest that the historical values: b(l), T(l), fi( 1, l), T(6), j5(6, l), T(24) and j(24,l) had different population means from their simulated equivalents. In addition, it was found that the model was able to match, within sampling variability, the mean of the monthly maximum rainfall captured in 1,6 and 24 h periods. However, the model had a tendency to overestimate the proportion of dry days, particularly for the summer months (see Fig. 2, where a day is defined to be ‘dry’ if less than 0.2 mm of rain fell), even though 4(h) had been used in fitting the model. Some improvement was found when the lower bound for a 70

60

50

40

30

20

10

0 1

5

10

15

20

25

30

Length of Sequence / days m

Historical

m

Simulated

Fig. 3. A comparison of historical and simulated dry spell sequences for Season 3 (July, August, September), when using the lag 1 autocorrelations in the fitting procedure.

26

P.S.P. Cowperrwaitet al. 1 Journal of Hydrology 175 (1996) 17-46

dry day was increased to 1 mm, 2 mm and 3 mm (so that a day was defined to be dry when less than 1 mm, 2 mm or 3 mm of rain fell), although many of the t-ratios were still statistically significant. Furthermore, from a practical point of view, if 3 mm of rain fell in a short time interval, which may well happen for summer convective storms, there may be enough runoff to wash the surface pollutants from the streets to the drainage system. Hence the differences between these statistics of the simulated and historical series were judged to be both statistically and practically significant. Dry spell sequences were also compared on a seasonal basis, where a year was divided into four seasons: Season 1 (January, February, March), Season 2 (April, May, June), Season 3 (July, August, September) and Season 4 (October, November, December). For each of these seasons, the number of occurrences of a sequence of n dry days were found for both the historical and simulated time series over the 20 year period (see Fig. 3, where n = 1,2,. . . , 30). It was found that the model tended to underestimate the historical dry spell sequences of short durations, particularly for the summer months. The overestimation in the proportion of dry days was therefore due to the model producing too many long dry spell sequences. The fit to the historical dry spell sequences and proportion of dry days was regarded as inadequate for the intended application, which is to provide rainfall inputs to models used in the design, or upgrading, of sewer systems in the UK. In sewer system modelling, it is important that antecedent conditions (e.g. initial detention storage levels) within the system are modelled as accurately as possible (e.g. see Padmanabhan and Delleur (1982) or Henderson (1986)). These antecedent conditions depend to a large extent on the length of the dry spell preceding a storm, because, for example, the pollution impact on a receiving river may be worse if the storm follows a dry period, when low river flows offer reduced dilution and pollutant concentrations are high owing to in-sewer deposition. Hence, an improvement in the fit to the historical dry spell sequences was sought by using the transition probabilities (Eqs. (7) and (9)) in the fitting procedure (10). 3.3. Use of transition probabilites in the fitting procedure In the previous section, it was found that the NSRP model matched poorly the historical proportion of dry days, despite the fact that it had been used in the fitting procedure. One possible explanation for this is that the model is unable to match both the autocorrelations and the proportions of dry days. Moreover, autocorrelations tend to have large sampling errors, because of the large number of zero depths. Thus, the autocorrelations ~(1, l), p(6,l) and p(24,l) were excluded from the fitting procedure and the transition probabilities (7) and (9) used. Following this approach, Table 3 Parameter estimates when using the transition probabilities in the fitting procedure Month

X (h-l)

P (h-l)

v (h-l)

Y

[ (h mm-‘)

Jan. JUlY

0.0242 0.0117

0.455 0.151

1.20 0.687

4.24 2.00

1.39 0.558

PAP.

Cowpertwait et al. / Journal of Hydrology 175 (1996)

17-46

21

the parameters of the model were estimated for each month (Table 3 gives the estimates for January and July), where the sample moments used in the fitting procedure (10) were: G(l), +(1)/i(3), W), ?(12), ?(24), &vw(~)~ &VW(~), &VW(~), &&12), &&24), $&24), +(24). The inclusion of the 3 and 12 hourly values should be noted. Although not shown here owing to space considerations, it was found that if these were omitted from the fitting procedure, then they were poorly matched by the model. Hence, they were included in the final set of sample moments used in the fitting procedure. Moreover, as the transition probabilities and proportion of dry days are related in Eq. (8), some extra weight is now given to matching the dry spell sequences. By comparing Tables 2 and 3, it can be seen that the parameter estimates for the two approaches differ, particularly in July (in general, it was found that they differed most in the summer months). Hence, the parameter estimates depend on the choice of moments used in the fitting procedure, so that such a choice needs to be made with some discretion. A rainfall time series was again simulated for a 20 year period using the NSRP model, and comparisons again made between historical and simulated statistics. As 6 5 4 3

5% level

2

z

1

2 ul

0

z

-1 -2 -3 -4 -5 -6

J

F

M

A

M

J

J

A

S

0

N

Month Fig. 4. t-Tests for the proportion of dry days, when using the transition probabilities in the fitting procedure. H is mean historical proportion of dry days; S is mean simulated proportion of dry days; SE is pooled estimate of standard error.

28

P.S.P. Cowpertwait et al. / Journal of Hydrology 175 (19%)

17-46

expected, the model was able to match the 1, 3, 6, 12 and 24 hourly historical variances (within sampling variability). However, the main interest here was to see how well the model fitted the historical dry spell sequences. Fig. 5 compares the historical and synthetic dry spell sequences for Season 3, from which a considerable improvement can be seen over Fig. 3. In addition, the model now also fits the historical proportion of dry days within the limits of sampling variability (compare Fig. 4 with Fig. 2). Although the autocorrelations were not used in the fitting procedure, the hourly and daily simulated lag 1 autocorrelations were still compared with their historical counterparts (see Figs. 6 and 7). In Fig. 6 there is some statistical evidence that the model is overestimating the hourly lag 1 autocorrelations. However, upon closer inspection it was found that the largest difference between the simulated and historical hourly autocorrelations was about 0.15 (for October), so that the differences between the historical and simulated hourly autocorrelations were judged to be of little practical significance. Furthermore, it was found that the model matched the historical lag 1 autocorrelations for most of the months at higher levels of aggregation, particularly the 12 h and 24 h levels (Fig. 7 is given as an example), 70

60

50

40

30

20

10

0

5

IO

15

20

25

30

Length of Sequence / days _

Historical

E’Ej Simulated

Fig. 5. A comparison of historical and simulated dry spell sequences for Season 3 (July, August, September), when using the transition probabilities in the fitting procedure.

PAP.

Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46

29

and so it was decided to retain the approach of omitting autocorrelations from the fitting procedure. Moreover, the practical importance of matching dry spell sequences discussed above outweighs the theoretical importance of matching autocorrelations precisely. To assess the performance of the fitting procedure more widely, the NSRP model was fitted (using transition probabilities) to hourly rainfall time series from eight other sites scattered throughout the UK. The results cannot be presented owing to space limitations, but they show that the NSRP model was also able to match the historical statistics from these time series within the limits of sampling variability (Cowpertwait, 1991b).

4. Fitting the NSRP model to daily rainfall data Daily rainfall data are much more widely available than hourly data, so that a suitable procedure for fitting the NSRP model to daily data is needed. One approach is to fit the NSRP model using properties at the 24 h aggregation level. An assessment can be made of the goodness of fit at lower aggregation levels by fitting the model 6 5 4 3 2 w

1

e 8

0

i

-1 -2 -3 -4 -5 -6

JFMAMJJASON MOti

Fig. 6. t-Tests for hourly lag 1 autocorrelations. H is mean historical hourly autocorrelations; simulated hourly autocorrelations; SE is pooled estimate of standard error.

S is mean

P.S.P. Cowpertwait

30

et al. / Journal of Hydrology

175 (1996) 17-46

using only the 24 h aggregated properties taken from hourly data and then comparing predicted 1 h properties with the equivalent sample values taken from the data. The NSRP model was fitted to the Manston data using only the 24 h aggregated properties. The hourly variance for each month was predicted using Eq. (4) and compared with the sample values taken from the historical record. The errors ranged between -38% and +36%, with underestimation generally occurring over the summer months. A poor performance of the model at the 1 h aggregation level, when fitting using only daily properties, is not surprising because the parameters of the model relate to rain cells that generally have lifetimes less than 1 h. Consequently, higher aggregation levels are unlikely to contain enough information from which the properties of the cells can be found. An alternative approach, when only daily data are available, is to estimate the sample hourly variance from the sample daily variance using a derived empirical relationship. This approach was initially motivated by observing that the sample hourly variances were highly correlated with the sample daily variances. Furthermore, expressions relating the variance at a lower aggregation level to the daily variance follow from the general expression for the sum of two random 6 5 4 3

5% level

2 1

ii 2 0

0

k

-1 -2 -3 -4 -5 -6 JFMAMJJASON Month

Fig. 7. t-Tests for daily lag 1 autocorrelations. H is mean historical daily autocorrelation; simulated daily autocorrelation; SE is pooled estimate of standard error.

S is mean

PAP.

Cowpertwait et al. / Journal of Hydrology 17.5 (1996) 17-46

31

variables, e.g. $12) = iy(24){ 1 + p( 12,1)}-’ Hourly rainfall data from 27 rainfall stations distributed across the UK (locations shown in Fig. 8) were used to derive regression equations. To begin with, appropriate

a

9

4

14

q a

u

I

2

8 10

1

22 12

Fig. 8. The homogeneous regions A and B used for h-hourly variance regression equations (also showing the locations of the stations, numbered l-27).

32

P.S.P. Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46

a7

ae

0.5 I s

a4

I >

a3

%

a2

ai

0 0

40

20

60

60

DdlyVarienCeISMJ IIWI

a7

a6 a5

B

5

I> f

a4

(w

a3

a2

ai 0

I

0

,

20

I

1

40

8

I

60

I

60

DdlyVsrlrncrISqlNll Fig. 9. The regression models for the hourly variance: (a) for stations in Region A, (b) for stations in Region B.

PAP.

Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46

33

24 22 20 18 16 14

(a1

12 10 8 6 4 2

84 82 30

28 26 24 22 20

(w

18 16 14 12 10 8 6 4 2 0

I

0

I

I

20

I

1

40

I

I

I

60

Llallyv-Isqmm Fig. 10. The regression models for the 12 hourly variance: (a) for stations in Region A; (b) for stations in Region B.

PAP.

34

Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46

regressions of 9( 1) on T(24) were identified, and the same approach was then applied for h = 3, 6, 12. A thorough analysis resulted in evidence that different equations are applicable for the South and North of the UK (Regions A and B in Fig. 8, respectively) whereas differences in derived equations within regions could reasonably be attributed to chance. The details of this work, which made use of a statistic derived by Cowpertwait and Cox (1992), are described in their paper and by Cowpertwait (1991b). The derived regression equations are as follows: Region A: T(l) = 0.03159 + 0.008597+(24)

(R2 = 76%)

Region B: 9( 1) = 0.03734 + 0.006205+(24)

(R2= 83%)

Region A: +(3) = 0.1415 + 0.0049319(24)

(R2= 88%)

Region B: q(3) = 0.1055 + 0.04524?(24)

(R2= 90%)

Region A: T(6) = 0.2899 + 0.1426?(24)

(R2= 93%)

Region B: T(6) = 0.0120 + 0.1493?(24)

(R2= 94%)

Region A: T( 12) = 0.4655 + 0.3874?(24)

(R2= 96%)

Region B: +(12) = -0.1582 + 0.4174?(24)

(R2= 97%)

Plots of the data and fitted prediction equations are presented in Fig. 9 and Fig. 10 for the hourly and 12 hourly cases, respectively. The fitting procedure for stations for which only daily data are available can now be summarized as follows: (1) calculate b(l), +(24), JWW(24), J&24) and $(24) for each month from the daily data. (2) Estimate T(l), T(3), q(6), T(12) from the regression equations for the appropriate region. (3) Use the nine values from (1) and (2) in minimizing the sum of squares function (10) to estimate the NSRP model parameters. The fitted model can then be used with reasonable confidence to generate hourly data when only daily data are available at the site in question.

35

P.S.P. Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46 (a)

24 22

i

201

5

6

7

8

9

10

11

Y-r

dmlynihlllmm 26

@I

24 22 20 18 16 14 12 10 8 6 4 2 0 14

Fig. 11. Time series plots for concatenated January: series.

15

16

17

18

19

20

(a) the historical time series; (b) the simulated time

P.S.P. Cowpertwait et al. / Journal of Hydrology 175 (19%)

36

17-46

T

0 12

3

4

5

6

7

5

8

10

11

12

13

14

12

15

14

15

15

17

15

18

20

Y-f

danymhfdlmm

(b)

Eo

70

60

50

1

40-

20-

m-

10 -

0

,

2

3

4

5

6

7

8

0

(0

11

15

10

17

18

19

20

Fig. 12. Time series plots for concatenated Julys: (a) the historical time series; (b) the simulated time series.

PAP. Cowpertwait et al. / Journal of Hydrology

175 (1996)

37

17-46

0.014 0.013 y

0.012

ii ia %

0.011 0.01

a

:

0.009

ii

s

0.006

s

0.007

8 ti

0.006

5

2

0.005 0.004

0.003 1

I

I

I

I

I

I

I

1

I

I

2

3

4

5

6

7

6

9

l0

11

Month

(a)

n

f

HistorIcal

12

Simulated

0.013 0.012 w .e E ta

0.011 0.01

% 8

0.009

b k

0.006

8

0.007

B ;

0.006

's E:

0.005 0.004 0.003 6

1 (b)

n

Historical

7

6

9

Simulated

Fig. 13. A comparison of the proportion of hourly rainfall exceeding 2 mm: (a) the mean of the proportions (over the 20 year period); (b) the standard deviation of proportions.

38

P.S.P. Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46 0.13 0.12

6 .e

0.11

it b

0.1

% a

0.09

6b

0.05

t

0.07 0.00

5 %

0.05 0.04 0.03 + 1

2

(a)

3

4

5

7

MOllth

Historical

l

6

+

8

9

10

11

12

siid

0.07

0.01 1 (b)

I

I

2

3 n

I

I

1

I

I

I

I

I

4

5

6

7

5

9

10

11

Historical

MOtlih

+

12

SiiUlated

Fig. 14. A comparison of the proportion of hourly rainfall exceeding 0 mm: (a) the mean of the proportions (over the 20 year period); (b) the standard deviation of the proportions.

39

P.S.P. Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46

5. Model validation In assessing the performance of a fitted rainfall time series model, the model’s ability to reproduce rainfall properties not used in the fitting procedure, but considered to be of practical importance, needs to be assessed. To this end, a number of validation checks on the fitted NSRP model were carried out based on: (1) a visual comparison of historical and simulated daily time series; (2) the crossing properties of hourly data generated by the NSRP model; (3) the extremes of daily and hourly data generated by the NSRP model. Figs. 11 and 12 were prepared to illustrate the historical and simulated daily time series for the Manston site (which were obtained by aggregating the hourly time series). January and July were selected to be representative of winter and summer, respectively. For the historical and simulated time series the months for each year were concatenated to form a record of Januarys and Julys. From these figures it is evident that the simulated daily time series compare favourably with their historical counterparts. The proportions of time that historical and simulated hourly rainfall exceeded various levels, taken as 0, 2, 3, 4, 5, 10, 15 and 10 mm, were found for each month of each year for the Manston site. The means and standard deviations of these quantities (20 for each month) were then calculated and plotted against calendar month (Figs. 13 and 14). Good agreement between the historic and simulated values was evident when the crossing levels were 2, 3, 4, 5, 10, 15 and 20 mm (Fig. 13 is a typical example). However, the model showed a tendency to underestimate the mean proportion of summer rainfall exceeding 0 mm and 1 mm, and a substantial underestimation was found in the corresponding standard deviation, particularly for the month of October (see Fig. 14). From this, it was concluded that the model is unlikely to be suitable for simulating persistent light rainfall (i.e. drizzle), and, in particular, the model provides less variability in the generation of light summer rainfall. However, for the analysis of sewer systems, light rainfall is unlikely to be of much practical importance as it can easily be lost through evaporation, particularly in the summer months. Hence, the current model, together with the fitting procedure described in this paper, is probably adequate for its intended purpose. This, however, needs to be tested more fully by running simulated and historical time series through a sewer system model to see whether the differences in the time series are of practical importance in modelling the effects of sewer system rehabilitation. Table 4 Long records of daily data used in extreme value analysis Station no. 1525

115306 588702 275574 354864

Station name

Years of data

Altitude (m)

East grid

North grid

Howick Hall Blackbrook Poaka Beck Windsor Exmouth

92 90 90 90 74

34 107 156 21 66

4264 4456 3240 4979 3027

6177 3178 4781 1754 819

PAP.

Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46

Gumbel Plot for Maximum Daily Rainfalls Howick Hall, Januaty (90 year record)

,_______--

60

T-

>

50

: 40

>

(

+

+ !!

0

A

I(

A

I

30

F!!f!!

8

+

20

10

?’

_----

r-

r___--~---r---T-___T__

----

-r-------

6

4

2

0

-2

Standardised Gumbel Variate I

Historical

+ exponential

G alpha=0.6

> alpha=0.4

n alpha=0.6

Gumbel Plot for Maximum Dail Rainfalls Howick Hall, July (90 year record T 6C 70 ‘_

GO

?

50 40 3b 20 10 0 2

0

4

6

Standardised Gumbel Variate a Historical

+ exponential

0 alpha=0.6

A alpha=0.6

r alpha=0.4

Fig. 15. Extreme value plots of maximum daily rainfall for Howick Hall: (a) for January; (b) for July. The rain cell intensity follows an exponential distribution and a Weibull distribution for several values of the shape parameter (Y.

P.S.P. Cowpertwait

et al. / Journal of Hydrology

41

I75 (1996) 17-46

Gumbel Plot for Maximum Daily Rainfalls

(a)

70 -~____-

Exmouth,January (74 year record) > Y

60 -

+

X

50 -

Q

40 30 20 10 0 ---2

2

0

4

Standardised Gumbel Variate I Historical

+ exponential

o alpha=O.E

A

alpha=08

z alpha=0.4

Gumbel Plot for Maximum Daily Rainfalls 90

Exmouth,July (74 year record) __.. _ -_ ..- __~. _. ._- --_---_ - --_- __._~__

____.___

il

80 I

I

I

I

0 2 Standardised Gumbel Variate I Historical

+ exponential

o alpha=03

-----

-1

4 A

alpha=03

x alpha=0.4

Fig. 16. Extreme value plots of maximum daily rainfall for Exmouth: (a) for January; (b) for July. The rain cell intensity follows an exponential distribution and a Weibull distribution for several values of the shape parameter (Y.

42

P.S.P. Cowpertwait et al. 1 Journal of Hydrology 175 (1996) 17-46

Results from such a validation test are reported in Part II of this paper (Cowpertwait et al., 1996). For the third validation check, the NSRP model was fitted to a number of long UK daily records (Table 4). For each of these records, the parameters of the NSRP model were estimated for January and July using the fitting procedure for daily data described above. In fitting the model, two different distributions (exponential and Weibull) were used to model the cell intensity, and the effect on the reproduction of the distribution of daily extremes was assessed. It should be noted that the choice of distribution for the cell intensity in the NSRP model is arbitrary in Eqs. (3)-(6). Using an exponential distribution for the cell intensity, data were simulated for each of the sites, so that the number of years of simulated data equalled the number of years of historical data. The maximum daily rainfall for each year was found for both the historical and simulated time series. These maxima were ordered and then plotted against the standardized Gumbel variate (Figs. 15 and 16 are given as typical examples). It is evident that the model tends to underestimate the extreme values, mainly for return periods in excess of about 5 years. As already noted, the choice of distribution for the cell intensity is arbitrary. The exponential distribution was initially selected so that the model would have only a small number of parameters, which is an important consideration when the model has to be regionalized (Part II of this paper). However, a heavier-tailed distribution could be used to model the cell intensity to improve the fit to the historical extreme values. Whether such a distribution is needed would depend on the intended application for the model. When upgrading an existing sewer system, the extreme events are of less concern as the system may be overflowing many times a year (Henderson, 1986), in which case the exponential distribution provides a satisfactory fit in the lower range of return period (see Figs. 15 and 16). An obvious alternative to the exponential distribution, which could be used to improve the fit to the extremes, is the Weibull distribution. The Weibull distribution has parameters (a, <) and survivor function given by FX(x) = Pr{X > x} = exp(+“)(x

> 0)

so that, when a: < 1, the survivor function decays more slowly than that for the exponential distribution (which is obtained as a special case by putting cx = 1 in the above). The NSRP model was fitted to the long records of daily data with the cell intensity following a Weibull distribution (see Table 5 for some examples of the parameter estimates). The shape parameter (o) was fixed at values of 0.8, 0.6 and 0.4 in the parameter estimation procedure, so that only the scale parameter 5 had to be estimated (Table 5). This allowed some comparisons to be made between the extreme values generated for several different Weibull distributions (Figs. 15 and 16). It can be seen in Figs. 15 and 16 that the model tends to generate more extreme rainfall events as Q becomes smaller. However, sometimes a Weibull distribution may underestimate the extreme values to a greater extent than the exponential distribution (e.g. in Fig. 15(a) the exponential distribution generates more extremes events than the Weibull (a = 0.8) distribution). This is likely to be due to the parameter estimates

43

P.S.P. Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46

Table 5 Parameter estimates for Howick Hall and Exmouth using the exponential and Weibull distribution to model the rain cell intensity

Station Howick Hall Howick Hall Howick Hall Howick Hall Howick Hall Howick Hall Howick Hall Howick Hall Exmouth Exmouth Exmouth Exmouth Exmouth Exmouth Exmouth Exmouth

Month Jan. Jan. Jan. Jan. July July July July Jan. Jan. Jan. Jan. July July July July

CY

x (h-‘1

B (h-‘1

;h-I,



:h mm-‘)

0.0184 0.0184 0.0181 0.0223 0.0161 0.0164 0.0163 0.0160 0.0168 0.0172 0.0173 0.0236 0.00862 0.00960 0.00984 0.0101

0.125 0.152 0.188 0.313 0.0944 0.143 0.200 0.288 0.0824 0.106 0.137 0.248 0.0564 0.0938 0.139 0.206

0.798 0.855 0.920 0.608 0.482 0.516 0.539 0.537 0.939 1.03 1.15 0.713 0.721 0.778 0.848 0.880

4.45 6.34 11.3 20.0 2.66 3.79 6.74 20.0a 5.22 7.24 12.8 20.0a 3.26 4.12 7.07 20.0=

1.29 1.70 2.37 3.94 0.975 1.38 2.06 3.43 0.848 1.19 1.79 3.32 0.550 0.849 1.40 2.59

1.0 0.8 0.6 0.4 1.0 0.8 0.6 0.4 1.0 0.8 0.6 0.4 1.0 0.8 0.6 0.4

a The mean number of rain cells per storm reached an upper bound of 20 in the parameter estimation procedures. Bounds were used to ensure the parameter estimates did not enter physically unrealistic parameter spaces during the minimization procedure.

being correlated. For example, when fixing Q at progressively smaller values in the parameter estimation procedure, the mean number of cells per storm (v) progressively increases (Table 5) and thus the total volume of rain per storm will be approximately normally distributed. This is likely to lead to the generation of less extreme events, which is evident in some of the plots (Fig. 16(b), for example). The longest record of hourly data (30 years for the Farnborough site) was selected to assess how well the model matched the historical extremes values (see Fig. 17). The conclusions were broadly similar to those obtained when fitting the model to the long daily records. The difficulty in obtaining a consistently good fit to extreme values may be due to an over-simplification in the parameterization of the model. The parameters in the NSRP model broadly relate to properties of rain cells. Short-duration heavy convective cells and large mesoscale areas of long-duration lighter intensity rainfall often occur in the same precipitation field (e.g. Austin and Houze, 1972) and it seems likely that extreme values in rainfall records are due to the heavy convective cells. However, the NSRP model only allows for the existence of one ‘type’ of rectangular pulse, i.e. one type of ‘cell’. Therefore, the parameter estimates for the intensity and duration of the cells in the NSRP model are likely to be average values over the various types of precipitation that can occur in the same precipitation field. Consequently, a good fit to extreme values is unlikely to be achieved consistently using the present form of the model. Research in progress is focusing on the

P.S.P. Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46

Gumbel Plot for Hourly Rainfalls 11

-

Fernborough,January (30 year record)

--.-

T 10 -

A

98-

A

1

7.

64

::

5-

+

4321

;

-2

I

I

1

0

I

I

-------+

2

4

Standardised Gumbel Variate I

Historical

+ exponential

o alpha=0.8

A alpha=0.6

x alpha=0.4

Gumbel Plot for Hourly Rainfalls Famborough, July (30 year record) 40 35

25

-2 StandardisedGumbel Variate e Historical

+ exponential

o alpha=0.6

A alpha=0.6

): alpha=0.4

Fig. 17. Extreme value plots of maximum hourly rainfall for Farnborough: (a) for January; (b) for July. The rain cell intensity follows an exponential distribution and a Weibull distribution for several values of the shape parameter CY.

P.S.P. Cowpertwait

et al. / Journal of Hydrology

175 (1996) 17-46

45

development of a version of the NSRP model in which more than one cell type can occur (Cowpertwait, 1994).

6. Summary and conclusions When fitting time-series models to observed data, an approach which is frequently adopted is to solve a set of simultaneous equations which relate model parameters to sampled moments. The work on which this paper is based suggests that an alternative and more flexible approach is to achieve an approximate fit to a larger set of statistics by minimizing a least-squares function. Two approaches to fitting the NSRP model to hourly rainfall data from the UK have been evaluated. The first of these, which involved using lag 1 autocorrelations in the fitting procedure, produced simulated rainfall data that failed to match the historical dry spell sequences and proportion of dry days. The second approach, which involved using the transition probabilities for wet and dry spells in the fitting procedure, improved the model’s reproduction of the historical dry spell sequences and proportion of dry days. In addition, the model was found to match (within the limits of sampling variability) various other statistics not used in the fitting procedure. A method of fitting the NSRP model to daily data, which are much more widely available than hourly data, has been developed which ensures that generated hourly data will have approximately the desired statistical properties. This has been achieved by identifying homogeneous groups of UK rainfall for regressing h-hourly (h < 24) variances on daily variances for those sites for which hourly data were available, and using these regression estimates of h-hourly variances in the NSRP fitting procedure together with daily sample moments. A number of validation checks on the performance of the NSRP model were carried out to assess its ability to reproduce rainfall properties not used in the fitting procedure. Visual comparisons of historic and simulated daily rainfall series suggested good general similarity. A comparison of the proportions of time which rainfall exceeded various levels in historic and simulated hourly series suggested good agreement for higher levels but less so for lower levels; however, for the intended model application, this discrepancy is not deemed significant. Comparisons of the distributions of daily rainfall extremes based on Gumbel probability plots indicated that, when an exponential distribution was used to model mean cell intensity, the simulated extremes underestimated the historical extremes for return periods in excess of 5 years. If a two-parameter Weibull distribution is employed for the extremes, the additional model parameter can be adjusted to achieve better agreement between the simulated and observed extremes. However, in storm sewer rehabilitation studies, events with return periods less than 5 years are of primary concern, and so an exponential distribution is considered adequate for this purpose. To facilitate the application of the NSRP at sites where few or no data exist, the model parameters have been estimated at more than a hundred sites throughout the UK and then regionalized; this work is described in Part II of this paper.

46

P.S.P. Cowpertwait et al. / Journal of Hydrology 175 (1996) 17-46

Acknowledgements The research described in this paper was co-ordinated by the Water Research Centre, Swindon, under a contract funded by the UK Foundation for Water Research.

References Amell, V., Harremoes, P., Jenson, M., Johansen, N.B. and Niemczynowicz, J., 1984. Review of rainfall data application for design analysis. Water Sci. Technol., 16: l-45. Austin, P.M. and Houze, R.A., 1972. Analysis of the structure of precipitation patterns in New England. J. Appl. Meteorol., 11: 926-935. Burlando, R. and Rosso, R., 1993. Stochastic models of temporal rainfall: reproducibility, estimation and prediction of extreme events. In: J.D. Salas, R. Harboe and J. Marco-Segura (Editors), Stochastic Hydrology and its Use in Water Resources Systems Simulation and Optimization. Kluwer, Dordrecht, pp. 137-173. Cowpertwait, P.S.P., 1991a. Further developments of the Neyman-Scott clustered point process for modelling rainfall. Water Resour. Res., 27(7): 1431-1438. Cowpertwait, P.S.P., 1991b. The stochastic generation of rainfall time series. Ph.D. Thesis, University of Newcastle upon Tyne. Cowpertwait, P.S.P., 1994. A generalized point process model for rainfall. Proc. R. Sot. London, Ser. A, 447: 23-37. Cowpertwait, P.S.P. and Cox, T.F., 1992. Clustering population means under heterogeneity of variance with an application to a rainfall time series problem. Statistician, 41: 113-121. Cowpertwait, P.S.P., O’Connell, P.E., Metcalfe, A.V. and Mawdsley, J.A., 1996. Stochastic point process modelling of rainfall. 2. Regionalisation and disaggregation. J. Hydrol., 175: 17-46. Cox, D.R. and Isham, V., 1980. Point Processes. Chapman and Hall. Entekhabi, D., Rodriguez-Burl-q I. and Eagleson, P.S., 1989. Probabilistic representation of the temporal rainfall process by a modified Neyman-Scott rectangular pulse model: parameter estimation and validation. Water Resour. Res., 25(2): 295-302. Henderson, R.J., 1986. Rainfall time series for sewer system modelling. Rep. ER 195E. Water Research Centre, Swindon, UK. Padmanabhan, G. and Delleur, J.W., 1982. An alternative to the design storm. Approach for urban storm water management. Proc. Int. Symp. Urban Hydrology, Hydraulics, and Sediment Control, Lexington, KY, pp. 267-274. Rodriguez-Iturbe, I., Cox, D.R. and Isham, V., 1987a. Some models for rainfall based on stochastic point processes. Proc. R. Sot. London, Ser. A, 410: 269-288. Rodriguez-Iturbe, I., Febres De Power, B. and Valdes, J.B., 1987b. Rectangular pulses point process models for rainfall: analysis of empirical data. J. Geophys. Res., 92(D8): 9645-9656. Trenberth, K.E., 1984. Some effects of finite sample size and persistence on meteorological statistics, Part 1: autocorrelations. Mon. Weather Rev., 112: 2359-2368. WRc Engineering, 1986. Sewerage Rehabilitation Manual. WRc Engineering.