Stochastic simulation of monthly streamflow by a multiple regression model utilizing precipitation data

Stochastic simulation of monthly streamflow by a multiple regression model utilizing precipitation data

Journal of Hydrology 12 (1971) 285-310; © North-Holland Publishing Co., Amsterdam Not to be reproduced by photoprint or microfilm without written perm...

987KB Sizes 0 Downloads 91 Views

Journal of Hydrology 12 (1971) 285-310; © North-Holland Publishing Co., Amsterdam Not to be reproduced by photoprint or microfilm without written permission from the publisher

S T O C H A S T I C S I M U L A T I O N O F M O N T H L Y S T R E A M F L O W BY A MULTIPLE REGRESSION MODEL UTILIZING PRECIPITATION DATA JOCHANAN BONNI~ 2060, Arcane Ave., Reno, Nevada 89503, U.S.A.

Abstract: A model for simulation of monthly streamflow series is developed by a multiple regression approach, which includes both precipitation and flow, instead of the simple regression Markovian model, which is based on the antecedent flow alone. The variables in the regression function represent the previous month's flow, current precipitation, antecedent month's precipitation, and the accumulated precipitation as independent variables, while the dependent variable is the current streamflow. A procedure is suggested for generating precipitation data which can be used in the multiple regression. The frequency distribution of the flow record in each month is determined by statistical analysis. These monthly distributions are later imposed on the model in order to reproduce data with the same statistical parameters and distributions. For each month the optimal combination of variables for the regression model is selected according to least variance, largest coefficient of determination, and the results of significance tests for the regression coefficients. Three watersheds with different physiographic characteristics were selected. Streamflows were simulated for each basin and results have been compared with the historic record in terms of statistical parameters and frequency distribution. The results were also compared with those obtained by the simple regression model using only streamflow data. Results in general show good agreement with the historical data, mainly in terms of mean, standard deviation, and frequency distribution. Poorer agreement was found in the case of the less reliable parameters like skew coefficient and serial correlation. However, in most cases a definite improvement is achieved over the simple regression model.

Stochastic models for streamflow simulation Streamflow s i m u l a t i o n based on historic flows, or the generation of synthetic hydrologic data by m e a n s of stochastic models, has recently o b t a i n e d the n a m e of O p e r a t i o n a l Hydrology1). The purpose of this generation is to use the derived records as inputs to s u b s e q u e n t hydrologic analysis, a n d to apply s i m u l a t i o n in water resources systems design a n d m a n a g e m e n t . I n the few years since this concept of stochastic streamflow s i m u l a t i o n was first discussed in detail by T h o m a s a n d Fiering~), p e r t i n e n t studies with the 285

286

JOCHANAN BONNE

same premise have been carried out by the same authors and many others. The majority of the approaches presented in these studies are based on linear regression models, assuming that the time period to time period dependency of raw or transformed data is adequately expressed by a homoscedastic (constant variance) Markovian model of the first order (lag of one period), which has the general form of: qt = oc + f l q , _ 1 + et

(1)

where q actual flow rate or its derivative (according to the underlying distribution). ~ and fl are regression coefficient and ~t is an independent error term with zero mean. These coefficients can be substituted by their derived values, and Eq. (1) may be rewritten as: q, = t~ + P ( q t - ,

where

-- # ) + vt(r (1 -- p2)~

(2)

/~ = p= a= vt =

mean of the historical data; the serial correlation coefficient; standard deviation; random normal deviate; and O'2( l --PC) = V a r (qt/qt-1) which is independent of q t - 1 .

The dependency of qt on q t - ~ a l o n e and not how the process reached q t - 1 made these types of models particularly tractable3).

Generation of streamflows by Markovian models has been developed for annual, monthly, and daily time series, of which the monthly ones offer the most opportunity for development and use. The reason for this is that monthly series were found to be most appropriate for analysis 4). Monthly values show the basic structures of precipitation and runoff series with both deterministic and stochastic components which are necessary for establishing the models to generate operational hydrology. In most of the main basins, except in underdeveloped regions, 20-50 years of precipitation and runoff records are available. For annual time series such periods may not be sufficient to determine the structure of time series and the probability distribution parameters with accuracy. In comparison with annual data, monthly series are twelve times longer and thus have about 250-600 items which can be considered as sufficient for the determination of both deterministic and stochastic components of the series. At the same time, the use of monthly values does not yield as much detail in the analysis of the two types of components as would the use of daily values; therefore, it does not involve so much computation. I f the time period of concern is a month, there is a physical reason why the assumptions of constant variance and dependency on last period of stream-

STOCHASTIC SIMULATION OF MONTHLY STREAMFLOW

287

\

A

!

,,

qi-I

MAY

Fig. 1. Typical distribution of monthly runoff.

flow alone are unsatisfactory. This is demonstrated in Fig. 1. In this figure, a schematic annual hydrograph is shown, divided into monthly periods. Along the rising limb of the hydrograph the flow qi in month i is composed of two components • qi = K q i - I + dqi"

(3)

The first term is the flow which is released from the existing ground and/or surface storage if no precipitation had occurred during this month. It can be expressed as a product of a recession coefficient 'K' and the last month flow qi- 1. The second term 'dq' is the consequence of the current precipitation, part of which may become runoff in that month, or be retained in storage below or above ground, or as snow pack. The falling limb of the hydrograph describes the dry weather flow, when no precipitation occurs, hence the runoff under these circumstances is a function of the storage in the basin only, or when precipitation is not enough to yield a flow exceeding that of the previous month. As there exist some limits to the natural basin storage, the division of the net precipitation* between retained water and runoff is dependent on the rate of precipitation as well as the amount. Thus, adequacy of the assumption of homoscedasticity is suspect. The dependency might still be Markovian, namely, the amount of flow will partly depend on the former state of watershed, but not be estimated by a simple streamflow regression • Total precipitation less evapotranspiration.

288

JOCHANAN BONNE

analysis alone. The concern of hydrologists with these aspects has led to the proposal of some different types of models than those discussed above. Fiering 3) has also explored the multilag serial correlation of streamflows in a model which considers annual precipitation. The model is formulated as follows: ql = (l

-

a -

b ) xi + c S i - 1

(4)

where ql = annual runoff; x~ = annual rainfall ; a = annual percentage of rainfall stored in a year; b = annual percentage of rainfall evaporated in a year; c = annual percentage of ground-water storage that runs off in a year; and S i - ~= ground-water storage at the beginning of the year. His analysis was directed to demonstrate the relationship between the parameters a, b, and c, and a realistic correlogram, which may explain the instability in streamflow correlograms. Another suggestion to improve the present techniques of synthesizing hydrologic series was to use a multiple regression equation based on all the long streamflow records within a region, including also generalized climatic variables and the hydrologic characteristics of the drainage basin, to obtain the mean and the standard deviation of annual or monthly flows. These parameters were then used for sites within the basinS). The main problems involved in operational hydrology models is the reliability of the derived streamflow records, or, how appropriate are they to represent any alternative of actual streamflow sequences, which may occur in the future, during the life time of the project under consideration in terms of range and frequency distribution. This concept of reliability is an arbitrary statistical criteria which measures the goodness of the forecast and is defined differently by various investigators. Usually it is expressed by statistical parameters to show the closeness of the fit between the two series, whether they are statistically indistinguishable. In the search for a comprehensive streamftow simulation model an investigation of the serial correlation was conducted. It was found that the degree of association between monthly flows has a varied seasonal pattern (Fig. 2). During late summer and early fall, when the flow in the western part of the United States is mainly the result of the storage in the basin, a high correlation coefficient is encountered. Whereas, in the winter and spring, the flow magnitude depends on current and preceding precipitation in the form of rainfall or the results of snow melt (which, by itself, is a function of

0.4

J

o

0.1

0.2-

-

(1) 0 . 3 -

I.u

0.5

0.6-

0.7

0 o

,,,

.j

-

z o

OCT

'

o

o

NOV

8

DEC

JAN

Fig. 2.

,,,t



S o u t h Fork Eel R i v e r Tujunga Creek

0,9-

0.8 )-

East Fork Carson River

o

• o

1.0-

FEB.

I



MAR

i

o

APR.

I

o

MAY

I

0



o

dUN.

Serial correlation of streamflow - m o n t h l y variations.

~

o

o

o

I

II

Pl

JUL.

I

6

~

AUG.

[

o

SEPT

[

o

bO Oo

~" ©

;>

Z

0

Z

r.

N

~

© 0

290

JOCHANAN BONNE

the antecedent precipitation stored in the form of snow pack, and of temperature). Therefore, the degree of association is much lower and fluctuates from region to region. The fundamental shape of the hydrograph, as presented in Fig. 1, and the rationale behind it, as is stated in Eq. (3), together with the randomness in distribution of monthly rainfall, lead to the suggestion of combining the parametric and stochastic approaches to develop a more realistic model which will include both physical and statistical parameters. The most important parameters used in hydrologic system investigation are precipitation, which constitutes the principal system input, and runoff, which is the principal system output. These physical parameters are usually available for varying lengths of record for the site and basin concerned. The reliability of their statistical parameters depends on this length. The parameters determined from the historic sequences are sample estimates of their respective population values; the sample estimates act as population values for the synthetic sequences. Because the sample estimates are unlikely to equal their respective population values, the estimates are operationally biased6). Therefore, using stochastic methods requires that the size of historical records will be sufficiently large to minimize operational biases. In many instances, where stochastic studies of runoff are needed the records are too short to fulfill this condition. On the other hand, available precipitation records frequently have much larger duration. It seems, therefore, desirable that sequences of runoff, reconstructed from historical rainfall by synthesis or other methods of analysis, can extend the basis for stochastic simulation. Stochastic models, using runoff alone, are more affected by systematic and random errors of measurements of the historical data than when rainfall data are incorporated, which reduces the sampling errors inherent in this method. Development of a simulation model

The simulation model is based on a multiple regression analysis. Such analysis permits the evaluation of the combined effects of many parameters on the dependent variable. The relative effect of each parameter can be determined and the overall efficiency of the simulation model can be measured by the coefficient of determination. The significance of the relationship between each independent variable and the dependent variable can be obtained by statistical tests and a "best" estimating equation can be derived. In the model a stochastic or a random component is added to the regression

S T O C H A S T I C S I M U L A T I O N OF M O N T H L Y S T R E A M F L O W

291

equation to account for the variation in the dependent variable due to errors and other variables not included in the analysis. In order to yield a relationship which would most closely approximate the current streamflow in terms of the preceding flow and precipitation, a monthly regression equation was formulated: t--1

Qt=A+BQt_~ +CPt+DPt_I +E ~ P~

(5)

i=t--j

where Q t =current month flow; Q~- 1 = previous month flow; Pt = current month precipitation; Pt- 1 -- previous month precipitation; j = 1 ... 12, a water year month counter; t--1

Pi = accumulated precipitation (since the beginning of the snow pack); i=t-j

and A, B, C, D, E = multiple regression coefficients. Equation (5) includes only the primary input and output parameters of the hydrological system, namely precipitation and runoff, which are usually available for most basins. The previous flows and precipitation account for the storage in the basin, whereas significant current precipitation is the reason for an immediate increase in flow rate. The accumulated precipitation is an important parameter in high lying basins, where snow melt provides the major portion of the flow during spring and early summer. Thus, it is directly related to flow magnitude. The arbitrary addition of other parameters, such as land and channel slope, soil cover and crop type may increase the coefficient of determination. However these parameters are more difficult to obtain and would probably effect the generalized character of the regression equations. Moreover, former studies on water yields from river basins have shown that only those factors pertaining to climate and ground water or base flow are in general consistently and significantly related to water yield and runoff, where other cultural variables, such as type of crops and cultivation are notT). Using the historical records of runoff and precipitation, the optimal combination of the independent variables and the corresponding values for the regression coefficients in Eq. (5) were determined for each month. The possible combinations of the different variables in the multiple regression function were searched by a computer program and the combination that met one or more of the following criteria was selected as the optimal one:

292

JOCHANAN BONNE

(a) Test of significance for the multiple regression coefficient: bi

twhere

(6)

Sbi

bi= regression coefficient; and Sol = standard error of the regression coefficient.

(b) Least variance or the standard error of estimate of the populations: Sy2 _ 2 (Y n-k-1

~-)2

(7)

where Y = t h e dependent variable (historic flow event); ~ = t h e estimate (due to the regression) of Y; k = number of independent variables; and n =number of events. (c) The largest coefficient of determination, which is defined as: R 2 = Z ( ~ -- I7) 2_

(8)

Z(Ywhere Yand ~are the same as above, and ]?is the sample mean. The optimal monthly combination of variables for the East Fork of Carson River are given in Table 1. These results indicate again the seasonal characteristics TABLE 1

Optimal monthly combination of variables East Fork, Carson River

streamflow

Current precipitation

Qt- 1

Pt

X XI XII

X

X

X X

X X

Previous

Previous precipitation

Month

1

X

X

II III IV V VI

X

X X

VII VIII

X X

IX

X

Pt

t 1 Z Pi i t j

X

X X X

1

Cumulative precipitation

X X X X

293

STOCHASTIC SIMULATION OF MONTHLY STREAMFLOW

SOUTH FORK, EEL RIVER(December)

i-

"

/ / ~

6-

° /

.E 5-

/

*

O

.J LI_ a

./

4

--

3

--

rr" W m

/

• Z

o

/.



°e

./

./

/.

),/" o/

I

I

I

I

2

3

I 4

I

I

I

I

5

6

7

8

COMPUTED FLOWS (inches) Fig. 3. Computed vs observed flows by multiple regression.

of the parameters involved and therefore justify the application of this kind of model to represent the monthly flow distribution. An example of the observed vs the computed December flows of the South Fork of the Eel River is given in Fig. 3. The computed values were obtained by the use of the multiple regression Eq. (5), including only the optimal variables in that month, which were found to be previous flow and current precipitation. THE GENERATING MODEL

The suggested generating equation is basically a Markovian model as it includes the former states of the watershed in terms of previous runoff and precipitation in the basin. However, a new variable, the current precipitation, is also included. The equation itself consists of two parts: (a) the deterministic one, which is equivalent to the multiple regression Eq. (5), with the optimal combination

294

J O C H A N A N BONNE

of the independent variables derived for the month concerned; and (b) a random component, which is stochastic by nature, represents the deviation from the estimates derived by the multiple regression: Y - ~ = SrT

(9)

where Y=the historic (or generated) flows; ~ = estimated flow by the multiple regression; Sr = standard error of estimate of the flows; and T = random standard deviates. The complete expressions for the generating equation is as follows: j~t-

Qt= A + BQ,_, + CPt + DP,_I + E

~

I

+ SrT

(10)

j=l

where Q, P, St, T, and j = the same as defined before. GENERATION OF SYNTHETIC PRECIPITATION

The introduction of various types of precipitation variables as input data in the simulation model raises the problem of what kind of records should be used. The approach suggested in this study is to synthesize monthly precipitation values by a model based on the works of Stidd8-1°). In these papers the author has discussed the nature of the skewness in the time distribution of precipitation data. He claims that theoretically, precipitation amounts have a cube root normal distribution because they are a product function of three variables: vertical motion in the atmosphere, moisture, and duration time. Rainfall rates, which are a product function of vertical motion and moisture alone, will more likely have a square root normal distribution, while other values may also be encountered due to local conditions effecting rainfall characteristics. Stidd suggests a method for plotting frequency distribution on ordinary full logarithmic graph paper. The plotted points delineate a straight line (Fig. 4) using arbitrary constants. The equation of this line may be expressed as: c l o g P = l o g ( t - - to) + loga

(11)

or

pc = a ( t where P c a t

to)

(12)

= precipitation data; = exponent corresponding to the slope of the line; = standard deviation of pc; = n o r m a l standard deviate corresponding to the probability of the event; and

S T O C H A S T I C S I M U L A T I O N OF M O N T H L Y S T R E A M F L O W

295

WOODFORDS,CALIF,

to--

19

dAN.

z 1.0 --

~

-

~

-

pC= O - ( t _ t o )

/

"/

~o=C = -.343 . 4 6

/

o ' = .42

oJ

[

I

I I

I

1.0

I I I I11 I0

t-

to

Fig. 4. Normal distribution of monthly precipitation.

t o = t value corresponding to the percentage of zeros in the historical data. If there are no zeros, to is estimated by a special procedure. Equation (12) shows that pc is normally distributed provided that zero values of P are considered to be representative of hypothetical negative amounts. A n y n th root normal distribution may be plotted as a straight line by this method ( n = l/c). Contrary to other models for synthesizing rainfall data which are based on Markovian processes and deal with hourly rainfall alone, Eq. (12) is considered to be more adequate to generate the required monthly precipitation data for the streamflow simulation model. The three parameters c, a, and t o are derived by a computer program for each month from the historic data. 't' is reproduced by a random number generator. The resulted values of P are applied thereafter in the model.

296

JOCHANAN BONNE

The assumption that monthly precipitation data are randomly distributed has been checked. For this purpose, the Fisher transformation was applied. l+r Z = 0.5 In - 1-r

(13)

where Z is approximately normally distributed with the approximate mean and standard deviation of 0.5 In (1 + p ) / ( 1 - p ) and 1 / ( n - 3 ) ~ regardless of the values of p. r is the serial correlation. The null hypothesis that r is not significantly different from zero was tested with significant levels of 95'~/o and 99~. CHOICE OF DISTRIBUTION

Streamflows and precipitation series, by nature, are not normally distributed. Therefore, G a m m a and Log Normal distribution are widely used to appropriate the distribution of hydrologic events. Exploring the underlying distribution of certain flow records, it was recognized that streamflows from watersheds with different physiographic features and climatologic conditions willexhibit different laws of frequency. Moreover, runoff, even from the same basin, may belong to different types of distributions within one year period. This may be so because of the seasonal flow variations due to the origin of the various flow components, i.e., ground or surface storage outflow, precipitation, or snow melt. In order to elaborate the flow generating model so as to yield better estimates of the streamflow population, the distribution derived from historical data should be presumed. Out of eleven different frequency distributions* the following three major types have been chosen for this purpose: (a) Normal, (b) Chow's Log Normal, and (c) Pearson Type III (which is a derivation of the G a m m a distribution). A computer program 11) has been utilized which determines the goodness of fit of the historical series to each one of the theoretical distributions mentioned above by the mean of a best fit coefficient which is defined as: R~ = 1 - - ~ ( Y -

)9)2/~, ( y _ ) 7 ) 2

(14)

where Y= historical event of the assigned frequency; Y= theoretical event due to the considered distribution for the same frequency as Y; and I7= mean of the historical events. * Gumbel, Lieblein, Chow's Log Normal; Gringorten's seven type curves and Gamma distribution.

S T O C H A S T I C S I M U L A T I O N OF M O N T H L Y S T R E A M F L O W

297

( y _ f,)2 represents the variability about the theoretical distribution, which is different for each distribution, and ~ ( Y - 17)2 is the variability about the sample mean which is the same for all the distributions. To maintain homogeneity in frequency distribution of the different variables within the recursion equation, various transformations of the historical records have been investigated to find out which one would best fit the normal distribution. This distribution has been chosen because the n th root of the monthly precipitation records are already considered as normally distributed, and also to simplify the generation of random deviates. Based on the three major distributions mentioned in the former section, the following transformations have been applied to yield normal flow variates:

where

K = LogeX

(15)

K = 6/G(((G.X/2) + 1)+ - 1) + G/6

(16)

K = 6/G 1 (((G 1 "Loge X/2) + 1)~ - 1) + G1/6

(17)

K = normally distributed flow variate; X = recorded flow; and G, G' = skew coefficients of X a n d loge X respectively or G = ~ ( X - X ) 3 / ( N - 1) Sx 3.

These equations are utilized whenever it is assumed that the primary records are distributed as a Gamma function or one of its derivations. A D J U S T M E N T OF T H E M O D E L A C C O R D I N G T O T H E PRESUMED FREQUENCY DISTRIBUTION

The basic generation Eq. (10) can be adjusted to fit the normal distribution after the underlying distribution of the recorded fows is determined. For this purpose the multiple regression Eq. (5) is reformulated in the following four different versions. In all of these versions the n th root of the monthly precipitation data has been applied, assuming that in this manner they are normally distributed. (a) Untransformed streamflows: t--1

Qt= A + BQt_, + CPtcs + oPf~]' + E ~

P['

(18)

i=t--j

(b) Streamflows transformed by Eq. (16) (Pearson Type III distribution): t--1

Qtt = A + BQ)_, + CP{s + D P ~ ' + E ~

P[

(19)

i=t-j

(c) Logarithm of the monthly streamflow: t--I

lnQt=A+BlnOt_t

+ C P [ J+DP[j--~ 'E

~ i=t-j

P['

(20)

298

JOCHANAN BONNE

(d) Streamflow transformation by Eq. (4) (Log Pearson Type III distribution): t--1

In Q: = A + B In Q I _

1 q-

CP[ J + DPtJ_]' E ~

P['

(21)

i=t-j

where in all four equations: Q, and Q,_ 1 = the current and previous month flows; A, B, C, D, E = the regression coefficients; Pt, P t _ l , ~ P i = t h e current, the previous month and the accumulated precipitation; c = e x p o n e n t equivalent to the n th root of the precipitation data; and 1 = transformed data. After the normal flow traces are generated by one or more of the above versions, they are retransformed to obtain again their parent distribution.

Operation of the model The complete model consists of five separate features, each of which has its own function. Because of storage limitations of the available computer, the model is arranged in such a manner that each function is designed to be a self-contained program rather than a sub-routine in a large program. The task of each sub-program can be described as follows: (a) Derivation of the statistical parameters for the recorded flows and selection of the best frequency distribution (FFD). (b) Derivation of the parameters for the normal distribution of precipitation (NDP). (c) Data preparation and transformation (DPT). (d) Multiple regression program (MR). (e) Generation of monthly precipitation and streamflows (SGSMR). A schematic layout of these components is shown in Fig. 5. The programs are written in Fortran IV and have been run on the XDS SIGMA 7 digital computer. The input information to the model consists of monthly recorded flows and precipitation from the index station for a pre-selected period. The following operations are then executed in successive stages: (a) The streamflow records are run through a fitting frequency distribution program where a choice of one out of four* frequency distributions is made for each month. In addition, the statistical parameters of mean, standard deviation and skewness coefficients are obtained. * Normal, Log Normal, Pearson Type III, and Log Pearson Type IlL

299

STOCHASTIC SIMULATION OF MONTHLY STREAMFLOW

IPRECIPITATIONRECORDS

I STREAMFLOWRECORDS-7

DPT

NDP

DATA PREPARATION

PRECIPITATION PARAMETERS

AND TRANSFORMATON

_

MULTIPLE M. REGRESSION DERIVATIONOF OPTIMAL COEFFICIENTS

FFD

I

STATISTICAL PAR. r-w~ CHOICEOF FREQUENCY/

Ito ~-

u~ --

SGSMR STOCHASTIC GENERATION

_

OF

--

PRECIE AND STREAMFLOWS

I L

. . . . . .

Fig. 5. Sequenceof operation chart.

(b) Simultaneously, the normal distribution precipitation parameters are derived by a separate program, namely the n th root (c), t o and a. (c) The streamflows and precipitation data are prepared for use as input to the multiple regression program. This includes calculation of antecedent precipitation for each month, transformation from acre-feet of streamflow to inches, and arrangement of the data, month-by-month, for the entire record period. (d) Derivation of the optimal regression and correlation coefficients for the simulation model by a multiple regression program. Streamflow records are transformed according to the selected frequency and the precipitation data are raised to the power " c " prior to entering the multiple regression computations. In the first run all the variables are included in the regression equation, while in further runs only the variables that are found to be significant are included, until an optimal combination is reached in terms of a maximum determination coefficient (R 2) and a minimum standard deviation of estimates (Sy).

300

JOCHANAN BONNE

ONTHLY

DIS

KEY

< OUTPUT >

%___J

RECipiTATi0N ~ . _ ~ . S C p[ ~ A R A M E T E R ~

@ 'r

[

<

//'/GENERATED ~"\>

\\N F i g . 6.

FLOWS

/ //

SGSMR

-

FORMATION\

/

<

GE'N'REATE? \\\

\\

FLOWS \

>

/ /

flow chart.

(e) Generation of precipitation and the corresponding streamflows. A detailed flow chart of the generating program is given in Fig. 6. The derived constants and coefficients from stages (1) to (d) together with the assigned frequency transformation are read into this program. Precipitation is generated first by Eq. (12). Estimates of streamflows are obtained by the optimal regression function. The random component is calculated separately and combined later with the estimates to yield the generated flow. If a better matching with the historic mean flows and standard deviation is required, a further computation takes place to derive the residual of the generated flows. The final simulated flow is then obtained by multiplying the residual by the historic standard deviation and adding to this product the historic mean.

STOCHASTIC SIMULATIONOF MONTHLY STREAMFLOW

301

The simulated flows can be fed back into the FFD program to yield the statistical parameters of the simulated series for comparison with the historical records. TESTING OF THE MODEL In order to test the validity of the model, three watersheds of different climatic and physical characteristics were selected in the western United States: l) East Fork of the Carson River near Gardnerville, a high lying basin on the eastern slope of the Sierra;

MIRANDA

~ i uJ .J,z

SO T F RK,EEL R

"\ "~

Cerson S~nk t-"

"~1~ -.,,, ,I

EAST FORK, ",k~ CARSON RIVER/

)

WOODF

MARKLEEVILLE# f

",

,/

PA CI FI C OCEAN

J



Rain

Gage

54,

Fig. 7. Location map of selected basins.

\\

302

JOCHANAN BONNE

2) South Fork of the Eel River near Miranda, a coastal basin of northern California; and 3) Tujunga Creek, an ephemeral stream of a semi-arid basin in southern California. Figure 7 shows the location of the three watersheds. As precipitation is the main input to the model, it is necessary to verify first, before further testing takes place, that the generated precipitation is of the same population and has the same frequency distribution as the recorded data. Woodfords

Precipitation

28

( r- . . . . .

Years )

J

4-

~.

]

3-

--~L ~-~

-

-

~

..........

g.....

ted data

E

o

OCT. I NOV. I

Fig. 8.

DEC. I

d...

I FEB. I MAR. I

APR. I M.Y

I

dUN. I

00L. I

• I SEPT. [

Means and standard deviations of monthly precipitation. NIRANDA P~ECIPITATI~N PATA

-- ;:*~ Y E A ~ 5

9~" I ~ E C ~ D

~CTOSE~

PREC|P. t~.C~ES S

lo.c+ H s



s

~



H s



s

S

sH~ H H

1,c*

E

• S

l

.

'

I I

,I*

i

I i i i I

..., ........ 3"~

+ .....

, .....

, ..... 13,8

+.., ..... Z~'t

,..,..,..,..,..,.,.,.,..,.-,..,.-,--,--,-.,--, 3~.5 ~.8 55.~ 6S,5

CL,~ULATtVE

-

t,~MAL D I S T ~ I B U T I ~ ,

..... 75,~

,

.....

S6,2

(~1

"ISTeRIC

~I~UL^TEa •

-

~.S

Fig. 9.

Frequency distribution of monthly precipitation.

, .....

, ........

,.. S6,S

303

STOCHASTIC SIMULATION OF MONTHLY STREAMFLOW o

35o-

South Fork, Eel River I as YEA,S or RECOROI

300-



/

/i"

Multiple Regression Simulation

250

c5

o

2oo

0

_o

LLJ P-

150

Ioo

5o

o •

I

I

]

L

k

I

I

0

50

~00

~50

20o

aso

3o0

35o

OBSERVED (1000 Ac. Ft.) Fig. 10.

Mean monthly flows, recorded vs simulated.

3oo-

South Fork, Eel River

o /

( 28 "/EARS OF RECORD ) 250-



/ /

Multiple Regression Simulation

/•

,,- 200 J 0 o o

150

I---

5O

0

50

~00

~50

200

25O

I

50O

OBSERVED (1000 Ac. Ft.)

Fig. 11.

Standard deviation of monthly flows, recorded vs simulated.

304

JOCHANAN BONNE

The results of these tests are shown in Figs. 8 and 9. The simulation results were tested by the following procedures: (a) Comparison of the statistical parameters of the recorded flows such as mean, standard deviation, skew coefficient, and serial correlation, with the corresponding quantities for the simulated records. The statistical parameters of the simulated flows were also compared with those of flow traces generated by a simple regression model where only serial correlation of streamflow was applied12). The results are demonstrated in Figs. 10 and 11. $OUTN FORK Or THE EEk R t V E ~ . ,

Ea YEARS OR RECORD

KEORUARy

F~OW i 0 0 0 Ar

I000.0*

w •

H S I00,0+

~



WS

,



S H

S W



*

+



-

:

w $

. .....

+..°..,°.

S



,.,S

W

S

s s

1o.o*

o°.+

........

, .....

. .....

, . . . . . , . . . . . , . . , . . , . . , . . , . . , . . , . , . , . . . . , . . , . . , . . , . . , . . , . . , . . ,

CU,Uk*t~VE

NORMAL

DISTRIBUTIONJ

.....

......

...

tx~

w * ~ISTORIC $ $ SZMUkAT~D •

-W,S

Fig. 12.

Frequency distribution of monthly flows.

TABLE 2

Maximum deviation in the frequency distribution of the historic and simulated flows,

South Fork, Eel River. Month Max. dev.

X

XI

XII

I

II

III

IV

V

VI

VII

VIII

IX

0.310 0.079 0.085 0.172 0.069 0.103 0.069 0.138 0.178 0.340 0.069 0.069

Level of significance ( ~ )

5

1

Critical value

0.355

0.340

STOCHASTIC SIMULATION OF MONTHLY STREAMFLOW

305

(b) Comparison of the recorded and simulated flow frequency distribution (Fig. 12). (c) Significance tests showing whether the generated flows are of the same population as the historic flows la) (Table 2). Conclusions

The reliability of simulation models is often questioned because of the uncertainties involved in the source of the variation of the flows, whether it really reflects natural phenomena or is affected by the length of the observation period as well as by sampling errors. The results of this study indicate that a hybrid model which combines stochastic and deterministic approaches is to be preferred over sophisticated statistical methods which bypass the hydrological processes involved in streamflow determination. In this model, statistical analysis is used to examine more realistically the significance of a number of relationships in terms of their meaning in the hydrologic systems. The deterministic part of this approach applies "parametric hydrology" to evaluate the relationships among the physical parameters involved in the TABLE 3 Serial, simple and multiple correlation of streamflows and precipitation and the corresponding variance reduction, South Fork, Eel River. Month X XI XII I II llI lV V VI VII VIII IX

rs rp R SQ Sy 2

= ----

-

rs

rp

R

( 1 - - r s 2) SQ 2

Sr 2

0.177 0.326 0.365 0.371 0.004 0.368 0.412 0.452 0.852 0.922 0.903 0.577

0.872 0.717 0.904 0.877 0.928 0.857 0.800 0.675 0.186

0.872 0.864 0.964 0.933 0.936 0.893 0.845 0.867 0.889 0.946 0.959 0.802

1.700 1.269 4.320 11.673 2.950 2.779 0.466 0.329 0.110 0.30 0.028 0.307

0.393 0.388 0.942 1.889 0.393 0.700 0.140 0.100 0.090 0.023 0.012 0.177

0.331 0.540

Serial correlation coefficient of streamflow; Correlation coefficient between streamflow and precipitation; Multiple correlation coefficient; Streamflow standard deviation; and Estimated flow variance

Var. Reduc. 76.5 69.5 78.9 84.0 92.0 75.0 70.0 70.0 18.0 23.0 57.0 42.5

306

JOCHANAN BONNE

hydrologic events and selects the significant relationship to generate flow sequences. The stochastic approach, in essence, is the use of statistical characteristics of the hydrologic variables to define their random component and the probability of its occurrence in the hydrologic event. By defining the relationship between streamflows and the associated precipitation in the form of a multiple regression function, the variance of the flow estimate is significantly reduced in comparison with the variance of the Markovian models that only use previous streamflows (Table 3). As a consequence, the random component or the error term in the generating equation of the composite model becomes smaller; thus, the deterministic portion of the model is increased and a higher degree of reliability is thereby attributed to the results. The reliability of simulated flow records depends to a large extent on how successful the results of the following operations are for each month. a. Generation of precipitation data which are statistically indistinguishable from the historic record; b. Identification of the underlying frequency distribution of the streamflow; and c. Derivation of the optimal combination of hydrologic variables which participate in the multiple regression equation. Monthly precipitation data were found to be independent in time. Therefore, the historic data could be reused in the model by applying a Monte Carlo approach or reshuffling the monthly data sequences. However, the use of a precipitation generator is preferred as the range and variability of the simulated precipitation series may differ from the historic ones. It is strongly believed, even though it could not be proven within the scope of this work, that stream flow distributions have physical significance which depends on the origin of the runoff. This phenomena was clearly observed in the South Fork of the Eel River where two different transformations, one for the wet season and the other for the drier season, were found to yield the best fit to the normal distribution (Table 4). Pearson Type III was used for months November to March and Log Pearson Type III from April to October. In spite of the fact that statistical tests are frequently inadequate because the sample size or record lengths are insufficient to distinguish between alternative flow populations (for example, normal, log normal, or gamma), the estimation of the underlying frequency distribution together with the corresponding transformation has improved the simulation in terms of resemblance to historic records. In general, the criterion which has been utilized for the selection of the

STOCHASTIC SIMULATION OF MONTHLY STREAMFLOW

307

TABLE 4 Selection of best transformation South Fork, Eel River

Historic flows

Loge of historic flows

Pearson Log Pearson Type III Type III Transformation Trans.

Chow's Log Normal

Normal

Normal

Normal

Normal

Rf 2

RI ~

R~ 2

Rs 2

Rs ~

X XI

0.677 0.919

0.394 0.893

0.859 0.930

0.684 0.979

0.915 0.688

XII

0.920

0.847

0.939

0.977

0.673

0.945

0.809

0.973

0.972

0.966 0.943 0.923 0.880 0.942 0.955 0.922

0.866 0.980 b 0.955 e 0.971 s 0.980 t 0.980 0.972

Month

b e

I

0.913

0.951

0.894 S

lI

0.940

0.885

0.968 t

III IV V VI VII VIII IX

0.950 0.935 0.939 0.921 0.963 0.972 0.954

0.962 0.815 0.840 0.786 0.897 0.926 0.867

0.922 0.977 0.951 0.926 0.968 0.979 0.965

significant variables in the multiple regression was found to be satisfactory. However, in certain cases it was still necessary to use subjective judgement to select the optimal combination instead of strictly following the described "rules". This fact may be the outcome of using non-representative rainfall or insufficient length of record. It was also noted that the choice of transformation has some influence on the optimal combination for a certain month. The causal influence of the various forms of precipitation is quite strong, as is revealed by the significance tests. However, where the random component is comparatively large, it can lead to errors, which show up as unrealistic outliers. These outliers appear particularly when the logarithmic version of the multiple regression is applied and the deterministic component is relatively small, as may be the case during late summer or early fall. It is even more pronounced in ephemeral streams. It can be concluded that in basins where both representative precipitation

308

JOCHANAN BONNE

data and streamflow record are available, incorporation of the precipitation in the regression function improves the simulation process. The model which uses this information is able to generate successfully monthly flows of perennial streams. Difficulties are encountered when the basin is too large in area or in elevation range so that one rain gage is not sufficient to represent the variation in rainfall distribution over the basin. Ephemeral streams can not be simulated adequately because of the simplified characteristics of the hydrologic system assumed for the model, which are unsuitable for these streams. Further studies in this realm might be directed toward the application of this same approach in multivariate models, namely when more than one stream gage and rain gage are involved.

Acknowledgement The study upon which this publication is based was accomplished while the author was a graduate student at the University of Nevada, Reno. It was supported in part by funds provided by the United States Department of Interior, Office of Water Resources Research and in part by the Center for Water Resources Research, Desert Research Institute, University of Nevada System. The author wishes to thank Dr. George B. Maxey, Director of the Center for Water Resources Research, Dr. William S. Butcher from the University of Texas at Austin, and Dr. Vulli L. Gupta for their advice and encouragement throughout this study.

List of Selected Symbols Notation: The letter symbols adopted for use in this study are defined where they first appear and listed alphabetically hereafter. A = Multiple regression intercept B --Multiple regression coefficient bi = Regression coefficient C = Multiple regression coefficient c = Exponent by which precipitation is raised to derive a normal distribution variate D = Multiple regression coefficient D P T = D a t a preparation and transformation program E = Multiple regression coefficient F F D =Fitting frequency distribution program G1, G 1= Skew coefficient of flows and logarithm of flow respectively

STOCHASTIC SIMULATION OF MONTHLY STREAMFLOW

j K K

=- M o n t h c o u n t e r = Recession coefficient = N o r m a l l y distributed flow variate

k MR

= N u m b e r of i n d e p e n d e n t variables = M u l t i p l e regression p r o g r a m

N, n NDP P Q q R

= N u m b e r of observations = N o r m a l d i s t r i b u t i o n parameters of precipitation p r o g r a m = M o n t h l y precipitation = M o n t h l y flows = Flow rate or r u n o f f = Multiple correlation coefficient = D e t e r m i n a t i o n coefficient

R2

R~ r S

Sh,

= = = =

SQ Sy t to

= = = =

309

F r e q u e n c y best fit coefficient C o r r e l a t i o n coefficient S t a n d a r d deviation S t a n d a r d error of the regression coefficient

X

S t a n d a r d deviation of historic flows S t a n d a r d error of estimate of the p o p u l a t i o n N o r m a l s t a n d a r d deviate N o r m a l s t a n d a r d deviate c o r r e s p o n d i n g to the percentage of zeros in the historical precipitation data = Recorded flow

Y f" 1~

= I n d e p e n d e n t variable in the multiple regression e q u a t i o n = Estimate of the p o p u l a t i o n m e a n = Sample m e a n

~, fl e /~

= Regression coefficient = I n d e p e n d e n t error term = Mean

p v

= Serial correlation coefficient = R a n d o m n o r m a l deviate = S t a n d a r d deviation

References 1) M. B. Fiering, Synthetic Hydrology: An Assessment. Water Research, Western Resources Conference 7th, Colorado State University (July 1965) p. 332 2) H. A. Thomas, Jr. and M. B. Fiering, In: Arthur Maas et al., The Design of Water Resource Systems (Harvard Univ. Press, Cambridge, 1962) Chap. XII 3) M. B. Fiering, Streamflow synthesis (Harvard Univ. Press, Cambridge, Mass., 1967) 4) L. A. Roesner and V. M. Yevdjevich, Mathematical models for time series of monthly precipitation and monthly runoff. Hydrology papers, Colorado State Univ., Fort Collins, Colorado (October 1966) p. 1

310

JOCHANAN BONNE

5) M . A . Benson and N. C. Matalas, Synthetic hydrology based on regional statistical parameters. Water Resources Res. 3 (4th Quarter 1967) 931-935 6) N. C. Matalas, Mathematical assessment of synthetic hydrology. Water Resources Res. 3 (4th Quarter 1967) 7) A. L. Sharp, A. E. Gibbs, W. J. Owens and B. Harris, Application of the multiple regression approach in evaluating parameters effecting water yields in river basins. J. Geophys. Res. 65 (1960) 1273-1286 8) C. K. Stidd, Cube-root normal precipitation distributions. Trans. Am. Geophys. Union, 34 (February 1953) 31-35 9) C. K. Stidd, A three parameter distribution for precipitation data with a straight-line plotting method. Proc. First Statistical Meteorological Conference, Hartford, Connecticut (May 1968) pp. 158-162 10) C. K. Stidd, The N TM root normal distribution of precipitation. Desert Research Institute, University of Nevada System (November 1969) 15 p. 11) V. L. Gupta, Model selection of frequency analysis of small watersheds runoff. Ph.D. Dissertation, West Virginia University (June 1968) 12) L. R. Beard, Use of interrelated records to simulate streamflow. J. Hydraulics Div. A.S.C.E., 91, No. HY5 (September 1965) 13-22 13) S. Siegel, Non parametric statistic for behavioral sciences (McGraw Hill, 1956) pp. 127-130, 278