280
GENERATION OF WEEKLY STREAMFLOW DATA FOR THE R I V E R DANUBE-RIVER MA IN- SY STEM EXPERIENCES WITH AN AUTOREGRESSIVE M U L T I V A R I A T E MULTILAG MODEL L.A.
SIEGERSTETTER AND W. WAHLIB
Technical U n i v e r s i t y of Munich
ABSTRACT In search for an optimal operating strategy for a complex hydrological system simulation runs required hundreds of years of synthetic weekly streamflow data. A multivariate multilag model was used for the generation of both time and space correlated hydrological series for eight gauging stations. As the model assumes the input to be normally distributed standardized variables extensive data transformation had to be performed. Stepwise regression is applied to k e e p the number of parameters defining the multiple autoregressive process as low as possible. 1.
INTRODUCTION
Statistical simulation of river flows is considered a powerful tool in the design and operation of water resources projects. Quite often historical records are too short to serve as a secure basis for the analysis of the behaviour of the system in qucstion. ‘I’hus, long synthetic flow series have to b e gcneratccl which truly reproduce the characteristics of the original d a t a . I urtherm o r e , for complex systems cross-correlated serics f o r Reprinted from Time Series Methods in Hydrosciences, by A.H. El-Shaarawi and S.R. Esterby (Editors) o 1982 Elsevier Scientific Publishing Company, Amsterdam - Printed in The Netherlands
281
various locations may have to be considered simultaneously. As a consequence a multivariate model is required describing not only the internal structures o f the respective series but also a l l spacial interactions. A well known multivariate approach was introduced by G . K . Young and W . C . Pisano when they extended a first order autoregressive process (AK(l)-process) suggested by N . C . klatalas. llowever, flow series usually prove to possess significant autocorrelation coeffients for time lags higher than lag-I and should not be treated as A K ( 1 ) processes to fully use all available information. In the following a higher order autoregressive model and its application as a planning tool is discussed. 2.
TlIE MOl)llI,
A multivariate autoregressive model that theoretically allows f o r any p time lags is given by the matrix equation
,(i)
=
/I1 >; ( i - 1 )
where Y i i ) , x(i-l)
A1, 112, 1: (i).
*2
+
3
... ,
x (i-2)
---
7
Ap, H:
+
...
x Ci-P) :
+ Ap
x(i-P)
+
H.1': (i) (1)
(nl-vectors containing flow measurements for time intervals i , ..., i-p for all n gauges
(n n)-matrices with model parameters (n)-vector representing white noise order of the autorcgressive process
Persistence of the runoff is rnodc1lcd by assuming that any flow realization is influenced by p predecessors observed at the same and a11 other gauges. A n indepcndcnt stochastic component 13. 1 , l i ) a d c ~ c t to ~ the autoregressive part o r eq. ( I )
282
accounts for all such variance that cannot be derived from the history of the runoff process. 2.1
Estimation of Model Parameters ____ In comparison with an univariate model of the same order the number of model parameters in eq. (1) is increased by n 2 . Practical application, however, demands that the runoff structure be reproduced in the most economical way, i.e. with the lowest possible number of coefficients. Stepwise linear regression was therefore used in the estimation of the matrices Al, . . . , Ap. Matrix equation (1) is split into separate row requations each being interpreted as a linear regression of the form
+ a2
=
-
(i-2)+ a 2 . J32
j , l x1
k = 1 a2j,k k'
... ,
i=p+l,
n:
... ,
m: k=l,
(i-2) + . . . +
2
...
1,
K=
1 aPj,k k'
(2)
Ci-p)
regression coefficient for the pth predecessor at gauge k
aPj ,k: j=1,
(i-2) + . . . + a 2 . x (i-2) + J,n n
2
n
al. (i-l)+ k=1 J,k Xk
where
X
m:
row number in eq. (1) index total number of observations
... ,
n:
gauge number
Stepwise regression searches eq. (2) for the optimal conbination of independent variables in the description of
283
the response variable. Finally, a dependence is found that ( i ) contains significant regression coefficients only (ii) minimizes the variance of deviations from observed to cornputr; values for the response variable. Remaining (significant) regression coefficients are inserted in matrices A l , ... , Ap of e q . ( 1 ) while a11 ot\er matrix elements a r e assumed to be zero. In estimating the stochastic matrix H the difference between the observed values for the response variable and the respective comDuted values is calculated f o r each time interval as
Provided that the vectors E ( i ) consist of (0,l)-normally distributed random numbers the expected value of B - H T can be derived from D a s
with
and
I: T :1'. 13 , L .
identity matrix transpose matrices of H and Ii
1:
=
[I:"',
~('1,
. .. ,
I)
=
[I)"),
D"),
... ,
(1)
,
... ,
E (m-p)]
1) (m-p) ]
B can then easily be found using a n iterative procedure described by G.K. Young and W.C. Pisano. 2.2
Data Transformation In general the observed flow series cannot be used in the calibration of parameters without statistical manipulation as the suggested model requires stationary and normally distributed input data. A double data transformation approximately fulfils these requirements:
284
( i ) To achieve normality the following simple mathe-
matical functions were used nx
=
x1/ 2
(5)
nx
=
x1/3
(6)
nx
=
log(x)
(7)
A more complex function transforms the distributions Pearson-Type I 1 1 into a normal distribution nx
6 cs cs { [ - 2 K + 1 1
= -
where
cs
cs:
coefficient of skewness
K:
standardized parameter of Pearson-Type I 1 1 distribution
(ii) The second transformation elimates cyclical components by a seasonal standardization
nx :
normalized runoff values
cyclically standardized runoff values (residuals) rx : nx Z ,sz: mean and standard deviation of nx in time interval z Z: cyclical index (z = 1 , 2 , . . . , 5 2 for weeks)
-
The residuals rx can now be used as input data for the model. Consequently, the algorithm will produce synthetic residuals which obviously have to undergo the inverse transformations to (i) and (ii) to represent the desired artificial runoff series.
APPLICATION OF TIIE MODIJI, The model was tested for a complex hydrological and water resources system (Fig. 1 ) . Water .from the Danube watershed will b e transfered to the Rivcr Main watershed 3.
285
MAIN
1 I
RESERVOIR
...
1.
CLEINE
son-
BROMEACH-
OONAU
I:ig.
N URN BERG
0REGENSBURG
Schema o f t h c R i v e r D a n u b e - R i v e r Main System
286
to improve its water balance. The system includes the Main-Danube-Canal, three reservoirs and various natural or artificial connecting ducts. To obtain optimal strategies for the long term operation of the system a computer routine has been developed o n the basis of a weekly simulation. As input data numerous 100-year synthctic series for 8 water gauges in the respective areas were required. Following a n extensive data analysis and pilot runs an autoregressive third order process was considered suitable for the generation of the synthetic series. Stepwise regression reduced the number of significant elements in the matrices 121, A2 and A? to appr. 40 percent of the maximum possible value with a minimum of 14 percent for A3. It should be noted that A3 was basically a diagonal matrix which indicates that spacial dependencics were neglectable at lag-3 and higher in this specific case. 4. RESU1,lS 4.1 Transformations The gencrating algorithm choses the most suitablc transformation function out of (5) - (8) with optimality being defined as
the lowest possible skewness is observed in the transformed data (ii) thc null hypothesis of a normal distribution is not rejected at the 5 percent level. (i)
When using just onc transformation function for a complete runoff record it was found that while some of the weekly series fulfilled the a b o v c criteria othcrs wcrc significantly rejected. Ilcncc, a s p c c in1 transformation was s c lectcd for each wcck at cach station with extrcmcly satisfying results as only 4 out of a total of 410 wcckly distributions did not meet rcquiremcnts ( i l and ( i i ) s i n i u l tancoulsy. In F u c h cascs condition [ i ) w a s consirlcrctl to
287
to be sufficient. Whereas normality of the residuals could be achieved in good approximation for most weeks unrealistically high synthetic runoff values were in some few cases observed with log and Pearson-Type 111. The result was an overestimation of the skewness of the artificial data. The explanation may be purely mathematical as the inverse 3 transformations contain ex and x . In general, a more sophisticated estimation (maximum likelihood) of the population parameters should be used if extreme values are of m a j o r importance. In addition, the skewness criterion may be responsible for the production of too high runoff values as it is very sensitive to extreme values in the observed data. After an effort to cope with a single large observation the inverse transformation can lead to a relatively high proportion of large synthetic values. In the final version of the algorithm the log-transformation (7) was used in 64 percent of all cases. Second came Pearson-Type I 1 1 (8) with approximately 30 percent whereas functions (5) and ( 6 ) were selected in about 2 p e r cent and 4 percent of the weeks, respectively. 4.2
Statistical Parameters _____ .___-____-
The statistical parameters mean, standard deviation and skewness were computed f o r both the generated residuals and the synthetic runoff. Constant comparison of these parameters clearly showed theeffects of the chosen transformation and the quality of the model. Figures 2 and 3 present the results of a typical 100-year production run for the gauge 'Jluttendorf' with respect to means arid standard deviations. Figure 4 compares the rcsul ts for five 100-years series with one 500-year series. It should be noted that 100-year series still show fairly large sanpling deviations which have to h e consitlercd in a n y application.
288 _____
STRFAIIFLOW
G E N E R A T I O N OF W E E K L Y
GAUGt ~-~-
Fig.
2.
DONAU-MAIN
. I i U E T T E N D O R F /' R E G N I T Z M E A N VALUES _ _ - 100 SYFITHETIC HISTOR[CAL 1977 ~ _ _RECORD 1 9 3 0-__________-
-
YFARS
~
Weekly means a t g a u g e f l u t t e n d o r f si mu 1a t i o n )
G E N E R A T I O N O F WEEKLY STREAMFLOW GAUGt
sYsTtri
HUETTtNnOKF
/
DONAU-MA I N S Y S T E M STANDARD D E V I A T I O N S
RFGNITZ
HISTORTC4L RECORD 1920-1977 -~ ~ - _~
(100-year
-
-
_
100 S Y N T H E T I C Y E A R S
1
Fig.
3 . Weckly s t a n d a r d d c v i a t i o n s a t g a u g c I l i i t t c n d o r f
( 1 9 9 - y e a r simulation)
289
GENERATION OF WEEKLY STREAMFLOW DONAU-MAIN SYSTEM GAUGE : SCHWEINFURT / MAIN MEAN VALUES HISTORICAL RECORD 1930-1977 _ _ _ 500 SYNTHETIC YEARS -. . . . . MINIMUM AND MAXIMUM VALUES FROM FIVE 100-YEAR SERIES
"i i
Fig. 4. Weekly means at gauge Schweinfurt (500-year simulation and extreme values out of five 100-year simulations __
~
_____
GtNERATION OF WEEKLY STREAMFLOW GAUGE : HUETTENDORF / REGNITZ HISTORICAL RECORD 1 9 3 0 - I 9 7 7
rK E
BE
EI
TR
l
NE
HU
LAG
-K E
BE
€1
TR
NE
HU
PE
-
PE
L
DUNAU-MAIN SYSTEM CORRELOGRAMS _ _ _ 1 0 0 SYNTHETIC YEARS -__
SF
KE
BE
€1
TR
c
NE
2
HU
PE
SF
LAG - 3
SF
KE
BE
El
TR
NE
HU
PE
*
SF
Fig. 5. Auto- and crosscotrelation coefficients at gauge i-!uttendorf for time lags 0 to 3 (100-year simulation)
290
Correlation Structure In Figure 5 the reproduction of the original correlation structure of the hydrological system is presented for time lags 0 to 3. There is some indication that correlation is overestimated for higher time lags. IIowever, this can only be observed for the synthetic runoff, i.e. after inverse transformation, whereas residuals reflect the correlation structure precisely and without systematic deviations. Iiow transformations do affect correlation is still to be investigated. 4.3
4.3 Neeative Runoff When sampling from a symmetrical distribution function < x < +co negative values are likely t o with range -.a occur for long series with a probability depending on mean and standard deviation. The proportion of negative runoff produced for the Danube-Main-System was around 0,05 percent. Without significantly affecting the statistical properties of the series the values were equalled nought. In multivariate simulation negative values of inter-
mediate runoff can be observed, especially if runoff sequences possess high coefficients of variation together with almost identical means as in the case of the gauges along the River Altmuhl. ‘The number of negative values generated totalled approximately 4 percent. It should be mentioned that negative intermediate runoff was found in the Altmuhl observations, too. At all other gauges no such irregularities occurred. 5.
CONCLUSION
A multivariate regression model for the generation of both time and space correlated synthetic runoff sequences is described. Time dependence can lie reproduced for any
291
number of time lags required. In an application weekly runoff series were generated. For the necessary transformation of the observed data into normal distribution various functions were tested with log (x) and Pearson-Type I 1 1 proving to be most suitable. The results with respect to the reproduction of statistical parameters as means, standard deviations, auto- and crosscorrelation coefficients are shown for some typical sample sequences. It should be mentioned that the model has also been used for the generation of monthly series as well as sequences of decades with satisfying results. REFERENCES Fiering, b 1 . B . : Multivariate Technique for Synthetic IIydrology. Journal of the Hydraulics Division ASCE 9 0 (1964) NO. IIY 5, p p . 43 - 60. Kindler, J., W. Zuberek: On some Multi-Site Multi-Season Streamflow Generation Models. International Institute for Applied System Analysis, RM-76-76, Laxenburg 1967. Matalas, N-C. : Mathematical Assessment of Synthetic IIydrology. Water Resources Research 3 (1967) No. 4, pp. 937 - 945. O'Connell, P.E.: Multivariate Synthetic Hydrology: A Correction. Journal of the Hydraulics Division ASCE 99 (1973) NO. HY 12, p p . 2393 - 2396. I'egram, G.G.S., W. James: MiJltilag Multivariate Autoregressive Model for the Generation of Operational Hydrology. Water Resources Research 8 (1972) No. 4 , p p . 1074 -1076. Schneider, K., R. Ilarboe: Anwezdung des Young-Pisano-Modells zur Erzeugung gleichzeitiger kiinstlicher Abflufireihen fur verschiedene Stellen in einem Einzugsgebiet. Wasserwirtschaft 69 (1979) No. 7/8, pp. 219 - 225. Schramm, M.: Zur mathematischen Darstellung und Simulation des naturlichen Durchflufiprozesses. Acta Hydrophysica 19 (1975) NO. 2/3, p p . 77 - 191. Young, G . K . : Discussion of "Mathematical Assessment of Synthetic Hydrology" by N.C. Matalas. Water Resources Research 4 (1968) No. 3, p p . 681 - 682. Young, G . K . , W . C . Pisano: Operational IIydrology lJsing Residuals. Journal of the Hydraulics Division ASCE 94 (1968) NO. fIY 4 , p p . 909 - 923.