Copyr ig ht © IFAC Identi fi catio n and Sys tem Parame te r Est ima tion 1982. W as hi ngto n D .e . . US A 1982
CONTEMPORARY STOCHASTIC APPROACH TO WATER RESOURCES SYSTEMS: THE ARMP AND FEATURE PREDICTION MODELS O. Ibidapo-Obe Faculty of Engin eering, Universl'ty of Lagos, Akoka, Lagos, Nigeria
Abstract. An autoregressive model with markovian parameters (ARMP/ a.r.m.p.) and a feature prediction scheme (FPM) are developed in this paper . The ARMP is physically based and adaptive in its implementation thus taking into consideration the inherent periodicities in hydrological time series. The FPM is motivated by the current inability to provide a suitable and sufficiently comprehensive yet simplified mathematical hydrological models. It is based on pattern analysis and is such that a system dynamic feature is identified using a priori data which can subsequentl y be used to simulate the missing data (syntheti c data generation) and forecast future hydrological parameters. The ARMP and FPM provide efficient alternatives to some other existing models which are not, in general, applicable to all classes of hydrological problems(perenial droughts and storm surges} It also affords an added advantage as a result of the ability of the schemes to forecast in real-time. A comparative analysis of the two techniques are undertaken using the discharge record data from the River Nile at Aswan Dam from 1870-1945 . It is further proposed that in order to enhance the over-all performance . of the prediction scheme, the FPM may be used as an input (training data) to the ARMP. Keywords. Adaptive systems, data generation, filtering, kalman filters, markov processes, modelling, on-line operation, prediction, stochastic system, water resources. INTRODUCTION It is now generally accepted that for the optimum operation of dams and several other hydraulic structures, an efficient method for forecasting in the short, medium and long-term ranges the various inputs to those structures is essential. These time series inputs which are usually the discharges" water levels, precipitations, etc. invariably have large components of uncertainty thus necessitating the use of stochastic methods for the analysis of outputs (runoff, etc.). A physically based autoregressive scheme (a.r.m.p) is presented in this paper; the parameters o f the scheme are assumed to be markovian. The algorithm can be implemented on-line thus allowing real-time forecasting; the simplicity of the scheme and its adaptive nature are clear advantages over some of the other existing models (Box-Jenkins (1970), Ivakhnenko's group of data handling (1968), etc.) The a.r.m.p., requires a training data and the inability to effectively obtain the length required motivates the use of a feature prediction model which is based on the theory of pattern recognition as an alternative . The model which has been used in several studies related to control and communications (Tou and Gonsalez, 1974) has recently been applied to hydrological models. It provides a useful tool for data generation. The synopsis of the model
given herein suggests some improvement that may incorporate the concept of entropy in hydrologic data generation as a means of improving forecasts. This paper compares the two schemes using the discharge data from the River Nile at Aswan. THE AUTOREGRESSIVE MODEL WITH MARKOVIAN PARAMETERS The autoregressive model with markovian parameters (a.r.m.p) is of the form
(1)
where { y . , i = 0,1,2, . .. , P } are the inputs (pie2iPitations, stream flows, etc); the parameter p represents the length of the training data which is to be selected a-priori usually through the use of certain information criteria (AIC, spectral window, etc, etc) ; a(i) are the parameters of the autoregressiSe AR model which are to be assumed to be gaussian with zero-mean and covariance q ~ . Equation (1) is the phase-variable model (Bohlin, 1970) without the control and can be said to represent a Markov chain of order p. It is further assumed that the AR
1583
1584
O. Ibidapo-Obe (i)
parameters at model (i)
at +1
b
(i)
t
(i)
at
P
satisfies a linear markov
+ v
(i)
=
i
t
(i)
1,2, . . • ,p
(2)
.
such that v are also stochastlc zero t mean uncorrelated gaussIan processes and bt(i) is a deterministic function of t. It should be remarked that a second-order heirachy of parameter processes is obtained if b~i) is also markovian. The equations (1) and (2) are now put in the state space form A
t+1
b A + Vt -t t
,
......
,
...... ,
~
,
~
v (P»' t
Equation (3) now facilitates the use of the Kalman-Bucy (Gelb, 1974) estimation equations. The key problem is to obtain the steady-state solutions for set of parameters
a~l) , i = 1,2, .•• ,p since hydrological time series exhibits some well-defined periodicity. The steady state outputs for the parameter set yields the input for the prediction of the output Yt+1' The mathematical problem can now be stated as follows: Minimize the function
where At+1 is the estimate of A + ; subject t 1 to the constraint given in equation (3). The smoothing and prediction algorithms are given by the following recursive formulas: (6)
with the error covariance
(7) for the smoothing (state estimate extrapolation) and y
t
-
y'
~
l
t-l t -
(10)
t -
for the prediction (state estimate update, algorithm) The one-step ahead forecast for Yt is therefore (11)
The a.r.m.p algorithm is adaptive in nature and it will find very useful purposes not only in synthetic data regeneration but also in the forecasting of flows / discharges.
(P»' at
Ut'
r
= Pt+Rt-PtYt-1Y~-lPt[Y~-lPtYt-l
2 + Iq l-2
(4)
The vector of zero-mean white gaussian noises V have a covariance matrix R , and is such t that no component of V is coirelated with t
t -
(9)
ofOthe initial state; the state noise covariance matrix R at each time t and .t . the measurement nOIse varIance qt2
....... ,
+ K
rV' P y I 2 ]-1 - t-l t t-l + qt
(3)
) , = Y (Yt-1' Yt-2 Yt-p t-l diag(b!l) , b(2) , • • • • J b(P» b -t t t where ( )' indicates the transpose of ( ).
~ t+1 = ~ t
Y t-l
The initial conditions associated with equation (6) to (11) above include an estimate of the vector of coefficients At at time to; the covariance matrix Po
and y' Yt = t-1 At + u t such that (1 ) (2) (at At at (2) (v!l) v v t t
P t +1
t
(8)
THE FEATURE PREDICTION MODEL It is well known that hydrological timeseries data exhibits a random behaviour within a global periodic wave-form; this structural format motivates the application of a model that is capable of predicting significant features of the data and thus identifying interventions, this is what is referred to as the feature prediction model which is based on the principles of pattern recognition that involves the analysis of information (data) in groups. The data set which is assumed periodic with observations at regular discrete time intervals are divided into a suitable periodic time duration (one year or six months usually) called segments. The segments are further divided into objects, which may correspond to the difference seasons in the segment; the observed measurements on the objects are characterised by an n-dimensional vector,
l
=
(Yl' Y2'
.... , Yj'
. .... , y ) '
known as the pattern vector.
n
(12)
The pattern vector is further reduced to contain only those features that are akin to the object thus obtaining a feature vector f = (f l ,f 2 , .... , fj'
... ,fm);m ~ n (13 )
Similar pattern vectors are grouped into a class so as to obtain K « m
Contemporary Stochastic Approach to Water Resourc es Sy stems classification space are known as reference vectors. Confidence bounds are usually obtained to delimit reference vectors. The feature prediction model for hydrological data is in three steps viz. classification of patterns (pattern recognition system), analysis of intraand inter-pattern structures and development of a generation scheme for time series realizations. The pattern vector is transformed into a feature vector by the equation. f
=
(14)
A~
f(k) = f(k) f(k) -i il' i2' .. .• • • , f
~k)
l.m
.... ,
f(k) ij
,
1,2, ... ,no
For each class k: the jth component of all the feature vectors is divid~~)into a positive and negative means (+Sj , _S j(k» as well as standard deviations O. (k» - J
are obtained.
Either of
the means may constitute the jth component y(k) of the R. th reference vector R. j
....
y(k) = (y (k) y(k) , y(k) , '. 00 . , R. j , n R. 2 - R. y(k) (16) ) R. m since there are m components per reference vector and each can take either the positive or negative aean values.
-
The feature vectors, and hence the stochastic time series are assumed to be multivariate normally distributed around each of the reference vectors (intra) as follows:
(~-\l f) }
(17)
where \I f is the mean vector for each of components of the reference vecthe tor and C the covariance matrix of the f feature - vectors of each class. The occurence of reference vectors in successive periodicities is assumed to be markovian so that the probabilities of transitions may be obtained. The association of feature vectors with reference vectors is realized by the use of minimum distance concept (Sebestyen, 1962) such that a distance function between ~ik) and arbitrary feature vector T is '(k)
DR. ISPE-2-y
min R.
n(k) = _11: D(k)
R.
LkR.
(19)
R.
A data set may now be generated using the classifiad reference vectors; the refe~ence vectors may be generated sequentially using the transition probability matrix, so that associated with each period, related feature vectors are synthesized using equation (17) subject to the following constra ints : <
(11 ) f .. < {. S . ± z l.J
-
1.
J
a/ 2
.0.
1.
J
i
+ , - }
m (11i)
l:
j=l
f . . < {S f - z a/ 2 0 f } l.J
-
za/ 2 is the significance level usually 1.96 (15)
) -
k = 1,2, ... K; i
R. = 1,2, .•. , Lk A characteristic distance may be obtained for example as
(i)
where the columns of matrix A are the normalized eigenvectors of the covariance matrix corresponding to the components of the pattern vectors. A feature vector in the kth pattern class is of the form
15 85
(18)
confidence level ; S and 0 are the mean and standard deviation ~ssocia{ed with distribution of sum of all components of the feature vectors within the class of the associated reference vector. The feature vectors are transformed into pattern vectors using equation (17); thus producing a synthetic realization of the time series. This scheme thus has its value in data regeneration and it may be used as a base for forecasting using the a.r.m.p. APPLICATIONS The a.r.m.p and the feature prediction model were applied to the simulation of forecasts for the average annual discharge o f the River Nile at Aswan. A shorter analysis using the data of discharges from October I, 1870 to September 30, 1945 on the a.r.m.p has been presented earlier (Ibidapo-Obe, 1978). The entire data period has been partitioned into two components viz . data records prior to 1902 when the Suez Canal was closed and recorded discharges from 1902 to 1945 . These sub periods are referred to a pre-and po stintervention epochs. The following assumptions were made in the use of the a.r.m.p scheme: A is a unit 2 to vector; qt is unity; R and Pt are unit t matrices and the length of 0 the training data p is 5(five). The I-step ahead forecasts is from periods 6(six) onwards. Table 1 provides the annual and predicted discharges as well as a regressive equation of the predicted on actual. The flows follows very closely the actual with a crosscorrelation coefficient of 0 . 79. The feature prediction model is applied t o the same discharge data from the River Nile at Aswan. The data set was divided into 5(five) segments with each segment co ntaining
1586
O. Ibidapo-Obe
3(three) objects. The individual pattern vectors that characterise the objects contain 5(five) data records. The feature vectors are of dimension 3; this provides three pattern classes. A synthesized data from the reference vectors based on the use of random number generator with a multivariate normal distribution is obtained. The re~ suIts are included in Table 1 and Figure 1. The a.r . m.p adapts more efficiently to the intervention than the FPM; hence the fairly large errors in the FPM forecast; the FPM would require a large data set for more accurate forecasts. The upper limits of the confidence intervals have been taken in generating th e feature vectors; these intervals have been used to delimit the range of random numbers that are acceptable. The prior and post intervention means are 3129.80m 3 / s and 3042.50m 3 / s compared with the actuals of 3370.1Om ' / s and 2620.4Om 3/ s respe c tively. The results are compared with the Box and Jenkins Model AR(4) (Sinha and Prasad, 1979).
to other random processes in control, communication, physiological and other systems. REFERENCES 1.
Bohlin, T. (1970), Inf0rmation pattern for linear ciscretetime models witr.. stochastic coefficients. IEEE - Tans. Autom. Control (USA)
2.
Box, G.E. P., anti .Jen'dns, G. /.'. (1970) Time Series Analysis: F?recastin~ and Contro!, HoldenDay Inc., San Fransisco.
3.
Gelb, A. (1974) AppJied Optimal Estimation, The M.I.T. Pr~s-:- Cambridge, Mass.
1.
Ibidapo-Cbe, 0. (1978). A New approach to stochastic data analysis: Tre Nile River at Aswan. Electronics Letters, 14, 765-767.
5.
Ivakhnenko, A.G. (1968). The Group Method of Data Handling, a rival to the method of stochastic approximation: Soviet Automatic Control, 13, 43-55.
6.
Panu, U.S. Prasad, T. and Unny, T.E. (1~77). A Feature prediction model for stochastic time series data: Proc, National Systems Conference, PSG College of Technology, India, E5-1-E5-6.
7.
Sebestyen, G.S. (1962) Decision-Making Processes in Pattern Recognition, Macmillan Co., New York. --
8.
Sinha, N. K. and Prasad, T. (1979). Some stochastic modelling techniques and their applications : Applied Mathematical Modelling, ~ 2-6.
9.
Tou, J.T. and Gonsalez, R.C. (1974) Pattern Recognition Principles, Addisonwesley, Reading, Mass.
CONCLUSION Two techniques have been proposed for the forecast and synthesizing hydrological data respectively. The a.r.m.p is adaptive and hence may be implemented on-line; the feature prediction technique is novel and provides a method by which hydrological data may be generated satisfactor~ly. The two techniques may be implemented sequentially with th e feature pre diction using an apriori data for synthesis and the a.r.m.p takes the system feature as the training data needed to start off the pred i c tion. It is ex p ected th at the application of t h ese techniques to th e analysis and synthesis of stream flows would yi el d very usefu~ results in t h e design and operat ion of seve ral h ydraulic structures. The a.r.m.p.
an d the feature prediction techniques may b e applied
Cont empora r y Stoc hasti c Approach to Wat e r Resources Sy st ems ",J, ~
sooo-
ssco -
l S0 0
---~----------'-------'-----~----
1960
1e 70
Di sc ha r re Data f o r
YEAR
~iv e ~ ~ ile
at Aswan (1870 - 1945)
ACTUAL
ARMP
FPM
(xl )
(x ) 2
(x ) 3
(t) 18 70
3 9 fl R.04 1
(1Qf> P, .
041)
3229.566
1875
3817. 6 06
(3 8 17.606)
3039.475
1 8 80
3 0 76.770
221'6.962
3920.143
18 85
:::9F.3 . 424
3,,1 ;; .963
2948.697
18 90
:~556. 6 1 ~.
3617.457
3731.498
1 8 95
3 657. 5 0C
4329.433
3141.024
1900
2843.64 9
2589.490
3291.746
H 105
2 6ZR. 2 71
19 21.269
3325.741
1910
288 9. 85 0
2470.057
2132.476
2680.220
191 ~
3 0 35 .199
2544.917
3446.854
2823.468
1020
2499.997
2952.204
3858.720
2269.967
1925
24 94 .98 0
2283.447
4102.326
2477 . 708
1930
2 205.1 46
2268.225
2246.458
2666.855
1935
2902.773
2631.558
3420.197
2675.871
1940
1848.086
2453.082
4018.326
2407.764
1945
2211.126
2232.681
2411.496
2790.056
Sample Statistics : (x is the mean and Xl
= 2913.377;
01
604.756 ;
Trend Lines :
x
2
= 2859.337 ;
x3
731 . 267 ;
03
02 xl
°
is the unbiased standard deviation)
= 3266.546; 608.031
-107.579t + 3827.798 96.648t + 3680.842 8.690t + 3340 . 411 1.157t + 2593.782
x
4
= 2598.989 194.034
158 7