Copyright © IFAC Dynamic Modelling Wanaw, Poland. 1980
ASSESSING THE QUALITY OF PREDICTION BAYKAL: BAYESIAN PREDICTION BY KALMAN FILTERING G. PaaB Institute for Planning and Decision Systems, Gesellschaft fur Mathematik und Datenverarbeitung, St. Augustin, Federal Republic of Germany Abstract. In Bayesian statistics for prediction a complete predictive density is calculated. This is an ideal means to asses the quality of prediction, as arbitrary measures of dispersion can be computed. The predictive density not only reflects the randomness of the residuals, but also takes into account the uncertainty of parameter estimates. In this paper a method (BAYKAL) is presented, which computes the predictive density by a Monte Carlo integration technique using the Kalman filter. BAYKAL is applicable to linear models with various features including simultaneous equations, observation errors, and correlated residuals. Apriori informations can be incorporated in a flexible way. Complete marginal densities, various parameters of these densities and measures of numerical errors can be obtained. For three small models the BAYKAL - predictions are compared with usual econometric predictions. The results indicate that the dispersion of BAYKAL predictions are always larger than the dispersion of the corresponding econometric predictions, which neglect the uncertainty of the parameters. Keywords. Bayes methods; Kalman filters; numerical methods; prediction; simultaneous econometric models; parameter estimation.
INTRODUCTION The prediction of future values of economic variables is one main application of economic models. But as the decisionmaker doesn't know the exact structure of the real world and has only a small amount of data, the predicted values can never be exact.
the prior subjective probability over the parameters by objective data. It has to be cleared up that the parameters needn't be regarded as a random result of a real experiment. Bayesian decision theory merely argues that people, who wish to decide consistently in uncertain situations, should act, as if the paramenters were a random variable with a certain distribution function (Rothenberg, 1973. p.138). This view is by no means arbitrary, but can be derived from a few simple axioms of rational decision (Lindley, 1971).
In classical econometrics prediction is usually done by a two step procedure. First the unknown parameters are estimated and then the future values are simulated with the estimated parameters. If one takes exogenous variables and the functional form of the model as given, there is one source of uncertainty that is neglected in this procedure, namely the uncerAccording to the concept of the Bayesian tainty of the parameters, which can be estimated theory, for prediction the probalitity denonly with limited accuracy. The derivation of sity p(z ly=y(T)) over the future variables the exact sampling distribution of the common z conditional to the observed data y(T) is econometric estimators in the case of autoregress-derived. It refects all information containive or simultaneous models seems to be possible ed in the subjective prior density and the only for very small models (Basman, 1974). There- objective data. The dispersion of this denfore for larger models it seems to be extremely sity can be interpreted in terms of the undifficult to asses the uncertainty of the predic- certainty of the prediction, while the exted values. pectation, the mode or the median give As an alternative the Bayesian theory offers an 'average' future values. To calculate this unified and theoretically well founded approach density, a multiple integral has to be evato prediction, which incorporates the uncertainty luated. This can be done analytically only in the parameters. Unlike classical statistics, for very simple cases. For simultaneous which interpret probability in terms of repeated econometric models therefore numerical preexperiments, Bayesian theory considers probabilitydiction procedures seem to be useful. as a subjective measure of belief. Then statisti- Van Dijk and Kl~ek (1977) -for the first time cal inference can be understood as correction of suggested a multiperiod prediction method.
297
G. Paass
298 p
35
0.1
30
84.1 %
0.05 /"
25
2
Fin.1
3
4
./'
_----- E
observed
20
Estimated density of variance a 4
They extended a suggestion of Chow (1973) and expressed the yredictive density by higher order moments of the posterior density of the parameters. These moments can be described by multiple integrals, which are partly solved analytically and partly by a Monte Carlo integration technique. In order to determine the expected value and the variance of the predictive density k steps ahead, moments up to order 2k have to be calculated. As there are no estimates of numerical accuracy and as there is a great computational effort, van Dijk and Kloek limit their approach to short term predictions. In this paper the predictive density is computed directly without reference to the moments of the posterior density of the parameters. Certain conditional densities are obtained using the Kalman filter, while the rest of the multiple integral is evaluated by a Monte Carlo technique. This method is called 'Bayesian Prediction by Kalman Filtering' (BAYKAL) and its advantages are:
15 15
Fig.2
20
25
True prediction for model
les. z is assumed to appear with 1 lags in the model. These lagged variables form a k*l vector
which for t=O is assumed to be normally distributed N{E{x o ) ,Var(x o Let the model consist of the equation system
».
{l} AZ =BX _ +cu + w t t t 1 t with a k*k - matrix A, a k*{k*l}- matrix B and a k*m - matrix C. A is assumed to be nonsingular. The 'system noise' w follows t a normal distribution N(O,Q) with Q ~ 0 and E(W w ')=O f~r t ~ T • It is supposed that 1 t the moael var1ables Zt can't be observed directly, but only by an intermediate 'observation model'
It is applicable to a large class of models. The whole predictive density as well as margi- with a vector Yt of observable variables. nal densities can be obtained. The observations are assumed to be disturbed by an observation-noise v that is normally t Moments and percentiles of the whole density distributed N (O,R) with E (vtv ) = 0 for as well as of the marginal densities can be t ~ T • Neither v not w nor x: maybe corret calculated. lated with each ofher. Without modification all the derivations of the paper can be exThe accuracy of the above values can be tended to time-varying matrices A,B,C,Q and estimated.
R.
The prior density can be chosen freely.
The cases of no observation noise R = 0 and The distribution, percentiles and marginal missing observations R- 1 = 0 can be handeled posterior densities of the parameters are cal- as limiting cases. cultated as by-product. Let a be the s-vector of unknown parameters. After this introduction in the next section the It consistsof elements of the matrices class of models is described, for which BAYKAL is A,B,C,Q,R as well as E (x ) and Var (x ). applicable. It is assumed that T obse~vations 0 yeT) :=(Yl'··· 'YT) are given. The following two sections give formulas for posterior and predictive densities and show that the likelihood function and certain conditional densities can be obtained from the Kalman filter.
(1) can be transformed to the reduced form (3) z
t
= A-
1
Bx
-1
t-l
+ A
-1
CUt + A
w t
.
Then the Monte Carlo integration procedure is des- From this one gets the state space form cribed.The last but one section illustrates the (4) x = FX _ + DU + Gw with flexibility of BAYKAL by means of the prediction t t t t 1 of three small models. Finally in the last section (A-1.B) (A-l xC ) (A- 1 ) some conclusions are made concerning BAYKAL and F=(I , 0) D=(O , 0) G=( 0 ) several extensions are proposed. DESCRIPTION OF THE MODEL
According to the philosophy of Bayesian statistics it is necessary to supply a prior Let Zt be a k-vector of endogenuous variables and density p(a) for the unknown parameters. Ut an m-vector of deterministic exogenous variab-
299
Assessing the Quality of Prediction Baykal 35
35
30
30
84.1 % 84.1 %
_--t--
25
---
_/,,-/E
25
/-
/
observed
20
15
15
Fig.3
observed 15.1%
20
20
15 MAP pred~ction
~....""t.:"'-_--+-------+-------+---;-
25
for model
This has been one main point of controversy about Bayesian statistics, as the choice of p(a) obviously affects the results of the prediction procedure. Fortunately for a large or medium number of ovservations even rather drastic changes in the prior distribution will not affect the result significantly as the information in the data predominates. In addition 'noninformative' priors have been derived that have a miniminal 'influence' upon the results (c.f. Zellner, 1977). In BAYKAL priors can be specified in two ways. First one can define any informative or noninformative prior in terms of the parameters a. In addition the value of any function of the parameters, for instance eigenvalues, reduced form parameters and multipliers, can be restricted to freely chosen intervalls. So if for example the decision maker knows from theoretical considerations that the model is stable, he can restrict the maximal absolut eigenvalue to be smaller than unity.
Fig.4
25
BAYKAL prediction for model
In the later sections it is shown that p(z Iy,a) can be calculated with the Kalman filter, if the'values of a and y are given. In this case p(z Iy,a) is normally distributed with ex~ectation E(z Iy,a) and + a scalar var1ance Var(z+ y,a). Let zn be component of z+. Then the marginal density p(znla,y) is normal again and the expectation E(znl y,a) and the variance var(znly,a) are the n-th element of E(z+\y,a) and of the diagonal of var(z+!y,a) respectively. So analoguously to (5) the marginal predictive density is calculated as
.
I
p(znlY) = I p(zn\y,a) p (alY) da .
(7)
Obviously the joint densities of several elements of z+ can be obtained in the same way. From these formulas various parameters of the predictive density can be derived. E(znl y ) = I zn p(zn 1Y ) dZ n
(8)
IIz n p(znly,a) p (alY) da dZ n IIz n p(zn1y,a) dznP (aIY) da
FORMULAS FOR PREDICTIVE DENSITIES Let a be the s-vector of unknown parameters, y(T) :=(Yl, ... 'Y T ) the vector of observed variables, z(T+l,T+N): =(zT+l, ... zT+N) the vector of future model variables and u(T+N): (ul,.··uT+N) the vector of exogenuous variables up to time T+N. If there is no confusion about the time ranges, Y (T) , X,. ...N and u (T+N) are abbreviated to y,z+ und u respectively. The exogenuous variables don't appear in this density as they are assumed to be known in advance. Now predictive density is defined as p(z+\y).
20
15
lE
(znly,a) p (alY) da
if the expectation exists. If the variance exists, it can be decomposed in the following way var(znl y ) = I zn 2 p(znl Y) dZ n - E( z n l y )2 The first term can be transformed to 11 zn 2 p(znly,a) p (alY) dadz n 11 zn 2 p(znly,a) dznP (alY) da
I [var(znly,a)+E(znly,a)2]p(aly) da By definition of conditional probability this is almost everywhere equivalent to p(Z+'y)/p(y)
I p(z+,y,a) da/p(y)
Using (8) one gets (9)
var(znly) =
I [var(Znly,a)+E(Znly,a)2]p(aly) da
[I E(Zn1y,a)p(aly) da]2
I p(z+,y,a)/p(y) da I !?_( _~t-'_¥ ',a), _~_~ ' a) da p(y,a) p(y)
I p(z Iy,a) p(a!y) da +
In a similar way formulas can be derived for several components of z+. The marginal density p(anlY) of the n-th component of a is described by the integral
p(alY) can be decomposed according to the Bayesian theorem (10)
( I) (6)
pay
=
p(y\a) pea) J p(Yla)' p(ar-da .
I
I
p(an y)= I p(a y)da1···dan-1dan+l··das
So the marginal distribution function can be written as
G. Paass
300 35
30
observation 30
predictio
84.1% 2S
84.1%
25
20
20
observed 15.1%
15 Fig.5 ( 11 )
15
20
1S
2S
b Fn (b) = f - ~. · f p ( a I y) da 1 · · da n .. da s
.
Expectation and variance are expressed as (12)
20
1S
BAYKAL prediction for model 1 (stable)
E(anly)= fanP(a\y) da and 2 var(anly)= fa n p(aIY) da - E(a n ly)2 .
Fig.G
2S
t
True: prediction for model 2
(19b)
Var(Xt+l \y(t+l) ,a)=var(x t +l Iy(t) ,a)
and only (15) is needed. Obviously these formulas include the case of prediction.
Now for given values of a and y one is able to calculate all the conditional densities (in terms of the characterizing first and CALCULATION OF CONDITIONAL DENSITIES second moments) of the preceding section: BY THE KALMAN FILTER p(yla)=p(y(T) la)can be decomposed by the chain rule (14) into conditional densities, Let us assume now that the value of the parameter which can be computed iteratively by the -variable is known. This needn't be the true Kalman filter and (16). value of a in the real system but can be any p(znly,a) is a marginal density of value that is admissible in respect to the prior p(xt+jly,a) which can be obtained by the density. By means of probability theory it can Kalman filter and the prediction formulas be shown that a conditional density then ( 1 5) and (1 9) . results in a density and can be handeled in the same way. From the chain rule one gets MONTE CARLO INTEGRATION (14) p(y(T) la)=p(YTla,y(T-l» p (y(T-l) la) (13)
= p(yT la,y(T-l»
·····p(y2Ia,y(1»p(ylla)
By the Kalman filter theory it follows that all these conditional densities are normally distributed. Therefore they are completely characterized by the expectation E(y t y (t-l) ,a) and varianceVar(Yt!y(t-1),a) .The latter quantities can be calculated recursively with the Kalman filter using the state space representation (4) (Jazwinski, 1970, p. 201f). First the variables are predicted one step ahead:
l
(15a)
E(xt+l \y(t) ,a)=F E(xtly(t) ,a)+Dut+l
(15b)
var(x t +lly(t) ,a)=F var(xtly(t) ,a)F'+GQG'
(16a)
E(Yt+lly(t) ,a)=H E(X t + Iy(t) ,a) 1 Var(Yt+l Iy(t) ,a)=H var(x t + Iy(t) ,a)H'+R 1 Now the observation Yt+1 is included (16b)
(17a)
E(Xt+lly(t+1) ,a)=E(Xt+1!y(t) ,a) +
=fE(Zn!y,a) T(a)q(a) dalfT(a)q(a) da Applying the Law of Large Numbers the first integral is estimated unbiassedly by /\
m
I
E o = 1/m i~lE(Znly,ai) T(a i ) where the ai are m independent random drawings from the distribution with density q(a) . An unbiassed estimate for the second integral is given by /\ 1 m (22) c:= Im i~1T (ai) (21)
/\,
(23)
(17b)
Var(x t + 1 ly(t+1) ,a)=Var(x t + 1 !y(t) ,a) Kt +1 H Var(x t + 1 Iy(t) ,a) The gain matrix is defined as (18) Kt +1=Var(x t +1 Iy(t) ,a)H'Var(Yt+1 \Yt,a)-1 The case of perfect observations can be handeled by assuming part or all of R to be 0 (Schweppe, p. 165f). If there is no observation at all for time t+l, one gets E(x t + 1 Iy(t+1) ,a)=E(x t + 1 Iy(t) ,a)
(20) E(znly)=fE(Znjy,a)p(yla) pea) cia Ip(y /a)p(a) da
The ratio
Kt + 1 [Yt+1- E (Yt+1 Iy(t) ,a)]
(19a)
Let q (a) be a density so that from q(a)=O follows p(aly)=o (absolute continuous) and define T(a) :=p(yla)p(a)/q(a). Then (6) and (8) result in
and
/\
/\
E(znlY):= Eolc
is an unbiassed estimate of E(zJy), if two independent sets of random drawings are used to calculate (21) a~d (22). Because of the theorem of Slutsky E(zJy) is a consistent estimate. As the use of two independent sets of random drawings is much less efficient than the use of the same a i in both formulas, the latter method is employed. The resulting procedure for the prediction of E(Zn!y) is intuitively appealing: A number of parameters al, ... ,am is chosen randomly from the parameter-space and the
301
Assessing the Quality of Prediction Baykal 30
30
observation
prediction
25
25
20
20
84.1 %
15
15
Fig.7
prediction
observation
20
15
25
MAP prediction for mod~l 2
20
15
t
Fig.8
25
t
Baykal prediction for model 2
conditional expectation E(znly,ai)is calculated In the same way as described above the for each ai. integrals related with the computation of the predictive density p(z\y) (5)-(9) as The actual prediction ~(zly) is obtained as a well as the posterior density of the paraweighted sum of the conditional expectations meters (10)-(13) can be evaluated. E(znly,ai). The weighting factors are the 'probabilities' (density values) that the obAt last a procedure for the calculation of served data has been qenerated by a model the percentiles of p(zn!y) shall be given. A Wl.. t h parameter ai. (c is merely a normalizing The.d-percentile of p(znlY) is the number factor. ) Cl wl.th Cl (26) _! p(znly)dzn=d/100 The variances of (21) and (22) are A
(24a)
Var(E o )
1/m varq[E(znly,a)T(a)]
(24b)
var(~)
1/m varq[T(a)]
The variance of the ratio estimator (23) can be approximated by A I 1 A /\ /\2 /\2 (25) Var(E(zn y» ~ /~2[var(Eo)+var(c)Eo/C /\
A /\
A
-2 cov(Eo,c)Eo/c] (Kendall and Stuart, 1969, p. 232). Because of (24) the standard deviation as a measure of accuracy goes to zero with 1//m. For a sufficient large dimension of a (say ~ 5), Monte Carlo integration algorithms converge more rapidly than any nonprobabilistic formula (Haber, 1970, p. 515). To reduce the variances var(~o) and var(~) one can choose q(a) in a way that T(a) is nearly constant (importance sampling). Analytical results for the functional form in the case of noninformative priors (Zellner, 1971, p. 270275) suggest the choice of multivariate normal-, Student- or Wishart densities for q(a) . In BAYKAL, normal- and Student densities can be used, whose central tendencies and dispersions are numerically approximated to the corresponding parameters of p(y!a)p(a) . -The modal value ao is used as parameter of the central tendency. It is obtained by maximizing p(aly)p(a) with hillclimbing algorithms and can be interpreted as 'maximum aposteriori' (MAP) estimate. -The dispersion is discribed by the inverted matrix of second order derivatives, evaluated at the mode. The derivatives are approximated by finite differences. The bias of the ratio estimator (23) can be reduced by serveral techniques. In BAYKAL a 'jackknife'-method has been employed(Kleijnen, 1974, p. 240) that removes the first order bias.
Applying (5) and (6) one gets Cl
(27) d/l00= f~! p(znly,a)dzn]p(a!y) da For given a and y, the inner integral is the value of the normal distribution function of N(E(Znly,a) ,var(znly,a». Using Monte Carlo Integration, Cl can be approximated by ~ with 1 m (28) d/100= /m i~l[-! P(Znly,ai)dzn]T(ai)/C
a
/\
A
Cl can be computed by solving (28) employing an iterative Newton-technique.
NUMERICAL ILLUSTRATION To demonstrate the BAYKAL method, predictions for three small models were calculated. The true parameters and the starting values were assumed to be known, and were used to generate the data of each model for 20 time periods. Then three types of prediction were calculated 7 periods ahead: the 'true' prediction using the true parameters. The dispersion of the prediction, resulting from the noise terms, was computed by the Kalman filter (15) (19) . the MAP-prediction using the MAP-estimates of the parameters. It corresponds to the common econometric prediction procedure. Once again the dispersion, resulting from the noise terms, was computed by the Kalman filter. the BAYKAL prediction using 800 drawing (models 2: 400) of the parameters according to q(~). q(a) was chosen as multivariate normal density. The central tendency of the computed densities was expressed by the expectation E, while the dispersion (Disp) was computed as half the difference between the 15.1% and
G. Paass
302
additional observations and their variances were set to as. The true-, the MAP-, and the BAYKAL-prediction are shown in Fig.6 - Fig. 8. The deviations between the MAP- and the BAYKAL-prediction again can be partly explained by the skewness of p(a4) and p(aS). The numerical accuracy for the last neriod was C98=0.87.
the 84.1% percentile. For the normal distribution this is the standard deviation. The numerical accuracy was measured as half the length of the 98% confidence interval of the estimator for E (later indicated by C98) . Model 1: exact observations The first model was an univariate autoregressive equation Zt = a 1 z t - 1 + a2 z t-2
(29)
Model 3: econometric model In the last run an econometric model with three equations (Kloek and v.Oijk, 1978, p.7) was used:
+ a3 + wt
with Wt distributed as N(0,a4). As it was assumed that little was known apriori, a noninformative prior was chosen: (30)
p(a1, ... ,a4) -
1/1a4
where - means 'proportional'. The true parameters as well as the MAP - and the BAYKALparameter-estimates (expectations of the posterior distribution) are schown in table 1.
---
_._------
a1 ----
a2
_.. _---
----
a3
--
---------
a4 .
1.250
-.270
1.00
1.50
MAP mode
1.248
-.268
0.900
1.95
0.208
E
1.259
BAYKAL Oisp C98
0.210 -.277
0.933
0.180
0.2141
0.035
0.038 0.139
2.58
0.721
a4+aSYt+a6It-1+w2,1
0.781 Table 2 Predictions for Model 3 (last period) 0.138 j
The true-, the MAP- and the BAYKAL-predictions are plotted in Fig. 2 - Fig. 4. For the first prediction period the dispersions are nearly equal, while for the last period the dispersion of the BAYKAL-prediction is 66% greater than the dispersion of the MAP-prediction. This partly seems to be caused by the posterior distribution of a4, which is heavily skewed to the right (Fig.1). While the MAP-prediction neglects this skewness, the BAYKAL-prediction takes into account the whole posterior density. The numerical accurracy for the last period was C98=1.36. The maximal eigenvalue of the model is 0.97, and therefore the model is stable. If the stability is incorporated into the prior density by rejecting all drawn parameter vectors, which correspond to an absolute eigenvalue ~ 1, the BAYKAL-prediction changes drastically (Fig.S). Especially the dispersion is reduced. Model 2: observation errors To take observ~tion ertors into account, model 1 was augmented by a noise model: Yt = Zt + v t with v t distributed as N(O,aS) . As true parameters al-a4 were taken from model 1 and as was set to 0.5. It was assumed that only little was known apriori (31)
(32)
I ~+1)/2=(a3a7)3/2
(Zellner, 1971, p.272). The true parameters were set to a1=10, a2=0.432, a3=0.4, a4=20., as=0.079, a6=0.401, a7=1. The predictions for the last period are shown in table 2.
0.644
0.608
a1+ a 2 Yt+ w1,t
It
(34) p(a1, ... ,a7)-\Q-l
_.- -
--
so
Ct
(33b)
(33c) Yt Ct+It+ Zt where Ct denotes consumers expenditures, Yt total expenditure, It investment, and Zt exogenuous expenditure. Wi,t and W2,t are normally distributed N(0,a3) and N(0,a7) and were assumed to be uncorrelated. As before a noninformative prior was chosen:
Table 1 Parameters of Model 1
true
(33a)
p(a1, ... ,aS) -
(Zellner, 1971 p.132)
1/la4aS .
The starting values were assumed to be
Ct
I
!
E
E
Oisp C98
~e_-----i~S_9_._7_9--t-_l._5_6 _~=- 4~. iMAP
60.03
1.97
II
It
-
Oisp C98 47
I
~~----=-J
48.59 1.53
-_~
L--:BA_Y_~_L_~~~l~ 2.1~_ 0.29 48.76 ~._~_~_~~
The fact that there are only moderate differences seems to be caused by the small variances a3 and a7. From this the uncertainty in the parameter estimates in only small. CONCLUSIONS ANO EXTENSIONS In this paper a numerical method (BAYKAL) for the calculation of the Bayesians predictive density has been derived. It uses the Kalman filter and Monte Carlo integration techniques and contains measures of numerical accuracy. By increasing the number of drawings, any level of accuracy can be reached. In addition to the whole predictive density, various parameters of dispersion can be obtained. Therefore the results of BAYKAL not only establish an optimal prediction (in the Bayesian sence), but also can be used to assess the quality of prediction. The predictive density not only. reflects the uncertainty of the noise terms (residuals), but also the uncertainty in the parameter estimates, and incorporates all correlations and higher order interactions of the
Assessing the Quality of Prediction Baykal parameter estimates. The show that the dispersion density is always larger dispersion, which can be predictions.
numerical examples of the predictive than the corresponding derived for usual
To reduce the computational effort, especially for larger models, it seems to be profitable to introduce new approximate densities q(a), that have a better fit to the posterior density. Nevertheless BAYKAL seems to be applicable to medium size models. Finally it should be mentioned that from the Kalman filter theory BAYKAL can be easily extended to new types of models: System Dynamics models (no observation at specified periods) , modelling of exogenuous variables (observation errors, feedback, prediction of exogenuous variables), and approximation of nonlinear models (extended Kalman filter) . REFERENCES Basman, R.L. (1974). Exact finite sample distributions for some econometric estimators and test statistics: a survey and appraisal. In M.D. Intriligator, and D.A. Kendrick (Ed.), Frontiers of quantitative econometrics~ Vol.II. North Holland, Amsterdam. pp. 209-271 Chow, G.C. (1973). Multiperiod predictions from stochastic difference equations by Bayesian methods. Econometrica~ Vol. 41, pp. 109-118 van Dijk, H.K. and Kloek, T. (1977). Predictive moments of simultaneous econometric models - a Bayesian approach. In M.Aykac and C. Brumat (Ed.), New developments in the application of Bayesian methods. North Holland, Amsterdam~ pp. 243-260. Haber, S. (1970). Numerical evaluation of multiple integrals. SIAM Review~ 12~ pp. 481-526. Jazwinski, A.H. (1970).
Stochastic Processes and Filtering Theory. Academic Press, New York. Kendall, M.G. und Stuart, A.
The Advanced Theory of
(1969).
Statistics~ Vol.
1
Griffin, London. Kleijnen, J.P.C. (1974).
Statistical Techniques in
Simulation~
Part I. Marcel Dekker, New York. Lindley, D.V. (1971). Baye~ian
Statistics - a Review .
SIAM Philadelphia. Rothenberg, T.J. (1973).
Efficient Estimation with A Priori Information. Yale University Press, New Haven. Schweppe, F.C. (1973).
Uncertain Dynamic Systems . Prentice Hall, Englewood Cliffs. Zellner A. (1971).
An Introduction to Bayesian Inference in Econometrics . Wiley, New York Zellner A. (1977). Maximal data information und prior distribution. In M. Aykac and Brumat, C. (Ed.),
New developments in the application of Bayesian methods. North Holland, Amsterdam. pp. 211-232
303