A Priori Analysis of Allowable Interval between Measurements as a Test of Model Validity Donald A. Jameson Department of Range Science Colorado State University Fort Collins, Colorado 80524
Transmittedby John Casti
ABSTRACT The variance of predicted states can be computed by a Kahnan filter for linear models, or by a linear approximation in a neighborhood for nonlinear models. Using these procedures, variances can be projected forward in time until allowable probabilities for statistical errors are exceeded. At this time the model is no longer useful as a predictive device, and a measurement of the system is required. The allowable interval between measurements is suggested as an indicator of model validity. Several statistical procedures for testing the suitability of models as a predictive device are discussed. A useful test must indicate that probability of model rejection increases as the projected variance increases. In usual applications of model testing the process variance is not known and an adaptive calculation of process variance is required in order to calculate the allowable interval between measurements. A new algorithm for calculating process variance is reported and compared with previously reported algorithms.
INTRODUCTION Simulation models are constructed for a variety of reasons and tested in a variety of fashions, but among all the reasons and the methods of testing there appears to be a tacit agreement that such models should be useful in making statements about the uncertainty of events. The kind of simulation models addressed here are those which use knowledge about the processes of some system to “predict,” or at least to express probabilities of, conditions of the states of the system. Since most model outputs are compared with measured APPLIED MATHEMATICS AND COMPUTATION
1793-105
93
(1985)
0 Elsevier Science Publishing Co., Inc., 1985 52 Vanderbilt Ave., New York, NY 10017
ooS63oo3/85/$03.30
94
DONALD A. JAMESON
states of the system a posterior& little attention has been given in the past to statements of future uncertainty. As a general rule, if the estimated states calculated by the model are within the confidence limits of the measured states of the system, the model is accepted. The usual methods of testing model adequacy have been well described by Shannon [13]; a more recent review of the literature is provided by Balci [l]. Confidence limits based on a t-test constitute a common method for univariate models, while Hotelling’s t2-test has been applied for multivariate models. The “predictive” ability of the model is often assumed if it passes the a poster&i tests.
PREDICTION
ABILITY
OF LINEAR MODELS
As a first example, consider a simple linear-quadratic-Gaussian (LQG) model. In such models, the system dynamics are linear, the cost function (i.e. the penalty for being wrong) is quadratic, and the statistics are Gaussian (i.e. the errors are normally distributed). Models of this type are commonly described in the control and estimation literature (e.g. [lo, 5, 81). The system under study, in the univariate case, is defined by
where xk is the “true” state of the system, zk is the state transition multiplier, uk is a control or driving variable, and wk is a random process error which is assumed to be normally distributed with mean zero and known variance qk. The subscript k denotes time. Since the exact value of wk is unknown but its mean value is zero, the system can be modeled as: fk+l/k
=
Zkzk,k
+
uk
where Zk,& is an unbiased estimate of the system state. The subscript k + l/k indicates that 2k+1,k is a prediction of the state of system at time k + 1 based on knowledge at time k. The subscript k/k indicates that the estimate 2 for time k has been updated at time k. The control or driving variable uk is assumed to be known without error. Measurements of the system states are indicated by: yk = xk + vkr
vk is N(0, r),
(3)
where yk is a measurement of the state which includes the measurement error
Test of Model Validity
95
uk. Thus, the variance r can be estimated from sampling, but the variance 9k can be determined only indirectly. The variance of the model estimates fk,k is designated as P~,~, and its projection is shown by Pk+l/k
=
22Pk,k
+
(4)
9k
The variance 9k is as defined in Equation (1). The variance projection from Equation (4) is used in the Kahnan update equations (Appendix). For the LQG model, the variance p of the estimates r can be projected forward by iteration, so that: Pk+Z/k
=
22
z2Pk,k
+
9k)
+
qk+l
(5)
and similarly can be projected for m time steps. The use of LQG models not only allows this straightforward projection of the variance p, but also allows some additional calculations. For evaluation purposes, the calculation of pk+m,k can be done separately for each variable to calculate the number of time steps (m) allowed between measurements even in multivariable models ([12]). As long as z and 9 remain constant, the resultant m will also remain constant [14, 71. For models with time varying z or 9, the allowable number of prediction steps could be calculated over time to show changes, if any, in the predictive usefulness of the model. PREDICTION
ABILITY OF NONLINEAR
MODELS
Basically, any model which is used as a predictive model can be evaluated by comparison with data to determine the adequacy of the model to predict one time step ahead if assumptions of quadratic costs and Gaussian or normally distributed errors are met. However, the projection of variances using a procedure similar to that of Equation (4) applies only to linear models. Thus, for nonlinear models an alternative procedure is required for calculating variance estimates for future time steps when no intermediate measurements are made. Nonlinear filter theory is considerably more complex than linear filter theory. The use of linearized nonlinear models and extended Kalman filters has been unsatisfactory in many applications. Nevertheless, a linear approximation adequate for assessing model predictive ability can be developed [4]. The assumptions of this method are that a nonlinear model can be separated into a linear and nonlinear component (6)
DONALD A. JAMESON
9f3
where z; is a calculated linear component and f(x,, k) is a residual nonlinear component. It is also assumed that in a neighborhood of a point in time the linear component dominates the nonlinear component. One might suppose that the value z; could be calculated by a linear regression from a sequence of points in a neighborhood 1
xk+
1=
b, + b$k.
(7)
The coefficient b, in the regression equation would be taken as the value z ’ for use in the variance projection. However, this procedure can yield negative values of b,, which would lead to reversals in the change in direction of x at each time step. The ratio estimation * z’=
xk+l ,. xk
gives more acceptable results, since z’ in this case is always positive (if the values of ? are nonnegative) and tends to be stable. At each time step, a new value of z; is computed. For model testing, the computation of Equation (8) should include some smoothing, either through the use of a fading memory smoother or by using data over the last several time steps, to compute an appropriate 2 ‘.
SUITABILITY
OF STATISTICAL TESTS
If a model is tested as a predictive device, the increasing uncertainty of future events should mean that the model would be less likely to pass a validity test in the distant future than in the near future. Thus, an appropriate model test should provide for increasing chances of failure as the model results are projected into the future. Statistical errors are of two types: Type I is the probability of rejecting the null hypothesis when it is true, and Type II is the probability of accepting the null hypothesis when it is false [6]. These error types must be considered in selecting a statistical procedure for model testing. Most null hypotheses are statements of “sameness.” If the statement of “sameness” implies a good model, then the Type I error is a statement of the probability of rejection of a good model; the corresponding Type II error is a statement of the probability of acceptance of a poor model. On the other hand, if the null hypothesis implies a poor model, then it is the Type I error that addresses the rejection of a poor model. Since the common statistical tests address Type I error, and
Test of M&l
97
Validity
Type II tests are generally more difficult, it seems appropriate to form a null hypothesis which states that the model is poor. Most tests of model fit involve an analysis of the differences yk - f,, where yk is a measurement of some state of the system at time k and Zk is an estimate of the state of the system at the same time. These differences are often called “residuals,” or in some literature “innovations.” If yk is determined by sampling and 3ik lies within the confidence interval of the sample, 2, is accepted as being a good model result. Thus, if the variance of the sample is large, it is easier to have a good model result by this standard than if the variance of the sample is small. It may be tempting to use the usual t-test for model predictive ability, substituting the square root of the projected variance for the sampling error in the equation for the t-value:
where d is a difference between two means. In a test of model predictive ability it might be assumed appropriate to compare values of t as the model results are projected forward. However, only the variance is projected forward. The difference y - 3E.is not projected, because these differences depend on making measurements y. In addition, an optimal estimation procedure minimizes (y - 2)s, thus assuring that this value will be small. As the projected variance increases, the t value will decrease, indicating a “better” model in the future than at present. The t-test applied in this fashion would not be a suitable indicator of the allowable interval between measurements. The variance s of the residual sequence y - f is equal to pk+ i + F [lo]. 7, and the ratio Thus> Pk+l/k = s -
is similar to the “lack of fit” test for regression models [9, 161. The ratio F can be compared with a standard F-test of basic statistics, and the statistical significance can be compared with the values in an F-table. With successive application of (5) the variance pk+,,,/k can be calculated for m time steps into the future. The test of model predictive ability can be determined by examining F
=
m
P-0 r
01)
98
DONALD A. JAMESON
at each time step. When the value of F, indicates a significant difference, the model is no longer appropriate for making predictions; i.e., the model should be used for predictions without measurements only for m - 1 steps. Clearly a model that fails the “lack of fit” test should not be considered a “good” model. However, the “lack of fit” test has a serious drawback in model testing, as it indicates the probability of rejecting a good model rather than the probability of accepting a poor model. If the acceptable probability is changed from 0.05 to 0.01, then the allowable F-value is increased, and the lack of fit test would indicate that the model is more suitable for prediction over several time steps. The variance ratio of Equation (9) is a useful test, but if used should be interpreted for the probability of Type II error rather than the probability of Type I error. Both the t-test and the F-test just described are tests for determining the probability of Type I error when the null hypothesis is that of a “good” model. In either of these tests, the complementary test for Type II error indicates the probability of accepting a poor model. As the probability of Type I error (a) decreases, the probability of Type II error (p) simultaneously increases; several Type II tests are described by Cohen [3]. As pk+m,k is projected forward by Equations (4) and (S), the resultant pk+m,k increases and B increases. \ I hen the probability p of a Type II error reaches an unsatisfactory levc , the model should be rejected. In contrast to the F- and t-tests described above, the statistical tests for regression and correlation have as null hypothesis a statement describing a “poor” model. In model testing, the null hypothesis would be that there is no relationship between the modeled and measured results. Thus, the test for Type I error is a statement of the probability of rejection of a poor model. However, the correlation and regression tests require knowledge of the covariance of y and 2, and the variance projection techniques do not address the question of this covariance. Another way to state a null hypothesis of a poor model is to state a maximum allowable or least significant difference (1.s.d.). The relationship of Equation (11) is restated so that
(12) Knowledge of the system is indicated by P~+*,~, and as the variance is projected forward, pk+,,,/k increases and the 1.s.d. increases. The null hypothesis is that the 1.s.d. is less than the allowable value. As pk+,/, grows, the 1.s.d. increases to the allowable value and the null hypothesis is rejected. This test is straightforward and well known, and seems to be entirely suitable for tests of model goodness. Based on the results of Pimental [12] discussed
Test of M&l
99
Validity
earlier, the 1.s.d. test can be applied to single variables of a multivariable model.
MODELS WITH UNKNOWN PARAMETERS In the usual applications, values of y can be measured, and r and ? can be calculated from repeated measurements of y at a given time. The variance s can be calculated by two or more differences y - 2 and can be updated at each time step. A fixed memory calculation over the last 12 time steps 10) can be used to compute the variance s of a series of (n=2,3,..., ( y - f )-values. In most applications the value of 9 cannot be determined directly. In keeping with this “real life” constraint, a value qk needs to be computed at each time step. A common equation reported in the literature calculates o as (13)
q = s - z”p - r.
However, this method is questionable [2] and often gives negative values of q. In initial numerical tests, the results of Equation (13) were so unsatisfactory that use of this equation was discontinued. An alternative method (developed in the Appendix) calculates q by
q=
z2(ssf)2+(s
_ f)(l_
q*
This method seldom gives negative values; if negative values do result, they can be replaced by q-
(s-d”, s
which is never negative. Equations (14) and (15) are derived from the Kalman filter equations at convergence, at which time pk/k = pk+ i/k+ 1; this method will therefore be designated the pa method. The value determined for q at each time step should be smoothed. In numerical tests a fading memory smoother was implemented as follows: (I) retrieve the last value ok_ i from storage and (2) compute a smoothed value for ok by ok = (q + 5 qk_ ,)/S. An alternative to explicit computation of q by equations such as (13), (14), or (15) has been described by Soeda and Yoshimura [15]: in this method an adaptive weight is applied to pk+l,k so that values of q are not critical.
DONALD A. JAMESON
100 A NUMERICAL
EXAMPLE
To demonstrate the application of these concepts to a simple LQG model, a sine wave model was used to generate sample values over the interval 247 radians with 720 time steps. For evaluation purposes, a new value of z ’ was computed by Equation (10) at each time step. In a strictly linear model, z ’ could be assumed known. The dynamics of the “true” state of the system was computed as shown in Equation (l), the “model estimates” as shown in Equation (2), and the “measurements” as shown in Equation (3). The estimates xk and the variance pk,k were updated at each time step using the usual Kalman update procedure [see Appendix, Equations (A4) to (A6)] or the Soeda-Yoshimura [15] procedure. Values of w and o were both generated from N(O,lOO), using values from a pseudorandom number generator converted to a normal distribution by a method described by Naylor et. al. [ll]. The variance r was computed from the values of ukr and the variance qk was calculated at each time step by Equations (14) and (15). The Soeda-Yoshimura filter was compared with the procedure of Equations (14) and (15) in this numerical example. The adaptive procedures were tested by examining the variance of the innovation sequence y - 12, by comparing the adaptively computed variance q with the known variance of w, and by calculating the autocorrelation of successive values of y - f. At each time step the projected value of the least significant difference was compared with an allowable difference. The value of ,/G was entered in Equation (12) to determine if the 1.s.d. had exceeded an acceptable value. The number of steps in which the model passed the 1.s.d. test was taken as the number of time steps for which the model could be assumed to be an adequate predictor without additional data being taken. Results for these tests are shown in Table 1. The Soeda-Yoshimura filter performed very well, resulting in values for the average waiting period between measurements, the variance of the innovation sequence, and the autocorrelation of the innovation sequence which were very similar to the values computed when the known error q was used in the update equations. The Soeda-Yoshimura filter also had the advantage that the assumed values of q were not critical. Thus the Soeda-Yoshimura filter produced entirely satisfactory results when evaluated strictly as a filter. On the other hand, the Soeda-Yoshimura filter does not yield values of the process error, which are necessary for the variance projection of Equations (4) and (5). The p, adaptive filter of Equations (14) and (15) suitably smoothed, does give estimated values of q which are comparable to the known value of process error used to generate the data. However, the p, filter has a higher
Test of M&l
101
Validity TABLE 1
REI.ATIONSHIP
Neighborhood lenath 4
5
6
10
OF ADAPTIVE
RESULTS
TO KNOWN
Average interval between measurements
Innovation variance
Estimated error Q
t-value for autocorrelation of innovation sequence
2.6
254
495
1.20
2.6 2.6
248 248
114 101
0.01 0.71
p, filter Modified SoedaYoshimura filter Known error p
3.9
254
147
0.03
3.1 3.1
248 248
105 101
0.24 0.73
pmfilter
4.9
259
127
0.73
Modified SoedaYoshimura filter Known error q
3.4 3.3
248 248
103 101
0.32 0.70
7.4
256
114
1.74
3.9 3.9
249 248
100 101
0.62 0.72
Filter method p, filter ModifiedScedaYoshimura filter Known error q
p, filter Modified SoedaYoshimura filter Known error q
ERROR
RESULTS
autocorrelation of the innovation sequence and is sensitive to the neighborhood length of the fixed memory used to calculate the variance of the innovation sequence. This procedure also results in more erratic variance estimates than does the Soeda-Yoshimura filter. The best overall results were obtained when the Soeda-Yoshimura filter was modified with an appropriate 9 for the variance projection computed from Equations (14) and (15).
CONCLUSION The results indicate the utility of adaptive filter techniques to determine the predictive ability of either linear or nonlinear models. A test for model validity was developed which analyzes the ability of the model results to suitably replace measurements over several time steps. The number of time steps that can elapse before the model results fail prescribed statistical tests is a useful index of model validity. Variance projection techniques using a
102
DONALD A. JAMESON
modified Soeda-Yoshimura filter appropriately show the increase in uncertainty of model results as projections are made further into the future.
APPENDIX
Linear systems are usually described in the literature as X
k+l=zxk+uk+wIC>
w
is N(O,Q),
(Al)
where x represents an n x 1 state vector of the system at times k + 1 and k, Z is an n x n state transition matrix, uk is an n X 1 control vector, and wk is an n x 1 vector of random process errors. The vector wk is assumed to be normally distributed with mean 0 and covariance Q. Measurements of the system are described as v is N(O,R),
yk=Xk+Vkr
(M
where yk represents measurements of the system which contain measurement errors vk, and v is normally distributed with mean 0 and covariance R. Estimates of the predicted state of the system x are calculated as xk+l/k
= zxk,k
t-43)
+“k
and the estimates are updated by: ‘k+l/k+l=
GkYk
+
(I-
644)
Gk)Xk+l,k,
where G, is the gain matrix as calculated in (A6) and I is the identity matrix. The covariance matrices used in linear or “ Kalman” filters are commonly presented as P k+
l/k
=
G, = P k+l/k+l
=
ZP,,,Z*
pk+
(I
I/k
+ Q,
M’(
-Gk”)Pk+l,k,
MP,+
645)
l,kMT
+
R) - ‘,
646) (A7)
103
Test of Model Validity
where the matrix Pk,k is the n X n updated covariance matrix of system errorV pk + l/k is the n x n projected covariance matrix of system error, M is an r X n measurement matrix, G, is the n X r gain matrix, and I is an n X n identity matrix. The matrices Z, Q, and R are as previously described. Less frequently mentioned is the fact that MPk+,,kMT i-R is the covariance matrix of the innovation or residual sequence y - x and is often designated as S. For simplicity only those cases when M = I and n = r will be addressed here. An initial estimate of P,,,,,may -’ be arbitrarily chosen. The user is left to find Q and R by some independent calculation. However, S and R are the only covariance matrices which can be directly calculated from data. The fact that Q is not calculated directly leads to the need for an adaptive filter which may be designated as “Qadaptive.” The “covariance matching” technique [2] makes use of the relationship S = ZPk,,ZT + Q + R,
(‘48)
Q=S-ZP,,,Z*-R
649)
which gives
as an estimate of Q. However, the result of (A9) may not lead to convergence [2] and may not be positive definite. In these cases negative diagonal elements of Q may be set to 0, which unfortunately expresses prediction with no error at times when R approaches S. Thus, the performance of adaptive filters using computation of Q by covariance matching is unsatisfactory when R is large relative to S but may be satisfactory when R is small compared to S. As an alternative consider the system error update equation. The process variance Q may be found from equations (A5), (A6), and (A7) given some P. The smallest possible P is at convergence, when Pk+ i,k+i = Pk,k: therefore, this value of P will yield the largest (most conservative) value of Q. At convergence Pk + ilk + i = Pk/k and with M = I, Equation (A7) can be rewritten p= [I-(zPz~+Q)s-~][z~z~+Q].
(AlO)
From (AB) ZPZT+Q=S-R
(All)
DONALD A. JAMESON
104
and
6412)
ZPZT=S-Q-R, P = Z-‘(S Combining (All)
- Q - R)(ZT)
- ‘.
6413)
and (A13), (AlO) can be further rewritten as
Z-‘(s-Q-R)(z=)-‘=
[I-(s-R)S-‘][s-R],
6414)
which simplifies to Q=Z(S-R)S-‘(S-R)ZT+(S-R)-Z(S-R)ZT
6415)
If Z = I, Equation (A15) reduces to Q=(S-R)S-‘(S-R)
6416)
Thus, a conservative estimate of Q can be adaptively calculated directly from S and R, which can in turn be calculated from data. For adaptive estimates of Q, the following sequence appears to give the satisfactory performance: (1) compute Q = Z(S - R)S-‘(S
- R)ZT + (S - R) - Z(S - R)Z?
6417)
Use this result if Q is positive definite. If Q is not positive definite, then (2) compute Q=(S-R)S-‘(S-R),
6418)
which wiIl be positive definite unless some diagonal element of S equals the corresponding diagonal elements of R, in which case Q will be positive semidefinite. Values of Q computed by Equations (A17) and (A18) should be smoothed. In numerical tests, suitable results were obtained by weighting previous values Qk_ r to obtain a “running average” for Qk by
Qtc=
Q+5Qk-l 6
*
W)
Test of Model Validity
105
REFERENCES
8 9 10 11 12
13 14
15 16
0. Balci, Statistical validation of multivariate response simulation models, Ph.D. Dissertation, Syracuse Univ., 1981. L. Chin, Advances in adaptive filtering, in Control and Dynamic Systems (C. T. Leondes, Ed.), 15:277-356 (1979). J. Cohen, Statistical Power Analysis for the Behavioral Sciences, Academic, 1977, 474 pp. H. E. Emara-Shabaik and C. T. Leondes, Non-linear filtering-the link between K&an and extended Kalman filters, Znternat. I. Control 34:1207-1214. A. Gelb, Applied Optimal Estimation, MIT Press, 1974, 324 pp. M. Hamburg, Statistical Analysis for Decision Making, Harcourt Brace Jovanovich, 1977, 861 pp. D. A. Jameson, Sampling efficiency for monitoring of model based forest plans, in Renewable Resource Inventories for Monitoring Changes and Trends, Proceedings of an International Conference, Corvahs, Oregon, 15-19 August 1983, College of Forestry, Oregon State University, 1983, pp. 568-572. A. H. Jazwinski, Stochastic Processes and Filtering Theory, Academic, 1970,376 PP. David G. KIeinbaum and Lawrence L. Kupper, Applied Regression Analysis and other Multivariable Methods, Duxburg Press, 1978, 556 pp. P. S. Maybeck, Stochastic models, estimation and control, Vol. 1, Academic, 1979. T. H. Naylor, J. L. Balintfy, D. S. Burdick, and K. Chu, Computer simulation Techniques, Wiley, 1966, 352 pp. K. D. Pimental, Toward a mathematical theory of environmental monitoring: The infrequency sampling problem. Lawrence Livermore Lab. Report UCRL51837 (NTIS), 1975. R. E. Shannon, Systems Simulation: the Art and Science, Prentice-Ha& 1975, 387 PP. J. F. Sisler and D. A. Jameson, Optimal multi-observation monitoring systems, a practical approach, in Analysis of Ecological Systems: State of the Art in Ecological Modeling (W. K. Lauenroth, G. V. Skogerboe, and M. FIug, Eds.), Elsevier, 1983, pp. 235-246. T. Soeda and T. Yoshimura, A practical filter for systems with unknown parameters, _I. Dynamic Systems, Measurements and Control 95:3Qf-401 (1973). Sanford Weisberg, Applied Linear Regression, Wiley, 1980, 283 pp.