Agricultural Systems 34 (1990) 183-190
Regression of a Model on Real-System Output: An Invalid Test of Model Validity S. R. H a r r i s o n Department of Economics, The University of Queensland, Queensland, 4072, Australia (Received 17 July 1989; accepted 21 January 1990)
ABSTRACT Statistical tests of validity of farming systems models may be inappropriate for a number of reasons. A specific example is the F test for zero intercept and unit slope; this test has intuitive appeal, but bias in parameter estimates can lead to rejection of a valid model. It is suggested that descriptive statistics and subjective tests be used to buildup confidence in a model as it proeeeds through a number of prototypes.
INTRODUCTION The validation of farming systems models presents many conceptual and practical difficulties. Validation in essence involves ensuring that a model is adequate for its intended use. While this intended use is often the correct ranking of alternative policies or technology packages, most validation activity has been directed towards the more demanding criterion of ability of models to predict real-system performance. That is, testing has been concerned with whether a model is capable of predicting mean values of realsystem outputs for given inputs, in terms of lack of bias and low standard error. A wide variety of statistical tests (on means, variances, autocorrelations and overall distributions) and subjective methods have been applied for this purpose (e.g. see Shannon, 1975). If a model satisfies a number of tests aimed at different characteristics of the output series, then confidence in that model should be enhanced. 183 Agricultural Systems 0308-521X/90/$03"50 © 1990ElsevierSciencePublishers Ltd, England. Printed in Great Britain
184
S. R. Harrison
Statistical tests on conformity of outputs of farming systems models and real systems may be inappropriate on a number of grounds. First, hypothesis testing has been designed to discredit false null hypotheses, whereas validation tests attempt to accredit true null hypotheses (that models adequately mimic real systems). In practice, a test can only address the question of whether a model is invalid beyond reasonable doubt (Harrison, 1987). Second, the assumptions underlying the better-known statistical inference techniques are rarely met when simulating dynamic stochastic systems (Wright, 1972). Third, an adequate number of data observations may not be available (especially for spectral tests), and the information yielded by tests may be no greater than that which could be obtained from subjective appraisal of output series (Wright, 1972). When the assumptions underlying tests are not upheld, estimates of statistical parameters may be biased. This is the case, for example, with estimated variance for the t test on differences between paired observations, where the differences are correlated over time (Law, 1983). A more subtle case of bias in parameter estimates arises when model output is regressed on real-system output, and a F test is applied to determine whether the population regression relationship could have, simultaneously, an intercept of zero and a slope of unity (Ritchie, 1976; Dent & Blackie, 1979). Although the existence of bias in estimates of parameters of a regression line when there are errors in both variables is well known by statisticians, this is an example of the validation procedures that continue to be used for farming systems models (e.g. Jones & Kiniry, 1986). As well, statistical tests have recently been proposed for validation of expert systems (O'Keefe et al., 1987). The purpose of this paper is to illustrate the pitfalls in use of statistical validation procedures, with particular reference to regression of model on real-system outputs. THE S I M U L T A N E O U S TEST ON I N T E R C E P T A N D SLOPE The relationship between model and real-system outputs may be hypothesized to follow a relationship of the form Y, = fix + fl2xr + et
(1)
where Yt is the t th observation of model output, xt is the t t" observation of real-system output, and e, is a random error term. The data usually take the form of two time series samples, with corresponding observations 'paired' or positively correlated over time. Output from the model is then regressed on output from the real system, to fit a sample relationship of the form Yt = bl + b2xt + et
(2)
Regression o f a model on real-system output
185
where bl and b2 are the sample regression intercept and slope coefficients, and the et are residuals. The following hypotheses are then tested: Ho: fix = 0 and f12 = 1 Hi:/31 = 0 and/32 = l, or both
(3)
The F statistic for this test is calculated as F = (n - 2){nb 2 + 2n x
bl(b 2 -
1) + Ex2(b2 - 1)2}
(4)
2ns 2
where n is the sample size and s z is the residual variance. The calculated F value is compared with the critical value in tables for 2 and n - 2 degrees o f freedom. If the test does not lead to rejection of the null hypothesis, then it cannot be concluded that the model is invalid; this is taken as evidence that the model is valid. The rationale for the regression test is that if the outputs of the model corresponded exactly with those of the real system, for the same input data, then all paired observations will fall exactly on the 45-degree line of a scatter diagram. If the model mimics the real system closely but not perfectly, then a cluster of points around this line will be expected. On the other hand, if the model is a poor representation of the real system, the scatter of points will depart systematically from the 45-degree line. For example, if model outputs are consistently lower than real-system outputs, then the points will cluster below the 45-degree line. If a high degree of'noise' or error in predictions is present, then the points will be widely dispersed on a scatter diagram. While this reasoning has intuitive appeal, in practice the regression test can be unacceptable on both theoretical and practical grounds. This will be illustrated with respect to a bivariate normal distribution, and then to a more general case of paired time series.
R E G R E S S I O N OF B I V A R I A T E N O R M A L V A R I A T E S Suppose a stochastic simulation model is an exact or perfectly valid mimic of a real farming system. Further, representative samples of model and realsystem output are obtained for the same management policies. Corresponding values in the two samples will be highly correlated, with differences due to random sampling variation only. This variation could arise, for example, from r a n d o m realizations of environmental variables such as weather and product prices. Since the two samples are drawn from identical populations, any statistical test would be expected to lead to the conclusion of no difference between the measures compared.
S." R. Harrison
186
As a simple and mathematically tractable case, suppose model and realsystem outputs arise from a bivariate normal distribution. This distribution has a vector of means and a variance-covariance matrix of the form
PY
L°'xr
(r2r
j
(5)
where Px and pr are mean real-system and model performance levels, g~ and cr2r are the variances, and ~xY is the covariance. The level of correlation between outputs is implied by the variance-covariance matrix, i.e. O'xy
(6)
tr x tyy
Suppose that the two variables (real-system outputs X a n d model outputs Y) have equal means and variances, and are positively correlated, i.e. ~ X ~--- ]'/Y "~- ]/
0"2 = 0.2 = 0-2
and
0 < Pxr < 1
(7)
Intuitively, we would expect the scatter diagram of the sample points to cluster around the 45-degree line, and the regression equation of either variable on the other to have an intercept of approximately zero and a slope of approximately unity. In contrast to this reasoning, there is a well known relationship between the sample regression coefficient (b2) and the sample correlation coefficient (r), which states b 2 ---- Sl' r
(8)
Sx
where s x and s r are the standard deviations of variables Xand Y(Harrison & Tamaschke, 1984). For equal population variances, the term s r / s x is likely to be near unity, and the regression coefficient will be approximately equal to the correlation coefficient, i.e. b 2 will be less than unity. In connection with validation of econometric models, Aigner (1972) examined the theoretical sampling distributions of the regression intercept and slope coefficients for correlated bivariate normal variables with equal means and variances. He showed that, regardless of which variable is regressed on the other, the expected regression intercept is greater than zero and the expected slope is less than unity, i.e. if bl and b2 are the least squares estimators of fll and f12 then the means of their sampling distributions are such that E(bl) > 0
and
E(b2) < 1
(9)
In fact, the expected slope is equal to the correlation coefficient P x r , and the expected intercept is p (1 - f12),where p is the common mean. For example, if
Regression o f a model on reaLsystem output
187
p = 0-7 and # = 20, then the expected slope is E(b2) = 0.7 and the expected intercept is E(bO = 6. The above theoretical results have important practical implications for model validation. If the correlation coefficient is low (i.e. if model outputs differ substantially from real-system outputs) then the unexplained variation will be high relative to the explained variation, and the null hypothesis Ho: fll = 0 and [3z = 1 probably will be accepted. But if the model is a close representation of the real system, then the residual variation will be relatively small, and the test could lead to rejection of the null hypothesis. In other words, if the model is invalid, we could conclude that it is valid, but further refinement could well lead to the test conclusion that the model is invalid. This clearly is a perverse result!
M O R E C O M P L E X O U T P U T PROCESSES Most farming systems models generate time series of output variables which possess components such as trend, seasonal variation, cyclical or persistent behaviour, and residual variation. Exact theoretical sampling distributions for the regression coefficients of model on real-system outputs may not then be available, but the distributions can be approximated by numerical methods. For example, suppose model and real-system outputs can be represented by the following population time-series models: Model: y,=z 1 +r2t+z3sin(tn/6)+e t Real system: x t = z 1 + r2t + "c3 sin(tn/6) + q,
(10)
Suppose further that et and r/t are normal random variates, each with variance o.2 and one-period autocorrelation coefficient p. These equations define fairly general forms of population distributions: r 1 expresses the general magnitude of the series, z2 is the trend increase per unit of time, and r3 governs the magnitude of seasonal variation. For any particular values of the parameters (zl to z3, o. and p) in these population schemes, a large number of independent random samples of fixed size may be drawn by Monte Carlo sampling. Suppose alternate samples are allocated as representing output of a systems model and output of a real farming system, respectively. Synthetic 'model' outputs may be regressed on synthetic 'real-system' outputs for each pair of samples. The means of the regression intercept and slope coefficients may then be calculated, yielding estimates of the expected values of these sample statistics over their unknown population sampling distributions. The above procedure has been applied for various combinations of
S. R. H a r r i s o n
188
TABLE 1 Mean Intercept and Slope Coefficient for Regression of Synthetic Model and Real-System Outputs Set
zl
z2
r3
~2
p
bl
s(bl)
b2
s(b 2)
1
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2
0"0 0-0 0-0 0"0 5"0 5"0 5"0 5-0 0"0 0"0 0-0 0-0 5"0 5"0 5-0 5-0 0"0 0.0 0"0 0"0 5"0 5"0 5'0 5"0 0'0 0'0 0'0 0-0 5"0 5"0 5"0 5"0
2"5 2'5 5'0 5"0 2"5 2-5 5"0 5"0 2"5 2"5 5"0 5-0 2"5 2"5 5'0 5'0 2'5 2'5 5"0 5'0 2'5 2'5 5"0 5'0 2"5 2"5 5'0 5'0 2"5 2"5 5"0 5"0
0-0 0'5 0"0 0"5 0'0 0'5 0"0 0"5 0-0 0-5 0-0 0"5 0-0 0"5 0.0 0'5 0.0 0"5 0.0 0-5 0"0 0"5 0-0 0-5 0.0 0"5 0"0 0"5 0'0 0'5 0"0 0"5
19"167 0 19"0904 19"0864 18"9806 5.959 6 5-1039 12"2957 11'686 8 1"460 5 1"389 8 4-953 9 4-389 1 1"678 1 1"555 3 5-590 3 4-902 2 38-414 8 38-290 5 38-334 0 38" 180 8 11-9896 10-3070 24"7501 23-5880 2"136 2 2"045 8 7"2100 6"417 5 2"4504 2"285 1 8"1294 7'1594
0-446 5 0'536 3 0"4478 0"5502 0"2772 0"4164 0-3784 0-5599 0-258 7 0-401 7 0-484 8 0-729 ! 0"2707 0'420 3 0"5025 0"7576 0"8960 1"068 9 0'893 0 1"072 7 0-546 1 0"8123 0"749 0 1-091 1 0-366 2 0"5654 0"6856 1'023 4 0-384 1 0'593 1 0"7124 1-0658
0-037 6 0"0400 0-037 6 0"0400 0"6985 0"7398 0"3773 0-404 9 0-966 2 0"9672 0"8872 0-898 6 0-961 4 0"963 5 0-873 0 0-887 1 0"0376 0-040 0 0"0376 0"0400 0"698 5 0"7398 0-377 3 0 "4 0 49 0"9662 0-9672 0"8872 0"8986 0-961 4 0-963 5 0-8730 0-8871
0"0225 0-026 8 0-022 5 0"0268 0-0136 0"0201 0.018 9 0-027 4 0"0055 0"0085 0"0103 0"0153 0-005 8 0-0089 0.010 8 0"015 9 0"0225 0"0268 0"0225 0-026 8 0.013 6 0"0201 0.018 9 0-027 4 0'005 5 0"0085 0-010 3 0-015 3 0"0058 0"0089 0-0108 0'0159
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 l8 19 20 21 22 23 24 25 26 27 28 29 30 31 32
parameter levels. For synthetic 'model' output, a value in each period t is obtained using the expression X t = Z 1 q- Z 2 t -q- "C3
sin (trc/6) + et
(11)
where each e~ is a correlated residual obtained from the expression et = p e t - 1 + ~ x / ~
- p2 z
(12)
values of the standard normal variate z being obtained by the 'rejection technique' (Mihram, 1972). The same procedure is used to synthesize real-
Regression o f a model on real-system output
! 89
system outputs. One hundred sets of paired observations, each over 24 time periods, have been generated for each of 32 parameter sets. Means and standard errors for the sample regression intercept and slope over all 100 samples are presented in Table 1. For each parameter set, the mean regression intercept is greater than zero and the mean slope coefficient is less than unity. Further, the standard errors are relatively small, implying that means of the population sampling distributions have been estimated with a high degree of precision. If the F test were applied to any individual pair of samples, it is likely that the null hypothesis of zero intercept and unit slope would be rejected, even though the 'model' and 'real-system' populations are identical. The expected slope could even be of the order of 0.04 or less, for a valid model. Table 1 also identifies parameter values for which the condition of zero intercept and unit slope will be most closely approximated. This occurs when values in the paired samples vary over a wide range of magnitudes, due to a sharp upward trend in values (e.g. for parameters sets 9 to 16) or strong seasonality (sets 5 to 8 relative to sets 1 to 4). Low unexplained variation in the paired series also reduces the intercept and increases the slope, e.g. in sets 5 and 6 compared with sets 7 and 8. Serial correlation tends to reduce random variation in individual series, and hence to reduce the intercept and increase the slope marginally, e.g. set 6 as against set 5. The outlined data generating process provides relatively general sets of synthetic model and real-system outputs, upon which to perform the regression test of validity. In practice, often only part of the variation in real system output is captured in a model. The sampling procedure outlined above has been repeated, with a lower random component in synthetic model output relative to synthetic real-system output. A pattern of sampling distributions similar to that of Table 1 was observed. Frequently, a deterministic model is used to represent a stochastic real system. In this case, use of the regression test is inappropriate, because the dependent variable (model output) does not satisfy the assumptions regarding the error term, upon which regression analysis depends. It would be more acceptable to regress real-system outputs on model outputs; this has not been carried out in the current analysis.
DISCUSSION Statistical tests are but one approach to model validation. Their use is limited by the appropriateness of assumptions and by the difficulty of accrediting a true null hypothesis (as distinct from discrediting a false one). As a particular example, while the simultaneous test of zero intercept and
190
s. R. Harrison
unit slope of model o u t p u t regressed on real-system o u t p u t has intuitive appeal, theoretical evidence reveals that the expected values of the sampling distributions for intercept and slope are positive, and less than unity, respectively. If model and real-system data series vary over narrow ranges, the bias in parameter estimates for this test m a y be large. Further, this test may have perverse behaviour in that progressive refinement of a model can reverse the statistical decision from 'valid' to 'invalid'. The conclusion is that this is not an appropriate procedure for validation of farming systems models. Limitations of statistical tests have led to suggestions that statistical methods be confined to use as descriptive (and not inferential) devices for c o m p a r i n g model and real-system outputs, and that these be combined with procedures such as graphic comparison and appraisal of face validity by subject matter specialists (Harrison, 1987). Most farming systems models evolve through a series of prototypes, perhaps over a n u m b e r of years. The essentially subjective nature of validation, and the practical reality that confidence is gradually built up in a model as it proceeds through a n u m b e r of versions should be recognized. REFERENCES Aigner, D. J. (1972). A note on the verification of computer simulation models. Management Science, 18(11), July, 615-19. Dent, J. B. & Blackie, M. J. (1979). Systems Simulation in Agriculture. Applied Science, London. Harrison, S. R. (1987). Validation of models: Methods, application and limitations. In Computer Assisted Management of Agricultural Production Systems, RMIT, Melbourne. Harrison, S. R. & Tamaschke, H. U. (1984). Applied Statistical Analysis. PrenticeHall, Melbourne. Jones, C. A. & Kiniry, J. R. (1986). CERES-Maize: a Simulation Model of Maize Growth and Development. Texas A & M University Press, College Station, TX. Law, A. M. (1983). Statistical analysis of simulation output data. Operations Research, 31(6), 983-1029. Mihram, G. A. (1972). Simulation: Statistical Foundations and Methodology. Academic Press, New York. O'Keefe, R. M., Balci, O. & Smith, E. P. (1987). Validating expert system performance. IEEE Expert, 2(4), 81-90. Ritchie, I. R. (1976). A crop irrigation simulation model for individual farmer use. M.S. thesis, Lincoln College, Canterbury, New Zealand. Shannon, R. E. (1975). System Simulation: the Art and the Science. Prentice-Hall, Englewood Cliffs. Wright, R. D. (1972). Validating dynamic models: an evaluation of tests of predictive powers. In Proceedings of the Summer Computer Simulation Conference, San Francisco. Simulation Councils, Inc., La Joila, CA, pp. 1286-96.