Variability within system models: A case study

Variability within system models: A case study

Agricultural Systems 37 (1991) 309-318 Variability within System Models: A Case Study D. A. Elston* & C. A. G l a s b e y Scottish Agricultural Stati...

442KB Sizes 0 Downloads 207 Views

Agricultural Systems 37 (1991) 309-318

Variability within System Models: A Case Study D. A. Elston* & C. A. G l a s b e y Scottish Agricultural Statistics Service, JCMB, King's Buildings, Edinburgh EH9 3JZ, UK (Received 4 March 1991; accepted 30 May 1991)

A BS TRA CT Forms of variation which arise within system models are considered, using as an example a model for energy requirements in growing cattle. Problems encountered include the need to summarize groups of lines and to satisfy logical constraints on values of variables. Effects of Gaussian variation on prediction of weight gain of cattle are found to be substantial with biases of up to 20% of the deterministic model outcomes, and standard deviations ranging from 20% to 150%. In some cases the distribution of predicted values is far from Gaussian.

1 INTRODUCTION Mathematical modelling is a way of testing quantitative understanding of interacting systems. If successful, the model will provide an aid in system management. The modelling process may, alternatively, highlight areas of relative ignorance which need further investigation (Dent & Blackie, 1979). Modelling starts with a logical analysis of the system, aided by flow diagrams (Forrester, 1968). The system is split into a number of distinct submodels, each of which is then modelled separately. Finally, the submodels are combined to form the system model. Estimation problems inherent in building system models require much * Present address: Scottish Agricultural Statistics Service, Macaulay Land Use Research Institute, Craigiebuckler, Aberdeen AB9 IQJ, UK. 309 Agricultural Systems 0308-521X/91/$03.50© 1991 ElsevierSciencePublishers Ltd, England. Printed in Great Britain

310

D. A. Elston, C. A. Glasbey

more attention (Curnow, 1984). It should be borne in mind that it is the model predictions, rather than the parameter estimates, which are important (Wallach & Goffinet, 1987). Although standard statistical methods are adequate for estimation within submodels (for example Ross, 1990), transformations when submodels are used for prediction lead to complications. As a simple example, Fig. 1 shows data relating weight gain (D kg/day) of cattle, over 2-weekly periods, to feed intake (I M J/day). If D is regressed on L the fitted model is D = 0.018I - 0.9 However, if this model is used to predict the intake to achieve a target weight gain, the result is I = (D + 0-9)/0.018 = 54D + 49 which is a very poor predictor of L It greatly overestimates the necessary intake for each kilogramme of weight gain. The best predictor is obtained by regressing I on D, giving the quite different I-- 10.6D+71 This paper examines sources of variation which arise in a simple model of the energy requirements for growing cattle, as described in an Agricultural Research Council working party report (ARC, 1980). The 3.0-

2.0-

• ,,/1= 1•0.6D+71 •

• •

1.o-

. • .-

t "

181-

"/ " t ~

t

0.9

• •





oo

-1.0-

o'.o Fig. I.

5d.o

'

100.0

'

150.0

Observed weight gains (D kg/day), over 2-weekly periods, plotted against feed intake (I MJ/day). Lines denote regression of D on L and of I on D.

Variability' within system models: a case study

311

consequences of variability upon the model predictions of weight gain are considered and they are compared with the model performance reported by an interdepartmental ADAS/SAC/UKASTA working party (ADAS, 1986). This working party compared energy model predictions of weight gain with those observed in a range of experiments and found that, on average, the model overpredicted gain by 15%. Individual animals showed substantial variation about the average: the standard deviation was 20% of the mean.

2 E N E R G Y MODEL The ARC (1980) model is deterministic. It uses empirical physiological functions to relate animal growth to energy intake, based on the equation M

DV

I = k m + kg where: I is the daily metabolizable energy intake (M J/day); M is the daily maintenance requirement (M J/day); k m is the efficiency of utilization of energy intake for maintenance; D is the rate of liveweight gain (kg/day); V is energy value of weight gain (MJ/kg); kg is the efficiency of utilization of energy intake for weight gain. The model has one state variable, liveweight W (kg), and one dimensionless auxiliary variable, metabolizability of the energy content of the feed, q. There are three submodels, describing (i) maintenance requirement of fasting animals, through an equation for M as a function of W; (ii) energy requirement for growth, leading to an equation for V in terms of W and D; (iii) efficiency of use of feed energy to meet the maintenance requirement, kin, and for growth, kg, which are related to q. We shall consider sources of variation between animals for each submodel in turn. 2.1 Maintenance submodel

The maintenance submodel consists of one equation, namely M -- a m O67 + 0.0043 W The first term is the energy cost of basic metabolism, incorporating a well established exponent of 0.67, whilst the second is an adjustment for walking when animals are outside. From a single observation on each of

312

D. A. Elston, C. A. Glasbey

88 animals in 8 calorimeter experiments, the constant a was estimated as 0.53 (ARC, 1980). Variation of observations about fitted values for the equation comes from three sources: between experiments, between animals within experiments, and measurement error. The last two forms of variation cannot be separated because there is just one observation per animal. Therefore we can only reliably estimate between-experiment variability. If we regard the exponent of 0.67 as fixed, then each experiment provides a separate estimate of a. The standard deviation among them is 0.03, which we will use in later simulations. The assumption of a fixed exponent of 0-67 is not critical, because the variation between experiments in predicted values of M is fully represented in different values for a.

2.2 Energy requirements for growth The energy content of weight gain was estimated from serial-slaughter experiments (ARC, 1980). A partition of body tissue into protein (P g/kg) and fat (F g/kg), leads to the equation dF Vo = 39.3 ~ +

dP 23.6 d W

where Vo is the energy value of gain at a growth rate of 0.65 kg/day. This is adjusted for other growth rates by V--

0.9035 Vo 1 - 0-1475D

The energy values of fat and protein, namely 39.3 and 23-6 MJ/g are subject to uncertainty. However, for simplicity, and lack of more detailed information, we will consider them fixed and concentrate on estimating dF/d W and dP/d W. Results from 41 experimental groups were available as log-log regressions. For example, in group i, l o g F = b i + e ilog W for estimated constants b~ and c~, and W in the range Li to U~. These data contain no information about the growth of individual animals, which will have to be assumed to follow the group average. The problem is to find a way to estimate V, by combining the results from different groups, which takes account of the range of W in each equation, and allows estimation of between-group variability. The derivative dF/d W, and not F itself, is of interest. This can be obtained either by relating F to W and differentiating, or by differentiating each experimental curve and averaging them. In general, these methods

Variability within system models: a case study

313

will not give the same results. Because use of the derivative of the average does not allow good estimation of the variability about the derivative, we opt to average all the separate differentiated curves. F r o m experimental group i we obtain dF = ci exp (bi) W ci dW

for Li < W < Ui

I

We seek a function d F / d W = G ( W ) which has minimum mean-squared distance from the experimental results, that is it minimizes

i Ui

1 '~ ~, U i - - L i

L~

(ci exp (bi)

W ci' --

G(W)) 2 d W

In practice, this can be implemented simply by replacing each experimental curve by a set of (say) 10 points equally spaced in W between L and U, and fitting G(I4/) using a standard least squares algorithm. A straight line appeared adequate for G(W), and similarly for d P / d W as a function of W. These results combine to produce Vo : 5.56 + 0-030W with standard deviation 0.013 W. The multiplying factor of W is included in the expression for the standard deviation to account for variation increasing with body weight.

2.3 Efficiency factors Information about efficiency of energy use is contained in the linear equations (ARC, 1980) k m = 0.35q + 0.503 kf = 0.78q + 0.006

(SD 0.064) (SD 0.097)

where kf is the efficiency of utilization of energy for weight gain when intake is exactly twice that needed for maintenance. From this kg -

kf

L--

1 -- (kf/km)L

1

1 -- (kf/km)

I

where L '

lkm

M

Efficiency factors must lie between 0 and 1, and further, for the expression for kg to make sense, kf must be less than km. The inclusion of variability in the equations for k m and kf must not cause these constraints to be violated. For the above levels of variability, and q between 0.4 and 0.8, only kf < km is ever violated. In simulations this occurred up to 5% of the time, and they were discarded.

314

D. A. Elston, C. A. Glasbev

A further difficulty in introducing variability is that deviations from the two lines will not be independent; some of the variability will be due to the properties of the feed quality being inadequately described by a single variable, q. Lacking further information, we simply consider the extreme cases, namely correlations of zero and one, in simulations which follow.

3 E F F E C T OF V A R I A T I O N S ON P R E D I C T I O N S The model proposed by the A R C working party is deterministic. As we have seen, the relationships are subject to substantial variation between groups of animals. This variability can be incorporated directly into the equations to build an equivalent stochastic model. We shall investigate the relationship between errors in the submodels and the corresponding distributions of predictions of weight gain, based on D=

I-

M/k m

0.9Vo/kg + 0.1475(I - M / k m )

Table 1 reports biases and standard deviations of predicted weight gains when Gaussian variability is introduced into one, or all three, submodels. Eight sets of values of (W, q, /) are considered which are at the extremes of realistic values. Summary statistics are based on 1000 independent simulations in each case, generated using the N A G Fortran subroutine library (NAG, 1987). Variation in maintenance requirement appears to be relatively unimportant; mean predictions are almost unbiased, and the distribution of predictions has only a narrow spread. Variation in energy value of weight gain, however, has a much bigger effect. Mean predictions are consistently upward biased. For animals weighing 600 kg, biases are substantial and standard deviations are over 35% of the predicted weight gains. The effect of variation in the two efficiency factors depends on the presence of correlation. With complete correlation, mean predictions are approximately unbiased and standard deviations are large. However, with zero correlation the mean predictions are downward biased and standard deviations are slightly smaller. Simultaneous variation in all three submodels can lead to either positive or negative bias, similar to the sum of the biases caused by variation in each submodel alone. Biases are found to be as extreme as --10% and +20% of the deterministic model outcome. Standard deviations are also close to those obtained by simply combining the components, by summing the squares of the separate standard deviations, then square-rooting. They range between 20% and 150% of the predicted weight gains.

Variability within system models." a case study

315

TABLE 1 Summary Statistics for Predicted Weight Gain, Derived from 1000 Simulations of Model with R a n d o m Variation Included in One or All Submodels

Liveweight, W (kg) Feed quality, q

100 0.4

600 0.7

0.4

22 250

38 750

18 37 250 1250

(i) Maintenance -2 (ii) Energy of gain 7 (iii) Efficiency factors (cor = 0) --10 Efficiency factors (cor = 1) 6

2 11 -32 6

Intake, I (MJ/day) Predicted weight gain, D (g/day)

74 250

0.7 114 750

61 250

111 1250

-- 2 - 1 4 11 --8 28 6 -- 17

2 --1 54 93 15 ---32 2 9

1 40 -7 1

-1 90 36 -- 5

--9 .... 1

28 21

19 50

64 91

50 30

88 120

Bias in weight gain (g/day)from variation in

(i)-(iii) (i~(iii)

(cot = 0) (cor = 1)

11 --17 5 4

SD in weight gain (g/day)jrom variation in (i) Maintenance (ii) Energy of gain (iii) Efficiency factors (cor = 0) Efficiency factors (cor = 1)

50 40 110 130

10 100 210 220

70 40 110 130

20 130 190 220

60 220 120 150

20 400 210 230

80 170 130 160

30 440 200 230

(i)-(iii) (i)-(iii)

130 150

230 250

130 150

240 270

270 280

420 440

380 300

550 590

(cor = 0) (cor = 1)

The interdepartmental ADAS/SAC/UKASTA working party (ADAS, 1986) reported a bias o f - 1 5 % , and a standard deviation of 20°/,, of predicted weight gain. Overall, the results presented here show positive biases and larger standard deviations. Setbacks in growth, such as sickness, may explain the negative biases observed in practice. The differences in standard deviations may result from associations between parameter values in different submodels, about which the authors have no information. The magnitudes of these variations, in either case, are sufficient to have serious consequences for animal husbandrymen who feed cattle solely on the basis of the deterministic model. A probability density function (pdf) provides full information about the distribution of predictions. In the case of variation in a single variable in one submodel, the pdf of the predictions can be calculated using standard transformation theory. Suppose X is the variable, with a Gaussian pdf denoted f(X). Let Z -- h(X) be the model outcome, where h is a

D. A. Elston, C. A. Glasbey

316

7.06.0-

5.0s 4.03.0-" 2.0-

1.0-" 0.0 0.0

0.2

014

o16

o18

11o

0'6

018

110

(a) 7.06.0 ~ 5.0 ~ 4.03.0 ~ 2.0 ~ 1.0 ~ 0.0 0.0

0'.2

0'4 (b)

80.0-

60.0-

40.0-

20.0-

0.0

0.!71

o.~2

o.~3 (c)

o.~4

o.%

I 0.76

Fig. 2. Probability density functions for predicted weight gain, when variation is included in one submodel: (a) maintenance submodel varied, with W - - 100 kg, q -- 0.7, I -- 18 MJ/day; (b) energy of gain submodel varied, with W - - 600 kg, q -- 0-4, I -- 74 M J/day; (c) maintenance submodel varied, with W -- 100 kg, q = 0-4, I = 38M J/day.

Var&bility with& system models: a case study

317

monotonic function and h~ is differentiable. Then the associated pdf g(Z) is I dh-J(Z) dZ

f (h-'(Z))

where the derivative can be calculated numerically if necessary. Note that the monotonicity condition can be relaxed to one of piecewise monotonicity, in which case g(Z) is the sum of contributions from each root of the equation X = h-l(Z). For the energy model, the estimated pdf of weight gain takes many different forms, as shown in Fig 2. The one-tailed distribution (c) is of particular interest. It occurs because the function relating D to M (denoted h above) reaches a maximum at D = 0.757. The pdfs in Fig. 2 can be classified by the shape of the curve defined by h. Symmetric pdfs arise from local linearity. Skewness results from h being convex or concave. When h is not monotonic, the pdf will contain at least one singularity. This may also lead to truncation of one or both tails.

4 DISCUSSION The three submodels of the energy model each illustrate a different aspect of variability. Estimation in the maintenance submodel is straightforward, and the variation between groups leads to relatively small variability in predictions. The energy value of gain submodel, however, needs a non-standard estimation procedure for fitting a line to data which are themselves lines. The large errors associated with this lead to large variation in model predictions, some of which may be attributable to known differences between groups of animals. For the efficiency factors submodel, the estimates given by ARC (1980) are used, and note the effect of correlation between variations in the two equations. In none of the submodels have the authors been able to consider variability between individual animals, instead of variability between groups. Relationships between values of parameters in different submodels have been ignored, as is usually the case in systems modelling. O'Neill et al. (1980) give an example where simulations from the sampling distribution of parameter estimates give rise to model predictions which are incompatible with the real system. The problem could, in theory, be overcome by estimating all parameters simultaneously, but there will rarely be sufficient data to allow this. All models of biological systems are subject to substantial variability

318

D. A. Elston, C. A. Glasbey

of the kinds described above, whether it is purely variation between individuals or imperfect knowledge of parameters. Variability, though difficult to incorporate into models, is an essential part of real systems which may be crucial when models are being used, for example in system management. REFERENCES ADAS (Interdepartmental ADAS/SAC/UKASTA Working Party Report) (1986). Nutritive Requirements of Ruminant Animals: Energy. Agricultural Development and Advisory Service, Bristol, UK. Agricultural Research Council (1980). The Nutrient Requirements of Ruminant Livestock. Commonwealth Agricultural Bureaux, Slough. Curnow, R. N. (1984). Statistics in biometry and agriculture. Present position and potential developments: some personal views. Journal of the Royal Statistical Society, A, 147, 349-58. Dent, J. B. & Blackie, M. J. (1979). Systems Simulation in Agriculture. Applied Science Publishers, London. Forrester, J. W. (1968). Industrial Dynamics. MIT Press, Cambridge, MA. NAG (1987). Numerical Algorithms Group Fortran Library Manual--Mark 12. NAG, Banbury Road, Oxford. O'Neill, R. V., Gardner, R. H. & Mankin, J. B. (1980). Analysis of parameter error in a nonlinear model. Eeological Modelling, 8, 278--311. Ross, G. J. S. (1990). Nonlinear Estimation. Springer-Verlag, Berlin. Wallach, D. & Goffinet, B. (1987). Mean squared error of prediction in models for studying economic and agricultural systems. Biometrics, 43, 561-76.