Data-Based Mechanistic Modelling for the Natural and Physical Sciences

Data-Based Mechanistic Modelling for the Natural and Physical Sciences

Copyright © IFAC Modelling and Control in Biomedical Systems, Warwick, UK, 1997 DATA-BASED MECHANISTIC MODELLING FOR THE NATURAL AND PHYSICAL SCIENCE...

2MB Sizes 0 Downloads 120 Views

Copyright © IFAC Modelling and Control in Biomedical Systems, Warwick, UK, 1997

DATA-BASED MECHANISTIC MODELLING FOR THE NATURAL AND PHYSICAL SCIENCES

Peter Young and Arun Chotai

Centre for Research on Environmental Systems and Statistics (CRES), Lancaster University, u.K.; E-mail: [email protected]

Abstract: The paper describes the Dala-Based Mechanistic (DBM)approach to the modelling of linear or nonlinear, stochastic, dynamic systems. The DBM modelling strategy involves several stages: starting with the dala-based identification of the model structure using nonparametric estimation methods which exploit recursive fixed interval smoothing procedures; and ending with the final estimation and validation of the identified linear or nonlinear model. The efficacy of the approach is illustrated by two practical examples: a non linear model for the population dynamics of the Australian sheep blowfly Lucilia cuprina; and a data-based approach to the identification and estimation of dominant modal behaviour in a complex, nonlinear global carbon cycle model. Examples in other application areas, such as hydrology, plant physiology and engineering, are described in the cited publications Keywords: linear systems; nonlinear systems; identification; parameter estimation; nonparametric estimation; fixed interval smoothing; biological modelling; climate modelling; ecological modelling; model reduction and simplification; dominant modes; tracer studies.

or state-dependent parameter polynomials in the backward shift operator Z-i (i.e. z-iy(k)=y(k-i», which are defined as follows,

1. INTRODUCTION

Previous publications on Data-Based Mechanistic (DBM) modelling (Young, 1988, 1993, 1996, 1997; Young and Runkle, 1989; Ng and Young, 1990; Young and Minchin, 1991; Young and Lees, 1993: Young and Beven, 1994) have argued that a fairly general class of nonlinear systems can either be represented exactly, or approximated, by a multi variable transfer function model with time variable or state dependent parameters. In the single input-single output case, this model can be written in the form I

A(k , z-l)= l+a l (k)Z-1 + ... + an(k)z-n B(k,z-I) = bo(k)+ bl (k)Z-1 + ... + bm(k)z-m

Equation (1) can be written conveniently as, y(k) = x(k) + ~(k)

(2)

where x( k), which can be considered as the noise free output of the model, is defined as,

(1)

x(k)= B(k,z-I) u(k) A(k, Z-I)

where y(k) and u(k) are, respectively, the input and output variables sampled at the kill sampling inslant; while A(k,z-l) and B(k,z-l) are, in general, time

(3)

and ~(k) is a stochastic noise term which is assumed to be statistically independent of the input variable u(k) . Finally, any pure time delay t5 affecting the relationship between u(k) and y(k) can be accommodated by selling the t5 leading coefficients

I the more general case IS diSCUSSed fully in the previous publications. to which the reader is refemed for further details.

315

of the B(k, Z-I) polynomial equal to zero. In the present context, the time or state-dependent parameters in (1) will reflect the nature of any nonlinear or nonstationary aspects of the system behaviour; and their statistical estimates, based on the data y(k) and u(k), will provide a potential source of information on the nature of the non linearity and/or nonstationarity in the system.

of we uncertainty on the FIS estimates. Nevertheless, it still provides information on the relative accuracy of the parameter estimates which, as we shall see, can prove useful in evaluating the detailed nature of the estimated parameter variations. Three important comments on this non-parametric estimation procedure are necessary . First, the performance of the FIS algorithm is controlled mainly by we so-called hyper-parameters; i.e. the Noise Variance Ratios (NVR's) associated with the assumed stochastic models for the parameter variations (normally stochastic random walk models: see We previous publications). These NVR values are optimised in a maximum likelihood sense using the Kalman filter and prediction error decomposition (Schweppe, 1965). Second, there is often some advantage to be gained by an iterative process of 'back-fitting' in which, at each iteration, each TVP is estimated in tum by FIS, with the data vectors sorted in some manner (e.g. ascending order). This can reduce the rate at which the estimated state dependent parameters change and so facilitate their accurate nonparametric estimation. The details of this procedure are discussed in Young et aI., 1997b. Finally, the model (4) should not be over-parameterised, otherwise any temporal variation in the estimated parameters may be due to such over-parameterisation rather than to the presence of real nonlinear/nonstationary behaviour (see e.g. Young 1969a, 1984).

2. THEORETICAL BACKGROUND The approximation of a nonlinear system by a TimeVariable Parameter (TVP) or State-Dependent Parameter (SDP) model such as (1) is the key assumption in the first stage of DBM modelling. It can be argued tllat the estimation of its (possibly) time variable parameters from the observed timeseries ( y(k); u(k) 1 provides a useful mechanism for tlle identification of a time-invariant parameter nonlinear model of tlle system. As in the previous publications cited above, such TVP estimation is based on a powerful Fixed Interval Snwolhing (FIS) method of recursive estimation. 2.1 Non-Parametric Estimation by Fixed Interval Smoothing

The details of FIS estimation are given in tlle prior publications cited above and it will suffice here to note that the FIS algorithm is applied to the following altemative vector form of the model (1),

2.2 A Simple State Dependent Parameter Model y(k) = z(kl a(k) + TI(k)

(4) If, at the initial TVP estimation phase of the analysis, Ule estimated parametcrs appear sensibly stationary over time, then it can be inferred that the system is predominantly linear and further analysis is not required. Otherwise, the analysis proceeds, with the aim of investigating the nature of the estimated parameter variations and atlempting to identify whewer these variations can be related to any of We variables in an extended or Non-Minimum Stale Space (NMSS; see also Young et aI., 1987) defined by the following NMSS state vector,

where, z(kl = [-y(k -I) - y(k - 2) ... - y(k - n) ...

u(k) ... u(k - m)] a(k) = [at (k) a2 (k) . .. an(k) bo(k) ... b",(k)]T

and TI(k) is a stochastic noise term arising from the presence of g(k) in (I). As in we case of g(k), it is assumed tllat TI(k) is statistically independent of the input variable u(k).

(5)

Based on this model form, the FIS algorithm provides an estimate li(kl N) of we model parameter vector a(k) at every sampling instant k, conditional on We time-series data ( y(k): u(k) lover the whole observation interval k = 1,2, .. . , N . In addition, if tlle noise TI(k) is assumed to be a zero mean sequence of serially uncorrclated random variables (discrete white noise), it also yiclds an estimate, at each sampling instant, of tlle covariance matrix p* (kl N), where P*(kIN) = E{a(kIN)a(kl Nll and the estimation error a(kIN) = a(kIN)- a(k) . If TI(k) is not white noise, Wen the matrix P*(kIN), as derived from we FIS a1goriLllIn, will not provide an accurate estimate

This is composed of the measured vector z(k) in (5), as well as (if necessary) We present and q past values of a vector U(k), consisting of s other variables Uj(k-i), j=I,2, ... ,s;i=O,1,2 ... . ,q which may affect we system non linearly but whose relevance in this regard is not be clear prior to time-series analysis. In other words, the analysis is intended to initially identify any state dependency in the parameter variations which is associated with nonlinear behaviour in We system and which can help to define the nature of the nonlinear system.

316

The other variables Vj(k), if they are required, will depend upon the example being considered. For instance, in the blowfly population example considered later in section 3, y(k) is the blowfly population; u(k) are the eggs laid per day by the blowflies; and V/k) , j = 1,2 are, potentially, the food supplied per day to the blowflies and the larval population.

of each one can be obtained by plotting each element of n(kIN) or M{x(k),a} against the equivalent element of z(k) . Note that there is nonnally no significant detrimental effect on the results if only diag( P*(kIN» is used in tl1e definition of W(k) (or W*(k». Also, tlle exact nature of the M(x(k),a} elements is specified in relation to the NMSS state variables by reference to the FIS estimates (a(kIN), P*(kIN)} and the physical nature of the system under consideration . This emphasis is essential to DBM modelling: while the choice of nonlinear transfonnation should not be dictated by tl1e prior perception of tile modeller about the nature of the nonlinear system, it should be capable of a rational explanation in relation to the physical (here biological, ecological, climatic) nature of tlle system.

At this stage in the analysis, it should be possible to develop some general nonlinear representation of the estimated parameter variations a(kl N) in tenns of the other measured state variables in X(k). A generalisation of a procedure suggested and utilised by the first author some twenty five years ago (Young, 1969a,b), is to assume tllat a(k) is nonlinearly related to functions ofaX(k), i.e.,

2.3 Final Estimation of the Nonlinear Model a(k) = M{x(k),a(k»)

(6) The primary aim of tl1e analysis in 2.2 is to identify whether the system is linear or nonlinear and, in the latter case, to identify a sensible functional form for each .of the elements in the state dependent transformation matrix M(k) = M (X(k»); in other words. the analysis is aimed at nonlinear model structure identification . This is necessary because the estimates of the parameters which characterise this functional form via tlle optimisation in (8) or (8a) are merely a vehicle in this identification process and it cannot be assumed that tl1ey always will provide statistically consistent or efficient estimates. Once a plausible structural form for the noniinear model has been identified, however. it is tllen possible to reestimate the model against the time-series data by some form of numerical optimisation (e .g. deterministic minimisation of the model residual variance; maximum likelihood estimation; prediction error minimisation etc.), tlle exact nature of which will tend to depend on the identified form of tlle model. For example, in the case of the TVP model (4) , the identification will, in general , suggest optimisation of tlle following estimation model

where, in tilis paper, M{x(k),a(k») is restricted to be a diagonal transformation maLrix whose elements are functional I y dependent upon X(0, while a( k) is a parameter vector which characterises the relationships and should, ideally, have time-invariant elements if tlle introduction of tlle state dependency in this manner proves successful. If a(k) = a is time invarial1l, tilen an estimate of M (X(k), a) can be obtained by assuming tllat , a(kIN) = M (X(k),a) + ECk)

(7)

where ECk) is a zero mean , white noise vector witil covariance matrix p* (kl N) that is introduced to allow for tlle uncertainty in the estimate a(kl N) . lllis suggests tllat a Weighted Least Squares (WLS) estimate a of tile parameter vector a in M{x(k),a) can be obtained by minimising t~e following non linear cost function, N

(8)

J = Illa(kIN)- M{x(k),a)11 k=l

W (k )

y(k)

where tlle weighting matrix W(k) is defined as tlle inverse of tlle estimated co variance matrix i.e., W(k) = P*(kIN)-I . An alternative but related approach is to minimise tlle modified cost function,

k=l

1) - ., -

an (Xlk)}y(k - n)

(9)

Alt1lOugh, in general, all of tlle parameters can be state dependent, practical experience suggests tllat, often, only a subset are so dependent, Witil tlle otl1ers remaining constant.

N

(8a)

J= Illn(kIN)-M{x(k).a}11

= -G 1 (X(k)} y(k -

+ ho (X(O)u(k) + .. + bm (X(k)}u(k - m) + 1)(k)

W'(k)

2.4 A Summary of the Complete DBM Procedure for a S/SO System

where n(kl N) = a(kl N). z( k) and tlle dot notation here means multiplication of the two vectors element-by-elemenl. Note that W*(k) is a modification of W(k) which allows for tlle change in J. The cost function (8a) has tlle advantage tllat the resulting estimate M (X(k), a} directly maps the nonlinearities, so tilat a useful graphical visualisation

1) Examine the available time-series data (y(O; u(k») and use tllese to estimate the parameters in tl1e best identified constant parameter, linear TF model using a reliable method of TF model identification and estimation (wc recommend tlle

317

identified nonlinear model are estimated, based on the identified nonlinear model form and all the relevant data in Ule NMSS vector. Any appropriate methods of estimation, such as ML or IV estimation can be used, depending upon the example.

Simplified Refined Instrumental Variable (SRIV) algorithm, which can be used to obtain both discrete and continuous time models; see Young, 1984; Young and Beven, 1994; Young et aI., 1996). Apply standard statistical tests to the model residuals, including tests for nonlinearity (e.g. Billings and Voon, 1986): if the results indicate linearity, then the linear model can be accepted and the analysis is complete. AlLematively, proceed to step (2)

8) Analyse the residuals from the estimated model to ensure tllat there is no evidence of any residual nonlinearity not identified in steps 1) - 7). This should include standard statistical diagnostic and nonlinearity tests: see step I).

2) Based on tlle analysis in 1) and any knowledge about the physical nature of the system, select: (a) the variables that could characterise the NMSS vector X(k); and (b), tlle simplest TF model tllat appears capable of characterising tlle dynamic behaviour of the output variable y(k) in relation to the observed input u(k) (tllis may well be only a first or second order model since even low order nonlinear models Crul have enonnous explrulatory potential).

3. PRACTICAL EXAMPLES The efficacy of any approach to tlle dynamic modelling of (potentially) non linear systems can only be evaluated properly by its successful application to practical examples based on real experimental or monitored data. The first exrunple below is typical of the results obtained so far from the practical application of the DBM modelling strategy. Other practical examples have been reported by the autllOrs and their colleagues in many other areas science, including: rainfall-flow modelling (Young, 1993; Young and Beven, 1994; Young et aI., 1997a), macro-economic modelling (Young and Pedregal, 1997a,b); and the modelling of soil temperature dynamics (Young and Lees, 1993). The second example is rather different: it shows how the SRIV identification and estimation methods used in DBM modelling have been exploited to simultaneously linearise and simplify complex nonlinear models (see Young er aI., 1996; Young and Lees, 1996).

3) Sort the data as discussed previously, use FIS estimation to obtain initial TVP estimates of the parameters in the TF model and define those parameters in a(k) which show significant variation over the observation interval. If any time variable parameters are identified, then obtain the FIS estimate of the TF model parameter vector a(kl N) with tlle time variable parruneters defined by step 2) but with all otller parruneters constrained to be constant. Note the relative accuracy of these estimates over time by reference to the appropriate diagonal elements of the co variance matrix p' (kl N) . This step may involve rul iterative procedure (see Young er al., 1997b).

Since it demonstrates how the primary dynamic modes of a system Crul be statistically inferred from experimental data, this second example also illustrates how the methodology can be useful in other application areas, such as biological, biomedical and pharmokinetic tracer studies. Other publications (e.g. Wallis et aI. , 1989; Young and Wall is, 1994; Minchin et ai. , 1996), for instance, have exploited this same methodology very successfully in the identification and estimation of hydrological and plant physiology models from experimental tracer data.

4) Examine tlle nalure of tlle FIS estimated time variation in the panuneters in relation to all tlle variables in the defined NMSS vector X(k) , using devices such as scatter plots, correlation analysis etc.; in all cases taking into account the relative accuracy of the FIS estimates identified in 3). 5) On tlle basis of tlle results in 4), define the nature of the elements in the transformation matrix M (X(k), a) and then obtain tile WLS estimate of the parruneter vector a. This will tend to be an iterative procedure, with the effectiveness of different M{x(k),a) being evaluated at each iteration until a satisfactory explanation of a(ki N) or n(kl N) is achieved by tlle WLS optimisation.

3.1 A DBM Model of IILe Australian BlOlvfly

Population Dynamics

Fig.l is the best known example of the data collected so laboriously by Nicholson (1954) in his investigation of the Australian sheep blowtly Lucilia ClIprina . It is clear that, in tilis particular experiment where the food (liver) supplied to the blowflies was limited to 0.5 g per day, tile adult blowfly population y(k) (circles: upper graph) and Ule eggs laid per day by tile blowtlies u(k) (full line: lower graph) varied in an apparently systematic fashion tllat is redolent of non linear dynamic behaviour. NOL surprisingly, these

6) The DBM philosophy requires that, if at all possible, any strong state dependent relationships exposed in 5) should be capable of physical (in the present context, biological, ecological or climatic) interpretation . This can often help to avoid tlle identification of potentially spurious nonlinearities. 7) Use the estimates of tlle parameters in 5) as starting values in a final model optimisation stage, where the (hopefully constant) parameters in the

318

data have received much attention in the scientific literature (e.g. May, 1973, 1976; Banks, 1994; Gurney, et aI., 1980; and the references therein).

are low and the food per blowfly is plentiful. There are two ways in which lie data can be analysed to infer lie nature of Ule nonlinear feedback dynamics. Most straightforwardly, lie model (10) can be modified directly to acknowledge that lie egg production rate u(k - S) is a non linear function of the blowfly population y(k). This approach is discussed fully in Young (l997b). Alternatively, since the time delay in the system is large, it is possible to investigate lie feedback-pali dynamics separately by considering directly Ule relationship between y(k), now considered as the input, and u(k), as Ule output. Then, Ule strategy outlined in section 2.4 reveals that lie feedback relationship takes lie form of a simple static nonlinearity wili a pure time delay td of about 2 samples (i.e.

'-lJvvJwvJ Nicholson Blowfly Data and linear Model Output

~

.!I!

.

~.

l~~ o

50

lOO

150

200

250

300

n=O,m=O, S=ld =2),

Fig. 1 Nicholson blowfly data

(11)

In contrast to the analysis described in the previous references, however, the DBM modelling approach makes no a priori assumptions about the nature of the blowfly system but starts with a relatively nonprejudicial analysis of the data, recognising only that the eggs and blowfly series are causally related in some manner. The data in Fig. 1 are sampled every two days and the most obvious relationship between u(k) and y(k), namely the blowfly response to the egg production rate, seems quite linear and can be described well by tile following first order, discretetime TF with a time delay of S = 7 samples (14 days) which accounts for the development of the eggs through a larval stage, prior to the emergence of the adult blowflies:

where the subscripts jb have been added to avoid confusion wili the parameters in Ule forward-paUl TF. FIS non-parametric estimation lien yields the estimate of the nonlinearity shown in Ule rop graph (A) of fig.2, where the dashed lines are the estimated standard error bounds. Estimated Nontinearities

1000

2000 3000 Blowfly Population

1~~:~

j

1000

(10)

UJ

4000

500

o~------------~~------------J o 1000 2000 3000 4000 Blowfly PopUlation

where the estimates of the parameters and their standard errors are obtained by SRIV estimation as bo =1.l61±0.OS4and l =0.736±0.014.

(le

a

The estimate x(k) of the noise-free model output x(k) (see equation (3» produced by this model is shown as tile solid line in the upper graph of fig. 1: it is clear that tilis simple linear model explains the experimental data very well with the associated Coefficient of Determination (COD), Ri-=0.86 (i.e. 86% of the output variance explained by the model: see e.g. Young and Beven, 1994) and the model residuals satisfying the usual diagnostic tests.

o

0.005

Food/BlowOy

0.01

1

0.015

Fig. 2 Initial FIS non-parametric idemification results compared with lie final optimised estimates of lie nonlinearities (see text). The shape of the estimated nonlinearity in fig.2 (A) suggests that the functional dependence of bo.jb(k) on y(k) has Ule form (cL the general equation (6»,

But the modelling story does not end wiUl Ulis linear model of the 'forward-path' dynamics: of much more interest is tile nature of tile 'feedback-path' dynamics, namely the mechanism by which the blowflies produce their eggs. In contrast to Ule forward-path, this mechanism is obviously nonlinear, with lie blowflies producing eggs only when lieir numbers

bo.jb(k)=g.exp( -

y(k-t No

d

»)

(12)

This makes ecological sense since, in the present context, g can be considered to represent lie maximum possible per capita egg production rate; while

319

No is the population size at which maximum reproductive success is achieved. Moreover, such a nonlinearity can be further justified by considering it in the alternative fonn shown in fig.3.

SIMULINK model of (16); and it yielded the following estimates of the nonlinear model parameters:

g = 9.07 ± 0.38; N= 957 ± 34; td

y(k-ld)

= 1.63 ± 0.03

Fig.4 compares the simulated variables i(k) and liCk), as obtained from the optimised model (16), with the experimental measured variables y(k), and u(k), respectively. The COD's associated with these Fig.3 The blowfly-egg production rate nonlinearity This recognises the fact that Nicholson restricted the food supply to J=0.5 g/day, so that the food per blowfly, Jb(k)=0.5/ y(k-t d ), is passed through a limiting nonlinearity of the fonn, (13)

to yield the egg production rate per blowfly, eb(k). When eb(k) in (13) is multiplied by the delayed total number of blowflies y(k -Id)' it produces the total egg production rate u(k), with the effective bo.jb(k) then defined as, b ( k) = g. exp _ y(k-t d ( O,jb J.N

results are R;=0.804 (for i(k» and Rj,=0.62 (for liCk»~. The fonner is not as high as that obtained in the case of the forward-path, linear TF model but this to be expected: here, both the model variables x(k) and u(k) are generated homogeneously by the model as a self generating deterministic limit cycle; whereas, in the case of the forward path TF model, the input is the measured u(k) series. It is clear from fig.4, that the finally estimated model not only provides a very good explanation of the data but, in providing an ecologically meaningful representation of the system, it also confonns with the DBM mOdelling philosophy.

."~~

!:!

c;5000 ~

»)

00

o

A

A

(X(k-td »)

100

ISO

200

250

300

SO

100

ISO 200 Time (doys)

250

300

Fig. 4 Comparison of model generated blowfly and egg production rate variables with Nicholson's experimental data. It is interesting to note that the results reported above agree well with the analysis of Gurney et al. (1980), which was discovered after the present analysis had been completed. However, their analysis is only semi-quantitative, in the sense that they do not use data-based identification and estimation, but simply speculate on the following continuous-time model,

where, e~k = y(k) - X(k)2, eik = u(k) - li(k)2 and x(k), liCk) are, respectively, the estimates of the blowfly population and egg production rate generated by the complete nonlinear model,

A

so

~~C::A]

(15)

A



(14)

which is the same as (12) if No = J. N. The lower two graphs in fig.2 show the final estimated fonns of the total nonlinearity bo.jb(k).y(k-t d ), with bo,jb(k) defined by (14) (as in fig.3); and the 'internal' per capita nonlinearity defined by (13). These were obtained by least squares optimisation based on the cost function,

u(k)=g.x(k-ld).exp

.



,;

(16)

di(t) ~ (X(l- - = -a1x(t) + g.x(l- u )exp A

m

A

J .N

from initial conditions i(O), with the forward-path linear model parameters constrained to their SRlV estimated values (see after equation (10» of bo =1.16l±0.054 , QI=-0. 736±0.014and 5=7. 11lis optimisation was accomplished using the leaslsq optimisation procedure in MA TLAB based on a

A

5»)

(17)

No

whose general dynamic behaviour confonns with that exhibited by the Nicholson data. Clearly, the model (17) can be considered as a continuous-time version of (16), althougb Gurney et al. do not identify the presence of the additional time delay td in the feedback path. Also, they do not consider the useful

320

decomposition of the nonlinearity in fig.3, nor explicitly include the blowfly food supply f.

'experimental' simulated data obtained by perturbing Ibe simulation model about its equilibrium solution using special test signals.

The explicit inclusion of f in (16) is useful because it allows us to check the ecological realism of the model still further by examining what it predicts should happen if the food supply is modified. In particular, Nicholson (1954) showed experimentally that, when the food supply was reduced from 0.5 g per day to 0.1 g per day, the average adult population dropped from 2520 to 527. In the case of the model (16) the average populations in these two same cases are 2667 and 544 respectively; a remarkable level of agreement in the circumstances (since the average values here are based on the deterministic limit cycle data produced by (16) rather than the actual data). Certainly this result provides a suitable final validation of the model.

In contrast to the conventional analytical model reduction procedures, Ibis statistical approach not only yields a reduced-order linear model which explains well the observed dynamic behaviour of the complex simulation model, but it also provides a good indication of the 'small perturbational' limits of the model; Le. Ibe level of input perturbation that can be tolerated before the non-linear behaviour of the parent model negates Ibe utility of its linearised approximation. In addition, it provides a method for defining Ibe dominant modes of dynamic behaviour, wilb Ibe identification step suggesting the number of dominant modes and the estimation step quantifying the nature of the eigenvalues (here time constants). In the case of the EL model, the reduced order linear model can be identified and estimated from the 3000 yr. unit impulse response of the high order non linear model (effectively a simulated tracer experiment: see fig.6(a) which shows the first 500 yrs.). The continuous-time SRIV algorithm yields the model,

3.2 Simplification of a Global Carbon Cycle Model

Fig.5 is the block diagram of a deterministic, nonlinear Global Carbon Cycle (GCC) model developed by Enting and Lassey (1993: hereinafter the EL model). The model is divided into 23 boxes as shown in Fig .5 (the deep ocean is composed of 18 boxes), each characterised by levels of total carbon, 13C and

A()

xt =

14C .

3

2

0. 255s +0.518s + 0.0675s + 0.000825 () 3? ut s +0.285s- +0.0127s+0.()()()()258

0.0765 () +---u t s where the second (integrator) term on the r.h .s of the equation arises from the assumption of complete mass conservation in the EL model (see Young et aI., 1996 for more details). As can be seen from Fig.6(b), which shows the error between the reduced and nonlinear model outputs, this differential equation model explains the data slightly better than an equivalently identified discrete-time model, although the difference is not large . The gains and time constants of Ibe Ibree dominant modes in the reduced order model are as follows:

Old

Biosphere

DeepOoean (sub-divided into

decay from all boxes light arrows incticate

14 C

18 layers)

GI = 27.8; TI = 467 years ; G2 = 2.8; T2 = 19.1 years; G3 = 1.03; T3 = 4.3 years

anthropogenic effects

Fig. 5 Block diagram of the EL model. Conventionally, quite large, non-linear, simulation models such as this are linearised analytically by a process involving Taylor series expansion about specified equilibrium solutions. With the advent of powerful computers, such linearisation procedures are now automated in software packages, such as MATLAB. There are various methods available to reduce the order of such linearised models, while retaining their dominant modes of behaviour. However, the SRIV identification and estimation procedures that underpin the DI3M modelling strategy provide an alternative method of deriving Ibe reducedorder, linear model approximations. This is accomplished in a single operation based on

The estimated shorter time constants are similar to the air-sea exchange time of 11 ±4 yrs. , and the troposphere-stratosphere exchange time of 6 ± 2 yrs assumed in the simulation model; whilst the larger time constant is associated with the value of 900±250 yrs for the deep ocean turnover time. It could be argued Ibat since this reduced order model is obtained from the impulse response of the high order nonlinear model at the selected equilibrium condition, it is not necessarily representative of the larger penurbational behaviour of the more complex model. However, the efficacy of the Iinearisation and reduction analysis in this wider sense is demonstrated

321

in Fig.7, which compares the response of the reduced order linear model with the full nonlinear model response to the measured fossil fuel input over the whole historical period 1840-1990. As can be seen, the differences are very small given the level of perturbation and the simplicity of the reduced order model.

This paper has shown how a new and systematic Data-Based Mechanistic (DBM) modelling strategy can be used to identify and estimate physically meaningful linear or nonlinear stochastic models from experimental or monitored data. In contrast to more conventional mechanistic modelling methods, the model structure is not assumed a priori but is inferred directly from the time series data using a combination of parametric and non-parametric estimation procedures applied to a transfer function model with time or state-dependent parameters. However, the final model is only deemed fully credible in scientific terms if, in addition to explaining the time series data well, it is also capable of rational physical (here biological, ecological, and climatic) interpretation.

Model reduction: comparison of impulse respo~

4th order tedu:cd tinear modd

_

o

0.5

26th otd::r oonlinear siroolalOG model

IE:

i'01 O.4 o

CS ~ 0.3 la

u

" {02

The paper has also pointed out the advantages of the SRIV approach to linear time series model identification and estimation used in the initial stages of DBM modelling. The SRIV methodology, which can be applied to both discrete (backward shift or delta operator) and continuous-time (differential operator or Laplace transform) transfer function models, provides a powerful tool for combined nonlinear model linearisation and simplification. In addition, it is particularly useful for the identification and estimation of multiple exponential modes in tracer experiment data. The SRIV, FIS and related algorithms will be available shortly in a time series analysis and forecasting toolbox for MA TLAB currently being developed at Lancaster.

2

5

..;

0.1

o

50

100

150

200

250

300

350

400

450

.soo

Year

Fig. 6(a) Impulse responses of the reduced order and nonlinear simulation models. Mood reduction : error between impulse responses

2.5

_

.eE -ll

.~

com.jIl.lw~

2

dililC~

lm-e (full)

tun: (dulled)

1.5 1

",------------

1:: ~f" ,

..;

·1

,'

-1.5

"~I

REFERENCES Banks, R.B. (1994). Growth and Diffusion Phenomena. Springer-Verlag, Berlin. Billings, S.A. and W.S.F. Voon (1986). Correlation based model validity tests for nonlinear models, Int. 1nl. of Control, 44, 235-244. Enting, LG. and K.R. Lassey (1993). Projections of future CO2 , Division of Atmospheric Research, Technical Paper 27. CSIRO: Australia Gurney, W.S.e. , P.e. Blythe and R.M. Nisbet (1980). Nicholson's blowflies revisited. Nature, 287, 17-21. May, R.M. (1973). Stability and Complexity in Model Ecosystems. Princeton University Press, Princeton, N.1. May, R.M. (Ed.) (1976) Theoretical Ecology . Blackwell, Oxford. Minchin, P.E.H., MJ. Lees, M.R. Thorpe and P.e. Young (1996). What can tracer profiles tell us about the mechanisms giving rise to them? 10urnal of Experimental BOIany, 47, 275-284. Nicholson, AJ. (1954). An outline of the dynamics of animal populations. Australian Zoological 1nl, 2,9-65.

.2L---~~~0----I~OOO~--~I'SOO~---2~OOO~--~2~'SOO~--~3000 Vear

Fig. 6(b) Error between the responses in fig.(6a) 3~ .-

ses to Fossil Foellnpul __Comparison ____of-Model -__Respon - -__ - -__ ~

--~--~

>320

!-ll .~

310

o

<5

270 '---:-::I86';:0,-""'I880~-:-::19':-::OO,--:-:19~20::--""'I94""O,--:-:I960~--:-:19~80:-l Vear

Fig. 7 Responses of the reduced order and nonlinear simulation models to fossil fuel input over the whole historical period (error shown above +325 ppmv) CONCLUSIONS

322

Young, P.e. and A. Chotai (1996). Identification, estimation and control methods for a class of nonlinear dynamic systems. Proceedings of 13th 1FAC World Congress, paper No. 2b-26 5. Elsevier, Amsterdam Young, P.C., and MJ. Lees (1993). The Active Mixing Volume (AMV): a new concept in modelling environmental systems, Chapter 1 in: Statistics for the Environment, (V. Barnetl and K.F. Turkman, Eds.), pp. 3-44. 1. Wiley, Chichester. Young, P.C. and MJ. Lees. (1996) Simplicity out of complexity in glasshouse climate modelling, Acta Horticulturae, 406, 15-28. Young, P.e. and P. Minchin (1991) Environmetric time-series analysis: modelling natural systems from experimental time-series data. Int. Jnl. BioI. Macromol.,13, 190-201. Young, P.C and D. Pedregal (1997a) Data-based mechanistic modelling. In: System Dynamics in Economic and Financial Models, (c. Heij and H. Schumacher, Eds.). J. Wiley, Chichester. Young, P.e. and D. Pedregal (1997b) Macroeconomic relativity: government spending, private investment and unemployment in the USA, submitted to the Journal of Structural Change and Economic Dynamics. Young, P.e. and D. Runkle (1989). Recursive estimation and modelling of non stationary and nonlinear time series. In: Adaptive Systems in Control and Signal Processing, Vol. 1, pp. 4964. Institute of Measurement and Control for IFAC, London. Young, P.C. and S.G. Wallis (1994). Solute transport and dispersion in channels. Appears as Chapter 6 in: Clwnnel Networks, (K.J. Beven and MJ . Kirby, Eds.), pp. 129-173 . Wiley, Chichester. Young, P.c., A.1. Jakeman and D.A. Post (1997a). Recent advances in the data-based modelling and analysis of hydrological systems. Watermatex Symposium, Quebec. Young, P.c., S. Parkinson and MJ. Lees (1996). Simplicity out of complexity: Occam's razor revisited, Journal of Applied Statistics, 23, 165210. Young, P C, M.A. Behzadi , C.L Wang and A. Chotai (1987). Direct digital and adaptive control by input-output, state variable feedback. Int. Jnl. of Control, 86, 1861-1881. Young, P.c., J. Bruun, D. Pegregal and 1. Whittaker (1997b). Nonlinear model identification using non-parametric fixed interval smoothing estimation. CRES, Lancaster University Tech. Note 116. To be submitted for publication.

Ng, C. N. and P.C. Young (1990). Recursive estimation and forecasting of nonstationary time series. Journal of Forecasting, 9, 173-204. Schweppe, F. (1965) Evaluation of likelihood functions for Gaussian signals. IEEE Trans on Information Theory, 11, 61-70. Wallis, S.G., P.C. Young and KJ. Beven (1989). Experimental investigation of the aggregated dead zone model for longitudinal solute transport in stream channels. Proc. Inst. of Civil Engrs, Part 2,87, 1-22. Young, P.C. (1969a). Differential equation Error Method of Real Time Process Identification . Ph.D. Thesis, Dept. of Engineering, University of Cambridge. Young, P.C. (l969b). The use of a priori parameter variation information to enhance the performance of a recursive least squares estimator", Tech. Note 404-90, Naval Weapons Center, China Lake, California. Young, P.C. (1978) A general theory of modelling for badly defined dynamic systems. In: Modeling, Identification and Control in Environmental Systems (G.e. Vansteenkiste, Ed.), pp. 103-135. North Holland, Amsterdam. Young, P.e. (1984). Recursive Estimation and Time-Series Analysis. Springer-Verlag, Berlin. Young, P.e. (1988). Recursive extrapolation, interpolation and smoothing of non-stationary time-series. In: Identification and System Parameter Estimation 1988, (H.F.Chen, Ed.), pp. 33-44. Pergamon Press: Oxford, . Young, P.C. (1993). Time variable and state dependent parameter modelling of non stationary and nonlinear time series". Chapter 26 in: Developments in Time Series (T Subba Rao, Ed.), pp. 374-413. Chapman and Hall: London. Young, P.e. (1994). Time variable parameter and trend estimation in nonstationary economic time series. Journal of Forecasting, 13, 179-210. Young, P.C. (1996) . A general approach to identification, estimation and control for a class of nonlinear dynamic systems. In: Identification in Engineering Systems. (M.1. Friswell and J.E. Mottershead, Eds.), pp. 436-445. Univ. of Wales, Swansea. Young, P.C . (1997a). Data-based mechanistic modelling of engineering systems. Journal of Vibration alld Control, in press. Young, P.C. (1997b). Data-based mechanistic modelling and the Nicholson blowfly data. CRES, Lancaster University Tech. Note 115. To be submitted for publication. Young, P.e. and K.1. Beven (1994) . Data-based mechanistic (DBM) modelling and the rainfallflow nonlinearity. Environmetrics, 5, 335-363.

323