Evaluation of Likelihood Functions for Nonlinear Grey-Box Identification

Evaluation of Likelihood Functions for Nonlinear Grey-Box Identification

Copyright © IFAC 10th Triennial World Congress, Munich , FRG, 1987 \ EVALUATION OF LIKELIHOOD FUNCTIONS FOR NONLINEAR GREY-BOX IDENTIFICATION T. Boh...

1MB Sizes 0 Downloads 56 Views

Copyright © IFAC 10th Triennial World Congress, Munich , FRG, 1987

\

EVALUATION OF LIKELIHOOD FUNCTIONS FOR NONLINEAR GREY-BOX IDENTIFICATION T. Bohlin Department of Automatic Control, R oyal Institute of Technology, Stockholm, Sweden

Abstract. This contribution investigates the feasibility of evaluating likelihood functions associated with dynamic objects subject to substantial random disturbance, and taking into account structural specifications that may be known to the user of an evaluating program but not to the designer of it. The purpose is to develop specifications for writing programs for grey - box identification of dynamic objects , i . e . cases where part of the model structure is known apriori , and therefore the model will be based on data as well as user knowledge . It turns out to be possible to do that by 1) defining " case elements ", specifying the user;s preconception of tentative model structures , and 2) introducing a set of "restriction indicators" , limiting the class of all conceivable model structures to those that are possible to handle bY2 computers. In few words: The general problem is Computer- Aided-Design Design (CAD , one might say) of dynamic models, and the special problem is to cope with the consequences of disturbance and incomplete information . Keywords. System identification; parameter estimation ; model structures; nonlinear identification .

Usually , the model structure A, whose parameters are to be estimated , is a member of an enumerable class, A = A(n) , where n is a vector of integers, e.g. order numbers or structure indices. The likelihood function becomes L(a ld,a,n,A). This means that the likelihood functions for all members in the class A can be evaluated using the same program with n as integer parameters .

INTRODUCTION Let d be the outcome of an experiment carried out on a system S, with or without controlled stimulation a, and assume a "model structure" A , i.e. an algorithm A[a,a,w ] ~ d such that the distribution of the data d it produces is sufficiently close to that of the system S for some a. The finite - dimen sional vector a represents all that is to be inferred about S from the experiment data. The stimulus a includes physical time, if knowledge of the latter is relevant to the modelling of the system response. The unknown "primitive" stochastic signal w has a known distribution (typically gaussian white noise) ; it takes into account the possibility that the same system with the same stimulus may produce different data on different occasions. On the other hand, w is not considered to carry any information of value for the purpose of the modelling. The difference between the un knowns a and w is that if one were to repeat the experiment, possibly with other control a; the value of a would be the same (as would A), but that of w would have changed .

Consider the procedure of model making. Three actors are involved : • The computer. It works on the basis of a set of model - designer programs {T} , a case A, control a , and data d . • The user or operator O. He/she provides the case A and selects the program T to run . • The model - designer designer . He/she writes the programs {T} . They play their parts in the following progression of design tasks:

The given distribution of w yields a distribution p[dla,a ,A] of d. Regard also a as a stochastic variable with apriori distribution p [aI A ). Then all one can ever infer from d about a is the conditional , "aposteriori" distribution p [ald ,a,A ] = p [aIA ] p [d la,a,A ]/p [dla .A]. If nothing is assumed about the value of a, the distribution p[aIA ] may be regarded as constant ove r a very large domain and the unbiassed likelihood function i s L(ald,a,A) = p [dla,c,A ]. Hence it depends fun damentally on the model structure A. In essence, the inference problem is to evaluate a number of characteristics of L(· Id ,a,A ) given d ,a,A, and one of the problems of designing means for computeraided identification of nonlinear systems is to render this computationally feasible for as general classes of structures A as possible.

• Evaluating the likelihood:

a ,d ,a ,n ,A

~

L(ald ,a ,n ,A )

(computer at run time) • ML - estimation (sear ch): d ,a ,n ,A ~ a (computer at run time , possibly guided by the operator) • Orde r determination: d ,a,A ~ n (computer and/or operator at run time) • Structure design: d ,a + A (operator at set - up time) • Deriving and programming the likelihood + L(

·1·, ·,· ,. )

(model-designer designer at any time)

227

T. Bohlin

228

Here "set-up time" means the time the operator sets up cases for the computer to run, e . g . by entering structure and data specifications A, and selecting T by giving "Run T" commands. This may or may not involve compilation. Generally, interactive identification will require a number of such command-driven programs T. The problems of designing general such programs were studied by Bohlin (1984, 1985), and it turns out that evaluating the likelihood for nonlinear cases is the common core and the most demanding part of several of those, viz. sensitivity analysis , fitting, validation, and predictor simulation . The scope of this paper will therefore be limited to the evaluation of likelihood, given d ,a,n , and nonlinear A. Obviously, solving the problem of deriving likelihood functi ons for all model structures A the user may conceive is not feasible. What is feasible is to derive the likelihood for classes of structur es A, and each class {AI~} will have its particular likelihood function Lp and evaluation program associated with it . .The index ~ is a vector of structure-restriction indicators, and it makes it possible to bring some order into the multitude of considerations that affect nonl inear system identification. CASE DEFINING ELEMENTS Call the operator~s structural specifications A "case elements". They are input to the computer , in addition to parameters n ,p, controls a, and data d, and they repre sent the apriori information the operator has obtained from other sources than data d . Each particular A determines the path the computer will take in the program executing T. If the structure is specified by means of an algorithm, the path will go through a program module executing that algorithm. Call programs defining A "user programs ", even if they need not be written by the operator. Object, input source, and data acquisition Let t be time, for simplicity discrete, and collect the physical quantities of interest for the identification into four vectors: U(t )

exogenous input to the object

z(t)

dependent output (Objects for control or prediction)

y(t) p

= measuring transducer output

= constant

parameters (who lly or partly unknown)

Let u = {u(t),u(t-1 ) , ... ,u( l )} , etc. Then a t general stochastic dynamic model may be written o Object model: M[ut,wt,t,n,pltET ] ~ z (t) ,y (t) where w is a sequence of uncorrelated gaussian

t

vectors with zero means and unit covariance matrices, without loss of generality. The sequence models unpredictable random "disturbances". Physically, the latter may have their sources inside the object or in an "environment". In the latter case , a model of the environment is included in M. In any case, modelling disturbances that are not white and gaussian requires that shaping filters be also included in M. Let d(k) be the data obtained at time tk and let

dk

= {d(k),d(k-1) , . . . ,d(l )}.

Define the process

sampling transducer output for data: • Data acquisition: O[y(t k ),kI1::..k::..N] .. d(k)

In order to take into account that not all transducer output y(t ) may be in the data sample, even

k

at sampling instants t , let O[·,k] piCk the com-

k

ponents that are recorded for the purpose of identification at t . This covers the case of different k sfumpling rates as well as occasionally missing data. Also the source of exogenous input must be spec fied: • Input model: C[ ak ' Wt , k , t , n , pltk _1
pO is the origin of free space, i.e . P[ po, O] ~ pO . The free space i s the set of points {pie} reachable by transfering any real vector e into {pl o Define also a distribution of p around pO by T mapping gaussian vectors e with covariance re re into {plo

The free space may be used in several ways: 1) To specify apriori restrictions on p for estimation; only p E {pie} are free to fit to data , and e are the parameters to be estimated. 2) To describe the aposteriori distribution of p after estimation. In that case the estimate is given by pO

=~

and its error distribution T

(asymptotically) by P and re re [-gradegradeL(O) ]

-1

. This requires a st ructure

such that the ML -estimate ~ will be asymptotically normal. 3) To generate random variation in p for simulation in order to appraise model uncertainty , robustness, etc . In that case random p are generated via e

r~ w, where w E Gaussian(O ,I ) .

4) To specify an alternative hypothesis p E {pie} to the null hypothesis p = pO in validati on of the model (Bohlin 1978 , 1982, 1984) : The coordinate e is then implicit and does not appear in the hypothesis. 5) To determine identifiability prior to fitting : The covariance factor

r~ computed from

C, M, V, P,pO, and dN' ON reveals which components of e that are not identifiable. External and internal models The case elements C,M,O,P, the indices n, and the control and data sequences ON and dN define uniquely the likelihood function L( · ,ldN,aN,n,C,M,V,P). To see this notice that

ut,n,p,M determine the apriori output distribution p[Ytlut,n, p,M], and for t k _1 < t

~

tk also

Nonlinear Grey-box Identification • External model

the aposteriori distribution

p[y t lu t ,dk _1 ,n,p,M,V) conditional on data, since dk is a part of Yt . This, in turn, determines the marginal distribution p[d(k)lu ,d _ ,n,p,M,V). t k 1 Secondly, ck,n,p,e determine the input distribution p[utl ck ,n,p,e). The product yields the joint distribution p[d(k),utldk_1 ,ck ,n ,p ,e,M,V), with the marginal distribution p [d(k) ld _ ,c , n, p,e,M,D). k 1 k Since this distribution is conditional on previous data only, it can be expressed , formally , by a predictor model ( 1)

MP[ d _ ,c ,w(k) , n,p ) .... d(k) k 1 k

where {w(k)lk=l , .. . ,N} is a s equence of random variables. The form of MP can be chosen in such a way t hat there is an inverse

i s generated by ck ,wt,n ,p ,e,M,D. The mapping

{w k } .... {d k } i s one to one and the gradient S(k) , J) -1 [gradd(k)w) is left triangular (Bohlin 1984). When t he "nor malized residuals" w(k) and the Jacobian det[S(k)) have been computed as functions

°

of ck ,n ,p ,e,p,e ,M,D, the likelihood function (3)

Bayes ~

~2 :

~f[dk_1'Ck,w(k),n,p) .... d(k) where {w(k)} is an independent random sequence with assumed distribution, usually Gaussian(O,I). A third form of external model is the "probabilistic" model p [d(k)ld _ ,c ,n ,p ) (Peterka 1981) .

k 1 k

It is defined uniquely by ~. External Model ~ may be interpreted as a one-step predictor of the measured output . It yields the conditional distribution, and various estimates may be computed by taking the conditional mode, mean , median , percentiles, confidence interval, or any other reasonable characteristic of the distri bution. In essence, the problem of evaluating the likelihood is one of finding predictors . RESTRICTIONS ON STRUCTURE

(2) of[ dk ,ck ,n ,p ) .... w(k) , S(k) and w(k) are independent gaussian vectors with zer o means and unit covariance matrices, when dk

follows easily, using

229

chain rule:

l.og L(eldWcN, n,p,e,M,V ) Log p[dNlc N,n,P(po,8),wP) N l.k=l Log p [d(k) Id k _ 1,c k,n, P(po , 8) ,wP) N 2 Lk=l {-II.w(k)11 /2 - Log det [S(k))}

plus a term that does not depend on

e.

The functions MP and wP both descri be the di stri bution of data uniquely and are therefore the only structures that can be validated by data. Hence , they are by themselves model structures, but describing only the system~s input-output behaviour, and are called "external" models. The " internal " model structures e,M,D describe more, but also requ ire more detail, which means more apriori knowledge about the object. External models are some times adequate , in particular as basis for control design relying mainly on feedback. However, if an internal model is required, the designer , in es sence, must provide so much structural information p,e,M,V that parameters n and 8are possible to infer from data . In addi tion to the difficulties of specifying the structure of an internal model , there is also the problem of deriving of from e,M,D. In the general case t hat problem has not been solved . Two ways to circumvent the problem are described below , and approximate solutions may also be feasible , using predictor algorithms of the "Extended Kalman filter" type. Because of this difficulty , and be cause the designer may pr efer an external model , for whatever reason, a fifth case element is introduced:

There are several kinds of restrictions on the class of case elements , from which the user may choose in order to prepare for a 3uccessful iden tification . Some are due to the fundamental limitations of statistical inference , which affect the choice of e,M, and V. Another restriction is " identifiability" , affecting mainl y the free parameter space P. Others are due to the limited power of computers (and of the systems scientists derlvlng the algorithms) ; they affect of. Some are due to the user ' s preference for certain struc tures , or to limitations in available identification programs. Since the aim of this paper is to investigate the limits to what can be done , the only restrictions discussed will be those origi nating in basic limitations of statist ical infe rence , mathematics , and computers: What is possible to infer from a given experiment? What restrictions make the evaluation of the likelihood feasible and efficient? The questions have no de finite answers . However , since it is still the responsibility of the operator to define the case elements , and since there are no definite , theory based rules for how to do that , it is important for the user at least to be aware o~ some known pitfalls in structure determination and some feasible ways to cope with the computing probl ems . Fundamental consequences of the experiment condition The way the exueriment is carried out affects fundamentally the possibilities to draw conclusions from its result . Among the pitfalls are the possi bili ty of " hidden variables" , the use of " natural input", the problem of "input noise" , "feedback in data" , and "object drift ". The fundamentals of these problems have puzzled statisticians for a long time and are now quite well understood (Ken dall and Stuart 1967). They have , however, some very practical consequences regarding what can and cannot be inferred by identification , and that needs to be pointed out . The following analysis helps to illuminate the diffi culties : It is evident , that observation of the responses of a stimulated object can never yield more information than the data distribution ~[dNlcN)' and this only if N is large and one may believe in ergodicity . This means that one can never do better than determining M and e from the formal equation

p [d",lc""n ,p ,e,M,V) or alternat ively ,

= ~[d""lc",). For simplicity, con-

sider the case that all input and output are measured at all times:

T. Bohlin

230 tk = k and d(k) = {y(k),u(k)} . The distributions one wants to describe by

n ,p ,M, C, those of the object output given input and of input given the feedback signal , if any , ar e by Bayes; chain rule (4) p [Y""iu"",n, p,Ml

and

n;=l p [ u~t)iYt, Ut_ 1 , n,p, Cl

The problems of "input noise " and "feedback " originate in having too many unknowns to determine in equ. (5). However , that case is less of a pitfall , since it can be detect ed by analyz i ng the data for identifiability . Conditions f or identifier bility wi l l not be given here. See fo r instance Gustavsson , Ljung , and S6derstrom (1977) , Vadja (1984) , Sieferd (1985) .

It is tempting , and a pitfall, to use the equation

The problem with slow drift originates in the fact that it violates the requirement of ergodicity. This means that even large N may not be large

(6)

enough to ascertain that ~ [ dNicN l

(5) p [u""iy"" , n ,p,C l :=

n;=l P[u(t)iy t ,ut _ 1,n ,p ,Cl

p [y(t)iy t _1,u t _ 1,n,p,Ml = P[d""le"" l

to infer about n ,p ,M,C. In doing so one ignores the possibility that for the data distribution (7)

~ [ y(t)iYt_1 , u"",c"" l '*' ~ [ y(t)iYt _ 1,Ut _ 1,c", l

i.e. the possibility that in spite of the obvious ly sound assumption of causality between the input and output sequences in the model, the sampled data y(t) may still be statistically dependent on current and future U(t+T), T> O. One possible cause of such dependence is "hidden variables": If the models M[ut,wt , t,n,pl .. y(t) and C[Yt , wt,t,n,p l .. u(t) have common components of Was arguments, then (8) p [Yoo'uooin,p,C,M l

appr oximates

~ [ dco Ie",l. The analysis yields some guidelines for designing C, M, and V for a number of important cases: • Identification using test signals: When experi ment input is provided by the same computer as is logging the responses , the Input model C will be able to extrapolate exactly from c k to pro duce u(t) between sampling points , usually by zero-order hold. • Identification under computer control: When the object is computer controlled , the same interpretation is valid . The sequence c is generated k by a control mechanism allocated outside C, M, and V. • Identification using "natural input" with or without "input error": When input is provided by an external source and measured with errors , then u(t ) is input to the transducer , its

k

Hence, in order to be able to infer anything about M, when separated from the input-generating process C (which is what one wants), one must simply be able to state that there are no common hidden variables. In other words, one must know that what caused u to vary cannot also cause y to vary through other channels than the object to be identified. That assumption cannot be verified from data only , and may not be easy to ascertain in practice , in particular when one does not know the cause of input variation, e . g . in cases when identification has to be carried out without a controlled input . That is the basic problem with using "natural input". Notice that statistical dependence between current y and future u does not prevent proper identification, when it is due to feedback via y. Then all dependence still passes through Channels the identifying computer has been prepared to expect, i.e. C and M. However , if feedback is administered through some other channel and the information on this is not in d , for instance by unrecorded mani pulations by an operator , then the model M arrived at will be adversely affected by such feedback . However again , even irregular , unplanned , and unrecorded manipulations need not be det r emental to identification, if they are done for causes not involv i ng feedback , since then they introduce no inadmissible statistical dependence. On the con t r a r y , they may be advantageous , if they help activating otherwise quiescent dynamic modes.

noise-contaminated value will be in the trans ducer output y(t ), and some or all values will k be in the data d(k) . In order to model the case correctly one should specify what is known about the external source using the Input model C. The arguments C and ware used to describe what is known and what is unknown respectively . It may suffice to specify the frequency distribution , and if that is unknown too, the parameters nand p may be used. However , it would not be sui table to model disturbance input as u, if the purpose of modelling is for designing control using the same input . In many cases it is reasonable to take a shortcut to the value of u(t) by interpolating between measured values of u(t ) . In k order to specify this one may equate c(k) with those data in d(k) that measure input , and then let C interpolate: C[c(k),c(k- 1 ),titk_12t2tk l .. u(t) . However , this is an approximation, neglecting both measurement and interpolation errors, and the statistical consequences of the approximation are difficult to analyse . • Identification of closed systems: When input is generated Qy feedback through other channels (than the data-logging computer) , then M must include also the controller , and the object of modelling is the closed-loop system . Unmeasureable output When the object output z, the source of which one wants to model , is not measured directly for iden tification , something is needed to tie z to what is observed , namely d. Obviously , there is no information in the data alone to provide a re l ation. However , the idea of "grey- box " identific ation is that the user may be able to provide enough inf ormation to determine such a relation in the follo wing way: The r elation M[ut, wt, n,p l .. z(t),y(t)

Nonlinear Grey-box Identification 11

11

11

fitted to data d yields estimates n , p , and Wt. 11

11 11

Then M[ut,wt,n,p]

+

11

. 2(t ) est1mates the unobserv -

231

linearly on the unknown coefficients introduced by the development (Parker and Perry 1981).

wP

able 2, if Mcontains enough apriori information. Sometimes the necessary relation can be provided in the form of an "observer", not necessarily con structured from knowledge of the object structure.

The feasibility of deriving the case element from ( C,M,V ) , i.e. an external model structure from an internal, is the next obstacle . It will restrict the feasible structures . Introduce ~ (feasibil ity ) to specify those restrictions .

If enough information is not available apriori, then there is obviously nothing else to do than either to get such information, or to measure also at least parts of 2. Such measurements (for instance laboratory tests) may be feasible during experiments for identification, since such rare occasions can bear higher measurement costs than those lasting periods when the model is used for control. This is cause for retaining the distinction between 2 and y in the model structure, even if 2 has to be measured . As a rule M should describe the situation prevailing when the model is put to use , while the experiments should be described by C and V.

The convenient case is when the user is able to specify the model structure in the form of an external structure:

Computability The following restriction originates in the necessity to limit the number of variables used in the calculations: This means that the model M must have a finite-dimensional state , which may be interpreted as a quantity embodying all information in past input ut ,W that is relevant for

t

predicting the future. How this necessary state is introduced in the structure specification is a matter of the user's preference , and of what is known about the physi cal interpretation of these state variables . That calls for introducing the first of the structureclassification indices ~(state) . Three popular classes are • {MI~(state)

= 1mplicit

M[ut,wt,n, p ]

+

state'}:

2(t),y(t)

where the state is not among the output . Typically, the model structure is in the form of a program too large to be integrated into the identification program. The model is available only as a source of responses to various input u ,w,n ,p . Even this representation must necessarily limit its use of memory , but how that is done is unknown to both the designer and the user of the identification program. • {MI~(state)

= explicit

state'):

M[u(t) , w(t),x(t),t,n,p ]

x(t+1 ),2 (t) , y(t)

+

The state variables are recognized apriori as having physi cal meanings . Typically , M is defined by algorithms integrating differential equations over one time quantum , with x(t) as start value. This means that the operator will have to specify also integration algorithm , preferably selected from a library. • {M I~ (stat e ) = 'phase - variables'} :

M[U t ,Wt ,2 t _ 1,Yt_1,t,n,p]

2(t),y(t) u where Ut = {u(t),u(t-1 ), . . . , u(t -n ), etc. The implicit state is x (t)

+

= {Ut-1,Wt-1,2t-1'Y t -1}.

This is a typical setup for black-box models, where little is known apriori of the internal stru cture. A general Malso allows various function-series developments of unknown structures (Billings and Leontaritis 1983 , Okita , Tanaka , and Yoshida 1985) . An example is bilinear models. Such a development is part i cularly beneficial for parameter estimation, if the data d will depend

• {MI~(feas ibilit y) = external model' } :

p[dk _ ,c k ,n, p ] 1

+

p [d(k)ld k _ , c k ] 1

or

or

w(k) E Gaussian(O,I) Generally, finding predictors is a problem only if there are essential stochastic disturbances about which one knows that they do not enter additively , and if one wants that knowledge to affect the model. Also , such disturbances should be several in order to create serious problems (see beloW) . In other cases , the identification program should be able to derive external uf from internal C,M ,V automatically . Now , the two crucial differences between an internal model M and an external model

MP

(or

11')

are that

y (t) at all times t , while the output d(k) of MP may be a subset of y(t ). k

1) M produces output

2) The number of primitive random variables dim [w(t) ] in M may be greater than the number of observed variables dim [d(k)] = dim [w(k) ]. An external model has a one-to- one correspondence between data d and the primitive random variables w driving it. The latter can therefore be computed easily , and hence also the likelihood. When an internal model has more primitives w than data , the likelihood has to be computed as an average over w, conditional on data d , and with a complicated relation between wand d . Only if the latter relation is linear is the task manageable, and that yields a feasible method using quasilinearization (see below). However , if one is parsimonious with introducing primitives also in an internal model, inverting the model to compute w will be feasible, and will yield the same convenient short cut to the likelihood function as external models offer . Even if the primitives cannot be kept few enough , all sorts of generalized inverses are still feasible. A reason for having more primitives than needed for statistical reasons is that they may correspond to physical noise sources . But in that case one would also have a basis for choosing the norm of a pseudo - inverse , namely the powers of the sources. This implies the following restriction: •

{ M I~ (feasi bili ty)

=lnverti ble di sturbance model'} :

M[ut , wt,t, n, p ] + y(t) V[ y(tk) ,k] + d(k) where (M ,V ) has an inverse {d(k),u t ,w t _ 1,t,k,n,p} + {wIt)} Using this recursively and setting w(k)

= w(tk)de-

fines 11'. A similar restriction applies to the input model C. Notice that the essential requirement is that for all t there be no more noise sources wIt) of disturbances than there are measured output d(k) at t . Usually this means the measurement of

k

T. Bohlin

232

all y(t) at all times , but if one can assume that w(t) = 0 for t _ < t < t , and also . for the . comk k 1 ponents of w( t ) corresponding to those of y( t ) k k that are not measured, the model so defined would allow irregular measurements. When invertibility cannot be assumed, the average of the pr edicted output over primitives w has to be computed. The only known exact solution to that problem assumes additive, gaussian Wt . However , the solution can be applied with approximation un der the following circumstances: When the model M is differentiable in all variables , except possi bly u(t) , and disturbances do not vary faster and more than is consistent with the assumption that 11 w( t) 112 are negli bi ble but not w( t), then the model can be linearized with respect to primitives around some reference trajectory (z~ ,y~ ) computed by setting w~ •

= O. This yields {MI~(feasible) = quasilinear

the restriction of

disturbance model'} In the case of explicit state model the struc ture will be M[u(t),O,xl' (t) ,t,n,p]

-+

r x (t+1 ),zr(t),yr(t)

Similar operator - specified indicators affect the recursive evaluation of the l ikelihood and the search for maximum . CONCWSIONS Evaluating the likelihood function associated with a given model structure is the key to system identification based on partial apriori knowledge as well as experiment data . Writing programs for this task and the class of all possible structures is not feasible in practice . However , investigating the effects of some limitations of statistical inference , mathematics , and computers on the task of system identification makes it possible to define structural restrictions such that evaluation is still feasible for a few but wide classes of structures. They differ in the way random disturbances are modelled . Under these restrictions it is feasible to write programs for interactive identification of nonlinear stochastic dynamic systems . The user's apriori information of the system structure may range from the assumption of "black box" to full knowledge of the equations describing the internal physical fenomena governing the external behaviour.

A [ u(t),t,n,p ] XO(t) + F[u(t),t,n,p] w(t)

REFERENCES

-+XO(t+1)

°

C[ u( t), t, n,p ] x (t) + E [ u(t),t,n,p] w(t) -+

zO(t),yO(t)

°

where superscript denotes deviation from the reference and the matrices A, C, F, E are obtained by taking partial derivatives at the reference trajectory. When based on an explicit state-space model the predictor will be the wellknown "Kalman filter". Predictors for "phase-variable models" are also feasi ble (Bohlin 1970). Further structural restrictions The restrictions introduced so far, viz . ~(state) and ~(feasibility) are necessary in order to write a program for evaluating the likelihood. In order to make both the evaluation and the search timeefficient more restrictions are necessary, except in the case of small systems . The following re striction indicators have been used by Bohlin (1984) : • ~(sampling) E {'irregular~ ' regular'} • ~(data record) E {'incomplete: 'complete'} Many identification methods , in particular of the recursive filtering type used for AR~ models lean heavily on the requirement that there be data for all sampling instants and that all variables be sampled at the same frequency . • ~(state differentiation) E {'numeric: , analytic: 'l inear'} • ~(parameter differentiation) E {'numeric: , analytic: 'linear'} It is common experience that linearity in the parameters is a crucial restriction for the identification of large nonlinear systems. However, many otherwise nonlinear models can be rewr itten in that form . • ~(residual metric) E {general: ' restricted'} Certain restrictions make it possible to estimate res idual covariances analytically, and hence to reduce the need for search (Bohlin 1984 ) . • ~(time variation) E {'t ime-dependent: 'autonomous'} It is avantageous also to distinguish between the cases of time variation of disturbances and of the object proper.

Billings , S . A., and I.J.Leontaritis (1983) . Proc . 6th IFAC Symposium on Identification and Sys tem parameter estimation , Washington DC , USA , 1982. Bohlin , T. (1970). Information pattern for linear discrete-time models with stochastic coefficients . IEEE Trans Automatic Control, AC-15 , 104-106. Bohlin;-or:- (197 8). :.1aximum-power validation of mo dels without higher-order fitting . Automatica, 14, 137-146. --Bohli~ . (1982). Model validation . Royal Institute of Technology , Sweden , TRITA-REG- 8203. Also in Sing M. (Ed.): Encyclopedia of systems and Control, Pergamon Press . Bohlin , T. (1984) . Computer-aided grey- box identification. Royal Institute of Technology , Sweden , TRITA-REG- 8403. Bohlin, T. (1985). Computer-aided Grey-box identification . Preprints 7th International Symposium on the Mathematical Theory of Networks and Systems, Stockholm , Sweden. Gustavsson , 1. , L. Ljung , and T. Soderstrom (1977) . Identification of processes in closed loop Identifiability and accuracy aspects . Automatic a , 13, 59- 75 . Kendall, M.~ , and A. Stuart (1967) . The advanced theory of statistics , vol. 2, Griffin , London . Okita, T. , S . Tanaka, and H. Yoshida (1985). Identification of nonlinear systems by Baysian theorem . Electr . Engn ., Jpn , ~, 126- 133. Parker, S.R., and F.A. Perry (1981) . A disc ret e ARMA model for nonlinear system identification. IEEE Trans Circuits and Systems , CAS- 28, 224-233. Peterka, V. ( 1981 ). Baysian system identification. Automatica , 11, 41 - 53 . Siferd, R.E. (1985) . Stochastic identifiability of nonlinear dynamical system parameters. Modeling and Simulation , ~ . Proceedings 14:th Annual Pittsburgh Conference. pitts burgh , PA , USA , April 1983 . Vadja , S. (1984). Structural identifiability of dynamic systems . Int J Syst Sci ., ~, 1229-1 247.