On The Consistency of Prediction Error Identification Methods

On The Consistency of Prediction Error Identification Methods

O N THE CONSISTENCY OFPREDICTIONERROR IDENTIFICA T I O N METHODS Lennart Ljung Department of Automatic Control Lund Institute of Technology S-220 07 L...

1MB Sizes 11 Downloads 155 Views

O N THE CONSISTENCY OFPREDICTIONERROR IDENTIFICA T I O N METHODS Lennart Ljung Department of Automatic Control Lund Institute of Technology S-220 07 Lund. Sweden

1.

INTRODUCTION

121

2.

SYSTEMS, MODELS AND PREDICTION ERROR IDENTIFICATION METHODS

123

2.1

SYSTEM DESCRIPTION

123

2.2

MODELS

129

2.3

IDENTIFICATION C R I T E R I A

132

2.4

CONNECTION WITH MAXIMUM LIKELIHOOD ESTIMATION

135

3.

CONSISTENCY AND IDENTIFIABILITY

137

4.

CONSISTENCY FOR GENERAL MODEL STRUCTURES

141

4.1

M A I N RESULT

142

4.2

LINEAR SYSTEMS

5.

6.

1.

143 146

IDENTIFIABILITY RESULTS 5.1

A DETERMINISTIC SET

5.2

LINEAR TIME-INVARIANT SYSTEMS

DI

146 148 152

CONCLUSIONS APPENDIX

153

REFERENCES

162

INTRODUCTION The problem of identification is to determine a model that

describes input-output data obtained from a certain system. The choice of model is as a rule made using some criterion of closeness to the data, see e.g. xstr8m-Eykhoff (1971), Soudack 121

122

Lennart Ljung

e t a l . (1971). In o u t p u t error methods the discrepancy between the model output and the measured output is minimized.

The

common model-reference methods are of this type, see e.g. Liiders-

.

Narendra (1974)

E q u a t i o n error m e t h o d s minimize the discrepancy

in the input-output equation describing the model.

Mendel (1973)

has given a quite detailed treatment of these methods. Output- and equation error methods are originally designed for noiseless data and deterministic systems.

If they are

applied to noisy data or stochastic systems they will give biased parameter estimates, unless the noise characteristics either are known or are of a very special kind. A natural extension of these methods is to take the noise characteristics into account and compare the predicted output of the model with the output signal of the system. Minimization of criteria based on this discrepancy leads to the class of p r e d i c t i o n error i d e n t i f i c a t i o n m e t h o d s .

This class contains under

suitable conditions the maximum likelihood method. The maximum likelihood method (ML method) was first introduced by Fisher (1912) as a general method for statistical parameter estimation. The problem of consistency for this method has been investigated by e.g. Wald (1949) and Cram&

(1946) under the

assumption that the obtained observations are independent. The first application of the ML method to system identification is due to hrEm-Bohlin (1965), who considered single input-single output systems of difference equation form. In this case the mentioned consistency results are not applicable. iistr8m-Bohlin (1965) showed one possibility to relax the assumption on independent observations. ML identification using state space models have been con-

sidered by e.g. Caines (1970), Woo (1970), Aoki-Yue (1970), and Spain (1971). Caines-Rissanen (1974) have discussed vector difference equations. All these authors consider consistency with probability one (strong consistency).

Tse-Anton (1972) have

proved consistency in probability for more general models.

R e d i c t i o n ETTOT Identification Methodc

123

Balakrishnan has treated ML identification in a number of papers, see e.g. Balakrishnan (1968). In the papers dealing with strong consistency, one main tool usually is an ergodic theorem. To be able to apply such a result, significant idealization of the identification experiment conditions must be introduced. The possibilities to treat input signals that are partly determined as feedback are limited, and an indispensable condition is that the likelihood function must converge w.p.1.

To achieve this usually strict stationarity of

the output is assumed. These conditions exclude many practical identification situations. For example, to identify unstable systems some kind of stabilizing feedback must be used.

Other

examples are processes that inherently are under time-varying feedback, like many economic systems.

In this paper strong consistency for general prediction error methods, including the ML method is considered. The results are valid for general process models, linear as well as nonlinear. Also quite general feedback is allowed. A general model for stochastic dynamic systems is discussed

in Section 2.

There also the identification method is

described. Different identifiability concepts are introduced in Section 3 , where a procedure to prove consistency is outlined.

In

Section 4 consistency is shown for a general system structure as well as for linear systems.

The application of the results

to linear time-invariant systems is discussed in Section 5. 2.

SYSTEMS, MODELS AND PREDICTION ERROR IDENTIFICATION METHODS

2.1

SYSTEM DESCRIPTION

A causal discrete time, deterministic system, denoted by

s,

can be described by a rule to compute future outputs of the

systems from inputs and previous outputs:

Lennart Ljung

124

Y0.,

where

"the initial conditions", represents the necessary

information to compute y(1). Often y(t+l)

is not expressed as an explicit function of

old variables, but some recursive way to calculate y(t+l)

is

preferred. Linear difference equations and state space models are well-known examples.

The advantage with such a description

is that only a finite and fixed number of old values are involved in each step. For a stochastic system future outputs cannot be exactly determined by previous data as in (2.1). probability distribution of y(t+l) should be considered.

Instead the conditional

given all previous data

It turns out to be convenient to subtract

out the conditional mean and consider an innovations representation of the form

where

I

E [y(t+l) Y,,s) I

is the conditional mean given all pre-

vious outputs and inputs,

Here

Yt

denotes the 0-algebra generated by

u(t) ,...,u(l);Y0},

and

Yo ,

{y(t),...,y(l);

"the initial condition", represents

the information available at time t

=

0 about the previous

behavior of the system. The sequence {s(t+l,Yt,S))

is a sequence of random

variables for which holds

It consists of the innovations, see Kailath (1970).

Prediction Error Identification Methods

I

will also be called

The conditional mean E [y (t+l) Yt,s] the prediction of y(t+l)

based on

Yt.

125

Since it will frequently

occur in this report a simpler notation

will be used. REMARK.

It should be remarked that the description (2.2) to

.

some extent depends on discussed here.

Two cases of choice of

Yo

The most natural choice is of course

will be

Y0

=

the

actual a p r i o r i information about previous behavior known to the "model-builder". A disadvantage with this choice is that in general E(y(t+l) lYt,s) will be time varying even if the system allows a time-invariant description. This point is further clarified below.

A

second choice is

Y0

=

-

Y0

= the information

equivalent (from the viewpoint of prediction) to knowing all previous y(t), u(t), t < 0. This choice gives simpler representations E (y(t+l) I Yt,s), but has the disadvantage that often not known to the user.

9,

is

Both choices will be discussed in

more detail for linear systems below. It should also be clear that the function

in (2.3) can gs be taken as independent of any feedback relationships between u and

y

k > t.

such that u(t)

is independent of future ~:(k,Y~-~,s)

General stochastic systems can be described by (2.21,

just as (2.1) is a general description of deterministic systems. The main results of this paper will be formulated for this general system description (2.2) .

For practical reasons, in the usual system descriptions the output is not given explicitly as in (2.2). ways to calculate y(t+l)

are used instead.

Various recursive Examples are given

below. EXAMPLE 2.1.

Linear Systems in State Space Representation

State space representations are a common and convenient way of describing linear, time-varying systems. The input-output

Lennnrt Ljung

126

S

r e l a t i o n f o r t h e system

x(t+l)=

A x(t)

s

{ e ( t ) } and

+

csx ( t ) +

y(t) = where

i s t h e n d e f i n e d by

{v(t))

Bsu(t) + e ( t ) (2.4)

v(t)

a r e sequences of independent Gaussian T

random v e c t o r s with z e r o mean v a l u e s and E e ( t ) e ( t ) = R s ( t ) , T c T E e ( t ) v ( t ) = Rs(t) and E v ( t ) v ( t ) = Q s ( t ) . The system matrices may very w e l l be time-varying b u t t h e t i m e argument i s suppressed. The f u n c t i o n E(y(t+l) where

Yt

IYtls)

=

j(t+lls)

i s t h e 0-algebra generated by

...,u (1),Y0}

u ( t ),

.

{ y ( t )I . . , y ( l ) ,

i s o b t a i n e d a s follows :

i(t+llS)

=

where t h e s t a t e e s t i m a t e

C$(t+llS) h

x

is o b t a i n e d from s t a n d a r d Kalman

filtering:

;(t+llS)

=

sc ( t l S ) +

A

BsU(t)

+ Ks(t){y

K ( t ) i s t h e Kalman g a i n m a t r i x , determined from

-2 Rsl

and

Qs

A s i B s i C s i Rsi

as

Yo

The information i n i t i a l value

C(0ls)

i s t r a n s l a t e d i n t o an e s t i m a t e of t h e

with corresponding covariance

Then (2.6) can be solved r e c u r s i v e l y from a t i o n (2.6) khen holds f o r any venient t o l e t

y,

t = 0.

.

Ps (0)

The r e p r e s e n t -

Yo, and i n t h i s ' c a s e it is con-

be t h e a c t u a l a p r i o r i knowledge about t h e

Prediction Error Identification Methods

previous behavior of the system.

127

Notice that if the system

matrices and covariance matrices all are time invariant and

Y0

=

Po,

then also Ks

will be time invariant.

A continuous time state representation can be chosen instead

of (2.4).

In e.g. ~str8m-K~llstr8m (1973) and Mehra-Tyler (1973)

it is shown how E[y(t+l) IYt,s], where

Yt

is as before, can be

calculated. The procedure is analogous to the one described above. 0 EXAMPLE 2.2.

General Linear, The-Invariant Systems

A linear time-invariant system can be described as

where and -1 q )

-1

q

Gs(z)

.

is the backward shift operator: q and

Hs(z)

The variables

-1 u(t)

are matrix functions of {e (t)}

z

=

(z

u(t-1) replaces

form a sequence of independent

random variables with zero mean values and covariance matrices T (which actually may be the-varying) It will Ee (t)e (t) = As

.

be assumed that Gs(z)

and

Hs(z)

are matrices with rational

as entries and that H (0) = I. The latter S assumption implies that e(t) has the same dimension as y(t) I functions of

z

but this is no loss of generality. Furthermore, assume that det H ( z ) has no zeroes on or inside the unit circle. This is S no serious restriction, cf. the spectral factorization theorem. -1 -1 Then Hs (q ) is a well defined exponentially stable linear filter, which is straightforwardly obtained by inverting Hs(z). To rewrite (2.7) on the form (2.2) requires some caution regarding the initial values.

If

Yo

does not contain enough

information the representation (2.2) will be time varying, even though (2.7) is time-invariant. In such a case a state space representation can be used.

Y0

=

0

A simpler approach is to assume that

= the information equivalent to knowing all previous

y(t), u(t), t < 0. It will follow from the analysis in the following sections that this assumption is quite relevant for

128

Lenmrt Ljung

identification problems. From ( 2 . 7 ) we have

-1 Since HS (0) = I, the right hand side of ( 2 . 8 ) contains y(s)

and

u(s)

only up to time

t.

The term e(t+l)

is determined from

of these variables, also in the case u output feedback. Hence, if

I

E(y(t+l) Yt,S)

=

{I

Y0

=

-

HS

is independent

Po,

-1 -1 (q

)y(t+l)

Now, linear systems are often not modelled directly in terms of the impulse response functions

GS

and

HS.

A frequently

used representation is the vector difference equation ( M E ) :

Another common representation is the state space form (in the time-invariant innovations representation form):

(2.11)

It is easily seen that these two representations correspond to

129

Prediction Error Identi/ication Methodr

respectively. In these two cases

Gs(z)

and

will be

Hs(z)

matrices with rational functions as entries. Inserting (2.12) into ( 2 . 9 ) it is seen that E(y(t+l) Yt,S)

=

is found as the solution of

G(t+llS) Cs(q

-1

A

)y(t+llS)

(Cs(q-l)-As(q-l) )y(t+l) + Bs(q

=

-1 )u(t)

(2.14) for the case of a VDE model. of

y(0) ,... ,y(-n) , u(0)

Solving (2.14) requires knowledge

,. ..,u(-n) , j ( O l s ) ,...,; (-nlS).

information is contained in the information

-

This

.

For the state space model (2.11) ;(t+llS)

is found from

(2.14b)

where the initial condition G ( 0 )

is obtained from 70

.

Notice that there is a parameter redundancy in the representations (2.10) and (2.11).

-

Cs

and all matrices

As,

-

-

All matrix polynomials Asi Bsi Bsl

Csl

Ks

and

that satisfy (2.12) and

(2.13) respectively, correspond to the same system (2.7). 0 These examples cover linear, possibly time varying systems. Clearly, also non-linear systems can be represented by (2.3).

A

simple example is y(t+l)

=

f(y(t) ,u(t)) + a(y(t))e(t+l)

It should, however, be remarked that it is in general no easy problem to transform a nonlinear system to the form ( 2 . 2 ) . This is, in fact, equivalent to solving the nonlinear filter problem.

It is therefore advantageous to directly model the

nonlinear system on the form ( 2 . 2 ) , if possible. 2.2

MODELS

In many cases the system characteristics, i.e. the function gs in (2.2) and the properties of { E (t+l,Yt,S)} are not known a priori. One possibility to obtain a model of the system is to

Lenmrt Ljung

I30

use input-output data to determine the characteristics. In this paper we shall concentrate on the problem how the function g can be found.

s

Naturally, it is impossible to find a general function gs(t;y(t) ,...,y (l);u(t),...,u(l);Yo). functions among which

g

Therefore the class of

is sought must be restricted. We will

call this set of functions the model s e t or the m o d e l s t r u c t u r e . Let it be denoted by

M

and let the elements of the model set be

indexed by a parameter vector will be denoted by

DM.

8.

The set over which

M

A certain element of

a model and be denoted by

M(8)

8

varies

will be called

or written as

A complete model of the system also models the sequence

{E(t+l,Yt,S)}

where

so that it is described by

{&(t+l,Yt,M(8))}

is a sequence of random variables with

conditional distribution that depends on M(8). For brevity, the notation

is also used for the prediction. REMARK.

The notions E[y(t+l) IYt,M(8)I

and

;(t+lle)

are

used to indicate that certain probabilistic arguments underl.ie the construction of the rules how to calculate a good estimate of the value y(t+l)

from previous data.

It must however be stressed

that the notions are purely formal and should be understood just There is no underlying M(8) ' "model-probability space" in the discussion, and all random

as replacements for the function g

131

Prediction Error Identification Methodc

variables, including of the true system.

(t+l\0) "

belong to the probability space

(The author is indebted to Prof. L. E.

Zachrisson, who suggested this clarification.) The model structures can be chosen in a completely arbitrary way.

For example,

g

can be expanded into orthogonal function

systems :

Such choices are discussed by e.g. Lampard (1955).

If there is

no natural parametrization of the model, such an expansion may be advantageous. Tsypkin (1973) has discussed models of this type in connection with identification of nonlinear systems.

However,

the usual choice is to take one of the models in Example 2.1 or

Bi into the system matrices.

2.2 and introduce unknown elements

A vector difference equation model, e.g., is then described by

AM())

(q-l)y(t)

=

BM(e)(q-l)U(t)

+ cM(0) (s-l)E(t;M(B))

where

(t;M ( 0 ) ) }

is a sequence of independent random variables T with zero mean values and EE(t,M(0))E(tIM(0)) = A M ( e ) . The {E

unknown elements may enter quite arbitrarily in the matrices A

Some elements may be known from basic physical laws, or i,M(0) ' a p r i o r i fixed. Other elements may be related to each other, etc. Generally speaking, M vector

0

can be described by the way the parameter

enters in the matrices:

the model parameterization.

Thus, for time-invariant linear systems the choice of model type, (vector difference equation, state space representation, etc.) and parameters can be understood as a way of parametrizing G

and

H

in (2.8):

GM(0)

and H

M (0)

via (2.12) or (2.13).

Lennurt Ljung

132

REMARK.

Like for the system description, also the model

.

description depends on the initial conditions most sensible to choose

Yo

It would be

as the actual a p r i o r i knowledge, but

as remarked previously, this gives more complex algorithms for computing the prediction. For time-invariant systems it will

Y0

therefore be assumed in the sequel that all previous history.

Since

to be included in the model:

-Yo -

=

0

=

knowledge of

is in general not known it has

Yo

-

= Yo(8).

Often it is sufficient

(8) = 0 all 8, i.e. u(t) = y(t) = 0,t < 0, corres0 ponding to zero initial conditions in (2.14) and (2.14b).

to take

2.3

IDENTIFICATION C R I T E R I A

The purpose of the identification is to find a model

M(0)

that in some sense suitably describes the measured input and output data. The prediction of trol.

y(t+l)

plays an important role for con-

In, e.g., linear quadratic control theory, the optimal

I Yt,S]

input shall be chosen so that E [y(t+l) havior.

has desired be-

This is the separation theorem, see e.g. istrEm (1970)-

Therefore, it is very natural to choose a model that gives the best possible prediction. That is, some function of the prediction error y(t+l)

- E(y(t+l) IYt,M(8))

should be minimized with respect to

8.

We will consider the following class of criteria. Introduce the matrix

QN(M(8))

=

where n is the number of outputs. Its dimension is n x n Y Y' Y {R(t)) is a sequence of positive definite matrices. It is assumed that

{ IR(t)

1

is bounded.

The selection of the matrices

naturally effects the relative importance given to the components

Prediction Error Identification Methodr

133

of the prediction. A special choice of weighting matrices is discussed in Section 2.4. A scalar function, h[Q (M(8)) I , of the matrix of prediction N errors will be minimized with respect to 8. For the minimization to make sense, some simple properties of the function h must be introduced. PROPERTIES OF h. n Y

x

Let h

be a continuous function with

n symmetric matrices as domain. Assume that Y1 h(AA)

g(A)h(A),

=

Let 61 < A < 1/61 let B

A, g(A) scalars and g(A) > for A > 0

0

(2.16a)

be a symmetric positive definite matrix, and

be symmetric, positive seniidefinite and nonzero. Assume

that then h(A+B+C for

E

T

tr C C If h

)

<

> h(A) + p(6) tr B

-

where

Eo,

E

0

where

p(6) > 0

depends only on

(2.16b)

6 and tr B.

0

satisfies (2.16), it defines a well posed identifi-

cation criterion by

(2.17)

or

In particular, h(A) satisfies (2.16).

will be taken as

tr A, which clearly

This criterion is probably the easiest one to

handle, theoretically as well as computationally. Then

where =

T

x R(t)x

.

Another possible choice is h(A) = det(A), which is of interest

Lennart Ljung

134

because of its relation to the likelihood function, cf. Section 2.4. LEMMA 2.1: Proof.

h(A) = det(A)

s a t i s f i e s (2.16).

Condition (2.16a) is trivially satisfied.

det(A+B+CE)

= =

det All2 det(I+A-1/2 (B+CE)A-ll2) det A 1/2 n Y det A (l+di) i=l

n

where di are the eigenvalues of A

-1/2

-1/2 (B+CE)A .

.

Let A be the largest eigenvalue of B. Then A > tr B/n Y -1/2BA-1/2 Also, A has one eigenvalue that is larger or equal to

16.

(Consider A-1/2BA-1/2~I with eigenvalue A . )

to B

the eigenvalues at most

where

A-l12x

Now, adding

~ / 6 where

E =

CE

is an eigenvector to

B

can distort

((CEIl , the operator norm

of Y

for

E

<

6 tr B

= %

which concludes the proof.. In this chapter we will consider the limiting properties of the estimate 0

that minimizes (2.17) as N

tends to infinity.

Of particular interest is of course whether the limiting values of

gives models M ( 0 )

0

that properly describe

s.

This is the

problem of consistency of prediction error identification methods. So far we have only discussed how the function E[y(t+l)

Ytlsl

can be estimated. The properties of

{E(t+l,Yt,S)}

I

can

135

Rediction Emor Identification Methodr

then be estimated from the residuals

where

S)

8*

is the minimizing value.

= {e(t+l))

In particular, if

{E(t+l,Yt,

is a stationary sequence of independent random

variables with zero mean values and we are only interested in the T second order moment properties then A = Ee(t)e (t) can be estimated as

1/N Q,(M(~*)

where

Q,

is defined by (2.15) with

R(t) = I. 2.4

CONNECTION WITH MAXIMUM L I K E L I H O O D ESTIMATION

It is well known that prediction error criteria are intimately connected with maximum likelihood estimates. This section contains a brief discussion of how the formal relations can be established. Consider the model (2.18) with

Let the innovations IE(t,M(e))) and normally distributed. given

Yt

be assumed to be independent

The probability density of

y(t+l)

and given that (2.18) is true then is

r

Here

f (xly,)

= F'

(xly,)

where

F(xlYt) = P(y(t+l)

5 xlYt)

.

Using Bayes' rule the joint probability density of y(t+l) and

y(t)

given

Yt-l

can be expressed as

136

Lennart Ljung

where y(t)

A

in y(t+118)

E{y(t+llY t , M ( 8 ) ) distribution of

should be replaced by

given

Yo.

In case

1 does not depend linearly on y(t) , the (y(t+l) ,y (t))

is not jointly normal.

directly gives the joint probability density of y(1)

xt.

Iteration

...,

y (t+l),y (t),

The logarithm of the likelihood function, given

Yo, then is obtained as

The maximum likelihood estimate (MLE) of

8

therefore is obtained

as the element that minimizes

A

If the matrices A(t)

are known, the MLE is consequently obtained

as the minimizing point of the loss function (2.17) with tr(A)

and

R(t) = ff-'(t)

.

When

i(t)

are unknown, the minimiz-

ation should be performed also with respect to A

A(t)

does not depend on

h(A) =

{ i ( s )1.

In case

t, the minimization with respect to

can be performed analytically, Eaton (1967), yielding the problem

137

Prediction Error Identqication Methods

to minimize QN(M(9))]

det[Q,(M(B))]

giving

9(N)

[where R(t)

=

I

in

and then take

Summing up, the loss function (identification criterion) (2.17) which per se has good physical interpretation, also corresponds to the log likelihood function in the case of independent and normally distributed innovations. In the analysis, however, this will not be exploited. The results are therefore valid for general distributions of the innovations. 3.

CONSISTENCY AND IDENTIFIABILITY The question of identifiability concerns the possibility to

determine the characteristics of a system using input output data. This question is obviously closely related to the problem of consistency of the parameter estimate

8. A way to connect the

two concepts is introduced in this section.

The definitions

given here are consistent with those of Ljung-GustavssonSoderstrom (1974).

The consistency of the parameter estimate 8

depends on a variety of conditions, such as noise structure, choice of input signal, model parametrization etc.

One specific

problem is that there usually is parameter redundancy in the models.

It was demonstrated in Examples 2.1 and 2.2 that several

sets of matrices give the same input output relationships, and hence cannot be distinguished from each other from measurements of inputs and outputs. Introduce the set

Lennart Ljung

138

The set DT(S,M)

consists of all parameters in DM which give

models that describe the system without error in the mean square sense. There might be differences between

s

and

M(B) ,

(S,M) due to initial conditions and discrepancies at certain T time instances, but on the whole they are indistinguishable from

8E D

input-output data only. For the case of linear, time-invariant systems it is easy to see that D (S,M) T

can be described as

Clearly, it is not meaningful to consider consistency if DT(S,M)

is empty.

M

Therefore,,unless otherwise stated it will be

(S,M) is nonempty. Naturally, T this is a very strong assumption in practice, since it implies

assumed that

is such that D

that the actual process can be modelled exactly. However, the theory of consistency does not concern approximation of systems, but convergence to "true" values.

In Ljung (1975) general con-

vergence results are given, which are valid also for the case when

D,(S,M)

is empty.

The estimate based on N

data, e(N), naturally depends on

S and M and on the identification method used, 1 .

It also

depends on the experimental conditions, like the choice of input signals, possible feedback structures etc. conditions will be denoted by

x.

The experimental

When needed, these dependencies

will be given as arguments. Suppose now that

REMARK.

By this is meant that inf

with probability one as N estimate converges.0

-t

a.

IB(N)

- el

+

e l E D~ It does not imply that the

o

Redaction Error Identification Methodr

139

Then the models that are obtained from the identification all give the same input-output characteristics as the true system.

If

we understand a system basically as an input-output relation, it is natural to say that we have identified the system if (3.2) holds : DEFINITION 3.1:

(sI(M,l,X))

able

w.p. 1 a s

N

s M,

A system

under g i v e n

i s s a i d t o be S y s t e m I d e n t i f i -

1,

x, if

and

8(N) -+DT(S,M)

+ m.

If the objective of the identification is to obtain a model that can be used to design control laws, the concept of SI

is

(S,M) give the same T input-output relation, they also give equivalent feedback laws. Since all elements in D

quite adequate.

When minimizing the criterion function, it may however lead to numerical difficulties if there is a non-unique minimum.

If

the objective is to determine some parameters that have physical significance another conept is more natural. DEFINITION 3.2: tifiable

sI(M,l,X)

(PI(M,l,X)) and

REMARK.

DT(S,M)

A system

s

i s s a i d to be P a r a m e t e r I d e n -

M,

under g i v e n

1,

and

X,

i f it is

c o n s i s t s of o n l y o n e p o i n t .

Parameter identifiability is the normal identifia-

bility concept, and it has been used by several authors, see e.g. istrGm-Bohlin (1965), Balakrishnan (1968), Bellman-istrzm (1970), Tse-Anton (1972) and Glover-Willems (1973). Usually the system matrices are assumed to correspond to a certain parameter value 8'

for the given model parametrization.

parameter 8'

In such a case the

is said to be identifiable w.p. 1

(or in

probability) if there exists a sequence of estimates that tends to

8'

w.p.1

converges to

(or in probability). 8

0

w.p.1

Now, the sequence of estimates

if and only if it is PI(M,l,X)

to Def. 3.2 and DT(S,M) = ( 8

0

1.

according

Therefore the definition just

cited is a special case of the Definition 3.2 above.

Lennart Ljung

I40

Clearly, a system 0

can be

S

pI(M,l,X)

only i f

D,(S,M)

1.

=

This means that t h e r e e x i s t s a one t o one correspondence 0 between the t r a n s f e r function and the parameter vector 8 This (9

.

one t o one correspondence can hold globally or l o c a l l y around a given value.

The terms global and l o c a l i d e n t i f i a b i l i t y have

been used f o r t h e two cases, see

e.g. Bellman and & t r 6 m

(1970).

Definition 3.2 c l e a r l y corresponds to global parameter i d e n t i fiability. The problem t o obtain such a one t o one correspondence f o r l i n e a r systems is r e l a t e d t o canonical representation of t r a n s f e r functions.

This i s a f i e l d t h a t has received much a t t e n t i o n .

The special questions r e l a t e d t o canonical forms f o r i d e n t i f i cation have been t r e a t e d by e.g. jistrGm-Eykhoff

(19711, Caines

(1971) , Mayne (1972) and Rissanen (1974). From t h e above discussion we conclude t h a t t h e problem of consistency and i d e n t i f i a b i l i t y can be t r e a t e d a s three d i f f e r e n t problems : 1.

F i r s t determine a s e t

8(N)

+

D,(S,M,T,X)

This i s a s t a t i s t i c a l problem.

such t h a t

D,(S,M,Z,X) w.p.1

as

N+m

To f i n d such a s e t , c e r t a i n

conditions, mainly on t h e noise s t r u c t u r e of the system, must be imposed. 2.

Then demand t h a t

i.e. that

S

is

SI(M,T,X).

experimental conditions,

x,

This introduces requirements on the choice of input s i g n a l , feedback

structures e t c . 3.

I f so desired, require t h a t

This i s a condition on t h e model s t r u c t u r e only, and f o r l i n e a r

Prediction Emor Identification Methodr

141

systems it is of algebraic nature. In Lemma 4.1 and in Theorems 4.1 and 4.2 of the following section the set DI

is determined for general model structures

(2.181, and linear systems respectively. Problem 2 is discussed in Section 5 for linear time-invariant systems. In Gustavsson-LjungSaerstrEm (1974) problem 2 is extensively treated for vector difference equations. Problem 3 is, as mentioned, the problem of canonical representation and can be treated separately from the identification problem. REMARK.

DT, SI and

It will not be discussed in this paper.

In the following, the arguments

s, M,

7,

x

in D I' PI will be suppressed when there is no risk of

ambiguity. 4.

CONSISTENCY FOR GENERAL MODEL STRUCTURES The problem to determine a set

a statistical problem.

is, as mentioned above, DI The approach used in most works is to

apply an ergodicity result to the criterion function (2.17) and then show that D

I

is the set of global minima of the limit of

the criterion function. However, to assume ergodicity of the involved processes introduces rather limiting conditions on the system, possible feedback structure and on the input signal. Furthermore, uniform (in 8 ) inequalities for the loss functions must be established. This is a fairly difficult problem, which in fact has been overlooked in many of the previous works. The set into which the estimates converge will here be shown to be

The reason for using limit inferior is that, under some circumstances, the limit may fail to exist.

Lennart Ljung

142

It should also be noted that ation w,

D may depend on the realizI DI(w), although in most applications it does not (a.e.X

see Section 5.

For adaptive regulators it is, however, sometimes

useful to consider

DI

as a function of w.

If convergence into a set that does not depend on w

is

desired, this can be achieved by showing that

Then O(N) 4.1

+

DI w.p.1

as

N

-+ m.

M A I N RESULT

LEMMA 4.1.

C o n s i d e r the s y s t e m

where

Consider a set of models, empty.

Let

V N ( 0 ) = h[l/N

D (S,M) is nonT 0 (N) minimize the identification criterion (2.17) , Q,(M(0))],

compact set DM. z(t)

where

DA

=

Let

sup 0 €Di

M,

where DI (w)

max l
h

such that

L

satisfies.(2.16), over a

be defined by (4.1). Suppose that

I&

j(i)(tlO)

I

((i) denotes i-th row)

is an open set containing DM, satisfies the following

condition w.p.1

Assume further that

(4.3)

143

Predict ion Error Identification Methods

lim sup tr

N-)ao

1 N

Q (M(0)) < N

m

w.p.1 for any fixed 0



DM

(4.4)

and that (4.5) (for h = tr

this assumption (4.5) is not necessary).

Then the estimate

The proof of the lemma is given in the Appendix. To apply Lemma 4.1 conditions (4.3) and (4.4) have to be checked.

This requires some analysis of the model structures.

REMARK.

If the search is restricted to a finite number of

models, conditions (4.3) and (4.4) can be removed and as the set DI

the smaller set

can be chosen, see Ljung (1974). 4.2

LINEAR SYSTEMS

For the linear, time-invariant model described in Example 2.2 it is relatively easy to find the variable

Lemma 4.1.

z(t)

defined in

If the search for models is restricted to those

corresponding to stable predictors, (4.3) will be satisfied if the input and output are subject to similar conditions.

This is

shown in Theorem 4.1. THEOREM 4.1: system ( 2 . 7 )

(LINEAR, TIME-INVARIANT SYSTEMS)

.

Consider the

Lennurt Ljung

144

Ele(t)14 < C ( a n d , i f the g e n e r a l criterion (2.17) i s T Ee(t)e(t) > 6I), and l e t the model set be d e s c r i b e d b y

where

used

where

DM

i s compact and

and

DMM(e)(z)

w i t h r a t i o n a l f u n c t i o n s a s entries.

H (z) a r e m a t r i c e s M(8)

Assume

O f o r VDE-parametrizations (2.12), that a l l zeroes o u t s i d e the u n i t circle f o r Ofor

(z) h a s

det

8 € DM

state s p a c e r e a l i z a t i o n s (2.13), t h a t

h a s a l l e i g e n v a l u e s i n s i d e the u n i t circle f o r

AM(~)-KM(~,C~(~)

8

€ DM.

the g e n e r a l c a s e , t h a t

det H (z) a s w e l l a s the M(8) l e a s t comon d e n o m i n a t o r t o the d e n o m i n a t o r p o l y n o m i a l s i n Ofor

GMce,( z ) and

for

8

€ DM.

H ( z ) h a v e a l l zeroes o u t s i d e the u n i t circle M(8)

Suppose a l s o t h a t

D,(S,M)

( d e f i n e d i n (3.2)) i s non-empty.

Any f e e d b a c k r e l a t i o n s h i p s b e t w e e n

u

y may e x i s t , b u t

and

assume t h a t l i m sup NThen the i d e n t i f i c a t i o n e s t i m a t e

as

tends to i n f i n i t y .

N

8 (N) c o n v e r g e s i n t o

In this c a s e

DI

The proof is given in the Appendix. the linear filters that determine G(tlf3) u

and

y

DI

w.p.1

c a n be e x p r e s s e d a s

It uses the fact that and

d/de j(t[8)

from

are exponentially stable.

For the time-varying model described in Example 2.1 it is known that the Kalman filter (see e.g. Jazwinski (1970),

Prediction ENOSIdentzjkation Methodc

Theorem 7 . 4 ) is exponentially s t a b l e i f the p a i r

is completely uniformly controllable.

(A(t)

145

, m)

Furthermore, the b a s i s for

t h e exponential depends only on the bounds on the observability and c o n t r o l l a b i l i t y Gramians.

Hence we have t h e following

theorem: THEOREM 4.2: FORM).

(TIME-VARYING LINEAR SYSTEMS I N STATE SPACE

Consider the l i n e a r s y s t e m d e s c r i b e d i n Example 2.1 and

s u p p o s e t h a t the c o v a r i a n c e matrices a r e u n i f o r m l y bounded from (For the g e n e r a l criterion (2.17) assume a l s o t h a t T (t) Ps (t) cs (t) + Q ( t ) > 61.) L e t the set o f models be d e f i n e d

above.

cs by

where

{E

( t )1

{w ( t )}

and

compact set s u c h t h a t such t h a t

T

m

d e f i n e d b y (3.1) i s nonempty and

(S,M)

is uniformly (in t

(A,C)

o b s e r v a b l e and

D

a r e s e q u e n c e s o f i n d e p e n d e n t Gaussian

(A,

6)i s

8

and i n

uniformly (in t

€ DM)

and i n

completely 8 € DM)

completely controllable. Any f e e d b a c k r e l a t i o n s h i p s b e t w e e n assume that (4.7) i s s a t i s f i e d . 8(N)

converges into

DI

u

and

y

m y exist but

Then the i d e n t i f i c a t i o n e s t i m a t e

w i t h p r o b a b i l i t y one a s

N

tends t o

i n f i n i t y . rn Theorems 4 . 1 and 4.2 determine t h e s e t general and weak conditions.

DI

under q u i t e

Actually, t h e imposed conditions:

bounded fourth moments of t h e innovations, model search over s t a b l e predictors and the condition on the o v e r a l l system behavior (4.7) a r e believed t o be s a t i s f i e d i n almost a l l conceivable applications.

For a c t u a l applications it i s of i n t e r e s t t o

Lcnnart Ljung

146

study

DI

more closely:

When is it possible to find a set

DI

satisfying (4.2) and what additional assumptions on the input generation must be imposed in order to assure system identifiability.

DICDT, i.e.

These questions are discussed in the

next section.

5.

IDENTIFIABILITY RESULTS As

outlined in Section 3, the identifiability and consistency

questions can be separated into three problems.

The first problem

to determine a set

D was solved in the previous section. The I second problem, investigation of the set DI and in particular

the relation D C DT I 5.1 DI

will be the subject of the present section.

A DETERMINISTIC SET

DI

as defined in (4.1) is a random variable.

most applications DI

=

-

DI

However, in

w.p.1

(5.1)

where

where the expectation is with respect to the sequence of innovations.

This deterministic set

DI

may be easier to

handle. For linear systems the relation (5.1) will hold if the system is in open loop and is stable, or contains linear feedback which makes the closed loop system exponentially stable. To include also nonlinear feedback terms, which makes the closed

loop system nonlinear, the concept of exponential stability has to be extended to stochastic, nonlinear systems. DEFINITION 5.1:

C o n s i d e r the l i n e a r system

Prediction Error Identification Methods

e(t)

where

the i n p u t

a r e i n d e p e n d e n t r a n d o m v a r i a b l e s , and w h e r e p a r t o f

u(t)

i s determined as (nonlinear) o u t p u t feedback.

L e t the s y s t e m a n d r e g u l a t o r be s t a r t e d u p a t t i m e

zero i n i t i a l c o n d i t i o n s , y i e l d i n q a t t i m e tively.

t

EC(Yt+)

4

<

C.

t-N,

with

the o u t p u t s and

Suppose that

scalar f u n c t i o n o f that

147

Y t-Nr

Such

T h e n the c l o s e d l o o p s y s t e m i s s a i d t o be

exponentially stable.

For l i n e a r feedback t h i s d e f i n i t i o n i s c o n s i s t e n t with t h a t t h e closed loop p o l e s be i n s i d e t h e u n i t c i r c l e . I t t u r n s o u t t h a t e x p o n e n t i a l s t a b i l i t y a s s u r e s n o t only

(5.1) b u t a l s o ( 4 . 7 ) . LEMMA 5.1:

Example 2.2.

Hence w e have t h e following lemma:

C o n s i d e r the l i n e a r s y s t e m s o f E x a m p l e 2.1 or

L e t the i n p u t h a v e the g e n e r a l f o r m

i s a s i g n a l t h a t i s i n d e p e n d e n t of

where

ur(t)

s < t

and s u c h t h a t

i s a s e q u e n c e o f d i s t u r b a n c e s o f a f i l t e r e d w h i t e noise

{w(t) }

character, s a y , which i s independent o f Elw(t)

y(s)I u(s),

l4

designer.

< C.

The f u n c t i o n

ft

{ e ( t ) } and s u c h t h a t

may be unknown t o the e x p e r i m e n t

A s s u m e t h a t the i n p u t i s such t h a t the c l o s e d l o o p

s y s t e m s i s e x p o n e n t i a l l y stable ( D e f . 5.1) a n d t h a t

DM

satisfies

Lennart Ljung

148

t h e assumptions o f Theorem 4.2 or 4.1 r e s p e c t i v e l y .

e(t), y(t)

and

u(t)

Suppose t h a t

have u n i f o r m l y bounded f o u r t h moments.

Then (4.7) and (5.1) h o l d .

Proof.

The proof is based on the following theorem due to

Cramer and Leadbetter (1967):

If

then

with probability one. It follows by straightforward calculations from the assumptions on exponential stability and on

satisfy (5.3).

DM

that

(For details, see Ljung (1974), Lemma 5.2.)

This

proves the lemma. 5.2

LINEAR TIME-INVARIANT SYSTEMS

Let us now study in more detail linear time-invariant systems as treated in Example 2.2 and Theorem 4.1.

Since this class in-

cludes any parametrization of vector difference equations or state space realizations or any other parametrization of a linear timeinvariant system, it is believed that such analysis is sufficient for most applications. From Theorem 4.1 and Lemma 5.1 it follows that the estimates tend to the set

Prediction ETTOT Identification Methods

-1

+

-

(HS Gs

H

I49

-1

M ( e) G~ ( 8

This s e t c l e a r l y depends on t h e i n p u t s i g n a l .

I f the i n p u t i s

n o t s u f f i c i e n t l y g e n e r a l , t h e set may c o n t a i n parameters c o r r e s ponding t o models t h a t d e s c r i b e t h e system w e l l f o r t h e used inp u t , b u t f a i l t o d e s c r i b e it f o r o t h e r i n p u t s .

This i s t h e case

i f t h e i n p u t c o n t a i n s t o o few f r e q u e n c i e s o r if it has c e r t a i n r e l a t i o n s h i p s with the o u t p u t .

Then

DI

i s n o t contained i n

DT

and t h e system i s n o t System I d e n t i f i a b l e f o r t h i s i n p u t (experiment c o n d i t i o n )

6I

The s e t

.

has been analysed i n Ljung-Gustavsson-Sierstrgm

(1974) i n d e t a i l f o r t h e case o f time-varying feedback.

Here w e

w i l l c o n s i d e r a c a s e with l i n e a r feedback and an e x t r a i n p u t signal (or noise).

where

L e t t h e i n p u t be

{u ( t ) } i s a sequence t h a t does n o t depend on R

Suppose t h a t

F(z)

is a m a t r i x with r a t i o n a l f u n c t i o n s a s

e n t r i e s , such t h a t t h e c l o s e d loop system i s s t a b l e . loop system i s y(t+l)

=

(I

{e(t)).

-

q-lGs(q-l)F(q-l))-lHs(q-l)e(t+l)

The closed

Lennart Ljung

I50

Then

Since

z

is independent of

iiR and uR‘ the expectation can be

written

If Ee(t)e(tIT > 61, Ke(q

-1

-

then it follows that

-1 -1 Le(q )F(q

=

0

for

e

E

DI

(5.4)

since the first term has to be arbitrarily close to zero infinetely often for

If u

R

€IE

DI.

This in turn implies that

is persistently exciting (see e.g. Mayne (1972)) of

sufficiently high order then this implies that -1 Le(q )

which, via (5.4) Ke(q

-

=

0

for

e

E

DI

for

e

E

DI

implies that

-1

0

)

That is, DI = DT

.

.

Prediction Error Identzyicafion Methods

REMARK. Let U (t) = col(uR(t) M sufficient to assume that

,...,uR (t-M)).

I51

Then it is

(5.5) The limit of the sum does not have to exist, as in the definition of persistent excitation in Mayne (1972). The number M on

s

for which (5.5) has to be satisfied depends

and on the parametrization of M.

presentations M

For state space re-

can be related to the orders of the system and

model, see e.g. Mayne (1972).

For the unspecified models, which

we deal with here, we can require that (5.5) holds for any M. Summing up this discussion, Lemma 5.1 and Theorem 4.1, we have the following theorem. THEOREM 5.1: y (t+l)

where

{e(t)}

that

Ele(t)

where

{u,(t)

for any

M.

=

C o n s i d e r the s y s t e m (2.7), s, Gs

(q-')

where

+ Hs (q-l)e (t+l)

i s a s e q u e n c e o f i n d e p e n d e n t random v a r i a b l e s s u c h

l4

< C and

}

Ee(t)e(t)T

i s independent of

Assume t h a t

i s exponentially stable. Y(t+l)

u (t)

=

GMce)

F

{e(t)

(q-l)u(t)

8(N)

1

and s a t i s f i e s (5.5)

L e t the model s e t , M, be d e s c r i b e d b y

+ HM(8) (q-')e(t+l)

conditions a s i n Theorem 4.1 f o r Let

The i n p u t i s

i s s u c h t h a t the c l o s e d l o o p s y s t e m

DM i s compact and s u c h t h a t

i s nonempty.

> 61.

H

M(e)

8 c D~

;

s a t i s f i e s the same

(2)

8 €DM. Assume t h a t

be the estimate o f

8

D,(S,M)

b a s e d on

N

data

points, o b t a i n e d b y m i n i m i z i n g the g e n e r a l criterion (2.17). Then

8(N)

-+

DT(S,M)

w i t h p r o b a b i l i t y one a s

N

-+

Q)

Lentwart Ljung

152

where

That is,

s

is System Identifiable.

REMARK.

Notice that, when evaluating the criterion (2.171,

the predictor

G(tl0)

initial data.

As remarked several times above, it is most

does not have to be based on the true

suitably chosen as the time-invariant, steady state predictor (2.9) initialized with zero initial state.

The chapter by Caines and Chan in this volume contains several interesting, related results on identifiability of feedback systems. 6.

CONCLUSIONS In this contribution consistency and identifiability

properties of prediction error identification methods have been investigated. A separation of the problem into three different tasks has been introduced, and results of varying generality and on varying levels have been given.

The results can be used as a

kit for "doing-your-own consistency theorem" in a specific application. They also solve the identifiability and consistency problem for linear, time-invariant systems under quite general assumptions, as shown in Theorem 5.1. The hard part in consistency theorems is believed to be to determine the set into which the estimates converge (Problem 1 in the formulation of Section 3 ) .

This has been solved for quite

general (Lemma 4.1) as well as for linear systems (Theorems 4.1 and 4.2).

Due to the very weak conditions imposed, these results

are applicable to adaptive systems of almost any kind, in addition to the more straightforward cases treated in Section 5. The difficult and vital problem of choosing a parametrization (model set) that gives identifiable parameters and consistent parameter estimates has not been considered here.

However, this

problem is most conveniently treated as a problem of its own, and

153

Prediction Error Identiyication Methods

it does n o t depend on t h e i d e n t i f i c a t i o n method chosen. APPENDIX A.l

PROOF O F LEMMA 4 . 1

The i d e a of t h e proof i s t o show t h a t

-0 E D T .

ex

where t h e infimum i s taken over an open sphere around radius

p , and where

Then t h e minimizing p o i n t

cannot belong t o t h i s sphere f o r

N

>

No(O

X

,p,w).

with

e(N)

This r e s u l t i s

then extended t o hold f o r t h e complement of any open r e g i o n containing

D by applying t h e Heine-Bore1 theorem. I' Let without loss of g e n e r a l i t y R ( t ) = I . Introduce, f o r

short, e (t) =

E( t r

Yt-l

IS)

=

y(t)

- ;(tls,

and c o n s i d e r

Let

T

E[e(t)e(t)

St > 6 1

for a l l

IYt-,] t.

According t o t h e assumptions t' Each element of t h e m a t r i x = S

c l e a r l y i s a martingale with bounded v a r i a n c e , from which follows that w.p.1 and

2/6' >

1

' QN(s) 2 62 I

for where

as

n > N1(u), P(Ql)

= 1

N -+

m

o€n,

Lennart Ljung

I54

where

6 ' = min(6, 1 / C ) .

(The argument

suppressed i n t h e v a r i a b l e s

h

y, el y

w

w i l l a s a r u l e be

e t c . , b u t used e x p l i c i t l y

i n bounds.) Introduce a l s o

Then it follows from ( 4 .4 ) and ( A . l )

Now t a k e a f i x e d element

€I€ DT

that

and c o n s i d e r

Then

Since

-

B€DT,

by d e f i n i t i o n N

and

But from ( A . l )

Hence

w.p.1 a s

N

-+m

Prediction Error Identification Methods

and, s i n c e

where

8

X



h

DM

I55

i s continuous,

is a f i x e d p o i n t and

B€B(Bx,p)

=

{el

18

-

8xl
Introduce f o r s h o r t

From t h e mean v a l u e theorem ly(t) if

p

I

5

pz(t)

i s s u f f i c i e n t l y small.

Then

1

1

W e w i l l f i r s t show t h a t each element of t h e m a t r i x

tends t o zero w.p.1 a s

r(t)

=

max

tends t o i n f i n i t y :

N

( c

I t i s easy t o see t h a t

t,

t

1

B;(k))

(Bi

Let

= i-th component of

6)

Lennnrt Ljung

156

is a martingale with respect to

{YN].

Furthermore,

%

Hence

converges with probability one (for w



X

fi4 ( 8 1,

X

P(R ( 8 ) ) = 1) according to the martingale convergence theorem, 4

Doob (1953).

It now follows from Kronecker's lemma (see e.g.

Chung (1968)) that, since r(N) N

1 C r(N) But

1 < r(N)/N

e .(t)Bi(t) +

I

5 Cl(w,ex)

N

-1 C N

and so

where

1

ej (t)Bi(t)

+

+ m,

o

as N

for w E fi2

o

as

N

-+

+ m

for w

cn4(ex)

according to (A.2). Hence m

for w cfi4(eX) nn,

h e d i c t i o n Error Identification Methodc

157

Property (2.16) can now be a p p l i e d t o (A.5) with m

c

=

1 Q(') N

E

N

(ex) + 1 Q(" ( e r e x ) N

N

F i r s t i n t r o d u c e a countable s u b s e t

DM.

Also i n t r o d u c e a s u b s e t



R

X

cnoose

RX

-

DM

of

DM

t h a t is dense i n

of t h e sample space

P ( C x ) = 1. Consider from now on a f i x e d r e a l i z a t i o n

Clearly,

w

and

and i n t r o d u c e t h e s e t

K € D a - ( E r u ) I l DM.

M

( I f t h i s sec 1 s empcy r o r any

E

7

u,

t h e a s s e r t i o n of t h e theorem i s t r i v i a l l y t r u e . ) Since

ex 4

D ~ ,

According t o (2.16) Choose /2))

N

> No(B X r W )

and choose

t h e r e i s an

Eo =

~ ~ ( B , At ) r= E ~ ( AX ) ~ , & I( . ~

= max(N4(W), N 3 ( ~ o / 2 r W . e X )

N2(Wr

p(6)d2(ex)

158

Lenmrt Ljung

Then

c l e a r l y covers the compact s e t

X

According t o the Heine

DM(€,u).

Bore1 theorem t h e r e e x i s t s a f i n i t e s e t

1

i = 1

B(ei,p*(ei)),

that covers

X

DM(E,u).

,...,K

1

Let

I t then follows from (A.8) t h a t

which means t h a t t h e minimizing element D:(E,w)

for le(N)

8(N)

cannot belong t o

N > N ~ ( w , E ) , i.e.

-

which, since E of the theorem.

DII <

E

for

N > NO(W,E)

i s an a r b i t r a r y s m a l l number, is t h e a s s e r t i o n

Prediction Error Identification Methodr

A.2

I59

PROOF OF THEOREM 4.1

The theorem follows from Lemma 4.1 if (4.3) and (4.4) can be shown to be satisfied. We will consider the general case, and let a

M (8)

(z)

can be

the least common denominator to the denominator polynomials in G M ( ~ (2) )

and

H M ( ~ (2). )

Introduce the matrix polynomials

Due to the assumptions, det C unit circle for 8 E D M , i.e. filter. and

Mg)

(z)

CM(e)(z)

has all zeroes outside the is an exponentially stable

(There are several pole-zero cancellations between a

det H .

Therefore the requirement that a has all zeroes

outside the unit circle is sufficient, but not necessary. Stability of (2-l is the only thing that matters.)

M

and, with

-1 etc., and suppressed q ,

From (A.9) and (A.10) it follows that

and

From (2.14)

Lennart Ljung

I60

-1

Since DM

-1

CM(e) (q

i s exponentially s t a b l e f o r a l l

)

is compact

and hence

and

Let

c

and

X

;

A < 1, be such t h a t

Introduce f o r b r e v i t y t h e n o t a t i o n

and d e f i n e L

Then

z(t) < % ( tand )

8 E DM, and

Prediction Error Identi/ication Methodr

Sum over

...,N

t = 0,

and d i v i d e by

I61

N:

or

According t o t h e assumptions of t h e lemma,

i s bounded w.p.1,

i s bounded w.p.1. of (4.3).

from which d i r e c t l y foLlows t h a t

Since

z(t)

5 Z(t)

t h i s concludes t h e proof

Condition (4.4) follows from ( A . 1 2 )

and (4.71, i n t h e

same way s i n c e

Acknowledgements I would l i k e t o thank P r o f e s s o r Karl Johan

i s t r 8 m for im-

p o r t a n t d i s c u s s i o n s on t h e s u b j e c t and Torsten Sliderstrgm and Daniel Thorburn f o r many v a l u a b l e comments on t h e manuscript. The s u p p o r t of t h e Swedish Board f o r Technical Development under C o n t r a c t No. 733546 i s a l s o g r a t e f u l l y acknowledged.

Lenmrt Ljung

162

REFERENCES Aoki, M. , and Yue, P. C. (1970), "On Certain Convergence Questions in System Identification," SIAM J. Control, Vol. 8 , No. 2 . &tr6m, K. J., and Bohlin, T. (19651, "Numerical Identification of Linear Dynamic Systems from Normal Operating Records," IFAC (Teddington) Symposium. istr6m, K. J. (1970), I n t r o d u c t i o n t o S t o c h a s t i c Control T h e o r y , Academic Press, New York. fistrgm, K. J. , and Eykhoff , P. (1971), "System Identification A Survey," A u t o m a t i c a , 7, 123-162. C. (1973), "Application of System fistriim, K. J., and K:llstr6m, Identification Techniques to the Determination of Ship Dynamics," Preprints of 3rd IFAC Symposium on Identification and System Parameter Estimation, The Hague, pp. 415-425. Balakrishnan, A. V. (1968), "Stochastic System Identification Techniques" In: S t o c h a s t i c O p t i m i z a t i o n and Control, M. F. Karreman, Ed., New York, Wiley, pp. 65-89. Bellman, R. , and Astrgm, K. J. (1970), "On Structural Identifiability," Math. Biosc., V o l . 7 , pp. 329-339. Caines, P. E. (1970), "The Parameter Estimation of State Variable Models of Multivariable Linear Systems," Ph.D. dissertation, Imperial College, London. Caines, P. E. (1971), "The Parameter Estimation of State Variable Models of Multivariable Linear Systems," Proceedings of the U.K.A.C. Conference on Multivariable Systems, Manchester, England. Caines, P. E., and Rissanen, J. (1974), "Maximum Likelihood Estimation of Parameters in Multivariable Gaussian Stochastic Processes,'I I E E E T r a n s . I T - 2 0 , No. 1. Chung, K. L. (1968),A Course i n P r o b a b i l i t y T h e o r y , Harcourt, Brace & World, Inc. Cram&, H. (1946), Mathematical Methods of S t a t i s t i c s , Princeton University Press, Princeton.

,

H., and Leadbetter, M. R. (1967) S t a t i o n a r y and R e l a t e d S t o c h a s t i c Processes, Wiley, New York.

Cram&,

Prediction Error Identijication Methodr

I63

Doob, J. L. (1953), S t o c h a s t i c Processes, Wiley, New York. Eaton, J. (1967), "Identification for Control Purposes," I E E E Winter meeting, New York. Fisher, R. A. (1912), "On an Absolute Criterion for Fitting Frequency Curves," Mess. of Math., V o l . 4 1 , p. 155. Glover, K., and Willems, J. C. (1973), "On the Identifiability of Linear Dynamic Systems," Preprints of the 3rd IFAC Symposium on Identification and System Parameter Estimation, The Hague, pp. 867-871. Gustavsson, I., Ljung, L,, and SSderstrSm, T. (1974), "Identification of Linear, Multivariable Process Dynamics Using Closed Loop Experiments," Report 7401, Division of Automatic Control Lund Inst. of Techn. Jazwinski, A. H. (19701, S t o c h a s t i c Processes and F i l t e r i n g Theory, Academic Press, New York. Kailath, T. (1970), "The Innovations Approach to Detection and Estimation Theory," Proc. I E E E , V o l . 5 8 , N o . 5. Lampard, D. G. (1955), "A New Method of Determining Correlation Functions of Stationary Time Series, Proc. I E E , 1 0 2 C , pp. 35-41 Ljung, L. (1974), "On Consistency for Prediction Error Identification Methods," Report 7405, Division of Automatic Control, Lund Institute of Technology. Ljung, L. (1975), "On Consistency and Identifiability," Proc. Intern. Symp. Stochastic Systems, Lexington, Kentucky, June 1975 (North-Holland Publishing Co.). To appear in Mathematical Programming Studies. Ljung, L., Gustavsson, I., and SijderstrEm, T. (1974),"Identification of Linear Multivariable Systems Operating Under Linear Feedback Control," I E E E T r a n s . , A C - 1 9 , N o . 6 (Special Issue on System Identification and Time Series Analysis), pp. 836-841. Lcders, G., and Narendra, K. S. (1974), "Stable Adaptive Schemes for State Estimation and Identification of Linear Systems,'' I E E E Trans., A C - 1 9 , N o . 6 (Special Issue on System Identification and Time Series Analysis), pp. 841-848. Mayne, D. Q. (1972), "A Canonical Form for Identification of Multivariable Linear Systems," I E E E T r a n s . , A C - 1 7 , N O . 5, pp. 728-729.

I64

Lenna~tLjung

Mehra, R. K. , and Tyler, J. S. (1973), "Case Studies in Aircraft Parameter Identification," Preprints of 3rd IFAC Symposium on Identification and System Parameter Estimation, The Hague, pp. 117-145. Mendel, J. M. (1973), Discrete Techniques of Parameter E s t i m a t i o n : T h e Egua tion Error F o r m u l a t i o n , Marcel Dekker, New York

.

Rissanen, J. (1974), "Basis of Invariants and Canonical Forms for Linear Dynamic Systems," A u t o m a t i c a , V o l . 10, No. 2, pp. 175-182. Spain, D. S. (1971), "Identification and Modelling of Discrete, Stochastic Linear Systems," Tech. Report No. 6302-10, Stanford University. Soudack, A. C., Suryanarayanan, K. L., and Rao, S. G. (19711, "A Unified Approach to Discrete-Time System Identification," rnt. J. of Control, Vol. 14, No. 6, pp. 1009-1029. Tse, E., and Anton, J. J. (1972), "On the Identifiability of Parameters," I E E E Trans. AC-17, No. 5. Tsypkin, Ya. Z . (1973), F o u n d a t i o n s of the T h e o r y of Learning S y s t e m s , Academic Press, New York. Wald, A. (1949), "Note on the Consistency of the Maximum Likelihood Estimate," Ann. Math. S t a t . , Vol. 2 0 , pp. 595-601. Woo, K. T. (1970), "Maximum Likelihood Identification of Noisy Systems," Paper 3.1, 2nd Prague IFAC Symposium on Identification and Process Parameter Estimation.