O N THE CONSISTENCY OFPREDICTIONERROR IDENTIFICA T I O N METHODS Lennart Ljung Department of Automatic Control Lund Institute of Technology S-220 07 Lund. Sweden
1.
INTRODUCTION
121
2.
SYSTEMS, MODELS AND PREDICTION ERROR IDENTIFICATION METHODS
123
2.1
SYSTEM DESCRIPTION
123
2.2
MODELS
129
2.3
IDENTIFICATION C R I T E R I A
132
2.4
CONNECTION WITH MAXIMUM LIKELIHOOD ESTIMATION
135
3.
CONSISTENCY AND IDENTIFIABILITY
137
4.
CONSISTENCY FOR GENERAL MODEL STRUCTURES
141
4.1
M A I N RESULT
142
4.2
LINEAR SYSTEMS
5.
6.
1.
143 146
IDENTIFIABILITY RESULTS 5.1
A DETERMINISTIC SET
5.2
LINEAR TIME-INVARIANT SYSTEMS
DI
146 148 152
CONCLUSIONS APPENDIX
153
REFERENCES
162
INTRODUCTION The problem of identification is to determine a model that
describes input-output data obtained from a certain system. The choice of model is as a rule made using some criterion of closeness to the data, see e.g. xstr8m-Eykhoff (1971), Soudack 121
122
Lennart Ljung
e t a l . (1971). In o u t p u t error methods the discrepancy between the model output and the measured output is minimized.
The
common model-reference methods are of this type, see e.g. Liiders-
.
Narendra (1974)
E q u a t i o n error m e t h o d s minimize the discrepancy
in the input-output equation describing the model.
Mendel (1973)
has given a quite detailed treatment of these methods. Output- and equation error methods are originally designed for noiseless data and deterministic systems.
If they are
applied to noisy data or stochastic systems they will give biased parameter estimates, unless the noise characteristics either are known or are of a very special kind. A natural extension of these methods is to take the noise characteristics into account and compare the predicted output of the model with the output signal of the system. Minimization of criteria based on this discrepancy leads to the class of p r e d i c t i o n error i d e n t i f i c a t i o n m e t h o d s .
This class contains under
suitable conditions the maximum likelihood method. The maximum likelihood method (ML method) was first introduced by Fisher (1912) as a general method for statistical parameter estimation. The problem of consistency for this method has been investigated by e.g. Wald (1949) and Cram&
(1946) under the
assumption that the obtained observations are independent. The first application of the ML method to system identification is due to hrEm-Bohlin (1965), who considered single input-single output systems of difference equation form. In this case the mentioned consistency results are not applicable. iistr8m-Bohlin (1965) showed one possibility to relax the assumption on independent observations. ML identification using state space models have been con-
sidered by e.g. Caines (1970), Woo (1970), Aoki-Yue (1970), and Spain (1971). Caines-Rissanen (1974) have discussed vector difference equations. All these authors consider consistency with probability one (strong consistency).
Tse-Anton (1972) have
proved consistency in probability for more general models.
R e d i c t i o n ETTOT Identification Methodc
123
Balakrishnan has treated ML identification in a number of papers, see e.g. Balakrishnan (1968). In the papers dealing with strong consistency, one main tool usually is an ergodic theorem. To be able to apply such a result, significant idealization of the identification experiment conditions must be introduced. The possibilities to treat input signals that are partly determined as feedback are limited, and an indispensable condition is that the likelihood function must converge w.p.1.
To achieve this usually strict stationarity of
the output is assumed. These conditions exclude many practical identification situations. For example, to identify unstable systems some kind of stabilizing feedback must be used.
Other
examples are processes that inherently are under time-varying feedback, like many economic systems.
In this paper strong consistency for general prediction error methods, including the ML method is considered. The results are valid for general process models, linear as well as nonlinear. Also quite general feedback is allowed. A general model for stochastic dynamic systems is discussed
in Section 2.
There also the identification method is
described. Different identifiability concepts are introduced in Section 3 , where a procedure to prove consistency is outlined.
In
Section 4 consistency is shown for a general system structure as well as for linear systems.
The application of the results
to linear time-invariant systems is discussed in Section 5. 2.
SYSTEMS, MODELS AND PREDICTION ERROR IDENTIFICATION METHODS
2.1
SYSTEM DESCRIPTION
A causal discrete time, deterministic system, denoted by
s,
can be described by a rule to compute future outputs of the
systems from inputs and previous outputs:
Lennart Ljung
124
Y0.,
where
"the initial conditions", represents the necessary
information to compute y(1). Often y(t+l)
is not expressed as an explicit function of
old variables, but some recursive way to calculate y(t+l)
is
preferred. Linear difference equations and state space models are well-known examples.
The advantage with such a description
is that only a finite and fixed number of old values are involved in each step. For a stochastic system future outputs cannot be exactly determined by previous data as in (2.1). probability distribution of y(t+l) should be considered.
Instead the conditional
given all previous data
It turns out to be convenient to subtract
out the conditional mean and consider an innovations representation of the form
where
I
E [y(t+l) Y,,s) I
is the conditional mean given all pre-
vious outputs and inputs,
Here
Yt
denotes the 0-algebra generated by
u(t) ,...,u(l);Y0},
and
Yo ,
{y(t),...,y(l);
"the initial condition", represents
the information available at time t
=
0 about the previous
behavior of the system. The sequence {s(t+l,Yt,S))
is a sequence of random
variables for which holds
It consists of the innovations, see Kailath (1970).
Prediction Error Identification Methods
I
will also be called
The conditional mean E [y (t+l) Yt,s] the prediction of y(t+l)
based on
Yt.
125
Since it will frequently
occur in this report a simpler notation
will be used. REMARK.
It should be remarked that the description (2.2) to
.
some extent depends on discussed here.
Two cases of choice of
Yo
The most natural choice is of course
will be
Y0
=
the
actual a p r i o r i information about previous behavior known to the "model-builder". A disadvantage with this choice is that in general E(y(t+l) lYt,s) will be time varying even if the system allows a time-invariant description. This point is further clarified below.
A
second choice is
Y0
=
-
Y0
= the information
equivalent (from the viewpoint of prediction) to knowing all previous y(t), u(t), t < 0. This choice gives simpler representations E (y(t+l) I Yt,s), but has the disadvantage that often not known to the user.
9,
is
Both choices will be discussed in
more detail for linear systems below. It should also be clear that the function
in (2.3) can gs be taken as independent of any feedback relationships between u and
y
k > t.
such that u(t)
is independent of future ~:(k,Y~-~,s)
General stochastic systems can be described by (2.21,
just as (2.1) is a general description of deterministic systems. The main results of this paper will be formulated for this general system description (2.2) .
For practical reasons, in the usual system descriptions the output is not given explicitly as in (2.2). ways to calculate y(t+l)
are used instead.
Various recursive Examples are given
below. EXAMPLE 2.1.
Linear Systems in State Space Representation
State space representations are a common and convenient way of describing linear, time-varying systems. The input-output
Lennnrt Ljung
126
S
r e l a t i o n f o r t h e system
x(t+l)=
A x(t)
s
{ e ( t ) } and
+
csx ( t ) +
y(t) = where
i s t h e n d e f i n e d by
{v(t))
Bsu(t) + e ( t ) (2.4)
v(t)
a r e sequences of independent Gaussian T
random v e c t o r s with z e r o mean v a l u e s and E e ( t ) e ( t ) = R s ( t ) , T c T E e ( t ) v ( t ) = Rs(t) and E v ( t ) v ( t ) = Q s ( t ) . The system matrices may very w e l l be time-varying b u t t h e t i m e argument i s suppressed. The f u n c t i o n E(y(t+l) where
Yt
IYtls)
=
j(t+lls)
i s t h e 0-algebra generated by
...,u (1),Y0}
u ( t ),
.
{ y ( t )I . . , y ( l ) ,
i s o b t a i n e d a s follows :
i(t+llS)
=
where t h e s t a t e e s t i m a t e
C$(t+llS) h
x
is o b t a i n e d from s t a n d a r d Kalman
filtering:
;(t+llS)
=
sc ( t l S ) +
A
BsU(t)
+ Ks(t){y
K ( t ) i s t h e Kalman g a i n m a t r i x , determined from
-2 Rsl
and
Qs
A s i B s i C s i Rsi
as
Yo
The information i n i t i a l value
C(0ls)
i s t r a n s l a t e d i n t o an e s t i m a t e of t h e
with corresponding covariance
Then (2.6) can be solved r e c u r s i v e l y from a t i o n (2.6) khen holds f o r any venient t o l e t
y,
t = 0.
.
Ps (0)
The r e p r e s e n t -
Yo, and i n t h i s ' c a s e it is con-
be t h e a c t u a l a p r i o r i knowledge about t h e
Prediction Error Identification Methods
previous behavior of the system.
127
Notice that if the system
matrices and covariance matrices all are time invariant and
Y0
=
Po,
then also Ks
will be time invariant.
A continuous time state representation can be chosen instead
of (2.4).
In e.g. ~str8m-K~llstr8m (1973) and Mehra-Tyler (1973)
it is shown how E[y(t+l) IYt,s], where
Yt
is as before, can be
calculated. The procedure is analogous to the one described above. 0 EXAMPLE 2.2.
General Linear, The-Invariant Systems
A linear time-invariant system can be described as
where and -1 q )
-1
q
Gs(z)
.
is the backward shift operator: q and
Hs(z)
The variables
-1 u(t)
are matrix functions of {e (t)}
z
=
(z
u(t-1) replaces
form a sequence of independent
random variables with zero mean values and covariance matrices T (which actually may be the-varying) It will Ee (t)e (t) = As
.
be assumed that Gs(z)
and
Hs(z)
are matrices with rational
as entries and that H (0) = I. The latter S assumption implies that e(t) has the same dimension as y(t) I functions of
z
but this is no loss of generality. Furthermore, assume that det H ( z ) has no zeroes on or inside the unit circle. This is S no serious restriction, cf. the spectral factorization theorem. -1 -1 Then Hs (q ) is a well defined exponentially stable linear filter, which is straightforwardly obtained by inverting Hs(z). To rewrite (2.7) on the form (2.2) requires some caution regarding the initial values.
If
Yo
does not contain enough
information the representation (2.2) will be time varying, even though (2.7) is time-invariant. In such a case a state space representation can be used.
Y0
=
0
A simpler approach is to assume that
= the information equivalent to knowing all previous
y(t), u(t), t < 0. It will follow from the analysis in the following sections that this assumption is quite relevant for
128
Lenmrt Ljung
identification problems. From ( 2 . 7 ) we have
-1 Since HS (0) = I, the right hand side of ( 2 . 8 ) contains y(s)
and
u(s)
only up to time
t.
The term e(t+l)
is determined from
of these variables, also in the case u output feedback. Hence, if
I
E(y(t+l) Yt,S)
=
{I
Y0
=
-
HS
is independent
Po,
-1 -1 (q
)y(t+l)
Now, linear systems are often not modelled directly in terms of the impulse response functions
GS
and
HS.
A frequently
used representation is the vector difference equation ( M E ) :
Another common representation is the state space form (in the time-invariant innovations representation form):
(2.11)
It is easily seen that these two representations correspond to
129
Prediction Error Identi/ication Methodr
respectively. In these two cases
Gs(z)
and
will be
Hs(z)
matrices with rational functions as entries. Inserting (2.12) into ( 2 . 9 ) it is seen that E(y(t+l) Yt,S)
=
is found as the solution of
G(t+llS) Cs(q
-1
A
)y(t+llS)
(Cs(q-l)-As(q-l) )y(t+l) + Bs(q
=
-1 )u(t)
(2.14) for the case of a VDE model. of
y(0) ,... ,y(-n) , u(0)
Solving (2.14) requires knowledge
,. ..,u(-n) , j ( O l s ) ,...,; (-nlS).
information is contained in the information
-
This
.
For the state space model (2.11) ;(t+llS)
is found from
(2.14b)
where the initial condition G ( 0 )
is obtained from 70
.
Notice that there is a parameter redundancy in the representations (2.10) and (2.11).
-
Cs
and all matrices
As,
-
-
All matrix polynomials Asi Bsi Bsl
Csl
Ks
and
that satisfy (2.12) and
(2.13) respectively, correspond to the same system (2.7). 0 These examples cover linear, possibly time varying systems. Clearly, also non-linear systems can be represented by (2.3).
A
simple example is y(t+l)
=
f(y(t) ,u(t)) + a(y(t))e(t+l)
It should, however, be remarked that it is in general no easy problem to transform a nonlinear system to the form ( 2 . 2 ) . This is, in fact, equivalent to solving the nonlinear filter problem.
It is therefore advantageous to directly model the
nonlinear system on the form ( 2 . 2 ) , if possible. 2.2
MODELS
In many cases the system characteristics, i.e. the function gs in (2.2) and the properties of { E (t+l,Yt,S)} are not known a priori. One possibility to obtain a model of the system is to
Lenmrt Ljung
I30
use input-output data to determine the characteristics. In this paper we shall concentrate on the problem how the function g can be found.
s
Naturally, it is impossible to find a general function gs(t;y(t) ,...,y (l);u(t),...,u(l);Yo). functions among which
g
Therefore the class of
is sought must be restricted. We will
call this set of functions the model s e t or the m o d e l s t r u c t u r e . Let it be denoted by
M
and let the elements of the model set be
indexed by a parameter vector will be denoted by
DM.
8.
The set over which
M
A certain element of
a model and be denoted by
M(8)
8
varies
will be called
or written as
A complete model of the system also models the sequence
{E(t+l,Yt,S)}
where
so that it is described by
{&(t+l,Yt,M(8))}
is a sequence of random variables with
conditional distribution that depends on M(8). For brevity, the notation
is also used for the prediction. REMARK.
The notions E[y(t+l) IYt,M(8)I
and
;(t+lle)
are
used to indicate that certain probabilistic arguments underl.ie the construction of the rules how to calculate a good estimate of the value y(t+l)
from previous data.
It must however be stressed
that the notions are purely formal and should be understood just There is no underlying M(8) ' "model-probability space" in the discussion, and all random
as replacements for the function g
131
Prediction Error Identification Methodc
variables, including of the true system.
(t+l\0) "
belong to the probability space
(The author is indebted to Prof. L. E.
Zachrisson, who suggested this clarification.) The model structures can be chosen in a completely arbitrary way.
For example,
g
can be expanded into orthogonal function
systems :
Such choices are discussed by e.g. Lampard (1955).
If there is
no natural parametrization of the model, such an expansion may be advantageous. Tsypkin (1973) has discussed models of this type in connection with identification of nonlinear systems.
However,
the usual choice is to take one of the models in Example 2.1 or
Bi into the system matrices.
2.2 and introduce unknown elements
A vector difference equation model, e.g., is then described by
AM())
(q-l)y(t)
=
BM(e)(q-l)U(t)
+ cM(0) (s-l)E(t;M(B))
where
(t;M ( 0 ) ) }
is a sequence of independent random variables T with zero mean values and EE(t,M(0))E(tIM(0)) = A M ( e ) . The {E
unknown elements may enter quite arbitrarily in the matrices A
Some elements may be known from basic physical laws, or i,M(0) ' a p r i o r i fixed. Other elements may be related to each other, etc. Generally speaking, M vector
0
can be described by the way the parameter
enters in the matrices:
the model parameterization.
Thus, for time-invariant linear systems the choice of model type, (vector difference equation, state space representation, etc.) and parameters can be understood as a way of parametrizing G
and
H
in (2.8):
GM(0)
and H
M (0)
via (2.12) or (2.13).
Lennurt Ljung
132
REMARK.
Like for the system description, also the model
.
description depends on the initial conditions most sensible to choose
Yo
It would be
as the actual a p r i o r i knowledge, but
as remarked previously, this gives more complex algorithms for computing the prediction. For time-invariant systems it will
Y0
therefore be assumed in the sequel that all previous history.
Since
to be included in the model:
-Yo -
=
0
=
knowledge of
is in general not known it has
Yo
-
= Yo(8).
Often it is sufficient
(8) = 0 all 8, i.e. u(t) = y(t) = 0,t < 0, corres0 ponding to zero initial conditions in (2.14) and (2.14b).
to take
2.3
IDENTIFICATION C R I T E R I A
The purpose of the identification is to find a model
M(0)
that in some sense suitably describes the measured input and output data. The prediction of trol.
y(t+l)
plays an important role for con-
In, e.g., linear quadratic control theory, the optimal
I Yt,S]
input shall be chosen so that E [y(t+l) havior.
has desired be-
This is the separation theorem, see e.g. istrEm (1970)-
Therefore, it is very natural to choose a model that gives the best possible prediction. That is, some function of the prediction error y(t+l)
- E(y(t+l) IYt,M(8))
should be minimized with respect to
8.
We will consider the following class of criteria. Introduce the matrix
QN(M(8))
=
where n is the number of outputs. Its dimension is n x n Y Y' Y {R(t)) is a sequence of positive definite matrices. It is assumed that
{ IR(t)
1
is bounded.
The selection of the matrices
naturally effects the relative importance given to the components
Prediction Error Identification Methodr
133
of the prediction. A special choice of weighting matrices is discussed in Section 2.4. A scalar function, h[Q (M(8)) I , of the matrix of prediction N errors will be minimized with respect to 8. For the minimization to make sense, some simple properties of the function h must be introduced. PROPERTIES OF h. n Y
x
Let h
be a continuous function with
n symmetric matrices as domain. Assume that Y1 h(AA)
g(A)h(A),
=
Let 61 < A < 1/61 let B
A, g(A) scalars and g(A) > for A > 0
0
(2.16a)
be a symmetric positive definite matrix, and
be symmetric, positive seniidefinite and nonzero. Assume
that then h(A+B+C for
E
T
tr C C If h
)
<
> h(A) + p(6) tr B
-
where
Eo,
E
0
where
p(6) > 0
depends only on
(2.16b)
6 and tr B.
0
satisfies (2.16), it defines a well posed identifi-
cation criterion by
(2.17)
or
In particular, h(A) satisfies (2.16).
will be taken as
tr A, which clearly
This criterion is probably the easiest one to
handle, theoretically as well as computationally. Then
where =
T
x R(t)x
.
Another possible choice is h(A) = det(A), which is of interest
Lennart Ljung
134
because of its relation to the likelihood function, cf. Section 2.4. LEMMA 2.1: Proof.
h(A) = det(A)
s a t i s f i e s (2.16).
Condition (2.16a) is trivially satisfied.
det(A+B+CE)
= =
det All2 det(I+A-1/2 (B+CE)A-ll2) det A 1/2 n Y det A (l+di) i=l
n
where di are the eigenvalues of A
-1/2
-1/2 (B+CE)A .
.
Let A be the largest eigenvalue of B. Then A > tr B/n Y -1/2BA-1/2 Also, A has one eigenvalue that is larger or equal to
16.
(Consider A-1/2BA-1/2~I with eigenvalue A . )
to B
the eigenvalues at most
where
A-l12x
Now, adding
~ / 6 where
E =
CE
is an eigenvector to
B
can distort
((CEIl , the operator norm
of Y
for
E
<
6 tr B
= %
which concludes the proof.. In this chapter we will consider the limiting properties of the estimate 0
that minimizes (2.17) as N
tends to infinity.
Of particular interest is of course whether the limiting values of
gives models M ( 0 )
0
that properly describe
s.
This is the
problem of consistency of prediction error identification methods. So far we have only discussed how the function E[y(t+l)
Ytlsl
can be estimated. The properties of
{E(t+l,Yt,S)}
I
can
135
Rediction Emor Identification Methodr
then be estimated from the residuals
where
S)
8*
is the minimizing value.
= {e(t+l))
In particular, if
{E(t+l,Yt,
is a stationary sequence of independent random
variables with zero mean values and we are only interested in the T second order moment properties then A = Ee(t)e (t) can be estimated as
1/N Q,(M(~*)
where
Q,
is defined by (2.15) with
R(t) = I. 2.4
CONNECTION WITH MAXIMUM L I K E L I H O O D ESTIMATION
It is well known that prediction error criteria are intimately connected with maximum likelihood estimates. This section contains a brief discussion of how the formal relations can be established. Consider the model (2.18) with
Let the innovations IE(t,M(e))) and normally distributed. given
Yt
be assumed to be independent
The probability density of
y(t+l)
and given that (2.18) is true then is
r
Here
f (xly,)
= F'
(xly,)
where
F(xlYt) = P(y(t+l)
5 xlYt)
.
Using Bayes' rule the joint probability density of y(t+l) and
y(t)
given
Yt-l
can be expressed as
136
Lennart Ljung
where y(t)
A
in y(t+118)
E{y(t+llY t , M ( 8 ) ) distribution of
should be replaced by
given
Yo.
In case
1 does not depend linearly on y(t) , the (y(t+l) ,y (t))
is not jointly normal.
directly gives the joint probability density of y(1)
xt.
Iteration
...,
y (t+l),y (t),
The logarithm of the likelihood function, given
Yo, then is obtained as
The maximum likelihood estimate (MLE) of
8
therefore is obtained
as the element that minimizes
A
If the matrices A(t)
are known, the MLE is consequently obtained
as the minimizing point of the loss function (2.17) with tr(A)
and
R(t) = ff-'(t)
.
When
i(t)
are unknown, the minimiz-
ation should be performed also with respect to A
A(t)
does not depend on
h(A) =
{ i ( s )1.
In case
t, the minimization with respect to
can be performed analytically, Eaton (1967), yielding the problem
137
Prediction Error Identqication Methods
to minimize QN(M(9))]
det[Q,(M(B))]
giving
9(N)
[where R(t)
=
I
in
and then take
Summing up, the loss function (identification criterion) (2.17) which per se has good physical interpretation, also corresponds to the log likelihood function in the case of independent and normally distributed innovations. In the analysis, however, this will not be exploited. The results are therefore valid for general distributions of the innovations. 3.
CONSISTENCY AND IDENTIFIABILITY The question of identifiability concerns the possibility to
determine the characteristics of a system using input output data. This question is obviously closely related to the problem of consistency of the parameter estimate
8. A way to connect the
two concepts is introduced in this section.
The definitions
given here are consistent with those of Ljung-GustavssonSoderstrom (1974).
The consistency of the parameter estimate 8
depends on a variety of conditions, such as noise structure, choice of input signal, model parametrization etc.
One specific
problem is that there usually is parameter redundancy in the models.
It was demonstrated in Examples 2.1 and 2.2 that several
sets of matrices give the same input output relationships, and hence cannot be distinguished from each other from measurements of inputs and outputs. Introduce the set
Lennart Ljung
138
The set DT(S,M)
consists of all parameters in DM which give
models that describe the system without error in the mean square sense. There might be differences between
s
and
M(B) ,
(S,M) due to initial conditions and discrepancies at certain T time instances, but on the whole they are indistinguishable from
8E D
input-output data only. For the case of linear, time-invariant systems it is easy to see that D (S,M) T
can be described as
Clearly, it is not meaningful to consider consistency if DT(S,M)
is empty.
M
Therefore,,unless otherwise stated it will be
(S,M) is nonempty. Naturally, T this is a very strong assumption in practice, since it implies
assumed that
is such that D
that the actual process can be modelled exactly. However, the theory of consistency does not concern approximation of systems, but convergence to "true" values.
In Ljung (1975) general con-
vergence results are given, which are valid also for the case when
D,(S,M)
is empty.
The estimate based on N
data, e(N), naturally depends on
S and M and on the identification method used, 1 .
It also
depends on the experimental conditions, like the choice of input signals, possible feedback structures etc. conditions will be denoted by
x.
The experimental
When needed, these dependencies
will be given as arguments. Suppose now that
REMARK.
By this is meant that inf
with probability one as N estimate converges.0
-t
a.
IB(N)
- el
+
e l E D~ It does not imply that the
o
Redaction Error Identification Methodr
139
Then the models that are obtained from the identification all give the same input-output characteristics as the true system.
If
we understand a system basically as an input-output relation, it is natural to say that we have identified the system if (3.2) holds : DEFINITION 3.1:
(sI(M,l,X))
able
w.p. 1 a s
N
s M,
A system
under g i v e n
i s s a i d t o be S y s t e m I d e n t i f i -
1,
x, if
and
8(N) -+DT(S,M)
+ m.
If the objective of the identification is to obtain a model that can be used to design control laws, the concept of SI
is
(S,M) give the same T input-output relation, they also give equivalent feedback laws. Since all elements in D
quite adequate.
When minimizing the criterion function, it may however lead to numerical difficulties if there is a non-unique minimum.
If
the objective is to determine some parameters that have physical significance another conept is more natural. DEFINITION 3.2: tifiable
sI(M,l,X)
(PI(M,l,X)) and
REMARK.
DT(S,M)
A system
s
i s s a i d to be P a r a m e t e r I d e n -
M,
under g i v e n
1,
and
X,
i f it is
c o n s i s t s of o n l y o n e p o i n t .
Parameter identifiability is the normal identifia-
bility concept, and it has been used by several authors, see e.g. istrGm-Bohlin (1965), Balakrishnan (1968), Bellman-istrzm (1970), Tse-Anton (1972) and Glover-Willems (1973). Usually the system matrices are assumed to correspond to a certain parameter value 8'
for the given model parametrization.
parameter 8'
In such a case the
is said to be identifiable w.p. 1
(or in
probability) if there exists a sequence of estimates that tends to
8'
w.p.1
converges to
(or in probability). 8
0
w.p.1
Now, the sequence of estimates
if and only if it is PI(M,l,X)
to Def. 3.2 and DT(S,M) = ( 8
0
1.
according
Therefore the definition just
cited is a special case of the Definition 3.2 above.
Lennart Ljung
I40
Clearly, a system 0
can be
S
pI(M,l,X)
only i f
D,(S,M)
1.
=
This means that t h e r e e x i s t s a one t o one correspondence 0 between the t r a n s f e r function and the parameter vector 8 This (9
.
one t o one correspondence can hold globally or l o c a l l y around a given value.
The terms global and l o c a l i d e n t i f i a b i l i t y have
been used f o r t h e two cases, see
e.g. Bellman and & t r 6 m
(1970).
Definition 3.2 c l e a r l y corresponds to global parameter i d e n t i fiability. The problem t o obtain such a one t o one correspondence f o r l i n e a r systems is r e l a t e d t o canonical representation of t r a n s f e r functions.
This i s a f i e l d t h a t has received much a t t e n t i o n .
The special questions r e l a t e d t o canonical forms f o r i d e n t i f i cation have been t r e a t e d by e.g. jistrGm-Eykhoff
(19711, Caines
(1971) , Mayne (1972) and Rissanen (1974). From t h e above discussion we conclude t h a t t h e problem of consistency and i d e n t i f i a b i l i t y can be t r e a t e d a s three d i f f e r e n t problems : 1.
F i r s t determine a s e t
8(N)
+
D,(S,M,T,X)
This i s a s t a t i s t i c a l problem.
such t h a t
D,(S,M,Z,X) w.p.1
as
N+m
To f i n d such a s e t , c e r t a i n
conditions, mainly on t h e noise s t r u c t u r e of the system, must be imposed. 2.
Then demand t h a t
i.e. that
S
is
SI(M,T,X).
experimental conditions,
x,
This introduces requirements on the choice of input s i g n a l , feedback
structures e t c . 3.
I f so desired, require t h a t
This i s a condition on t h e model s t r u c t u r e only, and f o r l i n e a r
Prediction Emor Identification Methodr
141
systems it is of algebraic nature. In Lemma 4.1 and in Theorems 4.1 and 4.2 of the following section the set DI
is determined for general model structures
(2.181, and linear systems respectively. Problem 2 is discussed in Section 5 for linear time-invariant systems. In Gustavsson-LjungSaerstrEm (1974) problem 2 is extensively treated for vector difference equations. Problem 3 is, as mentioned, the problem of canonical representation and can be treated separately from the identification problem. REMARK.
DT, SI and
It will not be discussed in this paper.
In the following, the arguments
s, M,
7,
x
in D I' PI will be suppressed when there is no risk of
ambiguity. 4.
CONSISTENCY FOR GENERAL MODEL STRUCTURES The problem to determine a set
a statistical problem.
is, as mentioned above, DI The approach used in most works is to
apply an ergodicity result to the criterion function (2.17) and then show that D
I
is the set of global minima of the limit of
the criterion function. However, to assume ergodicity of the involved processes introduces rather limiting conditions on the system, possible feedback structure and on the input signal. Furthermore, uniform (in 8 ) inequalities for the loss functions must be established. This is a fairly difficult problem, which in fact has been overlooked in many of the previous works. The set into which the estimates converge will here be shown to be
The reason for using limit inferior is that, under some circumstances, the limit may fail to exist.
Lennart Ljung
142
It should also be noted that ation w,
D may depend on the realizI DI(w), although in most applications it does not (a.e.X
see Section 5.
For adaptive regulators it is, however, sometimes
useful to consider
DI
as a function of w.
If convergence into a set that does not depend on w
is
desired, this can be achieved by showing that
Then O(N) 4.1
+
DI w.p.1
as
N
-+ m.
M A I N RESULT
LEMMA 4.1.
C o n s i d e r the s y s t e m
where
Consider a set of models, empty.
Let
V N ( 0 ) = h[l/N
D (S,M) is nonT 0 (N) minimize the identification criterion (2.17) , Q,(M(0))],
compact set DM. z(t)
where
DA
=
Let
sup 0 €Di
M,
where DI (w)
max l
h
such that
L
satisfies.(2.16), over a
be defined by (4.1). Suppose that
I&
j(i)(tlO)
I
((i) denotes i-th row)
is an open set containing DM, satisfies the following
condition w.p.1
Assume further that
(4.3)
143
Predict ion Error Identification Methods
lim sup tr
N-)ao
1 N
Q (M(0)) < N
m
w.p.1 for any fixed 0
€
DM
(4.4)
and that (4.5) (for h = tr
this assumption (4.5) is not necessary).
Then the estimate
The proof of the lemma is given in the Appendix. To apply Lemma 4.1 conditions (4.3) and (4.4) have to be checked.
This requires some analysis of the model structures.
REMARK.
If the search is restricted to a finite number of
models, conditions (4.3) and (4.4) can be removed and as the set DI
the smaller set
can be chosen, see Ljung (1974). 4.2
LINEAR SYSTEMS
For the linear, time-invariant model described in Example 2.2 it is relatively easy to find the variable
Lemma 4.1.
z(t)
defined in
If the search for models is restricted to those
corresponding to stable predictors, (4.3) will be satisfied if the input and output are subject to similar conditions.
This is
shown in Theorem 4.1. THEOREM 4.1: system ( 2 . 7 )
(LINEAR, TIME-INVARIANT SYSTEMS)
.
Consider the
Lennurt Ljung
144
Ele(t)14 < C ( a n d , i f the g e n e r a l criterion (2.17) i s T Ee(t)e(t) > 6I), and l e t the model set be d e s c r i b e d b y
where
used
where
DM
i s compact and
and
DMM(e)(z)
w i t h r a t i o n a l f u n c t i o n s a s entries.
H (z) a r e m a t r i c e s M(8)
Assume
O f o r VDE-parametrizations (2.12), that a l l zeroes o u t s i d e the u n i t circle f o r Ofor
(z) h a s
det
8 € DM
state s p a c e r e a l i z a t i o n s (2.13), t h a t
h a s a l l e i g e n v a l u e s i n s i d e the u n i t circle f o r
AM(~)-KM(~,C~(~)
8
€ DM.
the g e n e r a l c a s e , t h a t
det H (z) a s w e l l a s the M(8) l e a s t comon d e n o m i n a t o r t o the d e n o m i n a t o r p o l y n o m i a l s i n Ofor
GMce,( z ) and
for
8
€ DM.
H ( z ) h a v e a l l zeroes o u t s i d e the u n i t circle M(8)
Suppose a l s o t h a t
D,(S,M)
( d e f i n e d i n (3.2)) i s non-empty.
Any f e e d b a c k r e l a t i o n s h i p s b e t w e e n
u
y may e x i s t , b u t
and
assume t h a t l i m sup NThen the i d e n t i f i c a t i o n e s t i m a t e
as
tends to i n f i n i t y .
N
8 (N) c o n v e r g e s i n t o
In this c a s e
DI
The proof is given in the Appendix. the linear filters that determine G(tlf3) u
and
y
DI
w.p.1
c a n be e x p r e s s e d a s
It uses the fact that and
d/de j(t[8)
from
are exponentially stable.
For the time-varying model described in Example 2.1 it is known that the Kalman filter (see e.g. Jazwinski (1970),
Prediction ENOSIdentzjkation Methodc
Theorem 7 . 4 ) is exponentially s t a b l e i f the p a i r
is completely uniformly controllable.
(A(t)
145
, m)
Furthermore, the b a s i s for
t h e exponential depends only on the bounds on the observability and c o n t r o l l a b i l i t y Gramians.
Hence we have t h e following
theorem: THEOREM 4.2: FORM).
(TIME-VARYING LINEAR SYSTEMS I N STATE SPACE
Consider the l i n e a r s y s t e m d e s c r i b e d i n Example 2.1 and
s u p p o s e t h a t the c o v a r i a n c e matrices a r e u n i f o r m l y bounded from (For the g e n e r a l criterion (2.17) assume a l s o t h a t T (t) Ps (t) cs (t) + Q ( t ) > 61.) L e t the set o f models be d e f i n e d
above.
cs by
where
{E
( t )1
{w ( t )}
and
compact set s u c h t h a t such t h a t
T
m
d e f i n e d b y (3.1) i s nonempty and
(S,M)
is uniformly (in t
(A,C)
o b s e r v a b l e and
D
a r e s e q u e n c e s o f i n d e p e n d e n t Gaussian
(A,
6)i s
8
and i n
uniformly (in t
€ DM)
and i n
completely 8 € DM)
completely controllable. Any f e e d b a c k r e l a t i o n s h i p s b e t w e e n assume that (4.7) i s s a t i s f i e d . 8(N)
converges into
DI
u
and
y
m y exist but
Then the i d e n t i f i c a t i o n e s t i m a t e
w i t h p r o b a b i l i t y one a s
N
tends t o
i n f i n i t y . rn Theorems 4 . 1 and 4.2 determine t h e s e t general and weak conditions.
DI
under q u i t e
Actually, t h e imposed conditions:
bounded fourth moments of t h e innovations, model search over s t a b l e predictors and the condition on the o v e r a l l system behavior (4.7) a r e believed t o be s a t i s f i e d i n almost a l l conceivable applications.
For a c t u a l applications it i s of i n t e r e s t t o
Lcnnart Ljung
146
study
DI
more closely:
When is it possible to find a set
DI
satisfying (4.2) and what additional assumptions on the input generation must be imposed in order to assure system identifiability.
DICDT, i.e.
These questions are discussed in the
next section.
5.
IDENTIFIABILITY RESULTS As
outlined in Section 3, the identifiability and consistency
questions can be separated into three problems.
The first problem
to determine a set
D was solved in the previous section. The I second problem, investigation of the set DI and in particular
the relation D C DT I 5.1 DI
will be the subject of the present section.
A DETERMINISTIC SET
DI
as defined in (4.1) is a random variable.
most applications DI
=
-
DI
However, in
w.p.1
(5.1)
where
where the expectation is with respect to the sequence of innovations.
This deterministic set
DI
may be easier to
handle. For linear systems the relation (5.1) will hold if the system is in open loop and is stable, or contains linear feedback which makes the closed loop system exponentially stable. To include also nonlinear feedback terms, which makes the closed
loop system nonlinear, the concept of exponential stability has to be extended to stochastic, nonlinear systems. DEFINITION 5.1:
C o n s i d e r the l i n e a r system
Prediction Error Identification Methods
e(t)
where
the i n p u t
a r e i n d e p e n d e n t r a n d o m v a r i a b l e s , and w h e r e p a r t o f
u(t)
i s determined as (nonlinear) o u t p u t feedback.
L e t the s y s t e m a n d r e g u l a t o r be s t a r t e d u p a t t i m e
zero i n i t i a l c o n d i t i o n s , y i e l d i n q a t t i m e tively.
t
EC(Yt+)
4
<
C.
t-N,
with
the o u t p u t s and
Suppose that
scalar f u n c t i o n o f that
147
Y t-Nr
Such
T h e n the c l o s e d l o o p s y s t e m i s s a i d t o be
exponentially stable.
For l i n e a r feedback t h i s d e f i n i t i o n i s c o n s i s t e n t with t h a t t h e closed loop p o l e s be i n s i d e t h e u n i t c i r c l e . I t t u r n s o u t t h a t e x p o n e n t i a l s t a b i l i t y a s s u r e s n o t only
(5.1) b u t a l s o ( 4 . 7 ) . LEMMA 5.1:
Example 2.2.
Hence w e have t h e following lemma:
C o n s i d e r the l i n e a r s y s t e m s o f E x a m p l e 2.1 or
L e t the i n p u t h a v e the g e n e r a l f o r m
i s a s i g n a l t h a t i s i n d e p e n d e n t of
where
ur(t)
s < t
and s u c h t h a t
i s a s e q u e n c e o f d i s t u r b a n c e s o f a f i l t e r e d w h i t e noise
{w(t) }
character, s a y , which i s independent o f Elw(t)
y(s)I u(s),
l4
designer.
< C.
The f u n c t i o n
ft
{ e ( t ) } and s u c h t h a t
may be unknown t o the e x p e r i m e n t
A s s u m e t h a t the i n p u t i s such t h a t the c l o s e d l o o p
s y s t e m s i s e x p o n e n t i a l l y stable ( D e f . 5.1) a n d t h a t
DM
satisfies
Lennart Ljung
148
t h e assumptions o f Theorem 4.2 or 4.1 r e s p e c t i v e l y .
e(t), y(t)
and
u(t)
Suppose t h a t
have u n i f o r m l y bounded f o u r t h moments.
Then (4.7) and (5.1) h o l d .
Proof.
The proof is based on the following theorem due to
Cramer and Leadbetter (1967):
If
then
with probability one. It follows by straightforward calculations from the assumptions on exponential stability and on
satisfy (5.3).
DM
that
(For details, see Ljung (1974), Lemma 5.2.)
This
proves the lemma. 5.2
LINEAR TIME-INVARIANT SYSTEMS
Let us now study in more detail linear time-invariant systems as treated in Example 2.2 and Theorem 4.1.
Since this class in-
cludes any parametrization of vector difference equations or state space realizations or any other parametrization of a linear timeinvariant system, it is believed that such analysis is sufficient for most applications. From Theorem 4.1 and Lemma 5.1 it follows that the estimates tend to the set
Prediction ETTOT Identification Methods
-1
+
-
(HS Gs
H
I49
-1
M ( e) G~ ( 8
This s e t c l e a r l y depends on t h e i n p u t s i g n a l .
I f the i n p u t i s
n o t s u f f i c i e n t l y g e n e r a l , t h e set may c o n t a i n parameters c o r r e s ponding t o models t h a t d e s c r i b e t h e system w e l l f o r t h e used inp u t , b u t f a i l t o d e s c r i b e it f o r o t h e r i n p u t s .
This i s t h e case
i f t h e i n p u t c o n t a i n s t o o few f r e q u e n c i e s o r if it has c e r t a i n r e l a t i o n s h i p s with the o u t p u t .
Then
DI
i s n o t contained i n
DT
and t h e system i s n o t System I d e n t i f i a b l e f o r t h i s i n p u t (experiment c o n d i t i o n )
6I
The s e t
.
has been analysed i n Ljung-Gustavsson-Sierstrgm
(1974) i n d e t a i l f o r t h e case o f time-varying feedback.
Here w e
w i l l c o n s i d e r a c a s e with l i n e a r feedback and an e x t r a i n p u t signal (or noise).
where
L e t t h e i n p u t be
{u ( t ) } i s a sequence t h a t does n o t depend on R
Suppose t h a t
F(z)
is a m a t r i x with r a t i o n a l f u n c t i o n s a s
e n t r i e s , such t h a t t h e c l o s e d loop system i s s t a b l e . loop system i s y(t+l)
=
(I
{e(t)).
-
q-lGs(q-l)F(q-l))-lHs(q-l)e(t+l)
The closed
Lennart Ljung
I50
Then
Since
z
is independent of
iiR and uR‘ the expectation can be
written
If Ee(t)e(tIT > 61, Ke(q
-1
-
then it follows that
-1 -1 Le(q )F(q
=
0
for
e
E
DI
(5.4)
since the first term has to be arbitrarily close to zero infinetely often for
If u
R
€IE
DI.
This in turn implies that
is persistently exciting (see e.g. Mayne (1972)) of
sufficiently high order then this implies that -1 Le(q )
which, via (5.4) Ke(q
-
=
0
for
e
E
DI
for
e
E
DI
implies that
-1
0
)
That is, DI = DT
.
.
Prediction Error Identzyicafion Methods
REMARK. Let U (t) = col(uR(t) M sufficient to assume that
,...,uR (t-M)).
I51
Then it is
(5.5) The limit of the sum does not have to exist, as in the definition of persistent excitation in Mayne (1972). The number M on
s
for which (5.5) has to be satisfied depends
and on the parametrization of M.
presentations M
For state space re-
can be related to the orders of the system and
model, see e.g. Mayne (1972).
For the unspecified models, which
we deal with here, we can require that (5.5) holds for any M. Summing up this discussion, Lemma 5.1 and Theorem 4.1, we have the following theorem. THEOREM 5.1: y (t+l)
where
{e(t)}
that
Ele(t)
where
{u,(t)
for any
M.
=
C o n s i d e r the s y s t e m (2.7), s, Gs
(q-')
where
+ Hs (q-l)e (t+l)
i s a s e q u e n c e o f i n d e p e n d e n t random v a r i a b l e s s u c h
l4
< C and
}
Ee(t)e(t)T
i s independent of
Assume t h a t
i s exponentially stable. Y(t+l)
u (t)
=
GMce)
F
{e(t)
(q-l)u(t)
8(N)
1
and s a t i s f i e s (5.5)
L e t the model s e t , M, be d e s c r i b e d b y
+ HM(8) (q-')e(t+l)
conditions a s i n Theorem 4.1 f o r Let
The i n p u t i s
i s s u c h t h a t the c l o s e d l o o p s y s t e m
DM i s compact and s u c h t h a t
i s nonempty.
> 61.
H
M(e)
8 c D~
;
s a t i s f i e s the same
(2)
8 €DM. Assume t h a t
be the estimate o f
8
D,(S,M)
b a s e d on
N
data
points, o b t a i n e d b y m i n i m i z i n g the g e n e r a l criterion (2.17). Then
8(N)
-+
DT(S,M)
w i t h p r o b a b i l i t y one a s
N
-+
Q)
Lentwart Ljung
152
where
That is,
s
is System Identifiable.
REMARK.
Notice that, when evaluating the criterion (2.171,
the predictor
G(tl0)
initial data.
As remarked several times above, it is most
does not have to be based on the true
suitably chosen as the time-invariant, steady state predictor (2.9) initialized with zero initial state.
The chapter by Caines and Chan in this volume contains several interesting, related results on identifiability of feedback systems. 6.
CONCLUSIONS In this contribution consistency and identifiability
properties of prediction error identification methods have been investigated. A separation of the problem into three different tasks has been introduced, and results of varying generality and on varying levels have been given.
The results can be used as a
kit for "doing-your-own consistency theorem" in a specific application. They also solve the identifiability and consistency problem for linear, time-invariant systems under quite general assumptions, as shown in Theorem 5.1. The hard part in consistency theorems is believed to be to determine the set into which the estimates converge (Problem 1 in the formulation of Section 3 ) .
This has been solved for quite
general (Lemma 4.1) as well as for linear systems (Theorems 4.1 and 4.2).
Due to the very weak conditions imposed, these results
are applicable to adaptive systems of almost any kind, in addition to the more straightforward cases treated in Section 5. The difficult and vital problem of choosing a parametrization (model set) that gives identifiable parameters and consistent parameter estimates has not been considered here.
However, this
problem is most conveniently treated as a problem of its own, and
153
Prediction Error Identiyication Methods
it does n o t depend on t h e i d e n t i f i c a t i o n method chosen. APPENDIX A.l
PROOF O F LEMMA 4 . 1
The i d e a of t h e proof i s t o show t h a t
-0 E D T .
ex
where t h e infimum i s taken over an open sphere around radius
p , and where
Then t h e minimizing p o i n t
cannot belong t o t h i s sphere f o r
N
>
No(O
X
,p,w).
with
e(N)
This r e s u l t i s
then extended t o hold f o r t h e complement of any open r e g i o n containing
D by applying t h e Heine-Bore1 theorem. I' Let without loss of g e n e r a l i t y R ( t ) = I . Introduce, f o r
short, e (t) =
E( t r
Yt-l
IS)
=
y(t)
- ;(tls,
and c o n s i d e r
Let
T
E[e(t)e(t)
St > 6 1
for a l l
IYt-,] t.
According t o t h e assumptions t' Each element of t h e m a t r i x = S
c l e a r l y i s a martingale with bounded v a r i a n c e , from which follows that w.p.1 and
2/6' >
1
' QN(s) 2 62 I
for where
as
n > N1(u), P(Ql)
= 1
N -+
m
o€n,
Lennart Ljung
I54
where
6 ' = min(6, 1 / C ) .
(The argument
suppressed i n t h e v a r i a b l e s
h
y, el y
w
w i l l a s a r u l e be
e t c . , b u t used e x p l i c i t l y
i n bounds.) Introduce a l s o
Then it follows from ( 4 .4 ) and ( A . l )
Now t a k e a f i x e d element
€I€ DT
that
and c o n s i d e r
Then
Since
-
B€DT,
by d e f i n i t i o n N
and
But from ( A . l )
Hence
w.p.1 a s
N
-+m
Prediction Error Identification Methods
and, s i n c e
where
8
X
€
h
DM
I55
i s continuous,
is a f i x e d p o i n t and
B€B(Bx,p)
=
{el
18
-
8xl
Introduce f o r s h o r t
From t h e mean v a l u e theorem ly(t) if
p
I
5
pz(t)
i s s u f f i c i e n t l y small.
Then
1
1
W e w i l l f i r s t show t h a t each element of t h e m a t r i x
tends t o zero w.p.1 a s
r(t)
=
max
tends t o i n f i n i t y :
N
( c
I t i s easy t o see t h a t
t,
t
1
B;(k))
(Bi
Let
= i-th component of
6)
Lennnrt Ljung
156
is a martingale with respect to
{YN].
Furthermore,
%
Hence
converges with probability one (for w
€
X
fi4 ( 8 1,
X
P(R ( 8 ) ) = 1) according to the martingale convergence theorem, 4
Doob (1953).
It now follows from Kronecker's lemma (see e.g.
Chung (1968)) that, since r(N) N
1 C r(N) But
1 < r(N)/N
e .(t)Bi(t) +
I
5 Cl(w,ex)
N
-1 C N
and so
where
1
ej (t)Bi(t)
+
+ m,
o
as N
for w E fi2
o
as
N
-+
+ m
for w
cn4(ex)
according to (A.2). Hence m
for w cfi4(eX) nn,
h e d i c t i o n Error Identification Methodc
157
Property (2.16) can now be a p p l i e d t o (A.5) with m
c
=
1 Q(') N
E
N
(ex) + 1 Q(" ( e r e x ) N
N
F i r s t i n t r o d u c e a countable s u b s e t
DM.
Also i n t r o d u c e a s u b s e t
€
R
X
cnoose
RX
-
DM
of
DM
t h a t is dense i n
of t h e sample space
P ( C x ) = 1. Consider from now on a f i x e d r e a l i z a t i o n
Clearly,
w
and
and i n t r o d u c e t h e s e t
K € D a - ( E r u ) I l DM.
M
( I f t h i s sec 1 s empcy r o r any
E
7
u,
t h e a s s e r t i o n of t h e theorem i s t r i v i a l l y t r u e . ) Since
ex 4
D ~ ,
According t o (2.16) Choose /2))
N
> No(B X r W )
and choose
t h e r e i s an
Eo =
~ ~ ( B , At ) r= E ~ ( AX ) ~ , & I( . ~
= max(N4(W), N 3 ( ~ o / 2 r W . e X )
N2(Wr
p(6)d2(ex)
158
Lenmrt Ljung
Then
c l e a r l y covers the compact s e t
X
According t o the Heine
DM(€,u).
Bore1 theorem t h e r e e x i s t s a f i n i t e s e t
1
i = 1
B(ei,p*(ei)),
that covers
X
DM(E,u).
,...,K
1
Let
I t then follows from (A.8) t h a t
which means t h a t t h e minimizing element D:(E,w)
for le(N)
8(N)
cannot belong t o
N > N ~ ( w , E ) , i.e.
-
which, since E of the theorem.
DII <
E
for
N > NO(W,E)
i s an a r b i t r a r y s m a l l number, is t h e a s s e r t i o n
Prediction Error Identification Methodr
A.2
I59
PROOF OF THEOREM 4.1
The theorem follows from Lemma 4.1 if (4.3) and (4.4) can be shown to be satisfied. We will consider the general case, and let a
M (8)
(z)
can be
the least common denominator to the denominator polynomials in G M ( ~ (2) )
and
H M ( ~ (2). )
Introduce the matrix polynomials
Due to the assumptions, det C unit circle for 8 E D M , i.e. filter. and
Mg)
(z)
CM(e)(z)
has all zeroes outside the is an exponentially stable
(There are several pole-zero cancellations between a
det H .
Therefore the requirement that a has all zeroes
outside the unit circle is sufficient, but not necessary. Stability of (2-l is the only thing that matters.)
M
and, with
-1 etc., and suppressed q ,
From (A.9) and (A.10) it follows that
and
From (2.14)
Lennart Ljung
I60
-1
Since DM
-1
CM(e) (q
i s exponentially s t a b l e f o r a l l
)
is compact
and hence
and
Let
c
and
X
;
A < 1, be such t h a t
Introduce f o r b r e v i t y t h e n o t a t i o n
and d e f i n e L
Then
z(t) < % ( tand )
8 E DM, and
Prediction Error Identi/ication Methodr
Sum over
...,N
t = 0,
and d i v i d e by
I61
N:
or
According t o t h e assumptions of t h e lemma,
i s bounded w.p.1,
i s bounded w.p.1. of (4.3).
from which d i r e c t l y foLlows t h a t
Since
z(t)
5 Z(t)
t h i s concludes t h e proof
Condition (4.4) follows from ( A . 1 2 )
and (4.71, i n t h e
same way s i n c e
Acknowledgements I would l i k e t o thank P r o f e s s o r Karl Johan
i s t r 8 m for im-
p o r t a n t d i s c u s s i o n s on t h e s u b j e c t and Torsten Sliderstrgm and Daniel Thorburn f o r many v a l u a b l e comments on t h e manuscript. The s u p p o r t of t h e Swedish Board f o r Technical Development under C o n t r a c t No. 733546 i s a l s o g r a t e f u l l y acknowledged.
Lenmrt Ljung
162
REFERENCES Aoki, M. , and Yue, P. C. (1970), "On Certain Convergence Questions in System Identification," SIAM J. Control, Vol. 8 , No. 2 . &tr6m, K. J., and Bohlin, T. (19651, "Numerical Identification of Linear Dynamic Systems from Normal Operating Records," IFAC (Teddington) Symposium. istr6m, K. J. (1970), I n t r o d u c t i o n t o S t o c h a s t i c Control T h e o r y , Academic Press, New York. fistrgm, K. J. , and Eykhoff , P. (1971), "System Identification A Survey," A u t o m a t i c a , 7, 123-162. C. (1973), "Application of System fistriim, K. J., and K:llstr6m, Identification Techniques to the Determination of Ship Dynamics," Preprints of 3rd IFAC Symposium on Identification and System Parameter Estimation, The Hague, pp. 415-425. Balakrishnan, A. V. (1968), "Stochastic System Identification Techniques" In: S t o c h a s t i c O p t i m i z a t i o n and Control, M. F. Karreman, Ed., New York, Wiley, pp. 65-89. Bellman, R. , and Astrgm, K. J. (1970), "On Structural Identifiability," Math. Biosc., V o l . 7 , pp. 329-339. Caines, P. E. (1970), "The Parameter Estimation of State Variable Models of Multivariable Linear Systems," Ph.D. dissertation, Imperial College, London. Caines, P. E. (1971), "The Parameter Estimation of State Variable Models of Multivariable Linear Systems," Proceedings of the U.K.A.C. Conference on Multivariable Systems, Manchester, England. Caines, P. E., and Rissanen, J. (1974), "Maximum Likelihood Estimation of Parameters in Multivariable Gaussian Stochastic Processes,'I I E E E T r a n s . I T - 2 0 , No. 1. Chung, K. L. (1968),A Course i n P r o b a b i l i t y T h e o r y , Harcourt, Brace & World, Inc. Cram&, H. (1946), Mathematical Methods of S t a t i s t i c s , Princeton University Press, Princeton.
,
H., and Leadbetter, M. R. (1967) S t a t i o n a r y and R e l a t e d S t o c h a s t i c Processes, Wiley, New York.
Cram&,
Prediction Error Identijication Methodr
I63
Doob, J. L. (1953), S t o c h a s t i c Processes, Wiley, New York. Eaton, J. (1967), "Identification for Control Purposes," I E E E Winter meeting, New York. Fisher, R. A. (1912), "On an Absolute Criterion for Fitting Frequency Curves," Mess. of Math., V o l . 4 1 , p. 155. Glover, K., and Willems, J. C. (1973), "On the Identifiability of Linear Dynamic Systems," Preprints of the 3rd IFAC Symposium on Identification and System Parameter Estimation, The Hague, pp. 867-871. Gustavsson, I., Ljung, L,, and SSderstrSm, T. (1974), "Identification of Linear, Multivariable Process Dynamics Using Closed Loop Experiments," Report 7401, Division of Automatic Control Lund Inst. of Techn. Jazwinski, A. H. (19701, S t o c h a s t i c Processes and F i l t e r i n g Theory, Academic Press, New York. Kailath, T. (1970), "The Innovations Approach to Detection and Estimation Theory," Proc. I E E E , V o l . 5 8 , N o . 5. Lampard, D. G. (1955), "A New Method of Determining Correlation Functions of Stationary Time Series, Proc. I E E , 1 0 2 C , pp. 35-41 Ljung, L. (1974), "On Consistency for Prediction Error Identification Methods," Report 7405, Division of Automatic Control, Lund Institute of Technology. Ljung, L. (1975), "On Consistency and Identifiability," Proc. Intern. Symp. Stochastic Systems, Lexington, Kentucky, June 1975 (North-Holland Publishing Co.). To appear in Mathematical Programming Studies. Ljung, L., Gustavsson, I., and SijderstrEm, T. (1974),"Identification of Linear Multivariable Systems Operating Under Linear Feedback Control," I E E E T r a n s . , A C - 1 9 , N o . 6 (Special Issue on System Identification and Time Series Analysis), pp. 836-841. Lcders, G., and Narendra, K. S. (1974), "Stable Adaptive Schemes for State Estimation and Identification of Linear Systems,'' I E E E Trans., A C - 1 9 , N o . 6 (Special Issue on System Identification and Time Series Analysis), pp. 841-848. Mayne, D. Q. (1972), "A Canonical Form for Identification of Multivariable Linear Systems," I E E E T r a n s . , A C - 1 7 , N O . 5, pp. 728-729.
I64
Lenna~tLjung
Mehra, R. K. , and Tyler, J. S. (1973), "Case Studies in Aircraft Parameter Identification," Preprints of 3rd IFAC Symposium on Identification and System Parameter Estimation, The Hague, pp. 117-145. Mendel, J. M. (1973), Discrete Techniques of Parameter E s t i m a t i o n : T h e Egua tion Error F o r m u l a t i o n , Marcel Dekker, New York
.
Rissanen, J. (1974), "Basis of Invariants and Canonical Forms for Linear Dynamic Systems," A u t o m a t i c a , V o l . 10, No. 2, pp. 175-182. Spain, D. S. (1971), "Identification and Modelling of Discrete, Stochastic Linear Systems," Tech. Report No. 6302-10, Stanford University. Soudack, A. C., Suryanarayanan, K. L., and Rao, S. G. (19711, "A Unified Approach to Discrete-Time System Identification," rnt. J. of Control, Vol. 14, No. 6, pp. 1009-1029. Tse, E., and Anton, J. J. (1972), "On the Identifiability of Parameters," I E E E Trans. AC-17, No. 5. Tsypkin, Ya. Z . (1973), F o u n d a t i o n s of the T h e o r y of Learning S y s t e m s , Academic Press, New York. Wald, A. (1949), "Note on the Consistency of the Maximum Likelihood Estimate," Ann. Math. S t a t . , Vol. 2 0 , pp. 595-601. Woo, K. T. (1970), "Maximum Likelihood Identification of Noisy Systems," Paper 3.1, 2nd Prague IFAC Symposium on Identification and Process Parameter Estimation.