Convergence properties of the generalised least squares identitication method

Convergence properties of the generalised least squares identitication method

Automatica, Vol. 10, pp. 617-626. Pergamon Press, 1974. Printed in Great Britain. Convergence Properties of the Generalised Least Squares Identiticat...

816KB Sizes 41 Downloads 96 Views

Automatica, Vol. 10, pp. 617-626. Pergamon Press, 1974. Printed in Great Britain.

Convergence Properties of the Generalised Least Squares Identitication Method* Proprirtrs de Convergence de la Mrthode D'identification Grnrralisre de Moindres Carrrs Konvergenzeigenschaften der verallgemeinerten Identifikations-methode der kleinsten Quadrate CBOfiCTBa CXO/IHMOCTrl o6ormennoro M e T O ~ a H/IeHTH~I4~a/_IHH HaI4MeHbIIIHX r B a I t p a T O B TORSTEN SODERSTROMt Generalized least squares identification may be interpreted as a loss .function minimization, but more than one local minimum may exist, depending on the signal to noise ratio, and undesired estimates can result. Summary--Convergence properties of the generalized least squares method are analyzed. The method can be interpreted as optimization of a likelihood function. The number of local maximum points of the likelihood function is examined. It is shown that this number is influenced by the signal to noise ratio. This theoretical result is illustrated by numerical examples using plant measurements. It is also proved that the method gives consistent estimates under suitable conditions.

The local minimum points are analyzed. An asymptotic loss function is considered, i.e. the number of data is assumed to tend to infinity. It is proved that there is a local minimum point of the real loss function, based on a finite number of data, close to every local minimum point of the asymptotic loss function when the number of data is large. The validity of this assertion is moreover illustrated by numerical examples in the paper and also by simulated examples in [13].

1. INTRODUCTION THE MAIN objective of this paper is to examine the properties of the generalized least squares (GLS) method. This method is an extension of the wellknown least squares (LS) method, see [3], [9]. The basic advantage of the GLS method is the possibility of estimating the parameters of a system corrupted with noise of an arbitrary spectrum. The GLS method is an iterative procedure. An essential task is therefore to examine if and when the method converges. If the parameter estimates converge as the number of iterations tends to infinity, it would also be interesting to examine the convergence limit. It is shown in the paper that the iterative procedure can be interpeted as minimization of a loss function using a special kind of search routine. The possible limit points of the estimates turn out to be the local minimum points of this loss function.

The relation of the local minimum points of the two loss functions is closely related to the concept of consistency. Then only global minimum points are considered. It is also proved that under suitable conditions the GLS method gives consistent estimates. Examination of the local minimum points of the loss functions, corresponding to the model structures of some other identification methods, has also been done by the author in [4], [14]. It must be stressed, however, that such an analysis depends very strongly on the considered model structure. It is shown in the paper that the GLS method has two large drawbacks. The first drawback is that there may exist more than one local minimum point. This is shown theoretically and also illustrated by a numerical example. It is possible to some extent to test the model obtained from the identification, el. [6]. In this way the true global minimum point may be found. The other drawback is that the iterative procedure constitutes a very slow minimization method. The convergence to a local minimum point is only linear. In the paper an

* Received 3 August 1973; revised 19 November 1973; revised 3 May 1974. The original version of this paper was presented at the 3rd IFAC Symposium on Identification and System Parameter Estimation which was held in the Hague/Delft, The Netherlands during June 1973. It was recommended for publication in revised form by Associate Editor A. Sage. t Uppsala University, Institute of Technology, Uppsala, Sweden. 617

618

TORSTEN SODERSTROM

alternative, much more efficient minimization method is presented. There is no intention to make a careful comparison of the GLS method with other, well-known methods. However, in the numerical examples a comparison with the ML method used by ASTROM and BOItLIN [2] is given. The results of this paper can be found in a more detailed form in [13], which also contains the omitted proofs of the theorems. 2. THE GLS METHOD Assume that a process (called the system in this paper) is asymptotically stable and given by

A*(q- 1)y(t)=B*(q- 1)u(t) + v(t)

(1)

Estimate A*(q -1) and B*(q -1) by the LS method.

(2)

Estimate H*(q-1) -1 by fitting an autoregression to the residuals.

(3)

B*(q-1)=blq -1+ . . . +bnq -n For simplicity e(t) always denotes white noise with zero mean. Its variance Ee(t) 2 is denoted by a 2. S denotes the signal to noise ratio. In the following it will be assumed that the noise v(t) can be expressed as

v(t)=H*(q-1)e(t) where H*(q -1) is an asymptotically stable filter of finite order. The purpose of an identification is to fit a mathematical model to given data. The LS method corresponds to the model (2.2)

and the parameter estimates are obtained from the unique minimum of the loss function

Filter data. Make a new LS estimation of

A*(q-1) and B*(q-1) using the filtered data

(2.1)

where y(t) is the output at time t, u(t) the input and v(t) is noise, q-~ is the backward shift operator and A * ( q - 1 ) = l + a l q -1+ . . . +a,q-"

.~*(q- 1)y(t)=/~*(q- 1)u(t) + 8(t)*

is white noise. The LS method will then give consistent estimates if ur(O and yr(t) are used. However, the assumption that H*(q-1) is known is not realistic. The idea of the GLS method, [9], is to estimate H*(q -1) as well as A*(q -1) and B*(q-1) in an iterative way. The main steps are the following:

and repeat from 2. There are different versions of the GLS method. In the one considered here, described in [9] and [13], always the original, and not the already filtered, data are filtered. This version corresponds to the model

1 .~*(q- 1)y(t) = B*(q- 1)u(t) + ~ , - - i ~ ) e(t)

(2.4)

For simplicity assume that all the polynomials of (2.4) are of degree n. The extension of the analysis to polynomials of different degrees is straightforward. 3. PRELIMINARIES The concept of persistently exciting signals will be used in the paper. A signal u(t) is said to be persistently exciting of order n, of. [2], [3] if

1 ~

1 N

(i) lim - - ~ u(t) = ~ and lira .-= Z [ u ( t ) - ff] N - ~ ~V ~ ] t = l

1 N W = ~-~,~182(t)

N - * c~ l ~ t = l

[u(t +T)-ff ]=r.(x) exist and

where N denotes the number of data. If the noise v(t) is white this method will give consistent estimates [1]. If v(t) is not white the parameter estimates will in general converge to false values with probability one as the number of data tends to infinity. If the correlation function of v(t) or equivalently H*(q-1) is known it is possible to filter data with H*(q- 1)- 1 so that

vF(t) =a*(q- 1)ye(t) -- B*(q- t)ue(t) [ur(t)=n*(q - k)-lu(t) etc.] (2.3) * In equation (2.2.) e(t) denotes the residuals, which can be intepreted as an equation error in (2.2) or as an estimate of the noise v(t).

r~(O)

r.(n - 1)

ru(n - 1)

ru(0)

(ii)

is positive definite. This concept is discussexl e.g. in [11]. Loosely speaking, if a controllable system of order n is controlled by an input signal persistently exciting of order 2n, then all modes of the system are excited. To be more precise, assume that A*(q-1)y(O=B*(q-1)u(t) is controllable and of

Convergence properties of the generalized least squares identification method order n. If u(t) is persistently exciting of order 2n then i=1

The loss function can be written, cf. (2.2) as 1 N WLs(~) = ~t__~l e(t) 2

(h~y(t- 1 ) - h i + . u ( t - i))=0

1

all t implies hz=0 i = 1. . . . . 2n. The following theorem gives a valuable interpretation of the GLS method.

1

N

N

= 2--N,~x [y(t) + z(t)r~o] 2 where

Theorem 1. The considered version of GLS is a minimization of the loss function

wN(O)=w (O)=-f~-,~

619

z(t)--=-[y(t- 1 ) . . . y ( t - 2 n ) - u(t) . . . - u ( t - 2n)] r q~-[al • • • a2n, bl • • • b2.] T

e2(t)

Introduce further the notations Note that the subscript N will sometimes be dropped for simplicity. O=[al,

• • • , a., bl .....

b., ~1 .....

1

Z=~

~.]r

1

N

z"= 2-N ,~----x y(t)z(t)

e ( t ) : (~*(q - 1)[~,(q- 1)y(t) _/~.(q- X)u(t)] by a relaxation method in which Wt~ is minimized first with respect to a~ . . . . . a.,/~1, • • •,/~., then w.r.t. ~x, • • • , ~., then again w.r.t, a~ . . . . . a., /;1 . . . . , b. and so on. Further if 1 v ( t ) - C*(q- 1) e(t) (3.1)

N

t ~--1z(t)z(t)r

1

N

Y = 2--N,~-1 y2(t) Then WLs(9)=Y + 2~,r~0+ ~orZ~0 Thus, the LS estimate, ~b, can be written as ~ = - Z - a~.

with e(t) white gaussian noise the GLS method is the maximum likelihood method applied to (2.1).

Remark 1. It follows from [7] that if (3.1) holds the estimates have noice asymptotic properties, they are e.g. asymptotically efficient and asymptotically normal. Remark 2. W(Ok), k = 1, 2 . . . . being the number of iterations, form a decreasing bounded sequence which implies convergence. Possible bounded limits must be stationary points of W(0). It can be shown, [131, that they must be minimum points if 'stability' is required. (By a stable limit point is understood that a start sufficiently close to the point will imply convergence to the point.) Thus it is of great interest to know if the loss function has a unique local minimum or not.

The corresponding value of the loss function is

The estimated covariance matrix P$ is, see e.g. [1],

P;=

w~s(O) z- 1

N"

Consider now the function ff,(,p) = (,p _ ¢)re~

1(~ _ ~)

(3.2)

Using straightforward calculations it is rewritten as

N

l~(¢p) = WLs(~) [tpTzcp -- 2¢pTz~) + ~)Tz~)]

Remark 3. Note that the convergence can be very slow. Close to a minimum point Ok will converge linearly. Remark 4. There are more efficient ways to minimize W(O). The following optimization method reduces the computing time to about 10 per cent for some simple examples. Let ~ denote the LS estimate of A*(q -1) and B*(q -1) of order 2n and let P ; denote the corresponding estimated covariance matrix.

N

__ WL$(~) [~TZq ) ..~2q~r~ + ~r Z - 1~]

N - wL~(O) [ w~o)-

w~#)]

Thus

Whs(~p)--Wzs(~)N ff'(q') =

W,.s(O)

(3.3)

620

TORSTEN S6DERSTRObl

Now # @ ) will be considered only for some special values of q~. By multiplying (2.4) with C*(q-t) it can be seen that the GLS method can be interpreted as the LS method applied to a model of order 2n and with a constraint. In fact, ¢p is defined as a function of 0 in the following way. The two polynomials A*(q -1) and B*(q -1) of degree 2n is expressed as

A *(q-l)=fit*(q-')~*(q-~),

B*(q -1)

=B,(q-l)~,(q-1)

(3.4)

Using this restricted set of values of ~0, it is seen that WLS@) is nothing but the loss function W(O), defined in Theorem 1. This means that minimization of W(q0, subject to the constraints (3.4) is equivalent to a minimization of W(O). The minimization of I~(9) subject to (3.4) can be done directly with respect to 0 since q~ easily is determined as a function of 0 from (3.4). Moreover, in the minimization of l~(q>) the data are not needed explicitly. This fact can be interpreted such that ~ and P$ is a sufficient statistics for the present problem. It can be mentioned that the problem of minimizing functions of the type (3.2) is discussed to a large extent in [15]. The reason for this interest is that such a minimization constitutes a way of testing common factors of polynomials. Clearly W is a polynomial in a t , . . . , ~, where the coefficients are different sample covariances. An analysis of W and especially the local minimum points of this function must therefore be done in a probabilistic setting. In order to do the analysis reasonable asymptotic theory will be used. In the following some assumptions are made: (i)

u(t)=ut(t ) + G*(q- X)el(t) u~(t) is deterministic and periodic, G*(q-a) is a stable filter of finite order, and ex(t) is white noise. One of the terms may

where

vanish. (ii) el(t) and

e(t)(u(t)

and

v(t)) are

independent.

In [13] it is shown by utilizing standard ergodic theory that these assumptions imply that WN(O) has a limit V(O) for all 0 with probability one as N tends to infinity. The calculations are not included here due to their length. The asymptotic loss function V(O) satisfies

V(O)= ½E

{~*(q-

z(t)

is a deterministic signal, the notation 1 N

The function WN(O)is a polynomial in 0 of degree four for all values of N. The coeffÉcients of the polynomial are different sample covariances. The assertion of convergence means that these sample covariances converge with probability one when N tends to infinity. Since WN(O)and also V(0) are polynomials, it follows that WN(O) converges uniformly to V(O) on every compact set. In the next section the number of local minimum points of V(O) is examined. 4. MAIN RESULTS

In the following analysis it will be assumed that the parameter vector 0 belongs to an arbitrary compact set fL A very natural choice of the set f~ is given by the following conditions. (i) .~*(z) has all zeros outside or on the unit circle. (ii) [/;i[ ~
Theorem 2. Consider parameter estimates in a compact set gl*. Suppose that V(O) has a unique, global minimum point Ov in f~*. Then WN(O)has (at least for large values of N) a global minimum point Ow in f~* such that Ow-+Ov, N-,oo

(4.1)

Proof. Consider only WN(O) converges to V(O).

realizations for which These realizations have the measure 1. Pick an arbitrary e > 0 and consider the set S,={O; II0-0vll ~ } . it will be shown that OweS, provided that N is sufficiently large. As a first step a number 6 > 0 can be chosen such that (since V(0) is continuous)

~*

A*(q-')B*(q-1)}u(t)]2

with probability 1

If 0w is not unique, then (4.1) is true for all the global minimum points.

inf

')B*(q- 1)

r/],(q-1)C,. -1. ,2 ++E L' A,(q..~-(~) q )v(t)J

If

Ez2(t) means

-

V(O)>_.V(Ov)+6

(4.2)

Se

As a second step a number No can be found such that is uniformly convergen0

(sinceWN(0)

(3.5)

] V(0)-Wu(O)]<.6/3 all 0en* if only

N>~No

Convergence properties of the generalized least squares identification method

Since A*(z) and B*(z) are assumed to be relatively prime, it is easy to show that (4.5) implies

Thus inf WN(O)~ fl*

WN(Ov)<. V(Ov)+ 6/3

.~*(q-l)_~A*fq-1) B*(q- 1)-mB*(q- 1)

and with use of (4.2) inf

621

Finally (4.4) implies

Wu(O)>t inf V(O)- 6/3 >>.V(Ov)+ ~6

f~*- $ ,

g~*- $e

C*(q-1)=_C*(q-I)

Thus the global minimum point(s) of WN(O) is (are) situated inside S,. This conclusion finishes the proof. []

Corr. 1. Assume that the system (2.1) is controllable, that the input signal is persistently exciting of order 2n (cf. section 3), and that v(t) = C*(q- 1)e(t)

(3.1)

s true. Then the estimate 0s, defined as the global minimum point of WN(0) in ~, is consistent.

Proof. Take f)* in the theorem as t). It remains to analyze the global minimum points of V(0). Using the expression for V(0) and simple calculations " C*" -I)

{3*(q- ')B*(q - x) ..,i

I

~,(q-1)~,(q-l)}u(t)/2 .,,J

'-,~*"tq -I)C*"tq - 9

[]

Corr. 2. Assume that V(O) has a local minimum point d}v L in f~. Then, at least for large values of N, there is a local minimum point 0~, of WN(O) in f~ such that ~w--'~, N--,oo with probability 1

1

V(O)= ½EL

which completes the proof.

-,2

+½E[A*(q-X)¢*(q-1)e(t)j

>- F.~*(q- 1)~_,(q- 1) . -]2 .~½EL A,(q_l)C,(q_l) e(t) j

=½n[KO+ ... ]2 >~½E[e(t)]2 Equalities are obtained if and only if

~*(q-1) * -1 * -1 A,(q_l){,~ ( q ) B ( q )

-A*(q- 1)B*(q- 1)}u(0 = 0 with prob. 1 ~*(q- 1)¢*(q- 1) A*(q- 1)C*(q- 1) - 1

(4.3)

(4.4)

Since u(O is persistently exciting it follows from (4.3), el. e.g. [11], [13] that

j,(q- 1)B,(q- i) _ A*(q- ')B*(q- 1) = 0

(4.5)

Proof. Let ~* be a small neighbourhood of 0~,. Apply the theorem. [] The implication of Corr. 2 is that V(0) contains valuable information also about local minimum points of WN(/~) for large values of N. Thus, the theoretical result in this paper requires a large number of data. For practical reasons it is interesting to know if they are applicable for a given number of data. In order to examine this problem, identification of simulated data, as well as plant measurements, were made. All computations were carried out on a UNIVAC 1108. In [13] a description of the programs used and the simulation results can be found. From a practical point of view it is more interesting to use real data than to consider simulated examples. It is also possible to satisfactorily test the relevance of the obtained models. For this reason only plant measurements are used as illustrations in the paper. For parametric identification methods the choice of model order is a crucial point. For the maximum likelihood method the following test quantity is proposed in [5]. When the order of the model is increased by one, the decrease of the loss function is AW. The test quantity t=NAW/W is formed. The model of the lower order is accepted (A W is not significant) if t is small. Since for the ordinary maximum likelihood case t is asymptotically Z2(3) distributed, see [5], it may be appropriate to apply this technique for the GLS method as well. Then the model of lower order should be accepted if t is smaller than 9. A straightforward application of such a test of order would in general result in more complex models for the considered processes. However, the orders of the models are not unreasonable. It is worth mentioning that the main purpose of the identifications of the plant measurements was

622

TORSTEN S6DERSTR6M

to investigate the possible existence of more than one local minimum of the loss function. It is to be noted that for a 'false' minimum point the estimated covariance matrix has statistically a dubious meaning.

r, (r)

1"0-

1,oT r,. Cr)

I'O-i

:

em n

The results of the identification are compared with models obtained with the maximum likelihood model used by e.g. /~STROMand BOHLIN [2].

.~*(q-1)y(t)=t*(q-1)u(t)+~*(q-a)e(t)

(4.6)

The results of the identifications of the plant measurements are illustrated by plotted signals and normalized covariance functions. The following signals are plotted in Figs. 1 and 4: the input u(t), the output y(t), the model output ym(t) ---l~*(q-1)/.~*(q-i)u(t), the model error e,,(t) ----y(t)-y=(t) and the residuals e(t). The plotted covariance functions, in Figs. 2, 3, 5 and 6, are r,(~), r~.(~)=E~(t)u(t+z) and rem.(z ). The 5 per cent confidence intervals are given by dashed lines. The time is given in sampling periods in all the diagrams. ~o [

"./V ib FIG.

at

2. Normalized sample covariancefunctions for the nuclear reactor, model 1.

~(r)

qu (v) 1.0,

1.0-

1.0-

Inpuf

I

il

I

~

'

1

,J

lb

Model I : Model ou*put

Fro. 3. Normalized sample covadanee functions for the nuclear reactor, modal 2.

9BO

Model I: Model error

o..

Large signal to noise ratios The following theorem characterizes the local minimum points of the loss function.

°o

~4 o

o

~

~

Modlet °

: Residvols 1

ol i0~,~

04[,

Model 2 : Model output

Model 2: Moael error

Theorem 3. Let the system (2.1) be controllable and of order n. Assume that the order of the model is n + k , k/> 0 and that u(t) is persistently exciting of order 2n + k (cf. section 3). Consider parameter estimates in f~, an arbitrary compact set. Then there is a constant So such that if So ~
°..,,.,L..E;I-"

,~*(q-1)=A*(q-1)L*(q-1)+o(1),

S~oo

(4.7)

~*(q-1)=B*(q-1)L*(q-1)+o(1),

S~oo

(4.8)

where L * ( q - 1 ) = l +llq - 1 + FIO. 1. Models of the nuclear reactor. The input is given in digital units and the other variables in MW. The sampling period is 2 sec.

. . . +lkq k - if k>0

=1 if k = 0

Convergence properties of the generalized least squares identification method Further L*(q- t) and ~*(q- t) fulfil 1.

L*(q-t)-----E*(q-1)+o(1), S ~

V3(ll ....

1"0

t,.'"

' remu (.t)

I'0-

(4.9)

C*(q-t)----C*(q-')+o(1), S~oo

(4.10)

~.+~)

where (11, • • • , lk, Ct . . . . . stationary point of

tr.,.,

623

is a

, l~, c D . . . , c , + k ) =E[L*(q-

t)C*(q- 1)v(t)]2 (4.11) ~LA 10 %

The matrix of second order derivatives of V3 in (1t . . . . . ~,+k) must be positive definite or positive semidefinite. (ii) If the matrix of second order derivatives of 1"3 in (11, • • • , ~'.+k) is positive definite, then there exists a unique local minimum point of the forms (4.7)-(4.11) and the terms o(1) can be replaced by 0(l/S). Further the matrix V" of the second order derivatives is positive definite in this point. ~o r

l~c. 5. Normalized sample covarianc¢ functions for the distillation column, model 1.

G(T) 1.0-

,L

1,ot

r.. (r)

re/~u(T)

1.0"

Xnpu't

6300

Sampling intervals

aA ~

r

Woo

-

-

Model

J: Model

o~pu:f

~

~10o

59~ -~oo[

/~G. 6. Norma!iT~clsample covarianc¢ functions for the distillation column, model 2.

Model I : Model error

/

Model

l

V,;~

All terms S0(1/~ are bounded when S tends to infinity. All terms o(1) tends to zero when S tends to infinity.

I : Residuals

o

Remark 1. The different assumptions have the following meanings.

°°I .......... :°'::Z_.: 20o

......

Madel 2 : Model error

(ii) The study of only parameter estimates in ~2 is motivated above.

~o 4o

(i) The restriction on the input signal is very natural. If the condition is not satisfied the loss function may have several global minimum points.

Model 2 : Residuals

(iii) The restriction on the signal to noise ratio is crucial as is shown in Theorem 4.

Ro. 4. Models of the distillation column. Digital units are used. The sampling period is 96 sec.

(iv) The assumption of controllability is essential. If the system is not controllable, there is a factor in common between A*(q-1) and

624

TORSTEN SODERSTROM

B*(q-~). Equation (2.1) can be divided by this factor, obtaining a controllable system of lower order than the original and with another correlation of the noise. If the system is not controllable, it is thus equivalent to increasing k. Remark 2. Note that all minimum points have the property

~,(q-1) B,(q-~)

~ , ( q - 1 ) - A , ( q - X ) + o(1), S--+oo

TABLE 1.

PARAMETER ESTIMATES FROM IDENTIFICATION OF THE NUCLEAR REACTOR DATA

Parameter dl d2 bo bi

b2 bl ~2 W× 104 x 102 Static gain

Model1

Model 2

Modelin [8]

-0.18_+0.06 --0.91_+0.02 --0.89±0.02 0.07!0.02 --0.01+0.02 --0-01±0.02 2.41 +_0.14 2"38i0.14 2.43±0.12 5"80+_0'21 4"01 _+0"19 4'02_+0"17 --1'08±0"38 --5-83_+0-16 --5"68±0.15 --0'79_+0-07 --0"06___0'04 Comparison is --0'12_+0.06 0"06_+0-03 impossible 3"41 3'47 3'45 2'61 2'63 2"63 7.95 7"07 7-50

Remark 3. If k = 0 the loss function has a unique local minimum point which satisfies ai=ai+O(1/S), S ~ m

i=1

bl=bi+O(l/S), S~m

i=l...n

~= oi + 0(1 iS), S ~

i=1 . . . n

An identification of the first order model gave the lesult:

. . . n

(4.12)

where C"*(q-~)=l+~xq-~+ . . . + ? , q - " and ( g l , . . . , ~,) is the minimum point of

E[C*(q- 1)v(t)]2

(4.13)

If further (3.1) is fulfilled then the point satisfies a i = a i, b i = b i, ~i=c~, i = 1 , . . . , n

Example 1. The data used in this example are plant measurements from a nuclear reactor in OECD Halden Reactor Project in Norway. The input is reactivity created by control rod movement and the output is the nuclear power. For a description of the experiment see [12]. ML identifications using the model (4.6) are reported in [8]. The system contains a direct term (i.e. the term bo in B ( q - 1 ) = b o + b x q - l + . . . ) which is easily handled by shifting the data. The test quantity t for comparing the models of orders 1 and 2 is 34.2, while the value is 3.3 when the models of orders 2 and 3 are compared. Thus the order two seems to be satisfactory. Two minimum points of W(O) were found for this order. The results of the identification are given in Table 1 and Figs. 1-3. It is seen from the figures that the differences between the models are small. Further (al, a2, ~x, ~2) of model 1 is close to (Ox, ~2, aa, az) of model 2. In fact, both models as well as the model in [8] may be simplified to a first order system 1

y(t) =2.4(1+2.6q- X)u(t)+ 1 - 0 . 9 q - i e(t)

(4.14)

if approximate factors in common and small zeros are omitted.

! i 2.396+ 6.234q -1 y(t)= l_O.O0012q_ 1 u(t) 1

+ 1 --0-918q -~ +O'O001q - ~ e(t) and a = 2 . 6 6 x 1 0 -2 which differs only slightly from the simplified model. Since the two models do not differ significantly, it is impossible to call any of them the 'best' or most 'correct' one. If (4.14) is an adequate description of the dynamical behaviour of the process then it is expected with Theorem 3 in mind, that there will be, at least, two different but equivalent models of second order. The models obtained by identification are in fact close to those expected models. Of course, this is a loose discussion according to the assumption that (4.14) describes the system adequately enough.

Small signal to noise ratios When the signal to noise ratio is low it turns out that the loss function V in general has no unique minimum point. Theorem 4. Assume that (i) u(t) is persistently exciting of order n (cf. section 3) (ii) there exist at least two different pairs of polynomials ~*(q- 1), t ~ ( q - l) and . ~ ( q - 1), t~*(q-1), such that

Vz(al .. • a,, ~1 • • • ~.) _["~*(q-X)¢*(q-1)H*(q -1) e(t) 12 has local minimum points with positive definite matrices of second order derivatives

Convergence properties of the generalized least squares identification method in (d:t

• • • at,, etl • • • e1~) and

a~,, e2:

(a2t

• • •

... e2J.

Then there is a number S : > 0 such that O
:l*(q-~)f,t?(q-~)+o(s) s - , o i=1, 2 ¢*(q- 1) = t~?(q- 1) + o(s) s-+o in

The basic idea of the proof is to first express {b~} {aj, ck} using

OV

~-~ ffi0, l<.i<<,n, and then to use the inverse function theorem applied to OV aV

0-a~=0, ~ =0, l<~i~.n.

An analysis of the second assumption and when it is fulfilled can be found in [13]. For example it is sul~cient that V2 has a minimum point with V'~ positive definite and .~*(q- t) 4=~ . ( q - t). A conclusion of the theorems 3 and 4 is that S has a strong influence on the uniqueness of the estimates. However, the 'dividing point' depends on the system itself. This reason makes it difficult to explicitly specify what large $ and small S mean8.

BOHLIN [6], has given results, which can be used to test if an estimate is the true maximum likelihood estimate. The test quantity involves sample covariances of s(t) and u(t). If, however, the noise level is high this method cannot be used successfully in the case described here. The minimum points of the loss function will give residuals st(t) and s2(t) satisfying st(O-s2(t)=O(S ~) so that also all possible test quantities will have a small difference if S is small. It can be argued that this fact is not so serious. When S is small, identification is a hard task and it is difficult to get accurate models at all. However, in the example below S is not unreasonably small, and yet it is shown that the loss function has at least two local minimum points. From the sample covariance functions no clear conclusions can be made.

Example 2. The system is a binary distillation column. The data have been received from Dr. D. J. Williams, who has performed the measurements at National Physical Laboratory, London. Results of maximum likelihood identifications are reported in [10]. The input signal is the reflux ratio and the output signal is the top product composition. The test quantity t for

625

comparing models of orders 2 and 3 is 108. Since the ML identification indicates a model of order 2 as reasonable, this order was considered in spite of the great value of the test quantity. For second order models two minimum points of the loss function were found. The results from the identification are given in Table 2 and Figs. 4-6. T ~ L B 2. P A S ~ r ~ esrlMATm n~oM o l ~ m e m , fflCArtoN OF ~ DIffrnA,ATIONCOLUMN DATA

Parameter

dt d2 bl b2

~l ~2 W " Static gain

Model 1

Model 2

Model in [10l

--1-53+_04)2 0.19+0.08 --1.54_+0.02 0.55+0.02 --0"02+_0.05 0.55+_0.02 0.24_+0.02 0.37+0.02 0.23+_0.02 --0.62+_0.02 0-22+_0.03 -0.60___0.02 0.82+-0"06 --1"51+0"08 Comparison 0.41 -+0.06 0.54+0.07 impossible 69.04 164.37 71.68 11-75 18.13 11.97 --18.77 0.507 --22.46

From Table 2 it is seen that ~*(q-t) of model 2 i8 very close to/~*(q-1) of model 1. With Theorem 4 in mind this is not astonishing. The model from [10] is very close to model 1, which means that the noise can well be modelled as

1) e(O

v(t) = as well as

v(0=

1

t)e(t)

with e(t) white noise. The values of the static gain indicate that model 1 gives the best description of the process. Also from the lower value of loss function at the corresponding minimum point, it can be expected that this model is to be preferred. The plots of the results are a nice illustration of the expected differences. From Figs. 5-6 it is noted that the residuals are most white for model 2 and most uncorrelated with the input for model 1. That means it is hard or impossible to choose the best model from these figures. 5. C O N C L U S I O N S

Some essential properties of the generalized least squares (GLS) method for identification of dynamical systems are summarized below. Part of the material is well known. The GLS method is an uncomplicated extension of the least squares (LS) method. Besides a program for LS identification only programs for administration and filtering are needed.

626

TORSTEN SODERSTROM

In the GLS method the noise can be modelled in different ways, and each way corresponds to some model structure. One such way is to model the noise as an autoregression. Only this version of the GLS method is treated in the paper. It can be interpreted as a special minimization of the negative logarithm of the likelihood function. Provided that the global minimum point of this loss function is found the GLS method becomes under mild conditions equivalent to the maximum likelihood method, and has nice asymptotic properties e.g. consistency of parameter estimates. The GLS method gives a very slow convergence close to a minimum point of the loss function. The method is thus inappropriate if great accuracy is required. An alternative, much more efficient minimization technique, was proposed in the paper. As a first step a LS model of order 2n is computed. As a second and last step a non-quadratic minimization problem is solved. In this optimization the data are not needed explicitly. Applied to data, without severe peculiarities, the GLS method gives good results comparable with the results of the more complicated 'ordinary' maximum likelihood method. The required conditions of the data are weaker than for the simpler LS method. For a sufficiently high value of the signal to noise ratio it can be shown theoretically that the loss function has only one local minimum point. The loss function corresponding to the GLS method may have more than one minimum point In this case the result of the GLS identification depends on the starting values of the parameter estimates. The existence of several minimum points can be shown theoretically for low signal to noise ratios. In practice it can happen also for reasonable values of this ratio. It is not always easy without intimate knowledge of the actual process to decide which of the models will be the 'best' or most 'correct'. Acknowledgments The author wishes to thank his thesis supervisor Professor K. J. Astr6m, who proposed this problem and provided valuable guidance for the solution. In particular, he pointed out the possibility of interpeting the GLS procedure as a special maximization of the likelihood function. Many stimulating discussion with Dr. I. Gustavsson and Dr. L. Ljung are also gratefully acknowledged. He is grateful for the measurements, which were

supplied to the Division of Automatic Control, Lurid Institute of Technology, by OECD Halden Reactor Project and Dr. B. J. Williams, London. The project was partially supported by the Swedish Board for Technical Development under contracts 7t-50]U 33 and 72-202/U 137.

REFERENCES [1] K. J. ASTR6M: Lectures on the Identification Problem-The Least Squares Method. Report 6806, Division of Automatic Control, Lund Institute of Technology (1968); a [2] K. J. ASTR6Manu T. BOmaN: Numerical Identification of Linear Dynamic Systems from Normal Operating Records. IFAC Symposium on Theory o f Self-Adaptive Systems, Teddington, England (1965). Also in Theory o f Self-Adaptive Control Systems. (Edited by P. H. ~ o r , r o ) . Plenum Press, New York (1966). 13] K. J. ASTR6Mand P. Ex,r~orF: Systemidentification--a survey. Automatica 7, 123-162 (1971). [4] K. J. Asa~6M and T. S6DU~STR6M: Uniqueness of the maximum likelihood estimates of the parameters of an ARMA model. IEEE Trans. Aut. Control AC-19, (1974). [5] T. Born.aN: On the maximum likelihood method of identification. IBM J. Res. and Dev. 14, 41-51 (1970). [6] T. BoriLtrq: On the problem of ambiguities in maximum likelihood identification. Automatica 7, 199-210 (1971). [7] P. E. CAirns: The Parameter Estimation of State Variable Models of Multivariable Linear Systems. Control SystemsCentre Report No. 146, The University of Manchester, Institute of Science and Technology (1971). [8] S. CARtSSON: Maximum Likelihood Identification of Reactor Dynamics from Multivariable Experiments. (In Swedish). Master Thesis, Division of Automatic Control, Lund Institute of Tedmology (1972). [9] D. W. CLAm~: Generalized least squares estimation of the parameters of a dynamic model. 1st IFAC Symposium on Identification in Automatic Control Systems, Prague (1967). [10] I. GusrAvSSON: Identification of dynamics of a distillation column. Report 6916, Division of Automatic Control, Lurid Institute of Technology (1969). [11] L. LJUNG: Characterization of the concept of 'persistently exciting' in the frequency domain. Report 7119, Division of Automatic Control, Ltmd Institute of Technology (1971). [12] G. OtSSON: Modelling and identification of nuclear power reactor dynamics from multivariable experiments. Prec. 3rd IFAC Symp. Identification and System Parameter Estimation, the Hague (1973). [13] T. SODERSTR6M: On the convergence properties of the generalized least squares identification method. Report 7228, Division of Automatic Control, Lurid Institute of Technology (1972). [14] T. S6Dr.gSrrt6M: On the Uniqueness of Maximum Likelihood Identification. To be published in Automatica (March 1975). [15] T. SODEgSTg6M: Test of common factors of identified models. Application to the generalized least squares method. Report 7328, Division of Automatic Control, Lund Institute of Technology (1973).