Frequency-domain system identification using non-parametric noise models estimated from a small number of data sets

Frequency-domain system identification using non-parametric noise models estimated from a small number of data sets

Pergamon A~rronrorico. Vol. 33. No. 6. pp. 1073-1086. 1997 1997 Elsevier Sctence Ltd. All rights reserved Printed in Great Britain 0005.1098197 $17.0...

1MB Sizes 0 Downloads 57 Views

Pergamon

A~rronrorico. Vol. 33. No. 6. pp. 1073-1086. 1997 1997 Elsevier Sctence Ltd. All rights reserved Printed in Great Britain 0005.1098197 $17.00 +o.OO

0

PII: SOOO5-1098(97)00002-2

Frequency-domain System Identification using Non-parametric Noise Models Estimated from a Small Number of Data Sets* J. SCHOUKENS,t

R. PINTELON,t

G. VANDERSTEEN?

and P. GUILLAUMEt

We present a frequency-domain identification method using periodic noise weighting is automatically excitation signals. A non-parametric generated without user interaction. The technique allows consistent closedloop identification without knowledge of the reference signal. Key Words-Identification: errors-in-variables.

frequency

domain:

closed-loop

Abstract-This paper discusses the problem of identifying a linear system from the frequency data when the measurements of the input and the output signals are both disturbed with noise. A typical example of such a problem is the identification of a system in a feedback loop. It is known that this problem can be solved using errors-in-varaibles methods if the covariance matrices of the disturbing noise (on input and output measurements) are a priori known. It is shown that the exact covariance matrices can be replaced by the sample covariance matrices: the system can be identified from the sample means and sample covariance matrices calculated from a (small) number M of independently repeated experiments. It is shown that under these conditions the estimates are still strongly consistent for an increasing number of data points N in each experiment (N+ 0~) if M 2 4. The loss in efficiency is quantified (M 26). and the expected value of the cost function (M 2 4) and its variance (M 2 6) are calculated. 0 1997 Elsevier Science Ltd.

identification:

noise characterization:

of these noise representations to a specific domain, it turns out in practice that time-domain methods are usually combined with parametric noise models, while frequency-domain identification methods are often combined with a non-parametric frequency-domain weighting. Classical time-domain identification methods like the prediction error techniques estimate simultaneously the parametric noise model with the plant model. Non-parametric noise models are not used in the time domain, since they lead to a full weighting matrix resulting in a very time-consuming calculation of the cost function. if N represents the number of Indeed, time-domain samples then a non-parametric noise model requires a number of multiplications proportional to N* to evaluate the cost function, while a parametric model only needs O(N) multiplications. A parametric noise model also presents some disadvantages. It is not obvious that the power spectrum of the noise can be described by a (low-order) rational model (this results in a loss in efficiency, and can even lead to inconsistent estimates), and it requires a second model selection problem to be solved. The additional noise model parameters increase the complexity and reduce the convergence region of the nonlinear optimization schemes. Although a parametric noise model can also be used in frequency-domain identification, non-parametric noise models are often preferred, avoiding the above mentioned disadvantages. This is possible because in many situations the full weighting matrix can be replaced by a diagonal one, since in most practical situations the noise on the frequency-domain measurements at different frequencies is not correlated

I. INTRODUCTION

The identification of linear dynamic systems is very often formulated as a weighted least squares problem. The weighting matrix is added to improve the efficiency of the estimates (weighted output error methods), or to guarantee the consistency of the estimates (errors-invariables methods). In the identification scheme, either a non-parametric or a parametric representation of the weighting can be used. Although there is no formal reason to link one *Received 10 November 1995: revised 22 March 1996: revised 11 July 1996; received in final form 28 November 1996. This paper was not presented at any IFAC meeting. This paper will be used as the basis for a contribution to the 11th IFAC Symposium on System Identification (SYSID’97), which will be held in Fukuoka, Japan during 1997. This paper was recommended for publication in revised form by Associate Editor B. Ninness under the direction of Editor Torsten Siiderstrom. Corresponding author Dr J. Schoukens. Fax +32 2 629 28 50: E-mail [email protected]. 1‘Vrije Universiteit Brussel, Department ELEC, Pleinlaan 2. 1050 Brussel, Belgium. 1073

1074

J. Schoukens et al.

(Brillinger, 1981; Schoukens and Renneboog, 1986). Hence only the noise (co)variances at each frequency have to be measured, and the number of multiplications required to calculate the cost functions is again proportional to N (this time the number of considered frequency lines), as it is for time-domain identification schemes in combination with parametric noise models. Moreover, the variances can be easily estimated. The Fourier coefficients of the input and the output are measured at N (different) frequencies and grouped in a data set D. The sample (co)variances and sample measns of these Fourier coefficients are calculated from M independent realizations Dk, k = 1, . . . , M. This paper is completely focused on the analysis of the non-parametric approach. Assuming that the noise on the measurements of the Fourier coefficients (input and output) at each frequency is jointly normally distributed and independent from one frequency to another (see below), the following will be shown.

(9

The weighted least squares estimator using the sample-variance-based weighting matrix is strongly consistent for N + WJif at least four independent data sets are available.

(ii) The loss in efficiency due to the poor quality of the sample variance for small values of M is not larger than 15% (M = 6). (iii) The estimator is also consistent when the input and output noise are correlated. This includes consistency for feedback systems without needing to select a correct noise model or to use an exactly known external reference signal as required with parametric noise models. (iv) It is possible to calculate the expected value and the variance of the cost function, simplifying significantly the model selection problem. Section 2 of this paper formalizes the problem statement. The proofs of the above claims are given in Section 3. In Section 4 a comparison is made with existing methods, and the results are illustrated by simulations in Section 5. In this section also the connection between open loop and closed-loop issues is elaborated.

Fig. 1. Measurement setup.

function H(w) is measured at angular frequencies wk, k = 1, . . . , N, resulting in a set of measured (indicated by the subscript m) Fourier coefficients &(wk) and Ym(mk) (Remark that &,(wk) and Y,(wk) are not frequency-response measurements-they are really measurements of the Fourier coefficients describing the input and the output signal). These measurements can be made using periodic excitations in combination with discrete Fourier transform techniques (Schoukens and Pintelon, 1991). In order to avoid transient effects, the measurements can only start when the transients became negligible with the steady-state response. compared Leakage problems are avoided by measuring an integer number of periods, and aliasing problems can be fully controlled by filtering the input and output signal with good anti-aliasing filters before sampling them. More information on how to make these measurements can also be found in Schoukens et al. (1993). In this paper we do not deal further with this measurement problem, but start immediately from the frequencydomain data &-,(wk) and Y,,,(wk). These are disturbed by noises NU(wk) and NY(wk) modeling the generator noise, the measurement noise and the process noise. All these noise contributions are in general not independent (e.g. in feedback). Assumption 1. The measuremnts &,,(wk) and Y,@kj are related to the exact values cO(ok) and &(Wk) (indicated by the subscript 0) by

We

&,,(ok)

=

uO(mk)

+

Nu(mk),

ym(wk)

=

%(Wk)

+

b(wk).

make the following

&n(ok) 20

2. FORMALIZING THE PROBLEM STATEMENT

Consider the setup in Fig. 1. The spectrum of the input and output signal of a single-input, single-output linear dynamic system with transfer

r,w

UAo,)

zn

=

=

(zo(d,

=

(Znl(4,~

(1)

definitions:

zO(mk)

+

(2)

Nz(ok),

. . . , zo(wv))T~ . . , zrn(~N))T,

with superscript T indicating the transpose. Assumption 2. (i) The

noise

NZ(wk) is zero-mean,

complex

Frequency-domain normally distributed covariance matrix

identification

(Pin&bono,

1993) with

‘wz(~k)x(%)Hl = (+*u(wJd&J/J [ &(Ok) f&(0/J ’

Cz(%) =

1

NW+JW~J=1= with superscript transpose.

(3)

0 W L

H indicating the Hermitian

(ii) The noise at different independently distributed: E[NL(~,JNz(~,)H]

=0

frequencies Vk # 1.

is

using non-parametric

cost function. This knowledge allows the user to detect model errors, and to guide him or her through the model selection process. In all these results it is supposed that the noise (co)variance information is exact and known a priori. In the next section it will be shown how if the exact these results are modified (co)variances and the measured Fourier coefficients are replaced by respectively the sample (co)variances and sample means obtained from a small number of repeated experiments.

(4)

Note that (3) and (4) imply that the real and imaginary parts of Nu(ok) (and Ny(ok)) have equal variance #,(wk) (or $~*v(w~)), and also that they are independently distributed. It is known that these noise assumptions are asymptotically valid after a discrete Fourier transform (Brillinger, 1981) and the properties are approximated very well for practical lengths of the data records (e.g. 1024 data points in a record) (Schoukens et al., 1986). If there are no model errors, the exact Fourier coefficients obey the following system relations: yo(%) = H(WJG(%).

(5)

In this paper we shall also deal with the situation where model errors are present. In Pintelon et al. (1992) it is shown that the ML estimates of the model parameters in H(w, 0) = B(w, O)/A(w, f3) (with A(w, e), polynomial functions in j, B(w 0) parameterized in f3) are found by minimizing the following cost function:

1075

noise models

3. THEORY

In this section the cost function (6) will be reformulated, using this time the sample (co)variances instead of the exact values calculated from a small number M of independent measurements. Assumption 3. The measurements

U,,,,,(wJ, Y,,,,,(w&

I= 1,. . . , M; k = 1,. . . , N,

63) are M independent realizations of (1) with W+) and yO(wk) realization-independent variables. _

-

Define U,,,(o,J and Ym(wk) as the sample means, and G&(wk), &$(ok) and &,(w~) as the sample (co)variances calculated on the measurements (8), using the following definitions for the sample means and sample (co)variances of the random variables r and s, where an asterisk indicates the complex conjugate:

K(e, Z,) = $,

IB(w‘, WX%)

- A(%, e)Y,(w,)l*

x G&G+) lB(~k, @I’ + &(wJ

- 2 Re [&bkkWk,

lA(~,u U’

f4Bbkj 69*1>-‘. (6)

It is known that the minimizer of the cost function (6) is a strongly consistent estimate. In Schoukens and Pintelon (1991) it is also shown that for &,,,_= argmin K(8, Z,) the expected value and the variance of K become

-WI = Knoise + Godeb Nd iTKnoise + 2Krndel7

E[K(eML,

(7)

with dim (0) - # parameter Knoise

=

l

-

constraints

Define the equation as

error of the Ith realization

(10) The sample mean (e(wk, 0)) and sample variance (B$(e)) of e,(wk, e), 1= 1, . . . , M, can be calculated as linear combinations of the sample means U,,,(W~), E(wk) and sample variances &(ok), ~?~yjo~), &(c+).

2N

and Kmodel the model error contributions

to the

Proposition

1. Under

Assumptions

1-3,

the

1076

J. Schoukens et al.

equation error el(wk, f3) is a complex normally distributed variable with variance &@)

= g*u(%) IN wk, 0’ + v*Y(ok) IA(mk>@I’ - 2 Re [&(G)-+~,

e)B(mk, e)*].

The proof for the exact variance af,k( 8) is completely similar. Proposition

2. Under

expression 0

Assumptions

1-3,

The sample mean e(o,_ 0) is given by

- A(@~, e)ym(wk),

e(wk, e) = B(w~, e)u,(w,)

&h,(tik,

e)

=

Re

ti

.el(wkt

e)

[

-

+k,

e)

+Jk,

e)

&,k(e)

(II) and the sample variance e&(0) a:,,(e)

iw

= c%+) -

2

Re

el(wk,

is

d,(Wk,

@I’

+

c*Y(wk)

[&(~k)A(Wk,

IAh,

wbk,

@*I.

Proof

is

linear combination of which are complex &n,l(wk) and ym,l(wk), normally distributed. As a consequence, e,(mk, 0) is also complex normally distributed. (ii) The sample mean is e)

=

ti

e)

-

h

a’

(12)

(i) e,(wk,

e)

%kte) wkt

the

variables

a

1’ 1

are independently, zero-mean, normally distributed variables with unit variance (&, = c&, = 1). Although the actual values of &&(&& 0) and &&(wk,f3) depend on 8, their distribution is B-independent. Proof

We have

el(wk,

=fiRe[E[

e)

-

~

+k,

e)

(e)

]}=o.

e,k

2 ym,dwk)

A(%, 6) i

-

I =

1

-

-Abk9

em&k)

%kl

Similarly, E[&z,(Wp, e)] = 0. Because the distribution of e,(@, 0) e(tik, e) is circular, (3) it is known (Pincinbono, 1993) that E[{e,(wk, 0) - F((wk, e)}‘] = 0, and SO E[kRd%

6)

+ &(wk?

e))‘i

e)ydwk). =

E[@(@k,

e)‘]

-

E[&(wk,

e)‘]

(iii) The sample variance is + =

2jE[&(Wk,

(dR,

-

e)dbk7

&,)

+

e)]

j &,d,

= 0,

x

e) -

[el(wk,

e)i*

+k7

(13)

so that 2 OER,

-

2 g~R,el,

2 ffdp

0.

=

(14)

For normal distributions, a zero cross correlation also implies independence. Because -

A(%

-

e)[yrn.,bk)

ymbk)li2

el(wk,

E

e)

[I

-

+k,

e,

%k(e)

II

* = 1

and +

z

e)i’ -&

iy(wkf

E

lym,l(~k)

- 2 Re A(o~, 6)f+k,

e)*

-

%&k)l*

e((wk, [I

e>

-

%kte)

+k,

e)

II

* =

dR,

+

2

it follows from (14) that &, = a:,, = 1.

&

dI,

cl

i

Remark

1. Propositions 1 and 2 have been proved without making any assumption about the model. Hence they are valid in the presence of model errors.

x

5 I=1

[ym,dwk)

x

[&n,l(wk)

-

-

u,(w,)l*}.

It follows immediately &:.k(@ =

&:(wk) -

2 Re

ym(wk)l

from (9) that

iB(Wk, {&bk)A(Wk9

e)i’

+

&(mk) e)w)k,

Define I+&

e)i’ e)*>.

@k(e) = +?:,k(e).

(15)

Frequency-domain

identification

Corollary 1. E(wk, 0) is a complex distributed random variable with a&(e) and sample variance B&(e).

normally variance

Proof This follows immediately from the observation that P(ok, 0) is l/M times the sum of M independent, identically distributed comq plex normal random variables.

using non-parametric

The cost function can be rewritten as

with

In this paper we shall not deal with the degenerate situation where gc,k is zero for some values of k.

Proposition

3. Under

Kk(ol

and

&$nition

distributed.

1. S = (6 : ~~&!q

# 0, &(o)

# 0).

A new cost function R(e, Z,) is defined using the sample means and sample (co)variances of the same M independent data sets: R(e, Z,) -

-

= i $ k

x

-

(

;

Re

mWkj

e)um(Ok)

-A(%

@ym(wk)i2

I @t&k)

iB(Wk,

[&&k)A(Wk,

8)’

+

e%wk)

@)B(Wk,

iA(Wk,

a’

e)*i$’

(16) This is the non-parametric equivalent of the classical identification (e.g. prediction error methods), where often a parametric noise model is identified together with the system model. Note that the sample variance @k(e) should always be calculated using the full expression. The sample covariance should be added, even if it is a priori known that & is zero, otherwise G&(e) would be no longer the sample variance of e(wk, 6). The estimates are obtained by minimizing the new cost function.

1077

noise models

zrn)

Assumptions 1 and 2, independently are

ck(o)

Proof: The sample mean and sample variance of independent realizations of a normally distributed random variable are independent (Stuart and Ord, 1979). Because Kk(8? zm) = [F((wk, ~)~2/g:,k(~) only depends on the sample mean, and ck(8) = ~~,k(~)/ci~,k(~) On the SaIIIple variance, it is clear that they are independent. 0 Remark 2. This proposition

allows the same data to be used for making the noise analysis (measuring the variances) as for identifying the model (by using the sample mean values in the cost function). Proposition

4. Under Assumptions expected value E[c,@)] is independent is equal to

1-3, the of 8, and

(18) for M > 2. Proof.

Consider

2(M - 1) -!ck(e) =

e:k(e)

2(M - 1) -

dk(e)

Definition 2. 8 = arg minesS z?(e, Z,). Assumption

2(M - 1)

i =

5. 6 is an interior point of S.

The properties of the estimator 6 will be studied, without assuming that the exact model belongs to the model set.

2+

I

+

{Im [el(wk?

x

[i

= g =

$ ({Re

I

eR2

e,

-

cbk?

e)i2)

1

4. S is an identifiable

parameter set for the exact measurements Z,: there exists a unique global minimum point &iJZ,) = arg minetS K( 8, Z,) VN > N,, including N = x.

Assumption

$ (& ,$

le/(wkt

=

[e/(wk? 0)

-

6)

-

“r.k(e)

+k,

+kr

e)i}‘)

[ERdWk,6)” + dbkl

@‘I

@)I>’

dk(o)]-’

+ El’.

(19)

J. Schoukens et al.

1078

The expression in (19) consists of the sum of two independent central X2-distributed variables with M - 1 degrees of freedom (from Proposition 2) resulting in a x2 distribution with 2M - 2 degrees of freedom. The first- and second-order moments of l/x, where x has a x2 distribution with Y degrees of freedom, are studied in Appendix A, starting from the moments of an F distribution (Stuart and Ord, 1979): 1 2 var - = 0x (v-2)*(v-4)’

It follows immediately from Definition 3 that Z,) = 0, which proves the statement. Cl

K(&,

Proposition 6. Under Assumptions

WW,

Gl)l V8 E S, VM > 2.

Proof

It follows immediately Proposition 3 that

from

(23)

(17) and

(20) WQe,

Consequently,

Z,)i = E[ i $ K,(e, Z+,(e)] k

var [‘@)I

1 and 2,

(M - l)* = (M _ 2)2(M _ 3)’

(21) 0

Note that the moments of ~(0) are 8independent owing to the fact that ER, and &I, have e-independent distributions (Proposition 2). 3.1. Strong consistency In this section it will be shown that 8 is a strongly consistent estimate if there are no model errors. In the case of model errors the estimate strongly converges to the parameters that minimize the cost function calculated on the noiseless measurements. In order to deal with the model errors, the following definitions are made.

=

$ k

The proof then Proposition 4.

E[&@,

zrn)i&k@)l-

(24)

1

follows

immediately

from cl

Remark

3. This proposition also gives an intuitive insight into why we can replace the exact (co)variances by (poorly) estimated values. It turns out that the effect of the errors on the estimates is averaged over an increasing number of frequencies owing to the independence of the residual (g( ok, 0)) and the weighting factor (based on &k(e)). Proposition 7. Under Assumptions

1-5,

argemin E[K( f3, Z,)] = emin*

Proof.

It follows from (6) and Proposition

(25)

1 that

E[K(8, Z,)] = K(8, Z,) + 1.

Definition 3. 0, are the exact parameters

if there are no model errors: YO(ok) = H(&, wk)UO(wk) and &i,(Z,) = arg rnineEs K(8, 2,).

i

1

(26)

The proof follows immediately from Proposition 3, Assumptions 4 and 5, and the definition of 0

kina

The parameters e,i”(Z,) depend on the data Z0 in the case of model errors. For notational simplicity, the explicit dependence of &,i,(Z,) on Z,, will be omitted in the rest of this paper. Proposition 5. Under Assumption

4, e&Z,)

=

Combining Propositions 6 and 7 immediately in the following proposition. Proposition 8. Under Assumptions

l-5,

argemin E[R(e, Z,)] = emin

f$, if there are no model errors. Proof

(27)

Proof.

K(& ZO) = &5,

results

arg emin E [R (e, Z,)] IB(W, e)uh4

x b*u(~~d IBM

- 2 Re [d+~~)%k,

-Ah, e)? + QS~

wh,

e)yobd*

M-l = argemin [K(e, Z,) + l] ~_2

1

IWJ~, a* e)*l)-1. (22)

= arg min K( 8, Z,) = E&in. e

1

Frequency-domain Assumption

(persistent

6. The experiments

excitation)

identification

are informative

so that

WW,

Z,)] = WV.

9. For

8 E S, the following

using non-parametric

1079

noise models

The second equality is valid because &(0, Z,) is independent of &(e, Z,) for k Z 1. Using the rule

(28) var (xy) = u~u~ + &[y]*

Proposition

state+ E[x]*a;

ments are valid. (i) R( 0, Z,) and E[R( 13,Z,)] functions of 8.

x and y, (32)

are continuous (31) becomes

(ii) Z&(0, Z,,) and E[(&(8,

for independent

Z,))‘]

are bounded

Vk.

(putting

x = &(8, Z,)

and y =

ck(e))

Pro05

(9 R( 0, Z,)

and E[R( f3, Z,)] are rational functions in 0, and their denominator is not equal to zero on S.

(ii) Using the same argument as in (i) and the numerator of that noticing only depends on secondE]K(e, Z,)]*] and fourth-order moments of normally distributed noise, the statement is proved. 0 Starting from the previous intermediate the following theorem can be formulated.

Proof. It is known from Proposition 6 that E[R(e, Z,)] attains its global minimum only in emin. Next, uniform strong convergence (w.r.t. 0) of R(e, Z,) to E[R(e, Z,)] is shown: Z,)]} = 0.

(29)

Consider the cost function (17):

R(e,

2,)

=

with 5 Bk(~Zm’, k=l

Because a$, 5 E[K:], (,!+I)*, (33) becomes :hX .W(e,

(33)

gz I E[cg] and E[x*] 2

Z,) - zG(e,

Z,)]]*]

results

Theorem 1. Under Assumptions l-6, the estimator 8 converges strongly to emin for M 2 4.

a;.s=m {R(e, Z,) - E[R(e,

+ d,@[Kk])*).

(30)

It is known from Appendix B and Proposition 9 that each of the terms in the sum is uniformly (in 0) bounded, so that (34) converges uniformly in l/N, and consequently

R(e, z) - qR(e, z,)] = oMs(iV*).

Because the terms in (31) are independently distributed, mean square convergence also implies a.s. convergence (Stout, 1974); and a.s. convergence (uniform in 8) of cost functions that are continuous in 8 implies that (Soderstriim, 1974) a.s. lim (8 - f3,,,) = 0. N--t%

(31)

(35)

q

3.2. Loss in efficiency In this section the influence of replacing the exact (co)variances by their sample estimates on the covariance matrix of the estimates is calculated. The analysis consists of two parts. First the deviation of the estimates from their mean value due to the measurement noise is calculated. Next these results are used to obtain an asymptotic expression (as N-+ m) for the covariance matrix.

1080

J. Schoukens et al.

During these calculations, second-order moments of the first and second derivative of the cost function with respect to the parameter will be needed. In order for these moments to exist, M should be equal to or larger than 6. It turned out from the simulations that this constraint can be relaxed to M being larger than or equal to 4 if regularization is applied. This procedure will be explained first. 3.2.1. Regulurization. Consider the cost function (16) consisting of the sum of the terms ]fF(ok, 6)]‘/8$,,(0). The regularization procedure consists in removing all contributions from the sum for which B&(0) < e, where E is a constant such that E <<< min u$( f3). Theorem 2. The regularized dure is strongly consistent.

estimation

In practice the value E can be chosen only as a very small value compared with is unknown. Hence e(wk, e), since o&(B) contributions are found to be small after inspection of the measurements, and are certainly not independent of the measurements, as is required in the previous theorem. Consequently, Theorem 2 does not guarantee the consistence of the regularized estimator as it is applied in practice. It only gives a motivation why it is still possible to get reasonable results on the second-order moments with M = 4. It should be clear that preferably M = 6. d= deviations. Define 3.2.2. Parameter emin + S. The dependence of 6 on the measurement noise z can be found by considering the Taylor expansion of the cost function i?(&,,i, + 6, Z,) with respect to 6, and then imposing the stationary condition (a/N)&(e, Z,) = 0 on this expression (for notational simplicity, the dependence of ck(8) on 8 will be omitted):

6, Z,) = k(%i,,

Z,) +

Z,)))‘lO_

*

-j$(e,

Z,).

(37)

It is shown in Appendix D that a.s. N_x lim ($zQe,

Zln) 1

6=tP

-~-$we

zd

= 0,

(

) ~=%m

uniformly in 8.

(38)

Combining (37) and (38) (using the fact that a.s. limN,, e* = emin)finally results in a.s. lim VX 6 + E N-+x (

Remark.

+

6 = -(-$z@,

proce-

Proof. This is exactly the same as the proof of Theorem 1. Only the expected value E[ck(0)] should be reconsidered. In Remark 1 of Appendix B, it is shown that the expected value is still a constant independent of 0. 0

R(%in

and solving for S results in

~-1

-$B(e, z-,) = 0,

with

The expression (e/a@ (0, Z,) can also be further simplified. It follows immediately from (17) that

=

kk$, (Ck$fkter 22 + K,te, .w$$ ve (4)

and it is shown in Appendix A414 with regularization) a.s. Nix lim

(

C that (M 2 6 or

5 K(& Z,) - ; $, ck $G(@, = 0

and hence the second neglected, so that

part

Z-,) ve,

(41)

in (40) can be

M-2

a.s. lim V% 6 + ___ M-1F-l N+=

($te, ZJ)T~

with 8* = t&,,in + (1 - t)d and t E [0, l] (meanvalue theorem). If not otherwise, mentioned all derivatives w.r.t. 8 are evaluated at 8 = emin. Applying the stationary condition of (36) w.r.t. 0

It should be noted that this solution is in the presence of model errors. 3.2.3. An approximate expression covariance matrix. Making use of the dence of the sample mean and sample

also valid for

the

indepenvariance,

Frequency-domain the following asymptotic is found:

identification

(for N-, ~4) expression

ee=E[&sT]= (E)2F-l

using non-parametric

1081

noise models

and ck(8, Z,) are no longer independent because they both depend on e(Z,). To get around this problem, the following approximation is made: R(8, Z,) = R(e,.

+ 6, Z,)

$ k$1 -WE [$-fk(B, zn) ’ ($j Kk(& Zi$])F-‘* Using the result immediately that

of Appendix

e,=f+(j$, x ($ Kk(& M-2 = =F-‘E

(43)

B, it follows

R(e, Z,) =

+$K,@%n)

and

Using (38), (41) expression is found:

(42),

the

following

~(&in,.G)

Z,))T])F-’ [

-$zqe, z,)

x (5 qe, zm))‘]P.

(44)

Because

ce = PE[-$e, z,)(

The derivatives have to be evaluated in (emin, Z,). 3.3.1. Calculation of the mean value. This expression (48) can also be written as (evaluating the derivatives in emin)

-$zw, z_))~]F-~ (45)

is the covariance matrix of the parameters co(variances) are exactly known, c

e

/f-2 -

M-3

CO,

if the

(46)

with M 2 6 or with M 2 4 using regularization. It is clear that the loss in efficiency is (M - 2)/(M - 3), which is quite low even for small values of M.

and the expected value becomes WW,

=E[~(e,i,,Z,)]-ftr{~~-l

3.3. Mean and variance of the cost function in its global minimum In this section the mean value and variance of the cost function are calculated. Because of the availability of the non-parametric noise model, it is possible to make a prediction of the value of the cost function that is expected to be observed at the end of the identification process if no model errors are present. Consequently this information can be used during the model selection process by comparing the actual value of the cost function with its expected value. For this reason, the mean value and the standard deviation of the cost function are calculated here. More information on this topic can be found in Schoukens and Pintelon (1991). Some precautions should be considered when calculating E[R(8)] and (am, since Z?(@,Z,)

Z,)]

X E $2

f#ktev

zrn>

[

x

=

($

fi

ii,.

Z,))T]E,d,}

E[K(Q,i,,

Cdl

-fEE[($K(B, x j&e,

zmgTP

zd].

(50)

Using the ‘classical’ result (7) and

E[K(%i,,-Cdl= E[K(%4dl+ &, with p = dim 8 - ## parameter

constraints,

(51)

and

1082

J. Schoukens et al.

neglecting the second-order noise contributions in E[(aK/M)TF-’ aK/M], so that

E[($K(@ Z,))TF-‘-$K(B, Zm)] =g9

(52)

it is found that WW,

already been studied in the literature (Villegas, 1961, 1982; Fuller, 1980; Anderson, 1984; Amemiya and Fuller, 1984). The typical formulation as followed by Amemiya (1984) is given below. Consider yt=&,+xtP,

M-l -- 1 N(M-3)(M-2):

Y,= yt + e,,

x, = x, + 3.3.2. Calculation of the variance. For the variance, it is mostly enough to have a rough estimate. This can be obtained by neglecting the influence of S and calculating (54) o&o, Z,) = o~W,,.*Z,P This variance (variance of R(emin, Z,) = Er=‘=,[&(&,in, Zm)ck(&,,in, Z,)l/N) can be calculated using the rule (32): (M - 1)2 ‘T&8) = (M _ 2)(M _ 3) &G, (M - 1)2 (M - 2)2(M - 3) (55)

with M 2 6 or with M 2 4 using regularization. This result shows that the variance of the ‘sample cost function’ consists of two contributions: the scaled variance of the ‘true’ cost function plus an additional term depending on the model errors. The evaluation of (55) depends on the presence or absence of model errors. Both situations will be considered. (i) No model errors present. In this case K(8) is the sum of normally distributed residuals, obeying p linear(ized) equations. Hence 4NK(8) has a x2 distribution with 2N -p degrees of freedom. So the variance of K(8) is (N The second term can be ip)lN2 = l/N. replaced (E[K,(8)] = 1) by (M - 1)’

1

(M - 2)2(M - 3) ??’

1 (M - 1)” a&$, = N(M-2)2(M-3)’

(56)

(ii) Model errors present. In this case the full expression (55) should again be considered. WITH EXISTING

, N,

u,

t=l,2

,...,

1

N,

(58)

where e, and uI are unobservable error vectors of dimensions r and k respectively. Let E, = (e,, u,) be independent and identically distributed with zero mean and covariance matrix Z,, and assume that the E, are independent of the Xj for all t and j. The asymptotic properties of the maximum likelihood/weighted least squares estimators are studied for N + ~0. It is clear that this situation differs from the problem studied in this paper, because in (58) an infinite number of measurements under the same noise conditions is considered, while in the problem formulation of this paper only a (small) finite number of observations under identical conditions are available. The problem considered in this paper using the formalism of Amemiya and Fuller (1984) is

X.1= Y,I + e,l xt.,=%,I + u,,/

t = 1,2, . . . , N; I= 1,2, . . . , M, (59)

where &,,I= (e,l, u,,J has zero mean and covari-

ante matrix &B,I. In this paper consistency was shown for the weighted least squares estimator based on the sample covariance matrices e,,,, for N+ ~0while M remains finite (small). 5. SIMULATIONS

In order simulations system is closed-loop

to illustrate the previous results, two are made. In the first, an open-loop considered, while in the second a system is studied.

Simulation 1. Three situations (M = 4, M = 5, M = 6) were considered. For M = 4 and M = 5, the regularization was not applied, although a

resulting finally in

4. RELATION

,...

(57) where PO is a 1 X r vector of parameters and 0 is a k X r matrix of parameters. Y, and X, are observations satisfying

&dl

+

t=1,2

RESULTS

The errors-in-variables problem with unknown covariance matrix of the disturbing noise has

very large number of simulations were made, no outliers could be detected. The number of simulations S was respectively 96 095, 25 207 and 27 150. This results in a 95% uncertainty bounds S = 25 207) of 0.9916~~,,,, < (T,,~< (for l.O084a,,,,,. To avoid an excessive computation system is considered: time, a zeroth-order H(s) = a0 with a, = 1. This simplification does not jeopardize the value of the simulation,

Frequency-domain

identification

because the theoretical results are independent of the model complexity. The simulation parameters were N = 1000, U(oJ = ej4k, with & uniformly distributed in [0,2x), (T&C+) = 0.02, gy(wk) =O.Ol and c&ok) = 0. In all the simulations the mean value of the parameter was equal to the true value within the 2a uncertainty bounds, which confirms the consistency claim. The other results of the simulations are given in Table 1. In this table the column ‘Using exact variances’ shows the results with a priori known exact values of the variances, while the next column ‘Using sample variance’ contains the results obtained using the sample variances based cost function R. The third column gives the ratio between the results of column 2 and 1, while the last column gives the theoretical prediction of this ratio. It can be seen from this result that there is a very nice agreement between theory and simulation.

using non-parametric

Fig. 2. Measurement setup for simulation with feedback.

so that lU,(w,)l = 4 and IYO(wk)l= 3. The process noise was set to dominate in the simulation so that high correlations are present. This results in the following (co)variances: (+2u(wk)= M/1(%)

Simulation 2. A system with feedback is studied. The setup is given in Fig. 2. In this simulation A4 = 6. The simulation parameters were again N= 1000 and X(ok)=e’ -6k, with C& uniformly distributed in [0,2x). This time four noise sources were considered: ZV,,( ok), N,( wk), y2 wk are independent zero-mean &++.) and N ( ) noise sources with respective standard deviations Cru, w/J = 0.2ti, (+Jw/J = o.oM, CrY,(W/J= 2 J 2 and arz(w,J = O.OlSv?. Because of the feedback, the noise on L&,(o,J and Ym(wJ is strongly correlated. Using the notation from Fig. 2, the following relations hold:

tw

Table 1.

uu
1083

noise models

+ M+Jk)

+ &(%)

= 0.898,

&/(W)

= M/1(%)

- &G&k)

= -0.871. Note the very high correlation between the input and output noise (IpI > 0.95). The number of simulations was 7774. The exact value of the parameter is & = 2, while the mean value of estimates is 6 = 2.0039, with a standard deviation ue = 0.091. Keeping in mind that the uncertainty on the averaged value is much smaller (~6 = O.OOlO), a small bias is detectable: (f3 - &)/a, = 0.043. If the cross-correlation term was omitted, a significant bias was detected, &, = 2, 6 = 4.47 and ue = 0.49. The other results

Verification on simulations: open loop Using sample variances

3.535 x 10-4 3.173 x 10-b 2.875 x lo-“

4.978 x 10-d 3.872 x lo-“ 3.337 x 1o-4

1.408 1.220 1.161

1.414 1.224 1.155

999.64 999.62 999.65

1499.00 1332.70 1249.27

1.500 1.333 1.250

1.500 1.333 1.250

80.81 59.52 50.36

2.562 1.889 1.597

2.598 1.886 1.613

Standard deviation cost (NK) M=4 31.54 M=5 31.51 M=6 31.54

Ratio. simulations

Ratio, theoretical value

Using exact variances

1084

J. Schoukens et al. Table 2. Verification on simulations: closed loop system Using exact variances

Using sample variances

Ratio, simulations

Ratio, theoretical value

guu,, M = 6 (S = 7774)

0.07741

0.09101

1.176

1.155

Expected value cost (NK) M=6

999.56

1248.98

1.250

1.250

Standard deviation cost (NK) M=6 31.76

50.90

1.602

1.613

are given in Table 2. The results are completely in agreement with the theoretical predictions. It can be concluded from this simulation that, under the previously mentioned conditions, it is possible to estimate systems in feedback using a very small number of independently repeated measurements. Discussion. Simulation 2 shows clearly that the proposed method is also consistent in feedback situations. This seems to be in disagreement with the well-known classical result whereby the limiting identified model (H(w, em,,)) depends on the signal-to-noise ratio of the measurements. To get a better understanding of this behavior, the situation depicted in Fig. 2 is rewritten in the form of Fig. 1 by choosing 1 U”(Wk)= 1 + G(o,#$(w~)

X(4,

For that reason, many methods converge to a combination of H(w,) and G-‘(w,& depending on the signal-to-noise ratio of the measurements. The method proposed in this paper will always converge to H(o~), even for low signal-to-noise ratios, as long as the system is persistently excited (X(ok) # 0 for a sufficient number of frequencies). Basically this is due to the possibility separating the excitation ( Uo(ok)) from the noise contributions using the known periodicity of the excitation signal. At the same time, the second-order moments of the noise on input and output are estimated. Because of the consistent behavior of the estimator, this information is correctly combined over the frequencies, so that even for a small number of periods, the identified model converges to the true model for a large number of frequency components. 6. CONCLUSIONS

Note that NU(wk) and NY(wk) are allowed to be correlated. The ratio of the measured output and input spectra is

If the in-loop noise source (NYI(ok)) dominates compared with the other (noise) sources, the measured transfer function is approximately

In this paper the influence of replacing the covariance matrix by its sample value in the weighting of an errors-in-variables least-squares method has been studied. It has been shown that the resulting estimator is still strongly consistent if at least four independent, repeated data sets are available. Moreover, the loss in efficiency is quantified as a function of the number M of independent data sets. It turns out that even for very small numbers (M 2 6) the loss is not large. Also, the variation of the cost function has been studied. This is important for the model validation step, because it allows one to detect the presence of model errors. In general, it can be concluded that nonparametric noise models offer an attractive alternative to parametric noise models: this simplifies the optimization process, reduces the model selection problem (there is only one model selection problem), and robustifies the results (there are no problems with incorrect noise models). Acknowledgements-This work is supported by the Belgian National Fund for Scientific Research, the Flemish

Frequency-domain

identification

government (GOA-IMMI), and the Belgian government as a part of the Belgian programme on Interuniversity Poles of Attraction (IUAPSO) initiated by the Belgian State, Prime Minister’s Office, Science Policy Programming. The authors would like to thank the anonymous referees and Associate Editor for their constructive remarks.

using non-parametric (ii) Second-order Because

l-45.

Billingsley, P. (1986) Probability and Measure. Wiley, New York. Brillinger, D. R. (1981) Time Series: Data Analysis and Theory. McGraw-Hill, New York. Fuller, W. A. (1980) Properties of some estimators for the errors-in-variables model. Ann. Statist. 8,407-422. Kendall, M., Stuart, A. and Ord, J. K. (1983) The Advanced Theory of Statistics, Part 3. Charles Griffin, London. Lukacs, E. (1975) Stochastic Convergence. Academic Press, New York. Pintelon, R., Guillaume, P., Rolain, Y. and Verbeyst, F. (1992) Identification of linear systems captured in a feedback loop. IEEE Trans. Instrum. Meas. IM-41, 747-754.

Picinbono, B. (1993) Random Signals and Systems. Prentice-Hall, Englewood Cliffs, NJ. Schoukens, J. and Pintelon, R. (1991) Zdentification of Linear Practical

Guideline

to Accurate

278-286.

Schoukens, J., Guillaume, P. and Pintelon, R. (1993) Design of broadband excitation signals. In Perturbation Signals for System Identification; ed. K. Godfrey, pp. 126-159. Prentice-Hall, Englewood Cliffs, NJ. Soderstrom, T. (1974) Convergence properties of the generalized least squares identification method.

10,617-626.

Stout, W. F. (1974) Almost Sure Convergence. Academic Press, New York. Stuart, A. and Ord, J. K. (1979) The Advanced Theory of Statistics, Part 1. Charles Griffin, London. Villegas, C. (1961) Maximum likelihood estimation of a linear functional relationship. Ann. Math. Statist. 32, 1048-1062.

Villegas, C. (1982) Maximum likelihood and least squares estimation in linear and affine functional models. Ann. Statist.

10.256-265.

APPENDIX A-STUDY OF THE MOMENTS OF l/x WHERE x HAS A X2(n) DISTRIBUTION (i) First-order moment, E[ l/x] Consider a variable f = (x,/nr)/(xz/nr) consisting of the ratio of two independent stochastic variables xi/n, and xz/n2 having x2 distributions with respectively n, and n2 degrees of freedom. f has an F distribution with (n,, n2) degrees of freedom, and E[f] = nz/(n2 - 2) (Kendall et al., 1983). Because

E]fl = +lE[&]

= ‘[A]

=$,

(A.l)

it is found that Ei

[I

=A.

- 4)

and using the rule with x,/n, and I/(x2/nZ), it is found that (A.3)

APPENDIX B-Q. &,/a@ AND a2ck.a6* HAVE FINITE FIRST- AND SECOND-ORDER MOMENTS For the notational BZk on 0 is omitted.

simplicity, the dependence

(i) ck has finite first- and second-order From (A.2) and (A.3),

of cli and

moments for (M > 3)

M-l

EkA = ~_2> E[c;] = (E[c,#

+ a:,

+2

4(M - l)* (2M - 4)*(2M -- 6)

(M - 1)” = (M - 2)(M - 3)

(ii) ack /a9 has finite first- and second-order moments (M 2 5) Since E[cJ is independent of 0. it follows immediately that

E[ac,/ae] =o.

Modeling.

Pergamon Press, Oxford. Schoukens, J. and Renneboog, J. (1986) Modeling the noise influence on the Fourier coefficients after a discrete Fourier transform. IEEE Trans. Instrum. Meas. IM-35,

Automatica

2n$(nl + nZ - 2) &(hz - 2)%,

1 2 var 0x =(n - 2)‘(n -4)’

Amemiya, Y. and Fuller, W. A. (1984) Estimation for the multivariate errors-in-variables model with estimated error covariance matrix. Ann. Statist. 12,497-W. Anderson, T. W. (1984) The 1982 Wald memorial lectures estimating linear statistical relationships. Ann. Statist. 12,

A

moment, var (1 /x)

var (f) =

REFERENCES

Systems:

1085

noise models

(‘4.2)

with 6, the Ith parameter of 8. This is an expression of the form E[(u - b)‘]. Applying the inequality -E[u’] - E[b*] % ZE[ub] 5 E[u*] + E[b’] shows that it is sufficient to prove that the second-order moments of a and b are finite. This is obvious for a. Bounding E[b*] is more involved. Observing that 8& is a homogeneous second-order polynomial in 8, it can be shown that (aa~,,/ae,)*~w:‘a:al, if 0, is a parameter of N(w, 0) and (a&/%$)~&?$& if 0, is a parameter of D(w, 6). Without loss of generality, the first situation will be considered here, and it will be shown that the second moment of

is bounded. For notational simplicity, the dependence on k will be omitted. The sample variances (i?&, &s) have a Wishart distribution that also depends on the cross-correlation &, and is denoted by dF(& d:, p), with p = c&/m (Kendall et al., 1983). The domain of the integral to calculate the expected value of b* can be split into two parts with respect to the variable 8:: D, = {@I o$< E} and D, = (63 1 22~ E}. The integral over D, will be finite because &, has finite moments and the denominator is bounded. It can be shown after some calculations that on D, the marginal density function dH(& a$) = ID,, dF(B$, 8:, p) is bouuded by dH(8:.

6;) 5 C(,,)e(y+‘)(IPI~ru”,/‘)‘~ dC(v, aL) dC(v, 6:). (B.1)

Here dG(v, 8:) is a x2 distribution with respect to 65, with v degrees of freedom, and similarly for G(v, es), and C(v) is a constant with respect to (86, es) depending on v. Because the density function dG(v, 3;) of a x2 distribution is proportional to (~~z_)‘v’2)~-‘e~irzJ2 e , it is clear that the expected value of b’ o’ver D, will be bounded if v 2 8, so that Mz5(v=2M-2). Remark 1. A second possibility to bound the second moment is to regularize the estimator by eliminating all contributions of Dr. In that case the statement is valid for M -2 4. It should be noted that the consistency proof is still valid, because

1086

J. Schoukens et al.

E&J 0f e.

Proof:

= (A4 - l)/(M - 2) + Cs(2M-2)‘2 is still independent

(i) In the first step the expected value is calculated:

E

[~@Wt,J I 1 .9=.9’

has finite first- and second-order

=-+rkj,($ck+2%$+&$)].

mo

ments (M s 6) The proof is completely similar to the previous one, noticing that a2Bbk/a6, af$ = &:w;+’ or B&;+” or B$w;+‘, and that (&J252(8h+ 6%). This time, contributions of (1/6$)4 should be bounded, requiring that ~210, so that M26.

(D.2)

For notational simplicity, the dependence on 6 and Z, is omitted in the right-hand side. All derivatives have to be evaluated in 8*. It is known from the previous appendix that E[aCk/ae] = 0 and E[a2Ck/ae2]= 0. Because &(e, Z,) and ck are independent variables, it is clear that

Remark.

A second possibility to bound the second moment is to regularize the estimator by eliminating all contributions of D,. In that case the statement is valid for M 2 4. APPENDIX If Assumptions

C-APPROXIMATION DERIVATIVE &la@ l-7 are met then

OF THE

(D.3)

(C.1) Because .!?[&(8, Z,)] = Kk(6, Z,,) + I, (D.3) can be Proof

For notational simplicity, the dependence on 8 and 2, is omitted. It follows from (40) and Assumption 6 that it is sufficient to proof that

reduced

to

(D.4) Because a.s. lim,.,,,

I.9=63* 1 -f~$,E[$Kd8,&)/ E $it(e, ([

a.s. ,I& Proof. Because Kk and ck are stochastically independent, follows immediately that

8* = em,” (Theorem l), it follows that

it

Z,)

])=O.

(D.5)

%li.tz,,)

Since ‘E[Q] is a constant, independent of 8, it is clear that E[&,/ae] = 0, which proves the statement. (ii) #(hi,

Kkg)2]

= O(W’).

Proof: Because ck is stochastically independent because E[as/ae] = 0, it follows that

(ii) In the second step the a.s. convergence of (a2/a@)R(f3, Z,) to its expected value is shown. It will be shown that its variance is O(W’), which already proves convergence in probability (plim). Because the terms in the sum are independent distributed, a.s., convergence follows. &R(e.Z,))+$&?(e,Z,))2].

of c,, and

k

(D.6)

I

Because the noise is independently distributed over the data points, the expected value of the double products are equal to zero and (D.6) becomes Because K, is assumed to be bounded (which follows from Assumption 7). and E[(ack/a6,)2] is bounded (Appendix B), the statement is proved. (iii) It follows from parts (i) and (ii) that

var & ( +$

(Lukacs, 1975). The almost-sure convergence follows immediately from Stout (1974) where it is shown that convergence in probability of a sum of independent random variables also includes almost-sure convergence. APPENDIX D If Kk(e, z,J, (a/ae)K,(e,Z,,),and (a2/ae2)&(6, bounded on a closed neighbourhood of emin then a.s. ilii

(

$

R(e, Z,)

M-1 -M-we,

o=tJ*

= 0,

a2

Zo) are

mls-,,“,z,,,)

uniformly in 8.

(D.l)

R(e, Z,)) K&$‘].

E[(zci+2zz+

(D.7)

Using the same rule as in the previous appendix, it is sufficient to show that the second-order moments of each of the individual terms in (D.7) are uniformly (in 0) bounded to prove that

E SC [I ae2

+2aKkaCk+K k

ae ae

II

&k 2 SC<%

k de2

is uniformly bounded. From the initial assumptions and the results of Appendix B. this result follows immediately. Consequently, var($R(e,&))
(D-8)