On Consistency of Subspace Methods for System Identification

On Consistency of Subspace Methods for System Identification

PII: S0005–1098(98)00104–6 Automatica, Vol. 34, No. 12, pp. 1507—1519, 1998  1998 Elsevier Science Ltd. All rights reserved Printed in Great Britain...

221KB Sizes 0 Downloads 93 Views

PII: S0005–1098(98)00104–6

Automatica, Vol. 34, No. 12, pp. 1507—1519, 1998  1998 Elsevier Science Ltd. All rights reserved Printed in Great Britain 0005-1098/98 $—see front matter

On Consistency of Subspace Methods for System Identification* MAGNUS JANSSON- and BO WAHLBERG-

Conditions for consistency for a certain class of subspace system identification methods are given. It is shown that persistence of excitation of the input signal is, in general, not sufficient. Stronger sufficient conditions are presented. Key Words—Consistency; subspace methods; identification; performance analysis; linear systems.

apply to identification of multivariable systems. Reliable numerical algorithms form the basis of these methods. The statistical properties, however, are not yet fully understood. Initial results in this direction are reported in Viberg et al. (1991, 1997), Peternell et al. (1996), Jansson and Wahlberg (1996a, b) and Bauer et al. (1997). Conditions for consistency of the 4SID methods have often involved assumptions on the rank of certain matrices used in the schemes. It would be more useful to have explicit conditions on the input signals and on the system. The consistency of combined deterministic and stochastic subspace identification methods has been analyzed in Peternell et al. (1996). In that paper, consistency is proven under two assumptions. One, is the assumption that the input is an ARMA process. Secondly, it is assumed that a dimension parameter tends to infinity at a certain rate as the number of samples tends to infinity. This assumption is needed to consistently estimate the stochastic sub-system and the system order. However, subspace methods are typically applied with fixed dimension parameters. A consistency analysis under these conditions is given in Jansson and Wahlberg (1996a, b, 1997). The analysis is further extended in the current paper. The analysis focuses on the crucial first step of the subspace identification methods. That is, the focus is on estimating the extended observability matrix, or the ‘‘subspace estimation’’ step. The methods analyzed herein, are the so-called Basic4SID and I»-4SID. The Basic-4SID method (De Moor and Vandewalle, 1987; De Moor et al., 1988; Liu, 1992; Verhaegen, 1991) has been analyzed in Gopinath (1969), Viberg et al. (1991), Liu (1992) and Verhaegen and Dewilde (1992). Herein, we extend the analysis and give persistence of excitation conditions on the input signal which guarantee a consis-

Abstract—Subspace methods for identification of linear timeinvariant dynamical systems typically consist of two main steps. First, a so-called subspace estimate is constructed. This first step usually consists of estimating the range space of the extended observability matrix. Secondly, an estimate of system parameters is obtained, based on the subspace estimate. In this paper, the consistency of a large class of methods for estimating the extended observability matrix is analyzed. Persistence of excitation conditions on the input signal are given which guarantee consistent estimates for systems with only measurement noise. For systems with process noise, it is shown that a persistence of excitation condition on the input is not sufficient. More precisely, an example for which the subspace methods fail to give a consistent estimate of the transfer function is given. This failure occurs even if the input is persistently exciting of any order. It is also shown that this problem can be eliminated if stronger conditions on the input signal are imposed.  1998 Elsevier Science Ltd. All rights reserved.

1. INTRODUCTION

The large interest in subspace methods for system identification is motivated by the need for useful engineering tools to model linear multivariable dynamical systems using experimental data (Van Overschee and De Moor, 1994; Larimore, 1990; Verhaegen, 1994; Viberg, 1995). These methods, sometimes referred to as 4SID (subspace statespace system identification) methods, are easy to

*Received 2 July 1997; revised 20 March 1998; received in final form 4 June 1998. Parts of this paper were presented at the 13th IFAC World Congress, which was held in San Francisco, U.S.A., during June 30—July 5, 1996, and at SYSID’97, which was held in Fukuoka, Japan, during July 8—11, 1997. The Published Proceedings of this IFAC Meeting may be ordered from: Elsevier Science Limited, The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, U.K. This paper was recommended for publication in revised form by Associate Editor Paul Van den Hof under the direction of Editor Torsten So¨derstro¨m. Corresponding author Dr Magnus Jansson. Tel. #46 8 7907425; Fax #46 8 7907329; E-mail magnus.jansson@s3. kth.se. -S3-Automatic Control, Royal Institute of Technology (KTH), SE-100 44 Stockholm, Sweden. 1507

1508

M. Jansson and B. Wahlberg

tent estimate of the extended observability matrix. The IV-4SID class of methods include N4SID (Van Overschee and De Moor, 1994), MOESP (Verhaegen, 1994) and CVA (Larimore, 1990). An open question regarding the IV-4SID methods has been whether or not only a persistence of excitation condition on the input is sufficient for consistency. The current contribution employs a counterexample to show that this is not the case. For systems with only measurement noise, however, it is possible to show that a persistently exciting input of a given order ensures consistency of the IV-4SID subspace estimate. For systems with process noise, IV-4SID methods may suffer from consistency problems. This is explicitly shown in the aforementioned counterexample. A simple proof shows that a white noise or a low order ARMA input signal guarantees consistency in this case. The remainder of the paper is organized as follows. In Section 2, the system model is defined and the general assumptions are stated. Some preliminary results are also collected in this section. In Section 3, the underlying ideas of the Basic-4SID and IV-4SID methods are presented. In Section 4, the consistency of the Basic-4SID method is analyzed. In Section 5, the critical relation for consistency of IV-4SID methods is presented. Conditions for this relation to hold are given. In Section 6, the counterexample is given, describing a system for which the critical relation does not hold. In Section 7, some simulations are presented to illustrate the theoretical results. Finally, conclusions are provided in Section 8.

moments E+e(t)e2(s),"r d , C R Q where d is the Kronecker delta. R Q A2: The input u(t) is a quasi-stationary signal (Ljung, 1987). The correlation function is defined by r (q)"E +u(t#q)u2(t), S where E is defined as 1 , E + ) ," lim E+ ) ,. , N R The input u(t) and the innovation process e(t) are assumed to be uncorrelated, E +u(t)e2(s),"0 for all t and s. A3: The system (2) is asymptotically stable, i.e., the eigenvalues of A lie strictly inside the unit circle. The eigenvalues of the matrix A!KC are inside the unit circle. A4: The description (2) is minimal in the sense that the pair (A, C) is observable and the pair (A, [B K]) is reachable. The concept of persistently exciting signals is defined in the following way: Definition 1. A (quasi-stationary) signal u(t) is said to be persistently exciting of order n, denoted by PE(n), if the following matrix is positive definite; r (0) S r (!1) S $

2. PRELIMINARIES

Assume that the true system is linear, time-invariant and described by the state-space equations x(t#1)"Ax(t)#Bu(t)#w(t),

(1a)

y(t)"Cx(t)#Du(t)#v(t).

(1b)

r (1!n) S

r (1) S r (0) S

2 rS (n!1) r (n!2) S . \ $

r (2!n) 2 S

r (0) S

For easy reference, the following two well known lemmas are given. They will be useful in the consistency analysis.

Here, x(t)31L is the state-vector, u(t)31K consists of the observed input signals, and y(t)31J contains the observed output signals. The system is corrupted by the process noise w(t)31L and the measurement noise v(t)31J. In the remainder of this paper we will consider the system Eq. (1) in its innovations representation

where A is n;n and D is a non-singular m;m matrix. Then

x(t#1)"Ax(t)#Bu(t)#Ke(t),

(2a)

rank+S,"m#rank+A!BD\C,.

(2b)

Proof. The result follows directly from the identity,

y(t)"Cx(t)#Du(t)#e(t), where e(t)31J is the innovation process. Let us introduce the following assumptions.

general

A1: The innovation process e(t) is a stationary, zero-mean, white noise process with second order

¸emma 2. Consider the (block) matrix

  A B

S"



C D

   

I

!BD\

0

I

A

B

C D

"

D

0

0

D

,

(3)



I

0

!D\C

I

,

(4)

On consistency of subspace methods where D"A!BD\C is the Schur complement of D in S. Using equation (4), the next lemma is readily proven. ¸emma 3. Let S, as defined in equation (3), be symmetric (C"B2), and let D be positive definite. Conclude then that S is positive (semi) definite if and only if (A!BD\B2) is positive (semi) definite.

3. BACKGROUND

In this section, the basic equations used in subspace state-space system identification (4SID) methods are reviewed. Define the la;1 vector of stacked outputs as y (t)"[y2(t), y2(t#1),2, y2(t#a!1)]2, ? where a is a user-specified parameter which is required to be greater than the observability index (or, for simplicity, the system order n). In a similar manner, define vectors made of stacked inputs and innovations as u (t) and e (t), respectively. By iter? ? ation of the state equations in equation (2), it is straightforward to verify the following equation for the stacked quantities: y (t)"C x(t)#U u (t)#n (t), ? ? ? ? ?

(5)

where n (t)"W e (t), and where the following ? ? ? block matrices have been introduced: C C" ?

CA $

,

CA?\

U" ?

D

0

2

0

CB

D

\

0

$

\

\

$

CA?\B

W" ?

CA?\B 2 D

I

0

2 0

CK

I

\ 0

$

\

\

$

CA?\K 2

I

CA?\K

,

.

The key idea of subspace identification is to first estimate the n-dimensional range space of the (extended) observability matrix C . The system para? meters are then estimated in different ways. For consistency of 4SID methods it is crucial that the estimate of C is consistent. ? There have been several suggestions in the literature for estimating C . Most of them, however, are ? closely related to one another (Van Overschee and

1509

De Moor, 1995b; Viberg, 1995). To describe the methods considered, some additional notation is needed. Assume there are N#a#b!1 data samples available and introduce the block Hankel matrices Y "[y (1#b),2,y (N#b)] (6) ? ? ? Y "[y (1),2,y (N)]. (7) @ @ @ The parameter b is to be chosen by the user. Intuitively, b can be interpreted as the dimension of the observer used to estimate the current state (Wahlberg and Jansson, 1994). The partitioning of data into Y and Y is sometimes referred to as ‘‘past’’ @ ? and ‘‘future’’. Define U and U as block Hankel ? @ matrices of inputs similar to equations (6) and (7), respectively. Comparing equations (6) and (5), it is clear that Y is given by the relation ? Y "C X #U U #N , (8) ? ? D ? ? ? where X "[x(1#b) x(2#b)2x(N#b)], D and N is constructed from n (t) similar to equa? ? tion (6). Two methods of estimating the range space of C from data are presented next. The methods ? are based on relation (8). 3.1. Basic-4SID The method described in this section is sometimes referred to as the Basic-4SID method (De Moor and Vandewalle, 1987; De Moor et al., 1988; Liu, 1992; Verhaegen, 1991). When considering this method, it is not necessary to partition the data into past and future. Hence it can be assumed that b"0. The idea is to study the residuals in the regression of y (t) on u (t) in equation (5). Intro? ? duce the orthogonal projection matrix on the null space of U as ? P, U2"I!U2 (U U2)\U . ? ? ? ? ? If the input u(t) is PE(a), then the inverse of U U2 ? ? exists for sufficiently large N. The residual vectors in the regression of y (t) on u (t) are now given by ? ? the columns of the following expression: 1

1 1 Y P, C X P,U2# N P, U2" U2 , (9) ? ? ? ? D (N (N (N ? ? where the normalizing factor 1/(N has been introduced for later convenience. The Basic-4SID estimate of C is then given by ? Cª "Qª , (10) ? Q where Qª is obtained from the singular value deQ composition (SVD)



Sª [Qª Qª ] Q Q L 0

0 Sª L

Vª 2 Q " 1 Y P, U2 . Vª 2 (N ? ? L

 

1510

M. Jansson and B. Wahlberg

Here, Sª is a diagonal matrix containing the n Q largest singular values and Qª contains the corresQ ponding left singular vectors. 3.2. I»-4SID The second type of estimator that will be analyzed is called IV-4SID (Larimore, 1990; Van Overschee and De Moor, 1994; Verhaegen, 1994). The reason for this name is that this approach is most easily explained in terms of instrumental variables (IV:s) (see e.g. Viberg, 1995). The advantage of IV4SID (compared to Basic-4SID) is that it can handle more general noise models. The main idea here is to use an instrumental variable, which not only removes the effect of u (t) in equation (5) (as in ? the Basic-4SID), but also reduces the effect of n (t). ? The first step is to remove the future inputs. This is the same as in Basic-4SID. The second step is to remove the noise term in equation (9). This is accomplished by multiplying from the right by the ‘‘instrumental variable matrix’’

 

Y P" @ . (11) @ U @ Post-multiplying both sides of equation (9) with the matrix in equation (11), and normalizing by 1/(N yields, 1 1 1 Y P,U2 P2" C X P, N P, P2, U2 P2# ? ? @ N ? D @ N ? U2? @ N ? where, due to the assumptions, the second term on the right tends to zero with probability one (w.p.1) as N tends to infinity (since the input and the noise process are uncorrelated, and since there is a ‘‘time shift’’ between N and the noise in Y ). A quite ? @ general estimate of C is then given by ? Cª "W ª \Qª , (12) ? P Q where Qª is the al;n matrix made of the n left Q singular vectors of 1 W ª Y P, ª U2P2W N P ? ? @ A

corresponding to the n largest singular values. The (data dependent) weighting matrices W ª and W ª P A can be chosen in different ways to yield a class of estimators. This class includes most of the known methods appearing in the literature; see Table 1 (cf. Van Overschee and De Moor, 1995; Viberg, 1995). The methods PO-MOESP, N4SID, IVM and CVA appear in Verhaegen (1994), Van Overschee and De Moor (1994), Viberg (1995) and Larimore (1990), respectively. For some results regarding the effect of the choice of the weighting matrices, see Jansson and Wahlberg (1995, 1996a) and Van Overschee and De Moor (1995b).

4. ANALYSIS OF BASIC-4SID

In this section, conditions are established by which the Basic-4SID estimate Cª in equation (10) ? is consistent. Here, consistency means that the column span of Cª tends to the column span of C as ? ? N tends to infinity (w.p.1). (In other words, there exists a non-singular n;n matrix T such that lim Cª "C T.) This method has been analyzed , ? ? previously (see Gopinath, 1969; Viberg et al., 1991; Liu, 1992; Verhaegen and Dewilde, 1992). This problem is reconsidered here, however, for two reasons. First, the proof to be presented appears to be more explicit than previous ones (known to the authors). We give less restrictive conditions on the input signal than previously known results and the order of persistence of excitation needed is given. Second, these results are useful in the analysis of the more interesting IV-based methods given in the next section. Recall equation (5) and note that the noise term n (t) is uncorrelated with u (t) and x(t). It is there? ? fore relatively straightforward to show that the sample covariance matrix of the residuals in equation (9) 1 Rª e" Y P, U2Y2 N ? ? ?

(13)

Table 1. Weighting matrices corresponding to specific methods. Notice that some of the weightings may not be as they appear in the referred papers. These weightings, however, give estimates of Cª identical to ? those obtained using the original choice of weighting W ª A

   

  

1 \ P PN U2P2 N @ ? @ 1 \ 1 \ P PN P P2 U2P2 N @ ? @ N @ @ 1 \ P P2 N @ @





1 \ P PN U2P2 N @ ? @



W ª P

Method

I

PO-MOESP

I

N4SID

 

 

1 \ Y PN U2Y2 N ? ? ? 1 \ Y PN U2Y2 N ? ? ?

IVM CVA

On consistency of subspace methods tends to (14) Re"C (r !r R\r2 )C2#R ? V VS S VS ? L w.p.1, as NPR. Here, the various correlation matrices are defined as r "E +x(t)x2(t),, V r "E +x(t)u2 (t),, VS ? R "E +u (t)u2 (t),, S ? ? R "E +n (t)n2 (t),. L ? ? Recall that u(t) is assumed to be PE(a). This implies that R is positive definite. From equation (14) it S can be concluded that the estimate of C in equa? tion (10) is consistent if and practically only if,R "pI, (15) L r !r R\r2 '0 (16) V VS S VS for some scalar p. If equations (15) and (16) hold, then the limit of Cª (as defined in equation (10)), ? equals C T for some non-singular n;n matrix T. ? The condition (15) is essentially only true for output error systems perturbed by a measurement noise of equal power in each output channel. That is, if K"0 and r "pI. In the following it is shown C under what conditions (on the input signals) the condition (16) is fulfilled. In view of Lemma 2, the condition (16) is equivalent to

    x(t)

x(t)

r



r V VS '0. (17) u (t) u (t)2 r2 R ? ? VS S Next, the state is decomposed into two terms x(t)"xB(t)#xQ(t), where the ‘‘deterministic’’ term xB(t) is due to the observed inputs and the ‘‘stochastic’’ term xQ(t) is caused by the innovations process e(t). Due to assumption A2, xQ(t) is uncorrelated with xB(t) and u (t). Furthermore, introduce a trans? fer function (operator) description of xB(t) as E

"

det+qI!A, a(q\)" qL "a #a q\#2#a q\L,   L

(a "1), 

(19)

where Adj+A, denotes the adjugate matrix of A and det+ ) , is the determinant. The polynomial coefficients a (i"0, 2, n) are scalars while all G FS (i"1, 2, n) are n;m matrices. Analogously to G FS(q\), define the matrix polynomial Adj+qI!A,K FC(q\)" qL "FC q\#FC q\#2#FCq\L,   L where FC (i"1, 2, n) are n;p matrices. Thus, xQ(t) G is given by the relation FC(q\) xQ(t)" e(t). a(q\) Also define the Sylvester-like matrix FS 2 FS 0 2 L  aI 2 a I a I K K S" L K \ \

0

FC 2 FC L  0 2 0

0

$

.

$

0 aI 2 2 a I 0 2 0 LK K GFFFFFFFFHFFFFFFFI ; L>?K ?>LK>LN (20) Using the notation introduced above, it can be verified that u(t!n) $

  x(t)

u(t#a!1) 1 "S . a(q\) u (t) e(t!n) ? $

(21)

e(t!1)

FS(q\) xB(t)" u(t), a(q\) where q\ is the backward shift operator (q\u(t)"u(t!1)). The polynomials FS(q\) and a(q\) are defined by the relations

It is shown in Appendix A that S has full row rank under assumption A4. This implies that the condition (17) is satisfied if the following matrix is positive definite:

Adj+qI!A,B FS(q\)" qL "FS q\#FS q\#2#FSq\L,   L

1511

(18)

- There may exist special cases when Basic-4SID is consistent even when R OpI. However, these cases depend on the system L parameters and are not of interest here.

E



u(t!n)

u(t!n)

$

$

u(t#a!1) u(t#a!1) 1 1 a(q\) a(q\) e(t!n) e(t!n) $

$

e(t!1)

e(t!1)

2



.

(22)

1512

M. Jansson and B. Wahlberg

Since u(t) and e(t) are uncorrelated, the covariance matrix in equation (22) will be block diagonal. The upper diagonal block can be shown to be positive definite if u(t) is PE(a#n); see Lemma A.3.7 in So¨derstro¨m and Stoica (1983), modified to quasistationary signals. The lower diagonal block is positive definite assuming r '0. If r "0, then x(t) C C does not depend on the process +e(t), and, hence, its contribution in equations (21) and (22) can be removed. In conclusion, the following theorem has been proven. ¹heorem 4. Under the general assumptions A1—A4, the Basic-4SID subspace estimate equation (10) is consistent if the following conditions hold: (1) The covariance matrix of the stacked noise is proportional to the identity matrix. That is, R "pI. L (2) The input signal u(t) is persistently exciting of order a#n. Remark 5. As previously mentioned, condition (1) of the theorem is essentially only true if K"0 and if r "pI. Furthermore, K"0 implies that the sysC tem must be reachable from u(t).

equation (12) is consistent if and only if the following matrix has full rank equal to n: r !r R\R . VN VS S SN

This is because the limiting principal left singular vectors of equation (13) will be equal to W C T for P ? some non-singular n;n matrix T. Hence, the limit of CK in equation (12) is C T. In view of Lemma 2, ? ? the condition that equation (24) should have full rank can be reformulated as

rank E

W C (r !r R\R )W , P ? VN VS S SN A



u (t#b) ?

E

(23)

where W and W are the limits of W K and W K , P A P A respectively. The correlation matrices appearing in equation (23) are defined as follows: r "E +x(t#b)p2 (t),, VN @

R "E +u (t#b)p2 (t),, SN ? @ where p (t), t"1, 2, N, are defined as the col@ umns of P in equation (11). If the weighting ma@ trices W and W are non-singular,- the estimate P A

- It is not necessary to assume that the weighting matrices are non-singular for proving consistency of IV-4SID. For example, it is sufficient that rank(r !r R\R )W " VN VS S SN A rank(r !r R\R )"n. However, we believe that the results VN VS S SN become clearer if we assume that W and W are non-singular P A and focus on the rank of equation (24).



y (t) @ u (t) @ u (t#b) ?



y (t) 2 @ u (t) '0. @ u (t#b) ?

(26)

This is recognized as a similar requirement appearing in least-squares linear regression estimation (So¨derstro¨m and Stoica, 1989, Complement C5.1). The condition (26) holds if u(t) is PE(a#b) and if E+n (t)n2 (t),'0. Similarly, conditions for all cases @ @ in Table 1 can be obtained. The result is summarized in Table 2. Having established that the weighting matrices are non-singular, condition (25) may now be examined. The correlation matrix in equation (25) can be written as the sum of two terms:

r "E +x(t)u2 (t),, VS ? R "E +u (t)u2 (t),, S ? ?



y (t) 2 @ u (t) "n#am. (25) @ u (t#b) ?



x(t#b)

This is the critical relation for consistency of IV-4SID methods. Let us begin by deriving conditions which ensure that the weighting matrices in Table 1 are nonsingular. Consider for example the limit of the column weighting W for the PO-MOESP method. A We need to establish that P P,U2P2/N is positive @ ? @ definite for large N. Using Lemma 3, that condition can equivalently be written as

5. ANALYSIS OF IV-4SID

In this section, conditions for the IV-4SID estimate CK defined in equation (12) to be consistent ? are established. As NPR, the limit of the quantity in equation (13) is w.p.1 given by

(24)

E



yB (t) 2 @ u (t) @ u (t#b) ?



xB(t#b) u (t#b) ?

#E







xQ(t#b) 0





yQ (t) 2 @ 0 , 0

(27)

where the superscripts ( ) )B and ( ) )Q denote the deterministic part and the stochastic part, respectively. Below we show that the first term in equation (27) has full row rank under a PE condition on u(t). This implies that the critical relation (25) is fulfilled for systems with no process noise (since

On consistency of subspace methods Table 2. Sufficient conditions for the weighting matrices W and P W appearing in Table 1 to be non-singular A Method

Order of PE

Condition on noise

PO-MOESP N4SID IVM CVA

a#b a#b m_max(a, b) a#b

E+n (t)n2 (t),'0 @ @ E+n (t)n2 (t),'0 @ @ E+n (t)n2 (t),'0 K K E+n (t)n2 (t),'0 K K







xB(t)

xB(t)



2

u (t) u (t) @ @ u (t#b) u (t#b) ? ? C2 0 0 @ 0 ; U2 I (28) @ @K 0 0 I ?K

where C (A, B) denotes the reversed reachability @ matrix. That is, C (A, B)"[A@\B A@\B 2 B]. (29) @ Clearly, the first factor has full row rank (n#am) if the system (2) is reachable from u(t) and if b is greater than or equal to the controllability index. Notice that the last factor has full row rank, provided b is greater than or equal to the observability index. A sufficient condition is then obtained if the covariance matrix in the middle of equation (28) is positive definite. We have

E



xB(t) u (t) @ u (t#b) ? "S E





2 u (t) @ u (t#b) ? 1 u (t!n) a(q\) ?>@>L xB(t)





1 u2 (t!n) a(q\) ?>@>L

FS 2 FS 0 2 L  aI 2 a I a I K K S" L K \ \

0 0

.(31)

If the system is reachable from u(t) then S is full row rank according to Appendix A. The matrix in equation (30) then, is positive definite if u(t) is PE(a#b#n). The result is summarized in the following theorem. ¹heorem 6. The IV-4SID subspace estimate CK ? given by equation (12) is consistent if the general assumptions A1–A4 hold in addition to the following conditions: (1) The weighting matrices W and W are nonP A singular. (2) (A, B) is reachable. (3) b5max (observability index, controllability index). (4) K"0. (5) The input u(t) is persistently exciting of order a#b#n. Remark 7. The condition K"0 ensures that the second term in equation (27) is zero. Remark 8. The conditions for the non-singularity of the weighting matrices in Table 2 are fulfilled if r '0 in addition to the PE condition on u(t). C We conclude this section by giving three alternate consistency theorems for IV-4SID. The first relaxes the order of PE on u(t) for single input systems, while the second strengthens the conditions on u(t) in order to prove consistency for systems with process noise. Finally, the third theorem considers single input ARMA (autoregressive moving average) signals. 5.1. Single input systems For single input systems, we may prove that the first term in equation (27) is full row-rank in an alternate way than was made in Theorem 6. This leads to a less restrictive condition on the input excitation. ¹heorem 9. For single input systems, the IV-4SID subspace estimate CK given by equation (12) is con? sistent if the general assumptions A1–A4 apply in addition to the following conditions:



S2,

where

0 aI 2 2 a I LK K GFFFFFFHFFFFFFI ; L>?K>@K ?>@>LK

xQ(t),0 in that case). For systems with process noise, one may in exceptional cases lose rank when adding the two terms in equation (27). One example of such a situation is given in the next section. Generally, loss of rank is unlikely to occur (at least if bl
1513

(30)

(1) The weighting matrices W and W are nonP A singular.

1514 (2) (3) (4) (5)

M. Jansson and B. Wahlberg

(A, B) is reachable. b5the observability index. r "0. U The input u(t) is persistently exciting of order a#n.

Proof. Consider a single input system and define the vector polynomial in q\: C Adj+qI!A,B FW(q\)" #D qL "FW #FW q\#FW q\#2#FWq\L,    L where FW (i"0, 2, n) are l;1 vectors. This implies G that L FW q\I u(t), (32) yB(t)" I I L a q\I I I where +a , is defined in equation (19). Next, define G the Sylvester matrix FW 2 FW 0 L  \ \

0 2 0 $

$

0

FW 2 FW 0 2 0 L  a 2 a 0 0 2 0 L  \ \ $ $

S " W

a 2 a 0 2 0 L  0 2 0 a 2 a 0 L  $ $ \ \ 0

0 2 0 0 a 2 a L  GFFFFFHFFFFFI ?>@>@J;L>?>@

  

E

where

 

u (t#b) ?



5.2. ¼hite noise input Next, we give the following consistency result concerning the case when the input is a white noise process. ¹heorem 10. Let the input u(t) to the system in equation (2) be a zero-mean white sequence in the sense that r (q)"r (0)d and r (0)'0. Assume S S O  S that A1–A4 hold, the weighting matrices W and P W are positive definite, and A rank+[C (A, B) C (A, G)],"n. (35) @ @ Here, C (A, B) is defined in equation (29), @ C (A, G)"[A@\G A@\G 2 G], @ and the matrix G is defined as G"E +x(t#1)y2(t),.

bl rows

b rows.

a rows

(33)

It can be shown that the rank of S is n#a#b if W the system is observable and reachable from u(t), and if b is greater than or equal to the observability index (Anderson and Jury, 1976; Bitmead et al., 1978). A straightforward (simple) proof of this fact is given in Appendix B, using the notation introduced in this paper. The first term in equation (27) can now be written as xB(t#b)

Theorem 9 does not carry over to the multi-input case because S is not full column rank if m'1 (see W Appendix B).



yB (t) 2 @ "SR S2, (34) u (t) @ 3 W u (t#b) ?



1 1 R "E u (t#b!n) u2 (t!n) . 3 a(q\) L>? a(q\) ?>@>L The covariance matrix R is full row rank if u is 3 PE(a#n). It is then clear that equation (34) is full rank since S is non-singular and S is full column W rank. Therefore, for single input systems we can relax the assumption about the input signal. The proof of

Given the above assumptions, the IV-4SID subspace estimate CK defined in equation (12) is ? consistent. Proof. Consider the correlation matrix in equation (25):





y (t) 2 @ E u (t) @ u (t#b) ? u (t#b) ? E +x(t#b)y2 (t), E +x(t#b)u2 (t), 0 @ @ " , 0 0 Ru



x(t#b)



?



(36)

where Ru "E +u (t)u2 (t),"I r (0). ? ? ? ? S The symbol  denotes the Kronecker product. In equation (36), we used the fact that E +x(t)u2(t#k),"0 for k50 since u(t) is a white sequence. Notice that Ru '0 since r (0)'0. Un? S der the given assumptions it is readily proven that E +x(t#b)y2 (t),"C (A, G), @ @ E +x(t#b)u2 (t),"C (A, B)(I r (0)). @ @ @ S We conclude that the matrix in equation (36) is full row rank under the reachability assumption given in equation (35). This concludes the proof. Notice that Theorem 10 guarantees consistency even for systems with process noise. Also observe that we here were able to relax the reachability condition on (A, B) compared to the previous theorems. Some other results for the case of a white noise input signal can be found in Verhaegen (1993).

On consistency of subspace methods

1515

5.3. Scalar ARMA input signal Finally, we give a consistency result for the case when u(t) is a scalar ARMA signal.

that an AR-input of arbitrarily high order is allowed if a is chosen large enough.

¹heorem 11. Assume that the scalar (m"1) input u(t) to the system in equation (2) is generated by the ARMA model

6. A COUNTEREXAMPLE

F(q\)u(t)"G(q\)e(t), where e(t) is a white noise process. Here, F(z) and G(z) are relatively prime polynomials of degree n and n , respectively, and they have all zeros $ % outside the unit circle. Furthermore, assume that A1–A4 hold, and (1) the weighting matrices W and W are nonP A singular, (2) (A, B) is reachable, (3) b5n, (4) n 4b!n, % (5) n 4a#b!n. $ Given the above assumptions, the IV-4SID subspace estimate CK in equation (12) is consistent. ?

The purpose of this section is to present an example of a system for which the IV-4SID methods are not consistent. We have shown that a necessary condition for consistency is the validity of equation (25). In the discussion following equation (27), it was also pointed out that consistency problems may occur for systems with process noise. In this section, we give an explicit example for which the rank drops when the two terms in equation (27) are added. Consider the cross-correlation matrix appearing in equation (25) which with obvious notation, reads



R

The construction of a system for which this matrix loses rank is divided into two major steps: (1) Find a vector v such that

Proof. Study the last two block columns of equation (25). Clearly, if (recall that m"1) rank E



x(t#b)



u (t#b) ?



u (t) 2 @ "n#a u (t#b) ?

(37)

holds, then the critical relation (25) is satisfied. Using the notation introduced in equation (31), we can write the correlation matrix in equation (37) as S E







1 u (t#b!n) u2 (t) , (38) ?>@ a(q\) L>?

where S is (n#a);(n#a) and non-singular since (A, B) is reachable (see Appendix A). Lemma A3.8 in So¨derstro¨m and Stoica (1983) shows that the correlation matrix appearing in equation (38) is full row rank if max(a#n#n , n#n )4a#b, % $ which can be easily transformed to the conditions of the theorem. Notice that the theorem holds for arbitrarily colored noise as long as it is uncorrelated with the input. With some more effort the theorem can be extended to the multi-input case by modifying Lemma A3.8 in So¨derstro¨m and Stoica (1983). For example, the theorem holds if the different inputs are independent ARMA signals. The result of the theorem is very interesting since it gives a quantification of the statement that ‘‘for large enough b, the matrix in equation (27) is full rank’’. Also observe



#R Q Q R B RB VBWB@ V W@ V S@ V S? . R B R R S?W @ S?S@ S?S?

(2) Design R

VQWQ@

v2



v2





RB RB V S@ V S? "0. R R S?S@ S?S? such that



R B B #R Q Q V W@ V W@ "0. R B S?W @

(39)

The first step turns out to be similar to a construction used in the analysis of instrumental variable methods (see So¨derstrom and Stoica, 1981, 1983 and the references therein). Once a solution to the first step is found, the second step is a matter of investigating whether or not a stochastic subsystem exists, generating a valid R Q Q . V W@ Let us study the first step in more detail. For notational convenience, define the following covariance matrix: P (t , t )"E E"J  







1 u (t!t ) u2 (t!t ) .  J  a(q\) E

The correlation matrix of interest in the first step can be written as



RB V S@ R S?S@



RB V S? "SP (n!b, 0), ?>L;?>@ R S?S?

where S is defined in equation (31). Here, however, it is of dimension (n#am);((a#n)m). The rank properties of P ; (0, 0) have been studied in E J So¨derstro¨m and Stoica (1983). P is full rank either if a(q\) is positive real and u(t) is persistently

1516

M. Jansson and B. Wahlberg

exciting, or if u(t) is an ARMA-process of sufficiently low order (e.g., white noise, cf. Theorem 11). Here, we consider an a(q\) that is not positive real to make P rank deficient. This implies that n52. Let us fix the following parameters: n"2, m"1, l"1, b"2 and a"3. If (A, B) is reachable then S is non-singular (see Appendix A). The first step can then be solved if there exists a vector g such that g2P ; (0, 0)"0. A solution to this can be   found by using the same setup as that in the counter-example for an instrumental variable method given in So¨derstro¨m and Stoica (1983). Choose a pole polynomial and an input process according to:

The free parameters in the second step are B, C, D, and those specifying the stochastic subsystem, with constraints on minimality of the system description. Somewhat arbitrarily,‡ we make the choices: B"[1 !2]2, C"[2 !1] and D"0. Consider the following description of the system in state space form:

a(q\)"(1!cq\),

(40)

u(t)"(1!cq\)(1#cq\)e(t),

(41)

where the variance of the noise process e(t) is r . C With the choices made, the system is observable and reachable from u(t). For a given c, equation (39) now consists of two non-linear equations in k , k and r . The equations are, however, linear   C in k r , k r , kr , kr and k k r . If we change C C C C  C variables according to

where e(t) is a white noise process of unit variance.Notice that c should have a modulus less than one for stability and that the input is persistently exciting of any order. Now, a straightforward calculation leads to P ; (0, 0)"   1!2c !4c c!2c 2c c 2c 1!2c !4c c!2c 2c c 2c 1!2c !4c c!2c . 0 c 2c 1!2c !4c 0 0 c 2c 1!2c This matrix is singular if c is a solution to det(P

;

(0, 0))"1#4c#10c!8c !15c!12c"0. (42)

The polynomial has two real roots for c+$0.91837. Choose, for example, the positive root for c and g as the left eigenvector corresponding to the zero eigenvalue of P. This provides a solution to the first step (namely, v2"g2S\). The solution depends on the matrix B. Until now, we have fixed the poles of the system and a specific input process. Remark 12. Observe that a subspace method that only uses the inputs U as instrumental variables @ (cf. equation (11)) is not consistent for systems satisfying the first step. In other words, the past input MOESP method (see Verhaegen, 1994) is not consistent if the poles and the input process are related as in equations (40) and (41) and if c is a solution to equation (42).

- Note that condition (4) of Theorem 11 is violated since b"n and n "4. %



x(t#1)"

2c !c 1

0

 k



 

x(t)#

1

u(t)

!2

 e(t),  y(t)"[2 !1]x(t)#e(t),

#

k

(43a) (43b)

k k " , PCJ k  g "kr ,  C g "k r ,  C then the two non-linear equations become linear in g and g . Thus, given c and k , we can easily solve   PCJ for g and g . Observe that g must be positive to    be a valid solution. A plot of g as a function of  c and k is shown in Fig. 1. The solution g is PCJ  positive when k is between +0.7955 and PCJ (4c!3c#1)/ (2c!2c#2)+0.8475 (g is dis continuous at this point). Let us, for example, choose the solution corresponding to k "0.825. PCJ The parameters become: c+0.9184, r +217.1, C k +!0.1542 and k +!0.1869. Thus, if the in  put equation (41) is applied to the system (43), and if a"3 and b"2, then the matrix in equation (25) is rank deficient. We conclude this section with some short comments about the counterexample. Indeed, the example presented in this section is very special. However, the existence of these kind of examples indicates that the subspace methods may have a very poor performance in certain situations. As should be clear from the derivations in this paper, the consistency of 4SID methods are related to the consistency of some instrumental variable schemes (see So¨derstro¨m and Stoica, 1981, 1983, and the

‡ For some choices of B, C and D we were not able to find a solution to the second step. For other choices of C, making the system ‘‘less observable’’, we found solutions with lower noise variance.

On consistency of subspace methods

1517

Fig. 1. The graph shows g as a function of c and k .  PCJ

references therein). When using 4SID methods, it is a good idea to estimate a set of models for different choices of a and b. These models should then be validated separately to see which gives the best result. This reduces the performance degradation due to a bad choice of a and b. In the counterexample presented above, such a procedure would probably have led to another choice of the dimension parameters (given that the system order is known). While this may lead to consistent estimates, the accuracy may still be poor (as demonstrated in the next section). 7. NUMERICAL EXAMPLES

In this section, the previously obtained results are illustrated by means of numerical examples. First, the counterexample given in the previous section is simulated. Thus, identification data are generated according to equation (41) and equation (43). The extended observability matrix is estimated by equation (12) using the weighting matrices for the CVA method (see Table 1). Similar results are obtained with other choices of the weighting matrices. The system matrix is then estimated as Aª "Cª - Cª ,   where ( ) )- denotes the Moore—Penrose pseudo inverse, Cª denotes the matrix made of the first a!1  rows and Cª contains the last a!1 rows of CK . To  ? measure the estimation accuracy we consider the estimated system poles (i.e., the eigenvalues of AK ). Recall that the system equation (43) has two poles at c"0.9184. Table 3 shows the absolute value of the bias and the standard deviation of the pole estimates. The sample statistics are based on 1000

Table 3. The figures shown in the table are the absolute values of the bias and, within brackets, the standard deviations

Pole 1 Pole 2

a"3, b"2

a"6, b"5

5.9572 (69.8826) 2.6049 (30.9250)

0.4704 (0.4806) 0.1770 (0.3526)

independent trials. Each trial consists of 1000 input—output data samples. The results for two different choices of a and b are given in the table. The first corresponds to the choice made in the counterexample. It is clearly seen that the estimates are not useful. For the second choice (a"6, b"5), it can be shown that the algorithm is consistent. This is indicated by the much more reasonable results in Table 3. The accuracy, however, is still very poor since the matrix in equation (25) is nearly rank deficient. Next, we study the influence of the stochastic sub-system on the estimation accuracy. Recall that in Section 6 the stochastics of the system were designed to cause a cancellation in equation (27). The zeros of the stochastic sub-system are given by the eigenvalues of A!KC which become, approximately: 0.9791$i0.1045. In the next simulation, the locations of these zeros are changed such that they have the same distance to the unit circle, but the angle relative to the real axis is increased five times. This is accomplished by changing K in equation (43) to [!0.2100, !0.5590]2. The zeros then become 0.8489$i0.4990. The simulation results are presented in Table 4. It can be seen that the estimation accuracy is more reasonable in this case. Finally, we change the input to be realizations of a zero-mean white noise process. The power of the

1518

M. Jansson and B. Wahlberg

Table 4. Same as Table 3, but for the system with K"[!0.2100, !0.5590]2

Pole 1 Pole 2

a"3, b"2

a"6, b"5

0.1161 (0.2405) 0.2266 (0.5043)

0.0316 (0.0452) 0.0282 (0.0290)

Table 5. Same as Table 3, but with a white noise input signal

Pole 1 Pole 2

a"3, b"2

a"6, b"5

0.0464 (0.1146) 0.0572 (0.0523)

0.0184 (0.0271) 0.0182 (0.0171)

white noise input is chosen to be equal to the power of the colored process equation (41). Except for the input, the parameters are the same as in the counterexample. Recall Theorem 10, which shows that the estimates are consistent in this case. Table 5 presents the result of the simulations. A comparison with the results of Table 3 clearly shows a great difference.

8. CONCLUSIONS

Conditions that ensure consistency of the subspace estimate used in subspace system identification methods have been given. For systems without process noise, a persistence of excitation condition on the input signal was obtained. With process noise, a persistence of excitation condition alone is not sufficient. In fact, an example was given of a system for which subspace methods are not consistent. The system is reachable from the input and the input is persistently exciting of any order. The subspace methods fail, however, to provide a consistent model of the system transfer function. It was also shown that a white noise input or an ARMA input signal of sufficiently low order guarantees consistency under weak conditions. This shows that the performance of subspace methods may be sensitive to the input excitation. Acknowledgements—The authors are very grateful to Professor Petre Stoica for fruitful discussions, suggestions and comments. REFERENCES Anderson, B. D. O. and E. I. Jury (1976). Generalized Bezoutian and Sylvester matrices in multivariable linear control. IEEE ¹rans. Automat. Control, AC-21, 551—556. Bauer, D., M. Deistler and W. Scherrer (1997). The analysis of the asymptotic variance of subspace algorithms. In Proc. 11th IFAC Symp. on System Identification, Fukuoka, Japan, pp. 1087—1091. Bitmead, R. R., S. Y. Kung, B. D. O. Anderson and T. Kailath (1978). Greatest common divisors via generalized Sylvester and Bezout matrices. 1978, IEEE ¹rans. Automat. Control, 23(6), 1043—1047.

De Moor, B. and J. Vandewalle (1987). A geometrical strategy for the identification of state space models of linear multivariable systems with singular value decomposition. In Proc. 3rd Int. Symp. in Applications of Multivariable System Techniques, Plymouth, U.K. pp. 59—69. De Moor, B., J. Vandewalle, L. Vandenberghe and P. Van Mieghem (1988). A geometrical strategy for the identification of state space models of linear multivariable systems with singular value decomposition. In Proc. IFAC 88, Beijing, China, pp. 700—704. Gopinath, B. (1969). On the identification of linear time-invariant systems from input—output data. Bell System ¹ech. J., 48(5), 1101—1113. Jansson M. and B. Wahlberg (1995). On weighting in state-space subspace system identification. In Proc. European Control Conf. ECC’95, Roma, Italy, pp. 435—440. Jansson, M. and B. Wahlberg (1996a). A linear regression approach to state-space subspace system identification. Signal Processing, 52(2), 103—129. Jansson, M. and B. Wahlberg (1996b). On consistency of subspace system identification methods. In Preprints of 13th IFAC World Congress, San Francisco, California, pp. 181—186. Jansson, M. and B. Wahlberg (1997). Counterexample to general consistency of subspace system identification methods. In 11th IFAC Symp. on System Identification, Fukuoka, Japan, pp. 1677—1682. Larimore, W. E. (1990). Canonical Variate Analysis in Identification, Filtering and Adaptive Control. In Proc. 29th CDC, Honolulu, Hawaii, pp. 596—604. Liu, K. (1992). Identification of multi-input and multi-output systems by observability range space extraction. In Proc. CDC, Tucson, AZ, pp. 915—920. Ljung, L. (1987). System Identification: ¹heory for the ºser, Prentice-Hall, Englewood Cliffs, NJ. Peternell, K., W. Scherrer and M. Deistler (1996). Statistical analysis of novel subspace identification methods. Signal Processing, Special Issue on Subspace Methods, Part II: System Identification, 52(2), 161—177. So¨derstro¨m, T. and P. Stoica (1981). Comparison of some instrumental variable methods—consistency and accuracy aspects. Automatica, 17(1), 101—115. So¨derstro¨m, T. and P. G. Stoica (1983). Instrumental »ariable Methods for System Identification, Springer, Berlin. So¨derstro¨m, T. and P. Stoica (1989). System Identification, 1989, Prentice-Hall International, Hemel Hempstead, U.K. Van Overschee, P. and B. De Moor (1994). N4SID: subspace algorithms for the identification of combined deterministicstochastic systems. Automatica, Special Issue on Statistical Signal Processing and Control, 30(1), 75—93. Van Overschee, P. and B. De Moor (1995a). Choice of statespace basis in combined deterministic-stochastic subspace identification, Automatica, Special issue on trends in system identification, 31(12), 1877—1883. Van Overschee, P. and B. De Moor (1995b). A unifying theorem for three subspace system identification algorithms, Automatica, Special issue on trends in system identification, 31(12), 1853—1864. Verhaegen, M. (1991). A novel non-iterative MIMO state space model identification technique. In Proc. 9th IFAC/IFORS Symp. on Identification and System Parameter Estimation, pp. 1453—1458, Budapest. Verhaegen, M. (1993). Subspace model identification. Part III: analysis of the ordinary output-error state space model identification algorithm. Int. J. Control, 58, 555—586. Verhaegen, M. (1994). Identification of the deterministic part of MIMO state space models given in innovations form from input—output data. Automatica, Special Issue on Statistical Signal Processing and Control, 30(1), 61—74. Verhaegen, M. and P. Dewilde (1992). Subspace model identification. Part I: the output-error state-space model identification class of algorithms. Int. J. Control, 56, 1187—1210. Viberg, M. (1995). Subspace-based methods for the identification of linear time-invariant systems. Automatica, Special Issue on ¹rends in System Identification, 31(12), 1835—1851.

On consistency of subspace methods Viberg, M., B. Ottersten, B. Wahlberg and L. Ljung (1991). A statistical perspective on state-space modeling using subspace methods. In Proc. 30th IEEE Conf. on Decision & Control, Brighton, U.K. pp. 1337—1342. Viberg, M., B. Wahlberg and B. Ottersten (1997). Analysis of state space system identification methods based on instrumental variables and subspace fitting. Automatica, 33(9), 1603—1616. Wahlberg, B. and M. Jansson (1994). 4SID Linear Regression. In Proc. IEEE 33rd Conf. on Decision and Control, Orlando, U.S.A., pp. 2858—2863.

1519

serve that the relation (32) holds. Define

z"

yB (t) @ u (t) @ u (t#b) ?

and express z in the following two equivalent ways: 1 z"S u (t!n) Wa(q\) ?>@>L and

APPENDIX A—RANK OF COEFFICIENT MATRIX ¸emma 13. Let A be an n;n matrix with all eigenvalues inside the unit circle and let B be an n;m matrix. Define the matrix polynomial FS(q\) as in equation (18). The coefficient matrix is, then [FS 2 FS ], L  and it has the same rank as the reachability matrix of (A, B).

C @ z" 0 0



[FS 2 FS ]" L 

[AL\B AL\B 2 B]

0

a I K

0 \

2



xB(t)

xB(t)

u (t) @ u (t#b) ?

u (t) @ u (t#b) ?

2



N2.

(B.2)

From the proof of Theorem 6 we know that the covariance matrix in the middle of equation (B.2) is positive definite if u(t) is at least PE(a#b#n) and if (A, B) is reachable. If rank+C ,"n, @ then N is full column rank and it follows that

Comparison of coefficients leads to

0



(B.1)

"a(q\)(qI!A)\B

a I K a I K $

xB(t)

u (t) _N u (t) . @ @ u (t#b) u (t#b) ? ?

1 1 E +zz2,"S E u (t!n) u2 (t!n) S2 W a(q\) ?>@>L W a(q\) ?>@>L

"N E

"(a #a q\#2#a q\L)   L ;q\(I#q\A#q\A#2)B.

xB(t)

Next, study

Proof. The result follows from a simple construction. Let a(q\) be as in equation (19); then, Adj+qI!A,B FS(q\)" qL

U 0 @ I 0 @K 0 I ?K

rank+E +zz2,,"n#(a#b)m.

0 $

,

a I a I 2 a I L\ K L\ K K where I is a m;m identity matrix. The last factor is clearly K non-singular (since a "1), so the lemma is proven.  The lemma shows that the first block row of S as defined in equation (20) has full row rank if (A, [B K]) is reachable. This implies that S has full row rank (since a "1).  APPENDIX B—RANK OF SYLVESTER MATRIX Consider the Sylvester matrix S in equation (33) generalized W to the multi-input case. This means that every a is replaced by G a I and that every FW is now a matrix of dimension l;m. The GK G dimension of S becomes ((a#b)m#bl);((a#b#n)m). ObW

(B.3)

Furthermore, if u(t) is PE(a#b#n) then E





1 1 u (t!n) u2 (t!n) a(q\) ?>@>L a(q\) ?>@>L

is positive definite (So¨derstro¨m and Stoica, 1983). Notice that throughout the paper, we assume that the system is stable. This assumption implies that a(z) has all roots outside the unit circle. Comparing equation (B.3) and equation (B.1), it can be concluded that, rank+S ,"n#(a#b)m. W This result holds if (A, B) is reachable, if (A, C) is observable, and if b is greater than or equal to the observability index. The observability and reachability assumptions are equivalent to requiring that the polynomials FW(z) and a(z) do not share any common zeros.