Linear approximation and identification of MIMO Wiener–Hammerstein systems

Linear approximation and identification of MIMO Wiener–Hammerstein systems

Automatica 71 (2016) 118–124 Contents lists available at ScienceDirect Automatica journal homepage: www.elsevier.com/locate/automatica Brief paper ...

609KB Sizes 2 Downloads 231 Views

Automatica 71 (2016) 118–124

Contents lists available at ScienceDirect

Automatica journal homepage: www.elsevier.com/locate/automatica

Brief paper

Linear approximation and identification of MIMO Wiener–Hammerstein systems✩ Tohru Katayama a,b , Hajime Ase c a

Ritsumeikan University, Kusatsu Shiga 525-8577, Japan

b

Kyoto University, Japan

c

Mathematical Technology Institute, 1-64-14 Shibasaki, Chofu, Tokyo 182-0014, Japan

article

info

Article history: Received 13 November 2014 Received in revised form 3 March 2016 Accepted 13 April 2016

Keywords: Wiener–Hammerstein system Linear approximation Orthogonal projection Subspace identification Basis function expansion Separable least-squares

abstract This paper considers the linear approximation and identification of multi-input multi-output (MIMO) Wiener–Hammerstein systems, or LNL systems. Evaluating the input–output cross-covariance matrix of the MIMO LNL system for Gaussian inputs, we show that the best linear approximation of the MIMO LNL system in the mean square sense can be obtained by the orthogonal projection (ORT) subspace identification method. For each allocation of the poles of the best linear approximation between the two linear subsystems, the unknown parameters in the numerators of the linear subsystems and the coefficients of a basis function expansion of the nonlinearity are estimated by applying the separable least-squares. The best LNL system is the one that gives the minimum mean square output error. A numerical example is included to show the feasibility of the present approach. © 2016 Elsevier Ltd. All rights reserved.

1. Introduction The classical result due to Bussgang (1952) that exactly computes the input–output correlation function of a nonlinear system has been applied to the identification of single-input single-output (SISO) Wiener–Hammerstein systems, or LNL systems (Billings & Fakhouri, 1982; Hunter & Korenberg, 1986), where a static nonlinearity (N) is sandwiched by two linear (L) subsystems. For recent developments in the identification of LNL systems; see Greblicki (2012), Mu and Chen (2014), Schoukens, Pintelon, and Enqvist (2008). Also, the result of Bussgang (1952) has been employed to develop the multivariable output-error state space (MOESP)-based methods for identifying multi-input multi-output (MIMO) Wiener and Hammerstein systems (Verhaegen & Westwick, 1996; Westwick & Verhaegen, 1996). The linear approximation problems for SISO nonlinear finite impulse response (NFIR) systems, that include LNL systems, have extensively been studied using the classical Wiener theory (Enqvist & Ljung, 2005),

✩ The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Andrea Garulli under the direction of Editor Torsten Söderström. E-mail addresses: [email protected] (T. Katayama), [email protected] (H. Ase).

http://dx.doi.org/10.1016/j.automatica.2016.04.040 0005-1098/© 2016 Elsevier Ltd. All rights reserved.

extending the results of Bussgang (1952). Moreover, a partitioning approach for identifying an SISO Wiener–Hammerstein system has been developed by using the best linear approximation of it (Sjöberg, Lauwers, & Schoukens, 2012; Sjöberg & Schoukens, 2012), therein the consistency of the initialization algorithm is shown. In Ase and Katayama (2015), motivated by the subspacebased approaches (Verhaegen & Westwick, 1996; Westwick & Verhaegen, 1996) and by the partitioning method (Sjöberg & Schoukens, 2012), we have presented a subspace-based method of identifying the Wiener–Hammerstein benchmark model by using the orthogonal projection (ORT) subspace method (Picci & Katayama, 1996) and the separable least-squares (Golub & Pereyra, 1973). In this paper, we deal with the linear approximation and identification of MIMO Wiener–Hammerstein systems, extending the SISO results of Ase and Katayama (2015) to MIMO systems. We first derive the input–output cross-covariance matrix of an MIMO LNL system for Gaussian inputs, from which we show that the best linear approximation, or the best linear model, of the LNL system in the mean square sense is obtained by replacing the static nonlinearity with the equivalent gain matrix. It is also shown that the best linear model is consistently identified using the ORT subspace method. To identify the LNL system, the poles of the best linear model are then allocated between the two linear subsystems by

T. Katayama, H. Ase / Automatica 71 (2016) 118–124

119

Proposition 1. The input–output cross-covariance matrix of the LNL system of Fig. 1 is given by Ryu (τ ) = Fig. 1. LNL system.

using a state transformation (Ase & Katayama, 2015). For each realizable allocation of the poles, we estimate unknown system parameters and the coefficients of a basis function expansion of the nonlinearity by the separable least-squares, which is solved using a gradient-based optimization method (Wills & Ninness, 2008); the best LNL configuration is selected based on the mean square output error. The paper is organized as follows. Section 2 computes the input–output cross-covariance matrix of MIMO LNL systems. In Section 3, we consider the best linear approximation of the LNL system from the point of view of the orthogonal projection. The identification procedure of the best linear model by the ORT method is discussed in Section 4, and Section 5 explains a method of allocating the poles of the best linear model between the two linear subsystems. Based on a basis function expansion of the nonlinearity, we outline a method of identifying MIMO LNL systems in Section 6, and a feasibility study is included in Section 7. Section 8 concludes this paper. Notation: E {·} denotes the mathematical expectation, and Eˆ {· | ·} orthogonal projection. Two random vectors a and b are said to be orthogonal, or uncorrelated, if E {abT } = 0, which is expressed as a ⊥ b. 2. Input–output cross-covariance matrix

∞  k =0

Assumption 1. (i) The input u(t ) is a zero mean stationary Gaussian process with a finite covariance matrix, and ν(t ) is a zero mean white noise sequence. (ii) The linear subsystems G1 (z ) and G2 (z ) are stable, and there is no pole-zero cancellation between them (Anderson & Gevers, 1981). (iii) The nonlinearity is a measurable function, and the variance of output of the nonlinearity is bounded, i.e. E {|fij (v)|2 } < ∞,

i = 1, . . . , q; j = 1, . . . , r

and each element of the nonlinearity is well approximated by a basis function expansion (Ljung, 1999).  The input–output cross-covariance matrix is defined by Ryu (τ ) = E {y(t )uT (t −τ )}, τ = 0, ±1, . . . , and other covariance matrices (·) are defined similarly. Let Gi be the impulse response matrices of Gi (z ), i.e. Gi (z ) =

∞ 

(k)

Gi z −k ,

∞ 

(j)

G2 Ruu (τ − j − k)

(1)

j =0

where Ruu (·) is the covariance matrix of the input u(t ), and F e is the equivalent gain matrix of the nonlinearity defined by F e := 1 q×r E {f (v(t ))v T (t )}R− (Roberts & Spanos, 1990), where vv (0) ∈ R Rvv (0) is the covariance matrix of v(t ). Proof. See Appendix.



In the next section, we derive a useful orthogonal projection result from (1). 3. Linear approximation of LNL system Let U− := span{u(t ), u(t − 1), . . .} be the linear subspace t spanned by the past inputs, where the over-bar denotes the closure in mean square. Let the orthogonal projection onto U− t be defined by Eˆ {· | U− } . t By the definition of the covariance matrix, we see that (1) is equivalent to the following condition y(t ) −

∞ 

(j)

G1

j =0

∞ 

(k)

F e G2 u(t − j − k) ⊥ u(t − τ )

k=0

for τ = 0, ±1, . . . . Define yd (t ) =

∞  j =0

We consider the LNL system shown in Fig. 1, where G1 (z ) and G2 (z ) are linear subsystems, and f : Rr → Rq is a static nonlinearity. Also, u(t ) ∈ Rm is the input, v(t ) ∈ Rr is the output of G2 (z ), w(t ) ∈ Rq is the output of the nonlinearity and the input to G1 (z ), y0 (t ) ∈ Rp is the noise-free output, y(t ) ∈ Rp is the output, and ν(t ) ∈ Rp is the output noise. To ensure that all the variables are 2nd-order stationary random processes, we assume the following.

(k)

G1 F e

(j)

G1 F e

∞ 

(k)

G2 u(t − j − k).

(2)

k=0

− d Then, yd (t ) satisfies y(t ) − yd (t ) ⊥ U− t and y (t ) ∈ Ut , implying that yd (t ) is the orthogonal projection of y(t ) onto the subspace − d ˆ U− t , i.e. y (t ) = E {y(t ) | Ut }. It follows from (2) that yd (t ) = G1 (z )F e G2 (z )u(t ). By the property of orthogonal projection, we see that yd (t ) is the linear least-squares estimate of y(t ) given the past inputs (Enqvist & Ljung, 2005). Hence,

Gd (z ) := G1 (z )F e G2 (z )

(3)

is the best linear approximation, or the best linear model, of the LNL system. The best linear model is also called the deterministic system. Remark 1. The best linear model Gd (z ) minimizes the effect of the nonlinearity in the mean square sense (Enqvist & Ljung, 2005; Schoukens, Pintelon, Dobrowiecki, & Rolain, 2005), so that it depends on the statistics of the input to the nonlinearity. In fact, Gd (z ) is the transfer matrix obtained by replacing the static nonlinearity f (·) in Fig. 1 with the equivalent gain matrix F e .  Remark 2. Since there exists no pole-zero cancellation in the right-hand side of (3) by Assumption 1(ii), the poles of Gd (z ) are the sum of poles of G1 (z ) and G2 (z ). This fact will be utilized for partitioning the poles of the best linear model into the two linear subsystems.1 

i = 1, 2.

k=0

Under Assumption 1, we show the following result, an MIMO extension of the Bussgang’s result (Bussgang, 1952).

1 We can partition the zeros of G (z ), if the inverse G−1 (z ) exists, i.e. if the d d matrix D1 F e D2 of (7) is nonsingular. In this paper, however, we do not consider the partitioning of zeros, since it seems that the case where Gd (z ) is invertible is very rare in practice.

120

T. Katayama, H. Ase / Automatica 71 (2016) 118–124

(l)

(l−i)

(i)

Define Gd = F e G2 , l = 0, 1, . . . . Then, the i=0 G1 orthogonal projection (2) is expressed as yd (t ) =

∞ 

l

(l)

Gd u(t − l).

We define the extended observability matrix of (6) as C¯ ¯C A¯

   

  kp×n , ..  ∈ R .  C¯ (A¯ )k−1

Since, by Assumption 1(ii), Gd (z ) is stable, we can show that for a sufficiently large k, the infinite sum can be well approximated by the finite sum

and the block lower triangular Toeplitz matrix as

¯ D C¯ B¯

 yˆ d (t ) :=

k>n

Ok = 

l=0

k−1 



(l)

Gd u(t − l).

(4)

l =0

In other words, the truncation error E {∥yd (t ) − yˆ d (t )∥2 } becomes arbitrarily small for large k. It should be noted (Enqvist & Ljung, 2005) that the finite order model (4) approximates the best linear model rather than the true nonlinear system. From given input–output data, we can estimate the impulse (l) response sequence {Gd , l = 0, 1, . . . , k − 1} in (4) by the linear regression under the assumption that the input u(t ) satisfies the PE condition (Ljung, 1999). Then, based on the estimated impulse response sequence, we could derive a state space model describing yˆ d (t ) by using the Ho–Kalman realization algorithm (Verhaegen & Verdult, 2007). In this paper, however, without taking this two-step approach, we use the ORT subspace identification method (Katayama, 2005; Picci & Katayama, 1996) to identify the best linear model Gd (z ). A reason of using the ORT method is explained in Ase and Katayama (2015), where a numerical result comparing with the past output (PO)-MOESP and the ORT methods is included. 4. Identification of best linear model

  

Tk = 



.. . C¯ (A¯ )k−2 B¯

¯ D .. .

..

···

C¯ B¯

   ∈ Rkp×km 

.

¯ D

where it should be noted that Tk consists of the impulse responses in (4). Moreover, in terms of input data, we define the block Hankel matrices u(1)

u(2)



. U1|k =  .. 

u(k)

.. . u(k + 1)

u(k + 1)

Uk+1|2k = 



.. .

···

 km×N ′ ∈R u(k + N ′ − 1) ···

.. . u(2k + 1)

u(2k)



.. .

u(k + 2)



u(N ′ )

···

···

u(k + N ′ )

 .. km×N ′ ∈R . u( N )



where N := N − 2k + 1, and U1|k and Uk+1|2k denote the past and future inputs, respectively. The past and future of the outputs Y1|k and Yk+1|2k are defined similarly. Then, from (6), we derive the matrix equation Ykd+1|2k = Ok Xkd+1 + Tk Uk+1|2k

4.1. State space model We assume that a state space model of the LNL system of Fig. 1 is given by x1 (t + 1) = A1 x1 (t ) + B1 w(t )

(5a)

y(t ) = C1 x1 (t ) + D1 w(t ) + ν(t ) w(t ) = f (v(t )) x2 (t + 1) = A2 x2 (t ) + B2 u(t ) v(t ) = C2 x2 (t ) + D2 u(t )

(5b) (5c) (5d) (5e)

where x1 (t ) ∈ R and x2 (t ) ∈ R are the state vectors for G1 (z ) and G2 (z ), respectively, and A1 ∈ Rn1 ×n1 , B1 ∈ Rn1 ×q , C1 ∈ Rp×n1 , D1 ∈ Rp×q , and A2 ∈ Rn2 ×n2 , B2 ∈ Rn2 ×m , C2 ∈ Rr ×n2 , D2 ∈ Rr ×m are constant matrices. Define n := n1 + n2 . Then, we see from (3) and (5) that a state space model of Gd (z ) is given by n1

n2

¯ d (t ) + Bu ¯ (t ) xd (t + 1) = Ax

(6a)

¯ (t ) yd (t ) = C¯ xd (t ) + Du

(6b)

A¯ =



A1 0

B 1 F e C2 , A2

C¯ = [C1 D1 F e C2 ],



B¯ =

B1 F e D2 B2





¯ = D1 F e D2 . D

(7)

4.2. Subspace identification based on finite data We briefly describe the ORT subspace method (Katayama, 2005) for identifying the best linear model of (6) from given input–output data {u(t ), y(t ) | t = 1, . . . , N }.

(8)

where Xkd+1 := [xd (k + 1) · · · xd (k + N ′ )] ∈ R the (unknown) state vectors.

n×N ′

is a sequence of

Assumption 2. (i) rank(U1|2k ) = 2km, i.e. the input u(t ) satisfies the PE condition. (ii) rank(Ok ) = n and rank(Xkd+1 ) = n, i.e. the system of (6) is observable and reachable. (iii) span(Xkd+1 ) ∩ (Uk+1|2k ) = {0}, i.e. the consistency condition is satisfied (Chiuso & Picci, 2004).  In the ORT method, the image of the extended observability matrix is obtained from the orthogonal projection of the future outputs onto the complementary past inputs like the past input (PI)-MOESP (Verhaegen & Verdult, 2007). Thus, as deeply discussed in Chiuso and Picci (2004), the deterministic system can be best identified by using orthogonalized data. Under Assumption 2, we consider the LQ decomposition



where xd (t ) ∈ Rn is a state vector, and







Uk+1|2k L11  U1|k  L21  Y  = L 1|k 31 Yk+1|2k L41

0 L22 L32 L42

0 0 L33 L43

 Q1T 0  T 0  Q2     0 Q T  3 L44 Q4T 

 (9)

where L11 , L22 ∈ Rkm×km , L33 , L44 ∈ Rkp×kp are lower-triangular, ′ ′ and Q1 , Q2 ∈ RN ×km , Q3 , Q4 ∈ RN ×kp are orthogonal matrices. Then, from (9), the deterministic component of the future output Yk+1|2k is given by Ykd+1|2k = L41 Q1T + L42 Q2T .

(10)

Equating (8) and (10),

Ok Xkd+1 + Tk Uk+1|2k = L41 Q1T + L42 Q2T .

(11)

T. Katayama, H. Ase / Automatica 71 (2016) 118–124

Post-multiplying (11) by Q2 yields Ok Xkd+1 Q2 = L42 , since Uk+1|2k Q2 = L11 Q1T Q2 = 0. Thus, under the assumption that Xkd+1 Q2 has full rank, we have Im(Ok ) = Im(L42 ) and dim Im(L42 ) = n. Compute the singular value decomposition (SVD) of L42 ∈ Rkp×km as L42 = [U1 U2 ]

 Σ1 0

 T



V1

0

Σ2

V2T

≈ U1 Σ1 V1T

where Σ1 ∈ Rn×n is the diagonal matrix with the first n largest singular values, discarding Σ2 with smaller singular values. Thus, 1/2 the extended observability matrix is given by Ok = U1 Σ1 ∈ kp×n R , which is consistent as k → ∞ with N → ∞, see Chiuso and Picci (2004). Then, by using the shift invariance property of Ok , we obtain the estimates of A¯ and C¯ . Moreover, pre-multiplying (11) by U2T and post-multiplying by ¯ )L11 = U2T L41 , since U2T Ok = 0. It is also Q1 yield U2T Tk (B¯ , D ¯ ) is consistent as k → ∞ shown (Chiuso & Picci, 2004) that Tk (B¯ , D ¯ ) is linear with with N → ∞. For given Ok , we see that Tk (B¯ , D ¯ ), so that they are obtained by the least-squares respect to (B¯ , D method. Thus, the best linear model Gd (z ) is consistently identified by using the ORT subspace method.

121

We see from (14a) and (14b) that η := (A1 , A2 , B2 , C1 ) are determined by the solution X of (13). A solution method of the non-symmetric ARE of (13) is briefly mentioned. Let λ(A) = {λ1 , . . . , λn } be the eigenvalues of A, where   u

it is assumed that they are distinct.2 Let vi , ui ∈ Cn1 , vi ∈ i Cn2 be the eigenvector associated with λi ∈ λ(A). Let ∆s := {λs1 , . . . , λsn1 } be a set of n1 eigenvalues of A, and partition λ(A) ¯ s with ∆s ∪ ∆ ¯ s = λ(A). Then, into two disjoint subsets ∆s and ∆ arrange the eigenvectors associated with ∆s as

  Us Vs

 =

··· ···

us1

vs1

usn1



vsn1

∈ C(n1 +n2 )×n1 .

(15)

If Us ∈ Cn1 ×n1 is nonsingular, then Xs = Vs Us−1 is the sth ¯ s . Solving (13) solution of (13) with λ(A1 ) = ∆s and λ(A2 ) = ∆ for all realizable partitions of λ(A), we obtain all realizable pairs of (A1 , A2 ), and hence all allocations of the poles between the two subsystems; see Ase and Katayama (2015) for an example of realizable partitions. Since (14c) is independent of X , the system parameters θ := (B1 , C2 , D1 , D2 ) in (14c) cannot be determined by the state transformation; hence, they should be estimated using the input–output data. Note that the vector θ ∈ Rnθ , nθ = n1 q + n2 r + pq + rm, is formed by applying the vec operator (Verhaegen & Verdult, 2007) to B1 , C2 , D1 and D2 successively.

5. Allocation of poles of best linear model 6. Identification of LNL system

ˆ d (z ) = Let a realization of the identified best linear model be G [A, B, C , D] := C (zI − A)−1 B + D. It follows from (7) that there exists a similarity transformation T ∈ Rn×n such that T

−1

B1 F e C2 , A2





A1 AT = 0

CT = [C1 D1 F e C2 ],

T

−1

B1 F e D2 B= B2





D = D1 F e D2 .

(12)

ˆ d (z ), we Since, from Remark 2, the eigenvalues of A are the poles of G partition the eigenvalues in all possible ways between A1 and A2 in ˆ d (z ) between two linear subsystems order to allocate the poles of G ˆ 1 (z ) and Gˆ 2 (z ). The following procedure is adapted from Ase and G Katayama (2015). Partition the given (A, B, C ) as  A=

A11 A21

A12 , A22



B=

 1 B 2 ,

C = [C 1 C 2 ]

B

where Aij ∈ Rni ×nj , Bi ∈ Rni ×m , C j ∈ Rp×nj , i, j = 1, 2. Let X ∈ Rn2 ×n1 be a solution of the non-symmetric algebraic Riccati equation (ARE) (Laub, 1979)

6.1. Model of nonlinear element We assume that each element of the nonlinearity is well approximated by a basis function expansion (Ljung, 1999) fˆij (vj (t )) =

L 

αlij ζl (vj (t )) ij

where ζ1 (·), . . . , ζL (·) are basis functions, αl are coefficients to be estimated, and where i = 1, . . . , q; j = 1, . . . , r. We assume that fˆij (0) = 0, so that the constant term ζ0 (·) is not included.

ˆ 1 (z ) = Let the estimates of linear subsystems be given by G [A1 , B1 , C1 , D1 ] and Gˆ 2 (z ) = [A2 , B2 , C2 , D2 ]. Also, define ϕ(vj (t )) = [ζ1 (vj (t )) · · · ζL (vj (t ))] ∈ R1×L and α ij = [α1ij · · · αLij ]T ∈ RL . Since the input to the nonlinearity is approximated as vˆ (t ) = ˆ 2 (z )u(t ), the estimate of the ith output of the nonlinearity is G expressed as

w ˆ i (t ) =

r 

fˆij (ˆvj (t )) =

j =1

− XA11 − XA12 X + A21 + A22 X = 0  and let the similarity transformation be given by T =

T

A11 + A12 X AT = 0

T −1 B =



1

B

B2 − XB1



A

22

,

A12 − XA12

CT = C 1 + C 2 X

ˆ 1 (z )w( yˆ 0 (t ) = G ˆ t)

X

In2

. It

B2 = B2 − XB1 ,





A2 = A22 − XA12 C1 = C 1 + C 2 X



12

B1 A F e [C2 D2 ] = D1 C2

1

r  

(1)

(q)

ˆ 1 ϕ(ˆvj (t ))α 1j + · · · + Gˆ 1 ϕ(ˆvj (t ))α qj G



j =1

ˆ (1i) are the ith column of Gˆ 1 (z ). Define Zt (j, θ ) where G C

 2

.

Comparing with (12) yields A1 = A11 + A12 X ,

i = 1, . . . , q.

j =1

Thus, the noise-free output is approximated as

0





ϕ(ˆvj (t ))α ij ,



In1

=



r 

(13)

then follows from (13) that −1

(16)

l =1

(14a)

[Gˆ (11) ϕ(ˆvj (t )) · · · Gˆ (1q) ϕ(ˆvj (t ))] ∈ Rp×qL . Then,  1j  α r r   .   Zt (j, θ )α •j . yˆ 0 (t ) = Zt (j, θ )  ..  = j =1 j=1 α qj

=

(17)

(14b)



B . D

(14c)

2 If there are multiple eigenvalues, then the Schur method may be applicable (Laub, 1979).

122

T. Katayama, H. Ase / Automatica 71 (2016) 118–124

Since Zt (j, θ ) are functions of θ , we see that yˆ 0 (t ) is a nonlinear function with respect to θ , but is linear with respect to α •j ∈ RqL . 6.2. Separable least-squares

Table 1 Poles and zeros of linear subsystems. G1 (z ) Poles Zeros

0.8, 0.7 ± 0.55j, 0.4 ± 0.75j, 0.1 ± 0.85j 2, 1.5553, 0.9948, 0.1872 ± 0.7988j

G2 (z ) Poles

0.69, 0.9 ± 0.2j; Zero 1.0526

Define the extended vectors as



y(1)





. Y =  ..  , 

..  , .  yˆ 0 (N )



y(N )

where Y, Y ∈ R matrix



pN

Z1 (1, θ )

and α ∈ R

rqL

ZN (1, θ )

 •1  α  .  α =  ..  α

1

•r

0.5

, and also define the regression

···

Z1 (r , θ )

1 pN

0



.. .

···

 pN ×rqL . ∈R ZN (r , θ )

–1

(18)

Since, from (17), the estimate of the noise-free extended output is expressed as Y0 = Φ (θ )α , the extended output error vector is given by E (θ , α) := Y − Φ (θ )α ∈ RpN . Consider the least-squares problem to find the parameter vectors (θ , α ) minimizing the mean square error V (θ, α) :=

0 –0.5

.. .

 Φ (θ) = 



Y0 = 

 0

yˆ 0 (1)

∥E (θ , α)∥2 =

1 pN

∥Y − Φ (θ )α∥2 .

(19)

Since E (θ , α) is a nonlinear function of θ , but is a linear function of α , the nonlinear least-squares problem is called a separable leastsquares (Golub & Pereyra, 1973). For a given θ , the least-squares estimate of α becomes α = Φ Ď (θ )Y, where (·)Ď denotes the pseudo-inverse. Substituting the least-squares estimate into (19) yields the variable projection functional (Golub & Pereyra, 1973) 1 V˜ (θ) = ∥(IpN − Φ (θ )Φ Ď (θ ))Y∥2 pN

(20)

which should be minimized with respect to θ by using a gradientbased optimization method (Wills & Ninness, 2008). Suppose that the matrix Φ (θ ) of (18) has a constant rank over an open parameter set Ω ⊂ Rnθ . It is shown (Golub & Pereyra, 1973) that if θˆ ∈ Ω minimizes V˜ (θ ), the pair (θˆ , α) ˆ with αˆ = Φ Ď (θˆ )Y also minimizes V (θ , α) of (19). Thus, the method of separable least-squares is quite effective for many data fitting problems (Golub & Pereyra, 2003); see also Bruls, Chou, Haverkamp, and Verhaegen (1999) for applications to linear and nonlinear system identification. 6.3. Outline of identification procedure The algorithm for identifying the Wiener–Hammerstein system is summarized, where n1 and n2 are known.

ˆ d (z ) = [A, B, C , D] by the ORT method from given Step 1. Identify G input–output data. Step 2. Compute the eigenvalues λ(A) and the associated eigenvectors. Then, enumerate all realizable subsets ∆s = {λs1 , . . . , λsn1 }, s = 1, . . . , ρ , where ρ denotes the number of realizable partitions. Step 3. Arrange the eigenvectors associated with ∆s as in (15). If Us is nonsingular, Xs = Vs Us−1 gives the sth solution of (13), and then compute ηs = (A1 , A2 , B2 , C1 )s from (14a) and (14b). Step 4. Compute θˆs that minimizes the output error V˜ (θ ) of (20) and then put αˆ s = Φ Ď (θˆs )Y. The numerical algorithm employed is a version of the gradient-based method (Wills & Ninness, 2008); see also Ase and Katayama (2015) for a related numerical algorithm for the Benchmark model. Step 5. Repeat Steps 2–4, and find s∗ such that θˆs∗ attains the minimum of {V˜ (θˆs ) | s = 1, . . . , ρ}. Then, the parameters (θˆs∗ , αˆ s∗ , ηs∗ ) determine the best configuration of the LNL system.

–1.5 –1.5

–1

–0.5

0

0.5

1

1.5

–5 –5

0

ˆ d (z ) for 50 simulation runs, where the true Fig. 2. Poles (left) and zeros (right) of G ones are shown in black.

Remark 3. The algorithm in Step 4 gives only local minima, since V˜ (θ ) is a very complicated nonlinear (non-convex) function of θ . Since ηs obtained in Step 3 are fixed in Step 4, the number of unknown parameters in the gradient search is considerably reduced compared with the full parameterization (Wills & Ninness, 2008). Also, if the nonlinear block is diagonal, a good initial estimate of θ is obtained by using the SVD of the right-hand side of (14c), assuming that F e is, say, the identity matrix; otherwise, we need trial and error.  7. Numerical example Consider a two-input two-output Wiener–Hammerstein system, where the linear subsystems are given by

 0.48 −0.36z + 0.72   − 0.8z + 0.7225 z − 0.8 G1 (z ) =   −0.35z + 0.7 −0.24z + 0.8 z 2 − 0.2z + 0.7325 z 2 − 0.14z + 0.7925   0.81 0   G2 (z ) =  z − 0.69 −0.76z + 0.8  . 0 z 2 − 1.8z + 0.85 

z2

Table 1 shows the poles and zeros of the linear subsystems. Since G1 (z ) and G2 (z ) are strictly proper, we have D1 = 0, D2 = 0. We assume that the nonlinear block is diagonal with the following elements: f11 (v1 ) = 0.4444v13 ,

f22 (v2 ) = 1.5v2 − 0.3v23 .

(21)

For numerical simulations, we use both the polynomial and the Legendre (orthogonal) polynomial expansions (Greblicki & Pawlak, 2008), so that if L ≥ 3, the true nonlinearities are in the model set formed by the basis functions. The input u(t ) ∈ R2 and the noise ν(t ) ∈ R2 are uncorrelated Gaussian white noises with means zero and the standard deviations: σu1 = 0.6, σu2 = 0.42 and σν1 = 0.05, σν2 = 0.05, and the data length is N = 30,000. The sample values of the output S/N ratios of channels 1 and 2 are 27.6 dB and 30.2 dB, respectively. Fig. 2 shows the poles and zeros of the identified linear model ˆ d (z ) for 50 Monte Carlo simulation runs with k = 18 in the ORT G subspace method. We see that the estimated poles have clustered around their true values, but the estimates of zeros are scattered. ˆ d (z ) In fact, we see from Table 1 that Gd (z ) has six finite zeros, but G has eight zeros, so that two infinite zeros are incorrectly estimated as finite zeros.

T. Katayama, H. Ase / Automatica 71 (2016) 118–124

1

0.5

0

0

10

20

0.5

0

0

–0.5

–0.5

5 0

–5

–5

–10

–10

–15

–15

–20

–20

–1

–1.5 –1.5

–1.5 –1.5

–1

–0.5

0

0.5

1

1.5

–1

–0.5

0

0.5

1

2

3

4

15

15

10

10

5

5 0

–5

–5

–10

–10

–15

–15

–20

–20

–25 –1

–25 0

0

1

0.5

10

5 0

Fig. 3. Convergence of iterations.

1

10

–25

30

123

–25 0

1

1

2

3

4

0

1

2

3

0

1

2

3

(11) jω (e ) (upper left), Gˆ (112) (ejω ) (upper right), Gˆ (121) (ejω ) (lower (ejω ) (lower right), where the true ones are indicated in black.

ˆ1 Fig. 5. Bode plots of G (22)

Fig. 4. Estimated nonlinearities f11 (left) and f22 (right) by using the Legendre polynomial expansion with L = 5, where the true ones are indicated in black.

ˆ1 left), G

Table 2 Output error norms using two different basis functions. L

Polynomial

Legendre

3 4 5

0.0757 (6.77 × 104 ) 0.0756 (7.28 × 108 ) 0.0757 (9.83 × 107 )

0.0758 (3.94 × 103 ) 0.0756 (5.18 × 106 ) 0.0756 (1.57 × 108 )

0

0

P L

−0.0015

−0.0030 −0.0021

0.0107

f22

1.5000

P L

1.4974 1.4974

0.4444

0

−0.0007 −0.0007

0.4506 0.4404

0

−0.3000

0

−0.2972 −0.2971

0.0004 0.0004

5

0

0

–5

–5

–10

–10

–15

–15

–25

0 0.0009 0.0006

10

5

–20

–20

Table 3 True and estimated coefficients of nonlinear functions, where P(L) denotes the polynomial (Legendre). f11

10

−0.0010 0.0005 0

−0.0005 −0.0005

To show the feasibility of the procedure outlined in Section 6.3, we have performed the gradient-based search in Step 4, assuming that the parameters η are known. Since r = q = 2, the dimensions of unknown parameters are dim(θ ) = 20 and dim(α) = 6, 8, 10 for L = 3, 4, 5, respectively. Fig. 3 shows a convergence of the output error norm, i.e. the square root of V˜ (θ ), for the Legendre polynomial expansion with L = 5. We see that the convergence is not monotonously decreasing.3 Table 2 displays the performance of the output error for the two basis function expansions for L = 3, 4, 5. Also, the condition numbers c [Φ (θ )] at the iteration number 25 of the gradient-based search are included in parentheses. Since the standard deviation of output noises are σν1 = σν2 = 0.05, we see that the lower bound of the performance is 0.0707, so that the minimum is about 8% larger than the lower bound. Table 3 displays the true and estimated coefficients of the nonlinearities of (21) for L = 5, and Fig. 4 shows the true and estimated nonlinearities obtained by the Legendre polynomial expansion with L = 5; the results obtained by the polynomial

3 For L = 5, the computation time by the gradient-based optimization with N = 30,000 is about 42 min (1.2 min per iteration) by a desktop (Intel core i7 3.40 GHz) for a given partition.

–25 0

1

2

3

4

(11)

ˆ2 Fig. 6. Bode plots of estimated G ones are indicated in black.

0

1

2

3

(ejω ) (left), Gˆ (222) (ejω ) (right), where the true

expansion are nearly the same. In view of the above results, both the polynomial and Legendre polynomial expansions give quite good results. But, the condition numbers for the Legendre polynomial expansions are slightly smaller than those for the polynomial expansions except for L = 5. Figs. 5 and 6 display Bode plots of estimated linear subsystems by the Legendre polynomial expansion with L = 5. In Fig. 5, we ˆ (112) (ejω ) and observe small discrepancies in the Bode plots of G (22)

ˆ 1 (ejω ) near the frequencies of 1.1 and 1.5 [rad/sec], respectively. G The discrepancies are apparently due to the peaks of Bode gains ˆ (111) (ejω ) and Gˆ (121) (ejω ) near the same frequencies. In Fig. 6, of G however, there are no such phenomena; this is because G2 (z ) is diagonal, so that there is only one transfer function in each channel. We see that if the nonlinearities are in a model set defined by a small number of basis functions, we get good identification results. Otherwise, the performance somewhat degrades, since we need the basis function expansion (16) with a larger L. For a more realistic example, see the numerical result of the Benchmark model (Ase & Katayama, 2015). 8. Conclusions In this paper, we have shown that the best linear model of an MIMO Wiener–Hammerstein system for Gaussian inputs is consistently estimated by the ORT subspace identification method. Moreover, based on the partitioning of the poles of the best linear model, a procedure of identifying MIMO Wiener–Hammerstein systems has been developed by using the basis function expansion of the nonlinearity and the separable least-squares. Simulation results are included to show the feasibility of the procedure, though it remains to improve the efficiency of the numerical algorithm.

124

T. Katayama, H. Ase / Automatica 71 (2016) 118–124

Acknowledgments The authors are grateful to anonymous reviewers for their many helpful suggestions. Appendix. Proof of Proposition 1 Under Assumption 1, all the covariance matrices are well ∞ (k) defined. Post-multiplying y(t ) = k=0 G1 w(t − k) + ν(t ) by uT (t − τ ), and taking the expectation yield Ryu (τ ) =

∞ 

(k)

G1 Rwu (τ − k)

(A.1)

k=0

since ν(t ) is uncorrelated with u(t − τ ). By definition, Rwu (τ ) = E {w(t )uT (t − τ )} = E {f (v(t ))uT (t − τ )} holds. Since v(t ) = ∞ (j) j=0 G2 u(t − j) with u(t − j) Gaussian, the pair (v(t ), u(t − τ )) is jointly Gaussian. Let α := v(t ), β := u(t − τ ), and let Σαβ := E {αβ T }. Then, by the well-known property of conditional expectation and the definition of equivalent gain, we see that Rwu (τ ) = E {f (α)β T } = E {E {f (α)β T | α}}

= E {f (α)(E {β | α})T } −1 T = E {f (α)(Σβα Σαα α) } −1 = E {f (α)α T }Σαα Σαβ = F e Rvu (τ )

which is substituted into (A.1) to get Ryu (τ ) =

∞ 

(k)

G1 F e Rv u (τ − k).

(A.2)

k=0

Also, the cross-covariance matrix between the input and output of the linear system G2 (z ) is given by Rv u (τ ) = E {v(t )uT (t − τ )} =

∞ 

(j)

G2 Ruu (τ − j).

j =0

Substituting this expression into (A.2) yields (1).



References Anderson, B. D. O., & Gevers, M. R. (1981). On multivariable pole-zero cancellations and the stability of feedback systems. IEEE Transactions on Circuits and Systems, CAS-28(8), 830–833. Ase, H., & Katayama, T. (2015). A subspace-based identification of WienerHammerstein benchmark model. Control Engineering Practice, 44, 126–137. Billings, S. A., & Fakhouri, S. Y. (1982). Identification of systems containing linear dynamic and static nonlinear elements. Automatica, 18(1), 15–26. Bruls, J., Chou, C. T., Haverkamp, B. R. J., & Verhaegen, M. (1999). Linear and nonlinear system identification using separable least-squares. European Journal of Control, 5(1), 116–128. Bussgang, J. J. (1952). Crosscorrelation functions of amplitude-distorted Gaussian signals. Technical Report No. 216. Research Laboratory of Electronics, MIT. Chiuso, A., & Picci, G. (2004). Subspace identification by data orthogonalization and model decoupling. Automatica, 40(10), 1689–1703. Enqvist, M., & Ljung, L. (2005). Linear approximations of nonlinear FIR systems for separable input processes. Automatica, 41(3), 459–473.

Golub, G. H., & Pereyra, V. (1973). The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM Journal on Numerical Analysis, 10(2), 413–432. Golub, G. H., & Pereyra, V. (2003). Separable nonlinear least squares: the variable projection method and its applications. Inverse Problems, 19(2), 1–26. Greblicki, W., & Pawlak, M. (2008). Nonparametric system identification. Cambridge University Press. Hjalmarsson, H., Rojas, C. R., & Rivera, D. E. (Eds.) (2012). Special section: WienerHammerstein system identification benchmark. Control engineering practice: vol. 20(11) (pp. 1095–1246). Hunter, I. W., & Korenberg, M. J. (1986). The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cybernetics, 55, 135–144. Katayama, T. (2005). Subspace methods for system identification. Springer. Laub, A. J. (1979). A Schur method for solving algebraic Riccati equations. IEEE Transactions on Automatic Control, AC-24(6), 913–921. Ljung, L. (1999). System identification – theory for the user (2nd ed.). Prentice-Hall. Mu, B.-Q., & Chen, H.-F. (2014). Recursive identification of errors-in-variables Wiener-Hammerstein systems. European Journal of Control, 20(1), 14–23. Picci, G., & Katayama, T. (1996). Stochastic realization with exogenous inputs and ‘subspace methods’ identification. Signal Processing, 52(2), 145–160. Roberts, J. B., & Spanos, P. D. (1990). Random vibration and statistical linearization. Wiley, Dover 2003. Schoukens, J., Pintelon, R., Dobrowiecki, T., & Rolain, Y. (2005). Identification of linear systems with nonlinear distortions. Automatica, 41(3), 491–504. Schoukens, J., Pintelon, R., & Enqvist, M. (2008). Study of the LTI relations between the outputs of two coupled Wiener systems and its application to the generation of initial estimates for Wiener-Hammerstein systems. Automatica, 44(7), 1654–1665. Sjöberg, J., Lauwers, L., & Schoukens, J. (2012). Identification of WienerHammerstein models: Two algorithms based on the best split of a linear model applied to the SYSID’09 benchmark problem. Control Engineering Practice, 20(11), 1119–1125. Sjöberg, J., & Schoukens, J. (2012). Initializing Wiener-Hammerstein models based on partitioning of the best linear approximation. Automatica, 48(2), 353–359. Verhaegen, M., & Verdult, V. (2007). Filtering and system identification. Cambridge University Press. Verhaegen, M., & Westwick, D. (1996). Identifying MIMO Hammerstein systems in the context of subspace model identification methods. International Journal of Control, 63(2), 331–349. Westwick, D., & Verhaegen, M. (1996). Identifying MIMO Wiener systems using subspace model identification methods. Signal Processing, 52(2), 235–258. Wills, A., & Ninness, B. (2008). On gradient-based search for multivariable system estimates. IEEE Transactions on Automatic Control, 53(1), 298–306.

Tohru Katayama received the B.E., M.E., and Ph.D. degrees all in Applied Mathematics and Physics from Kyoto University, in 1964, 1966, and 1969, respectively. He was Professor at the Department of Applied Mathematics and Physics, Kyoto University from 1986 to 2005, and now Professor Emeritus of Kyoto University. He was the Chair of IFAC CC on Systems and Signals (2002–2005), the Vice-Chair of IFAC TB (2005–2011), and a member of EB (2011–2014). IFAC Fellow (2009). His research interests include stochastic realization, subspace identification methods, and identification of nonlinear systems.

Hajime Ase was born in Osaka, Japan in 1950 and received the M.E. degree in Applied Mathematics and Physics from Kyoto University in 1975. From 1975, he worked at Nippon Kokan steel company as an Engineer in research and development divisions. He received Doctor in Engineering degree from Kyoto University in 1988. From 2015, he is a consultant to industry on the application of mathematical optimization techniques.