Accepted Manuscript
The Frisch scheme in multivariable errors–in–variables identification Roberto Diversi, Roberto Guidorzi PII: DOI: Reference:
S0947-3580(16)30075-9 10.1016/j.ejcon.2017.05.004 EJCON 209
To appear in:
European Journal of Control
Received date: Revised date: Accepted date:
25 July 2016 8 March 2017 8 May 2017
Please cite this article as: Roberto Diversi, Roberto Guidorzi, The Frisch scheme in multivariable errors– in–variables identification, European Journal of Control (2017), doi: 10.1016/j.ejcon.2017.05.004
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
The Frisch scheme in multivariable errors–in–variables identification a
CR IP T
Roberto Diversia,∗, Roberto Guidorzia
Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna, Viale del Risorgimento 2, 40136 Bologna, Italy
Abstract
M
AN US
This paper concerns the identification of multivariable errors-in-variables (EIV) models, i.e. models where all inputs and outputs are assumed as affected by additive errors. The identification of MIMO EIV models introduces challenges not present in SISO and MISO cases. The approach proposed in the paper is based on the extension of the dynamic Frisch scheme to the MIMO case. In particular, the described identification procedure relies on the association of EIV models with directions in the noise space and on the properties of a set of high order Yule-Walker equations. A method for estimating the system structure is also described.
ED
Keywords: system identification, errors–in–variables models, multivariable models, Frisch scheme
PT
1. Introduction
AC
CE
This paper concerns the identification of multivariable errors-in-variables (EIV) models, i.e. system representations where both inputs and outputs are affected by additive noise. This class of models is particularly suitable when the interest is mainly focused on the inner laws that describe the considered process and/or on filtering applications rather than on the prediction of its future behavior. EIV models play an important role in several engineering applications, as shown in [34, 35]. Nowadays several ∗
Corresponding author Email addresses:
[email protected] (Roberto Diversi),
[email protected] (Roberto Guidorzi)
Preprint submitted to Elsevier
May 16, 2017
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
M
AN US
CR IP T
approaches for identifying SISO EIV models are available in the literature [25, 36, 33, 32, 38, 39, 24, 9, 12, 21, 17, 30, 26, 11, 37]. Recently, it has also been shown how many available methods can be put into a general framework, resulting in a Generalized Instrumental Variable Estimator (GIVE) [26]. An overview of EIV identification can be found in the survey papers [27, 28]. The identifiability of EIV models has been discussed, among others, in [2, 23, 1, 5]. It is worth noting that very few papers deal with the identification of EIV models in the MIMO case, see e.g. [7, 6, 8, 29, 22]. One of the most interesting approaches for modeling EIV processes is the Frisch scheme. It was originally introduced by Ragnar Frisch in the static case for describing the family of linear relations linking sets of noisy observations [13, 19]. This scheme was subsequently extended to the dynamic case in [3]. The dynamic case shares many interesting properties with the static one like the existence of convex curves or hypersurfaces defining the loci of solutions in the noise space that can be associated with the covariance matrix of the noisy data. Moreover, the time-shift properties of dynamic systems allow the asymptotic determination of the “true” model of the process that has generated the data. This nice property is based on the existence of a common point between the hypersurfaces associated with all models with orders greater than the actual one. Of course, in presence of finite sequences of data, covariance matrices are replaced by sample ones and the associated hypersurfaces do no longer share any common point. It is thus necessary to introduce suitable criteria in order to determine a single solution of the EIV identification problem [17]. The extension of the Frisch scheme to the identification of multi–input multi–output errors–in–variables models introduces additional challenges not shared by SISO and MISO cases. In fact, a MIMO system can be viewed as a set of interconnected subsystems whose number equals the number of outputs. Each subsystem can be associated with a hypersurface of admissible solutions and could thus be identified separately by means of the same procedures developed for SISO and MISO models. An approach of this kind is less advantageous from the computational viewpoint since it requires the solution of a set of separate identification problems. Moreover, it leads to different estimates for every set of input and output noise variances with a consequent loss of congruence that can be critical in applications like filtering and fault detection. For these reasons, it is preferable to treat the MIMO EIV model as a whole and not as a set of separate subsystems. Starting from the preliminary results reported in [7] this paper proposes a 2
CR IP T
ACCEPTED MANUSCRIPT
Figure 1: MIMO errors–in–variables system.
AC
CE
PT
ED
M
AN US
Frisch scheme-based identification procedure where multivariable EIV models are no longer associated to single points but to directions in the noise space. This approach, introduced in [14], relies, from a computational standpoint, on the computation of the intersection between a straight line from the origin and the hypersurfaces of admissible solutions. It allows to overcome the aforementioned drawbacks and to introduce congruent selection criteria. In particular, the criterion proposed in the paper takes advantage of the properties of a set of high order Yule–Walker equations. Even if the identification procedure still leads to different values for each noise variance, the ratios between the noise variances associated with every subsystem are the same. The described identification algorithm is compared to the generalized instrumental variable approach introduced in [29]. The paper proposes also a procedure for the estimation of the system structure that is based on the EIV Kalman filtering described in [10]. The organization of the paper is as follows. Section 2 reports the statement of the problem and the basic assumptions. Section 3 describes the set of solutions compatible with the Frisch scheme in the multivariable case. Section 4 shows how it is possible to associate MIMO EIV models to directions in the noise space and describes a selection criterion based on a set of high–order Yule–Walker equations. A procedure for estimating the system structure is proposed in Section 5 while Section 6 reports the results of some Monte Carlo simulations. Some concluding remarks are finally given in Section 7. 2. Statement of the problem Consider the multi–input multi–output errors–in–variables system represented in Figure 1 where u0 (t) ∈ Rr and y0 (t) ∈ Rm . The noise-free inputs 3
ACCEPTED MANUSCRIPT
and outputs are linked by the linear model Q(z) y0 (t) = P (z) u0 (t),
CR IP T
where
(1)
u0 (t) = [ u01 (t) u02 (t) . . . u0r (t) ]T
(2)
T
(3)
y0 (t) = [ y01 (t) y02 (t) . . . y0m (t) ]
AN US
and Q(z), P (z) are (m × m) and (m × r) polynomial matrices in the unitary advance operator z (z y(t) = y(t + 1)). The measures of both the input and output are considered as affected by additive noise so that the available observations are u(t) = u0 (t) + u˜(t) y(t) = y0 (t) + y˜(t),
(4) (5)
M
and the entries of u(t), u˜(t) and y(t), y˜(t) will be denoted with ui (t), u˜i (t) (i = 1, . . . , r) and yj (t), y˜j (t) (j = 1, . . . , m). As shown in [15], any MIMO system of type (1) can be represented by a minimally parameterized canonical model described by the following set of m relations
ED
y0i (t+νi )+
νij m X X
αijk y0j (t+k−1) =
j=1 k=1
r νX i +1 X
βijk u0j (t+k−1) i = 1, . . . , m
j=1 k=1
AC
CE
PT
(6) where the integers νi (i = 1, . . . , m) are the Kronecker observability indexes of the system while the integers νij are completely defined by these indexes through the following relations [15] νij = νi for i = j, νij = min (νi + 1, νj ) for i > j, νij = min (νi , νj ) for i < j.
(7)
It is also shown that the canonical model of type (6) representing system (1) is unique. The maximal observability index will be denoted as νM i.e. νM = maxi {νi } i = 1, . . . , m. 4
ACCEPTED MANUSCRIPT
Remark 1. For strictly proper systems, the index k in the right side of (6) ranges between 1 and νi .
i=1
CR IP T
Remark 2. The system order n, i.e. the McMillan degree of Q(z)−1 P (z), is given by m X n= νi . (8)
AN US
Remark 3. The considered class of minimally parameterized models is quite general and can describe any MIMO model. It is worth noting that the commonly used “full polynomial form” can be considered as a special case of this class. A limit of the full polynomial form is that the system order n (8) must be a multiple of the number of outputs m. Remark 4. Because of (6), the EIV system (1) can be viewed as a set of m interconnected MISO subsystems. In fact, the output y0i (t) of the i-th subsystem depends not only on the input signals u01 (t), u02 (t), . . . , u0r (t) but also on the output signals y01 (t), . . . , y0(i−1) (t), y0(i+1) (t), . . . , y0m (t).
M
The following assumptions are introduced. Assumptions
ED
A1. System (1) is asymptotically stable. A2. Q(z) and P (z) are left coprime.
CE
PT
A3. The noiseless input u0 (t) can be either a zero–mean ergodic process or a quasi–stationary bounded deterministic signal, i.e. such that the limit N 1 X u0 (t) uT0 (t − τ ) lim N →∞ N t=1
AC
exists ∀τ [20]. Moreover, u0 (t) is assumed as persistently exciting of sufficiently high order.
A4. u˜(t) and y˜(t) are zero–mean ergodic white processes with unknown diagonal covariance matrices ˜ ∗u = E[ u˜(t)˜ Σ uT (t) ] = diag [ σ ˜u2∗1 , σ ˜u2∗2 , . . . , σ ˜u2∗r ], ˜ ∗ = E[ y˜(t)˜ Σ y T (t) ] = diag [ σ ˜y2∗1 , σ ˜y2∗2 , . . . , σ ˜y2∗m ]. y 5
(9) (10)
ACCEPTED MANUSCRIPT
A5. u˜(t) and y˜(t) are mutually uncorrelated and uncorrelated with the noise–free input u0 (t).
CR IP T
A6. The structural indexes ν1 , ν2 , . . . , νm of the canonical model (6) are considered as a priori known or as already estimated by using the procedure described in Section 5.
Under these assumptions, the EIV MIMO identification problem can be stated as follows.
AN US
Problem 1. Given the set of noisy input–output observations u(1), . . . , u(N ), y(1), . . . , y(N ), estimate the coefficients αijk , βijk of the canonical model (6) ˜ ∗ (9), (10). ˜∗, Σ and the noise covariance matrices Σ y u 3. The dynamic Frisch scheme in the MIMO case
In the following, we will focus our attention on the generic i–th subsytem. The properties that will be described hold for all m subsytems. Let us define the vectors
M
ϕ0i (t) = [ −y01 (t) . . . − y01 (t + νi1 − 1) . . . − y0i (t) . . . − y0i (t + νi ) . . . − y0m (t) · · · − y0m (t + νim − 1) u01 (t) . . . u01 (t + νi ) . . .
ED
. . . u0r (t) . . . u0r (t + νi ) ]T ,
ϕi (t) = [ −y1 (t) . . . − y1 (t + νi1 − 1) . . . − yi (t) · · · − yi (t + νi ) . . . u1 (t) . . . u1 (t + νi ) . . . ur (t) . . . ur (t + νi ) ]T ,
PT
ϕ˜i (t) = [ −˜ y1 (t) . . . − y˜1 (t + νi1 − 1) . . . − y˜i (t) · · · − y˜i (t + νi ) . . . u˜1 (t) . . . u˜1 (t + νi ) . . . u˜r (t) . . . u˜r (t + νi ) ]T ,
(11)
(12) (13)
AC
CE
whose entries are noise–free, noisy and noise samples respectively, and the parameter vector θi∗ = αi11 . . . αi1νi1 αi21 . . . αi2νi2 · · · αii1 · · · αiiνi 1 · · · αim1 · · · αimνim T βi11 · · · βi1(νi +1) βi21 · · · βi2(νi +1) · · · βir1 · · · βir(νi +1) . (14) From (6) and (4)–(5) it follows that
ϕT0i (t) θi∗ = 0, ϕi (t) = ϕ0i (t) + ϕ˜i (t). 6
(15) (16)
ACCEPTED MANUSCRIPT
Define also the covariance matrices Σ0i = E [ ϕ0i (t) ϕT0i (t) ],
(17)
Σi = E [ ϕi (t) ϕTi (t) ], ˜ ∗ = E [ ϕ˜i (t) ϕ˜T (t) ]. Σ i i
(18)
Σ0i θi∗ = 0, ˜∗ Σi = Σ0i + Σ i
CR IP T
Equations (15)–(16) and Assumption A5 lead to
(19)
(20) (21)
PT
ED
M
AN US
where, because of Assumptions A4 2∗ ˜ ∗ = diag σ Σ ˜y1 Iνi1 , . . . , σ ˜y2∗i Iνi +1 , . . . , σ ˜y2∗m Iνim , σ ˜u2∗1 Iνi +1 , σ ˜u2∗2 Iνi +1 , . . . , σ ˜u2∗r Iνi +1 i (22) where Ik denotes the k×k identity matrix. The covariance matrix of the noisy data Σi can thus be decomposed as the sum of a positive semidefinite matrix ˜∗ Σ0i , whose kernel defines the parameter vector θi∗ , and a diagonal matrix Σ i that is completely characterized by the input–output noise variances. Note that equations (20) and (21) cannot be directly used to solve Problem 1 even asymptotically since only the covariance matrix of the noisy data is known. ˜ ∗i ) were known, it would be If the noise variances (and then the matrix Σ possible to determine the covariance matrix of the noise-free data via (21) and then the parameter vector θi∗ by means of (20): ˜ ∗i ) θi∗ = 0. (Σi − Σ
(23)
CE
Starting from these considerations, the Frisch scheme approach to EIV identification can be summarized by the following two steps:
AC
i) Determine the set of all diagonal matrices of type (22) (i.e. the set of all input-output noise variances) such that subtracted to Σi give positive semidefinite matrices with a null eigenvalue (the same properties of Σ0i ). Every diagonal matrix of this set can thus be associated with a matrix whose null space leads to a possible parameter vector, like in (20).
ii) Define a suitable selection criterion in order to find, within the pre˜ ∗ (i.e. the true inputviously defined set, the true diagonal matrix Σ i output noise variances) and then, the true parameter vector θi∗ via (20). 7
ACCEPTED MANUSCRIPT
CR IP T
2 2 2 Denote now with P = (˜ σ12 , . . . , σ ˜m ,σ ˜m+1 ,...,σ ˜m+r ) a generic point belonging m+r ˜ i (P ) the diagonal to the positive orthant of the noise space R and with Σ matrix 2 2 2 2 ˜ i (P ) = diag σ Iνi +1 . Iνi +1 , . . . , σ ˜m+r Iνim , σ ˜m+1 ˜m ˜i2 Iνi +1 , . . . , σ Σ ˜1 Iνi1 , . . . , σ (24) The first step of the Frisch scheme approach is equivalent to the following problem.
Problem 2. Determine the set of points P belonging to the positive orthant of Rm+r such that ˜ i (P ) , Σ0i (P ) ≥ 0, Σi − Σ
AN US
˜ i (P )) = 0. min eig Σi − Σ
(25)
This set will be called singularity (hyper)surface of Σi and denoted as S(Σi ) in the sequel. Note that a generic point P of S(Σi ) defines the set of input– output noise variances (26)
M
2 2 2 σ ˜y21 = σ ˜12 , . . . , σ ˜y2m = σ ˜m ,σ ˜u21 = σ ˜m+1 ,...,σ ˜u2r = σ ˜m+r .
ED
Because of (25), it is also possible to associate the point P with a vector of parameters θi (P ) having the same structure as (14): Σ0i (P ) θi (P ) = 0,
(27)
PT
θi (P ) = αi11 (P ) . . . αi1νi1 (P ) · · · αii1 (P ) · · · αiiνi (P ) 1 · · · αim1 (P ) · · · T · · · αimνim (P ) βi11 (P ) · · · βi1(νi +1) (P ) · · · βir1 (P ) · · · βir(νi +1) (P ) . (28)
CE
where θi (P ) is obtained from any basis of the null space of Σ0i (P ) by normalizing the appropriate entry to 1, see (14):
AC
θi (P ) =
v , v(η)
v ∈ ker (Σ0i (P )),
η = νi1 + · · · + νi(i−1) + νi + 1.
(29)
The singularity hypersurface S(Σi ) is described by the following theorem.
Theorem 1. The set of points P satisfying (25) defines an hypersurface S(Σi ) belonging to the positive orthant of Rm+r (the noise space). Every 8
CR IP T
ACCEPTED MANUSCRIPT
AN US
Figure 2: Typical shape of S(Σi ) for a one–input two–output EIV model.
point P of S(Σi ) defines a set of m + r input–output noise variances and can be associated with a vector of parameters θi (P ) satisfying the relation ˜ i (P ) θi (P ) = 0. Σi − Σ (30)
ED
M
The vector θi (P ) is obtained by normalizing to 1 the (ν1 + · · · + νi + 1)– ˜ i (P ), see (28)-(30). th entry of a generic basis of the null space of Σi − Σ 2∗ 2∗ ∗ 2∗ 2∗ 2∗ ˜u1 , . . . , σ ˜ur ), defined by the true noise The point P = (˜ σy1 , σ ˜y2 , . . . , σ ˜ ym , σ variances, belongs to S(Σi ) and is associated with the true parameter vector, i.e. θi (P ∗ ) = θi∗ .
PT
Proof. See Appendix A.
CE
As an example, Fig. 3 shows a typical surface for a system with one input and two outputs. The figure shows, to improve the rendering of the shape of ˜y21 , σ ˜y22 are free, see Lemma S(Σi ), a set of curves obtained by fixing σ ˜u2 while σ 3 in Appendix A.
AC
Remark 5. The hypersurface S(Σi ) defined by Theorem 1 partitions the positive orthant of Rm+r into two regions. The points over S(Σi ) correspond to ˜ i (P ), those under S(Σi ) correspond to positive nondefinite matrices Σi − Σ ˜ definite matrices Σi − Σi (P ). As already said at the beginning of this section, the above mentioned properties hold for every subsystem. It is thus possible to define the m hypersurfaces S(Σ1 ), S(Σ2 ), . . . , S(Σm ), that exhibit the following important property. 9
ACCEPTED MANUSCRIPT
Theorem 2. All hypersurfaces S(Σ1 ), S(Σ2 ), . . . , S(Σm ) share the common point P ∗ corresponding to the actual input–output noise variances. Moreover, ∗ θ1 (P ∗ ) = θ1∗ , θ2 (P ∗ ) = θ2∗ , . . . θm (P ∗ ) = θm .
CR IP T
Proof. See Appendix B.
4. Identification of MIMO EIV models in the Frisch scheme context
AN US
The property described by Theorem 2 allows to find, in the asymptotic case, the point P ∗ associated with the true noise variances, and, as a consequence, the true system parameters by means of the relations ˜ i (P ∗ ) θi (P ∗ ) = 0, Σi − Σ i = 1, 2, . . . , m. (31)
ED
M
Nevertheless, this nice property cannot be directly used in practice since the covariance matrices Σi , i = 1, . . . , m must be replaced by their sample esˆ i , i = 1, . . . , m so that the associated hypersurfaces S(Σ ˆ 1 ), S(Σ ˆ 2 ), timates Σ ˆ m ) do no longer share any common point. A possible way for solv. . . , S(Σ ing the identification problem could consists in the minimization, on the hypersurfaces associated with the different subsystems, of the cost functions already used in the SISO case [17]. The generic subsystem i would thus be treated as a MISO model where all outputs except the i–th one play the role of inputs, see Remark 4. An approach of this type presents, however, the following drawbacks:
CE
PT
• It is necessary to solve m optimization problems of complexity m + r and this increases the computational burden. In fact, the step ii) of the Frisch scheme has to be solved for every subsystem i.e. the selection ˆ 1 ), S(Σ ˆ 2 ), . . . , criterion has to be applied to the m hypersurfaces S(Σ ˆ m ). S(Σ
AC
ˆ 1 ), . . . , S(Σ ˆ m ) will never, • The m sets of noise variances selected on S(Σ when working with real data, coincide. This lack of congruence would compromise the use of the identified models in some applications like, for instance, fault detection and optimal filtering.
To overcome these drawbacks it is necessary to treat model (6) as a single MIMO description. This can be done by exploiting the geometric approach introduced in [14] whose rationale consists in associating the whole MIMO 10
ACCEPTED MANUSCRIPT
CR IP T
model to a direction in the noise space. This approach takes advantage also of the easy computation of the intersection between a straight line through the origin and singularity hypersurfaces [18] that is described, for the MIMO case, by the following theorem. Theorem 3. Let ξ = (ξ1 , ξ2 , . . . , ξm+r ) be a generic point of the positive orthant of Rm+r and ρ the straight line from the origin through ξ. Its intersection with S(Σi ) is the point Pi given by Pi =
ξ , λM i
AN US
where
˜ξ λM i = max eig Σ−1 i Σi
˜ ξ = diag ξ1 Iν , . . . , ξi Iν +1 , . . . , ξm Iν , ξm+1 Iν +1 , . . . , ξm+r Iν +1 . Σ i i i im i1 i Proof. See Appendix C.
(32)
(33)
ED
M
This parameterization establishes, for every subsystem, a bijection between the sheaf of lines from the origin in the first orthant of Rm+r and the locus of solutions described by Theorem 1. It is thus possible to consider the m intersections P1 , P2 , . . . , Pm of a line ρ from the origin with the m singularity hypersurfaces S(Σ1 ), . . . , S(Σm ) that define univocally the parameters θ1 (P1 ), θ2 (P2 ), . . . , θm (Pm ) i.e. a whole MIMO model.
CE
PT
Remark 6. Note that this approach still leads to m different values for each noise variance. However, since P1 , P2 , . . . , Pm belong to the same straight line from the origin, the ratios between the noise variances associated with every subsystem are the same.
AC
On the basis of all previous considerations, it is possible to associate a single MIMO EIV model to every direction in the noise space as follows. Definition 1. Consider a straight line ρ passing through the origin and belonging to the positive orthant of the noise space Rm+r . Let P1 , P2 , . . . , Pm be the intersections of ρ with the hypersurfaces S(Σ1 ), S(Σ2 ), . . . , S(Σm ). The direction ρ defines the following MIMO EIV model: M(ρ) , {θ1 (P1 ), θ2 (P2 ), . . . , θm (Pm )}. 11
(34)
ACCEPTED MANUSCRIPT
∗ }. M(ρ∗ ) = {θ1∗ , θ2∗ , . . . , θm
CR IP T
In order to solve the identification problem (Problem 1) it is necessary to introduce a selection criterion determining, asymptotically, the true direction in the noise space, i.e. the direction ρ∗ associated with the point P ∗ and, consequently, with the true model M(ρ∗ ): (35)
In fact, because of Theorem 2 and Theorem 3, if ρ = ρ∗ it follows that P1 = P2 = · · · = Pm = P ∗ . The next subsection describes a criterion based on the properties of the high–order Yule–Walker equations.
AN US
4.1. A criterion based on high–order Yule–Walker equations Consider the vectors
ϕ0 (t) = [ −y01 (t) . . . − y01 (t + ν1 ) − y02 (t) · · · − y02 (t + ν2 ) . . .
(36)
− y0m (t) · · · − y0m (t + νm ) u01 (t) . . . u01 (t + νM ) . . . u0r (t) . . . u0r (t + νM ) ]T ϕ(t) = [ −y1 (t) . . . − y1 (t + ν1 ) − y2 (t) · · · − y2 (t + ν2 ) . . . (37)
M
− ym (t) · · · − ym (t + νm ) u1 (t) . . . u1 (t + νM ) . . . ur (t) . . . ur (t + νM ) ]T ϕ(t) ˜ = [ −˜ y1 (t) . . . − y˜1 (t + ν1 ) − y˜2 (t) · · · − y˜2 (t + ν2 ) . . . (38) − y˜m (t) · · · − y˜m (t + νm ) u˜1 (t) . . . u˜1 (t + νM ) . . . u˜r (t) . . . u˜r (t + νM ) ]T
PT
ED
and the matrix of parameters
∗ Θ∗ = θ¯1∗ θ¯2∗ · · · θ¯m ,
(39)
AC
CE
where θ¯i∗ = αi11 . . . αi1νi1 0| .{z . . 0} . . . αii1 . . . αiiνi 1 . . . αim1 . . . αimνim (ν1 +1−νi1 )
. . 0} βi11 . . . βi1(νi +1) |0 .{z . . 0} . . . βir1 . . . βir(νi +1) 0| .{z (νM −νi )
(νM −νi )
T
,
0| .{z . . 0}
(νm +1−νim )
(40)
for i = 1, . . . , m. Note that θ¯i∗ differs from θi∗ only for the presence of some structural zeros, see (14). Because of (6) and (4)–(5) it is easy to show that ϕT0 (t) Θ∗ = 0 ϕ(t) = ϕ0 (t) + ϕ(t). ˜ 12
(41) (42)
ACCEPTED MANUSCRIPT
Define also the rq × 1 (where q ≥ 1 is a user–chosen parameter) vector of delayed inputs
ϕdu (t) = ϕdu0 (t) + ϕ˜du (t), where
CR IP T
ϕdu (t) = [ u1 (t−q) . . . u1 (t−1) u2 (t−q) . . . u2 (t−1) . . . ur (t−q) . . . ur (t−1) ]T , (43) that, because of (4), satisfies the condition
ϕdu0 (t) = [ u01 (t − q) . . . u01 (t − 1) u02 (t − q) . . . u02 (t − 1) . . .
AN US
ϕ˜du (t)
. . . u0r (t − q) . . . u0r (t − 1) ]T
= [ u˜1 (t − q) . . . u˜1 (t − 1) u˜2 (t − q) . . . u˜2 (t − 1) . . . . . . u˜r (t − q) . . . u˜r (t − 1) ]T .
(44)
(45)
(46)
Define now the rq × [n + m + (r + 1) νM ] matrix
(47)
M
Σd = E [ϕdu (t) ϕT (t)] = E [(ϕdu0 (t) + ϕ˜du (t)) (ϕ0 (t) + ϕ(t)) ˜ T ], Because of the assumptions A4–A5 we have
Σd Θ∗ = 0.
(49)
ED
(48)
PT
so that,from (41)
Σd = E [ϕdu0 (t) ϕT0 (t)],
AC
CE
Relation (49) represents a set of high order Yule–Walker equations that can be used to define a selection criterion. To this end, consider a direction ρ in the noise space, its intersections P1 , . . . , Pm with S(Σ1 ), . . . , S(Σm ) and the associated MIMO model M(ρ), according to Definition 1. Relation (49) allows the introduction of the cost function J(P1 , . . . , Pm ) = J(ρ) = trace ΘT (P1 , . . . , Pm ) (Σd )T Σd Θ(P1 , . . . , Pm ) , (50) ∗ where Θ(P1 , . . . , Pm ) = Θ(ρ) has the same structure of Θ and has been constructed with the entries of θ1 (P1 ), . . . , θm (Pm ). This function satisfies the conditions 1. J(ρ) ≥ 0 13
ACCEPTED MANUSCRIPT
2. J(ρ) = 0 if P1 = · · · = Pm = P ∗ ,
CR IP T
where the second condition follows directly from (49). Problem 1 can thus be solved by minimizing J(ρ) in the noise space. Of course, when the covariance matrices are replaced by their sample estimates, relation (49) does no longer hold exactly and the minimum of the cost function (50) will be greater than zero. The whole identification procedure is summarized by the following algorithm. Algorithm 1 (Identification of MIMO models)
ˆi = Σ
AN US
1. Compute, on the basis of the noisy observations u(1), . . . , u(N ), y(1), . . . , y(N ), the sample covariance matrices Σ1 , . . . , Σm : NX −νM 1 ϕi (t) ϕTi (t), N − νM t=1
i = 1, . . . , m
M
2. Choose a value of the delay q in (43) and compute the sample covariance ˆ d: matrix Σ NX −νM 1 d ˆ Σ = ϕd (t) ϕT (t). N − νM − q t=q+1 u
PT
ED
3. Start from a generic direction ρ in the first orthant of Rm+r . ˆ 1 ), . . . , S(Σ ˆ m) 4. Compute the intersections P1 , . . . , Pm between ρ and S(Σ by means of Theorem 3. ˜ 1 ), . . . , Σ(P ˜ m ) and compute the 5. Construct the diagonal matrices Σ(P parameter vectors θ1 (P1 ), θ2 (P2 ), . . . , θm (Pm ) by means of the relations ˆi − Σ ˜ i (Pi ) θi (Pi ) = 0, i = 1, . . . , m. Σ
CE
It is now possible to define the model M(ρ), see (34). Construct, with the entries of θ1 (P1 ), . . . , θm (Pm ) the matrix Θ(ρ) with the same structure of the matrix (39). Compute the cost function J(ρ) (50). Move to a new direction corresponding to a decrease of J(ρ). Repeat steps 3–8 until the direction ρˆ corresponding to the minimum of J(ρ) is found. The intersections Pˆ1 , Pˆ2 , . . . , Pˆm between ρˆ and ˆ 1 ), . . . , S(Σ ˆ m ) defines the estimated MIMO model M(ˆ S(Σ ρ):
6.
AC
7. 8. 9.
M(ˆ ρ) = {θ1 (Pˆ1 ), θ2 (Pˆ2 ), . . . , θm (Pˆm )} , {θˆ1 , θˆ2 , . . . , θˆm }. 14
(51)
ACCEPTED MANUSCRIPT
CR IP T
Remark 7. It is worth pointing out that Algorithm 1 is based on the solution of a single optimization problem of dimension m + r (the number of inputoutput noise variances). As already said at the beginning of this section, if we treat the MIMO system as a set of separate subsystem, the solution of the EIV identification problem requires the implementation of m optimization problems of dimension m + r with an increasing computational burden.
AN US
Remark 8. The criterion based on a set of Yule-Walker equations could be substituted by a more elegant geometric criterion using a cost function relying on the euclidean distances between the points Pi . Unfortunately, these distances depend on the scaling of the different input and outputs so that also the solutions would depend on these scalings, see [4] for a discussion concerning the SISO case.
ED
M
4.2. Estimation of the noise variances The MIMO model (51) identified by means of Algorithm 1 leads to m different values of each input and output noise variance given by the entries of Pˆ1 , Pˆ2 , . . . Pˆm . However, as previously said, since Pˆ1 , Pˆ2 , . . . , Pˆm belong to the same straight line ρˆ from the origin, the ratios between the noise variances are univocally defined. Let 2 ˆ 2 2 Pˆi = σ ˜12 (Pˆi ), . . . , σ ˜m (Pi ), σ ˜m+1 (Pˆi ), . . . , σ ˜m+r (Pˆi ) , i = 1, . . . , m. (52)
CE
PT
A single estimation of every noise variance can be obtained by considering the arithmetic means of the values associated with Pˆ1 , Pˆ2 , . . . , Pˆm : m
1 X 2 ˆ σ ˜ (Pi ), σ ˜ˆy2j = m i=1 j
j = 1, . . . , m
(53)
k = 1, . . . , r.
(54)
m
X 2 ˆ˜u2 = 1 (Pˆi ), σ σ ˜m+k k m i=1
AC
It is worth noting that these estimations preserve the ratios between the noise variances since the point ˆ˜y2 , . . . , σ ˆ˜y2 , σ ˆ˜u2 , . . . , σ ˆ˜u2 ) Pˆ , (σ m r 1 1
belongs to the straight line ρˆ as well. 15
(55)
ACCEPTED MANUSCRIPT
5. On structure estimation
AN US
x(t + 1) = A x(t) + B u0 (t) y0 (t) = C x(t) + D u0 (t).
CR IP T
The identification algorithm described in Subsection 4.1 relies on the a priori knowledge of the system structure (see Assumption A6). This subsection shows a procedure for estimating the structural indexes ν1 , ν2 , . . . , νm . The procedure is based on the EIV Kalman filter described in [10] and can be used for model validation as well. Consider a state space realization of the canonical input–output model (6) (see [16] for details) (56) (57)
By inserting (4), (5) in (56), (57) we obtain
x(t + 1) = A x(t) + B u(t) − B u˜(t) y(t) = C x(t) + D u(t) − D u˜(t) + y˜(t).
(58) (59)
M
This model is a special case of the more general framework considered in [10]. Introduce now the signal
ED
z(t) = y(t) − D u(t).
(60)
Starting from (58)–(59), the optimal prediction z(t + 1|t) of z(t) can be computed by means of the following Kalman filter equations [10]:
PT
x(t + 1|t) = A x(t|t − 1) + B u(t) + K(t) ε(t) K(t) = A P (t|t − 1) C T + S Σε (t)−1
AC
CE
P (t + 1|t) = A(t) P (t|t − 1) AT (t) + Q T − A P (t|t − 1) C T + S Σε (t)−1 A P (t|t − 1) C T + S z(t + 1|t) = C x(t + 1|t),
(61) (62) (63) (64)
where ε(t) and Σε (t) denote the innovation of z(t) and its covariance matrix, given by ε(t) = z(t) − C x(t|t − 1)
Σε (t) = E [ε(t) εT (t)] = CP (t|t − 1) C T + R, 16
(65) (66)
ACCEPTED MANUSCRIPT
where ˜ ∗u B T Q=BΣ ˜∗ + D Σ ˜ ∗ DT R=Σ
(67)
˜ ∗u DT . S =BΣ
(69)
(68)
u
CR IP T
y
The innovation ε(t) (65) associated with the optimal predictor (61)–(64) is a zero-mean white process with covariance matrix given by (66). This property can thus be exploited to evaluate the goodness of an identified EIV model. To this purpose, consider the normalized innovation 1
AN US
ε¯(t) = [ ε¯1 (t) ε¯2 (t) · · · ε¯m (t) ]T = Σε (t)− 2 ε(t), 1/2
(70)
where Σε (t) is the unique positive definite square root of Σε (t), whose covariance matrix is the m × m identity matrix, Σε¯(t) = E [¯ ε(t) ε¯T (t)] = Im . For every component ε¯i (t) of ε¯(t) consider the sample autocorrelations rˆε¯i (1), rˆε¯i (2), . . . , rˆε¯i (η) given by N 1 X ε¯(t + τ ) ε¯(t), N t=1
τ = 1, . . . , η
(71)
M
rˆε¯i (τ ) =
ED
where η is a user-chosen integer. Since ε¯1 (t), . . . , ε¯m (t) are zero-mean white processes with unit variance, the quantities i ζN,η
=N
η X
rˆε¯i (τ )2 ,
i = 1, . . . , m
(72)
PT
τ =1
AC
CE
are asymptotically χ2 (η) distributed [20, 31]. Then, for a given level α of the χ2 (η) distribution, a whiteness test can be applied for every sequence ε¯i (1), ε¯i (2), . . . , ε¯i (N ) by checking if ζN,η ≤ χ2α (η). Starting from these considerations, it is possible to devise an algorithm for estimating the model structure, i.e. the set of Kronecker indexes ν1 , ν2 , . . . , νm appearing in (6). For notational convenience, we will refer in the following to the multi-index ν = ν1 , ν2 , . . . , νm . (73)
Consider the following lexicographic sequence of multi–indexes [15] 1, 1, . . . , 1 , 2, 1, . . . , 1 , 2, 2, 1, . . . , 1 , · · · , 2, 2, . . . , 2 , 3, 2, . . . , 2 , · · · (74) 17
ACCEPTED MANUSCRIPT
and perform the steps described by the following algorithm Algorithm 2 (Estimation of the system structure)
M
AN US
CR IP T
1. Set the number of degrees of freedom η and the level α for the χ2α (η) whiteness test. η should be selected larger than the (assumed) value of νM . 2. Consider the system structure defined by the first term in (74) and determine, on the basis of the input-output observations u(1), . . . , u(N ), y(1), . . . , y(N ), an estimate of the parameter vectors θ1 , θ2 , . . . , θm by means of Algorithm 1. ˜ ∗ by means ˜∗, Σ 3. Compute an estimate of the noise covariance matrices Σ y u of relations (53) and (54). 4. Perform the EIV Kalman filtering (61)–(66) and compute the normalized innovation sequence ε¯(1) , ε¯(2), . . . ε¯(N ). i 5. Compute the m quantities ζN,η (i = 1, . . . , m) (72). i 6. For every i such that ζN,η ≤ χ2α (η), set νi = i and do not increase any longer νi . 7. When all conditions 1 2 m ζN,η ≤ χ2α (η), ζN,η ≤ χ2α (η), · · · , ζN,η ≤ χ2α (η)
(75)
ED
are satisfied, the model structure has been defined and the procedure ends, otherwise consider the next allowable structure in (74) and repeat steps 2–7.
PT
6. Numerical results
AC
CE
6.1. Example 1: estimation of the system parameters The effectiveness of the proposed identification algorithm has been tested by means of Monte Carlo simulations and compared to that of the GIVE approach described in [29]. Consider the two–input two–output model characterized by the polynomial matrices " # z 3 − 1.5 z 2 + 1.1 z − 0.45 0.44 z − 0.22 Q(z) = 0.4 z 2 − 0.2 z − 0.08 z 2 − z + 0.5 " # 0.22 z 3 + 0.08 z 2 − 0.05 z + 0.65 0.3 z 3 + 0.05 z 2 − 0.07 z + 0.05 P (z) = 0.18 z 2 + 0.02 z + 0.1 0.36 z 2 + 0.09 z + 0.8 18
ACCEPTED MANUSCRIPT
In this case, according to (7), we have ν1 = 3,
ν2 = 2,
ν12 = 2,
CR IP T
corresponding to the parameter vectors θ1∗ = −0.45, 1.1, −1.5, 1, −0.22, 0.44, 0.65, −0.05, 0.08, 0.22, 0.05, T − 0.07, 0.05, 0.3 (76) T θ2∗ = −0.08, −0.2, 0.4, 0.5, −1, 1, 0.1, 0.02, 0.18, 0.8, 0.09, 0.36 . (77) ν21 = 3
σ ˜u2∗1 = 0.1,
AN US
and the system order is n = ν1 + ν2 = 5. The true inputs u01 (t), u02 (t) are pseudo random binary sequences of unit variance and length N = 2000. The true values of the noise variances are σ ˜u2∗2 = 0.1,
σ ˜y2∗1 = 0.56,
σ ˜u2∗1 = 0.28,
PT
ED
M
corresponding to signal-to-noise ratios of approximately 10 dB on both inputs and outputs. A Monte Carlo simulation of 200 independent runs has been carried out by using q = 2 in (43). The GIVE method has been applied by using pu = 1 and py = 1, see [29]. With these choices, both methods rely on the same number of equations. The obtained results are summarized in the Tables 1–3. The performances of the method described in this paper and GIVE are quite similar. Slightly better results are obtained with Algorithm 1 in the estimation of the noise variances. It can, however, be observed that the procedure based on the Frisch scheme is more efficient from a computational standpoint; an approximate comparison has been performed by using the Matlab function ‘cputime’ in several Monte Carlo simulations that have shown how the Frisch–based algorithm is one order of magnitude faster than GIVE.
AC
CE
6.2. Example 2: estimation of the system structure The aim of this subsection is to test Algorithm 2 for the estimation of the system structure. To this end, the same two–input two–output model of Example 1 has been considered. Different values of the noise variances have been considered, in order to set the signal to noise ratio to the desired value for both inputs and outputs. For every considered value of the SNR a Monte Carlo simulation of 500 runs has been performed on sequences with length N = 2000. Table 4 reports the percentages associated with the detection of the correct structure ν1 = 3, ν2 = 2. The proposed procedure seems very robust. It is also worth pointing out that the structure has never been underestimated. 19
ACCEPTED MANUSCRIPT
Frisch GIVE true Frisch GIVE
α111
α112
α113
−0.45
1.1
−1.5
Frisch
0.44
1.0873 ± 0.1009
−1.4885 ± 0.0506
−0.2274 ± 0.0580
β111
β112
β113
β114
−0.4388 ± 0.0477
1.0805 ± 0.0908 −0.05
0.65 0.6525 ± 0.0420
−1.4928 ± 0.0555 0.08
−0.0519 ± 0.0391
−0.2245 ± 0.0284
0.2215 ± 0.0396
β123
β124
−0.0549 ± 0.0357
0.0814 ± 0.0426
0.05
−0.07
0.05
0.3
0.0484 ± 0.0400
0.3057 ± 0.0412
0.0479 ± 0.0428 0.0461 ± 0.0474
0.4417 ± 0.0411
0.22
0.0827 ± 0.0415
0.6385 ± 0.0580
β122
0.4504 ± 0.0641
−0.0618 ± 0.0409
−0.0692 ± 0.0363
0.0492 ± 0.0422
0.2170 ± 0.0388
0.3062 ± 0.0377
M
GIVE
α122
−0.4482 ± 0.0640
β121
true
α121
−0.22
AN US
true
CR IP T
Table 1: True and estimated values of the coefficients of Q(z) and P (z) – first subsystem. Monte Carlo simulation of 200 runs, N = 2000.
Frisch GIVE
PT
α211
true
ED
Table 2: True and estimated values of the coefficients of Q(z) and P (z) – second subsystem. Monte Carlo simulation of 200 runs, N = 2000.
−0.08
−0.0835 ± 0.0424 −0.0836 ± 0.0337
CE
β211
true
Frisch
AC
GIVE true
Frisch
GIVE
α212
α213
α221
−0.2
0.4
0.5
0.4008 ± 0.0484
0.4964 ± 0.0280
−0.1972 ± 0.0780
−0.1971 ± 0.0601 β212
0.4010 ± 0.0327 β213
0.1
0.02
0.18
0.0986 ± 0.0228
0.0218 ± 0.0250
0.1792 ± 0.0279
β122
β123
0.0963 ± 0.0224
0.0209 ± 0.0232
0.8
0.09
0.36
0.8032 ± 0.0400
0.0931 ± 0.0261
0.3624 ± 0.0273
β121
0.8048 ± 0.0316
0.0927 ± 0.0234
0.1748 ± 0.0266
0.3630 ± 0.0239
20
0.4973 ± 0.0213
α222 −1
−0.9949 ± 0.0361
−0.9954 ± 0.0249
ACCEPTED MANUSCRIPT
Table 3: True and estimated values of the noise variances. Monte Carlo simulation of 200 runs, N = 2000.
true Frisch GIVE
σ ˜y2∗1
σ ˜y2∗2
CR IP T
2∗ σ ˜u 2
2∗ σ ˜u 1
0.1
0.1
0.56
0.0995 ± 0.0369
0.1012 ± 0.0379
0.5535 ± 0.0298
0.0732 ± 0.0754
0.1034 ± 0.0203
0.5588 ± 0.0269
0.28
0.2793 ± 0.0170 0.2777 ± 0.0151
AN US
Table 4: Estimation of the system structure: percentage of detections of the correct structure ν1 = 3, ν2 = 2. Monte Carlo simulation of 500 runs, N = 2000. SNR (dB)
%
5
90
10
98.5
15
99.2
20
100
PT
ED
M
6.3. Example 3: application to optimal filtering The proposed identification algorithm can consist the preliminary step for applying the EIV filter (61)-(66) in order to obtain an optimal estimation of the noise-free output y0 (t). As shown in [10], the optimal (minimal variance) of y0 (t) that can be obtained from the noisy input-output data up to time t, the true model and the true noise variances, is given by
(78)
CE
yˆ0 (t|t) = E[y0 (t)|y(t), y(t − 1), . . . , y(1), u(t), u(t − 1), . . . , u(1)] ˜ ∗y Σ−1 = y(t) − Σ ε ε(t).
AC
It is also shown that, under suitable conditions, the covariance of the estimation error ey (t) = y0 (t) − yˆ0 (t|t) (79)
converges to
h i ˜ ∗y ˜ ∗y I − C P C T + R −1 Σ Py = lim E[ey (t)eTy (t)] = Σ t→∞
(80)
where P is the unique solution of the algebraic Riccati equation associated with (63). 21
ACCEPTED MANUSCRIPT
AN US
CR IP T
To test the performance of the EIV identification procedure with respect to the optimal filtering problem, Monte Carlo simulations of 200 runs have been carried out by using the same simulation set-up of Example 1. Both the number of samples N = 2000 and N = 10000 have been considered. In each Monte Carlo run, the model parameters and the noise variances are first identified by means of Algorithm 1. Then, the estimated quantities are used to perform the EIV filtering described by equations (61)-(66) and (78). The estimated noise-free output sequence is thus computed as well as the associated estimation error sequence (79). The asymptotic covariance matrix of the estimation error, obtained by means of the relation (80) is 0.1650 −0.0207 Py = −0.0207 0.1044
The mean and standard deviation of the sample estimates of Py obtained in the Monte Carlo runs is 0.1897 ± 0.0364 −0.0283 ± 0.0156 ˆ Py = −0.0283 ± 0.0156 0.1146 ± 0.0132 Pˆy =
0.1684 ± 0.0049 −0.0216 ± 0.0022 −0.0216 ± 0.0022 0.1062 ± 0.0025
M
for N = 2000 and
PT
ED
for N = 10000. These results show that the identified models lead to a good filtering performance. It is worth noting that the implementation of the optimal EIV filter requires a good estimation not only of the model coefficients but also of the input and output noise variances. 7. Conclusions
AC
CE
This paper has described a complete procedure for the identification of multi-input multi-output errors-in-variables models based on the dynamic Frisch scheme. This procedure relies on the association of models with directions in the noise space and takes advantage of the properties of a set of high order Yule-Walker equations. The proposed identification method has been described with reference to the general class of minimally parameterized multivariable models. The paper proposes also a procedure for the estimation of the system structure that is based on the EIV Kalman filtering described in [10] The effectiveness of the approach has been tested by means of Monte Carlo simulation. 22
ACCEPTED MANUSCRIPT
Appendix A. Proof of Theorem 1
CR IP T
Theorem 1 follows easily from the Lemmas 1-4 reported below. It is useful, for the subsequent analysis, to partition the matrix Σi (18) as follows: Σ11 Σ12 · · · Σ1η Σ12 Σ22 · · · Σ2η Σi = .. (A.1) .. .. , . . . . . . Ση1 Ση2 · · · Σηη
AN US
where η = m + r and the meaning of the various blocks can be easily derived from (12) and (18). These blocks should be indexed also with the suffix i; this has been avoided to simplify the notation.
Lemma 1. The maximal admissible value for the noise variance σ ˜k2 compatible with (25) when all other variances are zero, i.e. the value σ ˜k2max of σ ˜k2 max 2max associated with the point Pk = (0, . . . , 0, σ ˜k , 0, . . . , 0) of S(Σi ), is given by σ ˜k2max = min eig Σkk − Σk# Σ−1 (A.2) ## Σ#k ,
M
where Σk# denotes the k–th block row of (A.1) without the block Σkk , Σ#k denotes the k–th block column of (A.1) without the block Σkk and Σ## is obtained from (A.1) by deleting its k–th block row and block column.
PT
ED
Proof. We can consider, without loss of generality, the maximal admissible value of σ ˜12 . In fact, it is always possible to rearrange the order of the input– output variables y1 , . . . , ym , u1 , . . . , ur in (11)–(13) and, accordingly, the order of the parameters in (28). In this case we have 2 ˜ i (P ) = diag σ Σ ˜1 Iνi1 , 0νi 2 , . . . , 0νim , 0νi +1 , . . . , 0νi +1 ,
AC
CE
where 0j denotes a null matrix of dimension j × j, so that Σ11 − σ ˜12 Iνi1 Σ12 · · · Σ1η Σ21 Σ22 · · · Σ2η ˜12 Iνi1 Σ1# Σ11 − σ ˜ Σ0i (P ) = Σi −Σi (P ) = , .. .. .. , .. Σ#1 Σ## . . . . Ση1 Ση2 · · · Σηη with obvious meaning of the notation. By using a well known property of the determinant of block partitioned matrices, we have det (Σ0i (P )) = det (Σ## ) det Σ11 − σ ˜12 Iνi1 − Σ1# Σ−1 ## Σ#1 . 23
ACCEPTED MANUSCRIPT
Since Σ## > 0, the condition (25) is satisfied if and only if min eig Σ11 − σ ˜12 Iνi1 −Σ1# Σ−1 ## Σ#1 = 0,
Σ11 − σ ˜12 Iνi1 −Σ1# Σ−1 ## Σ#1 ≥ 0,
CR IP T
i.e., if and only if
˜12max = min eig Σ11 − Σ1# Σ−1 σ ˜12 = σ ## Σ#1 .
(A.3)
Remark 9. Note that the point Pkmax is the intersection of S(Σi ) with the 2 2 2 = 0, σ ˜k2 6= 0} of =σ ˜m+r =σ ˜k+1 ˜k−1 ˜22 = · · · = σ k–th coordinate axis {˜ σ12 = σ the noise space.
AN US
Remark 10. This lemma can be considered as an extension to the dynamic case of the result reported in [18].
M
Lemma 2. The intersection of S(Σi ) with the coordinate plane {˜ σk2 , σ ˜j2 6= 0, σ ˜ξ2 = 0, ξ 6= k, j}, i.e. the set of points P = (0, . . . , 0, σ ˜k2 , 0, . . . , 0, σ ˜j2 , 0, . . . , 0) compatible with (25) is a curve whose intersections with the coordinate axes ˜ξ2 = 0, ξ 6= j} are given by σ ˜k2max , {˜ σk2 6= 0, σ ˜ξ2 = 0, ξ 6= k} and {˜ σj2 6= 0, σ 2max . σ ˜j
PT
ED
Proof. The proof will be carried out by considering, without loss of generality, the set of points P = (˜ σ12 , σ ˜22 , 0, . . . , 0) compatible with (25). It is obvious from Lemma 1 that the intersections of S(Σi ) with the coordinate ˜22max . Since ˜12max and σ ˜22 , 0, . . . , 0) are given by σ axes (˜ σ12 , 0, . . . , 0) and (0, σ 2 ˜ i (P ) = diag σ Σ ˜1 Iνi1 , σ ˜12 Iνi2 , 0νi 3 , . . . , 0νim , 0νi +1 , . . . , 0νi +1 ,
AC
CE
it follows that
Σ11 − σ ˜12 Iνi1 Σ12 Σ12# ˜ i (P ) = Σ21 Σ22 − σ ˜22 Iνi2 Σ0i (P ) = Σi − Σ Σ#12 Σ## Σ(˜ σ12 , σ ˜22 ) Σ12# , , (A.4) Σ#12 Σ##
with obvious meaning of the blocks. It follows that det (Σ0i (P )) = det (Σ## ) det Σ(˜ σ12 , σ ˜22 ) − Σ12# Σ−1 ## Σ#21 . 24
ACCEPTED MANUSCRIPT
Since Σ## > 0, the condition (25) is satisfied if and only if min eig Σ(˜ σ12 , σ ˜22 ) − Σ12# Σ−1 ## Σ#21 = 0. (A.5) −1 By partitioning the matrix Σ12# Σ## Σ#21 as a 2 × 2 block matrix with blocks ˜22 ), it is possible to write congruent with those of Σ(˜ σ12 , σ 0 Σ012 ˜12 Iνi1 Σ11 − σ −1 2 2 0 2 2 . Σ(˜ σ1 , σ ˜2 ) − Σ12# Σ## Σ#12 , Σ (˜ σ1 , σ ˜2 ) = Σ021 Σ022 − σ ˜22 Iνi2 (A.6) To characterize the curve it is now possible to adopt the same lines followed in [3] with reference to the SISO case. Consider, for example, a generic value ˜22max is given by (A.2). From (A.6) ˜22max , where σ ˜22 < σ σ ˜22 , 0 < σ det(Σ0 (˜ σ12 , σ ˜22 )) = det(Σ022 −˜ σ22 Iνi2 ) det Σ011 − σ ˜12 Iνi1 − Σ012 (Σ022 − σ ˜22 Iνi2 )−1 Σ021 .
AN US
CR IP T
Σ(˜ σ12 , σ ˜22 ) − Σ12# Σ−1 ## Σ#21 ≥ 0,
M
Since σ ˜22 < σ ˜22max the matrix Σ022 − σ ˜22 Iνi2 is positive definite so that (A.5) is satisfied if and only if σ ˜12 = min eig Σ011 − σ ˜12 Iνi1 − Σ012 (Σ022 − σ ˜22 Iνi2 )−1 Σ021 .
ED
Of course it is, equivalently, possible to consider a generic value σ ˜12 such that ˜22 by following similar 0<σ ˜12 < σ ˜12max and find the corresponding value of σ 2 2 ˜2 ) satisfying (A.5) defines a curve whose steps. The set of couples (˜ σ1 , σ typical shape is reported in Figure 2.
PT
Lemma 3. Consider the set of points P ∈ S(Σi ) such that all the variances assume fixed (admissible) values with the exception of σ ˜j2 and σ ˜k2 , i.e.
CE
¯˜ξ2 , 0 < σ ¯˜ξ2 < σ {P ∈ S(Σi ) | σ ˜ξ2 = σ ˜ξ2max
∀ξ 6= j, k}.
(A.7)
AC
Since only σ ˜j2 and σ ˜k2 are free variables, this set is defined by the intersection of S(Σi ) with a plane parallel to the coordinate plane (0, . . . , 0, σ ˜j2 , 0, . . . , 0, σ ˜k2 , 0, . . . , 0). This set is a curve that can be characterized by means of the same procedure described in Lemma 2. Proof. As usual, we assume, without loss of generality, that the free variables are σ ˜12 and σ ˜22 . Therefore 2 ˜ i (P ) = diag σ ¯˜32 Iνi3 , . . . , σ ¯˜m+r Iνi +1 . Σ ˜1 Iνi1 , σ ˜22 Iνi2 , σ 25
CR IP T
ACCEPTED MANUSCRIPT
AN US
Figure A.3: Typical shape of the intersection of S(Σi ) with a coordinate plane in the noise space.
M
By taking into account (A.4) it is possible to write Σ11 − σ ˜12 Iνi1 Σ12 Σ12# ˜ i (P ) = Σ21 Σ22 − σ ˜22 Iνi2 Σ0i (P ) = Σi − Σ ¯ ## Σ#12 Σ Σ(˜ σ12 , σ ˜22 ) Σ12# , ¯ ## , Σ#12 Σ
PT
ED
¯ ## , where σ ¯˜m+r ¯˜32 , . . . , σ which is identical to (A.4) except for the block Σ ¯ ## is positive appear in its main diagonal. Because of (A.7), the matrix Σ definite so that the condition (25) can be rewritten as in (A.5). The set of points P of the type (A.7) belonging to S(Σi ) defines thus a curve that can be characterized as in Lemma 2.
AC
CE
Lemma 4. The point P ∗ = (˜ σy2∗1 , . . . , σ ˜y2∗m , σ ˜u2∗1 , . . . , σ ˜u2∗r ), defined by the true noise variances, belongs to S(Σi ) and is associated with the true parameter vector, i.e. θi (P ∗ ) = θi∗ . Proof. From (21) ˜ i (P ∗ ) = Σi − Σ ˜ ∗ = Σ0i . Σ0i (P ∗ ) = Σi − Σ i
(A.8)
Because of (20), the covariance matrix Σ0i , defined in (17), is positive semidefinite and its kernel defines the true parameter vector θi∗ . 26
ACCEPTED MANUSCRIPT
Appendix B. Proof of Theorem 2
CR IP T
Theorem 2 follows immediately from Theorem 1. In fact, the statement of Theorem 1 refers to the generic i–th subsystem, so that it holds for every subsystem. As a consequence, the point P ∗ = (˜ σy2∗1 , σ ˜y2∗2 , . . . , σ ˜y2∗m , σ ˜u2∗1 , ...,σ ˜u2∗r ), defined by the actual noise variances, belongs to all hypersurfaces S(Σ1 ), S(Σ2 ), . . . , S(Σm ). It follows that all hypersufaces share the point P ∗ . Moreover, since θi (P ∗ ) = θi∗ , ∀i we have θ1 (P ∗ ) = θ1∗ , θ2 (P ∗ ) = θ2∗ , ∗ . . . . , θm (P ∗ ) = θm Appendix C. Proof of Theorem 3
˜ i (Pi ) ≥ 0, Σi − Σ where
AN US
2 2 2 ) ∈ S(Σi ) must satisfy the ,...,σ ˜m+r ,σ ˜m+1 ˜m The point Pi , (˜ σ12 , . . . , σ condition (25), i.e.
˜ i (Pi )) = 0, min eig Σi − Σ
(C.1)
ED
M
2 2 2 2 ˜ i (Pi ) = diag σ Σ ˜1 Iνi1 , . . . , σ ˜i2 Iνi +1 , . . . , σ ˜m Iνim , σ ˜m+1 Iνi +1 , . . . , σ ˜m+r Iνi +1 . (C.2) Moreover, since Pi belongs to ρ, there exists a scalar λ such that ξ = λ Pi ,
PT
It follows that
˜ i )) = det det (Σi − Σ(P
λ > 0.
(C.3)
1 ˜ξ Σi − Σi = 0, λ
(C.4)
AC
CE
˜ ξ is given by (33), or, equivalently where Σ i 1 −1 ˜ ξ −1 det Σi det I − Σi Σi = 0. λ
(C.5)
The scalar λ satisfying (C.5) is thus given by ˜ξ λ = λM i = max eig(Σ−1 i Σi ).
27
(C.6)
ACCEPTED MANUSCRIPT
References References
CR IP T
[1] J. C. Aguero, G. C. Goodwin, Identifiability of errors in variables dynamic systems, Automatica 44 (2008) 371–382.
[2] B. D. O. Anderson, M. Deistler, Identifiability of dynamic errors–in– variables models, Journal of Time Series Analysis 5 (1984) 1–13. [3] S. Beghelli, R. Guidorzi, U. Soverini, The Frisch scheme in dynamic system identification, Automatica 26 (1990) 171–176.
AN US
[4] S. Beghelli, P. Castaldi, R. Guidorzi, U. Soverini, A comparison between different model selection criteria in Frisch scheme identification, System Science Journal 20 (1994) 77–84.
[5] G. Bottegal, G. Picci, S. Pinzoni, On the identifiability of errors-invariables models with white measurement errors, Automatica 47 (2011) 545-551.
ED
M
[6] C. T. Chou, M. Verhaegen, Subspace algorithms for the identification of multivariable dynamic errors-in-variables models, Automatica 33 (1997) 1857–1869.
PT
[7] R. Diversi, R. Guidorzi, Combining the Frisch scheme and Yule–Walker equations for identifying multivariable errors–in–variables models, in: Proc. of the 19th International Symposium on Mathematical Theory of Networks and Systems (MTNS 2010), Budapest, Hungary, 2010, pp. 2407–2413.
AC
CE
[8] R. Diversi, R. Guidorzi, A covariance–matching criterion in the Frisch scheme identification of MIMO EIV models, in: Proc. of the 16th IFAC Symposium on System Identification, Brussels, Belgium, 2012, pp. 1647– 1652. [9] R. Diversi, R. Guidorzi, U. Soverini, Maximum likelihood identification of noisy input-output models, Automatica 43 (2007) 464-472.
[10] R. Diversi, R. Guidorzi, U. Soverini, Kalman filtering in extended noise environments, IEEE Transactions on Automatic Control 50 (2005) 1396– 1402. 28
ACCEPTED MANUSCRIPT
[11] R. Diversi, U. Soverini, Identification of errors-in-variables models as a quadratic eigenvalue problem, in: Proc. of the 12th European Control Conference (ECC 2013), Z¨ urich, Switzerland, 2013, pp. 1896–1901.
CR IP T
[12] M. Ekman, Identification of linear systems with errors in variables using separable nonlinear least squares, in: Proc. of the 16th IFAC World Congress, Prague, Czech Republic, 2005.
[13] R.Frisch, Statistical confluence analysis by means of complete regression systems, Technical Report 5, University of Oslo, Economic Institute, Oslo, Norway, 1934.
AN US
[14] R. Guidorzi, Identification of multivariable processes in the Frisch scheme context, MTNS’96, St. Louis, USA, 1996.
[15] R. Guidorzi, Invariants and canonical forms for system structural and parametric identification, Automatica 17 (1981) 117–133. [16] R. Guidorzi, Equivalence, invariance and dynamical system canonical modelling – Part I, Kybernetika 25 (1989) 233–257.
M
[17] R. Guidorzi, R. Diversi, U. Soverini, The Frisch scheme in algebraic and dynamic identification problems, Kybernetika 44 (2008) 585–616.
ED
[18] R. Guidorzi, M. Pierantoni, A new parametrization of Frisch scheme solutions, in: Proc. of the 12th International Conference on Systems Science, Wroclaw, Poland, 1995, pp. 114–120.
CE
PT
[19] R. E. Kalman, System identification from noisy data, in: A. R. Bednarek, L. Cesari (Eds.), Dynamical Systems II, Academic Press, 1982, pp. 135–164. [20] L. Ljung, System Identification – Theory for the User, Prentice Hall, Englewood Cliffs, NJ, 1999.
AC
[21] K. Mahata, An improved bias–compensation approach for errors–in– variables model identification, Automatica 43 (2007) 1339–1354. [22] M. Mossberg, T. S¨oderstr¨om, On covariance matching for multiple input multiple output errors–in–variables systems, in: Proc. of the 16th IFAC Symposium on System Identification, Brussels, Belgium, 2012, pp. 1371– 1376. 29
ACCEPTED MANUSCRIPT
[23] E. Nowak, Identifiability in multivariate dynamic linear errors-invariables models, Journal of the American Statistical Association 87 (1992) 714-723.
CR IP T
[24] R. Pintelon, J. Schoukens, Frequency domain maximum likelihood estimation of linear dynamic errors–in–variables models, Automatica 43 (2007) 621–630. [25] T. S¨oderstr¨om, Identification of stochastic linear systems in presence of input noise, Automatica 17 (1981) 713–725.
AN US
[26] T. S¨oderstr¨om, A generalized instrumental variable estimation method for errors–in–variables identification problems, Automatica 85 (2011) 287–303. [27] T. S¨oderstr¨om, Errors–in–Variables methods in system identification, Automatica 43 (2007) 939–958.
M
[28] T. S¨oderstr¨om, System identification for the errors-in-variables problem, Transactions of the Institute of Measurement and Control 34 (2012) 780– 792.
ED
[29] T. S¨oderstr¨om, A generalized instrumental variable estimator for multivariable errors–in–variables identification problems, International Journal of Control 85 (2012) 287–303.
PT
[30] T. S¨oderstr¨om, M. Mossberg, M. Hong, A covariance matching approach for identifying errors–in–variables systems, Automatica 45 (2009) 2018– 2031.
CE
[31] T. S¨oderstr¨om, P. Stoica, System Identification, Prentice Hall, Cambridge, UK, 1989.
AC
[32] P. Stoica, M. Cedervall, A. Eriksson, Combined instrumental variable and subspace fitting approach to parameter estimation of noisy input– output systems, IEEE Transactions on Signal Processing 43 (1995) 2386–2397. [33] K. J. Tugnait, Stochastic system identification with noisy input using cumulant statistics, IEEE Transactions on Automatic Control 37 (1992) 476–485. 30
ACCEPTED MANUSCRIPT
[34] S. Van Huffel S (Ed.), Recent Advances in Total Least Squares Techniques and Errors–in–Variables Modelling, SIAM, Philadelphia, PA, USA, 1997.
CR IP T
[35] S. Van Huffel, P. Lemmerling (Eds.), Total Least Squares Techniques and Errors–in–Variables Modelling: Analysis, Algorithms and Applications, Kluwer Academic Publishers: Dordrecht, The Netherlands, 2002. [36] S. Van Huffel, J. Vandewalle, Comparison of total least squares and instrumental variable methods for parameter estimation of transfer function models, International Journal of Control 50 (1989) 1039–1056.
AN US
[37] E. Zhang, R. Pintelon, J. Schoukens, Errors-in-variables identification of dynamic systems excited by arbitrary non-white input, Automatica 49 (2013) 3032–3041.
[38] W. X. Zheng, Transfer function estimation from noisy input and output data, International Journal of Adaptive Control and Signal Processing 12 (1998) 365–380.
AC
CE
PT
ED
M
[39] W. X. Zheng, A bias correction method for identification of linear dynamic errors-in-variables models, IEEE Transactions on Automatic Control 47 (2002) 1142–1147.
31