Copyright © IFAC Identification and System Parameter Estimation 1982. Washington D .C .. USA 1982
ON THE PROBLEM OF STRUCTURE SELECTION FOR THE IDENTIFICATION OF STATIONARY STOCHASTIC PROCESSES M. Gevers and V. Wertz Laboratoire d'Automatique et d'Analyse des Systemes, Louvain University, Batiment Maxwe ll, B·1348 Louvain·la·Neuve, Belgium
ABSTRACT. Several "overlapping" but uniquely identifiable parametrizations can usually be fitted to the same multivariate stationary stochastic process. We show that these parametrizations are defined by a finite set of intrinsic invariants, and that they all give the same value to the determinant of the Fisher information matrix . INTRODUCTION. An important problem is that of determining the structure of the state-space or ARMA model for a multivariate stationary finite-dimensional stochastic process such that the model parameters become uniquely identifiable. Two different lines of thought have been followed for this problem. The first idea is to use canonical (state-space or ARMA) forms [I] - [2] . To any finite dimensional process one can associate in a unique way a set of "structural invariants" (e.g. the Kronecker invariants) which in turn uniquely determines a canonical form. The disadvantage with using canonical forms is that if those structural invariants are wrongly estimated, then the parameter estimation problem becomes ill-conditioned. In recent years an alternative approach has been proposed, namely that of using "overlapping parametrizations" [3] - [8]. I t has been recognized that the set of all finite dimensional systems can be represented by a finite number of parametrizations, each parametrization being uniquely identifiable. To each parametrization there corresponds a set of integers called "structure indices". Each of these parametrizations is able to represent almost all finite dimensional systems; each system can normally be represented by more than one such parametrization, and any two parametrizations for a given process are related by a linear transformation which corresponds to a coordinate transformation in Euclidean space; hence the word "overlapping" parametrizations. Now because a process can be represented in more than one overlapping form, the question naturally arises as to whether, for a given data set, any such form is better than the others in a statistical or numerical sense. So far no definite answer is available to this question. Different procedures have heen proposed that select one out of several candidate parametrizations which is considered best in some ad hoc sence [4] - [7] • In this paper we first show that, for a n-th 361
order process with a p-dimensional observation vector y , the parameters of any overlapping param~trization (either in statespace or in ARMA form) are obtained from a set of 2np "intrinsic invariants" which are determined from the Hankel matrix of impulse responses. To any choice of p structure indices, there corresponds, for a given system, a set of 2np parameters which completely specify this system. The choice of the p structure indices determines in which particular local coordinate space the system is described. From these 2np "intrinsic invariants", a unioue state-space or ARMA parametrization can then be derived; these will belong to the set of overlapping parametrizations. Next we compare different overlappinR parametrizations in terms of asymptotic accuracy of the parameter estimates. He show that, if the determinant of the Fisher information matrix is used as a measure of asymptotic efficiency, then all overlapping parametrizations describing the same process are equivalent, in the sense that they will give the same value to this criterion. Our result implies that if a process is modelled in state-space or ARMA form using a prediction error method, then the determinant of the covariance matrix of the parameter estimates will asymptotically be the same, whichever overlapping ~arametri zation is used. PARAMETRIZATION OF SYSTEMS.
~LTIVARIATE
We consider throughout this paper a p-dimensional stationary full rank zero-mean stochastic process {y } with rational spectrum. Then it is well kn6wn that {y } can be described, up to second order stlitistics, by the following finite-dimensional representations :
M. Gevers and V. Wertz
36Z
Fx
t
+
Ke
+ e
t
(Z. I)
t
where the state x is an n-dimensional vector ; F, K and Hare m1trices of dimensions nxn, nxp, and pxn, F has all its eigenvalues strictly inside the unite cir c le, and {e } is a pdimensional white noise sequence twith covariance matrix Q. ~ep~esentat~on
Input - output
i I H. = H F - K, i
HO = I,
~ep ~esentation
State - space
(ARMA model)
Yt+AlYt_I+ ... +ArYt_r=et+Blet_I+ ... Bset_s (Z.Za) where AI, ... A ,BI, ... ,B are pxp matrices and et is as before. Thissrepresentation is equivalent with the following : (Z.2b) A(z)Yt = B(z)e t where A(z) and B(z) are square pol ynomial matrices in the variable z (z is the advance operator: zYt=Y + l ) , with det A(z) # 0 for Izl>l, t and lim A-I(z) B(z) = I.
1:
H. z
i=O
1, Z, ... (Z.4a)
l
-i
= A-I (z) B (z)
~
(Z.4b)
The impulse response representation (2.3) completely specifies the second-order statistics of the process {y }, namel! the covariance function R (k)t= E {y Y k } ' Y
t
t-
k = 0, I, ...
We can now define identifiability up to second order statistics. Let 8 be the vector of parameters in either the triple (H, F, K) or the pair (A(z), B(z» and let Q be the covariance matrix of the white noise sequence {et} in either (2.1) or (2.2).
Definition 2 : 2 parameter
pairs (8 ,Q ) and (9 ,Q2) are undistinguishable iflanA 2 only if k>O
R (k y
z->«>
(2.5)
Without any loss of generality, we can make the following assumptions regarding these two representations.
Assump tion la : The matrix triple (H,F,K) is of minimal order n, where n is the dimension of the state vector x , i . e. the pair (H,F) is observable and the p1ir (F,K) is controllable. n is then called the order of the process .
where R (k ; 9., Q.) is the covariance y
~
l
function of the process {Y } generated by t model i.
and B(z) are left coprime. It can then be shown that deg det A(z) = n, the order of the proc e ss.
Note that if {y } is Gaussian (or if a second order identification method is used such as a prediction error method), then the probability law (or the loss function) is completely determined by the second order moments, and Definition 2 can be replaced by the following
De f inition la : The set of all minimal triples
Definition 2 ' : For a Gaussian process Z
Assumption lb : The polynomial matrices A(z)
(H,F,K) of order n will be denoted by Sn'
Definition lb : The set of all left cop rime polynomial pairs (A(z),B(z» will be denoted by S~.
with deg det A(z)=n
Eliminating x in (2. I) or premultiplying ' (2.2) by A-I(z) leads to a third representation for the process {Yt } :
)?
H.
(2.3)
i=O ~
where the pxp matric e s H. are called impulse response matrices (or Ma}kov parameters). The infinite matrix H is defined as ~ = [HOHIH Z"'] i
00
L H.z ~s analytic in Iz l';;; I. i=O ~ ~ The infinite column v ec tor Et is defined as t T T T T E = [e, ] . In addition we t e t- I' e t- 2'
with HO=!p' H(z)=
shall assume that the 00
L C.Z i=O ~
l
~nverse
C(z)
=
-I H (z)
t,
~
and et = Yt-Yt / t-I
where
(y T ; Po
9
1
,QI )
T V yO and V T >
( T
= P Y
(Z.6)
o
°
Now it is easy to show that (9 ,QI) and 1 (9 ,Q2) are undistinguishable iff 2 i=O, I ,2 ••• (Z.7) Because QI = Q2' we shall in the sequel drop the explicit dependence of R (k) or p(yT) y 0 on Q. The undistinguishability concept induces an equivalence relation on the sets Sn and S~, which we shall denote by the symbol ~ . It follows from (2.4) and (2.7) that
=
exists and is analytic in Izl< 1. Then
{e } is called the innovation process of {Yt } t
parameter pairs (8 ,QI) and (8 ,Q2) are J 2 undistinguishable ~ff
91
~
92
~~
<=> H2
9t / t - 1 is the linear
least squares predi c tor of Y given the past history yt-I of {Y }. t t The impulse response matrices are related to the representations (2. I) and (2.2) as follows:
Hi ( 9 ) = Hi (9 ) V i 1 2
K2 gular matrix T. c~
-I
HIT, FZ = T -I
T
(2.8)
FIT,
KI for some nons in-
(2.9) A2 (z) = M(z)A (z), B2 (z) = I M(z)BI(z) for some unimodu-
363
On the Problem of Structure Selection lar matrix M(z).
(2.10)
Two matrix triples (HI: F , KI~ and. (H , F , I 2 2 K ) (resp. two polynom1al matr1x pa1rs 2 (AI (z), B I (z» and (A (z), B2 (z» are called 2 equivalent if the relations (2.9) (resp. (2. 10» ho Id. The covarianc e function (or the probability law in the Gaussian case) of the process {y } is completely determined by specifing(H, F,tK, Q) or (A(z), B (z), Q). But because of the nonuniqueness induced by (2.8)-(2.9), in order to achieve identifiabilit y , we have to find a r eparametrization of the family R (k ; 8) or y
p(Y~ ; 8) in such a way that two different sets of parameters (in the reparametrized set) correspond to two different sequenc e s of Markov parameters. From now mweshall, for simplicity, asume that the process {y } is Gaussian; identifiability is th en definea by Definition 2'. All statements hold, up to second order statistics, for non-Gaussian processes if P(Y6; 8) is replaced by Ry(k; e ). What is need e d to achieve identifiability is a factorization of the map p : 8 -+ p(. ; e) in the followin g way :
(2. I I)
(i.e. each local parametrization) is defined by specifying p integer valued numbers ni' ... ,n called "structure indices" p f
::S
. ..
, n n ni' P2np with X € R
(n I' ... ,
(2.14)
n
where.:) (ni' ... ,n ) is a subset of .:s . n n p Each map (2.14) is locally a complete system of surjective invariants of dimension 2np ; the subsets ~ (ni' ... ,n ) for all possible choices of RI' ... , n Poverlap, and cover j . p n
In the next section we shall define the structure indices and show how a choice of structure indices nI' ... , n defines a set of 2np invariants f p . These ni' ... , np 2np invariants are comoutable functions of the imnulse response matrices HO' HI' H , . 2 .. and will be called intrinsic invariants. We shall then show how to construct overlapping state-space or ARMA parametrizations as a function of the 2np intrinsic invariants. CONSTRUCTION OF A COMPLETE SYSTEM OF INVARIANTS. From (2.3) we can write the linear least squares k-step ahead predictor Yt+kl t as follows
H. e 1
Here
j n 1S either Sn or S~n (see Definitions I
above) ; p i s the map defined by the probability law ; P is the image of p. The set X and and p : X -+ pn must -+ X the fun c tion s f :~ n satisfy the followiii g coiiditions
. t-1
k
0,1,2, .. (3.1)
Therefore,
Y t
Yt+ I It
~
Yt+2 l t
HI H2
e
H2 H3
e _ t 1
t
=It Et (3.2)
a) for each e € :Sn' t; = f(8) is finite-dimen(2.12a) sional b) p (.
e)
c ) p (.
t; I )
= P(. ;
P(.
f( 8»
; t; 2) 9
for all e €
:S n
(2.12b)
t;1 = t;2 (2.12c)
The function f consists of a finite number of scalar components, say f , ... , f , which form k l a complete s y stem of invariants (see [2]) for the equivalen ce relation (2.8), since by b) and c)
:
(2. 13)
Since the process is of order n, the rank of the Hankel matrixX is n. Therefore we can choose a set of linearly independent rows of J( , which will form a basis for the whole row space of J(. Now the structure indice s will define which rows oft will form the basis, and we shall show how to construct a set of 2np invariants from for a given choice of structure indices.
X
To any choice of n linearly independent rows of J(, we shall associate a multiindex i = (i , ... , in) where the number;-I~~-~~~~l
The set X can be identified with the quotient n
sets S /~ or S~/ ~ , or, equivalently, with the class nof all nimpulse response sequences H admitting a minimal realization of order n~ Now it can be shown that, when p > I, no single parametrization is able to describe all n-th order s y stems. Rather Xn can be described bv. a famil y of (n-I ) overlapping parametrizations p- I of dimension 2np. Each set of 2np invariants
i , arranged in increasing order, are the iiidices of the rows ofGK that form the basis. Two restrictive conditions will be imoosed on the selection of the basis rows: Condition 1 if j E i. ' then j-p ci. -----------
1,2, ... , pEi. Condition I follows from the structure of the Hankel matrix : if the (j-p)-th row of
M. Gevers and V. Wertz
364
J<
is in the span of the preceding rows, so is the j-th row. Condition 2 results from the full rank assumption on{ y } : it follows At l'~near 1 y ~hat the p components of yt+l/t are
In the notation of Section 11 : f l , ... ,n n
~
p
:..)(nl, .. ·n)_JR n p
{a . ' k' h .. (k)}
{H ,H ,··· }-+f n , . .. ,n I 2 p
~ndependent.
2np
~J
l
~J
: If the selection of the basis vectors obeys conditions I and 2, then the corresponding multiindex is called "nice".
(3.6) We have thus constructed a family of functions f , each of which is a comn , ... ,n
All nice multiindices correspond to a choice of the basis inside the first n1>+1 row blocks of X . For given nand p, there are possible nice multiindices (Example: for a 2-dimensional vector process (p=2) of order 3 (n=3), there are only 2 choices: !I = (I, 2, 3) and!2 = (I, 2, 4». There are however subsets of j for which only one basis exists. n
plete syste~ of surjective invariants mapping onto lR np. These functions are defined on the overlapping subsets; (nl ..• ,n )
~~[f~f!f~~_~
(B:I)
~~[f~f!f~~_~[_!~~_£!~~Q!~~~ _f~~f Q~ £
as the set of all n-th order systems for which the n rows specified by nI' ... , n in the Hankel matrix are linearly indepenijent. 5 (nl,·.·,n ) is a proper subset of~ • (nI' •.. ~ n ) Connsider noJ an element of specified by its Hankel matri~~ • We sha~l construct a complete system of 2np surjective invariants for this system, i.e. a reparametrization of this s ystem usin g 2np parameters. Let H~ be the i-th block of p rows of the infinite Hankel matrix X (e. g. H2 =[ H H H .. 2 3 4 •. 1) and let
Consider an element of
J
~J
0'I
o
(3 5)
completely coordinatize:S (n , ..• ,n ), ~: e. l they map that set in a onento one maRner, on Euclidean s pace of dimension 2np. The impulse response sequence HI' H , H , ..• is com2 3 pletely specified by tne p structure indices and these 2np numbers. These 2np numbers constitute a complete system of surjective invariants, which wi 11 be called "intrinsic invariants" of the process.
o
I
H
(3.7 a)
,," .
0:
o
o
,I
--..--n
F
i, j
[F . . 1 ~J
p
I, ... ,p
(3.7b)
o
0
F .. =
0
. , F .. =
~
~~
0 a
i
~J
o
0 .. a ij I ... a ~Jn.
a ..
iil
~~n. ~
-J
--~---
n.
n. J
~
~j
~
P
:0
I
(3.3)
'- C\J'k h ' k i=I, ... ,p (3.4) J j=1 k=1 These relations define np scalar numbers a ijk · Now denote by h .. (k) the element in row i, column j of Hk·~JThen the 2np numbers {a , ijk k= I , ... , n. ; h .. (k) , k= I , ... , n. ; i, j = I, ... p }
(nl, ... ,n ) for a
q
~J
hI ( nl+ I), .. ·,h P ( np+ I) can be exp¥essed as : -
~
n
of that element. Then the following is a state-space representation of that element
whe re h . are rows of infinite length. Since k
p 1:
j
given set of structure indices, and let {a . ' k' h .. (k) } be the intrinsic invariants
~ is an ~lement of jn(n l , ... ,n ), the rows
hi(n.+I) =
Next we show that one can
construct corresponding overlapping (statespace or ARMA) parametrizations whose parameters are functions of the 2np intrinsic invariants just defined.
:s
Hi=[hli"'\iIT
p
n
:S. n
which cover
p
n
~
p
§!q!~ :£EqQ~ _Eq~~~ !~~q!f~~
Let i = (i , ... ,i ) be a nice multiindex l defining a basis fgr the rows of X . For k I, ... , p, let ~ be the least natural number such that (k + nkP) ~!. Then nI' ... , n are called the "structure indices" correRponding to that basi s ; they specify which rows of~ are taken in the basis. Note that . tln . = n. We can now define:) (nI' ... , n) ~=
l
(3.7 c)
KI
, with K.
K
and k
~
ij
... , h .
K
k.
P
~n. ~
~p
=[ hi) (j) ,
(j»),
a row p-vector
The proof is a straightforward but tedious verification that the relations (2.4a) hold with the a. 'kand h .. (k) defined from the H. ~J
~J
~
as in (3.4)-(3.5).
ARMA Parametrization By an argument similar to that developed in [21 we obtain the following equations for the entries A(z) and B(z) of (2.2b) a . . (z) ~~
a .. (z) ~J
(3.12a)
(3.12b)
On the Problem of Structure Selection and b .. (z)
b .. -
1Jn+
1J
with n
I z
ii.
+ b .. -
1Jn
The a
max n .• 1
1<;;; i<;;; p
z
ijk
!i-I
(3.12c) +
+ b .. 1
1J
are defined by
(3.4), while the b . . are defined as follows 1J k (3.13)
In figs. (3.17)-(3.18) all quantities are to be indexed bv (n , ...• n). These factoril zations of the probabili~y maD make the multivariable models identifiable. Given an arbitrary element (H. F, K) of S (nl •... ,n ). .. . 2 n p wh1ch 1S parametrlzed by n +2np parameters that are not identifiable, we replace this element by an equivalent element of the quotient space Sn/- via the map hl(nl, ... ,n ) p (see Fig. 3.18). This last element is parametrized by the system of 2np invariants defined by hI and appearing in the form (3.7). These 2np invariants are uniquely identifiable. The same can be said if an ARMA form 1S used.
B = M K where
b.1p I
(3,14)
B b
-
io(n+1
]
K
(n+p) xp k
(3. 15) In I
(0 I 0 ... 0)
k21 k
ASYMPTOTIC EQUIVALENCE OF ALL OVERLAPPING FORMS.
pnp
with k .. defined by (3.7c) 1J M = [M .. )
1J
(i,j
365
=
I ... p)
(3. 16)
- a ijl ... -aijn. 0 • J
In most cases an n-th order system can be represented in more than one of the overlaoping forms, because diff e r ent choices of nice multiindices can be made, which will define different sets of linearly independent rows of the Hankel matrix.
Examp l e : Consider a bivariate process M.. 11
-a .. 11ni
;:;-nd
0 0
M. lj
........ ........
0 0
;:;-n j {
-a .. lJnj
o o o
........ ........
We can now summarize the main points of this section by extending the scheme (2.11). For every element of :S (n ,· .. , n ), we have first n l p defined the intrinsic invariants f n , ... ,n l p
Next, by (3.7), (3.12) and (3.13), we have established tl-1O bijections gl (n ,.·., np) and l g2(n l , ... , np ) between the intrinsic invariants (defined on .!i) and th e quoti ent spaces S 1_ and n
S~I n -
0 0
(p = 2) of order 3 (n = 3). Two nice multiindices exist. with the corresponding sets of structure indices: n =2, n =1 or l 2 nl=l, n 2=2. Now the state x of the state-space realization (2. I), with H, )" K defined b¥ (3.7). is made UP of the components of Yt_Iindexed by the element of the s e lected nice multiindex (see e.g. (6)). In our exam"Jle; -
- for 'l1=1, n =2 2
i. 1=(1,2,3)
i. 2 =(I,2,4)
~ X
p
"1 n (3.17)
for n =2, n =1 l 2
t
AI Ytl t-I AI 'ft+l/t-I A2 ;' tl t-I
~
x
t
AI y t/ t-I A2 y t/t-I A2 y t+ I It-I
In most cases, both choices ~fe oossi~le, but one might think that if Yt+l/t-I 1S close to the linear span of yll I and t ty2
, then the choice of i would be prebecause the components of the state would be more orthogonal to one another, thereby making the ensuing parameter estimation problem numerically better behaved. More generally the question is whether anyone of the overlapping parametrizations is optimal in some sense. We present a partial answer to this question.
f~(~ble
Hence, by a property of complete sets of surjective invariants (see, e.g. [2]). hi(n l • " .• n ) = g. (nI' ...• n ) 0 f are also p 1 P nl .. ·n locally complete sets of surjectiv~ invariants. This leads to the following factorizations : ISPE - l -!"\""
Theorem : Given the intrinsic invariants
M. Gevers and V. Wertz
366
{a.
'k,h .. (k)
~J
~J
}
and {a.* 'k,h*.. (k)} corresponding ~J
~J
to two different sets ~f structure indices nl, ••. ,n and n*, ... ,n , then the determinants of Pt he in£ormatign matrices corresponding to these two parametrizations are identical. The proof of the theorem following lemma.
follows from the
Lemma: Let {a. 'k,h .. (k)} and {a~. ,h~.(k)} ~J
-----
~J
~Jk
~J
be the intrinsicjnvariants of a given n-th order system in ~ for two different choices of the structure igdices. Then the Jacobian of the transformation between these two parameter vectors is unity. The proof of the lemma can be found in [ 6) • The theorem can then be proved as follows. Let e = a. 'k,h .. (k)} and e = {a. 'k,h .. (k)}.
{
~J
~J
*
*
~J
*
~J
The corresponding information matrices M and Me* are related by t: e M* e
E * {( alog p(Yle*)T)( alog p(Yle»} yle ae* ae*
=(~)TEy ae*
le
{(alog p(Yle)T)(alOg p(Yle»}(.£..L) ae ae ae*
(~)T M (~) rie* e ae* It follows from the bijective relationship between the intrinsic invariants a .. ,h .. (k) and the corresponding overlapping ~~~am~tri zations H,F,K or A(z), C(z) that the theorem also holds when two overlapping (state-space or ARMA) parametrizatio~are compared.
=
C?rollary : Given*tw~ o¥erlappine parametrizat~ons F,K,H and F ,K ,H in ihe form (3.7) (resp. A(z),B(z) and A*(z),B (z) in the form (3.12» for the same process, corresponding to two different seti of struiture indices {nI' .:., n } and {nI' ... , n } then the determ~nantR of the ~nformatign matrices corresponding to these two parametrizations are identical. If the parameters are estimated using a maximum likelihood or a prediction error method, then the covariance matrix of the estimation errors is asymptotically equal to the inverse of the Fisher information matrix Me ' Therefore all overlapping parametrizations are asymptotically equivalent, as far as the accuracy of the parameter estimates is concerned, when this accuracy is measured by the determinant of the covariance matrix of the estimation errors. Of course other criteria could be used that might be able to discriminate, even asymptotically, between different structures, see e.g. [7). Some structures might also be better than others when only a finite data record is available. In (4) and [6] some heuristic selection procedures have been proposed to handle the finite data situation. t I f x ~s . a scalar and e a column k-vector,
then
fi
denotes the row vector*-, ..• , 1
*.k .
CONCLUSIONS. The problem of specifying identifiable parametric structures for multivaria~le systems can be solved by a factorization of the probability map in such a way as to define a finite set of invariants which completely characterize the process. Proceeding in this way we have constructed a family of overlapping parametrizations which completely cover the set of finite-dimensional minimal-order systems. Since a given process can in general be represented by different overlapping parametrizations, the question then arises as to whether some parametrizations might yield more accurate parameter estimates than others. Our main result is that all overlapping parametrizations yield asymptotically the same value to the determinant ~f the information matrix. Therefore, when a prediction error identification method is used for the estimation of the parameters, all overlapping parametrizations will give the same value to the determinant of the asymptotic error covariance matrix. REFERENCES. [I) M.J. DENHAM, "Canonical Forms for the Identification of Multivariable Linear Systems", IEEE Trans. A.C., vol. 19, pp. 646-656, 1974. (2) R.P. GUIDORZI, "Invariants and Canonical Forms for Systems Structural and Parametric Identification", Automatica, vol. 17, pp. 117-133, 1981. [3) K. GLOVER, J.C. WILLEMS, "Parametrization of Linear Dynamical Systems, Canonical Forms and Identifiability", IEEE Trans. A.C., vol. 19, pp. 640-646, 1974. (4) L. LJUNG, J. RISSANEN, "On Canonical Forms, Parameter Identifiability and the Concept of Complexity", 4th IFAC Symp. on Identification and System Parameter Estimation, Tbilisi, URSS, 1976. [5) A. J. M. VAN OVERBEEK, L. LJUNG, "On Line Structure Selection for Multivariable State Space Models", 5th IFAC Symp. on Identification and System Parameter Estimation, Darmstadt, FRG, 1979. [6] V. WERTZ, M. GEVERS, E. HANNAN, "The Determination of Optimum Structures for the State Space Representation of Multivariate Stochastic Processes", to appear in IEEE Trans. Autom. Control, Oct. 1982. (7) J: ~ISSANEN, "Estimations of Structure by M~n~mum Description Length", Proc. Intern. Workshop on Rational Approximations for Systems. Leuven, Belgium, 1981. [8] G. PICCI, "Some Numerical Aspects of Multivariable Systems Identification", Proc. of the Workshop on "Numerical Methods for System Engineering Problems", Lexington, Kentucky, June 1980.