Cross-validation Ideas in Model Structure Selection for Multivariable Systems

Cross-validation Ideas in Model Structure Selection for Multivariable Systems

Copyright © IFAC Ide ntificat iun and S~' s 'ern Pa rameter Estimation. Be ijing. PRC 1988 CROSS· VALIDATION IDEAS IN MODEL STRUCTURE SELECTION FOR M...

1MB Sizes 0 Downloads 69 Views

Copyright © IFAC Ide ntificat iun and S~' s 'ern Pa rameter Estimation. Be ijing. PRC 1988

CROSS· VALIDATION IDEAS IN MODEL STRUCTURE SELECTION FOR MULTIV ARIABLE SYSTEMS P. Janssen*, P. Stoica**, T. Soderstrom*** and P. Eykhoff* *Dfpartmnlt of Electrical Engineering, Eindhovl'll Universit~ of T echn olog)' (EUT), p .a . Box 513, NL -5600 MB EindhovPll , The Netherlands **FaCll ltatea de Automatica, 1nstitutul Politec/lllic Bucuresti, Splaiul Independentei 313, R -77206 Bucuresti, Romania *** Uppsala Unit 'enit)', Institute of T echll olog)', p.a. Box 534 , 5-751 21 Uppsa la, Sweden

Abstract: Using cross-validation ideas, two procedures are proposed for making a choice between different model structures used for (approximate) modelling of multivariable systems. The procedures are derived under fairly general conditions: the 'true' system does not need to be contained in the model set; model structures do not need to be nested and different criteria may be used for model estimation and validation. The proposed structure selection rules are invariant to parameter scaling. Under certain conditions (essentially requiring that the system belongs to the model set and that the maximum likelihood method is used for parameter estimation) they are shown to be asymptotically e quivalent to the (generalized) Akaike structure selection criteria. Keywords: Structure selection, cross-validation,

1.

multi variable systems

the model set). These results were presented for single-output systems and for residual sum-ofsquares parameter estimation criteria.

INTRODUCTION

When identifying dynamical systems a central issue is the choice of the model structure which will be used for representing/ approximating the system under study. Many researchers have approached this topic and a multitude of methods for choosing the model structure has been proposed. (see e.g. Stoica et al. (1986) for a recent overview).

The aim of this study is to generalize these results in three directions. We will consider multivariable systems and general parameter estimation criteria. Moreover, we will allow that the criterion used for validation differs from the criterion used for estimation.

Most of the proposed methods assume that the 'true' system belongs to one of the candidate model structures and try to select a 'right' structure. In practice, however, this assumption is unlikely to be fulfilled and all we can hope for is to select a model (structure) giving a suitable approximati o n o f those system features in which we are interested. Therefore we would like to view the mode l structure selection problem as choosing, within a set o f candidate structures, the 'best' one according to a certain criterion, expressing the intended (future) use of the model.

The outline of the paper is as follows: in section 2 some basic assumptions are introduced. In section 3 we present two cross-validation criteria which are extensions of the proposals in Stoica et al. (1986). Some asymptotic results for these criteria are given in section 4. Section 5 presents some concluding remarks.

2.

PRELIMINARIES AND BASIC ASSUMPTIONS

The system that generated the data is denoted by S; it is assumed that the data are realizations of stationary ergodic proces ses. Let M( e ) denote a model for representing/approximating S, where e is a finite dimensional vector

In this context the concept of cross-validation or cross-checking (see e.g. St one (1974» would be an appealing guiding principle. Roughly stated, cross-validation c o mes down to a division of the experimental data set into two subsets, one t o be used for estimation o f the model, and the other one to be used for evaluation of the performance of the model (i.e. for validation), hereby reflecting the fact that one o ften wants to use the model on a data se t different from the one used for estimation. In this way one can assess the performance of various candidate model structures and thereby select a 'best' one .

o f unknown parameters; e
Based on these ideas Stoica et al. (1986) proposed two cross-validation criteria for model structure selection. The assumptions made for deriving these criteria were fairly ge neral (e.g. the system does not need to belong to the model set; model structures do not need to be nested), and the resulting procedures were invariant to parameter-scale changes. Moreover, it was shown that these criteria are asympto tically equivalent to some well-known structure sele ction criteria if additional assumpti o ns are made (implying, in fact, the requirement that the system belongs to

We will assume that the estimate, say e, of the unknown parameter vector e of M( e ) in the set M, is obtained as e = arg min e<::

67 1

V(e)

(2.1 )

672

P. Janssen et al.

8,~,

1

where

V(8):=

N

N

L

Here £(t,8),RQ is the 'estimation-residual' of M(8) at time instant t, and N is the number of data-points. l(t,8,£) is a scalar valued measure, "measuring" the estimation residual £. The performance of a specific model M(8) in the set M has to be assessed with the intended use of the model in mind. In order to obtain more flexibility we allow that the criterion used for validation/performance assessment can differ from the criterion used for estimation (see e.g. Cor rea and Glover (1986), Gevers and Ljung (1986». Let r(t,8),R h be the 'validation residual' associated with M(8), i.e. the quantity used for validation (Remark: h does not need to be equal to q). Using the scalar-valued measure f(t,8,r) for "measuring" the validation residual r, we can define the average mis-performance of the model M(e) over the set of validation data points Iv as N

t.I

v

Here N

v

(2.2)

f(t,e,r(t,8» v

denotes the number of data points in I

v



For later use we will also define: N

J(8)

:=

.1.

L

f(t,8,r(t,8»

(2.4b)

1.1

l(t,8,£(t,8».

t=1

L

for all t

(2.3)

N t=1 The formulation given in (2.2)-(2.3) is rather general, and by appropriately defining r(t,8) and f(t,e,r) we can express various intended uses of the estimated model (e.g. one-step ahead prediction, multi-step ahead prediction, simulation etc.) • Remark 2.1: Although one could argue that, ideally, the intended use of the model should be reflected in the choice of the estimation-residual £(t,e) and the function l(t,e,£), we have chosen here (unlike Stoica et al. (1986» to make a distinction between estimation and validation criteria. This flexibility makes it possible to cover situations in which we estimate models by equation error nethods (which are computationally low demanding) .nd then assess their performance, for example, by )utput-error measures. Moreover, it offers us the ?ossibility to treat estimation methods which do 10t minimize a criterion, such as instrumental lariable methods. See Janssen et. al (1987) for letails on this aspect.

where denotes the Euclidean norm. 1£(t,8,£) denotes the partial derivative of I with respect to £ etc. *** This condition is imposed in Ljung (1978) for obtaining results on the convergence of in (2.1) when N tends to infinity.

e

Assumption 2: £(t,8) and r(t,8) are sufficiently smooth functions of 8 such that their derivatives with respect to 8 exist and are finite for any a,~. *** Assumption 3: The second order derivative matrix ~aa(a) is positive definite.

(This implies that a is an iso-

lated minimum point of V(a». ***

Assumption 4: The functions l(t,a,£(t,a», f(t,a,r(t,a» and their derivatives of first and second order with respect to a are stationary ergodic processes for any a,;. Moreover we assume that the sample moments involving functions of the above processes converge to the theoretical moments as N tends to infinity at a rate of order O(1/IN). *** These assumptions are all fairly weak; see Stoica et al. (1986) for further comments.

3.

CROSS-VALIDATION CRITERIA FOR MULTIVARIABLE MODEL STRUCTURE SELECTION

As we indicated in the introduction, the basic idea behind cross-validation for model discrimination is to divide the data set in two subsets. One will be used to estimate the model. The other data set will be used to assess the performance of the estimated model (i.e. for validation). The validation has to be performed with the intended use of the model in mind. Based on this idea, two cross-validation criteria have been proposed in Stoica et al. (1986) for the case where £(t,e) is scalar and where l(t,e,£) = £2; r(t,e) = £(t,8); f(t,a,r) r2. We will now extend these criteria to the general situation described in section 2 •

••• Let

I !emark 2.2: ~he quantities in (2.1)-(2.3) should normally be .ndexed to show that they correspond to the model :tructure M, for example ~M' vM(e ), £M(t, 8 ) etc. M M lowever, to simplify the notation we will omit the .ndex M whenever there is no possibility of con·usion.

...

'inally we introduce some regularity conditions hat are assumed to hold throughout the paper. ssumption ,:

= {1,2, .....

I

P

(t,e,r):RxR

ne

xR

h

+

Ik

p

1, ..... ,k-1 (3.1b)

{(k-1)m+1, ..... ,N)

[!! J ([ x J m

de-

notes the largest integer not greater than x). Our first cross-validation criterion for assessing the model structure M is obtained by using the various subsets Ip for validation and the complementary sets I-I for estimation: p

Cl :=

(2.4a)

L

L

p=1

t.I

a := arg p

min a,;

+ R are twice continuously

for all t

{(p-1 )m+1, •••• ,pm)

k

Rand

ifferentiable with respect to ( e , £ ) resp. (e,r) nd there exists some finite constant C such that 8,~,

(3.1a)

for some positive integer m and k =

n

he functions l(t, e ,£) :RxR e xRq

,N}

and

f(t,a ,r(t,a » p p

(3.2)

I

(3.3)

p

where

t,I-I

l(t,a,£ (t,a» p

Cross-\'alidalion Ideas Exact evaluation of Cr would be very time-consuming. Therefore an asymptotically valid approximation of Cl is derived that is much easier to com-

pute. Theorem 3.1 Let assumptions 1-4 be true. enough we have 1

-

N

= Cl +

C I



[J 8 (8)]

T

Then for k large

III

Model Structure Selection

Cl in the sense that it uses the various subsets I for estimation and the corresponding subsets Ip f " Ip or val~dat~on (as a result the length of the estimation subset, m, is now (much) smaller than the length of the validation subset, N-m). This criterion has the form: k

CII :=

1 .0 ( - - ) + k 2 /m

673

~1 t~I-Ip

f(t,8 p ,r(t,8 p »

where

where

k Cl

:=

I

J(8) + N2

p=1

zT(e) P

J( 8) + - tr VS§( N2

8)

I

ve~(e)wp(8)

W( 8)

1(t,8,E(t,8»

t
( 3.5)

An asymptotically valid approximation of C II given in the following theorem. Theorem 3.2: Let assumptions 1-4 be true.

with

~8

( 3.9)

p

is

Then for m and k

large enough, we have

[f(t,8,r(t,8»]1

• p=l, ••• ,k 8=8

+

(3.6a) w (8)

a8 [1(t,8'E(t,8»]1

p

o(

• p=1, ••• ,k 8=8

( 3_10)

3/2 ) min(N,m )

where

(3.6b) k W( 8)

I

(3.6c)

p=1

where k Q( 8):=

The above result holds for both 'large' and

p=1

'small' values of m.

Proof: See Janssen et al.

(1987) ***

Since (cf. Janssen et al.

I

w (8) wT(e) p P

( 3.12)

and w (6) is given by (3.6b) • p In cases where V(8) = J( 8) ( Le. validation criterion = estimation criterion) we obtain:

(1987), remark 3.2)

1

(k-l )N

C

II

= C

2

+

o(

min(N,m 312 )

)

(3.13)

where

k C := V( 8) + - - tr 2 2N2 k k v(8) + - 2N2 p=1

the term Cl in (3.4) will be an asymptotically valid approximation of approximate criterion

~

Cl'

I

In general this

Q(e)VS~(8) w;(e)ve~(e)Wp(8)

C, is much easier to compute

than Cl' Furthermore, if the parameter estimation is performed by minimizing V(8) in 12.1) by a. Newton-Raphson algorithm, then ve~(8) and w (8) p can be obtained easily from the last iterat~on of this algorithm. So if the criteria used for estimation and validation are identical (i.e.

£(t,8,E(t,8» = f(t,8,r(t,8», then Cl can be evaluated with a modest computational effort. In all other cases some e~tra effort should be spent in order to compute J(8) and zp(8). We now state our first structure selection rule,

based on the (approximate) cross-validation criterion Cl:

Proof: See Janssen et al.

(1987).

The second term in expression (3.11) has order o(~). The criterion C2 can therefore be used as an asymptotically valid approximation of 1

, .

(k-1)N C lr only ~f J 8 (S) is of order o( 1).

rule: ~se the model structure which leads to the smallest value of Cl' where Cl is defined by (3.5)-(3.6). *** This procedure will depend on the selection of m. Some considerations on the choice of m are given

(1986) and will not be repeated

Next we will present a second cross-validation assessment criterion which is "complementary" to

In

general, this will only be the case if we use the same criterion for validation and estimation,

(i.e. V(8) = J(8» or if the system belongs to the model set (see remark 3.3 in Janssen et al. ( 1987». Based on the approximation of

First cross-validation model structure selection

in Stoica et al. here.

(3.14)

1

(k-1 )N

we can now state a second model-structure selec-

tion rule. Second cross-validation model-structure selection

rule: ( This selection ,ule should only be used in situations where J (8) is of order o( 1), e.g. if J(8) 8 = V( 8»). Choose the model structure that leads to the smallest values of C ' where C is defined by (3.11). 2 2 ***

674

P. Janssen et al.

Note that C 2 depends, amongst other things, on k and m. For a discussion on the choice of these parameters we refer to Stoica et al. (1986). Remark 3.1: Choosing r(t,e)=£(t,e), l(t,e,£)=f(t,e,£) = £2 for scalar £, the asymptotic results in theorems 3.1 and 5.1 of Stoica et al. (1986) are easily seen to be special cases of theorems 3.1 and 3.2 presented above. *** Remark 3.2: The criteria C resp. C , in (3.5) resp. (3.11), 1 2 consist of the main contribution of J(e) and a penalty-term having order O( 1/k/m) resp. 0(1/m) (cf. Remark 3.5 in Janssen et al. (1987». Moreover note that for nested model structures M1 , M2 with M1 C M2 we will necessarily have VM (eM) 1

If J(e)

I

VM (eM

1

2

focus on the situation where the validation criterion is identical to the estimation criterion, and where the function l(t,e,£) is the negative logarithm of the Gaussian probability density function p(e,£) given by

[(2n)q det A(e») -

p( e,£)

( 4.1)

r(t,e) = £(t,e)

(4.2a)

f(t,e,£) = l(t,e,£):= -log p(e,£)

(4.2b)

Ljung and Caines (1979) showed that under weak conditions (implied by our assumptions 2-4), as N tends to infinity e .. e* = arg min

2

(6 M1

) can possibly be

0 (_1_)

(eM) - J (6 ) will be 0(1). M2 M2 M1 1 In such a case, the "best" model structure can be selected by "minimizing" J( El) over the set of candidate structures. The second term in C (and 1 C2 ) will asymptotically have no influence on the choice of the model structure and can therefore be neglected. This, of course, simplifies the structure selection procedure to a great extent. However, in other cases, for example, if the system is close to the model set or belongs to it, the second term in C 1 (and C ) need to be considered 2 when choosing the "best" structure (since in the difference J

1

(eM )-J 1

M

2

(eM) may be 0(1». 2

(4.3a)

and

M1

(eM). Furthermore if the system M 2 2 does not belong to the model set then, in general,

M

E l(t,e,£ (t,e» (wp1)

e.=:

(4.3b)

IN

smaller than J

such a case J

exp[- 2 £TA-l( e)£)

In (4.1) A(e) is a qxq positive definite matrix which may depend on the parameter vector e. So we assume that

V(e), then a similar inequality does not

need to hold for J(e): J

2

In order to simplify expressions (3.4) and (3.10) for Cl and C we impose the following II extra condition: Assumption 5 !£(t,e*)} is a sequence of independent and identically distributed gaussian random vectors with zero mean and covariance matrix A*:= A(6*). *** This assumption and (4.2) implies that the estimation method is a maximum likelihood method. Assumption 5 is essentially equivalent to requiring that j£(t,e)j are one-step-ahead prediction errors and that S.M. Using this additional assumption, we obtain the following asymptotic results:

***

Summarizing, we have presented two cross-validation model-structure selection rules for multivariable systems which are generalizations of the proposals in Stoica et al. (1986). These rules are invariant to scaling of the parameters (see Janssen et al. (1987» and are applicable to nonnested model structures. Moreover, the proposec structure selection rules are clearly seen to depend on the estimation criterion and the quantities used for validation. They necessitate the estimation of the parameters in the various candidate model stuctures, and can therefore be classified as a posteriori methods. In the next section we will present, under additional assumptions, some asymptotic results for the cross-validation criteria introduced above.

Theorem 4.1: Let (4.1)-(4.2) hold and let assumptions 2-5 be true. Then we have, for m ) 1 and sufficiently large k: 1

N Cl

=

2N

AIC

+o (~)

(4.4)

mk

and, for large k and m (k-1)N Cn

2N

GAIC + 0 (

) (4.5) m.min(k1,m1)

where AIC and GAIC are defined as AIC:=

-2L( e ) + 2ne

(4.6)

GAIC:= -2L(e) + kne

(4.7)

and where L( e ) is the log- likelihood function defined as N

4.

L( e ):=

SOME ASYMPTOTIC RES ULTS FOR THE PROPOSED CROSS-VALIDATI ON CRITERIA

L

log p(e,£(t, e »

Proof: See Janssen et al. In Stoica et al. (1986) it was shown under some additional conditions (in fact coming down to requiring that the system belongs to the model set) that the proposed cross-validation criteria are asymptotically equivalent to the well-known (generalized) Akaike criteria. In this section similar asymptotic results will be presented for our generalized cross-validation criteria (3.2) and (3.8). In order not to make our assumptions and results too intricate, we will

-N v( e)

(4.8)

k=1 (1987). ***

AIC defined in (4.6) is the information criterion proposed by Akaike (cf. Akaike (1974,1981». GAIC denotes its generalized version, considered by various authors (see Stoica et al. (1986) for appropriate references). Theorem 4.1 shows that if Assumptions 2-5 are fulfilled, then our crossvalidation criteria are asymptotically equivalent to the (generalized) Akaike structure selection criteria. Thus Theorem 4.1 renders a nice cross-

Cross-\O alidation Ideas in \1odel Structure Selec tion validation interpretation to the (generalized) Akaike critera (see Stoica et al. (1986) for further discussion on this aspect).

5.

CONCLUDING REMARKS

Making use of cross-validation ideas we have proposed two new criteria for model-structure selection of multivariable systems, thereby extending the results originally presented in Stoic a et al. (1986) for single-output systems. The cross-validation criteria were derived under fairly general conditions (the system does not need to be contained in the model set) and we did not require that the criteria used for validation and estimation should be the same. The resulting structure-selection methods allow for discrimination between non-nested model structures and are invariant to parameter-scaling. Some asymptotic equivalences between our methods and the (yeneralized) Akaike criteria for structure selection were established under additional (somewhat restrictive) conditions. The proposed procedures necessitate estimatio n of the parameters in the various candidate model structures, which can be computationally costly, especially for multivariable systems. In using these procedures, we have to choose (amongst other things) the parameters k and m. Some guidelines for choosing these parameters have been presented in Stoica et al (1986). However, further work is needed to better understand the influence of k and m on the behaviour of the proposed structure selection. Although the proposed cross-validation structure selection methods appear appealing, some critical remarks may be justified. One can object that the performance of an estimated model is often judged on the basis of its use on future data sets, different from the one used for estimation. This aspect is insufficiently covered by the specific subdivisions of the data-sequence in validation resp. estimation subsets, as done in Cl and C II • Therefore, one could argue that the proposed cross-validation methods need not ~ecessarily guarantee that the selected model is a (near)optimal one for use on future data sets (see also Rissanen (1986}). Concluding, we point o ut that cross-validatio n assessment appears to be a useful and appe aling concept in model (structure) sele c tion. Ho wever, further study is needed to obtain mo r e insi g ht into the possibilities and limitations of t h is approach.

REFERENCES Akaike, H. (1974). A new look at t h e statistical model identification. IEEE Trans. o n Autom . Control, 19 , 716-7 2 3. Akaik~1981}. Modern development of statistical methods. In P. Eykhoff (Ed.), Trends and Progress in Sy stem Identification~­ gamon Press, Oxford, pp. 16 9-18 9 . Correa, G. O., and K. Gl o ver (19 86 ). On the c hoice of parameterization for identificati o n. IEEE Trans. on Autom. Control, 31, 8 -15. Gevers, M., and L. Ljung (19 8 6)~ Optimal experiment deSigns with respect t o t h e inte nde d model application. Automatica, 22, 543-5 5 4. Janssen, P., P. Stoi c a, T. Soderstrom-and P. Eykhoff (1987), Model structure selection for multivariable systems by cross-validatio n methods. EUT-Report 8 7-E-176, Ein dhoven ~ity of Techno logy, The Netherlands.

675

Ljung, L. (197 8 ). Conver gence analysis of parametric identificati o n meth ods. IEEE Trans. on Autom. Co ntrol, 23, 77 0-7 8 3. Ljung, L., and P.E. Cain;; (1979). Asymptotic normality of prediction error estimation for appro ximate system models. Stochastics, 2, 29-46. Rissanen, J. (1 9 86). Order estimation by acc umulated prediction erro rs. J. of Applied Probability, 23A, 55-61. Stoic~P.~khoff, P. Janssen, and T. Soderstrom (198 6 ). Model-structure sele c tion by cross-validation. Int. J. Control, 43, 184118 78. Stone, M. (1974). Cross-validatory choice and assessment o f statistical predi c tions. J.R. Statistical So c. B., ~, 111-1 3 3.