Experiment Design for Grey-Box Models

Experiment Design for Grey-Box Models

Copyright © IFAC 12th Triennial World Congress. Sydney. Australia. 1993 EXPERIMENT DESIGN FOR GREY-BOX MODELS H. Melgaard-, P. Sadegh-, H. Madsen- an...

883KB Sizes 0 Downloads 119 Views

Copyright © IFAC 12th Triennial World Congress. Sydney. Australia. 1993

EXPERIMENT DESIGN FOR GREY-BOX MODELS H. Melgaard-, P. Sadegh-, H. Madsen- and J. Holst-*/nstituJe

0/ Mathematical Statistics and Operations Research, Building 321, The Technical University 0/ Denmark, DK-2800 Lyngby, Denmark. uDepartl1ll!nt 0/ Mathematical Statistics, Lund InstituJe a/Technology, S-22100 Lwui, Swedi!n

Abstract. In this paper, design methods are presented, where information related criteria for optimality of the precision of the resulting parameter estimates are extended with prior information, corresponding to the available physical knowledge about the system, and with cost related loss function elements, reflecting the importance of the obtained parameter estimates . Bayesian as well as non-Bayesian techniques are discussed. An example is given on the grey-box approach for optimal design of a power constrained input sequence . Key Words_ Experiment Design; Parameter Estimation; Stochastic Systems; Bayesian statistics.

of the parameter estimates to be obtained, in general the covariance of the parameters. The Cramer-Rao inequality gives a limit to the covariance of any unbiased estimator g(Y) of B, subject to certain regularity conditions, cL (Rao, 1965),

INTRODUCTION The design of experiments for identification of models of physical systems, should be based on available partial, grey-box, information on the system to be studied. For a discussion of the grey-box approach to system modelling see (Bohlin, 1989; Graebe, 1990; Hoist et aI., 1992) . In this paper, design methods are presented where information related criteria on the precision of the resulting parameter estimates are extended with such prior information. In the resulting experiment design, the character of the excitation signal and its parameters, such as sampling time, frequency, amplitude, duration are specified.

cov(g) ~ M-I

(1)

where M is the Fisher information matrix, defined by

M

=E YIO

Analytical and numerical methods have been used during the last decades, to achieve a design of an experiment such that it is maximally informative. Most frequently a non-Bayesian approach have been used where a function of the Fisher information matrix is maximized with respect to design characteristics, see (WaIter & Pronzato, 1990) and (Melgaard et aI., 1992). For some simple systems the optimal design, as e.g. the spectral characteristics of input signals, can be determined analytically. However in the classical approach, the optimal design is a function of the unknown parameters. Obviously this calls for a Bayesian procedure where the prior knowledge of the parameters can be parameterized hy the prior distribution function, which expresses the partial information on the system available prior to an experiment.

{(alogp(Y\B))T(alogp(Y\B))} aB aB'

(2)

Y is a vector of all observations, B is the unknown parameter vector, and p(Y\B) is the conditional probability density of Y for given B. By assuming that the estimator is asymptotically efficient, a rationale for using the Fisher information matrix as a suitable characterization of the asymptotic parameter uncertainty is obtained . In the Bayesian approach a loss function L(B, 0) is defined, which describes the consequences of obtaining 0, when B is the true parameter vector. This function is then used as a basis for obtaining an optimal experiment design . Prior to the experiment the expected performance is

(3) 2

CRITERIA

The criterion may be optimized with respect to the allowable experimental conditions. By making suitable assumptions J I can be simplified to:

In order to perform the quantitative part of the experiment design, a measure of the information achieved from an experiment is needed. It is common practice, cf. (Goodwin & Payne, 1977), to select a performance measure related to the expected accuracy

Eo,Y[L(B, 0)] Eo[L(B, B) 489

= EoEY1o[L(B, 0)]

+ ~trace(a2 L/a( 2 )M-I]

(4)

where M is given by (2) and it is assumed that 0 is an efficient unbiased estimator. The criterion (4) can be generalized to

2.2

Evaluation

0/ the

(.'riI.tTion

Consider a linea.r discrete t.ime mod el of the form: (8)

(5) for a suitable scalar function .pC)' which accounts for the mapping of the expected uncertainty in the parameter space into a scalar measure. In the classical case the design is obtained simply by evaluating .p(M- I ) at the prior mean

where {ud and {y,J are t.he input. and output sequences, respectively, and {(,,} is a sequence of Gaussian random va.riables having covariance a. Cl and C 2 are transfer fUllctions. [n t.he following an expression for M will be derived for the model (8). Define ~, as t.he residual sequence, t.hen:

(9)

(6) and A number of different functions .pC) may be considered. The Bayesian approach (4) leads to a function of the form .p1(M- I ) = trace(WM- I ), where W is some weight matrix. An experiment minimizing this criterion is known as A-optimal. It corresponds to minimization of th e mean dispersion of the estimates of the parameters. Another commonly used criterion is .p2(M- I ) = det(M- I ) , which results in D-optima/ designs. This criterion minimizes the volume of the confidence ellipsoid of the estimates.

N

E , {~",(ih')T(a. (,)}

M

\ 10 a ~ aB

aB

'=1

N a(1 T a(1 + 2(12 ( aB) (aB)'

(10)

Goodwin &. Payne (1977 , p. 131). yields:

Superposition

(11) where

2.1

8f, iJB

Physical Measures o/In/m'mation

m,

The classical measures consider all the parameters to be of equal importance, which however seldom is the case. Hence, dealing with physical models, other measures than the classical considered above may b e of interest. Let us assume that the physical parameters of interest can be described as f(B), i.e . by some t.ransformation of the entire set of parameters. It is important to take this transformation into account in the design. Then Gauss' formula gives the information matrix for the transformed parameters , i.e.,

()( ( J ,_I 2 q

3C ( q ) ) -----;:;eu,

(12)

,_I ( )( - ( ,2 q

3C 2 (q)) (, -----;:;e-

(13)

-

aB

>1

After substitut.ing t.hese expressions in (10) and assuming no feedback in the system (!L' is indepe ndent of e,), we get: N

M = ~ "'( Dr., )T( i)t, (1 ~ aB aB '=1

)

+

M

(14) C

where Mc does not depend upon t.he choice of input. signal. (7)

Now make the following simplifying assumptions: I) The experiment. t.ime is large. 2) The input. {!L,} is rest.rict.ed t.o t.he class admitting a spectral representation with spect.ral distribution function F(w) , wf[-7r,7rJ. 3) Th e allowable input power is constrained.

Any of the standard measures of information can now be applied on Mf(f). In Melgaard et al. (1992) a physical model of the thermal characteristics of a building is considered. The classical measures of information all le ad to the same optimal design of input. signal, a certain prbs sequence. It t.urned out that the main interest is not focused on the individual parameters of the model, but on th e overall heat transmittance and internal heat capacity of the building. These characteristics are given as a function of the model parameters. An optimal design considering this application specific measure of information leads to an input signal, which is a step. This design turns out to be much different from the original design, (more weight on low frequencies), but it reflects the demands from the building physicists. This procedure can be regarded as an alternative t.o specifying t.he loss function , L(B,O).

Consider now t.he ilverage information mat.rix, which is d efined by: lim

NI M

( 15)

N-I"X!

If"

-

(M(w)

7r .o

+ Mc)d~(w) ,

wh e re

d~(w) = 490

{

&dF(w) dF(w)

w = 0, w wrJO,7r[

= 7r

(I6)

and

the parameter·s. The mean-value th em'l' m states, that for all ~(w) th er'e exis ts a O· E such that

ne

I( JW) .e ;;1 [dG\(aBeJW )]TC'_ '2 (' .

M(w)

R {

c:; I (e- JW)[ lJC 11(~-)W )]}

J

( 17)

and

= [((-logde t M(~(w),O·)p(B·))

(21)

for so m e co nstant 1\' which is independent of ~ (w) and

0·. It is seen that (2 1) has th e form of th e (I'it erion co nsidered in th e previous Theor·em. T h e ciiffen:nce is that now O· depends upon ~(w) . But since (16) is still valid, we conc/udl' that th e set of all averag e information matrices is the convex hull of all av erage informati on matricl's cO I.,-esponcii ng to singl!' frequ enc y des igns , and th e pmof follow s imm edia tely. 0

( 18) The power restrIction of t.he input. sign,d can be for mulated by:

Pu

11"

=-

7r

d~ (w)

=1

From the proof it is readily seen that in th e general case (5) it is also possible to find an optimal d esign comprising a finit e numb er of sinusoids, as long as !/I ( .) is a continuous fun ction of th e information matrix.

( 19)

0

The following Th eorem states that it is always possible to find an optimal input comprising a finite number of sinusoids.

3

C onsid er a fir s t order stochastic diffe re ntial eq uation with di sc rete obse rvations:

THEOREM 2.1: For any POW!T cons/mined des ign ~ I (w) with corre ,~ponding p x p (IVI 'mg!> mfomwtion matrix M(~I). th ere always C'xists a powel' co nstmin ed d esign 6(w) whieh is piecewisc co nstrlTlt with at most p(p+l)/2+1 dis co ntinuities and M(6) = M(~Il , For th e d esign c rit erion J = - log d e t M, optimal des igns exist co mprising not morc than p(p + 1)/2 sinusoids. PROOF: see (Fedorov, 1977). 0

2 ..'3

1972;

dx (t)

-a x(t ) dt

+ bull) dt + dw(t)

(22)

(23)

=

where w(t) is a Wiener pro cess with t.,'(dw(t)) 0, V(dW(t)2) = rdt , and e2, k is a Gaussian white noise process with V(e2,k) = T2. dw(t) and e2,k are assumed to be inde pendent ; subscript k is s horthand for tk. We are inte rested in the se t of param eters 0 = [a, and to find an optimal input signal for the system. Since th e inputs are assum ed to be co nstant within a sampling inte rval , the co rres ponding discret e time model is writte n:

rioodwin t'i ['ayn c,

bV ,

Bayesian Approach

We now turn t.o the Bayesian approach, whi ch is close related to grey-box mod eling following the pre vious disc ussion. Ass ume th at th e prior knowledge of the system is give n in terms of a prior distribution of th e parame ters. He nce we are able t.o evaluate th e ex pectation of the considered criterion with respect to the prior dis tribution of th e parame te rs cf. (5), in stead of simply evaluating th e criterion ,Lt some fix ed values of th e paramete rs. In gen e ral th e res ulting design will be different for th e two approaches, c L th e example below . I n th e Bayesian c ase it is possible t.o prove a theorem similar to Theorem 2.1.

(24) wh e re 0' , f3 and V(el,k) d epends on the sampling time T. The disc rete tim e mod el is brought into th e following innovations form

:h+l Yk

O'h + /3u k + /\-fk

(25)

Xk +

(26)

fk

wh e re

Vh) 1\'

=

THEOREM 2 .2: Using J Eo(-logdctA1) as crit erion, optimal de.signs exist comprising not more than p(p + 1 )/2 sinusoidal co mpon ents.

P + T2 O'P(P+T2)-1

(27)

(28)

and P is th e positive semidefinite solution of th e stationary Ricatti equation . The innova tions form of the model (25) and (26) may be rewritte n as :

PROOF: Th e crit erion may iJe wI'itt cn rl.S

= l-logdet.M(~(w) , O))J!(B)dO

EXAMPLE

(20)

(29)

wh e re 18 assum ed to /)(' a rom]Jact set of til l' pamm et ers and p( 0) is the TJ7'i m' l'miHliJility density of

Since equation (29) is of th e form (8) we are now ready to formulat e th e average Fis he rs infor mation matrix

J

rI B

no

491

for the problem. Considering a power constrained input in the frequency domain we use equations (16), (17) and (lS).

prior distribution of the parameters in the design. In grey- box, physical modeling, this prior knowledge is usually available.

Considering the criterion J) = -log det M, Theorem 2.1 states that it is possible to consider a design comprising not more than 3 sinusoids.

It has also been proven that taking expectation in the prior distribution of the parameters for the considered criterion does not change the fact that a finite number of sub-experiments suffices for optimality of the design .

3

M(~)

=L

(30)

IlkM(Wk)

A simple method for specifying physical measures of the importance of the individual parameters is proposed. This method can be regarded as an alternative to specifying the loss function usually involved in Bayesian procedures.

k=l

L

=

where Ilk :2: 0 and Ilk 1. Hence the optimization problem is formulated and some general available software for solving nonlinearly constrained minimization may be applied. It turns out, that a single sinusoid suffices for optimality in the considered case. A plot of this optimal input frequency for different values of a is shown below for b = 0.5, T = 1, T2 = 0.1 and T = 1. We now want to compare this traditional design of

The traditional approach of evaluating the criterion at a fixed value of the parameters, may be thought of as an approximation of the Bayesian approach evaluated at the expected value of the parameters. But the example demonstrates that in the considered case, the design is sensible to the choice of criterion.

REFERENCES

Bohlin , T . (1989). The fundamentals of modelling and identification . Tech. rep. TRITA-REG-89/00002, Dept. of Automatic Control, Royal Institute of Technology, Stockholm, Sweden. 0.2

0.4

0.6

O.S

1.0

1.2

1.4

a

Fedorov, V. V. (1972). Theory of Optimal Experim ents. Academic Press . Fig. 1. Optimal design of input frequ ency w' for different values of a .

Goodwin, G. C. & Payne, R. L. (1977). Dynamic System Identification: Experiment Design and Data Analysis , Vo!. 136 . Academic Press.

experiments with the Bayesian approach for a given prior probability density of the parameters . Hence we consider the criterion h = Ee( -log det M). It is assumed that parameter a is uniformly distributed in the interval [0.5 , 1.1] and the other parameters are fixed to the same values as above. The optimal design in this case is

W; = 0.4575.

Graebe, S. (1990) . Th eory and Implementation of Gray Box Identification . Ph . D. thesis, Dept of A utomatic Control , Royal Institute of T echnology, Stockholm , Sweden. Hoist , J., Hoist, U., Madsen , H., & Melgaard, H. (1992). Validation of grey box models. In IFAC Symposium on Adaptive Control and Signal Processing 92, pp. 407- 414 .

(31)

Melgaard, H., Madsen, H., & HoIst, J. (1992). Selection of optimal test sequences for identification of the heat dynamics of passive solar components. In Thermal Performance of the Exterior Envelopes of Buildings V, pp. 594- 601. ASHRAE/DOE/BTECC . Clearwater Beach , Florida.

This should be compared to the previous design using the expected value of the parameters, E( a) = 0.8, which has the optimal design, according to J I W~

= 0.6038,

(32)

hence, here the two approaches give different design.

4

Rao, C. R. (l9G5). Linear Statistical Inference and Its Applications. Wiley. Waiter , E. & Pronzato, L. (1990). Qualitative and quantitative experiment design for phenomenological models - a survey. A utomatica, 26(2), 195- 2 n.

CONCLUSION

It is demonstrated in the paper, that for grey-box modeling, the Bayesian approach is an obvious tool for design of optimal experiments. It is a nice way of including knowledge of the system in terms of a

492