Copyright © IFAC System Identification Santa Barbara, California, USA, 2000
INFORMATION CRITERION FOR MODEL SELECTION WITH CONTROLLER DESIGN
Kouji Tsumura *
* Department of Mathematical Engineering and Information Physics
The University of Tokyo Hongo 7-3-1, Bunkyo-ku, Tokyo J13-8656, Japan
[email protected]
Abstract: In this paper, an information criterion based on AICffIC is proposed which addresses not only on ~e .estimation and selection of model for plant but also on designing of controller. A key pomt IS that the model is estimated on the space of control performance, and the resultant criterion also contains a bias term as AICffIC however it depends on the contro~ler. Moreover, when. th~ co~troller is also designed with stochastic data generated by the estimated model, the cntenon IS added by another bias term, which is proportional to the dimension of the control parameter. Copyright @2000 IFAC Keywords: information criterion. robust control, identification. complexity
I. INTRODUCfION
and the attained control performances, the evaluation should be to address both of them simultaneously.
In the field of robust control and the corresponding identification, the necessity of probabilistic evaluation in some sense for the uncertainties of controlled systems and the performances of the closed loop systems has been actively discussed in the last several years (Dahleh et al.. 1993; Vidyasagar. 1997; Tempo et al.. 1996). One of the motivations of these researches is to avoid the conservatism of the uncertainties which are derived from identification, and another is the application of the Monte Carlo method for robust control problem. which is hard to be solved analytically. These researches in the field of identification and design have been advanced independently. however the modeling of systems and the attained performance of the closed loop system which is designed from the model. essentially have deep relations each other. For example. the complexity of model directly influences that of controller such as its order or the number of free parameters. and it is conjectured that the complexity of controller and the reliability of the attained performance of the resultant closed loop systems will conflict. when the controlled systems are evaluated in probablistic sense. These facts say that when we apply probablistic evaluation for the system uncertainties
On the other hand. in the field of statistics or coding. several information criteria for model estimation and selection. such as AICmC. MDL and so on, are proposed in connection with information and complexity (Akaike. 1974; Takeuchi. 1976; Rissanen, 1984), where the criteria can make decision which models are appropriate for the objective systems by using given data. In particular. the well-known results, which show a trade-off between consistency of the estimated model for the given data and its complexity, are essence of model estimation and selection. However note that their focuses have been only on model estimation and selection problem. In controller designing. the most important point is the performance of closed loop systems. and when some model, which is selected by the information criterion, is used for the controller designing. the resultant closed loop system or its performance should be consistently evaluated by the same paradigm or the criterion. However the connection between the selected model and the resultant closed loop system has not been investigated in the fields of statistics or coding.
355
Proposition 1. For any nonsingular transformation S: X -+ Y, there exists a unique operator P : Cl -+ Cl for all B E B such that
On these backgrounds, in this paper, an information criterion is proposed for model estimation and selection problem in consideration of controller designing. The criterion is a derivation of AICfflC basically, but it reflects the selection of controller or its class. A key point is the transformation from the modeling of the distribution of systems itself to that of performance indices which are consequently attained with the systems and a controller. Moreover, it is shown that when a controller is designed by using the estimated model from the data, the complexity of the controller, that is, the number of its parameter, influences on the estimation results of the model in a probablistic sense. These results reveal the indivisibility of the modeling of the objective systems and controller designing.
J
J
Pf(y)v(dy) =
BEB
f(x)J.L(dx).
(1)
8- I (B)
Definition 2. Let (A, X, J.L), (B, Y, v) be measurable spaces. For a nonsingular measurable transformation S : X -+ Y, the unique operator P : Cl -+ Cl which satisfies (1) is called a Frobenius-Perron operator. Proposition 3. For a nonsingular measurable transformation S : X -+ Y, the following holds
J
f(x)J.L(dx) =
8- I (B)
J J
f(S-1(Y))J.LS- 1(dy)
B
2. MEASURABLE SPACE AND VARIABLE TRANSFORMATION
=
f(S - l(y))J-l(y)v(dy),
B
where J.LS- l , J- l are defined as
When an uncertainty addressed in robust control is not given a priori, a possible process we can take may be identification. This means that a model set of systems is derived from a data set, and then a controller is designed from the model set. Here it should be noted that the final purpose of control is the performance of the closed loop system, the model set which has a consistency with the given data, should be modeled with respect to the resultant control performance (Tsumura and Shin, 1995; Tsumura and Shin, 1996; Tsumura and Shin, 1997).
J.LS-l(B) =
J
J - l(y)v(dy), BE B.
B
From this proposition, the next representation of the Frobenius-Perron operator follows.
Proposition 4. For an invertible nonsingular measurable transformation S : X -+ Y, let P be the corresponding Frobenius-Perron operator. Then, the following holds.
With this discussion, here it is considered to transform the mode ling of the distribution of systems in a sense of probability directly to that of objective performance indices which are attained in the closed loop system as a result. For example, let x be an index of system, X the set of x, f(x) a probability density, and 'Y a performance index of a closed loop system composed of the system x and a controller 1>. Note that 'Y is a function of x and 1>, in other word, the controller 1> is considered as a mapping 1> : X -+ r. Corresponding to this mapping, the probability density f(x) on a set of system index is transformed to that on a space of the performance index as 1> : f(x) -+ gb). And then it is natural to consider the modeling of g( 'Y) substitute for that of f(x). For the analysis along this discussion, several formulae are needed for the variable transformation on measurable spaces, and next show several basic notations (Lasota and Mackey, 1995) about variable transformation of probability density for the preparation of the following sections.
Moreover, when S(y)-l is differentiable, J - 1 (y) and J (x) are given by the absolute values of the Jacobians of S-l(y) and S(x) as
J-l(y)
= IdS~:(Y)
I, J(x) = Id~~X) I.
In this paper, it is considered that J.L, v are Lebesgue measures.
3. INFORMATION CRITERION In this section, an information criterion such as AIC or TIC will be introduced for model estimation and selection problem. At first define several notions. Let x be an index of systems and also a variable of stochastic process, g( x) the true probability density of x and unknown, fo(x) a model of g(x) which is parametrized by 0, x(n) := {Xl, X2, ... , x n } sample data of x, 'Y a performance index of a closed loop system.
At first define two measurable spaces, (A, X, J.L), (B, Y, v), where X, Y are sets, A, B are a -a1gebras composed of all the subsets of X and Y, J.L, v are measures on X, Y, and f, 9 are probability densities on (A, X, J.L), (B, Y, v). Also define a transformation S : X -+ Y. Here the following holds (Lasota and Mackey, 1995).
Based on the Kullback-Leibler information:
g(x) g(x) log fo(x) dx,
J 356
(2)
where "Yi := S",(Xi) and "Y(n) := hI, "Y2, ... , "Yn}, substitute for the maximization of (4). Here the following two facts should be noted:
which is considered to be appropriate for the distance of two probability densities (Kullback, 1959), the consistency of probability density 10 (x) and the true density g( x) is evaluated by the value of
J
g(x) log lo(x)dx,
1. the constant value
J
(3)
in the normal model estimation may be changed by the transformation of the variable x. 2. However, the Kullback-Leibler information (2) itself is constant for the transformation of the variables by a cancellation between the numerator and the denominator in (2).
and the maximizer lo(x) of (3) is considered to be a good model for g(x). However because the true density g(x) is unknown, it should be substituted to the empirical distribution and consider the following maximization problem of the following likelihood function for (3). 1
These facts seemingly say that the maximization of (6) or (7) is unreasonable, however note that (6) can be decomposed as
n
- Llog/o(xd
(4)
n i=l
J J
This is called the maximum likelihood method (Kullback, 1959).
g",(",{) log P",lo(",{)d"Y
In usual, the modeling of lo(x), which is a distribution function of the system index x, is considered, but following the discussion in Section 2, it is natural to consider the modeling of the distribution of the performance index "Y substitute for that of x from the point of control problem. Here note that "Y is defined by a system x and a controller. When a controller is fixed, "Y of the closed loop system composed of the controller and system x can be figured out, and then a mapping
S",(x) = "Y,
=
(5)
1
g(x) log
:0\;) dx.
By employing the formulae on variable transformation introducing in Section 2, L"" o( "Y(n») is written as 1 n L""o(",{(n») = :;:;: L log P",IO("'{i) 1 n = - Llog/o(x;)
n
1 n
+ -n~ ~logK",(xi)'
i=l i=l K",(x) := J;l (S",(x)).
(11)
(12)
The equations (7) and (11) show the followings. When
(6)
where g",(",{) is the unknown true density of "Y. And now consider the problem:
mFLL""o("'{i) i=l
J
The first term is the minus of the entropy of the distribution of the resultant performance index and the second term is the minus of the normal KullbackLeibler information between the true density g( x) and the estimation lo(x). Therefore, the maximization of (10) can be interpreted as a simultaneous optimization to restrict the dispersion of the resultant performance and to approximate g(x) by lo(x). In particular, to restrict the dispersion of the performance is significant for the robust control. From these reasons, the current problem is concluded to be sufficiently meaningful.
Corresponding the variable transformation (5), the probability density lo(x) is transformed to P",lo("'{) where the Frobenius-Perron operator P", is defined by the mapping (5), and therefore the evaluation (3) is also transformed to
g",(",{)logP",lo(",{)d"Y,
g",(",{) 10gg",(",{)d"Y -
(10)
from x to "Y for a controller
J
(9)
g(x)logg(x)dx
(7)
Now define 1
n
L""o(",{(n»):=:;:;: LlogP",lo("'{i), i=l
n
(} := arg max - ~ log PW 1)IO("'{i) o n~
(8)
i=l
357
(13)
(27)
Next consider to evaluate the maximum likelihood model !o in the sense of (6). The value of (6) at B = 0 can be reduced to that of
1
where
M:=
{log!o(x)
+ log K",(O) (x) } g(x)dx.
(15)
1
1
[W(O,
1
n
8 8B' log {Jo(x)K",(e) (x) } g(x)dx,
w(O, Ba) 1 n
= --tr J
{log!o(x)
+ log K",(O) (x)} (17)
{log!eo(x) + log K",(oo) (x) } g(x)dx 1"
,
{log!o(x)
-1
1
-trJ(
Here the first term in the right hand side of the above equation can be rewritten as follows.
{log !eaCx) 1
(32)
-1
dimB
l(
and TIC is equal to Ale. Note that w(O, 0) depends not only on !e(x) but also the controller S",(x). This suggests that when the model is estimated or selected based on W(0, 0), the selected model or its estimation value will changes depending on the class of the selected controller and the design method.
+ log K",(e o)(x) } g(x)dx
n
=:;:;: L
"
(
Note that w(O, 0) is identical to TIC except for the
+ o(n -1 ). (26)
1
+ log K",(O) (x) } g(x)dx
The equation (17) shows the properness of using w(O, Ba) as an estimation of (15). However at first Ba is unknown and w(O, 0) is considered as a possible estimation. Moreover J(
+ log K",(O) (x) } g(x)dx
- "2(B - Ba) J(
1
Finally take the average of (32) for x(n) and the proposition is given. 0
Now consider the Taylor expansion of (15) at B = Ba,
=
(31)
+ 0(1))(0 - Ba) - M + 0(n- 1 ).
PROOF. (outline)
{log !o(x)
(30)
where g(x(n») means the joint density of x(n), and we get
:B log {Jo (x) K",(o) (x) }
1
1
Mg(x(n»)dx(n) = 0,
g(x)dx
X9(X)dX] =0(n- 1)
1 1
(29)
= -~tr J(
82
8B8B' log {Je(x)K",(o) (x) }
Ba) -
+ 0(1))(0 - Ba)
1(0 - Ba)'(J(
then, the next holds.
E
+ log K",(O) (xd)
Moreover,
1 X
(log!o(xd
- (0 - Ba)'(J(
+ log K",(O) (Xi))
1
l(
n
=:;:;: L
+ M + 0(n - 1).
- -trJ(
X
(28)
{log!o(x) + log K",(O) (x) } g(x)dx
.=1
J(
K",(Oo) (Xi)}
1
,In (log!o(Xi)
n
Substituting (27) to (26), we get
Proposition 5. Define the followings,
L
(log!eo(x) + log K",(e o)(x)) g(x)dx
- :;:;: L {log!eo(xi) + log
Of course (15) can not be calculated because fJ",(-y) is unknown and it should be approximated. The Akaike's information criterion AIC (Akaike, 1974) or the Takeuchi's information criterion TIC (Takeuchi, 1976; Konishi and Kitagawa, 1996) is known as an approximation of (3) for the maximum likelihood model. By following the process of deriving AICffIC, here the following proposition is given.
weB, Ba):=:;:;:
1
(log!o(x i ) + log K",(O) (x))
358
4. SELECTION OF CONTROLLER DESIGNING
where h(z) is a uniform distribution density. The important point is that ~ is a function of () and a random variable z{m), and then ~ is also a random variable.
In the previous section, an information criterion is induced for model estimation and selection. In this section, at first define concrete controller designs and then extend the above derived criterion to new one. At first show one of considerable design method of controller S"'UJ)'
Because ~ is a random variable, next consider the average of the term
Let p",fo(r) be a performance distribution of a closed loop system which depends on the model, then, a considerable evaluation of the controller 4> is the average of the attained performance as
J'
in W(B, ()0) at 4> = ~. At first the next is given.
1 n h(z(m»);;, L log KJ>(Ij,z(m» (xi)dz(m)
(33)
P",fo(r)d,.
J
The other evaluation is also considerable but this formula will be addressed to understand the following process. The optimal controller 4> can be defined by
i=l
n =;;,1 LlogK"'o(li)(xd +
x
((~(B' z(m») 1
Here the minimized function (33) is also written as
J'
P",fo(r)d, =
S",(x)fo(x)dx.
'"
1
m i=l
'"
+~
i= l
(36)
t
:4> log KJ>(o,z(m » (Xi)(4>o -
~(B, z(m»))
i= 1
+ (4)0 - ~(B,z(m»))'
Here assume that fo is nonsingular, and then the sample points Xi which obey fo are generated by using uniformly distributed sample data {zd (=: z{m» on a sample space Z of the corresponding dimension and a variable transformation To which is defined by fo as follow (Fushimi, 1994).
Xi = TO(Zi)
82 84>84>' logK"'o(O)(xd
Moreover the first term in the right hand side of (40) is written as follows.
L S",(Xi)
rn
4>0)'
X (~(B, z(m») - 4>0)) dz(m) + 0 (m- I) (40)
m
= argmin -
h(z(m»)
t= l
(35)
For example when, is some operator norm of a closed loop system, then, (35) is an infinite dimensional optimization and it is very hard to be solved in general. A reasonable one is the Monte Carlo method, that is, at first prepare sample points Xi (i = 1 ~ m) which obey probability density fo(x) and transform them to the performances on the sample points. Then the controller is selected as the optimizer of the sample points of the performance as follows. , 1 m 4>:= argmin - L,i
Tt
x 2n L
J
J
i=1
1
n
x 2n L
82 84>84>' logKJ>(o,z(m»(Xi)
t= l
x(4)o-~(B,z(m)))+o(m-l)
Now define (37)
Therefore (36) is written as ,
1
m
4>((), z{m») = argmin - L '"
S",(TO(Zi))
m i= 1 1 m
=:argmin- LU",O(Zi) '"
m
U""o(z):= S",(To(z)).
i= 1
then (40) is written as
'
(38)
J
1
h(z(m»)n
Corresponding to ~((), z(m»), 4>0(()) is also given by (39)
359
Tt
~ log ~ i=l
K ",(O · . ,z(m» (x t·)dz(m)
(41)
closed loop systems. They are composed of entropy not only with a bias term which is proportional to the complexity of model but also a new term which is proportional to the complexity of controller. where
J J
R(>, B) := Q(>, B) :=
The concrete methods of the controller designing were not referred and then the discussion was advanced under the assumption of their feasibility, however the real execution of solving controller designing as (36) is very hard and this problem is left for future work.
h(z) a:;>,Uq"O(Z)dZ
a
a
h(z) a> Uq"o(z) a>,Uq"o(z)dz.
The same discussion based on MDL (Rissanen, 1984), which is related to the complexity of model and its estimation/selection problem, can be also applicable.
By taking the average of 1lt(0, Ba) on z(m) and the following is given.
J
h(z(m))
J{logfo(x)+lOgK~(O, z("'»)(x)}
6. REFERENCES Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control AC19-6, 716-723. Dahleh, M. A., T. V. Theodosopoulos and J. N. Tsitsiklis (1993). The sample complexity of worstcase identification of FIR linear systems. Systems & Control Letters 2~3, 157-166. Fushimi, M. (1994). Stochastic method and simulation. Iwanami Seminar, Applied Mathematics. Iwanarni. Konishi, S. and G. Kitagawa (1996). Generalised information criteria in model selection. Biometrika 83-4, 875-890. Kullback, S. (1959). Information theory and statistics. Mathematical statistics. John Wiley & Sons, Inc .. New York. Lasota, A. and M. C. Mackey (1995). Chaos, fractals, and noise; Stochastic aspects of dynamics, Second edition. Vo\. 97 of Applied mathematical sciences. Springer-Verlag. New York. Rissanen, J. (1984). Universal coding, information, prediction and estimation. IEEE Trans. Information Theory IT-30, 629-636. Takeuchi, K. (1976). Distribution of information statistics and criteria for adequacy of models. Mathematical Sciences 153, 12-18. Tempo, R., E. W. Bai and F. Dabbene (1996). Probabilistic robustness analysis: explicit bounds for the minimum number of samples. In: Proceedings of the 35th CDC. pp. 3424-3428. Tsumura, K. and S. Shin (1995). Decision of nominal model and closed loop system design. In: The 18th SICE Symposium on Dynamical System Theory. pp. 285-288. Tsumura, K. and S. Shin (1996). Robust control with distribution condition of system parameters. In: The 25th SICE Symposium on Control Theory. pp. 247-252. Tsumura, K. and S. Shin (1997). Simultaneous modeling and data-distribution-dependent robust control system design. In: Preprints of the 11th IFAC Symposium on System Identification . Kitakyushu. pp. 129-134. Vidyasagar, M. (1997). A theory of learning and generalization. Springer. London.
x g(x)dxdz(m)
1
=
n
~ LlogK~(lj,z (",»)(xi) i=l
-
.!.trJ(~, Bo)-lI(~, n
1
A
1
Ba)
+W A
A
- - t r R(>o, B)- Q(>o , B)R(>o , B)-
nm x V(>o , 0) +o{n- 1 ) +o{m- 1 )
1
(43)
In the above equation, the true value of W is unknown but its average is 0, therefore approximate it as 0 and also introduce an approximation of R(>o, 0) --+ R(~, 0) and so on, then, an information criterion can be given for the estimation/selection of system model and also the selection of controller design method as follows.
Y(~, 0) 1
=
n
~ LlogK~(O,z(",»)(xi) i=l
- '!'tr J(~, 0)-1 I(~, 0) n - _1 tr R(~,
nm
O) - lQ(~, O)R(~, O)-lV(~, 0) (44)
The formulae (43) and (44) show that Y adds a new bias term to 1lt, which is proportional to the number of parameters of the controller. This suggests that when the structure of controller is complex then a controller which shows high performance for the given data will be designed. However, when the data x for the controller designing obeys a stochastic process, the complex controller tends to be specialized only for the given data. The new bias term can be interpreted as to avoid such complex controller. 5. CONCLUSION In this paper, information criteria were proposed for model estimation and selection problem based on AICffIC with respect to the attained performance of
360