PRINCIPLES OF INFORMATION THEORY OF DYNAMIC SYSTEMS
Petrov B.N., Zaporogetz A. V. , Petrov V. V• , Kochubievsky I.D., Ulanov G.M., Hazen E.M., Ageev V.M., Ulianov S. V. Institute of Control Sciences Moscow, USSR During the decision of the problems of analysis and synthesis of characteristics of the complex sistems with developed logic of hierarchical relations is necessary to form generalized, complex criterions, allowing to compare from the same positions and arguemently confront different realisations of the projected systems. One of such criterions gives the theory of information.
The majority of problems considered is statistical dynamics come to the analysis of the generalized structure, which aspect is given in pie. I, where U (t) is useful signal, n(t} and mic} are hindrances, HfJ'vJ) and 1,./(1<») are ideal and real operators of transformations, eR ( t) and K ( t) are ideal and real impulse transitional functions, A(t) and Wi) are ideal and real goings out of the researched system. Understanding under the information characteristic of such system meaning of the middle quantity of information about h.{L) ,containing in V(t~) ,for the arbitrary moment t ~ ,we shall have (2)
It is known that from some general point of view (1) mathematics deals only with great numbers and the relations between the objects of mathematical analysis are described by the great numbers theory. From these positions the information theory studies quantitative characteristics of topological properties of reflections in great numbers of topological space. During the decision of many problems is admitted that great numbers of topological space can admit metrization, elements of great number can posses characteristics of probability. But as soon as information characteristic deals with topological properties ot reflections, it doesn't appear always connected with metric of space. It explanes mutual unsimpleness of information and exactness characteristics. In future the report deals with topological metrizied spaces, which elements possess characteristics of probability.
:J[h.(C.V]:::JJP(h ) eo [P{X)] ~(t.) . Y tIt ~ P( h) . t [h.,vJ
dh.dv
'*
(1 )
For the system described by the linear differential equations, acting in conditions of stationary influences of Hauss, meaning of the information characteristic in the established regime is determined by the correlation
400
maximizing the meaning of quantity of information. Peculiarity of functionals of types (2), (3) consists in their nonlocality (4), which doeon't allow to use the apparatus of classic variation calculus. That's why is necessary special decision with research of sufficientness of received conditions. where $ (j <.'-1) tral density.
is correspondent spec-
Analysis of the correlation (2) allows to get the significance of the information characteristic for the system of the arbitrary structure and every number of points of attaching of influences, allows to get transitions to the classic correlations of the information theory, to give exact wording of the conditions of absolute invariability and invariability up to E • Difference between the correlation (2) and the known consists in giving of exact meanings of information characteristic and not the upper and lower limits of its possible meanings.
For the singlemeasured linear systems, if we neglect constant factors and take into account monotony of logarithmic dependence, modifying functional is written down as the following
-
[JJH';1.{l -f)K{t)ae(r)dtdr +JRmjt)ae(t)dtJ~
V[K{t)j;:JJR.(t-r)K{t)K(r)dt dr + 2JRm~t)K(t)dt +-Rm (0)
I
(4)
As it is known (4) , the necessary and sufficient condition of extremum of the functional is
For the multimeasured systems meaning of information characteristic is determined by the correlation (5)
is material parameter, where independent on t ~ (t) is variation In this case (I1+J..B/
V(r/..);;.
Jze
+ 2J.'J:J +- E
where all the variables are given in vectorial-matrix form (3) •
(6)
J
where If ~JJ~(t-r)K(t);;e{r)dt dP R",)1:.) ae{t) dt
Received characteristics allow to put straight the variation problem of vector of optimum transmissive functions,
B ::.)f R"]I{I:.-r) 2(t) oe(,r) dt dr
C ;:JJR.(t-'[)Z{t)ur)dtdr
401
JJR.{t
;/J .]jR, (t -r)zrt) K(t)clt cif "'JP. mx It) ~ It) dt
J= -T)K(t)K(f) >2jR,,,,(l)K{t) dt + R", (I) /1 fJR . Jt{t -r)t«t)ae{r)dtdr +-JRm,,(t)ae{t)dt
E: fIR. (t-r)K{t)K{r)dtd[+dJRn,r/t)K{Odt >R)O)
Analysing received correlations it is necessary to mark that is it appears in the system (10) ) ={ , in that case the system of integral equations, supplying maximum of information exponents, degenerates into the known equation of Viner-Hopf, supplying minimum of middle square of mistake. From the other side having ) I i maximization of information exponents is reaChed in the other decisions, than in the case of minimization of dispersion of mistake •
From
the correlation (5) follows
(7) that leads to the two extremum:
conditions of
1) 11 = 0 ,that corresponds to the trivial minimum of the initial functional (2);
2)
Bf-lItJ:=O
(8)
that corresponds to the maximum of the initial functional. In the work (5) it is shown that ,the condition (8) is not only necessary, but also it is sufficient of extremum. Opening (8) we shall have
J#(t) 2(t) elt
=0
(10)
Decision of the system (10) is of that type: 1) For the system with limited characteristics:
(9)
where
(11) According to the main lemma of the variation calculus (4) from (9) follows the necessary and sufficient condition of the maximum of functional (2) as system of integral equations:
2) For the systems, satisfying to the condition K(t)=O when t. L.O
Ij( ..J) = _1- [;/ . Sur UuJ)H(ivJ) - $",. (i"') ~
'r'.juJ)
]
$.(,..)
+J-{r,smr fdu)] ,[$",. f-i..,J) ] - ~ J 'f!(-J,..) - + 'flJw) _ ~m(w) dw
_:...0
402
+
3) For the system, satisfying to the condition /fit): 0 when t £ 0 and t ~ T
(16) 17
J!Rx/t-r)k(i)k('l:)dtdr 4-2jRm.(f)K(t)dt +Rm(O) "\(t) = "~ T vo . . ae([)JR/J,Jt - l}f((t)dtd'[-JRmt!)~{t) cd
[J
. c-:.
()
.
.:.0
(13)
4) For the system with the variable parameters
K{t,r)::b1 (P)H;'(p)Ptt)R".(t,r)-ftM(t,r)j; U
t >r, p:::r
t
11 R;Jt,e)K(t r)k{t,f})drdB"2JK(l,t)[tn(t»)( tJ [+8(l)oe(t)jdt Rm (t, r) 21J{t)m(t) Qtr) )(1:)= JR,,~ (t, r)K{t, r)dr +u(t)m(t)+u(O Qft) t.t.
t~
+
4-
l
+Q(t)
=======-
t
t.
(14)
One of the possible ways in the research of the nonlinear systems is applying of methods of linearization. If in the quality of criterion of estimate of quality of the syst em is used middle square meaning of mistake, than quite natural is the substitution of the nonlinear link by the linear with the coefficient of intensification equal to the relation between middle square meanings of the going out and entering signals. But using of information criterions leads to the necessity to examine questions of linearization from the positions of the information theory. Let the nonlinear link be described by the nonlinear function
5) For discrete systems
y=j(x)
j = 1').1')1,/iz)'.Iif if ' 21j.:'(1)1.1i,-) '¥- ~ifJ. MI !P$;(l)~(i')Ji(l)~l +9S:)z)H(l-j ~ 'i!I:~
Ill'1
(15)
6) For discrete systems under the finite time of ol?servation K( i T)= 8. d + .. , +8~,/(d;1f +C'~O(iTJ+· .."c""/> ;S(J(i.T)+~~iT ",oN
L
(17)
It is demanded to find linear characteristic 1J(/J) , which with its properties would be near the given nonlinear. From the point of view of information theory characteristics will be equivalent if they change the entropy of entering signal x(i) in the same way. Then, if we determinate the entropy of J.I J{ signal in the entrance of nonlinear element and the entropy of Hi in its going out, from what was said follow that equivalent linear characteristic has to satisfy to
403
In this correlation quantity F is connected with the interval of discretization and, if we demand equality of losses of entropy in linear and nonlinear links under any interval of discretization, then from correlation follows that
the correlation:
(18)
In the general case determination of quantities Ut and H" is complicated problem. But, if we suppose that the process in the entrance and going out are of Hauss, and that the interval of discretization ~t simultaneously is connected with the upper frequency F by the correlation c:£ Kotelnikov V.A., or LIt :: 2~ we shall have f
fog;i)i-&jj)" =2T J~/rp~'J)(dj o
(19)
Using correlation F
1?og1j:J/ ~ ; I£03 S(j)2Fdj o (20)
we get:
r
J
~Jf) F
o
SiJ) d f -2TJn '. r;o~S.(j) JWj<{JVJ)dj 0
(21)
where Sx and Si are spectral density or signals x(i) and ~(t) • As n. =2 TF ,then after evident transformations the last expression is brought to the aspect: F
F
J&9 ::(~j dj =Je031 cpvJ )/2dj
o
= . I Sit)) I
(23) This correlation coinsides with principle of linearisation, suggested in the work (6). Of course, it is necessary not to forget that with nonlinear characteristic j( x) signals x(t) and ~(t) can't be of Hauss simultaneously. That's why the done supposition appears to be contradictory. It is known that under the given correlation function entropy takes the maximum meaning, if the process is of Hauss. Then formula (23) is average excessive meaning of amplitude characteristic. Thus methods of the information theory allow to come to the problems of linearisation of nonlinear characteristics, and reveal the main peculiarities which take place under different suppositions, used in practice, during the linearisation of nonlinear characteristics. Projecting of optimum systems of automatic control and of complicated information systems in many cases is connected with the problems of reduction of large massives of entering information of simplification of descriptions of the process in the projected system and of valuation of losses in the quality of work of the system in comparison with the maximum potentialy possible exactness.
0
(22)
The row of estimates for the growth of risk under reduction of dates and simpljfication of description of complicated
404
systems of automatic control we can receive formulating the problems of control as the problems of the theory of optimum statistical decisions (12). ---+
decision of stopping of observations and of choice of meaning B will be taken on the base of consecutively coming reduced datas
-
Let X{, X2 . " X". mean consecutively received entering datas (results of observations), -is unknown parameter, of which the decision is taken according to J) ..... .-.... the results of observations, o(x" XL, '" X,,) is decisive function, which confront the rec eived meanings (~ X; ... X":) ..... 't./' .-.." )(- n)",Oj LiI with some meaning 8" , 'VJ(8(x" is weighter function of losses (or advantage) under the taken decision8(x" X" ... x~) and true meaning fJ In the problems of consecutive analysis the decision is taken also about the choice of moment of stopping of observations, n.= /'L(x/ ,'" x,,) and of taking of final decision about B . The risk R is the mathematical expectation of losses
e
~
(where V. ( ... .. ) functions).
) are some given
Risk for the optimum decisive rule with the reduced datas we shall designate byJP
• Theorem
During the reduction of datas the growth of Baiess risk~' in comparison with Baiess r i s k . l in the initial problem of cosecutive statistical decisions is limited by the inequalities
The quantily
J ='- n.j M[ W'( 8(
)(4, .... I
)( .. ),
h,
8]
(24)
.s is called the risk of Baeiss, where the lower limit is taken according to all consecutive rules $
(25)
When that it is supposed that the entering datas X, . . · X,... represent casual with joint distribution of probabilities
P(x, .. .. x~ 9) ,
1
I
(26)
It = 1, 2 . ..
,
Let us suppose that simplification of description of the system or reduction of entering datas come to the following: --+ instead of the initial datas X I, '" X n· · · I
In inequalities (25), (26) it is supposed that the function of losses \./(8(X., .. . ., X,.) ft B ) is limited, W' ( ... ) !:- ~ and in inequaUty .p I( wJ) designates Baiess risk in the
,
405
problem of statistical decisions, in which the weighter function of losses ~ is changed by V;: ,difference .f,(w")- (yY represents dispersion of risk . ___ /J
31 = J ((;;, . . .
x: ~ e)- J (x;, ...,
X.,
'J, 8)
x:
where J(( X; . ' I J, B ) is Shennon's quantity of information, maintained in the cosecutive selection ( X:-.. . , relatively to parameter B J( (;:. . " X:-), f} is quantity of information in the reduced selection;
x:)
Ll~ =J{itn,(x-; ..
x:,.,) ).J(X:' (x: .... , X:.,)J,
.A J,! describes losses of Shennon' s quantity of information maintained in reduced datas (X;, ., . . relatively to the following meaning ~ , in comparison with information, maintained in full datas ( X;, .. . . 'X:) ; the mean quanti ties are carried out according to all selections (X;, .. . I under which decision is taken not earlier then h'20 step ( m , It ). The proof of the formulated theorem is set forth in (2). )
X:-.)
x:)
Inequalities (24) - (26) represent generalization to the problems of consectutive analysis of results received in the works 11. Pe - re i! (1) for the case of beforehand settled time of observation h- • In the case fl-:= c on.. A t term /J3; (describing losses of information about the future course of the process, necessary for the choice of the moment of stopping of observations falls out from the formulas (24)-(26).
The meaning .ll J( =D , if the reduc ed quanti ti e s X;;. . .I X:' are sufficient for the estimate of parameter B (where sufficientness is determined according to Fisher). The meaning £J ~ :: 0 , if ( are sufficient for the estimate in other words condition~l distribution .. - " X....m _,) P(-' - .• ) P(X"'(X"" X..,(X" . .. , X;., -r This property of statistics (X" . .. X",' ) is called transitness. Thus, if under the reduction of datas the properties of sufficientness and transitness are kept, then risk for the optimum consecutive decisive rule doesn't grow. In this private case (.A~ = 0 • /J ~ =0) affirmation of the theorem coinsides with the result of Bahadur (1954), who proved permissibility of of using of sufficient and transit statistics for the construction of consecutive decisive rules.
x:, .. .", x:)
Inequalities (24), (26), however, envelop also those cases when the properties of sufficientness and transitness aren't kept under the reduction of datas. But if the losses of Shennon's quantity of information ..1 J f , Ll:h are little, then risk I increases a little in comparison with ...p
.f
Information estimates (24) - (26) allow to compare the risks J f)' and Y without construction of optimum consecutive decisive rules. Initial datas ( X" ..•. , X .. ) can be multimeasured ( X; is vector), and the reduced quantities X;' singlemeasured; .I XL - can be uninterrupted, and I Xi discrete quantities and so forth.
-
--
Construction of optimum consecutive rules with using of full datas (Z, ... , can demand such "volume of memory" and
x: ... )
406
acting of electronic computers with such quickness, that can appear unachievable in practice. Calculation or estimate fr~m above information quanti ties .a ~ Ll ~ is more simple and allow to judge of proximity of the reduced rule to the optimum, and therefore, of applicability of this or that reduction or simplification of the system. J
In
information quantities .4 ;J{ Ll 1z can be determined analyticaly. If () -is discrete parameter, taking meanings H, . . . [)/( • then asymptoticaly under h - - c-o (J )Cin case of independent observations and under settled beforehand number of observations ): so~e cas~s
Construction of consecutive decisive rules is founded on the decision of recurrent equation for the conditional risk function. Let us suppose that under the reduction of observed datas take place transition to the function J.." : ; L (;;, . .. " so that meanings L/ . .. . , J.." ." make Markov process. Let S (/... ,...,) designate conditional mathematical expectation of future losses under the optimum decisive rule, under the condition that after n1 observations we receive the meaning L(X{, .. . J X ..) .:: L ; -W(f. ; I"n) is conditional mathematical expectation of losses in the stop. Then
x:)
J
J
,
Calculation Ll~ simplifies itself expecially in case of Hauss' processes.
I
J(5(L ....,;
entropy
of a priori distribution
,
In -I- i) ~C (L"",; /71+" )P(~ rn H
I' L) di", .,] I
_~
B
(27)
where C ( J..," +/ ; Il1 ·H ) is the cost next (m +1) of carrying out of the observation. Let the number of observations be m. ~ r ; then S(L; r):: 'v./ ( 1. T) These correlations allow to calculate recurrently 5 ( L; h'?) under in f, T and find sp ere of continuation, where
e
For uninterrupted parameter (8 - is scale vector) under independen~ observations and n , 00
1((x
1 , .. . ,
Xn)(})::
J
JJP(B)fn.{n.(9)d;t ((:JV-(BJ/l]d6+
-
-
S{L; m.) < w(l..; m)
+COfl,,a.t[P(9)] ~
However, if L is multimeasured quantity, then the decision of recurrent equation (27) becomes difficult. Let us admit, that we can find the function I. (X" .. . , XI>.)::: L,"" of less proportions, also possessing Markov property under the increase of h.. • Then the
is Fisher's information matrix,
. ,
ii(9) - is mathematical expectation of the number of observations.
407
--...
~
I
optimum rule, using only current meanings L'n. ,is build on the foundation of decision of equation, analogous to (24), for the function 5'(L' ; m) Difference between risks p and Y' we can value according to formulas (24)(26):
,':cj.S(L ; l)dL'j Simultaneously
LJJ;";o J (/..,.;()) <' ----- /
IJJ,: ::
- J(L~; e) , <' ----- ./"
:;(1.... ; J.. ..... ,) - .7(/.. .... ; /.."" .,)
(rn 5. h) Proximity of risk for the rule, based on the meanings of the reduced quantity "-- ~ , to the optimum meaning for the rule, using multimeasured statistics i we can value according to formulas (24)(26), without the decision of multimeasured equation for the risk m. ) (27). Some concrete examples of such construction of suboptimum rules and estimate of their proximity to the optimum are given in 2.
-
siT;
For the simple function of losses risk consi( W(B(x l , •.. . x,,)8)=O or-i) des with the probability of mistake. Information estimate for risk give the lower limits for probability of mistakes under the distinguishing of statistical hypotheses. Simultaneously, side by side with measures of entropy and guantity of information by Shennon-Viner is advisable to use generalized measures of entropy and guanti ty of information. /
/
During the decis of the problems of opt imisation of processes of treatment and
408
re-treatment of great massives of information in Information System in practive often we have to deal as with its uniform similar so with not dissimilar uniform massives, for which strictly speaking exist different measures of information. Therefore in information estimates for change of risk and for probability of mistakes under distinguishing of many statistical hipotheses with regard of information massives hetero-geneity can be used measures of information, distinguished from Shennon's. Generalijed f-entropies (where instead of ( t eh t ) are used others salient functions j(f) introduced by A. Renyi, who has opened also their statistical meaning. Possible generalized entropy measures and information guantities and their axiomatization are examined in (4). Using such wider class of information divergense between probably distributions under the fixed settled number of observations information estimates for risk and probabilities of mistakes of classification of tested hipotheses can be improved ( 5). Thus. calculation of "inner" structure of treated massives of information allows to receive more exact estimates of waking the decision, to reduce necessary number of observations under the given exactness of calculations, and the received information estimates (as the border conditions of pot ntial exactness) allow to pass to the realisation of the idea of pressing of the data processing processes. The latter is a part of the special problem of information theory of control [~tJ. Let us take as an example the results of determination of the lower limits for probabilities of mistakes under distinguishing of many hipothese for the generalised measures of information divergence.
Let us designate as LiJ probability to assume hipotheses H,;' under the truth of }{ for the given decisive rule, which can be as a rule with settled number n of observations, consecutive. Let, as before, n designate the moment of stopping of observations. Quantity of conditional mathematical expectation
Unequations «29) allow to define specify more exactly lower limits for probabilities of mistakes under the settled number of observations n., in comparison with the results, received when (we use Renyi and Culbac information measures of divergences. There are shown the results of calculations according to formulas (28), (29) in pie. 2. By dotted line and Stroke dotted line are shown the results for measures of Culbac and Renyi correspondingly. During culculations were taken the following parameters 0( fl = q l':>' . If:, X
A(x)~Ki.e
ft, } )
K f =1' If='? . o::oS I
,
I
A:j!J'!:~"
(28)
is called generalised information divergence (of regime d and type { between probability destributions
(i : ~2);
For the mesure of Renyi
O=O~
Through the information measure (28) are expressed generalized additive measure of information divergence of regime a and type {}: J : [6 ]
I}ftd( R: liAr) =I_let Rn. [Het {fo<~(i :K)] Using divergence (28) and Lenson ineguations for the functions of several variables, we can get the following estimates for probabilities of mistakes
and nonadditive measure of information divergence of regime and type {
;''j
[5,6J if
"A 72,
then
lA}
1-foi
r
NI
<:
0'<2
-fti.
o'J,-{
Ho: (i;K)~L
Ha
Ha
1(Pc:IIP/{; I-a
,
(hj,~If)
(K :i)
AI'
l-a
"'j{ d..~
{fod
1-a
(i :lo)jff- e ),
a~i-{],
hJ
(K :i)"'[J.i/{J.K/f +(f-di.ir) (1-rl./f,.)
IfoiJ
a{fifJ=(I-Ha
I
ct-jJ;.-1
I,
which are connected by the correlation of the following type :
I{pJf1r;
"'!>,-1
clt
409
-
Ifl;}
(-Q
a, { fti } ) =(l-expjr(f-I)Ia (fU/P"J))(f-e)
The results given earlier remain correct for this wide class of information measures as well.
REFERENCES (1) Shilov G.E. Mathematical Analysis. Part 3. "Science", 1970. (2) Shenon N. Works on the Information Theory and Cybernetics, Ill, 1957. (3) Leonde K.T. Contemporary Theory of the Systems of Control. Science,1970. (4) Gelfand I.M., Dolin S.V. Calculus of Variations. Fismatgis, 1961. (5) Petrov v.v., Zaporogrts, A.V., Polakov I.N. Information Synthesis of Optimum Systems. DAN USSR, vol. 215, No 4, 1974. (6) Fupkov A., Calculation and Projecting of Nonlinear Systems by Statistical Method. Nauca, 1969. (7) Peres A. Information-theoretic Risk Estimates in statistical Decision. "Cybernetyca", I, No 3, 1967; No 4, 1965. (8) Hasen E. Information Estimates for the Growth of Risk under the Reduction of Observed Datas in the Problems of Consecutive Analysis. "Problems of Control and Information Theory", No. 2, 1974. (9) Renyi A. Probability Theory, Budapest, 1970. (10) Petrov B.N., Ulanov G.M., Ulanov S.V. Value of Information Simiotical Aspects of Information Theory of Control and Cybernetics. In collection "Technical Cybernetics", (Results of Science and Techniques), vol.5, VINITI. Academy of Sciences USSR, Moscow, 1973. (11) Ulanov G.M., Ulanov S.V., Hasen E.M. Information Estimates for Risk in the Problems of Tr eatment of Great In1'ormation Massives. DAN USSR,vol.
210, No 2, 1973. (12) Petrov B.N., Ulanov G.M., Ulanov S.V., Hasen E.M. Value of Information Semiotical Aspects of Information Theory of Control and Cybernetics. In collection "Technical Cybernetic!1', (Results of Sciences and Techniques), vol. 6, VINITI, Academy of Sciences of USSR. Moscow, 1974.
h(t)
H(d w) i ce (t)
_
--tQo.
x(i)
u(t)V¥
wqwJ;
I< (t)
•
-r m(t)
n (t)
Pie, 1
0
I
\. D,4
\\ \
.
\:\
~<;j
\
0.2
.
"'~ ",- ,
"~ -..:
2.
n P~c. 2
410
•