Axiomatic information measures depending only on a probability measure

Axiomatic information measures depending only on a probability measure

ELSEVIER Statistics & Probability Letters 28 (1996) 329-335 Axiomatic information measures depending only on a probability measure T. Brezmes*, G. N...

374KB Sizes 0 Downloads 41 Views

ELSEVIER

Statistics & Probability Letters 28 (1996) 329-335

Axiomatic information measures depending only on a probability measure T. Brezmes*, G. Naval University of Oviedo, Facultad de Ciendias, 2a Planta, Departamento de Matematicas, c/Calvo Sotelo, s/n, 33007 Oviedo, Spain

Received July 1994; revised May 1995

Abstract This paper characterizes information measures, in the sense of the Axiomatic Information theory introduced by Forte and Kamp6 de F6riet (1969) which depend only on a probability measure and they are compatible with the most general form of the "independence axiom". Keywords: General information; General independence

1. Introduction and background The classical Information Theory due to Wiener (1948) and Shannon (1948) is essentially based on the probability of r a n d o m events. With f2 a set of elements e) (possible results of an experiment) and 5e a nonempty class of subsets A of f2, Wiener stated a theory on the measurement of the amount of information provided by the occurrence of a rand o m event A under the hypothesis that there is a probability space associated with 12, say (f2, 5e, P). In that case, the measure of the information supplied by the occurrence of A is given by J ( A ) = c log P ( A )

where c is any negative constant.

* Corresponding author.

(1)

This definition results from the following arguments: (a) Suppose that the information measure depends only on a probability measure, i.e., (2)

J(A) =f[P(A)],

w h e r e f i s a mapping from [0, 1] to [0, + ov]. (b) Assume that whenever two events A and B are "independent" from the point of view of a given information measure, then the information supplied by the simultaneous occurrence of events A and B is the sum of the information supplied by the occurrence of each of them, i.e., (3)

J ( A ~ B ) = J ( a ) + J(B).

As a consequence of (2) and (3), if the "independent" events are also stochastically independent, it follows that f[P(A)P(B)]

0167-7152/96/$12.00 © 1996 Elsevier Science B.V. All rights reserved SSDI 0 1 6 7 - 7 1 5 2 ( 9 5 ) 0 0 1 4 2 - 5

= f[P(A)]

+ f[P(B)].

(4)

T. Brezmes, G. Naval / Statistics & Probability Letters 28 (1996) 329 335

330

2. Independence Axiom: If ( d , ~ ) e #g, then,

So the function f must satisfy the following functional equation:

J ( A m B ) = G[J(A), J(B)] f(xy) =f(x) +f(y),

V x , y e [0, 1];

whatever A E ~4 and B E aM may be. 3. Universal values:

whose unique continuous solution is f ( x ) = c log x,

(6)

where c is a negative constant that ensures the nonnegativeness of J(A). Conditions (a) and (b) constrain information measures depending only on a probability measure to be those in (6). That constraint has motivated us to generalize the notion of independence between random events by replacing condition (b) by a less restrictive one: (b*) Assume that whenever two events A and B of 5 ~ are "independent", then the information supplied by the simultaneous occurrence of events A and B is a function of the information associated with A and the information associated with B, i.e., J ( A m B ) = G[J(A), J(B)].

(9)

(5)

(7)

In this paper, we will examine the most general form of the measure of the information supplied by an event under the assumption that the measure satisfies conditions (a) and (b*).

J(0) = + oo

and

J(K~) = 0.

(10)

The composition law G will be called the independence law and J will be called the general information measure. Now let us look at some properties of the independence law G. We suppose that G is continuous on (0, + m] x (0, + o03. Let A~ be the set of the "G-idempotent" elements, that is, A6 = {x e(0, + o s ] such that G ( x , x ) = x}

(11)

and AS = (0, + oo] - A~.

(12)

Obviously AG is closed and one can find a sequence of disjoint open intervals of (0, + ~ ] , Ai = (~i, f13, i e I (the set of indices I being a subset of ~( + ), satisfying AS = U Ai.

(13)

i~l

2. Preliminary concepts and results We first recall some basic notions and results.

Definition 2.1. An information space is a structure

If G satisfies the hypothesis of universality (Benvenuti, 1969, p. 488), then the following properties of the independence law G hold in (0, + oo]. These are:

(Q, 5P, off, G, J) where • Universal values • ~ is a nonempty set of elements ~o (elementary events); • 5~ is an algebra of subsets of ~2; • oU is a set of couple of "M-independent" subalgebras of 5~ (cf. Kappos, 1960); • G is a function defined on (0, + c~] x (0, + oo] and satisfying certain suitable properties; • J is an extended real-valued and nonnegative function defined on 5P and having the following properties 1. Monotonicity: A ~ B ~ J(A) >1 J(B).

(8)

a(x, + oo) = + oo.

(14)

• Symmetry G(x, y) = G(y, x)

(15)

• Associativity G[x, G(y, z)] = G[G(x, y), z]

(16)

• Monotonicity Xx <<.x2 ::* G(xx, y) <<.G(x2, y).

(17)

T. Brezmes, G. Naval / Statistics & Probability Letters 28 (1996) 329-335

Finally, assuming that

of the composition law F hold everywhere:

lim G0', x) = x, Vx ~ (0, + ~ ] ,

(18)

y~0

• Values

F(x, O) = O and

the system of equations (14)-(18) has the following solution (Benvenuti et al. 1969): )'0~[0i(x) + 9i(Y)] G(x, y) = [sup{x, y}

if z ~ [0, 9~(fli)], if z > 9~(fl~).

(20)

D e f i n i t i o n 2.2. A general information measure J on

a measurable information space (I2, 5 ¢, o~¢,()is called compositive if one can find a real-valued nonnegative function F such that

J ( A w B ) = F[J(A), J(B)]

• Symmetry (26)

F i x , F(y, z)] = F[F(x, y), z].

(27)

• Monotonicity

xl <. x2 ~

F(xx, y) <~ F(x2, y).

(28)

The system of equations (25)-(28) has been solved in Benvenuti et al. (1969) and its solution is expressed by ~'j~ [f/(x) + f i ( y ) ] F(x, y) = ~inf {x, y}

if (x, y) ~ ~ x otherwise,

(29)

where J] is a decreasing and extended real-valued nonnegative function defined on E such that limxTb,fi(x) = 0 andj~ is the pseudoinverse function of J] given by

(21)

holds for every A e 5~, B ~ 5P, such that A n B = O. The function F will be called the composition law of J. Now we are examining some properties of a composition law F. If F is continuous on [0, + ~ ] x [0, + ~ ] the set Av of the "F-idempotent" elements, that is, Av = {x ~ [0, + ~ ] such that F(x, x) = x}

(25)

• Associativity

where gl is an increasing extended real-valued nonnegative function defined on Ai such that lim~l, ' 9~(x) = 0 and j~ is the pseudoinverse function of 9~ defined by -1(z) /~

F(x, + ~ ) = x.

F(x, y) = F(y, x).

if (x, y) ~ Ai × Ai, otherwise. (19)

0~(z) =

331

(22)

is closed. Its complementary set A~- = [0, + ~ ] Ar is the union of a finite or countable sequence of disjoint open intervals of [0, + ~ ] , say

ff~-l(z) fi(x) = '~tai

if z ~ [0,f~(ai)), if z > f(ai).

(30)

D e f i n i t i o n 2.3. A general information measure J on

a measurable information space (f2, 5P, ~ ) is called information measure of type M if

J ( A w B ) = O-X {O[J(A)] + O[J(B)]}

(31)

VA ~ 5e B e 5g such that A n B = 0, where 0 is an extended real-valued nonnegative, continuous and decreasing function, defined on [0, + ~ ] , such that 0(0) = + ~ and lim~+~ O(x) = O.

(23)

3. G e n e r a l i n f o r m a t i o n m e a s u r e on a probability space

(24)

Let (f2, 5e, ~f, G, J) be a general information space and P a probability measure defined on (O, ,90).

If F satisfies the hypothesis of universality (Benvenuti, 1969, p. 488), then the following properties

Definition 3.1. We say that a general information measure J depends only on the probability measure

Vi = (ai, hi),

i e L c ~e+,

that is, A~ = U ~. ieL

T. Brezmes, G. Naval / Statistics & Probability Letters 28 (1996) 329-335

332

P if we can find an extended real-valued and nonnegative function f defined on [0, 1] and continuous on (0, 1], such that J(A) =f[P(A)],

V A • 5 r.

(32)

If we suppose that all ( ~ , ~ ) • Y are stochastically independent, then from the nature of general information and probability space we can easily deduce the following properties of function f: f i s non-increasing on [0, 1], f(xy) = G[f(x),f(y)],

f(0)=+oe

and

V x , y • [0, 1],

f(1)=0.

(33) (34) (35)

f ( y ) are not equal to zero. From the definition of x~ we h a v e f ( x ) , f ( y ) < ct, i.e.,f(x),f(y) • (0, ~). Furthermore, from the definition of ~, and from the expression (19), we obtain that G[f(x),f(y)]

= 91{9a I f ( x ) ] + gl If(Y)] }.

If 91 I f ( x ) ] + 91 If(Y)] > 9~(~), then from (34) and from the definition of 91, we have f ( x y ) = cc

(42)

In the set F c (x~, 1)x(x,, 1) where 9 1 I f ( x ) ] + g l [ f ( y ) ] < 91(~) Eq. (34) can be expressed as follows: f ( x y ) = 9 ; 1 {91 [ f ( x ) ] + 91 [/(Y)] }.

4. Determination of a general information measure on a probability space Let (~, 5:, ~f/, G, J) be an information space where J is an information measure which only depends on a probability measure P by means a function f. In this section our aim is to determine the most general form of the function f If we denote = i n f { x / x • A~},

fl = l i m f ( x )

(36) (37)

x+O

(41)

(43)

If we define a new function h:(x,, 1) ~ (x~, 1) as h(t) = 9 1 I f ( t ) ] , then the preceding equation becomes h(xy) = h(x) + h(y)

(44)

wherever in F. The most general solution of this functional equation (Azcbl, 1966) is h(t) = c log t

(45)

where c is a negative constant. Combining (42) and (45) we finally obtain that

and we define

f ( t ) = 01(C log t)

xa = sup{x • [0, 1] such that

Next, we will obtain the most general form of the f u n c t i o n f on (0, x,]. For this purpose we need to study the following two cases:

f ( x ) >>.a},

Va • [0, + oc],

(38)

then, from properties o f f (see Eqs. (33), and (35)) one can conclude that x a = 1 ¢~ a = 0 .

(39)

Vt • (x~, 1).

(46)

4.1. First case

If we call 91 to be the function defining the independence law G on (0, e) x (0, ~), then we have

Suppose that there exists an interval (0, e), with > 0, where f is decreasing. From properties (33) and (35) we can easily deduce that

Proposition 4.1. I f ~ is 9reater than zero, then

x,=0

f ( x ) = 91(c log x)

(4o)

Vx • (x~, 1], where c is a negative constant.

Proof. Since ~ > 0, from (39) we have x~ V: 1. Let x, y be two elements in (x,, 1) such that f ( x ) and

~- a~>fl.

(47)

Proposition 4.2. I f there exists an element a • Ann(O, fl), then x,•(0,1)

and

for all x • (0, x,].

f(x)=a

(48)

T. Brezmes, G. Naval / Statistics & Probability Letters 28 (1996) 329-335

Proof. Assume that there exists and element a e AG~(0, fl). Then, as a consequence of the continuity and monotonicity off, we conclude that f ( x ) < f ( x a ) = a,

Vx • (Xa, 1].

(49)

From (34) we have that f(xx,) = G[f(x),f(Xa)],

f ( x x , ) = s u p { f ( x ) , f ( x a ) } = a,

V x • [x~, 1]. (51)

Let z be now an element in the interval [x,z, x,]. There exists an element y in [x,, 1] such that z = yXa and, from (51),f(z) = a. Hence Vz ~: IX 2, Xa"].

(52)

A process analogous to (52) leads to f(z)=a

Suppose that there exists an interval (0, e), where > 0 a n d f i s equal to [3. From properties (33) and (35) we can deduce that

V z e [ X a3, X , ]2 .

~

(57)

a > [3.

(50)

and, sincef(x,) = a and a s AG, in accordance with (19) we obtain

f(z) = a

4.2. Second case

Xa=O

V x • Ix,, 1]

333

Proposition 4.7. I f there exists an element a • AGc~ (0, [3], then x, 6(0,1)

and f ( x ) = a

for a l l x e ( O , x , ] .

(58)

Proof. The proof is analogous to that of Proposition 4.2. [] Proposition 4.8. I f ~ is greater than zero, then f(x)=

{c~ Ol(clogx)

if x • (0, x~], if x e ( x ~ , l ] ,

(59)

(53)

and by repeating this argument the proposition is proved. [] From the last proposition we can immediately derive the following consequence.

where c is a negative constant.

Proof. By Proposition 4.1, we have that f ( x ) = 01(c log x)

for all x • (x~, 1].

(60)

Corollary 4.3. The set AGc~(O, [3) is empty.

Furthermore, ifx~ > 0 then, by (57), e ~< fl and from Proposition 4.7 we have that

Proposition 4.4. I f ~ is greater than zero, then

f(x) = a

f ( x ) = Ol(C log x)

Vx s (0, 1];

for all x • (0, x~].

(61)

(54)

where c is a negative constant.

5. Examples

Proof. According to Proposition 4.1 we have that

Let (f2, 6a, P) be a probability space. In this section, we define a general information measure J depending only on the probability measure P in the following cases:

f(x)=Ol(clogx)

Vxe(x=,l].

(55)

Furthermore, as c~• Aa (since AG is a closed set), Corollary 4.3 implies that e ~> ft. Then, by (47) we have x= = 0, whence f(x)=Ol(clogx)

Vxe(0,1].

(56)

As consequence of the last proposition we have the following results: Corollary 4.5. ~ = ft.

Example 1. Suppose that the independence law G is given by G(x, y) = x + y for all

(x, y) e (0, + oo] x (0, + oo].

Then, Aa = + oo and the function gl is given by gl(x)=x

Corollary 4.6. The information measure J ( A ) = f [ P ( A ) ] is of type M (see Definition 2.3), where O(x) = f - l ( x ) f o r all x • [0, + ~ ] .

forallxe(0,+oo).

Thus, from Proposition 4.4 f(x)=clogx

for a l l x • [ 0 , 1 ] ,

T. Brezmes, G. Naval / Statistics & Probability Letters 28 (1996) 329-335

334

where c is a negative constant. This gives us the classical Shannon (1948) information measure: J(A) =f[P(A)]

= c log P ( A )

for all (x, y) ~ (0, + ~ ] × (0~ + ~ ] and s > 1. In this case

with c < 0. Ao =

Example 2. If G is given by

=

-klog(~+l

)

for a l l x ~ ( 0 , + ~ )

where k is any positive constant. From Proposition 4.4 we know that f(x)=x

d-1

c/k-l=x

,

for a l l x ~ ( 0 , 1 ] ,

fora,, (0 where k is a positive constant. Furthermore, using Proposition 4,5 ~ that 1 1 - 21-s f(x) =

where d is a negative constant, and thus we have the following information measure: J(A)=fEP(A)]

}

g l ( x ) = - k log[x(21-~ - 1) + 1] 1/(*- 1)

(x, y) ~ (0, + ~ ] × (0, + oc],

Then, Ao = + ~ and gl(x)

l_2X_~,+~

and then

G(x, y) = x y + x + y

for all

f'

= P(A) a-1

withd<0.

In particular, when d = - 1 we have the wellknown hyperbolic information proposed by Picard (1975). Example 3. By selecting

1

if x ~(0, x~] [xd(~ -1) __ 1]

2 1 ~

conclude

if X ~ (X,, 1],

where d is a positive constant. In this way, we obtain the following information measure. p(A)a(s-

J(A) =

1) _ 1

2 l-s_

1

withd>0ands>l.

G ( x , y ) = ( 2 1 - ~ - 1)xy + x + y

for all ( x , y ) ~ ( 0 , + ~ ] × ( 0 , + ~ ] ; and 0 < s < 1 we have AG=+~ and gl(x)=-klog Ix(21-s - 1) + 1] 1/t~-1) for all x e ( 0 , + oo) where k is a positive constant. Using Proposition 4.4 we have xd(1-s)__ 1 f ( x ) ---- 21_~ __ 1

for all x E (0, 1],

where d is a negative constant. This function'corresponds (for d -- - 1) to the well-known Havrda and Charv~t 0967) information measure of order s: p(A)d(1

J(A) =

s)_ 1

21-~- 1

with d < 0 and 0 < s < 1. Example 4. Finally, let G be G(x, y) = (2 ~ - ~ - 1)xy + x + y

6. Conclusions and implications

Any general information measure, in the sense of the Axiomatic Information theory introduced by B. Forte and J. Kamp6 de F6riet, which depend only on a probability measure is perfectly determined by means of its independence law. We have found a general expression for all information measures, in the context of the Axiomatic Information theory ((B. Forte (1969)) and (J. Kamp6 de F6riet (1970)) which are defined on a probabilistic space. This expression include the well known information measures defined on a probabilistic space as Shannon's, R6nyi's information measure ... etc. Since in the context of the Asiomatic theory (B. Forte (1969)) any entropy measure H, of an experience {A1 . . . . . A,}, can be expressed, under suitable componibility conditions, as a function of the

T. Brezmes, G. Naval / Statistics & Probability Letters 28 (1996) 329-335

information given by the occurrence of each possible result of that experience: H ( { A 1 . . . . , An}) = L [ J ( ~ d ......

J~A.)],

we are deveLopin~ a work about the construction of ~11oatmpy measures depending only on a probability measure. The Shannon's and R6nyi's entropy are some examples of this kind of measures (M. Behara (1990)).

7. Acknowledgements

The authors are d~laly gr~t~ul to anonymous referee for t~hoi~ adviccs and suggestions. Supported, ia p~rt, by Grant DGICYT, No. PB941328,

References Azc~], J, (1966), Lectures on Functional Equations and Their Applications Vol. 19 (Academic Press, New York).

335

Behara, M. (1990), Additive and Nonadditive Measures of Entropy (J. Wiley, New York). Benvenuti, P. (1969), Sulle misure di informazione compositive con traccia compositiva universale, Rend. di Mat. (3 4) 2, s6rie V1, 481-506. Benvenuti, P., Forte B. and Kampb de F6riet J. (1969), Forme generale de r operation de composition Serie A (CRAS, Paris 269). Bertoluzza C. and Boscani A. (1977), Un sistema di equazioni funzionali connesso con una generalizzazione della nozione d" indipendenza in teoria dell' informazione, Analisi matematica A. I l l , 69-78. Cho-Hsin Ling (1965), Representation of associative functions, Publicationes Math. 12 1-4. Forte, B. (1969), Measures of information: the general axiomatic theory, RIRO 3° ann6e, N °, R-2, 63-90. Havrda, J. and Charvat, F. (1967), Quantification method of classification processes, Kybernetika 3, 30 35. Kamp6 de F6riet, J. (1970, Mesure de 1' information fournie par un evenement, Collq. Internat. CNRS 186, 191 221. Kappos, D.A. (1960), Strukturtheorie del WahrscheinlichkeitsFelder un Ri~ume (Springer, Berlin). Picard, C.F. (1975), Aspects informatiques de 1" information hyperbolique, Sympos. Math. 10, 55-82. Shannon, C.E. (1948), A mathematical theory of communication, J. Bell. Syst. Tech. 27, (379-423, 623-656). Wiener, N. (1948), Cybernetics (Hermann, Paris).