Trigonometric entropies, Jensen difference divergence measures, and error bounds

Trigonometric entropies, Jensen difference divergence measures, and error bounds

INCUBATION SCIENCES 35,145-156 (1985) 145 Trigonometric Entropies, Jensen Difference Divergence Measures, and Error Bounds* ANNIBAL P. SANT’ANNA ...

609KB Sizes 1 Downloads 98 Views

INCUBATION

SCIENCES

35,145-156

(1985)

145

Trigonometric Entropies, Jensen Difference Divergence Measures, and Error Bounds* ANNIBAL P. SANT’ANNA Institute de Matematica, Universidade Federal do Rio de Janeiro, 21944 Rio de Janeiro, RJ, Brasil and

INDER JEET TANEJA Bepartamento de Matembtica, ~niversida~ Federal de Santa Catarina, 88.000 Fiorianbpoiis, SC, Brasil

ABSTRACT Various authors have attemptedto characterizegeneralizedentropieswhichin specialcases reduce to the Shannonentropy. In this paper, we also characterizenew trigonometricentropies, and some donation-~eo~tic propertiesare studied. Bounds on the Bayesianprobabilityof error in terms of t~gonome~c entropiesand Jensen differencedivergencemeasureshave been obtained. Xdeasof paired entropies applied to statistical mechanics and fuzzy set theory are also discussed.

1.

INTRODUCTION

Various entropies have been introduced in the literature, taking the Shannon entropy as basic. It was R&q-i [21] who for the first time gave a parametric generalization of the Shannon entropy, known as entropy of order a; later Havrda and Charvgt [15] introduced another kind, known as entropy of degree 8. Sharma and Mittal [23] introduced a third kind involving two parameters which unifies those of Rknyi with those of Havrda and Char&, known as entropy of order OLand degree @ Sharma and Taneja [?A] also gave a direct generalization of entropy, that of entropy of degree (a, 8). All these generalizations are based on a power function of the type f(P) = z_, p;, where r is any parameter greater than zero. Taneja [26] for the first time gave a systematic way

*Partially supportedby CNPq (Bra@. QElsevier Science PublishingCo., Inc. 1985 52 VanderbiltAve., New York, NY 10017

ANNIBAL

146

P. SANT’ANNA

AND INDER

JEET TANEJA

to generalize the Shannon entropy, involving the sine function, and later, Sharma and Taneja [25] characterized it jointly with entropy of degree (cll,/3). There are two main basic approaches adopted to characterize these entropies, one axiomatic and another by functional equations. It is true that the Shannon entropy is fundamental from the applications point of view and arises naturally from statistical concepts. But during past years, researchers have also examined the applications of generalized entropies in different fields [2, 4, 6, 81 and found them as good as the Shannon entropy, and sometimes better because of the flexibility of the parameters, especially for comparison purposes. Here our aim is to characterize new families involving the sine function. In special cases, these either reduce to the Shannon entropy or are as good as the Shannon entropy. Some information-theoretic properties are studied. Bounds on the Bayesian probability of error are obtained. The idea of Jensen difference divergence measure or information radius has been generalized. Some possible applications mechanics and fuzzy set theory are discussed. t0 statistical 2.

SINE ENTROPIES

AND THEIR

PROPERTIES

Let A,= {P=(p,,p, ,..., p,,) 1pi 2 0, C:_ Ipi = l} be the set of all complete finite discrete probability distributions associated with a discrete random variable taking a finite number of values. The sine entropy, introduced by Taneja [26] (see also [25]) is given by

W)=-&i

B+kgr,

P, sin( P log pi 1,

k-0,1,2

,...,

(1)

r=l

p,,) E A,,, and its characterization is based on the funcfor all P=(p,,p,,..., tional equation arising from the following generalized additivity:

W*Q>

=W)~W+G(Q)~W),

(4

where

H(P)=E:=,h(pi), G(P)=Xy=,g(pi) for all PEA,, QEA,, and and f and g are continuous functions defined over [0, I]. It is easy P * Q E L,, to verify that

gli_m$P)=H(P)=-

ipJogp,. i=l

It is understood throughout the paper that all the logarithms OlogO = 0, and Osin(/ilogO) = 0, /3 # 0.

are to base 2,

JENSEN

DIFFERENCE

DIVERGENCE

MEASURES

147

The above sine entropy enjoys many interesting properties Similar to the Shannon entropy. Here the log is inside the sine function. In the following we give characterization of three different sine entropies; in two of them the sine function is inside the log function, while the third is only in terms of the sine function. Some information-theoretic properties are also studied. 2.1.

CHARACTERIZATION

Let h and f be two continuous functions defined over [O,l] and satisfying the following relations: (4) and

f(P + 4)f(P- 4) =fb)2-f(q)2~

(5)

for all p, q E [0, 11.Also consider the following generalized average sums:

4(p) = i W(PA9

(6)

&(P) = i f(PJW(PJ)~

(7)

i=l

i=l

and

h(P) =

k f(A),

(8)

i=l

for all P=(p,,p, ,..., P,,)EA,. The most general nontrivial continuous solutions of the functional equations (4) and (5) (see A&l [l]) are given by

h(f(P)) =Al%f(P)~ f(p) = cl sin&

and j(p)

=c,stiBp,

where A, cl, c2, and /3are arbitrary constants. The above set of solutions under the boundary conditions f(f) = 4, and h()) = 1 lead to f(p)_

sinfip

2sin( P/2) ’

f(P)=

sinh@p 2 sinh( /3/2) ’

antiKft

PI)‘= -logffpfa

This, together with the sum representations @), (7),

and {8), gives

arid

It

iseasier

to veti@ the fo~o~:

and

The ~~W~~EOl~~~

~~bU~U~

of sta~~~~me&ani=

can be ob-

tained by ~8 the Shmon entxqq subject to the constraint that the average energy of the system is prescribed. This ~s~butio~, however, is not obeyed by any particle in nature. All particles in nature obey either Bose-Einstein statistics or Fermi-Dirac statistics. Evidently, these ~st~butions cannot be derived by maximizing the ordinary Shamxm entropy. They can, however, bc derived from a modification of the Shannon entropy. Such a procedure was used in Capocelli and De Luca 191,where the Bose4Xinstein statistics is derived along with the Fermi-Dirac statistics and intermediate ones. Forte and Sempi [13]

JENSEN

DIFFERENCE

DIVERGENCE

MEASURES

149

showed that the abovementioned entropies can be derived, without recourse to a special entropy, by m aximi&tg the Shannon conditional entropy. The Bose-Einstein distribution, which is satisfied by bosons (photons, and nuclei and atoms containing an even number of particles), can be derived by maximizing the Bose-Einstein entropy (see Kapur [17,18]), viz., - $+lnp,+

t

(l+pi)In(l+p,)-21n2.

i-l

Similarly, Fermi-Dirac distribution, which is satisfied by electrons, neutrons, and protons, is given by m aximizmg the Fermi-Dirac entropy or paired entropy (see Kapur [17, 18]), viz., -

2 pihlpi - i i-l

(I-pi)hl(l-pi).

i-l

(10)

This idea of paired entropies has been systematically carried over to fuzzy-set theory by De Luca and Termini [ll], where the generalized additivity (2) appears as one of the properties of entropy of fuzzy sets. Recently Ebanks [12] took it as one of the axioms and came up with an entropy of fuzzy sets which is known as the quadratic entropy [27] in information theory. In a manner somewhat similar to Kapur [17], Burbea [6] recently extended (9) and (10) to the entropies of degree /3. Here our aim is just to introduce trigonometric paired entropies, whose details of applications to fuzzy-set theory and statistical mechanics will be discussed elsewhere. These are as follows:

&CC’)=

iI sf(Pi,l-Pi)P

k =1,2,3,4,

i-l

where S,f(p,l-p), k=1,2,3,4, are the binary trigonometric entropies. similar way, Bose-Einstein trigonometric entropies can be introduced. 2.3.

(11) In a

PROPERTIES

In this section, we shall give some information-theoretic properties of the trigonometric entropies. These properties are subject to the condition that /3 E (0, s]. Using the periodicity of the sine function, we can extend them to other intervals. (i) Nonnegativity: Sf(P), k = 1,2,3,4, are nonnegative for 0 < /3 Q x; S!(P), k =1,2,3,4, are continuous functions of P; (ii) Continuity: (iii) Symmetry: Sf(pl,pz,. . . , p,), k =1,2,3,4, are symmetric functions

of

ANNIBAL

150

P. SANT’ANNA

their arguments; (iv) Expansibility: S,f(p,,p, (v) Normality: S,f($,i) =l, (vi) Decisiuity: s#,o)

AND INDER

,..., p,,,0)=Sf(p1,p2 k =1,2,3,4;

JEET TANEJA

,..., p,,), k=1,2,3,4;

= S,s(O,l) = 0,

s,a(~,o)=s~(o,~)=log(cos~)-l,

S,a(l,O) =S,s(O,l)

s&1,0)

B+n,

=(cos$log(cos$

= S,B(O,l) = cos;.

In the second case the entropy is never decisive. In the third and fourth cases it is decisive only when j3 = rr, i.e., S;(l,O) = S,“(O,l) = 0 and S,“(l,O) = S,“(O,l) = 0. 3.

TRIGONOMETRIC

ENTROPIES

AND ERROR

BOUNDS

Let us consider the decision-theory problem of classifying an observation X as coming from one of n possible classes (hypotheses) C,, C,, . . . , C,. Let pi=Pr{C=Ci}, i=1,2 ,..., n, denote the a priori probability of the classes, and let p( x 1Ci) denote the probability density function of the randoin variable given that Ci in the true class or hypothesis. We assume that the pi and p( x 1Ci) are completely known. Given any observation x on X, we can calculate the conditional (a posteriori) probabilities p (C, 1x) by the Bayes rule. Consider the decision rule which chooses the hypothesis with the largest a posteriori probability. Using this rule, the partial probability of error for X= x is expressed by P,(X)

=l-m={p(G

Ix)t...Tp(Glx)).

Prior to observing X, the probability of error P, associated with X is defined as the expected probability of error, i.e.,

P,=Ex{ where p(x)

P,(X)}

=jx~(x)~,(x)

= Cy_‘,,pip(x 1Ci) is the unconditional

dx, density of X evaluated

at x.

JENSEN

DIFFERENCE

DIVERGENCE

151

MEASURES

In the recent literature, researchers in pattern recognition have shown considerable interest in the applications of certain probabilistic information and distance measures as criteria for feature selection. Kanal [16] and Chen [lo] provided a fairly good list of information and distance measures, corresponding bounds, and relationships among them. Now we will give bounds on P, in terms of trigonometric entropies in two different ways: by a sum representation, and by Jensen difference divergence measures. 3.1.

SUM

REPRESENTATION

AND ERRUR

BOUNDS

Kovalevski [19] gave a pointwise upper bound for P, in terms of the parameter t, and a Fano bound taking the Shannon entropy into consideration. Based on Kovaleski’s idea, Ben-Bassat [3] extended his results to the general class of functions satisfying the sum property, defined by

r(f)

=

i

H(P))H:A, +.I?, n222, H(P)= i f strictly concave, f” exists, f(0)

The bounds

f(Pi)>

i=l

= peof(

p) = 0 . )

[3] are given by

and

H(P)
*)3

where t is an integer such that t-l -~pp,<---t

t t+l’

The particular cases considered by Ben-Bassat 131 are the Shannon entropy, the quadratic entropy [27], and the entropy of degree /3 [15]. We can extend them to the trigonometric entropies as follows: For t = 0.5, 0 < P, < f , the upper bounds on P, are given by

~,4%%~lx),

k =1,2,3,4,

(12)

ANNIBAL

152

P. SANT’ANNA

AND INDER JEET TANEJA

x, and Si(C]x) (k=1,2,3,4) are the where S&(ClX)-J,S&(Clx)p(x)d conditional trigonometric entropies of C for X = x. The upper bounds on St< C 1X) are the Fano-type bounds given by

s&(C]X)@(&+&

k=1,2,3,4.

,... &,l-Pe),

(13)

The bounds given in (12) and (13) are subject to the condition of concavity: For k = 1, f(p) is concave provided /I log tan( /3 log p) Q 1, which holds for B small and p E (O,l]. For k = 2 and 3, the bounds are valid for /3 E (0, n/4]. For k - 4, the bound is valid for /3 E (0, a].

3.2.

JENSEN

DIFFERENCE

DIVERGENCE

MEASURES

AND

ERROR

BOUNDS

Recently, Burbea and Rao [7] and Burbea [5] considered three different classes of divergence measures and studied their convexity properties. Two of them are direct generalizations of Jeffreys, Kullback, and Leibler’s J-divergence, and one is based on the Jensen difference. They put the greatest emphasis on the Jensen difference. It has a wide range of applications in biological sciences [22], information theory [14], statistics [20], and other related areas. Here our aim is to consider divergence measures in terms of the Jensen difference and to obtain bounds on the probability of error. Some trigonometric examples are considered. Here we consider only the two-class case. In terms of the prior probabilities the Jensen difference divergence measure based on $ is given by

J+=/,[

~(P(xIc,))+~(P(xlc*)) 2

_~

117 dx

PblcJ+PwG) 2 i

(14) where $ is a convex function class of measures

J,(PI,P,)

=jx[

defined on [O,l]. Let us consider the more general

~(PlP(xlC,))+~(P*P(xlC*)) 2 P,P(xlCI)+P*P(xlC,) 2

-4

dx 11

*

05)

JENSEN

DIFFERENCE

DIVERGENCE

153

MEASURES

lx>,

Let us consider the Jensen difference in terms of the posterior probabilities as (16) where

#tPtC,/x))+cptptr,/x))

J*(x)=

2

-+(i).

Now we will obtain bounds on P, in terms of ‘J+( pt, pz) and then will consider some examples. In the two-class case p,(x)= min{p(C, Ix),p(C;Ix)). Since J.(X) is symmetric in p(C, Ix) and p(C,lx), consider pl(C, jx)=pe(x) so that p(C,lx) =1--p,(x). This gives qx)

=

‘pbetx))+(P(1-Petx))-+(i). 2

(i) Lower Bound on IJ, ( pl, p2) in term of P,: J*(x) convex. Thus

As Cpis convex, this gives

This is a lower bound on V+(p,, p2) in terms of P,, which in turn gives bound on P, in terms of ‘Y..(p,, pz) but in a complicated form. (ii) Upper Bound on P, in terms of ‘J+( pl, pz): In order to obtain an upper bound on P,, let us put the following conditions on the function Cp: 441) = HO) = 0

and

+(‘i) =-f.

This gives f+(O) = f+(l) = f and J+(f) = 0. Consider the function f,(x)

=I--2J*b);

then f+(l) = f*(O) = 0 and f+(i) = 1. Also f+ is concave. Then by the shape off+ and p,(x) we can easily see that

ANNIBAL P. SANT’ANNA AND INDER JEET TANEJA

154 i.e.,

P,(X)

Gf[l-25,(x)],

i.e., P,~f[l-2x#bb~,)]. EXAMPLE

1. For @(p)=plogp,

(18)

J(P,,P~)=‘J(P,,P&

we have

and where

P(c~I~)logP(c~I~);tP(c~I~)logP(c*I~)+A

J(P,,P*) =,,[

2

_

i

P(xlc,)+P(xlc,)

*ogP(~lcd+P(~lc2)

1i

2

p(x)dx,

2I

2

11’ dx

and H(Z’,,l-PC)=-P,logP,-(l-~‘,)log(l-P,). EXAMPLE

2. For +(p)

=

psin(Blogp) sinp



o
we have

where

=

P(C,lx)sin(8logp(C,Ix))+p(C~lx)sin(Plogp(C,Ix))

d

2

X

xp(x) EXAMPLE

dx.

3. For

+A 2I

JENSEN DIFFERENCE

DIVERGENCE

MEWXJRJZS

155

we have

where

+

s~v(G 1x1 2

EXAMPLE 4. For +(p)=

1%

sinIrp( C, 1x) 2

-(sinlrp)/2,

where

This work was started during the first author’s stay with the Departamento de Matematica, Universidade Federal de Santa Catarina, 88.000 Florianbpolis, SC, Brazil, from August to November 1983, and was completed during the second author’s stay with the Instituto di Scienze dell’lnformazione, Facolta di Scienze, Uniuersita di Salerno, 84100 Salerno, Italy, from November I983 to October 1984; both authors are thankful to those universities for providing facilities and hospitality. Thanks are also extended to CNPq (Brazil) for partial support. REFERENCES 1. J. D. A&I, Lectures on Functional Equations and Their Applications, Academic, 1966. 2. S. Arimoto, Information measures and capacity of order 01 for discrete memoryless channels, in Colloquium on Information Theory, Kesthely, Hungary, 1975, pp. 41-52. 3. M. Ben-Bassat, f-entropies, probability of error and feature selection, Inform. and Control, 39:227-242 (1978). 4. M. Ben-Bassat and J. Raviv, Rcnyi’s entropy and the probability of error, IEEE Trans. Inform. Theory IT-24:324-331 (1978). 5. J. Burbea, J-divergence and related concepts, in Encyclopedia of Statistical Sciences, Vol. 4. 1983, pp. 290-296. 6. -, The Bose-Einstein entropy of degree OLand its Jensen difference, Utilitas Math., to appear.

ANNIBAL P. SANT’ANNA AND INDER JEET TANEJA 7. J. Burbea and C. R. Rao, On the convexity of some divergence measures based on entropy functions, IEEE Trans. Inform. Theory IT-28:489-495 (1982). 8. L. L. Campbell, A coding theorem and Renyi’s entropy, Inform. and Conrrol 8:423-429 (1%5). 9. R. M. Capocelli and A. De Luca, Fury sets and decision theory, Inform. and Control, 23~446-473 (1973). 10. C. H. Chen, On information and distance measures, error bounds, and feature selection, Inform. Sci. 10:159-173 (1976). 11. A. De Luca and S. Termini, A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory, Inform. and Conrrol 20:301-312 (1972). 12. B. R. Fbanks, On measures of fuzziness and their representation, J. Math. Anal. Appt. 94:24-37 (1981). 13. B. Forte and C. Sempi, Maximizing conditional entropies: A derivation of quantal statistics, Rend. Mar. (6) 9:551-566 (1976). 14. R. G. Gallager, Information Theory and Reliable Communication, Wiley, New York, 1%8. 15. J. Havrda and F. Charvat, Quantification method of classification processes: Concept of structural a-entropy, Kybemetika (Prague) 3:30-35 (1967). 16. L. Kanal, Patterns in pattern recognition, IEEE Trans. Inform. Theory IT-20:697-622 (1974). 17. J. N. Kapur, Measures of uncertainty, mathematical programming and information theory, J. Indian Sot. Agric. Statist. 24~47-66 (1972). Non-additive measures of entropy and distributions of statistical mechanics, Indian 18. -9 J. Pure Appl. Math. 14:1372-1387 (1983). 19. V. A. Kovalevski, The problem of character recognition from the point of view of mathematical statistics, in Character Readers and Pattern Recognitions, 1%8, pp. 3-30. 20. C. R. Rao. Diversity and dissimilarity coefficients: A unified approach, Theoret. Poptdation Biol. 21:24-43 (1982). 21. A. Renyi, On measures of entropy and information, in Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, Univ. of Calif. Press, Berkeley, 1 %l , Vol. 1, pp. 547-561. 22. R. Sibsoo, Information radius, 2. Wahrsch. Verw. Gebiere, 14:149-160 (1969). 23. B. D. Sharma and D. P. Mittal, New nonadditive measures of entropy for discrete probability distributions, .I. Math. Sci. IO:2840 (1975). 24. B. D. Sharma and I. J. Taneja, Entropy of type (a, jzI) and other generalized measures in information theory, Merrika 22:205-215 (1975). 25. -9 Three generalized additive measures of entropy, Elektron. Informationsverarb. Kybemet. 13:419-433 (1977). 26. I. J. Taneja. A study of generalized measures in information theory, Ph.D. Thesis, Univ. of Delhi, India, 1975. 27. I. Vajda, Bounds on the minimal error probability and checking a finite or countable number of hypotheses, Inform. Trans. Problems 419-17 (1968). Receiued 13 November

I984; revised 8 January 1985