Information Sciences 152 (2003) 139–144 www.elsevier.com/locate/ins
On scale and concentration invariance in entropies Luc Knockaert Department of Information Technology, IMEC-INTEC, St. Pietersnieuwstraat 41, B-9000 Gent, Belgium Received 30 December 2001; received in revised form 27 November 2002; accepted 30 December 2002
Abstract Renyi entropies are compared to generalized log-Fisher information and variational entropies in the context of translation, scale and concentration invariance. It is proved that the Renyi entropies occupy a special place amongst these entropies. It is also shown that Shannon entropy is centrally positioned amidst the Renyi entropies. Ó 2003 Elsevier Science Inc. All rights reserved. Keywords: Renyi entropy; Shannon entropy; Information; Invariance
1. Introduction Renyi entropies [1] are known to be of importance in cryptography [2], resolution in time–frequency [3], time–frequency representations [4] and social sciences [5]. They happen to be the logical extensions [1] of Shannon entropy, displaying the same invariance properties with respect to translation and scaling. In this paper we relate Renyi entropies to the pertinent variational and Fisher-information based entropies [6] which also exhibit translational and scaling invariance. All these entropies are shown to be strongly inter-related by means of relevant inequalities. However, only the Renyi entropies satisfy the property of concentration invariance. The central position of the Shannon
E-mail address:
[email protected] (L. Knockaert). 0020-0255/03/$ - see front matter Ó 2003 Elsevier Science Inc. All rights reserved. doi:10.1016/S0020-0255(03)00058-6
140
L. Knockaert / Information Sciences 152 (2003) 139–144
entropy is expressed by the fact that all Renyi entropies can be written as functionals of the Shannon concentration entropy.
2. Shift, dilatation and concentration Let f ðtÞ be a probability distribution function (p.d.f.) with support X R and Lebesgue support measure 0 < l 6 1. We define three basic operators transforming f ðtÞ into another p.d.f. These are the shift or translation operator Ss ½f ðtÞ ¼ f ðt sÞ s 2 R
ð1Þ
which translates X by the amount s but leaves l unchanged, the scaling or dilatation operator Ds ½f ðtÞ ¼ sf ðstÞ
ð2Þ
s>0
which scales l by the amount 1=s, and the concentration operator Z b Cb ½f ðtÞ ¼ fb ðtÞ ¼ f ðtÞ f ðtÞb dt b > 0
ð3Þ
which leaves X and l unchanged. The above operators obey commutation relations, as can be seen from the following table:
S s0
Ds0
Cb0
Ss Ds Cb
Ssþs0 Ss0 =s Ds S s0 C b
Ds0 Ss0 s Dss0 Ds0 Cb
Cb0 Ss Cb0 Ds Cbb0
3. Entropic inequalities We consider three classes of entropies. These are the generalized log-Fisher information [6] Z 0 a f ðtÞ 1 f ðtÞ dt a > 0 Ja ðf Þ ¼ log ð4Þ a f ðtÞ the variational entropies Z 1 a Va ðf Þ ¼ log min jt sj f ðtÞ dt s a and the Renyi entropies Z 1 a log f ðtÞ dt Ha ðf Þ ¼ 1a
a>0
a>0
ð5Þ
ð6Þ
L. Knockaert / Information Sciences 152 (2003) 139–144
141
Note that V2 ðf Þ is log r, the logarithm of the standard deviation. For a ¼ 1 the Renyi entropy coincides with the Shannon entropy [1]: Z ð7Þ H ðf Þ ¼ H1 ðf Þ ¼ lim Ha ðf Þ ¼ f ðtÞ log f ðtÞ dt a!1
The entropies (4)–(6) are interrelated by means of the following inequalities and equalities. R R Theorem 1. Let f ðtÞ be differentiable with f 0 ðtÞ dt ¼ 0, f 0 ðtÞt dt ¼ 1 and let p, q > 1 be conjugate H€ older, i.e. 1p þ 1q ¼ 1. Then Jq ðf Þ 6 Vp ðf Þ
ð8Þ
Proof. From the premises we have Z 1 ¼ f 0 ðtÞðt sÞ dt
ð9Þ
Exploiting H€ olderÕs inequality we find Z 0 1 ¼ f ðtÞðt sÞ dt
Z Z 1=p 1=p 1=p p f ðtÞjt sj dt ¼ f 0 ðtÞf ðtÞ f ðtÞ ðt sÞdt 6
Z
0
q
jf ðtÞj f ðtÞ
q=p
1=q
ð10Þ
dt
Hence Z 16
f ðtÞjt sjp dt
1=p Z
jf 0 ðtÞ=f ðtÞjq f ðtÞ dt
1=q ð11Þ
Inequality (11) has the constant 1 in the l.h.s and a function of s in the r.h.s. Hence, taking logarithms (a strictly increasing continuous function) and minimizing the r.h.s. with respect to s leads to Z Z 1 1 p q 0 f ðtÞjt sj dt þ log jf ðtÞ=f ðtÞj f ðtÞ dt 0 6 min log ð12Þ s p q or equivalently 0 6 Vp ðf Þ Jq ðf Þ which completes the proof.
ð13Þ
142
L. Knockaert / Information Sciences 152 (2003) 139–144
Note that for p ¼ q ¼ 2 we have a logarithmic form of the Cramer–Rao [6] inequality. Theorem 2. Let CðzÞ be the Euler gamma function. Then 1 H ðf Þ 6 Va ðf Þ þ ð1 þ log aÞ þ log½2Cð1 þ 1=aÞ a
a>0
Proof. Apply the Kullback–Leibler inequality [7] Z H ðf Þ 6 f ðtÞ log gðtÞ dt
ð14Þ
ð15Þ
valid for any two p.d.f.Õs f ðtÞ and gðtÞ to the p.d.f. a
gðtÞ ¼ A1 exp ½ sjt sj
A ¼ 2s1=a Cð1 þ 1=aÞ s; a > 0
ð16Þ
We obtain H ðf Þ 6
1 log s þ log½2Cð1 þ 1=aÞ þ seaVa ðf Þ a
ð17Þ
Minimization of the r.h.s. of (17) with respect to the free parameter s completes the proof. Theorem 3. Let p, q > 1 be conjugate H€ older, i.e. 1p þ 1q ¼ 1: Then 1 b Hqðb1=pÞ ðf Þ Hb ðf Þ p > 1=b Va ðfb Þ 6 Vpa ðf Þ þ a Proof. Apply H€ olderÕs inequality to Z Z h ih i a b a 1=p b1=p jt sj f ðtÞ dt ¼ jt sj f ðtÞ f ðtÞ dt
ð18Þ
ð19Þ
and afterwards minimize with respect to s and take logarithms. Note that, if we take b ¼ 1, this also proves that Va ðf Þ is increasing with respect to a. Regarding the Renyi entropies we have the fundamental facts that they are decreasing [2], i.e. Ha ðf Þ 6 Hb ðf Þ for a P b, and that they are all equal for the uniform p.d.f. The next theorem shows that the Renyi entropy Ha ðf Þ can be written as a functional of the concentration entropy H ðfb Þ. Theorem 4. For a 6¼ 1 we have Z a a Ha ðf Þ ¼ H ðfb Þb2 db a1 1
ð20Þ
L. Knockaert / Information Sciences 152 (2003) 139–144
Proof. We have b2 H ðfb Þ ¼ b2 1
Z
fb ðtÞ log fb ðtÞ dt
Z
2
Z
b
fb ðtÞ log f ðtÞ dt þ b log f ðtÞ dt Z Z b 1 d 2 log f ðtÞ dt þ b log f ðtÞb dt ¼ b db Z d b 1 ¼ b log f ðtÞ dt db ¼ b
143
ð21Þ
and (20) follows. Note that since Ha ðf Þ is decreasing with respect to a, we also have d 1 Ha ðf Þ ¼ ½ H ðfa Þ Ha ðf Þ 6 0 da aða 1Þ
ð22Þ
4. Shift, scale and concentration invariance Theorems 1–4 relate the different interactions between the different entropies. The next theorem focuses on the invariance of the entropies with respect to the translation, dilatation and concentration operators. Theorem 5. The entropies (4)–(6) are linearly invariant with respect to translation and dilatation, while only the Renyi entropies are linearly invariant with respect to concentration. Proof. Linear invariance between two entities should be understood in the sense that there exists a linear relationship between the two entities. It is easy to see that Va ðSs ½f Þ ¼ Va ðf Þ;
Ja ðSs ½f Þ ¼ Ja ðf Þ;
Ha ðSs ½f Þ ¼ Ha ðf Þ
ð23Þ
and Va ðDs ½f Þ ¼ log s þ Va ðf Þ; Ha ðDs ½f Þ ¼ log s þ Ha ðf Þ
Ja ðDs ½f Þ ¼ log s þ Ja ðf Þ; ð24Þ
which proves the first part of the statement. The second part of the statement follows straightforwardly from Z 1 1 ab a ab a Ha ðCb ½f Þ ¼ log fb ðtÞ dt ¼ Hab ðf Þ Hb ðf Þ ð25Þ 1a 1a 1a since (25) has no linear counterpart in terms of Ja ðCb ½f Þ or Va ðCb ½f Þ.
144
L. Knockaert / Information Sciences 152 (2003) 139–144
In the statistical physics community one sometimes utilizes the Tsallis entropies [8] Z 1 a Ta ðf Þ ¼ ð26Þ 1 þ f ðtÞ dt 1a instead of the Renyi entropies. However, in contradistinction with the Renyi entropies, the Tsallis entropies (26) are not scale-invariant, since in general Ta ðDs ½f Þ 6¼ log s þ Ta ðf Þ.
References [1] A. Renyi, On measures of entropy and information, in: Proc. 4th Berkeley Symp. Math. Stat. and Prob., vol. 1, 1961, pp. 547–561. [2] C. Cachin, Entropy Measures and Unconditional Security in Cryptography, Hartung-Gorre Verlag, Konstanz, 1997. [3] L. Knockaert, Comments on ‘‘resolution in time–frequency’’, IEEE Trans. Signal Process. 12 (2000) 3585. [4] R.-G. Baraniuk, P. Flandrin, A.-J.-E.-M. Janssen, O. Michel, Measuring time–frequency information content using the Renyi entropies, IEEE Trans. Inf. Theory 4 (2001) 1391–1409. [5] M.-M. Mayoral, RenyiÕs entropy as an index of diversity in simple-stage cluster sampling, Inf. Sci. 105 (1998) 101–114. [6] A. Dembo, T.-M. Cover, J.-A. Thomas, Information theoretic inequalities, IEEE Trans. Inf. Theory 6 (1991) 1501–1518. [7] M. Basseville, Distance measures for signal processing and pattern recognition, Signal Process. 18 (1989) 349–369. [8] C. Tsallis, Possible generalization of Boltzmann–Gibbs statistics, J. Stat. Phys. 52 (1988) 479– 487.