Partial monotonicity of entropy revisited

Partial monotonicity of entropy revisited

Accepted Manuscript Partial monotonicity of entropy revisited Wanwan Xia PII: DOI: Reference: S0167-7152(18)30314-6 https://doi.org/10.1016/j.spl.20...

433KB Sizes 2 Downloads 76 Views

Accepted Manuscript Partial monotonicity of entropy revisited Wanwan Xia

PII: DOI: Reference:

S0167-7152(18)30314-6 https://doi.org/10.1016/j.spl.2018.09.015 STAPRO 8337

To appear in:

Statistics and Probability Letters

Received date : 27 April 2018 Revised date : 29 July 2018 Accepted date : 26 September 2018 Please cite this article as: Xia W., Partial monotonicity of entropy revisited. Statistics and Probability Letters (2018), https://doi.org/10.1016/j.spl.2018.09.015 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Partial Monotonicity of Entropy Revisited Wanwan Xia School of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing, Jiangsu 211800, China [email protected] Revised July, 2018

Abstract In this short note, we revisit the partial monotonicity of the differential entropy, and point out some errors in Shangari and Chen (2012) and Gupta and Bajaj (2013). The partial monotonicity of the Shannon entropy is also established. Mathematics Subject Classifications (2000): 60E15, 94A17 Keywords: Entropy; Log-concavity; Dispersive order; Partial monotonicity

1

Introduction First, we recall the definition of the entropy H(X) of a random variable X from Cover and Thomas

(2006). The entropy H(X) measures the uncertainty of the random variable X. Let X be a discrete random variable. We say that X has the law {x1 , . . . , xn ; p1 , . . . , pn } if pi = P(X = xi ), with x1 < x2 < · · · < xn , pi ≥ 0, i = 1, . . . , n, and Shannon entropy of X is defined by H(X) = −

i = 1, . . . , n,

Pn

i=1

n X

pi = 1. Here, n may be finite or infinite. The

pi log pi .

i=1

When X is an absolutely continuous random variable with density function f , the differential entropy of X is defined by H(X) = −

Z

R

f (x) log f (x) dx. 1

Shangari and Chen (2012) investigated the sufficient conditions on the distribution function F of X such that the differential entropy H(X) possesses the partial monotonicity, that is, the conditional (differential) entropy of X given a < X < b is increasing in a and decreasing in b. The partial monotonicities of Renyi’s entropy and Tsallis’s entropy of order α were considered by Gupta and Bajaj (2013). The formal definitions of Renyi’s entropy and Tsallis’s entropy of order α were given in Remark 2.8. As we know, conditioning does not increase entropy, that is, H(X|Y, Z) ≤ H(X|Z);

(1.1)

See, for example, Theorem 2.38 for Shannon entropy and Corollary 10.33 for differential entropy in Yeung (2008). By choosing Y and Z be indicator random variables of X ∈ B and X ∈ A, respectively, Shangari and Chen (2012) pointed out that it follows immediately from (1.1) that H(X|X ∈ B) ≤ H(X|X ∈ A) for intervals A and B such that B ⊆ A. That is, partial monotonicity is trivially true for all discrete random variable. However, this assertion is wrong because H(X|I{X∈B} , I{X∈A} ) 6= H(X|X ∈ A ∩ B),

H(X|I{X∈A} ) 6= H(X|X ∈ A),

where ID is the indicator function of set D. A counterexample is presented in Remark 3.4 of Section 3. This paper is organized as follows. In Section 2, we revisit the partial monotonicity of the differential entropy, and point out some errors in Shangari and Chen (2012) and Gupta and Bajaj (2013). The partial monotonicity of the Shannon entropy is established in Section 3.

2

Monotonicity of differential entropy First, we recall the definition of the dispersive order, which was introduced and studied by Bickel and

Lehmann (1976). A random variable X with distribution function F is said to be smaller than another random variable Y with distribution function G, denoted by X ≤disp Y , if F −1 (β) − F −1 (α) ≤ G−1 (β) − G−1 (α),

0 < α < β < 1,

where F −1 is the left continuous inverse function of F , defined by F −1 (p) = inf{x : F (x) ≥ p},

p ∈ (0, 1).

Actually, we have X ≤disp Y =⇒ H(X) ≤ H(Y ). In fact, by (3.B.10) in Shaked and Shanthikumar (2007), X ≤disp Y ⇐⇒ g(G−1 (α)) ≤ f (F −1 (α)), ∀α ∈ (0, 1), 2

which implies H(X)

= − ≤ −

Z

1

log f (F −1 (x))dx

0

Z

1

log g(G−1 (x))dx = H(Y ).

0

The following lemma is interesting in itself. Lemma 2.1 (i) is due to Shangari and Chen (2012). A simple proof is presented in terms of the dispersive ordering of the truncated distribution. Lemma 2.1. Let F be a distribution function of an absolutely continuous random variable X with density function f , and assume that the entropy H(X) of X is finite. (i) If F is log-concave, then the conditional entropy of X given X ∈ (−∞, b] H(X|(−∞, b]) = −

Z

b

−∞

f (x) f (x) ln dx F (b) F (b)

is increasing in b. (ii) If F is log-concave, then the conditional entropy of X given X ∈ (a, ∞) Z ∞ f (x) f (x) H(X|(a, ∞)) = − ln dx F (a) F (a) a is decreasing in a. (iii) If F is log-convex, then H(X|(a, ∞)) is increasing in a. Proof. The desired results follow directly from the following three observations: • If X ≤disp Y , then H(X) ≤ H(Y ); • Theorem 3.B.19 in Shaked and Shanthikumar (2007) states that if F is log-concave, then [X|X ≤ b] is increasing in b in the sense of the dispersive order, and if F is log-concave, then [X|X > a] is decreasing in a in the sense of the dispersive order; • If F is log-convex, then [X|X > a] is increasing in a in the sense of the dispersive order (Belzunce 

et al. (1996), Pellerey and Shaked (1997)).

Remark 2.2. It should be pointed out that Parts (i) and (ii) of Lemma 2.1 are equivalent. To see it, note that H(X|(a, ∞)) = H(−X|(−∞, −a)), and, hence, H(X|(a, ∞)) is increasing in a is equivalent to that H(−X|(−∞, −a)) is decreasing in a.

3

If F is log-convex, then the support of X must be of the form (`, +∞). If the support of X is of the form (`, u) with ` finite, then F can not be log-convex. In view of this observation, the imposed condition that F is log-convex in Lemma 2.2 of Shangari and Chen (2012) and in Theorem 2 (i) of Gupta and C

Bajaj (2013) is not correct.

Remark 2.3. In Lemma 2.1, the entropy H(X) of X is assumed to be finite. Here, we give an example to show that the entropy of a random variable may be not finite. Let X be a random variable with density function f (x) =

1 , x(ln x)2

x ≥ e.

Then the entropy of X is H(X) = −

Z



f (x) ln f (x) dx =

e

Z

e





1 ln ln x +2 x ln x x(ln x)2



dx = +∞.

C

To state the following result, we introduce the componentwise log-concavity of a multivariate function. Let h : D → R be a function with D a subset of Rd . We call h componentwise log-concave, if h is logconcave in each component on D. Proposition 2.4. Let F be a distribution function of a continuous random variable X and f be its density function. If h(x, y) := F (y) − F (x) is componentwise log-concave on D = {(x, y) : x ≤ y}, then the conditional entropy of X on X ∈ A = [a, b] H(X|A) = −

Z

b

a

f (x) f (x) ln dx F (b) − F (a) F (b) − F (a)

(2.1)

is increasing in b and decreasing in a. Proof. We first show that H(X|A) is increasing in b. For a ∈ R, define a random variable Xa as Xa = [X|X > a]. It is easy to see that the distribution and density functions of Xa are Fa (x) = (F (x)−F (a))/F (a) and fa (x) = f (x)/F (a), x ≥ a, respectively. Since h(a, x) is log-concave in x ∈ [a, ∞), we have Fa (x) = h(a, x)/F (a) is also log-concave in x ∈ [a, ∞). Then by Proposition 2.1, the conditional entropy of Xa given Xa ∈ B = (−∞, b] H(Xa |B) = −

Z

a

b

fa (x) fa (x) ln dx = H(X|A) Fa (b) Fa (b)

is increasing in b ∈ [a, ∞). The monotonicity of H(X|A) in a can be proved similarly by Remark 2.2. Remark 2.5. Theorem 2.3 in Shangari and Chen (2012) states that if F is log-concave, then H(X|A) is increasing in b and decreasing in a, where A = (a, b). However, the monotonicity of H(X|A) in a is not correct, as illustrated by the following counterexample. Let X be a random variable following the Gamma distribution with shape parameter 1/2 and scale parameter 1. Figure 1 plots the functions 4

g(a) = H(X|X > a) and h(b) = H(X|X ≤ b) on R+ , which shows that both g(a) and h(b) are increasing on R+ . In fact, this result follows from Lemma 2.1 (i) and (iii) because the log-convexity of the density

C

−5

0.2

−4

0.4

−3

−2

0.6

−1

0.8

0

f implies the log-convexity of F and the log-concavity of F (Sengupta and Nanda (1999)).

0

2

4

6

8

10

0

2

4

6

8

10

Figure 1: Plots of the functions g(x) = H(X|X > x) (Left) and h(x) = H(X|X ≤ x) (Right)

Remark 2.6. It is worthnoting that the condition in the Proposition 2.4 is just a sufficient but not necessary condition. To show it, we assert that for a density function f , if f always takes values in [1, e], then the H(X|A) defined by (2.1) is always increasing in b ∈ [a, ∞) and decreasing in a ∈ (−∞, b]. To see it, note that in this case ln f ∈ [0, 1] and, hence, the right-derivative of H defined by (2.1) with respect to b is given by ∂+ f (b) H(X|A) = ∂b (F (b) − F (a))2

"Z

a

#

b

f (x) ln f (x) dx + (F (b) − F (a))(1 − ln f (b)) ≥ 0.

This implies that H(X|A) is increasing in b ∈ [a, ∞). Similar arguments lead to that H(X|A) is decreasing

C

in a ∈ (−∞, b].

Let f be a density function. It is well-known that if f is log-concave, then its corresponding distribution and survival functions F and F = 1 − F are both log-concave. By similar arguments, one can verify that h(x, y) := F (y) − F (x) is log-concave in y ∈ [x, ∞) and is also log-concave in x ∈ (−∞, y]. Hence, Proposition 2.4 holds if the density function f is log-concave. Corollary 2.7. Let X be a random variable with a log-concave density function f . Then the conditional entropy H(X|A) of X given X ∈ A = [a, b] defined by (2.1) is increasing in b ∈ [a, ∞) and decreasing in a ∈ (−∞, b]. 5

Remark 2.8. Several generalizations of the differential Shannon entropy were proposed in the literature of information theory. For example, R´enyi (1961) proposed the the differential R´enyi entropy of order α defined by Hα (X) =

1 log 1−α

Z

R

f α (x) dx,

α > 0,

and Tsallis (1988) defined the following generalized entropy   Z 1 1 − f α (x) dx , Sα (X) = α−1 R

α > 0.

By applying L’Hopital’s rule, H1 (X) := lim Hα (X) = H(X),

S1 (X) := lim Sα (X) = H(X).

α→1

α→1

Let X ∼ F , Y ∼ G be two random variables with respective density function f and g. It is easy to see that X ≤disp Y =⇒ Hα (X) ≤ Hα (Y ) and Sα (X) ≤ Sα (Y ) for all α > 0. To see it, note that X ≤disp Y ⇐⇒ f (F −1 (u)) ≥ g(G−1 (u)), ∀ u ∈ (0, 1), and, for α > 0, Hα (X)

=

Sα (X)

=

Z 1  α−1 1 log f (F −1 (u)) du, 1−α 0   Z 1  α−1 1 −1 1− f (F (u)) du . α−1 0

Therefore, all results for the differential Shannon entropy in this subsection also hold for generalized entropy Hα and Sα . Theorem 2 (ii) in Gupta and Bajaj (2013) states that if F is log-concave, then Sα (X|A) is increasing in b and decreasing in a, where A = (a, b). However, the decreasing property of Sα (X|A) in a is not C

correct.

Counterexample 2.9. Consider the same counterexample in Remark 2.5. Let X be a random variable following the Gamma distribution with shape parameter 1/2 and scale parameter 1. It can be checked directly that F is log-concave. However, Figure 2 plots the function h(a) = S2 (X|X > a), which shows that h(a) is increasing in a.

3

Monotonicity of Shannon entropy In this section, we investigate conditions under which Shannon entropy possesses the partial mono-

tonicity. First, one lemma is required. 6

0.4 0.3

g(a)

0.2 0.1

0

2

4

6

8

10

a

Figure 2: Plots of the function h(a) = S2 (X|X > a)

Lemma 3.1. Let X be a discrete random variable with law {1, . . . , n; p1 , . . . , pn }, where n may be finite Pk or infinite. If Pk = i=1 pi is log-concave on Λ ≡ {1, . . . , n}, then the conditional entropy of X given X≤k

H(k) = − is increasing in k ∈ Λ.

k k X pi 1 X pi pi ln pi + ln Pk ln =− P Pk Pk i=1 i=1 k

Proof. Note that for k = 2, . . . , n, we have k k−1 1 X 1 X Pk pi ln pi + pi ln pi + ln Pk i=1 Pk−1 i=1 Pk−1 ! k−1 k X X 1 Pk = Pk pi ln pi − Pk−1 pi ln pi + ln Pk Pk−1 P k−1 i=1 i=1

H(k) − H(k − 1) = −

=

k−1 X pk pi Pk pi ln + ln Pk Pk−1 i=1 pk Pk−1

pk = Pk Pk−1

k−1 X

pi Pk Pk−1 Pk pi ln + ln p p P k k k−1 i=1

!

.

Denote h(1) = 0 and h(k) :=

k−1 X i=1

pi ln

Pk Pk−1 Pk pi + ln , k = 2, . . . , n. pk pk Pk−1

7

Then for k = 1, . . . , n − 1, we have h(k + 1) − h(k) =

k X

pi ln

i=1

pi pk+1

= pk ln pk − = Pk ln

k X



k−1 X

pi ln

i=1

Pk+1 Pk Pk+1 Pk Pk−1 Pk pi + ln − ln pk pk+1 Pk pk Pk−1

pi ln pk+1 +

i=1

k−1 X

pi ln pk +

i=1

Pk+1 Pk Pk+1 Pk Pk−1 Pk ln − ln pk+1 Pk pk Pk−1

pk Pk+1 Pk Pk+1 Pk Pk−1 Pk + ln − ln pk+1 pk+1 Pk pk Pk−1

Pk Pk+1 Pk Pk+1 Pk Pk−1 Pk + ln − ln Pk+1 pk+1 Pk pk Pk−1   Pk+1 Pk−1 Pk Pk = Pk ln − ln pk+1 Pk pk Pk−1      pk+1 Pk−1 pk Pk ln 1 + − ln 1 + , = Pk pk+1 Pk pk Pk−1

≥ Pk ln

(3.1)

where the inequality follows from pk /pk+1 ≥ Pk /Pk+1 by the log-concavity of F (k). Here, by the logconcavity of F (k) in k again, that is, Pk+1 /Pk ≤ Pk /Pk−1 , we have Pk = pk+1



−1  −1 Pk+1 Pk Pk−1 −1 ≥ −1 = . Pk Pk−1 pk

(3.2)

Note that it can be verified that x 7→ x ln(1 + 1/x) is an increasing function in x > 0. Then substituting (3.2) to (3.1) yields that h(k + 1) − h(k) ≥ 0 for k = 1, . . . , n − 1, which implies that h(k) ≥ 0. This is equivalent to that H(k) is increasing in k. Thus, this completes the proof. Similar to Proposition 2.4, one can show the following result by Lemma 3.1. Proposition 3.2. Let X be a discrete random variable with law {1, . . . , n; p1 , . . . , pn } and F be its Pk distribution function. If F (k) − F (`) = i=`+1 pi is componentwise log-concave on D = {(k, `) : k >

`, (k, `) ∈ Λ2 }, then the conditional entropy of X given X ∈ (`, k] H(X|(`, k]) = −

k X

i=`+1

pi pi ln F (k) − F (`) F (k) − F (`)

(3.3)

is increasing in k and decreasing in `, where (k, `) ∈ D. Note the relation between the log-concavity of probability mass function and that of its distribution and survival functions. We have the following corollary. Corollary 3.3. Let X be a discrete random variable with law {1, . . . , n; p1 , . . . , pn }. If pi is log-concave in i ∈ Λ, then the conditional entropy H(X|(`, k]) defined by (3.3) of X given X ∈ (`, k] is increasing in k and decreasing in `, where ` < k.

8

Remark 3.4. The assertion that the conditional entropy of a discrete random variable on an interval A is always partial monotone is wrong. To see it, we give a counterexample. Let X be a random variable with probability mass function P(X = 1) = P(X = 2) = 0.1 and P(X = 3) = 0.8. It can be easily calculated that H(X|[1, 2]) = log 2 ≈ 0.693 > H(X|[1, 3]) ≈ 0.639. That is, H(X|A) is not increasing in A. This is due to the fact that the distribution function of X is not log-concave on the set {1, 2, 3}.

References Belzunce, F., Candel, J. and Ruiz, J.M. (1996). Dispersive orderings and characterizations of ageing classes. Statistics and Probability Letters, 28, 321-327. Bickel, P.J. and Lehmann, E.L. (1976). Descriptive statistics for non-parametric models. III. Dispersion. Annals of Statistics, 4, 1130-1158. Chen, J. (2013). A partial order on uncertainty and information. Journal of Theoretical Probability, 26, 349-359. Cover, T.M. and Thomas, J..A. (2006). Elements of Information Theory (Second Edition), WileyInterscience, New York. Gupta, N. and Bajaj, R.K. (2013). On partial monotonic behaviour of some entropy measures. Statistics and Probability Letters, 83, 1330-1338. Pellerey, F. and Shaked, M. (1997). Characterizations of the IFR and DFR aging notions by means of the dispersive order. Statistics and Probability Letters, 33, 389-393. R´enyi, A. (1961). On measures of entropy and information. In: Neyman, J. (Ed.), Proceedings of the Fourth. Berkeley Symposium on Mathematics, Statistics and Probability, Vol. I. University of California Press, Berkley, CA, pp. 547-561. Sengupta, D. and Nanda, A.K. (1999). Log-concave and concave distributions in reliability theory. Naval Research Logistics, 46, 419-433. Shaked, M. and Shanthikumar, J.G. (2007). Stochastic orders. Springer, New York. Shangari, D. and Chen, J. (2012). Partial monotonicity of entropy measures. Statistics and Probability Letters, 82, 1935-1940.

9

Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics, 52, 479-487. Yeung, R.W. (2008). Information Theory and Network Coding, Springer, New York.

10