Advances in Applied Mathematics 89 (2017) 18–40
Contents lists available at ScienceDirect
Advances in Applied Mathematics www.elsevier.com/locate/yaama
General Fisher information matrices of a random vector Songjun Lv School of Mathematical Science, Chongqing Normal University, Chongqing, 401331, China
a r t i c l e
i n f o
Article history: Received 18 May 2016 Received in revised form 7 January 2017 Accepted 28 March 2017 Available online xxxx MSC: 28D20 62B10 62H05 94A17
a b s t r a c t We extend the (q, λ)-Fisher information to a much broader setting, where the power function x → |x|q in the (q, λ)-Fisher information is replaced by an arbitrarily chosen convex function. We describe qualitative research, which is undertaken within the general framework, on the newly-introduced generalized Fisher information. In particular, we derive the characterization of general Fisher information matrix for a random vector in Rn . © 2017 Elsevier Inc. All rights reserved.
Keywords: Fisher score Fisher information Fisher information matrix Characterization Renyi entropy
1. Introduction Three most fundamental and mathematically intriguing quantities studied in information theory are the Fisher information, the moment, and the entropy (alternatively, the entropy power). Recall also that the moment-entropy inequality or the Fisher inforE-mail address:
[email protected]. http://dx.doi.org/10.1016/j.aam.2017.03.002 0196-8858/© 2017 Elsevier Inc. All rights reserved.
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
19
mation inequality show the connections between the entropy power and the moment, or the Fisher information respectively. The combination of these two significant inequalities eventually leads to the famous Cramér–Rao inequality which demonstrates the interconnection between the moment (matrix) and the Fisher information (matrix) for 1-dimensional random variables (multi-dimensional random vectors). A phenomenon produced by the notions of the basic information quantities that we can not oversee is that: The entropy power is actually an intrinsic invariant under a volume-preserving linear transformations, but the moment and the Fisher information are not. This phenomenon has been stimulating the research on the invariant Fisher information and the invariant moment, see, e.g., [7,8,15]. Recent progress on invariant information quantities sharps the classical information inequalities and makes the uncertainty principle for entropy and Fisher information more accurate and adaptive than the classical ones. For example, the equality in the classical Fisher information inequality characterizes only the standard Gaussian, but for general fixed entropy, the Fisher information may become increasingly random when the covariance matrices skew away from the identity matrix. Thus the classical Fisher information inequality becomes inaccurate and it is then necessary to introduce the Fisher information that is invariant under volume-preserving linear transformation. There could be different ways to introduce an invariant Fisher information (moment), cf. [7,8,15]. The authors in [7] showed that there are at least two notions of invariant (under SL(n) transforms) Fisher information for a random vector. Such notions of invariant Fisher information, which was called affine Fisher information, are natural to investigate when there is no a priori best or natural choice for defining the total error given an error vector. The prerequisite for the introduction of such invariant information measures is to reveal how the Fisher scores behave under non-singular linear transformations. Indeed, the related research dates back to the classical Fisher information matrix, which was defined explicitly. However, for a more general setting, it is hard (maybe impossible) to define the Fisher information matrix entry by entry. Even for the widely-used (q, λ)-Fisher information matrix, we may only define it in an implicit way: the authors in [7] introduced an invariant (q, λ)-Fisher information of multivariate random vector by showing that associated with each λ-Renyi entropy and exponent q ≥ 1, there is a corresponding notion of a Fisher information matrix for a multivariate random vector. The generating function therein the concept of the (q, λ)-Fisher information (and hence the (q, λ)-Fisher information matrix) is described by the power function x → |x|q . To introduce the affine invariant general Fisher information, especial the invariant ones generated by the nonhomogeneous convex function like the well-known logistic distribution (sigmoid function) and the power function distribution, it is necessary to characterize the corresponding general Fisher information matrix; cf. [1,7]. We shall show the existence and uniqueness of the general Fisher information matrix generated by arbitrarily chosen convex function. We shall also characterize and define such general Fisher information matrices whose generating function are chosen to be arbitrary convex functions. As a by-product, we shall demonstrate an asymptotic behavior
20
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
of the (q, λ)-Fisher information matrix as q goes infinity, and show that the (∞, λ)-Fisher information matrix is naturally associated with the characterization of the John ellipsoids arisen from the subject of convex analysis. Note that for potential applications of general Fisher information matrices to the affine-invariant Riemannian metric and the convergence/divergence of information, one can refer to [5,16,23] and the references therein. The probability density function fθ (X), referred as a function of X, can also be considered as the likelihood function of the parameter θ, whereas X is then the observed outcome of an experiment. The likelihood function fθ (X) = f (X, θ) is critical to define the Fisher score of the random vector X: The gradient of the logarithm of the likelihood function with respect to the parameter θ = (θ1 , · · · , θn ) is exactly the Fisher score X(θ) of the random vector X: X(θ) = ∇θ log f (X, θ). An important special case of the Fisher score is the one where the parameter θ is the location parameter, that is to say, the parameter θ is completely determined by the mean or median of the referred distribution. In that case, the likelihood of an observation is given by a function of the form fθ (X) = f (X + θ); and the Fisher score, denoted by X1 , is then defined by the logarithm of the gradient of f (x) = f (X + θ) with respect to x: X1 = ∇ log f (x). Such Fisher scores are often called linear scores. Throughout, we always suppose the Fisher score and its extensions involved are linear. The Fisher information matrix, denoted simply by J(X), can be expressed as J(X) = E[X1 ⊗ X1 ] =
[∇ log f (x) ⊗ ∇ log f (x)]f (x)dx, Rn
where w ⊗ w denotes the n × n matrix whose ij-th component is wi wj for w ∈ Rn . It is well-known that the Fisher information matrix J(X) can be characterized as the square of the unique matrix with minimal determinant among all positive definite symmetric matrices A such that E[|A−1 X1 |2 ] = n. As an important extension of the Fisher score of a random vector in Rn , the λ-score is connected closely with the λ-Renyi entropy, just like that the Fisher score is combined tightly with the Shannon entropy. The λ-score Xλ of a random vector X, for λ = 1, can be given by (λ − 1)Xλ = ∇[f (X)λ−1 ]. It is obvious that the Fisher score X1 is exactly the limit case of the λ-score as λ → 1.
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
21
It is then natural to extend the Fisher information to the (q, λ)-Fisher information, denoted by Φq,λ (X) for a random vector X, as Φq,λ (X) = E[|Xλ |q ],
for q > 0.
In [7], the (q, λ)-Fisher information matrix Jq,λ (X) was characterized implicitly as the qth power of the unique matrix with minimal determinant among all positive definite symmetric matrices A such that E[|A−1 Xλ |q ] = n. Recently, Vajda [23] introduced a generalized Fisher formation of a random variable, where the power function x → |x|q is replaced by an arbitrarily chosen convex function. Based on the convention of extending objects for random variables to those for random vectors, we see that Vajda’ generalization of the Fisher information is extensible to the multi-dimensional setting. Motivated by Vajda’s idea of the extension of Fisher information and the mentioned characterization of the (q, λ)-Fisher information matrix, we further investigate extensions of Fisher information of a random vector in Rn . Our main goal is to study and characterize the general Fisher information matrix in a much broader framework. To this end, we first introduce a general Renyi entropy, and introduce, correspondingly, an even more general Fisher score than the λ-score, which will be called the Q-score of a random vector. Then we define a general Fisher information of a random vector X in Rn in an implicit manner, and shall call it the (φ∗ , Q)-Fisher information of X. Note that the (q, λ)-Fisher information was defined explicitly in [7], but the extension undertaking in this context can be done implicitly only. This is because the undertaken convex function in our definition may not be homogeneous, while the power function in the definition of the (q, λ)-Fisher information is homogeneous of degree q. A key observation to carry on such extensions from the power function x → |x|q to an arbitrarily chosen convex function tells that the quasi-arithmetic means contains the generalized mean as special cases. For the generalized mean with respect to q itself, we shall deduce the (q, λ)-Fisher information matrix from our main result. Since q could be very large, it is then not only reasonable but also necessary to investigate the limit case of (q, λ)-Fisher information matrices as q goes infinity. In Section 4, we obtain and prove the existence, uniqueness, and the characterization of the (∞, Q)-Fisher information matrix of a random vector in Rn . Section 5 is devoted to an application of general Fisher information matrix introducing the notion of the invariant Fisher information. The theory of the λ-Renyi entropy and the (q, λ)-Fisher information entails naturally general moment-entropy inequalities as well as Fisher information inequalities, see for examples, [7–10,12]. As an extension of the λ-information theory, in Section 2 we build up a new general framework in which the Q-Renyi entropy and the (φ∗, Q)-Fisher information for a random vector in Rn are formulated. This developing theory has the advantage over the λ-information theory of allowing us to study the relationship between the information quantities whose generating functions are not necessarily homogeneous.
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
22
There is a dual information matrix called moment matrix, or called covariance matrix when an unbiased estimator is being considered. As a correspondence to the general Fisher information matrix studied here, the similar extension of the classical moment matrix, as well as of the pth moment matrix characterized by Lutwak et al. in [12], was established by the author in [13] recently. A complete classification of SL(n) covariant matrix-valued valuations on functions with finite second moments was obtained by Ludwig [6] very recently. 2. Definitions Let X be a random vector in Rn with probability density function fX . We also write fX simply as f . If A is a nonsingular n × n matrix, then the probability density of a random vector AX transformed by the linear transformation A is given by fAX (y) = |A|−1 fX (A−1 y),
(2.1)
where |A| is the absolute value of the determinant of A. The Shannon entropy of a random vector X with density f is given by h(X) = −
f (x) log f (x)dx.
Rn
The λ-Renyi entropy power [17] for λ > 0 is defined to be
Nλ (X) =
⎧⎛ ⎪ ⎪ ⎪⎝ ⎨ ⎪ ⎪ ⎪ ⎩
2 ⎞ n(1−λ)
f (x)λ dx⎠
,
if λ = 1,
Rn 2
e n h(X) ,
(2.2)
if λ = 1.
The λ-Renyi entropy of X is defined to be hλ (X) =
n log Nλ (X). 2
For a random vector X in Rn with probability density f , the λ-score of X is the random vector given by Xλ = f (X)λ−2 ∇f (X),
(2.3)
and the (q, λ)-Fisher information of X is the qth moment of the λ-score of X Φq,λ (X) = E[|Xλ | ] =
f (x)q(λ−2)+1 |∇f (x)|q dx.
q
Rn
(2.4)
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
23
Note that the classical Fisher information Φ(X) = Φ2,1 (X). The Fisher information matrix of the random vector X is J[X] = E[X1 ⊗ X1 ],
(2.5)
where X1 = f (X)−1 ∇f (X) is the 1-score of X. Recall that the classical Fisher information is the trace of the Fisher information matrix. The notion of the Fisher information matrix was extended to the (q, λ)-Fisher information matrix recently in [7], of which a characterization was determined. Let μ be a Borel probability measure on Rn , and let g : Rn → R be a continuous function, then the generalized mean Mq [g] with respect to q ∈ [−∞, +∞] can be stated as ⎧⎛ ⎞ q1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎝ |g(x)|q dμ(x)⎠ , ⎪ if q ∈ (−∞, 0) ∪ (0, +∞), ⎪ ⎪ ⎪ ⎪ n ⎪ R ⎪ ⎪ ⎛ ⎞ ⎪ ⎪ ⎪ ⎨ if q = 0, Mq [g] = exp ⎝ log |g(x)|dμ(x)⎠ , ⎪ ⎪ n ⎪ R ⎪ ⎪ ⎪ ⎪ ⎪ sup |g(x)|, if q = +∞, ⎪ ⎪ ⎪ x∈Rn ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ inf |g(x)|, if q = −∞. n x∈R
If, in addition, Q : I1 → I2 is a continuous real-valued function, which is an injective map from I1 ⊆ R to I2 ⊆ R, then the quasi-arithmetic mean MQ [g] of the function g, with respect to the probability measure μ and the function Q, is given by ⎡ MQ [g] = Q−1 ⎣
⎤ Q(g(x))dμ(x)⎦ .
Rn
Thanks to the generalized mean, one can observe that the λ-Renyi entropy power and the (q, λ)-Fisher information are reduced to the classical Shannon entropy and the classical Fisher information when λ → 1. In this paper we extend these two notions to an even more general setting by taking the quasi-arithmetic mean into account. Definition 1 (The Q-Renyi entropy). For a continuous real-valued function Q : I1 → I2 , which is an injective map from I1 ⊆ R to I2 ⊆ R, the quantity NQ (X) determined by ⎡ −n 2
NQ (X)
= MQ [f ] = Q−1 ⎣
⎤ Q(f (x))f (x)dx⎦
(2.6)
Rn
is called the Q-Renyi entropy power of a random vector X with probability density f . The Q-Renyi entropy hQ (X) of X is
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
24
hQ (X) =
n log NQ (X). 2
Taking Q(t) = tλ−1 /(λ − 1) in (2.6), we recover the λ-Renyi entropy power (2.2) immediately. Note that the notion of the Q-Renyi entropy provides us a unified way to investigate Renyi entropies including the λ-Renyi entropy. To define the Fisher information matrix associated with the Q-Renyi entropy, we first recall the definition and properties of φ-Gaussian random vector (see Cianchi et al. [1]). We define a gauge to be a continuously differentiable function φ : (t− , t+ ) → R with 0 ≤ t− < t+ ≤ ∞ such that φ : (t− , t+ ) → R is a strictly increasing function. In particular, φ is strictly convex. A gauge φ is said to be Gaussian, if in addition t+
e−φ(t) dt < ∞.
t−
For a given Gaussian gauge φ : (t− , t+ ) → R, the standard φ-Gaussian random vector Zφ is defined to be the random vector whose density function fφ is given by ⎛ fZφ (y) = e−φ(|y|) 1suppφ (|y|)/ ⎝
⎞ e−φ(|x|) 1suppφ (|x|)dx⎠ ,
Rn
for y ∈ Rn , where 1suppφ denotes the support of φ. Any random vector of the form Z = T (Zφ − μ), where T is a nonsingular matrix, is called a φ-Gaussian. If we let C = T T t , then the density function of Z is given explicitly by 1 1 1 fZ (x) = a|C|− 2 exp −φ (xt C −1 x) 2 1suppφ (xt C −1 x) 2 , where x ∈ Rn and a is a constant such that fZ forms a probability density. We always assume the mean vector of Z to be μ = 0 and the covariance matrix of Z to be n1 C. Let φ be a Gaussian gauge, and let Zφ be the standard φ-Gaussian random vector associated to the Gaussian gauge φ. Assume that −∞ < E[φ(|Zφ |)] < ∞, and denote ⎛ φˆ = E[φ(|Zφ |)] = ⎝
⎞−1
e Rn
−φ(|x|)
1suppφ (|x|)dx⎠
φ(|x|)e−φ(|x|) dx.
(2.7)
Rn
A continuous random vector X is said to have finite φ-moment, if there exists m > 0 such that
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
|X| |X| ˆ < ∞ = 1 and E φ = φ. P 0< m m
25
(2.8)
In that case, the φ-moment of X is defined to be Eφ [|X|] = m. For a gauge φ : (t− , t+ ) → R, denote the image φ ((t− , t+ )) = (ξ− , ξ+ ) and define the dual gauge φ∗ : (ξ− , ξ+ ) → R of φ as the function such that φ∗ (φ (t)) = tφ (t) − φ(t),
(2.9)
for each t ∈ (t− , t+ ). Note that the dual gauge φ∗ is a continuously differentiable function, and (φ∗ ) = −1 (φ ) is a strictly increasing function from (ξ− , ξ+ ) onto (t− , t+ ). It is worth pointing ∗ out that φ∗ is not necessarily a Gaussian gauge, because the integral of e−φ over (ξ− , ξ+ ) is not necessarily finite. In particular, φ∗ is strictly increasing in (ξ− , ξ+ ). In fact, if we take s1 , s2 from (ξ− , ξ+ ) and suppose that s1 < s2 , and then φ (s1 ) < φ (s2 ), this together with the strict convexity of φ gives φ∗ (φ (s1 )) − φ∗ (φ (s2 )) = s1 φ (s1 ) − s2 φ (s2 ) + φ(s2 ) − φ(s1 ) < φ (s2 )(s1 − s2 ) + φ(s2 ) − φ(s1 ) < 0. That shows that φ∗ (φ (s1 )) < φ∗ (φ (s2 )) and then φ∗ is strictly increasing over (ξ− , ξ+ ). Observing that the definition of the dual gauge is derived from the Legendre transforms, the undertaken transform is then an involution, i.e., (φ∗ )∗ = φ.
(2.10)
To this end, take t ∈ (t− , t+ ), and get φ(t) = φ(t) + tφ (t) − tφ (t) = φ (t)(φ )−1 [φ (t)] − [tφ (t) − φ(t)] = φ (t)φ∗ [φ (t)] − φ∗ (φ (t)) = (φ∗ )∗ [φ∗ [φ (t)]] = (φ∗ )∗ [(φ )−1 [φ (t)]] = (φ∗ )∗ (t). Example 1 (The p-Gaussian gauge). For p, q ∈ (1, ∞) such that 1/p + 1/q = 1, the gauge φ(t) = tp /p with t > 0 is the so called p-gauge. Its dual gauge φ∗ is exactly the q-gauge.
26
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
The most commonly used gauge is the 2-gauge φ(t) = t2 /2 with t > 0. The probability density of the p-Gaussian Zp is usually given by fZp (x) = γe−|x| , p
where γ is a constant such that the above function is a probability density. Example 2 (The logistic Gaussian gauge). Let φ : (0, ∞) → R be given by φ(t) = t + 2 log(1 + e−t ) with t > 0. Then φ is a gauge and its dual gauge is φ∗ : (0, 1) → R given by φ∗ (ξ) = (1 + ξ) log
1+ξ 2
+ (1 − ξ) log
1−ξ 2
.
Note that φ∗ is a strictly increasing function from (0, 1) to (−2 log 2, 0). The probability distribution for any constant multiple of the φ-Gaussian Xφ is known as a logistic distribution [24], whose probability density function is fZφ (x) =
e−|x| . (1 + e−|x| )2
Example 3 (The Laplace Gaussian gauge). Let φ : (0, ∞) → (0, ∞) be given by φ(t) = e−t with t > 0. Then φ is a gauge and its dual gauge is φ∗ : (−∞, 0) → R given by φ∗ (ξ) = ξ[1 − log(−ξ)]. The probability distribution for any constant multiple of the φ-Gaussian Xφ has density function fZφ (x) = τ e−e
−|x|
,
where τ is a positive constant such that the above function has total mass 1 on Rn . Example 4 (The power function Gaussian gauge). Let φ : (0, 1) → R be given by φ(t) = −p log t with p > 0 and t > 0. Then φ is a gauge and its dual gauge is φ∗ : (−∞, −p) → R given by φ∗ (ξ) = p[−1 + log p − log(−ξ)]. The probability distribution for any constant multiple of the φ-Gaussian Xφ has density function n
fZφ (x) =
2(p + 1)π 2 |x|p . (n + p)Γ( n2 )
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
27
We are in the position to define even more general Fisher information of a random vector in Rn . The idea is to replace the q-moment of the λ-score in the (q, λ)-Fisher information by a φ∗ -moment of the Q-score XQ of a random vector X in Rn with density f , in which the Q-score XQ is a random vector in Rn given by XQ = ∇[Q ◦ f ], where Q is an injective, continuously differentiable function and Q ◦ f is the composition of Q and f . For 0 ≤ t− < t+ ≤ ∞, let φ : (t− , t+ ) → R be a Gaussian gauge, whose φ-Gaussian random vector and dual gauge are denoted by Zφ and φ∗ respectively. Denote the image of its derivative by φ ((t− , t+ )) = (ξ− , ξ+ ), and define a random variable Ξφ by Ξφ = φ (|Zφ |). Also define a quantity associated to the Q-score (Zφ )Q of Zφ as φˆ∗ = E[φ∗ (Ξφ )], if it exists. Observe that ˆ φˆ∗ = E [|Zφ |φ (|Zφ |)] − φ.
(2.11)
Based on all of these setups, we define the (φ∗ , Q)-Fisher information of a random vector as follows: Definition 2 (The (φ∗ , Q)-Fisher information). Let Q : I1 → I2 be a continuously differentiable real-valued function, which is an injective map from I1 ⊆ R to I2 ⊆ R. Let XQ be the Q-score of a random vector X in Rn . We say that X has finite (φ∗ , Q)-Fisher information, if there exists m∗ > 0 such that |XQ | P ξ− < < ξ+ = 1, m∗ and |XQ | ∗ = φˆ∗ . E φ m∗ The (φ∗ , Q)-Fisher information Eφ∗ [|XQ |] of X is then defined to be Eφ∗ [|XQ |] = m∗ . The case where φ∗ (t) = tq and Q(s) = sλ−1 /(λ − 1) in Definition 2 recovers the (q, λ)-Fisher information introduced and investigated recently in [7].
28
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
3. Characterizations of general Fisher information matrices As mentioned before, an essential characterization of the Fisher information matrix states that a unique matrix having minimal determinant among all positive definite symmetric matrices A such that E[|A−1 X1 |] = n exactly gives the square root of the Fisher information matrix of the random vector X. Such characterizations were extended to those of the (q, λ)-Fisher information matrix in [7] recently. Motivated by the definitions of the (φ∗ , Q)-Fisher information, we characterize new generalized Fisher information matrices which are called the (φ∗ , Q)-Fisher information matrices of a random vector. Theorem 3.1. Let φ be a Gaussian gauge and XQ the Q-score of the random vector X in Rn with finite (φ∗ , Q)-Fisher information, then there exists a unique positive definite symmetric matrix A of minimal determinant so that Eφ∗ [|A−t XQ |] = n. Moreover, the matrix A is the unique positive definite symmetric matrix solving the above constraint minimal problem if and only if −t −t |A XQ | A−t XQ ⊗ A−t XQ 1 |A XQ | ∗ ∗ −t = E φ |A XQ | In . E φ n |A−t XQ | n n We divide the proof of Theorem 3.1 into several lemmas. The existence of the (φ∗ , Q)-Fisher information matrix is established by the following lemma. Lemma 3.2 (Existence). There exists a positive definite symmetric matrix A such that Eφ∗ [|A−t XQ |] = n
and
|A| ≤ |A |
among all of the positive definite symmetric matrices A satisfying that Eφ∗ [|(A )−t XQ |] = n. Proof. Since the inverse matrix of a positive definite symmetric matrix is again a positive definite matrix, the lemma follows from the compactness of the set S of positive definite symmetric matrices B satisfying the constraint Eφ∗ [|BXQ |] = n. Since S is closed, it suffices to show that it is bounded. For a given positive definite symmetric matrix B, let η be the maximal eigenvalue of B with normalized eigenvector e, then it follows that |BXQ | ≥ η|e · XQ |, for the Q-score of any random vector X.
(3.1)
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
29
We claim that if X is a random vector in Rn with finite (φ∗ , Q)-Fisher information, then there exists a constant c > 0 such that E[|e · XQ |] ≥ c > 0,
(3.2)
for every unit vector e. To this end, observing that the left hand side of (3.2) is a continuous function on the unit sphere, which is compact, it attains its minimum. If the minimum is zero, then there exists a unit vector e such that e · ∇f (x) = 0 for almost every x in the support of f . However, this is not possible for a differentiable probability density function on Rn . This proves the claim. By the monotonicity and convexity of φ∗ , (3.1), Jensen’s inequality, and (3.2), we have |BXQ | f (x)dx φˆ∗ = φ∗ n Rn
φ∗
≥ Rn
⎛
η ≥ φ∗ ⎝ n
η|e · XQ | n
f (x)dx ⎞
|e · XQ |f (x)dx⎠
Rn
E[|e · XQ |] n ∗ c ≥φ η . n = φ∗
η
This together with the invertibility and monotonicity of the function φ∗ gives η≤
n ∗ −1 ˆ∗ (φ ) (φ ). c
From the definitions of φ∗ and φˆ∗ , it follows that 0 < (φ∗ )−1 (φˆ∗ ) < ∞. Thus the eigenvalues of B is uniformly bounded from above, proving that the set S is bounded. 2 Observe that the (φ∗ , Q)-Fisher information matrix is actually the solution to the following constrained minimal problem min {|B|}
B∈S
subject to
Eφ∗ [|B −t XQ |] = n,
(3.3)
where S is the set of n × n positive definite symmetric matrices. The dual problem is to find B ∈ S such that min{Eφ∗ [|B −t XQ |]} subject to
B∈S
|B| = 1.
(3.4)
The solutions to (3.3) and (3.4) only differ by a scale factor. It is enough, then, to characterize the unique solution to the problem (3.4).
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
30
Lemma 3.3 (Uniqueness). The solution to the dual constrained problem (3.4) is unique. Proof. Suppose that there are two different symmetric positive definite matrices A1 and A2 solving (3.4). Then |A1 | = |A2 | = 1 and −1 Eφ∗ [|A−1 1 XQ |] = Eφ∗ [|A2 XQ |].
Define a matrix A3 ∈ S by −1 1 −1 − n −1 −1 A−1 + A A + A 1 2 1 2 A3 = . 2 2 Then |A3 | = 1. The Minkowski’s determinant theorem [18, p. 205] states that n1 −1 A1 + A−1 1 1 1 −1 1 2 n n ≥ |A−1 1 | + |A2 | = 1, 2 2 2
(3.5)
with equality if and only if A1 = cA2 with c > 0. This, together with the strict monotonicity and convexity of φ∗ , deduces that
∗
φ Rn
≤
1 2
|A−1 3 XQ | Eφ∗ [|A−1 1 XQ |] Rn
φ∗
f (x)dx ≤
|A−1 1 XQ | Eφ∗ [|A−1 1 XQ |]
∗
φ Rn
f (x)dx +
−1
+A−1 2 XQ | 2 −1 Eφ∗ [|A1 XQ |]
| A1
1 2
Rn
φ∗
f (x)dx
|A−1 2 XQ | Eφ∗ [|A−1 1 XQ |]
f (x)dx
= φˆ∗ . Consequently, −1 −1 Eφ∗ [|A−1 3 XQ |] ≤ Eφ∗ [|A1 XQ |] = Eφ∗ [|A2 XQ |].
However, this contradicts the assumption that A1 and A2 are solutions to the problem (3.4). Therefore, the inequality in (3.5) has to be an equality, but this happens only when A1 = cA2 (c > 0). This together with the hypothesis that |A1 | = |A2 | = 1 shows that c = 1. 2 Lemma 3.4. If A is the positive definite symmetric matrix such that |A| = 1 and Eφ∗ [|A−t XQ |] ≤ Eφ∗ [|(A )−t XQ |], for all A ∈ S satisfying that |A | = 1, then
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
∗ E φ
|A−t XQ | Eφ∗ [|A−t XQ |]
31
In |A−t XQ | A−t XQ ⊗ A−t XQ ∗ −t = E φ |A XQ | . |A−t XQ | n Eφ∗ [|A−t XQ |] (3.6)
Proof. Suppose that there is a matrix A ∈ S with |A| = 1 satisfying the assumption. For every B ∈ S, denote (B −t A−t )s = [(B −t A−t )t (B −t A−t )]1/2 , then |B −t A−t | = |(B −t A−t )s | and |(B −t A−t )x| = |(B −t A−t )s x|, for each x ∈ Rn . Denoting YQ = A−t XQ , we have Eφ∗ [|(B)−t YQ |] = |B −t |− n Eφ∗ [|B −t YQ |] 1
= |A−t | n |B −t A−t |− n Eφ∗ [|B −t A−t XQ |] 1
1
= |A−t | n |(B −t A−t )s |− n Eφ∗ [|(B −t A−t )s XQ |] 1
1
≥ |A−t | |A−t | 1 n
1 −n
(3.7)
Eφ∗ [|A−t XQ |]
= Eφ∗ [|YQ |], where B = B/|B|1/n . From the definition of the (φ∗ , Q)-Fisher information, it can be shown that the inequality (3.7) is equivalent to
φ∗
Rn
|(B)−t YQ | Eφ∗ [|YQ |]
f (y)dy ≥ φˆ∗ .
(3.8)
Let B ∈ S be a positive definite symmetric matrix. Then there exist 0 > 0 such that for each ∈ (0, 0 ), both the matrices In + B and In − B are still positive definite. For ε ∈ (− 0 , 0 ), define B = (In + εB )/|In + εB |1/n . Then from (3.8) we have
∗
φ Rn
|(In + εB )−t YQ | 1 |In + εB |− n Eφ∗ (|YQ |)
f (y)dy ≥ φˆ∗ ,
which holds for every ε near 0. Hence −t |(I d + εB ) Y | n Q f (y)dy = 0, φ∗ 1 dε ε=0 |In + εB |− n Eφ∗ [|YQ |] Rn
which leads to
(3.9)
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
32
0= Rn
=
|(In + εB )−t YQ | d ∗ f (y)dy φ 1 dε ε=0 |In + εB |− n Eφ∗ [|Y |]
1 Eφ∗ [|YQ |]
(φ∗ )
Rn
|YQ | Eφ∗ [|YQ |]
(B )t YQ · YQ tr(B ) |YQ | − f (y)dy. n |YQ |
A direct calculation shows that this holds only when ∗ E (φ )
|YQ | Eφ∗ [|YQ |]
1 |YQ | Y Q ⊗ YQ ∗ = E (φ ) |YQ | In . |YQ | n Eφ∗ [|YQ |]
Substituting YQ = A−t XQ into (3.10) we achieve (3.6) immediately.
(3.10)
2
Lemma 3.5. If A is a positive definite symmetric matrix with determinant 1 such that (3.6) holds, then A is the unique solution to the problem (3.4). Proof. The proof is similar to that of Theorem 3.1 from [13]. We sketch the main ingredient of the proof below. Let A be a positive definite symmetric matrix with determinant 1 satisfying (3.6) and denote YQ = A−t XQ . For an arbitrary positive definite symmetric matrix B with determinant 1, let e1 , · · · , en be an orthonormal basis of eigenvectors of B −t with corresponding eigenvalues λ1 , · · · , λn . Then Rn
φ∗
|B −t YQ | Eφ∗ [|YQ |]
f (y)dy =
φ∗
Rn
=
∗
φ Rn
n ( i=1 λ2i (ei · YQ )2 )1/2 f (y)dy Eφ∗ [|YQ |]
|diag(λ1 , · · · , λn )YQ | Eφ∗ [|YQ |]
f (y)dy.
Let F (λ) = Rn
φ∗
|diag(λ1 , · · · , λn )YQ | Eφ∗ [|YQ |]
f (y)dy,
we will prove that F (λ) ≥ F (e), where λ = (λ1 , · · · , λn ) and e = (1, · · · , 1). It is not hard to see that F is continuous and convex, and for any λ ∈ Rn+ (the conic subset of Rn with positive n-tubes), F (cλ) is strictly increasing in c ∈ [0, ∞). Hence F −1 ((0, F (e)]) is indeed a convex body (a compact, convex subset of Rn with nonempty interior).
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
33
Since φ∗ is smooth and |diag(λ1 , · · · , λn )YQ | is smooth in (λ1 , · · · , λn ) uniformly for YQ ∈ Rn , from (3.6) we achieve that ∂ F (λ) ∂λi
|YQ | ∗ φ Eφ∗ [|YQ |] |YQ | (ei · YQ ) f (y)dy = 2 |YQ | Eφ∗ [|YQ |] Rn |YQ | E φ∗ Eφ∗ [|Y | |Y Q |] Q = . nEφ∗ [|YQ |]
λ=e
2
That is a constant independent of the index i, denoted by C. We arrive at ∇F (e) = Ce. Therefore the vector e is an outer normal of F −1 ((0, F (e)]) at the boundary point e of F −1 ((0, F (e)]). Since F −1 ((0, F (e)]) is convex, it is contained in the half-space {x ∈ Rn : x · e ≤ e · e}. That is to say, for each λ ∈ Rn+ , F (λ) ≤ F (e)
=⇒
Meanwhile, for λ = (λ1 , · · · , λn ) ∈ Rn+ with gives
λ · e ≤ n. n i=1
(3.11)
λi = 1, the GM-AM inequality
λ · e ≥ n,
(3.12)
with equality if and only if λ = e. That shows that F (λ) ≥ F (e). Since F (e) = φˆ∗ , by the definition of (φ∗ , Q)-Fisher information we have Eφ∗ [|B −t YQ |] ≥ Eφ∗ [|YQ |], for all B ∈ S with determinant 1. We now consider the uniqueness of A. Suppose that A1 , A2 ∈ S have determinant 1 (1) (2) −t and satisfy (3.6). Denote by YQ = A−t 1 XQ and YQ = A2 XQ . In view of the procedure presented above, we achieve that for all B ∈ S with determinant 1, both of ≥ Eφ∗ A−t XQ Eφ∗ B −t A−t 1 XQ 1
(3.13)
≥ Eφ∗ A−t XQ Eφ∗ B −t A−t 2 XQ 2
(3.14)
and
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
34
hold true. Thus, by setting B = A2 A−1 in (3.13) and B = A1 A−1 in (3.14), respec1 2 −1 −1 tively, and by observing the fact that A2 A1 , A1 A2 ∈ S and that det (A2 A−1 1 ) = det (A1 A−1 ) = 1, we obtain 2 −t Eφ∗ [|A−t 2 XQ |] ≥ Eφ∗ [|A1 XQ |]
−t and Eφ∗ [|A−t 1 XQ |] ≥ Eφ∗ [|A2 XQ |],
which concludes (1) (2) Eφ∗ YQ = Eφ∗ YQ .
(3.15)
Combining (3.13)–(3.15) with Lemma 3.3, we see that the positive definite symmetric matrix A is unique. 2 4. The (∞, Q)-Fisher information matrices Let the Gaussian gauge φ(t) = tp with p > 1. Then the dual gauge φ∗ (s) = sq for 1/p + 1/q = 1. Under this assumption the (q, Q)-Fisher information matrix Aq is characterized (see [7]) by E[|A−1 XQ |q−2 (A−1 XQ ) ⊗ (A−1 XQ )] = In , provided that X has finite (q, Q)-Fisher information. Such a matrix A ∈ S actually is the solution to the constrained problem min{|B|}
B∈S
subject to
E[|B −t XQ |q ] q = n q . 1
1
(4.1)
As q → ∞, the problem (4.1) reduces to max{|B −t |} subject to B∈S
|B −t XQ | ≤ 1.
(4.2)
The solution to (4.2) is referred to as (∞, Q)-Fisher information matrix, which relates very closely to the John ellipsoid theorem [3] from convex analysis. We define the (∞, Q)-Fisher information of X as the essential supremum (see, e.g., [20, pp. 8]) of XQ : E∞ [|XQ |] = ess supXQ ∈Rn |XQ |
! = inf τ ≥ 0 : L({y : |fXQ (y)| > τ }) = 0 ,
where L denotes the Lebesgue measure in Rn , and fXQ (y) represents the probability density function of XQ . Throughout this section, we always assume that X has finite (∞, Q)-Fisher information. As a consequence, the probability density functions fXQ of the Q-score vectors XQ in Rn are then elements of the space L∞ (Rn , L).
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
35
The existence and uniqueness of the solution to (4.2) are obtained with the following: Theorem 4.1. Let S = {B −t ∈ S : |B −t XQ | ≤ 1;
fXQ ∈ L∞ (Rn , L)}.
Then there exists exactly one A ∈ S such that |A−t | ≥ |B −t | for all B −t ∈ S . Proof. The existence of A−t will be clear once S is proved to be compact, because the determinant is continuous. It then suffices to show that S is bounded. For B −t ∈ S , let η be the maximal eigenvalue of B −t with normalized eigenvector e, then 1 ≥ |B −t XQ | ≥ η|e · XQ |.
(4.3)
Define the generalized mean Mq by 1
Mq = E[|e · XQ |q ] q . Then, by the monotonicity of generalized mean, we have ess supXQ ∈Rn |e · XQ | = M∞ ≥ M1 = E[|e · XQ |].
(4.4)
Since by (3.2) there exists a constant c > 0 such that M1 = E[|e · XQ |] ≥ c > 0, from (4.3) and (4.4) we have 1 ≥ ηM∞ ≥ ηM1 ≥ ηc > 0, proving that η ≤ 1/c. Thus S is bounded. −t Suppose that there is another matrix A ∈ S solving the problem (4.2), i.e., |A
−t
| ≥ |B −t |,
for all
B −t ∈ S .
Then by the Minkowski’s determinant theorem and the GM-AM inequality, A−t + A −t 1 −t 1 ≥ |A−t | 2 |A | 2 = |A−t |. 2 −t
(4.5)
Now the observation that (A−t + A )/2 ∈ S , together with the equality conditions of the Minkowski’s determinant theorem and of the GM-AM inequality, shows that −t A−t = A . 2
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
36
There is a dual problem to (4.2) which states that min {|B −t XQ |}
B∈S
|B| = 1.
subject to
(4.6)
It is not hard to see that the solutions to the problems (4.2) and (4.6) differ only by a scaling factor. A solution to (4.6) is characterized as follows. Theorem 4.2. Suppose that XQ has finite essential supremum. A positive definite symmetric matrix A with |A| = 1 satisfies that |A−t XQ | ≤ |(A )−t XQ |, where A ∈ S with |A | = 1, if and only if, In (A−t XQ ) ⊗ (A−t XQ ) = . |A−t XQ |2 n
(4.7)
Proof. Let A ∈ S be a solution to the problem (4.6). For each B ∈ S, denote (B −t A−t )s = [(B −t A−t )t (B −t A−t )]1/2 , then |B −t A−t | = |(B −t A−t )s | and |(B −t A−t )x| = |(B −t A−t )s x| for any x ∈ Rn . Denoting YQ = A−t XQ and B = B/|B|1/n , we get |(B)−t YQ | = |B −t |− n |B −t YQ | 1
= |A−t | n |B −t A−t |− n |B −t A−t XQ | 1
1
= |A−t | n |(B −t A−t )s |− n |(B −t A−t )s XQ | 1
1
≥ |A−t | n |A−t |− n |A−t XQ | 1
1
= |YQ |. For B ∈ S, set 1
B = (In + εB)/|In + εB| n . Since for small enough ε, In + εB is still positive definite, we have 1 d 0= |In + εB| n |(In + εB)−t YQ | dε ε=0 =
B t YQ · YQ tr(B) |YQ | − . n |YQ |
Therefore, it follows from a direct calculation that
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
37
In (A−t XQ ) ⊗ (A−t XQ ) . = |A−t XQ |2 n Conversely, we assume that (4.7) holds. Let YQ = A−t XQ , then for any unit vector e, it follows that 1 (YQ · e)2 = . |YQ |2 n
(4.8)
For B ∈ S, define a generalized mean M p [YQ ] by M p [YQ ] = Rn
|B −t YQ |p f (y)dy, |YQ |p
where M ∞ [YQ ] and M 0 [YQ ] are limit cases given by M ∞ [YQ ] = lim M p [YQ ] = ess supYQ ∈Rn p→∞
|B −t YQ | |YQ |
(4.9)
and ⎛ M 0 [YQ ] = lim M p [YQ ] = exp ⎝
p→0
Rn
⎞ |B −t YQ | f (y)dy ⎠ . log |YQ |
Let e1 , · · · , en be an orthonormal basis of eigenvectors of B −t with corresponding eigenvalues λ1 , · · · , λn . By the monotonicity of generalized means, the convexity of log, and (4.8) M ∞ [YQ ] ≥ M 0 [YQ ] ⎛ ⎞ −t 2 1 |B Y | Q = exp ⎝ log f (y)dy ⎠ 2 |YQ |2 Rn
⎛ = exp ⎝
1 2
⎛
log Rn
n " i=1
⎞ λ2i
|ei · YQ | f (y)dy ⎠ |YQ |2 2
⎞ " n 2 |ei · YQ | ≥ exp ⎝ log(λi )f (y)dy ⎠ |YQ |2 i=1 Rn
= |B
−t
1 n
| .
This together with (4.9) shows that
38
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
|B −t YQ | ≥ |YQ | holds for any B −t ∈ S with |B −t | = 1. Now substituting YQ by A−t XQ and taking a symmetrization argument for B −t A−t derive the desired result. 2 Finally, we summarize the results of this section with the following: Theorem 4.3. Let XQ denote the Q-score vector of an arbitrary random vector X in Rn with finite (∞, Q)-Fisher information, then there exists a unique positive definite symmetric matrix A of minimal determinant such that |A−t XQ | = n. Moreover, A is such a unique matrix (up to a scalar multiplication) if and only if In (A−t XQ ) ⊗ (A−t XQ ) = . |A−t XQ |2 n 5. Introduction of invariant Fisher information via the general Fisher information matrix Associated to a Gaussian gauge φ and the Q-score of a random vector in Rn , we denote the (φ∗ , Q)-Fisher information matrix by Jφ∗ ,Q (X). One possible way of introducing an invariant form of the generalized Fisher information, as described below, is to minimize the Fisher information over all volume-preserving linear transformations of X. Define ˆ Q (X) = Φ
inf
A∈SL(n)
Eφ∗ [|A−t XQ |].
(5.1)
The following theorem can eventually be viewed as one of the motivations for characterizing the (φ∗ , Q)-Fisher information matrices. Theorem 5.1. Let X be a random vector in Rn with finite (φ∗ , Q)-Fisher information for a Gaussian gauge φ. Then ˆ Q (X) = n|Jφ∗ ,Q (X)| n1 . Φ Proof. We first note that for each A ∈ GL(n) and x ∈ Rn |A−t x| = |P −t x|, where A = T P , T ∈ O(n), P ∈ S is the polar decomposition of A. This fact together with the homogeneous property of the (φ∗ , Q)-Fisher information Eφ∗ [|XQ |] derives that
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
inf
A∈SL(n)
Eφ∗ [|A−t XQ |] =
inf
A∈GL(n)
39
Eφ∗ [|A−t XQ |]|A| n 1
1 = inf n|A| n : Eφ∗ [|A−t XQ |] = n, A ∈ GL(n) 1 = inf n|A| n : Eφ∗ [|A−t XQ |] = n, A ∈ S 1
= n|Jφ∗ ,Q (X)| n .
2
6. Conclusions We develop a general theory regarding information quantities such as the Fisher score, the Renyi entropy, and the Fisher information of a random vector in Rn . In particular, a general Fisher information matrix is introduced and characterized. The adopted process in the context allows us to study an even broader class of the general Fisher information and information matrices than the class of the (q, λ)-Fisher information and information matrices. However, this, as well as the general moment matrix introduced and characterized in [13], is achieved and can only be achieved implicitly, which brings us into an interesting situation when we try to investigate the corresponding Cramér–Rao inequality since it is then not possible to derive a Cramér–Rao bound directly from these information matrices. For a random variable, general Cramér–Rao inequality corresponding to such a setting has been obtained by Cianchi et al. [1]. However, the idea and the technique employed in [1] can not be carried out for multi-dimensional case. Further work will cover the topics about the establishment of a general Stam’s inequality as well as a general moment-entropy inequality connecting the φ-moment, the (φ∗ , Q)-Fisher information and the Q-Renyi entropy power. Indeed, a combination of such inequalities will lead to the desired general Cramér–Rao inequality for a random vector. For known related results within the framework of λ-Renyi entropy theory, one can refer to [1,2,4,10–12,14,19,21,22] and the references therein. On the one hand, if the function Q, used in the definitions of the Q-Renyi entropy and the Q-score, and its inverse satisfy some certain homogeneous property, then the Q-Renyi entropy may be invariant under SL(n) transforms: 2
NQ (AX) = |A| n NQ (X),
for A ∈ SL(n).
Hence, one may ask if there exists a linear invariant general Fisher information that strengthens the related information inequalities. Such affine information theory has been developed in [7,8] for the λ-Renyi entropy. But for the Q-Renyi entropy, it remains open. On the other hand, the fact that general Fisher information matrix and general moment matrix can only be characterized implicitly also motivate us to study more general affine information theory. Especially, the finding of the explicit expressions of an invariant general Fisher information is highly desirable.
40
S. Lv / Advances in Applied Mathematics 89 (2017) 18–40
Acknowledgments This work is supported partly by NSFC under Grant 10801140, CSTC under Grant 2013-JCYJ-A00005, and CQNU Foundation under Grant 13XLZ05. References [1] A. Cianchi, E. Lutwak, D. Yang, G. Zhang, A unified approach to Cramér–Rao inequalities, IEEE Trans. Inform. Theory 60 (1) (2014) 643–650. [2] J.A. Costa, A.O. Hero, C. Vignat, A characterization of the multivariate distributions maximizing Renyi entropy, in: Proc. 2002 IEEE Int. Symp. Information Theory, Lausanne Switzerland, Jun./Jul., 2002, p. 263. [3] F. John, Extremum problems with inequalities as subsidiary conditions, in: Studies and Essays Presented to R. Courant on His 60th Birthday, January 8, Interscience Publishers, Inc., New York, NY, 1948, pp. 187–204. [4] C.P. Kitsos, N.K. Tavoularis, Logarithmic Sobolev inequalities for information measures, IEEE Trans. Inform. Theory 55 (6) (2009) 2554–2561. [5] V. Latyshev, Linear invariant statistics for signal parameter estimation, Electr. Electron. Eng. 5 (2) (2012) 277–283. [6] M. Ludwig, Covariance matrices and valuations, Adv. in Appl. Math. 51 (2013) 359–366. [7] E. Lutwak, S. Lv, D. Yang, G. Zhang, Extensions of Fisher information and Stam’s inequality, IEEE Trans. Inform. Theory 58 (3) (2012) 1319–1327. [8] E. Lutwak, S. Lv, D. Yang, G. Zhang, Affine moments of a random vector, IEEE Trans. Inform. Theory 59 (9) (2013) 5592–5599. [9] E. Lutwak, D. Yang, G. Zhang, Moment-entropy inequalities, Ann. Probab. 32 (2004) 757–774. [10] E. Lutwak, D. Yang, G. Zhang, Cramér–Rao and moment-entropy inequalities for Renyi entropy and generalized Fisher information, IEEE Trans. Inform. Theory 51 (2) (2005) 473–478. [11] E. Lutwak, D. Yang, G. Zhang, Optimal Sobolev norms and the Lp Minkowski problem, Int. Math. Res. Not. IMRN 62987 (2006) 1–21. [12] E. Lutwak, D. Yang, G. Zhang, Moment-entropy inequalities for a random vector, IEEE Trans. Inform. Theory 53 (4) (2007) 1603–1607. [13] S. Lv, Covariance matrices associated to general moments of a random vector, J. Multivariate Anal. 134 (2015) 61–70. [14] S. Lv, X. Lv, Affine Fisher information inequalities, J. Math. Anal. Appl. 371 (1) (2010) 347–354. [15] M. Noguchi, Invariant Fisher information, Differential Geom. Appl. 4 (2) (1994) 179–199. [16] A. Peter, A. Rangarajan, Shape matching using the Fisher–Rao Riemannian metric: unifying shape representation and deformation, Proc. IEEE Int. Symp. Biomed. Imag. (2006) 1164–1167. [17] A. Rényi, On measures of information and entropy, in: Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics and Probability, 1960, pp. 547–561. [18] A.W. Roberts, D.E. Varberg, Convex Functions, Academic Press, New York, 1973. [19] A.J. Stam, Some inequalities satisfied by the quantities of information of Fisher and Shannon, Inf. Control 2 (1959) 101–112. [20] E.M. Stein, R. Shakarchi, Functional Analysis: Introduction to Further Topics in Analysis, Princeton Lectures in Analysis, Princeton Univ. Press, 2011. [21] T.L. Toulias, Generalized information for the γ-order normal distribution, J. Probab. Stat. (2015) 385285. [22] T.L. Toulias, C.P. Kitsos, Generalizations of entropy and information measures, in: N.J. Daras, M.T.H. Rassias (Eds.), Computation, Cryptography and Network Security, Springer, 2015, pp. 495–526. [23] I. Vajda, On convergence of information contained in quantized observations, IEEE Trans. Inform. Theory 48 (8) (2002) 2163–2172. [24] Wikipedia, Logistic distribution, Available at: http://en.wikipedia.org/wiki/Logistic_distribution.